Registry of Open Data on AWS Access to datasets used by governments and researchers that happen to be stored on Amazon’s servers. Skewed toward large datasets.
Packaged
These feature datasets that are essentially already packaged as CSV or Excel files, plus descriptions.
Five Thirty-Eight Data used to support the site’s journalism, mainly in politics and sports.
Kaggle Long-time host of data science competitions. The formal competitions are well-curated, but user contributions vary widely.
UCI Machine Learning Repository Well-known source for datasets that have been used extensively in machine learning research, but also recent contributions.
Open ML Sort of abandoned years ago, but lots of eclectic datasets remain.
IMDB Datasets Information about movies and TVs. (Big files!)
These require you to navigate an interface to select data from a large pool. Typically, you can make selections, preview the dataset, and then download in CSV or Excel format.
A much more exhaustive glossary can be found here.
Git
Git Protocol for maintaining the entire file history of a project, including all versions and author attributions.
repository Collection of files needed to record the history of a git project.
GitHub Website that hosts git repositories created by private users, along with tools to help inspect and manage them.
commit Collection of particular changes to the repository made by an individual and given a message.
stage Temporary designation of locally modified files to be added to the next commit.
merge Automatic union of non-conflicting commits from different sources.
conflict Disagreement between repository versions that requires human intervention to resolve.
push Sending one or more commits from a local repository to a remote repository.
pull Receiving and merging all commits from a remote repository that are unknown to the local repository.
Notebooks
notebook Self-contained collection of text, math, code, output, and graphics.
kernel Back-end that executes code from and returns output to the notebook.
cell Atomic unit of a notebook that contains one or more lines of text or code.
Markdown Simplified syntax to put boldface, italics, and other formatting within text.
TeX/LaTeX Language used to express mathematical notation within a notebook.
Jupyter Popular format and system for interactive editing, execution, and export of notebooks.
Jupyter Lab Layer over Jupyter notebook functionality to help manage notebooks and extensions.
Python
package (or wheel) Collection of Python files distributed in a portable way to provide extra functionality.
numpy Package of essential tools for numerical computation.
scipy Package of tools useful in scientific and engineering computation.
database Structured collection of data, usually with a formal interface for interaction with the data.
data frame Tabular representation of a data set analogous to a spreadsheet, in which columns are observable quantities and rows are different observations of the quantities.