Getting started

We will be using Python extensively. There are choices for two major aspects of using Python:

Back end

Here are some pros and cons of the usual options:

Your own machine Cloud
Available offline 😄 Use any device 👍
Total control 😎/😕 No installations 😌
Choose the font end Browser only
Yours forever 😁 Permanence is an illusion ⛄️
Your hardware 💵 You get what you get ¯\(ツ)
Tip

You want to use Python 3.x, not Python 2.x. These coexisted for a while during a dark time for Python. If you see guidance based on Python 2 now, it is either terribly outdated or the product of a disordered mind.

Run on your own computer

My recommended option is to download Anaconda. It’s a big download and a long installation, but it comes with just about everything ready to go.

Run on the cloud

There are many choices, but a popular one is Google Colab. It’s free and saves you from having to install anything. The hardware is basic but most likely fine for our purposes. You will need to download your results for assignment submissions.

Front end

Much of data science is expressed using notebooks, which we will use exclusively. Specifically, you will use Jupyter notebooks.

Tip

The name IPython refers to the direct ancestor of Jupyter. The older name continues to stick to a few of the tools, though.

A notebook is a self-contained collection of text, math, code, output, and graphics grouped into cells. The front end manages the cells and communicates with a back end called the kernel.

If you are using Google Colab, then you are using a Jupyter notebook, though within an interface customized by Google.

On your own machine

Jupyter splits into “classic” Jupyter and the newer Jupyter Lab. Either is fine for this book, but there is no reason to prefer the older variant. Just start it up from the Anaconda dashboard, and it will open your web browser to the front-end server.

Setting up VS Code

If you use the free Visual Studio Code editor, you can take advantage of its AI assistant.

  1. First install Anaconda Python. (Be patient! It’s a big download and a long installation.)
  2. Download and install Visual Studio Code.
  3. Start VS Code and look for the “Extensions” icon on the left. (For me, it looks like 4 building blocks.) Search for “Jupyter” and install the extension by Microsoft.
  4. You may also want to install the Python extension by Microsoft.

A notebook file has the extension .ipynb. When you open or create such a file, you will see “Choose Kernel” button in the top right. The kernel is the background computing process that produces output for the notebook. If you click that, button, select “Python environments…” and then select the one called “base” or “anaconda3” or something similar.

Important

You may have other Python environments installed on your computer. Use the one associated with Anaconda to access the packages we will use in this course.

Once the kernel is selected for a notebook file, that selection will be remembered the next time you open the file.

Tips on Jupyter success

Warning

The order of cells that you see in a notebook is not necessarily the order in which they were executed.

By far the greatest source of confusion and subtle problems in a notebook is the freedom it gives you to execute the cells in whatever order you want. As you experiment and add and delete cells to try things out, you will reach a point at which the code on the screen is no longer a recipe for reaching the current state of your workspace.

Tip

Before submitting a notebook, it’s highly recommended to take two steps: Restart the kernel and then run all cells, just to make sure that everything still works as it’s seen on screen.

Generative AI

Generative AI is a catchall term for models that can produce output similar to data they were trained on. Most notoriously of late, these include the large language models such as Chat GPT that can simulate conversation.

The ultimate utility of these models is yet to be realized in most domains. However, they are already quite useful for generating computer code, given careful use and supervision. If nothing else, they offer a sort of intelligent autocomplete that can assist you when you have forgotten the names of functions and the orders of their arguments.

For students, there is a sophisticated code-oriented AI that you can use for free: GitHub Copilot. It is a plugin for the free Visual Studio Code editor that is well-developed for use with most computer programming languages, including Python, Matlab, and Julia, and it can work with Jupyter notebooks natively.

Getting started with GitHub Copilot

  1. Sign up for a free GitHub account.
  2. Apply for GitHub Global Campus as a student. This requires uploading some validation of your student status, such as a student ID or bursar’s receipt. It may take some time for your status to be verified.
  3. Download and install Visual Studio Code.
  4. Install the GitHub Copilot and the GitHub Copilot chat extensions for Visual Studio Code. You may need to restart or reload VS Code after installing these extensions.
  5. Sign in to your GitHib account in Visual Studio Code.
  6. Finally, open the GitHub chat window by clicking on the speech bubbles icon along the left side of the window. If it shows a button for starting a 30-day free trial, go ahead and click that, which takes you to a website. If your student status has been approved, yopu should not get asked for payment information before getting to a success screen. After one more restart of VS Code, you should be good to go.

Usage

There are now several ways you can interact with Copilot:

  • Open or create a code file and start typing. You will soon start to see grayed-out suggestions from Copilot. You can accept these suggestions by pressing Tab or Enter. If you continue typing, the suggestions will change to match. Note that Copilot will use the open file as context for its suggestions, including comments and variable names.
  • Within your code file, you can type Ctrl+i, or Cmd+i on a Mac, to open the Copilot panel. You can type in the panel to ask questions or create a prompt for it to respond to.
  • Along the left side of the VS Code window are icons representing the extensions you have installed. The Copilot Chat icon is a pair of speech bubbles (though this can change). Clicking on this icon will open a chat window.
  • You can use Ctrl+Shift+P, or Cmd+Shift+P on a Mac, to open the command palette and type “Copilot” to see a list of commands you can use to interact with Copilot. For example, you can ask Copilot to generate a function definition or a class definition, or to complete a line of code.

Math notation

To make good-looking math in Jupyter notebooks, you can use LaTeX notation. For example, to write the equation of a line as \(y=mx+b\), you can use the following within a text cell:

For example, to write the equation of a line as $y=mx+b$, you can use the following

The example above produces inline math, which is shown as part of the text. You can also use display math, which is shown on its own line.

For example, to write the equation of a line on its own line as

\[ y=mx+b, \]

you can use the following within a text cell:

For example, to write the equation of a line on its own line as

$$
y=mx+b,
$$

you can use the following

Use _ for subscripts and ^ for superscripts. For example, to write \(x_1^2\), use $x_1^2$. If you have more than one character in the sub/superscript, use curly braces. For example, to write \(X_{i,j}=10^{100}\), use $X_{i,j}=10^{100}$.

Function names and Greek letters are written with a backslash. For example, to write \(\sin(\theta)\), use $\sin(\theta)$. To write \(\alpha+\ln(\beta)\), use $\alpha + \ln(\beta)$. Note the difference between \(\cos(\phi)\), which is written as $\cos(\phi)$, and \(cos(\phi)\), which is written as $cos(\phi)$. All the functions you know work like this, plus square roots, like \(\sqrt{2+z}\), which is written as $\sqrt{2+z}$. Occasionally you want a short bit of text to appear like function names (upright) rather than multiplied variables (italic). For instance, to get \(x_\text{init}\), use $x_\text{init}$.

You’ve seen that curly braces are used to indicate groupings. If you actually want curly braces to appear, you have to escape them with a backslash. For example, to write \(\{2, 3, 5, 7\}\), use $\{2, 3, 5, 7\}$.

To write a fraction, use \frac{numerator}{denominator}. For example, to write \(\frac{1}{2}\), use $\frac{1}{2}$. Derivatives are written as fractions, so to write \(f'(x) = \frac{dy}{dx}\), use $f'(x) = \frac{dy}{dx}$.

If you want to put fractions inside of parentheses, use \left( and \right) to make the parentheses the right size. For example, to write

\[ \left( x - \frac{1}{2} \right)^3, \]

use $\left(x - \frac{1}{2}\right)^3$. The same applies to \left[ and \right] for square brackets.

For the most part, the spaces you put in don’t matter. For example, $x+1$ and $x + 1$ both produce \(x+1\). Usually, the automatic spacing makes the best-looking result. But if you want to put a space in manually, you can use \, for a small space, \: for a medium space, \; for a large space, and \quad for a really large space.

Integrals and sums are produced using \int and \sum, with subscripts and superscripts put where you would expect them to go. For example, to write

\[ \int_0^1 x^2 \, dx, \]

use $$\int_0^1 x^2 \, dx$$. (Here is one legitimate use for a manual space!) To write

\[ \sum_{i=1}^n i^2, \]

use $$\sum_{i=1}^n i^2$$.

There are seventeen squijillion web pages and blogs giving quick introductions to LaTeX, if you need to do something more complex than the above. Beware: it can be a deep rabbit hole to fall into.