Prepared by: Cindee Madison, Thomas Kluyver (Any errors are our own)
Thanks to: Justin Kitzes, Matt Davis
The most basic component of any programming language are "things", also called variables or (in special cases) objects.
The most common basic "things" in Python are integers, floats, strings, booleans, and some special objects of various types. We'll meet many of these as we go through the lesson.
TIP: To run the code in a cell quickly, press Ctrl-Enter.
TIP: To quickly create a new cell below an existing one, type Ctrl-m then b. Other shortcuts for making, deleting, and moving cells are in the menubar at the top of the screen.
# A thing
2
# Use print to show multiple things in the same cell
# Note that you can use single or double quotes for strings
print(2)
print('hello')
# Things can be stored as variables
a = 2
b = 'hello'
c = True # This is case sensitive
print(a, b, c)
# The type function tells us the type of thing we have
print(type(a))
print(type(b))
print(type(c))
# What happens when a new variable point to a previous variable?
a = 1
b = a
a = 2
## What is b?
print(b)
Just storing data in variables isn't much use to us. Right away, we'd like to start performing operations and manipulations on data and variables.
There are three very common means of performing an operation on a thing.
All of the basic math operators work like you think they should for numbers. They can also
do some useful operations on other things, like strings. There are also boolean operators that
compare quantities and give back a bool
variable as a result.
# Standard math operators work as expected on numbers
a = 2
b = 3
print(a + b)
print(a * b)
print(a ** b) # a to the power of b (a^b does something completely different!)
print(a / b) # Careful with dividing integers if you use Python 2
# There are also operators for strings
print('hello' + 'world')
print('hello' * 3)
#print('hello' / 3) # You can't do this!
# Boolean operators compare two things
a = (1 > 3)
b = (3 == 3)
print(a)
print(b)
print(a or b)
print(a and b)
These will be very familiar to anyone who has programmed in any language, and work like you would expect.
# There are thousands of functions that operate on things
print(type(3))
print(len('hello'))
print(round(3.3))
TIP: To find out what a function does, you can type it's name and then a question mark to get a pop up help window. Or, to see what arguments it takes, you can type its name, an open parenthesis, and hit tab.
round?
#round(
round(3.14159, 2)
TIP: Many useful functions are not in the Python built in library, but are in external scientific packages. These need to be imported into your Python notebook (or program) before they can be used. Probably the most important of these are numpy and matplotlib.
# Many useful functions are in external packages
# Let's meet numpy
import numpy as np
# To see what's in a package, type the name, a period, then hit tab
#np?
np.
# Some examples of numpy functions and "things"
print(np.sqrt(4))
print(np.pi) # Not a function, just a variable
print(np.sin(np.pi))
Before we get any farther into the Python language, we have to say a word about "objects". We will not be teaching object oriented programming in this workshop, but you will encounter objects throughout Python (in fact, even seemingly simple things like ints and strings are actually objects in Python).
In the simplest terms, you can think of an object as a small bundled "thing" that contains within itself both data and functions that operate on that data. For example, strings in Python are objects that contain a set of characters and also various functions that operate on the set of characters. When bundled in an object, these functions are called "methods".
Instead of the "normal" function(arguments)
syntax, methods are called using the
syntax variable.method(arguments)
.
# A string is actually an object
a = 'hello, world'
print(type(a))
# Objects have bundled methods
#a.
print(a.capitalize())
print(a.replace('l', 'X'))
Throughout this lesson, we will successively build towards a program that will calculate the
variance of some measurements, in this case Height in Metres
. The first thing we want to do is convert from an antiquated measurement system.
To change inches into metres we use the following equation (conversion factor is rounded)
inches_in_metre
.inches
) for your height in inches, as inaccurately as you want.inches
by inches_in_metre
, and store the result in a new variable, metres
.Bonus
Convert from feet and inches to metres.
TIP: A 'gotcha' for all python 2 users (it was changed in python 3) is the result of integer division. To make it work the obvious way, either:
inches_in_metre = 39.
(add the decimal to cast to a float, or use 39.4 to be more accurate)from __future__ import division
- Put this at the top of the code and it will work
While it is interesting to explore your own height, in science we work with larger slightly more complex datasets. In this example, we are interested in the characteristics and distribution of heights. Python provides us with a number of objects to handle collections of things.
Probably 99% of your work in scientific Python will use one of four types of collections:
lists
, tuples
, dictionaries
, and numpy arrays
. We'll look quickly at each of these and what
they can do for you.
Lists are probably the handiest and most flexible type of container.
Lists are declared with square brackets [].
Individual elements of a list can be selected using the syntax a[ind]
.
# Lists are created with square bracket syntax
a = ['blueberry', 'strawberry', 'pineapple']
print(a, type(a))
# Lists (and all collections) are also indexed with square brackets
# NOTE: The first index is zero, not one
print(a[0])
print(a[1])
## You can also count from the end of the list
print('last item is:', a[-1])
print('second to last item is:', a[-2])
# you can access multiple items from a list by slicing, using a colon between indexes
# NOTE: The end value is not inclusive
print('a =', a)
print('get first two:', a[0:2])
# You can leave off the start or end if desired
print(a[:2])
print(a[2:])
print(a[:])
print(a[:-1])
# Lists are objects, like everything else, and have methods such as append
a.append('banana')
print(a)
a.append([1,2])
print(a)
a.pop()
print(a)
TIP: A 'gotcha' for some new Python users is that many collections, including lists, actually store pointers to data, not the data itself.
Remember when we set b=a
and then changed a
?
What happens when we do this in a list?
HELP: look into the copy
module
a = 1
b = a
a = 2
## What is b?
print('What is b?', b)
a = [1, 2, 3]
b = a
print('original b', b)
a[0] = 42
print('What is b after we change a ?', b)
heights
.Bonus
HINT: len() can be used to find the length of a collection
We won't say a whole lot about tuples except to mention that they basically work just like lists, with two major exceptions:
You'll see tuples come up throughout the Python language, and over time you'll develop a feel for when to use them.
In general, they're often used instead of lists:
xy = (23, 45)
print(xy[0])
xy[0] = "this won't work with a tuple"
Traceback errors are raised
when you try to do something with code it isn't meant to do. It is also meant to be informative, but like many things, it is not always as informative as we would like.
Looking at our error:
TypeError Traceback (most recent call last)
<ipython-input-25-4d15943dd557> in <module>()
1 xy = (23, 45)
2 xy[0]
----> 3 xy[0] = 'this wont work with a tuple'
TypeError: 'tuple' object does not support item assignment
Dictionaries are the collection to use when you want to store and retrieve things by their names (or some other kind of key) instead of by their position in the collection. A good example is a set of model parameters, each of which has a name and a value. Dictionaries are declared using {}.
# Make a dictionary of model parameters
convertors = {'inches_in_feet' : 12,
'inches_in_metre' : 39}
print(convertors)
print(convertors['inches_in_feet'])
## Add a new key:value pair
convertors['metres_in_mile'] = 1609.34
print(convertors)
# Raise a KEY error
print(convertors['blueberry'])
Even though numpy arrays (often written as ndarrays, for n-dimensional arrays) are not part of the core Python libraries, they are so useful in scientific Python that we'll include them here in the core lesson. Numpy arrays are collections of things, all of which must be the same type, that work similarly to lists (as we've described them so far). The most important are:
Arrays can be created from existing collections such as lists, or instantiated "from scratch" in a few useful ways.
When getting started with scientific Python, you will probably want to try to use ndarrays whenever possible, saving the other types of collections for those cases when you have a specific reason to use them.
# We need to import the numpy library to have access to it
# We can also create an alias for a library, this is something you will commonly see with numpy
import numpy as np
# Make an array from a list
alist = [2, 3, 4]
blist = [5, 6, 7]
a = np.array(alist)
b = np.array(blist)
print(a, type(a))
print(b, type(b))
# Do arithmetic on arrays
print(a**2)
print(np.sin(a))
print(a * b)
print(a.dot(b), np.dot(a, b))
# Boolean operators work on arrays too, and they return boolean arrays
print(a > 2)
print(b == 6)
c = a > 2
print(c)
print(type(c))
print(c.dtype)
# Indexing arrays
print(a[0:2])
c = np.random.rand(3,3)
print(c)
print('\n')
print(c[1:3,0:2])
c[0,:] = a
print('\n')
print(c)
# Arrays can also be indexed with other boolean arrays
print(a)
print(b)
print(a > 2)
print(a[a > 2])
print(b[a > 2])
b[a == 3] = 77
print(b)
# ndarrays have attributes in addition to methods
#c.
print(c.shape)
print(c.prod())
# There are handy ways to make arrays full of ones and zeros
print(np.zeros(5), '\n')
print(np.ones(5), '\n')
print(np.identity(5), '\n')
# You can also easily make arrays of number sequences
print(np.arange(0, 10, 2))
Revisit your list of heights
BONUS
So far, everything that we've done could, in principle, be done by hand calculation. In this section and the next, we really start to take advantage of the power of programming languages to do things for us automatically.
We start here with ways to repeat yourself. The two most common ways of doing this are known as for loops and while loops. For loops in Python are useful when you want to cycle over all of the items in a collection (such as all of the elements of an array), and while loops are useful when you want to cycle for an indefinite amount of time until some condition is met.
The basic examples below will work for looping over lists, tuples, and arrays. Looping over dictionaries is a bit different, since there is a key and a value for each item in a dictionary. Have a look at the Python docs for more information.
# A basic for loop - don't forget the white space!
wordlist = ['hi', 'hello', 'bye']
for word in wordlist:
print(word + '!')
Note on indentation: Notice the indentation once we enter the for loop. Every idented statement after the for loop declaration is part of the for loop. This rule holds true for while loops, if statements, functions, etc. Required identation is one of the reasons Python is such a beautiful language to read.
If you do not have consistent indentation you will get an IndentationError
. Fortunately, most code editors will ensure your indentation is correction.
NOTE In Python the default is to use four (4) spaces for each indentation, most editros can be configured to follow this guide.
# Indentation error: Fix it!
for word in wordlist:
new_word = word.capitalize()
print(new_word + '!') # Bad indent
# Sum all of the values in a collection using a for loop
numlist = [1, 4, 77, 3]
total = 0
for num in numlist:
total = total + num
print("Sum is", total)
# Often we want to loop over the indexes of a collection, not just the items
print(wordlist)
for i, word in enumerate(wordlist):
print(i, word, wordlist[i])
# While loops are useful when you don't know how many steps you will need,
# and want to stop once a certain condition is met.
step = 0
prod = 1
while prod < 100:
step = step + 1
prod = prod * 2
print(step, prod)
print('Reached a product of', prod, 'at step number', step)
TIP: Once we start really generating useful and large collections of data, it becomes unwieldy to inspect our results manually. The code below shows how to make a very simple plot of an array. We'll do much more plotting later on, this is just to get started.
# Load up pylab, a useful plotting library
%matplotlib inline
import matplotlib.pyplot as plt
# Make some x and y data and plot it
y = np.arange(100)**2
plt.plot(y)
We can now calculate the variance of the heights we collected before.
As a reminder, sample variance is the calculated from the sum of squared differences of each observation from the mean:
where mean is the mean of our observations, x is each individual observation, and n is the number of observations.
First, we need to calculate the mean:
total
for the sum of the heights.for
loop, add each height to total
.mean
.Note: To get the number of things in a list, use len(the_list)
.
Now we'll use another loop to calculate the variance:
sum_diffsq
for the sum of squared differences.for
loop over heights
.diff
. diffsq
.diffsq
on to sum_diffsq
.diffsq
by n-1
to get the variance.Note: To square a number in Python, use **
, eg. 5**2
.
Bonus
variance
is larger than 0.01, and print out a line that says "variance more than 0.01: "
followed by the answer (either True or False).
Often we want to check if a condition is True and take one action if it is, and another action if the condition is False. We can achieve this in Python with an if statement.
TIP: You can use any expression that returns a boolean value (True or False) in an if statement.
Common boolean operators are ==, !=, <, <=, >, >=. You can also use is
and is not
if you want to
check if two variables are identical in the sense that they are stored in the same location in memory.
# A simple if statement
x = 3
if x > 0:
print('x is positive')
elif x < 0:
print('x is negative')
else:
print('x is zero')
# If statements can rely on boolean variables
x = -1
test = (x > 0)
print(type(test)); print(test)
if test:
print('Test was true')
One way to write a program is to simply string together commands, like the ones described above, in a long file, and then to run that file to generate your results. This may work, but it can be cognitively difficult to follow the logic of programs written in this style. Also, it does not allow you to reuse your code easily - for example, what if we wanted to run our logistic growth model for several different choices of initial parameters?
The most important ways to "chunk" code into more manageable pieces is to create functions and then to gather these functions into modules, and eventually packages. Below we will discuss how to create functions and modules. A third common type of "chunk" in Python is classes, but we will not be covering object-oriented programming in this workshop.
# We've been using functions all day
x = 3.333333
print(round(x, 2))
print(np.sin(x))
# It's very easy to write your own functions
def multiply(x, y):
return x*y
# Once a function is "run" and saved in memory, it's available just like any other function
print(type(multiply))
print(multiply(4, 3))
# It's useful to include docstrings to describe what your function does
def say_hello(time, people):
'''
Function says a greeting. Useful for engendering goodwill
'''
return 'Good ' + time + ', ' + people
Docstrings: A docstring is a special type of comment that tells you what a function does. You can see them when you ask for help about a function.
say_hello('afternoon', 'friends')
# All arguments must be present, or the function will return an error
say_hello('afternoon')
# Keyword arguments can be used to make some arguments optional by giving them a default value
# All mandatory arguments must come first, in order
def say_hello(time, people='friends'):
return 'Good ' + time + ', ' + people
say_hello('afternoon')
say_hello('afternoon', 'students')
Finally, let's turn our variance calculation into a function that we can use over and over again. Copy your code from Exercise 4 into the box below, and do the following:
calculate_variance
that takes a list of values and returns their variance.Bonus
calculate_variance
function.
calculate_mean
and calculate_variance
function(s) in a module¶We can make our functions more easily reusable by placing them into modules that we can import, just
like we have been doing with numpy
. It's pretty simple to do this.
stats.py
.import stats
to import the module. Type stats.
and hit tab to see the available
functions in the module. Try calculating the variance of a number of samples of heights (or other random numbers) using your imported module.