Math 480A / 582E - Day 2 Python and Cython

131 days ago by cswiercz

Some Important Links and Resources

 
       

Day 2 - Python and Cython

Today we'll talk about the Python programming language and a tool called Cython for writing fast Python programs.

  • Questions?
  • Feel free to bring your computer to class to follow along.
    • (William wil probably do lots of in-class demos)

"Homework" Read the official Python tutorial located here: http://docs.python.org/tutorial 

A Brief Introduction to the Python Programming Language

Sage is built on top of the Python programming language. In fact, Sage can run any Python code!

Some features of Python:

  • easy to read
  • dynamic typing (i.e. no need to define variable type like in C/C++ or FORTRAN)
  • object-oriented
  • modular
  • extensions / modules can be written in C/C++ or Cython (more on Cython later)

 

Data Types

Python has several basic data types:

  • int
  • float
  • string
  • list
  • tuple
  • dict
# data types mystr = "what up!" myint = int(2012) sageint = 2012 
       
type(mystr) 
       
\newcommand{\Bold}[1]{\mathbf{#1}}\verb|<type|\phantom{x}\verb|'str'>|
\newcommand{\Bold}[1]{\mathbf{#1}}\verb|<type|\phantom{x}\verb|'str'>|
type(myint) 
       
\newcommand{\Bold}[1]{\mathbf{#1}}\verb|<type|\phantom{x}\verb|'int'>|
\newcommand{\Bold}[1]{\mathbf{#1}}\verb|<type|\phantom{x}\verb|'int'>|
type(sageint) 
       
\newcommand{\Bold}[1]{\mathbf{#1}}\verb|<type|\phantom{x}\verb|'sage.rings.integer.Integer'>|
\newcommand{\Bold}[1]{\mathbf{#1}}\verb|<type|\phantom{x}\verb|'sage.rings.integer.Integer'>|
sageint.factor() 
       
\newcommand{\Bold}[1]{\mathbf{#1}}2^{2} \cdot 503
\newcommand{\Bold}[1]{\mathbf{#1}}2^{2} \cdot 503
myfloat = float(3.14159) type(myfloat) 
       
\newcommand{\Bold}[1]{\mathbf{#1}}\verb|<type|\phantom{x}\verb|'float'>|
\newcommand{\Bold}[1]{\mathbf{#1}}\verb|<type|\phantom{x}\verb|'float'>|
ans = myfloat*myint print ans print type(ans) 
       
6320.87908
<type 'float'>
6320.87908
<type 'float'>
# strings can be encapsulated in single or double quotes s = "Hello, world!" t = 'She said "Hello, world!" to everybody.' t 
       
\newcommand{\Bold}[1]{\mathbf{#1}}\verb|She|\phantom{x}\verb|said|\phantom{x}\verb|"Hello,|\phantom{x}\verb|world!"|\phantom{x}\verb|to|\phantom{x}\verb|everybody.|
\newcommand{\Bold}[1]{\mathbf{#1}}\verb|She|\phantom{x}\verb|said|\phantom{x}\verb|"Hello,|\phantom{x}\verb|world!"|\phantom{x}\verb|to|\phantom{x}\verb|everybody.|
t. 
       
t.split(' ') 
       
\newcommand{\Bold}[1]{\mathbf{#1}}\left[\verb|She|, \verb|said|, \verb|"Hello,|, \verb|world!"|, \verb|to|, \verb|everybody.|\right]
\newcommand{\Bold}[1]{\mathbf{#1}}\left[\verb|She|, \verb|said|, \verb|"Hello,|, \verb|world!"|, \verb|to|, \verb|everybody.|\right]
l = [1,2,3] print l print type(l) 
       
[1, 2, 3]
<type 'list'>
[1, 2, 3]
<type 'list'>
l.reverse() print l 
       
[3, 2, 1]
[3, 2, 1]
mixedtypes = [3, 'super', 4.1, ['another','list'], 42] print mixedtypes 
       
[3, 'super', 4.10000000000000, ['another', 'list'], 42]
[3, 'super', 4.10000000000000, ['another', 'list'], 42]
print mixedtypes[0] print mixedtypes[1] print mixedtypes[2] print mixedtypes[3] 
       
3
super
4.10000000000000
['another', 'list']
3
super
4.10000000000000
['another', 'list']
tup[1] = 8912375419283 
       
Traceback (click to the left of this block for traceback)
...
TypeError: 'tuple' object does not support item assignment
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "_sage_input_29.py", line 10, in <module>
    exec compile(u'open("___code___.py","w").write("# -*- coding: utf-8 -*-\\n" + _support_.preparse_worksheet_cell(base64.b64decode("dHVwWzFdID0gODkxMjM3NTQxOTI4Mw=="),globals())+"\\n"); execfile(os.path.abspath("___code___.py"))
  File "", line 1, in <module>
    
  File "/tmp/tmpTsWU6r/___code___.py", line 3, in <module>
    exec compile(u'tup[_sage_const_1 ] = _sage_const_8912375419283 
  File "", line 1, in <module>
    
TypeError: 'tuple' object does not support item assignment
# lists are mutable! mixedtypes[3] = sqrt(2) print mixedtypes 
       
[3, 'super', 4.10000000000000, sqrt(2), 42]
[3, 'super', 4.10000000000000, sqrt(2), 42]
# tuples are immutable lists tup = (5, 'foo', 4.1) print tup print type(tup) 
       
(5, 'foo', 4.10000000000000)
<type 'tuple'>
(5, 'foo', 4.10000000000000)
<type 'tuple'>
# dictionaries are like lists, except they're indexed by whatever you want # # the "indexes" are called "keys" and the elements of the dictionary are called "items" d = {4.2:'hello', 'fun':'world'}; d 
       
\newcommand{\Bold}[1]{\mathbf{#1}}\left\{\verb|fun| : \verb|world|, 4.20000000000000 : \verb|hello|\right\}
\newcommand{\Bold}[1]{\mathbf{#1}}\left\{\verb|fun| : \verb|world|, 4.20000000000000 : \verb|hello|\right\}
d[4.2] 
       
\newcommand{\Bold}[1]{\mathbf{#1}}\verb|hello|
\newcommand{\Bold}[1]{\mathbf{#1}}\verb|hello|
d['fun'] 
       
\newcommand{\Bold}[1]{\mathbf{#1}}\verb|world|
\newcommand{\Bold}[1]{\mathbf{#1}}\verb|world|
d[19235] # 
       
Traceback (click to the left of this block for traceback)
...
KeyError: 19235
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "_sage_input_39.py", line 10, in <module>
    exec compile(u'open("___code___.py","w").write("# -*- coding: utf-8 -*-\\n" + _support_.preparse_worksheet_cell(base64.b64decode("ZFsxOTIzNV0gICAj"),globals())+"\\n"); execfile(os.path.abspath("___code___.py"))
  File "", line 1, in <module>
    
  File "/tmp/tmp4Y4Gds/___code___.py", line 3, in <module>
    exec compile(u'd[_sage_const_19235 ]   #
  File "", line 1, in <module>
    
KeyError: 19235
d.keys() 
       
\newcommand{\Bold}[1]{\mathbf{#1}}\left[\verb|fun|, 4.20000000000000\right]
\newcommand{\Bold}[1]{\mathbf{#1}}\left[\verb|fun|, 4.20000000000000\right]
d.items() 
       
\newcommand{\Bold}[1]{\mathbf{#1}}\left[\left(\verb|fun|, \verb|world|\right), \left(4.20000000000000, \verb|hello|\right)\right]
\newcommand{\Bold}[1]{\mathbf{#1}}\left[\left(\verb|fun|, \verb|world|\right), \left(4.20000000000000, \verb|hello|\right)\right]

Loops

for loops and while loops

for loops are all about lists. That is, all loops loop over items in a list.

mixedtypes = [3, 'super', 4.1, sqrt(2), 42] # tab indicates that we're inside the loop for thing in mixedtypes: print "a thing:" print thing print 
       
a thing:
3

a thing:
super

a thing:
4.10000000000000

a thing:
sqrt(2)

a thing:
42
a thing:
3

a thing:
super

a thing:
4.10000000000000

a thing:
sqrt(2)

a thing:
42
# an important command: range range(8) 
       
\newcommand{\Bold}[1]{\mathbf{#1}}\left[0, 1, 2, 3, 4, 5, 6, 7\right]
\newcommand{\Bold}[1]{\mathbf{#1}}\left[0, 1, 2, 3, 4, 5, 6, 7\right]
for n in range(8): print n^2 
       
0
1
4
9
16
25
36
49
0
1
4
9
16
25
36
49
N = randint(100,200) 
       
# this loop will find smallest prime greater than N N = randint(1000,2000) print "Starting number:", N while not is_prime(N): N = N + 1 print "Next prime:", N print "is prime? ", is_prime(N) 
       
Starting number: 1582
Next prime: 1583
is prime?  True
Starting number: 1582
Next prime: 1583
is prime?  True

Functions

The contents of a function are indicated by an indentation.

# functions def foo(x): return x^2 
       
foo(5) 
       
\newcommand{\Bold}[1]{\mathbf{#1}}25
\newcommand{\Bold}[1]{\mathbf{#1}}25
type(foo) 
       
\newcommand{\Bold}[1]{\mathbf{#1}}\verb|<type|\phantom{x}\verb|'function'>|
\newcommand{\Bold}[1]{\mathbf{#1}}\verb|<type|\phantom{x}\verb|'function'>|
# "quick functions" using lambda notation foo = lambda x: x^2 
       
foo(5) 
       
\newcommand{\Bold}[1]{\mathbf{#1}}25
\newcommand{\Bold}[1]{\mathbf{#1}}25
bar = lambda x: foo(2*x) 
       
bar(5) 
       
\newcommand{\Bold}[1]{\mathbf{#1}}100
\newcommand{\Bold}[1]{\mathbf{#1}}100
(2*5)^2 
       
\newcommand{\Bold}[1]{\mathbf{#1}}100
\newcommand{\Bold}[1]{\mathbf{#1}}100
# a function with documentation def my_prime_test(n): """ Determines the primality of `n` using trial division. INPUT: - ``n``: integer OUTPUT: - ``True`` if `n` is prime, ``False`` if `n` is composite EXAMPLES: An example with the current year:: sage: my_prime_test(2012) False """ for k in range(2,n): if n/k == int(n/k): return False return True 
       
my_prime_test(9) 
       
\newcommand{\Bold}[1]{\mathbf{#1}}\mathrm{False}
\newcommand{\Bold}[1]{\mathbf{#1}}\mathrm{False}
my_prime_test(2012) 
       
\newcommand{\Bold}[1]{\mathbf{#1}}\mathrm{False}
\newcommand{\Bold}[1]{\mathbf{#1}}\mathrm{False}
my_prime_test? 
       

File: /tmp/tmpe8HhpC/___code___.py

Type: <type ‘function’>

Definition: my_prime_test(n)

Docstring:

Determines the primality of n using trial division.

INPUT:

  • n: integer

OUTPUT:

  • True if n is prime, False if n is composite

EXAMPLES:

An example with the current year:

sage: my_prime_test(2012)
False

File: /tmp/tmpe8HhpC/___code___.py

Type: <type ‘function’>

Definition: my_prime_test(n)

Docstring:

Determines the primality of n using trial division.

INPUT:

  • n: integer

OUTPUT:

  • True if n is prime, False if n is composite

EXAMPLES:

An example with the current year:

sage: my_prime_test(2012)
False

 

Classes

If we have time at the end we can talk about classes. If we don't have time, take a look at the Python Docuementation on classes: http://docs.python.org/tutorial/classes.html

Python Modules

The functionality of Python (and of Sage) is broken up into "modules". A module is a Python file that contains a lot of functions and class definitions. modules are a similar concept to Matlab toolboxes, Maple packages, and Mathematica...whatever they're called. If you've done C/C++ programming, modules are similar to header files and libraries.

Python has a lot of built in modules: (see http://docs.python.org/tutorial/stdlib.html and http://docs.python.org/tutorial/stdlib2.html for a partial list and introduction)

  • os --- provides functions for interacting with the operating system
  • shutil --- daily file and directory management commands
  • re --- regular expression tools for string parsing
  • math --- basic mathematics (Sage, of course, has a lot more)
  • email --- a module for, yes, sending emails via Python

Often, modules will have submodules containing even more functionality.


#importing a module import os 
       
os. 
       
import sys 
       
sys.version 
       
\newcommand{\Bold}[1]{\mathbf{#1}}\begin{array}{l}
\verb|2.6.4|\phantom{x}\verb|(r264:75706,|\phantom{x}\verb|Nov|\phantom{xx}\verb|8|\phantom{x}\verb|2011,|\phantom{x}\verb|18:43:30)|\\
\verb|[GCC|\phantom{x}\verb|4.2.4|\phantom{x}\verb|(Ubuntu|\phantom{x}\verb|4.2.4-1ubuntu4)]|
\end{array}
\newcommand{\Bold}[1]{\mathbf{#1}}\begin{array}{l}
\verb|2.6.4|\phantom{x}\verb|(r264:75706,|\phantom{x}\verb|Nov|\phantom{xx}\verb|8|\phantom{x}\verb|2011,|\phantom{x}\verb|18:43:30)|\\
\verb|[GCC|\phantom{x}\verb|4.2.4|\phantom{x}\verb|(Ubuntu|\phantom{x}\verb|4.2.4-1ubuntu4)]|
\end{array}

Often, modules will contain submodules offering even more functionality. For example, Numpy is a Python module for doing numerics. It contains a submodule called linalg for performing common linear algebra operations.

import numpy 
       
numpy. 
       
mat = numpy.matrix([[4,2],[2,5]]); mat 
       
\newcommand{\Bold}[1]{\mathbf{#1}}\begin{array}{l}
\verb|[[4|\phantom{x}\verb|2]|\\
\phantom{x}\verb|[2|\phantom{x}\verb|5]]|
\end{array}
\newcommand{\Bold}[1]{\mathbf{#1}}\begin{array}{l}
\verb|[[4|\phantom{x}\verb|2]|\\
\phantom{x}\verb|[2|\phantom{x}\verb|5]]|
\end{array}
from numpy import linalg 
       
linalg. 
       
Traceback (click to the left of this block for traceback)
...
SyntaxError: invalid syntax
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "_sage_input_34.py", line 10, in <module>
    exec compile(u'open("___code___.py","w").write("# -*- coding: utf-8 -*-\\n" + _support_.preparse_worksheet_cell(base64.b64decode("bGluYWxnLg=="),globals())+"\\n"); execfile(os.path.abspath("___code___.py"))
  File "", line 1, in <module>
    
  File "/tmp/tmpnVG9TT/___code___.py", line 2
    linalg.
          ^
SyntaxError: invalid syntax
print mat print L*L.transpose() 
       
[[4 2]
 [2 5]]
[[ 4.  2.]
 [ 2.  5.]]
[[4 2]
 [2 5]]
[[ 4.  2.]
 [ 2.  5.]]
L = linalg.cholesky(mat); L 
       
\newcommand{\Bold}[1]{\mathbf{#1}}\begin{array}{l}
\verb|[[|\phantom{x}\verb|2.|\phantom{xx}\verb|0.]|\\
\phantom{x}\verb|[|\phantom{x}\verb|1.|\phantom{xx}\verb|2.]]|
\end{array}
\newcommand{\Bold}[1]{\mathbf{#1}}\begin{array}{l}
\verb|[[|\phantom{x}\verb|2.|\phantom{xx}\verb|0.]|\\
\phantom{x}\verb|[|\phantom{x}\verb|1.|\phantom{xx}\verb|2.]]|
\end{array}
 
       

Cython

Cython is an easy way to create "C-extensions" in Python. That is, with Cython you can easily write fast C/C++ code (that looks a lot like Python) with the usual easy to use  Python interface. Use Cython to greatly speed up your computations.

As a demonstrative example, let's create a function that will compute the sum of a bunch of numbers.

def sum_sage(n): s = 1 for i in range(n): s += i return s 
       
timeit('sum_py(1000000)') 
       
5 loops, best of 3: 159 ms per loop
5 loops, best of 3: 159 ms per loop

Just out of curiosity, let's compare the speed of Sage constructs, like Integer, to the base Python types like int.

%python def sum_python(n): s = 1 for i in range(n): s += i return s 
       
timeit('sum_python(1000000)') 
       
5 loops, best of 3: 162 ms per loop
5 loops, best of 3: 162 ms per loop

Interestingly, these results are comparable on this machine.

Now, lets begin writing some Cython. In the Sage Notebook, you can (usually) start writing Cython code by starting with the Sage/Python version verbatim. When we execute the above code under the %cython environment, Cython will turn the Python code into C, using underlying Python types, compile the C into a Python-accessible library, and load the function contained in the library into the Sage Notebook.

%cython def sum_cython_naive(n): s = 1 for i in range(n): s += i return s 
timeit('sum_cython_naive(1000000)') 
       
5 loops, best of 3: 140 ms per loop
5 loops, best of 3: 140 ms per loop

A slight speedup.

Note that two files are generated when we run the above code. One is a .c file containing the C code generated by Cython. Imagine tryingt to write this by hand! The .html file shows the "C translation" of your code line by line. (The long variable names come from the directory scheme used in the Sage Notebook.) Take a look at these files.

Now, let's declare the variable s and i to be C integer types. These are much faster than the (relatively) bloated Python integer types.

%cython def sum_cython(n): cdef int i, s = 1 for i in range(n): s += i return s 
timeit('sum_cython(1000000)') 
       
625 loops, best of 3: 662 µs per loop
625 loops, best of 3: 662 µs per loop
140/0.656 
       
\newcommand{\Bold}[1]{\mathbf{#1}}213.414634146341
\newcommand{\Bold}[1]{\mathbf{#1}}213.414634146341

A factor of 200 speedup!

Note that when dealing with C data types, Cython interprets "for i in range(n)" appropriately. Namely, as a C "for loop".

Careful, though. Sage integer types are arbitrary precision whereas C integer types are fixed precision: ether 32-bit or 64-bit. You might encounter overflow if your input is outside of this precision range.

sum_sage(10^7) 
       
\newcommand{\Bold}[1]{\mathbf{#1}}49999995000001
\newcommand{\Bold}[1]{\mathbf{#1}}49999995000001
sum_cython(10^7) 
       
\newcommand{\Bold}[1]{\mathbf{#1}}-2014260031
\newcommand{\Bold}[1]{\mathbf{#1}}-2014260031