These are the slides for a talk I gave recently.

**Abstract.** IPython notebooks, NumPy and
Pandas data frames are the go-to tools for doing data science with Python. Spark and PySpark is rapidly becoming the *de facto* standard for doing
analysis on large volumes of data. But what about CPU-intensive tasks? What about
rough numerical, but distributed computations? In the first part of this talk
I give an overview of the most interesting alternatives. The second part is
a brief roundup of the file formats for storing data for numerical analysis;
most of these file formats are language-independent.