https://iamtrask.github.io/2014/11/23/cython-blas-fortran/
https://stackoverflow.com/questions/31931798/implementing-faster-python-inner-product-with-blas
https://stackoverflow.com/questions/11443302/compiling-numpy-with-openblas-integration
http://markus-beuckelmann.de/blog/boosting-numpy-blas.html
In your journey to optimize your python code, at some point you can touch a level where, no matter you ported all your code to cython, the bottleneck is given by linear algebra operations.
At this stage, the most efficient way to optimize your code is to use the BLAS routines for linear algebra. Luckily, the BLAS dense routines are exposed to cython through scipy, so, in order to use them is to add the from scipy.linalg.blas cimport c_name_of_routine
in your code and you're pretty to go.
However, the Sparse BLAS routines are not accessible through scipy, and hence there is no easy way to profit from processor based optimization for sparse operations.
This repo sums up a working experiment (in Windows and Linux) to call the Sparse BLAS routines from the MKL library provided by default in the Anaconda distribution, that can be easily installed by typing conda install mkl
Package is based on a Anaconda installation on python . Ohter distributions are not supported.
conda install mkl
(be sure that your distribution is linked against the mkl, reinstalls numpy if necessary)
python setup.py install
pytest tests
In order to call the routines for your own package, you need to modify your setup.py
by
add libraries and library_dirs keywords on the Extension class
add include_dirs argument in the setup function
Take a look at setup.py
to see what the instructions above mean.
Sparse blas package contains headers for some Sparse functions, to include other functions just take a look at the c headers for instructions.