Bottleneck

Bottleneck is a collection of fast, NaN-aware NumPy array functions written in C.

As one example, to check if a np.array has any NaNs using numpy, one must call np.any(np.isnan(array)). The bottleneck.anynan() function interleaves the np.isnan() check with np.any() pre-exit, enabling up to an O(N) speedup relative to numpy.

Bottleneck strives to be a drop-in accelerator for NumPy functions. When using the following libraries, Bottleneck support is automatically enabled and utilized:

Details on the performance benefits can be found in Benchmark

Example

Let’s give it a try. Create a NumPy array:

>>> import numpy as np
>>> a = np.array([1, 2, np.nan, 4, 5])

Find the nanmean:

>>> import bottleneck as bn
>>> bn.nanmean(a)
3.0

Moving window mean:

>>> bn.move_mean(a, window=2, min_count=1)
array([ 1. ,  1.5,  2. ,  4. ,  4.5])

Benchmark

Bottleneck comes with a benchmark suite:

>>> bn.bench()
Bottleneck performance benchmark
    Bottleneck 1.3.0.dev0+122.gb1615d7; Numpy 1.16.4
    Speed is NumPy time divided by Bottleneck time
    NaN means approx one-fifth NaNs; float64 used

              no NaN     no NaN      NaN       no NaN      NaN
               (100,)  (1000,1000)(1000,1000)(1000,1000)(1000,1000)
               axis=0     axis=0     axis=0     axis=1     axis=1
nansum         29.7        1.4        1.6        2.0        2.1
nanmean        99.0        2.0        1.8        3.2        2.5
nanstd        145.6        1.8        1.8        2.7        2.5
nanvar        138.4        1.8        1.8        2.8        2.5
nanmin         27.6        0.5        1.7        0.7        2.4
nanmax         26.6        0.6        1.6        0.7        2.5
median        120.6        1.3        4.9        1.1        5.7
nanmedian     117.8        5.0        5.7        4.8        5.5
ss             13.2        1.2        1.3        1.5        1.5
nanargmin      66.8        5.5        4.8        3.5        7.1
nanargmax      57.6        2.9        5.1        2.5        5.3
anynan         10.2        0.3       52.3        0.8       41.6
allnan         15.1      196.0      156.3      135.8      111.2
rankdata       45.9        1.2        1.2        2.1        2.1
nanrankdata    50.5        1.4        1.3        2.4        2.3
partition       3.3        1.1        1.6        1.0        1.5
argpartition    3.4        1.2        1.5        1.1        1.6
replace         9.0        1.5        1.5        1.5        1.5
push         1565.6        5.9        7.0       13.0       10.9
move_sum     2159.3       31.1       83.6      186.9      182.5
move_mean    6264.3       66.2      111.9      361.1      246.5
move_std     8653.6       86.5      163.7      232.0      317.7
move_var     8856.0       96.3      171.6      267.9      332.9
move_min     1186.6       13.4       30.9       23.5       45.0
move_max     1188.0       14.6       29.9       23.5       46.0
move_argmin  2568.3       33.3       61.0       49.2       86.8
move_argmax  2475.8       30.9       58.6       45.0       82.8
move_median  2236.9      153.9      151.4      171.3      166.9
move_rank     847.1        1.2        1.4        2.3        2.6

You can also run a detailed benchmark for a single function using, for example, the command:

>>> bn.bench_detailed("move_median", fraction_nan=0.3)

Only arrays with data type (dtype) int32, int64, float32, and float64 are accelerated. All other dtypes result in calls to slower, unaccelerated functions. In the rare case of a byte-swapped input array (e.g. a big-endian array on a little-endian operating system) the function will not be accelerated regardless of dtype.

License

Bottleneck is distributed under a Simplified BSD license. See the LICENSE file and LICENSES directory for details.

Install

Requirements:

Bottleneck

Python 3.6, 3.7, 3.8; NumPy 1.15.0+ (follows NEP 29)

Compile

gcc, clang, MinGW or MSVC

Unit tests

pytest, hypothesis

Documentation

sphinx, numpydoc

To install Bottleneck on Linux, Mac OS X, et al.:

$ pip install .

To install bottleneck on Windows, first install MinGW and add it to your system path. Then install Bottleneck with the command:

python setup.py install --compiler=mingw32

Alternatively, you can use the Windows binaries created by Christoph Gohlke: http://www.lfd.uci.edu/~gohlke/pythonlibs/#bottleneck

Unit tests

After you have installed Bottleneck, run the suite of unit tests:

In [1]: import bottleneck as bn

In [2]: bn.test()
============================= test session starts =============================
platform linux -- Python 3.7.4, pytest-4.3.1, py-1.8.0, pluggy-0.12.0
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/chris/code/bottleneck/.hypothesis/examples')
rootdir: /home/chris/code/bottleneck, inifile: setup.cfg
plugins: openfiles-0.3.2, remotedata-0.3.2, doctestplus-0.3.0, mock-1.10.4, forked-1.0.2, cov-2.7.1, hypothesis-4.32.2, xdist-1.26.1, arraydiff-0.3
collected 190 items

bottleneck/tests/input_modification_test.py ........................... [ 14%]
..                                                                      [ 15%]
bottleneck/tests/list_input_test.py .............................       [ 30%]
bottleneck/tests/move_test.py .................................         [ 47%]
bottleneck/tests/nonreduce_axis_test.py ....................            [ 58%]
bottleneck/tests/nonreduce_test.py ..........                           [ 63%]
bottleneck/tests/reduce_test.py ....................................... [ 84%]
............                                                            [ 90%]
bottleneck/tests/scalar_input_test.py ..................                [100%]

========================= 190 passed in 46.42 seconds =========================
Out[2]: True

If developing in the git repo, simply run py.test