Skip to content

Commit c38919d

Browse files
authored
Merge pull request #94 from prajwel/docupdate
Mentioned the histogramdd function in the README
2 parents 3a98148 + ddada09 commit c38919d

File tree

4 files changed

+58
-18
lines changed

4 files changed

+58
-18
lines changed

README.rst

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
About
44
-----
55

6-
Sometimes you just want to compute simple 1D or 2D histograms with regular bins. Fast. No
6+
Sometimes you just want to compute simple 1D, 2D, or multidimensional histograms with regular bins. Fast. No
77
nonsense. `Numpy's <http://www.numpy.org>`__ histogram functions are
88
versatile, and can handle for example non-regular binning, but this
99
versatility comes at the expense of performance.
@@ -13,8 +13,9 @@ histogram functions for regular bins that don't compromise on performance. It do
1313
anything complicated - it just implements a simple histogram algorithm
1414
in C and keeps it simple. The aim is to have functions that are fast but
1515
also robust and reliable. The result is a 1D histogram function here that
16-
is **7-15x faster** than ``numpy.histogram``, and a 2D histogram function
17-
that is **20-25x faster** than ``numpy.histogram2d``.
16+
is **2-15x faster** than ``numpy.histogram``, a 2D histogram function
17+
that is **10x faster** than ``numpy.histogram2d``, and a multidimensional
18+
histogram function that is **5-10x faster** than ``numpy.histogramdd``.
1819

1920
To install::
2021

@@ -24,12 +25,12 @@ or if you use conda you can instead do::
2425

2526
conda install -c conda-forge fast-histogram
2627

27-
The ``fast_histogram`` module then provides two functions:
28-
``histogram1d`` and ``histogram2d``:
28+
The ``fast_histogram`` module then provides three functions:
29+
``histogram1d``, ``histogram2d``, and ``histogramdd``:
2930

3031
.. code:: python
3132
32-
from fast_histogram import histogram1d, histogram2d
33+
from fast_histogram import histogram1d, histogram2d, histogramdd
3334
3435
Example
3536
-------
@@ -46,24 +47,26 @@ histogram:
4647
In [3]: y = np.random.random(10_000_000)
4748
4849
In [4]: %timeit _ = np.histogram2d(x, y, range=[[-1, 2], [-2, 4]], bins=30)
49-
935 ms ± 58.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
50+
562 ms ± 5.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
5051
5152
In [5]: from fast_histogram import histogram2d
5253
5354
In [6]: %timeit _ = histogram2d(x, y, range=[[-1, 2], [-2, 4]], bins=30)
54-
40.2 ms ± 624 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
55+
55.9 ms ± 583 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
5556
5657
(note that ``10_000_000`` is possible in Python 3.6 syntax, use ``10000000`` instead in previous versions)
5758

58-
The version here is over 20 times faster! The following plot shows the
59+
The version here is over 10 times faster! The following plot shows the
5960
speedup as a function of array size for the bin parameters shown above:
6061

6162
.. figure:: https://github.com/astrofrog/fast-histogram/raw/main/speedup_compared.png
6263
:alt: Comparison of performance between Numpy and fast-histogram
6364

64-
as well as results for the 1D case, also with 30 bins. The speedup for
65-
the 2D case is consistently between 20-25x, and for the 1D case goes
66-
from 15x for small arrays to around 7x for large arrays.
65+
as well as results for the 1D and 3D cases, also with 30 bins. The speedup for
66+
the 2D case is consistently between 10-12x, and for the 1D case goes
67+
from 15x for small arrays to around 2x for large arrays.
68+
We have benchmarked the ``histogramdd`` function with a 3D array, and the speedup
69+
is found to be between 5-10x.
6770

6871
Q&A
6972
---

comparison/benchmark.py

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,18 @@
2525
NUMPY_2D_STMT = "np_histogram2d(x, y, range=[[-1, 2], [-2, 4]], bins=30)"
2626
FAST_2D_STMT = "histogram2d(x, y, range=[[-1, 2], [-2, 4]], bins=30)"
2727

28+
SETUP_3D = """
29+
import numpy as np
30+
from numpy import histogramdd as np_histogramdd
31+
from fast_histogram import histogramdd
32+
x = np.random.random({size})
33+
y = np.random.random({size})
34+
z = np.random.random({size})
35+
"""
36+
37+
NUMPY_3D_STMT = "np_histogramdd(np.column_stack([x, y, z]), range=[[-1, 2], [-2, 4], [-2, 4]], bins=30)"
38+
FAST_3D_STMT = "histogramdd(np.column_stack([x, y, z]), range=[[-1, 2], [-2, 4], [-2, 4]], bins=30)"
39+
2840
# How long each benchmark should aim to take
2941
TARGET_TIME = 1.0
3042

@@ -44,8 +56,8 @@ def time_stats(stmt=None, setup=None):
4456
return np.min(times) / number, np.mean(times) / number, np.median(times) / number
4557

4658

47-
FMT_HEADER = "# {:7s}" + " {:10s}" * 12 + "\n"
48-
FMT = "{:9d}" + " {:10.7e}" * 12 + "\n"
59+
FMT_HEADER = "# {:7s}" + " {:10s}" * 18 + "\n"
60+
FMT = "{:9d}" + " {:10.7e}" * 18 + "\n"
4961

5062
with open("benchmark_times.txt", "w") as f:
5163
f.write(
@@ -63,6 +75,12 @@ def time_stats(stmt=None, setup=None):
6375
"fa_2d_min",
6476
"fa_2d_mean",
6577
"fa_2d_median",
78+
"np_3d_min",
79+
"np_3d_mean",
80+
"np_3d_median",
81+
"fa_3d_min",
82+
"fa_3d_mean",
83+
"fa_3d_median",
6684
)
6785
)
6886

@@ -83,6 +101,12 @@ def time_stats(stmt=None, setup=None):
83101
fa_2d_min, fa_2d_mean, fa_2d_median = time_stats(
84102
stmt=FAST_2D_STMT, setup=SETUP_2D.format(size=size)
85103
)
104+
np_3d_min, np_3d_mean, np_3d_median = time_stats(
105+
stmt=NUMPY_3D_STMT, setup=SETUP_3D.format(size=size)
106+
)
107+
fa_3d_min, fa_3d_mean, fa_3d_median = time_stats(
108+
stmt=FAST_3D_STMT, setup=SETUP_3D.format(size=size)
109+
)
86110

87111
f.write(
88112
FMT.format(
@@ -99,6 +123,12 @@ def time_stats(stmt=None, setup=None):
99123
fa_2d_min,
100124
fa_2d_mean,
101125
fa_2d_median,
126+
np_3d_min,
127+
np_3d_mean,
128+
np_3d_median,
129+
fa_3d_min,
130+
fa_3d_mean,
131+
fa_3d_median,
102132
)
103133
)
104134
f.flush()

comparison/plot.py

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,17 +17,24 @@
1717
fa_2d_min,
1818
fa_2d_mean,
1919
fa_2d_median,
20+
np_3d_min,
21+
np_3d_mean,
22+
np_3d_median,
23+
fa_3d_min,
24+
fa_3d_mean,
25+
fa_3d_median,
2026
) = np.loadtxt("benchmark_times.txt", unpack=True)
2127

2228
fig = plt.figure()
2329
ax = fig.add_subplot(1, 1, 1)
24-
ax.plot(size, np_1d_min / fa_1d_min, color=(34 / 255, 122 / 255, 181 / 255), label="1D")
25-
ax.plot(size, np_2d_min / fa_2d_min, color=(255 / 255, 133 / 255, 25 / 255), label="2D")
30+
ax.plot(size, np_1d_min / fa_1d_min, label="1D")
31+
ax.plot(size, np_2d_min / fa_2d_min, label="2D")
32+
ax.plot(size, np_3d_min / fa_3d_min, label="DD (3D)")
2633
ax.set_xscale("log")
2734
ax.set_xlim(0.3, 3e8)
28-
ax.set_ylim(1, 35)
35+
ax.set_ylim(1, 20)
2936
ax.grid()
3037
ax.set_xlabel("Array size")
31-
ax.set_ylabel("Speedup (fast-histogram / numpy)")
38+
ax.set_ylabel(f"Speedup (fast-histogram / numpy (version {np.__version__})")
3239
ax.legend()
3340
fig.savefig("speedup_compared.png", bbox_inches="tight")

speedup_compared.png

12.3 KB
Loading

0 commit comments

Comments
 (0)