;; Is Clojure Still Fast ? ;; Once upon a time I wrote a blog post saying that clojure was fast. It still is, and optimizing ;; it is now much easier than it used to be, but it doesn't seem to be *quite* as fast as it once ;; was. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; I optimized a program to solve a differential equation in a ;; simple-minded way: ;; The equation was: dy/dt = f(t,y) where f(t, y) = t-y and y=0 when t=0 ;; ( the exact solution is: y=e^(-t)+t-1 ) ;; Here's a program to solve it using Euler's method (defn f [t y] (- t y)) (defn solveit [t0 y0 h its] (if (> its 0) (let [t1 (+ t0 h) y1 (+ y0 (* h (f t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its])) ;; For every go round the loop we have to: ;; compare its with 0, ;; branch depending on the result, ;; add t0 to h, ;; call f with t0 and y0, ;; multiply h and the result, ;; add that to y0, ;; jump. ;; So if this was an assembly language program that worked the way ;; you'd expect, each loop would take 7 cycles. ;; According to /proc/cpuinfo, my netbook runs at 1.662 GHz. (def *cpuspeed* 1.662) ;; We care about how many cycles each iteration of the solver takes: (defmacro cyclesperit [expr its] `(let [start# (. System (nanoTime)) ret# ( ~@expr (/ 1.0 ~its) ~its ) finish# (. System (nanoTime))] (int (/ (* *cpuspeed* (- finish# start#)) ~its)))) ;; With the program as written, this estimate turns out to have been a little optimistic. ;; The figures in the original post were on a desktop machine that I no longer have, which was more ;; powerful than my current netbook. ;; To two significant figures, the results of the timing expression (cyclesperit (solveit 0.0 1.0) 1000000) ;; are: ;; On my old desktop with clojure 1.2 : 2300 cycles ;; On my netbook with clojure 1.2 : 2800 cycles ;; On my netbook with clojure 1.3 : 2500 cycles ;; On my netbook with clojure 1.4 : 2400 cycles ;; So it looks like my netbook is not only slower in clock speed than my desktop was, but also in ;; terms of cycles/iteration. That's not surprising as the netbook has an Atom processor, optimized ;; for low power rather than for speed. ;; But it also looks as though clojure has been speeding up slightly, which has almost made up for that. ;; I'm also assuming that the JVM itself hasn't changed much since the original blog post. The ;; netbook timings were all done today on the same JVM, but the desktop timings are from a while ;; back, so that might account for some differences. ;; So we're looking at a slowdown of about 300 times over what we could probably ;; achieve coding in assembler or in C with a good optimizing compiler (and of ;; course I'm assuming that floating point operations take one cycle each) ;; This is about the sort of speed that you'd expect from a dynamic language ;; without any optimization or type hinting. ;; In the original blog post I went through a number of faster versions: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; In solveit-2 I explicitly typed the loop variables, inlined the function call, and made an ;; internal target for recur. That speeded things up considerably: (defn solveit-2 [t0 y0 h its] (loop [t0 (double t0), y0 (double y0), h (double h), its (int its)] (if (> its 0) (let [t1 (+ t0 h) y1 (+ y0 (* h (- t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its]))) ;; Let's time that and see how it goes: (cyclesperit (solveit-2 0.0 1.0) 10000000) ;; On my old desktop with clojure 1.2 : 490 cycles ;; On my netbook with clojure 1.2 : 600 cycles ;; On my netbook with clojure 1.3 : 44 cycles ;; On my netbook with clojure 1.4 : 44 cycles ;; Wow! ;; Again we see that my netbook is around 20% slower, but the performance change between clojure 1.2 ;; and clojure 1.3 was incredible. ;; There's now about a factor of 6 between the clojure version and the imaginary assembler program ;; on the imaginary cpu in my head. And all I've done is to declare the types and inline the functions. ;; This is actually so impressive that I want to examine the three things in detail, to see how the ;; three things combine to cause the speedup. As I remember from clojure 1.2, you needed all three ;; changes to see any great difference. In 1.4 you can type hint separately from making an internal loop target. ;; All timings with clojure 1.4.0. (defn f [t y] (- t y)) (defn solveit [t0 y0 h its] (if (> its 0) (let [t1 (+ t0 h) y1 (+ y0 (* h (f t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its])) (cyclesperit (solveit 0.0 1.0) 10000000) ; 2481 ;; type hints: (defn ^double tf [^double t ^double y] (- t y)) (defn solveit-1-1 [^double t0 ^double y0 ^double h ^long its] (if (> its 0) (let [t1 (+ t0 h) y1 (+ y0 (* h (tf t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its])) (cyclesperit (solveit-1-0 0.0 1.0) 10000000) ; 289 ;; Inline f (defn solveit-1-2 [ ^double t0 ^double y0 ^double h ^long its] (if (> its 0) (let [t1 (+ t0 h) y1 (+ y0 (* h (- t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its])) (cyclesperit (solveit-1-2 0.0 1.0) 10000000) ; 44 ;; Internal loop target (defn solveit-1-3 [ ^double t0 ^double y0 ^double h ^long its] (loop [t0 t0, y0 y0, h h, its its] (if (> its 0) (let [t1 (+ t0 h) y1 (+ y0 (* h (- t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its]))) (cyclesperit (solveit-1-3 0.0 1.0) 10000000) ; 44 ;; Original version (defn solveit-2 [t0 y0 h its] (loop [t0 (double t0), y0 (double y0), h (double h), its (int its)] (if (> its 0) (let [t1 (+ t0 h) y1 (+ y0 (* h (- t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its]))) ;; Let's time that and see how it goes: (cyclesperit (solveit-2 0.0 1.0) 10000000) ; 44 ;; This is pretty awesome. Type hints alone are giving us a factor of eight speedup, and inlining ;; the function then gives us another factor of 6. The internal loop target, which was originally a ;; bit of a hack to allow clojure to deduce types doesn't make any difference to the hinted version, ;; but in fact that still works, and gives the compiler enough information to get the same speed. ;; So far I really could not be more impressed. ;; In the original post, I then used a profiler to find out where the loop was running slow, and ;; did some strange things to make it fast: (defn solveit-3 [t0 y0 h its] (loop [t0 (double t0), y0 (double y0), h (double h), its (int its)] (if (> its (int 0)) (let [t1 (+ t0 h) y1 (+ y0 (* h (- t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its]))) (cyclesperit (solveit-3 0.0 1.0) 10000000) ;; On my old desktop with clojure 1.2 : 70 cycles ;; On my netbook with clojure 1.2 : 90 cycles ;; On my netbook with clojure 1.3 : 100 cycles ;; On my netbook with clojure 1.4 : 100 cycles ;; Originally, that (int 0) instead of 0 in the comparison made all the difference, causing a huge ;; and unexpected speedup. ;; The odd thing here is that the same program actually runs slightly slower in clojure 1.3/1.4 than ;; it does in clojure 1.2. Not much, and they're all considerably slower than clojure 1.4's best ;; shot so far. ;; What I think is happening here is that that int cast is taking up unnecessary time in the later clojures, ;; whereas in 1.2 it was the final piece of the puzzle as far as the compiler was concerned. ;; To test this: (defn solveit-3-a [ t0 y0 h its] (loop [t0 (double t0), y0 (double y0), h (double h), its (long its)] (if (> its (long 0 )) (let [t1 (+ t0 h) y1 (+ y0 (* h (- t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its]))) (cyclesperit (solveit-3-a 0.0 1.0) 10000000) ; 44 ;; solveit-3-a runs at the same 44 cycles/iteration as we've been seeing before. I get the ;; impression that modern clojure prefers longs to ints. ;; The final craziness was to take the (int 0) out of the loop entirely: (defn solveit-4 [t0 y0 h its] (let [zero (int 0)] (loop [t0 (double t0) y0 (double y0) h (double h) its (int its)] (if (> its zero) (let [t1 (+ t0 h) y1 (+ y0 (* h (- t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its])))) (cyclesperit (solveit-4 0.0 1.0) 100000000) ;; And this is where it gets weird: ;; On my old desktop with clojure 1.2 : 23 cycles ;; On my netbook with clojure 1.2 : 32 cycles ;; On my netbook with clojure 1.3 : 45 cycles ;; On my netbook with clojure 1.4 : 45 cycles ;; Clojure 1.3/1.4 is now running at its usual (fast) speed, despite the ints rather than longs in solveit-4 ;; But Clojure 1.2 on this program is able to run about 33% faster than the modern versions. ;; So it actually looks as though whatever awesomeness has caused the phenomenal speedup between ;; clojure 1.2 and 1.3 has actually slightly slowed down the optimized version. ;; I'm imagining that this is something to do with using longs rather than ints. ;; But I haven't done any kind of close profiling to see if I can make 1.4 run any faster than 44 cycles/loop. ;; Someone in a comment on my earlier post said that they were seeing 8 cycles/second on their ;; Macintosh, so it might just be that there's something weird about the Atom processor. ;; At any rate, it looks as though optimizing numeric code in Clojure is now dead easy. You just ;; tell it the types, inline function calls, and that's as good as it gets. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; For those in real need of cycles and willing to take risks to save them, there's: (set! *unchecked-math* true) ;; Which I don't think has an equivalent in 1.2, which cuts a few cycles off: (defn solveit-4 [t0 y0 h its] (let [zero (long 0)] (loop [t0 (double t0) y0 (double y0) h (double h) its (long its)] (if (> its zero) (let [t1 (+ t0 h) y1 (+ y0 (* h (- t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its])))) (cyclesperit (solveit-4 0.0 1.0) 100000000) ; 37 ;; I do wonder what is going on here. I'm not sure what checking is here to be turned off. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Incidentally, the following two java programs also produce the numbers that are seen above: ;; public class Euler{ ;; public static void main (String[] args){ ;; double cpuspeed = 1.662; ;; int its = 10000000; ;; double t=0; ;; double y=0; ;; double h=1.0/its; ;; Long start = System.nanoTime(); ;; for(int i = 0; i < its; i++){ ;; y = y+h*(t-y); ;; t = t+h; ;; } ;; Long finish = System.nanoTime(); ;; System.out.println("y=" + y + " t=" +t); ;; System.out.println("cycles/iteration: " + ((int) ((cpuspeed * (finish - start)) / its))); ;; } ;; } ;; y=0.3678794227282174 t=0.99999999975017 ;; cycles/iteration: 32 ;; public class Euler{ ;; public static void main (String[] args){ ;; double cpuspeed = 1.662; ;; long its = 10000000; ;; double t=0; ;; double y=0; ;; double h=1.0/its; ;; long start = System.nanoTime(); ;; for(long i = 0; i < its; i++){ ;; y = y+h*(t-y); ;; t = t+h; ;; } ;; long finish = System.nanoTime(); ;; System.out.println("y=" + y + " t=" +t); ;; System.out.println("cycles/iteration: " + ((int) ((cpuspeed * (finish - start)) / its))); ;; } ;; } ;; y=0.3678794227282174 t=0.99999999975017 ;; cycles/iteration: 37
Blog Archive
-
▼
2013
(52)
-
▼
February
(10)
- Rerum Cognoscere Causas
- Clojure's Reader is Unsafe
- Linear Congruential Random Number Generators
- Packing and Unpacking Numbers : Deterministic Rand...
- Clojure Emacs Eval Paste Keyboard Macros / Generat...
- Mathematics as an Experimental Science
- A Naive Bayesian Classifier
- Clojure is Fast / Is Clojure Still Fast? / Clojure...
- FOR vs MAP
- Runtime Require: Downloading a New Dependency and ...
-
▼
February
(10)
Search This Blog
Friday, February 8, 2013
Clojure is Fast / Is Clojure Still Fast? / Clojure is as Fast as Java
Subscribe to:
Post Comments (Atom)
Running on a MBA (Core i5 1.8 GHz) on Clojure 1.4 and Java (build 1.7.0_05-b06). Only chang was setting cpuspeed to 1.8.
ReplyDeleteI got 424 on my first run of the unoptimized version, successive runs drop to a steady state of 103. Killing the nrepl started me at 257 and drops to same steady state.
Also ran solveit-4 wuthout setting unchecked math, 11 on the initial run dropping to 7 steady state. yay.
Your benchmarking is too simplistic. https://github.com/hugoduncan/criterium or better yet : http://www.azulsystems.com/presentations/art-of-java-benchmarking
ReplyDeleteThe newer versions of Clojure have moved to 64-bit arithmetic for good, which is why `int` is giving speedup in 1.2 for you. Now, if your post was annotated with line numbers I could mention those in my comment. Another thing to note is when you run the code with JVM arg "-server" it automatically JIT-inlines functions doing primitive arithmetic stuff, so let the JVM warm up for a while before measuring anything. Criterium is the right tool. Nice run-through though.
ReplyDelete