Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Nim compilation process took an additional 702 ms

That's horrifyingly slow for a compiler. The author mentioned "modern languages look like Python but run as fast as C", which is a common promise those languages make that never really materialize except for a few very happy path cases they heavily optmised the language for. Julia, for example, makes this promise too, but compiles even slower than that and takes ridiculous amounts of RAM even for hello world.

Did the author post the data set they used for the examples? Would be nice to try it out on a few languages to see how fast that can compile and run on a mature language like Common Lisp (which is just as easy to write) or even node.js.



Nim is actually one of the fastest to compile out of the compiled languages out there, on par with Go. Although this is a bit subjective, I think a second of compilation is good enough for light scripting tasks. (And being a statically-typed languages it catches a good chunk of errors before compilation is finished.)

Nim's advantage is that it uses a good old C compiler for the backend (which has been hyperoptimized for decades), but the frontend (transpiler) is also pretty fast. Nim's compilation speed should improve a bit when incremental compilation support is added (which would probably solve a lot of other current issues for Nim, for example better IDE tooling)


I didn't post it because it's quite big (150M) but readily available from the NCBI Virus portal [1]. I would love to see how well other languages compete both for speed and simplicity.

[1] https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType...


I couldn't get your 150M file, so I used one of the smaller files I could get by clicking on the first set shown in the table (the FASTA file was only 30KB) and duplicated it until it was around 150MB.

Here's a comparison with Common Lisp:

~/fasta-dna $ time python3 run.py

0.3797277865097147

21.828 secs

~/fasta-dna $ time sbcl --script run.lisp

0.37972778

2.415 secs

~/fasta-dna $ ls -al nc_045512.2.fasta

-rw-r--r-- 1 156095639 2021-09-25 11:15 nc_045512.2.fasta

So, almost as fast as Nim (the time includes compilation time)?

Here's the Common Lisp code:

    (with-open-file (in "nc_045512.2.fasta")
      (loop for line = (read-line in nil)
            while line
            with gc = 0 with total = 0 do
              (unless (eql (aref line 0) #\>)
                (loop for i from 0 below (length line)
                      for ch = (char line i) do
                        (setf total (1+ total))
                        (when (or (eql ch #\C) (eql ch #\G))
                          (setf gc (1+ gc)))))
            finally (format t "~f~%" (/ gc total))))
With a top-level function and some type declarations it could run even faster, I think.

EDIT: compiling the Lisp code to FASL and annotating the types brings the total runtime to 2.0 seconds. Running it from source increases the time very slightly, to 2.08 seconds, showing how the SBCL compiler is incredibly fast. Taking 0.7 seconds to compile a few lines of code is crazy, imagine when your project grows to many thousands of lines.

The Lisp code still can't really match Nim, which is really C at runtime, in speed when excluding compile-time, but if you need a scripting language, CL is great (specially when used with the REPL and SLIME).


@brabel - The Nim compiler actually builds a relatively large `system` package every time. (They are also working on speeding up compiles.) So, compile time does not scale as badly as you think. E.g., you might have to 50..100x the "user level" source code to double the time.

Also, @benjamin-lee this version of the Nim program is a bit lower level, but probably much faster:

    import memfiles as mf
    var gc = 0
    var total = 0

    var f = mf.open("orthocoronavirinae.fasta")
    for line in memSlices(f):
        let n = line.size
        let cs = cast[cstring](line.data)
        if n > 0 and cs[0] == '>': # ignore comment lines
            continue
        for i in 0 ..< n:
            let letter = cs[i]
            if letter == 'C' or letter == 'G':
                gc += 1
            total += 1

    echo(gc.float / total.float)
    mf.close(f) # not really needed; process about to end
Compile with -d:danger and so on, of course. { On a small 30kB test file I got about a 1.7x speed-up over that of the blog post. I also could not find the 150 MB file. Multiplying up the tiny 30 KB file like @brabel, I got only a 1.25x speed-up down to 0.5 seconds. So, might not be worth the low levelness, but a real file might tilt more towards the 1.7x end. }


I clicked on the big Download button and selected "all records", it downloaded over 3.5GB before I gave up... which file exactly should I use??


I'm sorry, I completely forgot that the file I used was from six months ago when I wrote the blog post (and then promptly forgot to publish it). In the last half year, the number of coronavirus sequences has increased dramatically. One thing that you could do to drop the file size down is to filter for only complete and unambiguous sequences, which drops the number down from 1.6 million to ~100k [1].

Alternatively, the exact file I used for the post is available for one week here with MD5 sum 3c33c3c4c2610f650c779291668450c9 [2]. Anyone who wants the file is free to reach out to me directly (email is on site).

[1] https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType...

[2] https://file.io/nUNc7cG5i8gj


The file at [2] is already gone :(


can you upload somewhere your 150M file. If i follow the link in your comment there are bunch of small files, did you concatenate them?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: