clearprop's comments

clearprop · on July 6, 2024

I thought Rust was thread-safe and I don't see any "unsafe" blocks. What am I missing?

kibwen · on July 6, 2024

From https://doc.rust-lang.org/nomicon/races.html :

"Safe Rust guarantees an absence of data races, which are defined as: two or more threads concurrently accessing a location of memory, where one or more of them is a write, and one or more of them is unsynchronized. A data race has Undefined Behavior, and is therefore impossible to perform in Safe Rust. [...] However Rust does not prevent general race conditions. This is mathematically impossible in situations where you do not control the scheduler, which is true for the normal OS environment. [...] For this reason, it is considered "safe" for Rust to get deadlocked or do something nonsensical with incorrect synchronization: this is known as a general race condition or resource race. Obviously such a program isn't very good, but Rust of course cannot prevent all logic errors. In any case, a race condition cannot violate memory safety in a Rust program on its own. Only in conjunction with some other unsafe code can a race condition actually violate memory safety."

chc4 · on July 6, 2024

Losing updates to the atomic counter is thread safe behavior, it's just a logic bug.

kibwen · on July 6, 2024

Specifically, one thread reads the counter with proper synchronization, then another thread writes an incremented value to the counter with proper synchronization, then the first thread writes its own incremented value to the counter with proper synchronization. At every step the use of an AtomicU32 guarantees proper synchronization to the underlying memory, which is what Rust is concerned with.

The fix for the logic bug in this case would be to indicate that you want the increment itself to be an atomic operation, using the fetch_add method: https://doc.rust-lang.org/std/sync/atomic/struct.AtomicU32.h...

clearprop · on July 4, 2024

Indeed, updated: https://github.com/alrevuelta/myblog/commit/9fcc3dc84b1d9b66...

Thanks for the feedback.

clearprop · on July 4, 2024

Great feedback. I have updated the post using convolve instead. There is a huge difference convolve/naive. On the other hand, convolved is slower than the FFT as expected, but for greater than 3000 degree polynomials or so. See diff: https://github.com/alrevuelta/myblog/commit/9fcc3dc84b1d9b66...

clearprop · on July 1, 2024

Thanks! In case it helps: * Multiplying two polynomials as taught in school is in reality a convolution. * "performing the convolution of two signals in the time domain is equivalent to multiplying them in the frequency domain" * FFT allows us to convert from time domain to frequency domain * We use FFT to convert our polynomial to frequency domain. * If we are now in the frequency domain, we just need to multiply. Faster than a convolution.

Does it clarify the missing step? Happy to update the post with what's missing.

eranation · on July 1, 2024

Yep, it definitely helps! What I'm struggling to understand is why "performing the convolution of two signals in the time domain is equivalent to multiplying them in the frequency domain" (I'm going to google/gpt it, this is more of an exercise left for me to dive into, your post is perfect as it is)

clearprop · on July 1, 2024

Thanks for the comment. That makes sense, since as you are saying, a base10 number can be expressed as a polynomial where x=10. Eg: 983 = 9x^2 + 8x + 3 aka [9, 8, 3]. Wondering how big the number has to be to make sense, and where this is used in practice.

Kubuxu · on July 1, 2024

FFT is used extensively in curve based zero-knowledge cryptography, not for the purpose of splitting a single large number into smaller ones, but to interpolate and evaluate very large polynomials (for example of the degree 2^27).

All of this happens in a field of an elliptic curve so the complexity reduction is greatly appreciated.

nickcw · on July 1, 2024

In libgmp I think the numbers have to be around 100,000 bits or 30,000 digits for FFT to be faster.

https://gmplib.org/manual/Multiplication-Algorithms

GMP has lots of other methods in between schoolbook multiplication and FFT multiplication. A nice one is Karatsuba multiplication which is very easy to understand and delivers O(n^1.58) rather than O(n^2) performance. Python uses this method for multiplying large numbers together

https://en.wikipedia.org/wiki/Karatsuba_algorithm

kurlberg · on July 1, 2024

There is a nice picture of the "best" for different ranges of sizes of numbers to be multiplied at

http://gmplib.org/devel/log.i7.1024.png

More context and explanation can be found at: http://gmplib.org/devel/

BTW, I like Bernstein's survey of different multiplication algorithms at

https://cr.yp.to/papers/m3.pdf

(there is a unifying theme about using ring isomorphisms to explain many of the "standard" routines.)

kurlberg · on July 1, 2024

PS: if you're interested in multiplying "ludicrously large numbers", Harvey and van der Hoeven had a nice breakthrough and got multiplication down to "FFT speed" (n*log(n)), see

https://hal.science/hal-02070778v2/document

A pop-sci description can be found at

https://theconversation.com/weve-found-a-quicker-way-to-mult...

pfdietz · on July 1, 2024

The Karatsuba algorithm idea can also be used for multiplication of polynomials.