"Safe Rust guarantees an absence of data races, which are defined as: two or more threads concurrently accessing a location of memory, where one or more of them is a write, and one or more of them is unsynchronized. A data race has Undefined Behavior, and is therefore impossible to perform in Safe Rust. [...] However Rust does not prevent general race conditions. This is mathematically impossible in situations where you do not control the scheduler, which is true for the normal OS environment. [...] For this reason, it is considered "safe" for Rust to get deadlocked or do something nonsensical with incorrect synchronization: this is known as a general race condition or resource race. Obviously such a program isn't very good, but Rust of course cannot prevent all logic errors. In any case, a race condition cannot violate memory safety in a Rust program on its own. Only in conjunction with some other unsafe code can a race condition actually violate memory safety."
Specifically, one thread reads the counter with proper synchronization, then another thread writes an incremented value to the counter with proper synchronization, then the first thread writes its own incremented value to the counter with proper synchronization. At every step the use of an AtomicU32 guarantees proper synchronization to the underlying memory, which is what Rust is concerned with.
Great feedback. I have updated the post using convolve instead. There is a huge difference convolve/naive. On the other hand, convolved is slower than the FFT as expected, but for greater than 3000 degree polynomials or so.
See diff: https://github.com/alrevuelta/myblog/commit/9fcc3dc84b1d9b66...
Thanks! In case it helps:
* Multiplying two polynomials as taught in school is in reality a convolution.
* "performing the convolution of two signals in the time domain is equivalent to multiplying them in the frequency domain"
* FFT allows us to convert from time domain to frequency domain
* We use FFT to convert our polynomial to frequency domain.
* If we are now in the frequency domain, we just need to multiply. Faster than a convolution.
Does it clarify the missing step? Happy to update the post with what's missing.
Yep, it definitely helps! What I'm struggling to understand is why "performing the convolution of two signals in the time domain is equivalent to multiplying them in the frequency domain" (I'm going to google/gpt it, this is more of an exercise left for me to dive into, your post is perfect as it is)
Thanks for the comment. That makes sense, since as you are saying, a base10 number can be expressed as a polynomial where x=10. Eg: 983 = 9x^2 + 8x + 3 aka [9, 8, 3]. Wondering how big the number has to be to make sense, and where this is used in practice.
FFT is used extensively in curve based zero-knowledge cryptography, not for the purpose of splitting a single large number into smaller ones, but to interpolate and evaluate very large polynomials (for example of the degree 2^27).
All of this happens in a field of an elliptic curve so the complexity reduction is greatly appreciated.
GMP has lots of other methods in between schoolbook multiplication and FFT multiplication. A nice one is Karatsuba multiplication which is very easy to understand and delivers O(n^1.58) rather than O(n^2) performance. Python uses this method for multiplying large numbers together
PS: if you're interested in multiplying "ludicrously large numbers", Harvey and van der Hoeven had a nice breakthrough and got multiplication down to "FFT speed" (n*log(n)), see