Just to be clear, Javascript's arrays will, depending on the implementation, dynamically switch between sparse and compact implementations. A sparse javascript array literally behaves like a javascript object or a map with an integer index. Ruby and Python's arrays will grow as long as the allocator can alloc more memory onto the end, but has to copy the entire array if there's memory fragmentation...it's not quite the same as a C array.
Because of the immutable data, your best hopes are actually vectors or maps.
Btw, the erlang library you mention implements arrays on top of a tree of 10-ary(-ish) tuples. It'd need to be measured, but I'd be willing to bet that the native maps are faster at access, insert, and delete. The maps are implemented in C, and worst case for all operations would be O(n log n) or similar.
There are a bunch of different performance differences, but a couple basic differences: 1) gc is per-process, not global/stop-the-world 2) the VM is responsible for scheduling processes using threads, and can pre-empt...blocking the scheduler is considered a VM bug.
Aside from pretty simple `__using__` macros, most Elixir devs come to appreciate macros and then elect not to use them in the most situations. Most people who do use macros have a very small Macro that just returns the AST for a function call after some tiny little adjustments to the arguments is made.
There are a few exceptions where there are very large projects that make heavy use of macros (Phoenix's router, Ecto), but they are by far the exception. Most of the macros you use every day are simple ones.
If you had a multi-paradigm language, you'd need not only just some way to break the rules easily and have that bubble up, you'd also want this rule-breaking behavior to be exposed up the chain from any libraries. You'd also have to enforce the benefits of immutability and such through social pressure in order for users to see large parts of the benefits of side-effect isolation.
A lot of the choices FP languages make that are less about side-effects (like strict typing systems) are also often an all-or-none proposition. Optional typing systems have very limited benefits if libraries aren't well-typed. The static analyzers can only do so much.
Doing some rough math and my limited understanding of linux network internals, it's about 40KB per connection in this benchmark. I know that cowboy is going to require ~4KB or so per connection. Consulting a local ubuntu install, the default minimum buffer sizes in TCP will be at least 4KB each (2* for read and write), but by default 16KB each, and by default the max goes to 1.5MB or so each. This is required for TCP retransmits and such. If you have clients on shoddy connections or see packet loss, your memory could skyrocket on you. I remember reading of a case where someone had a service die despite memory overhead of 33% when the TCP packet loss rate went up (still under 1%), but it caused their buffer sizes to grow large enough to run out of memory.
So that's 8KB (will be higher with more usage) for TCP buffers in the kernel, 4KB or so for cowboy, and 28KB or so left for various other bits of the system when amortized per connection.
Minor nitpick to the nitpick, since I dug down in Elixir's compiler before. Elixir spits out a normal Erlang AST, not a Core Erlang AST, which is a bit different. There are a half dozen levels between the Erlang AST and BEAM as well. There's several instances of a talk by Robert Virding about implementing languages on the BEAM where he talks about which advantages different levels of the compiler to hook into.
Because of the immutable data, your best hopes are actually vectors or maps.
Btw, the erlang library you mention implements arrays on top of a tree of 10-ary(-ish) tuples. It'd need to be measured, but I'd be willing to bet that the native maps are faster at access, insert, and delete. The maps are implemented in C, and worst case for all operations would be O(n log n) or similar.