At it’s heart the JVM is designed to run OO languages very efficiently. This is why languages like Ruby and Python are (mostly) easily
ported to the platform...
However, the JVM is less suited to running non-OO languages. Languages like Erlang, Haskell, and Scheme provide features, like tail recursion, closures, and continuations, which are not prominent in the mainstream OO world the JVM targets. They depart far enough from the OO model to make the JVM a poor platform choice.
That's strange, because all of the Smalltalk VMs are capable of closures and many of them can do continuations, yet Smalltalk is often cited as one of the purest OO environments. (I'm also fairly sure that tail recursion can be done as a compiler trick.)
That said, there are real limitations of the JVM. Most Smalltalk VMs can be targeted by a Java compiler to do a dandy job as a JVM. It's much harder to target Smalltalk for a JVM, unless you limit the capabilities of the environment. (Particularly runtime compilation.)
I'm also fairly sure that tail recursion can be done as a compiler trick.
Only for a single function, at least on the JVM. For mutually recursive functions (ie, a() tailcalls b() which tailcalls a()) you either need the VM to do the optimization, or some explicit control over the stack pointer, or implementing your own function call model within one function.
For the first, you would need some sort of new tailcall instruction, or the VM would have to be guaranteed to do tail call optimizations. (You can't really give control of the stack explicitly while keeping things portable and efficient)
For the second, it's workable... if you don't care about interoperating with other languages on the JVM.
I get the impression that Smalltalk is very late-bound, whereas Java is sort of half-late-bound[1]. Late binding enables you to do pretty much anything.
[1] For example: in Java, methods with the same name but different argument types ("overloaded") are statically bound; whereas in Smalltalk, they are dynamically bound. Java dynamically binds methods with the same name and same argument types that are defined in a subclass/implementation ("overridden"). Sounds like a performance tradeoff to me. (Is Smalltalk performance C-like or Python-like?)
and we make the following invocation - which one gets called? The static, compile-time type of the invocation is Object and the dynamic, runtime class of the invocation is String. Which one will select the method?
Object a = "hello";
print(a);
In Java, the first method gets called (the one taking an Object argument). Because the compile time type of this invocation is Object, it is bound to the first method, regardless of the runtime class of a. Even if a is a String. It's the compile time type that selects the method, not the runtime class. To select the other method, we need to change the compile time type:
> For example: in Java, methods with the same name but different argument types ("overloaded") are statically bound; whereas in Smalltalk, they are dynamically bound.
In Smalltalk you simply have messages. An object either responds to one or it doesn't, there's no concept of overloading. A method that looks the same, but has a different number of arguments is a different method.
For example, in Java you might have a print method which takes a string, or a method with the same name which takes a format string and a variable number of replacements:
smalltalk is generaly slow, and it has the same calling system as ruby.
in smalltalk, methods with the same name don't exist. every method on a object has an unique selector. The selector plays exactly the same role as the signature in java there is no difference on this aspect.
In what context? I've seen implementations of block encryption in Smalltalk that ran 3% faster than a DLL written in C! You wouldn't want to do heavy numerical integration utilizing lots of double precision floating point in Smalltalk, but for a lot of the sort of programming I do, several of the available Smalltalks are very snappy and responsive.
No, if I say "generaly" I don't give context, sorry. Don't answer to a rule of the thumb with anecdote.
More specificaly, on the same algorithm:
It's generaly slower than C/C++ (because of the method calls and the tagged arthmetics), unless you use the GC against the staticaly compiled code and slow allocator, but you need special cases. C/C++ "generaly" have faster method call.
It has few chances to be faster than java/.net who have the static types to get longer monomorphic specialization chains appart from corner cases, and generaly better GCs. And those can do deoptimization if the profile change, it's been a long time I left smalltalk, but the last time I checked VisualWorks couldn't.
In 11 years of Smalltalk IT consulting, I have never run up against the message send being a performance issue. For me, it's accessing the disk, the database, middleware, poor design with regards to network latency, or poor algorithm design.
I take your message, and change the "Smalltalk" by any other language, and I get a generic excuse for any language.
You can take a prominent ruby website, they will never admit ruby is slow, they cache everything everywhere to do the trick and then blame the algorithm too. Where on a JVM/.net platform the guys could recompute the same stuff 300/s without even needing to think about it, and still having the IO as a bottleneck. The same algorithm would'nt even be necessary in the first place.
BUT there are actually great differences in execution speed (or memory consumption) in languages for the same algorithm.
When your language pales in the benchmark, then you have various choices :
- dismiss the bench as not "real life"
- rewrite the algorithm to bank on the strength of your language (for smalltalk I would bet on the GC)
- count the lines of code (readability is not an objective metric so you can kill it for free). "Yeah but mine is more maintainable".
- admit that this is one weakness of your language and learn other languages for the day you'll need them.
I'm not dismissing your experience (far from it, actually I tend to like "slow languages" too).
I've worked with smalltalk and some friend did crazy stuff with it performance wise, I did crazy stuff in java too.
But
- smalltalk have a slow message dispatch time
- ruby have slow message dispatch and no real GC
- java have too big memory footprint and is too verbose
- O'caml and haskell have unreadable type error messages when you play with inference. And they are statically compiled.
- etc.
Maybe Apple will own the next disruptive platform (iPhone) and objective-C will have its day after all.
BTW: "OOP to me means only messaging, local retention and protection and hiding of state-process, and extreme late-binding of all things. It can be done in Smalltalk and in LISP. There are possibly other systems in which this is possible, but I'm not aware of them." http://userpage.fu-berlin.de/~ram/pub/pub_jf47ht81Ht/doc_kay...
first class closures are not a JVM problem, but a language problem. Moreover, closures (lexical closures) exists in java itself: an annonymous inner class captures (well, on demand, for perf reasons) all the lexical terms visible at the point of use.
the tail call is a bytecode security problem. Looks like this problem has been solved and will be merged.
Anonymous inner classes aren't really closures in the more traditional sense, since they require that the captured variables be final; that makes them more restricted than closures generally are in other programming languages. That might not be a problem for Erlang given its write-once variables, but it is a problem for other languages.
We've been working on generating JVM bytecode for our in-house language, which does support a less-restricted version of closures, and we work around that by wrapping closed variables in one-element arrays, though you have to do for both the initial declaration and on every reference to that variable, even outside the closure, which gets really annoying when the closed variable is something passed onto the stack as a function argument, and probably in other cases I'm not thinking about right now. It's definitely workable, it's just messy and incurs more overhead than is ideal.
So, in other words, it doesn't fit into preexisting languages idiomatically. You want to introduce new idioms into Closure to make it integrate better with the JVM.
That's fine, but it also means that if you want to do a direct port of, say, Ruby or Python or Erlang, you'll have several rather large warts compared to new languages designed to run on the JVM and use the Java class libraries.
iteration and getting system setting stuff are so generic you might as well implement it yourself or write a thin wrapper (in this case iterating over entries in a hash table and getting the OS name)
IMO, a good chunk of the java libraries are there to overcome java deficiencies, and don't produce anything grand on their own for other languages.
I'm looking at LLVM at the moment, and I can say that it is far from a JVM/.net .
I wonder if I can invest on it because they will close the gap, or if they are just playing catchup with gcc, in which case there is no future for GC languages.
I don't know too much about LLVM, but the JVM has a ton of advantages: it's pretty much universally available on all platforms, it uses a fairly trivial set of instructions that make learning the instruction set and using it fairly easy, there's a ton of information out there about it covering everything from GC algorithms to VM flags to bytecode instructions and memory models, and both the Sun and IBM versions have been heavily optimized over the years. It might not be the best fit for every language, but it's a pretty impressive platform to build a language on.
Check http://blogs.azulsystems.com/cliff/ for really impressive work on jvm that works on 800+ cores ... oh yes, and you have constant time garbage collection.
I remember his hashmap when he published it, state machines are a great way to do concurrency.
The Erlang way of handling concurrency is to spawn millions of "processes" (threads) obviously these can't be hardware threads so Erlang is hitting problems OS schedulers would handle (or not be able to handle) in the JVM's case.
I would argue that the JVM is eventually going to have to handle the threading itself if it's to support any actor based programming. OS schedulers have other uses they need to be fast for and hardware threads will always have higher overhead. Python's Stackless is perhaps an example of the amount of work it would require.
You seem to be implying that one needs e.g., "green threads" to support huge numbers of "actors". That implication is false.
Also, the only way that Erlang can support running on multiple actual hardware cores is, just as in Java "actor" implementations, to do an MxN mapping of actors to cores.
Now, at one end of the spectrum there's the basic rule of thumb for each system to have O(number of cores) hardware threads and map all of the actors down to that, there are good arguments (by people like Paul Tyma) that promote using 10s or even 100s of thousands of hardware threads in Java on real, shipping OSs (late model Linux is his focus).
Of course, Erlang as both a language and its VM, are matched to each other and their specific problem domain (lots of little actors focused on coordination rather than computational or data transformation intensive jobs) whereas Java is more general. So, things like the management of multiple run queues in Erlang certainly make sense.
No, you brought that (green threads implication) into the discussion. My whole point is that you're making an implication based on that distinction and that's just not true.
Both Erlang and the JVM have to map lots of actors to a number of hardware cores. Lightweight threads of various sorts on top of a smaller number of hardware processes/threads is the same for both. Both Erlang and the JVM use an MxN model. You're last sentence seems to imply that you believe something different.
They make some different tradeoffs because of their goals (coordination vs. general purpose) that has some real implication as to solving particular problems more or less easily. I.e., if you're doing coordination dominated systems then Erlang is easy and the underlying performance loss due to other implementation issues is mitigated. On the other hand, if you have lots of data and computation, then Java's solution will blow Erlang away. The real world is about understanding, choosing, and managing the real tradeoffs -- not pushing some theoretical ideal.
Even discounting specialized hardware+VM solutions like Azul, the regular JVM runs fine on shipping hardware that has lots of cores.
The dominating limiter in practice is actually garbage collection. I.e., very large heaps will drive the scale up vs. out decision way before cores is a serious issue.
Why would you even bother doing this in the first place. Erlang's VM is one the things that makes it interesting, Not the syntax.
The only reason I can see is the OTP libraries, but then again, why not just use the Erlang VM as you get so much great stuff for Actor based concurrency out of it.
Erlang isn't a dynamic language in the sense that Parrot is optimized for, either. Erlang-on-Parrot wouldn't be able to use any other libraries developed for the Parrot ecosystem as they would depend on mutable languages any better than it already can (it would still have to go through the Erlang ports), eliminating a major reason to go to a given VM, and the Parrot VM optimizes a lot of cases that Erlang doesn't care about while missing out on many it does.
Erlang's data structures are not manifestly typed, but in most other ways it is a static language.
He doesn't mention the Erlang scheduler and processes. Shouldn't be that hard to hack into the JVM, but as I've commented elsewhere, that's only half the battle...
However, the JVM is less suited to running non-OO languages. Languages like Erlang, Haskell, and Scheme provide features, like tail recursion, closures, and continuations, which are not prominent in the mainstream OO world the JVM targets. They depart far enough from the OO model to make the JVM a poor platform choice.
That's strange, because all of the Smalltalk VMs are capable of closures and many of them can do continuations, yet Smalltalk is often cited as one of the purest OO environments. (I'm also fairly sure that tail recursion can be done as a compiler trick.)
That said, there are real limitations of the JVM. Most Smalltalk VMs can be targeted by a Java compiler to do a dandy job as a JVM. It's much harder to target Smalltalk for a JVM, unless you limit the capabilities of the environment. (Particularly runtime compilation.)