Capsule Shield: A Docker Alternative for the JVM

mitchellh · on Oct 13, 2015

[Disclaimer: Bias as I work for HashiCorp]

We designed Nomad[1], our scheduler, with this in mind. Nomad has the concept of "task drivers" and one of those task drivers is "java." A task driver provides isolation using things like cgroups automatically, without you having to containerize your application. This makes sense for applications that are already _mostly_ "containers" (in the abstract sense): Java JARs, statically linked binaries, VMs, etc.

If you compile a fat JAR you can schedule it directly with Nomad, and Nomad will handle the isolation and resource constraints. Pretty nifty!

And, I noticed Capsule talks about OSv. Nomad can also deploy "VMs" natively on top of Qemu so you can deploy that as well.

The coolest thing perhaps is that you can mix and match this stuff: you can deploy JARs alongside containers alongside VMs and Nomad handles the binpacking and resource constraints for you. And this is why I bring up this plug for us.

[1]: https://nomadproject.io/docs/drivers/index.html

pron · on Oct 14, 2015

> If you compile a fat JAR you can schedule it directly with Nomad

That JAR better be a capsule, then :) Non-capsule fat JARs don't have enough information to configure the JVM, don't support native dependencies and in too many cases simply don't work.

curun1r · on Oct 13, 2015

For me, this misses the point of Docker.

These days, production and dev environments are extremely heterogenous. We're already running a database written in C, a messaging queue written in a JVM language and a front-end API written in Node and all while serving our static assets from a generic web server. And we're starting to explore Go and Rust for more specialized components. Docker allows me to abstract away all the platform-specific issues and treat all my infrastructure components as if they're homogenous. Docker compose can bring up my entire dev stack. Docker swarm or any of the other orchestration services (ECS, Kubernetes, Nomad, Fleet, etc) can manage my entire production deployment.

A JVM-specific option is really a step backwards. Using that, I'd now have to go back to treating my JVM components differently from everything else. The whole point of Docker, for me, is that none of my diverse infrastructure components needs special consideration.

pron · on Oct 13, 2015

I work on Capsule.

It's not about philosophy but pragmatism. Most organizations are mostly Java, and even if Java doesn't make up 90% of the components, it makes up 90% of the deployments because that's what the core business software uses. Such organizations get to use a a rather crude tool that takes away some of the power of their technology stack. Why not use a slimmer, faster, less-hassle tool for the majority of your deploys?

And if we want to talk philosophy, Docker really mixes two separate issues -- portable packaging and isolation. Capsule separates them. If you want, you can launch a capsule inside a Docker container. If you want, you can directly launch it in a more convenient container simply by launching it with Shield (or outside a container by running it directly). As Shield will be open-container-compatible, there's no reason why Docker and Shield shouldn't play together. And remember, Capsule requires zero additional tools on top of what developers use anyway (it's just a plain Maven Central library that you stick in your JAR with your build tool).

curun1r · on Oct 13, 2015

You live in a much more Java-centric world than I do. Your world seems similar to the one I inhabited for well over a decade (I wrote Java code professionally from 1997-2012), but with the rise of microservices, I'm seeing almost no interest in creating any new Java code. The JVM isn't going away...there's plenty of Scala and even some Clojure. But there's also a move towards newer platforms like Node and Go. And there's even been a rediscovery of platforms like Ruby and Python when the performance penalty doesn't come into play.

Docker enables everything to play nicely together. Capsule seems to double down on the old, monoculture way of developing. For people that want to do that, I'm sure your solution is an excellent product. But my own perspective, which may be skewed by being in the bay area, is that those people are the minority, not the majority.

pron · on Oct 14, 2015

> there's plenty of Scala and even some Clojure. But there's also a move towards newer platforms like Node and Go. And there's even been a rediscovery of platforms like Ruby and Python when the performance penalty doesn't come into play.

I won't say what you're saying isn't real, but it is happening on a completely different scale than the size of the Java ecosystem. Even on the JVM, what you see as "plenty of Scala" doesn't even amount to 5% -- not that it matters, but it's just an example of why I think your view is skewed.

The size of the Go ecosystem is hardly 1% of that of the JVM (rough estimate), and while Node.js/Python are "a thing" (and are the only languages/platforms of those you mentioned that are used to a substantial amount outside of startups), they are still tiny compared to the use of the JVM. Even in Silicon Valley (which is in itself a very small portion of the software world), the large "webby" application companies (with the notable exception of Facebook) are JVM shops: Amazon, Google, eBay, Twitter, Netflix are either entirely JVM or predominantly JVM, and even Facebook is making growing use of the Java.

Most of the shifts you're seeing is among non-JVM technologies. It's mostly Pythonists that switch to Go and Rubyists that switch to Elixir/Node.js, etc.. The JVM was never the biggest player in the "fast application development" arena -- at the beginning there was VB, and then Perl, and then Python/Ruby and so forth. I see no signs that there is greater movement away from the JVM -- mostly among those who realize they're better served by the "fast dev" platforms even at the cost of performance -- than towards it -- by those realizing they need the best the industry can offer in terms of performance/monitorability/tooling even at the cost of slower development.

jacques_chester · on Oct 14, 2015

> I'm seeing almost no interest in creating any new Java code.

I work at a firm where we have dozens of greenfield Spring Boot projects going on.

And if we perform the stats on our public cloud, Java dominates the workload.

carlosdp · on Oct 13, 2015

Even if 90% of deployments are on JVM, that's 10% of deployments that you have to somehow treat differently. Docker is designed to take away this concern, just treat everything as units that need to be deployed on boxes.

pron · on Oct 13, 2015

So you want 100% of your deploys to be slower and less convenient than they need be just because you need to use another tool for 10% of them (and mostly for code that wasn't even written in the organization, so those images might not even need to be maintained by the organization)? I'd understand if you'd want to do that for a 50-50 mix, but not 90-10.

Also, remember that Capsule doesn't add another build step. You need to create JARs anyway, so you might as well make them capsules. You then either have the option of adding another step of building Docker images (just because you have some other Docker containers, most of them not even built by you) or not. Capsule just saves you that extra step and still lets you isolate your app in a container if you want, while making everything faster and management+logging easier. I don't think the need/wish to use containers for isolation should dictate your packaging solution.

carlosdp · on Oct 13, 2015

Docker is incredibly good at caching, the assumption that Docker is slow is incorrect. The first time you build or pull an image on a machine, it might take a few to get the base images, but past that it is lightning fast, and it works the same for any language and any stack.

For most companies that operate several services, having one trusted and strong way to deploy and orchestrate is way more important than an occasional load/build time when working on a new laptop / instance.

Edit: I'm not saying Capsule isn't useful, it just doesn't make any sense to say it compares to Docker, because the use-case is quite different in reality.

mwcampbell · on Oct 13, 2015

What makes Docker-based deploys of JVM applications slower than Capsule-based deploys? All of an organization's Docker images for JVM applications would probably be based on the same base image, so there's no waste there.

pron · on Oct 13, 2015

Because you still need the extra step of creating the images (and layering them carefully if you want them to share dependencies other than the JVM). To run a capsule inside a container (if you want to run it in a container), you just tell Shield to launch the capsule JAR (which is created anyway by the build). In fact, you can even embed Shield itself in the capsule.

federico3 · on Oct 13, 2015

> Most organizations are mostly Java

...quite a bold statement. Can you cite some sources?

pron · on Oct 14, 2015

Sure. Estimates are that 90% of Fortune 500 companies use Java[1], and Java dominates job offering by a wide margin[2]: "its stranglehold on modern development is unshakable." The #2 platform is, unsurprisingly .Net. If you concentrate on non-Windows platforms you see that Java easily constitutes a majority. Note that what I said is adjusted by size: fledgling companies prefer quicker development.

[1]: http://www.zipcodewilmington.com/blog/why-java-skills-matter...

[2]: http://www.infoworld.com/article/2868654/it-jobs/java-develo..., http://www.infoworld.com/article/2608294/java/employers-want...

solofounder2 · on Oct 13, 2015

I think enormous sprawling enterprises who's infrastructure has evolved over decades really need something like docker in order to consolidate what used to feel like a tangled mass of different tech. For that use case docker probably saves top firms a ton of money by reducing overall complexity in their operations.

However for early stage web startups especially ones that use the JVM there are only really likely to be a handful of diverse technologies in play. A typical example: nginx, jruby, redis, kafka, <primary persistence tech goes here>. Now at first glance you might look at this and see 5 diverse systems and think "these guys could probably benefit from docker", and perhaps some could, but I personally think that a growing number of shops feel like it's a waste of time and resources, and here is why:

Although there are five separate systems in the example above the truth is that developers are not sitting around writing nginx config files all day, nor kafka config, nor redis config, nor Mysql config files. Those are things that you only have to setup correctly once and then you only change gradually, a few lines here and there over the lifetime of the business until you settle in on the most optimized configuration for your workload.

99.9% of your team's changes will only touch the "application layer" which in this example resides on a single runtime environment (the JVM). Teams who utilize the JVM in this way usually don't feel "pinned in" by it since after all the performance is there, the mature profiling tools are there, the library ecosystem is there, and there are diverse programming languages, frameworks, and multiple programming paradigms all of which ride atop the JVM. examples: (Java, JRuby, Scala, Clojure, Vertx.io (non-blocking io/javascript), Lift, Akka, spray.io, Play, Immutant, Wildfly, etc)

What I'm getting at is there's really no good excuse for a team who standardizes on the JVM to ever need to go outside of that paradigm unless perhaps they're writing a C extension to optimize some tight loop, but these days the JVM's optimizing bytecode compiler is getting pretty decent at doing that for you.

If 99% of your application code (the code that is updated continually) runs on the JVM then even if you have a micro-services architecture with several different programming languages in use you don't really need to lean on a containerization technology to standardize deployment because all of it is already under the same runtime environment roof anyways. From an operations standpoint all your build/deploy step really needs to do is ship jar files around over the network irregardless whether it's a clojure service, or a scala, or ruby app. And all of it can be monitored in a clean way thanks to the monitoring support built into the JVM called JMX.

To use docker in such a situation is to just waste system resources and to increase network IO (moving heavy disk images around) for no reason. Instead you can just use shell scripts or Ansible to provision an instance and then rely on the JVM's own venerable mini-containerization primitive (the jar file) as the standard way to deploy apps across your fleet.

To each his own though..

necubi · on Oct 13, 2015

I've spent some time in the past few weeks trying to rebuild a bunch of uberjars as capsules, and have found it not suitable for production server use. The main promise for me is that rather than shipping 100M+ uberjars, we can ship just our code (~5MB) and have the capsule download its dependencies on startup. This sounds pretty nice in theory: your boxes can cache your dependencies (and transitive dependencies) and new deploys only need to update the new stuff (generally just your code).

But there are a bunch of ways capsule makes this difficult, particularly in the context of a mono-repo with a bunch of internal libraries. Capsule lacks a real transitive dependency management system, so you're stuck bypassing it entirely and having your build system compute the complete graph of transitive dependencies and write it in the jar manifest.

Even worse, on every start up capsule will open all the jars that have already been cached to validate checksums, which can take many minutes if you have a lot of dependencies.

The idea is really solid. I just wish the implementation were better.

mwcampbell · on Oct 13, 2015

I think fetching dependencies at run time is a bad idea. The more dynamism and complexity you have at run time, the more ways there are for something to go wrong. For maximum robustness, I think a good rule is to never do at run time what you can do at build time. So in that respect, a fat jar is better. Also, in princple a fat-jar deployment should start faster, because (given an appropriately optimized JVM implementation) the system can just map one file into memory and go.

So why not just use a fat jar, as you were before?

pron · on Oct 13, 2015

Capsule by itself does not fetch dependencies at runtime. But because Capsule is so flexible, we have a caplet[1] that you can choose to use that does.

And a capsule is a fat jar. Non-capsule fat jars frequently just don't work: they can create resolution conflicts which requires shadowing and even that can fail, they don't support native dependencies, and most importantly they require a startup script as they don't set up JVM options.

[1]: http://www.capsule.io/caplets/

necubi · on Oct 13, 2015

I agree, in principle. The issue is that as the number of dependencies (and, in particular, transitive dependencies) grows, fat jars become less and less wieldy. Our uberjars are in the 150-200MB range, which means they take up a lot of space (we produce and store artifacts from every merge to master) and take a while to build and deploy.

pron · on Oct 13, 2015

> Capsule lacks a real transitive dependency management system

The Maven caplet uses Aether which is the same dependency-resolution library used by Maven. If you've found a problem, file an issue and we'll solve it.

> so you're stuck bypassing it entirely and having your build system compute the complete graph of transitive dependencies and write it in the jar manifest

That sure sounds like a bug. The Maven capsule should resolve transitive dependencies.

necubi · on Oct 13, 2015

I'm referring to the issue in this thread [0]. I haven't seen any issues with how capsule (or rather, the maven caplet) does dependency resolution; it's that you don't have much control over transitive dependencies. This is particularly a problem when trying to create a capsule that contains the output of your internal dependencies but not your external dependencies. In other words, we've spent a lot of time getting gradle to package the correct set of transitive dependencies so that our code works; I have no desires to re-implement that in capsule's limited dependency system.

See also this bug on maven-capsule-plugin [1]. Having tried to actually implement that strategy (albeit in gradle), it's clear that capsule isn't well-adapted to this sort of thing.

[0] https://groups.google.com/forum/#!topic/capsule-user/Mjtnvwt...

[1] https://github.com/chrischristo/capsule-maven-plugin/issues/...

pron · on Oct 13, 2015

Ah, well, that's a design issue we've been grappling with, and haven't yet come up with a definitive solution. The opposing forces are convenience vs. repeatable results, and hopefully we'd like to provide the former without harming the latter. The way I see it, there are three options:

1) Allow a more elaborate description of dependencies 2) Improve the build-tool plugins to make generating the full dependency tree in the build tool and passing that on to Capsule more convenient.

I favor option 2, b/c the precise resolution already happens in the build tool, and seems like an easy and complete solution. We can continue this discussion on the mailing list if you like.

In any event, as you can see, we are always open to suggestions and happy to accept PRs.

js4all · on Oct 13, 2015

So its basically a reinvention of the servlet container model. Shield equals tomcat or one of the other servlet container apps. Caplets equal war file packaged applications. Have I missed something?

pron · on Oct 14, 2015

Capsule does not resemble servlet containers and WAR files at all. You can think of Capsule as a fat jar that always works and contains a startup script (which is declarative and placed in the JAR's manifest). It is a packaging solution. It isn't a deployment format (like a WAR), but a JAR that can configure the JVM that runs it.

And while the word "container" is used here, we mean a different kind of containers -- those that give you virtualization and app isolation at the OS level[1]. So there is no container -- like a sevlet container -- that hosts multiple applications, and no specific programming model.

[1]: https://en.wikipedia.org/wiki/Operating-system-level_virtual...

js4all · on Oct 14, 2015

An, that's interesting. So can it be used to overcome the memory leak problems of traditional servlet containers, because every app is a system process whose memory is released by the OS and not by the container.

pron · on Oct 14, 2015

Well, it's completely orthogonal, really. Capsule makes it very easy to package and deploy single-app servlet containers (like, say, Jetty), but if you wanted to you could also package JBoss as a single capsule JAR. If it can run on the JVM -- it can be packaged with Capsule (and if not, let us know and we'll fix it!)

js4all · on Oct 14, 2015

Thanks for the explanation. I will definitely try Capsule.

smegel · on Oct 13, 2015

> Docker is a terrific product, but as a general-purpose solution, is simply too blunt an instrument for the deployment and management of JVM application containers

Why?

> Image management, logging and monitoring present challenges to general-purpose container solutions, but are non-issues for the JVM.

WTF? Java applications can be a pain to set up. Jars here, jars there, Maven, Ant, XML configuration, its all a big mess. If someone can do all that for me and say "here's my Docker image that just works" I would say issue solved.

> Docker images are big and contain full-blown operating systems as they are meant to run arbitrary applications: managing their archival and evolution can become a serious hassle. On the other hand JVM applications need nothing more than a JVM and a kernel

This person seems to have a serious lack of understanding of Docker. Your Docker image does not need to contain a "full-blown operating system" - it is perfectly acceptable for it to contain just the strict dependencies of the program (the JVM) you are running. On my system java links against libpthrad, libdl, libz and libc.

> Linking: Java already supports customized domain-name resolution, so Docker’s solution of modifying the container’s hosts file is unnecessary

Does he even get what linking is about? Linking is a way of sharing configuration between containers...so the mysql host:port on one container is available as a convenient environment variable or /etc/hosts on another. What has this got to do with Java's DNS custom support?

> Monitoring and Management: The JVM has its own rich monitoring and management API, JMX

And Docker is somehow preventing you from using this?

This post reeks of Java elitism and written by people who didn't even bother trying to understand the technology they are claiming is inferior or not applicable to a True Java approach.

pron · on Oct 14, 2015

I am sorry you feel hurt, but you have misunderstood the post. Docker really is great -- and like the post says, it really does let you do everything you want. But for JVM applications, Capsule is much easier and faster, requires fewer steps and fewer tools, and launches containers that are more secure and easier to monitor and manage. Docker doesn't prevent you from doing all that (well, except for the security bit), but it does add extra work and an extra layer of inconvenience compared to Capsules. Capsules are a JAR file built by your chosen build-tool that require no set up, configuration etc.. You can launch them as a simple executable JAR, or, if you wish, optionally launch them in a container. Docker isn't inferior by any means. It is simply less streamlined -- for the JVM -- than a solution that is specifically tailored for the JVM. That's it.

ABS · on Oct 13, 2015

you kinda lost me at this:

"Docker images are big and contain full-blown operating systems as they are meant to run arbitrary applications"

because it's exactly how you are not supposed to use docker, you are supposed to include only what you need and you most definitely do not need a full operating system inside a container that runs on a full operating system.

back to Java: our base java docker images are around 70MB, see this to see how, I'm not involved in it: https://github.com/delitescere/docker-zulu

Yes, still bigger than the uberjar but that's the not point

pron · on Oct 13, 2015

Even if you optimize or images and layer them in such a way that they share dependencies, you still need to create them, which is much slower than not creating them, and requires an extra tool, and requires handling management and logging less conveniently. You don't need Docker for containers, and the JVM has better packaging solutions than Docker. Why add an extra tool and an extra step for less convenience?

ABS · on Oct 13, 2015

you are missing the point of my comment which is: your "Footprint" is factually wrong

pron · on Oct 15, 2015

I don't think so. For the same amount of work you get a much higher footprint. Want smaller footprint? Start grouping your dependencies into layers -- namely, extra work.

When I say that something is higher footprint or worse performance etc., I mean that that's what you get for the same amount of work compared to your point of comparison, not that it's impossible to make it better if you work harder.

ABS · on Oct 15, 2015

ok but that's not what you wrote. I quoted verbatim what you wrote and it's not correct, Docker images do not "contain full-blown operating systems as they are meant to run arbitrary applications" unless you decide to put a full-blown operating systems in them but that's your choice, you definitely don't have to and everyone recommends not to

pron · on Oct 16, 2015

Fair enough. Clarified in the post.

vardump · on Oct 13, 2015

I think it's a step backwards if platform specific containers like Capsule pick up. Nodeck and Gontainer next?

It's more productive to develop general containers that solve issues with logging etc. for all platforms, not for just one.

pron · on Oct 13, 2015

Docker isn't a container -- it's a packaging solution for containers. Capsule is another packaging solution that can be launched in the same kind of container. Also, the JVM isn't a platform; it's the biggest platform by a very wide margin for server-side applications.

So the containers are still general, and you use them for all platforms and they all interoperate. I just see no sense in requiring an extra, inconvenient packaging step for a platform that already packages applications well.

jacques_chester · on Oct 14, 2015

What you're asking for is a PaaS. For those wanting Docker support, Cloud Foundry and OpenShift Origin are available. For those who also want Buildpacks, Cloud Foundry supports that out of the box.

Disclaimer (deep breath): I previously worked on the CF Buildpacks team, and I work for Pivotal Labs, a division of Pivotal, the company which donates the majority of engineering effort to Cloud Foundry.

reitanqild · on Oct 13, 2015

Meanwhile others use JavaEE. Supported by multiple vendors and multiple open source options, easy to use, well-tested and battle-hardened.

Oh: and it has a secret feature - pragmatic and friendly developers.

gmarx · on Oct 13, 2015

Do you really need just a JVM and a Kernel? You must need a filesystem too. How do you connect to a database?

mike_hearn · on Oct 13, 2015

Filesystems run inside kernels (normally).

pron · on Oct 13, 2015

Very true, although technically I'm not sure the JVM requires a filesystem. The class loader for application classes is a URLClassLoader that can load classes over the network, so the only problem is the boostrap class-loader, and I'm sure that could be taken-care-of, too (might require a JNI wrapper).

ZenoArrow · on Oct 13, 2015

> "BONUS: Capsule OSv"

That could be something decent. I'd be interested to know what performance benefits you get from using Capsule to manage OSv. Does anyone have a JVM-based Docker app they could use for comparison?

mike_hearn · on Oct 13, 2015

The performance boost for using OSv can be pretty significant if you can use it (it's a rather specialised tool). You're effectively using the hypervisor as a kernel. If you mean "a docker based app on OSv" then the answer is almost certainly no, as OSv is an operating system that's designed to only run one or two apps at once. It's not a regular operating system at all. I don't think Docker would likely even start on it.

mwcampbell · on Oct 13, 2015

I think it would be more interesting to compare a JVm app running on OSV inside a VM to the same JVM app running in a Docker-compatible container on bare metal. In both cases you effectively have a single process running on a kernel, without the extra kernel in between (as in the common case of an app running on a general-purpose OS in a VM). But virtualization probably still has extra overhead. And of course, the hypervisor offers lower-level abstractions to the VM than a general-purpose kernel would offer to a container; for example, for storage, you get a block device rather than a filesystem.

Note that I said Docker-compatible. One alternative to the Docker Engine itself would be Joyent's Triton, which uses Illumos rather than Linux as the kernel (while still supporting Linux binaries), so it doesn't have the security problems of Linux namespaces + cgroups.

pron · on Oct 13, 2015

Right, but luckily JVM applications do run on OSv, and they have even made some modifications to HotSpot that can use more efficient (non-POSIX) syscalls for added performance (esp. networking I believe).

ZenoArrow · on Oct 13, 2015

Sorry, I wasn't clear, I mean comparing a JVM app managed in a Docker container with a JVM app running on OSv managed by capsule.

sspies · on Oct 13, 2015

I do not like fat jars. We use JVM + mvn + appassembler and pack the output into docker images. Not a big deal.