I’ve heard FBLearner Flow is pretty cool for running/managing/sharing ML pipelin...

I’ve heard FBLearner Flow is pretty cool for running/managing/sharing ML pipelines inside Facebook. Never seen or used it myself, but Microsoft had a similar internal tool called AEther that was very cool too. We’ve definitely taken inspiration from AEther in building Floyd.

Here’s an anecdotal story about how awesome AEther was (been a long time, so a little fuzzy on details): In 2011, Harry Shum was the VP of the Bing division at Microsoft. It was the early days of Bing (~10% market share, ~$2bn annual loss, etc.) - we had good talent, but were lagging behind Google in tech. In one of our all-hands meetings, Harry jokingly announced that if we beat Google in our core relevance metric (called NDCG), he’d take the entire Bing team, approx. 300 people strong, for a fully paid trip to Las Vegas.

Sure enough, a year later, Bing did beat Google in our core relevance metric (http://www.insideris.com/microsoft-bing-beats-google-in-the-...) and all 300 of us went to Vegas for a weekend as promised. (Spoiler: Google did eventually beat Bing back later)

The success and rapid acceleration in relevance gains was attributed in large parts to the introduction of a new tool called AEther (in addition to improving ML tech and hiring top talent). AEther was an experimentation platform for building and running data workflows. It allowed data scientists to build complex workflows and experiment in a massively parallel fashion, while abstracting away all the engineering concerns. I used it a ton on a daily basis and loved it. The AEther team claimed that it increased the experimentation productivity of researchers and engineers by almost 100X. Even now, when I ask ex-Bing data scientists working at other companies about what they miss the most from their time at Microsoft, AEther is almost always in the top 3 answers.

Having seen how awesome AEther was from the inside, one of our goals is to bring its benefits to the rest of the world as well. However, having talked to a few individual data scientists and researchers over the last month, their preference seems to be CLI over GUI (while bigger companies like it much better). May be its one of those things you have to get used to, or may be our implementation is clunky. So we’re making the GUI an enterprise only feature for now, while we continue to help individual data scientists through our CLI.