More

clarecorthell · on Feb 9, 2021

No bothersome problem is too small! If you'd like to share, we would be eager to look into this. Reach us through the app or at osm-questions [a] lyft.com.

RicoElectrico · on Feb 9, 2021

The real problem of GP is with the drivers not reading the note. Each system should have a manual override, Lyft does seem to have one, but the human element failed.

clarecorthell · on Feb 9, 2021

Lyft study shows crucial OpenStreetMap road attributes are fresh and high quality in 30 North American cities, as compared to groundtruth. Blog post and paper detail the process, methodology, and results.

clarecorthell · on Oct 12, 2014

Of people who use these tools, only a very small percentage of them also build said tools. Which makes sense. In industry, companies want to build meaningful products and data sets that provide value which can be exchanged for money. That means delivering some majority of the value in the shortest amount of time possible. For that reason, engineering teams have little room for R&D, including the kind you're hinting at. Scikit, for example, has largely academic contributors (http://scikit-learn.org/stable/about.html). Not surprising. Talents of pedantry, proof-writing, and pushing the boundaries of theory probably lies in academia as opposed to industry. Industry rewards shipping code (useful oversimplification) and academia rewards novel theory (yet another oversimplification).

Solving problems requires understanding what solutions exist (and whether they can be used, must be built upon, could be used in ensemble, etc). Choosing among those solutions requires understanding (to some varying degree) why and how the solution solves the problem. Choosing the correct out-of-the-box solution is not trivial.

One very real danger here is unwittingly lying with statistics. Which is arguably worse than wittingly doing so.

clarecorthell · on Oct 12, 2014

I'm curious about the audience here -

What career goal is driving your interest in learning Data Science?

KrisAndrew · on Oct 12, 2014

It's more of a response to the market. Beginning about 5 years ago I was increasingly asked to do more numerically oriented things. Prior to that I was mostly writing applications that generated SQL and wrapped the results in some HTML. Pretty boring. Data science is more compelling.

Over the past 15-20 years there has been a massive amount of information piling up in databases and log files; not just from web applications but from desktop and mobile apps too. And there are companies who want to pan for gold in that data. So if you want to do something more interesting than fiddle with canvases, or CSS or MVC frameworks, then data science is fairly accessible if you're not afraid of math. Furthermore, most companies will have a need for it even if you don't really care to develop their software products directly.

NB: I doubled my salary by moving into data science. Nowadays gas station attendants can write a Rails app to search a database. The bar has been lowered. Understanding stochastic gradient descent (among other things) and knowing where/when to use it commands more earning power.

dasboth · on Oct 13, 2014

"I was mostly writing applications that generated SQL and wrapped the results in some HTML. Pretty boring." - This is basically why I'm interested in making the move (eventually). Earning power may turn out to be a perk but the main motivation is something that is perpetually stimulating.

clarecorthell · on Oct 12, 2014

Deterrent roadblocks are a huge problem. In fact, it's the primary reason education can't happen solely through a book or video lecture series. Students need mentorship to gain motivation and remove roadblocks. Teachers, TAs, and cohorts fill this need in the university setting. It makes sense that when you learn something new, you face more unknown unknowns than known unknowns. It's nearly impossible to ask questions about the former because you don't know how to begin formulating a question, while the latter is googleable and likely solvable.

The programmers' standard roadblock remover? google -> "stackoverflow" + problem

clarecorthell · on Oct 12, 2014

You'll find R resources here: https://github.com/datasciencemasters/go/blob/master/r-resou...

The OSDSM maintains a focus on python resources.

clarecorthell · on Oct 12, 2014

I took six months off for the OSDSM. I am very aware that it was a luxury to do so. I had to take loans, move out of my apartment, and I studied 10 hours per day.

If you weren't working 10 hours per day, 6 days per week for six months (1440 hrs), and you took 6 hours per week to do the same, it would take you more than 4 years to finish a similar curriculum.

Managing your time is still the hardest part of self-study. It always will be. Most people pay institutions to structure their lives with a workload, deadlines, time off, expectations, and consequences (positive and negative). Having the time for self-study is a luxury few people have; most people won't have the time, money, and opportunity to give up for a "classic liberal education," and by extension they won't have time for self-study, either (one such conversation on the topic: http://www.newyorker.com/culture/culture-desk/loud-nathan-he...). That's part of why new forms of education like The OSDSM are so necessary. Such curriculums fit exceptional cases that are less and less the exception.

clarecorthell · on Oct 12, 2014

The truthful meta-answer which you probably don't want to hear: We only have a statistical likelihood (of dubious confidence) of what will transpire in the future. We have no knowledge of the future. None. Same is true of your knowledge of your future problems. Use some basic bayesian logic and guess in an educated manner, like with most prediction problems.

More practical advices: Find out what problems you're interested in working on, then google, read books, engage people working on those problems to tell you more about how they solve them. Another great place to start is to look at your business' biggest inefficiencies and informational gaps, and determine which of them could potentially be addressed with prediction or statistical inference. This used to be called Business Intelligence. Even a coffee shop can benefit from understanding simple seasonality.

clarecorthell · on Oct 12, 2014

It's worthwhile to explain why many people find the OSDSM useful.

The topics listed do indeed require depth to acquire an adequate proficiency to understand and command them. The OSDSM is most useful for people who don't know where to start, which is often the hardest part. The hardest things are often the most trivial in hindsight.

Typical interaction:

Q: I want to be a Data Scientist. What do you think I should study first?

A: Build a basic proficiency in linear algebra, programming in python, and statistics. Then take cursory classes in the subjects in the OSDSM that interest you. You'll figure out what depth is most meaningful to you from there.

Q: What does a Data Scientist actually need to know?

A: Totally and completely depends on what you want to do. There are people who crunch click logs all day, people who comb the tendrils of search algorithms, and yet others who seek terrorists and criminals in statistical signs among the bits and bytes. See my last answer for relevant guidance.

e.g. https://medium.com/@clarecorthell/the-brief-multi-tweet-enum...

_qc3o · on Oct 12, 2014

In that case I have some feedback. As a beginner I still don't know where to start after reading your post. There are 101* bullet points in your post distributed among IDEs, books, online courses, libraries (the programming kind not the book kind), programming languages, pure math, applied math, database theory, machine learning, natural language processing, visualization, etc. and there is no obvious ordering on how hard or easy things are because the very first link in the math section is another list of bullets tackling some heavy-duty mathematics and I say this as someone who studied mathematics at a graduate school level.

Expanding mailshanx's answer will be much more helpful to beginners than the current 101 bullet points.

*I just counted the bullet points with document.querySelectorAll('li') so there are some false positives.

clarecorthell · on Oct 12, 2014

Start here: http://bit.ly/uwintrodatascience. It's the first bullet.

clarecorthell · on Oct 12, 2014

Try CodeAcademy and CS106a (how I learned programming at Stanford) https://www.udemy.com/cs-106a-programming-methodology

I've started cataloging some of the best beginning resources here. Main benefit is that people can battle out what the best resources are in PRs. https://github.com/datasciencemasters/go/blob/master/basic-p...