Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Looking for a co-founder to change language learning forever
50 points by solarmist on Aug 22, 2022 | hide | past | favorite | 38 comments
I’m throwing this out because I’ve gone as far as I can as a solo founder without holding myself back. (More info about me in my profile.)

I have an awesome prototype that re-imagines how people learn a language. I think this has a real chance to change the world by making language learning as achievable as learning to read in your first language and as predictable as following a recipe.

I applied to YC last year and got an interview, but it was still early enough in development that I needed to rely on instinct and intuition to answer questions. I’ve finished the initial R&D and am building the final UI for the prototype. The prototype natively supports any pairing between 20+ languages, including English, Japanese, Korean, Chinese, Russian, and many European languages. Practically these pairs are limited by the availability of bilingual dictionaries.

I’ve been meeting with a couple of people a week, mostly language learners, teachers, and tech people, but I am finding people, perhaps unsurprisingly, have a lot of trouble seeing past the curse of knowledge. We’ve all learned a language, so most of us feel qualified to teach it to others without deeply examining our assumptions.

Meeting someone on HN isn’t ideal, but I’m determined to find the right people and I’m willing to improve my chances in any way that seems productive.

What’s my idea?

I’ve studied Japanese for years and constantly wonder, “how close am I to being able to read and understand this book, academic paper, song, or show?” And there isn’t any tool that can answer that question for me.

Natural language processing (NLP) has advanced to the point where I can build something resembling abstract syntax trees (ASTs) for natural language sentences. By indexing the words and original sentences, I can easily track a user’s learning across any number of pieces of content, link to word references like dictionaries, cross-index words to show other examples of usage, create automatic exercises with full context, schedule reviews (SRS), tagging named entities (proper nouns and the like) and provide concrete guidance.

I want to build language servers (natural language LSPs) and integrated language environments (an IDE for language) for individuals. For any content a user adds she will have access to content reader, dictionaries, cross-references, character information (like kanji) if applicable, review exercises, parallel text tools (not machine translation), and sentence shadowing tools.

My prototype takes the books, articles, or web pages you would like to read and creates a detailed index of all the words and sentences; it then finds the fewest sentences that cover all the words and turns those into fill-in-the-blank exercises to study from. You are given real feedback and correction when answering exercises. It also includes an integrated e-reader and dictionaries.

Who do I want to meet?

You must be:

* Deeply curious and technical (but this need not be in programming);

* Willing to ignore established methods/ideas when reality doesn’t match;

* Deeply believe that at every level, the details matter;

* Able to say “No” in order to ship, even for things with obvious immediate value.

I'm currently preparing my YC application for the winter 2023 batch. If interested, please email me at joshua@solarmist.net. Put "co-founder" somewhere in the subject and include:

1) something awesome you've built

2) why would you be right for this and

3) the best time and way to contact (FaceTime/Zoom/etc) you.

Even if you aren't the right person, please upvote this and help us meet.



I think it is a solid method to learn a language by reading a lot and taking notes on words and expressions. I learned french this way in the past year and now I can read entry level texts including news sites like lemonde.fr.

The method I was using is similar to what you described, but less automatic and integrated.

To save dictionary time, I used the google translate extension, I configured it to pop up automatically when I select some words. It worked like a charm.

I didn't have a good tool for review. But I usually read the article again to check if I remember the new words. Recently I came across lingonote.com, which seems targets the material collection and review problem as well.

I myself is also building a language learning app, but more focused on listening and speaking. I am not reaching out for cofounder per se, but I am very happy to see your post and would like to offer my encouragement.

I also checkout the Parsnip app the other day. nice try:)


Thank you.

Yes, your experience is one I hear all the time. That's the biggest thing I want to accomplish more automatic and integrated.


I missed the most basic part of my pitch.

Language learning is a 30 billion dollar market, with DuoLingo and Quizlet representing the first billion dollar companies in a market with no clear leader, and I think Parsnip could become the leader for language learning apps.

I use Parsnip for all of my own reading and I'm doing my best to get it into users' hands as soon as possible. You learn what you need to read books, articles, or web pages in ~30% of the time compared to immersion only, and you learn vocabulary at least 6 times faster than flashcards using spaced repetition (SRS).

As you can see by the passionate and detailed responses on this thread there is significant user interest.


> mostly language learners, teachers, and tech people, but finding people, perhaps unsurprisingly, have a lot of trouble seeing past the curse of knowledge.

Respectfully, how do you know that you're making something people want?


If you've every completed a Duolingo just to realize you can barely string a sentence together you will have shared my pain. Language acquisition is a long, confusing and lonely road. It's clear that OPs vision for NLP augmented language learning will be the norm one day sooner or later. Dictionaries, endless books and tutors is just too much time and money for most people.


Oh, I completely agree on the problem, and I think any technological solution to that problem will require NLP (perhaps tautologically so). But to my mind, the open questions are (a) what does that mean specifically? and (b) is a tool like that compelling enough to motivate a venture-backed startup in 2022?

To expand on (a), I think a Young Lady's Illustrated Primer [1] that generates interesting level-appropriate content on the fly is extremely compelling, and it's aligned pedagogically with principles like extensive reading [2]. Leaving aside its feasibility, such a primer is clearly providing NLP-augmented language learning --- but at the same time, it's a very different vision from what OP describes.

More broadly, tools more aligned with communicative language teaching (CLT) would almost certainly rely on NLP without reaching too deeply for dictionaries, exercises, spaced repetition, and grammatical explanations, all of which seem to be in scope for OP.

There's a lot of room here, which is why I maintain that the critical question is a simple one: is OP's specific solution to the problem something people want?

[1]: https://notes.andymatuschak.org/The_Young_Lady%E2%80%99s_Ill... [2]: https://erfoundation.org/wordpress/


Huh? I don’t follow. Parsnip is a tool to highlight and surface unknown words to directly enable easier extensive reading and comprehensible input. One of the key parts of it is a reader app.

Just because we can’t see the steps needed to build the primer doesn’t mean we shouldn’t create anything.

It seems like you find something about my idea distasteful, but I’m having trouble understanding what, if anything, your objection is?

Is it too “mechanical” of an approach to you? If so I hope you realize what I’m describing is “how the sausage is made” not what the user experience will be like. Computers are here to enhance human experiences not make us more mechanical.

Also, Duolingo and quizlet are both billion dollar companies, so I believe that proves VC interest. Language learning is a nearly universal desire, but most people give up on it in their teens.


I was just replying to ipnon's comment in the abstract here -- it wasn't specifically about Parsnip, and I don't mean it as a subtweet or anything like that.


Ok, gotcha. Yeah, I understand that. It can be hard to determine intention with text only.


The answer is that Duolingo doesn't teach you grammar, which is why a bunch of startups have popped up to teach grammar specifically. It's not evident NLP is the solution.


The simplest answer is because this is what I want for studying Japanese and I wish I had when I was learning Korean at the defense language institute.

For a longer answer:

I also know a lot of people and am constantly meeting more who have successfully and unsuccessfully learned other languages. One problem that comes up constantly is the intermediate plateau or death march where it seems like no matter how much someone studies they never make any observable progress.

To be functionally independent in another language you need to be able to recognize and understand on the order of 25k words. Parsnip is designed to target that by finding and tracking your learning of all those words and automating the work of looking up, practicing and recording those words.


> One problem that comes up [...]

I agree that many language learners end up in the intermediate plateau, so there's a clear problem. But, how do you know that Parsnip is a solution these people want?

> [...] by finding and tracking your learning of all those words and automating the work of looking up, practicing and recording those words.

On a tech note, I'm curious what you think of lingq.com, which seems quite adjacent to what you describe here.


Again mostly personal experience and direct feedback from people saying that keeping track of vocabulary is hard.

Lingq was an early inspiration, but it has many limitations (for example, it has trouble tracking word conjugations, let alone kanji in languages like Japanese) and relies almost exclusively on manual volunteer work. It provides very little leverage to users.

For a purely leverage-based answer, Parsnip will save hundreds of hours of manual work per user purely by automating manual things like creating fill-in-the-blank exercises, vocabulary tracking and dictionary lookups (a conservative estimate is 70+ hours for this alone).


Hello! I have some input and questions and feedback - I have a deep interest in this topic.

For background, I studied Japanese to a reasonable level of proficiency (read novels, worked in an all-Japanese company for many years, passed N1, etc).

I found the hardest plateau to overcome was the intermediate ("I know a bunch of words, grammar.I can read most things given time, and have a basic conversation in a 1:1 environment about topics I'm comfortable with") to proper fluent ("I'm comfortable in group conversations, able to work around words I don't understand, not have to ask people to slow down, etc").

The most challenging this was always knowing if I'm correct or not, how correct, and if not, why.

When you are beginner -> intermediate, the validation curve is basically if they understand you, or not. Once you get pretty decent, upper intermediate/lower advanced, people can generally figure out what you are trying to say or mean, even when it's weird/unnatural. Getting feedback beyond this point is really hard.

What you are suggesting just sounds like a faster way to get to upper intermediate - not really the last 10$-20%, which is the hardest (upper intermediate to advanced, to fluent).

For this last step, I think the problem that needs to be solved is feedback that includes not just what is un-natural, but how it can be more natural and ideally why (sometimes the reason is "it just is", which is fine -- in this case just show examples that are similar that illustrate the common pattern).

A super basic example that illustrates this might be as described here: https://oshiete.goo.ne.jp/qa/7848587.html

O パンが食べたい <> パンを食べたい

Both are grammatically correct, but the former is natural, the latter is weird. There is no real way to get this feedback without someone telling you, though, since everyone understands you either way, no-one is going to stop and point out your weirdness.

Do you think your tool can solve this problem? Ideally, I want to be able to write something complex, and have it improved - kind of like bilingual Grammarly, I suppose, but more aimed at semantic/natural language.


Yes, this is a big problem and will require many tools to tackle.

I see turning Parsnip into a language learning platform which can be incorporated into other tools and products. Including one like Grammarly in the future, but active feedback for language production is, as you mentioned, a much harder and more subtle challenge.

I do thing tools like GPT-3 and the like make it achieveable though.


This is very akin to something that I have been interested in for some time. Being bilingual myself, German and English, and having a keen interest in historical natural language progression and evolution has led me down a few roads related to learning models and personal experiences. Testing teaching approaches for myself in regards to other languages using non traditional models and approaches, many may seem abstract but this is how I feel language learning should be approached. The cookie cutter methodology of one size fits all does not work, i.e flash cards, standard repetrion and memorization etc. I have taken a more functional approach to language modeling personally, not so much in a technical sense as my coding skills are still subpar, but the ideas and approaches are grounded in the use of current NLP models and methodologies and how to tailor them to myself. Rather difficult to lay out the approach in a "short" reply as it is multi faceted and may sound disjointed in this format. However, I feel I have found a "new" model, or approach to a model, of learning which is non traditional and works through engaging the individual through their own interests and experiences to tailor the experience in a way that doesn't feel like task engagement or an obvious game model approach. Language is also rooted in mathematics and, in my opinion, can not be disconnected from the model especially regarding type text. I have found that hand written script as well as the type script and font formats also reinforce these aspects, much of this idea is based in pattern recognition and repetition within the context of all language but not in a "conscious" way. Hopefully this reply makes sense. Would love to make the time to show some of the methodology I have approached this with and developed and solidify approaches into a less abstract response, either way good luck!


That is quite the information dump. But I get it.

In building Parsnip, one of the hardest things to do is communicate in a way others can digest, isn't disjointed, and get excited about.

Email me, and I'd be glad to chat with you.


What are your thoughts on the Fluent Forever method, specifically the part that going through the effort of creating your own study materials is an integral part of recall and internalizing new words/knowledge? It's definitely attractive to have a lot of the "manual work" be automated, but maybe it's a necessary effort to cross the chasm that is intermediate language learning and perhaps most people stay at intermediate levels not because of the lack of a tool but because there's a natural filter with how much effort is required to go from intermediate to fluent.

I've been "stuck" in intermediate Japanese for years now after being on and off multiple times. Got to 1.2k kanji, "ok" grammar and ~3k vocab words. Perhaps something like this is what's needed. I've been wanting an "instructor" that can do this sort of indexing for all sorts of content like TV shows, movies, books, articles, etc.


I think it's conflating doing anything that might help with doing the most valuable things. As a concrete example, if you looked up 25k words once in a dictionary at 10 seconds each (this is speedy for a digital or paper dictionary), it would cost you >70 hours looking things up. You'd be hard-pressed to convince me that getting very good at finding stuff in a reference is directly improving my language skills.

The intermediate plateau is because of Zipf's law. In a 300-page book, there are ~5500 unique words and ~3000 of them occur once or twice. This isn't a big deal for native speakers because a 300-page book is about 100k words (1 day's worth of content), but for a language learner, that might take weeks or months to cover. To go further, that native speaker will probably encounter those words again in ~40 days, but it might be years before that learner re-encounters all of them (having long since forgotten them).

Your time is best spent focusing on the sentences (30% of the book) that contain those 3000 words because they use almost all of the rest of the words.


> Your time is best spent focusing on the sentences (30% of the book) that contain those 3000 words because they use almost all of the rest of the words.

This seems to assume: (i) that readers of a 300 page book in a foreign language (not typically a beginner task!) are choosing to do so primarily as a means to the end of learning/remembering unfamiliar words, and not because they want to understand the content of the book itself, develop their appreciation of literary phrasing, challenge themself etc., and (ii) that focusing on a [probably disjointed] subset of the sentences in the book won't deprive the reader of the necessary context to grok sentences even when the words are familiar. I'm not sure either is generally true.

Ultimately the alternative to using machine-selected sentences isolated from long form text for learning new words or fill-the-blanks exercises is using definitions and exercises specifically constructed to be accessible and relevant to language learners. The only obvious case where I can see the ML process generating more useful examples is if the language learners' needs are skewed heavily towards absorbing the sort of specialist technical/professional vocabulary conventional learning courses don't cover.

I also think that picking up common and uncommon idiomatic phrases would be at least as important as individual words too (though this is definitely something an ML tool can aid)


This scales up and down freely. I choose 300 pages because that is ~100k words or the amount a native speaker processes daily.

My process is just the opposite of (i). I want to read and understand a book, so I want something to show me where my deficiencies are. Then when I read the page, chapter, etc. it will go much more smoothly. This is also an iterative process where I'm constantly going back and forth between studying new words in a section and trying to read the section.

For (ii), every exercise or review sentence has a link directly back to the source material. I am playing with the idea of extending the context to +/- N sentences when showing an exercise as well.

> using definitions and exercises specifically constructed to be accessible and relevant to language learners

This is the prescriptivist view of language learning and how all classes and textbooks are created. It can be useful, especially at the earliest stages of learning a language. Still, I mostly reject it because when using a language, I have very little control over the content I have to consume. I don't get to choose how an article is written or how someone speaks to me. So the sooner I address that as a language learner, the faster I will become comfortable with arbitrary content.

Phrases are great, I agree. I don't have a vision of how that could work technically so it's just in the pile of ideas I'd love to do eventually.


> For (ii), every exercise or review sentence has a link directly back to the source material. I am playing with the idea of extending the context to +/- N sentences when showing an exercise as well.

This would be a good idea, but my point is more that content in general writing (as opposed to specifically constructed to be self-explanatory writing) is inferred from structure and callbacks to the words or tone of much earlier sections of the writing. Language learners do have to handle passages of text which aren't written with ease-of-comprehension in mind, but they don't have to try to fill in the blanks for "As seen in the previous chapter, x is an example of ______" without reading references to x in the previous chapter first. That's often an impossible task even for native speakers. Similarly, people are much more likely to correctly guess at meaning of a word describing a characters' emotional state (or internalise the meaning after looking it up) if they followed the narrative of the section six pages earlier which provided the context for their emotional state. Not stripping that context, or algorithmically isolating the sentences in a piece which don't require context to fully understand is a tough challenge.


Ah, okay. I understand now.

Yes, that can happen, but from my experience it is rare; Also because this is focused on learning the language I can give hints/affordances like the word, or definition, in their native language, so all user needs to do is produce the word in the target language and conjugation.


If you're looking for something like this for Japanese, Kou, has been doing great work on https://jpdb.io/ which contains many of the concepts OP's app is trying to accomplish specifically for Japanese.

In particular it has the SRS, parallel text for words, assisted study and % of content you already know.


Yeah, I'm familiar with this and it's a great resource.

The key difference is that it is still flashcard based. And you have to choose between a word or sentence cards that conflate the context with the word being studied.

By actually parsing the text, when you are studying the word "eat": "Where are we going to eat?", "I wanna eat a burrito the size of my head!", "No way. You just ate that ice cream cone."

* You separate the target of studying from the context/presentation. You can create an exercise from any of the many sentences using eat, ate, eating, etc. And change examples for every study repetition.

* You can focus on individual conjugations or overall knowledge of the word.

* You have an objective right/wrong signal instead of a subjective 0-4 scale.

* You can attach definitions to not only the target word but all of the words in the sentence on demand.

* You can track learning across all 25 of the words from my examples simultaneously. And those sentences can be used to generate exercises for any one of those 25 words.


Hi. I'm the dev behind jpdb.

> The key difference is that it is still flashcard based.

Oh I have a lot more ambitious plans than just flashcards. (: It's just that you have to start somewhere, and flashcards are basically the current gold standard. No need to immediately throw the baby out with the bathwater, especially since a huge chunks of code can be reused for other forms of learning.

> You have an objective right/wrong signal instead of a subjective 0-4 scale.

I also have this as an option (pass/fail mode). The con of this is that it lowers the quality of SRS predictions, so you end up having to do more reviews. But depending on how much time you gain by not having to think about how to grade it can become a net improvement, so it's a tradeoff. I haven't yet had the time to precisely measure it yet.

> And change examples for every study repetition.

For what's worth I have that too. (:

> You can focus on individual conjugations or overall knowledge of the word.

In Japanese I feel this would be counterproductive, since essentially all conjugations are regular. This means that once you're not a beginner you just want to learn a word without wasting time on conjugations. But for a beginner, sure. And if you want to make any money targetting beginners is the way to go.


Sounds great. If you're up for it, I'd love to chat with you about jpdb. It sounds like, and if nothing else, we'd have a great conversation about the challenges and ideas behind making a language learning tool.


I could be interested. My app is currently translation based(word alignments, sense disambiguating) but had planned to venture into assisted writing for people who need translation assistance, but not 100% help, then giving them a score regarding how much help they needed, or how accurate their portion was. Then using that to help give advice on what they struggle with. Language is hugely complex so I’m currently focusing on ES-EN(or is that spa-eng? Depends on if you’re using nltk or spacy).


I'd love to chat; even if it doesn't lead to anything, please shoot me an email.

Yes, translation is what all the tech is focused on but, it's really hard to make it useful for language learners. Because translating requires many more skills than just being bilingual. The most fundamental issue is that most native-sounding translations require being non-literal in many places.


I agree. That’s why my focus is not on being native sounding but conveying meaning. If I say “I want to buy a seal.” The most native sounding translation that says “I want to buy a gasket” is useless if my meaning was “I want to buy a (marine animal).” Obviously that’s a blatant use of an ambiguous word but there’s a surprisingly large number of words we don’t think about as being ambiguous but translators can alter the meaning.


I've studied a few languages and have long wished for / thought of making software roughly along the lines of what you're suggesting here. While tackling a text, looking up words one-at-a-time can be painfully inefficient and break the flow. Digital dictionaries are quicker than paper ones, but don't resolve the fundamental problem. Nothing as advanced as natural language processing crossed my mind, but I did imagine software that would take a target text and generate a vocabulary list that the user could then study efficiently, allowing the user to then read the text fluidly without need for looking up words. There is a long tradition of human-produced "readers" that do this, but software would allow generating this for any text the user wanted and with much greater flexibility and learning options (e.g. auto-generated Anki flashcard decks).

I looked into LingQ (mentioned in another comment here), which promises just this idea, but found that it failed pretty badly in execution, to the point that a paper dictionary and notebook just worked better.

I know you're seeking a cofounder, but here are some suggestions I hope you find useful:

- I think it's essential to use a top-quality dictionary for something like this. I've found Oxford to produce excellent bilingual dictionaries in the languages I've studied. Of course these are copyright-protected and so would require paid purchases or licensing of some kind to use. I think there's a tendency to use generic free-license dictionaries for software like this, and these really aren't good enough.

- There's also a tendency to bake into the software the false assumption that "a word is a word", when of course words can have many different meanings, words in different languages do not have one-to-one correspondence, and for most languages, learning a word requires also learning extra information such as gender, conjugation, declension, pronunciation/stress, etc. This goes hand-in-hand with using a top-quality dictionary, which lists multiple word meanings, phrases, and extra linguistic information that poorer-quality dictionaries omit or get wrong.

- I would suggest incorporating human-produced translations in the user's native language of the texts in the user's target language, at least as an option that the user could upload. Even the best machine translation software can miss quite a lot.


Thank you for the thoughtful response. I completely agree on all points.

Yeah, my first idea was automatic Anki cards, but once you try to use sentences and track words, it's intractable for Anki. It's a complex graph problem even before you get to the level of word vs. form disambiguation.

For dictionaries, my initial source is Oxford bi-lingual dictionaries and a few others.

Re: words. Yup. This was one of the top technical challenges I needed to solve for this project to be possible. I separate each of the possible meanings and map them to a lemma or character that uniquely represents each.

For example, 見ました [Japanese to see (past tense)] maps to several things simultaneously:

みる - the lemma/dictionary form of the word

見る - the kanji representation of the lemma

見 - the kanji in the word

みました - the non-kanji conjugation of the word

Parallel texts are a feature that I hope to implement eventually, but aligning parallel texts isn't trivial, but even that (much simpler than machine translation) ends up being a many-to-many mapping.


I can see the difficulty of integrating with Anki if you're trying to implement a sophisticated system for tracking a user's progress, however I would definitely want some way of studying vocabulary with spaced-repetition flashcards I could have on my phone for something like this, Anki or otherwise. I've personally found traditional flashcards and (even more so) writing words with pen on paper to be the most effective ways of absorbing vocabulary. I've found that I don't ultimately absorb vocab as well using other self-quiz methods like multiple-choice / fill-in-the-blank / matching / etc. in Duolingo and similar apps (though I've had the false impression of learning from these methods). Just my personal experience.

It seems we're on the same page regarding specific word-meanings, comprehensive linguistic information provided along with a word, and using good dictionaries.

I would suggest including at least the option of viewing the full conjugation / declension / etc. associated with a word, and not just that used in the context of the text. I don't know any Japanese (though I recognize the example character you provided from the little Chinese I've studied), but something like this full conjugation of that verb is roughly what I'd want to see for any conjugated/declined language: http://www.japaneseverbconjugator.com/VerbDetails.asp?txtVer....

I understand parsing and matching up parallel texts is difficult, but I thought NLP would help with that, and if the software fails at matching up individual words, you could default to matching up clauses or sentences, which the user could study (with flashcards or something similar) alongside the individual words taken from the dictionary.

I took a look at the app, but I see the currently-available demo is only English-Japanese, and I'm also consistently getting "API Error: Request failed with status code 412" in multiple browsers.

Exactly what parts of the app are you looking for someone else to work on?


The current demo version of the app uses heavily cached 1 Gb data files at the limits of the instance's memory, so, unfortunately, it crashes a lot.

I've been waiting to update until I had UI for exercises implemented. They will be something like flashcards, including SRS for scheduling.

Well, two things mentioned in your reply are things that need to be implemented.

* Consuming and parsing the Oxford Dictionary's API and transforming that into Parsnip's data models is one thing.

* Building statistics pages showing a user's progress/knowledge. Instead of an information firehose, I'm leaning towards only showing conjugations in the user's library. That way, the user can immediately see usage examples for those forms.


This sounds incredibly interesting to me as a Japanese learner myself. Do you have some kind of email list I could sign up for future updates? Even if it's a couple of years until it is available, I'm sure I'll still be interested in the product.


Yes, at the top of https://www.solarmist.net there is a mailing list signup box.


Hi. How many languages are you proficient in? What languages has your tool helped you acquire? What are the results?


I speak Japanese and Korean, but I've had little time to study recently.

Building Parsnip is a catch-22 in that all my time is spent fixing bugs and changing the UI.

I have used parts of it for some stories and news articles. To me, it's like interliner books. There's a nice glossary for anything you want to look up already there.

The exercises are only now becoming usable, but I'm most looking forward to not forgetting all the words that I only come across once in a blue moon.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: