The original misunderstanding behind "knowledge base" was that, in the 1980s, it...

ezst · on April 12, 2025

"Funny" how that reminisces of the whole blockchain discussion. If the need is fully satisfied by a "boring" and cost-effective "facts" database, why would an adequate engineer push for (blockchain/)LLM instead?

PaulHoule · on April 12, 2025

There were several reasons why "expert system" were rejected in the 1980s including competition with programmable calculators and spreadsheets and no correct paradigm for reasoning with uncertainty but the one most quoted was that the creation of that kind of database is not cost-effective.

I spent about 10 years working (sometimes for myself, sometimes for employers, sometimes part time, sometimes as a software developer sometimes as a business developer) on the problem of turning a mass of text into facts into text to solve problems like:

- Doctors write copious medical notes from which facts would be useful for themselves, payers, researchers, regulators.

- An accounting or legal firm may need to scan vast numbers of documents and extract facts for a audit or lawsuit

- An aerospace manufacturer has a vast database of documentation and maintenance notes (even from the teams at the airports) that it needs to keep on top of

- A fashion retailer wants to keep track of social media chatter to understand how it connects and fails to connect with customers and answer questions like "should we endorse sports star A or B?"

- Police and soldiers chat with each other over XMPP chat about encounters with "the other" which again are rich with entities, attributes, events, etc.

Tasks like this need an interactive system but you face the problem that people have an upper limit of 2000 or so simple decisions [1] in a sustainable day. The problem is large but it is not "boil the ocean" because you can set requirements for what gets extracted and use the techniques of statistical quality control as in Deming to know accuracy is in bounds.

You can give people tools to tag things in bulk, you can apply rules, you can give the people tools to create the rules. I worked on RNN and CNN based models, SVM, logistic, autoencoder and other models and before BERT they all sucked. If you have the interactive framework you can put encoder or decoder LLMs in and it is a revolution that makes systems like that much cheaper to develop and run for better effects.

[1] hot dog/not hot dog