The original misunderstanding behind "knowledge base" was that, in the 1980s, it was an idea in symbolic AI that you'd develop a set of facts against an ontology designed for accurate inference and somehow by the 1990s it became a text repository with a search engine that may or may not work. Occasionally useful, sometimes hard to distinguish from a trash can. See Confluence.
Prompt engineers with their decoder models are going to always be wondering why they are always a bridesmaid and never a bride, with encoder models you can attain the holy grail of the system where you put text in one side and get, within calibrated accuracy, facts to put into the first kind of knowledge base. Or, for that matter, a good search engine for the second kind of knowledge base which could raise it above the "trash can" level.
"Funny" how that reminisces of the whole blockchain discussion. If the need is fully satisfied by a "boring" and cost-effective "facts" database, why would an adequate engineer push for (blockchain/)LLM instead?
There were several reasons why "expert system" were rejected in the 1980s including competition with programmable calculators and spreadsheets and no correct paradigm for reasoning with uncertainty but the one most quoted was that the creation of that kind of database is not cost-effective.
I spent about 10 years working (sometimes for myself, sometimes for employers, sometimes part time, sometimes as a software developer sometimes as a business developer) on the problem of turning a mass of text into facts into text to solve problems like:
- Doctors write copious medical notes from which facts would be useful for themselves, payers, researchers, regulators.
- An accounting or legal firm may need to scan vast numbers of documents and extract facts for a audit or lawsuit
- An aerospace manufacturer has a vast database of documentation and maintenance notes (even from the teams at the airports) that it needs to keep on top of
- A fashion retailer wants to keep track of social media chatter to understand how it connects and fails to connect with customers and answer questions like "should we endorse sports star A or B?"
- Police and soldiers chat with each other over XMPP chat about encounters with "the other" which again are rich with entities, attributes, events, etc.
Tasks like this need an interactive system but you face the problem that people have an upper limit of 2000 or so simple decisions [1] in a sustainable day. The problem is large but it is not "boil the ocean" because you can set requirements for what gets extracted and use the techniques of statistical quality control as in Deming to know accuracy is in bounds.
You can give people tools to tag things in bulk, you can apply rules, you can give the people tools to create the rules. I worked on RNN and CNN based models, SVM, logistic, autoencoder and other models and before BERT they all sucked. If you have the interactive framework you can put encoder or decoder LLMs in and it is a revolution that makes systems like that much cheaper to develop and run for better effects.
Prompt engineers with their decoder models are going to always be wondering why they are always a bridesmaid and never a bride, with encoder models you can attain the holy grail of the system where you put text in one side and get, within calibrated accuracy, facts to put into the first kind of knowledge base. Or, for that matter, a good search engine for the second kind of knowledge base which could raise it above the "trash can" level.