Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Predictive Database (aito.ai)
125 points by tlarkworthy on Dec 18, 2019 | hide | past | favorite | 34 comments


I think this kind of tool can be neat, and probably sometimes useful, but honestly I do not hope that tools like this become common, and used in inappropriate places. I am sick and tired of recommendations that are based on actions that are not inherently intended by me to affect my recommendations.

Take YouTube, for instance. It has gotten so bad that I actively avoid watching videos that I might otherwise want to see (even when I just want to educate myself - a simple example might be Nazi war propaganda videos), because I really don't want all my YouTube recommendations to turn into similar crap. I therefore use the service less. And I like the service less. And I think it is now more difficult for people to discover interesting videos because they largely rely on the inherent behavior pattern-matching and not robust, intentional searches. And people avoid searching for specific topics they don't want popping up on their sidebar.

I am learning that my behavior is what changes my settings, and therefore I should change my behavior if I want my settings to be good. Robust search is falling by the wayside. This is an objectively terrible situation.


Turning off viewing history and search history puts the recommendations much more under your own control — they’re then (AFAICT) only based on likes and subscriptions, and you still have the option to reject recommendations, as sbierwagen says.


I tried doing that on my android TV. There is no option in YouTube app on TV to turn off viewing history. Do you know if it is hidden somewhere?


If you're logged in, you can reject recommendations by clicking the triple dot menu and selecting "Not interested".


I've found myself watching those sorts of videos in a separate Firefox profile that I'm not logged into YouTube in.


I find this very true for things like movie reviews or product reviews. You likely will be bombarded by a ridiculous amount of similar content for weeks after you no longer have any intention to watch similar content.


I just occasionally clear videos from my Watch History which I do not want to be used for recommendations. It’s become a habit to do this every once in a while, and doesn’t take a lot of time.


Truth is that the metrics show it is better than scrolling through a gigantic list, which is what 99% of search website are.


For some arbitrary definition of better.


Better for the advertisers?


Hi, the author here, we made predictive database to help developers test, prototype and productize predictive functionality - lighting fast.

Just to be clear, the predictive database's value proposition is two fold. First: querying for predictions in instant is much faster than fitting & deploying ML model and using it. Second: it looks like a database and it is used like a database so it is familiar and easy to use.

I am available for any questions, here or via email (antti@aito.ai)


Is this using Collaborative Co-Occurrence for recommendations?


The recommendations are content based. Basically, if you have a preference for certain item or feature, it will get better score.

Compared to collaborative approach: content based scoring works better for learning routine, e.g. the weekly grocery shopping routine. It also works better in situations, where there isn't lot of samples about the recommended content, but there is lots of metadata about options. E.g. the sales situation is such: you likely haven't sold before to this customer company, but you may have lot of information about it


Would have been cool if they weren't already called recommendation engines. Introducing side effects into data loading or DBMS scheduled tasks is not the future but the past.


Aito can do statistical predicting, recommending, matching and relating, and of course normal queries and FTS search.

It has been built bottom up to provide programmers ability to query unknown (like ML system) as an addition to known (like database).

I'm not certain how this relates to the recommendation systems you are talking about. Perhaps you could provide me a link to one.


This overlooks a huge part of data science which is cleaning your data. The rest is relatively available with off the shelf tools.


The reason data science is so expensive is because you have to spend so much time cleaning your data, teasing out the true relationships between things, and avoiding common pitfalls.

If you want quick, cheap predictions, there are tools out there that make it very easy, like Azure Machine Learning Studio where you just paste data and have common algorithms run on it.


Founder here. Aito does help with some of the data cleaning concerns:

- There is automatic feature selection/filtering to selec most relevant features from huge feature pools

- Inference through links does help with the data aggregation / flattening work

- There are also new techniques coming in in the future release, which does some feature engineering automatically

- Then there is the ability to express missing data, the ability to bin numeric values and Bayesian mathematics helps with certain data chacteristics

In future, you could likely large skip the step, where data scientist turns deep datastructures into flat dataframes.


It's just a query language. not a "new database category".


they've also somehow modified Lucene's search algorithm to perform some ML calculations, but it's not clear what they've implemented. still not a "new database category" though.


Founder here, Aito has a custom database, that has been bottom up optimized for statistical operations.

This let's Aito create models at spot to answer the predictive queries


Could you share any of the thought process that led to making this from the ground up rather than extending/layering on top of an existing DBMS (or, conceivably, multiple DBMSes)?

Was there indexing and storage engine considerations? Was it a lack of interface support for this kind of thing? Marketing? I could see a lot of arguments either way and wondered what convinced you.

It's always auspicious to start a software project in Finland, all the best of luck on this! The site looks great.


Aito builds a model for predictive queries in millisecond scale. It requires heavy optimizations and preparations in DB to reach the performance. The indexes are optimized for statistics, and there are extra datastructures not found in normal DBs.

The ML is also implanted inside the database to minimize various overheads, and to have direct access to data & invested. if you need to do thousands of statistical operations in 10ms, just IPC can become a huge overhead. You want to put data & math in same process.

Overall, its all based on tight AI+DB integration to enable the instant modeling.


can you share details about the "instant modeling" capability? while simple copula's and correlation matrices may be calculable in milliseconds, larger models are likely to have more performance considerations and training latencies.


It's done for discrete data. Operations on such data can be optimized to pretty extreme level. Aito scales to around million rows and million features before slowing down too much.

We believe we can make it scale to 10m or 100m rows in the future. Maybe more


got it, any plans on supporting models requiring backprop/gradient descent?


>> The end users have gotten used to AI-driven features like recommendations. Features like personalization can provide huge benefits for both the user and the business.

While it's true that "the end users have gotten used to AI-driven features like recommendations" most of the time the user is "used to" recommendations like Medieval peasants were "used to" poverty, wars and the Black Death. If I had a penny everytime I heard an "end user" making fun of e.g. Amazon's recommendation algorithm I'd be a penny billionaire (latest example: "everytime I order shoes on Amazon it shows me shoes for a week afterwads").

"Personalisation" in particular usually means personalised advertisement. I don't think at this point anyone can seriously deny that personalised advertisement is just personalised nuisance. It seems that only the people working in advertisement companies are immune to this observation. As a small bit of concrete evidence- well, that's why we have ad-blockers (and the success of ad-blockers, evidenced by attempts to er, block them, is evidence of the strength of feeling against internet advertisement, personalised or otherwise).

So, yes, personalised ads can provide "huge benefits" for businesses, as long as those businesses can profit while ignoring the annoyance those ads cause to the users. How the user benefits- that's another matter and I'm very skeptical of the article's claim that the user also reaps "huge benefits" by personalisation in this context.

Edit: just noticed the author of the article is participating in the thread. I hope the above doesn't come across as a criticism of the product itself. I'd be interested to know how the "predictive database" can help reduce the nuisance of targeted advertisement. For example, is the predictive database smarter than a typical recommender engine? Can it avoid situations like "I get shoe ads for a week afterwards"?


not sure what is so special about this. sounds like just a normal sql database queried with yaml file and (AI part is just) ranked with fuzzy search.


Founder here. Aito had a custom database optimized for statistical operations. It essentially creates Bayesian models real time to answer the queries.


Sounds like BayesDB. (That was an academic project and I’m not sure if it’s still being worked on).


True. Aito does resemble BayesDB. BayesDB wasn't the inspiration for Aito (I found it afterwards), but it is extremely impressive piece of work.

The biggest difference between BayesDB and Aito is that BayesDB is built on top of SQLite, while Aito has its custom implementation. I have understood, that the SQLite approach puts pretty hard limits on the BayesDB's scaling. The custom database enables pretty radical optimizations, which allow much, much bigger scale.


> For example if we predicted how likely a vegetarian is to purchase bacon, Aito could return that it is very likely, because based on data, that's the common average.

Why is this a good thing??? Maybe it's just me who doesn't understand the point of this? I see this "fetaure" as a benefit in some cases, but this makes me very doubtful very quickly in most instances.


but why is JSON the query language...?


Any similarities to BayesDB?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: