Cavallium's comments

Cavallium · on Feb 24, 2024

A lot of vlang-related statistics are very suspicious, probably some of the metrics are boosted by a click farm or something similar. For example, 99% of the global google searches about "vlang" are from Beijing. https://trends.google.com/trends/explore?date=today%205-y&q=...

alwayslikethis · on Feb 24, 2024

This makes very little sense to me. Keep in mind that Google is blocked in China, and has been for a long time, except maybe specific special machines from the government that may have unlimited access. Even if there is a lot of search interest from people using VPNs, it shouldn't show up as China.

arp242 · on Feb 24, 2024

"golang" seems to have similar statistics: https://trends.google.com/trends/explore?date=today%205-y&q=...

So I'm not sure if that's evidence of metric boosting by a click farm, or anything else. Clicking on the question mark, it looks like it's not really a ranking of "where do searches come from", but rather "how popular is it in this region":

> Numbers represent search interest relative to the highest point on the chart for the given region and time. A value of 100 is the peak popularity for the term. A value of 50 means that the term is half as popular. A score of 0 means there was not enough data for this term.

avgcorrection · on Feb 25, 2024

> "golang" seems to have similar statistics: https://trends.google.com/trends/explore?date=today%205-y&q=...

But Go is a working programming language.

baranul · on March 1, 2024

Was curious, so checked up. V has Chinese contributors[2] and who have also translated their documentation from English to Chinese[1]. As was mentioned, other languages have Chinese followings too.

[1] https://www.bookstack.cn/search/result?wd=Vlang

[2] https://lydiandylin.gitbook.io/

Cavallium · on Nov 17, 2021

I opened a PR to use the more precise timers. This would not improve the performance of any of the tests, but it will improve their time accuracy.

I highlighted some other problems related to the Lucene benchmark, but since I can't program in Clojure I can't fix them with a PR: I opened 3 issues to describe what can be done to address them.

huahaiy · on Nov 17, 2021

Merged your PR. Thank you! If you do not know Clojure, that's fine, just outline what you want to do in Java, I can translate that into Clojure. Thanks.

To be honest, all your proposed changes would not make material difference, for algorithmic differences cannot be made up by implementation details.

As has already been demonstrated, T-Wand is implemented in a much slower language, Clojure. Clojure is implemented on top of JVM, it cannot possibly beat the same algorithm written in optimized Java. The only way to make a difference is to change the algorithm. So all your suggestions would not make a difference.

Cavallium · on Nov 17, 2021

You completely misunderstood what this project does. This kind of libraries do not force you to return results in a specific way. Generally they allow the programmer to customize how the results are scored and chosen.

huahaiy · on Nov 17, 2021

I don't think he misunderstood anything.

There's the kind of library Lucene wants to be, a configurable library. But there are other libraries do not want that.

In this case, Datalevin is a database system, full-text search is just but a feature, I do not want users to configure full-text search in Datalevin as much as they like in Lucene. I want the default to be good, which Lucene's is not.

Cavallium · on Nov 17, 2021

That's not true. The real problem is that he is implying that his software is faster than Lucene showing data from a benchmark that has substantial flaws. The reason why the majority of criticism comes from Lucene users is because they generally have a higher knownledge about this field than the people that read this article without having the basic knowledge to form any kind of criticism

huahaiy · on Nov 17, 2021

Incorrect. The benchmark is fine. Your so-called "fundamental" flaws are just superficial things, such as using a different time measure, using a different benchmark library, passing in a thread pool, and so on. These are immaterial for relative comparison, unless proven otherwise.

I have repeated asked you to send the code that does things properly in your mind, and you refused to do so.

So you are just trolling. Please stop hinder the progress of the industry through pointless trolling. Makes some positive contributions instead, and I have pointed out repeatedly how to make a positive contribution in this case. Please do.

You are a university student, I am a seasoned computer scientist, past researcher and professor, I am busying running a startup. This will be the last time I will say this to you: please consider contributing something positive to the world.

Cavallium · on Nov 17, 2021

I already explained to you why I didn't open more than one PR to the project, I'm not trolling anyone: without knowing Clojure the only contribution that I could do without spending literally days of work is just the use of System.nanoTime() instead of System.currentTimeMillis(). The reason why I don't write the other benchmarks in Java is because reimplementing anything from zero costs precious time, that I don't have. If you don't have time too, instead of closing the issues and calling me a troll, you can just say that; you are taking some valid critiques as a personal attack.

It's true that I'm just an university student and you are a scientist, a researcher, a professor, and an enterpreneur, but it doesn't mean that I can't be knowledged as you in very narrow fields. I'm currently running a personal project that uses Lucene since four years ago, with about 8 billion messages and 340 million chats stored in a distributed lucene index, so I'm not the regular troll or a person that just talks without knowing anything. I surely can't critique your T-WAND algorithm, that it seems to be good in your use case, but with all the humilty in the world I can say to have the minimum knowledge required to spot some weird usages of Lucene APIs.

huahaiy · on Nov 18, 2021

As I have repeated suggested, since you obviously know Lucene, it does not take much for you to write a few lines of Java code to say, "here, this is how it is supposed to be done".

Instead, you keeps giving all kind of excuses. You said you do not have time, but somehow you have time to write long wall of text like this. If you are not trolling, what you are doing then?

I pointed out to you why your suggestions will not make differences, some I have already tried. For example, initializing a query parser only once, but that would crash Lucene. Clearly, you do not know about this, so your knowledge of Lucene is not as good as you think you know.

Fair?

Cavallium · on Nov 17, 2021

I also suggest you to look at his "benchmarks" code

Cavallium · on Nov 17, 2021

He is beating lucene only if you look at his broken benchmark

huahaiy · on Nov 17, 2021

Send a PR if you feel the benchmark can be improved. I will be happy to merge it and rerun the benchmark.

Cavallium · on Nov 17, 2021

The funny thing is that the benchmarks that he wrote are more problematic than the T-WAND code itself.

He didn't use any benchmarking library, and he used System.currentTimeMillis instead of the high precision timer available with System.nanoTime.

He also instantiated the IndexSearcher without specifying any executor and he instantiated the legacy query parser on every lucene search

Cavallium · on Nov 17, 2021

The benchmarks in question have several implementation issues, I reported them on GitHub.

https://github.com/juji-io/datalevin/issues/created_by/caval...

huahaiy · on Nov 17, 2021

I would appreciate if you also send a PR to address these issues, since clearly you know more about benchmarking than me.

Contributing to open source project is good, right?

BTW, Lucene is an open source project, Datalevin is also an open source project. Contributing to either would be equally good.

Even if you only want to contribute to Lucene for some reasons, a better contribution would be to integrate T-Wand in Lucene, rather than trying to talk down Datalevin. Would you agree?

Cavallium · on Nov 17, 2021

I never had the interest to "talk down Datalevin", I criticized the results because the title was quite misleading in the tones and the benchmarks were not rigorous enough to express conclusive statements.

Cavallium · on Aug 4, 2021

Looking at gitly repository it almost has no features, only a fraction of the basic git features are present, and they are barely working. Why is it being advertised here on HN as a fast alternative of github/gitlab?

P.S.: Probably the website crashed, it's giving me 502 bad gateway from cloudflare

amedvednikov · on Aug 4, 2021

502 was fixed.

I'm actually surprised it handled a front page HN traffic spike on a $3 VPS.

That was a good test of V/vweb.

Cavallium · on Aug 4, 2021

Now it's online

Cavallium · on June 10, 2021

They also care about criticism https://imgur.com/a/8wYsj4C