I have a bot I wrote to help me with various web tasks that are too tedious manually. I just tested it against this and it says "isbot: false".
edit: looks like it only detects bots that overtly identify themselves as bots, e.g. Googlebot -- it's designed to identify clients, not as some sort of security device
The worst bot normally doesn't identify themselves as bots.
A really useful flag would be "isAIBot" so you can tell them to f-off. A colleague returned from SRECon and had been asking around in regards to bots from AI companies and it's getting ridicules. AI companies are just hammering sites left and right, to the point where some are hitting the limits on their deals with hosting companies and transit providers.
And you can't filter them out, because their running on AWS, Azure or GCP IPs and aren't identifying themselves properly.
Many years ago (pre-smartphone), there was a Java library, written by an Italian chap, that did pretty much the same thing. Don’t remember the name. This appears to use the same approach. I think they had a PHP version, but that was a long time ago. I know it was several megabytes, which was huge, in those days.
Did what it said on the tin, but did so, by maintaining a huge list of individual devices and their characteristics. At the time, I chose not to use it (I was developing a [c]WAP server), but it had a number of supporters, and its maintainer was pretty sharp, and quite dedicated.
These days, there’s an order of magnitude more devices, and a much greater variety. Big job.
Same here... back in 2011 or so. We needed something much more performant than WURFL. My efforts eventually became a feature/product at Akamai known as "Edge Device Characterization" (EDC) using algorithms not dissimilar to how LLMs are trained today.
I can't speak to how good the actual product is today (or even when it launched, but that's a whole 'nother story), but during development it was capable of processing 100K RPS in a footprint of ~30MB RAM with ~98% accuracy compared to WURFL as a baseline.
Wow, I remember WURFL! I used this at my second-ever job, back when mobile was still taking off, and we were trying to create some sort of mobile-server-plugin-thing for a big Java CMS monstrosity thing, as well as running NYC Restaurant Week's mobile site.
I'm just curious — what could be a potential use case for such things on the backend? For bot detection, it seems quite unreliable. Would it be more suitable for server-side rendered UIs? Or am I missing something?
Hypothetical example: When I open Twitter in the browser, I see a feed - but I also see a "What's Happening" section, and a "Who To Follow" list of suggestions, as well as what looks to be my inbox, minimized. Plus, the feed itself automatically loads the images that people are tweeting.
If you know a client is likely to be from a place where bandwidth is expensive, you may choose not to load the "What's Happening"/"Who To Follow", or the messages, or possibly even the image URLs (which I'd guess come from the backend with an array of URLs of those images in various sizes & resolutions.)
Hell, you might even load a smaller subset of the feed - 10 items instead of 30.
1) End device has ability to display HiDPI images -> Send big
2) End device does not have ability to display HiDPI images -> Send small
Of course, if you have (1), in a low-bandwidth environment, then you actually want the server to send small, even if the device can handle big, but that can be indicated with a different flag.
Good tool. I wish Google had gone even further with Chrome in reducing the information in the user agent. It seems like user agent is primarily used as a browser fingerprinting signal.
I need a way to detect the screen DPI from the user agent, so I can return higher resolution images only to devices that can use them. I realize detecting that based on user agent may not always be accurate, but surely it could work the vast majority of the time. Does anybody know of a lib that implements that on NodeJS?
Please consider taking network speed into account. The device can be great but on mobile network it may take ages to load everything, depending on the location (e.g. on a train you may not have stable 5G long enough).
This is still a consideration, and one of the reasons that having a customized server delivery is an important capability.
Responsive sites still upload the same data, but show less of it to you.
That said, if there were a way to report network connection speed to the server, it could make the decision to reduce the data load (regardless of end device).
what can you say about DPI from a string like "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"? I don't think it's possible from the user agent, but it's a one liner in Javascript
i was need something similar for golang and i try to use regexes in those projects, but in eye of performance it wasnt good enough. sometimes i wish to understand more deeply regexes.
it maybe another way to speed up for golang like prefix tree instead of using regexes, any one know a something similar for golang?
It can only tell you things actually included in the UA string itself as it's just be a parser and not a "knowledge engine"
https://github.com/donatj/PhpUserAgent