Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Kind of. They all run Stable Diffusion because they released fully open source.

There’s still competitive advantage to owning, training, and gatekeeping access to models. MidJourney and DallE are both superior to Stable Diffusion along many axes.

Monetizing models is tricky because it’s so cheap to run locally but so expensive in the cloud. Except if you release your model such that it can run locally all advantage is lost.

I wonder if there is a way to split compute such that only the last 10% runs in the cloud?



Why is it expensive to run in the cloud and cheap to run on a device?

1. Commodity hardware can do the inference on a single instance (must be true if a user device can do it).

2. It’s apparently possible to run a video game streaming service for $10/month/user.

3. So users should be able to generate unlimited images (one at a time) for $10/month?

Maybe the answer is the DallE/Midjourney models running in the cloud are super inefficient and Stable Diffusion is better. So the services will need to care about optimizing to get that kind of performance. But it’s not inherently expensive because they run it on the cloud.


I wouldn’t assume those $10/mo gaming services are profitable.

It’s not that running in the cloud is more expensive. It’s that people already have a $2000 laptop or maybe even $1600 RTX 4090. If I’ve got that I don’t want to pay $20/month to 6 different AI services.

Sam Altman said ChatGPT costs like 2 cents per message. I’m sure they can get that way down. Their bills are astronomical. But the data they’re collecting is more valuable than the money they’re spending.

Stable Diffusion isn’t super fast. It takes 30 to 60 GPU seconds. There’s minimal consumer advantage to running in the cloud. Id run them all locally if I could.


If I try to pencil it out, $10/mo seems maybe doable at 10% utilization and downright lucrative at 1%.


The problem is (as always) the "bad user" case. You get some users who run at 100% utilization full time (or more because depending on your model they might be able to have multiple instances). They'll be the ones doing things like running a Discord bot in a popular server, or reselling the image generation or something.

This kills your margin.


Many users will use the service at once though, not evenly distributed ... so you might wanna overprovision. Which is basically what you dont wanna do - profitability is reached by underprovisioning.


I think for an AI generation service this problem is actually more solvable than usual. You can slow down how fast the results are returned, which will slow down the demand. Charge more for a higher tier that gets prioritized. People are going to be a somewhat bothered if the result takes 10 seconds instead of 1 second, but it’s not the end of the world if it’s a rare event. If Netflix can’t keep up with demand and your video spends half the time buffering that would be a worse failure mode.


yes yes, i was contemplating game streaming.


Yeah lots of services are lucrative when people buy it but hardly use it!


Some random stats for successful web services (unit is average minutes of use per day per user):

YouTube - 19 minutes

Spotify - 140 minutes

TikTok - 95 minutes

Netflix - 71 minutes

So we’re looking at roughly a 1% - 10% utilization range, depending on where your game streaming or AI inference app falls. You need to factor that in when figuring out the pricing, your competition certainly will.


My intuition tells me GPU utilization is very different. Those services are egress bound. Egress is super elastic and can be scaled to stupefyingly large numbers.

GPU utilization is less scalable. No GPU cloud service is particularly popular. I don’t think any of them are profitable. Having 1:1 GPUs to users use tough.

Gaming is especially difficult because it’s super latency sensitive. Which means you need racks or expensive GPUs sitting in hundreds of edge nodes. I’m super bearish on cloud gaming still.

ML tools aren’t that sensitive. They’ll exist and they’ll be profitable. But I think the economics are tough. And as a consumer I’d still greatly prefer to run locally. Which means there’s a tension between “good for the customer” and “good for the business”.


Nvidia’s business model makes it inherently more expensive to run on the cloud.


Ah, do you have to contract when you buy the cheap GPUs that you might use then for game streaming but you won’t do AI inference?

Makes me wonder if you could first-sale-doctrine your way out of that problem by buying the GPUs on eBay and not making any agreement with Nvidia.


The software is proprietary and is governed by the license. It's not the hardware.


Can't wait for a court to toss that particular one out. "Consumers who purchase a product containing a copy of embedded software have the inherent legal right to use that copy of the software" (Chamberlain v. Skylink)


The drivers are not embedded in the hardware. They are gigabytes of additional downloads.


They are required to use the hardware however. They also come with Windows by default.


AFAIK Nvidia restricts which GPUs you can run in a datacenter, so you cannot buy, for instance, RTX 4090 and use it in a datacenter. You need to buy the datacenter, and much more expensive, cards.


Gpus are expensive. You need at least 10 gpus to quickly render stable diffusion images. If you want to run a service you need more of them. Thousands per month easily reached.


>> Monetizing models is tricky because it’s so cheap to run locally but so expensive in the cloud.

Can you expand on this a bit? The way i'm thinking, that is only the case if you need low-latency. And in that case, it seems you just need to charge to cover compute.

We're running Stable Diffusion on an eks cluster and it evens out the load across calls and prevents over-resourcing.

If latency isnt an issue, it can be run on non-gpu machines. If you're looking for someone under $300 or $400/mo, then I agree it may be an issue.

On that note, I havent checked whether there are lambda/fargate style options which provide GPU power, to achieve consumption based pricing tied to usage, but that might be a route. Can anyone speak to this?


>On that note, I havent checked whether there are lambda/fargate style options which provide GPU power, to achieve consumption based pricing tied to usage, but that might be a route. Can anyone speak to this?

https://lambdalabs.com/service/gpu-cloud


Thanks for this. This is nice and the prices are great...but I was specifically curious about something where consumption can be tied to cost (e.g. lambda/fargate style where you pay by the call)


It's not quite lambda, but GKE auto pilot supports GPU workloads, so it could be a relatively easy way to do this.

You could have a rest service sticking incoming requests into a queue, and then a processor deployment picking off the queue using the GPU resource requests / spot instances. You'd probably also want something to be scaling the processor deployment replicas based on the queue depth and your budget.

I haven't compared the pricing to EKS so unsure if it would really be better financially, but it would avoid having to manage scaling up/down GPU nodes explicitly.

https://cloud.google.com/kubernetes-engine/docs/how-to/autop...


https://www.banana.dev/ have been working on the Lambda-style thing. I haven't tried it but looks very impressive.


> If you're looking for someone under $300 or $400/mo, then I agree it may be an issue.

Yeah. These models don’t need special resources to run. As a consumer I would prefer to buy a 4090 and then run everything locally. I don’t want to pay $10 or $20 monthly subs to a half dozen different AI services. All professional software turning into subscription services sucks.

Midjourney charges $30/mo for unlimited “relax” time and 15 hours of fast GPU time. That’s not too bad. But multiply that by 6 services and a 4090 pays for itself in a few months.


Midjourney is completely different from SD on a computational level. SD is optimized for speed, it takes 5 seconds to generate a 512x512, and their internal optimizations are bringing it down to 0.5 seconds (stated on their twitter). To achieve this, they do one-shot generation straight to 512x512, without upscaling slowly from 64x64 -> 256x256 -> 512x512

Midjourney is optimized for quality. It actually does do the gradual upscaling, which is how the Imagen and Ediffi papers demonstrated. This results in far better quality, but extremely taxing and slow. Even on 'fast' mode it runs like a snail compared to SD. I don't think it'll work on anything below a A100.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: