Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
averne_
1 hour ago
|
parent
|
context
|
favorite
| on:
Real-time LLM Inference on Standard GPUs: 3k token...
The blog makes it clear that "standard" GPU here is in opposition to purpose-built hardware like Cerebras. The selling point is reaching the same order of magnitude in generative speed as those approaches.
help
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: