Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
calaphos
10 months ago
|
parent
|
context
|
favorite
| on:
Batch Mode in the Gemini API: Process More for Les...
Inference throughout scales really well with larger batch sizes (at the cost of latency) due to rising arithmetic intensity and the fact that it's almost always memory BW limited.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: