Aaron Decker

Aaron Decker - Engineer & Entrepreneur

AI inference revenue projections and predictions through 2030.

AI 2030: projections and predictions

aipredictions

generate an image of this conversation for AI in 2030 Nanobanana: generate an image of this conversation for AI in 2030

2030 is less than 5 years away, and there has been much debate lately around if AI is in a bubble, if progress is slowing down, or if it will cause mass unemployment and somehow destroy all humans. Despite the silly narratives in both directions it seems that in general progress is still happening at a good pace and there are four major labs now with cutting edge models pushing new capabilities and breaking benchmark high water marks every few months.

As an investor though, I want to try to understand potential token demand sources and who will end up accumulating revenue.

I worked with GPT 5.2 thinking and GPT 5.2 pro a bit to refine these projections.

OpenRouter December 2025

I’m writing this in December 2025 and I always like to snapshot OpenRouter just for a kind of vibe check on token usage.

I will say for the remainder of this article xAI isn’t really included much but it’s pretty clear they made grok-fast-4.1 very good and cheap and people are using it for things (including me). How they turn this into a lot of revenue it’s not super clear yet.

openrouter dec 2025

Revenue by inference application

First let’s try to bucket each source of inference demand in 2030. It’s safe to assume consumer and enterprise assistants will still use a lot of tokens. But there are a ton of other applications we should think about.

This is annual inference spend run-rate in 2030 (what customers pay for model calls; excludes training). Base case total: $790B.

workload bucket2030 spend (US$B)what matters for “who wins”
consumer assistants (chat/search/tutoring)180distribution + bundling
white-collar docs (legal/finance/hr/ops)80enterprise trust, compliance, integration
coding copilots70workflow lock-in; coding share already concentrated today (OpenRouter)
voice agents (call handling, scheduling, sales)70low latency + TTS/ASR specialists still matter (Poe)
cybersecurity & threat detection70on-prem + auditability + agentic triage
robots/cars/industrial autonomy control60edge inference + safety + real-time
customer support agents40cost curve + reliability
video creation35compute intensity; China already strong in usage on Poe (Poe)
video analysis / analytics35surveillance/retail/industrial adoption
observability & log intelligence (app/iot/infra)25“cheap intelligence” at scale
image creation & editing25model quality jumps cause rapid share flips (Poe)
image analysis / vision25edge + vertical CV stacks
healthcare & biology inference25regulation + validated tooling; FDA is starting to qualify AI tools (Reuters)
simulation & digital twins20engineering + industrial “digital ops”; big market, but inference is a slice (Grand View Research)
social media analysis & moderation15platform + brand safety
gaming/VR interactive agents15latency, local inference, UGC scale; huge gaming TAM but inference slice is smaller (Grand View Research)

Model company revenue shares

Next, who is going to be capturing this revenue?

provider bucket2030 inference revenue (US$B)share
Google176.022.3%
OpenAI172.321.8%
Neocloud / open weights (Together/Fireworks/Groq style)114.814.5%
Anthropic114.514.5%
China providers (ByteDance/Alibaba/etc, incl. video)77.09.7%
Other long tail60.07.6%
Edge / local OEM (auto, robots, devices)39.04.9%
Specialists (media/audio like ElevenLabs/Runway/etc)36.54.6%

New paradigm inference sources

Next, we try to model out some new paradigms / sources of inference demand that don’t really exist today.

Specifically around the “agentic swarms” idea, I think we can kind of assume that it’s likely there will be a lot of agentic AI type employees that join your slack and then respond to things and do tasks.

In the background these agents should be able to work together to accomplish tasks. I can imagine soon you will have a ton of virtual environment based agents with accounts and stuff controlling computer browsers and doing things end to end. That’s likely where we are going very very soon. I would be super happy to have this to e.g. do various tasks related to accounting or twilio compliance for example. Right now the tools like Manus, or Claude computer use are just not that good yet and they are very slow and make too many mistakes. That probably changes this year.

paradigmendpoints/usersevents ratecompute per eventeffective $/compute2030 spend ($B/yr)cloud compute implied
agentic swarms (enterprise autopilots)300M active knowledge workers (subset of ~644M-997M)100 agent-steps/user/day10 H100-sec/step$8 per H100-hr (blended)243.3~30.4B H100-hr/yr (3.47M always-on eq.)
system 2 + verification tax (deep reasoning, proof-carrying output)500M deep runs/day (global)per run200 H100-sec/run$12 per H100-hr (premium)121.7~10.1B H100-hr/yr (1.16M always-on)
video/world simulation (generative video, interactive worlds, sims)2B output-sec/dayper output-sec150 H100-sec/output-sec$3.5 per H100-hr (commodity)106.5~30.4B H100-hr/yr (3.47M always-on)
always-on digital twins (continuous monitoring + intervention)20M always-on twinscontinuous0.04 H100 per twin 24/7$4 per H100-hr28.0~7.0B H100-hr/yr (0.80M always-on)
ambient perception (cars + smart glasses)120M vehicles + 80M glasses paying; plus cloud assistcloud assist: 60 H100-sec/endpoint/day for 200M endpointssee leftEdge subs: $15/mo vehicles, $8/mo glasses; Cloud: $8/H100-hr39.0cloud only: ~1.22B H100-hr/yr (139k always-on); edge compute not counted

Conclusion

These numbers are obviously just projections and we have no idea what they will eventually turn out to be. But this is what you can potentially come up with if you model starting from today and extrapolate out inference trends.

I don’t think it’s crazy to say you could have somewhere between 500 billion to 1 trillion in annual inference revenue attributable to the major model providers as well as the open source neocloud providers.

People continue to miss that you can train models on any number of data modalities and the applications are hard to even imagine now.

If anything, it’s really entertaining to write the posts and come back later to see how wrong I was.