Tensormesh raises $4.5M to squeeze more inference out of AI server loads -

With the AI infrastructure push reaching staggering proportions, there’s more pressure than ever to squeeze as much inference as possible out of the GPUs they have. And for researchers with expertise in a particular technique, it’s a great time to raise funding.

That’s part of the driving force behind Tensormesh, launching out of stealth this week with $4.5 million in seed funding. The investment was led by Laude Ventures, with additional angel funding from database pioneer Michael Franklin.

Tensormesh is using the money to build a commercial version of the open source LMCache utility, launched and maintained by Tensormesh co-founder Yihua Cheng. Used well, LMCache can reduce inference costs by as much as 10x — a power that’s made it a staple in open source deployments and drawn in integrations from heavy hitters like Google and Nvidia. Now Tensormesh is planning to parlay that academic reputation into a viable business.

The core of the product is the key-value cache (or KV cache), a memory system used to process complex inputs more efficiently by condensing them down to their key values. In traditional architectures, the KV cache is discarded at the end of each query — but Tensormesh co-founder and CEO Junchen Jiang argues that this is an enormous source of inefficiency.

“It’s like having a very smart analyst reading all the data, but they forget what they have learned after each question,” says Jiang.

Instead of discarding that cache, Tensormesh’s systems hold onto it, allowing it to be redeployed when the model executes a similar process in a separate query. Because GPU memory is so precious, this can mean spreading data across several different storage layers, but the reward is significantly more inference power for the same server load.

The change is particularly powerful for chat interfaces, since models need to continually refer back to the growing chat log as the conversation progresses. Agentic systems have a similar issue, with a growing log of actions and goals.

In theory, these are changes AI companies can execute on their own — but the technical complexity makes it a daunting task. Given the Tensormesh team’s work researching the process and the intricacy of the detail itself, the company is betting there will be lots of demand for an out-of-the-box product.

“Keeping the KV cache in a secondary storage system and reused efficiently without slowing the whole system down is a very challenging problem,” says Jiang. “We’ve seen people hire 20 engineers and spend three or four months to build such a system. Or they can use our product and do it very efficiently.”

Tensormesh raises $4.5M to squeeze more inference out of AI server loads

latest articles

Two days after OpenAI’s Atlas, Microsoft re-launches a nearly identical AI browser

Instagram users can now use Meta AI editing tools directly in IG Stories

20-year-old dropouts built AI notetaker Turbo AI to 5 million users

Only 4 days until TechCrunch Disrupt 2025 kicks off in San Francisco and ticket rates increase

Cluely’s Roy Lee joins TechCrunch Disrupt 2025 to show how rage-baiting cuts through the AI noise

Sora update to bring AI videos of your pets, new social features, and soon, an Android version

explore more