Betting on Data: AI’s Value Mismatch

It occurs to me that the valuations of many “AI companies” are too high relative to the valuations placed on certain proprietary datasets, many of which are owned, controlled, and maintained by companies whose valuations are driven by more traditional means, but whose businesses also stand to benefit immensely from further AI development due to the datasets that they maintain.

In the realm of AI, tech companies and startups are making substantial investments in research and development, and are receiving a substantial lift in valuation from the potential of the technology. However, they are only able to create solutions for scenarios for which they have sufficient training data. This has constrained AI development to datasets that are publically available or that can be easily manufactured through newly-formed operations, sometimes via labeling services like Amazon Mechanical Turks.

This explains why we are seeing generative AI models do amazing things in language, code, or image generation: There is an abundance of free data online upon which to train a model for these purposes. However, for AI to become useful in many industries, and for AI to deliver a true competitive advantage to any firm, it will require models trained on high-value, proprietary datasets.

On the other side of the landscape there are established firms, often non-tech or non-AI-focused, that possess incredibly valuable datasets that cannot be found online or easily replicated without building the specific businesses they operate. Despite holding the key to creating highly valuable AI models, most of these firms are not directing their investments toward AI R&D, and thus are not experiencing the associated valuation lift.

Specialty insurance, agriculture, and healthcare all sit in this sweet-spot, along with several other industries. Each are especially data-rich, and firms in these fields heavily rely on their data to succeed. Unlike the process for training existing language models like ChatGPT, which take publically-accessible writings from the internet as their training data, you won’t find claim data from insurance providers, yield data from agricultural companies, or patient data from healthcare firms floating around online. Each firm receives a competitive advantage – or disadvantage – based on the quality and quantity of its proprietary data, one that is greatly exaggerated by future AI implementation.

Bowery, a NYC-based vertical farming company that has raised over $700 million, realized the value of agricultural data amid the advance of AI from its start. Alongside vertical farms, they built BoweryOS, an AI-powered system that adjusts the amount of light, water, and nutrients an individual plant receives based on several inputs, including computer-vision analyzed photos of the plants. By playing both sides of the equation, generating massive amounts of proprietary data and developing the AI to maximize its value, Bowery enjoyed valuation multiples that would make a traditional farmer gasp. But as interest rates have risen, and venture capital has dried up, Bowery is demonstrating perfectly the difficulty with developing both the AI and the underlying business operation required to train it. Fidelity has written down their valuation by over 85% from the height in 2021, which is a far better outcome than several of their peers, such as AeroFarms, AppHarvest, and Kalera, who have all filed for bankruptcy this year.

Now, I am personally a big fan of what Bowery and similar firms are building, and despite the setbacks, I still see a bright future for their technology. The lesson in all of this, for me at least, is that there are a class of businesses out there who, due to the operational data they already produce, are wildly undervalued, especially when held next to companies developing only the technology half of the AI equation, or who are trying to tackle all of it at once. It seems far more likely for an established company with great data to leverage AI than it is for a company with great technology to manufacture comparable data, especially through the various economic cycles we’re likely to experience before AI is fully adopted.

In a few years time, a handful or fewer of firms will win big in the race to develop the best AI technology, and they will have undoubtedly spent billions of investor dollars in the process. Meanwhile, there are countless firms who have yet to realize that they are sitting on a goldmine of data which will become exponentially more valuable once AI development is subsidized by venture capital and big tech. In the meantime, the companies developing the tech enjoy inflated valuations, and the companies sitting on the data are being presented with a huge opportunity.