The AI Training Data Dilemma: Human Native AI’s Solution
The AI revolution is in full swing, with large language models and AI systems requiring massive amounts of data to be trained accurately. However, these systems shouldn’t train on data they don’t have the rights to use. The recent licensing deals between OpenAI and The Atlantic, as well as Vox, demonstrate that both parties are interested in landing these AI-training content licensing agreements.
The importance of data in AI training
Human Native AI, a London-based startup, is building a marketplace to broker such deals between companies building large language models (LLMs) and those willing to license data to them. The goal is to help AI companies find data to train their models on while ensuring rights holders opt-in and are compensated.
“It feels like we are in the Napster-era of generative AI. Can we get to a better era? Can we make it easier to acquire content? Can we give creators some level of control and compensation?” - James Smith, CEO and co-founder of Human Native AI
The company, launched in April, is currently operating in beta and has already signed a handful of partnerships that will be announced in the near future. Human Native AI announced a £2.8 million seed round led by LocalGlobe and Mercuri, two British micro VCs, this week.
The future of AI training data licensing
Smith said demand from both sides has been really encouraging, and they’ve already seen interest from CEOs of 160-year-old publishing companies. The big AI players need lots of data to train on, and giving rights holders an easier way to work with them, while giving them full control of how their content is used, seems like a good approach that can make both sides of the table happy.
“Sony Music just sent letters to 700 AI companies asking that they cease and desist. That is the size of the market and potential customers that could be acquiring data.” - James Smith
The number of publishers and rights holders it could be thousands if not tens of thousands. Human Native AI hopes to provide infrastructure to facilitate these deals, making it easier for smaller AI systems to access data and level the playing field.
The future of AI data licensing
The other interesting piece here is the future potential of the data that Human Native AI collects. Smith said that in the future, they will be able to give rights holders more clarity around how to price their content based on a history of deal data on the platform.