Llama Is Open-Source, But Why?
Training a large language model can cost millions of dollars. Why would Meta spend so much money training a model and letting everyone use it for free? This article analyzes Meta’s GenAI and large model strategy to understand the considerations of open-sourcing their large models. We also discuss how this wave of open-source models is similar to and different from traditional open-source software.
The Illusion of Proprietary Models
If Meta open-sources its models, wouldn’t people just build their own services instead of paying for the service (e.g., the chatbot on Meta AI, an API based on Llama, or helping you fine-tune the model and serve it efficiently) provided by Meta?
Meta AI’s chatbot
Preventing people from building their own solutions by keeping the models proprietary is just an illusion. Regardless of whether you open-source your models, others, like Mistral AI, Alibaba, and even Google, open-sourced their models.
“Unless your model is better than any other open-source models by several orders of magnitude, whether you open-source your model wouldn’t affect the quality of the applications the users can build upon open-source models.”
Why Be the Leader of Open-Source Models?
Being the leader of open-source models has many benefits, but the most important is attracting talent. The war of GenAI is a talent competition bottlenecked by computing power. How much computing power you get largely depends on the cash flow relationship with Nvidia, except Google. However, how many talents you have is another story.
![AI Talent](_search_image ai talent) Talent competition in AI
According to Elon Musk, Google had two-thirds of the AI talent, and to counter Google’s power, they founded OpenAI. Then, some of the best people left OpenAI and founded Anthropic to focus on AI safety. So, these three companies have the best and the most AI experts right now in the market. Everyone else is super hungry for more AI experts.
Being the leader of open-source models would help Meta bridge this gap of AI experts. Open-source models attract talent in two different ways.
First, the AI experts want to work for Meta. It is super cool to have the whole world use the model you built. It gives you so much exposure for your work, amplifies your professional impact, and benefits your future career. So, many talented people would like to work for them.
Second, the AI experts in the community do the work for Meta for free. Right after the release of Llama, people started to experiment with it. They help you develop new serving technologies to reduce costs, fine-tune your models to discover new applications and scrutinize your model to discover vulnerabilities to make it safer.
Iterate Fast with the Community
With open-source models, Meta can iterate quickly with the community by directly incorporating their newly developed methods.
![Llama Model](_search_image llama model) Llama model
How much would it cost Google to adopt a new method from the community? The process consists of two phases: implementation and evaluation. First, they need to reimplement the method for Gemini. This involves rewriting the code in JAX, which requires a fair amount of engineering resources. During the evaluation, they need to run a list of benchmarks on it, which requires a lot of computing power. Most importantly, it takes time. It stopped them from iterating on the latest technologies when they were first available.
Conversely, if Meta wants to adopt a new method from the community, it will cost them nothing. The community has done the experiments and benchmarks on the Llama model directly, so not much further evaluation is needed. The code is written in PyTorch. They can just copy and paste it into their system.
Can They Still Make Money?
The model is open-source. Wouldn’t people just build their own service? Why would they want to pay Meta for a service built on an open-source model? Of course, they will. The service is difficult to build even with an open-source model.
![Meta Service](_search_image meta service) Meta service
How do you fine-tune and align the model to your specific application? How do you balance between the service cost and the model quality? Are you aware of all the tricks to fully utilize your GPUs?
The people who know the answers to these questions are expensive to hire. Even with enough people, it is hard to get the computing power to fine-tune and serve the model. Imagine how hard it is to build Meta AI from the open-source Llama model. I would expect hundreds of employees and GPUs to be involved.
It’s Just Like Open-Source Software, but Not Quite
The situation is very similar to traditional open-source software. The “free code paid service” framework still applies. The code or the model is free to attract more users to the ecosystem. With a larger ecosystem, the owner collects more benefits. The service built upon the free code is for profit.
However, it is also NOT like open-source software. The main difference can be summarized as low user retention and a new type of ecosystem.
![Open-Source Ecosystem](_search_image open-source ecosystem) Open-source ecosystem
Low User Retention
Open-source models have lower user retention. Migrating to a new model is much easier than to new software.
It is hard to migrate software. PyTorch and HuggingFace have established a strong ecosystem for deep learning frameworks and model pools. Imagine how hard it would be to shift their dominance even slightly if you created a new deep learning framework or model pool to compete with them.
A good example is JAX. It has better support for large-scale distributed training, but it is hard to onboard users to JAX because it has a smaller ecosystem and community. It lacks a helpful community to support users with issues. Moreover, the engineering cost of migrating the entire infra to a new framework is too high for most companies.
Open-source models do not have these problems. They are easy to migrate and require almost no user support. Therefore, it is easy for people to shift to the latest and best models. To maintain your leadership in open-source models, you must constantly release new models at the top of the leaderboard. This is also noted as a downside or challenge to be the leader in open-source models.
A New Type of Ecosystem
Open-source models create a new type of ecosystem. Unlike open-source software, which creates ecosystems of contributors and new software built upon them, open-source models create ecosystems of fine-tuned and quantized models, which can be seen as forks of the original model.
![Llama Ecosystem](_search_image llama ecosystem) Llama ecosystem
As a result, an open-source foundational model doesn’t have to be super good at every specific task because users would fine-tune it for their applications with domain-specific data. The most important feature of a foundational model is to meet the deployment requirements of the users, such as low latency in inferencing or being small enough to fit an end device.
This is why Llama has multiple sizes for each version. For example, Llama-3 has three versions: 8B, 70B, and 400B. They want to ensure they cover all deployment scenarios.
Summary
Even if Meta did not open-source their model, others would. So, it would be wise for Meta to open-source it early and lead the open-source models. Then, Meta can iterate quickly with the community to improve its models and catch up with OpenAI and Google.
When open-sourcing your model, there is no need to worry about people not using your service since there is still a huge gap between the foundational model and a well-built service.
Open-source models are similar to open-source software in that they all follow the “free code paid service” framework but differ in user retention rate and the type of ecosystem they create.
In the future, I would expect to see more open-source models from more companies. Unlike the deep learning frameworks converged on PyTorch, open-source models will remain diverse and competitive for a long time.