Understanding the Business of Open Source Software and AI
Recently, I had dinner with Eric Flaningam, an investor, (excellent) writer, and one of our cult members in NYC. We talked about a bunch of interesting topics, including open-source software (OSS) in AI and why companies invest so much in it. After all, how does a company gain from spending so much time and effort into R&D for a tool only to give it to the public for free, especially since it exposes possible trade secrets/competitive advantages to their competitors?
This article will look to answer that question from a purely business perspective. To do so, we will look at various kinds of stakeholders in the Tech Ecosystem (focusing on AI) and how each can leverage OSS for their benefit. Once that is covered, we will review the different strategies companies can use Open Source to increase business adoption.
For conciseness- I’ll keep the highlights section more focused on the principles and go into more details and examples in the main article.
Executive Highlights (TL;DR of the article)
To fully understand the ideas discussed, it helps first to clear a key misunderstanding that often comes up in this discussion.
The False Dichotomy Between Open Source and Closed Source Software in AI
Conversations around Open Source (we will use OS for this for simplicity) often frame OSS and Closed Software in direct opposition to each other. Open source is often misconstrued as simply “free software” with no monetization potential. However, this narrow view ignores the vast ecosystem and diverse business models that exist in tech (every closed project would break without OS components, and OS projects rely extensively on Closed Companies funneling money into the projects).
Open Source is great for solving large problems that affect lots of people. Closed Software applies the general solution created by OS projects and refines their implementation to specific use cases required by specific people. Without OSS, Closed Software would have to build to build everything from scratch. Without Closed Software, Open Source solutions would often remain inaccessible or unusable to the average person, and their potential impact would be significantly diminished.
Thus, Open and Closed Software are often complementary forces that are blended together to create a useful end product.
To craft a good OS strategy for a company, it helps to understand how it impacts different entities, which is what we will discuss next.
How Open Source Helps Different Stakeholders in AI
Developers (I’m grouping researchers into this group as well): Open-source provides developers access to cutting-edge algorithms, models, and tools. Platforms like TensorFlow, PyTorch, and Google’s BERT models have enabled programmers to experiment with advanced AI technologies without prohibitive costs. This access accelerates learning and fosters innovation by allowing developers to study and contribute to high-quality AI projects alongside experts worldwide. Participation in open-source AI projects enhances career prospects as developers build public portfolios showcasing expertise in a highly competitive field.
Businesses: It is important to look at two perspectives- the adopter (companies using existing tools) and the builders (companies building and sharing OSS). The adoption perspective is relatively easy to understand- adopting preexisting OS tools allows you to reduce costs, build more secure systems (you have a better understanding of the vulnerabilities), and iterate quickly.
What about builders? Companies that share their software get better street cred, outsource a lot of R&D to people for free, and hook more people into their ecosystem ( which helps with customer acquisition AND reduces employee training costs). All of these are huge gains that they get for relatively little downside (a builder would have spent that money building a tool for internal use anyway, so they lose very little by sharing).
to ensure that we have access to the best technology and aren’t locked into a closed ecosystem over the long term, Llama needs to develop into a full ecosystem of tools, efficiency improvements, silicon optimizations, and other integrations. If we were the only company using Llama, this ecosystem wouldn’t develop and we’d fare no better than the closed variants of Unix.
-Meta’s “Open Source AI Is the Path Forward” is a good read for the benfits of OS AI
Some people worry that open-sourcing their tools will lead to competitors catching up. However, this concern is often overstated since successful products require refined execution, deep business relationships, network effects, established reputation, lots of resources, and the ability to handle a million little edge cases. That’s much harder to replicate than technical know-how (I have all the knowledge and free tools to build a Facebook or ChatGPT clone, but that would do very little for me when it comes to building viable competitors).
End-users: End-users benefit from AI-powered applications that are improved through open-source collaboration. The widespread use of open-source AI frameworks leads to more robust, efficient, and feature-rich products. The more crowd-sourced nature of OSS generally leads to downward pressure on costs and improved technology accessibility (look at how the OS community reduced the costs of LLMs after Llama weights were leaked).
Government and Public Sector: Governments have a lot of potential to leverage the benefits of OSS for more security and fairness. Many of the AI fairness features are closer to organizations' cost centers (they increase costs without increasing revenue) and thus will be underprioritized in purely free market situations. Thus, they require more proactive regulation. OS is a great way to foster public-private partnerships, ensuring safety does not become innovation-killing regulation. I will write a separate article covering this topic in more detail.
How Firms Should Adapt Their Strategy to Benefit from Open Source in AI
There are several ways for organizations to integrate OSS to meet their business goals-
Support and Services: Firms can offer specialized support, consulting, and customization services for open-source AI tools. Given the complexity of AI technologies, businesses often require expert assistance to implement and optimize AI solutions effectively. This is how I primarily avoid starvation. My newsletter shares all the important techniques and discussions for free. My clients hire me not to give them any secret techniques but to help them judge which ideas would best suit their needs (and implement them if required)-
Providing training and certification in open-source AI frameworks can also generate revenue and build a community of skilled users who rely on the firm’s expertise. Platform players, however, typically tend to use Free courses/certs to get more people on their platform. For example, GCP (Google Cloud Platform) gives out free courses for Google Cloud, which adds more GCP developers to the market. This makes it more likely for neutral companies to use Google Cloud when building solutions (since the architects will rely on it). All major cloud providers do this, I chose GCP as an example since I think their dev-rel has been a step above the competitors.
Dual Licensing: Companies can employ dual licensing for their AI software, offering an open-source version to encourage widespread adoption and a commercial version with additional features or support. This strategy allows the firm to benefit from community contributions to the open-source version while generating revenue from enterprise clients who need enhanced capabilities or guaranteed service levels.
A powerful variant of this strategy often involves giving students and educational institutions free versions of premium products. This gets the students hooked on the product, making them much more willing to stick to it even when they have to pay (it also enables cross-selling). By getting more students familiar with the platforms/solutions, it also turns them into passive salespeople (similar to the GCP strat from the previous section).
Open Core Model: In the AI sector, an open-core model involves releasing the foundational AI algorithms or models as open source while offering proprietary tools or platforms that enhance these models for commercial use. This approach encourages community engagement and innovation at the core level while monetizing the added value provided by proprietary enhancements, such as user-friendly interfaces, scalability solutions, or advanced analytics.
Hugging Face, which has raised nearly $400 million from tech giants including Alphabet and Nvidia, helped develop the open-source BLOOM and StarCoder models. Rather than focusing on developing models, however, the startup sells compute power and enterprise support for other open models and its open-source model-sharing platform, said Jeff Boudier, head of product at Hugging Face.
“Open models tend to create an ecosystem, whereas closed models just tend to find customers,” he said. “That’s a ten, one hundred times multiplier.”
Hosted AI Services: Providing cloud-based, managed AI services based on open-source tools can be highly profitable. Companies can offer scalable AI platforms that allow clients to deploy machine learning models without managing the underlying infrastructure. This is a play to monetize convenience and is extremely powerful in a fragmented sector like AI- where there are 30 different options for every step of the process.
Complementary Proprietary Products: Firms can develop proprietary applications or tools that integrate with open-source AI frameworks. Offering specialized AI model deployment platforms, performance optimization tools, or domain-specific AI solutions adds customer value and creates revenue opportunities. Companies encourage long-term use of their proprietary products and the underlying open-source AI technologies by ensuring seamless integration.
Partnership and Ecosystem Development: Collaborating with other organizations to create integrated AI solutions expands market opportunities. By building ecosystems around open-source AI tools, firms can influence industry standards and benefit from network effects. Partnerships can lead to co-development of AI models, shared research, and joint offerings that enhance the value proposition for all parties involved. Fostering such ecosystems encourages innovation and accelerates the deployment of AI technologies across industries.
A good case study for this is Nvidia and GPUs. One major reason GPUs became mainstream as the “AI Chip” before TPUs (despite TPUs being better) was the stronger community support and awareness for GPUs. This made developers more likely to optimize AI solutions on GPUs, which created more awareness, and so on. Creating strong partnerships and ecosystems is probably more important for technical adoption than actual performance (what good is the best product if I have to kill myself to use it?).
That is an overview of how I view the business of Open-Source in tech. We will build on these ideas throughout the rest of the article.
I put a lot of work into writing this newsletter. To do so, I rely on you for support. If a few more people choose to become paid subscribers, the Chocolate Milk Cult can continue to provide high-quality and accessible education and opportunities to anyone who needs it. If you think this mission is worth contributing to, please consider a premium subscription. You can do so for less than the cost of a Netflix Subscription (pay what you want here).
Many companies have a learning budget, and you can expense your subscription through that budget. You can use the following for an email template.
I provide various consulting and advisory services. If you‘d like to explore how we can work together, reach out to me through any of my socials over here or reply to this email.
How Open Source Can Promote Business
Businesses like Red Hat honed in on how to leverage open source to create professional products. They took a free good–Linux–and turned it into an stress-tested product, Red Hat Enterprise Linux. In doing so, they focused on removing the pain points for CTOs who needed reliable computing stacks with turnkey maintenance. Open source code alone doesn’t provide that. Red Hat was selling what the customer needed. This approach to business is not new. As Henry Ford said, “A business absolutely devoted to service will have only one worry about profits. They will be embarrassingly large.”
People who are unfamiliar with tech often assume that OSS acts at best as a loss leader- where companies give away free software to gain market share and distract from the competition. In their minds, Software exists on two extremes- closed and open, with Open Software or AI Models being a free (and inferior) version of their closed counterparts.
This incomplete representation reduces a very complex and dynamic ecosystem into a false binary. It’s like trying to boil my GOAT Antony down to something simplistic like United Winger who spins a lot. In reality, OSS is a different paradigm from closed software, and companies benefit from combining both in their development.
Because it leverages crowd-sourced expertise, OSS is really good at the Macro- solving big, important problems that affect tons of people. Consequently, OS Projects often form foundational components- frameworks, platforms, core technologies etc. However, you can’t build a successful product by solving the average problem because the average person does not exist. People have specific challenges, quirks, and advantages, and that is what they will pay for. A strong foundation is important for a house, but no one will buy a house with only a foundation. Closed Software is great for this, and thus many successful AI/Tech companies build massive companies around OS Tech to solve specific challenges for specific enterprises.
Consider the example of Databricks, a company built around Apache Spark, an open-source data processing engine-
Since its release, Apache Spark, the unified analytics engine, has seen rapid adoption by enterprises across a wide range of industries. Internet powerhouses such as Netflix, Yahoo, and eBay have deployed Spark at massive scale, collectively processing multiple petabytes of data on clusters of over 8,000 nodes. It has quickly become the largest open source community in big data, with over 1000 contributors from 250+ organizations…
Apache Spark is 100% open source, hosted at the vendor-independent Apache Software Foundation. At Databricks, we are fully committed to maintaining this open development model. Together with the Spark community, Databricks continues to contribute heavily to the Apache Spark project, through both development and community evangelism.
Databricks offers a managed Spark platform with additional features and services tailored for enterprise use. This model allows them to:
- Benefit from the innovation and community support of Apache Spark: Databricks leverages the power and flexibility of Spark, which is constantly being improved by a global community of developers (solving more refined ‘average problems’ faced by a large community).
- Differentiate with value-added services: Databricks provides enterprise-grade features like security, compliance, and managed infrastructure, which are not available in the open-source version.
- Contribute back to the ecosystem: Databricks actively contributes to the development of Apache Spark, ensuring its continued growth and relevance.
This symbiotic relationship benefits both Databricks and the Apache Spark community. Databricks gains a powerful and widely adopted technology foundation, while the Spark community benefits from the contributions and support of a successful company.
That, in a nutshell, is the symbiotic relationship between Open and Closed Software. If you’re very busy, you probably could click off the article here since that’s the most important aspect of the article. However, for those of you who really want to understand how to craft an Open Source Strategy for your project/company, it is important to look deeper into how OSS affects each individual player in the ecosystem. Let’s discuss that next.
How different Stakeholders play with Open Source
Open Source and Individual/Independent Developers
Platforms like TensorFlow, PyTorch, and Hugging Face Transformers provide access to cutting-edge algorithms, pre-trained models, and sophisticated tools — all without the prohibitive costs often associated with proprietary software. This accessibility democratizes AI development, allowing individuals and small teams to experiment with advanced technologies and contribute to the forefront of innovation.
Open source offers more than just access; it provides a unique learning environment. Developers can dig into the source code of established projects, understand the design decisions behind them, and learn from some of the brightest minds in the field. This transparency accelerates the learning curve and fosters a deeper understanding of AI principles (the learn by doing principle).
Finally, actively contributing to open-source projects allows developers to build a public portfolio that showcases their skills and expertise, enhancing their career prospects in a highly competitive job market (learn more about how here). By collaborating with a global community, developers can hone their skills, gain recognition for their contributions, and ultimately accelerate their careers.
The presence of an active developer community will dramatically enhance any project. Therefore, it is critical for any group to invest in creating a developer-friendly open-source project. This goes beyond simply making the code public. The most important parts include-
- Providing clear and comprehensive documentation
- Establishing straightforward contribution guidelines (new devs are often overwhelmed by how to contribute to OS Projects),
- Actively engaging with the community through forums and discussion channels, and fostering a culture of constructive feedback and mentorship.
By prioritizing these aspects, project maintainers can attract a wider range of contributors, accelerate the pace of development, and ensure the long-term sustainability of the project. That is why Larger Companies have invested extensively in Dev-Rel and Evangelism groups that act as marketing/customer success agents b/w the larger developer communities and the company (I’ve personally gotten offers to do this full-time for some Big Tech Companies). Their role is to ensure that OSS can contribute smoothly, and to relay any feedback back to parent companies.
With this covered, let’s move on to companies and how they engage with OS.
What Companies Want with Open-Source
Regarding open source, businesses can play two distinct but interconnected roles: the adopter and the builder.
Adopters: Reaping the Benefits of Shared Innovation
For companies adopting open-source tools, the advantages are clear. By leveraging pre-existing solutions, businesses can:
- Reduce Development Costs: Instead of reinventing the wheel, companies can focus on building differentiating features and functionalities.
- Accelerate Time-to-Market: Open-source components provide a head start on development, allowing companies to bring products and services to market faster.
- Enhance Security and Transparency: With open-source code, companies have full visibility into their software, enabling them to identify and address potential vulnerabilities more effectively.
- Tap into a Global Talent Pool: Open-source projects often attract a vibrant community of skilled developers, providing companies with access to a wider talent pool for support, collaboration, and even recruitment.
Builders: Cultivating an Ecosystem
While the benefits for adopters are straightforward, the motivations for companies building and sharing open-source software might seem less obvious. However, the strategic advantages are significant:
- Boosting Brand Reputation and Developer Trust: Open-sourcing technology demonstrates a commitment to transparency, collaboration, and innovation, enhancing brand reputation and building trust within the developer community. A great reputation acts as a powerful moat, while a bad reputation can cause overreactions and mania (something Google AI has experienced recently).
- Crowdsourcing R&D and Accelerating Innovation: By opening up their code, companies can tap into the collective intelligence of a global community, accelerating the pace of innovation and feature development. This is one of the biggest benefits of this newsletter (I get to ask all of you to do my research). Take, for example, our recent exploration of Deepfakes and how to detect them. In the article where I talked about the engineering of a Deepfake detection system, I talked about our usage of Triplet Loss for richer decision boundaries. In one of our discussion posts, I got this excellent comment that I will remember for all my future work-
- Cultivating a Thriving Ecosystem: Open-source projects often become industry standards, attracting a network of users, contributors, and complementary products and services. This creates a virtuous cycle of adoption, innovation, and growth. Meta’s investment into PyTorch is a good case-study for this.
- Reducing Internal Development Costs: By sharing their tools and technologies, companies can reduce the need for internal development and maintenance, freeing up resources for other strategic initiatives. Take React’s VR capabilities, which were developed quickly because of its large community of advanced Web Devs. This development for VR has been crucial to Zucks Metaverse aspirations (which I still maintain makes long-term business sense as a powerful platform play).
- Acting as Funnel- Open Source Platforms can act as a funnel to paid services. Google has done with Android. The OS nature of Android Ecosystem attracts as a lot of developers, who are then much more likely to use Google’s more expensive APIs to build more refined projects.
The perceived downside of “giving away” technology is often outweighed by the long-term benefits of fostering a collaborative ecosystem and establishing a leadership position within a particular domain.
Let’s end our discussion of the impact of OSS on the non-technical end-users of technology. Fortunately, that equation is relatively straightforward.
How Open Source Benefits End Users
OSS leads to cheaper, safer, and more accessible products, all of which benefit end users. OS Projects attracts a more diverse set of contributors, many of whom are much more concerned about the efficiency of solutions than developers from richer companies (who tend to prioritize performance on benchmarks).
A good example of this is the RWKV project, which aims to build a more efficient LLM by replacing Transformers with RNNs. RWKV models are generally better trained in other languages (e.g., Chinese, Japanese, etc.) than most existing OSS models. This stuck out to me, so I spoke to the team about it. It turns out that RWKV has always had very diverse contributors, many of whom came from lower-resourced languages. They noticed very early that Open AI’s tokenizations didn’t work as well for their languages, so they decided to build their own tokenizers. This is a giant leg up.
One overlooked benefit of OSS to end-users is its ability to foster more long-term innovation. Since OS projects aren’t bound to the same short-term profit motives as Companies, they can explore more novel directions (companies can get stuck in local maxima, while OS will be more likely to explore). A similar principle is why so many core tech innovations came from the government (or from Bell Labs), which could invest in projects on much longer time horizons.
This creates a lot of synergy between the government and OS, which will be the subject of its own article (along with important discussions around safety and regulation). I’m going to end this article here to keep it from being too long. We will discuss the intricacies of different OS strategies in dedicated case-studies (feel free to send me a message if you want to discuss how your group might want to craft an OS strat).
If you liked this article and wish to share it, please refer to the following guidelines.
That is it for this piece. I appreciate your time. As always, if you’re interested in working with me or checking out my other work, my links will be at the end of this email/post. And if you found value in this write-up, I would appreciate you sharing it with more people. It is word-of-mouth referrals like yours that help me grow. You can share your testimonials over here.
I put a lot of effort into creating work that is informative, useful, and independent from undue influence. If you’d like to support my writing, please consider becoming a paid subscriber to this newsletter. Doing so helps me put more effort into writing/research, reach more people, and supports my crippling chocolate milk addiction. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.
PS- We follow a “pay what you can” model, which allows you to support within your means. Check out this post for more details and to find a plan that works for you.
I regularly share mini-updates on what I read on the Microblogging sites X(https://twitter.com/Machine01776819), Threads(https://www.threads.net/@iseethings404), and TikTok(https://www.tiktok.com/@devansh_ai_made_simple)- so follow me there if you’re interested in keeping up with my learnings.
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- https://artificialintelligencemadesimple.substack.com/
My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819