Blog

With Open Source Artificial Intelligence, Don’t Forget the Lessons of Open Source Software

Released

By: Jack Cable, Senior Technical Advisor, and Aeva Black, Section Chief, Open Source Security

The accelerated development of new artificial intelligence (AI) capabilities, including with large language models (LLMs), has spurred international debates around the potential impact of “open source AI” models. Does open sourcing a model benefit society because it enables developers to rapidly innovate by studying, using, sharing, and collaboratively iterating on these state-of-the-art models? Or do such capabilities pose security threats, allowing adversaries to leverage these models for greater harm? 

Fortunately, the conversation isn’t starting from scratch. Developers of all AI models, including open foundation models, can learn from existing work to secure software. As the Cybersecurity and Infrastructure Security Agency’s (CISA) leads on open source software (OSS) security, we’ve spent significant time immersed in open source communities. OSS faced similar debates during the 1990s, and we know that there are many lessons to be learned from the history of OSS.

Last month, CISA responded to the National Telecommunications and Information Administration’s (NTIA) Request for Information on Dual Use Foundation Artificial Intelligence Models With Widely Available Model Weights. At CISA, we see significant value in open foundation models to help strengthen cybersecurity, increase competition, and promote innovation. Our response highlighted that the global AI community should (1) learn from existing software security work and (2) continue to promote the responsible development and release of open foundation models while mitigating their potential harms. There is a tremendous wealth of experience from OSS community that shouldn’t be lost when considering open foundation models.

While there is not yet a consensus on the definition of what constitutes “open source AI”, the Open Source Initiative, which maintains the “Open Source Definition” and a list of approved OSS licenses, has been “driving a multi-stakeholder process to define an ‘Open Source AI’”. Therefore, in the interest of being more precise, we use the term “open foundation models” to refer to AI models with widely available weights.

Learn from Existing Software Security Work

OSS facilitates extensive innovation in every sector. A recent paper from Harvard and the University of Toronto found that the total cost to produce the world’s OSS is $4.15 billion, while the value created is magnitudes larger: $8.8 trillion. It’s safe to say that many innovations of the digital age would not have been possible without OSS.

We must all work to ensure that OSS doesn’t fall victim to the tragedy of the commons. At CISA, we continue to emphasize that every software manufacturer should be a responsible consumer of the OSS that it uses, and that means also being a sustainable contributor back to the open source ecosystem. This same principle applies to open foundation models – everyone ought to do their part to ensure a safe, secure, and sustainable community.

Additionally, we have observed the benefits of open source tools in cybersecurity. While there has been a decades-long debate on the open sourcing of dual-use cybersecurity tools (i.e., tools that can both aid cyber defenders and be used maliciously by threat actors), the general consensus among the security community is that the benefits of open sourcing security tools for defenders outweigh the harms that might be leveraged by adversaries – who, in many cases, will get their hands on tools whether or not they are open sourced. While we cannot anticipate all the potential use cases of AI, lessons from cybersecurity history indicate that we can stand to benefit from dual-use open source tools.

CISA has been hard at work in recent years to help secure the OSS ecosystem. Our Open Source Software Security Roadmap, published last year, starts with the recognition that open source software is supported by an inherently global community – and government’s role is to show up as a community member to support this community. We collaborated with the open source community to release principles for the security of package repositories and highlighted actions that five major package repositories – npm, PyPI, Crates.io, Composer, and Maven Central – are taking in line with this framework.

The AI community should heed these and other lessons from the open source community. Operators of package repositories in the AI ecosystem – such as platforms that distribute AI source code, models, weights, or training data – should work towards the items in the Principles for Package Repository Security framework and think about what unique considerations might apply. Tool developers should begin incorporating traceability and artifact composition analysis techniques. Model developers should include diverse viewpoints early and throughout the development lifecycle, ensuring that trust and safety is a core consideration during model development.

Promote the responsible development and release of open foundation models while mitigating their potential harms

In our response, we define two sets of potential harms of foundation models. Our definitions are based on whether the deployer of the model – who is the entity that runs the model – intends for those harms to be caused, or if they seek to prevent them. The first class of harms are those deliberately sought by the deployer of the model, such as using the model to conduct cyberattacks or to generate non-consensual intimate imagery (NCII). The second class of harms are those which are not desired by the deployer of the model, such as a cybersecurity vulnerability in a model deployed by a critical infrastructure entity.

The first class of harms must be addressed with a multipronged risk reduction approach. This includes additional research and investment to limit abuse of the technology (which should draw on existing trust and safety work), although we know that many protections will inevitably be circumvented by malicious deployers. Therefore, domain-specific risk mitigations are also needed. For NCII, this might involve discouraging the training of specific capabilities in models that are widely distributed (such as by filtering training data) and societal approaches to support victims in abuse. For cybersecurity vulnerabilities, the best approach to all forms of threats – including those enabled by foundation models – is to ensure that software is built in a secure by design manner resilient to the most common classes of vulnerabilities.

Most of the conversation about open foundation models has been dominated by the abuse of such models by malicious deployers. We certainly should work to study and mitigate these harms. With that said, the second class of harms is equally deserving of attention; and, promisingly — as the deployer of these models does not want harms to occur — building protections against risks into models is more readily possible. For instance, the developer of an open source model could take a secure by design approach and build the model in a responsible manner, resilient to common classes of vulnerabilities. Such a developer could also train the model in a publicly verifiable way, or on publicly available data, thereby allowing others to more fully study the model’s behavior and gain confidence that it does not contain vulnerabilities or backdoors. Much as transparency strengthens the security of OSS, allowing for public study and verification can help secure open foundation models.

We recognize that today’s open foundation models exist along a spectrum of openness and applaud efforts to train open foundation models on appropriate public data. When a model’s weights and training code are published without disclosure of training data or pre-training, even though it may be modifiable by users in some ways, users of that model have only a limited ability to understand, verify, or mitigate any vulnerabilities in the model. This lack of transparency prevents inspection and research into how these models operate. Thus, developers of open weight foundation models without open data have a responsibility to ensure the outputs of their models are safe, secure, and trustworthy.

CISA is committed to ensuring that OSS, including AI models, can continue to be deployed in a safe and secure manner to foster innovation. We encourage readers to review our full response to NTIA and to learn more about our work in OSS Security.

As always, we can be contacted at OpenSource@cisa.dhs.gov

CISA does not endorse any commercial entity, product, or service. Any reference to specific commercial entities, products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation, or favoring by CISA.