May 27, 2024

How to Protect Confidential Corporate Information in the Age of Generative AI

Employees are sharing trade secrets, source code, and financial forecasts with generative AI tools every day. Learn what legal protections actually exist for confidential corporate information, where the gaps are, and how organizations can take back control.

Kathrin Gardhouse

What happens to confidential corporate information when employees use generative AI tools?

The headlines have made it impossible to ignore. Employees at major organizations have inadvertently leaked source code, internal strategy documents, and sensitive client data by pasting them directly into generative AI prompts. In the most widely reported case, Samsung engineers submitted proprietary semiconductor code to ChatGPT for debugging assistance. Once that data was entered, it became part of the inputs the model could potentially learn from, and Samsung had no mechanism to take it back.

This is not an isolated incident. It reflects a structural problem: the legal and technical frameworks that organizations rely on to protect their confidential information were not designed with generative AI in mind. Employees are not acting maliciously. They are reaching for a productivity tool the same way they would reach for a search engine, without fully appreciating the difference between the two.

The result is a quiet, ongoing leak of some of the most sensitive information businesses hold: internal financial projections, unreleased product roadmaps, client communications, legal strategy, source code, and more. And unlike a traditional data breach, there is no alert, no audit log, and often no awareness that anything happened at all.

What legal protections exist for confidential corporate information?

To understand the gap, it helps to understand what protections currently exist. Several legal mechanisms are available to businesses, but each one covers a narrow slice of the problem.

Copyright and patent law are designed to prevent the commercial exploitation of creative and inventive works once they are in the public domain. Copyright does not protect the confidentiality of information, and patent law requires public disclosure in exchange for the protections it grants. Neither framework gives an organization the ability to demand that a third party delete or cease processing information that was shared with them.

Non-disclosure agreements (NDAs) and internal corporate policies can contractually restrict how employees handle confidential information, but they operate after the fact. An NDA cannot prevent a prompt from being sent; it can only create liability once a breach has already occurred.

Trade secret law and unfair competition legislation offer some protection against deliberate misappropriation, but the standards for what qualifies as a trade secret are demanding, and the burden of proving misappropriation in an AI context is largely untested.

Sector-specific financial regulation, such as the EU's Market Abuse Regulation, prohibits the unlawful disclosure of material non-public information that could influence stock prices. This protects a specific category of corporate information, but only in the context of securities markets.

What all of these mechanisms have in common is that they protect corporate information indirectly, through commercial or contractual frameworks, rather than giving organizations direct control over how their information is collected and used once it leaves their walls. Compare this to the rights individuals hold under data privacy law, and the gap becomes stark.

Why personal data gets better protection than corporate data

Under modern privacy law, individuals have rights that go far beyond what any business can claim over its own confidential information. The General Data Protection Regulation (GDPR) provides individuals with the right to erasure under Article 17 and the right to restrict processing under Article 18, subject to certain conditions, such as when data has been processed unlawfully. Individuals can request that AI providers remove their personal data from a model's outputs, and that right is backed by law, not just a provider's terms of service.

The International Association of Privacy Professionals defines privacy as "the right to be let alone, or freedom from interference or intrusion. Information privacy is the right to have some control over how your personal information is collected and used."

The key word there is control. And when it comes to confidential corporate information, that control is almost entirely absent. A company whose employee submits a proprietary algorithm to a large language model has no corresponding right to erasure. No regulatory framework compels the AI provider to delete or stop using that data. And unlike personal data, there is no established right to know how or whether it has been incorporated into model training.

This creates a situation where the personal data of a single employee can trigger extensive regulatory obligations, while the trade secrets of an entire organization may have no enforceable remedy at all.

What are the real-world risks of employees sharing corporate data with AI tools?

The risk is not hypothetical. It is structural, and it plays out across industries every day.

Consider a few scenarios that are already common practice. A marketing professional drafts a product launch strategy using an AI writing tool, inputting unreleased product details to get the copy right. A financial analyst shares internal forecasting models with an AI assistant to pressure-test assumptions. A project manager uploads a presentation containing client logos and confidential partnership terms to an AI platform for polishing. A software engineer pastes source code into a chat interface to ask for help debugging.

In each case, the intent is productive. The employee is trying to do their job more effectively. But the information shared may now reside in systems outside the organization's control, processed by a provider whose data practices are governed by their own terms of service rather than by the organization's policies.

Organizations in regulated industries face compounded exposure. For companies handling patient data, financial records, or personal information about consumers, the inadvertent disclosure of that data through an AI tool may trigger notification obligations and regulatory penalties under frameworks like HIPAA or the GDPR, in addition to the business harm of the disclosure itself. Healthcare organizations and financial services firms are particularly exposed, given the sensitivity and regulatory weight of the data they handle daily.

Are AI providers doing enough to protect corporate data?

OpenAI, Anthropic, Google, and other major AI providers have introduced settings designed to address some of these concerns. Users can opt out of having their conversations used for model training, and enterprise versions of these tools typically offer stronger data handling commitments through contractual agreements. These are genuine improvements, and they matter.

But there is an important distinction between a voluntary contractual commitment and a legally enforceable right. An enterprise contract can be modified, terminated, or subject to carve-outs that are difficult for customers to audit. A data subject's rights under the GDPR are backed by independent regulatory bodies with the power to investigate and impose penalties. Corporate confidential information does not have an equivalent backstop.

There is also the question of unintended model behavior. Even where providers commit not to use customer data for training, the information entered into a prompt exists within that provider's infrastructure at the moment of processing. The security posture of that infrastructure, the controls applied to that data, and the possibility of output leakage through prompt injection or model inversion attacks are all considerations that fall outside the organization's direct control.

How should organizations respond to the corporate data leakage problem?

The immediate response for most organizations has been to issue usage policies: guidelines prohibiting employees from entering certain categories of information into external AI tools. This is a necessary first step, but it is not sufficient on its own. Policies depend on awareness and compliance. They do not create a technical control that intercepts the data before it leaves the organization.

Many of the security frameworks organizations already have in place for personal data protection are also relevant here. Access controls that limit who can interact with AI tools, data loss prevention systems that monitor outbound communications, and incident response protocols all apply. But when organizations stop there, they are applying a personal data security posture to a problem that extends well beyond personal data.

The more robust approach is to address the problem at the point of transmission: before confidential information reaches an external AI system at all. This is precisely the challenge that Limina's data de-identification platform is designed to address. Built by linguists rather than pattern-matching engineers, Limina's technology is context-aware, understanding the relationships between entities in a document rather than simply scanning for recognizable formats. This means it can identify and redact confidential business information in the same way it handles personal data, including references embedded within natural language, partial identifiers, and contextually sensitive terms that would evade a rules-based system.

By intercepting and de-identifying sensitive information before it is transmitted to a third-party AI tool, organizations can preserve the productivity benefits of generative AI while maintaining meaningful control over what leaves their environment. That control is the thing the current legal framework struggles to provide. Technology that builds it in operationally is, at present, one of the most reliable ways to close the gap.

If your organization is working through how to govern employee use of AI tools without forfeiting the productivity benefits, the Limina team is ready to help.

What should a corporate data protection framework look like?

Looking at the trajectory of regulatory development, there is a reasonable case that the gap between personal data protection and corporate data protection will narrow over time. The EU AI Act includes requirements for transparency by AI developers about the use of copyrighted works in model training, which signals a broader legislative interest in the accountability of AI systems for the data they process. Japan has taken a different approach, carving out certain AI-related uses from copyright infringement under defined circumstances, in an effort to balance innovation with protection.

As a thought experiment, consider what a comprehensive corporate data protection framework modeled on the GDPR would look like. Organizations would hold enforceable rights requiring AI providers to disclose what corporate data has been processed, for what purpose, and under what conditions. They would have the right to demand erasure or restriction of processing when that data was submitted outside policy. And regulatory bodies would have the authority to investigate and penalize violations.

That framework does not exist yet. But the direction of travel in AI regulation suggests that some version of expanded corporate data rights is more likely than not in the medium term. In the meantime, the burden of protection falls on organizations themselves.

For businesses operating in industries where data governance is already a compliance requirement, such as pharmaceutical and life sciences companies, insurance providers, and organizations running contact center operations, the infrastructure for handling sensitive data responsibly is already partially in place. Extending that infrastructure to govern AI tool usage is a natural next step, and one that carries real competitive advantage as regulators begin to pay closer attention to this space.

Organizations that build robust controls now will be better positioned when regulatory frameworks catch up. Those that wait may find themselves managing a disclosure that legal policy alone was powerless to prevent.

Talk to Limina about protecting your confidential data before it reaches external AI systems.

‍

Share this post

Copy link

Frequently Asked Questions

What is confidential corporate information?

Confidential corporate information refers to any information belonging to a business that is not publicly available and that the business has an interest in keeping private. This includes trade secrets, source code, internal financial data, client records, business strategies, legal communications, and proprietary processes. Unlike personal data, which is protected by privacy law in most jurisdictions, confidential corporate information relies primarily on contractual mechanisms such as non-disclosure agreements, supplemented by intellectual property law and, in specific sectors, financial regulation.

‍

Can employees be held liable for sharing corporate data with generative AI tools?

Potentially, yes. If an employee's use of a generative AI tool violates their employment contract, a non-disclosure agreement, or an internal acceptable use policy, that employee may face disciplinary consequences and, in serious cases, legal liability. However, liability is difficult to enforce after the fact, and in many cases the disclosure occurs without any intent to harm the organization. Preventive technical controls, rather than policy enforcement after a breach, are a more reliable approach.

‍

Does the GDPR protect confidential corporate information?

No. The GDPR is a personal data protection framework. It grants rights to individuals whose personal information is being processed, not to organizations whose confidential business information has been disclosed. A business whose confidential data is shared with an AI provider has no equivalent right to erasure or restriction of processing under the GDPR. Individuals whose personal data appears in that same disclosure may have such rights, which is why the exposure of personal data through AI tools creates a separate compliance problem for organizations in addition to the business harm of the disclosure.

‍

What is the biggest gap in legal protection for corporate data shared with AI?

The most significant gap is the absence of a right to control or recover information once it has been transmitted to a third-party AI system. Intellectual property law, trade secret law, and NDAs all operate indirectly and are difficult to enforce in the context of AI-generated outputs. Unlike personal data, there is no regulatory framework that currently compels AI providers to delete or restrict the use of confidential corporate information on request. The result is that organizations have far weaker legal remedies than individuals do when their information is processed by a generative AI tool without authorization.

‍

Will AI regulation eventually protect corporate confidential information the way the GDPR protects personal data?

It is increasingly likely, though the timeline is uncertain. The EU AI Act introduces new transparency obligations for AI developers, including around the use of third-party content in model training. This signals a legislative willingness to regulate AI systems' handling of data more broadly. Whether that will extend to enforceable rights for organizations over their confidential information remains to be seen. In the meantime, organizations should not wait for regulatory frameworks to close the gap, particularly in regulated industries where the cost of disclosure is already significant.

‍