Privacy GuideApril 17, 20269 min read

How to Remove Yourself From AI Training Data

SC

By Sarah Chen

Head of Privacy Research

How to Remove Yourself From AI Training Data

AI models are trained on vast amounts of data scraped from the internet — and that data very likely includes your personal information. From people-search sites and social media profiles to public records and forum posts, AI companies have ingested billions of web pages to build their models. While you cannot erase data that has already been baked into a trained model, you can take concrete steps to prevent your information from being used in future training and reduce your overall exposure.

Important Limitation

No method can remove data that has already been used to train an existing AI model. Once a model has been trained on your data, that information is embedded in the model's parameters — it cannot be selectively extracted. The steps in this guide focus on preventing your data from being used in future training runs and reducing the amount of personal information available for AI companies to scrape going forward.

Platform-by-Platform Opt-Out Guide

ChatGPT (OpenAI)

By default, OpenAI may use your ChatGPT conversations to improve its models. To opt out:

  1. Open ChatGPT and click on your profile icon
  2. Go to Settings
  3. Select Data Controls
  4. Toggle off "Improve the model for everyone"

When this setting is disabled, your conversations will not be used to train or improve OpenAI's models. Note that OpenAI may still retain conversations for up to 30 days for safety monitoring purposes before deleting them. If you use the ChatGPT API, your data is not used for training by default.

Google

Google uses data from its services to train AI models, including Gemini. To limit this:

  1. Visit myaccount.google.com
  2. Navigate to Data & Privacy
  3. Review and adjust your Web & App Activity settings
  4. Turn off Gemini Apps Activity to prevent your Gemini conversations from being reviewed and used for training
  5. Delete existing Gemini activity by going to myactivity.google.com and filtering by Gemini

Keep in mind that turning off Web & App Activity may affect the personalization of other Google services.

Meta (Facebook, Instagram)

Meta uses content shared on its platforms to train its AI models. If you are in the EU or UK, you have stronger rights to object:

  1. Go to the Meta Privacy Center or search for Meta's "Right to Object" form
  2. Submit the form explaining that you object to Meta using your data for AI training
  3. Meta is legally required to honor these requests under GDPR

For users outside the EU and UK, options are more limited. You can submit the objection form, but Meta is not legally obligated to honor it in all jurisdictions. You can also reduce your exposure by limiting public posts, removing old content, and tightening your privacy settings.

Microsoft 365 Copilot

If you use Microsoft 365 services, your data may be processed by AI features:

  1. Visit the Microsoft Trust Center
  2. Navigate to Privacy Options
  3. Review and adjust settings related to connected experiences and optional data sharing
  4. For enterprise users, your IT administrator may need to configure organizational policies

Microsoft states that enterprise customer data in Microsoft 365 is not used to train foundation models, but individual consumer products may have different policies.

GitHub Copilot

GitHub Copilot uses code from public repositories to train its AI models. If you are a developer, you should be aware of an important deadline:

GitHub Copilot Opt-Out Deadline

You must opt out before April 24, 2026 to prevent your public code from being used to train Copilot. Code in public repositories is used by default. Go to your GitHub Settings, then Copilot, and disable "Allow GitHub to use my code snippets for product improvements." If you miss this deadline, your existing public code may already be incorporated into training datasets.

For Website Owners: Use robots.txt

If you own a website, you can use your robots.txt file to block AI crawlers from scraping your content. Add rules to block the most common AI training bots:

  • GPTBot (OpenAI)
  • Google-Extended (Google AI training)
  • CCBot (Common Crawl, used by many AI companies)
  • Anthropic-AI (Anthropic)
  • Meta-ExternalAgent (Meta)

While robots.txt is a widely respected standard, it is technically voluntary — there is no legal enforcement mechanism in all jurisdictions. However, major AI companies have publicly committed to honoring robots.txt directives, and ignoring them could expose companies to legal liability.

Remove Your Data from Data Broker Sites

One of the most overlooked sources of AI training data is data broker and people-search sites. These websites aggregate and publish personal information — names, addresses, phone numbers, email addresses, family members, employment history, and more — and they are regularly scraped by AI companies building training datasets.

When your personal information is publicly available on sites like Spokeo, BeenVerified, WhitePages, and hundreds of others, it becomes part of the publicly available web data that AI models are trained on. Removing your information from these sites directly reduces the amount of personal data available for future AI training runs.

PrivacyOn automates this process by continuously monitoring and removing your personal data from over 100 data broker sites. Since AI models are periodically retrained on updated web data, removing your information from broker sites means it will not be present in future training datasets. This is one of the most effective practical steps you can take to limit your presence in AI training data.

Exercise Your GDPR Rights (EU and UK Residents)

If you are a resident of the EU or UK, the General Data Protection Regulation gives you powerful rights over your personal data:

  • Right to erasure: Request that a company delete your personal data
  • Right to object: Object to the processing of your data for AI training purposes
  • Right of access: Request a copy of all personal data a company holds about you
  • Right to restrict processing: Request that a company stop processing your data while your objection is reviewed

Several EU data protection authorities have already ruled that AI companies must provide clear opt-out mechanisms. If a company fails to honor your request, you can file a complaint with your national data protection authority.

Additional Steps to Reduce Your AI Training Data Footprint

  • Audit your social media: Delete old posts, limit public visibility, and review what is accessible without logging in
  • Remove old forum posts: Search for your name and username on Google and request removal of outdated content
  • Use Google's "Results About You": This tool lets you request removal of search results containing your personal contact information
  • Limit future public sharing: Be thoughtful about what you post publicly going forward, knowing that anything publicly accessible may be scraped for AI training
  • Review app permissions: Revoke access for apps and services you no longer use, as they may still be collecting and sharing your data

The Bottom Line

Removing yourself completely from AI training data is not possible today — but reducing your future exposure is. By opting out on major platforms, blocking AI crawlers on your websites, removing your data from broker sites with a service like PrivacyOn, and exercising your privacy rights, you can significantly limit how much of your personal information ends up in the next generation of AI models. The sooner you act, the less data there will be for AI companies to train on.

SC
Sarah Chen

Head of Privacy Research

CIPP/US CertifiedIAPP MemberB.S. Computer Science

CIPP/US-certified privacy researcher with over a decade of experience helping consumers remove their personal information from data brokers.

Ready to Protect Your Privacy?

Let PrivacyOn automatically remove your personal information from data broker sites and keep it removed.