The Opt Out: 4 privacy concerns in the age of AI
You are more than a data point. The Opt Out is here to help you take your privacy back.
THE LATEST WAVE of artificial intelligence development has forced many of us to rethink key aspects of our lives. Digital artists, for example, now need to focus on protecting their work from image-generating sites, and teachers need to contend with some of their students potentially outsourcing essay writing to ChatGPT.
But the flood of AI also comes with important privacy risks everyone should understand—even if you don’t plan on ever finding out what this technology thinks you’d look like as a merperson.
A lack of transparency
“We often know very little about who is using our personal information, how, and for what purposes,” says Jessica Brandt, policy director for the Artificial Intelligence and Emerging Technology Initiative at the Brookings Institution, a nonprofit in Washington, D.C., that conducts research it uses to tackle a wide array of national and global problems.
In broad terms, machine learning—the process by which an AI system becomes more accurate—requires a lot of data. The more data a system has, the more accurate it becomes. Generative AI platforms like chatbots ChatGPT and Google’s Bard, plus image generator Dall-E get some of their training data through a technique called scraping: They sweep the internet to harvest useful public information.
But sometimes, due to human error or negligence, private data that was never supposed to be public, like delicate company documents, images, or even login lists, can make its way to the accessible part of the internet, where anyone can find them with the help of Google search operators. And once that information is scraped and added to an AI’s training dataset, there’s not a lot anyone can do to remove it.
“People should be able to freely share a photo without thinking that it is going to end up feeding a generative AI tool or, even worse—that their image may end up being used to create a deepfake,” says Ivana Bartoletti, global chief privacy officer at Indian tech company Wipro and a visiting cybersecurity and privacy executive fellow at Virginia Tech’s Pamplin College of Business. “Scraping personal data across the internet undermines people’s control over their data.”
Data scraping is only one potentially problematic source of training data for AI systems. Katharina Koerner, a senior fellow for privacy engineering at the International Association of Privacy Professionals, says another is the secondary use of personal data. This happens when you voluntarily give up some of your information for a specific purpose but it ends up serving another you didn’t consent to. Businesses have been accumulating their clients’ information for years, including email addresses, shipping details, and what kinds of products they like, but in the past, there wasn’t a lot they could do with this data. Today, complex algorithms and AI platforms provide an easy way to process this information so they can learn more about people’s behavioral patterns. This can benefit you by serving you only ads and information you might actually care about, but it can also limit product availability and increase prices depending on your ZIP code. Koerner says it’s tempting for businesses to do this given that some are already sitting on large piles of data their own clients provided.
“AI makes it easy to extract valuable patterns from available data that can support future decision making, so it is very tempting for businesses to use personal data for machine learning when the data was not collected for that purpose,” she explains.
It doesn’t help that it’s extremely complicated for developers to selectively delete your personal information from a large training data set. Sure, it may be easy to eliminate specifics, like your date of birth or Social Security number (please don’t provide personal details to a generative AI platform). But performing a full deletion request compliant with Europe’s General Data Protection Regulation, for example, is a whole other beast, and perhaps the most complex challenge to solve, Bartoletti says.
Selective content deletion is difficult even in traditional IT systems, thanks to their convoluted microservice structures, where each part works as an independent unit. But Koerner says it’s even harder, if not currently impossible, in the context of AI.
That’s because it’s not just a matter of hitting “ctrl + F” and deleting every piece of data with someone’s name on it—removing one person’s data would require the costly procedure of retraining the whole model from scratch, she explains.
It’ll be harder and harder to opt out
A well-nourished AI system can provide incredible amounts of analysis, including pattern recognition that helps its users understand people’s behavior. But this is not due only to the tech’s abilities—it’s also because people tend to behave in predictable ways. This particular facet of human nature allows AI systems to work just fine without knowing a lot about you specifically. Because what’s the point in knowing you when knowing people like you will suffice?
“We’re at the point where it just takes minimal information—just three to five pieces of relevant data about a person, which is pretty easy to pick up—and they’re immediately sucked into the predictive system,” says Brenda Leong, a partner at BNH.AI, a Washington, D.C., law firm that focuses on AI audits and risk. In short: It’s harder, maybe impossible, to stay outside the system these days.
This leaves us with little freedom, as even people who’ve gone out of their way for years to protect their privacy will have AI models make decisions and recommendations for them. That could make them feel like all their effort was for nothing.
“Even if it’s done in a helpful way for me, like offering me loans that are the right level for my income, or opportunities I’d genuinely be interested in, it’s doing that to me without me really being able to control that in any way,” Leong continues.
Using big data to pigeonhole entire groups of people also leaves no place for nuance—for outliers and exceptions—which we all know life is full of. The devil’s in the details, but it’s also in applying generalized conclusions to special circumstances where things can go very wrong.
The weaponization of data
Another crucial challenge is how to instill fairness in algorithmic decision making—especially when an AI model’s conclusions might be based on faulty, outdated, or incomplete data. It’s well known at this point that AI systems can perpetuate the biases of their human creators, sometimes with terrible consequences for an entire community.
As more and more companies rely on algorithms to help them fill positions or determine a driver’s risk profile, it becomes more likely that our own data will be used against our own interests. You may one day be harmed by the automated decisions, recommendations, or predictions these systems make, with very little recourse available.
It’s also a problem when these predictions or labels become facts in the eyes of an algorithm that can’t distinguish between true and false. To modern AI, it’s all data, whether it’s personal, public, factual, or totally made up.
More integration means less security
Just as your internet presence is as strong as your weakest password, the integration of large AI tools with other platforms provides attackers with more latches to pry on when trying to access private data. Don’t be surprised if some of them are not up to standards, securitywise.
And that’s not even considering all the companies and government agencies harvesting your data without your knowledge. Think about the surveillance cameras around your neighborhood, facial recognition software tracking you around a concert venue, kids running around your local park with GoPros, and even people trying to go viral on TikTok.
The more people and platforms handle your data, the more likely it is that something will go wrong. More room for error means a higher chance that your information spills all over the internet, where it could easily be scraped into an AI model’s training dataset. And as mentioned above, that’s terribly difficult to undo.
What you can do
The bad news is that there’s not a lot you can do about any of it right now—not about the possible security threats stemming from AI training datasets containing your information, nor about the predictive systems that may be keeping you from landing your dream job. Our best bet, at the moment, is to demand regulation.
The European Union is already moving ahead by passing the first draft of the AI Act, which will regulate how companies and governments can use this technology based on acceptable levels of risk. US president Joe Biden, meanwhile, has used executive orders to award funding for the development of ethical and equitable AI technology, but Congress has passed no law that protects the privacy of US citizens when it comes to AI platforms. The Senate has been holding hearings to learn about the technology, but it hasn’t come close to putting together a federal bill.
As the government works, you can—and should—advocate for privacy regulation that includes AI platforms and protects users from the mishandling of their data. Have meaningful conversations with those around you about the development of AI, make sure you know where your representatives stand in terms of federal privacy regulation, and vote for those who have your best interests at heart.
Read more PopSci+ stories.