A new study shows that even when real names and other personal information are stripped from customer data, it’s often possible to use just a few pieces of the information to identify a specific person, the New York Times reported.
In a study published Friday (Jan. 30) in the journal Science titled “Unique in the Shopping Mall: On the Reidentifiability of Credit Card Metadata,” a group of data scientists analyzed credit card transactions made by 1.1 million people in 10,000 stores over a three-month period. The data set contained details including the date of each transaction, amount charged and name of the store, but had been anonymized by removing personal details like names and account numbers.
Despite removing the “personally identifiable information,” the uniqueness of customers’ behavior made it easy to single them out, the researchers said. In fact, knowing just four random pieces of information was enough to reidentify 90 percent of the shoppers as unique individuals and to uncover their records, the researchers calculated.
That uniqueness of behavior — or “unicity,” as the researchers call it — combined with publicly available information, such as Instagram or Twitter posts, could make it possible to reidentify people’s records by name.
“The message is that we ought to rethink and reformulate the way we think about data protection,” said Yves-Alexandre de Montjoye, a graduate student in computational privacy at the M.I.T. Media Lab who was the lead author of the study. “The old model of anonymity doesn’t seem to be the right model when we are talking about large-scale metadata.”
The study may give ammunition to privacy advocates who have challenged the consumer-tracking processes used on supposedly anonymized customer data. A survey last fall reported that 60 percent of Americans said they are comfortable divulging information about themselves anonymously to their favorite stores.
But if standard techniques aren’t actually making individuals unidentifiable, companies or institutions collecting the data should quantitatively attest to the risks of reidentification, the researchers wrote in the study, adding, “A data set’s lack of names, home addresses, phone numbers or other obvious identifiers does not make it anonymous nor safe to release to the public and to third parties.”