Using Online, Social Data To Make ‘Thin Files’ Thick

Can using online and social data give consumers without a significant credit history better odds of having their identity verified? Socure sure thinks so. And it has the predictive data model and a 98.8 percent success rate to prove it.

SHUTTERSTOCK

Can the data trapped in “Digital Exhaust” – like online and social media data – be used to validate identity and predict fraud? Socure sure thinks so. And it has the data model to prove it.

Suppose you could build a model that would help financial institutions of any kind fill in those gaps with online and social data that could make thin files thick without increasing their risk?

Socure did that experiment. Here is how it unfolded.

Step One: Correlate a consumer’s participation in various types of social networks (traditional, blog-based and professional) and their associated fraud risk.

Socure found that while an individual who belongs to no social networks brings a fraud risk of 22.9 percent, that number drops by 5.7 percent when a consumer participates in all three types.

Socure_wp001

Step Two: Demonstrate the effectiveness of a fraud model that relies upon social data based on a supervised learning approach to identity verification.

In the company’s research, three data sets were constructed to test the veracity of online and social media data as a means of authentication:

  • Real Data — The control group, consisting of 10,000 real U.S. consumers who were identified using names, addresses, phone numbers and dates of birth (DOB).
  • Synthetic/Fake Data — Another 10,000 identities, in this case all fake (generated automatically using an online tool), were used to simulate what a fraudster could make up. These synthetic identities were created using the attributes of name, DOB, email (with a valid domain), a random phone number with valid area code, address (with valid city, state, country and ZIP code but a random house number and street name), and a random IP address. Much like would be the case in the work of a skilled fraudster, the city, state, country, and ZIP and area code of each synthetic identity align with one another.
  • Stolen (Simulated) Data — To create this third data set, researchers randomized the real data from the first set — keeping it valid in and of itself, but associated it with different people. This was done to simulate a fraudster’s tactic of stealing most parts of an identity but changing components for misdirection (thus allowing them to have goods, funds or services delivered to themselves rather than to the legitimate consumer).

Step Three: Generate a series of social and online data for them and then test to see if the variables accurately classified each identity as real, fake or stolen.

Using each of the three data sets, Socure used its ID+ platform to build a predictive model.

According to Socure’s research, the predictive model showed a success rate of 98.8 percent.

Socure_wp002

To get the full insight into how the utilization of online and social data to supplement offline data can more effectively provide evidence of a consumer’s true digital identity (or lack thereof) that traditional methods, download the new white paper from Socure— “Real, Fake or Stolen: Validating the Use of Alternative Data for Identity Verification” — that examines the potential of utilizing online and social media data to provide a fuller digital identity picture of otherwise “thin file” under-documented population segments.

    To download Real, Fake or Stolen: Validating the Use of Alternative Data for Identity Verification, fill out the form below.

    Your First Name (required):

    Your Last Name (required) :

    Company (required):

    Your Corporate Email (required):