Data Provenance - Unintended Consequences of Multiple Data Breaches
by Michael Queralt
The results of the multiple data breaches and compromised identity data, free flowing within the vast digital ecosystem has created a data provenance problem. One that will be have a great impact on individuals, as such data makes its way into the data supply chain.
When the compromised data enters the regular supply chain without any proper vetting, it will be aggregated and used by a number of organizations, giving it validity. Once in the supply process it will be consumed by algorithms to make data driven and actionable decisions — which will have a ripple effect on unbeknownst parties and actors.
In this age of AI, ML, smart algorithms, the issue is that those algorithms are being deployed in an effort to streamline and improve many process, making obfuscated decisions that will have a long term impact on the same individuals which data was compromised in the first place.
As an example ; in the area of recruiting algorithms are used to determine the score of the candidate, utilizing data from many sources including Equifax , therefore an individual's application can be affected by compromised data that has found its way back into the same company that exposed it, via a complex and obscure data supply chain.
“After multiple months of trying to find a new position, and applying to an average of 5 positions per week. Maria was totally defeated. She did not understand — she had the qualification, she had the experience, her resume was written by a professional and she was submitting customized covers letters to each position, and not one single call back, until that determining call from a recruiter — where she asked — which Maria are you ? and that is when she realized how many Maria’s are there ? ”
In this example — how many Maria’s exist — how do you determine which is the real Maria if multiple identities have been created for the same person, all using the same relationship information with a slight change of an address ?
The data supply chain is an interwoven process of originators, suppliers, aggregators and consumers, where provenance is difficult to ascertain and validate.
The issue in the coming years, is understanding the impact of the long forgotten data breaches and how it affects those individuals when they are interacting with an organization that is using data & algorithm driven for decision making, like; trying to open up a bank account, find a new job, request a new service, change insurance services, etc.
It is then — that the full effect of the data breaches will be felt and understood by those individuals that had their private data captured, aggregated and compromised.
Algorithm driven organizations, must understand the provenance of the data that they are using. They must audit it and validated, otherwise they could become liable for decisions that they do not understand based on data that they do not control.