Black Friday, Cyber Monday means more fraud; Machine learning to the rescue?
Although Black Friday traditionally evokes images of mayhem as consumers run wild in brick and mortar outlets, this may be changing. In a survey released a few days ago, 80 percent of UK consumers said they plan to avoid stores on Black Friday this year, while 21 percent said they prefer to hunt for Black Friday deals online.
This translates to more and more online sales on retail peak points. One of the issues plaguing online sales, however, is fraud. With more online sales comes more fraud. Data and machine learning can help with anti-fraud, but this, too, has its blind spots, which new EU regulation aims to address.
Big Retail, Big Fraud, Big Data
According to a 2016 report, the average yearly financial expense attributed to fraud for retailers was 7.6 percent of annual revenue across all channels, including online and offline sales. And that is on a business-as-usual day. On peak-retail days, clients operating on Amazon have reportedly seen an increase of 150 percent in fraud attempts.
A recently published data-driven analysis on fraud by Sift Science shows that peak retail days are not the ones with the highest proportion for attempted fraud. They may be the ones with the highest volume overall, but there is a difference.
Key concepts: Free PDF download: Data, AI, IoT: The future of retail | Business analytics: The essentials of data-driven decision-making
The ratio of fraud actually goes down during the holidays because the bad orders are dwarfed by the good orders. If you are a fraud analyst you may see your workload (aka fraud orders) double or triple, but the amount of good traffic (that he/she does not see) is increasing way more”.
Analyses done by other anti-fraud experts, seem to support this as well. When looking at transaction data for retail, we find that fraud rates stay fairly consistent throughout the year and often even go down as an overall percentage of all transactions, according to Angie White, iovation product marketing manager:
This is simply because the volume of good transactions increases over those time frames. This supports what we’ve been saying for years, fraud has become a business. These are sophisticated operations that operate all year long, are constantly evolving their tactics, and view this as their full-time job.
Data Sift’s findings also support this. Fraud does not happen while you’re asleep, it happens all the time, actually in sync with legitimate transactions. So people don’t commit fraud as a side hustle, but being a fraudster is a proper day job, if you can call it that. The question then is, what to do about it.
Machine learning to the rescue, and humans in the loop
We have already seen how machine learning can take on online retail fraud. But is it the only way to go? Both Data Sift and iovation offer solutions for online retail anti-fraud, so their insights should be of interest.
Lee said Data Sift does see merchants lead with a machine learning first approach for their business, as it’s the most scalable and adaptable tool that merchants currently have — the merchants’ mechanical advantage or super power:
We use supervised machine learning, because it ramps up faster than unsupervised — does not take as much data to learn. We leverage our global model — over 12,000 websites use us and we use them to protect each other.
We have real-time adaptability and anomaly detection, which means that we can stop new forms of bad as soon as it pops up. We don’t need to wait for a model refresh or a rule to be generated. This reduces a business’ overall company exposure rate to fraud.
We practice dynamic friction, meaning, machine learning enables us/merchants to provide amazing user experiences to known, trusted customers. This enables merchants to complete the bigger / Amazon-esque transactions with more tools under their belt. We level the playing field a bit.
Primers: What is AI? Everything you need to know about Artificial Intelligence | Machine learning? | Deep learning? | Artificial general intelligence? | The AI, machine learning, and data science conundrum: Who will manage the algorithms?
Iovation, on the other hand, also offers a machine learning solution, but does not rely on machine learning exclusively. White said iovation takes a layered approach to fraud prevention, which allows them to leverage the strengths of both machine learning solutions and human insights with a rules-based system:
Employees are a business’s most important asset. They understand the unique landscape better than any technology ever will. Fraud analysts can catch certain types of fraud machines might miss and react quickly to threats that are unique to that business. But they may miss trends that are too subtle for humans to pick up on or are only noticeable on a global scale.
Our machine learning algorithms analyze billions of combinations of inputs to detect subtle fraud trends across multiple businesses and industries more quickly and accurately than a human. iovation leverages the best of both worlds.
Online fraud: There’s regulation for that, too
But the online retail anti-fraud business is about to change, and that change is going to affect consumers and retailers as well. This is due to new EU regulation called PSD2. PSD2, which comes into effect in mid-2019, is mainly about opening bank APIs to 3rd parties. But it also includes provisions applying to online sales.
For online businesses with high fraud occurrences, transactions from €30 and up must go through heavy authentication, which is not going to be a very good experience for customers. Lower fraud rates mean that the threshold goes up and customers aren’t subjected to such authentication. We asked the experts how they see that playing out.
The intention of this directive is good at heart but unnecessarily provides friction to the more than 99 percent of users out there that are good, according to Lee: “We are essentially making buyers conform to a set of rules because the system is being exploited by a select few bad apples.
Our EU customers have been well prepared for these changes and have put the proper safeguards in place, including upgrading their systems with Data Sift’s suite of products which includes machine learning, reporting, and data visualization tools”.
White thinks we’re likely going to see that the most successful retailers post-PSD2 are those that are able to collaborate with payment service providers (PSPs) to achieve the lowest reference fraud rates and therefore the highest exemption thresholds for Secure Customer Authentication (SCA):
Only PSPs will be able to request exemptions to SCA based on transaction risk analysis. But the reference fraud rates are calculated based on all the transactions that they process, not just those for a single merchant. This means, it’s going to require very close collaboration between merchants and PSPs to hit the highest exemption rates.
It will likely have the effect of creating a tiered system where merchants with higher fraud rates will have to pay a higher rate for transaction processing. The chart below correlates the reference fraud rates with the SCA exemption thresholds and predicts how this will correlate to transaction processing costs.
White said this is going to have a tremendous impact on the market, specifically in the e-commerce space: “Conversion rates are already low in this space, and any added obstacles or friction could correlate into an increase in cart abandonments.
The winners are going to be companies that really look at how they can maximize their SCA exemptions to reduce the total number of transactions subject to SCA. And then provide low friction, user friendly authentication for transactions that are subject to SCA”.
What about interpretability?
Another side-effect of regulation has to do with interpretability. Since machine learning is, and will most likely continue to be, the dominant approach for anti-fraud today, how can transparency requirements be met by machine learning-based anti-fraud offerings, given the issues with machine learning model interpretability?
Oftentimes, machine learning can be seen as a black box because (if it’s good) it’s crunching thousands of signals in real-time and uncovering the unknown unknowns and their correlations, according to Data Sift.
What is GDPR? Everything you need to know about the new general data protection regulations
Lee noted that their machine learning was built in a way to give analysts more insights into the known signals that more heavily contributed to increased probability of fraud. Data Sift gives the analyst some breadcrumbs to help them better understand the why behind the scores models output.
White thinks the transparency requirements laid out by other regulatory frameworks such as the GDPR and e-Privacy legislation will require vendors to ensure that the most complex solutions can be understood in the simplest terms:
This can be a difficult task because if it was simple we wouldn’t need the predictive power of machine learning in the first place. Our approach allows our customers to accurately distinguish between the users they can trust, and those they can’t. Savvy businesses use this knowledge to focus on improving the experiences for trusted users, while keeping fraudsters at bay.
Unsurprisingly, the answers to this question do not seem to go very deep. Machine Learning interpretability is an open research problem. First off, not all models are directly interpretable. Deep learning, for example, is notoriously hard to interpret. Oftentimes, there is a tradeoff between efficiency and interpretability.
But even for the machine learning models that are simpler, for example the ones based on decision trees, the notion of interpretability for trees with thousands of branches is not very clear. Interpretability, or explainable AI, is a topic in and by itself. We will be revisiting it soon.