Foundations of a Fraud Detection System

7 minute read

Published:

img

Icons for the image were picked from 3, 4, 5

Online fraud has been a problem since the early days of e-commerce. Companies like Google, Amazon, Microsoft etc. have been working to tackle it for years. During the early days of the internet companies in the late 90s, machine learning did not see widespread applications yet. However, organizations needed to tackle fraud to save themselves from monetary and customer loss. They first started with analyzing common behaviors of fraudsters and setting teams to do manual reviews of high-cost activities. Even today, manual reviews and analyzing common fraud behaviors are the foundations for building fraud detection in any company where fraud vectors have started to emerge. Large established organizations still use manual reviews as another layer of validation despite having highly accurate machine learning-based fraud detection systems to make sure that they are not creating unnecessary friction for legitimate users by marking them as fraud (remember that machine learning can get things wrong in various situations).

Manual reviews and behavioral analytics are two foundational ways to detect fraud. However, both do not work in real-time, i.e., it takes time to identify online activity (that already happened) as fraud, after the activity is performed. To prevent fraud, it is equally important to use the knowledge previously gained from past fraud patterns and use that knowledge to further stop similar activities from occurring in the future. The simplest way to encode the knowledge of patterns is to encode those patterns in the form of rules. Rules can be classified into two categories, 1) lists, e.g., a list of usernames or IP addresses to deny or allow access, and 2) thresholding, i.e., limiting access if a certain pattern is observed. For example, when more than 25 items are bought on an e-commerce within 30 minutes, then block further transactions and send for a manual review. Notice how different mechanisms are knitted together in a fraud prevention system. For example, using behavioral patterns to devise rules and then those rules to decide which activities to send for manual reviews.

Having discussed the foundations for fraud detection systems in a narrative format, let’s also summarize each in bullet format for better rememberence. If you are tasked to create the first ever fraud detection system for your organization, here’s what you could start with:

1. Manual Investigations: Human investigators manually look at the online activities for suspicious patterns. These investigators generally have good data analysis skills, good knowledge of common fraud topologies and a good grasp on various cyber risks. These specialists are also educated on different laws and regulations pertaining to fraud and abuse in the country of operation. They should be well versed at finding common patterns of suspicious activities and anomalies in numbers. An example of suspicious pattern is different state of shipping than previously shipped items from the same account or a different shipping state that the billing state associated with the card holder of transaction. Since fraud can suddently happen in day or night, weekend or weekday, many fraud specialists have to work in shifts where someone has to be present in case a fraud outbreak occurs on a Saturday night.

A very important role that manual investigations play is helping gather data points and labels for building machine learning models later on. The activities marked by human investigators are used as ground truth and are foundations for building most ML backed fraud detection models.

2. Behavioral Analytics: Manual investigations are good first step for a fraud detection system, but they can’t be performed at scale. Fraudsters act at a scale. They use softwares that can do the same thing over and over. So the fraud prevention also has to happen at scale. If you are building a fraud detection system, you need to think of a mechanism to identify and stop commonly occurring malicious anomalies. Such analysis can be done by Fraud Analysts who have at-least the basic knowledge statistical analysis. They can create clusters of multiple fraud activities (that are already found by manual investigators) into behavioral patterns. Such patterns can later be encoded into a set of rules and those rules can be embedded into the website or the app to automatically stop those activities from re-occurring.

3 Rules: Rules are the set of logics that can be coded directly into the service by a Software Engineer.
3.1: Lists: Lists contain the attributes that can be directly used to deny access to a service. For example, during manual investigations, you might observe that some fraudster is creating fake accounts and repetitively using the same device to do so. You could put that Device ID in a deny list. Whenever a new account is created from an activity originating from that Device ID, it gets blocked. Other common attributes used in creation of denylists include IP address, Device Fingerprint, Browser Cookie, Email address, Card number, Phone number etc. These attributes are useful to identify multiple activities under different names but from same person and are useful to be put into lists. Denylist is a also a common term used in firewalls, to deny traffic from a specific origin to enter the a network.
3.2: Thresholding: Some rules are not tied to specific attributes like a specific IP address. But rather to a specific pattern. For example, if a transaction has more than 10 items worth $10,000 or more, then you can flag that transaction as suspicious, or if more than 100 sign-ins are happening by a user in a single day, then mark that user as suspicious. Such rules are based on thresholding and are learnt from past behaviors. These rules are not perfect and in many cases require another layer of human verification. But these rules still help in putting a wall against high impact activities and reducing the work for human investigators by only sending more suspicious activities along their way.

4 Customer Feedbacks: Sometimes, customers who get impacted (for example from an account compromise) might reach out to the organization about some suspicius activities happening in their account. So the customer feedback is another way of finding fraud and taking some enforecement actions like helping the customer change password. Although, if an organization is only finding fraud with customer feedback, they are not really doing a good job in proactively preventing fraud and might lose customers.

So, if you are planning to build a fraud detection system in your organization, first think about a team comprising of Investigators, Risk Analysts, Data Scientists and Software Engineers who will work together to help create fraud detection systems. In many situations, especially in startups, all these skills (Investigator + Risk Analyst/Data Science + Software Engineer) can be found in a single person. Then, think about how you will build (or find) the infrastructure and tools that 1) make it easy for investigators to visualize the information they need, 2) keep the customer’s data secure but find a way to make it possible for your risk analysts (with access to confidential data for security purposes) to analyze the data for suspicious behaviors, and 3) make it easy for your customers to reach out in case they see anything suspicious. Later on in the series, we will build such an infrastructure from scratch.

Consider subscribing to the email list if interested.

Subscribe

* indicates required