Methodology for Fraud Detection in credit card transactions with small manual labelling effort (Keras / MLP / Autoencoder)

The annual loss due to fraudulent credit card transactions in France reached 400 millions of euros in 2016 (Source: L’observatoire de la sécurité des moyens de paiement). Even if this number is small compared with the global loss ($ 21.8 billions in 2015 according to Nilson Reports), the fraud detection is an important concern for banks.

The different payment methods and the variety of devices that can be used to pay (such as mobiles, smart watches, etc.) increase the difficulty to label a transaction as fraudulent or normal. Due to the proliferation of different payment devices which are connected with the same credit card, the number of credit card transactions to analyse is huge. These factors make a day-to-day tracking transaction a challenging task. A manual labelling is not a faisible task anymore and the identification of new types of frauds becomes a difficult task.

Fig. 1 The dataset of credit card transactions is highly unbalanced.

We can observe that the majority of fraudulent transactions can be grouped together in a reduced dimension space (the group of red points on the lower left corner) . However, there is a group of fraudulent transactions that can be easily confused with normal transactions (red dots in the middle of blue dots distribution).

We present an unsupervised learning approach to reduce the number of transactions to analyse, and new possible type of fraudulent transactions could be discovered in this subset. A drawback of this approach is the precision of the fraud detection. Indeed, this type of approach has a less precision and sensitivity for the detection of frauds compared with a supervised learning approach. However, the discovery of new types of fraudulent transactions using a supervised learning approach is very difficult. This is because we do not know if the new types of frauds can be represented with the same features or that new features have to be added in the analysis to describe them.

The aim of this article is to give a roadmap to implement a Fraud Detection mechanism from scratch. A deep learning approach is used for both unsupervised and supervised learning algorithms, that can reduce the effort of manually label credit card transactions. At the end, a suggestion is done to update the algorithms for new types of frauds using either transfer learning or retraining the previous model but with a bigger labeled dataset i.e. the size of the labeled dataset increased as the size of the total dataset. It does not mean that all transactions are labeled but only a reduced group of them.

1. Fraud detection mechanism

The methodology that we use is to build a fraud detection mechanism from scratch includes three phases: the analysis of an initial dataset from which a data subset is selected for a manual labelling request, the model building to fraud prediction using a labeled subset, and the update or evolution of the former model and the labeled subset.

Fig. 2 The models are defined in the first two steps of the methodology: The first step defines an Autoencoder to select a subset for manual labelling. The second step defines a MLP architecture for fraud prediction using a labeled dataset.

Analysis of the initial unlabeled dataset

The main constraint to develop a fraud detection mechanism is the lack of labeled data. Tagging a transaction as fraudulent or normal is a difficult task not only for the reduce number of experts but primarily for the huge amount of transactions to analyse. Labeling millions of transactions and detecting the features to characterize frauds is not an easy task especially when the number of frauds is really smaller than normal transactions, i.e. fraud transactions represent 1% or even less of the total transactions in some datasets.

A fraud transaction can be seen as an outlier in a transaction dataset. Indeed, an outlier is defined by Barnett and Lewis 1994 ¹ as an observation which appears to be inconsistent with the remainder of the dataset or the one that appears to deviate markedly from other members of the sample in which it occurs.

To deal with the huge amount of transactions to label, we use an unsupervised learning approach in this phase. Indeed, the unsupervised learning approach does not need any label for the dataset. The algorithm learns how to automatically detect the outliers in a dataset using as input only the features that characterize each transaction. In summary, the advantages to use the unsupervised learning approach are:

Automatically extract meaningful features from the data.
Leverage the availability of unlabelled data.

Some examples of unsupervised learning algorithms are: Restricted Boltzmann Machine, Sparse Coding Model, and Autoencoders. In Our case, we will use the last one to show that it is possible to use an unsupervised learning approach to reduce the dataset for a manual labelling request.

An Autoencoder ² is a neural network architecture that is composed by an Encoder and a Decoder. The goal of an autoencoder is to copy its input to its output using a reconstruction process. The encoder will map the input in a hidden layer space and the decoder will reconstruct the input from the hidden layer space. There are different Autoencoders architectures according to the dimensions used to represent the hidden layer space, and the inputs used in the reconstruction process.

In our approach, we use an Undercomplete Autoencoder which uses a dimensional reduction mechanism similar to PCA. The hidden layer space has less dimensions than the input, and we can see the encoding phase as a feature extraction process.

The common Loss Function used in autoencoders is the Squared Error which helps us to measure the error reconstruction of a datapoint. To avoid any overfitting in the autoencoder, it is possible to use some regularizers as well as dropout layers in the architecture of the encoder.

Due to the size of the dataset and the small number of features to characterize each transaction that we use, the Autoencoder architecture has only two hidden layers in both the encoder and decoder and the number of features at the end of the encoder represents a quarter of the initial number of features.

The selection of the subset for manual labelling request is done according to a threshold defined for the reconstruction error. Indeed, we select a threshold that divides the dataset in two groups: a group that contains at least 95% of the dataset with a good reconstruction error and another group with the remain transactions that have a big reconstruction error. With this procedure, at most the 5% of the dataset is included in the subset that will be analysed to assign a label i.e. classification of the transactions as fraudulent or normal. If the size of the original dataset is bigger, the percentage of transactions to analyse can be reduced.

A drawback of this approach is that it does not distinguish fraudulent and normal transactions with similar reconstruction errors. However, it can detect a group of abnormal transactions which can include fraudulent and normal transactions that are difficult to reconstruct. This subset is much smaller than the original data set. This is the main advantage of the autoencoder: reduce the dataset to be manually labeled.

def get_autoencoder_model():
    # Building the model
    inputs = Input(shape=(input_dim,))
    # ENCODER layers
    encoder = Dense(units=encoder_l1, activation="tanh",
                    activity_regularizer=regularizers.l1(10e-5))(inputs)
    encoder = Dense(units=encoder_l2, activation="relu")(encoder)
    encoder = Dropout(dropout_prob,seed=dropout_seed)(encoder)
    # DECODER layers
    decoder = Dense(units=decoder_l1, activation="tanh")(encoder)
    decoder = Dense(units=decoder_l2, activation="relu")(decoder)
    # Defining the AUTOENCODER
    autoencoder = Model(inputs=inputs, output=decoder)
    # Compiling the model
    autoencoder.compile(optimizer='Adam',loss='mean_squared_error',
                       metrics=['accuracy'])
    return autoencoder

Fig 3 Implementation of the Autoencoder in Keras.

Modelling and evaluation

The manual labelling of the subset defined in the former step should be done by an expert. This new labeled subset is then used to build a supervised learning model. Using the deep learning approach, the model for this step was based on a Multi-Layer Perceptron (MLP) ³.

The labeled dataset is then divided in three groups: training, test and validation sets. The split of the data is important to evaluate the good generalization of the model.

The MLP model has three hidden layers and it has only fully connected layers. The output layer is a softmax layer with two outputs one for fraudulent transactions and the another one for normal transactions. Using a softmax layer helps us to predict a transaction as normal or fraudulent without the use of any explicit threshold, i.e. we can select the class with maximum probability as the predicted class. The task for the MLP is a multi-class classification, and the function loss used to train the model is a categorical-crossentropy loss.

Furthermore, an early stopping mechanism is used in the training process as well as a reduction of the learning rate on plateau i.e. if the validation loss does not improve after some epochs, the learning rate is reduced to the half and the training process goes on until the early stopping mechanism stop the training process.

def get_mlp_model():
   # Building the model
   inputs = Input(shape=(input_nodes,))
   # HIDDEN layers
   layer1 = Dense(units=hidden_nodes1, activation="sigmoid")(inputs)
   layer2 = Dense(units=hidden_nodes2, activation="sigmoid")(layer1)
   layer3 = Dense(units=hidden_nodes3, activation="sigmoid")(layer2)
   layer3 = Dropout(dropout_prob,seed=dropout_seed)(layer3)
   output = Dense(units=output_nodes, activation="softmax")(layer3)
   # Defining the MLP
   model = Model(inputs=inputs, output=output)
   # Compiling the model
   model.compile(optimizer='Adam',loss='categorical_crossentropy',
                      metrics=['accuracy'])
   return model

Fig 4 Implementation of the MLP in Keras

Update or evolution of the model

The MLP model could be tested in a new dataset. There are three tasks to perform in this step:

Test the MLP model: This task helps us to detect fraudulent transactions in the entire dataset. It can be used in real-time. However, it does not detect new types of fraudulent transactions.
Test the autoencoder model: This step defines a subset which has to be manually labeled. Analyse a new dataset constantly is very important to detect new types of frauds. Then, this new labeled subset could be added to the labeled subset defined previously. The labeled subset becomes bigger and bigger, and it can be used to create new models or to update the existing models.
Update the model: In this step, we can retrain the previous model of the MLP or build a new one using the new labeled dataset. If the model is already complex, we can also use a transfer learning approach. This step helps us to update the model considering the new types of transactions discovered gradually.

Fig. 5 The third step of the methodology uses the models defined in the steps 1 and 2. It integrates a re-training/update process using a new labeled dataset which could include new types of fraudulent transactions.

2. Conclusions

It is possible to use a deep learning approach to implement a fraud detection mechanism. Using both unsupervised and supervised learning approach is it possible to reduce the effort to manually label a dataset. The discovery of new types of frauds can be included during the update of the model. The approach for the model update depends of the quantity of labeled data i.e. the model could be retrained with a new dataset or a transfer learning approach can be used in a more complex model architectures.

A deep learning approach can give us good results. However, it is very important to compare a deep learning model with traditional machine learning approaches. Indeed, the difference between deep learning and traditional machine learning is the creation of hand-engineering features which are part of the preprocessing data phase. The comparison of both techniques could give us good insights to easily detect fraudulent transactions. In consequence, this comparison should be explored as well as a better tuning of the MLP to reduce the false negative rate and false positive rate for fraudulent transactions.

3. References

¹ Barnett, V. & Lewis, T.(1994). Outliers in Statistical Data, 3rd edn. John Wiley & Sons.

² Baldi, Pierre. « Autoencoders, unsupervised learning, and deep architectures. » Proceedings of ICML Workshop on Unsupervised and Transfer Learning. 2012.

³ Maes, Sam, et al. « Credit card fraud detection using Bayesian and neural networks. » Proceedings of the 1st international naiso congress on neuro fuzzy technologies. 2002.

Methodology for Fraud Detection in credit card transactions with small manual labelling effort (Keras / MLP / Autoencoder)

1. Fraud detection mechanism

2. Conclusions

3. References

Articles concoctés avec les mêmes ingredients

Le top des lectures / audios de vacances de l’équipe Axionable :

Le top 5 des lectures de vacances de l’équipe Axionable

Orano et ses partenaires lancent le projet Usines de Demain

Assureurs : maîtrisez vos risques climatiques avec CatNat Predict !