Forecasting fraud transactions of credit card users of YKB with machine learning technics instead of traditional rule-based systems. Credit card fraud means a transaction that is not intentionally performed by the card holder.
Yapı Kredi Teknoloji is a Turkish company that delivers outputs in machine learning, data mining, pattern recognition, artificial intelligence and natural language processing fields and develops mobile applications.
Existing fraud prevention mechanism in banks are mostly based on manpower-based rules. These rules evaluate the fraud risk of credit card transactions and inform the fraud operation team according to the risk scores of rules. This process is daily reported to the credit cardholders by operation team by considering the daily call capacity. This challenge is about forecasting fraud transactions of credit card users of YKB with machine learning technics instead of traditional rule-based systems. Credit card fraud means a transaction that is not intentionally performed by the card holder.
The challenge has the following sample datasets available for download
Dataset Description: The dataset is anonymized with PCA method and balanced at card level to reduce the high-class imbalance. Half of these credit cards are selected based on the criteria of having at least one fraudulent transaction in the given time frame. Accordingly, the remaining half consist of credit cards that do not have any fraudulent transaction in the time frame. This dataset can be used to train models but should not be used to evaluate their performance. For the evaluation please use the unbalanced dataset that is also provided.
Dataset Description: Unbalanced test set anonymized with PCA and containing transactions of all credit cards in a certain time frame.
There are two metrics to evaluate the performance of the credit card fraud detection models: Detection Rate (DR) and customized False Positive Rate (FPR). Please pay attention that customized FPR is not regular FPR used in Machine Learning.
These metrics are calculated with the help of confusion matrix. In confusion matrix:
- Positives: Fraudulent transactions, Negatives: Legitimate transactions
- FN: False Negative, FP: False Positive, TP: True Positive, TN: True Negative
- DR is the percentage of correctly detected fraudulent credit card transactions. FRP is the ratio of fraudulent transactions detected by the developed model to the total number of fraudulently predicted transactions. Followings are the formulas of DR and FPR.
- DR = TP / (TP + FN) = (# of detected fraudulent transactions) / (# of fraudulent transactions)
- FPR = FP / TP = (# of legitimate transactions detected as fraudulent) / (# of detected fraudulent transactions)
Lastly, the expected result is that at 30/1 FPR, DR should be higher than 50%. Only result(s) for unbalanced dataset(s) should be reported.