Improving Classification Performance in Credit Card Fraud Detection by Using New Data Augmentation

Figure 1.
The architecture of K-CGAN: (a) K-CGAN Discriminator Architecture; (b) K-CGAN Generator Architecture with Novelty Loss.
Figure 1.
The architecture of K-CGAN: (a) K-CGAN Discriminator Architecture; (b) K-CGAN Generator Architecture with Novelty Loss.
Figure 2.
ROC curve of K-CGAN.
Figure 2.
ROC curve of K-CGAN.
Figure 3.
ROC curve of Original Dataset.
Figure 3.
ROC curve of Original Dataset.
Figure 4.
ROC curve using SMOTE.
Figure 4.
ROC curve using SMOTE.
Figure 5.
ROC curve using ADASYN.
Figure 5.
ROC curve using ADASYN.
Figure 6.
ROC curve using B-SMOTE.
Figure 6.
ROC curve using B-SMOTE.
Figure 7.
ROC curve using Vanilla GAN.
Figure 7.
ROC curve using Vanilla GAN.
Figure 8.
ROC curve using WS GAN.
Figure 8.
ROC curve using WS GAN.
Figure 9.
ROC curve using SDG GAN.
Figure 9.
ROC curve using SDG GAN.
Figure 10.
ROC curve using NS GAN.
Figure 10.
ROC curve using NS GAN.
Figure 11.
ROC curve using LS GAN.
Figure 11.
ROC curve using LS GAN.
Figure 12.
ROC curve using K-CGAN.
Figure 12.
ROC curve using K-CGAN.
Figure 13.
Correlation comparison of Original Data, SMOTE, ADASYN, B-SMOTE, Novelty K-CGAN, Vanilla GAN, WS GAN, SDG GAN, NS GAN, and LS GAN methods.
Figure 13.
Correlation comparison of Original Data, SMOTE, ADASYN, B-SMOTE, Novelty K-CGAN, Vanilla GAN, WS GAN, SDG GAN, NS GAN, and LS GAN methods.
Figure 14.
Univariate V1 Feature Distribution comparison of Original Data, Smote, ADASYN, B-SMOTE, Novelty K-CGAN, Vanilla GAN, WS GAN, SDG GAN, NS GAN, and LS GAN methods.
Figure 14.
Univariate V1 Feature Distribution comparison of Original Data, Smote, ADASYN, B-SMOTE, Novelty K-CGAN, Vanilla GAN, WS GAN, SDG GAN, NS GAN, and LS GAN methods.
Figure 15.
Univariate V5 Feature Distribution comparison of Original Data, SMOTE, ADAYSN, B-SMOTE, Novelty K-CGAN, Vanilla GAN, WS GAN, SDG GAN, NS GAN, and LS GAN methods.
Figure 15.
Univariate V5 Feature Distribution comparison of Original Data, SMOTE, ADAYSN, B-SMOTE, Novelty K-CGAN, Vanilla GAN, WS GAN, SDG GAN, NS GAN, and LS GAN methods.
Figure 16.
Univariate V15 Feature Distribution comparison of Original Data, SMOTE, ADAYSN, B-SMOTE, Novelty K-CGAN, Vanilla GAN, WS GAN, SDG GAN, NS GAN, and LS GAN methods.
Figure 16.
Univariate V15 Feature Distribution comparison of Original Data, SMOTE, ADAYSN, B-SMOTE, Novelty K-CGAN, Vanilla GAN, WS GAN, SDG GAN, NS GAN, and LS GAN methods.
Figure 17.
Bivariate V1 vs V3 Feature Distribution comparison of Original Data, SMOTE, ADAYSN, B-SMOTE, Novelty K-CGAN, Vanilla GAN, WS GAN, SDG GAN, NS GAN, and LS GAN methods.
Figure 17.
Bivariate V1 vs V3 Feature Distribution comparison of Original Data, SMOTE, ADAYSN, B-SMOTE, Novelty K-CGAN, Vanilla GAN, WS GAN, SDG GAN, NS GAN, and LS GAN methods.
Figure 18.
Bivariate V1 vs V4 Feature Distribution comparison of Original Data, SMOTE, ADAYSN, B-SMOTE, Novelty K-CGAN, Vanilla GAN, WS GAN, SDG GAN, NS GAN, and LS GAN methods.
Figure 18.
Bivariate V1 vs V4 Feature Distribution comparison of Original Data, SMOTE, ADAYSN, B-SMOTE, Novelty K-CGAN, Vanilla GAN, WS GAN, SDG GAN, NS GAN, and LS GAN methods.
Figure 19.
Bivariate V1 vs. V5 Feature Distribution comparison of Original Data, SMOTE, ADASYN, B-SMOTE, Novelty K-CGAN, Vanilla GAN, WS GAN, SDG GAN, NS GAN, and LS GAN methods.
Figure 19.
Bivariate V1 vs. V5 Feature Distribution comparison of Original Data, SMOTE, ADASYN, B-SMOTE, Novelty K-CGAN, Vanilla GAN, WS GAN, SDG GAN, NS GAN, and LS GAN methods.
Credit card dataset (sourced from Kaggle.com).
Credit card dataset (sourced from Kaggle.com).
Dataset | No. of Attributes | No. of Instances | No. of Fraud Instances |
No. of Legal Instances |
---|---|---|---|---|
Kaggle | 30 | 31 | 492 | 284,315 |
Table 2.
Credit card dataset features (highlighted in bold) (first samples of rows).
Table 2.
Credit card dataset features (highlighted in bold) (first samples of rows).
Time | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 |
---|---|---|---|---|---|---|---|---|
0 | −1.35981 | −0.07278 | 2.536347 | 1.378155 | −0.33832 | 0.462388 | 0.239599 | 0.098698 |
0 | 1.191857 | 0.266151 | 0.16648 | 0.448154 | 0.060018 | −0.08236 | −0.0788 | 0.085102 |
1 | −1.35835 | −1.34016 | 1.773209 | 0.37978 | −0.5032 | 1.800499 | 0.791461 | 0.247676 |
1 | −0.96627 | −0.18523 | 1.792993 | −0.86329 | −0.01031 | 1.247203 | 0.237609 | 0.377436 |
2 | −1.15823 | 0.877737 | 1.548718 | 0.403034 | −0.40719 | 0.095921 | 0.592941 | −0.27053 |
2 | −0.42597 | 0.960523 | 1.141109 | −0.16825 | 0.420987 | −0.02973 | 0.476201 | 0.260314 |
4 | 1.229658 | 0.141004 | 0.045371 | 1.202613 | 0.191881 | 0.272708 | −0.00516 | 0.081213 |
7 | −0.64427 | 1.417964 | 1.07438 | −0.4922 | 0.948934 | 0.428118 | 1.120631 | −3.80786 |
7 | −0.89429 | 0.286157 | −0.11319 | −0.27153 | 2.669599 | 3.721818 | 0.370145 | 0.851084 |
V9 | V10 | V11 | V12 | V13 | V14 | V15 | V16 | V17 |
0.363787 | 0.090794 | −0.5516 | −0.6178 | −0.99139 | −0.31117 | 1.468177 | −0.4704 | 0.207971 |
−0.25543 | −0.16697 | 1.612727 | 1.065235 | 0.489095 | −0.14377 | 0.635558 | 0.463917 | −0.1148 |
−1.51465 | 0.207643 | 0.624501 | 0.066084 | 0.717293 | −0.16595 | 2.345865 | −2.89008 | 1.109969 |
−1.38702 | −0.05495 | −0.22649 | 0.178228 | 0.507757 | −0.28792 | −0.63142 | −1.05965 | −0.68409 |
0.817739 | 0.753074 | −0.82284 | 0.538196 | 1.345852 | −1.11967 | 0.175121 | −0.45145 | −0.23703 |
−0.56867 | −0.37141 | 1.341262 | 0.359894 | −0.35809 | −0.13713 | 0.517617 | 0.401726 | −0.05813 |
0.46496 | −0.09925 | −1.41691 | −0.15383 | −0.75106 | 0.167372 | 0.050144 | −0.44359 | 0.002821 |
0.615375 | 1.249376 | −0.61947 | 0.291474 | 1.757964 | −1.32387 | 0.686133 | −0.07613 | −1.22213 |
−0.39205 | −0.41043 | −0.70512 | −0.11045 | −0.28625 | 0.074355 | −0.32878 | −0.21008 | −0.49977 |
V18 | V19 | V20 | V21 | V22 | V23 | V24 | V25 | V26 |
0.025791 | 0.403993 | 0.251412 | −0.01831 | 0.277838 | −0.11047 | 0.066928 | 0.128539 | −0.18911 |
−0.18336 | −0.14578 | −0.06908 | −0.22578 | −0.63867 | 0.101288 | −0.33985 | 0.16717 | 0.125895 |
−0.12136 | −2.26186 | 0.52498 | 0.247998 | 0.771679 | 0.909412 | −0.68928 | −0.32764 | −0.1391 |
1.965775 | −1.23262 | −0.20804 | −0.1083 | 0.005274 | −0.19032 | −1.17558 | 0.647376 | −0.22193 |
−0.03819 | 0.803487 | 0.408542 | −0.00943 | 0.798278 | −0.13746 | 0.141267 | −0.20601 | 0.502292 |
0.068653 | −0.03319 | 0.084968 | −0.20825 | −0.55982 | −0.0264 | −0.37143 | −0.23279 | 0.105915 |
−0.61199 | −0.04558 | −0.21963 | −0.16772 | −0.27071 | −0.1541 | −0.78006 | 0.750137 | −0.25724 |
−0.35822 | 0.324505 | −0.15674 | 1.943465 | −1.01545 | 0.057504 | −0.64971 | −0.41527 | −0.05163 |
0.118765 | 0.570328 | 0.052736 | −0.07343 | −0.26809 | −0.20423 | 1.011592 | 0.373205 | −0.38416 |
V27 | V28 | Amount | ||||||
0.133558 | −0.02105 | 149.62 | ||||||
−0.00898 | 0.014724 | 2.69 | ||||||
−0.05535 | −0.05975 | 378.66 | ||||||
0.062723 | 0.061458 | 123.5 | ||||||
0.219422 | 0.215153 | 69.99 | ||||||
0.253844 | 0.08108 | 3.67 | ||||||
0.034507 | 0.005168 | 4.99 | ||||||
−1.20692 | −1.08534 | 40.8 | ||||||
0.011747 | 0.142404 | 93.2 |
Table 3.
Variance inflation factor (VIF).
Table 3.
Variance inflation factor (VIF).
Feature | VIF |
---|---|
Time | 1.104214 |
V1 | 1.003973 |
V2 | 1.000397 |
V3 | 1.038927 |
V4 | 1.002805 |
V5 | 1.007125 |
V6 | 1.000983 |
V7 | 1.002670 |
V8 | 1.001018 |
V9 | 1.000367 |
V10 | 1.001049 |
V11 | 1.013779 |
V12 | 1.003927 |
V13 | 1.000932 |
V14 | 1.002786 |
V15 | 1.007373 |
V16 | 1.000528 |
V17 | 1.002051 |
V18 | 1.002158 |
V19 | 1.000196 |
V20 | 1.000669 |
V21 | 1.001252 |
V22 | 1.004694 |
V23 | 1.000729 |
V24 | 1.000058 |
V25 | 1.012106 |
V26 | 1.000409 |
V27 | 1.000941 |
V28 | 1.000440 |
Amount | 11.650240 |
Table 4.
K-CGAN Generator Neural Network Hyperparameter Settings.
Table 4.
K-CGAN Generator Neural Network Hyperparameter Settings.
Parameter | Value |
---|---|
Learning Rate | 0.0001 |
Hidden Layer Optimizer | Relu |
Output Optimizer | Adam |
Loss Function | Trained Discriminator Loss+ KL Divergence |
Hidden Layers | 2, −128, 64 |
Dropout | 0.1 |
Random Noise Vector | 100 |
Kernel Initializer | glorot_uniform |
Kernel Regularizer | L2 method |
Total Learning Parameters | 36,837 |
Table 5.
K-CGAN Discriminator Neural Network Hyperparameter Settings.
Table 5.
K-CGAN Discriminator Neural Network Hyperparameter Settings.
Parameter | Value |
---|---|
Learning Rate | 0.0001 |
Hidden Layer Optimizer | LeakyRelu |
Output Optimizer | Adam |
Loss Function | Binary Cross Entropy |
Hidden Layers | 2, −20, 10 |
Dropout | 0.1 |
Kernel Regularizer | L2 method |
Table 6.
GAN Generator Neural Network Hyperparameter Settings.
Table 6.
GAN Generator Neural Network Hyperparameter Settings.
Parameter | Value |
---|---|
Learning Rate | 0.0001 |
Hidden Layer Optimizer | Relu |
Output Optimizer | RMSprop |
Loss Function | Trained Discriminator Loss |
Hidden Layers | 64, 32 |
Dropout | 0.5 |
Random Noise Vector | 100 |
Table 7.
GAN Discriminator Neural Network Hyperparameter Settings.
Table 7.
GAN Discriminator Neural Network Hyperparameter Settings.
Parameter | Value |
---|---|
Learning Rate | 0.0001 |
Hidden Layer Optimizer | LeakyRelu |
Output Optimizer | RMSprop |
Loss Function | Binary Cross Entropy |
Hidden Layers | 128, 64, 32 |
Dropout | 0.1 |
Table 8.
Precision values for classification methods for balanced dataset.
Table 8.
Precision values for classification methods for balanced dataset.
Precision Value for Balanced Dataset | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
K-CGAN | Original Dataset | SMOTE | ADASYN | B-SMOTE | Vanilla GAN | WS GAN |
SDG GAN | NS GAN |
LS GAN |
|
XG-Boost | 0.999762 | 0.924370 | 0.999467 | 0.999182 | 0.999816 | 0.997085 | 0.988636 | 0.986072 | 0.980831 | 0.982405 |
Random Forest | 0.999776 | 0.931035 | 0.999762 | 0.999760 | 0.999958 | 0.994135 | 0.980170 | 0.986111 | 0.977564 | 0.982249 |
Nearest Neighbor | 0.999608 | 0.864865 | 0.982366 | 0.973762 | 0.997603 | 0.960606 | 0.954416 | 0.966197 | 0.954545 | 0.961194 |
MLP | 0.999692 | 0.881890 | 0.997690 | 0.997970 | 0.998082 | 0.982456 | 0.974504 | 0.957219 | 0.962145 | 0.959885 |
Logistic Regression | 0.999566 | 0.890110 | 0.974443 | 0.909084 | 0.994725 | 0.965732 | 0.958457 | 0.970149 | 0.949495 | 0.968051 |
Table 9.
Recall values for classification methods for balanced dataset.
Table 9.
Recall values for classification methods for balanced dataset.
Recall Value for Balanced Dataset | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
K-CGAN | Original Dataset | SMOTE | ADASYN | B-SMOTE | Vanilla GAN | WS GAN |
SDG GAN | NS GAN |
LS GAN |
|
XG-Boost | 0.999706 | 0.827068 | 1.000000 | 0.999986 | 0.999703 | 0.955307 | 0.932976 | 0.917098 | 0.962382 | 0.941011 |
Random Forest | 0.999706 | 0.812030 | 1.000000 | 1.000000 | 0.999661 | 0.946927 | 0.927614 | 0.919689 | 0.956113 | 0.932584 |
Nearest Neighbor | 0.999706 | 0.721804 | 0.999804 | 1.000000 | 0.999746 | 0.885475 | 0.898123 | 0.888601 | 0.921630 | 0.904494 |
MLP | 0.999594 | 0.842105 | 1.000000 | 0.999929 | 0.999746 | 0.938547 | 0.922252 | 0.927461 | 0.956113 | 0.941011 |
Logistic Regression | 0.999608 | 0.609023 | 0.919681 | 0.860942 | 0.996383 | 0.865922 | 0.865952 | 0.841969 | 0.884013 | 0.851124 |
Table 10.
F1 Score values for classification methods.
Table 10.
F1 Score values for classification methods.
F1 Score Value for Balanced Dataset | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
K-CGAN | Original Dataset | SMOTE | ADASYN | B-SMOTE | Vanilla GAN | WS GAN |
SDG GAN |
NS GAN |
LS GAN | |
XG-Boost | 0.999734 | 0.873016 | 0.999733 | 0.999584 | 0.999760 | 0.975749 | 0.960000 | 0.950336 | 0.971519 | 0.961263 |
Random Forest | 0.999741 | 0.867470 | 0.999881 | 0.999880 | 0.999809 | 0.969957 | 0.953168 | 0.951743 | 0.966720 | 0.956772 |
Nearest Neighbor | 0.999657 | 0.786885 | 0.991008 | 0.986707 | 0.998673 | 0.921512 | 0.925414 | 0.925776 | 0.937799 | 0.931983 |
MLP | 0.999643 | 0.861538 | 0.998844 | 0.998949 | 0.998913 | 0.960000 | 0.947658 | 0.942105 | 0.959119 | 0.950355 |
Logistic Regression | 0.999587 | 0.723214 | 0.946270 | 0.884358 | 0.995553 | 0.913108 | 0.909859 | 0.901526 | 0.915584 | 0.905830 |
Table 11.
Accuracy values for classification methods.
Table 11.
Accuracy values for classification methods.
Accuracy Value for Balanced Dataset | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
K-CGAN | Original Dataset | SMOTE | ADASYN | B-SMOTE | Vanilla GAN | WS GAN | SDG GAN |
NS GAN |
LS GAN | |
XG-Boost | 0.999733 | 0.999551 | 0.999733 | 0.999585 | 0.999761 | 0.999762 | 0.999594 | 0.999482 | 0.999748 | 0.999622 |
Random Forest | 0.999740 | 0.999537 | 0.999880 | 0.999880 | 0.999810 | 0.999706 | 0.999524 | 0.999496 | 0.999706 | 0.999580 |
Nearest Neighbor | 0.999655 | 0.999270 | 0.990905 | 0.986578 | 0.998678 | 0.999244 | 0.999244 | 0.999230 | 0.999454 | 0.999342 |
MLP | 0.999641 | 0.999494 | 0.998839 | 0.998952 | 0.998917 | 0.999608 | 0.999468 | 0.999384 | 0.999636 | 0.999510 |
Logistic Regression | 0.999585 | 0.999129 | 0.947643 | 0.887842 | 0.995568 | 0.999174 | 0.999104 | 0.999006 | 0.999272 | 0.999118 |
Table 12.
Comparison of classification models using the K-CGAN synthetic data only (sample of 30,000 valid and 30,000 fraud transactions generated by K-CGAN model).
Table 12.
Comparison of classification models using the K-CGAN synthetic data only (sample of 30,000 valid and 30,000 fraud transactions generated by K-CGAN model).
Algorithm | Precision | Recall | F1 Score | Accuracy |
---|---|---|---|---|
XG-Boost | 1.0 | 1.000000 | 1.000000 | 1.00000 |
Random Forest | 1.0 | 0.982301 | 0.991071 | 0.99996 |
Nearest Neighbor | 1.0 | 0.929204 | 0.963303 | 0.99984 |
MLP | 1.0 | 1.000000 | 1.000000 | 1.00000 |
Logistic Regression | 1.0 | 0.946903 | 0.972727 | 0.99988 |
Comments are closed.