Research on Blockchain Transaction Privacy Protection Methods Based on Deep Learning
[ad_1]
1. Introduction
In order to address the issues mentioned above, we propose a scheme that balances privacy protection and regulatory functions. Our main contributions are summarized below:
-
We propose a blockchain transaction scheme that integrates a variety of cryptographic technologies to balance privacy protection and regulatory functions. Specifically, it adopts probabilistic public-key encryption to protect the user’s identity from being exposed.
-
To validate the basic legality of blockchain transactions, our scheme employs cryptographic commitment schemes and zero-knowledge proof technology. It further integrates graph neural networks (GNNs) technology for anomaly detection in blockchain transaction data, thus meeting the requirements for transaction privacy protection and regulatory compliance without disclosing sensitive transaction information.
-
Our scheme allows regulatory authorities to avoid storing users’ real identities and key information, significantly reducing storage and computational burdens. Under the premise of ensuring transaction efficiency as much as possible, it balances the implementation of privacy protection and regulatory functions.
2. Preparatory Knowledge
2.1. Literature Review
Blockchain features include decentralized storage, data immutability, and consensus mechanisms. These features ensure the transparency and security of blockchain data, but they also create challenges for users’ privacy protection. Researchers have introduced numerous privacy protection technologies to safeguard privacy.
2.2. UTXO Model
2.3. Probabilistic Public-Key Cryptosystems
Parameter Setting: Let , where and are large primes, and
. Here, is the public key, while and serve as the private keys. Define the plaintext space as , the ciphertext space as , and the key space as .
Encryption: For plaintext message to be encrypted, the process is as follows:
-
Randomly select a seed and use the BBS generator to produce random bits as the keystream;
-
Calculate
;
-
Calculate
, ;
-
The ciphertext is .
Decryption: The process of decrypting the ciphertext is as follows:
-
Calculate
p − 1 ) ;
-
Calculate
q − 1 ) ;
-
Calculate
;
-
Calculate
;
-
Utilize the Chinese remainder theorem to calculate , satisfying
and
;
-
Using the BBS generator, derive from the seed ;
-
For each bit , compute
;
-
The decrypted plaintext is .
2.4. Identity-Based Cryptosystems
Define as the generator of an additive cyclic group on an elliptical curve , as the generator of a similar group on an elliptical curve , as a hash function, and as a bilinear pair. Considering A as the signer and B as the verifier, the digital signature process for SM9 is as follows:
Key Generation: The KGC selects a random number as the master private key for signing and computes as the master public key. Therefore, the master key pair is established as . The identification of user A is . To create A’s private signing key , the KGC computes and within the field , subsequently obtaining .
Signing: To sign a message , A’s signing process is as follows:
-
;
-
Select a random number ;
-
, ,
;
-
. Then, ’s signature is .
Verification: For verifying a signature on the message , B follows the following steps:
-
;
-
, ;
-
, , ;
-
, if , the signature verification is successful; if not, it fails.
2.5. Password Commitment Program
2.6. Graph Neural Networks
This model primarily comprises three major modules. The graph construction is based on node representation matrices, adjacency matrices and temporal density matrices to construct a directed weighted graph, generating distinctive feature representations for each node. In this framework, the node representation matrix includes information about the nodes’ out-degree and in-degree, as well as the node type. The adjacency matrix is constructed with four types of edges based on transactions, contract calls, rewards, and other methods. The time density matrix is built according to the frequency and timing of interactions between account addresses. The model employs graph convolutional neural networks for learning and ultimately uses the softmax function to predict the node types for anomaly identity detection.
The model initially constructs a graph network through blockchain transaction information, then samples labeled accounts, extracting subgraphs centered around the target accounts as input for the model. Finally, it trains a GNN model and evaluates the results. Experiments conducted on the EOSG and ETHG datasets demonstrate that this method achieves superior results in the domain of identity inference.
3. Deep Learning-Based Blockchain Transaction Privacy Protection Model
This paper integrates technologies such as the UTXO transaction model, the BG probabilistic public-key encryption algorithm, the IBC cryptographic system, Pedersen commitments, and graph neural networks to propose a supervised blockchain transaction privacy protection scheme. The design process is introduced in detail below.
3.1. Model
The scheme comprises the following seven algorithms:
: This is the key generation process of the BG algorithm. It generates the BG algorithm’s public key pk and private key sk using large primes p and q. This algorithm provides probabilistic encryption, which generates different ciphertexts even if the same message is encrypted multiple times.
: This is the encryption process of the BG algorithm. It encrypts message m utilizing the public key pk of the probabilistic public-key BG algorithm to produce the ciphertext.
: This is the decryption process of the BG algorithm. It decrypts ciphertext ct utilizing the private key sk of the probabilistic public-key BG algorithm to retrieve the plaintext. A user with the correct private key can successfully decrypt the ciphertext.
: This is the key generation function of the SM9 algorithm based on IBC. Generate the user’s private key by employing the SM9 algorithm’s master key (sk) and the user’s identifier (id).
: This is the encryption process of the SM9 algorithm. It encrypts message m using the public key pk of the SM9 algorithm to produce ciphertext. SM9 is an identity-based encryption algorithm, which means that the encryption can be performed directly using the user’s public identity information.
: This is the decryption process of the SM9 algorithm. It decrypts ciphertext ct using the private key sk of the SM9 algorithm to retrieve the plaintext. A user with the correct private key can decrypt successfully.
: This is the signature process of the SM9 algorithm. It signs message m using the private key sk of the SM9 algorithm to obtain the signature value. This ensures the message remains unaltered throughout transmission, ensuring data integrity and non-repudiation.
This scheme provides public-key encryption and decryption using the BG probabilistic public-key cryptography algorithm (with a key size of 2048 b), which provides strong security guarantees for transactions, especially in terms of its ability to counter selective plaintext attacks. We also use the SM9 algorithm based on the IBC cryptosystem (with a key size of 256 b), whose encryption strength is equivalent to the RSA encryption algorithm of 3072 b. The SM9 algorithm allows the direct use of a user’s identification data as the public key, which simplifies the process of distributing and managing the key. In addition, it provides digital signature and authentication functions; this approach can secure transactions and verify user identities in certain situations. The use of these two algorithms enhances the system compatibility and flexibility, enabling the scheme to meet different transaction scenarios. By combining different encryption algorithms, a more complex security framework is constructed for the scheme, which enhances the security of the whole system.
3.2. Anonymous Identity Realization
During the initialization phase of the scheme, the regulatory authority needs to generate three public-private key pairs: firstly, using the BG algorithm to produce private key and public key ; secondly, as the KGC within the IBC framework, the regulatory authority creates a master public key and corresponding a master private key ; thirdly, defining the identity marker in IBC as , considering as the public key, the signature private key is created using the master private key based on the IBC algorithm. Then, users apply for key distribution from the regulatory authority using unique identifiable information (which needs to be self-proving, such as an email address, ID card number, or phone number).
After verifying user identity information, the regulatory authority encrypts it using the public key of the BG probabilistic cryptography algorithm to generate . To ensure that is certified by the regulatory authority, it needs to be signed by the authority, generating . Define . Since is obtained using the BG probabilistic public-key encryption algorithm and has good randomness, and is obtained through the IBC signature, also possesses good randomness, effectively hiding the user’s real identity information .
Next, is used as the public-key identity. Utilizing the IBC algorithm, the regulatory authority generates the corresponding private key for the user. The user’s verifiable true identity is denoted as , while represents their calculated anonymous identity, being the corresponding private key. Employing the BG algorithm enables the generation of various from the same , establishing a one-to-many connection between and . This relationship permits the theoretical creation of limitless from the same , allowing users to continuously renew their anonymous identities.
For ease of subsequent description, define the transaction sender and receiver’s identity markers as and , respectively. Through the above process, their corresponding anonymous identities and , and private keys and can be calculated. When the sender transacts with the receiver, they can utilize to decrypt the UTXO input script and set as the receiver’s address, thereby maintaining identity anonymity.
3.3. Transaction Data Privacy Protection
In this transaction, there are two inputs with amounts and and two outputs: one for the transaction with , amounting to , and the other returning change to oneself, amounting to . Additionally, the portion is the transaction fee, serving as the miner’s fee for packaging the transaction.
where and can be decrypted by using the private key .
where encompasses details verifying the range of the transaction value. is broadcast across the network and after miners verify its legitimacy, it is incorporated into blocks and documented in the blockchain ledger via consensus protocols. The receiver can acknowledge the transaction using and then decrypt using their private key to obtain transaction information, thereby completing the entire process of the transaction while concealing the amount of the transaction.
3.4. Transaction Legitimacy Verification
In a blockchain, transactions are recorded via a consensus mechanism. During the consensus process, miners verify transactions’ legitimacy, which primarily includes the verification of participant identity and transaction amount legitimacy.
Identity legitimacy verification involves verifying the legitimacy of both the sender and receiver’s identities. Within the UTXO model, the sender uses to unlock UTXO inputs. Therefore, miners only need to use the sender’s anonymous identity public key (called ) to verify the legitimacy of the unlocking script signature.
represents the signature performed by the regulatory authority using its private key on . Thus, to validate the unlocking script’s signature, miners just have to utilize the sender’s anonymized public identity key (designated as ).
Transaction amount legitimacy also requires two aspects of verification: the equality of input and output amounts and the validity of the range of output amounts.
3.5. Micro-Level Supervision Algorithm for Transaction Data
Blockchain transaction privacy protection is relative, primarily aimed at protecting user data from unauthorized access by malicious third parties. Nevertheless, regulatory authorities need transaction monitoring to combat illegal activities. Thus, it is crucial to ensure participant identities and transaction amounts can be regulated.
The regulatory authority thus obtains the transfer amount to . Similarly, processing allows for the querying and monitoring of blockchain transactions.
3.6. Anomaly Transaction Data Detection Based on Graph Neural Networks
Anomaly detection is a method used to identify behaviors that deviate from the expected norm. The task of graph-based anomaly detection aims to uncover nodes, edges, or subgraphs within a network that exhibit significantly outlier characteristics. Anomaly detection of transaction data using GNNs is particularly useful in identifying fraudulent activities, money laundering and other anomalous patterns in financial transactions. This method is especially adept at handling complex financial networks, where transaction relationships can be modeled as graph structures, with nodes representing participants (such as individuals and companies) and edges representing transactions.
In this application, the GNNs’ role is to leverage the structural information of the graph to learn underlying patterns within transaction data. Traditional fraud detection methods typically rely on rules or simple machine learning models that may not be able to capture complex non-linear fraud patterns. In contrast, GNNs can more effectively identify anomalous patterns by considering the relationships between nodes and transaction patterns in the transaction network.
First, identify the graph structure relevant to the specific context and represent the data in graphical format. Next, determine the type of graph, such as directed/undirected or homogenous/heterogeneous. Subsequently, develop a loss function. Depending on the graph learning task, prediction types can be categorized at various levels: node, edge, community, or graph-wide. Finally, establish computational modules and train the model. The propagation module facilitates information exchange between nodes, enabling the aggregation of information to capture the graph’s characteristics and topological details. The sampling module is responsible for graph sampling. For higher-dimensional subgraph representations, the pooling module can extract node information.
where represents the tensor of at the -th iteration cycle.
With supervised nodes, the loss function for GNN training incorporates true values and predicted values , and leverages a gradient descent strategy with the following steps: The state is iteratively updated according to Equation (24) for cycles until it approaches the fixed-point solution near Equation (26), at which point the obtained will be close to the fixed-point solution . During backpropagation, the gradient of the weight is calculated from the loss, and then is continuously updated based on the gradient computed in the previous step. After cycles, the gradient with respect to is obtained, which is then used to update the model parameters.
The framework for anomaly detection in transaction data based on GNNs is a technique for identifying and locating abnormal information in transaction data, which plays a significant role in fields like finance, e-commerce, and insurance. The process flow of the GNN-based transaction data anomaly detection model can be broadly divided into the following:
-
Data Preprocessing: Initially, transaction data are converted into graph data where nodes represent transaction entities (e.g., users, merchants, banks, etc.) and edges represent transaction relationships (e.g., payments, transfers, refunds, etc.). Attributes of nodes and edges represent transaction characteristics (e.g., amount, time, frequency, type, etc.).
-
Graph Neural Networks: Subsequently, GNNs are employed for feature extraction and representation learning of the graph data. Utilizing the attribute information and structural information of nodes and edges, low-dimensional vector representations for each node and edge are obtained.
-
Anomaly Scoring: The vector representations of each node and edge are then assessed using an anomaly scoring function to compute their level of anomaly. Candidates for abnormal transactions are selected based on certain thresholds or ranking methods.
-
Anomaly Interpretation: Finally, the anomaly interpretation module explains the candidate nodes and edges involved in abnormal transactions. This analysis includes the causes and effects of anomalies, providing visual and interpretable results to help users understand and address abnormal transactions.
The detailed design of the detection model includes the following:
-
Input: Transaction data forms an attribute graph G = (V, E, X). Nodes V denote transaction entities, while edges E depict the relationships between them. X is the node attribute matrix representing transaction features like amount, time, frequency, type, etc.
-
Output: An anomaly score S ∈ R|V| for each node, indicating the degree of anomaly. A higher score suggests a higher likelihood of anomaly.
-
Model Structure: The model has three parts: the graph neural network, the anomaly scoring function, and the loss function.
- ♦
-
Graph Neural Network: The graph neural network extracts feature representations from the graph data. Various types of GNNs can be used. Assuming GCN as an example, the GNN formula is as follows:
Here, is the node feature matrix at layer l, is the feature dimension of layer l, is the trainable weight matrix at layer l, is the adjacency matrix with self-loops, is the degree matrix of , and is an activation function like ReLU. After L layers of the GNN, the final node feature representation is obtained.
- ♦
-
Anomaly Scoring Function: The anomaly scoring function computes the anomaly score based on the node’s feature representation. Various types of scoring functions can be used, such as those based on reconstruction error, distance, or density. Assuming a scoring function based on the reconstruction error as an example, the equation is
Here, is the reconstructed node attribute matrix, which can be decoded from . is the Frobenius norm, representing the root of the total sum of each matrix element squared. The larger the reconstruction error, the more inconsistent the node’s attributes are with the normal pattern, hence the higher the anomaly score.
- ♦
-
Loss Function: The loss function optimizes the model’s parameters to better distinguish between normal and abnormal nodes. Various types of loss functions can be used, such as contrastive, self-supervised, or adversarial. Assuming a contrastive-based loss function as an example, the formula can be expressed as
Here, is the anomaly score of node v, and is a temperature parameter for controlling the scaling of scores. The purpose of this loss function is to maximize the scores of anomalous nodes while minimizing the scores of normal nodes, thereby increasing the score differences between nodes. This loss function requires some prior anomaly labels, which can be obtained by simple rules or statistical methods, or by semi-supervised or unsupervised methods.
-
5. Conclusions and Discussions
To address the challenge of balancing privacy protection and regulatory requirements in blockchain transactions, we integrate multiple cryptographic technologies, utilizing probabilistic public-key cryptography, identity-based cryptography (IBC), Pedersen commitments, and Bulletproofs techniques, combined with deep learning graph neural networks, to propose a blockchain transaction and regulatory scheme that offers both privacy protection and regulatory functions. Our scheme can be applied as an independent module in existing blockchain technologies. Our analysis of its security performance reveals that the blockchain transaction scheme is simple and practical. It holds extensive applicative value in areas such as digital asset risk analysis and financial transaction regulation.
Although our scheme balances privacy protection and regulatory functions, it still has some limitations. For example, there is still room for optimizing the improvement in transaction efficiency. It is an important research direction to improve the transaction speed, reduce the verification time, and minimize the computational overhead as much as possible in regulatable blockchain transactions. In addition, it is also important to choose appropriate privacy protection schemes for different transaction scenarios. Therefore, future research will focus on realizing the balance of privacy protection, regulatory function, transaction efficiency and other elements in blockchain transactions.
Legal regulations are also an important issue to be considered for blockchain technology. Blockchain’s decentralization, tamper-resistance, transparency, and security make it widely applicable in the financial sector, which has resulted in some illegal activities choosing to use blockchain technology for transactions. Therefore, countries are developing specific laws and regulations related to cryptocurrencies, such as registration requirements for trading platforms and anti-money laundering regulations. At the same time, they are actively supervising the digital currency market to prevent possible financial risks. In addition, several other countries have established specific regulatory frameworks for blockchain, aiming to maintain financial security, protect consumer interests, and enhance the ability to combat money laundering. Our scheme provides regulatory capabilities for blockchain trading activities under the premise of privacy protection, which can help relevant organizations avoid illegal transactions. However, the current legal system of each country for blockchain technology is still not perfect, and the relevant rules are still being optimized. Therefore, future research should enhance the compatibility and flexibility of the scheme to adapt to the legal requirements of different countries.
[ad_2]