Efficient and Secure Distributed Data Storage and Retrieval Using Interplanetary File System and Blockchain
[ad_1]
1. Introduction
One of the main characteristics of IPFS is that anyone can access the data stored on this public network. Therefore, this can limit its usability if the stored data present privacy concerns. In the domain of digital data security, hashing and encryption are considered sophisticated cryptographic practices. Hashing, the process of generating a unique fixed-size hash value from any input data, of any size, serves as a critical tool in ensuring data integrity. This procedure is deterministic, providing that any change in the data, no matter how small, results in a significantly modified hash value, allowing the effective detection of data tampering or corruption. Encryption, on the other hand, converts original data, known as plaintext, into an encoded version known as ciphertext. This transformation, controlled by sophisticated algorithms and cryptographic keys, ensures that unauthorized entities cannot access the data, providing data confidentiality. In distributed data storage systems, combining hashing and encryption offers a robust framework for data security, effectively tackling two critical aspects of data security: integrity and confidentiality. Indeed, hashing values can be used to provide a unique fingerprint for the data that allows one to check data integrity without the need to decrypt data and compare the original contents. Conversely, the sole use of encryption techniques requires the decryption of the retrieved data, namely, the inverse transformation from ciphertext to plaintext, exposing private keys and posing a significant security risk.
The solution proposed in this paper integrates IPFS with blockchain technology to solve all the problems above. In particular, we propose to exploit the decentralized and efficient architecture of IPFS for large data storage, significantly reducing costs while improving scalability. We propose a dual-layer security mechanism that combines hashing and encryption to ensure robust data security and privacy. We propose a novel approach for verifying data integrity, which detects and handles any modifications to the data by generating and comparing hashes of the retrieved and stored information. Furthermore, we eliminate the need for decryption during data retrieval and querying, thus mitigating the risks of exposing private keys and improving overall system security. The proposed approach is developed with reference to a real-world scenario in which a set of IoT devices periodically produces several information records that need to be stored in an immutable and tamper-proof way, as well as preserving their confidentiality.
2. Related Work
The integration of IPFS and blockchain technology offers a significant advancement in distributed data storage and security. This section reviews existing research on blockchain-based data storage, using IPFS, encryption, and hashing techniques to ensure data security and privacy.
2.1. Blockchain and IPFS for Data Storage
2.2. Data Security and Privacy Methods
2.3. Query Optimization Techniques
3. Problem Statement and Formalization
In a typical IoT scenario, several IoT devices or sensors produce a set of data every day that needs to be stored and certified to ensure immutability, traceability, and tamper-proofing. In this scenario, it is crucial for the produced data to be stored permanently without any subsequent modification or deletion.
Given such premises, we can formalize the following properties that one wants to ensure by using blockchain and smart contract technology in conjunction with the IPFS protocol.
(data immutability). Given a piece of data d stored in a database or file system, we say that d is immutable if it cannot be modified after its storage, or otherwise, any subsequent modification of d can be easily identified.
Blockchain technology ensures the immutability of data stored inside its blocks. Indeed, block confirmation ensures that there are no further data modifications to the block. Typically, the amount of data stored in blockchain is minimal due to technological challenges and cost constraints. However, blockchain can ensure the immutability of data stored off-chain by essentially storing a fingerprint (or hash) of the data. Such a fingerprint is enough to identify if some information stored outside the blockchain has been modified.
In many real-world scenarios, immutability can be considered too strict since data could need to be updated or rectified due to some missing or erroneous parts. Therefore, we consider immutability together with another property called traceability.
(data traceability). Data traceability refers to the ability to follow the data transformations back to their origin, verifying the authenticity and integrity of the data. It can also be intended as the degree to which a system or a data provider can record the changes made to the data.
The last requirement is related to the system used to store data off-chain.
(data availability). Data availability refers to a user’s confidence that the stored data and all the information required to verify specific properties of that (such as immutability) are available in a given period.
Given such desired properties, we can contextualize the considered problem as follows: Suppose we have a set of IoT devices that continuously produce a set of measurements of a predefined quantity, for instance, the amount of water that flows through a pump, or the temperature in a greenhouse, or the quantity of precipitation. These pieces of data are collected through a Message Queuing Telemetry Transport (MQTT) broker, which stores the received data in centralized storage, such as a relational database, a set of log files, and so on. In this scenario, the problem we want to solve is making such information immutable and tamper-proof, providing the traceability properties mentioned above and ensuring their availability.
The following section illustrates the proposed solution, which integrates blockchain, smart contract technology, and the IPFS decentralized storage.
4. Proposed Solution
The proposed solution achieves data privacy, integrity, and query optimization by properly integrating the hashing and encryption of data. In the future, any modification can be detected by simply comparing the previous hash stored on IPFS and newly generated hashes of modified data. Consequently, this approach eliminates the need for decryption, which often depends on revealing private keys and poses security risks, limiting this possibility only to the case in which previous data need to be effectively restored. Moreover, this scheme allows for the easy separation of concerns since an integrity check can be performed without revealing the content.
Algorithm 1 Creation of the data to be stored in IPFS |
|
where s is the salt, c is the number of iterations, and is the desired key length.
Algorithm 2 Verification of a secret S without revealing it |
|
If the secret verification is successful, the data are encrypted using the derived key , as reported in line 6 of Algorithm 1. The encrypted data are then combined with the hashed data inside a JSON object (see line 7 of Algorithm 1).
Once the are generated, it is stored on IPFS. IPFS generates a unique CID, representing a reference for the encrypted and hashed data on IPFS. The blockchain is the final step of the proposed architecture. After storing the encrypted data and the hashed data on IPFS and obtaining the corresponding CID, this CID is stored on the blockchain through a smart contract. Such a smart contract will maintain a list of CIDs related to the data stored on IPFS. The immutability of blockchain ensures that the CID is recorded and cannot be changed or removed, which provides an auditable track of the data integrity and traceability. The primary purpose of this smart contract is not only to store CIDs permanently but also to retrieve and verify the persisted data.
Algorithm 3 illustrates the function DataVerification, which performs an integrity check about the current content of the centralized database and what has already been stored in the IPFS. After retrieving both contents, and , respectively, the function analyzes if each row in the database is already contained in the blockchain and, in this case, if the hashes are the same. The newly identified records are then stored in IPFS, as illustrated in Algorithm 1, and a transaction gets performed to store the corresponding new CID. Conversely, the records whose hashes do not coincide become classified as corrupted or inconsistent. This consistency verification does not need to decrypt the stored data but is based only on comparing the hashes. However, based on the specific policies of the system, in case of corrupted or inconsistent data, their original version could be eventually retrieved through a decryption of the encrypted data contained at the corresponding CID.
Algorithm 3 Data retrieval and verification |
|
5. Evaluation
This section presents the proposed solution implementation and mainly focuses on testing the system performance and security with respect to different data sizes and security vulnerabilities.
5.1. Experimental Setup
5.2. Performance Evaluation
5.2.1. Encryption Throughput
5.2.2. Total Time
5.2.3. Scalability
5.3. Security Evaluation
This section presents a security evaluation of the proposed system. We evaluate the capability of the system to check the data integrity and prevent unauthorized access. The proposed system successfully detected both attacks.
5.3.1. Data Integrity
The primary focus of AES and similar encryption algorithms is to ensure confidentiality. However, these systems do not ensure data integrity, which means that they do not detect data changes during transmission or storage. An attacker could modify encrypted data, which leads to incorrect but successful decryption. Therefore, the AES algorithm requires an additional mechanism for data integrity. The proposed solution addresses this data integrity challenge by integrating a hashing mechanism with encryption. The proposed system computes the original data hash, a unique digital fingerprint of data. After decryption, it computes the hash of decrypted data and compares it with the original hash. If both hashes match, no tampering occurred during the encryption and decryption. The proposed data integrity approach ensures robust security against unauthorized modifications, improving stored or transmitted information security.
5.3.2. Unauthorized Access
The unauthorized access attack consists of accessing the content of the encrypted data without a valid key. We perform an evaluation against this attack, which involves two steps: first, a secure encryption of data with a valid secret key and then an attempt to decrypt these data by using a different, unauthorized secret key. This approach simulates a real-world attack scenario where an attacker tries to gain access to encrypted information. The system successfully identified the unauthorized attempt as it continuously failed to decrypt the data by using the unauthorized key. This resulted in either an empty output or an error. These results show the proposed solution’s robust security and efficiency in preventing unauthorized access and the reliability of system encryption and key management schemes in protecting sensitive data.
6. Conclusions
Data immutability, traceability, and availability could be provided by properly integrating blockchain technology and IPFS. This paper proposes a complete architecture that properly combines these two technologies with encryption and hashing algorithms to ensure the mentioned properties efficiently. A set of scalability tests have also been performed to demonstrate the system’s capabilities to adapt to an increased amount of data. Finally, some considerations about the prevention of unauthorized access and the ensuring of data integrity are also provided.
[ad_2]