A Multi-Modal Entity Alignment Method with Inter-Modal Enhancement

4.1. Experimental Settings
Datasets. Experiments in this paper utilize three public knowledge graph datasets: FB15K, DB15K, and YG15K. FB15K (Freebase15K) is a knowledge graph dataset developed by Facebook AI Research and released in 2015, containing 14,951 entities, 592,213 relation triples, 29,395 attribute triples, and 13,444 images. The dataset includes real-world entities and relations such as people, organizations, locations, and times. DB15K (Deep Learning Benchmarking 15K) is a knowledge graph dataset developed by researchers from Leipzig University and released in 2016, containing 12,842 entities, 89,197 relation triples, 48,080 attribute triples, and 12,837 images. The dataset includes entities and relations from Wikidata relations such as “place_of_birth” and “place_of_death”. YG15K (YAGO3-SP Geospatial) is a knowledge graph dataset developed by researchers from the Max Planck Institute and released in 2017, containing 15,404 entities, 122,886 relation triples, 23,532 attribute triples, and 11,194 images. The dataset includes entities and relations from YAGO3 and GeoNames, where GeoNames is a geospatial entity library containing locations and geographical features worldwide. These datasets have been widely used in multi-modal entity alignment tasks because of their large scale and diverse domains, making them the most representative datasets for multi-modal entity alignment.
where S denotes the set of triples, denotes the indicator function (if · is true, the function value is 1, otherwise the value is 0), and is the link prediction ranking of the i-th triple. Higher values of Hits@n and MRR indicate better entity alignment performance of the model, while a lower value of MR can also prove this point.
Implementation Details. The initial phase of the experiment started with a data pre-processing operation on the dataset. We performed a normalization operation on the image data in the dataset, using the Z-score normalization method to normalize all the images. This method calculates the mean and standard deviation of each pixel and transforms it into a distribution with mean 0 and standard deviation 1, which allows for better comparability of pixel values of the images, as well as better stability and convergence. In addition, the numerical information in the dataset is normalized so that the range of values is limited to [0,1]; the duplicate data and missing data in the dataset are carefully screened to remove these data to ensure the accuracy of the experiment.
4.3. Results and Analysis
4.3.1. Overall Results
All three models processed the numerical modality when comparing the MEAIE model with MMEA and MultiJAF. However, the other two models ignored the cross-modal effects and the impact of weak modalities, whereas this paper improved upon these points. It was found that the final experimental results showed an improvement of at least 14% in Hit@1, 15% in Hit@5, and 17% in Hit@10, as well as an increase of at least 6% in MRR. It demonstrates the necessity of introducing cross-modal enhancement mechanisms, adding attention layers, and the rationality of selecting modal knowledge and fusion methods. However, it was discovered during the experiment that some entity images were missing in the knowledge graph, causing these entities to lack visual knowledge and therefore affecting the final entity alignment performance due to the absence of visual features. This paper used a strategy of replacing visual features with zero vectors, which did not enhance the representation of entity relations or correctly assign attribute weights, resulting in a slight improvement in experimental results.
4.3.2. Ablation Study
The first set of variables reveals that every modality contributes to entity alignment. Notably, visual knowledge significantly impacts entity alignment, as evidenced by the substantial decrease in Hit@1, Hit@10, and MRR. This is because, in this paper, we leveraged visual knowledge to enhance entity relations and allocate attribute weights, introducing inter-modality effects. Thus, the impact of visual knowledge is the greatest among all variables, which is consistent with the characteristics of the proposed model. Concerning the additional numerical modality introduced in this paper, the experimental results showed a slight decrease in Hit@1, Hit@10, and MRR when the numerical modality was missing, further demonstrating the feasibility of adding a numerical modality.
In the second set of variables, this paper demonstrates that introducing an attention layer was beneficial for the entity alignment task. The main reason was to avoid the excessive influence of weak modalities, allowing potent modalities to occupy a higher weight and weak modalities to have a relatively smaller weight proportion, thereby further improving the effectiveness of entity alignment after completing the joint embedding. Similar effects were observed in the FB15K-YG15K dataset during the same ablation experiments, but this paper will only go into some detail here.
4.3.3. Seed Sensitivity
Specifically, in the experimental preparation phase, sensitivity experiments were conducted on the seed entity parameters of multi-modal entity alignment methods. Through experiments, it was found that MMEA exhibited relatively poor performance in training pre-aligned seeds. This was because the network structure of MMEA was fairly simple and had poor fitting ability, resulting in weak dependence on pre-aligned entities. MEAIE showed a significant improvement in Hit@1, Hit@10, and MRR compared to the MCLEA model, validating that the entity alignment performance of the MEAIE model gradually improves with the increase in training seed ratio. Furthermore, the graph shows that the MSNEA model had the most outstanding experimental results when the seed ratio reached 80%, with Hit@10 and MRR results even higher than the MEAIE model, indicating that the MSNEA model’s performance can only compare a high level with a high proportion of seed pairs, while the MEAIE model can perform well even with a limited number of pre-aligned entities.
Disasters Expo USA, is proud to be supported by Inergency for their next upcoming edition on March 6th & 7th 2024!
The leading event mitigating the world’s most costly disasters is returning to the Miami Beach
Inergency is celebrating 4 years as partner with the Disasters Expo USA 😍 😍