A Multi-Modal Entity Alignment Method with Inter-Modal Enhancement

A Multi-Modal Entity Alignment Method with Inter-Modal Enhancement

4.1. Experimental Settings

Datasets. Experiments in this paper utilize three public knowledge graph datasets: FB15K, DB15K, and YG15K. FB15K (Freebase15K) is a knowledge graph dataset developed by Facebook AI Research and released in 2015, containing 14,951 entities, 592,213 relation triples, 29,395 attribute triples, and 13,444 images. The dataset includes real-world entities and relations such as people, organizations, locations, and times. DB15K (Deep Learning Benchmarking 15K) is a knowledge graph dataset developed by researchers from Leipzig University and released in 2016, containing 12,842 entities, 89,197 relation triples, 48,080 attribute triples, and 12,837 images. The dataset includes entities and relations from Wikidata relations such as “place_of_birth” and “place_of_death”. YG15K (YAGO3-SP Geospatial) is a knowledge graph dataset developed by researchers from the Max Planck Institute and released in 2017, containing 15,404 entities, 122,886 relation triples, 23,532 attribute triples, and 11,194 images. The dataset includes entities and relations from YAGO3 and GeoNames, where GeoNames is a geospatial entity library containing locations and geographical features worldwide. These datasets have been widely used in multi-modal entity alignment tasks because of their large scale and diverse domains, making them the most representative datasets for multi-modal entity alignment.

To ensure the effectiveness of the entity alignment task, the preparation stage of the experiment combines these three public datasets pairwise to form a diverse set of examples. These example datasets aim to cover various attributes, relations, and image information to provide sufficient diversity. These example datasets are used to measure the effectiveness of entity alignment, and their statistical data are shown in Table 1.
Evaluation Metrics. This paper evaluates all models using cosine similarity to calculate the similarity between two entities and Hits@n, MRR, and MR as evaluation metrics. Hits@n represents the accuracy of the top n entities ranked by cosine similarity, MR is the average rank of the correct entities, and MRR is the average reciprocal rank of the proper entities. Formulas for the three metrics are shown in Equations (20)–(22):

M R R = 1 | S | i = 1 | S | 1 r a n k i = 1 | S | ( 1 r a n k 1 + 1 r a n k 2 + + 1 r a n k | S | )

M R = 1 | S | i = 1 | S | r a n k i = 1 | S | ( r a n k 1 + r a n k 2 + + r a n k | S | )

H i t @ n = 1 | S | i = 1 | S | I r a n k i n

where S denotes the set of triples, I ( · ) denotes the indicator function (if · is true, the function value is 1, otherwise the value is 0), and r a n k i is the link prediction ranking of the i-th triple. Higher values of Hits@n and MRR indicate better entity alignment performance of the model, while a lower value of MR can also prove this point.

Implementation Details. The initial phase of the experiment started with a data pre-processing operation on the dataset. We performed a normalization operation on the image data in the dataset, using the Z-score normalization method to normalize all the images. This method calculates the mean and standard deviation of each pixel and transforms it into a distribution with mean 0 and standard deviation 1, which allows for better comparability of pixel values of the images, as well as better stability and convergence. In addition, the numerical information in the dataset is normalized so that the range of values is limited to [0,1]; the duplicate data and missing data in the dataset are carefully screened to remove these data to ensure the accuracy of the experiment.

This paper conducted all experiments on the two datasets with relevant parameter settings. First, this paper initialized the knowledge embeddings in the knowledge graph to limit the scope of subsequent operations. This paper set the embedding size for all models to 100 and used a mini-batch method with a batch size of 512. For each experiment, this paper trained the model for 1000 epochs and set the corresponding learning rates for learning. Additional experimental model parameters are shown in Table 2.

4.3. Results and Analysis

4.3.1. Overall Results

Our MEAIE was compared with several state-of-the-art entity alignment methods to demonstrate the proposed model’s effectiveness and superiority. Table 3 and Table 4 show the performance of all methods trained with 20% alignment seeds on the combined datasets FB15K-DB15K and FB15K-YG15K.
Table 3 shows that MEAIE achieves remarkable results in entity alignment tasks by enhancing entity representations through cross-modal effects and adding dynamic modal weights. It is precisely based on all evaluation metrics. This excludes the MR evaluation metric, as it only considers the average ranking of entity matching without evaluating the accuracy of the model’s sorting of correctly matched entities. Thus, if a model ranks high for all entity pairs but ranks the correct matching entity lower, its MR score will be lower, but in reality, the model’s matching performance is not good. In contrast, MRR pays more attention to the accuracy of the model’s sorting of correctly matched entities, thus reflecting the model’s actual performance more accurately. MEAIE achieves good results on the FB15K-DB15K dataset. Compared with traditional entity alignment methods, MEAIE outperforms the state-of-the-art method SEA by 50%, 45%, 43%, and 49% on Hit@1, Hit@5, Hit@10, and MRR, respectively, demonstrating the significant improvement of cross-modal entity alignment over traditional entity alignment. Using auxiliary modalities in multi-modal knowledge graphs can enhance entity alignment performance, validating the importance of developing auxiliary modalities in entity alignment tasks. Compared with other multi-modal entity alignment methods, such as EVA, MSNEA, and MCLEA, the proposed MEAIE model performs the best in multi-modal entity alignment tasks. When providing 20% of training seeds, MEAIE outperforms the state-of-the-art baseline methods MCLEA and MSNEA, with at least a 1.5% improvement on Hit@1, at least a 1.6% improvement on Hit@5, at least a 2.9% improvement on Hit@10, and at least a 3.2% improvement on MRR, validating the novelty and effectiveness of the proposed MEAIE model.

All three models processed the numerical modality when comparing the MEAIE model with MMEA and MultiJAF. However, the other two models ignored the cross-modal effects and the impact of weak modalities, whereas this paper improved upon these points. It was found that the final experimental results showed an improvement of at least 14% in Hit@1, 15% in Hit@5, and 17% in Hit@10, as well as an increase of at least 6% in MRR. It demonstrates the necessity of introducing cross-modal enhancement mechanisms, adding attention layers, and the rationality of selecting modal knowledge and fusion methods. However, it was discovered during the experiment that some entity images were missing in the knowledge graph, causing these entities to lack visual knowledge and therefore affecting the final entity alignment performance due to the absence of visual features. This paper used a strategy of replacing visual features with zero vectors, which did not enhance the representation of entity relations or correctly assign attribute weights, resulting in a slight improvement in experimental results.

From Table 4, the proposed MEAIE achieves objectively good experimental results on the FB15K-YG15K dataset. The model’s Hit@1, Hit@5, Hit@10, and MRR scores are 46%, 63%, 69%, and 0.534, respectively. Compared to the FB15K-YG15K dataset, where the entity alignment performance of all model methods is generally lower, this is due to the heterogeneity and other factors of the two datasets’ structures. However, the MEAIE model still achieves state-of-the-art performance and significant improvement, demonstrating good generalization and robustness in dealing with heterogeneous data in multi-modal knowledge graph entity alignment. Additionally, it is observed that EVA’s performance on the FB15K-YG15K dataset has significantly declined. This is because its multi-modal fusion approach needed to be better applied to the FB15K-YG15K dataset, resulting in poor results. On the other hand, the MEAIE model improves the alignment performance by comparing learning and adding attention layers to fuse modal knowledge effectively.

4.3.2. Ablation Study

To investigate the impact of each component of the proposed MEAIE model on entity alignment, this section designed two sets of variables for ablation experiments: (1) MEAIE without modalities, including relation, attribute, visual, and numerical modalities, i.e., w/R, w/A, w/V, w/N; (2) MEAIE without attention mechanism, i.e., simply concatenating the joint embeddings without dynamic modal weights, i.e., w/DW. Figure 2 shows the experimental results.

The first set of variables reveals that every modality contributes to entity alignment. Notably, visual knowledge significantly impacts entity alignment, as evidenced by the substantial decrease in Hit@1, Hit@10, and MRR. This is because, in this paper, we leveraged visual knowledge to enhance entity relations and allocate attribute weights, introducing inter-modality effects. Thus, the impact of visual knowledge is the greatest among all variables, which is consistent with the characteristics of the proposed model. Concerning the additional numerical modality introduced in this paper, the experimental results showed a slight decrease in Hit@1, Hit@10, and MRR when the numerical modality was missing, further demonstrating the feasibility of adding a numerical modality.

In the second set of variables, this paper demonstrates that introducing an attention layer was beneficial for the entity alignment task. The main reason was to avoid the excessive influence of weak modalities, allowing potent modalities to occupy a higher weight and weak modalities to have a relatively smaller weight proportion, thereby further improving the effectiveness of entity alignment after completing the joint embedding. Similar effects were observed in the FB15K-YG15K dataset during the same ablation experiments, but this paper will only go into some detail here.

4.3.3. Seed Sensitivity

To evaluate the sensitivity of the MEAIE model to pre-aligned entities, based on existing research, this paper uses 20%, 50%, and 80% of the alignment seeds as training sets for the entity alignment task. Figure 3 displays the training results of the model for different alignment seed proportions on the FB15K-DB15K dataset. The experimental results show that the MEAIE model achieved excellent results in almost all metrics and ratios.

Specifically, in the experimental preparation phase, sensitivity experiments were conducted on the seed entity parameters of multi-modal entity alignment methods. Through experiments, it was found that MMEA exhibited relatively poor performance in training pre-aligned seeds. This was because the network structure of MMEA was fairly simple and had poor fitting ability, resulting in weak dependence on pre-aligned entities. MEAIE showed a significant improvement in Hit@1, Hit@10, and MRR compared to the MCLEA model, validating that the entity alignment performance of the MEAIE model gradually improves with the increase in training seed ratio. Furthermore, the graph shows that the MSNEA model had the most outstanding experimental results when the seed ratio reached 80%, with Hit@10 and MRR results even higher than the MEAIE model, indicating that the MSNEA model’s performance can only compare a high level with a high proportion of seed pairs, while the MEAIE model can perform well even with a limited number of pre-aligned entities.

Disasters Expo USA, is proud to be supported by Inergency for their next upcoming edition on March 6th & 7th 2024!

The leading event mitigating the world’s most costly disasters is returning to the Miami Beach

Convention Center and we want you to join us at the industry’s central platform for emergency management professionals.
Disasters Expo USA is proud to provide a central platform for the industry to connect and
engage with the industry’s leading professionals to better prepare, protect, prevent, respond
and recover from the disasters of today.
Hosting a dedicated platform for the convergence of disaster risk reduction, the keynote line up for Disasters Expo USA 2024 will provide an insight into successful case studies and
programs to accurately prepare for disasters. Featuring sessions from the likes of FEMA,
NASA, NOAA, TSA and many more this event is certainly providing you with the knowledge
required to prepare, respond and recover to disasters.
With over 50 hours worth of unmissable content, exciting new features such as their Disaster
Resilience Roundtable, Emergency Response Live, an Immersive Hurricane Simulation and
much more over just two days, you are guaranteed to gain an all-encompassing insight into
the industry to tackle the challenges of disasters.
By uniting global disaster risk management experts, well experienced emergency
responders and the leading innovators from the world, the event is the hub of the solutions
that provide attendees with tools that they can use to protect the communities and mitigate
the damage from disasters.
Tickets for the event are $119, but we have been given the promo code: HUGI100 that will
enable you to attend the event for FREE!
Inergency is celebrating 4 years as partner with the Disasters Expo USA 😍 😍