Abstract
The task of entity alignment between knowledge graphs aims to find entities in two knowledge graphs that represent the same real-world entity. Recently, embedding-based models are proposed for this task. Such models are built on top of a knowledge graph embedding model that learns entity embeddings to capture the semantic similarity between entities in the same knowledge graph. We propose to learn embeddings that can capture the similarity between entities in different knowledge graphs. Our proposed model helps align entities from different knowledge graphs, and hence enables the integration of multiple knowledge graphs. Our model exploits large numbers of attribute triples existing in the knowledge graphs and generates attribute character embeddings. The attribute character embedding shifts the entity embeddings from two knowledge graphs into the same space by computing the similarity between entities based on their attributes. We use a transitivity rule to further enrich the number of attributes of an entity to enhance the attribute character embedding. Experiments using real-world knowledge bases show that our proposed model achieves consistent improvements over the baseline models by over 50% in terms of hits@1 on the entity alignment task.
1 Introduction
- (Background) The same entity may exist in different forms in different KGs. → These KGs are complementary to each other in terms of completeness.
- To integrate KGs, a basic problem is to identify the entities in different KGs → This study focuses on entity alignment between two KGs.
- An RDF triple used in this study consists of three elements:
- subject: entity
- object: 1) entity (relationship triple), 2) letters (attribute triple).
- relationship/predicate
- (Early studies) entity alignment: similarity between attributes of entities
-
- user-defined rules to determine the attributes to be compared between the entities
- limitation: different pairs of entities may need different attributes to be compared
-
- embedding-based models to capture the semantic similarity between entities based on relationshup triples (e.g. TransE)
- limitation: seed alignments between two KGs are rarely available, hence are difficult to obtain due to expensive human efforts required.
- (In this study) proposes a novel embedding model that 1) generated attribute embeddings from the attribute triples, 2) use attribute embeddings to shift the entity embeddings of two KGs to the same vector space.
-
- attribute similarity between two KGs helps the attribute embedding to yield a unified embedding space for two KGs.
-
- use the attribute embeddings to shift the entity embeddings of two KGs into the same vector space
→ the entity embeddings can capture the similarity between entities from two KGs.
-
also includes predicate alignment by renaming the predicates of two KGs into a unified naming scheme to ensure that the relationship embeddings of two KGs are also in the
same vector space.
- (Contributions)
- propose a framework for entity alignment between two KGs that consists of a predicate alignment module, an embedding learning module, and an entity alignment module.
- propose a novel embedding model that integrates entity embeddings with attribute embeddings to learn a unified embedding space for two KGs
- For over three real KG pairs, the model outperforms the state-of-the-art models consistently on the entity alignment task by over 50% in terms of hits@1
2 Related Work
2.1. String-Similarity-based Entity Alignment
- use string similarity as main alignment tool
- LIMES (Ngomo and Auer 2011)
- RDF-AI (Scharffe, Yanbin, and Zhou 2009)
- SILK (Volz et al., 2009)
- use graph similarity to improve performance
- LD-Mapper (Raimond, Sutton, and Sandler 2008)
- RuleMiner (Niu et al. 2012)
- HolisticEM (Pershina, Yakout, and Chakrabarti 2015)
2.2. Embedding-based Entity Alignment
[KG completion]
- translation based KGE models
- distributed representation that separate the relationship vector space from entity vector space: TransE (Bordes et al. 2013), TransH (Wang et al. 2014), TransR (Lin et al. 2015), and TransD (Ji et al. 2015), ...
- additional information along with the relationship triples to compute entity embeddings: DKRL (Xie et al. 2016), TEKE (Wang and Li 2016)
- non-translation based KGE models
- tensor-based factorization and representating relationships with matrices: RESCAL (Nickel, Tresp,
and Kriegel 2012) and HolE (Nickel, Rosasco, and Poggio 2016)
- bilinear tensor operator to represent each relationship and jointly models head and tail entity embeddings: NTN (Socher et al. 2013)
- preserve the structural information of the entities: entities that share similar neighbor structures in the KG should have a close representation in the embedding space.
[Entity alignment]
- Chen et al. (2017a; 2017b):