Unsupervised Enrichment of Knowledge Graph Schemas
Date28th Jun 2023
Time11:45 AM
Venue MR - I (SSB 233, First Floor)
PAST EVENT
Details
Knowledge graphs which are representations of information as a semantic graph
have a wide-spread impact in both the industrial and academic worlds. Knowledge
graphs are considered to be promising tools for accomplishing many tasks such as question
answering, recommendation, information retrieval, etc. due to their ability to store
semantically structured information.
The ontology associated with a knowledge graph can be considered to have two
parts: the terminological box (T-Box) and the assertion box (A-Box). The A-Box part
refers to assertions about individual entities whereas the T-Box part contains statements
about the entity classes, the class hierarchy, and others. In most cases, since the A-Box
of a knowledge graph is populated using some automatic means, its size is much larger
when compared to the size of the T-Box which is mostly manually curated. Hence
completely automated techniques to enrich the T-Box of the knowledge graphs so that
they cater better to the needs of the application, have become the need of the hour.
To this end, in this thesis, we have proposed two unsupervised systems named DARO
and DOPLEX for the property and property axiom enrichment of a knowledge graph
respectively.
Given a pair of classes, DARO discovers new object properties between them along
with their instances. DARO works by identifying text patterns from the web corpus
that can potentially represent relations between individuals. These text patterns are
then clustered based on their semantic similarities and a representative pattern is picked
from each cluster to be suggested as a new object property to the ontology engineer.
DARO has been built as a recall-oriented system and is seen to be performing better
than newOntExt, which is an offshoot of the popular NELL project.
Given a knowledge graph, DOPLEX finds the disjoint object property pairs in its
schema. It does so by using Probabilistic Soft Logic (PSL) to determine whether the
property names imply disjointness in addition to the traditional method of checking for
common triples. Our evaluation demonstrates that the proposed approach discovers disjoint
property pairs with better precision when compared to the state-of-the-art system,
when tested on knowledge graphs that are auto-extracted from large text corpora. We
have also proposed Temp-DOPLEX which attempts to find potential temporally non-disjoint
object property pairs in a schema.
We also made several attempts to address the question of finding the right (sequence
of) input class-pairs to be fed to DARO. We incorporated our findings as multiple criteria
and fed them into the standard multi-criterion approach TOPSIS in order to find
the most suitable class-pairs which could be potentially connected by object properties.
Through our experimental evaluation on three popular knowledge graphs, we can see
that the proposed approach yields promising recommendations for the class-pairs that
can be fed as input to systems such as DARO.
Speakers
Ms. Subhashree S, Roll No: CS13D029
Computer Science and Engineering