The human in the machine
Neural Language Models as Domain-Specific Knowledge Bases

The fundamental challenge of natural language processing (NLP) is resolution of the ambiguity that is present in the meaning of and intent carried by natural language. Ambiguity occurs at multiple levels of language understanding, as depicted below:
To resolve ambiguity within a text, algorithms use knowledge from the context within which the text appears. For example, the presence of the sentence “I visited the zoo.” before the sentence “I saw a bat” can be used to conclude that bat represents an animal and not a wooden club.
While in many situations neighboring text is sufficient for reducing ambiguity, typically it is not sufficient when dealing with text from specialized domains. Processing domain-specific text requires an understanding of a large number of domain-specific concepts and processes that NLP algorithms cannot glean from neighboring text alone.
As an example, in Title Insurance and Settlement domain, an algorithm may require an understanding of concepts like:
This is where domain-specific knowledge bases come in.
Domain-specific knowledge bases capture domain knowledge that NLP algorithms need to correctly interpret domain-specific text.
Depending on the use case, knowledge housed in a knowledge base could be of a specific type or of multiple types.
Traditionally, knowledge bases have been modeled as graph-based ontologies and added as one of the components in text processing pipelines. However, with the advent of Transformer-based language models, like BERT and GPT-2, there has been research and evaluation[1][2] on the type and quality of knowledge inherent in a trained language model.
Note: Statistical language models are probability distributions over sequences of words. The probability distributions are used to predict the most likely word(s) at a position in a text sequence, given preceding and/or succeeding words of the sequence.
A favorable result of such evaluation can make a case for dropping the separate knowledge base component from text processing pipelines to significantly reduce their complexity. Focusing on this idea, this blog post describes the challenges of graph-based knowledge bases and provides an evaluation of domain knowledge present in language models that have been fine tuned on text from the title and settlement domain.
To evaluate a knowledge base, one needs to test it on all types of knowledge that will be required to process text from the domain. Below is the list of select knowledge types that might be required in a knowledge base for domain-specific text processing:
Depending on the business domain and use case, more types of knowledge may be required in the knowledge base.
To create a knowledge base, one needs to define the knowledge that would be present in the knowledge base. Depending on the knowledge base, this is done either completely by human experts, for example WordNet[3], or by automated algorithms that may or may not build upon human-defined knowledge, for example YAGO[4].
Knowledge, once defined, is modeled in knowledge bases as graphs or ontologies. Concepts like classes and individuals are modeled as nodes, and relations amongst them are modeled as edges of graphs. Classes express concepts like documents and events. Individuals express instances of classes, for example 1003s, closing disclosures, and deeds all being instances of class documents. Edges capture relationships between classes and individuals. Examples of relationships that edges capture are is-type-of, is-instance-of, and has-attribute. In the example below, a directed edge between 1003 and Document is used to capture the knowledge that a 1003 is a type of Document.
The ontology or graph itself is represented using knowledge representation languages like RDF, RDFS, and OWL, and stored in formats like XML.
There are many popular knowledge bases — like YAGO and Concept Net — that model a variety of relationships this way. They often pick up entities to model from other knowledge bases like Wikipedia and WordNet.
Once a knowledge base is ready, different NLP models use different ways to incorporate it in their pipelines. Options include:
These methods are non-trivial and require significant effort to invent and incorporate in training and application phases.
Based on the above, it is clear that knowledge bases, in the form of graphs and ontologies, are costly in terms of time, effort, and money, especially when human experts are involved.
Since creating and incorporating ontology-based knowledge bases is costly, a better alternative is always welcomed.
In recent years, Transformer-based language models like BERT and GPT-2 have dominated the leaderboards across NLP competitions, tasks, and benchmarks. Impressively, they have achieved this with:
Given such success and ease of use, it would be ideal if a fine-tuned version of these models incorporated domain knowledge within them. If so, one can:
To evaluate, we fine tuned a roBERTa-base and a gpt2-medium model (both from Hugging Face) on internal company data and explored the knowledge captured in them. The methodology used to evaluate the models was:
1. Decide on a type of knowledge that needs to be evaluated. Example: is-a-type-of relationships
2. Generate specific instances of knowledge type to test presence of knowledge. Example: 1003 is a type of document.
3. Modify the knowledge instance by either masking the word or adding an additional fill-in-the-blank kind of question at the end of the instance. Example: 1003 is a type of _______.
4. Ask the model to fill in the blank and evaluate the answer.
Note, there are other methodologies that can be used depending on the type of knowledge that needs to be tested.
Is-type-of relationship:
4. Pre-Fine Tuning: Citibank is a type of bank .
5. Post-Fine Tuning: Citibank is a type of lender .
Note, Citibank was not present in the dataset at all. It seems the model learned that banks play the role of lenders in the title insurance and settlement business. It then applied that learning to banks it knew from its pre-training.
Entity-attribute relationship:
6. Post-fine tuning: Notary’s _____ is _____
Top values suggested for the first blank: name, information, email, contact, info, address, confirmation, schedule, number
Entity-associated-with-action relationship
7. ________ needs to be changed
Values suggested for blank are: password, title, this, nothing, CD, amount, fee, it, name
While there were a few examples where model results were not as convincing as the above examples, for most examples they were. It seems the language model nicely captures is-type-of, entity-attribute, and entity-associated-action relationships.
Part-of relationship
8. Lender package is made up of pages .
Other top answers were: two, documents, trust, boxes
For the above example, while answers like pages and documents are not wrong, they are not very useful either. Answers like closing disclosure, 1003, or signature affidavit, would have been more informative. Similarly in example 7., answers like nothing and this are not informative. These non-useful answers were found across various evaluations and could roughly be classified into the following categories: pronouns, interrogative words like ‘what’, punctuations, and words repeated with case variations.
Business process understanding
Fine-tuned RoBERTa:
9.
Fine-tuned gpt2-medium:
10.
11.
12.
Note, in examples 9., 10. and 12. — when provided with a good augmentation — the model was able to deliver a sensible answer. In contrast, in example 11., where the augmentation wasn’t typical, the model did not generate an acceptable answer. This may turn out to be an issue if downstream tasks do not have an easy way to provide the required text augmentation.
While this blog primarily focused on evaluating language models’ built-in knowledge bases, if it is required and not cost-prohibitive, one may want to consider an approach similar to COMeT[7]. COMeT fine-tunes a pre-trained language model on an existing knowledge base to predict known and new knowledge base relationships.
Training on an existing knowledge base enriches the model with wider and more accurate knowledge than what was contained in the original knowledge base or the pre-trained language model. While discussion of COMeT’s approach is out of scope for this blog, an online demo of COMeT trained on ATOMIC[8] and ConceptNet[9] could be found here.
Based on the above analysis we believe that language models — once fine tuned — do hold rich domain knowledge within them. However, there are challenges, like right augmentation text and identifying the more informative model outputs in making direct use of the knowledge.
References
[1] Petroni, Fabio, et al. “Language Models as Knowledge Bases?” Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.
[2] Davison, Joe, Joshua Feldman, and Alexander M. Rush. “Commonsense knowledge mining from pretrained models.” Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.
[3] Princeton University. “About WordNet.” WordNet. Princeton University. 2010.
[4] Suchanek, Fabian M., Gjergji Kasneci, and Gerhard Weikum. “Yago: a core of semantic knowledge.” Proceedings of the 16th International Conference on World Wide Web. 2007.
[5] Xia, Jiangnan, Chen Wu, and Ming Yan. “Incorporating Relation Knowledge into Commonsense Reading Comprehension with Multi-task Learning.” Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019.
[6] Lin, Bill Yuchen, et al. “Kagnet: Knowledge-aware graph networks for commonsense reasoning.” arXiv preprint arXiv:1909.02151. 2019.
[7] Bosselut, Antoine, et al. “COMET: Commonsense Transformers for Knowledge Graph Construction.” Association for Computational Linguistics (ACL). 2019.
[8] Sap, Maarten, et al. “Atomic: An atlas of machine commonsense for if-then reasoning.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019.
[9] Speer, Robyn, Joshua Chin, and Catherine Havasi. “ConceptNet 5.5: an open multilingual graph of general knowledge.” Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. 2017.
Get notified when new blogs post