CN115221313A

CN115221313A - Knowledge entity identification method and knowledge entity identification device

Info

Publication number: CN115221313A
Application number: CN202110410253.6A
Authority: CN
Inventors: 曾俋颖; 邱德旺
Original assignee: Delta Electronics Inc
Current assignee: Delta Electronics Inc
Priority date: 2021-04-16
Filing date: 2021-04-16
Publication date: 2022-10-21

Abstract

A knowledge entity identification method comprising the steps of: receiving a target text to be analyzed and metadata, wherein the target text comprises candidate words; comparing the candidate words in the knowledge base to obtain a plurality of entity names related to the candidate words from the knowledge base, wherein each entity name has corresponding entity description data; comparing the entity description data in the knowledge base with the metadata to obtain a comparison result; and setting the entity name related to the candidate word in the knowledge base as the output classification of the candidate word in the target text according to the comparison result. The disclosure also relates to a knowledge entity recognition device.

Description

Knowledge entity identification method and knowledge entity identification device

Technical Field

The present disclosure relates to electronic devices and methods thereof, and more particularly, to a knowledge entity recognition device and method.

Background

In the conventional knowledge management method, experts manually mark data of all files one by one. With the development of technology, the current data tagging method can analyze syntax and semantics through natural language technology, however, such corpus analysis cannot make a machine understand new words, and needs to be tagged by expert personnel. The existing training process using labeled data is too tedious and lacks flexibility, it is difficult for an established knowledge management system to train a knowledge management system in a new field based on the existing data, and a quite high training cost is required for the establishment of knowledge management systems in different fields.

In view of the above, the knowledge management system is a tool with considerable management ability, but at present, an efficient construction method is still lacking, and the execution accuracy of the knowledge management system still has room for improvement. Accordingly, how to provide an efficient system building method and provide high-precision knowledge management is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

According to an embodiment of the present disclosure, a knowledge entity identification method is disclosed, which includes the steps of: receiving a target text to be analyzed and metadata, wherein the target text comprises candidate words; comparing the candidate words in the knowledge base to obtain a plurality of entity names related to the candidate words from the knowledge base, wherein each entity name has corresponding entity description data; comparing the entity description data in the knowledge base with the metadata to obtain a comparison result; and setting the entity name associated with the candidate word in the knowledge base as the output classification of the candidate word in the target text according to the comparison result.

According to another embodiment, a knowledge entity identification apparatus is disclosed that includes a knowledge entity candidate generation module, a knowledge entity validation and enhancement module, and a knowledge entity classification module. The knowledge entity candidate generation module is configured to receive a target text to be parsed and metadata, and compare candidate words of the target text in a knowledge base to obtain a plurality of entity names associated with the candidate words from the knowledge base, wherein each entity name has corresponding entity description data. The knowledge entity verification and enhancement module is coupled to the knowledge entity candidate generation module, wherein the knowledge entity verification and enhancement module is configured to compare the entity description data and the metadata in the knowledge base to obtain a comparison result. The knowledge entity classification module is coupled to the knowledge entity verification and enhancement module, wherein the knowledge entity classification module is configured to set an entity name associated with the candidate word in the knowledge base as an output classification of the candidate word in the target text according to the comparison result.

Drawings

The following detailed description will facilitate a better understanding of embodiments of the disclosure when read in conjunction with the accompanying drawings. It should be noted that the features of the drawings are not necessarily drawn to scale in accordance with the requirements of an illustrative implementation. In fact, the dimensions of the various features may be arbitrarily increased or reduced for clarity of discussion.

Fig. 1 shows a block diagram of a knowledge entity identification apparatus in an embodiment according to the present disclosure.

FIG. 2 shows a flow diagram of a knowledge entity identification method in an embodiment in accordance with the disclosure.

FIG. 3 shows a flow diagram of a knowledge entity identification method in accordance with an embodiment of the present disclosure.

Description of the reference numerals:

100: knowledge entity recognition device

102: inputting data

104: outputting the data

112: knowledge entity candidate generation module

114: knowledge entity verification and enhancement module

116: knowledge entity classification module

200. 300, and (2) 300: knowledge entity identification method

500: knowledge base

502: analysis and storage module

600: external universal knowledge base

S210 to S240, S310 to S340: step (ii) of

[ depositing of biological Material ]

Domestic register information (please note according to register organization, date and number order)

Is free of

Foreign deposit information (please note according to the country, organization, date, number of deposit)

Is free of

Detailed Description

The following disclosure provides many different embodiments for implementing different features of the disclosure. Embodiments of the elements and arrangements are described below to simplify the present disclosure. Of course, these embodiments are merely exemplary and not intended to be limiting. For example, the terms "first", "second", etc. are used in this disclosure to describe elements, but are used only to distinguish the same or similar elements or operations, and the terms are not used to limit the technical elements of the disclosure, nor to limit the order or sequence of operations.

Referring to fig. 1, a block diagram of a knowledge entity identification apparatus 100 according to an embodiment of the disclosure is shown. The knowledge entity recognition apparatus 100 is used for recognizing a target object in input data 102 and providing recognized output data 104. For example, the knowledge Entity Recognition device 100 parses an input text, sentence, paragraph, etc. to perform Named Entity Recognition (Named Entity Recognition). In one embodiment, the input data 102 received by the knowledge entity recognition device 100 includes target text and Metadata (Domain Metadata). The target text is the data to be parsed. The metadata is data for assisting the analysis of the target text, and may be a category and a keyword thereof designed by the user in advance.

In one embodiment, the knowledge entity identifying apparatus 100 is coupled to the knowledge base 500. The knowledge base 500 is coupled to an external general knowledge base 600. The external general knowledge base 600 is a database having different formats and domain contents, such as wikipedia, a specialized dictionary, and domain expert knowledge. The knowledge base 500 may be a database that stores knowledge data created by internally self-defined knowledge and/or data through an external universal knowledge base 600. For example, the knowledge base 500 is provided with a parsing and storage module 502. The parsing and storing module 502 may read the data in the external common knowledge base 600 and convert the external data into a data structure with a specific format, for example, the external data and the domain expert knowledge data are processed regularly, so that the data stored in the knowledge base 500 may be provided to the knowledge entity recognition apparatus 100 for use in recognizing the target text.

In one embodiment, the knowledge entity identification apparatus 100 comprises a knowledge entity candidate generation module 112, a knowledge entity verification and enhancement module 114, and a knowledge entity classification module 116. The knowledge entity candidate generation module 112 is electrically coupled to the knowledge entity verification and enhancement module 114. The knowledge entity verification and enhancement module 114 is electrically coupled to the knowledge entity classification module 116. For the purpose of facilitating an understanding of the present disclosure, reference is made to fig. 1 and 2 in conjunction with the following description. FIG. 2 shows a flow diagram of a knowledge entity identification method 200 in accordance with an embodiment of the present disclosure. The knowledge entity identification method 200 may be performed by the knowledge entity identification apparatus 100 of FIG. 1.

In step S210, the knowledge entity candidate generation module 112 receives the target text and the metadata to be parsed.

In one embodiment, the target text to be parsed is text data to be analyzed, and includes one or more sentences or paragraphs. In another aspect, the metadata includes a plurality of categories (keys), each category including a plurality of key terms (values). The user may define all the categories of metadata and the keyword of each category in advance and input the target text to the knowledge entity recognition apparatus 100 at the same time. To facilitate the description of the present disclosure, the following text is presented by way of example with the sentence "An applet a day keys the sector access", and metadata is shown in table one. It should be noted that the present disclosure is not limited by this example.

Table one: metadata

Category (key)	Key word (value)
		FRUIT	fruit,juicy,tree,…
MEAT	animal,hunt,…
		DESSERT	sugar,sweet,…

In one embodiment, the knowledge entity candidate generation module 112 performs natural language processing to extract nouns or noun phrases of the target text. These extracted nouns or noun phrases are used as candidate words of the target text. Bearing the above example of the target text "An applet a day keys the sector address", the candidate words extracted from the target text include "applet", "day", and "sector". The number of candidate words of the target text may vary depending on the content of the target text. In one embodiment, the target text includes one or more candidate words. In this example of the target text, the number of candidate words is 3.

In step S220, the knowledge entity candidate generating module 112 compares the candidate words of the target text with the knowledge base 500 to obtain a plurality of entity names associated with the candidate words from the knowledge base 500.

In one embodiment, the knowledge entity verification and enhancement module 114 compares the candidate words one by one in the knowledge base 500. The knowledge base 500 records a plurality of entity data. The data structure of each entity data includes, but is not limited to, a number, an entity name, an entity description, an entity type, and the like, as shown in table two.

Table two: knowledge base

In the above example, when the candidate word is "applet", the knowledge entity verification and enhancement module 114 compares the candidate word "applet" with the knowledge base 500 in table two to obtain a plurality of entity names associated with "applet", such as "Apple inc." with number 0, "Apple with number 1," Pineapple "with number 2, and" Apple, oklahoma "with number 3. In one embodiment, the obtained entity names numbered 0 to 3 may be recorded in the candidate list of the candidate word "applet". On the other hand, since the entity name "Orange" of the number N is different/similar from the candidate word "applet", the entity name "Orange" of the number N is not registered in the candidate list of the candidate word "applet".

In an embodiment, the information retrieval method for searching and comparing the candidate words in the knowledge base 500 may be a term frequency-inverse document frequency (tf-idf) method or other data exploration/term frequency statistical methods, but the disclosure is not limited thereto.

In step S230, the knowledge entity verification and enhancement module 114 compares the entity description data and the metadata in the knowledge base 500 to obtain a comparison result.

In one embodiment, words are searched in the entity description data to obtain more content description information as enhancement information of candidate words for use by the subsequent knowledge entity classification module 116.

In the above example, the entity names recorded in the candidate list of the candidate word "applet" are 4 pieces of data, such as "Apple inc.", "Apple", "Pineapple", and "Apple, oklahoma". Further, the entity description data corresponding to each entity name in the candidate list is searched one by one according to the metadata received in the step S210. Take the metadata category "FRUIT" and its keyword "FRUIT, juiscy, tree" as an example (as shown in the above table). The knowledge entity verification and enhancement module 114 searches and compares the keyword "free" with the entity description data "An applet is An instant free by An applet tree (Malus domestica)," Apple trees are compact words and are the same with the event window yield scales in the gene Malus ", judges whether any word matches with the" free ", and accumulates 1 time when a matching word is obtained. In this embodiment, the category "friendly" has three key words, and the same search and matching are performed on the three key words, respectively, so as to obtain the total matching times of the category. For example, the keyword "friendly" of the category "friendly" may obtain 1 matching time in the entity description data corresponding to the entity name "Apple"; the key word "Juacy" of the category "FRUIT" can obtain 0 times of matching times in the entity description data corresponding to the entity name "Apple"; the keyword "tree" of the category "FRUIT" can obtain the number of matching times for 2 times in the entity description data corresponding to the entity name "Apple". Thus, the total number of matches of the category "FRUIT" with respect to the entity name "Apple" is 3.

By analogy, the total matching times of the keyword "animal, hunt" of the category "MEAT" in the entity description data corresponding to the entity name "Apple" is 0. The total matching times of the keyword "sugar, sweet" in the category "DESSERTs" in the entity description data corresponding to the entity name "Apple" is 0. As can be seen, among the three categories of metadata input in step S210, the category "FRUIT" having the largest total number of matches. Therefore, the category "friendly" of metadata is the comparison result of the target text. Meanwhile, the entity name "Apple" most associated with the category "friendly" is set as the most associated entity name.

In one embodiment, the comparison between the metadata and each entity description data may be a similarity comparison method (Cosine similarity). By using the metadata to search the entity description data in the knowledge base 500, the entity name closest to the metadata is filtered out by a similarity comparison method.

In step S240, the knowledge entity classification module 116 sets the entity name associated with the candidate word in the knowledge base 500 as the output classification of the candidate word in the target text according to the comparison result.

In the above example, the comparison result of the candidate word category in the target text is "friendly". Further, the knowledge entity classification module 116 compares the comparison result "FRUIT" with the entity type (i.e., "streams; malus; plants") corresponding to the most relevant entity name (i.e., "Apple") in the knowledge base 500. Since the word "strings" matching the comparison result "friend" can be found in the entity category, it can be verified that the found comparison result "friend" is the output classification of the candidate word in the target text.

In an embodiment, the data enhancement result of the candidate word obtained in step S230 and the category defined in advance by the metadata of the user and the keyword thereof may be input to a word classification model (not shown in fig. 1) for classification, so as to determine that the candidate word is the knowledge entity of the target text, and classify the knowledge entity into the corresponding category to obtain the final knowledge entity and the category to which the knowledge entity belongs.

Please refer to fig. 1 and fig. 3 together. FIG. 3 shows a flow diagram of a knowledge entity identification method 300 in accordance with an embodiment of the present disclosure. The knowledge entity identification method 300 may be performed by the knowledge entity identification apparatus 100 of FIG. 1.

In step S310, the knowledge entity verification and enhancement module 114 performs comparison in the knowledge base 500 using the candidate words of the target text, and obtains a plurality of entity names in a sequence according to the similarity.

In one embodiment, the target text may be extracted by the knowledge entity candidate generation module 112 by performing natural language processing techniques. Bearing the above example of the target text "An applet a days keys the sector access", the candidate word "applet" is compared to all entity names in the knowledge base 500 of table two. The entity names in the knowledge base 500 that are most similar to the candidate word "applet" have the highest ranking. Based on the similarity of each entity name, the sorted entity names can be obtained by high and low sorting. The sorted entity names are shown in table three, the entity name with the number 1 is the first order, the entity name with the number 0 is the second order, and so on. After the similarity comparison is performed, 4 sorted entity names screened from the knowledge base 500 are data which are the same as or similar to the candidate words.

Watch III

In step S320, the knowledge entity verifying and enhancing module 114 compares the key words of each category of the metadata with the words in the entity description data corresponding to the sorted entity names to obtain a comparison result. In some embodiments, the knowledge entity verification and enhancement module 114 compares the key words of each category of the metadata with the words in the entity description data corresponding to the sorted entity names to calculate the word matching number of the key words in the entity description data that are the same or similar to the key words of each category, so that each category has the corresponding word matching number.

In one embodiment, the metadata includes a plurality of categories, wherein each category includes a plurality of keyword terms. For example, the metadata includes a first category "fresh" and a second category "MEAT". The first category "FRUIT" includes the keyword words "free", "Juacy", and "tree". The second category "MEAT" includes the keyword "animal" as well as "hunt". The third category "DESSET" includes the key words "sugar" and "sweet".

In one embodiment, the keyword "front" is compared with the sorted first entity description data "An applet is An edge real produced by An applet tree (Malus domicile.) Aple trees and the area the most computer wide grains in the language Malus", to obtain 1 matching word. Similarly, the keyword "Juacy" and "tree" are compared in the first entity description data to obtain 0 and 2 matching words, respectively. In other words, the first category "FRUIT" has a total of 3 for the matching words associated with the first entity name. By analogy, the matching words of the second category "MEAT" associated with the first entity name sum to 0. The sum of the matching words between the keyword of each category and the entity description data of the first entity name "Apple" is shown in table four.

Table four:

metadata	Word match quantity of first entity description data
		The first category "fresh"	3
Second category "MEAT"	0
		The third category "DESSET"	0

In step S330, the knowledge entity verification and enhancement module 114 sets the category with the largest word matching number as the output category of the candidate word in the target text.

Following the example above, the first category has the largest number of word matches (i.e., 3), and thus the first category "FRUIT" will be set as the output classification for the candidate word in the target text.

It should be noted that, in steps S320 and S330, the sum of the word matching numbers of the sorted second entity names in the first, second and third categories of the metadata is calculated, the sum of the word matching numbers of the sorted third entity names in the first, second and third categories of the metadata is calculated, and the sum of the word matching numbers of the sorted fourth entity names in the first, second and third categories of the metadata is calculated. In other words, all categories of metadata match each of the ranked entity names to get the sum of the number of word matches for all categories of each entity name. For brevity of description, the description of the matching step is not repeated here.

In step S340, the knowledge entity verification and enhancement module 114 compares the output classification with the entity type corresponding to the sorted entity name in the knowledge base 500 to verify whether the output classification of the candidate word in the target text is correct.

Bearing the example above, it is the first category "FRUIT" that has the greatest number of word matches, so the output category of the candidate word in the target text is set to "FRUIT". In step S340, to verify whether the output classification is correct, the output classification "FRUIT" is further compared with the first entity type. As shown in Table three, the first entity categories include "streams", "Malus" and "Plants". Since the first entity class of "friends" is matchable with the output classification of "friend," it can be verified that this output classification is a correct result.

In one embodiment, the knowledge entity identification 100 may be implemented as, but not limited to, a portable electronic device, a mobile phone, a tablet computer (tablet computer), a Personal Digital Assistant (PDA), a wearable device, or a notebook computer.

In one embodiment, the knowledge entity identification 100 includes at least a processor (not shown in FIG. 1), a storage medium (not shown in FIG. 1), and an input/output interface (not shown in FIG. 1). The processor is configured to operate and control the knowledge entity candidate generation module 112, the knowledge entity verification and enhancement module 114, and the knowledge entity classification module 116. The storage medium is used for storing a plurality of program instructions and temporary storage data in the process of executing the instructions. The input/output interface is coupled to the processor for receiving an input data 102 and sending an output data 104.

The processor may be implemented as, but not limited to, a Central Processing Unit (CPU), a System on Chip (SoC), an application processor, an audio processor, a Digital Signal Processor (DSP), or a function specific processing Chip or controller.

The storage medium may be implemented as, but not limited to, a Random Access Memory (RAM) or a nonvolatile Memory (e.g., a Flash Memory, a Read Only Memory (ROM), a Hard Disk Drive (HDD), a Solid State Drive (SSD), an optical Memory, or the like).

In one embodiment, the text classification model may be an Artificial intelligence model and may be established by a plurality of sub-algorithms, including Neural Network (ANN), supervised learning (Supervised learning) in Machine learning (Machine learning), wherein the Supervised learning includes algorithms such as Support Vector Machine (SVM), regression analysis, statistical classification, and the like.

In one embodiment, the present disclosure provides a non-transitory computer readable recording medium storing a plurality of program codes. After the program code is loaded into the processor of the knowledge entity recognition device 100 shown in fig. 1, the processor executes the program code and performs the steps shown in fig. 2 and fig. 3.

Compared with the prior art, the knowledge entity identification method and the knowledge entity identification device can analyze more knowledge entities on the premise of the same number of the knowledge entities to be analyzed, and realize high recall rate (recall rate). On the premise that the analyzed knowledge entities are the same in number, the method can obtain more correct knowledge entities, and achieves high precision (precision).

In summary, according to the present disclosure, metadata is input while a target text to be labeled is input, and after an entity name is searched in a knowledge base, entity description data of the entity name is further retrieved for verification, so that accuracy of identifying classification of a knowledge entity in the target text can be improved. In addition, the knowledge entity identification device and method disclosed by the invention can be applied to labeling of a large number of documents. When the document to be marked is changed into different fields, the switching of the fields can be realized only by switching the corresponding knowledge base. And in the aspect of expansion, the new vocabulary only needs to be added into the knowledge base to be updated. In addition, the method can reduce the cost of manpower labeling and the burden of experts, save a large amount of manual labeling work, and has various subsequent applications (the input articles can automatically label the categories and keywords of the articles).

The foregoing outlines features of several embodiments so that those skilled in the art may better understand the implementations of the present disclosure. Those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiments as a basis for designing or modifying other structures for carrying out the same purposes and/or achieving the same advantages of the embodiments introduced herein without departing from the spirit and scope of the present disclosure. The foregoing is to be understood as illustrative of the present disclosure, and the scope of protection is to be determined by the claims.

Claims

1. A knowledge entity identification method, comprising:

receiving a target text and metadata to be analyzed, wherein the target text comprises a candidate word;

comparing the candidate word in a knowledge base to obtain a plurality of entity names related to the candidate word from the knowledge base, wherein each entity name has corresponding entity description data;

comparing the entity description data with the metadata in the knowledge base to obtain a comparison result; and

and setting the entity name related to the candidate word in the knowledge base as an output classification of the candidate word in the target text according to the comparison result.

2. The method of claim 1, wherein the metadata comprises a plurality of categories, each category comprising a plurality of keyword terms, wherein the method comprises:

and comparing the key words of each category of the metadata with the words in the entity description data corresponding to the entity name to obtain the comparison result.

3. The knowledge entity identification method of claim 2, further comprising:

comparing the candidate words of the target text in the knowledge base, and obtaining the sorted entity names based on the similarity of comparison;

comparing the key words of each category of the metadata with the words in the entity description data corresponding to the sorted entity names to calculate a word matching number of the key words in the entity description data which is the same as or similar to the key words of each category, so that each category has the corresponding word matching number; and

setting the category with the largest word matching number as the output classification of the candidate word in the target text.

4. The knowledge entity identification method of claim 3, further comprising:

comparing the output classification with an entity type corresponding to the entity name in the knowledge base to verify whether the output classification of the candidate word in the target text is correct.

5. The knowledge entity identification method of claim 2, further comprising:

comparing the candidate words of the target text in the knowledge base, and obtaining the sorted entity names based on the similarity;

comparing the words in the entity description data of the sorted entity names according to each key word in the categories to respectively obtain a matching quantity, wherein the sum of the matching quantities of each category is the word matching quantity of the corresponding category; and

and taking the category corresponding to the maximum word matching number as the output classification.

6. A knowledge entity recognition apparatus comprising:

a knowledge entity candidate generation module configured to receive a target text and metadata to be parsed and compare the candidate words of the target text in a knowledge base to obtain a plurality of entity names associated with the candidate words from the knowledge base, wherein each entity name has a corresponding entity description data;

a knowledge entity verification and enhancement module coupled to the knowledge entity candidate generation module, wherein the knowledge entity verification and enhancement module is configured to compare the entity description data and the metadata in the knowledge base to obtain a comparison result; and

a knowledge entity classification module, coupled to the knowledge entity verification and enhancement module, wherein the knowledge entity classification module is configured to set the entity name associated with the candidate word in the knowledge base as an output classification of the candidate word in the target text according to the comparison result.

7. The apparatus of claim 6, wherein the metadata comprises a plurality of categories, each category comprises a plurality of key words, wherein the candidate knowledge entity generation module is further configured to perform a comparison in the knowledge base using the candidate words of the target text, obtain an entity name, and compare the key words of each category of the metadata with the words in the entity description data corresponding to the entity name to obtain the comparison result.

8. The apparatus of claim 7, wherein the knowledge entity verifying and enhancing module performs a comparison in the knowledge base using the candidate words of the target text, obtains the ranked entity names based on the similarity of the comparison, compares the key words of each category of the metadata with words in the entity description data corresponding to the ranked entity names to calculate a word matching number of the key words in the entity description data that are the same as or similar to the key words of each category, such that each category has the corresponding word matching number, and sets the category with the largest word matching number as the output classification of the candidate words in the target text.

9. The apparatus of claim 8, wherein the knowledge entity classification module is further configured to compare the output classification with an entity type corresponding to the entity name in the knowledge base to verify whether the output classification of the candidate word in the target text is correct.

10. The apparatus of claim 7, wherein the knowledge entity verification and enhancement module is further configured to: