CN115221313A - Knowledge entity identification method and knowledge entity identification device - Google Patents

Knowledge entity identification method and knowledge entity identification device Download PDF

Info

Publication number
CN115221313A
CN115221313A CN202110410253.6A CN202110410253A CN115221313A CN 115221313 A CN115221313 A CN 115221313A CN 202110410253 A CN202110410253 A CN 202110410253A CN 115221313 A CN115221313 A CN 115221313A
Authority
CN
China
Prior art keywords
entity
knowledge
category
candidate
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110410253.6A
Other languages
Chinese (zh)
Inventor
曾俋颖
邱德旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Delta Electronics Inc
Original Assignee
Delta Electronics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Delta Electronics Inc filed Critical Delta Electronics Inc
Priority to CN202110410253.6A priority Critical patent/CN115221313A/en
Publication of CN115221313A publication Critical patent/CN115221313A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Abstract

A knowledge entity identification method comprising the steps of: receiving a target text to be analyzed and metadata, wherein the target text comprises candidate words; comparing the candidate words in the knowledge base to obtain a plurality of entity names related to the candidate words from the knowledge base, wherein each entity name has corresponding entity description data; comparing the entity description data in the knowledge base with the metadata to obtain a comparison result; and setting the entity name related to the candidate word in the knowledge base as the output classification of the candidate word in the target text according to the comparison result. The disclosure also relates to a knowledge entity recognition device.

Description

Knowledge entity identification method and knowledge entity identification device
Technical Field
The present disclosure relates to electronic devices and methods thereof, and more particularly, to a knowledge entity recognition device and method.
Background
In the conventional knowledge management method, experts manually mark data of all files one by one. With the development of technology, the current data tagging method can analyze syntax and semantics through natural language technology, however, such corpus analysis cannot make a machine understand new words, and needs to be tagged by expert personnel. The existing training process using labeled data is too tedious and lacks flexibility, it is difficult for an established knowledge management system to train a knowledge management system in a new field based on the existing data, and a quite high training cost is required for the establishment of knowledge management systems in different fields.
In view of the above, the knowledge management system is a tool with considerable management ability, but at present, an efficient construction method is still lacking, and the execution accuracy of the knowledge management system still has room for improvement. Accordingly, how to provide an efficient system building method and provide high-precision knowledge management is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
According to an embodiment of the present disclosure, a knowledge entity identification method is disclosed, which includes the steps of: receiving a target text to be analyzed and metadata, wherein the target text comprises candidate words; comparing the candidate words in the knowledge base to obtain a plurality of entity names related to the candidate words from the knowledge base, wherein each entity name has corresponding entity description data; comparing the entity description data in the knowledge base with the metadata to obtain a comparison result; and setting the entity name associated with the candidate word in the knowledge base as the output classification of the candidate word in the target text according to the comparison result.
According to another embodiment, a knowledge entity identification apparatus is disclosed that includes a knowledge entity candidate generation module, a knowledge entity validation and enhancement module, and a knowledge entity classification module. The knowledge entity candidate generation module is configured to receive a target text to be parsed and metadata, and compare candidate words of the target text in a knowledge base to obtain a plurality of entity names associated with the candidate words from the knowledge base, wherein each entity name has corresponding entity description data. The knowledge entity verification and enhancement module is coupled to the knowledge entity candidate generation module, wherein the knowledge entity verification and enhancement module is configured to compare the entity description data and the metadata in the knowledge base to obtain a comparison result. The knowledge entity classification module is coupled to the knowledge entity verification and enhancement module, wherein the knowledge entity classification module is configured to set an entity name associated with the candidate word in the knowledge base as an output classification of the candidate word in the target text according to the comparison result.
Drawings
The following detailed description will facilitate a better understanding of embodiments of the disclosure when read in conjunction with the accompanying drawings. It should be noted that the features of the drawings are not necessarily drawn to scale in accordance with the requirements of an illustrative implementation. In fact, the dimensions of the various features may be arbitrarily increased or reduced for clarity of discussion.
Fig. 1 shows a block diagram of a knowledge entity identification apparatus in an embodiment according to the present disclosure.
FIG. 2 shows a flow diagram of a knowledge entity identification method in an embodiment in accordance with the disclosure.
FIG. 3 shows a flow diagram of a knowledge entity identification method in accordance with an embodiment of the present disclosure.
Description of the reference numerals:
100: knowledge entity recognition device
102: inputting data
104: outputting the data
112: knowledge entity candidate generation module
114: knowledge entity verification and enhancement module
116: knowledge entity classification module
200. 300, and (2) 300: knowledge entity identification method
500: knowledge base
502: analysis and storage module
600: external universal knowledge base
S210 to S240, S310 to S340: step (ii) of
[ depositing of biological Material ]
Domestic register information (please note according to register organization, date and number order)
Is free of
Foreign deposit information (please note according to the country, organization, date, number of deposit)
Is free of
Detailed Description
The following disclosure provides many different embodiments for implementing different features of the disclosure. Embodiments of the elements and arrangements are described below to simplify the present disclosure. Of course, these embodiments are merely exemplary and not intended to be limiting. For example, the terms "first", "second", etc. are used in this disclosure to describe elements, but are used only to distinguish the same or similar elements or operations, and the terms are not used to limit the technical elements of the disclosure, nor to limit the order or sequence of operations.
Referring to fig. 1, a block diagram of a knowledge entity identification apparatus 100 according to an embodiment of the disclosure is shown. The knowledge entity recognition apparatus 100 is used for recognizing a target object in input data 102 and providing recognized output data 104. For example, the knowledge Entity Recognition device 100 parses an input text, sentence, paragraph, etc. to perform Named Entity Recognition (Named Entity Recognition). In one embodiment, the input data 102 received by the knowledge entity recognition device 100 includes target text and Metadata (Domain Metadata). The target text is the data to be parsed. The metadata is data for assisting the analysis of the target text, and may be a category and a keyword thereof designed by the user in advance.
In one embodiment, the knowledge entity identifying apparatus 100 is coupled to the knowledge base 500. The knowledge base 500 is coupled to an external general knowledge base 600. The external general knowledge base 600 is a database having different formats and domain contents, such as wikipedia, a specialized dictionary, and domain expert knowledge. The knowledge base 500 may be a database that stores knowledge data created by internally self-defined knowledge and/or data through an external universal knowledge base 600. For example, the knowledge base 500 is provided with a parsing and storage module 502. The parsing and storing module 502 may read the data in the external common knowledge base 600 and convert the external data into a data structure with a specific format, for example, the external data and the domain expert knowledge data are processed regularly, so that the data stored in the knowledge base 500 may be provided to the knowledge entity recognition apparatus 100 for use in recognizing the target text.
In one embodiment, the knowledge entity identification apparatus 100 comprises a knowledge entity candidate generation module 112, a knowledge entity verification and enhancement module 114, and a knowledge entity classification module 116. The knowledge entity candidate generation module 112 is electrically coupled to the knowledge entity verification and enhancement module 114. The knowledge entity verification and enhancement module 114 is electrically coupled to the knowledge entity classification module 116. For the purpose of facilitating an understanding of the present disclosure, reference is made to fig. 1 and 2 in conjunction with the following description. FIG. 2 shows a flow diagram of a knowledge entity identification method 200 in accordance with an embodiment of the present disclosure. The knowledge entity identification method 200 may be performed by the knowledge entity identification apparatus 100 of FIG. 1.
In step S210, the knowledge entity candidate generation module 112 receives the target text and the metadata to be parsed.
In one embodiment, the target text to be parsed is text data to be analyzed, and includes one or more sentences or paragraphs. In another aspect, the metadata includes a plurality of categories (keys), each category including a plurality of key terms (values). The user may define all the categories of metadata and the keyword of each category in advance and input the target text to the knowledge entity recognition apparatus 100 at the same time. To facilitate the description of the present disclosure, the following text is presented by way of example with the sentence "An applet a day keys the sector access", and metadata is shown in table one. It should be noted that the present disclosure is not limited by this example.
Table one: metadata
Category (key) Key word (value)
FRUIT fruit,juicy,tree,…
MEAT animal,hunt,…
DESSERT sugar,sweet,…
In one embodiment, the knowledge entity candidate generation module 112 performs natural language processing to extract nouns or noun phrases of the target text. These extracted nouns or noun phrases are used as candidate words of the target text. Bearing the above example of the target text "An applet a day keys the sector address", the candidate words extracted from the target text include "applet", "day", and "sector". The number of candidate words of the target text may vary depending on the content of the target text. In one embodiment, the target text includes one or more candidate words. In this example of the target text, the number of candidate words is 3.
In step S220, the knowledge entity candidate generating module 112 compares the candidate words of the target text with the knowledge base 500 to obtain a plurality of entity names associated with the candidate words from the knowledge base 500.
In one embodiment, the knowledge entity verification and enhancement module 114 compares the candidate words one by one in the knowledge base 500. The knowledge base 500 records a plurality of entity data. The data structure of each entity data includes, but is not limited to, a number, an entity name, an entity description, an entity type, and the like, as shown in table two.
Table two: knowledge base
Figure BDA0003023894100000051
In the above example, when the candidate word is "applet", the knowledge entity verification and enhancement module 114 compares the candidate word "applet" with the knowledge base 500 in table two to obtain a plurality of entity names associated with "applet", such as "Apple inc." with number 0, "Apple with number 1," Pineapple "with number 2, and" Apple, oklahoma "with number 3. In one embodiment, the obtained entity names numbered 0 to 3 may be recorded in the candidate list of the candidate word "applet". On the other hand, since the entity name "Orange" of the number N is different/similar from the candidate word "applet", the entity name "Orange" of the number N is not registered in the candidate list of the candidate word "applet".
In an embodiment, the information retrieval method for searching and comparing the candidate words in the knowledge base 500 may be a term frequency-inverse document frequency (tf-idf) method or other data exploration/term frequency statistical methods, but the disclosure is not limited thereto.
In step S230, the knowledge entity verification and enhancement module 114 compares the entity description data and the metadata in the knowledge base 500 to obtain a comparison result.
In one embodiment, words are searched in the entity description data to obtain more content description information as enhancement information of candidate words for use by the subsequent knowledge entity classification module 116.
In the above example, the entity names recorded in the candidate list of the candidate word "applet" are 4 pieces of data, such as "Apple inc.", "Apple", "Pineapple", and "Apple, oklahoma". Further, the entity description data corresponding to each entity name in the candidate list is searched one by one according to the metadata received in the step S210. Take the metadata category "FRUIT" and its keyword "FRUIT, juiscy, tree" as an example (as shown in the above table). The knowledge entity verification and enhancement module 114 searches and compares the keyword "free" with the entity description data "An applet is An instant free by An applet tree (Malus domestica)," Apple trees are compact words and are the same with the event window yield scales in the gene Malus ", judges whether any word matches with the" free ", and accumulates 1 time when a matching word is obtained. In this embodiment, the category "friendly" has three key words, and the same search and matching are performed on the three key words, respectively, so as to obtain the total matching times of the category. For example, the keyword "friendly" of the category "friendly" may obtain 1 matching time in the entity description data corresponding to the entity name "Apple"; the key word "Juacy" of the category "FRUIT" can obtain 0 times of matching times in the entity description data corresponding to the entity name "Apple"; the keyword "tree" of the category "FRUIT" can obtain the number of matching times for 2 times in the entity description data corresponding to the entity name "Apple". Thus, the total number of matches of the category "FRUIT" with respect to the entity name "Apple" is 3.
By analogy, the total matching times of the keyword "animal, hunt" of the category "MEAT" in the entity description data corresponding to the entity name "Apple" is 0. The total matching times of the keyword "sugar, sweet" in the category "DESSERTs" in the entity description data corresponding to the entity name "Apple" is 0. As can be seen, among the three categories of metadata input in step S210, the category "FRUIT" having the largest total number of matches. Therefore, the category "friendly" of metadata is the comparison result of the target text. Meanwhile, the entity name "Apple" most associated with the category "friendly" is set as the most associated entity name.
In one embodiment, the comparison between the metadata and each entity description data may be a similarity comparison method (Cosine similarity). By using the metadata to search the entity description data in the knowledge base 500, the entity name closest to the metadata is filtered out by a similarity comparison method.
In step S240, the knowledge entity classification module 116 sets the entity name associated with the candidate word in the knowledge base 500 as the output classification of the candidate word in the target text according to the comparison result.
In the above example, the comparison result of the candidate word category in the target text is "friendly". Further, the knowledge entity classification module 116 compares the comparison result "FRUIT" with the entity type (i.e., "streams; malus; plants") corresponding to the most relevant entity name (i.e., "Apple") in the knowledge base 500. Since the word "strings" matching the comparison result "friend" can be found in the entity category, it can be verified that the found comparison result "friend" is the output classification of the candidate word in the target text.
In an embodiment, the data enhancement result of the candidate word obtained in step S230 and the category defined in advance by the metadata of the user and the keyword thereof may be input to a word classification model (not shown in fig. 1) for classification, so as to determine that the candidate word is the knowledge entity of the target text, and classify the knowledge entity into the corresponding category to obtain the final knowledge entity and the category to which the knowledge entity belongs.
Please refer to fig. 1 and fig. 3 together. FIG. 3 shows a flow diagram of a knowledge entity identification method 300 in accordance with an embodiment of the present disclosure. The knowledge entity identification method 300 may be performed by the knowledge entity identification apparatus 100 of FIG. 1.
In step S310, the knowledge entity verification and enhancement module 114 performs comparison in the knowledge base 500 using the candidate words of the target text, and obtains a plurality of entity names in a sequence according to the similarity.
In one embodiment, the target text may be extracted by the knowledge entity candidate generation module 112 by performing natural language processing techniques. Bearing the above example of the target text "An applet a days keys the sector access", the candidate word "applet" is compared to all entity names in the knowledge base 500 of table two. The entity names in the knowledge base 500 that are most similar to the candidate word "applet" have the highest ranking. Based on the similarity of each entity name, the sorted entity names can be obtained by high and low sorting. The sorted entity names are shown in table three, the entity name with the number 1 is the first order, the entity name with the number 0 is the second order, and so on. After the similarity comparison is performed, 4 sorted entity names screened from the knowledge base 500 are data which are the same as or similar to the candidate words.
Watch III
Figure BDA0003023894100000081
In step S320, the knowledge entity verifying and enhancing module 114 compares the key words of each category of the metadata with the words in the entity description data corresponding to the sorted entity names to obtain a comparison result. In some embodiments, the knowledge entity verification and enhancement module 114 compares the key words of each category of the metadata with the words in the entity description data corresponding to the sorted entity names to calculate the word matching number of the key words in the entity description data that are the same or similar to the key words of each category, so that each category has the corresponding word matching number.
In one embodiment, the metadata includes a plurality of categories, wherein each category includes a plurality of keyword terms. For example, the metadata includes a first category "fresh" and a second category "MEAT". The first category "FRUIT" includes the keyword words "free", "Juacy", and "tree". The second category "MEAT" includes the keyword "animal" as well as "hunt". The third category "DESSET" includes the key words "sugar" and "sweet".
In one embodiment, the keyword "front" is compared with the sorted first entity description data "An applet is An edge real produced by An applet tree (Malus domicile.) Aple trees and the area the most computer wide grains in the language Malus", to obtain 1 matching word. Similarly, the keyword "Juacy" and "tree" are compared in the first entity description data to obtain 0 and 2 matching words, respectively. In other words, the first category "FRUIT" has a total of 3 for the matching words associated with the first entity name. By analogy, the matching words of the second category "MEAT" associated with the first entity name sum to 0. The sum of the matching words between the keyword of each category and the entity description data of the first entity name "Apple" is shown in table four.
Table four:
metadata Word match quantity of first entity description data
The first category "fresh" 3
Second category "MEAT" 0
The third category "DESSET" 0
In step S330, the knowledge entity verification and enhancement module 114 sets the category with the largest word matching number as the output category of the candidate word in the target text.
Following the example above, the first category has the largest number of word matches (i.e., 3), and thus the first category "FRUIT" will be set as the output classification for the candidate word in the target text.
It should be noted that, in steps S320 and S330, the sum of the word matching numbers of the sorted second entity names in the first, second and third categories of the metadata is calculated, the sum of the word matching numbers of the sorted third entity names in the first, second and third categories of the metadata is calculated, and the sum of the word matching numbers of the sorted fourth entity names in the first, second and third categories of the metadata is calculated. In other words, all categories of metadata match each of the ranked entity names to get the sum of the number of word matches for all categories of each entity name. For brevity of description, the description of the matching step is not repeated here.
In step S340, the knowledge entity verification and enhancement module 114 compares the output classification with the entity type corresponding to the sorted entity name in the knowledge base 500 to verify whether the output classification of the candidate word in the target text is correct.
Bearing the example above, it is the first category "FRUIT" that has the greatest number of word matches, so the output category of the candidate word in the target text is set to "FRUIT". In step S340, to verify whether the output classification is correct, the output classification "FRUIT" is further compared with the first entity type. As shown in Table three, the first entity categories include "streams", "Malus" and "Plants". Since the first entity class of "friends" is matchable with the output classification of "friend," it can be verified that this output classification is a correct result.
In one embodiment, the knowledge entity identification 100 may be implemented as, but not limited to, a portable electronic device, a mobile phone, a tablet computer (tablet computer), a Personal Digital Assistant (PDA), a wearable device, or a notebook computer.
In one embodiment, the knowledge entity identification 100 includes at least a processor (not shown in FIG. 1), a storage medium (not shown in FIG. 1), and an input/output interface (not shown in FIG. 1). The processor is configured to operate and control the knowledge entity candidate generation module 112, the knowledge entity verification and enhancement module 114, and the knowledge entity classification module 116. The storage medium is used for storing a plurality of program instructions and temporary storage data in the process of executing the instructions. The input/output interface is coupled to the processor for receiving an input data 102 and sending an output data 104.
The processor may be implemented as, but not limited to, a Central Processing Unit (CPU), a System on Chip (SoC), an application processor, an audio processor, a Digital Signal Processor (DSP), or a function specific processing Chip or controller.
The storage medium may be implemented as, but not limited to, a Random Access Memory (RAM) or a nonvolatile Memory (e.g., a Flash Memory, a Read Only Memory (ROM), a Hard Disk Drive (HDD), a Solid State Drive (SSD), an optical Memory, or the like).
In one embodiment, the text classification model may be an Artificial intelligence model and may be established by a plurality of sub-algorithms, including Neural Network (ANN), supervised learning (Supervised learning) in Machine learning (Machine learning), wherein the Supervised learning includes algorithms such as Support Vector Machine (SVM), regression analysis, statistical classification, and the like.
In one embodiment, the present disclosure provides a non-transitory computer readable recording medium storing a plurality of program codes. After the program code is loaded into the processor of the knowledge entity recognition device 100 shown in fig. 1, the processor executes the program code and performs the steps shown in fig. 2 and fig. 3.
Compared with the prior art, the knowledge entity identification method and the knowledge entity identification device can analyze more knowledge entities on the premise of the same number of the knowledge entities to be analyzed, and realize high recall rate (recall rate). On the premise that the analyzed knowledge entities are the same in number, the method can obtain more correct knowledge entities, and achieves high precision (precision).
In summary, according to the present disclosure, metadata is input while a target text to be labeled is input, and after an entity name is searched in a knowledge base, entity description data of the entity name is further retrieved for verification, so that accuracy of identifying classification of a knowledge entity in the target text can be improved. In addition, the knowledge entity identification device and method disclosed by the invention can be applied to labeling of a large number of documents. When the document to be marked is changed into different fields, the switching of the fields can be realized only by switching the corresponding knowledge base. And in the aspect of expansion, the new vocabulary only needs to be added into the knowledge base to be updated. In addition, the method can reduce the cost of manpower labeling and the burden of experts, save a large amount of manual labeling work, and has various subsequent applications (the input articles can automatically label the categories and keywords of the articles).
The foregoing outlines features of several embodiments so that those skilled in the art may better understand the implementations of the present disclosure. Those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiments as a basis for designing or modifying other structures for carrying out the same purposes and/or achieving the same advantages of the embodiments introduced herein without departing from the spirit and scope of the present disclosure. The foregoing is to be understood as illustrative of the present disclosure, and the scope of protection is to be determined by the claims.

Claims (10)

1. A knowledge entity identification method, comprising:
receiving a target text and metadata to be analyzed, wherein the target text comprises a candidate word;
comparing the candidate word in a knowledge base to obtain a plurality of entity names related to the candidate word from the knowledge base, wherein each entity name has corresponding entity description data;
comparing the entity description data with the metadata in the knowledge base to obtain a comparison result; and
and setting the entity name related to the candidate word in the knowledge base as an output classification of the candidate word in the target text according to the comparison result.
2. The method of claim 1, wherein the metadata comprises a plurality of categories, each category comprising a plurality of keyword terms, wherein the method comprises:
and comparing the key words of each category of the metadata with the words in the entity description data corresponding to the entity name to obtain the comparison result.
3. The knowledge entity identification method of claim 2, further comprising:
comparing the candidate words of the target text in the knowledge base, and obtaining the sorted entity names based on the similarity of comparison;
comparing the key words of each category of the metadata with the words in the entity description data corresponding to the sorted entity names to calculate a word matching number of the key words in the entity description data which is the same as or similar to the key words of each category, so that each category has the corresponding word matching number; and
setting the category with the largest word matching number as the output classification of the candidate word in the target text.
4. The knowledge entity identification method of claim 3, further comprising:
comparing the output classification with an entity type corresponding to the entity name in the knowledge base to verify whether the output classification of the candidate word in the target text is correct.
5. The knowledge entity identification method of claim 2, further comprising:
comparing the candidate words of the target text in the knowledge base, and obtaining the sorted entity names based on the similarity;
comparing the words in the entity description data of the sorted entity names according to each key word in the categories to respectively obtain a matching quantity, wherein the sum of the matching quantities of each category is the word matching quantity of the corresponding category; and
and taking the category corresponding to the maximum word matching number as the output classification.
6. A knowledge entity recognition apparatus comprising:
a knowledge entity candidate generation module configured to receive a target text and metadata to be parsed and compare the candidate words of the target text in a knowledge base to obtain a plurality of entity names associated with the candidate words from the knowledge base, wherein each entity name has a corresponding entity description data;
a knowledge entity verification and enhancement module coupled to the knowledge entity candidate generation module, wherein the knowledge entity verification and enhancement module is configured to compare the entity description data and the metadata in the knowledge base to obtain a comparison result; and
a knowledge entity classification module, coupled to the knowledge entity verification and enhancement module, wherein the knowledge entity classification module is configured to set the entity name associated with the candidate word in the knowledge base as an output classification of the candidate word in the target text according to the comparison result.
7. The apparatus of claim 6, wherein the metadata comprises a plurality of categories, each category comprises a plurality of key words, wherein the candidate knowledge entity generation module is further configured to perform a comparison in the knowledge base using the candidate words of the target text, obtain an entity name, and compare the key words of each category of the metadata with the words in the entity description data corresponding to the entity name to obtain the comparison result.
8. The apparatus of claim 7, wherein the knowledge entity verifying and enhancing module performs a comparison in the knowledge base using the candidate words of the target text, obtains the ranked entity names based on the similarity of the comparison, compares the key words of each category of the metadata with words in the entity description data corresponding to the ranked entity names to calculate a word matching number of the key words in the entity description data that are the same as or similar to the key words of each category, such that each category has the corresponding word matching number, and sets the category with the largest word matching number as the output classification of the candidate words in the target text.
9. The apparatus of claim 8, wherein the knowledge entity classification module is further configured to compare the output classification with an entity type corresponding to the entity name in the knowledge base to verify whether the output classification of the candidate word in the target text is correct.
10. The apparatus of claim 7, wherein the knowledge entity verification and enhancement module is further configured to:
comparing the candidate words of the target text in the knowledge base, and obtaining the sorted entity names based on the similarity;
comparing the words in the entity description data of the sorted entity names according to each key word in the categories to respectively obtain a matching quantity, wherein the sum of the matching quantities of each category is the word matching quantity of the corresponding category; and
and taking the category corresponding to the maximum word matching number as the output classification.
CN202110410253.6A 2021-04-16 2021-04-16 Knowledge entity identification method and knowledge entity identification device Pending CN115221313A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110410253.6A CN115221313A (en) 2021-04-16 2021-04-16 Knowledge entity identification method and knowledge entity identification device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110410253.6A CN115221313A (en) 2021-04-16 2021-04-16 Knowledge entity identification method and knowledge entity identification device

Publications (1)

Publication Number Publication Date
CN115221313A true CN115221313A (en) 2022-10-21

Family

ID=83604622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110410253.6A Pending CN115221313A (en) 2021-04-16 2021-04-16 Knowledge entity identification method and knowledge entity identification device

Country Status (1)

Country Link
CN (1) CN115221313A (en)

Similar Documents

Publication Publication Date Title
CN111104794B (en) Text similarity matching method based on subject term
CN106649818B (en) Application search intention identification method and device, application search method and server
Pasca et al. Organizing and searching the world wide web of facts-step one: the one-million fact extraction challenge
US8494987B2 (en) Semantic relationship extraction, text categorization and hypothesis generation
CN112818093B (en) Evidence document retrieval method, system and storage medium based on semantic matching
US8370345B2 (en) Snippet based proximal search
CN111324771B (en) Video tag determination method and device, electronic equipment and storage medium
Al-Ash et al. Fake news identification characteristics using named entity recognition and phrase detection
WO2020114100A1 (en) Information processing method and apparatus, and computer storage medium
US11893537B2 (en) Linguistic analysis of seed documents and peer groups
CN107844493B (en) File association method and system
CN108038099B (en) Low-frequency keyword identification method based on word clustering
Elhadi et al. Use of text syntactical structures in detection of document duplicates
CN110019474B (en) Automatic synonymy data association method and device in heterogeneous database and electronic equipment
CN113282729A (en) Question-answering method and device based on knowledge graph
Feng et al. Question classification by approximating semantics
Yan et al. Chemical name extraction based on automatic training data generation and rich feature set
CN108345694B (en) Document retrieval method and system based on theme database
CN111753514A (en) Automatic generation method and device of patent application text
CN114742062B (en) Text keyword extraction processing method and system
CN110705285A (en) Government affair text subject word bank construction method, device, server and readable storage medium
Groza et al. Reference information extraction and processing using random conditional fields
CN114842982A (en) Knowledge expression method, device and system for medical information system
CN115221313A (en) Knowledge entity identification method and knowledge entity identification device
TWI777496B (en) Knowledge entity identification method and knowledge entity identification device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination