CN113127645B - Automatic extraction method of large-scale knowledge graph body, terminal equipment and storage medium - Google Patents

Automatic extraction method of large-scale knowledge graph body, terminal equipment and storage medium Download PDF

Info

Publication number
CN113127645B
CN113127645B CN202110380611.3A CN202110380611A CN113127645B CN 113127645 B CN113127645 B CN 113127645B CN 202110380611 A CN202110380611 A CN 202110380611A CN 113127645 B CN113127645 B CN 113127645B
Authority
CN
China
Prior art keywords
entity
entities
knowledge graph
automatic extraction
steps
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110380611.3A
Other languages
Chinese (zh)
Other versions
CN113127645A (en
Inventor
洪万福
张林娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Yuanting Information Technology Co ltd
Original Assignee
Xiamen Yuanting Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Yuanting Information Technology Co ltd filed Critical Xiamen Yuanting Information Technology Co ltd
Priority to CN202110380611.3A priority Critical patent/CN113127645B/en
Publication of CN113127645A publication Critical patent/CN113127645A/en
Application granted granted Critical
Publication of CN113127645B publication Critical patent/CN113127645B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a large-scale knowledge graph ontology automatic extraction method, terminal equipment and a storage medium, wherein the method comprises the following steps: s1: obtaining an entity from a knowledge graph; s2: adopting a rule matching algorithm to carry out primary classification on the extracted entities; s3: adopting a named entity recognition model to perform named entity recognition on the unclassified entity in the step S2, and confirming the type of the recognized named entity; s4: classifying the remaining entities identified by the named entities in the step S3 by adopting a clustering algorithm; s5: and merging and adjusting the classification results of the steps S2, S3 and S4 to obtain a final classification result. The invention innovatively integrates various technical means, effectively realizes the automatic extraction of the ontology of the large-scale knowledge graph in the industry, and can still realize the ontology extraction of the entity of the knowledge graph with complexity, large magnitude and much dirty data under the condition of no manual labeling data.

Description

Automatic extraction method of large-scale knowledge graph ontology, terminal equipment and storage medium
Technical Field
The invention relates to the field of knowledge graphs, in particular to a large-scale knowledge graph ontology automatic extraction method, terminal equipment and a storage medium.
Background
The concept of Knowledge Graph (Knowledge Graph) was formally proposed by google in 2012, aimed at implementing a more intelligent search engine, and began to spread in academia and industry after 2013. At present, with the continuous development of intelligent information service application, knowledge maps have been widely applied in the fields of intelligent search, intelligent question answering, personalized recommendation, intelligence analysis, anti-fraud and the like.
The knowledge graph has two construction modes of top-down and bottom-up. The top-down construction is: defining an ontology, and adding the entity into a knowledge base; the bottom-up construction is that an entity is extracted from publicly acquired data by a certain technical means, and the entity with higher confidence coefficient is selected and added into a knowledge base. At present, the mainstream mode is a bottom-up construction mode, which requires extraction and construction of an ontology after map construction. The ontology construction method can be divided into manual construction, semi-automatic construction and automatic construction according to the degree of manual intervention, but a mature technical system does not exist at present.
Disclosure of Invention
In order to solve the above problems, the present invention provides an automatic extraction method for a large-scale knowledge-graph ontology, a terminal device and a storage medium.
The specific scheme is as follows:
a large-scale knowledge graph ontology automatic extraction method comprises the following steps:
s1: obtaining an entity from a knowledge graph;
s2: adopting a rule matching algorithm to carry out primary classification on the extracted entities;
s3: adopting a named entity recognition model to perform named entity recognition on the unclassified entity in the step S2, and confirming the type of the recognized named entity;
s4: classifying the remaining entities identified by the named entities in the step S3 by adopting a clustering algorithm;
s5: and merging and adjusting the classification results of the steps S2, S3 and S4 to obtain a final classification result.
Further, step S1 includes preprocessing the obtained entities, where the preprocessing includes punctuation cleaning, abnormal length entity filtering, and converting capital letters into lowercase letters.
Further, the clustering algorithm in step S4 adopts a Kmeans clustering algorithm.
Further, the step S4 adopts a clustering algorithm to perform the classification specifically as follows:
s401: for each entity to be classified, extracting one or more of attributes, labels and relations from the knowledge graph, splicing the extracted attributes, labels and relations with the entity name, acquiring vector representation of each character in a spliced character string by using a natural language processing word vector technology, and taking the average value of the vector representations of all characters as the word vector of the entity to be classified;
s402: inputting the word vectors of the entities to be classified into a Kmeans model, and confirming the clustering number k by using an elbow method;
s403: and simultaneously inputting the word vector representation of the entity to be classified and the clustering number k into the Kmeans model to obtain a clustering result.
Further, the natural language processing word vector technique adopted in step S401 is a bert-base-multilingual-uncased model trained on corpus of 102 languages.
Further, if the number of entities in a certain category in the final classification result is greater than the preset number threshold, the entities in the certain category are re-classified again in steps S2 to S5.
The terminal equipment for automatically extracting the large-scale knowledge graph ontology comprises a processor, a memory and a computer program which is stored in the memory and can run on the processor, wherein the processor executes the computer program to realize the steps of the method of the embodiment of the invention.
A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as described above for an embodiment of the invention.
The invention adopts the technical scheme and has the following beneficial effects:
1. the applicability is strong: knowledge maps in different domains can be used with the present invention.
2. The effect is good: the integration of multiple technical means is innovatively carried out, so that the effect of body extraction is ensured; the rules are matched for preliminary classification, so that the classification quality is high; secondly, a named entity recognition model is used, an open source named entity recognition model or a self-training named entity recognition model is optionally used, and the named entity recognition model, whether the open source named entity recognition model or the self-training named entity recognition model is generated based on large-scale text corpus training with labels, so that the method has a good text recognition and classification effect; the entity names are innovatively used for splicing entity attributes, labels and relations, a text vector representation is obtained by using a natural language word vector processing technology, more features are extracted compared with the single use of the entity names, and the learning effect of a subsequent Kmeans model is greatly improved.
3. The speed is high: firstly, the processing speed is high by using rule matching classification and named entity recognition model recognition classification. Secondly, the number of samples to be classified is reduced by using rule matching classification and named entity recognition model recognition classification in advance, so that the time for subsequent word vector conversion and the time for training and predicting the Kmeans model are greatly reduced.
4. The implementation is quick: the named entity recognition model and the natural language processing word vector model are optional, an open source model can be used, the implementation of the first edition project is fast, and the effect can be seen fast.
5. The expansibility is strong: the operation can be circularly iterated according to expectation, and the result has extremely strong expansibility.
Drawings
Fig. 1 is a flowchart illustrating a first embodiment of the present invention.
Fig. 2 is a schematic view showing a line drawing in this embodiment.
Detailed Description
To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. Those skilled in the art will appreciate still other possible embodiments and advantages of the present invention with reference to these figures.
The invention will now be further described with reference to the accompanying drawings and detailed description.
The first embodiment is as follows:
the embodiment of the invention provides an automatic extraction method of a large-scale knowledge graph ontology, which is a flow chart of the automatic extraction method of the large-scale knowledge graph ontology, as shown in fig. 1, and the method comprises the following steps:
s1: an entity is obtained from a knowledge graph.
In this embodiment, 40W entities are obtained from the knowledge-graph using cypher query statements.
Further, since the obtained entity formats are not uniform and there are some useless data, the entity formats need to be preprocessed, where the preprocessing in this embodiment includes punctuation cleaning, abnormal length entity filtering, capital letter conversion into lowercase letters, and the like, and in other embodiments, other processing manners may be adopted, which is not limited herein.
S2: and carrying out primary classification on the extracted entities by adopting a rule matching algorithm.
The following rules are employed in this example:
a. entities ending in "ships", "boats", "guns", "radars", "tanks", and the like, the category being "equipment";
b. entities ending in "military," travel, "" team, "" teacher, "" war zone, "etc., and categories are" organizations.
The above is only an example rule adopted in this embodiment, and in other embodiments, a person skilled in the art may set other rules according to requirements, and the rules are not limited herein.
Through the steps, classification of partial entities can be completed, and the classification quality is high.
S3: and adopting a named entity recognition model to perform named entity recognition on the unclassified entity in the step S2, and confirming the type of the recognized named entity.
The named entity recognition model can be an open source model, such as Hanlp, Ltp and the like, and can also be a self-training model. The categories that Hanlp or Ltp of the open source can identify are as follows: name of person, place name, organization name, etc. The classes that are recognizable by the self-trained named entity recognition model are self-defined when the named entity recognition model is trained. The training of the named entity recognition model is beyond the scope of the present invention and will not be described herein. In the present embodiment, a self-trained named entity recognition model is used, the unclassified entity in step S2 is input into the named entity recognition model, and a part of the entities are recognized and classified. Such as: the 'debate eisenhawell' is input into the named entity recognition model, and is recognized and classified as 'person'.
S4: and (4) classifying the remaining entities identified by the named entities in the step (S3) by adopting a clustering algorithm.
In this embodiment, a Kmeans clustering algorithm is used for classification, and the specific classification process is as follows:
s401: and aiming at each entity to be classified, extracting one or more of attributes, labels and relations from the knowledge graph, splicing the extracted attributes, labels and relations with the entity name, using a natural language word vector processing technology to obtain vector representation of each word in the spliced character strings, and taking the average value of the vector representation of all the words as the word vector of the entity to be classified.
The natural language Word vector processing techniques include bert (bidirectional Encoder expressions from transformations), Fasttext, Word2vec, and the like.
Since the knowledge-graph may contain foreign bodies, the embodiment preferably uses a bert-base-multilingual-uncached model trained on 102-language corpora to obtain a vector representation of each word in the concatenated string.
S402: and inputting the word vectors of the entities to be classified into a Kmeans model, and confirming the clustering number k by using an elbow method.
The specific process of the elbow method is as follows: and presetting a start-stop range and interval number for the k value, inputting word vectors of the entities to be classified into a Kmeans model, storing SSEs (simple sequences of edge) under different k values, drawing a line graph, and taking inflection points in the line graph as the final clustering number k.
In this embodiment, the start and end values of k are set to 2 and 20 at an interval of 2, and a line graph is plotted as shown in fig. 2, where the inflection point in the line graph is 4, and the number of clusters k is 4.
S403: and simultaneously inputting the word vector representation of the entity to be classified and the clustering number k into the Kmeans model to obtain a clustering result.
In this embodiment, the obtained clustering result is: "equipment", "organization", "literature", "location", "people".
S5: and merging and adjusting the classification results of the steps S2, S3 and S4 to obtain a final classification result.
In this embodiment, the final classification result in this round is: "equipment", "organization", "people", "places", "documents".
Further, if the number of entities in a certain category in the final classification result is greater than the preset number threshold, if it is desired to further classify the ontology categories, the entities in the certain category may be further classified again in steps S2 to S5.
In this example, the number of entities belonging to the equipment category is 15W, the number of entities belonging to the organization category is 4W, the number of entities belonging to the people category is 7W, the number of entities belonging to the location category is 6W, and the number of entities belonging to the literature category is 8W. The equipment category number is greater than the number threshold, and therefore further subdivision is required.
And (4) carrying out Kmeans clustering algorithm of step S4 on the 15W entities belonging to the equipment category for classification, wherein the clustering classification results are 'land equipment', 'water equipment' and 'air equipment'. Merging and adjusting the classification results, wherein the final classification result in the current round is as follows: "land equipment" (entity number 7W), "water equipment" (entity number 4W), "air equipment" (entity number 4W), "organization" (entity number 4W), "people" (entity number 7W), "place" (entity number 6W), "literature" (entity number 8W). The number after further classification is smaller than the number threshold, so that the requirements of engineering projects are met, further subdivision is not needed, and the extraction of the body is finished.
The number threshold may be set by one skilled in the art according to actual requirements, and is not limited herein.
The embodiment I of the invention innovatively integrates and uses a plurality of technical means of rule matching, named entity recognition, natural language word vector processing technology and Kmeans clustering, effectively realizes the automatic extraction work of the ontology of the large-scale knowledge graph in the industry, and can still realize the ontology extraction of the entity of the knowledge graph with complexity, large magnitude and much dirty data under the condition of no manual labeling data; if some entities are marked with data, the invention can achieve better effect.
The entity name is used independently, the intrinsic characteristics are insufficient, and Kmeans learning is too simple and is easy to be under-fitted. The entity nodes of the knowledge graph not only have entity names, but also generally have entity attributes, entity labels and entity relationships. According to the embodiment, the entity attributes, the entity labels and the entity relations are innovatively extracted from the map, and the character strings are spliced with the entity names, so that the Kmeans can learn richer characteristics of the entities.
Example two:
the invention also provides a terminal device for automatically extracting the large-scale knowledge graph ontology, which comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor executes the computer program to realize the steps of the method embodiment of the first embodiment of the invention.
Further, as an executable scheme, the large-scale knowledge base body automatic extraction terminal device may be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The large-scale knowledge graph ontology automatic extraction terminal device can comprise, but is not limited to, a processor and a memory. It will be understood by those skilled in the art that the above-mentioned composition structure of the large-scale automatic-knowledge-graph-ontology extracting terminal device is only an example of the large-scale automatic-knowledge-graph-ontology extracting terminal device, and does not constitute a limitation on the large-scale automatic-knowledge-graph-ontology extracting terminal device, and may include more or less components than the above-mentioned one, or combine some components, or different components, for example, the large-scale automatic-knowledge-graph-ontology extracting terminal device may further include an input-output device, a network access device, a bus, and the like, which is not limited in this embodiment of the present invention.
Further, as an executable solution, the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, and the like. The general processor may be a microprocessor or the processor may be any conventional processor, and the processor is a control center of the large-scale knowledge-graph ontology automatic extraction terminal device, and various interfaces and lines are used to connect various parts of the whole large-scale knowledge-graph ontology automatic extraction terminal device.
The memory can be used for storing the computer program and/or the module, and the processor realizes various functions of the large-scale knowledge-graph ontology automatic extraction terminal device by running or executing the computer program and/or the module stored in the memory and calling the data stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The invention also provides a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method of an embodiment of the invention.
The module/unit integrated by the large-scale knowledge graph ontology automatic extraction terminal device can be stored in a computer readable storage medium if the module/unit is realized in the form of a software functional unit and is sold or used as an independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), software distribution medium, and the like.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A large-scale knowledge graph ontology automatic extraction method is characterized by comprising the following steps:
s1: obtaining an entity from a knowledge graph;
s2: adopting a rule matching algorithm to carry out primary classification on the extracted entities;
s3: adopting a named entity recognition model to perform named entity recognition on the unclassified entity in the step S2, and confirming the type of the recognized named entity; the types include: a person name, place name, or organization name;
s4: classifying the remaining entities identified by the named entities in the step S3 by adopting a clustering algorithm; the clustering algorithm adopts a Kmeans clustering algorithm; the specific process of classifying by using the clustering algorithm is as follows:
s401: for each entity to be classified, extracting one or more of attributes, labels and relations from the knowledge graph, splicing the extracted attributes, labels and relations with the entity name, acquiring vector representation of each character in a spliced character string by using a natural language processing word vector technology, and taking the average value of the vector representations of all characters as the word vector of the entity to be classified;
s402: inputting the word vectors of the entities to be classified into a Kmeans model, and confirming the clustering number k by using an elbow method;
s403: simultaneously inputting the word vector representation of the entity to be classified and the clustering number k into a Kmeans model to obtain a clustering result;
s5: and merging and adjusting the classification results of the steps S2, S3 and S4 to obtain a final classification result.
2. The large-scale knowledge-graph ontology automatic extraction method according to claim 1, wherein: step S1 further includes preprocessing the acquired entities, the preprocessing including punctuation cleaning, abnormal length entity filtering, and conversion of capital letters into lowercase letters.
3. The large-scale knowledge-graph ontology automatic extraction method according to claim 1, wherein: the natural language word vector processing technique employed in step S401 is a bert-base-multilingual-uncased model trained on corpus of 102 languages.
4. The large-scale knowledge-graph ontology automatic extraction method according to claim 1, wherein: and if the number of the entities of a certain category in the final classification result is greater than the preset number threshold, re-performing the steps S2-S5 on the entities of the certain category for further classification.
5. The utility model provides an automatic extraction terminal equipment of large-scale knowledge map body which characterized in that: comprising a processor, a memory and a computer program stored in the memory and running on the processor, the processor implementing the steps of the method according to any of claims 1 to 4 when executing the computer program.
6. A computer-readable storage medium storing a computer program, characterized in that: the computer program when executed by a processor implementing the steps of the method as claimed in any one of claims 1 to 4.
CN202110380611.3A 2021-04-09 2021-04-09 Automatic extraction method of large-scale knowledge graph body, terminal equipment and storage medium Active CN113127645B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110380611.3A CN113127645B (en) 2021-04-09 2021-04-09 Automatic extraction method of large-scale knowledge graph body, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110380611.3A CN113127645B (en) 2021-04-09 2021-04-09 Automatic extraction method of large-scale knowledge graph body, terminal equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113127645A CN113127645A (en) 2021-07-16
CN113127645B true CN113127645B (en) 2022-09-13

Family

ID=76775510

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110380611.3A Active CN113127645B (en) 2021-04-09 2021-04-09 Automatic extraction method of large-scale knowledge graph body, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113127645B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114138923B (en) * 2021-12-03 2024-06-07 吉林大学 Method for constructing geological map knowledge graph
CN114691889B (en) * 2022-04-15 2024-04-12 中北大学 A method for constructing knowledge graph for fault diagnosis of turnout machine
CN115309906B (en) * 2022-09-19 2023-06-13 北京三维天地科技股份有限公司 Intelligent data classification method based on knowledge graph technology
US12028224B1 (en) 2023-02-17 2024-07-02 International Business Machines Corporation Converting an architecture document to infrastructure as code

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569405A (en) * 2019-08-26 2019-12-13 中电科大数据研究院有限公司 method for extracting government affair official document ontology concept based on BERT

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10558754B2 (en) * 2016-09-15 2020-02-11 Infosys Limited Method and system for automating training of named entity recognition in natural language processing
CN107330011B (en) * 2017-06-14 2019-03-26 北京神州泰岳软件股份有限公司 The recognition methods of the name entity of more strategy fusions and device
US10853576B2 (en) * 2018-12-13 2020-12-01 Hong Kong Applied Science and Technology Research Institute Company Limited Efficient and accurate named entity recognition method and apparatus
CN109858018A (en) * 2018-12-25 2019-06-07 中国科学院信息工程研究所 A kind of entity recognition method and system towards threat information

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569405A (en) * 2019-08-26 2019-12-13 中电科大数据研究院有限公司 method for extracting government affair official document ontology concept based on BERT

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于非分类关系提取技术的知识图谱构建;韦韬等;《工业技术创新》;20200425(第02期);第23-28页 *

Also Published As

Publication number Publication date
CN113127645A (en) 2021-07-16

Similar Documents

Publication Publication Date Title
CN113127645B (en) Automatic extraction method of large-scale knowledge graph body, terminal equipment and storage medium
CN107451126B (en) Method and system for screening similar meaning words
AU2017329098B2 (en) Method and device for processing question clustering in automatic question and answering system
CN111428485B (en) Judicial document paragraph classifying method, device, computer equipment and storage medium
US11429810B2 (en) Question answering method, terminal, and non-transitory computer readable storage medium
CN113780007B (en) Corpus screening method, intent recognition model optimization method, device and storage medium
CN112035599A (en) Query method and device based on vertical search, computer equipment and storage medium
CN109472022B (en) New word recognition method based on machine learning and terminal equipment
CN112100377B (en) Text classification method, apparatus, computer device and storage medium
CN112417121A (en) Customer intent identification method, device, computer equipment and storage medium
CN112131881A (en) Information extraction method and device, electronic equipment and storage medium
CN113609847B (en) Information extraction method, device, electronic equipment and storage medium
CN110633475A (en) Natural language understanding method, device and system based on computer scene and storage medium
CN110516057A (en) Method and device for answering petition questions
CN116010581A (en) Knowledge graph question-answering method and system based on power grid hidden trouble shooting scene
CN113837307A (en) Data similarity calculation method and device, readable medium and electronic equipment
CN113282729A (en) Question-answering method and device based on knowledge graph
CN112527985A (en) Unknown problem processing method, device, equipment and medium
CN114003725A (en) Information annotation model construction method and information annotation generation method
CN112487165A (en) Question and answer method, device and medium based on keywords
CN116992329A (en) Automatic classification and identification method and device for public network sensitive data
CN113095073B (en) Corpus tag generation method and device, computer equipment and storage medium
CN117273117A (en) Language model training method, rewarding model training device and electronic equipment
CN110633468A (en) Information processing method and device for object feature extraction
CN114722153A (en) Intention classification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant