CN112784063A

CN112784063A - Idiom knowledge graph construction method and device

Info

Publication number: CN112784063A
Application number: CN202110116596.1A
Authority: CN
Inventors: 李长亮; 汪美玲; 郭昱; 唐剑波
Original assignee: Chengdu Kingsoft Interactive Entertainment Technology Co ltd; Beijing Kingsoft Software Co Ltd
Current assignee: Chengdu Kingsoft Interactive Entertainment Technology Co ltd; Beijing Kingsoft Software Co Ltd
Priority date: 2019-03-15
Filing date: 2019-03-15
Publication date: 2021-05-11
Anticipated expiration: 2039-03-15
Also published as: CN109977233A; CN112784062A; CN109977233B; CN112784063B; CN112784062B

Abstract

The embodiment of the invention provides a idiom knowledge graph construction method, which comprises the following steps: acquiring a plurality of idioms to be processed and description information of each idiom to be processed; analyzing the description information of the idiom to be processed aiming at each idiom to be processed, and determining a label corresponding to the idiom to be processed; and constructing the knowledge graphs of the multiple idioms to be processed based on the incidence relation between the multiple idioms to be processed and the labels corresponding to the idioms to be processed. Therefore, the corresponding label can be determined for each idiom to be processed based on the description information, the knowledge graph is constructed based on the association relation between the label and the idiom to be processed, when a user queries the idioms, a plurality of corresponding idioms can be determined according to a certain label, and compared with a method for searching the idioms according to a specific idiom or a specific keyword, the method is beneficial for the user to obtain idiom information from more sides, and the idiom use requirement of the user is met.

Description

Idiom knowledge graph construction method and device

Technical Field

The invention relates to the technical field of information storage, in particular to a idiom knowledge graph construction method and device.

Background

In the existing network idiom dictionary, a large amount of idiom information including pronunciation, paraphrase, source, similar meaning word, antisense word and the like of each idiom is stored, so that idiom-related services can be provided for users.

In the related art, a relational database is usually used to store idiom information, so that a user can search for information related to a specific idiom by searching for the specific idiom, or search for an idiom related to a specific keyword in an paraphrase of the idiom and related information thereof.

However, in the relational database, it is difficult for the user to acquire idiom information from more sides, for example, although both "the ancient year" and "the Mao die year" are idioms about the age, the user has difficulty in acquiring information of both idioms by searching for "the age", and thus, the related art has difficulty in satisfying the idiom use requirements of the user.

Disclosure of Invention

The embodiment of the invention aims to provide a idiom knowledge graph construction method and device, so that idiom information can be obtained from more sides, and idiom use requirements of users are met. The specific technical scheme is as follows:

the embodiment of the invention provides a idiom knowledge graph construction method, which comprises the following steps:

acquiring a plurality of idioms to be processed and description information of each idiom to be processed;

analyzing the description information of the idiom to be processed aiming at each idiom to be processed, and determining a label corresponding to the idiom to be processed;

and constructing the knowledge graphs of the multiple idioms to be processed based on the incidence relation between the multiple idioms to be processed and the labels corresponding to the idioms to be processed.

Optionally, the analyzing, for each idiom to be processed, the description information of the idiom to be processed to determine a tag corresponding to the idiom to be processed includes:

performing word segmentation processing on the description information to obtain a word list corresponding to the idiom to be processed;

and screening the words with semantic similarity meeting preset conditions with the idioms to be processed from the word list to serve as labels corresponding to the idioms to be processed.

Optionally, the performing word segmentation processing on the description information to obtain a word list corresponding to the idiom to be processed includes:

filtering stop words and symbols in the description information to obtain filtering information;

and performing word segmentation processing on the filtering information to obtain a word list corresponding to the idiom to be processed.

Optionally, before the words whose semantic similarity with the to-be-processed idiom satisfies a preset condition are screened from the word list and used as the tags corresponding to the to-be-processed idiom, the method further includes:

acquiring a relevant word of each word in the word list, and adding the relevant word to the word list;

judging whether the number of the words in the word list changes, if so, returning to the step of acquiring the associated words of each word in the word list and adding the associated words to the word list, and if not, executing the step of screening the words, the semantic similarity of which with the idiom to be processed meets the preset condition, from the word list to serve as the label corresponding to the idiom to be processed.

Optionally, the constructing a knowledge graph of the multiple idioms to be processed based on the association relationship between the multiple idioms to be processed and the tags corresponding to each idiom to be processed includes:

generating idiom entities corresponding to the multiple idioms to be processed and label entities corresponding to labels corresponding to the idioms to be processed respectively;

and establishing an association relationship between each idiom entity and each label entity based on the association relationship between the multiple idioms to be processed and the labels corresponding to the idioms to be processed to obtain the knowledge maps of the multiple idioms to be processed.

Optionally, the description information includes: the pronunciation, paraphrase and origin of the idiom to be processed.

Optionally, after the storing the knowledge-graph, the method further includes:

acquiring a term to be queried;

querying a label matched with the term to be queried in the knowledge graph as a target label;

and outputting the idioms to be processed corresponding to the target tags.

The embodiment of the invention also provides a device for constructing the idiom knowledge graph, which comprises the following components:

the acquisition module is used for acquiring a plurality of idioms to be processed and the description information of each idiom to be processed;

the determining module is used for analyzing the description information of the idiom to be processed aiming at each idiom to be processed and determining a label corresponding to the idiom to be processed;

and the construction module is used for constructing the knowledge graph of the multiple idioms to be processed based on the incidence relation between the multiple idioms to be processed and the labels corresponding to the idioms to be processed.

Optionally, the determining module is specifically configured to:

Optionally, the determining module is further configured to:

and judging whether the number of the words in the word list changes, if so, returning to the step of acquiring the associated words of each word in the word list and adding the associated words to the word list, and if not, executing the step of screening the words, the semantic similarity of which with the to-be-processed idiom meets the preset condition, from the word list to serve as the labels corresponding to the to-be-processed idiom.

Optionally, the building module is specifically configured to:

Optionally, the apparatus further comprises:

the query module is used for acquiring terms to be queried; querying a label matched with the term to be queried in the knowledge graph as a target label; and outputting the idioms to be processed corresponding to the target tags.

The embodiment of the invention also provides electronic equipment which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing any idiom knowledge graph construction method when the program stored in the memory is executed.

The embodiment of the invention also provides a computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and when being executed by a processor, the computer program realizes any one of the idiom knowledge graph construction methods.

Embodiments of the present invention further provide a computer program product containing instructions, which when run on a computer, cause the computer to execute any one of the idiom knowledge-graph construction methods described above.

The idiom knowledge graph construction method and device provided by the embodiment of the invention comprise the steps of firstly, obtaining a plurality of idioms to be processed and description information of each idiom to be processed, analyzing the description information of the idiom to be processed aiming at each idiom to be processed, determining a label corresponding to the idiom to be processed, and then constructing a plurality of knowledge graphs of the idioms to be processed based on the idioms to be processed and the label corresponding to each idiom to be processed. Therefore, the corresponding label can be determined for each idiom to be processed based on the description information, the knowledge graph is constructed based on the association relation between the label and the idiom to be processed, when a user queries the idioms, a plurality of corresponding idioms can be determined according to a certain label, and compared with a method for searching the idioms according to a specific idiom or a specific keyword, the method is beneficial for the user to obtain idiom information from more sides, and the idiom use requirement of the user is met. Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a idiom knowledge graph construction method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of another idiom knowledge-graph building method according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an idiom knowledge-graph constructing apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Compared with the prior art, the embodiment of the invention provides the idiom knowledge graph construction method, and a computer, a server or other electronic equipment can construct the idiom knowledge graph by using the method.

The idiom knowledge graph construction method provided by the embodiment of the invention is generally explained below.

and constructing a knowledge graph of the multiple idioms to be processed based on the incidence relation between the multiple idioms to be processed and the labels corresponding to the idioms to be processed.

As can be seen from the above, the idiom knowledge graph construction method and apparatus provided in the embodiments of the present invention can determine a corresponding tag for each idiom to be processed based on the description information, and construct a knowledge graph based on the association relationship between the tags and the idioms to be processed, when a user performs idiom query, a plurality of corresponding idioms can be determined according to a certain tag, and compared with a method of searching idioms according to a specific idiom or a specific keyword, the method and apparatus are helpful for the user to obtain idiom information from more sides, and meet the idiom use requirements of the user.

The idiom knowledge graph construction method provided by the embodiment of the invention will be described in detail through specific embodiments.

As shown in fig. 1, a flow diagram of a idiom knowledge graph construction method provided in an embodiment of the present invention includes the following steps:

s101: and acquiring a plurality of idioms to be processed and the description information of each idiom to be processed.

In some scenarios, the electronic device (execution subject) may obtain some to-be-processed idioms, and store the to-be-processed idioms, so that a user can perform operations such as query and browsing on the to-be-processed idioms. And different storage modes of the idiom to be processed correspond to different query modes of the idiom to be processed.

In this step, the idioms to be processed may be idioms with any number of words, each idiom to be processed has its corresponding description information, and these description information may describe the idioms to be processed to distinguish them from other idioms. The description information may include one or more of pronunciation, paraphrase, and provenance information of the idiom to be processed, and is not limited specifically.

S102: and analyzing the description information of the idiom to be processed aiming at each idiom to be processed, and determining a label corresponding to the idiom to be processed.

After obtaining a plurality of idioms to be processed and the description information of each idiom to be processed, the description information of each idiom to be processed can be analyzed, and the label corresponding to each idiom to be processed is respectively determined, wherein each idiom to be processed can correspond to a unique label or a plurality of labels, and different idioms can correspond to the same label or different labels, and the details are not limited.

In one implementation manner, the manner of analyzing the description information of each idiom to be processed and determining the tag corresponding to the idiom to be processed may be: firstly, performing word segmentation processing on description information to obtain a word list corresponding to a to-be-processed idiom, and then screening words with semantic similarity meeting preset conditions with the to-be-processed idiom from the word list to serve as tags corresponding to the to-be-processed idiom.

For example, the shortest path algorithm may be used to perform word segmentation processing on the description information: firstly, the description information is segmented into a plurality of word string data, and an association diagram between the word string data is constructed according to the association relationship between the word string data. And then, calculating the association diagram by using a preset word frequency probability algorithm to obtain the word frequency probability of each associated word of the word string data. And eliminating ambiguity generated when the description information is segmented according to the word frequency probability of each associated word of the word string data aiming at each word string data, thereby identifying the words in the original text data more accurately. Alternatively, an n-gram model method, a maximum matching algorithm, a cross ambiguity algorithm, and the like may also be adopted, which is not limited in the embodiment of the present invention.

In addition, the semantic similarity between each word and the to-be-processed idiom may be calculated by using a jaccard similarity coefficient algorithm or a cosine similarity algorithm, and then words whose semantic similarity with the to-be-processed idiom satisfies a preset condition are screened from the word list, or a worker may manually review the words in the word list and the to-be-processed idiom, and words whose semantic similarity with the to-be-processed idiom satisfies a preset condition are screened from the word list, where the preset condition may be a word with the highest semantic similarity or a word whose semantic similarity reaches a preset threshold, and is not particularly limited.

When the word segmentation processing is performed on the description information to obtain the word list corresponding to the idiom to be processed, the stop words and symbols in the description information can be filtered to obtain the filtering information, and then the word segmentation processing is performed on the filtering information to obtain the word list corresponding to the idiom to be processed.

Therefore, repeated or useless information in the description information can be filtered, a more effective word list is obtained, and the efficiency and the accuracy of label extraction are further improved.

In addition, before the words with the semantic similarity to the idiom to be processed meeting the preset condition are screened from the word list and used as the tags corresponding to the idiom to be processed, association summarization processing can be further performed on the words in the word list.

The relevant word of each word may be a synonym or a synonym of the word, for example, if the word is "age", the relevant word may be "age", or the relevant word may be a hypernym of the word, for example, if the word is "fifty", the relevant word may be "age", and the like. When the relevant word of each word is obtained, the word can be inquired in a preset semantic dictionary, or the word can be input into an algorithm model obtained by pre-training for calculation, so that the relevant word of the word is obtained, and the method is not limited specifically.

Therefore, words in the word list can be richer and more general, and the efficiency and the accuracy of label extraction are further improved.

For example, the descriptor "ancient rare year" indicates that the person can live to the age of seventy, rare since ancient times "," Mao die year "indicates that the descriptor is" very old, after the two idioms to be processed and the description information thereof are obtained, word segmentation processing can be carried out on the description information to obtain a word list corresponding to the idioms to be processed, wherein the word list for "ancient rare year" may be "people \ seventy years old \ rare", "Madie year" may be "old \ big", then, the words in the word list can be summarized in a higher order, for example, the related word of "seventies" is "age", "the related word of" ages "is also" age ", furthermore, the words with the highest similarity with the idioms to be processed can be screened from the word list to be used as the labels corresponding to the idioms to be processed, thus, the labels corresponding to "ancient Mao year" and "die year" may both be "age".

S103: and constructing a knowledge graph of the multiple idioms to be processed based on the incidence relation between the multiple idioms to be processed and the labels corresponding to the idioms to be processed.

The knowledge map is also called a scientific knowledge map, is a knowledge domain visual mapping map, and can describe knowledge resources and carriers thereof by using a visual technology. That is, based on the association relationship between multiple idioms to be processed and the corresponding tags of each idiom to be processed, a knowledge graph of the multiple idioms to be processed can be constructed, so that the idioms to be processed can be visually described, and therefore, the idioms and the interrelations among the idioms can be mined, analyzed, constructed, drawn and displayed by a user.

For example, the method for constructing the knowledge graph of multiple idioms to be processed based on the association relationship between the multiple idioms to be processed and the corresponding tags of each idiom to be processed may be: firstly, generating a plurality of idioms corresponding to the idioms to be processed and label entities corresponding to labels corresponding to the idioms to be processed respectively, and then establishing the association between each idiom entity and each label entity based on the association between the idioms to be processed and the labels corresponding to the idioms to be processed to obtain a knowledge graph of the idioms to be processed.

In one implementation, after constructing a plurality of knowledge maps of idioms to be processed, a user can utilize the knowledge maps to perform idiom query.

For example, a user may input any term to be queried, and after obtaining the term to be queried, the electronic device (the execution main body) may query, in the knowledge base, a tag that matches the term to be queried, as a target tag, and then output a to-be-processed idiom corresponding to the target tag. Thus, the user can inquire all idioms related to the terms to be inquired.

For example, when the user inputs "age", the electronic device (the executing entity) may query the knowledge graph for the tag matching "age", and then output the idioms to be processed corresponding to the target tag, such as "ancient rare year" and "Matom die" year ", so as to further satisfy the idiom usage requirement of the user.

As can be seen from the above, the idiom knowledge graph construction method provided in the embodiment of the present invention can determine a corresponding tag for each idiom to be processed, construct a knowledge graph based on an association relationship between the tags and the idioms to be processed, and store the knowledge graph, so that the stored idioms are more organized, so that a user can query the idioms according to the tags.

As shown in fig. 2, a schematic flow chart of another idiom knowledge graph building method provided in the embodiment of the present invention includes the following steps:

s201: and acquiring a plurality of idioms to be processed and the description information of each idiom to be processed.

S202: and filtering stop words and symbols in the description information to obtain filtering information.

The stop words and symbols in the description information can be filtered to obtain filtered information, and then the filtered information is subjected to word segmentation to obtain a word list corresponding to the idioms to be processed.

S203: and performing word segmentation processing on the filtering information to obtain a word list corresponding to the idioms to be processed.

For example, the shortest path algorithm may be used to perform word segmentation processing on the description information: firstly, the description information is segmented into a plurality of word string data, and an association diagram between the word string data is constructed according to the association relationship between the word string data. And then, calculating the association diagram by using a preset word frequency probability algorithm to obtain the word frequency probability of each associated word of the word string data. And eliminating ambiguity generated when the description information is segmented according to the word frequency probability of each associated word of the word string data aiming at each word string data, thereby identifying the words in the original text data more accurately.

Alternatively, an n-gram model method, a maximum matching algorithm, a cross ambiguity algorithm, and the like may also be adopted, which is not limited in the embodiment of the present invention.

S204: and acquiring the associated words of each word in the word list, and adding the associated words to the word list.

S205: and judging whether the number of the words in the word list is changed, if so, returning to the S204, and if not, executing the S206.

For example, firstly, a relevant word of each word in the word list can be obtained, the relevant word is added into the word list, then, whether the number of the words in the word list changes or not is judged, if the number of the words in the word list changes, the words in the word list are continuously summarized until the number of the words in the word list does not change, and then, words whose semantic similarity to the idiom to be processed meets the preset condition are screened from the word list to serve as tags corresponding to the idiom to be processed.

S206: and screening the words with semantic similarity meeting preset conditions with the idioms to be processed from the word list to serve as labels corresponding to the idioms to be processed.

For example, a Jacard similarity coefficient algorithm or a cosine similarity algorithm may be used to calculate the semantic similarity between each word and the to-be-processed idiom, and then a word whose semantic similarity with the to-be-processed idiom satisfies a preset condition is selected from the word list, or a worker may manually review the word and the to-be-processed idiom in the word list, and a word whose semantic similarity with the to-be-processed idiom satisfies a preset condition is selected from the word list, where the preset condition may be the word with the highest semantic similarity or the word whose semantic similarity reaches a preset threshold, and is not particularly limited.

Each idiom to be processed may correspond to a unique tag or to multiple tags, and different idioms may correspond to the same tag or to different tags, which is not limited specifically.

S207: and constructing a knowledge graph of the multiple idioms to be processed based on the incidence relation between the multiple idioms to be processed and the labels corresponding to the idioms to be processed.

The embodiment of the present invention further provides a idiom knowledge graph constructing apparatus, as shown in fig. 3, which is a schematic structural diagram of the idiom knowledge graph constructing apparatus provided in the embodiment of the present invention, and the apparatus includes:

an obtaining module 301, configured to obtain multiple idioms to be processed and description information of each idiom to be processed;

a determining module 302, configured to analyze, for each idiom to be processed, description information of the idiom to be processed, and determine a tag corresponding to the idiom to be processed;

a building module 303, configured to build a knowledge graph of the multiple idioms to be processed based on the association relationship between the multiple idioms to be processed and the tags corresponding to each idiom to be processed.

In an implementation manner, the determining module 302 is specifically configured to:

In one implementation, the determining module 302 is further configured to:

In an implementation manner, the building module 303 is specifically configured to:

In one implementation, the description information includes: the pronunciation, paraphrase and origin of the idiom to be processed.

In one implementation, the apparatus further includes:

a query module 304, configured to obtain a term to be queried; querying a label matched with the term to be queried in the knowledge graph as a target label; and outputting the idioms to be processed corresponding to the target tags.

As can be seen from the above, the idiom knowledge graph constructing device provided in the embodiment of the present invention can determine a corresponding tag for each idiom to be processed, construct a knowledge graph based on an association relationship between the tags and the idioms to be processed, and store the knowledge graph, so that the stored idioms are more organized, so that a user can query the idioms according to the tags.

An embodiment of the present invention further provides an electronic device, as shown in fig. 4, including a processor 401, a communication interface 402, a memory 403, and a communication bus 404, where the processor 401, the communication interface 402, and the memory 403 complete mutual communication through the communication bus 404,

a memory 403 for storing a computer program;

the processor 401, when executing the program stored in the memory 403, implements the following steps:

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

As can be seen from the above, the idiom knowledge graph construction method and apparatus provided in the embodiments of the present invention can determine a corresponding tag for each idiom to be processed, construct a knowledge graph based on an association relationship between the tags and the idioms to be processed, and store the knowledge graph, so that the stored idioms are more organized, so that a user can query the idioms according to the tags.

In another embodiment of the present invention, there is further provided a computer-readable storage medium, having stored therein instructions, which when executed on a computer, cause the computer to execute the idiomatic knowledge-graph constructing method according to any one of the above embodiments.

In yet another embodiment of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the idiomatic knowledge-graph construction method of any one of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, the electronic device embodiment and the storage medium embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and in relation to the description, reference may be made to some portions of the description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A idiom knowledge graph construction method is characterized by comprising the following steps:

for each word in the word list, performing association summarization processing on the word list based on the associated word of the word;

screening words with semantic similarity meeting preset conditions with the idiom to be processed from the processed word list, and taking the words as labels corresponding to the idiom to be processed;

2. The method according to claim 1, wherein the constructing the knowledge graph of the multiple idioms based on the association relationship between the multiple idioms to be processed and the corresponding tags of each idiom to be processed comprises:

3. The method of claim 1, wherein the description information comprises: the pronunciation, paraphrase and origin of the idiom to be processed.

4. The method of claim 1, further comprising:

after the knowledge graph is stored, obtaining terms to be inquired;

and outputting the idioms to be processed corresponding to the target tags.

5. The method according to claim 1, wherein the performing word segmentation processing on the description information to obtain a word list corresponding to the idiom to be processed comprises:

and performing word segmentation processing on the description information by adopting a shortest path algorithm to obtain a word list corresponding to the idiom to be processed.

6. The method of claim 5, wherein the performing word segmentation processing on the description information by using a shortest path algorithm to obtain a word list corresponding to the idiom to be processed comprises:

segmenting the description information to obtain a plurality of word string data;

constructing an association diagram among the word string data according to the association relationship among the word string data;

calculating the association diagram by using a preset word frequency probability algorithm to obtain the word frequency probability of each associated word of the word string data;

and aiming at each word string data, obtaining a word list corresponding to the idiom to be processed according to the word frequency probability of each relevant word of the word string data.

7. The method according to claim 1, wherein before the filtering, from the processed word list, words whose semantic similarity to the idiom to be processed satisfies a preset condition as tags corresponding to the idiom to be processed, the method further comprises:

calculating the semantic similarity between each word in the processed word list and the to-be-processed idiom based on a preset similarity algorithm; wherein the preset similarity algorithm is a Jacard similarity coefficient algorithm or a cosine similarity algorithm.

8. The method of claim 1, wherein the associated word of the word comprises at least one of: a synonym of the term, and a hypernym of the term.

9. The method of claim 1, wherein before the associating and summarizing the word list based on the associated words of each word in the word list, the method further comprises:

inquiring each word in the word list in a preset semantic dictionary to obtain a relevant word of the word; or,

and aiming at each word in the word list, inputting the word into a pre-trained algorithm model to obtain the associated word of the word.

10. An idiom knowledge graph building apparatus, the apparatus comprising:

the word list generating module is used for carrying out word segmentation processing on the description information to obtain a word list corresponding to the idiom to be processed;

the association summarization processing module is used for carrying out association summarization processing on the word list based on the associated words of each word in the word list;

the label acquisition module is used for screening the words of which the semantic similarity with the idiom to be processed meets the preset condition from the processed word list to be used as labels corresponding to the idiom to be processed;

11. The apparatus according to claim 10, wherein the building block is specifically configured to:

12. The apparatus of claim 10, wherein the description information comprises: the pronunciation, paraphrase and origin of the idiom to be processed.

13. The apparatus of claim 10, further comprising:

14. The apparatus according to claim 10, wherein the word list generating module is specifically configured to perform word segmentation processing on the description information by using a shortest path algorithm to obtain a word list corresponding to the to-be-processed idiom.

15. The apparatus according to claim 14, wherein the word list generating module is specifically configured to perform segmentation processing on the description information to obtain a plurality of word string data;

16. The apparatus of claim 10, further comprising:

the semantic similarity calculation module is used for screening words, the semantic similarity of which with the idiom to be processed meets a preset condition, from the word list, and calculating the semantic similarity of each word in the word list with the idiom to be processed based on a preset similarity calculation method before the words are used as labels corresponding to the idiom to be processed; wherein the preset similarity algorithm is a Jacard similarity coefficient algorithm or a cosine similarity algorithm.

17. The apparatus of claim 10, wherein the associated word of the word comprises at least one of: a synonym of the term, and a hypernym of the term.

18. The apparatus of claim 10, further comprising:

the relevant word acquisition module is used for inquiring each word in the word list in a preset semantic dictionary to obtain a relevant word of the word; or,

19. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method of any one of claims 1 to 9 when executing a program stored in a memory.

20. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 9.