CN114579755A - Method and device for constructing traditional Chinese medicine knowledge map - Google Patents

Method and device for constructing traditional Chinese medicine knowledge map Download PDF

Info

Publication number
CN114579755A
CN114579755A CN202210094947.8A CN202210094947A CN114579755A CN 114579755 A CN114579755 A CN 114579755A CN 202210094947 A CN202210094947 A CN 202210094947A CN 114579755 A CN114579755 A CN 114579755A
Authority
CN
China
Prior art keywords
chinese medicine
text
entity relationship
traditional chinese
knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210094947.8A
Other languages
Chinese (zh)
Inventor
李响
胡鑫平
刘沛丰
李井娜
程佩玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Borui Tongyun Technology Co ltd
Original Assignee
Beijing Borui Tongyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Borui Tongyun Technology Co ltd filed Critical Beijing Borui Tongyun Technology Co ltd
Priority to CN202210094947.8A priority Critical patent/CN114579755A/en
Publication of CN114579755A publication Critical patent/CN114579755A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/90ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to alternative medicines, e.g. homeopathy or oriental medicines
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/20ICT specially adapted for the handling or processing of medical references relating to practices or guidelines

Abstract

The invention provides a method and a device for constructing a traditional Chinese medicine knowledge graph, wherein the method comprises the following steps: pre-establishing a traditional Chinese medicine entity relationship extraction model, wherein the traditional Chinese medicine entity relationship extraction model is used for splitting an input traditional Chinese medicine knowledge text into three text sections, each text section comprises two marked traditional Chinese medicine entities, and semantic information of entity relationship comprising the two traditional Chinese medicine entities is identified from the three text sections; training the traditional Chinese medicine entity relationship extraction model to generate a trained traditional Chinese medicine entity relationship extraction model; performing entity relation extraction on a plurality of Chinese medicine knowledge texts by using the trained Chinese medicine entity relation extraction model to generate entity relations among the Chinese medicine entities; and generating a traditional Chinese medicine knowledge map according to the entity relationship among the traditional Chinese medicine entities. The invention provides a method and a device for constructing a traditional Chinese medicine knowledge graph, which can improve the reliability of traditional Chinese medicine entity relationship in the traditional Chinese medicine knowledge graph.

Description

Method and device for constructing traditional Chinese medicine knowledge map
Technical Field
The invention relates to the technical field of knowledge maps, in particular to a method and a device for constructing a traditional Chinese medicine knowledge map.
Background
Traditional Chinese medicine is a precious medical wealth, and how to put more and more attention on digitalization of traditional Chinese medicine is paid. The construction of the traditional Chinese medicine knowledge graph is an important technical means for realizing traditional Chinese medicine digitization.
At present, the construction of the traditional Chinese medicine knowledge graph is mainly realized by the following steps: traditional Chinese medicine entity relations are extracted from a large amount of traditional Chinese medicine literature data, and a traditional Chinese medicine knowledge graph is constructed based on the traditional Chinese medicine entity relations.
Wherein, the accuracy of extracting the traditional Chinese medicine entity relationship determines the reliability of the traditional Chinese medicine knowledge graph to a great extent. The traditional algorithm for extracting the traditional Chinese medicine entity relationship has low accuracy of the traditional Chinese medicine entity relationship, so that the reliability of the constructed traditional Chinese medicine knowledge graph is low.
Disclosure of Invention
The embodiment of the invention provides a method and a device for constructing a traditional Chinese medicine knowledge graph, which can improve the reliability of traditional Chinese medicine entity relationship in the traditional Chinese medicine knowledge graph.
In a first aspect, an embodiment of the present invention provides a method for constructing a traditional Chinese medicine knowledge graph, where the method includes:
pre-establishing a traditional Chinese medicine entity relationship extraction model, wherein the traditional Chinese medicine entity relationship extraction model is used for splitting an input traditional Chinese medicine knowledge text into three text sections, each text section comprises two marked traditional Chinese medicine entities, and semantic information of entity relationship comprising the two traditional Chinese medicine entities is identified from the three text sections;
training the Chinese medicine entity relationship extraction model to generate a trained Chinese medicine entity relationship extraction model;
performing entity relation extraction on a plurality of Chinese medicine knowledge texts by using the trained Chinese medicine entity relation extraction model to generate entity relations among the Chinese medicine entities;
and generating a traditional Chinese medicine knowledge map according to the entity relationship among the traditional Chinese medicine entities.
In a first implementation manner of the first aspect, the extracting entity relationships of the plurality of chinese medicine knowledge texts by using the trained chinese medicine entity relationship extraction model to generate the entity relationships between the chinese medicine entities includes:
aiming at any Chinese medicine knowledge text, the following steps are carried out:
splitting three text sections from the Chinese medicine knowledge text, wherein each text section comprises two marked Chinese medicine entities;
determining a feature vector of each text segment;
generating a feature matrix of the Chinese medicine knowledge text according to the feature vectors of the three text segments;
performing convolution operation on the characteristic matrix to generate a characteristic diagram of the Chinese medicine knowledge text;
performing pooling operation on the characteristic graph to generate a pooled vector of the Chinese medicine knowledge text;
and inputting the pooled vectors into an output layer of the Chinese medicine entity relationship extraction model, and extracting the entity relationship between two Chinese medicine entities in the Chinese medicine knowledge text.
In a second implementation manner of the first aspect, the generating a feature matrix of the chinese medical knowledge text according to the feature vectors of the three text segments includes:
executing the following steps for the feature vector of any text segment:
determining a feature matrix of the text segment according to a first formula and a feature vector of the text segment, wherein the first formula is as follows:
Figure BDA0003490664360000021
wherein, ZqFeature matrix for the q-th text segment, CqIs the feature vector of the q text segment, B is a preset initial matrix, D is the difference of the entity vectors of two Chinese medicine entities in the Chinese medicine knowledge text, CrThe feature vector of the r text segment;
and generating the characteristic matrix of the Chinese medicine knowledge text according to the characteristic matrices of the three text sections.
In a third implementation manner of the first aspect, the performing a pooling operation on the feature map to generate a pooled vector includes:
pooling the feature map by using a maximum pooling algorithm to generate a first vector;
pooling the feature map by using an average pooling algorithm to generate a second vector;
and generating the pooled vector according to the first vector and the second vector.
In a fourth implementation manner of the first aspect, the training the chinese medical entity-relationship extraction model includes:
training the traditional Chinese medicine entity relationship extraction model by utilizing a first loss function, wherein the first loss function is as follows:
L=(1-softmax(dm+h))(||δ||2+lg(softmax(dm+h)))
wherein, L is a value of a first loss function, δ is a hyperparameter of the TCM entity relationship extraction model, τ is a preset adjustment parameter of a sample, d is a preset first proportionality coefficient, h is a preset second proportionality coefficient, and m is a pooled vector of the sample.
In a second aspect, an embodiment of the present invention provides an apparatus for constructing a traditional Chinese medicine knowledge graph, where the apparatus includes:
the system comprises a training module, a data processing module and a data processing module, wherein the training module is used for training a pre-established traditional Chinese medicine entity relationship extraction model to generate the trained traditional Chinese medicine entity relationship extraction model, the traditional Chinese medicine entity relationship extraction model is used for splitting an input traditional Chinese medicine knowledge text into three text sections, each text section comprises two marked traditional Chinese medicine entities, and semantic information of entity relationship comprising the two traditional Chinese medicine entities is identified from the three text sections;
the extraction module is used for extracting the entity relationship of the plurality of Chinese medicine knowledge texts by using the trained Chinese medicine entity relationship extraction model to generate the entity relationship among the Chinese medicine entities;
and the generating module is used for generating the traditional Chinese medicine knowledge graph according to the entity relationship among the traditional Chinese medicine entities.
In a first implementation manner of the second aspect, the extraction module is specifically configured to:
aiming at any Chinese medicine knowledge text, the following steps are carried out:
splitting three text sections from the Chinese medicine knowledge text, wherein each text section comprises two marked Chinese medicine entities;
determining a feature vector of each text segment;
generating a feature matrix of the Chinese medicine knowledge text according to the feature vectors of the three text segments;
performing convolution operation on the characteristic matrix to generate a characteristic diagram of the Chinese medicine knowledge text;
performing pooling operation on the characteristic graph to generate a pooled vector of the Chinese medicine knowledge text;
and inputting the pooled vectors into an output layer of the Chinese medicine entity relationship extraction model, and extracting the entity relationship between two Chinese medicine entities in the Chinese medicine knowledge text.
In a second implementation manner of the second aspect, when the extracting module executes the generating of the feature matrix of the chinese medical knowledge text according to the feature vectors of the three text segments, the extracting module is specifically configured to:
executing the following steps for the feature vector of any text segment:
determining a feature matrix of the text segment according to a first formula and a feature vector of the text segment, wherein the first formula is as follows:
Figure BDA0003490664360000041
wherein Z isqFeature matrix for the q-th text segment, CqIs the characteristic vector of the qth text segment, B is a preset initial matrix, D is the difference between the entity vectors of two Chinese medicine entities in the Chinese medicine knowledge text, CrThe feature vector of the r text segment;
and generating the characteristic matrix of the Chinese medicine knowledge text according to the characteristic matrices of the three text sections.
In a third aspect, an embodiment of the present invention provides an apparatus for constructing a traditional Chinese medicine knowledge graph, including: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is configured to invoke the machine readable program to perform the method for constructing a knowledgeable traditional Chinese medicine atlas of the first aspect.
In a fourth aspect, the embodiments of the present invention provide a computer readable medium, on which computer instructions are stored, and when executed by a processor, the computer instructions cause the processor to execute the method for constructing a traditional Chinese medicine knowledge graph according to any one of the first aspect.
In the embodiment of the invention, the Chinese medicine knowledge text to be processed comprises two marked Chinese medicine entities, a Chinese medicine entity relationship extraction model is established, the Chinese medicine knowledge text is divided into three text sections by the model, each text section comprises the two Chinese medicine entities in the Chinese medicine knowledge text, the Chinese medicine entity relationship extraction model can extract semantic information about the two Chinese medicine entities from different parts of the Chinese medicine knowledge text, and key extraction can be performed on text contents between the two Chinese medicine entities, so that the entity relationship of the two extracted Chinese medicine entities is more accurate, and the reliability of a Chinese medicine knowledge map generated based on the extracted entity relationship is higher.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of a method for constructing a knowledge graph of TCM according to an embodiment of the invention;
fig. 2 is a schematic diagram of an apparatus for constructing a traditional Chinese medicine knowledge graph according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
How to extract accurate TCM entity relationships from a large amount of TCM data is the key to construct a reliable TCM knowledge map. At present, an algorithm for extracting a traditional Chinese medicine entity relationship is mainly determined based on a position relationship and a co-occurrence probability of two entities, if the positions of the two entities in a text are closer, the two entities are determined to be associated and have a certain entity relationship, and if the frequency of the two entities co-occurring in the same text is higher, the two entities are determined to be associated and have a certain entity relationship. However, the traditional Chinese medicine texts are generally complex, and the accuracy of the current algorithm is low when extracting the traditional Chinese medicine entity relationship, so that the reliability of the traditional Chinese medicine knowledge graph obtained based on the extracted traditional Chinese medicine entity relationship is low.
In order to solve the problems in the prior art, as shown in fig. 1, an embodiment of the present invention provides a method for constructing a traditional Chinese medicine knowledge graph, including:
step 101: pre-establishing a traditional Chinese medicine entity relationship extraction model, wherein the traditional Chinese medicine entity relationship extraction model is used for splitting an input traditional Chinese medicine knowledge text into three text sections, each text section comprises two marked traditional Chinese medicine entities, and semantic information of entity relationship comprising the two traditional Chinese medicine entities is identified from the three text sections;
step 102: training the Chinese medicine entity relationship extraction model to generate a trained Chinese medicine entity relationship extraction model;
step 103: performing entity relation extraction on a plurality of Chinese medicine knowledge texts by using the trained Chinese medicine entity relation extraction model to generate entity relations among the Chinese medicine entities;
step 104: and generating a traditional Chinese medicine knowledge map according to the entity relationship among the traditional Chinese medicine entities.
In the embodiment of the invention, the Chinese medicine knowledge text to be processed comprises two marked Chinese medicine entities, a Chinese medicine entity relationship extraction model is established, the Chinese medicine knowledge text is divided into three text sections by the model, each text section comprises the two Chinese medicine entities in the Chinese medicine knowledge text, the Chinese medicine entity relationship extraction model can extract semantic information about the two Chinese medicine entities from different parts of the Chinese medicine knowledge text, and key extraction can be performed on text contents between the two Chinese medicine entities, so that the entity relationship of the two extracted Chinese medicine entities is more accurate, and the reliability of a Chinese medicine knowledge map generated based on the extracted entity relationship is higher.
The input of the Chinese medicine entity relationship extraction model is a Chinese medicine knowledge text, and the output is the entity relationship between two Chinese medicine entities in the Chinese medicine knowledge text.
The following will describe in detail the way of splitting the chinese medicine knowledge text into three text segments by taking a chinese medicine knowledge text as an example, where the chinese medicine knowledge text is:
the cucumber stem is taken after being heated up, so that the gastritis can be treated, and a plurality of friends try the gastritis to be effective. "
The two traditional Chinese medicine entities in the traditional Chinese medicine knowledge text are respectively: cucumber stem and gastritis.
The first text segment split is: "the cucumber stem is insisted on being boiled with water and drunk to treat the gastritis;
the second text segment split is: the cucumber stem can be used for treating gastritis after being boiled in water;
the third text segment split is: the cucumber stem is boiled with water and drunk, can treat gastritis, and is very effective when being tried by many friends. "
Therefore, the cucumber vine is boiled and drunk, the most important entity relationship information of the cucumber vine and the gastritis is contained in the gastritis treating medicine, the three text sections contain the part of information, and the traditional Chinese medicine entity relationship extraction model can recognize the part of information for multiple times, so that the entity relationship can be accurately extracted. In addition, the first text section and the third text section also contain partial entity relationship information of two traditional Chinese medicine entities of cucumber vine and gastritis, and the entity relationship extracted from the traditional Chinese medicine knowledge text can be more comprehensive and accurate by identifying and extracting the two text sections.
In an embodiment of the present invention, the extracting an entity relationship from a plurality of chinese medicine knowledge texts by using the trained chinese medicine entity relationship extraction model to generate an entity relationship between the chinese medicine entities includes:
aiming at any Chinese medicine knowledge text, the following steps are carried out:
splitting three text sections from the Chinese medicine knowledge text, wherein each text section comprises two marked Chinese medicine entities;
determining a feature vector of each text segment;
generating a feature matrix of the Chinese medicine knowledge text according to the feature vectors of the three text segments;
performing convolution operation on the characteristic matrix to generate a characteristic diagram of the Chinese medicine knowledge text;
performing pooling operation on the characteristic graph to generate a pooled vector of the Chinese medicine knowledge text;
and inputting the pooled vectors into an output layer of the Chinese medicine entity relationship extraction model, and extracting the entity relationship between two Chinese medicine entities in the Chinese medicine knowledge text.
In the embodiment of the invention, the semantic information identified from the text segment is converted into a vector, and the subsequent operation is carried out in a vector mode. The following information may be included in the feature vector of each text segment: the number of words in the text segment, the length of the text segment, the distance between each word and the Chinese medicine entity in the text segment, and the like. The semantic information of the text segment can be comprehensively and accurately described based on the feature vector of the text segment.
In an embodiment of the present invention, the generating a feature matrix of the chinese medical knowledge text according to the feature vectors of the three text segments includes:
executing the following steps for the feature vector of any text segment:
determining a feature matrix of the text segment according to a first formula and a feature vector of the text segment, wherein the first formula is as follows:
Figure BDA0003490664360000081
wherein Z isqFeature matrix for the q-th text segment, CqIs the characteristic vector of the qth text segment, B is a preset initial matrix, D is the difference between the entity vectors of two Chinese medicine entities in the Chinese medicine knowledge text, CrThe feature vector of the r text segment;
and generating the characteristic matrix of the Chinese medicine knowledge text according to the characteristic matrices of the three text segments.
In the embodiment of the invention, the feature matrix of each text segment can be calculated by the first formula, the importance degree of each text segment can be embodied by the first formula, the more important the text segment is, the higher the importance degree of the corresponding feature matrix in the feature matrix of the Chinese medicine knowledge text is, and the feature matrix of the Chinese medicine knowledge text generated by the method is input into the output layer, so that the extracted entity relationship is more accurate.
In an embodiment of the present invention, the performing a pooling operation on the feature map to generate a pooled vector includes:
pooling the feature map by using a maximum pooling algorithm to generate a first vector;
pooling the feature map by using an average pooling algorithm to generate a second vector;
and generating the pooled vector according to the first vector and the second vector.
In the embodiment of the invention, two algorithms, namely a maximum pooling algorithm and a mean pooling algorithm, are adopted to pool the feature maps of the Chinese medicine knowledge text respectively. The most important semantic information in the Chinese medicine knowledge text can be extracted through a maximum pooling algorithm, the second vector obtained through an average pooling algorithm represents the semantic information of the Chinese medicine knowledge text in an average mode, the first vector and the second vector obtained through the two algorithms are spliced to obtain a pooled vector, and the pooled backward vector contains more semantic information in the Chinese medicine knowledge text, so that the entity relationship of subsequent extraction is more accurate.
In the field of traditional Chinese medicine, fewer positive samples and more negative samples are used for training the traditional Chinese medicine entity relationship extraction model, and the accuracy of the traditional Chinese medicine entity relationship extraction model in extracting the entity relationship can be adversely affected by training the traditional Chinese medicine entity relationship extraction model through the sample data.
In order to further improve the accuracy of extracting the entity relationship by the traditional Chinese medicine entity relationship extraction model, in an embodiment of the present invention, the training of the traditional Chinese medicine entity relationship extraction model includes:
training the traditional Chinese medicine entity relationship extraction model by utilizing a first loss function, wherein the first loss function is as follows:
L=(1-softmax(dm+h))(||δ||2+lg(softmax(dm+h)))
wherein, L is a value of a first loss function, δ is a hyperparameter of the TCM entity relationship extraction model, τ is a preset adjustment parameter of a sample, d is a preset first proportionality coefficient, h is a preset second proportionality coefficient, and m is a pooled vector of the sample.
In the embodiment of the present invention, the purpose of the first loss function is to optimize internal parameters of the entity relationship extraction model of the chinese medicine, so that the entity relationship extraction model of the chinese medicine can extract the entity relationship between the entities of the chinese medicine more accurately. In the first loss function, the proportion of the positive sample and the negative sample is adjusted through the adjustment parameters of the samples preset by tau, so that the positive sample and the negative sample are more balanced, the influence of the positive sample and the negative sample on the first loss function is more balanced, the internal parameters of the traditional Chinese medicine entity relationship extraction model obtained through the optimization of the first loss function can further improve the accuracy of the traditional Chinese medicine entity relationship extraction model after training for extracting the traditional Chinese medicine entity relationship.
As shown in fig. 2, an embodiment of the present invention provides an apparatus for constructing a traditional chinese medicine knowledge graph, including:
a training module 201, configured to train a pre-established chinese medicine entity relationship extraction model to generate a trained chinese medicine entity relationship extraction model, where the chinese medicine entity relationship extraction model is configured to split an input chinese medicine knowledge text into three text segments, each text segment includes two marked chinese medicine entities, and semantic information including an entity relationship between the two chinese medicine entities is identified from the three text segments;
an extraction module 202, configured to perform entity relationship extraction on the multiple chinese medicine knowledge texts by using the trained chinese medicine entity relationship extraction model to generate entity relationships between the individual chinese medicine entities;
the generating module 203 is configured to generate a traditional Chinese medicine knowledge graph according to the entity relationship among the traditional Chinese medicine entities.
In an embodiment of the present invention, the extraction module is specifically configured to:
aiming at any Chinese medicine knowledge text, the following steps are carried out:
splitting three text sections from the Chinese medicine knowledge text, wherein each text section comprises two marked Chinese medicine entities;
determining a feature vector of each text segment;
generating a feature matrix of the Chinese medicine knowledge text according to the feature vectors of the three text segments;
performing convolution operation on the characteristic matrix to generate a characteristic diagram of the Chinese medicine knowledge text;
performing pooling operation on the characteristic graph to generate a pooled vector of the Chinese medicine knowledge text;
and inputting the pooled vectors into an output layer of the Chinese medicine entity relationship extraction model, and extracting the entity relationship between two Chinese medicine entities in the Chinese medicine knowledge text.
In an embodiment of the present invention, when the extracting module executes the feature vector according to the three text segments to generate the feature matrix of the knowledge text of chinese medicine, the extracting module is specifically configured to:
executing the following steps for the feature vector of any text segment:
determining a feature matrix of the text segment according to a first formula and a feature vector of the text segment, wherein the first formula is as follows:
Figure BDA0003490664360000101
wherein Z isqFeature matrix for the q-th text segment, CqIs the characteristic vector of the qth text segment, B is a preset initial matrix, D is the difference between the entity vectors of two Chinese medicine entities in the Chinese medicine knowledge text, CrA feature vector of an r text segment;
and generating the characteristic matrix of the Chinese medicine knowledge text according to the characteristic matrices of the three text segments.
In an embodiment of the present invention, when performing the pooling operation on the feature map and generating a pooled vector, the extracting module is specifically configured to:
pooling the feature map by using a maximum pooling algorithm to generate a first vector;
pooling the feature map by using an average pooling algorithm to generate a second vector;
and generating the pooled vector according to the first vector and the second vector.
In an embodiment of the present invention, the training module is specifically configured to train the chinese medical entity relationship extraction model by using a first loss function, where the first loss function is:
L=(1-softmax(dm+h))(||δ||2+lg(softmax(dm+h)))
wherein, L is a value of a first loss function, δ is a hyperparameter of the TCM entity relationship extraction model, τ is a preset adjustment parameter of a sample, d is a preset first proportionality coefficient, h is a preset second proportionality coefficient, and m is a pooled vector of the sample.
In an embodiment of the present invention, an embodiment of the present invention provides an apparatus for constructing a traditional Chinese medicine knowledge graph, including: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is configured to invoke the machine readable program to perform any of the methods for constructing a knowledgeable traditional Chinese medicine profile of the present invention.
In one embodiment of the present invention, a computer readable medium is provided, which stores computer instructions, and when executed by a processor, causes the processor to execute any one of the methods for constructing a traditional Chinese medicine knowledge graph according to the embodiments of the present invention.
It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit an apparatus for constructing a knowledge graph of chinese medical science. In other embodiments of the present invention, an apparatus for constructing a knowledgeable map of traditional Chinese medicine may include more or fewer components than those shown, or may combine certain components, or may split certain components, or may be arranged in different ways. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Because the information interaction, execution process, and other contents between the units in the device are based on the same concept as the method embodiment of the present invention, specific contents may refer to the description in the method embodiment of the present invention, and are not described herein again.
The present invention also provides a computer readable medium storing instructions for causing a computer to perform a method of constructing a knowledgeable map of traditional Chinese medicine as described herein. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.
The method and the device for constructing the traditional Chinese medicine knowledge graph provided by the embodiment of the invention at least have the following beneficial effects:
1. in the embodiment of the invention, the Chinese medicine knowledge text to be processed comprises two marked Chinese medicine entities, a Chinese medicine entity relationship extraction model is established, the Chinese medicine knowledge text is divided into three text sections by the model, each text section comprises the two Chinese medicine entities in the Chinese medicine knowledge text, the Chinese medicine entity relationship extraction model can extract semantic information about the two Chinese medicine entities from different parts of the Chinese medicine knowledge text, and can perform key extraction on text content between the two Chinese medicine entities, so that the entity relationship of the two extracted Chinese medicine entities is more accurate, and the reliability of a Chinese medicine knowledge graph generated based on the extracted entity relationship is higher.
2. In the embodiment of the invention, the semantic information identified from the text segment is converted into a vector, and the subsequent operation is carried out in a vector mode. The following information may be included in the feature vector of each text segment: the number of words in the text segment, the length of the text segment, the distance between each word and the Chinese medicine entity in the text segment, and the like. The semantic information of the text segment can be comprehensively and accurately described based on the feature vector of the text segment.
3. In the embodiment of the invention, the feature matrix of each text segment can be calculated by the first formula, the importance degree of each text segment can be embodied by the first formula, the more important the text segment is, the higher the importance degree of the corresponding feature matrix in the feature matrix of the Chinese medicine knowledge text is, and the feature matrix of the Chinese medicine knowledge text generated by the method is input into the output layer, so that the extracted entity relationship is more accurate.
4. In the embodiment of the invention, two algorithms, namely a maximum pooling algorithm and a mean pooling algorithm, are adopted to pool the feature maps of the Chinese medicine knowledge text respectively. The most important semantic information in the Chinese medicine knowledge text can be extracted through a maximum pooling algorithm, the second vector obtained through an average pooling algorithm represents the semantic information of the Chinese medicine knowledge text in an average mode, the first vector and the second vector obtained through the two algorithms are spliced to obtain a pooled vector, and the pooled backward vector contains more semantic information in the Chinese medicine knowledge text, so that the entity relationship of subsequent extraction is more accurate.
It should be noted that not all steps and modules in the above flows and system structure diagrams are necessary, and some steps or modules may be omitted according to actual needs. The execution order of the steps is not fixed and can be adjusted as required. The system structure described in the above embodiments may be a physical structure or a logical structure, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by a plurality of physical entities, or some components in a plurality of independent devices may be implemented together.
In the above embodiments, the hardware unit may be implemented mechanically or electrically. For example, a hardware element may comprise permanently dedicated circuitry or logic (such as a dedicated processor, FPGA or ASIC) to perform the corresponding operations. The hardware elements may also comprise programmable logic or circuitry, such as a general purpose processor or other programmable processor, that may be temporarily configured by software to perform the corresponding operations. The specific implementation (mechanical, or dedicated permanent, or temporarily set) may be determined based on cost and time considerations.
While the invention has been shown and described in detail in the drawings and in the preferred embodiments, it is not intended to limit the invention to the embodiments disclosed, and it will be apparent to those skilled in the art that various combinations of the code auditing means in the various embodiments described above may be used to obtain further embodiments of the invention, which are also within the scope of the invention.

Claims (10)

1. The method for constructing the traditional Chinese medicine knowledge graph is characterized by comprising the following steps:
pre-establishing a traditional Chinese medicine entity relationship extraction model, wherein the traditional Chinese medicine entity relationship extraction model is used for splitting an input traditional Chinese medicine knowledge text into three text sections, each text section comprises two marked traditional Chinese medicine entities, and semantic information of entity relationship comprising the two traditional Chinese medicine entities is identified from the three text sections;
training the Chinese medicine entity relationship extraction model to generate a trained Chinese medicine entity relationship extraction model;
extracting the entity relationship of a plurality of Chinese medicine knowledge texts by using the trained Chinese medicine entity relationship extraction model to generate the entity relationship among the Chinese medicine entities;
and generating a traditional Chinese medicine knowledge map according to the entity relationship among the traditional Chinese medicine entities.
2. The method of claim 1,
the method for extracting the entity relationship of the plurality of Chinese medicine knowledge texts by using the trained Chinese medicine entity relationship extraction model to generate the entity relationship among the Chinese medicine entities comprises the following steps:
aiming at any Chinese medicine knowledge text, the following steps are carried out:
splitting three text sections from the Chinese medicine knowledge text, wherein each text section comprises two marked Chinese medicine entities;
determining a feature vector of each text segment;
generating a feature matrix of the Chinese medicine knowledge text according to the feature vectors of the three text segments;
performing convolution operation on the characteristic matrix to generate a characteristic diagram of the Chinese medicine knowledge text;
performing pooling operation on the characteristic graph to generate a pooled vector of the Chinese medicine knowledge text;
and inputting the pooled vectors into an output layer of the Chinese medicine entity relationship extraction model, and extracting the entity relationship between two Chinese medicine entities in the Chinese medicine knowledge text.
3. The method of claim 2,
generating a feature matrix of the Chinese medicine knowledge text according to the feature vectors of the three text segments, wherein the feature matrix comprises:
executing the following steps for the feature vector of any text segment:
determining a feature matrix of the text segment according to a first formula and a feature vector of the text segment, wherein the first formula is as follows:
Figure FDA0003490664350000021
wherein Z isqFeature matrix for the q-th text segment, CqIs the characteristic vector of the qth text segment, B is a preset initial matrix, D is the difference between the entity vectors of two Chinese medicine entities in the Chinese medicine knowledge text, CrThe feature vector of the r text segment;
and generating the characteristic matrix of the Chinese medicine knowledge text according to the characteristic matrices of the three text segments.
4. The method of claim 2,
the performing pooling operation on the feature map to generate a pooled vector includes:
pooling the feature map by using a maximum pooling algorithm to generate a first vector;
pooling the feature map by using an average pooling algorithm to generate a second vector;
and generating the pooled vector according to the first vector and the second vector.
5. The method of claim 2,
the training of the traditional Chinese medicine entity relationship extraction model comprises the following steps:
training the traditional Chinese medicine entity relationship extraction model by utilizing a first loss function, wherein the first loss function is as follows:
L=(1-softmax(dm+h))(||δ||2+lg(softmax(dm+h)))
wherein, L is a value of a first loss function, δ is a hyperparameter of the TCM entity relationship extraction model, τ is a preset adjustment parameter of a sample, d is a preset first proportionality coefficient, h is a preset second proportionality coefficient, and m is a pooled vector of the sample.
6. The device for constructing the traditional Chinese medicine knowledge map is characterized by comprising the following components:
the system comprises a training module, a data processing module and a data processing module, wherein the training module is used for training a pre-established traditional Chinese medicine entity relationship extraction model to generate the trained traditional Chinese medicine entity relationship extraction model, the traditional Chinese medicine entity relationship extraction model is used for splitting an input traditional Chinese medicine knowledge text into three text sections, each text section comprises two marked traditional Chinese medicine entities, and semantic information of entity relationship comprising the two traditional Chinese medicine entities is identified from the three text sections;
the extraction module is used for extracting the entity relationship of the plurality of Chinese medicine knowledge texts by using the trained Chinese medicine entity relationship extraction model to generate the entity relationship among the Chinese medicine entities;
and the generating module is used for generating the traditional Chinese medicine knowledge graph according to the entity relationship among the traditional Chinese medicine entities.
7. The apparatus of claim 6,
the extraction module is specifically configured to:
aiming at any Chinese medicine knowledge text, the following steps are carried out:
splitting three text sections from the Chinese medicine knowledge text, wherein each text section comprises two marked Chinese medicine entities;
determining a feature vector of each text segment;
generating a feature matrix of the Chinese medicine knowledge text according to the feature vectors of the three text segments;
performing convolution operation on the characteristic matrix to generate a characteristic diagram of the Chinese medicine knowledge text;
performing pooling operation on the characteristic graph to generate a pooled vector of the Chinese medicine knowledge text;
and inputting the pooled vector into an output layer of the Chinese medicine entity relationship extraction model, and extracting the entity relationship between the two Chinese medicine entities in the Chinese medicine knowledge text.
8. The apparatus of claim 7,
the extraction module, when executing the feature vector according to the three text segments to generate the feature matrix of the chinese medical knowledge text, is specifically configured to:
executing the following steps for the feature vector of any text segment:
determining a feature matrix of the text segment according to a first formula and a feature vector of the text segment, wherein the first formula is as follows:
Figure FDA0003490664350000031
wherein Z isqFeature matrix for the q-th text segment, CqIs the characteristic vector of the qth text segment, B is a preset initial matrix, D is the difference between the entity vectors of two Chinese medicine entities in the Chinese medicine knowledge text, CrThe feature vector of the r text segment;
and generating the characteristic matrix of the Chinese medicine knowledge text according to the characteristic matrices of the three text segments.
9. An apparatus for constructing a knowledge graph of traditional Chinese medicine, comprising: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor configured to invoke the machine readable program to perform the method of constructing a knowledgeable traditional Chinese medicine profile of any of claims 1 to 5.
10. A computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the method of constructing a knowledgeable traditional chinese medicine profile of any one of claims 1 to 5.
CN202210094947.8A 2022-01-26 2022-01-26 Method and device for constructing traditional Chinese medicine knowledge map Pending CN114579755A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210094947.8A CN114579755A (en) 2022-01-26 2022-01-26 Method and device for constructing traditional Chinese medicine knowledge map

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210094947.8A CN114579755A (en) 2022-01-26 2022-01-26 Method and device for constructing traditional Chinese medicine knowledge map

Publications (1)

Publication Number Publication Date
CN114579755A true CN114579755A (en) 2022-06-03

Family

ID=81770836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210094947.8A Pending CN114579755A (en) 2022-01-26 2022-01-26 Method and device for constructing traditional Chinese medicine knowledge map

Country Status (1)

Country Link
CN (1) CN114579755A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008959A (en) * 2019-03-26 2019-07-12 北京博瑞彤芸文化传播股份有限公司 A kind of medical data processing method and system
CN110413999A (en) * 2019-07-17 2019-11-05 新华三大数据技术有限公司 Entity relation extraction method, model training method and relevant apparatus
CN110457703A (en) * 2019-08-12 2019-11-15 广东工业大学 A kind of file classification method, device and equipment based on improvement convolutional neural networks
CN111898384A (en) * 2020-05-30 2020-11-06 中国兵器科学研究院 Text emotion recognition method and device, storage medium and electronic equipment
CN112860904A (en) * 2021-04-06 2021-05-28 哈尔滨工业大学 External knowledge-integrated biomedical relation extraction method
CN113505244A (en) * 2021-09-10 2021-10-15 中国人民解放军总医院 Knowledge graph construction method, system, equipment and medium based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008959A (en) * 2019-03-26 2019-07-12 北京博瑞彤芸文化传播股份有限公司 A kind of medical data processing method and system
CN110413999A (en) * 2019-07-17 2019-11-05 新华三大数据技术有限公司 Entity relation extraction method, model training method and relevant apparatus
CN110457703A (en) * 2019-08-12 2019-11-15 广东工业大学 A kind of file classification method, device and equipment based on improvement convolutional neural networks
CN111898384A (en) * 2020-05-30 2020-11-06 中国兵器科学研究院 Text emotion recognition method and device, storage medium and electronic equipment
CN112860904A (en) * 2021-04-06 2021-05-28 哈尔滨工业大学 External knowledge-integrated biomedical relation extraction method
CN113505244A (en) * 2021-09-10 2021-10-15 中国人民解放军总医院 Knowledge graph construction method, system, equipment and medium based on deep learning

Similar Documents

Publication Publication Date Title
US11031009B2 (en) Method for creating a knowledge base of components and their problems from short text utterances
CN109272043A (en) Training data generation method, system and electronic equipment for optical character identification
CN107451106A (en) Text method and device for correcting, electronic equipment
CN105593845B (en) Generating means and its method based on the arrangement corpus for learning by oneself arrangement, destructive expression morpheme analysis device and its morpheme analysis method using arrangement corpus
CN111222336B (en) Method and device for identifying unknown entity
CN116127953B (en) Chinese spelling error correction method, device and medium based on contrast learning
CN113822054A (en) Chinese grammar error correction method and device based on data enhancement
CN116542297A (en) Method and device for generating countermeasure network based on text data training
CN111177328B (en) Question-answer matching system and method, question-answer processing device and medium
CN113641707A (en) Knowledge graph disambiguation method, device, equipment and storage medium
CN113468323B (en) Dispute focus category and similarity judging method, system and device and recommending method
CN112926323B (en) Chinese named entity recognition method based on multistage residual convolution and attention mechanism
CN109614610A (en) Similar Text recognition methods and device
CN113887200A (en) Text variable-length error correction method and device, electronic equipment and storage medium
CN111782892B (en) Similar character recognition method, device, apparatus and storage medium based on prefix tree
CN114579755A (en) Method and device for constructing traditional Chinese medicine knowledge map
CN117217233A (en) Text correction and text correction model training method and device
CN116663536A (en) Matching method and device for clinical diagnosis standard words
US20230186351A1 (en) Transformer Based Search Engine with Controlled Recall for Romanized Multilingual Corpus
US20220383159A1 (en) Systems and methods for open domain multi-hop question answering
CN115455969A (en) Medical text named entity recognition method, device, equipment and storage medium
CN115080748A (en) Weak supervision text classification method and device based on noisy label learning
US11687723B2 (en) Natural language processing with missing tokens in a corpus
US20230140480A1 (en) Utterance generation apparatus, utterance generation method, and program
CN114647727A (en) Model training method, device and equipment applied to entity information recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20220603

WD01 Invention patent application deemed withdrawn after publication