CN113076411B - Medical query expansion method based on knowledge graph - Google Patents

Medical query expansion method based on knowledge graph Download PDF

Info

Publication number
CN113076411B
CN113076411B CN202110454713.5A CN202110454713A CN113076411B CN 113076411 B CN113076411 B CN 113076411B CN 202110454713 A CN202110454713 A CN 202110454713A CN 113076411 B CN113076411 B CN 113076411B
Authority
CN
China
Prior art keywords
question
query
medical
words
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110454713.5A
Other languages
Chinese (zh)
Other versions
CN113076411A (en
Inventor
方钰
崔雪
翟鹏珺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN202110454713.5A priority Critical patent/CN113076411B/en
Publication of CN113076411A publication Critical patent/CN113076411A/en
Application granted granted Critical
Publication of CN113076411B publication Critical patent/CN113076411B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H80/00ICT specially adapted for facilitating communication between medical practitioners or patients, e.g. for collaborative diagnosis, therapy or health monitoring
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

A medical query expansion method based on knowledge graph. The query expansion technology in the automatic question-answering system reduces semantic difference between question-answering sentences by supplementing expansion information to the question sentences, thereby improving the accuracy of the question-answering system. In the field of medical question and answer, the existing query expansion method does not fully combine the co-occurrence incidence relation and the reasoning incidence relation among medical terms under different query intentions, so that the obtained expansion words are not accurate enough. The medical knowledge map is used as a knowledge source of the expansion words, the candidate expansion words are obtained by using the reasoning association of the medical terms under different query intentions, and the final expansion words are screened out by combining the negative medical term recognition and mutual information technology, so that the accuracy of the medical question-answering system is finally improved.

Description

Medical query expansion method based on knowledge graph
Technical Field
The invention relates to the field of natural language processing, in particular to query processing in a question-answering system. Query expansion is an important link and key technology in an automatic question and answer system.
Background
With the rapid development of the internet, more and more patients tend to seek medical help through online health communities. However, the drastically increased number of questions places a tremendous burden on the physician to return. In order to alleviate the workload of doctors and meet the demand of users for quick answers, a large number of researchers invest in the field of medical question-answering. In the medical question-answering system, word mismatching caused by different expression modes between question-answering sentences and semantic deviation caused by different information amounts between question-answering sentences are key factors influencing the accuracy of the system. For this reason, researchers have introduced query expansion techniques, i.e., by supplementing query-related expansion words in the query, to reduce the bias between question-answer sentences, so as to improve the performance of the system.
In the current medical question and answer field, the query expansion method mainly comprises query expansion based on key words and query expansion based on semantics. However, the keyword-based query expansion method only picks keywords from a statistical level, ignores semantic information of the query, and therefore may expand many irrelevant medical entities to introduce "noise" to the original query, thereby affecting the quality of answer selection. The semantic-based query expansion utilizes a medical ontology library or a medical semantic dictionary to mine potential semantics except surface word surfaces in queries, but at present, in the stage of acquiring candidate expansion words, the semantic-based query expansion research selects the candidate expansion words based on the concept of a medical entity, and the important role of reasoning association relation of the medical entity between question and answer sentences in guiding the acquisition of the candidate expansion words is ignored. In the expanded word screening stage, some researchers use mutual information to screen candidate words, but they neglect to deny the interference of medical entities on mutual information values among entities.
Disclosure of Invention
In view of the defects of the prior art, the invention provides a semantic query expansion method based on entity incidence relation in medical question answering. The method combines the inference association relation between the query intention and the entity to acquire candidate expansion words from the medical knowledge map, and combines the screening strategy of negating medical entity identification and mutual information to screen the expansion words.
Query expansion is an important ring in automated question-answering systems, which helps the question-answering model to pick the correct answer by processing the original question. At present, most of query expansion in the field of medical question and answer utilizes pseudo-correlation feedback to obtain expansion words, utilizes statistical relationship among medical terms to obtain expansion words, and utilizes semantic similarity among terms to obtain expansion words, and the obtained expansion words are probably irrelevant to query intentions and do not accord with medical scenes where the query is located, or have small correlation with the query, so that large noise is brought to a question and answer system, and the accuracy of the question and answer system is influenced.
Aiming at the problems, the invention aims at expanding the user query, adopts an SVM classifier to obtain the query intention of the user, then obtains candidate expansion words related to the query from a medical knowledge map based on the reasoning association relation of medical terms under different query intentions, and finally obtains the final expansion words by screening through a negative term recognition technology and a mutual information technology.
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
the invention provides a medical query expansion method based on a knowledge graph, which comprises the following steps:
step 1, preprocessing a data set of medical question and answer pairs;
step 2, training an SVM classifier to predict the query intention of the question;
step 3, combining the query intention obtained in the step 2 to obtain candidate expansion words related to query from the medical knowledge graph;
and 4, screening the candidate expansion words obtained in the step 3 by utilizing a negative medical term recognition technology and a mutual information technology, so as to obtain final expansion words.
Advantageous effects
The invention aims at the problems that the existing query expansion technology in the medical question-answering field can not accurately generate expansion words related to a medical scene where a query is located, the co-occurrence incidence relation and the reasoning incidence relation among medical terms under different query intentions are not fully combined, the influence of negative medical terms on the co-occurrence relation among the terms is not considered, and the like, and realizes a medical query expansion method based on a knowledge graph. The invention utilizes a semi-supervised SVM classifier to obtain the query intention of a user, utilizes the reasoning association relation among medical terms under different intentions to obtain candidate expansion words from a medical knowledge map, and finally utilizes a negative medical term technology and a mutual information technology to screen out the expansion words closely related to the query.
The invention provides a medical query expansion method based on a knowledge graph, and experimental verification is carried out on a data set of medical question and answer pairs, so that matched expansion words can be observed to better accord with a medical scene where a query is located and are more closely related to the query. An increase in answer selectivity was also observed using the evaluation tool of the TREC conference. The intelligent community system has great significance in providing convenient online and timely medical service for residents and relieving the workload of doctors in the intelligent community scene.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a schematic flow chart of a query expansion method;
FIG. 2 is a flowchart of the query intent classification of question in step two;
FIG. 3 is a diagram of selecting candidate expansion words from the knowledge graph in step three;
fig. 4 is the step four of screening the expansion words by using the negative medical term recognition technology and mutual information.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, a detailed description of the embodiments of the present invention will be given below with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative and explanatory of the invention and are not restrictive thereof.
The specific implementation process of the invention is shown in fig. 1, and comprises the following 4 aspects:
step 1, preprocessing a data set of medical question and answer pairs;
step 2, training an SVM classifier to predict the query intention of the question;
step 3, combining the query intention obtained in the step 2 to obtain candidate expansion words related to query from the medical knowledge graph;
and 4, screening the candidate expansion words obtained in the step 3 by utilizing a negative medical term recognition technology and a mutual information technology, so as to obtain final expansion words.
Each step is described in detail below.
The first step is as follows: the chinese medical answers pre-process the data set,
1.1 integrating question-answer pairs datasets
In order to ensure the balance of the data set and be beneficial to the subsequent classification operation, the invalid question-answer pairs which are not clear in expression, do not contain answers, question sentences or pictures containing the question sentences are deleted, and in addition to four categories of disease diagnosis, disease symptom, disease treatment and disease cause, the question-answer pairs in other categories are deleted. Providing the integrated data set to step 1.2;
1.2 removing stop words
The stop words of the question and answer in the data set are removed by using the stop word list, and the stop words mainly comprise words with high use frequency and no actual meanings, such as language words, polite words and the like. The result after the stop word is removed is provided to step 1.4;
1.3 integrating domain dictionaries
Because a published and relatively complete Chinese medical knowledge base is lacked at present in China, the ICD-9-CM, the ICD-10 and 39 health networks, the dog searching medical word base (for example and without limitation) and the small-scale medical entity dictionary disclosed on the Internet are integrated to obtain four types of medical field dictionaries of diseases, symptoms, medicines and examinations.
1.4 adding the domain dictionary into the dictionary of the jieba word segmenter, and segmenting the question in the data set by using the jieba word segmenter;
after word segmentation, the preprocessing work of the data set by the step 1 on the question and answer is completed, the question sentences in the preprocessed data set are provided for the step 2, the step 3 and the step 4, and the domain dictionary is provided for the step 3 and the step 4.
The second step is that: training the SVM classifier to predict the query intent of the question, as shown in FIG. 2.
2.1 labeling question classification labels
Marking the intention type of the partial question sentences obtained in the step 1, and marking the inquiry intention of the question sentences as 0 if the inquiry intention belongs to the disease diagnosis type; if the query intention of the question belongs to the disease treatment category, the query intention is marked as 1; if the query intention of the question belongs to the disease symptom class, marking as 2; if the query intention of the question belongs to the diagnosis and treatment category, the query intention is marked as 3; if the query intent of the question belongs to the disease cause category, it is labeled 4. The annotated results are provided to step 2.2.
2.2 semi-supervised training SVM intention classifier
Since the dataset question itself does not contain intent classes, a semi-supervised approach taken from training is employed to train the intent classifier. The statistical result of step 2.1 shows that the data set has a data imbalance problem, so the initial classifier uses a Support Vector Machine (SVM) algorithm for sample imbalance. Training of a classifier needs two characteristics (1) TF-IDF characteristics of a question; (2) question and question word features.
(1) TF-IDF is a commonly used feature vectorization method in text classification, and reflects the importance of words in a whole corpus through word Frequency (Term Frequency) and Inverse Document Frequency (Inverse Document Frequency). The calculation formula is as follows:
Figure BDA0003040078780000051
where t represents the word frequency of a word, N represents the total word count of a document, x represents the total number of documents, and w represents the occurrence of the word in w documents.
(2) And (3) obtaining the query feature words of the four categories of question sentences by the statistical data set, processing the question sentences by using discrete feature codes, and judging whether the question sentences contain the query feature words (with the value of 0 or 1) of a certain category.
The trained intent classifier is provided to step 2.3.
2.3 inputting the question to be classified into the trained SVM classifier, and providing the classification result (namely the query intention of the question) to the step 3.
The third step: candidate expansion words relevant to the query are obtained from the medical knowledge-graph, as shown in fig. 3.
3.1 medical knowledge map acquisition
Extracting triples marked as pediatrics departments from the disclosed Chinese medical general knowledge graph, and integrating a Chinese pediatric knowledge graph by combining medical entity relations related to pediatrics collected on a health website. The map is provided to step 3.4.
3.2 counting the negative feature words and the termination feature words in the data set. To step 3.3 and step 4.1.
3.3 query keyword acquisition
And (4) screening the initial query keywords of the sentence according to the question intention category labels provided in the step 2.3 and by combining the domain dictionary obtained in the step 1.3. The screening basis is that a symptom entity is selected as an initial query keyword for a disease diagnosis question, a disease entity is selected as an initial query keyword for a disease treatment question, a disease entity is selected as an initial query keyword for a question of symptom type, and a symptom entity is selected as an initial query keyword for a diagnosis and treatment question. And then removing the negative medical terms in the initial query keywords by using the negative terms and the terminating terms to obtain final query keywords. The specific idea is to determine a negative window by using a negative term and a terminating term as boundaries, wherein all medical terms in the negative window are marked as negative medical terms, the negative term is the negative characteristic word obtained in step 3.2, and the terminating term comprises the terminating characteristic word obtained in step 3.2 and commas, periods and semicolons. The obtained query terms are provided to step 3.4.
3.4 candidate expanded word acquisition
Combining the query keywords of step 3.3 with the query intentions obtained in step 2.3, the types of medical terms that may be present in the answers can be deduced based on the following reasoning formula.
[rule:(Q belongsTo C),(Q hasEntity M)→(A hasEntity N)]
In the formula, Q represents a question, A represents an answer, C represents a query intention, M represents a medical term type screened in the query, and N represents a corresponding medical term type in the answer.
For sentences of disease diagnosis, disease entities possibly corresponding to the query keywords are obtained from the knowledge graph, and intersection sets of the disease entities obtained by each symptom in the query are taken as final candidate expansion words. And for the sentences of the disease treatment class and the inquiry symptom class, respectively selecting the drug entities corresponding to the query keywords and the corresponding typical symptoms from the knowledge graph as candidate expansion words. For the compound question sentence of diagnosis and treatment, firstly inquiring the disease entity according to the processing method of the sentence of disease diagnosis, then inquiring the commonly used medicine entity according to the disease entity according to the processing method of the sentence of disease treatment, and finally, taking the disease entity and the medicine entity as candidate expansion words to be output. For the disease cause-like sentences, since it is difficult to generalize the cause with a single few expansion words, this type of question is not handled for the time being to avoid the introduction of a large amount of noise. The resulting list of candidate expanded words is provided to step 4.2.
The fourth step: screening all candidate expansion words by using a negative medical term recognition technology and a mutual information technology, as shown in fig. 4.
4.1 the questions and answers are used to label all negative medical terms in the dataset, the labeling method being the same as the labeling method described in step 3.3. The result of the labeling is provided to step 4.2.
4.2 calculating the normalized mutual information value of the expansion word and the whole query, and screening to obtain the final expansion word
And (3) calculating the mutual information quantity of each candidate expansion word and the whole query in 3.4, and selecting the candidate expansion word of which the normalized mutual information quantity is smaller than the expansion threshold value as the final expansion word of the query. The mutual information quantity calculation formula of the two words is as follows:
Figure BDA0003040078780000061
the co-occurrence window selects a range of a group of question-answer sentences, c (w1, w2) represents the times of the question sentences of the vocabulary w1 appearing in the co-occurrence window and the times of the response sentences of w2 appearing in the window simultaneously, c (w1) represents the times of the medical terms w1 appearing in the corpus, c (w2) represents the times of the medical terms w2 appearing in the corpus, and N represents the number of all the medical terms in the corpus. In the calculation stage of the mutual information matrix, the word frequency related to the negative medical terms marked in the step 4.1 is not counted, so that the negative medical terms are prevented from interfering the correlation degree of the whole medical terms in the corpus.
Assuming that each key medical term qi in the initial query Q is independent, a calculation formula of mutual information values between the expansion words and the whole query sentence is as follows.
M(Q)=∑qi∈QI(qi,w)
In order to conveniently set a screening threshold value and normalize the obtained mutual information value, the formula is as follows, wherein Mmax and Mmin respectively represent the maximum value and the minimum value of M (Q).
NM(Q)=(Mmax-M(Q))/(Mmax-Mmin)
And the terms with the normalized mutual information value NM (Q) smaller than the expansion threshold value of the whole query in the candidate expansion words become final expansion words.
Innovation point
The invention provides a medical query expansion method based on a knowledge graph, which is different from the query expansion method in the field of medical question and answer at present. The method comprises the steps of judging a user query intention by using a classifier, then acquiring candidate expansion words from a knowledge graph by combining inference association of medical terms under different query intentions, and finally screening by combining a medical term recognition technology and a mutual information technology to obtain final expansion words. Compared with the query expansion based on synonyms commonly used in the field of medical question answering, the method obtains more accurate expansion words.
The method provided by the invention has good performance on the data set of Chinese medical question and answer pairs, and improves the accuracy of the Chinese medical question and answer system.

Claims (1)

1. A medical query expansion method based on knowledge graph is characterized by comprising the following steps:
step 1, preprocessing a data set of medical question and answer;
1.1 integrating question-answer pairs datasets
Deleting invalid question-answer pairs which are not clear in expression, do not contain answers, question sentences or pictures containing the answer sentences, and deleting other question-answer pairs except four categories of disease diagnosis, disease symptom, disease treatment and disease cause in order to ensure the balance of the data set and facilitate subsequent classification operation; providing the integrated data set to step 1.2;
1.2 removing stop words
Removing stop words of question and answer in the data set by using a stop word vocabulary, wherein the stop words comprise words with high use frequency and no actual meanings; the result after the stop word is removed is provided to step 1.4;
1.3 integrating domain dictionaries
Constructing a medical field dictionary by integrating various existing medical entity dictionaries, wherein the medical field dictionary comprises four categories of diseases, symptoms, medicines and examinations;
1.4 adding the domain dictionary into the dictionary of the jieba word segmenter, and segmenting the question in the data set by using the jieba word segmenter;
after word segmentation, preprocessing the data set by the question and answer in the step 1 is completed, the questions in the preprocessed data set are provided for the step 2, the step 3 and the step 4, and the domain dictionary is provided for the step 3 and the step 4;
step 2, training an SVM classifier to predict the query intention of the question;
2.1 labeling question classification labels
Marking the intention type of the partial question sentences obtained in the step 1, and marking the inquiry intention of the question sentences as 0 if the inquiry intention belongs to the disease diagnosis type; if the query intention of the question belongs to the disease treatment category, the label is 1; if the query intention of the question belongs to the disease symptom class, marking as 2; if the query intention of the question belongs to the diagnosis and treatment category, the query intention is marked as 3; if the query intention of the question belongs to the disease cause class, marking as 4; the annotated result is provided to step 2.2;
2.2 semi-supervised training SVM intention classifier
The method adopts a self-training semi-supervised method to train an intention classifier, and an initial classifier uses a Support Vector Machine (SVM) algorithm for sample imbalance; training of a classifier requires two features of a question (1), namely TF-IDF features; (2) question and question word characteristics:
(1) TF-IDF is a commonly used feature vectorization method in text classification, which reflects the importance of words in the whole corpus through Term Frequency and Inverse file Frequency Inverse Document Frequency, and the calculation formula is as follows:
Figure FDA0003566921720000021
wherein t represents the word frequency of a certain word, N represents the total word number of the document, x represents the total number of the document, and w represents the occurrence of the word in w documents;
(2) the method comprises the steps that a statistical data set obtains question feature words of four categories of question sentences, discrete feature codes are used for processing the question sentences, and whether question feature words with the category of 0 or 1 are included is judged;
providing the trained intention classifier to the step 2.3;
2.3 inputting the question to be classified into the trained SVM classifier, and providing the classification result, namely the query intention of the question, to the step 3;
step 3, combining the query intention obtained in the step 2 to obtain candidate expansion words related to query from the medical knowledge graph:
3.1 medical knowledge map acquisition
Extracting triples marked as pediatrics departments from the disclosed Chinese medical general knowledge graph, and acquiring pediatrics medical entity relations from pediatrics question and answer corpora crawled from a 39 health network by using a BERT-based relation extraction method, so that the triples and the pediatrics knowledge graph are integrated; the map is provided to step 3.4;
3.2 counting negative characteristic words and termination characteristic words in the data set; providing to step 3.3 and step 4.1;
3.3 query keyword acquisition
Screening the initial query keywords of the sentence according to the question intention category labels provided in the step 2.3 and by combining the domain dictionary obtained in the step 1.3; the screening basis is that a symptom entity is selected as an initial query keyword for a disease diagnosis question, a disease entity is selected as an initial query keyword for a disease treatment question, a disease entity is selected as an initial query keyword for a question of symptom type, and a symptom entity is selected as an initial query keyword for a diagnosis and treatment question; then, removing negative medical terms in the initial query keywords by using the negative terms and the termination terms to obtain final query keywords; the specific idea is to determine a negative window by taking a negative term and a terminating term as boundaries, wherein medical terms in the negative window are all marked as negative medical terms, the negative term is a negative characteristic word obtained in step 3.2, and the terminating term comprises the terminating characteristic word obtained in step 3.2 and commas, periods and semicolons; providing the obtained query key words to step 3.4;
3.4 candidate expanded word acquisition
Combining the query keywords of step 3.3 with the query intentions obtained in step 2.3, the types of medical terms that may be present in the answers can be deduced based on the following reasoning formula;
[rule:(Q belongsTo C),(Q hasEntity M)→(A hasEntity N)]
in the formula, Q represents a question, A represents an answer, C represents a query intention, M represents a medical term type screened in the query, and N represents a corresponding medical term type in the answer;
for the sentences of disease diagnosis, acquiring disease entities possibly corresponding to the query keywords from the knowledge graph, and taking intersection of the disease entities obtained by each symptom in the query as final candidate expansion words;
for sentences of a disease treatment class and a symptom inquiry class, respectively selecting a drug entity corresponding to a query keyword and a corresponding typical symptom from a knowledge graph as candidate expansion words;
for the diagnosis and treatment compound question sentence, firstly inquiring a disease entity according to the processing method of a disease diagnosis sentence, then inquiring a commonly used medicine entity according to the disease entity according to the processing method of a disease treatment sentence, and finally, taking the disease entity and the medicine entity as candidate expansion words to be output;
for the disease reason sentences, the question sentences of the type are not processed for the time being;
the obtained candidate expansion word list is provided for the step 4;
and 4, screening the candidate expansion words obtained in the step 3 by utilizing a negative medical term recognition technology and a mutual information technology to obtain final expansion words:
4.1 marking all negative medical terms in the data set by the question and answer, wherein the marking method is the same as the marking method introduced in the step 3.3; the result of the labeling is provided to step 4.2;
4.2 calculating the normalized mutual information value of the expansion word and the whole query, and screening to obtain the final expansion word
Calculating the mutual information quantity of each candidate expansion word and the whole query in the step 3.4, and selecting the candidate expansion word of which the normalized mutual information quantity is smaller than the expansion threshold value as a final expansion word of the query; the mutual information quantity calculation formula of the two words is as follows:
Figure FDA0003566921720000031
selecting a range of a group of question-answer sentences in a co-occurrence window, wherein c (w1, w2) represents the times of the question sentences of a word w1 appearing in the co-occurrence window and the times of the response sentences of w2 appearing in the co-occurrence window simultaneously, c (w1) represents the times of the medical terms w1 appearing in the corpus set, c (w2) represents the times of the medical terms w2 appearing in the corpus set, and N represents the number of all medical terms in the corpus set; in the calculation stage of the mutual information matrix, the word frequency related to the negative medical terms marked in the step 4.1 is not counted;
assuming that each key medical term qi in the initial query Q is independent, the calculation formula of the mutual information value between the expansion word and the whole query statement is as follows:
M(Q)=∑qi∈QI(qi,w)
in order to conveniently set a screening threshold value and normalize the obtained mutual information value, the formula is shown as follows, wherein Mmax and Mmin respectively represent the maximum value and the minimum value of M (Q);
NM(Q)=(Mmax-M(Q))/(Mmax-Mmin)
and (3) the terms with the normalized mutual information value NM (Q) smaller than the expansion threshold value of the whole query in the candidate expansion words become final expansion words.
CN202110454713.5A 2021-04-26 2021-04-26 Medical query expansion method based on knowledge graph Active CN113076411B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110454713.5A CN113076411B (en) 2021-04-26 2021-04-26 Medical query expansion method based on knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110454713.5A CN113076411B (en) 2021-04-26 2021-04-26 Medical query expansion method based on knowledge graph

Publications (2)

Publication Number Publication Date
CN113076411A CN113076411A (en) 2021-07-06
CN113076411B true CN113076411B (en) 2022-06-03

Family

ID=76618763

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110454713.5A Active CN113076411B (en) 2021-04-26 2021-04-26 Medical query expansion method based on knowledge graph

Country Status (1)

Country Link
CN (1) CN113076411B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114510558A (en) * 2022-01-26 2022-05-17 北京博瑞彤芸科技股份有限公司 Question-answering method and system based on traditional Chinese medicine knowledge graph
CN115618947A (en) * 2022-12-05 2023-01-17 中国人民解放军总医院 Medical knowledge map quality evaluation system, device, equipment, medium and product
CN116052889B (en) * 2023-03-31 2023-07-04 四川无限智达科技有限公司 sFLC prediction system based on blood routine index detection
CN116542817B (en) * 2023-07-06 2023-10-13 北京烽火万家科技有限公司 Intelligent digital lawyer consultation method and system
CN116932767B (en) * 2023-09-18 2023-12-12 江西农业大学 Text classification method, system, storage medium and computer based on knowledge graph

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391906A (en) * 2017-06-19 2017-11-24 华南理工大学 Health diet knowledge network construction method based on neutral net and collection of illustrative plates structure
CN108986871A (en) * 2018-08-27 2018-12-11 东北大学 A kind of construction method of intelligent medical treatment knowledge mapping

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NZ520461A (en) * 2000-02-14 2005-03-24 First Opinion Corp Automated diagnostic system and method
US11158427B2 (en) * 2017-07-21 2021-10-26 International Business Machines Corporation Machine learning for medical screening recommendations based on patient activity information in social media
CN108256061A (en) * 2018-01-16 2018-07-06 华东师范大学 Search method, electronic equipment and the storage medium of medical text
CN109241257B (en) * 2018-08-20 2022-07-19 重庆柚瓣家科技有限公司 Intelligent question-answering system and method based on knowledge graph
CN111966780A (en) * 2019-05-20 2020-11-20 天津科技大学 Retrospective queue selection method and device based on word vector modeling and information retrieval
CN111370127B (en) * 2020-01-14 2022-06-10 之江实验室 Decision support system for early diagnosis of chronic nephropathy in cross-department based on knowledge graph
CN112241457A (en) * 2020-09-22 2021-01-19 同济大学 Event detection method for event of affair knowledge graph fused with extension features

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391906A (en) * 2017-06-19 2017-11-24 华南理工大学 Health diet knowledge network construction method based on neutral net and collection of illustrative plates structure
CN108986871A (en) * 2018-08-27 2018-12-11 东北大学 A kind of construction method of intelligent medical treatment knowledge mapping

Also Published As

Publication number Publication date
CN113076411A (en) 2021-07-06

Similar Documents

Publication Publication Date Title
CN113076411B (en) Medical query expansion method based on knowledge graph
CN110765257B (en) Intelligent consulting system of law of knowledge map driving type
Roberts A conceptual framework for quantitative text analysis
Li et al. Database integration using neural networks: implementation and experiences
Ferrández et al. Addressing ontology-based question answering with collections of user queries
CN109308321A (en) A kind of knowledge question answering method, knowledge Q-A system and computer readable storage medium
CN112559684A (en) Keyword extraction and information retrieval method
CN107506472B (en) Method for classifying browsed webpages of students
CN114416942A (en) Automatic question-answering method based on deep learning
CN112925918B (en) Question-answer matching system based on disease field knowledge graph
CN112214335A (en) Web service discovery method based on knowledge graph and similarity network
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN116992007B (en) Limiting question-answering system based on question intention understanding
CN112307182A (en) Question-answering system-based pseudo-correlation feedback extended query method
CN114265926A (en) Natural language-based material recommendation method, system, equipment and medium
CN115840812A (en) Method and system for intelligently matching enterprises according to policy text
CN110188170B (en) Multi-entry medical question template device and method thereof
Trabelsi et al. A hybrid deep model for learning to rank data tables
CN111597330A (en) Intelligent expert recommendation-oriented user image drawing method based on support vector machine
CN115828854B (en) Efficient table entity linking method based on context disambiguation
Pinto et al. What Drives Research Efforts? Find Scientific Claims that Count!
CN116227594A (en) Construction method of high-credibility knowledge graph of medical industry facing multi-source data
CN114238735B (en) Intelligent internet data acquisition method
CN113553419A (en) Civil aviation knowledge map question-answering system
CN114817497A (en) Mixed question-answering method based on intention recognition and template matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant