CN106874643B - Method and system for automatically constructing knowledge base to realize auxiliary diagnosis and treatment based on word vectors - Google Patents

Method and system for automatically constructing knowledge base to realize auxiliary diagnosis and treatment based on word vectors Download PDF

Info

Publication number
CN106874643B
CN106874643B CN201611222893.XA CN201611222893A CN106874643B CN 106874643 B CN106874643 B CN 106874643B CN 201611222893 A CN201611222893 A CN 201611222893A CN 106874643 B CN106874643 B CN 106874643B
Authority
CN
China
Prior art keywords
disease
related factor
dictionary
unit
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611222893.XA
Other languages
Chinese (zh)
Other versions
CN106874643A (en
Inventor
张文生
牛景昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201611222893.XA priority Critical patent/CN106874643B/en
Publication of CN106874643A publication Critical patent/CN106874643A/en
Application granted granted Critical
Publication of CN106874643B publication Critical patent/CN106874643B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/043Distributed expert systems; Blackboards
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention relates to a method and a system for automatically constructing a knowledge base to realize auxiliary diagnosis and treatment based on word vectors. Wherein, the method can comprise the following steps: obtaining a patient description; performing keyword matching on the patient description by using an expanded disease-disease related factor dictionary established based on the word vector, and extracting words and expressions related to medicine in the patient description; detecting whether the extracted words and expressions are in a standard disease-disease related factor dictionary; calculating a score of the disease in combination with a correlation score of the disease-related factor obtained from the expanded disease-related factor dictionary with respect to the disease based on the detection result; ranking scores of diseases; and determining the diseases according to the sequencing result. Therefore, the invention solves the technical problem of predicting the spoken disease description of the patient.

Description

Method and system for automatically constructing knowledge base to realize auxiliary diagnosis and treatment based on word vectors
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to a method and a system for automatically constructing a knowledge base based on word vectors to realize auxiliary diagnosis and treatment.
Background
Along with the rapid development of a plurality of doctor-patient online question-answer websites and mobile phone application services in the field of internet medical treatment, a question-answer pair is formed by spoken description of massive patient illness states and various comprehensive information and corresponding doctor diagnosis results, and a precious question-call knowledge base is formed. Since these records tend to be unstructured data and there are a large number of non-canonical medical terms resulting from spoken descriptions, there are many challenges to directly utilizing these data. At the same time, there is a lot of repetitive work in the patient case of online inquiry, which is a waste of valuable doctor human resources. If the artificial intelligence algorithm can be used for replacing doctors to make a preliminary diagnosis result, the inquiry efficiency can be greatly improved. This task can be summarized as: the newly input description of the comprehensive information of the patient about self sex, age, symptoms, disease history and the like is returned to the disease diagnosis result prediction of the patient by using statement analysis and related algorithms and combining with a pre-constructed domain knowledge graph.
The existing technical scheme mainly comprises the following two methods: 1. and returning the corresponding diagnosis result of the doctor by searching the question with the highest similarity with the description of the patient in the question-answer library. The main problems of the methods are that the disease information appearing in the description of the patient is not really analyzed, the similarity of texts cannot completely reflect the similarity of the disease condition of the patient, and the matching accuracy is poor. 2. And (3) by clicking the information such as symptoms and diseased parts related to the disease condition of the patient, overlapping the score corresponding to the disease marked by the information label pre-labeled by the expert, and finally returning a probability sequence of possible diseases. The problems of such methods are that manual scoring is extremely unstable and subjective, and a large amount of labor and time costs are consumed when the number of diseases to be labeled is large, and in addition, the diagnostic system cannot analyze and utilize information other than optional symptoms.
In view of the above, the present invention is particularly proposed.
Disclosure of Invention
In order to solve the above problems in the prior art, that is, to solve the technical problem of how to predict the spoken language disease description of the patient, the embodiment of the present invention provides a method for automatically constructing a knowledge base based on word vectors to implement auxiliary diagnosis and treatment. In addition, the embodiment of the invention also provides a system for automatically constructing the knowledge base based on the word vectors to realize auxiliary diagnosis and treatment.
In order to achieve the above object, according to one aspect of the present invention, the following technical solutions are provided:
a method for automatically constructing a knowledge base to realize auxiliary diagnosis and treatment based on word vectors comprises the following steps:
obtaining a patient description;
performing keyword matching on the patient description by using an expanded disease-disease related factor dictionary established based on the word vector, and extracting words and expressions related to medicine in the patient description;
detecting whether the extracted words and expressions are in a standard disease-disease related factor dictionary;
calculating a score of the disease in combination with a correlation score of the disease-related factor obtained from the expanded disease-related factor dictionary with respect to the disease based on the detection result;
ranking scores of diseases;
and determining the diseases according to the sequencing result.
Further, the expanded disease-disease related factor dictionary may be built by:
training a word vector embedding distributed representation model about the disease-disease related factor using the medical information;
and embedding a distributed representation model based on the word vector, expanding the standard disease-disease related factor dictionary by using a distance measurement method, and establishing an expanded disease-disease related factor dictionary.
Further, training the word vector embedding distributed representation model about the disease-disease related factor by using the medical information may specifically include:
acquiring a medical information training corpus;
cleaning the medical information training corpus;
counting high-frequency expression modes appearing in the records of the question-answering library, increasing the weight of the high-frequency expression modes in the word segmentation model, and performing Chinese word segmentation to obtain a training text;
training the training text to generate a word vector embedded distributed representation model.
Further, the relevance score of a disease-associated factor to a disease can be determined by:
embedding a distributed expression model based on the word vector, expanding a standard disease-disease related factor dictionary by using a distance measurement method, and establishing a replacement word list;
matching the disease-disease related factors in the medical information using the expanded disease-disease related factor dictionary and the replacement vocabulary, and calculating a relevance score of the disease related factors corresponding to the disease.
Further, matching the disease-disease related factors in the medical information by using the expanded disease-disease related factor dictionary and the alternative word list, and calculating a relevance score of the disease related factors corresponding to the disease may specifically include:
matching keywords with the doctor-patient question-answer records by using the expanded disease-disease related factor dictionary, and extracting medical related words and expressions in the doctor-patient question-answer records;
detecting whether the words and expressions related to medicine in the extracted doctor-patient question-answer records are in a standard disease-disease related factor dictionary or not;
if not, normalizing the extracted medical related words and expressions in the doctor-patient question-answer record into corresponding standard expressions according to the replacement word list;
counting the frequency of the co-occurrence of the diseases and the related factors thereof based on the standard expression to obtain a co-occurrence frequency recording matrix of the disease related factors and the diseases;
and obtaining a correlation score of the disease-related factors corresponding to the diseases by using a nonlinear transformation method based on the co-occurrence frequency recording matrix of the disease-related factors and the diseases.
Further, the method may further include:
embedding a distributed expression model based on the word vector, expanding a standard disease-disease related factor dictionary by using a distance measurement method, and establishing a replacement word list;
detecting whether the extracted words and expressions are in a standard disease-disease related factor dictionary, which specifically comprises the following steps:
if not, normalizing the extracted words and expressions to corresponding standard expressions according to the replacement word list to obtain standardized disease related factors;
calculating the score of the disease by combining the correlation scores of the disease-related factors corresponding to the disease, which are obtained according to the expanded disease-related factor dictionary, based on the detection result, and specifically comprising the following steps:
calculating a score for the disease based on the normalized disease-related factors in combination with a relevance score for the disease based on the disease-related factors derived from the expanded disease-related factor dictionary.
Further, the relevance score of a disease-associated factor to a disease can be determined by the following formula:
Figure GDA0001267108800000041
wherein Score (i, j) indicates that the disease-associated factor corresponds to a correlation Score for the disease; p (D)i|Fj) Representing a conditional probability of having a disease; diIndicates a disease; fjRepresents a disease-associated factor; n is a radical ofiIndicating frequency of disease, Ni=∑jNij,NijIndicating the recording frequency.
Further, the score of the disease can be obtained by the following formula:
Figure GDA0001267108800000042
wherein, DS (D)i) A score representing a disease; diIndicates a disease; w (F)j) Representing the mapping weight of the disease category; score (i, j) indicates that the disease-associated factor corresponds to a correlation Score for the disease.
In order to achieve the above object, according to another aspect of the present invention, the following technical solutions are also provided:
a system for automatically constructing a knowledge base based on word vectors to realize auxiliary diagnosis and treatment can comprise:
an acquisition module for acquiring a patient description;
the extraction module is used for performing keyword matching on the patient description by utilizing the expanded disease-disease related factor dictionary established based on the word vector, and extracting words and expressions related to medicine in the patient description;
the detection module is used for detecting whether the extracted words and expressions are in a standard disease-disease related factor dictionary or not;
a calculation module for calculating a score of the disease based on the detection result in combination with a correlation score of the disease-related factor obtained from the expanded disease-related factor dictionary with respect to the disease;
the sorting module is used for sorting scores of diseases;
and the determining module is used for determining the diseases according to the sequencing result.
Further, the extraction module may further specifically include:
a word vector model building unit for training a word vector embedding distributed representation model about the disease-disease related factors using the medical information;
and the extended dictionary establishing unit is used for embedding the distributed representation model based on the word vector, expanding the standard disease-disease related factor dictionary by using a distance measurement method and establishing an extended disease-disease related factor dictionary.
Further, the word vector model establishing unit may specifically include:
the acquisition unit is used for acquiring the medical information training corpus;
the cleaning unit is used for cleaning the medical information training corpus;
the first statistical unit is used for counting high-frequency expression modes appearing in the records of the question-answering library, increasing the weight of the high-frequency expression modes in the word segmentation model, and performing Chinese word segmentation to obtain a training text;
and the generating unit is used for training the training text and generating a word vector embedded distributed representation model.
Further, the calculation module may further specifically include:
the first replacement word list establishing unit is used for embedding a distributed representation model based on the word vectors, expanding a standard disease-disease related factor dictionary by using a distance measurement method and establishing a replacement word list;
and a correlation score calculation unit for matching the disease-disease related factors in the medical information using the expanded disease-disease related factor dictionary and the replacement word list, and calculating a correlation score of the disease related factors corresponding to the disease.
Further, the correlation score calculating unit may specifically include:
the extraction unit is used for matching keywords with the doctor-patient question-answer records by utilizing the expanded disease-disease related factor dictionary and extracting medically related words and expressions in the doctor-patient question-answer records;
the detection unit is used for detecting whether the words and expressions related to the medicine in the extracted doctor-patient question-answer records are in a standard disease-disease related factor dictionary or not;
the first normalization unit is used for normalizing the medically related words and expressions in the extracted medical question-answer record into corresponding standard expressions according to the alternative word list when the words and expressions are not in the standard disease-disease related factor dictionary;
the second statistical unit is used for counting the frequency of the co-occurrence of the diseases and the related factors thereof based on the standard expression to obtain a co-occurrence frequency recording matrix of the disease related factors and the diseases;
and the nonlinear transformation unit is used for obtaining the correlation score of the disease-related factor corresponding to the disease by using a nonlinear transformation method based on the co-occurrence frequency recording matrix of the disease-related factor and the disease.
Further, the system comprises:
the second replacement word list establishing unit is used for embedding a distributed representation model based on the word vectors, expanding the standard disease-disease related factor dictionary by using a distance measurement method and establishing a replacement word list;
the detection module may specifically include:
the second normalization unit is used for normalizing the extracted words and expressions to corresponding standard expressions according to the replacement word list to obtain standardized disease related factors when the extracted words and expressions are not in the standard disease-disease related factor dictionary;
the calculating module may specifically include:
and the disease score calculating unit is used for calculating the score of the disease based on the standardized disease related factors and the relevance scores of the disease related factors corresponding to the disease, which are obtained according to the expanded disease-disease related factor dictionary.
The embodiment of the invention provides a method and a system for automatically constructing a knowledge base to realize auxiliary diagnosis and treatment based on word vectors. Wherein, the method can comprise the following steps: obtaining a patient description; performing keyword matching on the patient description by using an expanded disease-disease related factor dictionary established based on the word vector, and extracting words and expressions related to medicine in the patient description; detecting whether the extracted words and expressions are in a standard disease-disease related factor dictionary; calculating a score of the disease in combination with a correlation score of the disease-related factor obtained from the expanded disease-related factor dictionary with respect to the disease based on the detection result; ranking scores of diseases; and determining the diseases according to the sequencing result. The embodiment of the invention utilizes word vector distributed expression trained aiming at the medical field to establish an expanded disease-disease related factor keyword dictionary, and can utilize multi-source medical information comprising general medical data and spoken Internet doctor-patient question-answer records to learn and construct a disease knowledge map and analyze and process non-standardized and spoken patient disease description, so that the technical problem of predicting the spoken disease description of a patient is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of a method for automatically constructing a knowledge base based on word vectors to realize auxiliary diagnosis and treatment according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a system for automatically constructing a knowledge base based on word vectors to realize assisted diagnosis and treatment according to an embodiment of the present invention.
Detailed Description
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
The basic idea of the embodiment of the invention is to use a word vector embedding technology to generate distributed representation of general medical information, online doctor-patient spoken question and answer records and disease-disease related factors in a patient case database, automatically construct a knowledge graph of the disease-disease related factors and further realize auxiliary diagnosis of the patient spoken disease description.
The terms or definitions to be explained are as follows:
disease-related factors: various factors that may cause, help judge, or contain certain disease information, such as: disease symptoms, disease history, age, signs of disease, sex, etc.
Word vector embedding: by using the method of "Distributed Representation", a word (or phrase) is represented by a continuous real number vector with a low dimension (for example, less than 1000 dimensions), so that the word can be distinguished or represented by the vectors, and the tasks of natural language processing such as text classification and relationship extraction can be processed.
Co-occurrence frequency: in a speech segment or document, the simultaneous occurrence of a certain word or concept is called a co-occurrence, and the number of the occurrences of these words in all representative documents, i.e. the co-occurrence frequency, is counted.
The embodiment of the invention provides a method for automatically constructing a knowledge base based on word vectors to realize auxiliary diagnosis and treatment. As shown in fig. 1, the method may include:
s100: a patient description is obtained.
S110: and performing keyword matching on the patient description by using the expanded disease-disease related factor dictionary established based on the word vector, and extracting medically related words and expressions in the patient description.
Wherein the expanded disease-related factor dictionary is established through steps S112 to S114.
S112: the word vector embedding distributed representation model for the disease-disease related factor is trained using medical information.
The medical information includes, but is not limited to, general medical information, doctor-patient question and answer records, patient cases, text data related to medical diseases, disease-related factors, and the like. General medical information includes, but is not limited to, medical literature (e.g., medical papers, medical patent literature), textbooks (especially medical textbooks), medical papers.
Preferably, the doctor-patient question-answer records are online doctor-patient spoken question-answer records.
Specifically, step S112 may include:
s1121: and acquiring the medical information training corpus.
The medical information corpus may include, but is not limited to, a question and answer library, a medical textbook, a case library, and the like.
S1122: and cleaning the medical information training corpus.
The purpose of this step is to remove meaningless characters.
S1123: and counting high-frequency expression modes appearing in the records of the question-answer library, increasing the weight of the high-frequency expression modes in the word segmentation model, and performing Chinese word segmentation to obtain a training text.
S1124: training the training text to generate a word vector embedded distributed representation model.
In the training process, the training corpora that can be used include, but are not limited to, on-line doctor-patient question and answer records, patient medical records, and textbooks. The embodiment of the invention trains and generates a word vector embedded representation model of the medical field by using, but not limited to, a word2vec open source tool (https:// github. com/danielfrg/word2vec) proposed by Mikolov Tomasi, and stores the word vector embedded representation model in a knowledge base. During the training process, a neural network or other training algorithm may be used. For methods related to word vector training, see application nos.: 201610179115.0, 201510096570.X, which is hereby incorporated by reference. Relevant representation learning field paper experiments show that more ideal word vectors can be obtained by larger training corpora.
For example, in practical applications, a 300-dimensional word vector of hundreds of thousands of expressions may be trained using word segmentation and washing text data, where the high frequency words are represented as:
< tumors 0.176907,0.470268, -0.008468 … total 300 vitamins >
< blood sugar 0.149234,0.278761, -0.474681 … total 300 vitamins >
< fever 0.184283,0.046142, -0.107758 … A total 300 vitamins >
< high fever: 0.204092,0.089622,0.0057266 … for 300 vitamins altogether >
< high fever: 0.366153,0.314256,0.073571 … Total 300 vitamins >
In some optional implementations, the step of training the training text may further include: and performing low-dimensional real number vector representation of high-frequency words in the training text.
The low dimension may be set according to actual conditions, and may be set to be less than 1000 dimensions, for example.
S114: and embedding a distributed representation model based on the word vector, expanding the standard disease-disease related factor dictionary by using a distance measurement method, and establishing an expanded disease-disease related factor dictionary.
It should be clear to those skilled in the art that in the process of establishing the expanded disease and disease related factor dictionary, a replacement word list can also be established, that is, the standard disease-disease related factor dictionary is expanded by using a distance measurement method based on the word vector embedded distributed representation model, and the replacement word list is established.
The medical expert combines with a specific prediction task to construct and maintain a standard disease-disease related factor dictionary, which refers to an authoritative textbook and a specified standard, and the medical expert makes and corrects and maintains a collection of diseases and disease factors, which is a term collection of standard disease-disease related factors and needs to be collated and maintained by combining with related information such as specific diseases and disease symptoms to be predicted, disease history, age, disease signs, sex and the like. For example, heart disease, depression may be an element in two standard disease dictionaries (sets), while insomnia, diabetes history may be an element in two standard disease-related factor dictionaries (sets).
Distance measurement methods include, but are not limited to, cosine distance, Euclidean distance, or other distance measurement methods.
For each element in the standard disease-disease related factor dictionary, a distance measurement method is used for calculation, k words or phrase expressions with the nearest distance in the word vector word list are found, and the k words or phrase expressions are recorded as the replacement of the element in the standard disease-disease related factor dictionary. Thereby creating a replacement vocabulary from the heterogeneous expression to the standard expression and simultaneously creating an extended disease-associated factor dictionary of the knowledge base. Namely: each replaceable element is added to the original standard disease-related factor dictionary to form an expanded disease-related factor dictionary. Where k represents a parameter that can be adjusted for a specific task and data.
The following describes in detail the process of obtaining the expanded dictionary of disease-related factors and the replacement word list by taking the term "fever" as an example of the standard symptom-related factors in the preferred embodiment, and specifically includes: step a1 to step A3.
Step A1: the expression mode of the word or phrase closest to the heating is calculated by using the cosine distance, and the high fever are obtained. Wherein the distance parameter k is 2.
Step A2: in the expanded disease-disease related factor dictionary, high fever and high fever are added, and the replacement word list of the standard disease related factor of fever is recorded to contain the high fever and the high fever.
Step A3: and embedding the trained medical field word vectors into a distributed representation model, and executing the same operation on each element in the standard disease-disease related factor dictionary to obtain an expanded disease-disease related factor dictionary and a replacement word list.
S120: and detecting whether the extracted words and expressions are in a standard disease-related factor dictionary.
In the step, if the extracted words and expressions are detected to be in the standard disease-disease related factor dictionary, the words and expressions are not processed; if not, normalizing the extracted words and expressions to corresponding standard expressions according to the replacement word list to obtain the relevant factors of the standardized diseases. The replacement word list is built by embedding a distributed representation model based on word vectors and expanding a standard disease-disease related factor dictionary by using a distance measurement method.
The non-processing step means performing the subsequent processing using the standardized disease-related factors in the standardized disease-related factor dictionary.
S130: and calculating the score of the disease based on the detection result and the correlation score of the disease-related factors corresponding to the disease obtained according to the expanded disease-related factor dictionary.
In this embodiment, when the extracted words and expressions are not detected in the standard disease-disease related factor dictionary, the extracted words and expressions are normalized to the corresponding standard expressions according to the replacement word list to obtain the standardized disease related factors; calculating a score for the disease based on the normalized disease-related factors in combination with a relevance score for the disease based on the disease-related factors derived from the expanded disease-related factor dictionary. When the extracted words and expressions are detected in the standard disease-related factor dictionary, scores of diseases are calculated using the standardized disease-related factors in the standard disease-related factor dictionary in combination with the correlation scores of the disease-related factors corresponding to the diseases obtained from the expanded disease-related factor dictionary.
Wherein the disease-associated factor is determined by steps S132 to S134 corresponding to the correlation score of the disease.
S132: and (3) based on the word vector embedded distributed representation model, expanding the standard disease-disease related factor dictionary by using a distance measurement method, and establishing a replacement word list.
S134: matching the disease-disease related factors in the medical information using the expanded disease-disease related factor dictionary and the replacement vocabulary, and calculating a relevance score of the disease related factors corresponding to the disease.
Specifically, step S134 may include:
s1341: and matching keywords with the doctor-patient question-answer records by using the expanded disease-disease related factor dictionary, and extracting medically related words and expressions in the doctor-patient question-answer records.
In a preferred embodiment, the step may use the expanded disease-disease related factor dictionary to perform keyword matching on the disease description and diagnosis result in the doctor-patient question-answer library, and extract medically related words and expressions in the doctor-patient question-answer record.
S1342: and detecting whether the medically related words and expressions in the extracted doctor-patient question-answer records are in a standard disease-disease related factor dictionary. If yes, go to step S1343; otherwise, step S1344 is performed.
The step detects whether the extracted related words and expressions are in a standard disease-disease related factor dictionary one by one, and if so, special treatment is not carried out; if not, normalizing to the corresponding standard expression according to the replacement word list.
S1343: no treatment is performed.
This step represents the subsequent processing using the standard expression in the standard disease-related factor dictionary.
S1344: and normalizing the words and expressions related to the medicine in the extracted doctor-patient question-answering records into corresponding standard expressions according to the alternative word list.
The step S1344 may further include: normalization of medically relevant words and expressions is performed when the words and expressions correspond to a plurality of standard diseases or disease-related factors.
Specifically, when a certain expression corresponds to a plurality of standard diseases or disease-related factors, the standard-related factor closest to the expression is determined to replace the expression, and the standard disease-related factor corresponding to the description of the patient is obtained.
As an example, when a certain word and expression corresponds to more than one standard disease or disease-related factor, the cosine distance or euclidean distance is used, but not limited to, to calculate and find the closest standard concept to it, to replace the current expression, i.e. to perform the normalization of medically-related words and expressions.
For example, when an expression corresponds to more than one standard disease or disease-related factor, the cosine distance or Euclidean distance is used to calculate and find the closest standard-related factor to replace the current expression. The operation results in the input for the strip of patients, Q normalized disease-related factors included: { F1,F2,...Fj...FQ}。
S1345: and counting the frequency of the co-occurrence of the diseases and the related factors thereof based on the standard expression to obtain a co-occurrence frequency recording matrix of the disease related factors and the diseases.
The standard disease-disease associated factor dictionary contains two elements: diseases and disease-related factors. For example, for m diseases, defined as { D }1...D2...Di...DmIs defined as { F } for n disease-associated factors1...F...Fj...Fn}; will NijInitialized to zero. Record { R in P question-answer libraries1...R2...RS...RPIn, if R issIn the meantime appear DiAnd FjIs a reaction of NijThe frequency of 1 increase, i.e. co-occurrence of a certain disease and a certain disease-related factor, is recorded once. And counting the P records to obtain an m multiplied by n disease related factor and a co-occurrence frequency record matrix of the disease.
Wherein, P represents the number of records in the question-answer library; r1,R2...Rs...RPRepresenting a question-answer library record; n is a radical ofijIndicating the recording frequency.
S1346: and obtaining a correlation score of the disease-related factors corresponding to the diseases by using a nonlinear transformation method based on the co-occurrence frequency recording matrix of the disease-related factors and the diseases.
In the specific implementation process, the following steps are considered: in a certain record, the disease is knownCorrelation factor FjPresent, then suffers from disease DiHas a conditional probability of P (D)i|Fj)=Nij/∑iNij. The conditional probability can reflect the possibility of the disease-related factor to the disease to a certain extent, but is easily influenced by the cumulative effect of the high-frequency common diseases, so that the higher-order number of common diseases appearing in the record obtains extremely high conditional probability. Therefore, the final scoring function should also include an and Ni=∑jNijThe control parameter concerned. This is similar to the inverse document frequency idea used in the field of document classification.
Preferably, the relevance score of a disease-associated factor to a disease can be determined by the following formula:
Figure GDA0001267108800000121
wherein Score (i, j) indicates that the disease-associated factor corresponds to a correlation Score for the disease; p (D)i|Fj) Representing a conditional probability of having a disease; diIndicates a disease; fjRepresents a disease-associated factor; n is a radical ofiIndicating frequency of disease, Ni=∑jNij,NijIndicating the recording frequency.
The above equation contains conditional probabilities and a non-linear transformation of the reciprocal disease frequency. Finally, each disease-related factor corresponds to at least one related disease, and the corresponding Score is represented by Score (i, j).
The above steps can automatically learn and construct a knowledge graph for predicting diseases by using the expanded disease-disease related factor dictionary, matching the disease-disease related factors in the medical information, calculating and storing the disease related factors in the knowledge graph to score the relevance of the diseases.
In a preferred embodiment, after step S1346, the method further comprises: the scoring function is periodically tested by the A/B test method and the disease-related factors are updated to correspond to the relevance score of the disease.
In the step, the data quality and quantity of the original question-answer library are considered to generate certain influence on the correlation score of the disease-related factors, and meanwhile, a large number of new records can be generated on an online medical inquiry platform every day. Therefore, the scoring function related to the disease correlation factor is stored in an off-line knowledge base, and the scoring function version with better effect is selected by the on-line A/B test to be connected on line regularly.
The training learning data of each version independently form a data version with a disease-related factor to a disease-related score, and the score of the correlation is not completely equal to the prior probability of a disease-related factor, so that the evaluation of a medical expert on the score is only referential, and whether the accuracy and the friendliness degree of disease determination can be improved or not is taken as a final evaluation index, and whether other versions are replaced or not is taken as a basis.
In the process of constructing the knowledge base, the existing knowledge base is combined, so that the condition of illness and basic information description input by a patient can be analyzed, and the function of possible illness can be given.
The process of obtaining a disease-related factor to disease-related score is described in detail below in a preferred embodiment. Wherein, the 'swelling and pain of throat', 'cold' and 'nasal obstruction and discharge' are in the standard dictionary. The process of obtaining the relevance score may include steps B1 through B5.
Step B1: the method comprises the steps of obtaining a question-answer pair of 'I swelling and pain in throat, having high fever, stuffy nose and running nose all the time for a few days, asking doctors what disease the doctors get' and 'possibly having cold' in an original question-answer library.
Step B2: the question and answer pairs are processed to match with sore throat, high fever, nasal obstruction and running nose and cold.
Step B3: according to steps S121 and S122, "fever high" is replaced with "fever" using the replacement word list.
Step B4: and matching the 3 disease-related factors and the 1 disease one by one, and counting the frequency of co-occurrence of the disease and the related factors to obtain a co-occurrence frequency recording matrix of the disease-related factors and the disease.
Step B5: determining a disease-associated factor corresponding to a disease-associated score according to the formula:
Figure GDA0001267108800000141
wherein Score (i, j) indicates that the disease-associated factor corresponds to a correlation Score for the disease; p (D)i|Fj) Representing a conditional probability of having a disease; diIndicates a disease; fjRepresents a disease-associated factor; n is a radical ofiIndicating frequency of disease, Ni=∑jNij,NijIndicating the recording frequency.
In a preferred embodiment, the score of the disease can be obtained by the following formula:
Figure GDA0001267108800000142
wherein, DS (D)i) A score representing a disease; diIndicates a disease; w (F)j) Representing the mapping weight of the disease category; score (i, j) indicates that the disease-associated factor corresponds to a correlation Score for the disease.
For example, in { F1,F2,...Fj...FQIn (b), for each factor F associated with the patient profilejCombining the classes of standardized disease-associated factors, using the following formula for each disease D associated therewithiAnd (3) overlaying and scoring:
Figure GDA0001267108800000143
wherein, DS (D)i) A score representing a disease; diIndicates a disease; w (F)j) Representing the mapping weight of the disease category; score (i, j) indicates that the disease-associated factor corresponds to a correlation Score for the disease.
In the above formula, different factors have different confidence degrees for the disease prediction, so different disease category mapping weights are given according to different factors. Wherein the mapping relationship can be formulated by an expert according to the category attribute. For example: "smoking habits" are disease-related factors belonging to the lifestyle-related category; the fever belongs to disease-related factors of disease symptoms, and when calculation is carried out, the weight of the category is determined, and different disease category mapping weights are used.
S140: the scores of the diseases are ranked.
S150: and determining the diseases according to the sequencing result.
The scoring of suspected diseases using embodiments of the present invention is described in detail below in a preferred embodiment. The disease condition of the patient is described as "the patient suffers from hyperpyrexia continuously in the days, and the patient suffers from smoking habit". The process of getting the ranking may include steps C1 through C7.
Step C1: and (3) matching keywords of 'the disease continuously appears in a few days, has a smoking habit and is a disease obtained' by utilizing the expanded disease-disease related factor dictionary, and extracting 'high fever' and 'smoking habit'.
Step C2: the detection of "high fever" is present in the expanded disease-related factors dictionary and not in the standard disease-related factors dictionary.
Step C3: according to the replacement word list, the 'high fever' is replaced by 'fever'.
Step C4: and respectively determining mapping weights according to the categories of the fever and the smoking habit.
Step C5: determining the patient's score for different diseases according to the formula:
Figure GDA0001267108800000151
step C6: the scores for different diseases are ranked.
Step C7: the first three ranked diseases were exported: 0.143531 for acute pharyngitis, 0.129281 for acute tonsil enlargement and 0.062088 for tracheal diseases.
Although the foregoing embodiments describe the steps in the above sequential order, those skilled in the art will understand that, in order to achieve the effect of the present embodiments, the steps may not be executed in such an order, and may be executed simultaneously (in parallel) or in an inverse order, and these simple variations are within the scope of the present invention.
Based on the same technical concept as the method embodiment, the embodiment of the invention provides a system for automatically constructing a knowledge base based on word vectors to realize auxiliary diagnosis and treatment. The system for automatically constructing the knowledge base based on the word vectors to realize the auxiliary diagnosis and treatment can execute the embodiment of the method for automatically constructing the knowledge base based on the word vectors to realize the auxiliary diagnosis and treatment. As shown in fig. 2, the system 20 may include: an acquisition module 21, an extraction module 22, a detection module 23, a calculation module 24, a ranking module 25 and a determination module 26. Wherein the obtaining module 21 is used for obtaining a patient description. The extraction module 22 is configured to perform keyword matching on the patient description using the extended disease-disease related factor dictionary established based on the word vector, and extract medically related words and expressions in the patient description. The detection module 23 is used to detect whether the extracted words and expressions are in the standard disease-related factor dictionary. The calculation module 24 is configured to calculate a score of the disease based on the detection result and a correlation score corresponding to the disease of the disease-related factor obtained from the expanded disease-related factor dictionary. The ranking module 25 is used to rank the scores of the diseases. The determination module 26 is used for determining the disease according to the sorting result.
In a preferred embodiment, the extraction module may further specifically include: the device comprises a word vector model establishing unit and an extended dictionary establishing unit. Wherein the word vector model building unit is used for training the word vector embedding distributed representation model related to the disease-disease related factors by using the medical information. The extended dictionary establishing unit is used for embedding a distributed representation model based on the word vectors, extending the standard disease-disease related factor dictionary by using a distance measurement method and establishing an extended disease-disease related factor dictionary.
In a preferred embodiment, the word vector model building unit may specifically include: the device comprises an acquisition unit, a cleaning unit, a first statistic unit and a generation unit. The acquisition unit is used for acquiring the medical information training corpus. The cleaning unit is used for cleaning the medical information training corpus. The first statistical unit is used for counting high-frequency expression modes appearing in the records of the question-answering library, increasing the weight of the high-frequency expression modes in the word segmentation model, and performing Chinese word segmentation to obtain a training text. The generating unit is used for training the training text and generating a word vector embedded distributed representation model.
In a preferred embodiment, the calculation module may further specifically include: a first alternative word list establishing unit and a correlation scoring calculation unit. The first replacement word list establishing unit is used for embedding a distributed representation model based on the word vectors, expanding the standard disease-disease related factor dictionary by using a distance measurement method and establishing a replacement word list. The correlation score calculation unit is used for matching the disease-disease related factors in the medical information by using the expanded disease-disease related factor dictionary and the replacement word list, and calculating the correlation score of the disease related factors corresponding to the diseases.
In a preferred embodiment, the correlation score calculating unit may specifically include: the device comprises an extraction unit, a detection unit, a first normalization unit, a second statistic unit and a nonlinear transformation unit. The extraction unit is used for matching keywords with the doctor-patient question-answer records by utilizing the expanded disease-disease related factor dictionary and extracting medically related words and expressions in the doctor-patient question-answer records. The detection unit is used for detecting whether the words and expressions related to the medicine in the extracted doctor-patient question-answer records are in the standard disease-disease related factor dictionary or not. The first normalization unit is used for normalizing the medically related words and expressions in the extracted medical question-answer records into corresponding standard expressions according to the alternative word list when the words and expressions are not in the standard disease-disease related factor dictionary. The second statistical unit is used for counting the frequency of the co-occurrence of the diseases and the related factors thereof based on the standard expression to obtain a co-occurrence frequency recording matrix of the disease related factors and the diseases. The nonlinear transformation unit is used for obtaining the correlation score of the disease-related factor corresponding to the disease by using a nonlinear transformation method based on the co-occurrence frequency recording matrix of the disease-related factor and the disease.
In a preferred embodiment, the system may further comprise: a second replacement word list establishing unit; the second replacement word list establishing unit is used for embedding a distributed representation model based on the word vectors, expanding the standard disease-disease related factor dictionary by using a distance measurement method and establishing a replacement word list. The detection module may further include a second normalization unit; the second normalization unit is used for normalizing the extracted words and expressions to corresponding standard expressions according to the replacement word list to obtain the standardized disease-related factors when the extracted words and expressions are not in the standard disease-related factor dictionary. The calculation module may further include a disease score calculation unit; the disease score calculating unit is used for calculating the score of the disease based on the standardized disease related factors and the relevance score of the disease related factors obtained according to the expanded disease-disease related factor dictionary corresponding to the disease.
For the specific working process and related description of the system described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.
Those skilled in the art will appreciate that the above system for automatically constructing a knowledge base based on word vectors to realize assisted diagnosis and treatment may further include some other known structures, such as a processor, a controller, a memory, a bus, and the like, wherein the memory includes, but is not limited to, a random access memory, a flash memory, a read only memory, a programmable read only memory, a volatile memory, a non-volatile memory, a serial memory, a parallel memory, a register, and the like, the processor includes, but is not limited to, a single-core processor, a multi-core processor, a processor based on an X86 architecture, a CPLD/FPGA, a DSP, an ARM processor, an MIPS processor, and the like, and the bus may include a data bus, an address. Such well-known structures are not shown in fig. 2 in order to not unnecessarily obscure embodiments of the present disclosure. It should also be noted that the number of individual modules in fig. 2 is merely illustrative. The number of modules may be any according to actual needs.
It should be noted that the division of the modules is only an example, and in practical applications, another division manner may be provided. In addition, each module can be decomposed into other modules again, which is not described herein again. Each module can be implemented by hardware, software, or a combination of hardware and software. In practical applications, the modules may be implemented by a central processing unit, a microprocessor, a digital signal processor, a field programmable gate array, or the like. Exemplary hardware platforms for implementing the various modules may include platforms such as Intel x86 based platforms with compatible operating systems, Mac platforms, MACOS, iOS, Android OS, and the like.
It should be noted that the terms "first", "second", etc. used herein should not be construed as limiting the scope of the present invention in various forms.
The above-mentioned embodiments and experimental examples describe the technical solutions, implementation details and algorithm effectiveness of the present invention in detail. It should be understood that the above description is only exemplary of the present invention, and is not intended to limit the present invention, and any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (3)

1. A system for automatically constructing a knowledge base to realize auxiliary diagnosis and treatment based on word vectors is characterized by comprising the following steps:
an acquisition module for acquiring a patient description;
the extraction module is used for performing keyword matching on the patient description by utilizing an expanded disease-disease related factor dictionary established based on word vectors, and extracting words and expressions which are related to medicine in the patient description;
a detection module for detecting whether the extracted words and expressions are in a standard disease-disease related factor dictionary;
a calculation module for calculating a score of a disease based on the detection result in combination with a correlation score of a disease-related factor obtained from the expanded disease-related factor dictionary with respect to the disease;
a ranking module for ranking the scores of the diseases;
a determining module for determining the disease according to the sorting result;
the calculation module specifically comprises a first alternative vocabulary establishing unit and a correlation scoring calculation unit,
the first replacement word list establishing unit is used for embedding a distributed representation model based on a pre-established word vector, expanding the standard disease-disease related factor dictionary by using a distance measurement method and establishing a replacement word list;
the correlation score calculating unit specifically includes:
a first normalization unit for normalizing the medically relevant words and expressions in the extracted patient description into corresponding standard expressions according to the replacement vocabulary when the words and expressions are not in the standard disease-related factor dictionary;
the second statistical unit is used for counting the frequency of the co-occurrence of the diseases and the related factors thereof based on the standard expression to obtain a co-occurrence frequency recording matrix of the disease related factors and the diseases;
the nonlinear transformation unit is used for obtaining a correlation score of the disease-related factor corresponding to the disease by using a nonlinear transformation method based on the disease-related factor and a co-occurrence frequency recording matrix of the disease;
wherein the disease-associated factor is determined by the following formula corresponding to a correlation score for the disease:
P(Di|Fj)=Nij/∑iNij
wherein Score (i, j) indicates that the jth disease-associated factor corresponds to a relevance Score for the ith disease; p (D)i|Fj) Representing a conditional probability of having a disease; diTo represent(ii) an ith disease; fjRepresents the jth disease-related factor; n is a radical ofiIndicates the frequency of co-occurrence of the i-th disease and its associated factors, Ni=∑jNij,NijIndicating the frequency of the i-th disease and the co-occurrence frequency of the i-th disease and the j-th correlation factor.
2. The system according to claim 1, wherein the extraction module specifically comprises:
a word vector model building unit for training a word vector embedding distributed representation model about the disease-disease related factors using the medical information;
and the extended dictionary establishing unit is used for embedding a distributed representation model based on the word vector, extending the standard disease-disease related factor dictionary by using a distance measurement method and establishing the extended disease-disease related factor dictionary.
3. The system according to claim 2, wherein the word vector model building unit specifically includes:
the acquisition unit is used for acquiring the medical information training corpus;
the cleaning unit is used for cleaning the medical information training corpus;
the first statistical unit is used for counting high-frequency expression modes appearing in the records of the question-answering library, increasing the weight of the high-frequency expression modes in the word segmentation model, and performing Chinese word segmentation to obtain a training text;
and the generating unit is used for training the training text and generating a word vector embedded distributed representation model.
CN201611222893.XA 2016-12-27 2016-12-27 Method and system for automatically constructing knowledge base to realize auxiliary diagnosis and treatment based on word vectors Active CN106874643B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611222893.XA CN106874643B (en) 2016-12-27 2016-12-27 Method and system for automatically constructing knowledge base to realize auxiliary diagnosis and treatment based on word vectors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611222893.XA CN106874643B (en) 2016-12-27 2016-12-27 Method and system for automatically constructing knowledge base to realize auxiliary diagnosis and treatment based on word vectors

Publications (2)

Publication Number Publication Date
CN106874643A CN106874643A (en) 2017-06-20
CN106874643B true CN106874643B (en) 2020-02-28

Family

ID=59165041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611222893.XA Active CN106874643B (en) 2016-12-27 2016-12-27 Method and system for automatically constructing knowledge base to realize auxiliary diagnosis and treatment based on word vectors

Country Status (1)

Country Link
CN (1) CN106874643B (en)

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358315A (en) * 2017-06-26 2017-11-17 深圳市金立通信设备有限公司 A kind of information forecasting method and terminal
CN110019826B (en) * 2017-07-27 2023-02-28 北大医疗信息技术有限公司 Construction method, construction device, equipment and storage medium of medical knowledge map
CN107633882B (en) * 2017-09-11 2019-05-14 合肥工业大学 Mix the minimally invasive medical service system and its aid decision-making method under cloud framework
CN107863147B (en) * 2017-10-24 2021-03-16 清华大学 Medical diagnosis method based on deep convolutional neural network
CN107610779B (en) * 2017-10-25 2021-10-22 医渡云(北京)技术有限公司 Disease evaluation and disease risk evaluation method and device
CN107833629A (en) * 2017-10-25 2018-03-23 厦门大学 Aided diagnosis method and system based on deep learning
CN108182972B (en) * 2017-12-15 2021-07-20 中电科软件信息服务有限公司 Intelligent coding method and system for Chinese disease diagnosis based on word segmentation network
TWI665684B (en) * 2017-12-27 2019-07-11 瑞友資訊股份有限公司 Care system capable of drawing up intelligent care plan and using method thereof
CN108182973A (en) * 2017-12-29 2018-06-19 湖南大学 A kind of Intelligent Diagnosis Technology of knowledge based collection of illustrative plates reasoning
CN110164544A (en) * 2018-02-11 2019-08-23 深圳欧德蒙科技有限公司 A kind of method, apparatus and terminal device of illness information processing
CN109243599A (en) * 2018-03-16 2019-01-18 申朴信息技术(上海)股份有限公司 A kind of disease based on various dimensions information retrieval is to code method
CN108614885B (en) * 2018-05-03 2019-04-30 杭州认识科技有限公司 Knowledge mapping analysis method and device based on medical information
EP3567605A1 (en) * 2018-05-08 2019-11-13 Siemens Healthcare GmbH Structured report data from a medical text report
CN109240258A (en) * 2018-07-09 2019-01-18 上海万行信息科技有限公司 Vehicle failure intelligent auxiliary diagnosis method and system based on term vector
CN109473169A (en) * 2018-10-18 2019-03-15 安吉康尔(深圳)科技有限公司 A kind of methods for the diagnosis of diseases, device and terminal device
CN109684445B (en) * 2018-11-13 2021-05-28 中国科学院自动化研究所 Spoken medical question-answering method and spoken medical question-answering system
CN109817330A (en) * 2019-01-25 2019-05-28 华院数据技术(上海)有限公司 A kind of disease forecasting device
US11210346B2 (en) * 2019-04-04 2021-12-28 Iqvia Inc. Predictive system for generating clinical queries
CN110276749B (en) * 2019-06-14 2022-04-01 辽宁万象联合医疗科技有限公司 Quality control artificial intelligence system and quality control method for children radiation shooting and diagnosis
CN110867228B (en) * 2019-11-15 2023-01-17 北京大学人民医院(北京大学第二临床医学院) Intelligent information grabbing and evaluating method and system for wound severity of wound inpatient
CN111599489A (en) * 2020-05-19 2020-08-28 万达信息股份有限公司 Disease information acquisition method, terminal equipment and storage medium
CN111985246B (en) * 2020-08-27 2023-08-15 武汉东湖大数据交易中心股份有限公司 Disease cognitive system based on main symptoms and accompanying symptom words
CN112017773B (en) * 2020-08-31 2024-03-26 吾征智能技术(北京)有限公司 Disease cognitive model construction method and disease cognitive system based on nightmare
CN111968740B (en) * 2020-09-03 2021-04-27 卫宁健康科技集团股份有限公司 Diagnostic label recommendation method and device, storage medium and electronic equipment
CN112364055B (en) * 2020-10-29 2023-11-03 上海德衡数据科技有限公司 Service management software system and method
CN112331355B (en) * 2020-11-26 2024-03-19 微医云(杭州)控股有限公司 Disease type evaluation table generation method and device, electronic equipment and storage medium
CN112988953B (en) * 2021-04-26 2021-09-03 成都索贝数码科技股份有限公司 Adaptive broadcast television news keyword standardization method
CN113505236B (en) * 2021-06-29 2023-08-04 朱一帆 Medical knowledge graph construction method, device, equipment and computer readable medium
CN113793668A (en) * 2021-09-17 2021-12-14 平安科技(深圳)有限公司 Symptom standardization method and device based on artificial intelligence, electronic equipment and medium
CN114628012B (en) * 2022-03-21 2023-09-05 中国人民解放军西部战区总医院 Emergency department's preliminary examination sorting system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011180746A (en) * 2010-02-26 2011-09-15 National Institute Of Information & Communication Technology Relational information expansion device, relational information expansion method and program
CN104572624A (en) * 2015-01-20 2015-04-29 浙江大学 Method for discovering treatment relation between single medicine and disease based on term vector
CN104965992A (en) * 2015-07-13 2015-10-07 南开大学 Text mining method based on online medical question and answer information
CN105069123A (en) * 2015-08-13 2015-11-18 易保互联医疗信息科技(北京)有限公司 Automatic coding method and system for Chinese surgical operation information
CN105138829A (en) * 2015-08-13 2015-12-09 易保互联医疗信息科技(北京)有限公司 Natural language processing method and system for Chinese diagnosis and treatment information
CN105426358A (en) * 2015-11-09 2016-03-23 中国农业大学 Automatic disease noun identification method
CN105786782A (en) * 2016-03-25 2016-07-20 北京搜狗科技发展有限公司 Word vector training method and device
CN106156272A (en) * 2016-06-21 2016-11-23 北京工业大学 A kind of information retrieval method based on multi-source semantic analysis

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070288304A1 (en) * 2006-06-08 2007-12-13 Adknowledge, Inc. System and method for behaviorally targeted electronic communications
CN101158969B (en) * 2007-11-23 2010-06-02 腾讯科技(深圳)有限公司 Whole sentence generating method and device
CN104375989A (en) * 2014-12-01 2015-02-25 国家电网公司 Natural language text keyword association network construction system
CN104391963A (en) * 2014-12-01 2015-03-04 北京中科创益科技有限公司 Method for constructing correlation networks of keywords of natural language texts
CN105740612B (en) * 2016-01-27 2019-07-05 北京国医精诚科技有限公司 Disease treatment system based on tcm clinical practice case
CN106096273A (en) * 2016-06-08 2016-11-09 江苏华康信息技术有限公司 A kind of disease symptoms derivation method based on TF IDF innovatory algorithm

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011180746A (en) * 2010-02-26 2011-09-15 National Institute Of Information & Communication Technology Relational information expansion device, relational information expansion method and program
CN104572624A (en) * 2015-01-20 2015-04-29 浙江大学 Method for discovering treatment relation between single medicine and disease based on term vector
CN104965992A (en) * 2015-07-13 2015-10-07 南开大学 Text mining method based on online medical question and answer information
CN105069123A (en) * 2015-08-13 2015-11-18 易保互联医疗信息科技(北京)有限公司 Automatic coding method and system for Chinese surgical operation information
CN105138829A (en) * 2015-08-13 2015-12-09 易保互联医疗信息科技(北京)有限公司 Natural language processing method and system for Chinese diagnosis and treatment information
CN105426358A (en) * 2015-11-09 2016-03-23 中国农业大学 Automatic disease noun identification method
CN105786782A (en) * 2016-03-25 2016-07-20 北京搜狗科技发展有限公司 Word vector training method and device
CN106156272A (en) * 2016-06-21 2016-11-23 北京工业大学 A kind of information retrieval method based on multi-source semantic analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
智能医疗诊断系统的研究与实现;梁耀波;《中国优秀硕士学位论文全文数据库 信息科技辑》;20161115;第2016年卷(第11期);第3-5章 *

Also Published As

Publication number Publication date
CN106874643A (en) 2017-06-20

Similar Documents

Publication Publication Date Title
CN106874643B (en) Method and system for automatically constructing knowledge base to realize auxiliary diagnosis and treatment based on word vectors
CN110472229B (en) Sequence labeling model training method, electronic medical record processing method and related device
CN111274806B (en) Method and device for recognizing word segmentation and part of speech and method and device for analyzing electronic medical record
CN110459282B (en) Sequence labeling model training method, electronic medical record processing method and related device
Yu et al. Automatic ICD code assignment of Chinese clinical notes based on multilayer attention BiRNN
WO2021151353A1 (en) Medical entity relationship extraction method and apparatus, and computer device and readable storage medium
CN110069779B (en) Symptom entity identification method of medical text and related device
CN111949759A (en) Method and system for retrieving medical record text similarity and computer equipment
US20190057773A1 (en) Method and system for performing triage
CN117744654A (en) Semantic classification method and system for numerical data in natural language context based on machine learning
US10593431B1 (en) Methods and systems for causative chaining of prognostic label classifications
Carchiolo et al. Medical prescription classification: a NLP-based approach
CN110931137B (en) Machine-assisted dialog systems, methods, and apparatus
WO2023029502A1 (en) Method and apparatus for constructing user portrait on the basis of inquiry session, device, and medium
CN112232065A (en) Method and device for mining synonyms
CN110931128A (en) Method, system and device for automatically identifying unsupervised symptoms of unstructured medical texts
CN111145903A (en) Method and device for acquiring vertigo inquiry text, electronic equipment and inquiry system
CN112037909B (en) Diagnostic information review system
CN113722507B (en) Hospitalization cost prediction method and device based on knowledge graph and computer equipment
CN112349367B (en) Method, device, electronic equipment and storage medium for generating simulated medical record
US11727685B2 (en) System and method for generation of process graphs from multi-media narratives
WO2023124837A1 (en) Inquiry processing method and apparatus, device, and storage medium
CN113761899A (en) Medical text generation method, device, equipment and storage medium
CN112652400A (en) Method, system, device and medium for reference of disease condition based on special disease view similarity analysis
CN113643825B (en) Medical case knowledge base construction method and system based on clinical key feature information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant