CN111949759A

CN111949759A - Method and system for retrieving medical record text similarity and computer equipment

Info

Publication number: CN111949759A
Application number: CN201910407594.0A
Authority: CN
Inventors: 郭士成; 王�琦
Original assignee: Peking University Medical Information Technology Co ltd
Current assignee: Peking University Medical Information Technology Co ltd
Priority date: 2019-05-16
Filing date: 2019-05-16
Publication date: 2020-11-17

Abstract

The invention provides a method, a system and computer equipment for retrieving medical record text similarity, wherein the method for retrieving the medical record text similarity comprises the following steps: receiving text information; performing word segmentation processing on the text information to generate words; training the words into long text vectors; and acquiring medical record information similar to the text information in the database according to the long text vector. According to the method for retrieving the similarity of the medical record texts, provided by the invention, medical knowledge is automatically mined and learned from the database through a medical artificial intelligence method without the participation of experts, a model for comparing similar medical records is constructed, the model can integrate the comparison results of various types of free texts, similar medical record recommendations can be efficiently and accurately obtained, the comparison results are highly consistent with the results obtained by manual comparison of doctors, a clinical path reference result with practical value can be provided for the doctors, and the problem that the doctors consume a large amount of time in looking up the history of the previous medical records is effectively solved.

Description

Method and system for retrieving medical record text similarity and computer equipment

Technical Field

The invention relates to the technical field of computers, in particular to a method, a system and computer equipment for retrieving medical record text similarity.

Background

At present, an Electronic Medical Record (EMR) is a Medical Record generated when a patient visits a Medical institution, is a carrier of Medical experience and mode of a doctor, and has a core value in auxiliary diagnosis and provides decision support for the doctor. The main forms of the electronic medical record data include tables, free texts and images, wherein the free texts are mainly presented in the form of unstructured data. With the development of information-based hospitals, hospitals have accumulated a large amount of unstructured electronic medical record free text, which contains a large amount of valuable medical and clinical information. With the increase of standardization of medical information, more standard and complete patient information is covered in free text. At present, many scholars, organizations and enterprises at home and abroad are dedicated to research on an EMR (electronic medical record) based auxiliary diagnosis system, and the field of the research can relate to a complete medical process and has important effects on the aspects of optimizing a working process, improving the working efficiency, reducing medical errors, improving the medical quality and the like. The domestic application research based on Chinese EMR (electronic medical record) aims at the research and development of an EMR (electronic medical record) system on one hand, and clinical path optimization and similar EMR (electronic medical record) search based on EMR (electronic medical record) on the other hand. In the related technology, a core technology of similar Chinese medical record text retrieval is used, the method mainly carries out comparison through keywords or an ontology model, the knowledge of medical experts is relied on, and the existing information contained in large-scale EMR (electronic medical record) data is not well mined and utilized.

Disclosure of Invention

The present invention is directed to solving at least one of the problems of the prior art or the related art.

Therefore, the invention provides a method for searching medical record text similarity in a first aspect.

The invention provides a system for searching medical record text similarity in a second aspect.

A third aspect of the invention provides a computer apparatus.

A fourth aspect of the invention provides a computer-readable storage medium.

In view of this, the first aspect of the present invention provides a method for retrieving similarity between medical records, including: receiving text information; performing word segmentation processing on the text information to generate words; training the words into long text vectors; and acquiring medical record information similar to the text information in the database according to the long text vector.

The method for retrieving the similarity of the medical history texts carries out word segmentation on the received text information, the word segmentation comprises word ambiguity segmentation and identification of unknown words, diseases and diseases can be segmented, the segmented words are used for next training, the accuracy of the next step is determined by accurate word segmentation, the generated words are trained into long text vectors, corresponding long text numerical identifiers are obtained, and the medical history information similar to the text information is obtained in a database according to the long text vectors. The medical record information is retrieved by the method, medical knowledge is automatically mined and learned from the database by a medical artificial intelligence method without the participation of experts, a model for comparing similar medical records is constructed, the model can integrate comparison results of various types of free texts, similar medical record recommendations can be efficiently and accurately obtained, the result obtained by manual comparison of doctors is highly consistent, a clinical path reference result with practical value can be provided for doctors, the problem that doctors consume a large amount of time to look up historical previous medical records is effectively solved, and meanwhile, doctors without medical experience can be assisted by the method, so that patients can obtain diagnosis and treatment better and timely, and the clinical diagnosis efficiency and the clinical diagnosis accuracy are further improved.

Specifically, main treatment objects of the method are main complaints, current medical history, past history, personal history, family history and general examination results in free texts, and perfect auxiliary diagnosis of patients is obtained.

According to the medical record text similarity retrieval method provided by the invention, the method can also have the following additional technical characteristics:

in the above technical solution, preferably, the method for retrieving medical record text similarity further includes: after the step of performing word segmentation processing on the text information to generate words, the method further comprises the following steps: performing tagging processing on the part of speech of the word; and classifying the words according to the part-of-speech labels of the words.

In the technical scheme, text information is preprocessed through named entity recognition application, the part of speech of a word is labeled, the word is classified according to the label, each word in a sentence is endowed with a correct lexical mark, and each word is endowed with a category. Further, the named entity recognition application can accurately segment unknown words, and part-of-speech tagging is mainly divided into rule-based and statistical-based methods. Specifically, firstly, the word segmented by the long text is part-of-speech labeled by using a CRF (conditional random field) algorithm, the word with the part-of-speech labeled is used as RNN (recurrent neural network) input, and the vocabulary classification of diseases and symptoms appearing in the long text is fed back according to the category type of the part-of-speech.

In any of the above technical solutions, preferably, the step of performing word segmentation processing on the text information to generate words specifically includes: and performing word segmentation processing on the text information according to the disease dictionary and the regular expression and the removal disabled words to generate words.

In the technical scheme, word segmentation is carried out on the text information according to the disease dictionary, the regular expression and the removal stop words, so that the effect of removing interference words is achieved, and meanwhile, the accuracy rate of word segmentation is improved by using a maximum matching method.

In any of the above technical solutions, preferably, the step of training the words into long text vectors specifically includes: training the words into word vectors; the word vectors are grouped into long text vectors.

In the technical scheme, firstly, the divided words are trained into word vectors, then the word vectors in each sentence are combined to form a long text vector, and further the numerical symbols of the long text of the medical record are obtained.

In any of the above technical solutions, preferably, the step of obtaining medical record information similar to the text information in the database according to the long text vector specifically includes: acquiring a plurality of long texts similar to the text information from a database, and respectively segmenting the long texts into word sets as a screening set; acquiring a long text matched with a word set obtained after word segmentation processing is carried out on text information in the screening set, and taking the long text as a priority result; calculating the relevance of a word set which is not matched with the text information in the screening set and the word set after the text information is subjected to word segmentation processing according to the long text vector; judging whether the relevance is greater than a preset threshold value or not; and if the relevance is greater than a preset threshold value, arranging the long texts which are not matched with the text information in a positive sequence according to the relevance.

In the technical scheme, firstly, the editing distance is used to solve the positive sequence ordering with the most similar character face of EMR (electronic medical record), and the EMR is divided into corresponding word sets, jaccard (Jaccard) distance is used to calculate long texts completely matched with text information in the word sets, the priority of the long texts is set to be the highest, cosine distance is used to solve the relevance between words of the long texts which are not completely matched, a preset threshold is set, if the relevance is smaller than the preset threshold, the relevance is 0, no relevance can be considered, the relevant word distances are added to the positive sequence ordering, and the next best priority long text matching is solved. Specifically, for example, if the current long text word segmentation set { a, B } is the set { C, a } in the library, the weighted similarity distance obtained after cosine distance calculation is: (B.C)/(| B | | · | | | C |).

In a second aspect of the present invention, a system for retrieving similarity between medical records and texts is provided, including: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor when executing the computer program implementing: receiving text information; performing word segmentation processing on the text information to generate words; training the words into long text vectors; and acquiring medical record information similar to the text information in the database according to the long text vector.

The retrieval system for medical history text similarity carries out word segmentation on received text information, the word segmentation comprises word ambiguity segmentation and identification of unknown words, diseases and time can be segmented, the segmented words are used for next training, the accuracy of the next step is determined by accurate word segmentation, the generated words are trained into long text vectors, corresponding long text numerical identifiers are obtained, and medical history information similar to the text information is obtained in a database according to the long text vectors. The system searches medical record information, medical knowledge is automatically mined and learned from a database by a medical artificial intelligence method without the participation of experts, a model for comparing similar medical records is constructed, the model can integrate comparison results of various types of free texts, similar medical record recommendations can be efficiently and accurately obtained, the result obtained by manual comparison of doctors is highly consistent, a clinical path reference result with practical value can be provided for doctors, the problem that doctors consume a large amount of time to look up historical previous medical records is effectively solved, and meanwhile, the system can be used for assisting doctors without medical experience, so that patients can obtain diagnosis and treatment better and timely, and further the clinical diagnosis efficiency and the clinical diagnosis accuracy are improved.

Specifically, main processing objects of the system are main complaints, current medical histories, past histories, personal histories, family histories and general examination results in free texts, and perfect auxiliary diagnosis of patients is obtained.

The system for searching the medical record text similarity provided by the invention can also have the following additional technical characteristics:

in the above technical solution, preferably, the processor further implements, when executing the computer program: after the step of performing word segmentation processing on the text information to generate words, the method further comprises the following steps: performing tagging processing on the part of speech of the word; and classifying the words according to the part-of-speech labels of the words.

In any of the above technical solutions, preferably, the step of performing word segmentation processing on the text information and generating words when the processor executes the computer program specifically includes: and performing word segmentation processing on the text information according to the disease dictionary and the regular expression and the removal disabled words to generate words.

In any of the above technical solutions, preferably, the step of training the words into long text vectors is implemented when the processor executes the computer program, and specifically includes: training the words into word vectors; the word vectors are grouped into long text vectors.

In any of the above technical solutions, preferably, the step of obtaining medical record information similar to the text information in the database according to the long text vector is implemented when the processor executes the computer program, and specifically includes: acquiring a plurality of long texts similar to the text information from a database, and respectively segmenting the long texts into word sets as a screening set; acquiring a long text matched with a word set obtained after word segmentation processing is carried out on text information in the screening set, and taking the long text as a priority result; calculating the relevance of a word set which is not matched with the text information in the screening set and the word set after the text information is subjected to word segmentation processing according to the long text vector; judging whether the relevance is greater than a preset threshold value or not; and if the relevance is greater than a preset threshold value, arranging the long texts which are not matched with the text information in a positive sequence according to the relevance.

In the technical scheme, firstly, the editing distance is used to solve the positive sequence ordering with the most similar character face of EMR (electronic medical record), and the EMR is divided into corresponding word sets, jaccard (Jaccard) distance is used to calculate long texts completely matched with text information in the word sets, the priority of the long texts is set to be the highest, cosine distance is used to solve the relevance between words of the long texts which are not completely matched, a preset threshold is set, if the relevance is smaller than the preset threshold, the relevance is 0, no relevance can be considered, the relevant word distances are added to the positive sequence ordering, and the next best priority long text matching is solved. Specifically, for example, if the current long text word segmentation set { a, B } is the same as a set { C, a } in the library, the weighted similarity distance obtained after cosine distance calculation is as follows: (B.C)/(| B | | · | | | C |).

In a third aspect of the present invention, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the method for retrieving the similarity between the medical record texts according to any one of the above technical solutions.

The technical scheme provided by the invention comprises the medical record text similarity retrieval method according to any one of the technical schemes of the first aspect, so that the medical record text similarity retrieval method has all the beneficial effects of the medical record text similarity retrieval method.

In a fourth aspect of the present invention, a computer-readable storage medium is provided, on which a computer program is stored, and when being executed by a processor, the computer program implements the steps of the method according to any of the above technical solutions, so that the method has all the technical effects of a medical history text similarity retrieval method, and details are not repeated herein.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart illustrating a method for retrieving similarity between medical records according to an embodiment of the present application;

fig. 2 is another schematic flow chart of a method for retrieving medical record text similarity according to an embodiment of the present application;

fig. 3 is another schematic flow chart of a method for retrieving medical record text similarity according to an embodiment of the present application;

fig. 4 is another flowchart illustrating a method for retrieving similarity between medical records according to an embodiment of the present application;

FIG. 5 is a block diagram of a system for retrieving medical record text similarity according to an embodiment of the present application;

FIG. 6 is another block diagram of a system for retrieving medical record text similarity according to one embodiment of the present application;

FIG. 7 is another block diagram of a system for retrieving medical record text similarity according to one embodiment of the present application;

FIG. 8 shows a schematic block diagram of a computer device of one embodiment of the present application.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

The following describes a method, a system and a computer device for retrieving medical record text similarity according to some embodiments of the present invention with reference to fig. 1 to 8.

Fig. 1 is a flowchart illustrating a medical record text similarity retrieval method according to an embodiment of the present application. As shown in fig. 1, the method includes:

step 102, receiving text information;

104, performing word segmentation processing on the text information to generate words;

step 106, training the words into long text vectors;

and step 108, acquiring medical record information similar to the text information in the database according to the long text vector.

In the foregoing embodiment, preferably, after the step of performing word segmentation processing on the text information and generating words, the method further includes: performing tagging processing on the part of speech of the word; and classifying the words according to the part-of-speech labels of the words.

In this embodiment, the text information is preprocessed by the named entity recognition application, the parts of speech of the words are labeled, the words are classified according to the labels, each word in the sentence is given a correct lexical label, and each word is given a category. Further, the named entity recognition application can accurately segment unknown words, and part-of-speech tagging is mainly divided into rule-based and statistical-based methods. Specifically, firstly, the word segmented by the long text is part-of-speech labeled by using a CRF (conditional random field) algorithm, the word with the part-of-speech labeled is used as RNN (recurrent neural network) input, and the vocabulary classification of diseases and symptoms appearing in the long text is fed back according to the category type of the part-of-speech.

In any of the above embodiments, preferably, the step of performing word segmentation processing on the text information to generate words specifically includes: and performing word segmentation processing on the text information according to the disease dictionary and the regular expression and the removal disabled words to generate words.

In the embodiment, the text information is subjected to word segmentation according to the disease dictionary, the regular expression and the removal stop words, so that the effect of removing interference words is achieved, and meanwhile, the accuracy rate of word segmentation is improved by using a maximum matching method.

In any of the above embodiments, preferably, the step of training the words into long text vectors specifically includes: training the words into word vectors; the word vectors are grouped into long text vectors.

In the embodiment, the divided words are trained into word vectors, and then the word vectors in each sentence are combined to form a long text vector, so that the numerical symbols of the long text of the medical record are obtained.

In any of the above embodiments, preferably, the step of obtaining medical record information similar to the text information in the database according to the long text vector specifically includes: acquiring a plurality of long texts similar to the text information from a database, and respectively segmenting the long texts into word sets as a screening set; acquiring a long text matched with a word set obtained after word segmentation processing is carried out on text information in the screening set, and taking the long text as a priority result; calculating the relevance of a word set which is not matched with the text information in the screening set and the word set after the text information is subjected to word segmentation processing according to the long text vector; judging whether the relevance is greater than a preset threshold value or not; and if the relevance is greater than a preset threshold value, arranging the long texts which are not matched with the text information in a positive sequence according to the relevance.

In the embodiment, firstly, the editing distance is used to solve the positive sequence ordering with the most similar character face of EMR (electronic medical record), and the EMR is divided into corresponding word sets, jaccard (Jacard) distance is used to calculate long texts completely matched with text information in the word sets, the priority of the long texts is set to be the highest, cosine distance is used to solve the relevance between words of the long texts which are not completely matched, a preset threshold is set, if the relevance is smaller than the preset threshold, the relevance is 0, no relevance can be considered, the relevant word distances are added to the positive sequence ordering, and the next best priority long text matching is solved. Specifically, for example, if the current long text word segmentation set { a, B } is the same as a set { C, a } in the library, the weighted similarity distance obtained after cosine distance calculation is as follows: (B.C)/(| B | | · | | | C |).

Fig. 2 is another flowchart illustrating a method for retrieving similarity between medical records according to an embodiment of the present application. As shown in fig. 2, the method includes:

step 202, receiving patient medical record chief complaint information;

step 204, performing word segmentation processing on patient medical record chief complaint information to generate words;

step 206, training the words into long text vectors;

step 208, screening a search range according to whether the disease name or the specific character is included;

and step 210, calculating the chief complaint similarity according to a combined distance algorithm.

In this embodiment, the data objects received are patient complaint data (text type), disease history (numerical type). Firstly, the similarity of the chief complaint data is calculated, as shown in fig. 2, according to the patient chief complaints input by doctors, the chief complaints are trained into text vectors by using a CRF (conditional random field) algorithm, an RNN (recurrent neural network) and a Doc2Vec (emotion analysis), a retrieval range is screened according to whether the chief complaints contain disease names or specific characters, wherein the retrieval range is narrowed by using an edit distance, the time complexity is reduced, and the effect of quick retrieval is achieved, and then the chief complaint similarity is calculated by combining a jaccard (Jaccard) distance and a cos (cosine) distance.

Fig. 3 is another flowchart illustrating a method for retrieving similarity between medical records according to an embodiment of the present application. As shown in fig. 3, the method includes:

step 302, acquiring medical history information in a medical record of a patient according to medical history statistics;

step 304, automatically encoding the medical history;

step 306, performing word segmentation processing on the medical history to generate words;

step 308, training the words into long text vectors;

step 310, calculating the medical history similarity according to the long text vector.

In the embodiment, medical history records in medical history are obtained through medical history statistics, one-hot codes are used for coding the medical histories, and then the similarity between the medical histories is calculated to obtain the similarity of the medical histories.

Fig. 4 is another flowchart illustrating a method for retrieving similarity between medical records according to an embodiment of the present application. As shown in fig. 4, the method includes:

step 402, receiving text information;

step 404, performing word segmentation processing on the text information to generate words;

step 406, training the words into long text vectors;

step 408, calculating the similarity of the chief complaints and the similarity of the medical histories;

step 410, normalizing the similarity of the chief complaints and the similarity of the medical histories;

step 412, feature selection;

step 414, calculating the weight ratio of each feature through feature selection;

and step 416, weighting and summing the main complaint similarity and the medical history similarity according to the obtained weight ratio to obtain comprehensive similarity.

In this example, after the similarity between the chief complaint and the medical history is obtained, the comprehensive similarity between the chief complaint and the medical history is calculated. As shown in fig. 4, the chief complaint similarity and the medical history similarity are normalized, and the input data format is standardized; calculating the weight ratio of each feature through feature selection; and weighting and summing the main complaint similarity and the medical history similarity according to the obtained weight ratio to obtain the comprehensive similarity.

In a second aspect of the present invention, a system 50 for retrieving medical record text similarity is provided, including: a memory 502, a processor 504, and a computer program stored on the memory 502 and executable on the processor 504, the processor 504 when executing the computer program implementing: receiving text information; performing word segmentation processing on the text information to generate words; training the words into long text vectors; and acquiring medical record information similar to the text information in the database according to the long text vector.

As shown in fig. 5, the medical history text similarity retrieval system 50 provided by the present invention performs word segmentation on received text information, where the word segmentation includes ambiguous segmentation of words and identification of unknown words, so as to segment diseases, disorders and time, apply the segmented words to the next training, determine the accuracy of the next training step by means of accurate word segmentation, train the generated words into long text vectors, obtain corresponding long text numeric identifiers, and further obtain medical history information similar to the text information in a database according to the long text vectors. The system searches medical record information, medical knowledge is automatically mined and learned from a database by a medical artificial intelligence method without the participation of experts, a model for comparing similar medical records is constructed, the model can integrate comparison results of various types of free texts, similar medical record recommendations can be efficiently and accurately obtained, the result obtained by manual comparison of doctors is highly consistent, a clinical path reference result with practical value can be provided for doctors, the problem that doctors consume a large amount of time to look up historical previous medical records is effectively solved, and meanwhile, the system can be used for assisting doctors without medical experience, so that patients can obtain diagnosis and treatment better and timely, and further the clinical diagnosis efficiency and the clinical diagnosis accuracy are improved.

Specifically, main processing objects of the system are main complaints, current medical histories, past histories, personal histories, family histories and general examination results in free texts, and complete auxiliary diagnosis of patients is obtained.

In the above embodiment, preferably, the processor 504, when executing the computer program, further implements: after the step of performing word segmentation processing on the text information to generate words, the method further comprises the following steps: performing tagging processing on the part of speech of the word; and classifying the words according to the part-of-speech labels of the words.

In any of the above embodiments, preferably, when the processor 504 executes the computer program, the step of performing word segmentation processing on text information to generate words is implemented, and specifically includes: and performing word segmentation processing on the text information according to the disease dictionary and the regular expression and the removal disabled words to generate words.

In any of the above embodiments, preferably, when the processor 504 executes the computer program, the step of training a word into a long text vector is realized, specifically including: training the words into word vectors; the word vectors are grouped into long text vectors.

In any of the above embodiments, preferably, when the processor executes the computer program, the step of obtaining medical record information similar to the text information in the database according to the long text vector is implemented, and specifically includes: acquiring a plurality of long texts similar to the text information from a database, and respectively segmenting the long texts into word sets as a screening set; acquiring a long text matched with a word set obtained after word segmentation processing is carried out on text information in the screening set, and taking the long text as a priority result; calculating the relevance of a word set which is not matched with the text information in the screening set and the word set after the text information is subjected to word segmentation processing according to the long text vector; judging whether the relevance is greater than a preset threshold value or not; and if the relevance is greater than a preset threshold value, arranging the long texts which are not matched with the text information in a positive sequence according to the relevance.

Specifically, as shown in FIG. 6, the patient medical record 6 is entered, a similar medical record 62 is obtained in the medical records database 60, and the results are returned to the physician. Specifically, after the patient describes the disease, the doctor can search the long text similar medical records according to experience to make corresponding clinical diagnosis and provide a proper treatment scheme for the patient.

Specifically, as shown in fig. 7, the doctor separates the chief complaint input data 70, the patient disease history data 72, and the general examination data 74 in the medical record data according to the medical record data 7 of the new patient input by the new patient, performs chief complaint similarity calculation 702, medical history similarity calculation 722, and comprehensive similarity calculation 742 on the basis of the separated data, acquires similar medical records from the Chinese electronic medical record database 78, and returns the examination result to 76 to assist the doctor in making a clinical diagnosis.

As shown in fig. 8, in a third aspect of the present invention, a computer device 8 is provided, which includes a memory 80, a processor 82, and a computer program stored on the memory 80 and executable on the processor 82, and when the processor 82 executes the computer program, the method for retrieving the similarity between the texts of the medical records according to any of the above embodiments is implemented.

The embodiment provided by the invention comprises the medical record text similarity retrieval method in any embodiment, so that the embodiment has all the beneficial effects of the medical record text similarity retrieval method.

In a fourth aspect of the present invention, a computer-readable storage medium is provided, on which a computer program is stored, and when being executed by a processor, the computer program implements the steps of the method according to any of the above embodiments, so that the method has all the technical effects of the method for retrieving medical history text similarity, and details are not repeated herein.

In the present invention, the term "plurality" means two or more unless explicitly defined otherwise. The terms "mounted," "connected," "fixed," and the like are to be construed broadly, and for example, "connected" may be a fixed connection, a removable connection, or an integral connection; "coupled" may be direct or indirect through an intermediary. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the description herein, the description of the terms "one embodiment," "some embodiments," "specific embodiments," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for retrieving medical record text similarity is characterized by comprising the following steps:

receiving text information;

performing word segmentation processing on the text information to generate words;

training the words into long text vectors;

and acquiring medical record information similar to the text information in a database according to the long text vector.

2. The medical record text similarity retrieval method according to claim 1, wherein after the step of performing word segmentation processing on the text information to generate words, the method further comprises:

performing tagging processing on the part of speech of the word;

and classifying the words according to the part-of-speech labels of the words.

3. The medical record text similarity retrieval method according to claim 1, wherein the step of performing word segmentation processing on the text information to generate words specifically comprises:

and performing word segmentation processing on the text information according to the disease dictionary and the regular expression and removing the stop word to generate a word.

4. The medical record text similarity retrieval method according to claim 2, wherein the step of training the words into long text vectors specifically comprises:

training the words into word vectors;

forming the word vector into the long text vector.

5. The medical record text similarity retrieval method according to any one of claims 1 to 4, wherein the step of obtaining medical record information similar to the text information in a database according to the long text vector specifically includes:

acquiring a plurality of long texts similar to the text information from the database, and respectively segmenting the long texts into word sets as a screening set;

acquiring a long text matched with the word set after the word segmentation processing is carried out on the text information in the screening set, and taking the long text as a priority result;

calculating the relevance of a word set which is not matched with the text information in the screening set and a word set after word segmentation processing is carried out on the text information according to the long text vector;

judging whether the relevance is greater than a preset threshold value or not;

and if the relevance is greater than the preset threshold value, arranging the long texts which are not matched with the text information in a positive sequence according to the relevance.

6. A system for retrieving medical record text similarity is characterized by comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor when executing the computer program implementing:

receiving text information;

training the words into long text vectors;

7. The medical record text similarity retrieval system according to claim 6, wherein the processor, when executing the computer program, implements the word segmentation processing on the text information, and after the step of generating words, further comprises:

performing tagging processing on the part of speech of the word;

and classifying the words according to the part-of-speech labels of the words.

8. The medical record text similarity retrieval system according to claim 6, wherein the processor implements the step of performing word segmentation processing on the text information to generate words when executing the computer program, and specifically comprises:

9. The system for retrieving medical record text similarity according to claim 7, wherein the processor, when executing the computer program, implements the step of training the words into long text vectors, specifically comprising:

training the words into word vectors;

forming the word vector into the long text vector.

10. The system for retrieving medical record text similarity according to any one of claims 6 to 9, wherein the processor, when executing the computer program, implements the step of obtaining medical record information similar to the text information in the database according to the long text vector, specifically comprising:

judging whether the relevance is greater than a preset threshold value or not;

11. A computer device comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor implements the method for retrieving similarity between medical record texts according to any one of claims 1 to 5 when executing the computer program.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method for retrieving the similarity of medical record texts according to any one of claims 1 to 5.