CN112559686B - Information retrieval method and device and electronic equipment - Google Patents

Information retrieval method and device and electronic equipment Download PDF

Info

Publication number
CN112559686B
CN112559686B CN202011460471.2A CN202011460471A CN112559686B CN 112559686 B CN112559686 B CN 112559686B CN 202011460471 A CN202011460471 A CN 202011460471A CN 112559686 B CN112559686 B CN 112559686B
Authority
CN
China
Prior art keywords
medical
vector space
semantic vector
target
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011460471.2A
Other languages
Chinese (zh)
Other versions
CN112559686A (en
Inventor
蒋帅
罗雨
彭卫华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011460471.2A priority Critical patent/CN112559686B/en
Publication of CN112559686A publication Critical patent/CN112559686A/en
Application granted granted Critical
Publication of CN112559686B publication Critical patent/CN112559686B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The application discloses an information retrieval method, an information retrieval device and electronic equipment, and relates to the technical field of artificial intelligence such as natural language processing, deep learning and the like. The scheme is as follows: receiving medical query words to be queried; acquiring candidate departments corresponding to the medical query words; acquiring a first semantic vector space of the medical query word; and acquiring at least one target medical term corresponding to the medical query word based on the candidate department and the first semantic vector space. According to the application, after the medical query word to be queried is received, at least one target medical term corresponding to the medical query word to be queried is automatically acquired based on the candidate department and the first semantic vector space, so that the matched standardized term is acquired according to the non-standardized query word, the accuracy of the target medical term is ensured, the efficiency and the reliability in the information retrieval process are improved, and the expression of the medical term in the diagnosis process is standardized.

Description

Information retrieval method and device and electronic equipment
Technical Field
Embodiments of the present application relate generally to the field of image processing technology, and more particularly to the field of artificial intelligence technology for natural language processing, deep learning, and the like.
Background
In the medical field, for disease diagnosis of patients, the first page of the patient's medical records often requires medical terms such as specifying the name of the disease in order to specify the type of disease of the patient. However, due to the level and cognition of doctors, and the same medical term is often referred to in multiple forms. Therefore, the efficiency of the information retrieval process of the medical terms is extremely low, so that the problem of standardization of the medical terms is brought, and the follow-up related flow is influenced. Therefore, how to improve efficiency, accuracy and reliability in information retrieval process has become one of important research directions.
Disclosure of Invention
The application provides an information retrieval method, an information retrieval device and electronic equipment.
According to a first aspect, there is provided an information retrieval method comprising:
receiving medical query words to be queried;
acquiring candidate departments corresponding to the medical query words;
acquiring a first semantic vector space of the medical query word;
and acquiring at least one target medical term corresponding to the medical query word based on the candidate department and the first semantic vector space.
According to a second aspect, there is provided an information retrieval apparatus comprising:
The receiving module is used for receiving medical query words to be queried;
the first acquisition module is used for acquiring candidate departments corresponding to the medical query words;
the second acquisition module is used for acquiring a first semantic vector space of the medical query word;
and the third acquisition module is used for acquiring at least one target medical term corresponding to the medical query word based on the candidate department and the first semantic vector space.
According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the information retrieval method of the first aspect of the present application.
According to a fourth aspect, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the information retrieval method according to the first aspect of the present application.
According to a fifth aspect, there is provided a computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the information retrieval method according to the first aspect of the application.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.
Drawings
The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:
FIG. 1 is a schematic diagram of a first embodiment according to the present application;
FIG. 2 is a schematic illustration of an interactive interface;
FIG. 3 is a schematic diagram of a second embodiment according to the present application;
FIG. 4 is a schematic diagram of a third embodiment according to the present application;
FIG. 5 is a schematic diagram of a fourth embodiment according to the application;
FIG. 6 is a schematic diagram of a fifth embodiment according to the present application;
FIG. 7 is a schematic diagram of a sixth embodiment according to the application;
fig. 8 is a schematic view of a seventh embodiment according to the present application;
FIG. 9 is a block diagram of an information retrieval apparatus for implementing an information retrieval method of an embodiment of the present application;
FIG. 10 is a block diagram of an information retrieval apparatus for implementing an information retrieval method of an embodiment of the present application;
FIG. 11 is a block diagram of an information retrieval apparatus for implementing an information retrieval method of an embodiment of the present application;
FIG. 12 is a block diagram of an electronic device for information retrieval that is used to implement an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The technical field related to the scheme of the application is briefly described as follows:
data processing (data processing), including the collection, storage, retrieval, processing, variation, and transmission of data, aims to extract and derive valuable, meaningful data for some particular user from a large, possibly unorganized, unintelligible amount of data.
AI (Artificial Intelligence ) is a discipline of studying certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) that make computers simulate life, both hardware-level and software-level technologies. Artificial intelligence hardware technologies generally include computer vision technologies, speech recognition technologies, natural language processing technologies, and learning/deep learning, big data processing technologies, knowledge graph technologies, and the like.
DL (Deep Learning), a new research direction in the field of ML Machine Learning (Machine Learning), was introduced into Machine Learning to make it closer to the original goal-artificial intelligence. Deep learning is the inherent law and presentation hierarchy of learning sample data, and the information obtained during such learning is helpful in interpreting data such as text, images and sounds. Its final goal is to have the machine have analytical learning capabilities like a person, and to recognize text, image, and sound data. Deep learning is a complex machine learning algorithm that achieves far greater results in terms of speech and image recognition than prior art.
NLP (Natural Language Processing ) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relation with the research in linguistics, but has important differences. Natural language processing is not a general study of natural language, but rather, is the development of computer systems, and in particular software systems therein, that can effectively implement natural language communications. It is thus part of computer science.
It should be noted that in the prior art, a doctor often needs to determine the name of a disease suffered by a patient based on his own experience, and record the name in the first page of a patient's medical record, that is, the doctor often needs to rely on his own experience to determine a target medical term. However, doctors' level and cognition are different and there are often multiple forms of designations for the same medical term. For example, for a condition of patient A respiratory obstruction, the target medical term determined by doctor A is upper airway obstruction, while the target medical term determined by doctor B is nasal polyp obstruction. As such, the accuracy of the determination of the target medical term tends to be extremely low.
Further, after the standard term set is manually arranged, in the subsequent use process, the term expression meeting the requirements needs to be searched through keywords. However, by means of keyword retrieval, it is very inflexible that exactly the same words must be included for recall, resulting in extremely low recall rates for retrieval, and for physicians without much experience, great effort is required to learn the expression of standard terms.
Therefore, the information retrieval method provided by the application does not depend on the experience of doctors any more, but automatically realizes the acquisition of at least one corresponding target medical term by receiving the medical query word to be queried, thereby ensuring the accuracy of the acquired target medical term.
The following describes an information retrieval method, an information retrieval device and electronic equipment according to an embodiment of the application with reference to the accompanying drawings.
Fig. 1 is a schematic diagram according to a first embodiment of the present application. The main body of execution of the information retrieval method of the present embodiment is an information retrieval device, and the information retrieval device may specifically be a hardware device, or software in a hardware device. Wherein the hardware devices such as terminal devices, servers, etc.
As shown in fig. 1, the information retrieval method provided in this embodiment includes the following steps:
s101, receiving medical query words to be queried.
Wherein, the medical query word can be any query word expressed by medicine. For example, it may be a poor breathing, asthma, etc.
In the embodiment of the application, the user can send the medical query words to be queried in various modes. As one possible implementation, the user may perform various operations on the interactive interface of the client to implement the corresponding functions. Optionally, a control for inputting and sending the medical query word to be queried is arranged on the interactive interface of the client, and the user can realize the function of inputting and sending the medical query word to be queried by triggering the control successively. Accordingly, the medical query words to be queried sent by the client can be received.
For example, as shown in fig. 2, the medical query term to be queried is a respiratory disorder. At this time, the user may input the medical query word to be queried, that is, the respiratory disorder, by triggering the input control 2-2 on the interactive interface 2-1 of the client, and then send the medical query word to be queried through the sending control 2-3. Accordingly, the medical query terms to be queried may be received.
S102, obtaining candidate departments corresponding to the medical query words.
The candidate departments may be at least one corresponding department.
For example, for the medical query term "dyspnea" to be queried, the corresponding candidate departments may be obtained including the following 4 departments: breathe internal medicine, otorhinolaryngology branch of academic or vocational study, department of traditional chinese medicine branch of academic or vocational study and department of neurology.
In the application, the specific mode of acquiring the candidate departments corresponding to the medical query words is not limited, and the candidate departments can be selected according to actual conditions. As a possible implementation manner, the medical query word to be queried may be input into a pre-trained model, so as to output as a candidate department corresponding to the medical query word.
S103, acquiring a first semantic vector space of the medical query word.
In the embodiment of the application, any medical query word expressed by medicine can be mapped into the semantic vector space and expressed by the semantic vector space to be used as a first semantic vector space.
For example, for a medical query term "dyspnea" to be queried, a first semantic vector space of the medical query term may be obtained as L j
In the present application, the specific manner of obtaining the first semantic vector space of the medical query term is not limited, and may be selected according to actual situations. As one possible implementation, the medical query terms to be queried may be input into a pre-trained model to take the output as the first semantic vector space.
S104, acquiring at least one target medical term corresponding to the medical query word based on the candidate department and the first semantic vector space.
Wherein, the target medical term may be at least one term of standard medical expression corresponding to the medical query word.
Optionally, a target medical term corresponding to the medical query word may be obtained based on the candidate department and the first semantic vector space, and provided to a user such as a doctor, so as to directly record the target medical term on the first page of the patient case.
Optionally, a plurality of target medical terms corresponding to the medical query word can be acquired based on the candidate department and the first semantic vector space and are all provided for users such as doctors, so that the users can screen the target medical terms and record one determined target medical term on the first page of the patient medical record.
According to the information retrieval method provided by the embodiment of the application, after the medical query word to be queried is received, at least one target medical term corresponding to the medical query word to be queried can be automatically acquired based on the candidate department and the first semantic vector space, so that the matched standardized term is acquired according to the non-standardized query word, the accuracy of the target medical term is ensured, the efficiency and the reliability in the information retrieval process are improved, and the expression of the medical term in the diagnosis process is standardized. Further, by acquiring the candidate departments corresponding to the medical query words, the matching operation amount can be reduced, interference caused by other departments to the determination of the target medical terms is eliminated, and the accuracy of the target medical terms is further improved.
Fig. 3 is a schematic diagram according to a second embodiment of the application.
As shown in fig. 3, on the basis of the above embodiment, the information retrieval method provided by the present application specifically includes the following steps:
s301, receiving medical query words to be queried.
S302, obtaining candidate departments corresponding to the medical query words.
Alternatively, the medical query word may be input into the target department classification model for learning, so as to output candidate departments corresponding to the medical query word.
The target department classification model is trained in advance.
S303, acquiring a first semantic vector space of the medical query word.
Steps S301 to S303 are the same as steps S101 to S103 in the above embodiment, and will not be described here again.
The specific process of obtaining at least one target medical term corresponding to the medical query word in step S104 based on the candidate department and the first semantic vector space includes steps S304 to S306.
S304, based on the candidate departments, obtaining candidate medical terms corresponding to the medical query words.
As shown in fig. 4, based on the above embodiment, the specific process of obtaining the candidate medical terms corresponding to the medical query words based on the candidate departments in the step S304 includes the following steps:
s401, based on the identification of the candidate department, medical terms consistent with the identification are screened from the medical term set.
Wherein, the medical term set refers to a standard medical expression term set, and each medical term in the set corresponds to one identification.
Alternatively, a set of medical terms may be traversed based on the identity of the candidate subject and medical terms consistent with the identity may be screened therefrom.
For example, based on the identity a of the candidate department, the set of medical terms is traversed and medical terms consistent with the identity a are screened therefrom.
S402, medical terms consistent with the identification are taken as candidate medical terms.
For example, based on the identification a of the candidate department, the medical term set is traversed, and the medical terms consistent with the identification a are selected as the medical terms 1 to 3, in which case the medical terms 1 to 3 can be used as the candidate medical terms.
S305, acquiring a second semantic vector space of the candidate medical term.
In the embodiment of the application, the candidate medical terms can be mapped into the semantic vector space with the same dimension as the first semantic vector space, and expressed by the semantic vector space as a second semantic vector space.
For example, for medical terms 1-3, the second semantic vector space that can acquire candidate medical terms is L i
S306, acquiring at least one target medical term according to the first semantic vector space and the second semantic vector space.
As shown in fig. 5, based on the above embodiment, the specific process of obtaining at least one target medical term according to the first semantic vector space and the second semantic vector space in the above step S306 includes the following steps:
s501, acquiring semantic similarity of a first semantic vector space and a second semantic vector space.
It should be noted that, in the present application, a specific manner of obtaining the semantic similarity between the first semantic vector space and the second semantic vector space is not limited, and may be selected according to actual situations.
Alternatively, the semantic similarity between the first semantic vector space and the second semantic vector space may be obtained by calculating a cosine distance or a euclidean distance.
For example, for the medical query term "dyspnea" to be queried, the first semantic vector space of the medical query term is L j Second semantic vector space L of candidate medical terms i In this case, the semantic similarity between the first semantic vector space and the second semantic vector space may be obtained by calculating the cosine distance.
It should be noted that the semantic similarity between the first semantic vector space and the second semantic vector space is between [ -1,1 ]. The closer the value of semantic similarity is to 1, the closer the directions of the two vectors are explained; the closer the value of semantic similarity is to-1, the more opposite the direction of the two vectors is described; a value of semantic similarity close to 0 indicates that the two vectors are nearly orthogonal.
S502, obtaining at least one target medical term corresponding to the medical query word according to the semantic similarity.
Alternatively, one candidate medical term with the highest semantic similarity may be acquired as the target medical term.
Alternatively, at least one candidate medical term having a semantic similarity greater than a preset similarity threshold may be obtained as the target medical term. Further, the medical terms may be arranged in a predetermined order such as descending order, and then recommended to the user for selection by the user from at least one candidate medical term.
According to the information retrieval method provided by the embodiment of the application, the second semantic vector space of the candidate medical terms can be obtained, and the semantic similarity between the first semantic vector space and the second semantic vector space is calculated, so that at least one target medical term corresponding to the medical query word is obtained according to the semantic similarity, the accuracy of the target medical term is further ensured, and the efficiency and the reliability in the information retrieval process are improved.
In the application, after receiving the medical query word to be queried, the candidate departments corresponding to the medical query word can be obtained according to the medical query word to be queried based on the pre-trained target department classification model, so that the interference caused by other departments to the determination of the candidate departments is eliminated while the matching operation amount is reduced. Further, the first semantic vector space may be obtained according to the medical query word to be queried and the candidate medical terms based on a pre-trained target semantic vector space model.
The training process of the target department classification model and the target semantic vector space model is explained below.
For the target department classification model, optionally, fine-tuning can be performed based on the ERNIE classification model to obtain a trained target department classification model, and then candidate departments are obtained based on the target department classification model.
As shown in fig. 6, on the basis of the above embodiment, the training process of the classification model of the objective department specifically includes the following steps:
s601, acquiring standard medical terms and department label information of the standard medical terms as first training data of a department classification model.
For example, based on the data requirement of the target department classification model, classification labeling of departments can be performed on 10 ten thousand standard medical terms to divide the standard medical terms into 171 different department categories, and each category of standard medical terms corresponds to one department label information. In this case, department tag information of 10 ten thousand standard medical terms and 171 standard medical terms can be used as first training data of the department classification model.
S602, training the department classification model based on the first training data to generate a target department classification model.
As one possible implementation manner, the first training data may be normalized to label\tdata based on the first training data, and input into the department classification model for training, so as to generate the target department classification model.
Optionally, a department label information prediction result corresponding to the first training data may be obtained, a difference between the department label information prediction result and department label information of standard medical terms may be obtained, then model parameters in the department classification model may be adjusted according to the difference until the difference meets a preset training end condition, and the department classification model after the last adjustment of the model parameters may be determined as a trained target department classification model.
For the target semantic vector space model, optionally, fine-tuning can be performed based on the ERNIE classification model to obtain a trained target semantic vector space model, and then semantic space vector conversion is performed based on the target semantic vector space model.
As shown in fig. 7, on the basis of the above embodiment, the training process of the classification model of the objective department specifically includes the following steps:
s701, standard medical terms and standard semantic vector space of the standard medical terms are acquired and used as second training data of a semantic vector space model.
For example, a collection of 10 ten thousand standard medical terms recorded may be converted into a standard semantic vector space of standard medical terms. In this case, the standard semantic vector space of 10 ten thousand standard medical terms and the converted standard medical terms can be used as the second training data of the semantic vector space model.
S702, training the semantic vector space model based on the second training data to generate a target semantic vector space model.
As one possible implementation, the second training data may be input into a semantic vector space model for training to generate a target semantic vector space model.
Optionally, a semantic vector space prediction result corresponding to the second training data may be obtained, a difference between the semantic vector space prediction result and a standard semantic vector space of the standard medical term may be obtained, then model parameters in the semantic vector space model may be adjusted according to the difference until the difference meets a preset training end condition, and the semantic vector space model after the last adjustment of the model parameters may be determined as a trained target semantic vector space model.
Fig. 8 is a schematic diagram according to a seventh embodiment of the present application. As shown in fig. 8, based on the above embodiment, the information retrieval method proposed in the present embodiment includes the following steps:
S801, receiving medical query words to be queried.
S802, obtaining candidate departments corresponding to the medical query words.
S803, acquiring a first semantic vector space of the medical query word.
In the embodiment of the application, the medical query word can be input into the target semantic vector space model for learning so as to output a first semantic vector space.
S804, based on the identification of the candidate departments, medical terms consistent with the identification are screened from the medical term set.
S805, medical terms consistent with the identification are used as candidate medical terms.
S806, acquiring a second semantic vector space of the candidate medical term.
In the embodiment of the application, the candidate medical terms can be input into the target semantic vector space model for learning so as to output a second semantic vector space.
S807, acquiring semantic similarity of the first semantic vector space and the second semantic vector space.
S808, acquiring at least one target medical term corresponding to the medical query term according to the semantic similarity.
It should be noted that, the descriptions of steps S801 to S808 may be referred to the relevant descriptions in the above embodiments, and are not repeated here.
It should be noted that the information retrieval method provided by the application can be applied to various scenes.
Aiming at the application scene of doctor's consultation, the sitting doctor can input the medical query words to be queried on the interaction interface of the office system, such as unsmooth breathing. Accordingly, after receiving the medical query word to be queried, candidate departments, such as respiratory department, otorhinolaryngology department, traditional Chinese medicine department and neurology department, can be output based on the trained target department classification model. And outputting a first semantic vector space based on the trained target semantic vector space model.
Further, a plurality of target medical terms can be automatically acquired through the candidate departments and the first semantic vector space, the corresponding semantic similarity is arranged in a reverse order, and then the target medical terms with the top 3 ranks are provided for doctors. Under the condition, doctors can combine self experience, select a target medical term from the experience to record on the medical records of patients, and the diagnosis efficiency is improved.
For the application scene of learning of medical students, the students can input any medical query word which is uncertain and corresponds to the standardized medical expression mode, such as unsmooth breathing, on the interaction interface of the office system. Accordingly, after receiving the medical query word to be queried, candidate departments, such as respiratory department, otorhinolaryngology department, traditional Chinese medicine department and neurology department, can be output based on the trained target department classification model. And outputting a first semantic vector space based on the trained target semantic vector space model.
Further, the candidate departments and the first semantic vector space can be used for automatically acquiring the target medical term with the highest semantic similarity and providing the target medical term for students. Under the condition, students can learn more accurate target medical terms corresponding to the medical query words, and the learning effect is improved.
In summary, according to the application, based on the received non-standardized medical query words, corresponding standardized medical terms can be automatically obtained, so that not only is the accuracy of the target medical terms ensured, the efficiency and reliability in the information retrieval process are improved, and the expression of the medical terms in the diagnosis process is standardized, but also the matching operation amount can be reduced and the interference caused by other departments to the determination of the target medical terms is eliminated by obtaining the candidate departments corresponding to the medical query words, thereby improving the accuracy of the target medical terms. Further, the subsequent encoding process can be facilitated, and the recall rate is improved.
An embodiment of the present application also provides an information retrieval device corresponding to the information retrieval methods provided in the above embodiments, and since the information retrieval device provided in the embodiment of the present application corresponds to the information retrieval method provided in the above embodiments, implementation of the information retrieval method is also applicable to the information retrieval device provided in the embodiment, and will not be described in detail in the embodiment.
Fig. 9 is a schematic structural view of an information retrieval apparatus according to an embodiment of the present application.
As shown in fig. 9, the information retrieval apparatus 900 includes: a receiving module 910, a first obtaining module 920, a second obtaining module 930, and a third obtaining module 940. Wherein:
a receiving module 910, configured to receive a medical query word to be queried;
a first obtaining module 920, configured to obtain a candidate department corresponding to the medical query word;
a second obtaining module 930, configured to obtain a first semantic vector space of the medical query term;
a third obtaining module 940, configured to obtain at least one target medical term corresponding to the medical query term based on the candidate department and the first semantic vector space.
Fig. 10 is a schematic structural view of an information retrieval apparatus according to another embodiment of the present application.
As shown in fig. 10, the information retrieval apparatus 1000 includes: a receiving module 1010, a first acquiring module 1020, a second acquiring module 1030, and a third acquiring module 1040. Wherein:
the third acquisition module 1040 includes:
a first obtaining submodule 1041, configured to obtain, based on the candidate department, a candidate medical term corresponding to the medical query term;
a second obtaining submodule 1042 for obtaining a second semantic vector space of the candidate medical term;
A third obtaining submodule 1043, configured to obtain the at least one target medical term according to the first semantic vector space and the second semantic vector space.
Wherein the third acquiring sub-module 1043 includes:
a first obtaining unit 10431, configured to obtain a semantic similarity between the first semantic vector space and the second semantic vector space;
the second obtaining unit 10432 is configured to obtain, according to the semantic similarity, the at least one target medical term corresponding to the medical query term.
Wherein the first obtaining submodule 1041 includes:
a screening unit 10411, configured to screen, based on the identification of the candidate department, medical terms consistent with the identification from the medical term set;
a determining unit 10412, configured to take, as the candidate medical term, a medical term that is consistent with the identification.
Wherein, the first obtaining module 1020 includes:
and the learning submodule 1021 is used for inputting the medical query words into a target department classification model for learning so as to output the candidate departments corresponding to the medical query words.
The receiving module 1010 and the second obtaining module 1030 have the same functions and structures as the receiving module 910 and the second obtaining module 930.
Fig. 11 is a schematic structural view of an information retrieval apparatus according to another embodiment of the present application.
As shown in fig. 11, the information retrieval apparatus 1100 includes: the receiving module 1110, the first acquiring module 1120, the second acquiring module 1130, and the third acquiring module 1140 further include: a first training module 1150, a learning module 1160, and a second training module 1170.
The first training module 1150 is configured to train the target department classification model, and includes:
a fourth acquiring sub-module 1151, configured to acquire standard medical terms and department label information of the standard medical terms as first training data of a department classification model;
a first generating sub-module 1152 is configured to train the department classification model based on the first training data to generate the target department classification model.
The learning module 1160 is configured to input the medical query word and the candidate medical term into a target semantic vector space model for learning, so as to output the first semantic vector space and the second semantic vector space.
Wherein, the second training module 1170 is configured to train the target semantic vector space model, and includes:
A fifth obtaining submodule 1171 for obtaining standard medical terms and standard semantic vector spaces of the standard medical terms as second training data of a semantic vector space model;
a second generating sub-module 1172 is configured to train the semantic vector space model based on the second training data to generate the target semantic vector space model.
It should be noted that the receiving module 1110, the first acquiring module 1120, the second acquiring module 1130, and the third acquiring module 1140 have the same functions and structures as the receiving module 910, the first acquiring module 920, the second acquiring module 930, and the third acquiring module 940.
According to the information retrieval device provided by the embodiment of the application, after the medical query word to be queried is received, at least one target medical term corresponding to the medical query word to be queried can be automatically acquired based on the candidate department and the first semantic vector space, so that the matched standardized term is acquired according to the non-standardized query word, the accuracy of the target medical term is ensured, the efficiency and the reliability in the information retrieval process are improved, and the expression of the medical term in the diagnosis process is standardized. Further, by acquiring the candidate departments corresponding to the medical query words, the matching operation amount can be reduced, interference caused by other departments to the determination of the target medical terms is eliminated, and the accuracy of the target medical terms is further improved.
According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.
As shown in fig. 12, is a block diagram of an electronic device for information retrieval according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.
As shown in fig. 12, the electronic device includes: one or more processors 1210, memory 1220, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 1210 is illustrated in fig. 12.
Memory 1220 is a non-transitory computer readable storage medium provided by the present application. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the information retrieval method provided by the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to execute the information retrieval method provided by the present application.
The memory 1220 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the receiving module 910, the first acquiring module 920, the second acquiring module 930, and the third acquiring module 940 shown in fig. 9) corresponding to the information retrieval method according to the embodiment of the present application. Processor 1210 executes various functional applications of the server and data processing, i.e., implements the information retrieval method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in memory 1220.
Memory 1220 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created from the use of the positioning electronic device, etc. In addition, memory 1220 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 1220 may optionally include memory located remotely from processor 1210, which may be connected to the positioning electronics via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device for information retrieval may further include: an input device 1230 and an output device 1240. Processor 1210, memory 1220, input device 1230, and output device 1240 may be connected by a bus or other means, for example in fig. 12.
The input device 1230 may receive input numeric or character information and generate key signal inputs related to locating user settings and function controls of the electronic device, such as a touch screen, keypad, mouse, trackpad, touchpad, pointer stick, one or more mouse buttons, trackball, joystick, and like input devices. The output means 1240 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.
According to the information retrieval method provided by the embodiment of the application, after the medical query word to be queried is received, at least one target medical term corresponding to the medical query word to be queried can be automatically acquired based on the candidate department and the first semantic vector space, so that the matched standardized term is acquired according to the non-standardized query word, the accuracy of the target medical term is ensured, the efficiency and the reliability in the information retrieval process are improved, and the expression of the medical term in the diagnosis process is standardized. Further, by acquiring the candidate departments corresponding to the medical query words, the matching operation amount can be reduced, interference caused by other departments to the determination of the target medical terms is eliminated, and the accuracy of the target medical terms is further improved.
According to an embodiment of the present application, there is also provided a computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the information retrieval method of the embodiment of the present application.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.
The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims (8)

1. An information retrieval method, comprising:
receiving medical query words to be queried;
inputting the medical query words into a target department classification model for learning so as to output candidate departments corresponding to the medical query words, wherein the training process of the target department classification model comprises the following steps: acquiring standard medical terms and department label information of the standard medical terms as first training data of a department classification model, and training the department classification model based on the first training data to generate the target department classification model;
inputting the medical query words into a target semantic vector space model for learning so as to output a first semantic vector space of the medical query words;
acquiring candidate medical terms corresponding to the medical query words based on the candidate departments;
Inputting the candidate medical terms into the target semantic vector space model for learning to output a second semantic vector space of the candidate medical terms, wherein the training process of the target semantic vector space model comprises the following steps: acquiring standard medical terms and standard semantic vector spaces of the standard medical terms as second training data of a semantic vector space model, and training the semantic vector space model based on the second training data to generate the target semantic vector space model;
and acquiring at least one target medical term corresponding to the medical query word according to the first semantic vector space and the second semantic vector space.
2. The information retrieval method according to claim 1, wherein the obtaining at least one target medical term corresponding to the medical query term according to the first semantic vector space and the second semantic vector space includes:
acquiring semantic similarity of the first semantic vector space and the second semantic vector space;
and acquiring the at least one target medical term corresponding to the medical query term according to the semantic similarity.
3. The information retrieval method according to claim 1, wherein the obtaining, based on the candidate department, a candidate medical term corresponding to the medical query term includes:
screening medical terms consistent with the identification from a medical term set based on the identification of the candidate department;
medical terms consistent with the identification are taken as the candidate medical terms.
4. An information retrieval apparatus comprising:
the receiving module is used for receiving medical query words to be queried;
the first acquisition module is used for inputting the medical query word into a target department classification model for learning so as to output candidate departments corresponding to the medical query word, wherein the training process of the target department classification model comprises the following steps: acquiring standard medical terms and department label information of the standard medical terms as first training data of a department classification model, and training the department classification model based on the first training data to generate the target department classification model;
the second acquisition module is used for inputting the medical query words into a target semantic vector space model for learning so as to output a first semantic vector space of the medical query words;
The third obtaining module is configured to obtain, based on the candidate department, a candidate medical term corresponding to the medical query term, input the candidate medical term into the target semantic vector space model for learning, so as to output a second semantic vector space of the candidate medical term, where a training process of the target semantic vector space model includes: the standard medical term and the standard semantic vector space of the standard medical term are obtained to serve as second training data of a semantic vector space model, the semantic vector space model is trained based on the second training data to generate the target semantic vector space model, and at least one target medical term corresponding to the medical query word is obtained according to the first semantic vector space and the second semantic vector space.
5. The information retrieval apparatus as recited in claim 4, wherein the third acquisition module comprises:
the first acquisition unit is used for acquiring the semantic similarity between the first semantic vector space and the second semantic vector space;
the second obtaining unit is used for obtaining the at least one target medical term corresponding to the medical query word according to the semantic similarity.
6. The information retrieval apparatus of claim 4, wherein the first acquisition module comprises:
a screening unit, configured to screen medical terms consistent with the identifiers from a medical term set based on the identifiers of the candidate departments;
a determining unit for regarding the medical term consistent with the identification as the candidate medical term.
7. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the information retrieval method of any one of claims 1-3.
8. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the information retrieval method of any one of claims 1-3.
CN202011460471.2A 2020-12-11 Information retrieval method and device and electronic equipment Active CN112559686B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011460471.2A CN112559686B (en) 2020-12-11 Information retrieval method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011460471.2A CN112559686B (en) 2020-12-11 Information retrieval method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN112559686A CN112559686A (en) 2021-03-26
CN112559686B true CN112559686B (en) 2023-10-27

Family

ID=

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106610972A (en) * 2015-10-21 2017-05-03 阿里巴巴集团控股有限公司 Query rewriting method and apparatus
CN106919793A (en) * 2017-02-24 2017-07-04 黑龙江特士信息技术有限公司 A kind of data standardization processing method and device of medical big data
WO2018040503A1 (en) * 2016-08-30 2018-03-08 北京百度网讯科技有限公司 Method and system for obtaining search results
CN109033080A (en) * 2018-07-12 2018-12-18 上海金仕达卫宁软件科技有限公司 Medical terms standardized method and system based on probability transfer matrix
WO2020177230A1 (en) * 2019-03-07 2020-09-10 平安科技(深圳)有限公司 Medical data classification method and apparatus based on machine learning, and computer device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106610972A (en) * 2015-10-21 2017-05-03 阿里巴巴集团控股有限公司 Query rewriting method and apparatus
WO2018040503A1 (en) * 2016-08-30 2018-03-08 北京百度网讯科技有限公司 Method and system for obtaining search results
CN106919793A (en) * 2017-02-24 2017-07-04 黑龙江特士信息技术有限公司 A kind of data standardization processing method and device of medical big data
CN109033080A (en) * 2018-07-12 2018-12-18 上海金仕达卫宁软件科技有限公司 Medical terms standardized method and system based on probability transfer matrix
WO2020177230A1 (en) * 2019-03-07 2020-09-10 平安科技(深圳)有限公司 Medical data classification method and apparatus based on machine learning, and computer device and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"Semantic Vector Encoding and Similarity Search Using Fulltext Search Engines";Jan Rygl et al.;《arXiv》;1-10 *
基于语义向量表示的查询扩展方法;李岩;张博文;郝红卫;;计算机应用(第09期);2526-2530 *
知识基础与前沿载文间的知识流动分析――以信息领域中的Gerard Salton为例;高继平;丁堃;刘宇;陈玉光;;情报杂志(第10期);98-101 *
面向电子病历中文医学信息的可视组织方法;徐天明;樊银亭;马翠霞;滕东兴;;计算机系统应用(第11期);44-51 *

Similar Documents

Publication Publication Date Title
CN112560912B (en) Classification model training method and device, electronic equipment and storage medium
CN112507715B (en) Method, device, equipment and storage medium for determining association relation between entities
US10679345B2 (en) Automatic contour annotation of medical images based on correlations with medical reports
EP3852001A1 (en) Method and apparatus for generating temporal knowledge graph, device, and medium
CN111710412A (en) Diagnostic result checking method and device and electronic equipment
CN112347769B (en) Entity recognition model generation method and device, electronic equipment and storage medium
CN112216359B (en) Medical data verification method and device and electronic equipment
US10339416B2 (en) Database systems and user interfaces for dynamic and interactive mobile image analysis and identification
CN112115299A (en) Video searching method and device, recommendation method, electronic device and storage medium
CN112528037B (en) Side relation prediction method, device, equipment and storage medium based on knowledge graph
CN112507090B (en) Method, apparatus, device and storage medium for outputting information
CN112509690B (en) Method, apparatus, device and storage medium for controlling quality
CN112528001B (en) Information query method and device and electronic equipment
CN111326251B (en) Question output method and device and electronic equipment
CN111274397B (en) Method and device for establishing entity relation detection model
CN112560505A (en) Recognition method and device of conversation intention, electronic equipment and storage medium
JP7242994B2 (en) Video event identification method, apparatus, electronic device and storage medium
CN111966782B (en) Multi-round dialogue retrieval method and device, storage medium and electronic equipment
CN112508004A (en) Character recognition method and device, electronic equipment and storage medium
CN111967599A (en) Method and device for training model, electronic equipment and readable storage medium
US20240112775A1 (en) Ai platform for processing speech and video information collected during a medical procedure
US11763081B2 (en) Extracting fine grain labels from medical imaging reports
CN112559686B (en) Information retrieval method and device and electronic equipment
US11928186B2 (en) Combined deep learning and knowledge driven reasoning for artificial intelligence classification
CN112559686A (en) Information retrieval method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant