CN114464328A - Test information retrieval method and device, clinical test recommendation method and terminal - Google Patents

Test information retrieval method and device, clinical test recommendation method and terminal Download PDF

Info

Publication number
CN114464328A
CN114464328A CN202210130064.8A CN202210130064A CN114464328A CN 114464328 A CN114464328 A CN 114464328A CN 202210130064 A CN202210130064 A CN 202210130064A CN 114464328 A CN114464328 A CN 114464328A
Authority
CN
China
Prior art keywords
medical record
test
vector
clinical
trial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210130064.8A
Other languages
Chinese (zh)
Inventor
谭传奇
靳琪奥
袁正
黄松芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202210130064.8A priority Critical patent/CN114464328A/en
Publication of CN114464328A publication Critical patent/CN114464328A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Public Health (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Epidemiology (AREA)
  • Computational Linguistics (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for searching test information, a method for recommending clinical tests and a terminal. The retrieval method comprises the following steps: analyzing a medical record vector of a target medical record by adopting a pre-trained medical record encoder, calculating the relevance score of each clinical test and the target medical record according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of relevance scores, sequencing the relevance scores to obtain a score sequencing result, and integrating the clinical tests corresponding to the relevance scores which are greater than a preset score threshold value in the score sequencing result to obtain test information of the clinical tests which can be participated by a target patient and are related to the target medical record. The invention solves the technical problems of lower accuracy and poorer efficiency in searching the clinical test matched with the patient in the related technology.

Description

Test information retrieval method and device, clinical test recommendation method and terminal
Technical Field
The invention relates to the technical field of information processing, in particular to a method and a device for searching test information, a method for recommending clinical tests and a terminal.
Background
Clinical trials generally refer to studies in which patients are grouped, given different treatments and observed for effect, and can be used to evaluate the efficacy of a new drug or therapy for a particular disease. Clinically, conventional therapies may not be effective in some patients, for example, patients with advanced tumors develop resistance to existing first and second line treatments, and many rare diseases are not yet marketed with any drug, and in such cases, patients are often required to participate in clinical trials of emerging therapies.
In the related art, the information of the patient often only includes medical record abstract summarization, and the retrieval is performed by inputting the keywords in the medical record abstract, so that the retrieval mode cannot obtain an accurate matching result, the extraction of the keywords from the original medical record is time-consuming and labor-consuming, and the extracted keywords cannot describe the overall view of the patient.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a method and a device for searching test information, a method for recommending clinical tests and a terminal, which are used for at least solving the technical problems of lower accuracy and poorer efficiency when searching the clinical tests matched with patients in the related technology.
According to an aspect of an embodiment of the present invention, there is provided a method for retrieving test information, including: analyzing a medical record vector of a target medical record by adopting a pre-trained medical record encoder; calculating the relevance score of each clinical test and the target medical record according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of relevance scores; sequencing the plurality of relevance scores to obtain score sequencing results; and integrating the clinical tests corresponding to the relevance scores which are greater than a preset score threshold value in the score sorting result to obtain test information of the clinical tests which can be participated by the target patient and are related to the target medical record.
Optionally, the step of calculating a relevance score of each clinical trial and the target medical record according to the medical record vector and a plurality of clinical trials in a preset database to obtain a plurality of relevance scores includes: traversing the clinical tests in the preset database to obtain a test text corresponding to each clinical test; analyzing the test text corresponding to each clinical test by adopting a pre-trained test encoder to obtain a test vector; calculating the similarity between the medical record vector and each test vector in the test vector set corresponding to each clinical test; and calculating the relevance score of each clinical test and the target medical record based on the similarity between the medical record vector and the test vector corresponding to the clinical test to obtain a plurality of relevance scores.
Optionally, the step of analyzing the test text corresponding to each clinical test by using a pre-trained test encoder to obtain a test vector includes: integrating text titles and brief texts in the test texts into a test document, and acquiring a first test vector of the test document by using the test encoder, wherein the brief texts are texts for carrying out summary description on the test documents; integrating text contents meeting the requirements of a preset standard text in the test text into a test standard text, and acquiring a second test vector of the test standard text by using the medical record encoder; integrating text contents which do not meet the requirements of a preset standard text in the test text into an exclusion standard text, and acquiring a third test vector of the exclusion standard text by using the medical record encoder; wherein the first trial vector, the second trial vector, and the third trial vector generate the set of trial vectors.
Optionally, the step of calculating a relevance score of each clinical trial and the target medical record based on the similarity between the medical record vector and the trial vector corresponding to the clinical trial to obtain the multiple relevance scores includes: calculating cosine similarity between the medical record vector and the first test vector to obtain first similarity; calculating cosine similarity between the medical record vector and the second test vector to obtain a second similarity; calculating cosine similarity between the medical record vector and the third test vector to obtain a third similarity; and obtaining the relevance score according to the first similarity, the second similarity and the third similarity.
Optionally, before analyzing the medical record vector of the target medical record by using the pre-trained medical record encoder, the method further includes: calculating a first matching loss parameter of medical record-clinical test matching; calculating a second matching loss parameter of clinical trial-medical record matching; and training to obtain the medical record encoder and the test encoder by combining the matching loss parameter I, the corresponding first adjusting parameter, the matching loss parameter II and the corresponding second adjusting parameter.
Optionally, before calculating the first match loss parameter of medical record-clinical trial matching, the method further includes: calculating vectors of text contents meeting the requirements of a preset standard text and text contents not meeting the requirements of the preset text by adopting a medical record encoder to obtain a first training vector and a second training vector; and calculating vectors of clinical trial documents and random trial documents in the trial text of each clinical trial by adopting a trial encoder to obtain a third training vector and a fourth training vector.
Optionally, the step of calculating a first match loss parameter for medical record-clinical trial matching comprises: calculating a first distance parameter between the first anchor point vector and the first positive case vector and a second distance parameter between the first anchor point vector and the first negative case vector by taking the first training vector as a first anchor point vector in loss calculation in medical record-clinical test matching, taking the third training vector as a first positive case vector and taking the fourth training vector as a first negative case vector; calculating a first difference value between the first distance parameter and the second distance parameter; and calculating a first matching loss parameter matched with the medical record-clinical test based on the first anchor point vector, the first positive case vector, the first negative case vector and the first difference value.
Optionally, the step of calculating a second match loss parameter for clinical trial-medical record matching includes: calculating a third distance parameter between the second anchor point vector and the second positive case vector and a fourth distance parameter between the second anchor point vector and the second negative case vector by taking the third training vector as a second anchor point vector in loss calculation in clinical trial-medical record matching, taking the first training vector as a second positive case vector and taking the second training vector as a second negative case vector; calculating a second difference value between a third distance parameter and the fourth distance parameter; and calculating a second matching loss parameter matched with the clinical test-medical record based on the second anchor point vector, the second positive case vector, the second negative case vector and the second difference value.
According to another aspect of the embodiments of the present invention, there is also provided a recommendation method for a clinical trial, including: receiving a test recommendation request transmitted by a client, wherein the test recommendation request at least carries: medical record information of the target medical record; analyzing a medical record vector of the target medical record by adopting a pre-trained medical record encoder based on the medical record information of the target medical record; calculating the relevance score of each clinical test and the target medical record according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of relevance scores, and sequencing the relevance scores to obtain a score sequencing result; integrating the clinical tests corresponding to the relevance scores which are greater than a preset score threshold value in the score sorting result to obtain test information of the clinical tests which can be participated by the target patient and are related to the target medical record; pushing a test set formed by clinical tests which can be participated by the target patient and test information of each clinical test to the client.
Optionally, the step of calculating a relevance score of each clinical trial and the target medical record according to the medical record vector and a plurality of clinical trials in a preset database to obtain a plurality of relevance scores includes: traversing the clinical tests in the preset database to obtain a test text corresponding to each clinical test; analyzing the test text corresponding to each clinical test by adopting a pre-trained test encoder to obtain a test vector; calculating the similarity between the medical record vector and each test vector in the test vector set corresponding to each clinical test; and calculating the relevance score of each clinical test and the target medical record based on the similarity between the medical record vector and the test vector corresponding to the clinical test to obtain a plurality of relevance scores.
According to an aspect of the embodiments of the present invention, there is provided a method for retrieving test information, which is applied to a cloud server, and includes: receiving a medical record test request transmitted by a request terminal, wherein the medical record test request carries a target medical record of a target patient; responding to the medical record test request, and analyzing a medical record vector of the target medical record by adopting a pre-trained medical record encoder; calculating the relevance score of each clinical test and the target medical record according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of relevance scores; sequencing the plurality of relevance scores to obtain score sequencing results; integrating the clinical tests corresponding to the relevance scores which are greater than a preset score threshold value in the score sorting result to obtain test information of the clinical tests which can be participated by the target patient and are related to the target medical record; and returning the test information to the request terminal.
According to another aspect of the embodiments of the present invention, there is also provided a device for retrieving test information, including: the first analysis unit is used for analyzing the medical record vector of the target medical record by adopting a pre-trained medical record encoder; the first determining unit is used for calculating the relevance score of each clinical test and the target medical record according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of relevance scores; the sorting unit is used for sorting the plurality of relevance scores to obtain a score sorting result; and the second determining unit is used for integrating the clinical tests corresponding to the relevance scores which are greater than a preset score threshold value in the score sorting result to obtain test information of the clinical tests which can be participated by the target patient and are related to the target medical record.
Optionally, the first determining unit includes: the first traversal module is used for traversing the clinical tests in the preset database to obtain a test text corresponding to each clinical test; the first analysis module is used for analyzing the test text corresponding to each clinical test by adopting a pre-trained test encoder to obtain a test vector; the first calculation module is used for calculating the similarity between the medical record vector and each test vector in the test vector set corresponding to each clinical test; and the second calculation module is used for calculating the relevance scores of each clinical test and the target medical record based on the similarity between the medical record vectors and the test vectors corresponding to the clinical tests to obtain the multiple relevance scores.
Optionally, the first analysis module comprises: the first integration submodule is used for integrating a text topic and a brief text in the test text into a test document and acquiring a first test vector of the test document by adopting the test encoder, wherein the brief text is a text for carrying out summary description on the test text; the second integration submodule is used for integrating text contents meeting the requirements of a preset standard text in the test text into a test standard text and acquiring a second test vector of the test standard text by adopting the medical record encoder; a third integration submodule, configured to integrate text content that does not meet a requirement of a preset standard text in the test text into an exclusion standard text, and acquire a third test vector of the exclusion standard text by using the medical record encoder; wherein the first trial vector, the second trial vector, and the third trial vector generate the set of trial vectors.
Optionally, the second computing module comprises: the first calculation submodule is used for calculating cosine similarity between the medical record vector and the first test vector to obtain first similarity; the second calculation submodule is used for calculating cosine similarity between the medical record vector and the second test vector to obtain a second similarity; the third calculation submodule is used for calculating cosine similarity between the medical record vector and the third test vector to obtain third similarity; and the first determining submodule is used for obtaining the correlation score according to the first similarity, the second similarity and the third similarity.
Optionally, the retrieving apparatus further comprises: the third calculation module is used for calculating a first matching loss parameter matched with the medical record-clinical test before analyzing the medical record vector of the target medical record by adopting a pre-trained medical record encoder; the fourth calculation module is used for calculating a second matching loss parameter of clinical test-medical record matching; and the first training module is used for training to obtain the medical record encoder and the test encoder by combining the matching loss parameter I and the corresponding first adjusting parameter, the matching loss parameter II and the corresponding second adjusting parameter.
Optionally, the retrieving apparatus further comprises: the fifth calculation module is used for calculating vectors of text contents meeting the requirements of the preset standard text and text contents not meeting the requirements of the preset text by adopting a medical record encoder before calculating the first matching loss parameter matched with the medical record-clinical test to obtain a first training vector and a second training vector; and the sixth calculation module is used for calculating the vectors of the clinical trial documents and the random trial documents in the trial text of each clinical trial by adopting a trial encoder to obtain a third training vector and a fourth training vector.
Optionally, the third computing module comprises: a fourth calculation sub-module, configured to calculate a first distance parameter between the first anchor point vector and the first positive case vector and a second distance parameter between the first anchor point vector and the first negative case vector by using the first training vector as a first anchor point vector in loss calculation in medical record-clinical trial matching, using the third training vector as a first positive case vector, and using the fourth training vector as a first negative case vector; the fifth calculation submodule is used for calculating a first difference value between the first distance parameter and the second distance parameter; and the sixth calculating submodule is used for calculating a first matching loss parameter matched with the medical record-clinical test on the basis of the first anchor point vector, the first positive case vector, the first negative case vector and the first difference value.
Optionally, the fourth calculation module includes: a seventh calculation submodule, configured to calculate a third distance parameter between the second anchor point vector and the second positive case vector and a fourth distance parameter between the second anchor point vector and the second negative case vector by using the third training vector as a second anchor point vector in the loss calculation in the clinical trial-medical record matching, using the first training vector as a second positive case vector, and using the second training vector as a second negative case vector; the eighth calculating submodule is used for calculating a second difference value between the third distance parameter and the fourth distance parameter; and the ninth calculation sub-module is used for calculating a second matching loss parameter matched with the clinical test and the medical record based on the second anchor point vector, the second positive case vector, the second negative case vector and the second difference value.
According to another aspect of the embodiments of the present invention, there is also provided a test information retrieval apparatus, applied to a cloud server, including: the medical record test system comprises a receiving unit, a sending unit and a receiving unit, wherein the receiving unit is used for receiving a medical record test request transmitted by a request terminal, and the medical record test request carries a target medical record related to a target patient; the response unit is used for responding to the medical record test request and analyzing a medical record vector of the target medical record by adopting a pre-trained medical record encoder; calculating the relevance score of each clinical test and the target medical record according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of relevance scores; sequencing the plurality of relevance scores to obtain score sequencing results; integrating the clinical tests corresponding to the relevance scores which are greater than a preset score threshold value in the score sorting result to obtain test information of the clinical tests which can be participated by the target patient and are related to the target medical record; and the return unit is used for returning the test information to the request terminal.
According to another aspect of the embodiments of the present invention, there is also provided a storage medium, the storage medium includes a stored program, wherein when the program runs, the apparatus on which the storage medium is located is controlled to execute the above-mentioned search method for the test information, or execute the above-mentioned recommendation method for the clinical test.
According to another aspect of the embodiments of the present invention, there is also provided a processor for executing a program, wherein the program executes the method for retrieving the test information or the method for recommending the clinical test.
According to another aspect of the embodiments of the present invention, there is also provided a terminal, including: a first device; a second device; a third device; and a fourth device; a processor running a program, wherein the program is run to perform the following processing steps on data from the first, second, third and fourth devices: the first step, adopting a pre-trained medical record encoder to analyze a medical record vector of a target medical record; secondly, calculating the relevance score of each clinical test and the target medical record according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of relevance scores; thirdly, sorting the plurality of relevance scores to obtain a score sorting result; and a fourth step of integrating the clinical trials corresponding to the relevance scores which are greater than a preset score threshold in the score sorting result to obtain trial information of the clinical trials which can be participated by the target patient and are associated with the target medical record.
In the disclosure, a pre-trained medical record encoder is adopted to analyze a medical record vector of a target medical record, a relevance score of each clinical test and the target medical record is calculated according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of relevance scores, the relevance scores are sorted to obtain a score sorting result, and the clinical tests corresponding to the relevance scores which are greater than a preset score threshold value in the score sorting result are integrated to obtain test information of the clinical tests which can be participated by a target patient and are related to the target medical record. In this application, the medical record encoder of accessible training in advance, convert patient's medical record into the medical record vector, afterwards, calculate the correlation score between medical record vector and the clinical trial, thereby obtain patient's clinical trial's that can participate experimental information, not only need not carry out keyword extraction to patient's medical record, and can summarize patient's holomorphism with the medical record vector, can realize high-efficiently and accurately for the patient matches the clinical trial that can participate, and then solved in the correlation technique when retrieving the clinical trial who matches with the patient, the degree of accuracy is lower, and the relatively poor technical problem of efficiency.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing a search method of experimental information;
FIG. 2 is a schematic diagram of an alternative implementation environment for implementing a method for retrieving experimental information, according to an embodiment of the invention;
FIG. 3 is a flow chart of an alternative method of retrieving trial information according to an embodiment of the invention;
FIG. 4 is a schematic diagram of an alternative training scheme for an encoder in vector search according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an alternative vector search according to an embodiment of the present invention;
FIG. 6 is a flow chart of an alternative method of retrieving trial information according to an embodiment of the present invention;
FIG. 7 is a flow chart of an alternative method of clinical trial recommendation in accordance with an embodiment of the present invention;
FIG. 8 is a schematic diagram of an alternative experimental information retrieval arrangement according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of an alternative experimental information retrieval arrangement according to an embodiment of the present invention;
fig. 10 is a block diagram of a computer terminal according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:
and (3) clinical trials: the study is a study in which patients are grouped and administered with different treatments to observe the effect, and can be used to evaluate the efficacy of a new drug or therapy for a particular disease.
Medical record summarization: the text of medical information of the patient, such as chief complaints, medical history, physical examination, auxiliary examination, diagnosis and treatment, is summarized.
Vector retrieval: the encoder maps the query and each document in the document set to the same vector space, with documents closer to the given query being returned as search results.
Transformer: a sequence modeling model can be used for text encoding.
BERT: a Transformer encoder pre-trained with a language model.
ClinicalBERT: BERT trained with clinical medical text data.
The existing search engine can only input keywords in the medical record abstract for retrieval, however, keyword extraction is time-consuming and labor-consuming, the overall appearance of a patient cannot be summarized, clinical tests expressing the same concept by using different terms cannot be searched, and whether the patient meets the inclusion standard or not and does not meet the exclusion standard cannot be judged. For example, a database of a clinical trial registration website includes a large amount of clinical trial registration information worldwide, and a clinical trial search engine of the website can only search clinical trials by inputting keywords of information such as diseases and drugs, and has the following disadvantages: (1) extracting keywords from the original medical record is time consuming and labor intensive; (2) the extracted keywords cannot describe the overall appearance of the patient, for example, in a clinical test of breast cancer, not only the diagnosis of the patient but also the information of the stage, the size of a tumor, the gene mutation condition and the like of the patient are concerned; (3) clinical trials expressing the same concept using different terms cannot be searched; (4) there is no way to determine whether the patient meets inclusion criteria and does not meet exclusion criteria, etc.
In the application, through a pre-trained medical record encoder and a pre-trained test encoder (for example, the medical record and the clinical test encoder are pre-trained by using clinical test data in a database of a certain clinical test registration website, wherein the encoder can adopt a Transformer sequence modeling model, and can also adopt other sequence modeling models such as LSTM, CNN and the like, and the embodiments of the invention are not specifically limited herein), medical record vectors related to patient medical records and test vectors related to clinical tests are obtained, and through vector retrieval, clinical tests which can be participated in can be matched for patients, so that the defects of labor and time waste of keyword retrieval are overcome, the complete appearance of the patients can be summarized by using the medical record vectors, and meanwhile, the defects that the existing retrieval cannot search clinical tests which use different terms to express the same concept and cannot judge whether the patients meet inclusion standards and do not meet exclusion standards are overcome, the function of efficiently and accurately matching the patient to the clinical trial that can be participated in can be realized.
The present invention will be described in detail below with reference to examples.
Example 1
There is also provided, in accordance with an embodiment of the present invention, an embodiment of a method for retrieving test information, including the steps illustrated in the flowchart of the accompanying figures, as executable by a computer system such as a set of computer-executable instructions, and where a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be executed out of order from that shown.
The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing the search method of experimental information. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more processors (shown as 102a, 102b, … …, 102n in the figures) which may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, a memory 104 for storing data, and a transmission module 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
It should be noted that the one or more processors and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device).
The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the test information retrieval method in the embodiment of the present invention, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implements the test information retrieval method described above. The memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).
Fig. 1 shows a block diagram of a hardware structure, which may be taken as an exemplary block diagram of the computer terminal 10 (or the mobile device) and also taken as an exemplary block diagram of the server, in an alternative embodiment, fig. 2 is a schematic diagram of an implementation environment of an alternative implementation method for retrieving the test information according to an embodiment of the present invention, and as shown in fig. 2, the computer terminal 10 (or the mobile device) may be connected or electronically connected to one or more servers (e.g., a security server, a resource server, a game server, etc.) via a data network. In an alternative embodiment, the computer terminal 10 (or mobile device) may be any mobile computing device. The data network connection may be a local area network connection, a wide area network connection, an internet connection, or other type of data network connection. The computer terminal 10 (or mobile device) may execute to connect to a network service executed by a server 110 (e.g., a security server) or a set of servers 108. A web server is a network-based user service such as social networking, cloud resources, email, online payment, or other online applications.
Under the above operating environment, the present application provides a method for retrieving test information as shown in fig. 3. Fig. 3 is a flowchart of an alternative experimental information retrieval method according to an embodiment of the present invention, and as shown in fig. 3, the retrieval method includes the following steps:
step S302, a medical record encoder trained in advance is adopted to analyze the medical record vector of the target medical record.
Step S304, calculating the relevance score of each clinical trial and the target medical record according to the medical record vector and the plurality of clinical trials in the preset database to obtain a plurality of relevance scores.
And S306, sequencing the plurality of relevance scores to obtain a score sequencing result.
And step S308, integrating the clinical tests corresponding to the relevance scores which are greater than the preset score threshold in the score sorting result to obtain test information of the clinical tests which can be participated by the target patient and are related to the target medical record.
Through the steps, a medical record encoder trained in advance can be adopted to analyze the medical record vector of the target medical record, the relevance score of each clinical test and the target medical record is calculated according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of relevance scores, the relevance scores are sorted to obtain a score sorting result, the clinical tests corresponding to the relevance scores which are larger than a preset score threshold value in the score sorting result are integrated, and test information of the clinical tests which can be participated by the target patient and are related to the target medical record is obtained. In the embodiment of the invention, the medical record of the patient can be converted into the medical record vector through the pre-trained medical record encoder, and then the correlation score between the medical record vector and the clinical test is calculated, so that the test information of the clinical test which can be participated in by the patient can be obtained, the keyword extraction of the medical record of the patient is not needed, the overall appearance of the patient can be summarized by the medical record vector, the participated clinical test can be efficiently and accurately matched for the patient, and the technical problems of lower accuracy and poorer efficiency in searching the clinical test matched with the patient in the related technology are solved.
The following will explain the embodiments of the present invention in detail with reference to the above steps.
Step S302, a medical record encoder trained in advance is adopted to analyze the medical record vector of the target medical record.
In the embodiment of the present invention, a medical record (i.e., a target medical record) of a patient can be recorded as P, and a vector representation of the medical record can be analyzed by a pre-trained medical record encoder to obtain a medical record vector of the target medical record, which is recorded as Encpatient(P) of the reaction mixture. In this embodiment, the medical record encoder may be pre-trained by using clinical trial data in a database of a clinical trial registration website, where the encoder may select to use a Transformer sequence modeling model, or may select to use other sequence modeling models such as LSTM and CNN.
Step S304, calculating the relevance score of each clinical trial and the target medical record according to the medical record vector and the plurality of clinical trials in the preset database to obtain a plurality of relevance scores.
Optionally, the step of calculating a relevance score of each clinical trial and the target medical record according to the medical record vector and the plurality of clinical trials in the preset database to obtain a plurality of relevance scores includes: traversing the clinical tests in a preset database to obtain a test text corresponding to each clinical test; analyzing the test text corresponding to each clinical test by adopting a pre-trained test encoder to obtain a test vector; calculating the similarity between the medical record vector and each test vector in the test vector set corresponding to each clinical test; and calculating the relevance score of each clinical test and the target medical record based on the similarity between the medical record vector and the test vector corresponding to the clinical test to obtain a plurality of relevance scores.
In an embodiment of the present invention, the relevance score to the target medical record may be calculated by traversing all possible clinical trials (i.e., clinical trials in a preset database, where a large amount of clinical trial registration information is stored in advance). Specifically, for each clinical trial i obtained through traversal, a test text corresponding to each clinical trial can be obtained, a pre-trained test encoder is adopted to analyze the test text corresponding to each clinical trial to obtain a test vector, the similarity between the medical record vector and each test vector in the test vector set corresponding to each clinical trial is calculated, and the relevance score between each clinical trial and the target medical record is calculated based on the similarity between the medical record vector and the test vector corresponding to the clinical trial to obtain a plurality of relevance scores.
Optionally, before analyzing the medical record vector of the target medical record by using a pre-trained medical record encoder, the method further includes: calculating a first matching loss parameter of medical record-clinical test matching; calculating a second matching loss parameter of clinical trial-medical record matching; and training to obtain a medical record encoder and a test encoder by combining the matching loss parameter I, the corresponding first adjusting parameter, the matching loss parameter II and the corresponding second adjusting parameter.
Alternatively, before calculating the matching loss parameter one of medical record-clinical trial matching, the method further comprises: calculating vectors of text contents meeting the requirements of a preset standard text and text contents not meeting the requirements of the preset text by adopting a medical record encoder to obtain a first training vector and a second training vector; and calculating vectors of clinical trial documents and random trial documents in the trial text of each clinical trial by using a trial encoder to obtain a third training vector and a fourth training vector.
In the embodiment of the present invention, before analyzing a medical record vector of a target medical record by using a pre-trained medical record encoder, a medical record encoder and a test encoder which are required to be trained are specifically:
for each clinical trial: recording the title (namely the text title in the test text) and the description part of the text (namely the brief text) as a test document D (namely the clinical test document in the test text of the clinical test), including the standard text I (namely the text content meeting the preset standard text requirement), excluding the standard text E (namely the text content not meeting the preset text requirement), randomly sampling and selectingAnother clinical trial document is Dr(i.e., random trial document) and then using the case history encoder EncpatientCalculating a vector representation of the inclusion and exclusion criteria, denoted Encpatient(I) And Encpatient(E) (namely adopting a medical record encoder to calculate vectors of text contents meeting the requirements of the preset standard text and text contents not meeting the requirements of the preset text to obtain a first training vector and a second training vector), and adopting a clinical trial encoder EncpatientCalculating a vector representation of the clinical trial document and the randomized clinical trial document, denoted Enctrial(D) And Enctrial(Dr) (i.e., using the trial encoder to calculate the vectors of the clinical trial documents and the random trial documents in the trial text for each clinical trial to obtain a third training vector and a fourth training vector).
In the related art, the information of the patient is often summarized by the medical record abstract, but the actual clinical test includes information such as the subject, the summary, the disease diagnosis, the inclusion criteria and the exclusion criteria, and the patient needs to satisfy all the inclusion criteria and not satisfy any one of the exclusion criteria before being included in the clinical test.
In this embodiment, a first training vector, a second training vector, a third training vector and a fourth training vector (denoted as Enc) are obtainedpatient(I)、Encpatient(E)、Enctrial(D) And Enctrial(Dr) Then, the match loss parameter Triplet loss for medical record-clinical trial matching and clinical trial-medical record matching are calculated, respectively (i.e., calculate the match loss parameter one for medical record-clinical trial matching, which can be recorded as L)patient-to-trialAnd calculating a second matching loss parameter of clinical trial-case history matching, which can be recorded as Ltrial-to-patient) Wherein, the calculation formula of the matching loss parameter Triplet loss is as follows:
Triplet Loss=max(0,(dist(anchor,pos)-dist(anchor,neg)+margin));
the anchor, pos and neg respectively represent vectors of an anchor point, a positive example and a negative example, dist represents a distance function, and margin is a target of the difference between the optimized anchor point-positive example distance and the optimized anchor point-negative example distance.
In this embodiment, after the first matching loss parameter and the second matching loss parameter are obtained through calculation, the first matching loss parameter, the corresponding first adjusting parameter α, the second matching loss parameter, and the corresponding second adjusting parameter 1- α may be combined to obtain an optimized loss function L as follows: l ═ α × Lpatient-to-trial+(1-α)*Ltrial-to-patientWith the goal of minimizing the loss function, the medical record encoder and the clinical trial encoder can be trained in a joint optimizer (e.g., Adam optimizer), resulting in a medical record encoder and a trial encoder.
Alternatively, the optimized loss function may instead predict a pointwise loss of match between the inclusion/exclusion criteria and the clinical trial document/randomized clinical trial document.
Optionally, the step of calculating a first matching loss parameter for medical record-clinical trial matching includes: calculating a first distance parameter between the anchor point vector I and the positive case vector I and a second distance parameter between the anchor point vector I and the negative case vector I by taking the first training vector as the anchor point vector I in loss calculation in medical record-clinical test matching, taking the third training vector as the positive case vector I and taking the fourth training vector as the negative case vector I; calculating a first difference value between the first distance parameter and the second distance parameter; and calculating a first matching loss parameter of medical record-clinical test matching based on the first anchor point vector, the first positive case vector, the first negative case vector and the first difference value.
In the embodiment of the invention, in the triple loss calculation matched with medical record-clinical test, an anchor point vector I can be set as Encpatient(I) The positive case vector one is Enctrial(D) Negative case vector one is Enctrial(Dr) (i.e. using the first training vector as anchor point vector one in the loss calculation in the medical record-clinical trial matching, using the third training vector as positive example vector one, and using the fourth training vector as negative example vector one), calculating a first distance parameter dist (Enc) between the anchor point vector one and the positive example vector onepatient(I),Enctrial(D) And a second distance parameter dist (Enc) between anchor vector one and negative example vector onepatient(I),Enctrial(Dr) Then, the difference between the first distance parameter and the second distance parameter is calculated as one (i.e., dist (Enc)patient(I),Enctrial(D))-dist(Encpatient(I),Enctrial(Dr) One) of the medical record and clinical trial matching is calculated based on the anchor point vector one, the positive case vector one, the negative case vector one and the difference value one, namely, the matching loss parameter one L is calculated by the following formulapatient-to-trial
Lpatient-to-trial=max(0,(dist(Encpatient(I),Enctrial(D))-dist(Encpatient(I),Enctrial(Dr))+margin))。
Optionally, the step of calculating a second matching loss parameter for clinical trial-medical record matching includes: calculating a third distance parameter between the second anchor point vector and the second positive case vector and a fourth distance parameter between the second anchor point vector and the second negative case vector by taking the third training vector as a second anchor point vector in loss calculation in clinical test-medical record matching, taking the first training vector as a second positive case vector and taking the second training vector as a second negative case vector; calculating a second difference value between the third distance parameter and the fourth distance parameter; and calculating a second matching loss parameter matched with the clinical test-medical record based on the second anchor point vector, the second positive case vector, the second negative case vector and the second difference value.
In the embodiment of the invention, in the clinical trial-case history matching Triplet loss calculation, the anchor point vector two can be set as Enctrial(D) The positive case vector is two and Encpatient(I) The negative case vector is twopatient(E) (namely, the third training vector is taken as an anchor point vector two in loss calculation in clinical test-medical record matching, the first training vector is taken as a positive example vector two, the second training vector is taken as a negative example vector two), and a third distance parameter dist (Enc) between the anchor point vector two and the positive example vector two is calculatedtrial(D),Encpatient(I) And a fourth distance parameter dist (Enc) between anchor point vector two and negative example vector twotrial(D),Encpatient(E))。
Then, a third distance parameter is calculatedThe difference between the fourth distance parameters is two (i.e., dist (Enc)trial(D),Encpatient(I))-dist(Enctrial(D),Encpatient(E) Two) of the two vectors, a second matching loss parameter L of the clinical trial-medical record matching is calculated based on the second anchor point vector, the second positive case vector, the second negative case vector and the second difference value, namely the second matching loss parameter L is calculated by the following formulatrial-to-patient
Ltrial-to-patient=max(0,(dist(Enctrial(D),Encpatient(I))-dist(Enctrial(D),Encpatient(E))+margin))。
Alternatively, the distance between the anchor point and the positive and negative examples may be modeled by an interactive model, for example, by inputting clinical trial documents and inclusion/exclusion criteria into the same BERT for calculation.
Optionally, the step of analyzing the test text corresponding to each clinical test by using a pre-trained test encoder to obtain a test vector includes: integrating text titles and brief texts in the test texts into a test document, and acquiring a first test vector of the test document by adopting a test encoder, wherein the brief texts are texts for carrying out summary description on the test text; integrating text contents meeting the requirements of a preset standard text in the test text into a test standard text, and acquiring a second test vector of the test standard text by using a medical record encoder; integrating text contents which do not meet the requirements of a preset standard text in the test text into an exclusion standard text, and acquiring a third test vector of the exclusion standard text by using a medical record encoder; wherein the first trial vector, the second trial vector and the third trial vector generate a set of trial vectors.
In the embodiment of the present invention, for a clinical test i, a text topic and a brief text (i.e. the brief text refers to a text that describes a summary of a test text) in a test text corresponding to the clinical test may be integrated into a test document DiObtaining a vector representation Enc of a test document with a test encodertrial(Di) (namely a first test vector), and text meeting preset standard in the test textThe required text content is integrated into a test standard text IiIntegrating the text contents which do not meet the requirements of the preset standard text in the test text into an exclusion standard text EiObtaining a vector representation Enc of inclusion criteria and exclusion criteria using a medical record encoderpatient(Ii) And Encpatient(Ei) (i.e., the second trial vector and the third trial vector), wherein the first trial vector, the second trial vector and the third trial vector generate a set of trial vectors.
Optionally, the step of calculating a relevance score between each clinical trial and the target medical record based on the similarity between the medical record vector and the trial vector corresponding to the clinical trial to obtain a plurality of relevance scores includes: calculating cosine similarity between the medical record vector and the first test vector to obtain first similarity; calculating cosine similarity between the medical record vector and the second test vector to obtain a second similarity; calculating cosine similarity between the medical record vector and the third test vector to obtain a third similarity; and obtaining a correlation score according to the first similarity, the second similarity and the third similarity.
In the embodiment of the invention, the medical record vectors Enc can be calculated respectively firstlypatient(P) and a first test vector Enctrial(Di) Second test vector Encpatient(Ii) And a third test vector Encpatient(Ei) Respectively obtaining a first similarity
Figure BDA0003502095640000141
Second degree of similarity
Figure BDA0003502095640000142
Third degree of similarity
Figure BDA0003502095640000143
Obtaining a correlation score according to the first similarity, the second similarity and the third similarity
Figure BDA0003502095640000144
And S306, sequencing the plurality of relevance scores to obtain a score sequencing result.
In the embodiment of the invention, the relevance scores of all clinical tests can be ranked from large to small to obtain the score ranking result.
And step S308, integrating the clinical tests corresponding to the relevance scores which are greater than the preset score threshold in the score sorting result to obtain test information of the clinical tests which can be participated by the target patient and are related to the target medical record.
In the embodiment of the present invention, the preset score threshold may be set according to an actual situation, and the clinical trials corresponding to the relevance scores greater than the preset score threshold are integrated to obtain the trial information of the clinical trials that the target patient associated with the target medical record can participate in.
The following describes a method for retrieving test information with reference to another alternative embodiment.
FIG. 4 is a schematic diagram illustrating an alternative training method for an encoder in vector search according to an embodiment of the present invention, which can train an encoder using clinical trial registration information on a website, including a case history encoder EncpatientAnd a clinical trial encoder EnctrialThe two encoders may select a transformer model, the parameters of which may be initialized by clicalbert. Specifically, during the pre-training process, for each clinical trial:
(1) the subject and description part of the text is marked as a clinical test document D, the included standard text is I, the excluded standard text is E, and another clinical test document selected by random sampling is Dr
(2) Using a case history encoder EncpatientCalculating a vector representation of the inclusion and exclusion criteria, denoted Encpatient(I) And Encpatient(E);
(3) Using clinical trial encoders EncpatientCalculating a vector representation of the clinical trial document and the randomized clinical trial document, denoted Enctrial(D) And Enctrial(Dr);
(4) And calculating the Triplet Loss of the medical record-clinical test matching and the clinical test-medical record matching respectively, wherein the Triplet Loss is max (0, dist (anchor, pos) -dist (anchor, neg) + margin)), the anchor, pos and neg respectively represent vectors of an anchor point, a positive example and a negative example, the dist is a distance function, and the margin is the target of the difference between the optimized anchor point-positive example distance and the anchor point-negative example distance.
In the case of triple loss calculation matched with medical record-clinical test, anchor point is Encpatient(I) (i.e., inclusion criteria (anchor patient), encoded with a medical record encoder), the right case is Enctrial(D) (i.e., clinical trial documentation (which may participate in the trial), encoded with trial encoders), negative example is Enctrial(Dr) (i.e., random clinical documents (not available for trial) encoded with trial encoders), the loss is denoted Lpatient-to-trial
In clinical trial-case history matched Triplet loss calculation, the anchor point is Enctrial(D) (i.e., clinical trial document (anchor trial) encoded with trial encoder), a positive example is Encpatient(I) (i.e., inclusion criteria (patient can participate), encoded using a case encoder), negative example is Encpatient(E) (i.e., exclusion criteria (not participating in the patient), encoded using a medical record encoder), the loss is denoted Ltrial-to-patient
(5) The final optimized loss function L is: l ═ α × Lpatient-to-trial+(1-α)*Ltrial-to-patientWherein alpha is an adjustable parameter in medical record-clinical trial matching, 1-alpha is an adjustable parameter in clinical trial-medical record matching, and the medical record encoder and the clinical trial encoder can be trained in combination with an Adam optimizer with the goal of minimizing the loss function.
Fig. 5 is a schematic diagram of an alternative vector search according to an embodiment of the present invention, as shown in fig. 5, including: the clinical test set comprises a test document coded by a test encoder, an inclusion standard coded by a medical record encoder and an exclusion standard coded by the medical record encoder, cosine similarities cos between the patient medical record coded by the medical record encoder and the test document, the inclusion standard and the exclusion standard are calculated, and then the cosine similarities cos are aggregated to obtain a correlation score, wherein the correlation score is specifically as follows:
(1) a patient medical record is given and is marked as P, a vector representation of the patient medical record is obtained by a pre-trained medical record encoder and is marked as Encpatient(P);
(2) A relevance score is calculated for a given patient medical record, traversing all possible clinical trials. Specifically, for clinical trial i:
1) inscribing its title and description part text as test document DiWhich incorporates the standard text as IiThe exclusion criterion text is Ei
2) Obtaining a vector representation Enc of a test document with a test encodertrial(Di) Obtaining a vector representation Enc of inclusion criteria and exclusion criteria using a medical record encoderpatient(Ii) And Encpatient(Ei);
3) Calculating Encpatient(P) and Enctrial(Di)、Encpatient(Ii) And Encpatient(Ei) Respectively, are recorded as
Figure BDA0003502095640000161
4) Final relevance score
Figure BDA0003502095640000162
(3) And sorting the relevance scores of all clinical tests from large to small to obtain the retrieval result of the clinical test which can be participated in by the medical record.
In the embodiment of the invention, the medical record vector of the patient medical record and the test vector related to the clinical test are obtained through the medical record encoder and the test encoder which are trained in advance, the clinical test which can be participated in can be matched for the patient through vector retrieval, the defects of labor and time waste of keyword retrieval are overcome, the overall appearance of the patient can be summarized by the medical record vector, meanwhile, the defects that the existing retrieval cannot search the clinical test which uses different terms to express the same concept and cannot judge whether the patient meets the inclusion standard and does not meet the exclusion standard are overcome, and the function of efficiently and accurately matching the participated clinical test for the patient can be realized.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art will appreciate that the embodiments described in this specification are presently preferred and that no acts or modules are required by the invention.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
The application provides another test information retrieval method shown in fig. 6, which is applied to a cloud terminal/cloud server, and the cloud server is used as an implementation main body, so that test information of clinical tests which can be participated by a target patient and are associated with a target medical record can be obtained after a medical record test request transmitted by a request terminal is received, and the test information is returned to the request terminal. Fig. 6 is a flowchart of another alternative test information retrieval method according to an embodiment of the present invention, as shown in fig. 6, the method includes the following steps:
step S602, receiving a medical record test request transmitted by a request terminal, wherein the medical record test request carries a target medical record related to a target patient.
Step S604, responding to the medical record test request, and analyzing a medical record vector of the target medical record by adopting a pre-trained medical record encoder; calculating the relevance score of each clinical test and the target medical record according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of relevance scores; sequencing the plurality of relevance scores to obtain a score sequencing result; and integrating the clinical tests corresponding to the relevance scores which are greater than the preset score threshold in the score sorting result to obtain test information of the clinical tests which can be participated by the target patient and are associated with the target medical record.
Step S606, the test information is returned to the requesting terminal.
Through the steps, after a medical record test request transmitted by a request terminal is received, the medical record test request is responded, a medical record encoder trained in advance is adopted to analyze a medical record vector of a target medical record, the relevance score of each clinical test and the target medical record is calculated according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of relevance scores, the relevance scores are sorted to obtain a score sorting result, the clinical tests corresponding to the relevance scores which are larger than a preset score threshold value in the score sorting result are integrated to obtain test information of the clinical tests which can be participated by a target patient and are related to the target medical record, and the test information is returned to the request terminal. In the embodiment of the invention, the medical record of the patient can be converted into the medical record vector through the pre-trained medical record encoder, and then the correlation score between the medical record vector and the clinical test is calculated, so that the test information of the clinical test which can be participated in by the patient can be obtained, the keyword extraction of the medical record of the patient is not needed, the overall appearance of the patient can be summarized by the medical record vector, the participated clinical test can be efficiently and accurately matched for the patient, and the technical problems of lower accuracy and poorer efficiency in searching the clinical test matched with the patient in the related technology are solved.
In the embodiment of the invention, the processing main body can be arranged in the cloud server and interacts with each implementation terminal, after a medical record test request transmitted by the request terminal is received, medical record vectors related to medical records of patients and test vectors related to clinical tests can be obtained through the medical record encoder and the test encoder which are trained in advance, and the clinical tests which can be participated in can be matched for the patients through vector retrieval.
Example 3
There is also provided, in accordance with an embodiment of the present invention, an embodiment of a method for recommending clinical trials, it being noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
FIG. 7 is a flow chart of an alternative method for recommending clinical trials according to an embodiment of the present invention, as shown in FIG. 7, the method comprising the steps of:
step S702, receiving a test recommendation request transmitted by a client, wherein the test recommendation request at least carries: medical record information of the target medical record.
Step S704, based on the medical record information of the target medical record, a pre-trained medical record encoder is adopted to analyze the medical record vector of the target medical record.
Step S706, calculating the relevance score of each clinical trial and the target medical record according to the medical record vector and the plurality of clinical trials in the preset database to obtain a plurality of relevance scores, and sequencing the plurality of relevance scores to obtain a score sequencing result.
Step S708, integrating the clinical trials corresponding to the relevance scores greater than the preset score threshold in the score sorting result to obtain trial information of the clinical trials which can be participated by the target patient and are associated with the target medical record.
Step S710, pushing a test set formed by clinical tests which can be participated by the target patient and test information of each clinical test to the client.
Through the steps, after a test recommendation request transmitted by a client is received, a medical record vector of a target medical record is analyzed by adopting a pre-trained medical record encoder based on medical record information of the target medical record, the relevance score of each clinical test and the target medical record is calculated according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of relevance scores, the relevance scores are sequenced to obtain a score sequencing result, the clinical tests corresponding to the relevance scores which are larger than a preset score threshold value in the score sequencing result are integrated to obtain test information of the clinical tests which can be participated by a target patient and are related to the target medical record, and a test set formed by the clinical tests which can be participated by the target patient and the test information of each clinical test are pushed to the client. In the embodiment of the invention, the medical record of the patient can be converted into the medical record vector through the pre-trained medical record encoder, and then the correlation score between the medical record vector and the clinical test is calculated, so that the clinical test which can be participated in by the patient and the test information of each clinical test can be obtained, the medical record of the patient is not required to be subjected to keyword extraction, the overall appearance of the patient can be summarized by using the medical record vector, the clinical test which can be participated in can be efficiently and accurately matched for the patient, and the technical problems of lower accuracy and poorer efficiency in retrieving the clinical test matched with the patient in the related technology can be further solved.
The following describes the embodiments of the present invention in detail with reference to the above steps.
Step S702, receiving a test recommendation request transmitted by a client, wherein the test recommendation request at least carries: medical record information of the target medical record.
Step S704, based on the medical record information of the target medical record, a pre-trained medical record encoder is adopted to analyze the medical record vector of the target medical record.
In embodiments of the present invention, the patient's medical record (i.e., the target) can be analyzedLabeled medical record) is recorded as P, the vector representation of the medical record is analyzed by a pre-trained medical record encoder to obtain the medical record vector of the target medical record, and the medical record vector is recorded as Encpatient(P) of the following. In this embodiment, the medical record encoder may be pre-trained by using clinical trial data in a database of a clinical trial registration website, where the encoder may select to use a Transformer sequence modeling model, or may select to use other sequence modeling models such as LSTM and CNN.
Step S706, calculating the relevance score of each clinical trial and the target medical record according to the medical record vector and the plurality of clinical trials in the preset database to obtain a plurality of relevance scores, and sequencing the plurality of relevance scores to obtain a score sequencing result.
Optionally, the step of calculating a relevance score of each clinical trial and the target medical record according to the medical record vector and the plurality of clinical trials in the preset database to obtain a plurality of relevance scores includes: traversing the clinical tests in a preset database to obtain a test text corresponding to each clinical test; analyzing the test text corresponding to each clinical test by adopting a pre-trained test encoder to obtain a test vector; calculating the similarity between the medical record vector and each test vector in the test vector set corresponding to each clinical test; and calculating the relevance score of each clinical test and the target medical record based on the similarity between the medical record vector and the test vector corresponding to the clinical test to obtain a plurality of relevance scores.
In an embodiment of the present invention, the relevance score to the target medical record may be calculated by traversing all possible clinical trials (i.e., clinical trials in a preset database, where a large amount of clinical trial registration information is stored in advance). Specifically, for each clinical trial i obtained through traversal, a test text corresponding to each clinical trial can be obtained, a pre-trained test encoder is adopted to analyze the test text corresponding to each clinical trial to obtain a test vector, the similarity between the medical record vector and each test vector in the test vector set corresponding to each clinical trial is calculated, and the relevance score between each clinical trial and the target medical record is calculated based on the similarity between the medical record vector and the test vector corresponding to the clinical trial to obtain a plurality of relevance scores.
Optionally, before analyzing the medical record vector of the target medical record by using a pre-trained medical record encoder, the method further includes: calculating a first matching loss parameter of medical record-clinical test matching; calculating a second matching loss parameter of clinical trial-medical record matching; and training to obtain a medical record encoder and a test encoder by combining the matching loss parameter I, the corresponding first adjusting parameter, the matching loss parameter II and the corresponding second adjusting parameter.
Alternatively, before calculating the matching loss parameter one of medical record-clinical trial matching, the method further comprises: calculating vectors of text contents meeting the requirements of a preset standard text and text contents not meeting the requirements of the preset text by adopting a medical record encoder to obtain a first training vector and a second training vector; and calculating vectors of clinical trial documents and random trial documents in the trial text of each clinical trial by using a trial encoder to obtain a third training vector and a fourth training vector.
In the embodiment of the present invention, before analyzing a medical record vector of a target medical record by using a pre-trained medical record encoder, a medical record encoder and a test encoder which are required to be trained are specifically:
for each clinical trial: the subject (namely the text subject in the test text) and the description part of the text (namely the brief text) are recorded as a test document D (namely the clinical test document in the test text of the clinical test), the standard text is I (namely the text content meeting the preset standard text requirement), the standard text is E (namely the text content not meeting the preset text requirement) is excluded, and another clinical test document selected by random sampling is Dr(i.e., random trial document) and then using the case history encoder EncpatientCalculating a vector representation of the inclusion and exclusion criteria, denoted Encpatient(I) And Encpatient(E) (namely adopting a medical record encoder to calculate vectors of text contents meeting the requirements of the preset standard text and text contents not meeting the requirements of the preset text to obtain a first training vector and a second training vector), and adopting a clinical trial encoder EncpatientCalculating the clinical trial document and randomized clinicsVector representation of the test document, denoted Enctrial(D) And Enctrial(Dr) (i.e., using the trial encoder to calculate the vectors of the clinical trial documents and the random trial documents in the trial text for each clinical trial to obtain a third training vector and a fourth training vector).
In this embodiment, a first training vector, a second training vector, a third training vector and a fourth training vector (denoted as Enc) are obtainedpatient(I)、Encpatient(E)、Enctrial(D) And Enctrial(Dr) Then, the match loss parameter Triplet loss for medical record-clinical trial matching and clinical trial-medical record matching are calculated, respectively (i.e., calculate the match loss parameter one for medical record-clinical trial matching, which can be recorded as L)patient-to-trialAnd calculating a second matching loss parameter of clinical trial-case history matching, which can be recorded as Ltrial-to-patient) Wherein, the calculation formula of the matching loss parameter Triplet loss is as follows:
Triplet Loss=max(0,(dist(anchor,pos)-dist(anchor,neg)+margin));
the anchor, pos and neg respectively represent vectors of an anchor point, a positive example and a negative example, dist represents a distance function, and marqin is a target of the difference between the optimized anchor point-positive example distance and the anchor point-negative example distance.
In this embodiment, after the first matching loss parameter and the second matching loss parameter are obtained through calculation, the first matching loss parameter, the corresponding first adjusting parameter α, the second matching loss parameter, and the corresponding second adjusting parameter 1- α may be combined to obtain an optimized loss function L as follows: l ═ α × Lpatient-to-trial+(1-α)*Ltrial-to-patientWith the goal of minimizing the loss function, the medical record encoder and the clinical trial encoder can be trained in a joint optimizer (e.g., Adam optimizer), resulting in a medical record encoder and a trial encoder.
Alternatively, the optimized loss function may instead predict a pointwise loss of match between the inclusion/exclusion criteria and the clinical trial document/randomized clinical trial document.
Optionally, the step of calculating a first matching loss parameter for matching the medical record-clinical trial comprises: calculating a first distance parameter between the anchor point vector I and the positive case vector I and a second distance parameter between the anchor point vector I and the negative case vector I by taking the first training vector as the anchor point vector I in loss calculation in medical record-clinical test matching, taking the third training vector as the positive case vector I and taking the fourth training vector as the negative case vector I; calculating a first difference value between the first distance parameter and the second distance parameter; and calculating a first matching loss parameter of medical record-clinical test matching based on the first anchor point vector, the first positive case vector, the first negative case vector and the first difference value.
In the embodiment of the invention, in the Triptlet loss calculation matched with medical record-clinical test, an anchor point vector I can be set as Encpatient(I) The positive case vector one is Enctrial(D) Negative case vector one is Enctrial(Dr) (i.e. using the first training vector as anchor point vector one in the loss calculation in the medical record-clinical trial matching, using the third training vector as positive example vector one, and using the fourth training vector as negative example vector one), calculating a first distance parameter dist (Enc) between the anchor point vector one and the positive example vector onepatient(I),Enctrial(D) And a second distance parameter dist (Enc) between anchor vector one and negative example vector onepatient(I),Enctrial(Dr) Then, the difference between the first distance parameter and the second distance parameter is calculated as one (i.e., dist (Enc)patient(I),Enctrial(D))-dist(Encpatient(I),Enctrial(Dr) In a case of a medical record-clinical trial matching, then, based on the anchor point vector one, the positive case vector one, the negative case vector one, and the difference one, a matching loss parameter one of the medical record-clinical trial matching is calculated, i.e., a matching loss parameter one L is calculated by the following formulapatient-to-trial
Lpatient-to-trial=max(0,(dist(Encpatient(I),Enctrial(D))-dist(Encpatient(I),Enctrial(Dr))+margin))。
Optionally, the step of calculating a second matching loss parameter for clinical trial-medical record matching includes: calculating a third distance parameter between the second anchor point vector and the second positive case vector and a fourth distance parameter between the second anchor point vector and the second negative case vector by taking the third training vector as a second anchor point vector in loss calculation in clinical test-medical record matching, taking the first training vector as a second positive case vector and taking the second training vector as a second negative case vector; calculating a second difference value between the third distance parameter and the fourth distance parameter; and calculating a second matching loss parameter matched with the clinical test-medical record based on the second anchor point vector, the second positive case vector, the second negative case vector and the second difference value.
In the embodiment of the invention, in the clinical trial-case history matching Triplet loss calculation, the anchor point vector two can be set as Enctrial(D) The positive case vector is two and Encpatient(I) The negative case vector is twopatient(E) (i.e., the third training vector is used as the second anchor point vector in the loss calculation in the clinical trial-medical record matching, the first training vector is used as the second positive example vector, and the second training vector is used as the second negative example vector), and a third distance parameter dist (Enc) between the second anchor point vector and the second positive example vector is calculatedtrial(D),Encpatient(I) And a fourth distance parameter dist (Enc) between anchor point vector two and negative example vector twotrial(D),Encpatient(E))。
Thereafter, the difference two (i.e., dist (Enc)) between the third distance parameter and the fourth distance parameter is calculatedtrial(D),Encpatient(I))-dist(Enctrial(D),Encpatient(E) Two) of the two vectors, a second matching loss parameter L of the clinical trial-medical record matching is calculated based on the second anchor point vector, the second positive case vector, the second negative case vector and the second difference value, namely the second matching loss parameter L is calculated by the following formulatrial-to-patient
Ltrial-to-patient=max(0,(dist(Enctrial(D),Encpatient(I))-dist(Enctrial(D),Encpatient(E))+margin))。
Alternatively, the distance between the anchor point and the positive and negative examples may be modeled by an interactive model, for example, by inputting clinical trial documents and inclusion/exclusion criteria into the same BERT for calculation.
Optionally, the step of analyzing the test text corresponding to each clinical test by using a pre-trained test encoder to obtain a test vector includes: integrating text titles and brief texts in the test texts into a test document, and acquiring a first test vector of the test document by adopting a test encoder, wherein the brief texts are texts for carrying out summary description on the test text; integrating text contents meeting the requirements of a preset standard text in the test text into the test standard text, and acquiring a second test vector of the test standard text by using a medical record encoder; integrating text contents which do not meet the requirements of a preset standard text in the test text into an exclusion standard text, and acquiring a third test vector of the exclusion standard text by using a medical record encoder; wherein the first trial vector, the second trial vector and the third trial vector generate a set of trial vectors.
In the embodiment of the present invention, for a clinical test i, a text topic and a brief text (i.e. the brief text refers to a text that describes a summary of a test text) in a test text corresponding to the clinical test may be integrated into a test document DiObtaining a vector representation Enc of a test document with a test encodertrial(Di) (namely a first test vector), integrating the text contents meeting the requirements of the preset standard text in the test text into a test standard text IiIntegrating the text contents which do not meet the requirements of the preset standard text in the test text into an exclusion standard text EiObtaining a vector representation Enc of inclusion criteria and exclusion criteria using a medical record encoderpatient(Ii) And Encpatient(Ei) (i.e., the second trial vector and the third trial vector), wherein the first trial vector, the second trial vector and the third trial vector generate a set of trial vectors.
Optionally, the step of calculating a relevance score between each clinical trial and the target medical record based on the similarity between the medical record vector and the trial vector corresponding to the clinical trial to obtain a plurality of relevance scores includes: calculating cosine similarity between the medical record vector and the first test vector to obtain first similarity; calculating cosine similarity between the medical record vector and the second test vector to obtain a second similarity; calculating cosine similarity between the medical record vector and the third test vector to obtain a third similarity; and obtaining a correlation score according to the first similarity, the second similarity and the third similarity.
In the embodiment of the invention, the medical record vectors Enc can be calculated respectively firstlypatient(P) and a first test vector Enctrial(Di) A second test vector Encpatient(Ii) And a third test vector Encpatient(Ei) Respectively obtaining a first similarity
Figure BDA0003502095640000221
Second degree of similarity
Figure BDA0003502095640000222
Third degree of similarity
Figure BDA0003502095640000223
Obtaining a correlation score according to the first similarity, the second similarity and the third similarity
Figure BDA0003502095640000224
In the embodiment of the invention, after the relevance scores of all clinical trials are obtained, the relevance scores of all clinical trials can be ranked from large to small to obtain the score ranking result.
Step S708, integrating the clinical trials corresponding to the relevance scores greater than the preset score threshold in the score sorting result to obtain trial information of the clinical trials which can be participated by the target patient and are associated with the target medical record.
Step S710, pushing a test set formed by clinical tests which can be participated by the target patient and test information of each clinical test to the client.
In the embodiment of the invention, the medical record vector of the patient medical record and the test vector related to the clinical test are obtained through the pre-trained medical record encoder and the test encoder, and the test set consisting of the participatable clinical tests and the test information of each clinical test can be pushed to the target patient through vector retrieval, so that the function of efficiently and accurately recommending the participatable clinical tests to the patient is realized.
Example 4
According to an embodiment of the present invention, there is also provided a retrieval apparatus for implementing the retrieval method of test information in embodiment 1 above, as shown in fig. 8, the apparatus including:
a first analysis unit 80, configured to analyze a medical record vector of a target medical record using a pre-trained medical record encoder;
the first determining unit 82 is configured to calculate a relevance score between each clinical trial and the target medical record according to the medical record vector and a plurality of clinical trials in a preset database to obtain a plurality of relevance scores;
a sorting unit 84, configured to sort the multiple relevance scores to obtain a score sorting result;
the second determining unit 86 is configured to integrate the clinical trials corresponding to the relevance scores greater than the preset score threshold in the score sorting result to obtain trial information of clinical trials that can be participated by the target patient and are associated with the target medical record.
The retrieval device can analyze the medical record vector of the target medical record by the first analysis unit 80 through a pre-trained medical record encoder, calculate the relevance score of each clinical test and the target medical record according to the medical record vector and a plurality of clinical tests in a preset database through the first determination unit 82 to obtain a plurality of relevance scores, sort the relevance scores through the sorting unit 84 to obtain a score sorting result, and integrate the clinical tests corresponding to the relevance scores which are greater than a preset score threshold value in the score sorting result through the second determination unit 86 to obtain the test information of the clinical tests which can be participated by the target patient and are related to the target medical record. In the embodiment of the invention, the medical record of the patient can be converted into the medical record vector through the pre-trained medical record encoder, and then the correlation score between the medical record vector and the clinical test is calculated, so that the test information of the clinical test which can be participated in by the patient can be obtained, the keyword extraction of the medical record of the patient is not needed, the overall appearance of the patient can be summarized by the medical record vector, the participated clinical test can be efficiently and accurately matched for the patient, and the technical problems of lower accuracy and poorer efficiency in searching the clinical test matched with the patient in the related technology are solved.
Optionally, the first determining unit includes: the first traversal module is used for traversing the clinical tests in the preset database to obtain a test text corresponding to each clinical test; the first analysis module is used for analyzing the test text corresponding to each clinical test by adopting a pre-trained test encoder to obtain a test vector; the first calculation module is used for calculating the similarity between the medical record vectors and each test vector in the test vector set corresponding to each clinical test; and the second calculation module is used for calculating the relevance score of each clinical test and the target medical record based on the similarity between the medical record vector and the test vector corresponding to the clinical test to obtain a plurality of relevance scores.
Optionally, the first analysis module includes: the first integration submodule is used for integrating text titles and brief texts in the test texts into a test document and acquiring a first test vector of the test document by adopting a test encoder, wherein the brief text is a text for carrying out summary description on the test text; the second integration submodule is used for integrating text contents meeting the requirements of the preset standard text in the test text into the test standard text and acquiring a second test vector of the test standard text by adopting a medical record encoder; the third integration submodule is used for integrating the text content which does not meet the requirements of the preset standard text in the test text into an exclusion standard text and acquiring a third test vector of the exclusion standard text by adopting a medical record encoder; wherein the first trial vector, the second trial vector and the third trial vector generate a set of trial vectors.
Optionally, the second calculating module includes: the first calculation submodule is used for calculating cosine similarity between the medical record vector and the first test vector to obtain first similarity; the second calculation submodule is used for calculating cosine similarity between the medical record vector and the second test vector to obtain a second similarity; the third calculation submodule is used for calculating cosine similarity between the medical record vector and the third test vector to obtain third similarity; and the first determining submodule is used for obtaining the correlation score according to the first similarity, the second similarity and the third similarity.
Optionally, the retrieving apparatus further includes: the third calculation module is used for calculating a first matching loss parameter matched with the medical record-clinical test before analyzing the medical record vector of the target medical record by adopting a pre-trained medical record encoder; the fourth calculation module is used for calculating a second matching loss parameter of clinical test-medical record matching; and the first training module is used for training to obtain the medical record encoder and the test encoder by combining the matching loss parameter I, the corresponding first adjusting parameter, the matching loss parameter II and the corresponding second adjusting parameter.
Optionally, the retrieving apparatus further includes: the fifth calculation module is used for calculating vectors of text contents meeting the requirements of the preset standard text and text contents not meeting the requirements of the preset text by adopting a medical record encoder before calculating the first matching loss parameter matched with the medical record-clinical test to obtain a first training vector and a second training vector; and the sixth calculation module is used for calculating the vectors of the clinical trial documents and the random trial documents in the trial text of each clinical trial by adopting the trial encoder to obtain a third training vector and a fourth training vector.
Optionally, the third computing module includes: the fourth calculation sub-module is used for calculating a first distance parameter between the first anchor point vector and the first positive case vector and a second distance parameter between the first anchor point vector and the first negative case vector by taking the first training vector as the first anchor point vector in loss calculation in medical record-clinical test matching, taking the third training vector as the first positive case vector and taking the fourth training vector as the first negative case vector; the fifth calculation submodule is used for calculating a first difference value between the first distance parameter and the second distance parameter; and the sixth calculation sub-module is used for calculating a first matching loss parameter matched with the medical record-clinical test based on the anchor point vector I, the positive case vector I, the negative case vector I and the difference value I.
Optionally, the fourth calculating module includes: the seventh calculation sub-module is used for calculating a third distance parameter between the second anchor point vector and the second positive case vector and a fourth distance parameter between the second anchor point vector and the second negative case vector by taking the third training vector as the second anchor point vector in the loss calculation in the clinical trial-medical record matching, taking the first training vector as the second positive case vector and taking the second training vector as the second negative case vector; the eighth calculating submodule is used for calculating a second difference value between the third distance parameter and the fourth distance parameter; and the ninth calculation sub-module is used for calculating a second matching loss parameter matched with the clinical test and the medical record based on the second anchor point vector, the second positive case vector, the second negative case vector and the second difference value.
In the embodiment of the invention, the medical record vector of the patient medical record and the test vector related to the clinical test are obtained through the medical record encoder and the test encoder which are trained in advance, the clinical test which can be participated in can be matched for the patient through vector retrieval, the defects of labor and time waste of keyword retrieval are overcome, the overall appearance of the patient can be summarized by the medical record vector, meanwhile, the defects that the existing retrieval cannot search the clinical test which uses different terms to express the same concept and cannot judge whether the patient meets the inclusion standard and does not meet the exclusion standard are overcome, and the function of efficiently and accurately matching the participated clinical test for the patient can be realized.
It should be noted that the modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.
Example 5
According to an embodiment of the present invention, there is also provided a retrieval apparatus for implementing the retrieval method of test information in embodiment 2 described above, as shown in fig. 9, the apparatus including:
the receiving unit 90 is configured to receive a medical record test request transmitted by a request terminal, where the medical record test request carries a target medical record of a target patient;
the response unit 92 is configured to respond to the medical record test request, and analyze a medical record vector of the target medical record by using a pre-trained medical record encoder; calculating the relevance score of each clinical test and the target medical record according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of relevance scores; sequencing the plurality of relevance scores to obtain a score sequencing result; integrating the clinical tests corresponding to the relevance scores which are greater than a preset score threshold value in the score sorting result to obtain test information of the clinical tests which can be participated by the target patient and are related to the target medical record;
a returning unit 94, configured to return the test information to the requesting terminal.
The retrieval device can receive a medical record test request transmitted by a request terminal through the receiving unit 90, respond to the medical record test request through the responding unit 92, analyze a medical record vector of a target medical record by adopting a pre-trained medical record encoder, calculate a correlation score of each clinical test and the target medical record according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of correlation scores, sort the plurality of correlation scores to obtain a score sorting result, integrate the clinical tests corresponding to the correlation scores which are greater than a preset score threshold value in the score sorting result to obtain test information of the clinical tests which can be participated by a target patient and are related to the target medical record, and return the test information to the request terminal through the returning unit 94. In the embodiment of the invention, the medical record of the patient can be converted into the medical record vector through the pre-trained medical record encoder, and then the correlation score between the medical record vector and the clinical test is calculated, so that the test information of the clinical test which can be participated in by the patient can be obtained, the keyword extraction of the medical record of the patient is not needed, the overall appearance of the patient can be summarized by the medical record vector, the participated clinical test can be efficiently and accurately matched for the patient, and the technical problems of lower accuracy and poorer efficiency in searching the clinical test matched with the patient in the related technology are solved.
In the embodiment of the invention, after receiving the medical record test request transmitted by the request terminal, the medical record vector related to the medical record of the patient and the test vector related to the clinical test can be obtained through the medical record encoder and the test encoder which are trained in advance, and the clinical test which can be participated in can be matched for the patient through vector retrieval, so that the defects of labor and time waste of keyword retrieval are overcome, the overall appearance of the patient can be summarized by using the medical record vector, meanwhile, the defects that the clinical test which uses different terms to express the same concept cannot be searched in the existing retrieval, whether the patient meets the inclusion standard and does not meet the exclusion standard or not can be overcome, and the function of efficiently and accurately matching the clinical test which can be participated in for the patient can be realized.
It should be noted that the modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiment 2. It should be noted that the modules described above as a part of the apparatus may operate in a hardware environment as described in the above embodiments, and may be implemented by software or hardware.
Example 6
The embodiment of the invention can provide a computer terminal which can be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.
Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.
In this embodiment, the computer terminal may execute the program codes of the following steps in the method for retrieving the test information to obtain the test information of the clinical test that can be participated by the target patient and is associated with the target medical record.
Alternatively, fig. 10 is a block diagram of a computer terminal according to an embodiment of the present invention. As shown in fig. 10, the computer terminal a may include: one or more (only one shown) processors 102, memory 104.
The memory may be configured to store a software program and a module, such as program instructions/modules corresponding to the method and apparatus for retrieving test information in the embodiments of the present invention, and the processor executes various functional applications and data processing by operating the software program and the module stored in the memory, so as to implement the method for retrieving test information. The memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory located remotely from the processor, which may be connected to the computer terminal a via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: the first step, adopting a pre-trained medical record encoder to analyze a medical record vector of a target medical record; secondly, calculating the relevance score of each clinical test and the target medical record according to the medical record vector and the plurality of clinical tests in the preset database to obtain a plurality of relevance scores; thirdly, sequencing the plurality of relevance scores to obtain score sequencing results; and a fourth step of integrating the clinical tests corresponding to the relevance scores which are greater than a preset score threshold in the score sorting result to obtain test information of the clinical tests which can be participated by the target patient and are related to the target medical record.
It can be understood by a common technical subject in the art that the structure shown in fig. 10 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 10 is a diagram illustrating a structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 10, or have a different configuration than shown in FIG. 10.
It will be understood by those of ordinary skill in the art that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store a program code executed by the test information retrieval method provided in the above embodiment, or a program code executed by a clinical test recommendation method.
Optionally, in this embodiment, the storage medium may be located in any one computer terminal in a computer terminal group in a computer network, or in any one mobile terminal in a mobile terminal group.
The embodiment of the invention also provides a processor. Optionally, in this embodiment, the processor may be configured to execute a method for retrieving the test information provided in the above embodiment, or execute a method for recommending a clinical test provided in the above embodiment.
Optionally, in this embodiment, the processor may be located in any one computer terminal in a computer terminal group in a computer network, or in any one mobile terminal in a mobile terminal group.
An embodiment of the present invention may provide a terminal, including: a first device; a second device; a third device; and a fourth device; a processor, the processor running a program, wherein the program runs to execute the following processing steps for the data from the first device, the second device, the third device and the fourth device: the first step, adopting a pre-trained medical record encoder to analyze a medical record vector of a target medical record; secondly, calculating the relevance score of each clinical test and the target medical record according to the medical record vector and the plurality of clinical tests in the preset database to obtain a plurality of relevance scores; thirdly, sequencing the plurality of relevance scores to obtain score sequencing results; and a fourth step of integrating the clinical tests corresponding to the relevance scores which are greater than a preset score threshold in the score sorting result to obtain test information of the clinical tests which can be participated by the target patient and are related to the target medical record.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (13)

1. A search method for test information is characterized by comprising the following steps:
analyzing a medical record vector of a target medical record by adopting a pre-trained medical record encoder;
calculating the relevance score of each clinical test and the target medical record according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of relevance scores;
sequencing the plurality of relevance scores to obtain score sequencing results;
and integrating the clinical tests corresponding to the relevance scores which are greater than a preset score threshold value in the score sorting result to obtain test information of the clinical tests which can be participated by the target patient and are related to the target medical record.
2. The method of claim 1, wherein the step of calculating a relevance score for each clinical trial on the basis of the medical record vector and a plurality of clinical trials in a predetermined database to obtain a plurality of relevance scores comprises:
traversing the clinical tests in the preset database to obtain a test text corresponding to each clinical test;
analyzing the test text corresponding to each clinical test by adopting a pre-trained test encoder to obtain a test vector;
calculating the similarity between the medical record vector and each test vector in the test vector set corresponding to each clinical test;
and calculating the relevance score of each clinical test and the target medical record based on the similarity between the medical record vector and the test vector corresponding to the clinical test to obtain a plurality of relevance scores.
3. The method of claim 2, wherein the step of analyzing the test text corresponding to each of the clinical tests using a pre-trained test encoder to obtain a test vector comprises:
integrating text titles and brief texts in the test texts into a test document, and acquiring a first test vector of the test document by using the test encoder, wherein the brief texts are texts for carrying out summary description on the test documents;
integrating text contents meeting the requirements of a preset standard text in the test text into a test standard text, and acquiring a second test vector of the test standard text by using the medical record encoder;
integrating text contents which do not meet the requirements of a preset standard text in the test text into an exclusion standard text, and acquiring a third test vector of the exclusion standard text by using the medical record encoder;
wherein the first trial vector, the second trial vector, and the third trial vector generate the set of trial vectors.
4. The retrieval method of claim 3, wherein the step of calculating the relevance score of each clinical trial with the target medical record based on the similarity between the medical record vector and the trial vector corresponding to the clinical trial to obtain the plurality of relevance scores comprises:
calculating cosine similarity between the medical record vector and the first test vector to obtain first similarity;
calculating cosine similarity between the medical record vector and the second test vector to obtain a second similarity;
calculating cosine similarity between the medical record vector and the third test vector to obtain a third similarity;
and obtaining the relevance score according to the first similarity, the second similarity and the third similarity.
5. The retrieving method according to claim 2, further comprising, before analyzing the medical record vector of the target medical record using a pre-trained medical record encoder:
calculating a first matching loss parameter of medical record-clinical test matching;
calculating a second matching loss parameter of clinical trial-medical record matching;
and training to obtain the medical record encoder and the test encoder by combining the matching loss parameter I, the corresponding first adjusting parameter, the matching loss parameter II and the corresponding second adjusting parameter.
6. A method for recommending a clinical trial, comprising:
receiving a test recommendation request transmitted by a client, wherein the test recommendation request at least carries: medical record information of the target medical record;
analyzing a medical record vector of the target medical record by adopting a pre-trained medical record encoder based on the medical record information of the target medical record;
calculating the relevance score of each clinical test and the target medical record according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of relevance scores, and sequencing the relevance scores to obtain a score sequencing result;
integrating the clinical tests corresponding to the relevance scores which are greater than a preset score threshold value in the score sorting result to obtain test information of the clinical tests which can be participated by the target patient and are related to the target medical record;
pushing a test set formed by clinical tests which can be participated by the target patient and test information of each clinical test to the client.
7. The recommendation method of claim 6, wherein the step of calculating a relevance score for each clinical trial of the medical record to the target medical record based on the medical record vector and a plurality of clinical trials in a predetermined database to obtain a plurality of relevance scores comprises:
traversing the clinical tests in the preset database to obtain a test text corresponding to each clinical test;
analyzing the test text corresponding to each clinical test by adopting a pre-trained test encoder to obtain a test vector;
calculating the similarity between the medical record vector and each test vector in the test vector set corresponding to each clinical test;
and calculating the relevance score of each clinical test and the target medical record based on the similarity between the medical record vector and the test vector corresponding to the clinical test to obtain a plurality of relevance scores.
8. A search method of test information is applied to a cloud server and comprises the following steps:
receiving a medical record test request transmitted by a request terminal, wherein the medical record test request carries a target medical record of a target patient;
responding to the medical record test request, and analyzing a medical record vector of the target medical record by adopting a pre-trained medical record encoder; calculating the relevance score of each clinical test and the target medical record according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of relevance scores; sequencing the plurality of relevance scores to obtain score sequencing results; integrating the clinical tests corresponding to the relevance scores which are greater than a preset score threshold value in the score sorting result to obtain test information of the clinical tests which can be participated by the target patient and are related to the target medical record;
and returning the test information to the request terminal.
9. An apparatus for searching test information, comprising:
the first analysis unit is used for analyzing the medical record vector of the target medical record by adopting a pre-trained medical record encoder;
the first determining unit is used for calculating the relevance score of each clinical test and the target medical record according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of relevance scores;
the sorting unit is used for sorting the plurality of relevance scores to obtain a score sorting result;
and the second determining unit is used for integrating the clinical tests corresponding to the relevance scores which are greater than a preset score threshold value in the score sorting result to obtain test information of the clinical tests which can be participated by the target patient and are related to the target medical record.
10. The utility model provides a retrieval device of experimental information which is characterized in that, is applied to the cloud server, includes:
the medical record test system comprises a receiving unit, a sending unit and a receiving unit, wherein the receiving unit is used for receiving a medical record test request transmitted by a request terminal, and the medical record test request carries a target medical record related to a target patient;
the response unit is used for responding to the medical record test request and analyzing a medical record vector of the target medical record by adopting a pre-trained medical record encoder; calculating the relevance score of each clinical test and the target medical record according to the medical record vector and a plurality of clinical tests in a preset database to obtain a plurality of relevance scores; sequencing the plurality of relevance scores to obtain score sequencing results; integrating the clinical tests corresponding to the relevance scores which are greater than a preset score threshold value in the score sorting result to obtain test information of the clinical tests which can be participated by the target patient and are related to the target medical record;
and the return unit is used for returning the test information to the request terminal.
11. A storage medium comprising a stored program, wherein the program, when executed, controls a device on which the storage medium is located to execute a method for retrieving trial information according to any one of claims 1 to 5, or a method for recommending a clinical trial according to any one of claims 6 to 7.
12. A processor for executing a program, wherein the program executes to perform the method for retrieving trial information according to any one of claims 1 to 5 or the method for recommending a clinical trial according to any one of claims 6 to 7.
13. A terminal, comprising:
a first device; a second device;
a third device; and
a fourth device;
a processor running a program, wherein the program is run to perform the following processing steps on data from the first, second, third and fourth devices:
the first step, adopting a pre-trained medical record encoder to analyze a medical record vector of a target medical record;
a second step of calculating a relevance score of each clinical trial and the target medical record according to the medical record vector and a plurality of clinical trials in a preset database to obtain a plurality of relevance scores;
thirdly, sorting the plurality of relevance scores to obtain a score sorting result; and
and fourthly, integrating the clinical tests corresponding to the relevance scores which are greater than a preset score threshold value in the score sorting result to obtain test information of the clinical tests which can be participated by the target patient and are related to the target medical record.
CN202210130064.8A 2022-02-11 2022-02-11 Test information retrieval method and device, clinical test recommendation method and terminal Pending CN114464328A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210130064.8A CN114464328A (en) 2022-02-11 2022-02-11 Test information retrieval method and device, clinical test recommendation method and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210130064.8A CN114464328A (en) 2022-02-11 2022-02-11 Test information retrieval method and device, clinical test recommendation method and terminal

Publications (1)

Publication Number Publication Date
CN114464328A true CN114464328A (en) 2022-05-10

Family

ID=81413306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210130064.8A Pending CN114464328A (en) 2022-02-11 2022-02-11 Test information retrieval method and device, clinical test recommendation method and terminal

Country Status (1)

Country Link
CN (1) CN114464328A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116383344A (en) * 2023-05-25 2023-07-04 广东珠江智联信息科技股份有限公司 Data processing method and system for medical clinical study based on middle stage technology

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116383344A (en) * 2023-05-25 2023-07-04 广东珠江智联信息科技股份有限公司 Data processing method and system for medical clinical study based on middle stage technology
CN116383344B (en) * 2023-05-25 2023-08-04 广东珠江智联信息科技股份有限公司 Data processing method and system for medical clinical study based on middle stage technology

Similar Documents

Publication Publication Date Title
WO2020007028A1 (en) Medical consultation data recommendation method, device, computer apparatus, and storage medium
KR102088980B1 (en) System and Method for Providing personalized hospital information
US10636515B2 (en) Medical or health information search support apparatus and medical or health information search support system
CN110289068A (en) Drug recommended method and equipment
CN103942463A (en) Medical guide system based on patient physical symptoms
CN110895568B (en) Method and system for processing court trial records
WO2023178971A1 (en) Internet registration method, apparatus and device for seeking medical advice, and storage medium
CN111710429A (en) Information pushing method and device, computer equipment and storage medium
CN112037905A (en) Medical question answering method, equipment and storage medium
CN112201359A (en) Artificial intelligence-based critical illness inquiry data identification method and device
de Herrera et al. Comparing fusion techniques for the ImageCLEF 2013 medical case retrieval task
CN113782195A (en) Physical examination package customization method and device
CN114330267A (en) Structural report template design method based on semantic association
CN113111159A (en) Question and answer record generation method and device, electronic equipment and storage medium
CN114464328A (en) Test information retrieval method and device, clinical test recommendation method and terminal
WO2023029510A1 (en) Remote diagnostic inquiry method and apparatus based on artificial intelligence, and device and medium
Grauer et al. Characterization of applicant preference signals, invitations for interviews, and inclusion on match lists for residency positions in urology
CN109299238B (en) Data query method and device
CN110752027A (en) Electronic medical record data pushing method and device, computer equipment and storage medium
CN114416929A (en) Sample generation method, device, equipment and storage medium of entity recall model
CN106502547A (en) Apply to electronic health record medical data inquiry system and the method for mobile terminal
CN117252664A (en) Medicine recommendation reason generation method, device, medium and equipment
CN111079021B (en) Method, device, server and storage medium for recommending medical information content
CN105701330A (en) Health information processing method and system
Ayadi et al. A medical image retrieval scheme with relevance feedback through a medical social network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination