WO2021190662A1

WO2021190662A1 - Medical text sorting method and apparatus, electronic device, and storage medium

Info

Publication number: WO2021190662A1
Application number: PCT/CN2021/084228
Authority: WO
Inventors: 李春宇; 朱威
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-10-31
Filing date: 2021-03-31
Publication date: 2021-09-30
Also published as: CN112307190A; CN112307190B

Abstract

A medical text sorting method and an apparatus, an electronic device, and a storage medium, relating to the technical field of medical technology. The method comprises: acquiring a query statement of a user; acquiring a plurality of candidate medical texts corresponding to the query statement; determining at least one sentence corresponding to each candidate medical text in the plurality of candidate medical texts; on the basis of the query statement and the at least one sentence, determining a score corresponding to each candidate medical text; and on the basis of the score corresponding to each candidate medical text, sorting the plurality of candidate medical texts. The present method is beneficial for improving the efficiency of retrieving medical texts.

Description

Medical literature sorting method, device, electronic equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 31, 2020, the application number is 202011206225.4, and the invention title is "Medical Document Sorting Method, Apparatus, Electronic Equipment, and Storage Medium", the entire content of which is incorporated by reference Incorporated in this application.

Technical field

This application relates to the technical field of information recommendation, and specifically relates to a medical document sorting method, device, electronic equipment, and storage medium.

Background technique

The public medicine (PUBMED) database contains a large amount of medical literature, and the mass medical literature often contains the development trend of a certain medical field. By reading the medical literature in the medical field, researchers in related fields can be improved The efficiency and accuracy of decision-making by our and relevant public health policy makers. The inventor found that the current method for users to obtain medical documents from the PUBMED database is generally to input query sentences, and then perform some keyword analysis on the query sentences in the background, search for candidate documents through keyword matching, and put the candidate documents in The visual interface is displayed to users for easy reference.

However, the inventor realized that with the increase of medical documents in Pubmed database, the number of candidate documents searched for each time is very large, and it is difficult for users to obtain the medical documents they want from a large number of candidate documents, which causes users to shrink medical documents. The efficiency is low, and you can't even get the medical literature you want to retrieve.

Summary of the invention

The embodiments of the present application provide a medical document sorting method, device, electronic equipment, and storage medium. By scoring candidate medical literature, the retrieval efficiency of medical literature is improved.

In the first aspect, an embodiment of the present application provides a method for sorting medical documents, including:

Get the user's query statement;

Acquiring multiple candidate medical documents corresponding to the query sentence;

Determine at least one sentence corresponding to each candidate medical document in the plurality of candidate medical documents;

Determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;

According to the score corresponding to each candidate medical document, the multiple candidate medical documents are sorted.

In the second aspect, an embodiment of the present application provides a medical document sorting device, including:

The transceiver unit is used to obtain the user's query statement;

A processing unit for obtaining multiple candidate medical documents corresponding to the query sentence;

The processing unit is further configured to determine at least one sentence corresponding to each candidate medical document among the multiple candidate medical documents;

The processing unit is further configured to determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;

The processing unit is further configured to sort the multiple candidate medical documents according to the score corresponding to each candidate medical document.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, the processor is connected to a memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory , So that the electronic device executes the following method:

Get the user's query statement;

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program causes a computer to execute the following method:

Get the user's query statement;

This application can score candidate medical documents (medical documents recalled for the first time) based on the sentence dimension scores of query sentences and candidate medical documents, and rank candidate medical documents based on the scores (that is, perform a second recall). Users can see the candidate medical literature with the highest score first, can quickly find the medical literature they want to obtain, and improve the retrieval efficiency of medical literature.

Description of the drawings

FIG. 1 is a schematic flowchart of a method for sorting medical documents according to an embodiment of the application;

FIG. 2 is a schematic structural diagram of a neural network provided by an embodiment of this application;

FIG. 3 is a schematic flowchart of a neural network training method provided by an embodiment of this application;

4 is a block diagram of functional units of a device for sorting medical documents according to an embodiment of the application;

Fig. 5 is a schematic structural diagram of a medical document sorting device provided by an embodiment of the application.

Detailed ways

The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.

The technical solution of this application may involve the field of artificial intelligence and/or big data technology, such as neural network technology, and can be applied to information retrieval scenarios such as information retrieval in the medical field to realize digital medical care and push the construction of smart cities . Optionally, the data involved in this application, such as query sentences and/or scores, can be stored in a database, or can be stored in a blockchain, which is not limited in this application.

Refer to FIG. 1, which is a schematic flowchart of a method for sorting medical documents according to an embodiment of the application. This method is applied to a medical document sorting device. The method includes the following steps:

101: The medical document sorting device acquires the query sentence of the user.

Exemplarily, the query sentence may be manually input by the user in the information input field of the medical literature search device, or it may be obtained by performing voice recognition on the user’s voice. For example, the user’s voice is recognized through a voice assistant to obtain the query sentence. This application does not limit the way of obtaining the query sentence.

102: The medical document ranking device obtains multiple candidate medical documents corresponding to the query sentence.

Exemplarily, determine the similarity between the query sentence and each medical document in the medical database (for example, Public Medicine (PUBMED) database), and obtain the similarity from the medical database according to the similarity of each medical document. Multiple candidate medical documents corresponding to the query sentence, for example, medical documents with similarity greater than a threshold are used as candidate medical documents.

Exemplarily, the similarity between the query sentence and the medical literature can be determined through a search server (for example, elasticsearch or slor), and multiple candidate medical literatures corresponding to the query sentence can be obtained from the medical database according to the similarity. That is, a recall of the medical literature in the medical library. This application does not limit the way of obtaining candidate medical documents.

In an embodiment of the present application, the search server is used to determine the similarity between the query sentence and the medical document, mainly by locally matching the query sentence with each medical document to obtain the similarity. Therefore, there will be partial matching but redundant candidate medical documents. For example, the query sentence is "lung cancer patient". In the process of local matching, all medical documents containing the patient may be regarded as candidates corresponding to the query sentence. Medical literature, get multiple redundant candidate medical literatures. Therefore, in order to improve the accuracy of obtaining candidate medical documents, after obtaining multiple candidate medical documents, the entity in each candidate medical document is determined, and the entity in the query sentence is determined, and the entity in the query sentence and the candidate medicine are determined. The similarity between entities in the literature; finally, the similarity between the query sentence and each candidate medicine, and the similarity between the entity in the query sentence and the entity in the candidate medical document are weighted. , Obtain the final similarity corresponding to each candidate medical document, and select the candidate medical document corresponding to the query sentence from the multiple candidate medical documents according to the final similarity corresponding to each candidate medical document. Through entity matching, some candidate medical documents that do not match entities can be filtered. For example, if the query sentence is "lung cancer patient", through entity matching, candidate medical documents that do not contain the entity "lung cancer" can be filtered out.

103: The medical document sorting device determines at least one sentence corresponding to each candidate medical document among the multiple candidate medical documents.

Exemplarily, each of the multiple candidate medical documents can be segmented through an existing toolkit to obtain at least one sentence corresponding to each candidate medical document. For example, the natural language processing toolkit ( Natural Language Toolkit (NLTK) divides sentences into each medical document.

Exemplarily, the identification symbols (for example, period, document symbol) in each medical document can be recognized through NLTK, and the medical text between the identification symbols is used as a candidate sentence in the medical document. Then, the same candidate sentences are merged to obtain at least one sentence corresponding to each medical document. However, in the process of merging, only the words in the sentences are compared one by one, and there are still sentences with the same semantics, thus retaining multiple redundant sentences, for example, the sentence "lung cancer survival rate" and the sentence "survival rate" "of lung cancer" has the same semantics, but due to different expressions, it cannot be regarded as two identical sentences. These two sentences will be regarded as two separate sentences and will not be merged, resulting in many redundant sentences with the same semantics. Therefore, the semantic recognition of each sentence in the at least one sentence can be performed to obtain the semantics of each sentence, and the sentences with the same semantics can be merged. For example, only one of multiple sentences with the same semantics can be kept, thereby filtering out Redundant sentences improve the efficiency of scoring medical literature.

104: The medical document ranking device determines the score corresponding to each candidate medical document according to the query sentence and the at least one sentence.

Exemplarily, perform word embedding processing on each first word in the query sentence to obtain the first word vector corresponding to each first word; perform word embedding processing on each word in sentence A to obtain each first word The second word vector corresponding to the two words, where the sentence A is any sentence in at least one sentence corresponding to each medical document; the inverse document frequency (IDF) corresponding to the query sentence is determined, that is, according to the The number of times the query statement appears in the multiple candidate medical documents and the number of the multiple candidate medical documents determine the IDF of the query statement. Then, according to the first word vector corresponding to each first word, the second word vector corresponding to each second word, and the IDF, the score of each medical document is determined.

Further, according to the self-attention mechanism and the first word vector of each first word, obtain the third word vector corresponding to each first word; and according to the self-attention mechanism and the second word vector of each second word, obtain The fourth word vector corresponding to each second word; according to the two-way attention mechanism, the third word vector corresponding to each first word, and the fourth vector corresponding to each second word, the sentence A to the Indian song is obtained. A feature matrix; then, according to the first similarity matrix, the first feature matrix and the IDF of the query sentence corresponding to each sentence in each medical document, the score corresponding to each medical document is determined.

Specifically, the similarity between the first word vector of each first word and the second word vector of each second word is determined, and the first similarity matrix with the sentence A is obtained, where the first word vector The j-th element of the i-th row in the similarity matrix represents the similarity between the i-th word in the query sentence and the j-th word in the sentence A; the first similarity matrix is pooled, Obtain the second similarity matrix corresponding to the sentence A, where the pooling process is getmax(k) processing, and k represents the number of elements to be retained in each row of the first similarity matrix, for example, when k=3, The first 3 elements in each row of the first similarity matrix can be retained, and the remaining elements can be deleted to obtain the second similarity matrix.

Further, the first word vector of each first word is weighted according to the self-attention mechanism to obtain the third word vector corresponding to each first word, that is, the first word vector corresponding to each first word is obtained according to the first preset parameter. The first word vector of each first word is transformed to obtain the first query vector, first key value vector, and first value vector corresponding to each first word; then, determine the first query vector corresponding to the first word A and The similarity between the first key value vector of each first word in the query sentence is obtained, and the weight coefficient between the first word A and each first word is obtained, that is, the difference between the first word A and each first word The similarity between the two is normalized to obtain the weight coefficient between the first word A and each first word; then, according to the weight coefficient between the first word A and each first word, each The first value vector corresponding to the first word is weighted to obtain the third word vector corresponding to the first word A, where the first word A is any first word in the query sentence; similarly, according to self-attention Mechanism, weighting the second word vector of each second word to obtain the fourth word vector of each second word, wherein the second word vector of each second word is weighted according to the self-attention mechanism , Which is similar to the above-mentioned method of weighting the first word vector of each first word, and will not be described again;

Further, a two-way attention mechanism (co-attention) is used to weight the third word vector of each first word and the fourth word vector of each second word to obtain the first feature matrix corresponding to the sentence A. Exemplarily, the similarity between the third word vector of the first word A and the fourth word vector of each second word is determined, and the weight coefficient between the first word A and each second word is obtained, and then, According to the weight coefficient between each second word, the fourth word vector corresponding to each second word is weighted to obtain the fifth word vector corresponding to the first word A, and further, the first word is determined The first maximum value among the weight coefficients between A and each second word, the first maximum value and the fifth word vector corresponding to the first word A are used for dot multiplication to obtain the target word corresponding to the first word A Then, the target feature vector corresponding to each first word in the query sentence is formed into a first matrix; for example, the fourth word vector of the second word B and the third word vector of each first word are determined The similarity between the second word B and each first word is obtained, and the second word B is any second word in the sentence A; according to the second word B and each first word The weight coefficient between the first words is used to weight the third word vector corresponding to each first word to obtain the fifth word vector corresponding to the second word B; then, determine the second word B and each first word vector The second maximum value among the weight coefficients between words; the second maximum value is used to do a dot multiplication on the corresponding fifth word vector of the second word B to obtain the target word vector corresponding to the second word, and each The target word vector corresponding to the second word forms the second matrix; finally, the third matrix formed by the first matrix, the second matrix and the fourth word vector corresponding to each second word is spliced to obtain the corresponding sentence A The first feature matrix. Then, perform high-level semantic extraction on the first feature matrix to obtain the second feature matrix corresponding to the sentence A.

Concat the second similarity matrix, the second feature matrix, and the inverse text frequency of the query sentence corresponding to the sentence A to obtain the third feature matrix corresponding to the sentence A; it should be understood that if the second is similar The dimensions of the degree matrix and the second feature matrix are different. You can first map the dimensions of the second similarity matrix and the second feature matrix to the same dimension, and then copy the inverse text frequency to obtain the corresponding inverse text frequency A feature vector (for example, a column vector), the dimension of the feature vector is the same as the number of rows of the second similarity matrix and the second feature matrix after the dimension transformation. In this way, the second similarity matrix, the second feature matrix, and the feature vector can be spliced (for example, vertical splicing) to obtain the third feature matrix corresponding to the sentence A.

Then, perform feature extraction on the third feature matrix corresponding to sentence A to obtain the fourth feature matrix corresponding to sentence A; further, perform pooling processing on the fourth feature matrix corresponding to sentence A, that is, getmax(k) processing , Get the fifth feature matrix corresponding to the sentence A;

Finally, the fifth feature matrix corresponding to each sentence in the at least one sentence and the similarity between the query sentence and each candidate medical document are spliced to obtain the target feature matrix corresponding to each candidate medical document; then, Perform feature extraction on the target feature matrix corresponding to each candidate medical document to obtain the target feature vector corresponding to each candidate medical document; classify according to the target feature vector corresponding to each candidate medical document, and obtain the score of each candidate medical document.

105: The medical document sorting device sorts the multiple candidate medical documents according to the score corresponding to each candidate medical document.

Exemplarily, according to the score of each candidate medical document, the multiple candidate medical documents are sorted in descending order, and the sorted multiple candidate medical documents are displayed on the visual interface.

It can be seen that the candidate medical literature can be sorted according to the sentence dimension scores of the query sentence and the candidate medical literature (that is, the second recall) so that users can first see the candidate medical literature with the highest score, and can quickly find what they want The obtained medical literature improves the retrieval efficiency of medical literature.

In one embodiment of this application, the medical document ranking method of this application can also be applied to the field of smart medicine. For example, doctors can use the medical document ranking method of this application to quickly find historical cases or historical documents, so as to be current Diagnosis provides case references, improves diagnosis efficiency, and promotes the development of medical technology.

In an embodiment of the present application, the sorting of medical documents in the present application can be achieved by a neural network that has been trained. The training process of the neural network will be described in detail later, and no further description will be given here. The following describes the process of determining the target score of medical literature in conjunction with the accompanying drawings and taking sentence A as an example.

As shown in Figure 2, the neural network includes embedding layer 1, embedding layer 2, convolutional layer 1, embedding layer 2, self-attention layer 1, self-attention layer 2, bidirectional attention layer, feature extraction layer, dense ( Dense) Network 1 and Dense Network 2. Among them, the convolutional layer 1 and the convolutional layer 2 may be trigram convolution network layers for feature extraction. The feature extraction layer can be a long and short-term memory network LSTM, which is also used for feature extraction. The self-attention layer 1, the self-attention layer 2, and the bidirectional attention layer can be an attention layer based on the bert model;

Exemplarily, the embedding layer 1 is used to perform word embedding processing on each first word in the query sentence to obtain the first word vector corresponding to each first word, and the embedding layer 2 is used to perform word embedding processing on each first word in the sentence A. The word embedding process is performed on the two words to obtain the second word vector corresponding to each second word; then, the first similarity between the first word A and each second word is determined to obtain the first similarity matrix, and The first similarity matrix is subjected to getmax(k) pooling processing to obtain the second similarity matrix.

Convolutional layer 1 is used for feature extraction (ie, semantic feature extraction) of the first word vector of each first word to obtain the first semantic vector corresponding to each first word; self-attention layer 1 is used for each The first semantic vector corresponding to the first word is subjected to self-attention weighting processing, and the third word vector corresponding to each first word is obtained. Through the self-attention layer 1, the key words in the query sentence (that is, starting from the query sentence) can be obtained. The key function) feature amplification; the convolutional layer 2 is used to perform feature extraction (ie, semantic feature extraction) on the second word vector corresponding to each second word to obtain the second semantic vector corresponding to each second word; The attention layer 2 is used to perform self-attention weighting processing on the second semantic vector corresponding to each second word to obtain the fourth word vector corresponding to each second word. Through the self-attention layer 2, the The features of key words (that is, words that can represent the medical literature) are amplified; finally, the two-way attention layer is used to perform two-way attention on the second word vector of each first word and the fourth word vector corresponding to each second word Force weighting processing to obtain the first feature matrix corresponding to sentence A;

The feature extraction network is used to perform feature extraction on the first feature matrix corresponding to sentence A to obtain the second feature matrix corresponding to sentence A; then, the second feature matrix, second similarity matrix and inverse text frequency corresponding to sentence A Perform concat to obtain the third feature matrix corresponding to the sentence A;

Dense network 1 is used to perform feature extraction on the third feature matrix corresponding to sentence A to obtain the fourth feature matrix corresponding to sentence A; and perform getmax processing on the fourth feature matrix to obtain the fifth feature matrix corresponding to sentence A Feature matrix

Finally, concatenate the fifth feature matrix corresponding to each sentence and the first score corresponding to each candidate medicine to obtain the target feature matrix corresponding to each candidate medical document;

Dense network 2 is used for feature extraction of the target feature matrix to obtain a target feature vector corresponding to each candidate medical document, and classify the target feature vector to obtain a score corresponding to each candidate medical document.

Refer to FIG. 3, which is a schematic flowchart of a neural network training method provided by an embodiment of the application. The content in this embodiment is the same as that in the embodiment shown in FIG. 1, and the description will not be repeated here. The method includes the following steps:

301: Obtain a query sample and multiple medical document samples corresponding to the query sample.

Among them, each of the multiple medical literature samples is marked with a true score.

302: Perform sentence segmentation on each medical sample in the multiple medical document samples to obtain at least one sentence corresponding to each medical sample.

303: Input the query sample and the at least one sentence corresponding to each medical document sample into the neural network to obtain a prediction score corresponding to each medical document sample.

304: Adjust the network parameters of the neural network according to the predicted score corresponding to each medical document sample and the actual score corresponding to each medical document sample.

Exemplarily, the loss is determined according to the predicted score corresponding to each medical document sample and the actual score corresponding to each medical document sample, and the network parameters of the neural network are adjusted according to the loss until the neural network converges, and the training is completed Neural Networks.

Refer to FIG. 4, which is a block diagram of the functional unit composition of a medical document sorting device provided by an embodiment of the present application. The medical document sorting device 400 includes: a transceiver unit 401 and a processing unit 402, wherein:

The transceiver unit 401 is used to obtain user query sentences;

The processing unit 402 is configured to obtain multiple candidate medical documents corresponding to the query sentence;

The processing unit 402 is further configured to determine at least one sentence corresponding to each candidate medical document among the multiple candidate medical documents;

The processing unit 402 is further configured to determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;

The processing unit 402 is further configured to sort the multiple candidate medical documents according to the score corresponding to each candidate medical document.

In some possible implementation manners, in terms of obtaining multiple candidate medical documents corresponding to the query sentence, the processing unit 402 is specifically configured to:

Determine the similarity between the query sentence and each medical document in the medical database;

According to the similarity corresponding to each medical document, a plurality of candidate medical documents are selected from the medical database.

In some possible implementation manners, in terms of determining the score corresponding to each candidate medical document according to the query sentence and the at least one sentence, the processing unit 402 is specifically configured to:

Performing word embedding processing on each first word in the query sentence to obtain a first word vector corresponding to each first word;

Performing word embedding processing on each second word in sentence A to obtain a second word vector corresponding to each second word, where the sentence A is any sentence in the at least one sentence;

Determine the inverse text frequency corresponding to the query sentence;

According to the frequency of the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word, the score corresponding to each medical document is determined.

In some possible implementation manners, each medical article is determined according to the frequency of the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word. Regarding the score corresponding to the document, the processing unit 402 is specifically used for:

Determining the similarity between the first word vector of each first word and the second word vector of each second word to obtain a first similarity matrix;

Obtaining a third word vector corresponding to each first word according to the self-attention mechanism and the first word vector of each first word;

Obtaining the fourth word vector corresponding to each second word according to the self-attention mechanism and the second word vector of each second word;

Obtaining the first feature matrix according to the two-way attention mechanism, the third word vector corresponding to each of the first words, and the fourth vector corresponding to each of the second words;

According to the inverse text frequency, the first similarity matrix, and the first feature matrix, the score corresponding to each medical document is determined.

In some possible implementation manners, in terms of determining the score corresponding to each medical document according to the inverse text frequency, the first similarity matrix, and the first feature matrix, the processing unit 402 is specifically configured to :

Pooling the first similarity matrix by using a getmax function to obtain a second similarity matrix;

Performing semantic feature extraction on the first feature matrix to obtain a second feature matrix;

Splicing the second similarity matrix, the second feature matrix, and the inverse text frequency to determine a third feature matrix corresponding to the sentence A;

According to the third feature matrix corresponding to each sentence in the at least one sentence, the score corresponding to each medical document is determined.

In some possible implementation manners, in terms of determining the score corresponding to each medical document according to the third feature matrix corresponding to each sentence in the at least one sentence, the processing unit 402 is specifically configured to:

Performing semantic feature extraction on the third feature matrix corresponding to each sentence to obtain the fourth feature matrix corresponding to each sentence;

Performing pooling processing on the fourth feature matrix corresponding to each sentence by using the getmax function to obtain the fifth feature matrix corresponding to each sentence;

Splicing the fifth matrix corresponding to each sentence and the similarity between the query sentence and each candidate medical document to obtain a target feature matrix corresponding to each candidate medical document;

According to the target feature matrix corresponding to each candidate medical document, the score corresponding to each candidate medical document is determined.

In some possible implementation manners, in terms of obtaining the first feature matrix according to the two-way attention mechanism, the third word vector corresponding to each first word, and the fourth vector corresponding to each second word, processing Unit 402 is specifically used for:

Determine the similarity between the third word vector of the first word A and the fourth word vector of each second word to obtain the weight coefficient between the first word A and each second word, the The first word A is any word in the query sentence;

According to the weight coefficient between the first word A and each second word, weighting is performed on the fourth word vector corresponding to each second word to obtain the fifth word vector corresponding to the first word A. Word vector

Determine the first maximum value among the weight coefficients between the first word A and each of the second words, and use the first maximum value to perform dot multiplication on the fifth word vector corresponding to the first word A , Obtain the target word vector corresponding to the first word A;

Forming a first matrix of target feature vectors corresponding to each first word in the query sentence;

Determine the similarity between the fourth word vector of the second word B and the third word vector of each first word, and obtain the weight between the second word B and each first word Coefficient, where the second word B is any second word in the sentence A;

Weighting the third word vector corresponding to each first word according to the weight coefficient between the second word B and each first word to obtain a fifth word vector corresponding to each second word;

Determine the second maximum value of the weight coefficients between the second word B and each first word, and use the second maximum value to perform dot multiplication on the fifth word vector corresponding to the first word B to obtain The target word vector corresponding to the second word B;

Forming the target feature vector corresponding to each second word into a second matrix;

The first matrix, the second matrix, and the third matrix formed by the fourth word vector corresponding to each second word are spliced to obtain the first feature matrix.

Referring to FIG. 5, FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the application. The electronic device includes a memory and a processor. Optionally, the electronic device may further include a transceiver. For example, as shown in FIG. 5, the electronic device 500 includes a transceiver 501, a processor 502, and a memory 503. They are connected by a bus 504 between them. The storage 503 is used to store computer programs and data, and can transmit the data stored in the storage 503 to the processor 502.

The processor 502 is configured to read the computer program in the memory 503 to perform the following operations:

Control the transceiver 501 to obtain the user's query sentence;

In some possible implementation manners, in terms of obtaining multiple candidate medical documents corresponding to the query sentence, the processor 502 is specifically configured to perform the following operations:

In some possible implementation manners, in terms of determining the score corresponding to each candidate medical document according to the query sentence and the at least one sentence, the processor 502 is specifically configured to perform the following operations:

Determine the inverse text frequency corresponding to the query sentence;

In some possible implementation manners, each medical article is determined according to the frequency of the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word. In terms of scoring corresponding to the document, the processor 502 is specifically configured to perform the following operations:

In some possible implementation manners, in terms of determining the score corresponding to each medical document according to the inverse text frequency, the first similarity matrix, and the first feature matrix, the processor 502 is specifically configured to Do the following:

In some possible implementation manners, in determining the score corresponding to each medical document according to the third feature matrix corresponding to each sentence in the at least one sentence, the processor 502 is specifically configured to perform the following operations:

In some possible implementation manners, in terms of obtaining the first feature matrix according to the two-way attention mechanism, the third word vector corresponding to each first word, and the fourth vector corresponding to each second word, processing The device 502 is specifically configured to perform the following operations:

Specifically, the transceiver 501 may be the transceiver unit 401 of the medical document sorting apparatus 400 in the embodiment shown in FIG. 4, and the processor 502 may be the processing unit 402 of the medical document sorting apparatus 400 in the embodiment shown in FIG. .

It should be understood that the medical document sorting device in this application may include smart phones (such as Android phones, iOS phones, Windows Phone phones, etc.), tablet computers, handheld computers, notebook computers, mobile Internet Devices (Mobile Internet Devices, MID for short) ) Or wearable devices, etc. The aforementioned medical document sorting device is only an example, not an exhaustive list, and includes but not limited to the aforementioned medical document sorting device. In practical applications, the above-mentioned medical document sorting device may also include: intelligent vehicle-mounted terminals, computer equipment, and so on.

The embodiments of the present application also provide a computer (readable) storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement any one of the medicines described in the above method embodiments. Part or all of the steps in the literature ranking method.

Optionally, the storage medium involved in this application, such as a computer-readable storage medium, may be non-volatile or volatile.

The embodiments of the present application also provide a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, the computer program is operable to cause a computer to execute the method described in the above method embodiment Part or all of the steps of any sort of medical literature.

In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.

The functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or in the form of software program modules.

If the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory. A number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.

The embodiments of the application are described in detail above, and specific examples are used in this article to illustrate the principles and implementation of the application. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the application; at the same time, for Those of ordinary skill in the art, based on the idea of the application, will have changes in the specific implementation and the scope of application. In summary, the content of this specification should not be construed as a limitation to the application.

Claims

A sorting method of medical literature, including:

Get the user's query statement;

Acquiring multiple candidate medical documents corresponding to the query sentence;

Determine at least one sentence corresponding to each candidate medical document in the plurality of candidate medical documents;

Determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;

According to the score corresponding to each candidate medical document, the multiple candidate medical documents are sorted.
The method according to claim 1, wherein said obtaining a plurality of candidate medical documents corresponding to said query sentence comprises:

Determine the similarity between the query sentence and each medical document in the medical database;

According to the similarity corresponding to each medical document, a plurality of candidate medical documents are selected from the medical database.
The method according to claim 1 or 2, wherein the determining the score corresponding to each candidate medical document according to the query sentence and the at least one sentence comprises:

Performing word embedding processing on each first word in the query sentence to obtain a first word vector corresponding to each first word;

Performing word embedding processing on each second word in sentence A to obtain a second word vector corresponding to each second word, where the sentence A is any sentence in the at least one sentence;

Determine the inverse text frequency corresponding to the query sentence;

According to the frequency of the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word, the score corresponding to each medical document is determined.
The method according to claim 3, wherein said determining the frequency according to the frequency of the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word Describe the corresponding scores of each medical literature, including:

Determining the similarity between the first word vector of each first word and the second word vector of each second word to obtain a first similarity matrix;

Obtaining a third word vector corresponding to each first word according to the self-attention mechanism and the first word vector of each first word;

Obtaining the fourth word vector corresponding to each second word according to the self-attention mechanism and the second word vector of each second word;

Obtaining the first feature matrix according to the two-way attention mechanism, the third word vector corresponding to each of the first words, and the fourth vector corresponding to each of the second words;

According to the inverse text frequency, the first similarity matrix, and the first feature matrix, the score corresponding to each medical document is determined.
The method according to claim 4, wherein the determining the score corresponding to each medical document according to the inverse text frequency, the first similarity matrix, and the first feature matrix comprises:

Pooling the first similarity matrix by using a getmax function to obtain a second similarity matrix;

Performing semantic feature extraction on the first feature matrix to obtain a second feature matrix;

Splicing the second similarity matrix, the second feature matrix, and the inverse text frequency to determine a third feature matrix corresponding to the sentence A;

According to the third feature matrix corresponding to each sentence in the at least one sentence, the score corresponding to each medical document is determined.
The method according to claim 5, wherein the determining the score corresponding to each medical document according to the third feature matrix corresponding to each sentence in the at least one sentence comprises:

Performing semantic feature extraction on the third feature matrix corresponding to each sentence to obtain the fourth feature matrix corresponding to each sentence;

Performing pooling processing on the fourth feature matrix corresponding to each sentence by using the getmax function to obtain the fifth feature matrix corresponding to each sentence;

Splicing the fifth matrix corresponding to each sentence and the similarity between the query sentence and each candidate medical document to obtain a target feature matrix corresponding to each candidate medical document;

According to the target feature matrix corresponding to each candidate medical document, the score corresponding to each candidate medical document is determined.
The method according to any one of claims 4-6, wherein, according to the two-way attention mechanism, the third word vector corresponding to each first word, and the fourth word vector corresponding to each second word Vector, get the first feature matrix, including:

Determine the similarity between the third word vector of the first word A and the fourth word vector of each second word to obtain the weight coefficient between the first word A and each second word, the The first word A is any word in the query sentence;

According to the weight coefficient between the first word A and each second word, weighting is performed on the fourth word vector corresponding to each second word to obtain the fifth word vector corresponding to the first word A. Word vector

Determine the first maximum value among the weight coefficients between the first word A and each of the second words, and use the first maximum value to perform dot multiplication on the fifth word vector corresponding to the first word A , Obtain the target word vector corresponding to the first word A;

Forming a first matrix of target feature vectors corresponding to each first word in the query sentence;

Determine the similarity between the fourth word vector of the second word B and the third word vector of each first word, and obtain the weight between the second word B and each first word Coefficient, where the second word B is any second word in the sentence A;

Weighting the third word vector corresponding to each first word according to the weight coefficient between the second word B and each first word to obtain a fifth word vector corresponding to each second word;

Determine the second maximum value of the weight coefficients between the second word B and each first word, and use the second maximum value to perform dot multiplication on the fifth word vector corresponding to the first word B to obtain The target word vector corresponding to the second word B;

Forming the target feature vector corresponding to each second word into a second matrix;

The first matrix, the second matrix, and the third matrix formed by the fourth word vector corresponding to each second word are spliced to obtain the first feature matrix.
A sorting device for medical documents, including:

The transceiver unit is used to obtain the user's query statement;

A processing unit for obtaining multiple candidate medical documents corresponding to the query sentence;

The processing unit is further configured to determine at least one sentence corresponding to each candidate medical document among the multiple candidate medical documents;

The processing unit is further configured to determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;

The processing unit is further configured to sort the multiple candidate medical documents according to the score corresponding to each candidate medical document.
An electronic device, comprising: a processor connected to a memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory, so that the electronic device executes the following method:

Get the user's query statement;

Acquiring multiple candidate medical documents corresponding to the query sentence;

Determine at least one sentence corresponding to each candidate medical document in the plurality of candidate medical documents;

Determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;

According to the score corresponding to each candidate medical document, the multiple candidate medical documents are sorted.
9. The electronic device according to claim 9, wherein the execution of determining the score corresponding to each candidate medical document according to the query sentence and the at least one sentence comprises:

Performing word embedding processing on each first word in the query sentence to obtain a first word vector corresponding to each first word;

Performing word embedding processing on each second word in sentence A to obtain a second word vector corresponding to each second word, where the sentence A is any sentence in the at least one sentence;

Determine the inverse text frequency corresponding to the query sentence;

According to the frequency of the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word, the score corresponding to each medical document is determined.
11. The electronic device according to claim 10, wherein the execution of the frequency according to the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word, Determine the score corresponding to each medical document, including:

Determining the similarity between the first word vector of each first word and the second word vector of each second word to obtain a first similarity matrix;

Obtaining a third word vector corresponding to each first word according to the self-attention mechanism and the first word vector of each first word;

Obtaining the fourth word vector corresponding to each second word according to the self-attention mechanism and the second word vector of each second word;

Obtaining the first feature matrix according to the two-way attention mechanism, the third word vector corresponding to each of the first words, and the fourth vector corresponding to each of the second words;

According to the inverse text frequency, the first similarity matrix, and the first feature matrix, the score corresponding to each medical document is determined.
11. The electronic device according to claim 11, wherein executing the determining the score corresponding to each medical document according to the inverse text frequency, the first similarity matrix, and the first feature matrix comprises:

Pooling the first similarity matrix by using a getmax function to obtain a second similarity matrix;

Performing semantic feature extraction on the first feature matrix to obtain a second feature matrix;

Splicing the second similarity matrix, the second feature matrix, and the inverse text frequency to determine a third feature matrix corresponding to the sentence A;

According to the third feature matrix corresponding to each sentence in the at least one sentence, the score corresponding to each medical document is determined.
11. The electronic device according to claim 12, wherein executing said determining the score corresponding to each medical document according to the third feature matrix corresponding to each sentence in the at least one sentence comprises:

Performing semantic feature extraction on the third feature matrix corresponding to each sentence to obtain the fourth feature matrix corresponding to each sentence;

Performing pooling processing on the fourth feature matrix corresponding to each sentence by using the getmax function to obtain the fifth feature matrix corresponding to each sentence;

Splicing the fifth matrix corresponding to each sentence and the similarity between the query sentence and each candidate medical document to obtain a target feature matrix corresponding to each candidate medical document;

According to the target feature matrix corresponding to each candidate medical document, the score corresponding to each candidate medical document is determined.
The electronic device according to any one of claims 11-13, wherein the execution of the two-way attention mechanism, the third word vector corresponding to each first word, and the third word vector corresponding to each second word are executed. The fourth vector, the first feature matrix is obtained, including:

Determine the similarity between the third word vector of the first word A and the fourth word vector of each second word to obtain the weight coefficient between the first word A and each second word, the The first word A is any word in the query sentence;

According to the weight coefficient between the first word A and each second word, weighting is performed on the fourth word vector corresponding to each second word to obtain the fifth word vector corresponding to the first word A. Word vector

Determine the first maximum value among the weight coefficients between the first word A and each of the second words, and use the first maximum value to perform dot multiplication on the fifth word vector corresponding to the first word A , Obtain the target word vector corresponding to the first word A;

Forming a first matrix of target feature vectors corresponding to each first word in the query sentence;

Determine the similarity between the fourth word vector of the second word B and the third word vector of each first word, and obtain the weight between the second word B and each first word Coefficient, where the second word B is any second word in the sentence A;

Weighting the third word vector corresponding to each first word according to the weight coefficient between the second word B and each first word to obtain a fifth word vector corresponding to each second word;

Determine the second maximum value of the weight coefficients between the second word B and each first word, and use the second maximum value to perform dot multiplication on the fifth word vector corresponding to the first word B to obtain The target word vector corresponding to the second word B;

Forming the target feature vector corresponding to each second word into a second matrix;

The first matrix, the second matrix, and the third matrix formed by the fourth word vector corresponding to each second word are spliced to obtain the first feature matrix.
A computer-readable storage medium in which a computer program is stored, and the computer program is executed by a processor to implement the following method:

Get the user's query statement;

Acquiring multiple candidate medical documents corresponding to the query sentence;

Determine at least one sentence corresponding to each candidate medical document in the plurality of candidate medical documents;

Determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;

According to the score corresponding to each candidate medical document, the multiple candidate medical documents are sorted.
15. The computer-readable storage medium according to claim 15, wherein executing said determining the score corresponding to each candidate medical document according to the query sentence and the at least one sentence comprises:

Performing word embedding processing on each first word in the query sentence to obtain a first word vector corresponding to each first word;

Performing word embedding processing on each second word in sentence A to obtain a second word vector corresponding to each second word, where the sentence A is any sentence in the at least one sentence;

Determine the inverse text frequency corresponding to the query sentence;

According to the frequency of the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word, the score corresponding to each medical document is determined.
The computer-readable storage medium according to claim 16, wherein the execution of the first word vector corresponding to each first word according to the inverse text frequency, and the second word vector corresponding to each second word The word vector determines the score corresponding to each medical document, including:

Determining the similarity between the first word vector of each first word and the second word vector of each second word to obtain a first similarity matrix;

Obtaining the third word vector corresponding to each first word according to the self-attention mechanism and the first word vector of each first word;

Obtaining the fourth word vector corresponding to each second word according to the self-attention mechanism and the second word vector of each second word;

Obtaining the first feature matrix according to the two-way attention mechanism, the third word vector corresponding to each of the first words, and the fourth vector corresponding to each of the second words;

According to the inverse text frequency, the first similarity matrix, and the first feature matrix, the score corresponding to each medical document is determined.
18. The computer-readable storage medium according to claim 17, wherein the execution of said determining the score corresponding to each medical document according to said inverse text frequency, said first similarity matrix and said first feature matrix ,include:

Pooling the first similarity matrix by using a getmax function to obtain a second similarity matrix;

Performing semantic feature extraction on the first feature matrix to obtain a second feature matrix;

Splicing the second similarity matrix, the second feature matrix, and the inverse text frequency to determine a third feature matrix corresponding to the sentence A;

According to the third feature matrix corresponding to each sentence in the at least one sentence, the score corresponding to each medical document is determined.
18. The computer-readable storage medium according to claim 19, wherein executing said determining the score corresponding to each medical document according to the third feature matrix corresponding to each sentence in the at least one sentence comprises:

Performing semantic feature extraction on the third feature matrix corresponding to each sentence to obtain the fourth feature matrix corresponding to each sentence;

Performing pooling processing on the fourth feature matrix corresponding to each sentence by using the getmax function to obtain the fifth feature matrix corresponding to each sentence;

Splicing the fifth matrix corresponding to each sentence and the similarity between the query sentence and each candidate medical document to obtain a target feature matrix corresponding to each candidate medical document;

According to the target feature matrix corresponding to each candidate medical document, the score corresponding to each candidate medical document is determined.
The computer-readable storage medium according to any one of claims 17-19, wherein the execution of the two-way attention mechanism, the third word vector corresponding to each first word, and the second The fourth vector corresponding to the word, the first feature matrix is obtained, including:

Determine the similarity between the third word vector of the first word A and the fourth word vector of each second word to obtain the weight coefficient between the first word A and each second word, the The first word A is any word in the query sentence;

According to the weight coefficient between the first word A and each second word, weighting is performed on the fourth word vector corresponding to each second word to obtain the fifth word vector corresponding to the first word A. Word vector

Determine the first maximum value among the weight coefficients between the first word A and each of the second words, and use the first maximum value to perform dot multiplication on the fifth word vector corresponding to the first word A , Obtain the target word vector corresponding to the first word A;

Forming a first matrix of target feature vectors corresponding to each first word in the query sentence;

Determine the similarity between the fourth word vector of the second word B and the third word vector of each first word, and obtain the weight between the second word B and each first word Coefficient, where the second word B is any second word in the sentence A;

Weighting the third word vector corresponding to each first word according to the weight coefficient between the second word B and each first word to obtain a fifth word vector corresponding to each second word;

Determine the second maximum value among the weight coefficients between the second word B and each first word, and use the second maximum value to perform dot multiplication on the fifth word vector corresponding to the first word B to obtain The target word vector corresponding to the second word B;

Forming the target feature vector corresponding to each second word into a second matrix;

The first matrix, the second matrix, and the third matrix formed by the fourth word vector corresponding to each second word are spliced to obtain the first feature matrix.