WO2021190662A1 - Medical text sorting method and apparatus, electronic device, and storage medium - Google Patents

Medical text sorting method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2021190662A1
WO2021190662A1 PCT/CN2021/084228 CN2021084228W WO2021190662A1 WO 2021190662 A1 WO2021190662 A1 WO 2021190662A1 CN 2021084228 W CN2021084228 W CN 2021084228W WO 2021190662 A1 WO2021190662 A1 WO 2021190662A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
sentence
matrix
word vector
medical document
Prior art date
Application number
PCT/CN2021/084228
Other languages
French (fr)
Chinese (zh)
Inventor
李春宇
朱威
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021190662A1 publication Critical patent/WO2021190662A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the technical field of information recommendation, and specifically relates to a medical document sorting method, device, electronic equipment, and storage medium.
  • the public medicine (PUBMED) database contains a large amount of medical literature, and the mass medical literature often contains the development trend of a certain medical field.
  • PUBMED public medicine
  • researchers in related fields can be improved The efficiency and accuracy of decision-making by our and relevant public health policy makers.
  • the inventor found that the current method for users to obtain medical documents from the PUBMED database is generally to input query sentences, and then perform some keyword analysis on the query sentences in the background, search for candidate documents through keyword matching, and put the candidate documents in The visual interface is displayed to users for easy reference.
  • the embodiments of the present application provide a medical document sorting method, device, electronic equipment, and storage medium.
  • the retrieval efficiency of medical literature is improved.
  • an embodiment of the present application provides a method for sorting medical documents, including:
  • the multiple candidate medical documents are sorted.
  • an embodiment of the present application provides a medical document sorting device, including:
  • the transceiver unit is used to obtain the user's query statement
  • the processing unit is further configured to determine at least one sentence corresponding to each candidate medical document among the multiple candidate medical documents;
  • the processing unit is further configured to determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;
  • the processing unit is further configured to sort the multiple candidate medical documents according to the score corresponding to each candidate medical document.
  • an embodiment of the present application provides an electronic device, including a processor, the processor is connected to a memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory , So that the electronic device executes the following method:
  • the multiple candidate medical documents are sorted.
  • an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program causes a computer to execute the following method:
  • the multiple candidate medical documents are sorted.
  • This application can score candidate medical documents (medical documents recalled for the first time) based on the sentence dimension scores of query sentences and candidate medical documents, and rank candidate medical documents based on the scores (that is, perform a second recall). Users can see the candidate medical literature with the highest score first, can quickly find the medical literature they want to obtain, and improve the retrieval efficiency of medical literature.
  • FIG. 1 is a schematic flowchart of a method for sorting medical documents according to an embodiment of the application
  • FIG. 2 is a schematic structural diagram of a neural network provided by an embodiment of this application.
  • FIG. 3 is a schematic flowchart of a neural network training method provided by an embodiment of this application.
  • FIG. 4 is a block diagram of functional units of a device for sorting medical documents according to an embodiment of the application.
  • Fig. 5 is a schematic structural diagram of a medical document sorting device provided by an embodiment of the application.
  • the technical solution of this application may involve the field of artificial intelligence and/or big data technology, such as neural network technology, and can be applied to information retrieval scenarios such as information retrieval in the medical field to realize digital medical care and push the construction of smart cities .
  • the data involved in this application such as query sentences and/or scores, can be stored in a database, or can be stored in a blockchain, which is not limited in this application.
  • FIG. 1 is a schematic flowchart of a method for sorting medical documents according to an embodiment of the application. This method is applied to a medical document sorting device. The method includes the following steps:
  • the medical document sorting device acquires the query sentence of the user.
  • the query sentence may be manually input by the user in the information input field of the medical literature search device, or it may be obtained by performing voice recognition on the user’s voice.
  • the user’s voice is recognized through a voice assistant to obtain the query sentence. This application does not limit the way of obtaining the query sentence.
  • the medical document ranking device obtains multiple candidate medical documents corresponding to the query sentence.
  • the medical database for example, Public Medicine (PUBMED) database
  • PUBMED Public Medicine
  • Multiple candidate medical documents corresponding to the query sentence, for example, medical documents with similarity greater than a threshold are used as candidate medical documents.
  • the similarity between the query sentence and the medical literature can be determined through a search server (for example, elasticsearch or slor), and multiple candidate medical literatures corresponding to the query sentence can be obtained from the medical database according to the similarity. That is, a recall of the medical literature in the medical library.
  • a search server for example, elasticsearch or slor
  • This application does not limit the way of obtaining candidate medical documents.
  • the search server is used to determine the similarity between the query sentence and the medical document, mainly by locally matching the query sentence with each medical document to obtain the similarity. Therefore, there will be partial matching but redundant candidate medical documents.
  • the query sentence is "lung cancer patient".
  • all medical documents containing the patient may be regarded as candidates corresponding to the query sentence.
  • Medical literature get multiple redundant candidate medical literatures. Therefore, in order to improve the accuracy of obtaining candidate medical documents, after obtaining multiple candidate medical documents, the entity in each candidate medical document is determined, and the entity in the query sentence is determined, and the entity in the query sentence and the candidate medicine are determined.
  • the similarity between entities in the literature; finally, the similarity between the query sentence and each candidate medicine, and the similarity between the entity in the query sentence and the entity in the candidate medical document are weighted. , Obtain the final similarity corresponding to each candidate medical document, and select the candidate medical document corresponding to the query sentence from the multiple candidate medical documents according to the final similarity corresponding to each candidate medical document.
  • entity matching some candidate medical documents that do not match entities can be filtered. For example, if the query sentence is "lung cancer patient", through entity matching, candidate medical documents that do not contain the entity "lung cancer” can be filtered out.
  • the medical document sorting device determines at least one sentence corresponding to each candidate medical document among the multiple candidate medical documents.
  • each of the multiple candidate medical documents can be segmented through an existing toolkit to obtain at least one sentence corresponding to each candidate medical document.
  • the natural language processing toolkit Natural Language Toolkit (NLTK) divides sentences into each medical document.
  • NLTK Natural Language Toolkit
  • the identification symbols (for example, period, document symbol) in each medical document can be recognized through NLTK, and the medical text between the identification symbols is used as a candidate sentence in the medical document. Then, the same candidate sentences are merged to obtain at least one sentence corresponding to each medical document.
  • the same candidate sentences are merged to obtain at least one sentence corresponding to each medical document.
  • the sentence "lung cancer survival rate” and the sentence "survival rate” "of lung cancer” has the same semantics, but due to different expressions, it cannot be regarded as two identical sentences. These two sentences will be regarded as two separate sentences and will not be merged, resulting in many redundant sentences with the same semantics.
  • the semantic recognition of each sentence in the at least one sentence can be performed to obtain the semantics of each sentence, and the sentences with the same semantics can be merged. For example, only one of multiple sentences with the same semantics can be kept, thereby filtering out Redundant sentences improve the efficiency of scoring medical literature.
  • the medical document ranking device determines the score corresponding to each candidate medical document according to the query sentence and the at least one sentence.
  • the query sentence perform word embedding processing on each first word in the query sentence to obtain the first word vector corresponding to each first word; perform word embedding processing on each word in sentence A to obtain each first word
  • the second word vector corresponding to the two words where the sentence A is any sentence in at least one sentence corresponding to each medical document; the inverse document frequency (IDF) corresponding to the query sentence is determined, that is, according to the The number of times the query statement appears in the multiple candidate medical documents and the number of the multiple candidate medical documents determine the IDF of the query statement.
  • the score of each medical document is determined.
  • the self-attention mechanism and the first word vector of each first word obtain the third word vector corresponding to each first word; and according to the self-attention mechanism and the second word vector of each second word, obtain The fourth word vector corresponding to each second word; according to the two-way attention mechanism, the third word vector corresponding to each first word, and the fourth vector corresponding to each second word, the sentence A to the Indian song is obtained.
  • the similarity between the first word vector of each first word and the second word vector of each second word is determined, and the first similarity matrix with the sentence A is obtained, where the first word vector
  • the j-th element of the i-th row in the similarity matrix represents the similarity between the i-th word in the query sentence and the j-th word in the sentence A;
  • the first 3 elements in each row of the first similarity matrix can be retained, and the remaining elements can be deleted to obtain the second similarity matrix.
  • the first word vector of each first word is weighted according to the self-attention mechanism to obtain the third word vector corresponding to each first word, that is, the first word vector corresponding to each first word is obtained according to the first preset parameter.
  • the first word vector of each first word is transformed to obtain the first query vector, first key value vector, and first value vector corresponding to each first word; then, determine the first query vector corresponding to the first word A and The similarity between the first key value vector of each first word in the query sentence is obtained, and the weight coefficient between the first word A and each first word is obtained, that is, the difference between the first word A and each first word
  • the similarity between the two is normalized to obtain the weight coefficient between the first word A and each first word; then, according to the weight coefficient between the first word A and each first word, each The first value vector corresponding to the first word is weighted to obtain the third word vector corresponding to the first word A, where the first word A is any first word in the query sentence; similarly, according to self-attention Mechanism, weighting the second
  • a two-way attention mechanism (co-attention) is used to weight the third word vector of each first word and the fourth word vector of each second word to obtain the first feature matrix corresponding to the sentence A.
  • co-attention a two-way attention mechanism
  • the similarity between the third word vector of the first word A and the fourth word vector of each second word is determined, and the weight coefficient between the first word A and each second word is obtained, and then, According to the weight coefficient between each second word, the fourth word vector corresponding to each second word is weighted to obtain the fifth word vector corresponding to the first word A, and further, the first word is determined
  • the first maximum value among the weight coefficients between A and each second word, the first maximum value and the fifth word vector corresponding to the first word A are used for dot multiplication to obtain the target word corresponding to the first word A
  • the target feature vector corresponding to each first word in the query sentence is formed into a first matrix; for example, the fourth word vector of the second word B and the third word vector of each first word are determined The similarity between
  • the second similarity matrix, the second feature matrix, and the inverse text frequency of the query sentence corresponding to the sentence A Concat the second similarity matrix, the second feature matrix, and the inverse text frequency of the query sentence corresponding to the sentence A to obtain the third feature matrix corresponding to the sentence A; it should be understood that if the second is similar The dimensions of the degree matrix and the second feature matrix are different. You can first map the dimensions of the second similarity matrix and the second feature matrix to the same dimension, and then copy the inverse text frequency to obtain the corresponding inverse text frequency A feature vector (for example, a column vector), the dimension of the feature vector is the same as the number of rows of the second similarity matrix and the second feature matrix after the dimension transformation. In this way, the second similarity matrix, the second feature matrix, and the feature vector can be spliced (for example, vertical splicing) to obtain the third feature matrix corresponding to the sentence A.
  • the second similarity matrix, the second feature matrix, and the feature vector can be spliced (for example, vertical
  • the fifth feature matrix corresponding to each sentence in the at least one sentence and the similarity between the query sentence and each candidate medical document are spliced to obtain the target feature matrix corresponding to each candidate medical document; then, Perform feature extraction on the target feature matrix corresponding to each candidate medical document to obtain the target feature vector corresponding to each candidate medical document; classify according to the target feature vector corresponding to each candidate medical document, and obtain the score of each candidate medical document.
  • the medical document sorting device sorts the multiple candidate medical documents according to the score corresponding to each candidate medical document.
  • the multiple candidate medical documents are sorted in descending order, and the sorted multiple candidate medical documents are displayed on the visual interface.
  • the candidate medical literature can be sorted according to the sentence dimension scores of the query sentence and the candidate medical literature (that is, the second recall) so that users can first see the candidate medical literature with the highest score, and can quickly find what they want The obtained medical literature improves the retrieval efficiency of medical literature.
  • the medical document ranking method of this application can also be applied to the field of smart medicine.
  • doctors can use the medical document ranking method of this application to quickly find historical cases or historical documents, so as to be current Diagnosis provides case references, improves diagnosis efficiency, and promotes the development of medical technology.
  • the sorting of medical documents in the present application can be achieved by a neural network that has been trained.
  • the training process of the neural network will be described in detail later, and no further description will be given here.
  • the following describes the process of determining the target score of medical literature in conjunction with the accompanying drawings and taking sentence A as an example.
  • the embedding layer 1 is used to perform word embedding processing on each first word in the query sentence to obtain the first word vector corresponding to each first word
  • the embedding layer 2 is used to perform word embedding processing on each first word in the sentence A.
  • the word embedding process is performed on the two words to obtain the second word vector corresponding to each second word; then, the first similarity between the first word A and each second word is determined to obtain the first similarity matrix, and The first similarity matrix is subjected to getmax(k) pooling processing to obtain the second similarity matrix.
  • Convolutional layer 1 is used for feature extraction (ie, semantic feature extraction) of the first word vector of each first word to obtain the first semantic vector corresponding to each first word; self-attention layer 1 is used for each The first semantic vector corresponding to the first word is subjected to self-attention weighting processing, and the third word vector corresponding to each first word is obtained.
  • self-attention layer 1 the key words in the query sentence (that is, starting from the query sentence) can be obtained.
  • the key function) feature amplification; the convolutional layer 2 is used to perform feature extraction (ie, semantic feature extraction) on the second word vector corresponding to each second word to obtain the second semantic vector corresponding to each second word;
  • the attention layer 2 is used to perform self-attention weighting processing on the second semantic vector corresponding to each second word to obtain the fourth word vector corresponding to each second word.
  • the self-attention layer 2 the The features of key words (that is, words that can represent the medical literature) are amplified; finally, the two-way attention layer is used to perform two-way attention on the second word vector of each first word and the fourth word vector corresponding to each second word Force weighting processing to obtain the first feature matrix corresponding to sentence A;
  • the feature extraction network is used to perform feature extraction on the first feature matrix corresponding to sentence A to obtain the second feature matrix corresponding to sentence A; then, the second feature matrix, second similarity matrix and inverse text frequency corresponding to sentence A Perform concat to obtain the third feature matrix corresponding to the sentence A;
  • Dense network 1 is used to perform feature extraction on the third feature matrix corresponding to sentence A to obtain the fourth feature matrix corresponding to sentence A; and perform getmax processing on the fourth feature matrix to obtain the fifth feature matrix corresponding to sentence A Feature matrix
  • Dense network 2 is used for feature extraction of the target feature matrix to obtain a target feature vector corresponding to each candidate medical document, and classify the target feature vector to obtain a score corresponding to each candidate medical document.
  • FIG. 3 is a schematic flowchart of a neural network training method provided by an embodiment of the application.
  • the content in this embodiment is the same as that in the embodiment shown in FIG. 1, and the description will not be repeated here.
  • the method includes the following steps:
  • each of the multiple medical literature samples is marked with a true score.
  • the loss is determined according to the predicted score corresponding to each medical document sample and the actual score corresponding to each medical document sample, and the network parameters of the neural network are adjusted according to the loss until the neural network converges, and the training is completed Neural Networks.
  • the medical document sorting device 400 includes: a transceiver unit 401 and a processing unit 402, wherein:
  • the transceiver unit 401 is used to obtain user query sentences
  • the processing unit 402 is configured to obtain multiple candidate medical documents corresponding to the query sentence;
  • the processing unit 402 is further configured to determine at least one sentence corresponding to each candidate medical document among the multiple candidate medical documents;
  • the processing unit 402 is further configured to determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;
  • the processing unit 402 is further configured to sort the multiple candidate medical documents according to the score corresponding to each candidate medical document.
  • the processing unit 402 is specifically configured to:
  • a plurality of candidate medical documents are selected from the medical database.
  • the processing unit 402 is specifically configured to:
  • the score corresponding to each medical document is determined.
  • each medical article is determined according to the frequency of the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word.
  • the processing unit 402 is specifically used for:
  • the score corresponding to each medical document is determined.
  • the processing unit 402 is specifically configured to :
  • the score corresponding to each medical document is determined.
  • the processing unit 402 is specifically configured to:
  • the score corresponding to each candidate medical document is determined.
  • processing Unit 402 is specifically used for:
  • the first word A is any word in the query sentence
  • weighting is performed on the fourth word vector corresponding to each second word to obtain the fifth word vector corresponding to the first word A.
  • the first matrix, the second matrix, and the third matrix formed by the fourth word vector corresponding to each second word are spliced to obtain the first feature matrix.
  • FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the application.
  • the electronic device includes a memory and a processor.
  • the electronic device may further include a transceiver.
  • the electronic device 500 includes a transceiver 501, a processor 502, and a memory 503. They are connected by a bus 504 between them.
  • the storage 503 is used to store computer programs and data, and can transmit the data stored in the storage 503 to the processor 502.
  • the processor 502 is configured to read the computer program in the memory 503 to perform the following operations:
  • the multiple candidate medical documents are sorted.
  • the processor 502 is specifically configured to perform the following operations:
  • a plurality of candidate medical documents are selected from the medical database.
  • the processor 502 is specifically configured to perform the following operations:
  • the score corresponding to each medical document is determined.
  • each medical article is determined according to the frequency of the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word.
  • the processor 502 is specifically configured to perform the following operations:
  • the score corresponding to each medical document is determined.
  • the processor 502 is specifically configured to Do the following:
  • the processor 502 in determining the score corresponding to each medical document according to the third feature matrix corresponding to each sentence in the at least one sentence, the processor 502 is specifically configured to perform the following operations:
  • the score corresponding to each candidate medical document is determined.
  • processing The device 502 is specifically configured to perform the following operations:
  • the first word A is any word in the query sentence
  • weighting is performed on the fourth word vector corresponding to each second word to obtain the fifth word vector corresponding to the first word A.
  • the first matrix, the second matrix, and the third matrix formed by the fourth word vector corresponding to each second word are spliced to obtain the first feature matrix.
  • the transceiver 501 may be the transceiver unit 401 of the medical document sorting apparatus 400 in the embodiment shown in FIG. 4, and the processor 502 may be the processing unit 402 of the medical document sorting apparatus 400 in the embodiment shown in FIG. .
  • the medical document sorting device in this application may include smart phones (such as Android phones, iOS phones, Windows Phone phones, etc.), tablet computers, handheld computers, notebook computers, mobile Internet Devices (Mobile Internet Devices, MID for short) ) Or wearable devices, etc.
  • smart phones such as Android phones, iOS phones, Windows Phone phones, etc.
  • tablet computers such as Samsung phones, iOS phones, Windows Phone phones, etc.
  • notebook computers mobile Internet Devices (Mobile Internet Devices, MID for short) ) Or wearable devices, etc.
  • the aforementioned medical document sorting device is only an example, not an exhaustive list, and includes but not limited to the aforementioned medical document sorting device.
  • the above-mentioned medical document sorting device may also include: intelligent vehicle-mounted terminals, computer equipment, and so on.
  • the embodiments of the present application also provide a computer (readable) storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement any one of the medicines described in the above method embodiments. Part or all of the steps in the literature ranking method.
  • the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.
  • the embodiments of the present application also provide a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, the computer program is operable to cause a computer to execute the method described in the above method embodiment Part or all of the steps of any sort of medical literature.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or in the form of software program modules.
  • the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory.
  • a number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.

Abstract

A medical text sorting method and an apparatus, an electronic device, and a storage medium, relating to the technical field of medical technology. The method comprises: acquiring a query statement of a user; acquiring a plurality of candidate medical texts corresponding to the query statement; determining at least one sentence corresponding to each candidate medical text in the plurality of candidate medical texts; on the basis of the query statement and the at least one sentence, determining a score corresponding to each candidate medical text; and on the basis of the score corresponding to each candidate medical text, sorting the plurality of candidate medical texts. The present method is beneficial for improving the efficiency of retrieving medical texts.

Description

医学文献排序方法、装置、电子设备及存储介质Medical literature sorting method, device, electronic equipment and storage medium
本申请要求于2020年10月31日提交中国专利局、申请号为202011206225.4,发明名称为“医学文献排序方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 31, 2020, the application number is 202011206225.4, and the invention title is "Medical Document Sorting Method, Apparatus, Electronic Equipment, and Storage Medium", the entire content of which is incorporated by reference Incorporated in this application.
技术领域Technical field
本申请涉及信息推荐技术领域,具体涉及一种医学文献排序方法、装置、电子设备及存储介质。This application relates to the technical field of information recommendation, and specifically relates to a medical document sorting method, device, electronic equipment, and storage medium.
背景技术Background technique
公共医学(public medicine,PUBMED)数据库包含了大量的医学文献,海量医学文献中往往包含着某一医学领域的研究方向的发展趋势,通过对医学领域的医学文献进行阅读,可提高相关领域研究者们和相关公共卫生政策制定者们制定决策的效率和精度。发明人发现,目前,用户从PUBMED数据库中获取医学文献的方法一般都是输入查询语句,然后,后台对查询语句做一些关键字解析,通过关键词匹配搜索出返候选文献,并将候选文献在可视化界面展示给用户,以便用户查阅。The public medicine (PUBMED) database contains a large amount of medical literature, and the mass medical literature often contains the development trend of a certain medical field. By reading the medical literature in the medical field, researchers in related fields can be improved The efficiency and accuracy of decision-making by our and relevant public health policy makers. The inventor found that the current method for users to obtain medical documents from the PUBMED database is generally to input query sentences, and then perform some keyword analysis on the query sentences in the background, search for candidate documents through keyword matching, and put the candidate documents in The visual interface is displayed to users for easy reference.
然而,发明人意识到,随着Pubmed数据库中医学文献的增多,每次搜索出的候选文献数量非常多,用户难以从大量的候选文献中获取到自己想要的医学文献,导致用户紧缩医学文献的效率较低,甚至无法获取到自己想要检索的医学文献。However, the inventor realized that with the increase of medical documents in Pubmed database, the number of candidate documents searched for each time is very large, and it is difficult for users to obtain the medical documents they want from a large number of candidate documents, which causes users to shrink medical documents. The efficiency is low, and you can't even get the medical literature you want to retrieve.
发明内容Summary of the invention
本申请实施例提供了一种医学文献排序方法、装置、电子设备及存储介质。通过对候选医学文献进行评分,提高对医学文献的检索效率。The embodiments of the present application provide a medical document sorting method, device, electronic equipment, and storage medium. By scoring candidate medical literature, the retrieval efficiency of medical literature is improved.
第一方面,本申请实施例提供一种医学文献排序方法,包括:In the first aspect, an embodiment of the present application provides a method for sorting medical documents, including:
获取用户的查询语句;Get the user's query statement;
获取与所述查询语句对应的多篇候选医学文献;Acquiring multiple candidate medical documents corresponding to the query sentence;
确定所述多篇候选医学文献中的每篇候选医学文献对应的至少一个句子;Determine at least one sentence corresponding to each candidate medical document in the plurality of candidate medical documents;
根据所述查询语句以及所述至少一个句子,确定所述每篇候选医学文献对应的评分;Determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;
根据所述每篇候选医学文献对应的评分,对所述多篇候选医学文献进行排序。According to the score corresponding to each candidate medical document, the multiple candidate medical documents are sorted.
第二方面,本申请实施例提供一种医学文献排序装置,包括:In the second aspect, an embodiment of the present application provides a medical document sorting device, including:
收发单元,用于获取用户的查询语句;The transceiver unit is used to obtain the user's query statement;
处理单元,用于获取与所述查询语句对应的多篇候选医学文献;A processing unit for obtaining multiple candidate medical documents corresponding to the query sentence;
所述处理单元,还用于确定所述多篇候选医学文献中的每篇候选医学文献对应的至少一个句子;The processing unit is further configured to determine at least one sentence corresponding to each candidate medical document among the multiple candidate medical documents;
所述处理单元,还用于根据所述查询语句以及所述至少一个句子,确定所述每篇候选医学文献对应的评分;The processing unit is further configured to determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;
所述处理单元,还用于根据所述每篇候选医学文献对应的评分,对所述多篇候选医学文献进行排序。The processing unit is further configured to sort the multiple candidate medical documents according to the score corresponding to each candidate medical document.
第三方面,本申请实施例提供一种电子设备,包括:处理器,所述处理器与存储器相连,所述存储器用于存储计算机程序,所述处理器用于执行所述存储器中存储的计算机程序,以使得所述电子设备执行以下方法:In a third aspect, an embodiment of the present application provides an electronic device, including a processor, the processor is connected to a memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory , So that the electronic device executes the following method:
获取用户的查询语句;Get the user's query statement;
获取与所述查询语句对应的多篇候选医学文献;Acquiring multiple candidate medical documents corresponding to the query sentence;
确定所述多篇候选医学文献中的每篇候选医学文献对应的至少一个句子;Determine at least one sentence corresponding to each candidate medical document in the plurality of candidate medical documents;
根据所述查询语句以及所述至少一个句子,确定所述每篇候选医学文献对应的评分;Determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;
根据所述每篇候选医学文献对应的评分,对所述多篇候选医学文献进行排序。According to the score corresponding to each candidate medical document, the multiple candidate medical documents are sorted.
第四方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存 储有计算机程序,所述计算机程序使得计算机执行以下方法:In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program causes a computer to execute the following method:
获取用户的查询语句;Get the user's query statement;
获取与所述查询语句对应的多篇候选医学文献;Acquiring multiple candidate medical documents corresponding to the query sentence;
确定所述多篇候选医学文献中的每篇候选医学文献对应的至少一个句子;Determine at least one sentence corresponding to each candidate medical document in the plurality of candidate medical documents;
根据所述查询语句以及所述至少一个句子,确定所述每篇候选医学文献对应的评分;Determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;
根据所述每篇候选医学文献对应的评分,对所述多篇候选医学文献进行排序。According to the score corresponding to each candidate medical document, the multiple candidate medical documents are sorted.
本申请可根据查询语句与候选医学文献在句子维度的评分,对候选医学文献(第一次召回的医学文献)进行评分,并根据评分对候选医学文献进行排序(即进行二次召回),这样用户可以优先看到评分最高的候选医学文献,可以快速找到自己想要获取的医学文献,提高医学文献的检索效率。This application can score candidate medical documents (medical documents recalled for the first time) based on the sentence dimension scores of query sentences and candidate medical documents, and rank candidate medical documents based on the scores (that is, perform a second recall). Users can see the candidate medical literature with the highest score first, can quickly find the medical literature they want to obtain, and improve the retrieval efficiency of medical literature.
附图说明Description of the drawings
图1为本申请实施例提供的一种医学文献排序方法的流程示意图;FIG. 1 is a schematic flowchart of a method for sorting medical documents according to an embodiment of the application;
图2为本申请实施例提供的一种神经网络的结构示意图;FIG. 2 is a schematic structural diagram of a neural network provided by an embodiment of this application;
图3为本申请实施例提供的一种神经网络训练方法的流程示意图;FIG. 3 is a schematic flowchart of a neural network training method provided by an embodiment of this application;
图4为本申请实施例提供的一种医学文献排序装置的功能单元组成框图;4 is a block diagram of functional units of a device for sorting medical documents according to an embodiment of the application;
图5为本申请实施例提供的一种医学文献排序装置的结构示意图。Fig. 5 is a schematic structural diagram of a medical document sorting device provided by an embodiment of the application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.
本申请的技术方案可涉及人工智能和/或大数据技术领域,如可具体涉及神经网络技术,可应用于信息检索如医学领域的信息检索等场景中,以实现数字医疗,推送智慧城市的建设。可选的,本申请涉及的数据如查询语句和/或评分等可存储于数据库中,或者可以存储于区块链中,本申请不做限定。The technical solution of this application may involve the field of artificial intelligence and/or big data technology, such as neural network technology, and can be applied to information retrieval scenarios such as information retrieval in the medical field to realize digital medical care and push the construction of smart cities . Optionally, the data involved in this application, such as query sentences and/or scores, can be stored in a database, or can be stored in a blockchain, which is not limited in this application.
参阅图1,图1为本申请实施例提供的一种医学文献排序方法的流程示意图。该方法应用于医学文献排序装置。该方法包括以下步骤:Refer to FIG. 1, which is a schematic flowchart of a method for sorting medical documents according to an embodiment of the application. This method is applied to a medical document sorting device. The method includes the following steps:
101:医学文献排序装置获取用户的查询语句。101: The medical document sorting device acquires the query sentence of the user.
示例性的,查询语句可以是用户在医学文献搜索装置的信息输入域中手动输入的,也可以是对用户语音进行语音识别得到的,比如,通过语音助手识别该用户语音,得到该查询语句,本申请不对获取该查询语句的方式进行限定。Exemplarily, the query sentence may be manually input by the user in the information input field of the medical literature search device, or it may be obtained by performing voice recognition on the user’s voice. For example, the user’s voice is recognized through a voice assistant to obtain the query sentence. This application does not limit the way of obtaining the query sentence.
102:医学文献排序装置获取与所述查询语句对应的多篇候选医学文献。102: The medical document ranking device obtains multiple candidate medical documents corresponding to the query sentence.
示例性的,确定该查询语句与该医学数据库(比如,公共医学(PublicMedicine,PUBMED)数据库)中每篇医学文献的相似度,并根据每篇医学文献的相似度从该医学数据库中获取与该查询语句对应的多篇候选医学文献,比如,将相似度大于阈值的医学文献作为候选医学文献。Exemplarily, determine the similarity between the query sentence and each medical document in the medical database (for example, Public Medicine (PUBMED) database), and obtain the similarity from the medical database according to the similarity of each medical document. Multiple candidate medical documents corresponding to the query sentence, for example, medical documents with similarity greater than a threshold are used as candidate medical documents.
示例性的,可通过搜索服务器(比如,elasticsearch或者slor)确定该查询语句与医学文献之间的相似度,并根据相似度从该医学数据库中获取与该查询语句对应的多篇候选医学文献,即对医学库中的医学文献进行一次召回。本申请对获取候选医学文献的方式不做限定。Exemplarily, the similarity between the query sentence and the medical literature can be determined through a search server (for example, elasticsearch or slor), and multiple candidate medical literatures corresponding to the query sentence can be obtained from the medical database according to the similarity. That is, a recall of the medical literature in the medical library. This application does not limit the way of obtaining candidate medical documents.
在本申请的一个实施方式中,通过搜索服务器确定该查询语句与医学文献之间的相似度,主要是将该查询语句与每篇医学文献进行局部匹配得到该相似度。所以,就会得到局部匹配,但是冗余的候选医学文献,比如,查询语句为“肺癌病人”,在进行局部匹配的过程中,可能将包含病人的医学文献全部作为与该查询语句对应的候选医学文献,得到多篇冗余的候选医学文献。因此为了提高得到候选医学文献的精确度,在得到多篇候选医学文献之后,确定每篇候选医学文献中的实体,以及确定该查询语句中的实体,确定该查询语句中的实体与该候选医学文献中的实体之间的相似度;最后,将该查询语句与每篇候选医 学之间的相似度、以及该查询语句中的实体与该候选医学文献中的实体之间的相似度进行加权处理,得到与每篇候选医学文献对应的最终相似度,根据每篇候选医学文献对应的最终相似度,从该多篇候选医学文献中选出与该查询语句对应的候选医学文献。通过实体匹配,可以过滤一些实体不匹配的候选医学文献,比如,查询语句为“肺癌病人”,通过实体匹配,可以过滤掉不包含实体“肺癌”的候选医学文献。In an embodiment of the present application, the search server is used to determine the similarity between the query sentence and the medical document, mainly by locally matching the query sentence with each medical document to obtain the similarity. Therefore, there will be partial matching but redundant candidate medical documents. For example, the query sentence is "lung cancer patient". In the process of local matching, all medical documents containing the patient may be regarded as candidates corresponding to the query sentence. Medical literature, get multiple redundant candidate medical literatures. Therefore, in order to improve the accuracy of obtaining candidate medical documents, after obtaining multiple candidate medical documents, the entity in each candidate medical document is determined, and the entity in the query sentence is determined, and the entity in the query sentence and the candidate medicine are determined. The similarity between entities in the literature; finally, the similarity between the query sentence and each candidate medicine, and the similarity between the entity in the query sentence and the entity in the candidate medical document are weighted. , Obtain the final similarity corresponding to each candidate medical document, and select the candidate medical document corresponding to the query sentence from the multiple candidate medical documents according to the final similarity corresponding to each candidate medical document. Through entity matching, some candidate medical documents that do not match entities can be filtered. For example, if the query sentence is "lung cancer patient", through entity matching, candidate medical documents that do not contain the entity "lung cancer" can be filtered out.
103:医学文献排序装置确定所述多篇候选医学文献中的每篇候选医学文献对应的至少一个句子。103: The medical document sorting device determines at least one sentence corresponding to each candidate medical document among the multiple candidate medical documents.
示例性的,可通过已有的工具包对该多篇候选医学文献中的每篇医学文献进行分句,得到每篇候选医学文献对应的至少一个句子,比如,可以通过自然语言处理工具包(Natural Language Toolkit,NLTK)对每篇医学文献进行分句。Exemplarily, each of the multiple candidate medical documents can be segmented through an existing toolkit to obtain at least one sentence corresponding to each candidate medical document. For example, the natural language processing toolkit ( Natural Language Toolkit (NLTK) divides sentences into each medical document.
示例性的,可通过NLTK识别每篇医学文献中的标识符号(比如,句号,文号),将标识符号之间的医学文本作为该篇医学文献中的一个候选句子。然后,将相同的候选句子进行合并,得到每篇医学文献对应的至少一个句子。然而,在合并的过程中只是将句子中的单词一一比对,仍然保留有语义相同的句子,从而保留了多个冗余的句子,比如,句子“lung cancer survival rate”和句子“survival rate of lung cancer”是语义相同,但是由于表述不同,不能作为完全相同的两个句子,则会将这两个句子作为单独的两个句子,不会合并,从而得到很多语义相同的冗余句子。因此,可对该至少一个句子中每个句子进行语义识别,得到每个句子的语义,并将语义相同的句子进行合并,比如,将多个语义相同的句子只保留其中一个,从而过滤掉了冗余的句子,提高了对医学文献的评分效率。Exemplarily, the identification symbols (for example, period, document symbol) in each medical document can be recognized through NLTK, and the medical text between the identification symbols is used as a candidate sentence in the medical document. Then, the same candidate sentences are merged to obtain at least one sentence corresponding to each medical document. However, in the process of merging, only the words in the sentences are compared one by one, and there are still sentences with the same semantics, thus retaining multiple redundant sentences, for example, the sentence "lung cancer survival rate" and the sentence "survival rate" "of lung cancer" has the same semantics, but due to different expressions, it cannot be regarded as two identical sentences. These two sentences will be regarded as two separate sentences and will not be merged, resulting in many redundant sentences with the same semantics. Therefore, the semantic recognition of each sentence in the at least one sentence can be performed to obtain the semantics of each sentence, and the sentences with the same semantics can be merged. For example, only one of multiple sentences with the same semantics can be kept, thereby filtering out Redundant sentences improve the efficiency of scoring medical literature.
104:医学文献排序装置根据所述查询语句以及所述至少一个句子,确定所述每篇候选医学文献对应的评分。104: The medical document ranking device determines the score corresponding to each candidate medical document according to the query sentence and the at least one sentence.
示例性的,对该查询语句中的每个第一单词进行词嵌入处理,得到每个第一单词对应的第一词向量;对句子A中的每个单词进行词嵌入处理,得到每个第二单词对应的第二词向量,其中,该句子A为每篇医学文献对应的至少一个句子中的任意一个句子;确定该查询语句对应的逆文本频率(inverse document frequency,IDF),即根据该查询语句在该多篇候选医学文献中出现的次数,以及该多篇候选医学文献的数量,确定该查询语句的IDF。然后,根据每个第一单词对应的第一词向量、每个第二单词对应的第二词向量以及该IDF,确定每篇医学文献的评分。Exemplarily, perform word embedding processing on each first word in the query sentence to obtain the first word vector corresponding to each first word; perform word embedding processing on each word in sentence A to obtain each first word The second word vector corresponding to the two words, where the sentence A is any sentence in at least one sentence corresponding to each medical document; the inverse document frequency (IDF) corresponding to the query sentence is determined, that is, according to the The number of times the query statement appears in the multiple candidate medical documents and the number of the multiple candidate medical documents determine the IDF of the query statement. Then, according to the first word vector corresponding to each first word, the second word vector corresponding to each second word, and the IDF, the score of each medical document is determined.
进一步的,根据自注意机制以及每个第一单词的第一词向量,得到每个第一单词对应的第三词向量;以及根据自注意机制以及每个第二单词的第二词向量,得到每个第二单词对应的第四词向量;根据双向注意力机制、每个第一单词对应的第三词向量以及每个第二单词对应的第四向量,得到与该句子A对印度歌第一特征矩阵;然后,根据每篇医学文献中每个句子对应的第一相似度矩阵、第一特征矩阵以及该查询语句的IDF,确定每篇医学文献对应的评分。Further, according to the self-attention mechanism and the first word vector of each first word, obtain the third word vector corresponding to each first word; and according to the self-attention mechanism and the second word vector of each second word, obtain The fourth word vector corresponding to each second word; according to the two-way attention mechanism, the third word vector corresponding to each first word, and the fourth vector corresponding to each second word, the sentence A to the Indian song is obtained. A feature matrix; then, according to the first similarity matrix, the first feature matrix and the IDF of the query sentence corresponding to each sentence in each medical document, the score corresponding to each medical document is determined.
具体的,确定每个第一单词的第一词向量与每个第二单词的第二词向量之间的相似度,得到与该句子A之间的第一相似度矩阵,其中,该第一相似度矩阵中的第i行第j元素,表示该查询语句中的第i个单词与该句子A中的第j个单词之间的相似度;将该第一相似度矩阵进行池化处理,得到与该句子A对应的第二相似度矩阵,其中,该池化处理为getmax(k)处理,k表示第一相似度矩阵的每行元素中所要保留的数量,比如,k=3时,可以将该第一相似度矩阵的每行元素中取值为前3的元素保留,其余元素删除,得到该第二相似度矩阵。Specifically, the similarity between the first word vector of each first word and the second word vector of each second word is determined, and the first similarity matrix with the sentence A is obtained, where the first word vector The j-th element of the i-th row in the similarity matrix represents the similarity between the i-th word in the query sentence and the j-th word in the sentence A; the first similarity matrix is pooled, Obtain the second similarity matrix corresponding to the sentence A, where the pooling process is getmax(k) processing, and k represents the number of elements to be retained in each row of the first similarity matrix, for example, when k=3, The first 3 elements in each row of the first similarity matrix can be retained, and the remaining elements can be deleted to obtain the second similarity matrix.
进一步的,根据自注意力机制(self-attention)对每个第一单词的第一词向量进行加权处理,得到每个第一单词对应的第三词向量,即根据第一预设参数对每个第一单词的第一词向量进行变换,得到每个第一单词对应的第一查询向量、第一关键值向量以及第一价值 向量;然后,确定第一单词A对应的第一查询向量与查询语句中的每个第一单词的第一关键值向量之间的相似度,得到第一单词A与每个第一单词之间的权重系数,即将第一单词A与每个第一单词之间的相似度进行归一化处理,得到该第一单词A与每个第一单词之间的权重系数;然后,根据第一单词A与每个第一单词之间的权重系数,对每个第一单词对应的第一价值向量进行加权,得到与第一单词A对应的第三词向量,其中,该第一单词A为该查询语句中的任意一个第一单词;同样,根据自注意力机制,对每个第二单词的第二词向量进行加权处理,得到每个第二单词的第四词向量,其中,根据自注意力机制对每个第二单词的第二词向量进行加权处理,与上述对每个第一单词的第一词向量进行加权处理的方式类似,不再叙述;Further, the first word vector of each first word is weighted according to the self-attention mechanism to obtain the third word vector corresponding to each first word, that is, the first word vector corresponding to each first word is obtained according to the first preset parameter. The first word vector of each first word is transformed to obtain the first query vector, first key value vector, and first value vector corresponding to each first word; then, determine the first query vector corresponding to the first word A and The similarity between the first key value vector of each first word in the query sentence is obtained, and the weight coefficient between the first word A and each first word is obtained, that is, the difference between the first word A and each first word The similarity between the two is normalized to obtain the weight coefficient between the first word A and each first word; then, according to the weight coefficient between the first word A and each first word, each The first value vector corresponding to the first word is weighted to obtain the third word vector corresponding to the first word A, where the first word A is any first word in the query sentence; similarly, according to self-attention Mechanism, weighting the second word vector of each second word to obtain the fourth word vector of each second word, wherein the second word vector of each second word is weighted according to the self-attention mechanism , Which is similar to the above-mentioned method of weighting the first word vector of each first word, and will not be described again;
进一步的,使用双向注意机制(co-attention)对每个第一单词的第三词向量以及每个第二单词的第四词向量进行加权处理,得到该句子A对应的第一特征矩阵。示例性的,确定第一单词A的第三词向量与每个第二单词的第四词向量之间的相似度,得到第一单词A与每个第二单词之间的权重系数,然后,根据与每个第二单词之间的权重系数,对每个第二单词对应的第四词向量进行加权处理,得到该第一单词A对应的第五词向量,进一步的,确定该第一单词A与每个第二单词之间的权重系数中的第一最大值,使用第一最大值与该第一单词A对应的第五词向量进行点乘,得到该第一单词A对应的目标词向量,然后,将该查询语句中的每个第一单词对应的目标特征向量组成第一矩阵;示例性的,确定第二单词B的第四词向量与每个第一单词的第三词向量之间的相似度,得到该第二单词B与每个第一单词之间的权重系数,其中,第二单词B为该句子A中的任意一个第二单词;根据第二单词B与每个第一单词之间的权重系数,对每个第一单词对应的第三词向量进行加权,得到与第二单词B对应的第五词向量;然后,确定该第二单词B与每个第一单词之间的权重系数中的第二最大值;使用第二最大值对该第二单词B对对应的第五词向量进行点乘,得到该第二单词对应的目标词向量,并将每个第二单词对应的目标词向量组成第二矩阵;最后,将第一矩阵、第二矩阵以及每个第二单词对应的第四词向量组成的第三矩阵进行拼接,得到与该句子A对应的第一特征矩阵。然后,对该第一特征矩阵进行高层语义提取,得到与该句子A对应的第二特征矩阵。Further, a two-way attention mechanism (co-attention) is used to weight the third word vector of each first word and the fourth word vector of each second word to obtain the first feature matrix corresponding to the sentence A. Exemplarily, the similarity between the third word vector of the first word A and the fourth word vector of each second word is determined, and the weight coefficient between the first word A and each second word is obtained, and then, According to the weight coefficient between each second word, the fourth word vector corresponding to each second word is weighted to obtain the fifth word vector corresponding to the first word A, and further, the first word is determined The first maximum value among the weight coefficients between A and each second word, the first maximum value and the fifth word vector corresponding to the first word A are used for dot multiplication to obtain the target word corresponding to the first word A Then, the target feature vector corresponding to each first word in the query sentence is formed into a first matrix; for example, the fourth word vector of the second word B and the third word vector of each first word are determined The similarity between the second word B and each first word is obtained, and the second word B is any second word in the sentence A; according to the second word B and each first word The weight coefficient between the first words is used to weight the third word vector corresponding to each first word to obtain the fifth word vector corresponding to the second word B; then, determine the second word B and each first word vector The second maximum value among the weight coefficients between words; the second maximum value is used to do a dot multiplication on the corresponding fifth word vector of the second word B to obtain the target word vector corresponding to the second word, and each The target word vector corresponding to the second word forms the second matrix; finally, the third matrix formed by the first matrix, the second matrix and the fourth word vector corresponding to each second word is spliced to obtain the corresponding sentence A The first feature matrix. Then, perform high-level semantic extraction on the first feature matrix to obtain the second feature matrix corresponding to the sentence A.
将该句子A对应的第二相似度矩阵、第二特征矩阵、以及该查询语句的逆文本频率进行拼接(concat),得到与该句子A对应的第三特征矩阵;应理解,若第二相似度矩阵和第二特征矩阵的维度不同,可以先将第二相似度矩阵和第二特征矩阵的维度映射到相同维度,然后,再对该逆文本频率进行复制,得到与该逆文本频率对应的特征向量(比如,列向量),该特征向量的维度与维度变换后的第二相似度矩阵和第二特征矩阵的行数相同。这样,可以将该第二相似度矩阵、第二特征矩阵以及该特征向量进行拼接(比如,纵向拼接),得到与该句子A对应的第三特征矩阵。Concat the second similarity matrix, the second feature matrix, and the inverse text frequency of the query sentence corresponding to the sentence A to obtain the third feature matrix corresponding to the sentence A; it should be understood that if the second is similar The dimensions of the degree matrix and the second feature matrix are different. You can first map the dimensions of the second similarity matrix and the second feature matrix to the same dimension, and then copy the inverse text frequency to obtain the corresponding inverse text frequency A feature vector (for example, a column vector), the dimension of the feature vector is the same as the number of rows of the second similarity matrix and the second feature matrix after the dimension transformation. In this way, the second similarity matrix, the second feature matrix, and the feature vector can be spliced (for example, vertical splicing) to obtain the third feature matrix corresponding to the sentence A.
然后,对该句子A对应的第三特征矩阵进行特征提取,得到该句子A对应的第四特征矩阵;进一步的,将句子A对应的第四特征矩阵进行池化处理,即getmax(k)处理,得到与该句子A对应的第五特征矩阵;Then, perform feature extraction on the third feature matrix corresponding to sentence A to obtain the fourth feature matrix corresponding to sentence A; further, perform pooling processing on the fourth feature matrix corresponding to sentence A, that is, getmax(k) processing , Get the fifth feature matrix corresponding to the sentence A;
最后,将该至少一个句子中每个句子对应的第五特征矩阵以及该查询语句与每篇候选医学文献之间的相似度进行拼接,得到与每篇候选医学文献对应的目标特征矩阵;然后,对每篇候选医学文献对应的目标特征矩阵进行特征提取,得到每篇候选医学文献对应的目标特征向量;根据每篇候选医学文献对应的目标特征向量进行分类,得到每篇候选医学文献的评分。Finally, the fifth feature matrix corresponding to each sentence in the at least one sentence and the similarity between the query sentence and each candidate medical document are spliced to obtain the target feature matrix corresponding to each candidate medical document; then, Perform feature extraction on the target feature matrix corresponding to each candidate medical document to obtain the target feature vector corresponding to each candidate medical document; classify according to the target feature vector corresponding to each candidate medical document, and obtain the score of each candidate medical document.
105:医学文献排序装置根据所述每篇候选医学文献对应的评分,对所述多篇候选医学文献进行排序。105: The medical document sorting device sorts the multiple candidate medical documents according to the score corresponding to each candidate medical document.
示例性的,根据每篇候选医学文献的评分,按照从大到小的顺序对该多篇候选医学文 献进行排序,并在可视化界面展示排序后的多篇候选医学文献。Exemplarily, according to the score of each candidate medical document, the multiple candidate medical documents are sorted in descending order, and the sorted multiple candidate medical documents are displayed on the visual interface.
可以看出,可根据查询语句与候选医学文献在句子维度的评分,对候选医学文献进行排序(即进行二次召回)这样用户可以优先看到评分最高的候选医学文献,可以快速找到自己想要获取的医学文献,提高医学文献的检索效率。It can be seen that the candidate medical literature can be sorted according to the sentence dimension scores of the query sentence and the candidate medical literature (that is, the second recall) so that users can first see the candidate medical literature with the highest score, and can quickly find what they want The obtained medical literature improves the retrieval efficiency of medical literature.
在本申请的一个实施方式中,本申请的医学文献排序方法还可以应用到智慧医疗领域,比如,医生可以使用本申请的医学文献排序方法快速的查找出历史病例或历史文献,从而为当前的诊断提供病例参考,提高诊断效率,推动医疗科技的发展。In one embodiment of this application, the medical document ranking method of this application can also be applied to the field of smart medicine. For example, doctors can use the medical document ranking method of this application to quickly find historical cases or historical documents, so as to be current Diagnosis provides case references, improves diagnosis efficiency, and promotes the development of medical technology.
在本申请的一个实施方式中,本申请对医学文献的排序可以通过完成训练的神经网络实现,后续详细描述对该神经网络的训练过程,在此不做过多描述。下面结合附图,并以句子A为例说明确定医学文献的目标评分的过程。In an embodiment of the present application, the sorting of medical documents in the present application can be achieved by a neural network that has been trained. The training process of the neural network will be described in detail later, and no further description will be given here. The following describes the process of determining the target score of medical literature in conjunction with the accompanying drawings and taking sentence A as an example.
如图2所示,该神经网络包括嵌入层1、嵌入层2、卷积层1、嵌入层2、自注意力层1、自注意力层2、双向注意力层、特征提取层、密集(Dense)网络1以及Dense网络2。其中,卷积层1和卷积层2可以为三元卷积(trigram convolution)网络层,用于特征提取。特征提取层可以为长短期记忆网络LSTM,,也是用于特征提取,自注意力层1、自注意力层2以及双向注意力层可以是以bert模型为基础构成的注意力层;As shown in Figure 2, the neural network includes embedding layer 1, embedding layer 2, convolutional layer 1, embedding layer 2, self-attention layer 1, self-attention layer 2, bidirectional attention layer, feature extraction layer, dense ( Dense) Network 1 and Dense Network 2. Among them, the convolutional layer 1 and the convolutional layer 2 may be trigram convolution network layers for feature extraction. The feature extraction layer can be a long and short-term memory network LSTM, which is also used for feature extraction. The self-attention layer 1, the self-attention layer 2, and the bidirectional attention layer can be an attention layer based on the bert model;
示例性的,嵌入层1用于对查询语句中的每个第一单词进行词嵌入处理,得到每个第一单词对应的第一词向量,嵌入层2用于对句子A中的每个第二单词进行词嵌入处理,得到每个第二单词对应的第二词向量;然后,确定第一单词A与每个第二单词之间的第一相似度,得到第一相似度矩阵,并对该第一相似度矩阵进行getmax(k)池化处理,得到第二相似度矩阵。Exemplarily, the embedding layer 1 is used to perform word embedding processing on each first word in the query sentence to obtain the first word vector corresponding to each first word, and the embedding layer 2 is used to perform word embedding processing on each first word in the sentence A. The word embedding process is performed on the two words to obtain the second word vector corresponding to each second word; then, the first similarity between the first word A and each second word is determined to obtain the first similarity matrix, and The first similarity matrix is subjected to getmax(k) pooling processing to obtain the second similarity matrix.
卷积层1用于对每个第一单词的第一词向量进行特征提取(即语义特征提取),得到每个第一单词对应的第一语义向量;自注意力层1用于对每个第一单词对应的第一语义向量进行自注意力加权处理,得到每个第一单词对应的第三词向量,通过自注意力层1可以将该查询语句中的重点单词(即对查询语句起关键作用的)的特征放大;卷积层2用于对每个第二单词对应的第二词向量进行特征提取(即语义特征提取),得到每个第二单词对应的第二语义向量;自注意力层2用于对每个第二单词对应的第二语义向量进行自注意加权处理,得到每个第二单词对应的第四词向量,通过自注意力层2可以将该句子A中的重点单词(即能够代表该医学文献的单词)的特征放大;最后,双向注意力层用于对每个第一单词的第二词向量以及每个第二单词对应的第四词向量进行双向注意力加权处理,得到与句子A对应的第一特征矩阵;Convolutional layer 1 is used for feature extraction (ie, semantic feature extraction) of the first word vector of each first word to obtain the first semantic vector corresponding to each first word; self-attention layer 1 is used for each The first semantic vector corresponding to the first word is subjected to self-attention weighting processing, and the third word vector corresponding to each first word is obtained. Through the self-attention layer 1, the key words in the query sentence (that is, starting from the query sentence) can be obtained. The key function) feature amplification; the convolutional layer 2 is used to perform feature extraction (ie, semantic feature extraction) on the second word vector corresponding to each second word to obtain the second semantic vector corresponding to each second word; The attention layer 2 is used to perform self-attention weighting processing on the second semantic vector corresponding to each second word to obtain the fourth word vector corresponding to each second word. Through the self-attention layer 2, the The features of key words (that is, words that can represent the medical literature) are amplified; finally, the two-way attention layer is used to perform two-way attention on the second word vector of each first word and the fourth word vector corresponding to each second word Force weighting processing to obtain the first feature matrix corresponding to sentence A;
特征提取网络用于对该句子A对应的第一特征矩阵进行特征提取,得到句子A对应的第二特征矩阵;然后,将句子A对应的第二特征矩阵、第二相似度矩阵以及逆文本频率进行拼接(concat),得到与该句子A对应的第三特征矩阵;The feature extraction network is used to perform feature extraction on the first feature matrix corresponding to sentence A to obtain the second feature matrix corresponding to sentence A; then, the second feature matrix, second similarity matrix and inverse text frequency corresponding to sentence A Perform concat to obtain the third feature matrix corresponding to the sentence A;
Dense网络1用于对该句子A对应的第三特征矩阵进行特征提取,得到与该句子A对应的第四特征矩阵;并对该第四特征矩阵进行getmax处理,得到该句子A对应的第五特征矩阵;Dense network 1 is used to perform feature extraction on the third feature matrix corresponding to sentence A to obtain the fourth feature matrix corresponding to sentence A; and perform getmax processing on the fourth feature matrix to obtain the fifth feature matrix corresponding to sentence A Feature matrix
最后,将每个句子对应的第五特征矩阵以及每篇候选医学对应的第一评分进行拼接,得到与每篇候选医学文献对应的目标特征矩阵;Finally, concatenate the fifth feature matrix corresponding to each sentence and the first score corresponding to each candidate medicine to obtain the target feature matrix corresponding to each candidate medical document;
Dense网络2用于对该目标特征矩阵进行特征提取,得到与每篇候选医学文献对应的目标特征向量,并对该目标特征向量进行分类,得到与每篇候选医学文献对应的评分。Dense network 2 is used for feature extraction of the target feature matrix to obtain a target feature vector corresponding to each candidate medical document, and classify the target feature vector to obtain a score corresponding to each candidate medical document.
参阅图3,图3为本申请实施例提供的一种神经网络训练方法的流程示意图。本实施例中与图1所示的实施例中的相同内容,在此不再重复描述。该方法包括以下步骤:Refer to FIG. 3, which is a schematic flowchart of a neural network training method provided by an embodiment of the application. The content in this embodiment is the same as that in the embodiment shown in FIG. 1, and the description will not be repeated here. The method includes the following steps:
301:获取查询样本以及与所述查询样本对应的多篇医学文献样本。301: Obtain a query sample and multiple medical document samples corresponding to the query sample.
其中,该多篇医学文献样本中的每篇医学文献样本标注有真实评分。Among them, each of the multiple medical literature samples is marked with a true score.
302:对所述多篇医学文献样本中的每篇医学样本进行分句,得到所述每篇医学样本对应的至少一个句子。302: Perform sentence segmentation on each medical sample in the multiple medical document samples to obtain at least one sentence corresponding to each medical sample.
303:将所述查询样本以及所述每篇医学文献样本对应的至少一个句子输入到神经网络,得到所述每篇医学文献样本对应的预测评分。303: Input the query sample and the at least one sentence corresponding to each medical document sample into the neural network to obtain a prediction score corresponding to each medical document sample.
304:根据所述每篇医学文献样本对应的预测评分以及所述每篇医学文献样本对应的真实评分,调整所述神经网络的网络参数。304: Adjust the network parameters of the neural network according to the predicted score corresponding to each medical document sample and the actual score corresponding to each medical document sample.
示例性的,根据每篇医学文献样本对应的预测评分以及每篇医学文献样本对应的真实评分,确定损失,并根据该损失调整该神经网络的网络参数,直至该神经网络收敛,得到完成训练的神经网络。Exemplarily, the loss is determined according to the predicted score corresponding to each medical document sample and the actual score corresponding to each medical document sample, and the network parameters of the neural network are adjusted according to the loss until the neural network converges, and the training is completed Neural Networks.
参阅图4,图4本申请实施例提供的一种医学文献排序装置的功能单元组成框图。医学文献排序装置400包括:收发单元401和处理单元402,其中:Refer to FIG. 4, which is a block diagram of the functional unit composition of a medical document sorting device provided by an embodiment of the present application. The medical document sorting device 400 includes: a transceiver unit 401 and a processing unit 402, wherein:
收发单元401,用于获取用户的查询语句;The transceiver unit 401 is used to obtain user query sentences;
处理单元402,用于获取与所述查询语句对应的多篇候选医学文献;The processing unit 402 is configured to obtain multiple candidate medical documents corresponding to the query sentence;
处理单元402,还用于确定所述多篇候选医学文献中的每篇候选医学文献对应的至少一个句子;The processing unit 402 is further configured to determine at least one sentence corresponding to each candidate medical document among the multiple candidate medical documents;
处理单元402,还用于根据所述查询语句以及所述至少一个句子,确定所述每篇候选医学文献对应的评分;The processing unit 402 is further configured to determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;
处理单元402,还用于根据所述每篇候选医学文献对应的评分,对所述多篇候选医学文献进行排序。The processing unit 402 is further configured to sort the multiple candidate medical documents according to the score corresponding to each candidate medical document.
在一些可能的实施方式中,在获取与所述查询语句对应的多篇候选医学文献方面,处理单元402,具体用于:In some possible implementation manners, in terms of obtaining multiple candidate medical documents corresponding to the query sentence, the processing unit 402 is specifically configured to:
确定所述查询语句与医学数据库中每篇医学文献对应的相似度;Determine the similarity between the query sentence and each medical document in the medical database;
根据与所述每篇医学文献对应的相似度,从所述医学数据库中选出多篇候选医学文献。According to the similarity corresponding to each medical document, a plurality of candidate medical documents are selected from the medical database.
在一些可能的实施方式中,在根据所述查询语句以及所述至少一个句子,确定所述每篇候选医学文献对应的评分方面,处理单元402,具体用于:In some possible implementation manners, in terms of determining the score corresponding to each candidate medical document according to the query sentence and the at least one sentence, the processing unit 402 is specifically configured to:
对所述查询语句中的每个第一单词进行词嵌入处理,得到所述每个第一单词对应的第一词向量;Performing word embedding processing on each first word in the query sentence to obtain a first word vector corresponding to each first word;
对句子A中的每个第二单词进行词嵌入处理,得到所述每个第二单词对应的第二词向量,其中,所述句子A为所述至少一个句子中的任意一个句子;Performing word embedding processing on each second word in sentence A to obtain a second word vector corresponding to each second word, where the sentence A is any sentence in the at least one sentence;
确定所述查询语句对应的逆文本频率;Determine the inverse text frequency corresponding to the query sentence;
根据所述逆文本频率、所述每个第一单词对应的第一词向量以及所述每个第二单词对应的第二词向量,确定所述每篇医学文献对应的评分。According to the frequency of the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word, the score corresponding to each medical document is determined.
在一些可能的实施方式中,在根据所述逆文本频率、所述每个第一单词对应的第一词向量以及所述每个第二单词对应的第二词向量,确定所述每篇医学文献对应的评分方面,处理单元402,具体用于:In some possible implementation manners, each medical article is determined according to the frequency of the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word. Regarding the score corresponding to the document, the processing unit 402 is specifically used for:
确定所述每个第一单词的第一词向量与所述每个第二单词的第二词向量之间的相似度,得到第一相似度矩阵;Determining the similarity between the first word vector of each first word and the second word vector of each second word to obtain a first similarity matrix;
根据自注意机制以及所述每个第一单词的第一词向量,得到所述每个第一单词对应的第三词向量;Obtaining a third word vector corresponding to each first word according to the self-attention mechanism and the first word vector of each first word;
根据自注意机制以及所述每个第二单词的第二词向量,得到所述每个第二单词对应的第四词向量;Obtaining the fourth word vector corresponding to each second word according to the self-attention mechanism and the second word vector of each second word;
根据双向注意力机制、所述每个第一单词对应的第三词向量以及所述每个第二单词对应的第四向量,得到第一特征矩阵;Obtaining the first feature matrix according to the two-way attention mechanism, the third word vector corresponding to each of the first words, and the fourth vector corresponding to each of the second words;
根据所述逆文本频率、所述第一相似度矩阵以及所述第一特征矩阵,确定所述每篇医 学文献对应的评分。According to the inverse text frequency, the first similarity matrix, and the first feature matrix, the score corresponding to each medical document is determined.
在一些可能的实施方式中,在据所述逆文本频率、所述第一相似度矩阵以及所述第一特征矩阵,确定所述每篇医学文献对应的评分方面,处理单元402,具体用于:In some possible implementation manners, in terms of determining the score corresponding to each medical document according to the inverse text frequency, the first similarity matrix, and the first feature matrix, the processing unit 402 is specifically configured to :
通过getmax函数对所述第一相似度矩阵进行池化处理,得到第二相似度矩阵;Pooling the first similarity matrix by using a getmax function to obtain a second similarity matrix;
对所述第一特征矩阵进行语义特征提取,得到第二特征矩阵;Performing semantic feature extraction on the first feature matrix to obtain a second feature matrix;
对所述第二相似度矩阵、所述第二特征矩阵以及所述逆文本频率进行拼接,确定与所述句子A对应的第三特征矩阵;Splicing the second similarity matrix, the second feature matrix, and the inverse text frequency to determine a third feature matrix corresponding to the sentence A;
根据所述至少一个句子中每个句子对应的第三特征矩阵,确定所述每篇医学文献对应的评分。According to the third feature matrix corresponding to each sentence in the at least one sentence, the score corresponding to each medical document is determined.
在一些可能的实施方式中,在根据所述至少一个句子中每个句子对应的第三特征矩阵,确定所述每篇医学文献对应的评分方面,处理单元402,具体用于:In some possible implementation manners, in terms of determining the score corresponding to each medical document according to the third feature matrix corresponding to each sentence in the at least one sentence, the processing unit 402 is specifically configured to:
对所述每个句子对应的第三特征矩阵进行语义特征提取,得到所述每个句子对应的第四特征矩阵;Performing semantic feature extraction on the third feature matrix corresponding to each sentence to obtain the fourth feature matrix corresponding to each sentence;
通过getmax函数对所述每个句子对应的第四特征矩阵进行池化处理,得到所述每个句子对应的第五特征矩阵;Performing pooling processing on the fourth feature matrix corresponding to each sentence by using the getmax function to obtain the fifth feature matrix corresponding to each sentence;
将所述每个句子对应的第五矩阵以及所述查询语句与所述每篇候选医学文献之间的相似度进行拼接,得到所述每篇候选医学文献对应的目标特征矩阵;Splicing the fifth matrix corresponding to each sentence and the similarity between the query sentence and each candidate medical document to obtain a target feature matrix corresponding to each candidate medical document;
根据所述每篇候选医学文献对应的目标特征矩阵,确定所述每篇候选医学文献对应的评分。According to the target feature matrix corresponding to each candidate medical document, the score corresponding to each candidate medical document is determined.
在一些可能的实施方式中,在根据双向注意力机制、所述每个第一单词对应的第三词向量以及所述每个第二单词对应的第四向量,得到第一特征矩阵方面,处理单元402,具体用于:In some possible implementation manners, in terms of obtaining the first feature matrix according to the two-way attention mechanism, the third word vector corresponding to each first word, and the fourth vector corresponding to each second word, processing Unit 402 is specifically used for:
确定第一单词A的第三词向量与所述每个第二单词的第四词向量之间的相似度,得到所述第一单词A与每个第二单词之间的权重系数,所述第一单词A为所述查询语句中的任意一个单词;Determine the similarity between the third word vector of the first word A and the fourth word vector of each second word to obtain the weight coefficient between the first word A and each second word, the The first word A is any word in the query sentence;
根据所述第一单词A与所述每个第二单词之间的权重系数,对所述每个第二单词对应的第四词向量进行加权处理,得到所述第一单词A对应的第五词向量;According to the weight coefficient between the first word A and each second word, weighting is performed on the fourth word vector corresponding to each second word to obtain the fifth word vector corresponding to the first word A. Word vector
确定所述第一单词A与所述每个第二单词之间的权重系数中的第一最大值,使用所述第一最大值对所述第一单词A对应的第五词向量进行点乘,得到所述第一单词A对应的目标词向量;Determine the first maximum value among the weight coefficients between the first word A and each of the second words, and use the first maximum value to perform dot multiplication on the fifth word vector corresponding to the first word A , Obtain the target word vector corresponding to the first word A;
将所述查询语句中的每个第一单词对应的目标特征向量组成第一矩阵;Forming a first matrix of target feature vectors corresponding to each first word in the query sentence;
确定所述第二单词B的第四词向量与所述每个第一单词的第三词向量之间的相似度,得到所述第二单词B与所述每个第一单词之间的权重系数,其中,所述第二单词B为所述句子A中的任意一个第二单词;Determine the similarity between the fourth word vector of the second word B and the third word vector of each first word, and obtain the weight between the second word B and each first word Coefficient, where the second word B is any second word in the sentence A;
根据所述第二单词B与每个第一单词之间的权重系数,对所述每个第一单词对应的第三词向量进行加权,得到与每个第二单词对应的第五词向量;Weighting the third word vector corresponding to each first word according to the weight coefficient between the second word B and each first word to obtain a fifth word vector corresponding to each second word;
确定所述第二单词B与每个第一单词之间的权重系数中的第二最大值,使用所述第二最大值对所述第一单词B对应的第五词向量进行点乘,得到所述第二单词B对应的目标词向量;Determine the second maximum value of the weight coefficients between the second word B and each first word, and use the second maximum value to perform dot multiplication on the fifth word vector corresponding to the first word B to obtain The target word vector corresponding to the second word B;
将所述每个第二单词对应的目标特征向量组成第二矩阵;Forming the target feature vector corresponding to each second word into a second matrix;
将所述第一矩阵、所述第二矩阵以及每个第二单词对应的第四词向量组成的第三矩阵进行拼接,得到所述第一特征矩阵。The first matrix, the second matrix, and the third matrix formed by the fourth word vector corresponding to each second word are spliced to obtain the first feature matrix.
参阅图5,图5为本申请实施例提供的一种电子设备的结构示意图。该电子设备包括存储器和处理器。可选的,该电子设备还可包括收发器。例如,如图5所示,电子设备500 包括收发器501、处理器502和存储器503。它们之间通过总线504连接。存储器503用于存储计算机程序和数据,并可以将存储503存储的数据传输给处理器502。Referring to FIG. 5, FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the application. The electronic device includes a memory and a processor. Optionally, the electronic device may further include a transceiver. For example, as shown in FIG. 5, the electronic device 500 includes a transceiver 501, a processor 502, and a memory 503. They are connected by a bus 504 between them. The storage 503 is used to store computer programs and data, and can transmit the data stored in the storage 503 to the processor 502.
处理器502用于读取存储器503中的计算机程序执行以下操作:The processor 502 is configured to read the computer program in the memory 503 to perform the following operations:
控制收发器501获取用户的查询语句;Control the transceiver 501 to obtain the user's query sentence;
获取与所述查询语句对应的多篇候选医学文献;Acquiring multiple candidate medical documents corresponding to the query sentence;
确定所述多篇候选医学文献中的每篇候选医学文献对应的至少一个句子;Determine at least one sentence corresponding to each candidate medical document in the plurality of candidate medical documents;
根据所述查询语句以及所述至少一个句子,确定所述每篇候选医学文献对应的评分;Determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;
根据所述每篇候选医学文献对应的评分,对所述多篇候选医学文献进行排序。According to the score corresponding to each candidate medical document, the multiple candidate medical documents are sorted.
在一些可能的实施方式中,在获取与所述查询语句对应的多篇候选医学文献方面,处理器502,具体用于执行以下操作:In some possible implementation manners, in terms of obtaining multiple candidate medical documents corresponding to the query sentence, the processor 502 is specifically configured to perform the following operations:
确定所述查询语句与医学数据库中每篇医学文献对应的相似度;Determine the similarity between the query sentence and each medical document in the medical database;
根据与所述每篇医学文献对应的相似度,从所述医学数据库中选出多篇候选医学文献。According to the similarity corresponding to each medical document, a plurality of candidate medical documents are selected from the medical database.
在一些可能的实施方式中,在根据所述查询语句以及所述至少一个句子,确定所述每篇候选医学文献对应的评分方面,处理器502,具体用于执行以下操作:In some possible implementation manners, in terms of determining the score corresponding to each candidate medical document according to the query sentence and the at least one sentence, the processor 502 is specifically configured to perform the following operations:
对所述查询语句中的每个第一单词进行词嵌入处理,得到所述每个第一单词对应的第一词向量;Performing word embedding processing on each first word in the query sentence to obtain a first word vector corresponding to each first word;
对句子A中的每个第二单词进行词嵌入处理,得到所述每个第二单词对应的第二词向量,其中,所述句子A为所述至少一个句子中的任意一个句子;Performing word embedding processing on each second word in sentence A to obtain a second word vector corresponding to each second word, where the sentence A is any sentence in the at least one sentence;
确定所述查询语句对应的逆文本频率;Determine the inverse text frequency corresponding to the query sentence;
根据所述逆文本频率、所述每个第一单词对应的第一词向量以及所述每个第二单词对应的第二词向量,确定所述每篇医学文献对应的评分。According to the frequency of the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word, the score corresponding to each medical document is determined.
在一些可能的实施方式中,在根据所述逆文本频率、所述每个第一单词对应的第一词向量以及所述每个第二单词对应的第二词向量,确定所述每篇医学文献对应的评分方面,处理器502,具体用于执行以下操作:In some possible implementation manners, each medical article is determined according to the frequency of the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word. In terms of scoring corresponding to the document, the processor 502 is specifically configured to perform the following operations:
确定所述每个第一单词的第一词向量与所述每个第二单词的第二词向量之间的相似度,得到第一相似度矩阵;Determining the similarity between the first word vector of each first word and the second word vector of each second word to obtain a first similarity matrix;
根据自注意机制以及所述每个第一单词的第一词向量,得到所述每个第一单词对应的第三词向量;Obtaining a third word vector corresponding to each first word according to the self-attention mechanism and the first word vector of each first word;
根据自注意机制以及所述每个第二单词的第二词向量,得到所述每个第二单词对应的第四词向量;Obtaining the fourth word vector corresponding to each second word according to the self-attention mechanism and the second word vector of each second word;
根据双向注意力机制、所述每个第一单词对应的第三词向量以及所述每个第二单词对应的第四向量,得到第一特征矩阵;Obtaining the first feature matrix according to the two-way attention mechanism, the third word vector corresponding to each of the first words, and the fourth vector corresponding to each of the second words;
根据所述逆文本频率、所述第一相似度矩阵以及所述第一特征矩阵,确定所述每篇医学文献对应的评分。According to the inverse text frequency, the first similarity matrix, and the first feature matrix, the score corresponding to each medical document is determined.
在一些可能的实施方式中,在据所述逆文本频率、所述第一相似度矩阵以及所述第一特征矩阵,确定所述每篇医学文献对应的评分方面,处理器502,具体用于执行以下操作:In some possible implementation manners, in terms of determining the score corresponding to each medical document according to the inverse text frequency, the first similarity matrix, and the first feature matrix, the processor 502 is specifically configured to Do the following:
通过getmax函数对所述第一相似度矩阵进行池化处理,得到第二相似度矩阵;Pooling the first similarity matrix by using a getmax function to obtain a second similarity matrix;
对所述第一特征矩阵进行语义特征提取,得到第二特征矩阵;Performing semantic feature extraction on the first feature matrix to obtain a second feature matrix;
对所述第二相似度矩阵、所述第二特征矩阵以及所述逆文本频率进行拼接,确定与所述句子A对应的第三特征矩阵;Splicing the second similarity matrix, the second feature matrix, and the inverse text frequency to determine a third feature matrix corresponding to the sentence A;
根据所述至少一个句子中每个句子对应的第三特征矩阵,确定所述每篇医学文献对应的评分。According to the third feature matrix corresponding to each sentence in the at least one sentence, the score corresponding to each medical document is determined.
在一些可能的实施方式中,在根据所述至少一个句子中每个句子对应的第三特征矩阵,确定所述每篇医学文献对应的评分方面,处理器502,具体用于执行以下操作:In some possible implementation manners, in determining the score corresponding to each medical document according to the third feature matrix corresponding to each sentence in the at least one sentence, the processor 502 is specifically configured to perform the following operations:
对所述每个句子对应的第三特征矩阵进行语义特征提取,得到所述每个句子对应的第四特征矩阵;Performing semantic feature extraction on the third feature matrix corresponding to each sentence to obtain the fourth feature matrix corresponding to each sentence;
通过getmax函数对所述每个句子对应的第四特征矩阵进行池化处理,得到所述每个句子对应的第五特征矩阵;Performing pooling processing on the fourth feature matrix corresponding to each sentence by using the getmax function to obtain the fifth feature matrix corresponding to each sentence;
将所述每个句子对应的第五矩阵以及所述查询语句与所述每篇候选医学文献之间的相似度进行拼接,得到所述每篇候选医学文献对应的目标特征矩阵;Splicing the fifth matrix corresponding to each sentence and the similarity between the query sentence and each candidate medical document to obtain a target feature matrix corresponding to each candidate medical document;
根据所述每篇候选医学文献对应的目标特征矩阵,确定所述每篇候选医学文献对应的评分。According to the target feature matrix corresponding to each candidate medical document, the score corresponding to each candidate medical document is determined.
在一些可能的实施方式中,在根据双向注意力机制、所述每个第一单词对应的第三词向量以及所述每个第二单词对应的第四向量,得到第一特征矩阵方面,处理器502,具体用于执行以下操作:In some possible implementation manners, in terms of obtaining the first feature matrix according to the two-way attention mechanism, the third word vector corresponding to each first word, and the fourth vector corresponding to each second word, processing The device 502 is specifically configured to perform the following operations:
确定第一单词A的第三词向量与所述每个第二单词的第四词向量之间的相似度,得到所述第一单词A与每个第二单词之间的权重系数,所述第一单词A为所述查询语句中的任意一个单词;Determine the similarity between the third word vector of the first word A and the fourth word vector of each second word to obtain the weight coefficient between the first word A and each second word, the The first word A is any word in the query sentence;
根据所述第一单词A与所述每个第二单词之间的权重系数,对所述每个第二单词对应的第四词向量进行加权处理,得到所述第一单词A对应的第五词向量;According to the weight coefficient between the first word A and each second word, weighting is performed on the fourth word vector corresponding to each second word to obtain the fifth word vector corresponding to the first word A. Word vector
确定所述第一单词A与所述每个第二单词之间的权重系数中的第一最大值,使用所述第一最大值对所述第一单词A对应的第五词向量进行点乘,得到所述第一单词A对应的目标词向量;Determine the first maximum value among the weight coefficients between the first word A and each of the second words, and use the first maximum value to perform dot multiplication on the fifth word vector corresponding to the first word A , Obtain the target word vector corresponding to the first word A;
将所述查询语句中的每个第一单词对应的目标特征向量组成第一矩阵;Forming a first matrix of target feature vectors corresponding to each first word in the query sentence;
确定所述第二单词B的第四词向量与所述每个第一单词的第三词向量之间的相似度,得到所述第二单词B与所述每个第一单词之间的权重系数,其中,所述第二单词B为所述句子A中的任意一个第二单词;Determine the similarity between the fourth word vector of the second word B and the third word vector of each first word, and obtain the weight between the second word B and each first word Coefficient, where the second word B is any second word in the sentence A;
根据所述第二单词B与每个第一单词之间的权重系数,对所述每个第一单词对应的第三词向量进行加权,得到与每个第二单词对应的第五词向量;Weighting the third word vector corresponding to each first word according to the weight coefficient between the second word B and each first word to obtain a fifth word vector corresponding to each second word;
确定所述第二单词B与每个第一单词之间的权重系数中的第二最大值,使用所述第二最大值对所述第一单词B对应的第五词向量进行点乘,得到所述第二单词B对应的目标词向量;Determine the second maximum value of the weight coefficients between the second word B and each first word, and use the second maximum value to perform dot multiplication on the fifth word vector corresponding to the first word B to obtain The target word vector corresponding to the second word B;
将所述每个第二单词对应的目标特征向量组成第二矩阵;Forming the target feature vector corresponding to each second word into a second matrix;
将所述第一矩阵、所述第二矩阵以及每个第二单词对应的第四词向量组成的第三矩阵进行拼接,得到所述第一特征矩阵。The first matrix, the second matrix, and the third matrix formed by the fourth word vector corresponding to each second word are spliced to obtain the first feature matrix.
具体地,上述收发器501可为图4所述的实施例的医学文献排序装置400的收发单元401,上述处理器502可以为图4所述的实施例的医学文献排序装置400的处理单元402。Specifically, the transceiver 501 may be the transceiver unit 401 of the medical document sorting apparatus 400 in the embodiment shown in FIG. 4, and the processor 502 may be the processing unit 402 of the medical document sorting apparatus 400 in the embodiment shown in FIG. .
应理解,本申请中的医学文献排序装置可以包括智能手机(如Android手机、iOS手机、Windows Phone手机等)、平板电脑、掌上电脑、笔记本电脑、移动互联网设备MID(Mobile Internet Devices,简称:MID)或穿戴式设备等。上述医学文献排序装置仅是举例,而非穷举,包含但不限于上述医学文献排序装置。在实际应用中,上述医学文献排序装置还可以包括:智能车载终端、计算机设备等等。It should be understood that the medical document sorting device in this application may include smart phones (such as Android phones, iOS phones, Windows Phone phones, etc.), tablet computers, handheld computers, notebook computers, mobile Internet Devices (Mobile Internet Devices, MID for short) ) Or wearable devices, etc. The aforementioned medical document sorting device is only an example, not an exhaustive list, and includes but not limited to the aforementioned medical document sorting device. In practical applications, the above-mentioned medical document sorting device may also include: intelligent vehicle-mounted terminals, computer equipment, and so on.
本申请实施例还提供一种计算机(可读)存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现如上述方法实施例中记载的任何一种医学文献排序方法的部分或全部步骤。The embodiments of the present application also provide a computer (readable) storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement any one of the medicines described in the above method embodiments. Part or all of the steps in the literature ranking method.
可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。Optionally, the storage medium involved in this application, such as a computer-readable storage medium, may be non-volatile or volatile.
本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程 序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如上述方法实施例中记载的任何一种医学文献排序方法的部分或全部步骤。The embodiments of the present application also provide a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, the computer program is operable to cause a computer to execute the method described in the above method embodiment Part or all of the steps of any sort of medical literature.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.
在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。The functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or in the form of software program modules.
所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory. A number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The embodiments of the application are described in detail above, and specific examples are used in this article to illustrate the principles and implementation of the application. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the application; at the same time, for Those of ordinary skill in the art, based on the idea of the application, will have changes in the specific implementation and the scope of application. In summary, the content of this specification should not be construed as a limitation to the application.

Claims (20)

  1. 一种医学文献排序方法,包括:A sorting method of medical literature, including:
    获取用户的查询语句;Get the user's query statement;
    获取与所述查询语句对应的多篇候选医学文献;Acquiring multiple candidate medical documents corresponding to the query sentence;
    确定所述多篇候选医学文献中的每篇候选医学文献对应的至少一个句子;Determine at least one sentence corresponding to each candidate medical document in the plurality of candidate medical documents;
    根据所述查询语句以及所述至少一个句子,确定所述每篇候选医学文献对应的评分;Determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;
    根据所述每篇候选医学文献对应的评分,对所述多篇候选医学文献进行排序。According to the score corresponding to each candidate medical document, the multiple candidate medical documents are sorted.
  2. 根据权利要求1所述的方法,其中,所述获取与所述查询语句对应的多篇候选医学文献,包括:The method according to claim 1, wherein said obtaining a plurality of candidate medical documents corresponding to said query sentence comprises:
    确定所述查询语句与医学数据库中每篇医学文献对应的相似度;Determine the similarity between the query sentence and each medical document in the medical database;
    根据与所述每篇医学文献对应的相似度,从所述医学数据库中选出多篇候选医学文献。According to the similarity corresponding to each medical document, a plurality of candidate medical documents are selected from the medical database.
  3. 根据权利要求1或2所述的方法,其中,所述根据所述查询语句以及所述至少一个句子,确定所述每篇候选医学文献对应的评分,包括:The method according to claim 1 or 2, wherein the determining the score corresponding to each candidate medical document according to the query sentence and the at least one sentence comprises:
    对所述查询语句中的每个第一单词进行词嵌入处理,得到所述每个第一单词对应的第一词向量;Performing word embedding processing on each first word in the query sentence to obtain a first word vector corresponding to each first word;
    对句子A中的每个第二单词进行词嵌入处理,得到所述每个第二单词对应的第二词向量,其中,所述句子A为所述至少一个句子中的任意一个句子;Performing word embedding processing on each second word in sentence A to obtain a second word vector corresponding to each second word, where the sentence A is any sentence in the at least one sentence;
    确定所述查询语句对应的逆文本频率;Determine the inverse text frequency corresponding to the query sentence;
    根据所述逆文本频率、所述每个第一单词对应的第一词向量以及所述每个第二单词对应的第二词向量,确定所述每篇医学文献对应的评分。According to the frequency of the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word, the score corresponding to each medical document is determined.
  4. 根据权利要求3所述的方法,其中,所述根据所述逆文本频率、所述每个第一单词对应的第一词向量以及所述每个第二单词对应的第二词向量,确定所述每篇医学文献对应的评分,包括:The method according to claim 3, wherein said determining the frequency according to the frequency of the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word Describe the corresponding scores of each medical literature, including:
    确定所述每个第一单词的第一词向量与所述每个第二单词的第二词向量之间的相似度,得到第一相似度矩阵;Determining the similarity between the first word vector of each first word and the second word vector of each second word to obtain a first similarity matrix;
    根据自注意机制以及所述每个第一单词的第一词向量,得到所述每个第一单词对应的第三词向量;Obtaining a third word vector corresponding to each first word according to the self-attention mechanism and the first word vector of each first word;
    根据自注意机制以及所述每个第二单词的第二词向量,得到所述每个第二单词对应的第四词向量;Obtaining the fourth word vector corresponding to each second word according to the self-attention mechanism and the second word vector of each second word;
    根据双向注意力机制、所述每个第一单词对应的第三词向量以及所述每个第二单词对应的第四向量,得到第一特征矩阵;Obtaining the first feature matrix according to the two-way attention mechanism, the third word vector corresponding to each of the first words, and the fourth vector corresponding to each of the second words;
    根据所述逆文本频率、所述第一相似度矩阵以及所述第一特征矩阵,确定所述每篇医学文献对应的评分。According to the inverse text frequency, the first similarity matrix, and the first feature matrix, the score corresponding to each medical document is determined.
  5. 根据权利要求4所述的方法,其中,所述根据所述逆文本频率、所述第一相似度矩阵以及所述第一特征矩阵,确定所述每篇医学文献对应的评分,包括:The method according to claim 4, wherein the determining the score corresponding to each medical document according to the inverse text frequency, the first similarity matrix, and the first feature matrix comprises:
    通过getmax函数对所述第一相似度矩阵进行池化处理,得到第二相似度矩阵;Pooling the first similarity matrix by using a getmax function to obtain a second similarity matrix;
    对所述第一特征矩阵进行语义特征提取,得到第二特征矩阵;Performing semantic feature extraction on the first feature matrix to obtain a second feature matrix;
    对所述第二相似度矩阵、所述第二特征矩阵以及所述逆文本频率进行拼接,确定与所述句子A对应的第三特征矩阵;Splicing the second similarity matrix, the second feature matrix, and the inverse text frequency to determine a third feature matrix corresponding to the sentence A;
    根据所述至少一个句子中每个句子对应的第三特征矩阵,确定所述每篇医学文献对应的评分。According to the third feature matrix corresponding to each sentence in the at least one sentence, the score corresponding to each medical document is determined.
  6. 根据权利要求5所述的方法,其中,所述根据所述至少一个句子中每个句子对应的第三特征矩阵,确定所述每篇医学文献对应的评分,包括:The method according to claim 5, wherein the determining the score corresponding to each medical document according to the third feature matrix corresponding to each sentence in the at least one sentence comprises:
    对所述每个句子对应的第三特征矩阵进行语义特征提取,得到所述每个句子对应的第 四特征矩阵;Performing semantic feature extraction on the third feature matrix corresponding to each sentence to obtain the fourth feature matrix corresponding to each sentence;
    通过getmax函数对所述每个句子对应的第四特征矩阵进行池化处理,得到所述每个句子对应的第五特征矩阵;Performing pooling processing on the fourth feature matrix corresponding to each sentence by using the getmax function to obtain the fifth feature matrix corresponding to each sentence;
    将所述每个句子对应的第五矩阵以及所述查询语句与所述每篇候选医学文献之间的相似度进行拼接,得到所述每篇候选医学文献对应的目标特征矩阵;Splicing the fifth matrix corresponding to each sentence and the similarity between the query sentence and each candidate medical document to obtain a target feature matrix corresponding to each candidate medical document;
    根据所述每篇候选医学文献对应的目标特征矩阵,确定所述每篇候选医学文献对应的评分。According to the target feature matrix corresponding to each candidate medical document, the score corresponding to each candidate medical document is determined.
  7. 根据权利要求4-6中任一项所述的方法,其中,所述根据双向注意力机制、所述每个第一单词对应的第三词向量以及所述每个第二单词对应的第四向量,得到第一特征矩阵,包括:The method according to any one of claims 4-6, wherein, according to the two-way attention mechanism, the third word vector corresponding to each first word, and the fourth word vector corresponding to each second word Vector, get the first feature matrix, including:
    确定第一单词A的第三词向量与所述每个第二单词的第四词向量之间的相似度,得到所述第一单词A与每个第二单词之间的权重系数,所述第一单词A为所述查询语句中的任意一个单词;Determine the similarity between the third word vector of the first word A and the fourth word vector of each second word to obtain the weight coefficient between the first word A and each second word, the The first word A is any word in the query sentence;
    根据所述第一单词A与所述每个第二单词之间的权重系数,对所述每个第二单词对应的第四词向量进行加权处理,得到所述第一单词A对应的第五词向量;According to the weight coefficient between the first word A and each second word, weighting is performed on the fourth word vector corresponding to each second word to obtain the fifth word vector corresponding to the first word A. Word vector
    确定所述第一单词A与所述每个第二单词之间的权重系数中的第一最大值,使用所述第一最大值对所述第一单词A对应的第五词向量进行点乘,得到所述第一单词A对应的目标词向量;Determine the first maximum value among the weight coefficients between the first word A and each of the second words, and use the first maximum value to perform dot multiplication on the fifth word vector corresponding to the first word A , Obtain the target word vector corresponding to the first word A;
    将所述查询语句中的每个第一单词对应的目标特征向量组成第一矩阵;Forming a first matrix of target feature vectors corresponding to each first word in the query sentence;
    确定所述第二单词B的第四词向量与所述每个第一单词的第三词向量之间的相似度,得到所述第二单词B与所述每个第一单词之间的权重系数,其中,所述第二单词B为所述句子A中的任意一个第二单词;Determine the similarity between the fourth word vector of the second word B and the third word vector of each first word, and obtain the weight between the second word B and each first word Coefficient, where the second word B is any second word in the sentence A;
    根据所述第二单词B与每个第一单词之间的权重系数,对所述每个第一单词对应的第三词向量进行加权,得到与每个第二单词对应的第五词向量;Weighting the third word vector corresponding to each first word according to the weight coefficient between the second word B and each first word to obtain a fifth word vector corresponding to each second word;
    确定所述第二单词B与每个第一单词之间的权重系数中的第二最大值,使用所述第二最大值对所述第一单词B对应的第五词向量进行点乘,得到所述第二单词B对应的目标词向量;Determine the second maximum value of the weight coefficients between the second word B and each first word, and use the second maximum value to perform dot multiplication on the fifth word vector corresponding to the first word B to obtain The target word vector corresponding to the second word B;
    将所述每个第二单词对应的目标特征向量组成第二矩阵;Forming the target feature vector corresponding to each second word into a second matrix;
    将所述第一矩阵、所述第二矩阵以及每个第二单词对应的第四词向量组成的第三矩阵进行拼接,得到所述第一特征矩阵。The first matrix, the second matrix, and the third matrix formed by the fourth word vector corresponding to each second word are spliced to obtain the first feature matrix.
  8. 一种医学文献排序装置,包括:A sorting device for medical documents, including:
    收发单元,用于获取用户的查询语句;The transceiver unit is used to obtain the user's query statement;
    处理单元,用于获取与所述查询语句对应的多篇候选医学文献;A processing unit for obtaining multiple candidate medical documents corresponding to the query sentence;
    所述处理单元,还用于确定所述多篇候选医学文献中的每篇候选医学文献对应的至少一个句子;The processing unit is further configured to determine at least one sentence corresponding to each candidate medical document among the multiple candidate medical documents;
    所述处理单元,还用于根据所述查询语句以及所述至少一个句子,确定所述每篇候选医学文献对应的评分;The processing unit is further configured to determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;
    所述处理单元,还用于根据所述每篇候选医学文献对应的评分,对所述多篇候选医学文献进行排序。The processing unit is further configured to sort the multiple candidate medical documents according to the score corresponding to each candidate medical document.
  9. 一种电子设备,包括:处理器,所述处理器与存储器相连,所述存储器用于存储计算机程序,所述处理器用于执行所述存储器中存储的计算机程序,以使得所述电子设备执行以下方法:An electronic device, comprising: a processor connected to a memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory, so that the electronic device executes the following method:
    获取用户的查询语句;Get the user's query statement;
    获取与所述查询语句对应的多篇候选医学文献;Acquiring multiple candidate medical documents corresponding to the query sentence;
    确定所述多篇候选医学文献中的每篇候选医学文献对应的至少一个句子;Determine at least one sentence corresponding to each candidate medical document in the plurality of candidate medical documents;
    根据所述查询语句以及所述至少一个句子,确定所述每篇候选医学文献对应的评分;Determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;
    根据所述每篇候选医学文献对应的评分,对所述多篇候选医学文献进行排序。According to the score corresponding to each candidate medical document, the multiple candidate medical documents are sorted.
  10. 根据权利要求9所述的电子设备,其中,执行所述根据所述查询语句以及所述至少一个句子,确定所述每篇候选医学文献对应的评分,包括:9. The electronic device according to claim 9, wherein the execution of determining the score corresponding to each candidate medical document according to the query sentence and the at least one sentence comprises:
    对所述查询语句中的每个第一单词进行词嵌入处理,得到所述每个第一单词对应的第一词向量;Performing word embedding processing on each first word in the query sentence to obtain a first word vector corresponding to each first word;
    对句子A中的每个第二单词进行词嵌入处理,得到所述每个第二单词对应的第二词向量,其中,所述句子A为所述至少一个句子中的任意一个句子;Performing word embedding processing on each second word in sentence A to obtain a second word vector corresponding to each second word, where the sentence A is any sentence in the at least one sentence;
    确定所述查询语句对应的逆文本频率;Determine the inverse text frequency corresponding to the query sentence;
    根据所述逆文本频率、所述每个第一单词对应的第一词向量以及所述每个第二单词对应的第二词向量,确定所述每篇医学文献对应的评分。According to the frequency of the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word, the score corresponding to each medical document is determined.
  11. 根据权利要求10所述的电子设备,其中,执行所述根据所述逆文本频率、所述每个第一单词对应的第一词向量以及所述每个第二单词对应的第二词向量,确定所述每篇医学文献对应的评分,包括:11. The electronic device according to claim 10, wherein the execution of the frequency according to the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word, Determine the score corresponding to each medical document, including:
    确定所述每个第一单词的第一词向量与所述每个第二单词的第二词向量之间的相似度,得到第一相似度矩阵;Determining the similarity between the first word vector of each first word and the second word vector of each second word to obtain a first similarity matrix;
    根据自注意机制以及所述每个第一单词的第一词向量,得到所述每个第一单词对应的第三词向量;Obtaining a third word vector corresponding to each first word according to the self-attention mechanism and the first word vector of each first word;
    根据自注意机制以及所述每个第二单词的第二词向量,得到所述每个第二单词对应的第四词向量;Obtaining the fourth word vector corresponding to each second word according to the self-attention mechanism and the second word vector of each second word;
    根据双向注意力机制、所述每个第一单词对应的第三词向量以及所述每个第二单词对应的第四向量,得到第一特征矩阵;Obtaining the first feature matrix according to the two-way attention mechanism, the third word vector corresponding to each of the first words, and the fourth vector corresponding to each of the second words;
    根据所述逆文本频率、所述第一相似度矩阵以及所述第一特征矩阵,确定所述每篇医学文献对应的评分。According to the inverse text frequency, the first similarity matrix, and the first feature matrix, the score corresponding to each medical document is determined.
  12. 根据权利要求11所述的电子设备,其中,执行所述根据所述逆文本频率、所述第一相似度矩阵以及所述第一特征矩阵,确定所述每篇医学文献对应的评分,包括:11. The electronic device according to claim 11, wherein executing the determining the score corresponding to each medical document according to the inverse text frequency, the first similarity matrix, and the first feature matrix comprises:
    通过getmax函数对所述第一相似度矩阵进行池化处理,得到第二相似度矩阵;Pooling the first similarity matrix by using a getmax function to obtain a second similarity matrix;
    对所述第一特征矩阵进行语义特征提取,得到第二特征矩阵;Performing semantic feature extraction on the first feature matrix to obtain a second feature matrix;
    对所述第二相似度矩阵、所述第二特征矩阵以及所述逆文本频率进行拼接,确定与所述句子A对应的第三特征矩阵;Splicing the second similarity matrix, the second feature matrix, and the inverse text frequency to determine a third feature matrix corresponding to the sentence A;
    根据所述至少一个句子中每个句子对应的第三特征矩阵,确定所述每篇医学文献对应的评分。According to the third feature matrix corresponding to each sentence in the at least one sentence, the score corresponding to each medical document is determined.
  13. 根据权利要求12所述的电子设备,其中,执行所述根据所述至少一个句子中每个句子对应的第三特征矩阵,确定所述每篇医学文献对应的评分,包括:11. The electronic device according to claim 12, wherein executing said determining the score corresponding to each medical document according to the third feature matrix corresponding to each sentence in the at least one sentence comprises:
    对所述每个句子对应的第三特征矩阵进行语义特征提取,得到所述每个句子对应的第四特征矩阵;Performing semantic feature extraction on the third feature matrix corresponding to each sentence to obtain the fourth feature matrix corresponding to each sentence;
    通过getmax函数对所述每个句子对应的第四特征矩阵进行池化处理,得到所述每个句子对应的第五特征矩阵;Performing pooling processing on the fourth feature matrix corresponding to each sentence by using the getmax function to obtain the fifth feature matrix corresponding to each sentence;
    将所述每个句子对应的第五矩阵以及所述查询语句与所述每篇候选医学文献之间的相似度进行拼接,得到所述每篇候选医学文献对应的目标特征矩阵;Splicing the fifth matrix corresponding to each sentence and the similarity between the query sentence and each candidate medical document to obtain a target feature matrix corresponding to each candidate medical document;
    根据所述每篇候选医学文献对应的目标特征矩阵,确定所述每篇候选医学文献对应的评分。According to the target feature matrix corresponding to each candidate medical document, the score corresponding to each candidate medical document is determined.
  14. 根据权利要求11-13中任一项所述的电子设备,其中,执行所述根据双向注意力 机制、所述每个第一单词对应的第三词向量以及所述每个第二单词对应的第四向量,得到第一特征矩阵,包括:The electronic device according to any one of claims 11-13, wherein the execution of the two-way attention mechanism, the third word vector corresponding to each first word, and the third word vector corresponding to each second word are executed. The fourth vector, the first feature matrix is obtained, including:
    确定第一单词A的第三词向量与所述每个第二单词的第四词向量之间的相似度,得到所述第一单词A与每个第二单词之间的权重系数,所述第一单词A为所述查询语句中的任意一个单词;Determine the similarity between the third word vector of the first word A and the fourth word vector of each second word to obtain the weight coefficient between the first word A and each second word, the The first word A is any word in the query sentence;
    根据所述第一单词A与所述每个第二单词之间的权重系数,对所述每个第二单词对应的第四词向量进行加权处理,得到所述第一单词A对应的第五词向量;According to the weight coefficient between the first word A and each second word, weighting is performed on the fourth word vector corresponding to each second word to obtain the fifth word vector corresponding to the first word A. Word vector
    确定所述第一单词A与所述每个第二单词之间的权重系数中的第一最大值,使用所述第一最大值对所述第一单词A对应的第五词向量进行点乘,得到所述第一单词A对应的目标词向量;Determine the first maximum value among the weight coefficients between the first word A and each of the second words, and use the first maximum value to perform dot multiplication on the fifth word vector corresponding to the first word A , Obtain the target word vector corresponding to the first word A;
    将所述查询语句中的每个第一单词对应的目标特征向量组成第一矩阵;Forming a first matrix of target feature vectors corresponding to each first word in the query sentence;
    确定所述第二单词B的第四词向量与所述每个第一单词的第三词向量之间的相似度,得到所述第二单词B与所述每个第一单词之间的权重系数,其中,所述第二单词B为所述句子A中的任意一个第二单词;Determine the similarity between the fourth word vector of the second word B and the third word vector of each first word, and obtain the weight between the second word B and each first word Coefficient, where the second word B is any second word in the sentence A;
    根据所述第二单词B与每个第一单词之间的权重系数,对所述每个第一单词对应的第三词向量进行加权,得到与每个第二单词对应的第五词向量;Weighting the third word vector corresponding to each first word according to the weight coefficient between the second word B and each first word to obtain a fifth word vector corresponding to each second word;
    确定所述第二单词B与每个第一单词之间的权重系数中的第二最大值,使用所述第二最大值对所述第一单词B对应的第五词向量进行点乘,得到所述第二单词B对应的目标词向量;Determine the second maximum value of the weight coefficients between the second word B and each first word, and use the second maximum value to perform dot multiplication on the fifth word vector corresponding to the first word B to obtain The target word vector corresponding to the second word B;
    将所述每个第二单词对应的目标特征向量组成第二矩阵;Forming the target feature vector corresponding to each second word into a second matrix;
    将所述第一矩阵、所述第二矩阵以及每个第二单词对应的第四词向量组成的第三矩阵进行拼接,得到所述第一特征矩阵。The first matrix, the second matrix, and the third matrix formed by the fourth word vector corresponding to each second word are spliced to obtain the first feature matrix.
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现以下方法:A computer-readable storage medium in which a computer program is stored, and the computer program is executed by a processor to implement the following method:
    获取用户的查询语句;Get the user's query statement;
    获取与所述查询语句对应的多篇候选医学文献;Acquiring multiple candidate medical documents corresponding to the query sentence;
    确定所述多篇候选医学文献中的每篇候选医学文献对应的至少一个句子;Determine at least one sentence corresponding to each candidate medical document in the plurality of candidate medical documents;
    根据所述查询语句以及所述至少一个句子,确定所述每篇候选医学文献对应的评分;Determine the score corresponding to each candidate medical document according to the query sentence and the at least one sentence;
    根据所述每篇候选医学文献对应的评分,对所述多篇候选医学文献进行排序。According to the score corresponding to each candidate medical document, the multiple candidate medical documents are sorted.
  16. 根据权利要求15所述的计算机可读存储介质,其中,执行所述根据所述查询语句以及所述至少一个句子,确定所述每篇候选医学文献对应的评分,包括:15. The computer-readable storage medium according to claim 15, wherein executing said determining the score corresponding to each candidate medical document according to the query sentence and the at least one sentence comprises:
    对所述查询语句中的每个第一单词进行词嵌入处理,得到所述每个第一单词对应的第一词向量;Performing word embedding processing on each first word in the query sentence to obtain a first word vector corresponding to each first word;
    对句子A中的每个第二单词进行词嵌入处理,得到所述每个第二单词对应的第二词向量,其中,所述句子A为所述至少一个句子中的任意一个句子;Performing word embedding processing on each second word in sentence A to obtain a second word vector corresponding to each second word, where the sentence A is any sentence in the at least one sentence;
    确定所述查询语句对应的逆文本频率;Determine the inverse text frequency corresponding to the query sentence;
    根据所述逆文本频率、所述每个第一单词对应的第一词向量以及所述每个第二单词对应的第二词向量,确定所述每篇医学文献对应的评分。According to the frequency of the inverse text, the first word vector corresponding to each first word, and the second word vector corresponding to each second word, the score corresponding to each medical document is determined.
  17. 根据权利要求16所述的计算机可读存储介质,其中,执行所述根据所述逆文本频率、所述每个第一单词对应的第一词向量以及所述每个第二单词对应的第二词向量,确定所述每篇医学文献对应的评分,包括:The computer-readable storage medium according to claim 16, wherein the execution of the first word vector corresponding to each first word according to the inverse text frequency, and the second word vector corresponding to each second word The word vector determines the score corresponding to each medical document, including:
    确定所述每个第一单词的第一词向量与所述每个第二单词的第二词向量之间的相似度,得到第一相似度矩阵;Determining the similarity between the first word vector of each first word and the second word vector of each second word to obtain a first similarity matrix;
    根据自注意机制以及所述每个第一单词的第一词向量,得到所述每个第一单词对应的 第三词向量;Obtaining the third word vector corresponding to each first word according to the self-attention mechanism and the first word vector of each first word;
    根据自注意机制以及所述每个第二单词的第二词向量,得到所述每个第二单词对应的第四词向量;Obtaining the fourth word vector corresponding to each second word according to the self-attention mechanism and the second word vector of each second word;
    根据双向注意力机制、所述每个第一单词对应的第三词向量以及所述每个第二单词对应的第四向量,得到第一特征矩阵;Obtaining the first feature matrix according to the two-way attention mechanism, the third word vector corresponding to each of the first words, and the fourth vector corresponding to each of the second words;
    根据所述逆文本频率、所述第一相似度矩阵以及所述第一特征矩阵,确定所述每篇医学文献对应的评分。According to the inverse text frequency, the first similarity matrix, and the first feature matrix, the score corresponding to each medical document is determined.
  18. 根据权利要求17所述的计算机可读存储介质,其中,执行所述根据所述逆文本频率、所述第一相似度矩阵以及所述第一特征矩阵,确定所述每篇医学文献对应的评分,包括:18. The computer-readable storage medium according to claim 17, wherein the execution of said determining the score corresponding to each medical document according to said inverse text frequency, said first similarity matrix and said first feature matrix ,include:
    通过getmax函数对所述第一相似度矩阵进行池化处理,得到第二相似度矩阵;Pooling the first similarity matrix by using a getmax function to obtain a second similarity matrix;
    对所述第一特征矩阵进行语义特征提取,得到第二特征矩阵;Performing semantic feature extraction on the first feature matrix to obtain a second feature matrix;
    对所述第二相似度矩阵、所述第二特征矩阵以及所述逆文本频率进行拼接,确定与所述句子A对应的第三特征矩阵;Splicing the second similarity matrix, the second feature matrix, and the inverse text frequency to determine a third feature matrix corresponding to the sentence A;
    根据所述至少一个句子中每个句子对应的第三特征矩阵,确定所述每篇医学文献对应的评分。According to the third feature matrix corresponding to each sentence in the at least one sentence, the score corresponding to each medical document is determined.
  19. 根据权利要求19所述的计算机可读存储介质,其中,执行所述根据所述至少一个句子中每个句子对应的第三特征矩阵,确定所述每篇医学文献对应的评分,包括:18. The computer-readable storage medium according to claim 19, wherein executing said determining the score corresponding to each medical document according to the third feature matrix corresponding to each sentence in the at least one sentence comprises:
    对所述每个句子对应的第三特征矩阵进行语义特征提取,得到所述每个句子对应的第四特征矩阵;Performing semantic feature extraction on the third feature matrix corresponding to each sentence to obtain the fourth feature matrix corresponding to each sentence;
    通过getmax函数对所述每个句子对应的第四特征矩阵进行池化处理,得到所述每个句子对应的第五特征矩阵;Performing pooling processing on the fourth feature matrix corresponding to each sentence by using the getmax function to obtain the fifth feature matrix corresponding to each sentence;
    将所述每个句子对应的第五矩阵以及所述查询语句与所述每篇候选医学文献之间的相似度进行拼接,得到所述每篇候选医学文献对应的目标特征矩阵;Splicing the fifth matrix corresponding to each sentence and the similarity between the query sentence and each candidate medical document to obtain a target feature matrix corresponding to each candidate medical document;
    根据所述每篇候选医学文献对应的目标特征矩阵,确定所述每篇候选医学文献对应的评分。According to the target feature matrix corresponding to each candidate medical document, the score corresponding to each candidate medical document is determined.
  20. 根据权利要求17-19中任一项所述的计算机可读存储介质,其中,执行所述根据双向注意力机制、所述每个第一单词对应的第三词向量以及所述每个第二单词对应的第四向量,得到第一特征矩阵,包括:The computer-readable storage medium according to any one of claims 17-19, wherein the execution of the two-way attention mechanism, the third word vector corresponding to each first word, and the second The fourth vector corresponding to the word, the first feature matrix is obtained, including:
    确定第一单词A的第三词向量与所述每个第二单词的第四词向量之间的相似度,得到所述第一单词A与每个第二单词之间的权重系数,所述第一单词A为所述查询语句中的任意一个单词;Determine the similarity between the third word vector of the first word A and the fourth word vector of each second word to obtain the weight coefficient between the first word A and each second word, the The first word A is any word in the query sentence;
    根据所述第一单词A与所述每个第二单词之间的权重系数,对所述每个第二单词对应的第四词向量进行加权处理,得到所述第一单词A对应的第五词向量;According to the weight coefficient between the first word A and each second word, weighting is performed on the fourth word vector corresponding to each second word to obtain the fifth word vector corresponding to the first word A. Word vector
    确定所述第一单词A与所述每个第二单词之间的权重系数中的第一最大值,使用所述第一最大值对所述第一单词A对应的第五词向量进行点乘,得到所述第一单词A对应的目标词向量;Determine the first maximum value among the weight coefficients between the first word A and each of the second words, and use the first maximum value to perform dot multiplication on the fifth word vector corresponding to the first word A , Obtain the target word vector corresponding to the first word A;
    将所述查询语句中的每个第一单词对应的目标特征向量组成第一矩阵;Forming a first matrix of target feature vectors corresponding to each first word in the query sentence;
    确定所述第二单词B的第四词向量与所述每个第一单词的第三词向量之间的相似度,得到所述第二单词B与所述每个第一单词之间的权重系数,其中,所述第二单词B为所述句子A中的任意一个第二单词;Determine the similarity between the fourth word vector of the second word B and the third word vector of each first word, and obtain the weight between the second word B and each first word Coefficient, where the second word B is any second word in the sentence A;
    根据所述第二单词B与每个第一单词之间的权重系数,对所述每个第一单词对应的第三词向量进行加权,得到与每个第二单词对应的第五词向量;Weighting the third word vector corresponding to each first word according to the weight coefficient between the second word B and each first word to obtain a fifth word vector corresponding to each second word;
    确定所述第二单词B与每个第一单词之间的权重系数中的第二最大值,使用所述第二 最大值对所述第一单词B对应的第五词向量进行点乘,得到所述第二单词B对应的目标词向量;Determine the second maximum value among the weight coefficients between the second word B and each first word, and use the second maximum value to perform dot multiplication on the fifth word vector corresponding to the first word B to obtain The target word vector corresponding to the second word B;
    将所述每个第二单词对应的目标特征向量组成第二矩阵;Forming the target feature vector corresponding to each second word into a second matrix;
    将所述第一矩阵、所述第二矩阵以及每个第二单词对应的第四词向量组成的第三矩阵进行拼接,得到所述第一特征矩阵。The first matrix, the second matrix, and the third matrix formed by the fourth word vector corresponding to each second word are spliced to obtain the first feature matrix.
PCT/CN2021/084228 2020-10-31 2021-03-31 Medical text sorting method and apparatus, electronic device, and storage medium WO2021190662A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011206225.4 2020-10-31
CN202011206225.4A CN112307190B (en) 2020-10-31 2020-10-31 Medical literature ordering method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021190662A1 true WO2021190662A1 (en) 2021-09-30

Family

ID=74333971

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/084228 WO2021190662A1 (en) 2020-10-31 2021-03-31 Medical text sorting method and apparatus, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN112307190B (en)
WO (1) WO2021190662A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116992874A (en) * 2023-09-27 2023-11-03 珠海金智维信息科技有限公司 Text quotation auditing and tracing method, system, device and storage medium
CN117316371A (en) * 2023-11-29 2023-12-29 杭州未名信科科技有限公司 Case report table generation method and device, electronic equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112307190B (en) * 2020-10-31 2023-07-25 平安科技(深圳)有限公司 Medical literature ordering method, device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040002849A1 (en) * 2002-06-28 2004-01-01 Ming Zhou System and method for automatic retrieval of example sentences based upon weighted editing distance
CN107491547A (en) * 2017-08-28 2017-12-19 北京百度网讯科技有限公司 Searching method and device based on artificial intelligence
CN108733745A (en) * 2018-03-30 2018-11-02 华东师范大学 A kind of enquiry expanding method based on medical knowledge
CN111159359A (en) * 2019-12-31 2020-05-15 达闼科技成都有限公司 Document retrieval method, document retrieval device and computer-readable storage medium
CN111753043A (en) * 2020-06-22 2020-10-09 北京百度网讯科技有限公司 Document data processing method, apparatus and storage medium
CN112307190A (en) * 2020-10-31 2021-02-02 平安科技(深圳)有限公司 Medical literature sorting method and device, electronic equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100217768A1 (en) * 2009-02-20 2010-08-26 Hong Yu Query System for Biomedical Literature Using Keyword Weighted Queries
CN108520038B (en) * 2018-03-31 2020-11-10 大连理工大学 Biomedical literature retrieval method based on sequencing learning algorithm
KR102059743B1 (en) * 2018-04-11 2019-12-26 한국과학기술원 Method and system for providing biomedical passage retrieval using deep-learning based knowledge structure construction
CN109977292B (en) * 2019-03-21 2022-12-27 腾讯科技(深圳)有限公司 Search method, search device, computing equipment and computer-readable storage medium
CN111507089B (en) * 2020-06-09 2022-09-09 平安科技(深圳)有限公司 Document classification method and device based on deep learning model and computer equipment
CN111444320B (en) * 2020-06-16 2020-09-08 太平金融科技服务(上海)有限公司 Text retrieval method and device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040002849A1 (en) * 2002-06-28 2004-01-01 Ming Zhou System and method for automatic retrieval of example sentences based upon weighted editing distance
CN107491547A (en) * 2017-08-28 2017-12-19 北京百度网讯科技有限公司 Searching method and device based on artificial intelligence
CN108733745A (en) * 2018-03-30 2018-11-02 华东师范大学 A kind of enquiry expanding method based on medical knowledge
CN111159359A (en) * 2019-12-31 2020-05-15 达闼科技成都有限公司 Document retrieval method, document retrieval device and computer-readable storage medium
CN111753043A (en) * 2020-06-22 2020-10-09 北京百度网讯科技有限公司 Document data processing method, apparatus and storage medium
CN112307190A (en) * 2020-10-31 2021-02-02 平安科技(深圳)有限公司 Medical literature sorting method and device, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116992874A (en) * 2023-09-27 2023-11-03 珠海金智维信息科技有限公司 Text quotation auditing and tracing method, system, device and storage medium
CN116992874B (en) * 2023-09-27 2023-12-22 珠海金智维信息科技有限公司 Text quotation auditing and tracing method, system, device and storage medium
CN117316371A (en) * 2023-11-29 2023-12-29 杭州未名信科科技有限公司 Case report table generation method and device, electronic equipment and storage medium
CN117316371B (en) * 2023-11-29 2024-04-16 杭州未名信科科技有限公司 Case report table generation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112307190A (en) 2021-02-02
CN112307190B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
CN111753060B (en) Information retrieval method, apparatus, device and computer readable storage medium
WO2021190662A1 (en) Medical text sorting method and apparatus, electronic device, and storage medium
US9183274B1 (en) System, methods, and data structure for representing object and properties associations
CN109858010B (en) Method and device for recognizing new words in field, computer equipment and storage medium
CN110162771B (en) Event trigger word recognition method and device and electronic equipment
US10915756B2 (en) Method and apparatus for determining (raw) video materials for news
CN112287069B (en) Information retrieval method and device based on voice semantics and computer equipment
CN111324771B (en) Video tag determination method and device, electronic equipment and storage medium
CN108875065B (en) Indonesia news webpage recommendation method based on content
CN111353021B (en) Intention recognition method and device, electronic device and medium
WO2023040493A1 (en) Event detection
CN112487827A (en) Question answering method, electronic equipment and storage device
WO2021159812A1 (en) Cancer staging information processing method and apparatus, and storage medium
CN117435685A (en) Document retrieval method, document retrieval device, computer equipment, storage medium and product
CN112287217B (en) Medical document retrieval method, medical document retrieval device, electronic equipment and storage medium
CN112199954B (en) Disease entity matching method and device based on voice semantics and computer equipment
CN115098619A (en) Information duplication eliminating method and device, electronic equipment and computer readable storage medium
CN114328894A (en) Document processing method, document processing device, electronic equipment and medium
CN113688633A (en) Outline determination method and device
CN112131350B (en) Text label determining method, device, terminal and readable storage medium
CN116992874B (en) Text quotation auditing and tracing method, system, device and storage medium
CN113220841B (en) Method, apparatus, electronic device and storage medium for determining authentication information
CN110399501B (en) Geological field literature map generation method based on language statistical model
CN110728148B (en) Entity relation extraction method and device
CN117313721A (en) Document management method and device based on natural language processing technology

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21776900

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21776900

Country of ref document: EP

Kind code of ref document: A1