CN113569124A - Medical title matching method, device, equipment and storage medium - Google Patents

Medical title matching method, device, equipment and storage medium Download PDF

Info

Publication number
CN113569124A
CN113569124A CN202110049743.8A CN202110049743A CN113569124A CN 113569124 A CN113569124 A CN 113569124A CN 202110049743 A CN202110049743 A CN 202110049743A CN 113569124 A CN113569124 A CN 113569124A
Authority
CN
China
Prior art keywords
medical
vector
statement
title
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110049743.8A
Other languages
Chinese (zh)
Inventor
康战辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110049743.8A priority Critical patent/CN113569124A/en
Publication of CN113569124A publication Critical patent/CN113569124A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Pathology (AREA)
  • Mathematical Physics (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The scheme relates to application of artificial intelligence technology, and after medical search sentences are obtained, according to the scheme, not only can semantic feature similarity between a medical title and a medical search sentence be determined by combining title vectors of the medical title and sentence vectors of the medical search sentences, but also whether medical intentions between the medical title and the medical search sentences are the same or not can be determined. On the basis, by combining the semantic feature similarity of the medical titles and the medical search sentences and the intention matching result of the medical intention, the matching degree between the medical titles and the medical search sentences can be analyzed more comprehensively, so that the medical titles which are more matched with the medical search sentences can be more accurately determined from the medical titles, and further, the medical text contents pointed by the medical titles can be more accurately searched out based on the medical search sentences.

Description

Medical title matching method, device, equipment and storage medium
Technical Field
The present application relates to the field of search technologies, and in particular, to a method, an apparatus, a device, and a storage medium for matching medical titles.
Background
With the development of internet medical treatment, a user can inquire about medical knowledge through a browser or a medical dictionary application and the like, for example, the user can inquire about medical knowledge of more professionalism and authority through a medical encyclopedia dictionary.
In a medical content search scenario, there is one medical title per piece of medical content. On the basis, after the medical search statement input by the user and related to the medical treatment is obtained, the search engine matches the medical search statement with the medical titles of the medical contents and searches out at least one medical document of which the medical title is matched with the medical search statement.
However, in the medical search field, the matching degree between the medical search sentence and the medical title cannot be determined more accurately, so that the user cannot accurately inquire the required medical content through the medical encyclopedia dictionary.
Disclosure of Invention
In view of this, the present application provides a method, an apparatus, a device and a storage medium for matching medical titles, so as to achieve more accurate matching of medical contents represented by medical titles by using medical search sentences.
In order to achieve the purpose, the application provides the following technical scheme:
in one aspect, the present application provides a medical topic matching method, including:
obtaining a medical search statement;
determining a heading vector of the medical heading for each of a plurality of medical headings to be matched;
determining a sentence vector of the medical search sentence;
for each medical topic, determining feature similarity of the medical topic and the medical search statement based on a topic vector of the medical topic and a statement vector of the medical search statement;
for each medical topic, determining an intention matching result of the medical topic and the medical search statement by using an intention recognition model based on a topic vector of the medical topic and a statement vector of the medical search statement, wherein the intention matching result is used for representing whether medical intentions between the medical topic and the medical search statement are the same or not, the intention recognition model is an intention matching result labeled for each pair of first samples according to the plurality of first samples, and is obtained by training the vectors of the medical topic samples and the medical search statement samples in each pair of first samples;
and determining the matching degree sequence of the medical titles by combining the feature similarity and the intention matching result of each medical title and the medical search statement.
In one possible case, before the determining the feature similarity between the medical title and the medical search statement, the method further includes:
determining a vector difference between a statement vector of the medical search statement and a title vector of the medical heading to obtain a difference vector;
the determining the feature similarity of the medical treatment title and the medical search statement based on the title vector of the medical treatment title and the statement vector of the medical search statement comprises:
and determining the feature similarity of the medical treatment title and the medical treatment search statement based on the title vector of the medical treatment title, the statement vector of the medical treatment search statement and the difference vector.
In yet another possible case, the determining the feature similarity of the medical topic and the medical search statement based on the topic vector of the medical topic, the statement vector of the medical search statement, and the difference vector includes:
determining feature similarity of the medical treatment title and the medical treatment search statement by using a similarity recognition model based on the title vector of the medical treatment title, the statement vector of the medical treatment search statement and the difference vector;
the similarity recognition model is obtained by training according to feature similarities labeled by a plurality of second sample pairs respectively and by using vectors corresponding to the medical heading samples and the medical search statement samples in the second sample pairs and difference vectors between the vectors of the medical heading samples and the vectors of the medical search statement samples.
In yet another possible case, before determining the feature similarity and intention matching result of the medical title and the medical search statement, the method further includes:
reducing the dimension of the statement vector of the medical search statement through a vector dimension reduction model, wherein the vector dimension reduction model is obtained by training the statement vector of the medical title sample and the statement vector of the medical search statement sample by using the second sample in the process of training the similarity recognition model;
reducing the dimension of the title vector of the medical title through the vector dimension reduction model;
the determining a vector difference between the sentence vector of the medical search sentence and the title vector of the medical heading to obtain a difference vector includes:
and determining a vector difference between the statement vector of the medical search statement after dimension reduction and the title vector of the medical title after dimension reduction to obtain a difference vector.
In yet another possible scenario, the determining a heading vector for the medical heading comprises:
determining a heading vector of the medical heading using a vector conversion model;
the determining a sentence vector of the medical search sentence comprises:
determining a sentence vector of the medical search sentence using the vector conversion model;
the vector conversion model is a bidirectional coding representation (BERT) model based on a converter, is obtained by training by utilizing a masked word sequence corresponding to a plurality of medical corpus samples and taking a word which is predicted to be masked in the masked word sequence and is subjected to masking processing as a training target;
the medical corpus sample is composed of a medical heading sample and medical text contents represented by the medical heading sample, and the masked word sequence is a word sequence obtained by masking at least one word contained in the medical corpus sample.
In another aspect, the present application also provides a medical treatment topic matching apparatus, including:
a sentence obtaining unit for obtaining a medical search sentence;
the medical treatment system comprises a first vector determination unit, a second vector determination unit and a matching unit, wherein the first vector determination unit is used for determining a title vector of each medical treatment title in a plurality of medical treatment titles to be matched;
a second vector determination unit for determining a sentence vector of the medical search sentence;
a feature determination unit configured to determine, for each medical topic, a feature similarity between the medical topic and the medical search term based on a topic vector of the medical topic and a term vector of the medical search term;
an intention determining unit, configured to determine, for each medical topic, an intention matching result between the medical topic and the medical search statement based on a topic vector of the medical topic and a statement vector of the medical search statement, and using an intention recognition model, where the intention matching result is used to characterize whether medical intentions between the medical topic and the medical search statement are the same, and the intention recognition model is an intention matching result labeled according to a plurality of first sample pairs, and is obtained by training using respective vectors of medical topic samples and medical search statement samples in each first sample pair;
and the matching determination unit is used for determining the matching degree sequence of the medical titles by combining the feature similarity and the intention matching result of each medical title and the medical search statement.
In one possible implementation manner, the method further includes:
a difference value determining unit, configured to determine a vector difference between a statement vector of the medical search statement and a title vector of the medical heading to obtain a difference value vector before the feature determining unit determines the feature similarity between the medical heading and the medical search statement;
the feature determining unit is specifically configured to determine a feature similarity between the medical heading and the medical search statement based on the heading vector of the medical heading, the statement vector of the medical search statement, and the difference vector.
In another possible implementation manner, the feature determining unit includes:
the characteristic determining subunit is used for determining the characteristic similarity of the medical treatment title and the medical treatment search statement by utilizing a similarity recognition model based on the title vector of the medical treatment title, the statement vector of the medical treatment search statement and the difference vector;
the similarity recognition model is obtained by training according to feature similarities labeled by a plurality of second sample pairs respectively and by using vectors corresponding to the medical heading samples and the medical search statement samples in the second sample pairs and difference vectors between the vectors of the medical heading samples and the vectors of the medical search statement samples.
In yet another aspect, the present application further provides a server comprising a memory and a processor;
wherein the memory is used for storing programs;
the processor is configured to execute the program, and the program, when executed, is specifically configured to implement the medical topic matching method as described in any one of the above.
In still another aspect, the present application also provides a storage medium storing a program for implementing the medical-treatment-title matching method as described in any one of the above when the program is executed.
As can be seen from the above, after the medical search term is obtained, the present application determines, for each medical topic, not only the semantic feature similarity between the medical topic and the medical search term, but also whether the medical intention between the medical topic and the medical search term is the same, in combination with the topic vector of the medical topic and the term vector of the medical search term. On the basis, by combining the semantic feature similarity of the medical titles and the medical search sentences and the intention matching result of the medical intention, the matching degree between the medical titles and the medical search sentences can be analyzed more comprehensively, so that the medical titles which are more matched with the medical search sentences can be more accurately determined from the medical titles, and further, the medical text contents pointed by the medical titles can be more accurately searched out based on the medical search sentences.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on the provided drawings without creative efforts.
FIG. 1 is a schematic diagram of a system component architecture to which the solution of the present application is applicable;
FIG. 2 is a flow chart illustrating one embodiment of a medical topic matching method of the present application;
FIG. 3 is a flow chart illustrating a further embodiment of a medical topic matching method of the present application;
FIG. 4 is a schematic diagram of a framework for implementing the medical topic matching method of the present application;
FIG. 5 illustrates a schematic block diagram of a principle framework for training a BERT model of the present application;
FIG. 6 is a flow chart illustrating an implementation of a training similarity recognition model and an intent recognition model according to the present application;
FIG. 7 is a diagram illustrating an application scenario in which the medical topic matching method of the present application is applied;
FIG. 8 is a schematic diagram of an interface for a terminal to present a matched medical encyclopedia article;
FIG. 9 is a schematic diagram illustrating the components of one embodiment of a medical topic matching apparatus of the present application;
fig. 10 is a schematic diagram illustrating a component structure of an embodiment of an electronic device according to the present application.
Detailed Description
The scheme of the application is suitable for any medical content searching scene. In the medical content search scenario, medical titles of the medical contents may be matched based on the medical search sentence, and the medical content under at least one medical title matched with the medical search sentence may be determined.
For ease of understanding, a medical search system to which aspects of the present application are applicable will be described.
Fig. 1 is a schematic diagram illustrating a component architecture of a medical search system to which the present application is applied.
The medical search system may include: a medical retrieval platform 100 and a terminal 200.
The medical retrieval platform can store a plurality of medical contents, wherein each medical content has a medical title. Since one piece of medical content will only be the subject of one disease, the medical title of each piece of medical content can characterize the subject of the disease to which the medical textual content relates.
Wherein, each medical content can be the content introducing relevant medical knowledge such as disease symptoms, disease causes, disease diagnosis and treatment or health care and the like.
The medical content may be in various forms, for example, the medical content may be medical text content, such as an article or short text related to a medical introduction. For example, the medical text content and the medical title may be a medical question and an answer text of the medical question, respectively.
Of course, the medical content may be medical video content, and the like, which is not limited thereto.
The terminal 200 may access the medical retrieval platform through a browser or a medical retrieval application matched with the medical retrieval platform, and send a search request to the medical retrieval platform, where the search request may carry a medical search statement.
Accordingly, the medical retrieval platform 100 may include at least one server 101.
The server can match medical titles of a plurality of medical contents in the medical retrieval platform according to the medical search statement sent by the terminal, and search out at least one medical title with higher matching degree with the medical search statement, so as to return the medical contents pointed by the searched at least one medical title to the terminal.
For example, the medical retrieval platform may be a medical encyclopedia dictionary platform, and on the basis, the terminal may request various medical encyclopedia knowledge and relevant knowledge such as disease symptoms from the medical encyclopedia dictionary platform.
It is to be understood that the medical search platform may store the medical content and the medical title thereof through the above servers, or may set a database (not shown in fig. 1) in the medical search platform, and store a plurality of medical contents and medical titles associated therewith through the database, which is not limited in this respect.
The server of the medical treatment retrieval platform can be combined with an artificial intelligence technology to realize the relevant processing of medical treatment title matching.
Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
In the application, the medical search platform at least relates to natural language processing technology, machine learning and other artificial intelligence technologies in order to match medical search sentences with medical titles.
Among them, Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
For example, in the present application, in order to train a model such as a vector model, text processing such as word segmentation may be performed on a medical heading sample and a search term sample, and semantic understanding of a medical heading and a medical search term may be performed.
The machine learning is a multi-field cross subject and relates to a plurality of subjects such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
The following describes techniques such as artificial intelligence related to the medical topic matching method and the medical topic matching method according to the present application with reference to flowcharts.
As shown in fig. 2, which shows a flowchart of a medical topic matching method according to the present application, the method of the present embodiment may be applied to the server as mentioned above, and the method of the present embodiment may include:
s201, obtaining a medical search statement.
For example, the server may obtain a search request sent by the terminal, where the search request may carry a medical search statement, and the search request is used to request to search for at least one piece of medical content matching the medical search statement. Accordingly, the server may obtain the medical search statement carried in the search request.
The medical search term is a search term (also referred to as a query term) related to medical content to be requested, and for convenience of distinction, the search term is referred to as a medical search term.
Wherein the medical search sentence may be a character string including at least one character. For example, the medical search statement may be a word or sentence such as "how to obtain child anorexia" or "fever".
S202, aiming at each medical title in a plurality of medical titles to be matched, determining a title vector of the medical title.
It is understood that a plurality of medical contents may be stored in the medical search platform, each medical content having a medical title, and thus the medical search platform stores a plurality of medical titles. In order to determine the medical content matching the medical search term, in the field of medical search, the matching degree of the medical search term and the medical topic needs to be determined, and therefore, the present application is directed to the related operations of steps S202 to S205 for each medical topic.
Of course, in practical applications, the medical titles in the medical search platform may be first screened according to the keywords included in the medical query statement, and the medical titles that are possibly matched with the medical query statement may be screened out, and the searched medical titles may be used as the medical titles to be matched.
For the sake of convenience of distinction, the vector converted from the medical heading is referred to as a heading vector. Wherein the header vector may characterize semantic features of the medical header.
It will be appreciated that there are many ways to determine the heading vector for a medical heading. For example, the title vector may be determined using a currently existing word vector model.
In one possible implementation, the title vector of the medical title may be determined by using a vector transformation model trained in advance.
It can be understood that, considering that the general word vector model may not be able to accurately determine the vector of the medical text in the medical field, the present application may also perform fine tuning (fine tune) on the existing word vector model or encoder model by using the medical corpus sample in advance to train a vector transformation model suitable for determining the vector of the medical text in the medical field.
The medical corpus sample can be composed of a medical heading sample and medical text content represented by the medical heading sample. Of course, the medical corpus sample may be other medical samples related to medical content, and the like, which is not limited to this.
For example, the word2vec model may be trained using medical corpus samples to obtain a word2vec model suitable for determining word vectors for words related to the medical field.
For another example, the vector transformation model may be a trained Bidirectional Encoding Representation (BERT) model based on a transformer, and accordingly, the vector transformation model may be obtained by using a masked word sequence corresponding to a plurality of medical corpus samples and training with words predicted to be masked in the masked word sequence as training targets. The masked word sequence corresponding to the medical corpus sample is a word sequence obtained after at least one word contained in the medical corpus sample is masked.
The masking processing of the medical corpus sample containing at least one character means that a set masking rule is adopted to replace or change part or all of words in the medical corpus sample, so that at least one word in the medical corpus sample after masking processing is changed.
For example, assuming that the medical corpus sample can be segmented into 100 words, 80 of the 100 words can be kept unchanged, and 80% of the remaining 20 words are replaced with mask marks, 10% are replaced with other characters, and 10% are kept unchanged. Of course, the same applies to determining each character in the medical corpus sample as a word.
It can be understood that, because the BERT model employs a multi-layer Transformer (Transformer) to perform bidirectional learning on a text (e.g., a medical heading in the present application), context relationships between words in the text (e.g., words in the medical heading) can be learned more accurately, so that semantic features of the words in the text can be extracted accurately. Therefore, the trained BERT model can be used for more accurately extracting semantic features reflecting medical semantics in the medical titles.
S203, determining a sentence vector of the medical search sentence.
For the sake of easy distinction, the vector converted from the medical search term is referred to as a term vector. Wherein, the sentence vector of the medical search sentence is used for representing the semantic feature of the medical search sentence.
Similar to step S202, there are various ways to determine the sentence vector, for example, the sentence vector of the medical search sentence can be determined by using a general word vector model.
For example, a vector conversion model trained in advance using the medical corpus text may be used, and accordingly, a term vector of the medical search term may be determined using the vector conversion model. The vector transformation model may be the same as the vector transformation model in the previous step S202, for example, the vector transformation model may be a trained BERT model.
It is understood that the order of steps S202 and S203 may be interchanged, or may be performed simultaneously, which is not limited thereto.
And S204, determining the feature similarity of the medical titles and the medical search sentences according to the title vectors of the medical titles and the sentence vectors of the medical search sentences aiming at each medical title.
Wherein the feature similarity is used for representing the semantic similarity between the medical treatment title and the medical treatment search statement.
It is understood that the feature similarity may be a similarity score or a similarity grade representing the similarity degree, for example, 5 similarity grades may be set from completely the same to completely different, and different feature similarity degrees may be represented by different similarity grades.
There may be various ways to determine the feature similarity.
For example, in one possible case, a cosine similarity between the title vector of the medical title and the sentence vector of the medical search sentence may be calculated, and the calculated cosine similarity may be determined as a feature similarity of the medical title and the medical search sentence.
In another possible case, the similarity recognition model may be trained in advance, where the similarity recognition model may be obtained by training vectors corresponding to the medical heading sample and the medical search statement sample in each sample pair according to feature similarities labeled by the plurality of sample pairs respectively. Each sample pair comprises a medical heading sample and a medical search statement sample, the medical heading sample is a medical heading serving as a training sample, and the medical search statement sample is a medical search statement serving as a training sample.
For example, the similarity recognition model may be a classification model such as normalized softmax, and on this basis, the similarity category corresponding to each pair of sample pairs is labeled in advance, for example, the similarity category may include: the 5 categories of similarity category 1 to similarity category 5, and the five categories are sequentially represented: the similarity is completely the same, the similarity exceeds eighty percent, the similarity exceeds fifty percent and is less than eighty percent, and the similarity is less than fifty percent and is completely different. On the basis, the label vectors of the medical treatment title samples in the sample pairs marked with the similarity categories and the statement vectors of the medical treatment search statement samples train classification models such as softmax, and the specific training process is not limited.
In this case, the feature similarity of the medical topic and the medical search sentence may be determined based on the topic vector of the medical topic and the sentence vector of the medical search sentence using the similarity recognition model.
And S205, for each medical heading, determining an intention matching result of the medical heading and the medical search sentence by using the intention recognition model based on the heading vector of the medical heading and the sentence vector of the medical search sentence.
Wherein the intention matching result is used for representing whether the medical intention between the medical title and the medical search statement is the same.
For example, the intention matching result can be divided into two categories, i.e., the same intention and the different intentions, and on this basis, for each medical topic, the intention matching result of the medical topic and the medical search statement can be the same intention or the different intentions.
Wherein the medical intent may characterize the direction of medical knowledge desired to be expressed or requested. Determining the category of medical knowledge expected to be obtained based on the medical search statement according to the medical intention of the medical search statement; and the medical intent of a medical topic may characterize the category of medical knowledge reflected by the medical content to which the medical topic is directed.
For example, medical intent can be divided into symptoms, causes, medical visits, medications, treatments and preventions. For example, if the medical intention of the medical search statement is a symptom, it means that a medical introduction related to the disease symptom is expected to be searched by the medical search statement.
Whether the medical intention between the medical search sentence and the medical title is the same or not can be analyzed through the trained intention recognition model, and therefore an intention recognition result is obtained.
The intention identification model is obtained by training respective vectors of medical heading samples and medical search statement samples in each sample pair according to intention matching results marked by the multiple sample pairs.
Each sample pair may include a medical topic sample and a medical search statement sample, and each sample pair is labeled with an intention matching result. On the basis, the heading vector of the medical heading sample in each sample pair and the statement vector of the medical search statement can be sequentially input into the trained intention recognition model, and the intention recognition result of each sample pair predicted by the intention recognition model is compared with the actually labeled intention recognition result of the sample pair, so that the prediction accuracy of the intention recognition model is indicated to meet the requirement.
The plurality of sample pairs for training the intention recognition model may be the same as or different from the plurality of sample pairs for training the similarity recognition model, and the method is not limited thereto.
For the sake of convenience of distinction, the pair of samples used for training the intention recognition model will be referred to as a first pair of samples in the claims of the present application, and the pair of samples used for training the similarity recognition model will be referred to as a second pair of samples. Of course, the first sample pair and the second sample pair are only for distinguishing the sample pairs for training different models without other meanings, and in the following embodiments, the sample pair for training the similarity recognition model may be referred to as the first sample pair, and the sample pair for training the intention recognition model may be referred to as the second sample pair, as needed.
And S206, determining the matching degree sequence of the medical titles by combining the feature similarity and the intention matching result of each medical title and the medical search statement.
For example, if the rank of a medical topic having a higher degree of feature similarity to the medical search term is higher, and if the degree of feature similarity is the same, the rank of a medical topic having the same intention as the medical search term is higher based on the intention matching result of the medical topic.
It is understood that the present embodiment actually analyzes the matching degree of the medical title and the medical search sentence from two dimensions of the feature similarity of the medical title and the medical search sentence and whether the medical intention is similar. The medical intention can reflect the type of medical knowledge requested to be inquired by the medical search statement, and the medical intention of the medical title can reflect the type of medical content pointed by the medical title, so that on the basis of determining the feature similarity of the medical title and the medical search statement, the matching degree sequence of the medical title is determined by combining the matching of the medical title and the intention of the medical search statement, the sequence of the medical title corresponding to the medical content requested by the medical search statement can be more advanced, and the medical content can be more accurately retrieved.
It can be understood that, in practical applications, after the matching degree ranks of the plurality of medical titles are determined, the medical content corresponding to at least one medical title with the top matching degree rank of the medical titles may be returned to the terminal according to the matching degree ranks of the plurality of medical titles.
It is understood that the present application may determine the matching degree of the medical treatment title and the medical search sentence by combining the feature similarity and the intention matching result of the medical treatment title and the medical search sentence.
If the medical titles are determined to be the same as the intention of the medical search statement according to the intention matching result of the medical titles and the medical search statement, setting the value of the intention matching result to be 1; accordingly, if the intention according to the medical title is not identical to the intention of the medical search sentence, the intention matching result is set to zero.
Accordingly, in the case of obtaining the feature similarity and intention matching result of the medical heading and the medical search sentence, determining the matching degree of the medical heading and the medical search sentence may be as follows:
calculating a first product of the feature similarity of the medical treatment title and the medical treatment search statement and a first weight;
calculating a second product of the value of the intention matching result of the medical title and the medical search statement and a second weight;
and determining the sum of the first product and the second product as the matching degree of the medical title and the medical search statement.
After the matching degree between the medical titles and the medical search statement is obtained, the matching degree sequence of the medical titles can be determined according to the matching degree between each medical title and the medical search statement. Of course, at least one medical content with a high degree of matching of medical titles may be returned to the terminal directly according to the degree of matching of each medical title.
It can be seen that, after the medical search statement is obtained, for each medical topic, not only the semantic feature similarity between the medical topic and the medical search statement is determined by combining the topic vector of the medical topic and the statement vector of the medical search statement, but also whether the medical intention between the medical topic and the medical search statement is the same is determined. On the basis, by combining the semantic feature similarity of the medical titles and the medical search sentences and the intention matching result of the medical intention, the matching degree between the medical titles and the medical search sentences can be analyzed more comprehensively, so that the medical titles which are more matched with the medical search sentences can be more accurately determined from the medical titles, and further, the medical text contents pointed by the medical titles can be more accurately searched out based on the medical search sentences.
It can be understood that the vector difference between the sentence vector of the medical search sentence and the title vector of the medical heading can reflect the difference between the semantic features of the medical heading and the medical search sentence, that is, the similarity between the semantic features of the medical heading and the medical search sentence can be reflected in one dimension.
Based on the above, before determining the feature similarity between the medical treatment title and the medical treatment search statement, the method can also determine the vector difference between the statement vector of the medical treatment search statement and the title vector of the medical treatment title to obtain the difference vector. Accordingly, the subject application can be directed to each medical heading
In an alternative, for each medical topic, the feature similarity between the medical topic and the medical search statement may be determined based on the topic vector of the medical topic, the statement vector of the medical search statement, and the difference vector, and using a similarity recognition model.
In this case, the similarity recognition model may be obtained by training, according to feature similarities labeled for each of the plurality of sample pairs, vectors corresponding to the medical heading sample and the medical search term sample in each second sample pair and a difference vector between the vectors of the medical heading sample and the medical search term sample in the sample pair.
For example, a softmax model trained using a plurality of pairs of samples labeled with feature similarities may be used as the similarity recognition model.
It can be understood that, in the above embodiments of the present application, in consideration of the fact that the dimensionality of the vector converted from the medical search statement and the medical heading statement is higher, the present application may further perform dimensionality reduction on the statement vector of the medical search statement and the heading vector of the medical heading, and then determine the above-mentioned feature similarity and intention matching result based on the reduced statement vector and the heading vector.
In the following description with reference to an implementation manner, as shown in fig. 3, a flowchart of another embodiment of the medical title matching method according to the present application is shown, where the method of this embodiment may include:
s301, obtaining a medical search statement.
S302, determining a statement vector of the medical search statement by using a BERT model.
And S303, determining a title vector of each medical title in the plurality of medical titles to be matched by using a BERT model.
The BERT model is obtained by training by utilizing a mask word sequence corresponding to a plurality of medical corpus samples and taking words which are predicted to be processed by mask in the mask word sequence as training targets. The masked word sequence is a word sequence obtained after at least one word in the medical corpus sample is masked.
It should be noted that, for convenience of understanding, the present embodiment is described by taking a vector conversion model as a BERT model trained by using medical corpus samples, but it is understood that the present embodiment is also applicable to a method of determining the sentence vector and the title vector by using other vector conversion models or by using other methods.
S304, reducing the dimension of the statement vector of the medical search statement through the vector dimension reduction model to obtain the statement vector after dimension reduction.
The vector dimension reduction model is obtained by training a statement vector of a medical search statement sample and a title vector of a medical title sample in each sample pair for training the similarity recognition model in the process of training the similarity recognition model. That is, the vector dimension reduction model can be trained together with the similarity recognition model.
For example, in an alternative, the vector dimension reduction model may be a pooling model.
S305, for each medical topic, reducing the dimension of the topic vector of the medical topic through the vector dimension reduction model to obtain the dimension-reduced topic vector.
S306, aiming at each medical heading, determining a vector difference between the statement vector after dimension reduction and the heading vector after dimension reduction corresponding to the medical heading to obtain a difference vector.
S307, aiming at each medical title, inputting the reduced-dimension title vector, the reduced-dimension statement vector and the difference vector into the trained similarity recognition model to obtain the feature similarity between the medical title and the medical search statement output by the similarity recognition model.
For example, for each medical topic, the reduced-dimension heading vector, the reduced-dimension statement vector, and the difference vector between the heading vector of the medical topic and the statement vector may be reconstructed into one vector, and then the reconstructed vector is input to the similarity recognition model to obtain the feature similarity output by the similarity recognition model.
In one possible case, the feature similarity model may be a first trained softmax model, a feature similarity category of the medical title and the medical search statement may be determined through the first softmax model, and a similarity degree of the medical title and the medical search statement may be characterized through the feature similarity category.
And S308, inputting the dimension-reduced title vector and the dimension-reduced statement vector into the intention identification model aiming at each medical title to obtain the intention matching result of the medical title and the medical search statement.
Wherein the intention matching result is used for representing whether the medical intention between the medical title and the medical search statement is the same.
For example, the intention recognition model may be a second softmax model trained from which intention matches between medical titles and medical search sentences may be determined.
S309, combining the feature similarity and intention matching results of the medical titles and the medical search sentences to determine matching degree ranking of the medical titles.
This step S309 can refer to the related description of the previous embodiment, and is not described herein again.
For the convenience of understanding the embodiment of fig. 3, reference may be made to fig. 4, which shows a functional block diagram of one implementation of medical topic matching of the present application, and in fig. 4, a vector transformation model is taken as a BERT model, a vector dimension reduction model is taken as a pooling model, a similarity recognition model is taken as a first softmax classification model, and an intention recognition model is taken as a second softmax classification model.
As can be seen from fig. 4, the medical search sentence may be processed by the BERT model to obtain a sentence vector u, and the medical heading may be processed by the BERT model to obtain a heading vector v. On the basis, the statement vector u is subjected to dimensionality reduction through a pooling layer to obtain a dimensionality-reduced statement vector u; meanwhile, the header vector v of the medical header is subjected to dimensionality reduction through the pooling layer to obtain the header vector v subjected to dimensionality reduction.
It should be noted that fig. 4 is to illustrate the branch of the medical search statement passing through the BERT model and the pooling layer and the branch of the medical topic passing through the BERT model and the pooling layer, in order to facilitate understanding of the process of obtaining the reduced-dimension statement vector u and the reduced-dimension title vector v, but in practical applications, the BERT model for processing the medical search statement and the medical topic may be the same, and correspondingly, the pooling layer is also the same.
On the basis, the difference vector of the sentence vector u after dimensionality reduction and the title vector v after dimensionality reduction is calculated by the method: u-v, and then inputting the sentence vector u subjected to the dimensionality reduction, the title vector v subjected to the dimensionality reduction and the difference vector u-v into a first softmax classification model serving as a similarity recognition model to obtain the similarity category of the medical title and the medical search sentence.
Meanwhile, the sentence vector u after dimension reduction and the title vector v after dimension reduction are also input into a second softmax classification model serving as an intention identification model, and an intention matching result of the medical title and the medical search sentence is obtained.
On the basis, the similarity category and the intention matching result of each medical title and the medical search statement are combined, so that the matching degree sequence of each medical title and the medical search statement can be determined.
In the embodiment of the application, the vector transformation model can be obtained by training alone, after the BERT model is obtained by training, the intention recognition model and the similarity recognition model can be trained synchronously, and the intention recognition model and the similarity recognition model can also be obtained by training alone respectively.
For ease of understanding, the following describes possible scenarios for training the above models in this application.
First, a training process of the vector transformation model is described. For convenience of introduction, the vector transformation model is still taken as the BERT model, and the medical content pointed by the medical heading is taken as the medical text content for example.
As shown in fig. 5, it shows a schematic diagram of one implementation of the present application to train the BERT model.
In the application, a plurality of medical corpus samples can be obtained, and each medical corpus sample comprises a medical heading sample and medical text content corresponding to the medical heading sample.
For each medical corpus sample, a word sequence consisting of a plurality of words constituting a medical caption sample and a medical text content of the medical corpus sample can be obtained. On the basis, the words in the word sequence can be subjected to mask processing, so that partial words in the word sequence are masked, and the masked word sequence corresponding to each medical corpus sample is obtained.
As shown in fig. 5, some words in the word sequence portion corresponding to the medical topic sample in the medical corpus sample are marked by the mask; similarly, some words in the word sequence part corresponding to the medical text content in the medical corpus sample are marked by the mask. On the basis, after the masked word sequence with the mask marks is input into the BERT model to be trained, the BRET model can determine word vectors of all words in the masked word sequence based on the context relationship among all words in the masked word sequence.
The word vectors of all words in the masked word sequence output by the BERT model can be input into the full-connection network layer, and the mask probability that all words in the masked word sequence belong to the words marked by the masks can be obtained through the full-connection network layer. On the basis, according to the mask probability of each word in the mask word sequence, the predicted masked word in the mask word sequence can be obtained.
Correspondingly, the prediction accuracy of the BERT model and the full-connection network layer is analyzed by combining the actual masked words in the mask word sequence corresponding to each medical corpus sample and the predicted masked words; and if the prediction accuracy does not meet the requirement, adjusting parameters in the BERT model and the fully-connected network layer, and retraining the BERT model by using each medical corpus sample until the prediction accuracy meets the requirement.
After training the BERT model, the similarity recognition model and the intention recognition model may be trained in conjunction with the architecture diagram shown in fig. 4. The similarity recognition model is exemplified as a first normalized softmax classification model, and the intent recognition model is exemplified as a second normalized softmax classification model, and is described with reference to fig. 4. Fig. 6 is a schematic diagram illustrating an implementation flow of training a similarity recognition model and an intention recognition model according to the present application. The method of the embodiment can comprise the following steps:
s601, obtaining a plurality of first sample pairs marked with similarity categories and a plurality of second sample pairs marked with intention matching results.
Wherein any one of the first sample pair and the second sample pair is composed of a pair of a medical caption sample and a medical search sentence sample.
The similarity category at least comprises two categories of similar features and dissimilar features, and the feature similar category between the similar features and the dissimilar features can be set according to requirements.
It can be understood that, in this embodiment, the similarity recognition model is trained through the first sample pair, and the intention recognition model is trained through the second sample pair as an example, in practical applications, multiple sample pairs of the similarity recognition model and the intention recognition model may also be trained identically, in which case, each sample pair may be labeled with the similarity category and the intention matching result at the same time.
S602, for any sample pair of the first sample pair and the second sample pair, respectively determining a heading vector corresponding to the medical heading sample and a sentence vector corresponding to the medical search sentence sample in the sample pair by using the trained BERT model, and executing step S603.
And S603, pooling title vectors of the medical title samples and statement vectors corresponding to the medical search statement samples respectively by using a pooling model to be trained to obtain pooled title vectors and pooled statement vectors.
S604, calculating a difference vector between the pooled statement vector corresponding to the first sample pair and the pooled title vector, and inputting the title vector of the medical title sample in the first sample pair, the statement vector corresponding to the medical search statement sample and the difference vector into a first normalized classification model to obtain the similarity category predicted by the first normalized classification model.
And S605, inputting the title vector of the medical title sample and the statement vector corresponding to the medical search statement sample in each second sample pair into the second normalized classification model to obtain the predicted intention identification result of the second normalized classification model.
S606, detecting whether the training end condition is met or not according to the corresponding actually marked similarity class and the predicted similarity class of each first sample pair, the corresponding actually marked intention identification result and the predicted intention identification result of each second sample pair, and if so, finishing the training; if not, adjusting parameters in the first normalized classification model, the second normalized classification model and the pooling model, and returning to the step S603 until the training end condition is met.
The training end condition may be set as needed.
For example, the loss function value may be calculated in accordance with a set loss function; and if the loss function value is determined to be converged, determining that the training end condition is met.
Wherein, for the first normalized classification model, the objective function is to optimize the matching classification function: softmax (u, v, | u-v |);
for example, this objective function can be optimized using a cross entropy loss, which can be expressed as follows:
Figure BDA0002898624840000181
where n corresponds to the number of pairs of first samples, m is the number of similarity classes set, yijAnd (3) a label indicating that the ith sample pair belongs to the similarity class j, wherein if the predicted similarity class corresponding to the ith sample pair belongs to at least one set similarity class, the value of the label is 1, and if not, the value of the label is 0. For a single classification task, the label of only one classification is non-zero due to one classification. f (x)ij) The probability that a sample pair i is predicted as a similarity class j is indicated. The size of the loss is completely dependent on the probability of classifying as the correct label, and when all samples are classified correctly, the loss is 0, otherwise, the loss is greater than 0.
In fig. 6, the first normalized classification model and the second normalized classification model can be regarded as a multi-task model training, and on this basis, the overall objective function of the multi-task model training can be expressed as:
Obj_total=alpha*softmax(u,v,|u-v|)+(1-alpha)*ObjFuntion(class(u),class(v))
the alpha represents the importance ratio of softmax (u, v, | u-v |) corresponding to the first normalized classification model, the value range is 0-1, and a value larger than 0.5 and smaller than 1 can be set for the alpha generally.
As in the case of the first and second sample pairs being identical, the objective of the multitask model training is to converge the values determined based on the overall objective function for each sample pair.
It is to be understood that after the training is completed, the first normalized classification model is the aforementioned similarity recognition model, and the second normalized classification model is the intention recognition model, and the pooling model may be a vector dimension reduction model.
It is understood that fig. 6 is only one implementation manner of training the intention recognition model and the similarity recognition model in the present application, and in practical applications, the similarity recognition model may be trained first, and in the process of training the similarity recognition model, the pooling model may be trained synchronously. Upon completion of the similarity recognition model, the intent recognition model may be trained separately with a plurality of second sample pairs on the basis of the trained BERT model and the pooling model.
In the following, an application scenario is introduced, and a medical dictionary platform in which a medical retrieval platform is a medical encyclopedia dictionary is taken as an example for explanation.
As shown in fig. 7, the medical dictionary platform 710 may include a plurality of servers 711 that provide medical encyclopedia dictionary services.
The terminal 720 may be installed with a medical dictionary application corresponding to the medical encyclopedia dictionary.
The terminal 720 may send a medical search request to the server 711 of the medical dictionary platform through the medical dictionary application. Wherein the medical search request carries a medical search statement.
The server 711 of the medical dictionary platform obtains the medical search statement in the medical search request, and respectively determines the feature similarity and the intention matching result of each medical text content and the medical search statement according to the scheme of any one of the embodiments above; and determining the matching degree of the medical text content and the medical search statement according to the feature similarity and the intention matching result of the medical title and the medical search statement aiming at each medical title. On the basis, the server of the medical dictionary platform can return the medical science popularization article corresponding to at least one medical title with high matching degree to the terminal according to the matching degree corresponding to each medical title.
Accordingly, the terminal may present each medical science popularization article returned by the server of the medical dictionary platform. As shown in fig. 8, it is a schematic diagram of an interface that presents the searched medical text content for the terminal. As shown in fig. 8, after the terminal inputs the medical search term "child anorexia" in the search field, the medical science popularization articles returned by the server for the medical search term may sequentially include: the medical treatment subject is a science popularization article on the basis of 'how to care the child anorexia', a science popularization article on the basis of 'how to treat the child anorexia', and the like. On the basis, the specific content of the science popularization article can be checked by opening a certain science popularization article.
The application also provides a medical treatment title matching device. As shown in fig. 9, which shows a schematic structural diagram of an embodiment of a medical title matching device according to the present application, the device may include:
a sentence obtaining unit 901 for obtaining a medical search sentence;
a first vector determination unit 902, configured to determine, for each medical topic of a plurality of medical topics to be matched, a topic vector of the medical topic;
a second vector determination unit 903 for determining a sentence vector of the medical search sentence;
a feature determination unit 904, configured to determine, for each medical topic, a feature similarity between the medical topic and the medical search term based on the topic vector of the medical topic and the term vector of the medical search term;
an intention determining unit 905, configured to determine, for each medical topic, an intention matching result between the medical topic and the medical search statement based on the topic vector of the medical topic and the statement vector of the medical search statement by using an intention recognition model, where the intention matching result is used to characterize whether the medical intention between the medical topic and the medical search statement is the same, the intention recognition model is an intention matching result labeled according to each of the plurality of first sample pairs, and is obtained by training using the vector of each of the medical topic sample and the medical search statement sample in each of the first sample pairs;
a matching determination unit 906, configured to determine matching degree ranks of the medical titles by combining feature similarity and intention matching results of each medical title and the medical search statement.
In one possible implementation, the apparatus may further include:
a difference value determining unit, configured to determine a vector difference between a statement vector of the medical search statement and a title vector of the medical heading to obtain a difference value vector before the feature determining unit determines the feature similarity between the medical heading and the medical search statement;
correspondingly, the feature determining unit is specifically configured to determine the feature similarity between the medical heading and the medical search statement based on the heading vector of the medical heading, the statement vector of the medical search statement, and the difference vector.
As an alternative, the feature determination unit includes:
the characteristic determining subunit is used for determining the characteristic similarity of the medical treatment title and the medical treatment search statement by utilizing a similarity recognition model based on the title vector of the medical treatment title, the statement vector of the medical treatment search statement and the difference vector;
the similarity recognition model is obtained by training according to feature similarities labeled by a plurality of second sample pairs respectively and by using vectors corresponding to the medical heading sample and the medical search statement sample in the second sample pair and a difference vector between the vectors of the medical heading sample and the medical search statement sample.
In an alternative, the apparatus further comprises:
the first vector dimension reduction unit is used for reducing the dimension of the statement vector of the medical search statement through a vector dimension reduction model before the feature determination unit and the intention determination unit determine the feature similarity and the intention matching result of the medical title and the medical search statement, and the vector dimension reduction model is obtained by training the title vector of the medical title sample and the statement vector of the medical search statement sample in the second sample in the process of training the similarity recognition model;
the second vector dimension reduction unit is used for reducing the dimension of the title vector of the medical title through the vector dimension reduction model;
the difference determining unit specifically includes: and the vector difference is used for determining the vector difference between the statement vector of the medical search statement after dimension reduction and the title vector of the medical title after dimension reduction, so as to obtain a difference vector.
In a possible implementation manner, the first vector determining unit is specifically configured to determine a header vector of the medical heading by using a vector conversion model, where the vector conversion model is a bidirectional coding representation BERT model based on a transformer, the vector conversion model is obtained by using a masked word sequence corresponding to a plurality of medical corpus samples and training a word predicted to be masked in the masked word sequence as a training target; the medical corpus sample consists of a medical title sample and medical text contents represented by the medical title sample, and the masked word sequence is a word sequence obtained by masking at least one word contained in the medical corpus sample;
the second vector determination unit is specifically configured to determine a statement vector of the medical search statement by using the vector conversion model.
In yet another aspect, the present application further provides a server in a medical retrieval platform. Fig. 10 is a schematic diagram illustrating a component architecture of a server provided in the present application. In fig. 10, the server 1000 may include: a processor 1001 and a memory 1002.
Optionally, the server may further include: a communication interface 1003, an input unit 1004, a display 1005, and a communication bus 1006.
The processor 1001, the memory 1002, the communication interface 1003, the input unit 1004, and the display 1005 all communicate with each other via the communication bus 1006.
In the embodiment of the present application, the processor 1001 may be a central processing unit, an application specific integrated circuit, or the like.
The processor may call a program stored in the memory 1002, and in particular, the processor may perform the operations on the server side in the above embodiments.
The memory 1002 is used for storing one or more programs, which may include program codes including computer operation instructions, and in the embodiment of the present application, the memory at least stores a method for implementing the medical title matching method in any one of the above embodiments.
In one possible implementation, the memory 1002 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, the above-mentioned programs, and the like; the storage data area may store data created according to the use of the server.
The communication interface 1003 may be an interface of a communication module.
The present application may further include an input unit 1004, which may include a touch sensing unit, a keyboard, and the like.
The display 1005 includes a display panel, such as a touch display panel or the like.
Of course, the server structure shown in fig. 10 does not constitute a limitation to the server in the embodiment of the present application, and in practical applications, the server may include more or less components than those shown in fig. 10, or some components may be combined.
In another aspect, the present application further provides a storage medium having stored therein computer-executable instructions, which when loaded and executed by a processor, implement the medical title matching method as in any one of the above embodiments.
The present application also proposes a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and executes the computer instruction, so that the computer device executes the method provided in the various optional implementation manners in the aspect of the medical treatment topic matching method or the aspect of the medical treatment topic matching device.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. Meanwhile, the features described in the embodiments of the present specification may be replaced or combined with each other, so that those skilled in the art can implement or use the present application. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims (10)

1. A medical topic matching method, comprising:
obtaining a medical search statement;
determining a heading vector of the medical heading for each of a plurality of medical headings to be matched;
determining a sentence vector of the medical search sentence;
for each medical topic, determining feature similarity of the medical topic and the medical search statement based on a topic vector of the medical topic and a statement vector of the medical search statement;
for each medical topic, determining an intention matching result of the medical topic and the medical search statement by using an intention recognition model based on a topic vector of the medical topic and a statement vector of the medical search statement, wherein the intention matching result is used for representing whether medical intentions between the medical topic and the medical search statement are the same or not, the intention recognition model is an intention matching result labeled for each pair of first samples according to the plurality of first samples, and is obtained by training the vectors of the medical topic samples and the medical search statement samples in each pair of first samples;
and determining the matching degree sequence of the medical titles by combining the feature similarity and the intention matching result of each medical title and the medical search statement.
2. The method of claim 1, further comprising, prior to said determining a feature similarity of the medical topic to the medical search statement:
determining a vector difference between a statement vector of the medical search statement and a title vector of the medical heading to obtain a difference vector;
the determining the feature similarity of the medical treatment title and the medical search statement based on the title vector of the medical treatment title and the statement vector of the medical search statement comprises:
and determining the feature similarity of the medical treatment title and the medical treatment search statement based on the title vector of the medical treatment title, the statement vector of the medical treatment search statement and the difference vector.
3. The method of claim 2, wherein determining the feature similarity of the medical topic and the medical search statement based on the topic vector of the medical topic, the statement vector of the medical search statement, and the difference vector comprises:
determining feature similarity of the medical treatment title and the medical treatment search statement by using a similarity recognition model based on the title vector of the medical treatment title, the statement vector of the medical treatment search statement and the difference vector;
the similarity recognition model is obtained by training according to feature similarities labeled by a plurality of second sample pairs respectively and by using vectors corresponding to the medical heading samples and the medical search statement samples in the second sample pairs and difference vectors between the vectors of the medical heading samples and the vectors of the medical search statement samples.
4. The method of claim 3, further comprising, prior to determining feature similarity and intent match results of the medical topic and the medical search statement:
reducing the dimension of the statement vector of the medical search statement through a vector dimension reduction model, wherein the vector dimension reduction model is obtained by training the statement vector of the medical title sample and the statement vector of the medical search statement sample by using the second sample in the process of training the similarity recognition model;
reducing the dimension of the title vector of the medical title through the vector dimension reduction model;
the determining a vector difference between the sentence vector of the medical search sentence and the title vector of the medical heading to obtain a difference vector includes:
and determining a vector difference between the statement vector of the medical search statement after dimension reduction and the title vector of the medical title after dimension reduction to obtain a difference vector.
5. The method of claim 1, wherein said determining a heading vector for said medical heading comprises:
determining a heading vector of the medical heading using a vector conversion model;
the determining a sentence vector of the medical search sentence comprises:
determining a sentence vector of the medical search sentence using the vector conversion model;
the vector conversion model is a bidirectional coding representation (BERT) model based on a converter, is obtained by training by utilizing a masked word sequence corresponding to a plurality of medical corpus samples and taking a word which is predicted to be masked in the masked word sequence and is subjected to masking processing as a training target;
the medical corpus sample is composed of a medical heading sample and medical text contents represented by the medical heading sample, and the masked word sequence is a word sequence obtained by masking at least one word contained in the medical corpus sample.
6. A medical treatment topic matching apparatus, comprising:
a sentence obtaining unit for obtaining a medical search sentence;
the medical treatment system comprises a first vector determination unit, a second vector determination unit and a matching unit, wherein the first vector determination unit is used for determining a title vector of each medical treatment title in a plurality of medical treatment titles to be matched;
a second vector determination unit for determining a sentence vector of the medical search sentence;
a feature determination unit configured to determine, for each medical topic, a feature similarity between the medical topic and the medical search term based on a topic vector of the medical topic and a term vector of the medical search term;
an intention determining unit, configured to determine, for each medical topic, an intention matching result between the medical topic and the medical search statement based on a topic vector of the medical topic and a statement vector of the medical search statement, and using an intention recognition model, where the intention matching result is used to characterize whether medical intentions between the medical topic and the medical search statement are the same, and the intention recognition model is an intention matching result labeled according to a plurality of first sample pairs, and is obtained by training using respective vectors of medical topic samples and medical search statement samples in each first sample pair;
and the matching determination unit is used for determining the matching degree sequence of the medical titles by combining the feature similarity and the intention matching result of each medical title and the medical search statement.
7. The apparatus of claim 6, further comprising:
a difference value determining unit, configured to determine a vector difference between a statement vector of the medical search statement and a title vector of the medical heading to obtain a difference value vector before the feature determining unit determines the feature similarity between the medical heading and the medical search statement;
the feature determining unit is specifically configured to determine a feature similarity between the medical heading and the medical search statement based on the heading vector of the medical heading, the statement vector of the medical search statement, and the difference vector.
8. The apparatus of claim 7, wherein the feature determination unit comprises:
the characteristic determining subunit is used for determining the characteristic similarity of the medical treatment title and the medical treatment search statement by utilizing a similarity recognition model based on the title vector of the medical treatment title, the statement vector of the medical treatment search statement and the difference vector;
the similarity recognition model is obtained by training according to feature similarities labeled by a plurality of second sample pairs respectively and by using vectors corresponding to the medical heading samples and the medical search statement samples in the second sample pairs and difference vectors between the vectors of the medical heading samples and the vectors of the medical search statement samples.
9. A server, comprising a memory and a processor;
wherein the memory is used for storing programs;
the processor is configured to execute the program, which when executed is particularly configured to implement the medical topic matching method of any one of claims 1 to 5.
10. A storage medium characterized by storing a program for implementing the medical title matching method according to any one of claims 1 to 5 when executed.
CN202110049743.8A 2021-01-14 2021-01-14 Medical title matching method, device, equipment and storage medium Pending CN113569124A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110049743.8A CN113569124A (en) 2021-01-14 2021-01-14 Medical title matching method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110049743.8A CN113569124A (en) 2021-01-14 2021-01-14 Medical title matching method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113569124A true CN113569124A (en) 2021-10-29

Family

ID=78160928

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110049743.8A Pending CN113569124A (en) 2021-01-14 2021-01-14 Medical title matching method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113569124A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114741526A (en) * 2022-03-23 2022-07-12 中国人民解放军国防科技大学 Knowledge graph cloud platform in network space security field

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114741526A (en) * 2022-03-23 2022-07-12 中国人民解放军国防科技大学 Knowledge graph cloud platform in network space security field
CN114741526B (en) * 2022-03-23 2024-02-02 中国人民解放军国防科技大学 Knowledge graph cloud platform in network space safety field

Similar Documents

Publication Publication Date Title
CN109033068B (en) Method and device for reading and understanding based on attention mechanism and electronic equipment
CN112131350B (en) Text label determining method, device, terminal and readable storage medium
CN108932342A (en) A kind of method of semantic matches, the learning method of model and server
WO2021100902A1 (en) Dialog system answering method based on sentence paraphrase recognition
CN113282711B (en) Internet of vehicles text matching method and device, electronic equipment and storage medium
CN111581364B (en) Chinese intelligent question-answer short text similarity calculation method oriented to medical field
CN113821587B (en) Text relevance determining method, model training method, device and storage medium
CN116992007B (en) Limiting question-answering system based on question intention understanding
CN115827819A (en) Intelligent question and answer processing method and device, electronic equipment and storage medium
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN114707005B (en) Knowledge graph construction method and system for ship equipment
CN115796182A (en) Multi-modal named entity recognition method based on entity-level cross-modal interaction
Daswani et al. CollegeBot: a conversational AI approach to help students navigate college
CN114330483A (en) Data processing method, model training method, device, equipment and storage medium
CN114153942A (en) Event time sequence relation extraction method based on dynamic attention mechanism
CN111767720B (en) Title generation method, computer and readable storage medium
CN113157887A (en) Knowledge question-answering intention identification method and device and computer equipment
CN113569124A (en) Medical title matching method, device, equipment and storage medium
Arbaaeen et al. Natural language processing based question answering techniques: A survey
CN114398903B (en) Intention recognition method, device, electronic equipment and storage medium
CN116186220A (en) Information retrieval method, question and answer processing method, information retrieval device and system
CN117009456A (en) Medical query text processing method, device, equipment, medium and electronic product
CN113468311B (en) Knowledge graph-based complex question and answer method, device and storage medium
Alwaneen et al. Stacked dynamic memory-coattention network for answering why-questions in Arabic
CN112749554B (en) Method, device, equipment and storage medium for determining text matching degree

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination