WO2020007028A1

WO2020007028A1 - Medical consultation data recommendation method, device, computer apparatus, and storage medium

Info

Publication number: WO2020007028A1
Application number: PCT/CN2019/071525
Authority: WO
Inventors: 高羽; 柳恭; 葛培明; 孙行智
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-07-04
Filing date: 2019-01-14
Publication date: 2020-01-09
Also published as: CN109147934A; CN109147934B

Abstract

A medical consultation data recommendation method, comprising: acquiring a current question to be answered, performing word segmentation, extracting feature words according to a word segmentation result, and obtaining a first feature word set corresponding to the current question; acquiring a second feature word set corresponding to each index node in a pre-established index; calculating a cosine similarity level between the first feature word set and the second feature word set, sorting each index node according to a first similarity calculation result, so as to select a pre-determined number of index nodes as target index nodes, and obtaining a target index node set; acquiring, from a medical consultation database, question-answer pairs respectively corresponding to the target index nodes; and calculating second similarity levels between the current question and questions respectively corresponding to the question-answer pairs, sorting the question-answer pairs according to the second similarity levels, so as to select a target question-answer pair, and recommending medical consultation data according to the selected question-answer pair.

Description

Recommendation method, device, computer equipment and storage medium for consultation data

Cross-reference to related applications

This application claims the priority of a Chinese patent application filed on July 04, 2018 with the Chinese Patent Office under the application number 2018107242917, and the application name is "Recommendation Method, Device, Computer Equipment, and Storage Medium for Interrogation Data" Citations are incorporated in this application.

Technical field

The present application relates to a method, an apparatus, a computer device, and a storage medium for recommending consultation data.

Background technique

With the rapid development of Internet technology, Internet-based online consultation and online health consultation have become more and more popular. In online consultation and online health consultation, each user expects the fastest response from the doctor after asking a question.

In the traditional technology, after seeing the user ’s question, the doctor needs to think, organize the language, write the answer, and click to send it, and then the user can see the response to the question. However, the inventor realized that this method caused inefficient consultation .

Summary of the invention

According to various embodiments disclosed in the present application, a method, an apparatus, a computer device, and a storage medium for recommending consultation data are provided.

A method for recommending consultation data includes:

Acquiring the current question to be answered, segmenting the current question to be answered, and extracting feature words according to the result of the word segmentation to obtain a first feature word set corresponding to the current question to be answered;

Obtaining a second feature word set corresponding to each index node in a pre-established index;

Calculate the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node, and sort each index node to select a pre- Set the number of inodes as the target inodes to get the target inode set;

Obtaining question and answer pairs corresponding to each target index node in the target index node set from the consultation database; and

Calculate a second similarity between the current question to be answered and a question corresponding to each question-answer pair, rank each question-answer pair according to the second similarity calculation result to select a target question-answer pair, and according to the selected target question-answer pair Recommendations for consultation data.

A device for recommending consultation data includes:

A first feature word set acquisition module, configured to obtain a current question to be answered, perform word segmentation on the current question to be answered, extract feature words according to the word segmentation result, and obtain a first feature word set corresponding to the current question to be answered;

A second feature word set acquisition module, configured to obtain a second feature word set corresponding to each index node in a pre-established index;

A target index node set acquisition module is configured to respectively calculate a first similarity between a first feature word set corresponding to the current question to be answered and a second feature word set corresponding to each index node, and calculate a result according to the first similarity Sorting each index node to select a preset number of index nodes as target index nodes to obtain a target index node set;

A question-and-answer pair acquisition module, for obtaining a question-and-answer pair corresponding to each target index node in the target index node set from the consultation database; and

A recommendation module is configured to separately calculate a second similarity between the current question to be answered and a question corresponding to each question and answer pair, and sort each question and answer pair to select a target question and answer pair according to the second similarity calculation result, and according to the selected question and answer pair, The target question-and-answer is recommended for consultation data.

A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the steps of the method for recommending diagnosis data provided in any embodiment of the present application are implemented. .

One or more non-transitory computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors implement one of the embodiments of the present application. The steps provided in the recommended method of consultation data.

Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below. Other features and advantages of the application will become apparent from the description, the drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to explain the technical solutions in the embodiments of the present application more clearly, the drawings used in the embodiments are briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. Those of ordinary skill in the art can also obtain other drawings according to these drawings without paying creative labor.

FIG. 1 is an application scenario diagram of a method for recommending diagnosis data according to one or more embodiments.

FIG. 2 is a schematic flowchart of a method for recommending diagnosis data according to one or more embodiments.

FIG. 3 is a schematic flowchart before step S202 in one or more embodiments.

FIG. 4 is a schematic flowchart of step S304 according to one or more embodiments.

FIG. 5 is a schematic flowchart of step S206 according to one or more embodiments.

FIG. 6 is a schematic flowchart of step S502 according to one or more embodiments.

FIG. 7 is a structural block diagram of a consultation data recommendation device according to one or more embodiments.

FIG. 8 is a block diagram of a consultation data recommendation device in another embodiment.

FIG. 9 is a block diagram of a computer device according to one or more embodiments.

detailed description

In order to make the technical solution and advantages of the present application more clear and clear, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the application, and are not used to limit the application.

The method for recommending diagnosis data provided in this application can be applied to the application environment shown in FIG. 1. The consultation terminal 102 and the doctor terminal 104 communicate with the server 106 through a network, respectively. After receiving the question to be answered sent by the consultation terminal, the server 106 performs word segmentation on the question to be answered, extracts feature words according to the word segmentation result, obtains a first feature word set corresponding to the question to be answered, and obtains a pre-built index database. The second feature word set corresponding to each index node calculates the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node. The inodes are sorted to select a preset number of indexes as the target inodes to obtain the target inode set. The questionnaire pairs corresponding to each target inode are searched from the consultation information database, and the current question to be answered is corresponding to each question and answer pair. The second similarity between the questions, according to the calculation result of the second similarity, sort each question and answer pair to select the target question and answer pair, and according to the selected target question and answer pair, the doctor's terminal recommends the consultation data, and the recommended consultation data Can be the entire target Q & A pair or just the target Q & A The reply message.

The consultation terminal 102 and the doctor terminal 104 may be, but are not limited to, various personal computers, notebook computers, smart phones, and tablet computers. The server 104 may be implemented by an independent server or a server cluster composed of multiple servers.

In some embodiments, as shown in FIG. 2, a method for recommending diagnosis data is provided. The method is applied to the server in FIG. 1 as an example, and includes the following steps:

Step S202: Obtain a current question to be answered, segment the current question to be answered, and extract feature words according to the result of the word segmentation to obtain a first feature word set corresponding to the question currently to be answered.

Specifically, the question to be answered refers to a question entered by the questioning user at the questioning terminal. When the inquisition user enters an inquisition question in the inquisition terminal, the server will receive the inquisition question sent by the inquisition terminal, segment the question, and obtain the segmentation result. The segmentation result refers to the one obtained after the segmentation. Sequence of words. For example, the segmentation result obtained after the segmentation of "What should I do for my stomachache" can be: I / Stomachache / What to do.

To segment the current question to be answered, you can first divide the question to be answered into complete sentences according to punctuation, and then perform word segmentation on each segmented sentence. For example, you can use the word segmentation method of string matching to segment each sentence. Perform word segmentation processing, such as forward maximum matching, which divides strings in a segmented sentence from left to right, or reverse maximum matching, which divides strings in a segmented sentence from right to left Word segmentation; or shortest path word segmentation, where the number of words in a string of a segmented sentence is required to be cut to a minimum; or, two-way maximum matching, which performs word segmentation matching in both forward and reverse directions. Word segmentation can also be used to perform segmentation processing on each segmented sentence. Word segmentation is a method of machine speech judgment, which uses syntactic and semantic information to process ambiguity to segment words. You can also use statistical word segmentation to perform word segmentation on each segmented sentence. From the historical search history of the current user or the historical search history of the general user, according to the statistics of the phrase, it will be counted that some two adjacent words appear more frequently. If there are many, you can use these two adjacent words as phrases to perform segmentation.

Further, the server extracts feature words according to the segmentation results. In some embodiments, extracting feature words may specifically match each word in the segmentation result with each word in a pre-established feature word library, and use the matched words as feature words. In some embodiments, the match may be that the two words are exactly the same. In other embodiments, the matching may be that the similarity between the two words exceeds a preset threshold, such as "belly pain" and "belly pain" as two words that match each other. The feature vocabulary can be the authoritative explanation of various diseases obtained from the existing medical database, including its corresponding introduction, symptoms, complications, therapeutic drugs, common examinations and other professional information, and it can also be the medical treatment of various drugs. Information, such as the type of disease treated by the drug, the medical data can also be an open source medical data source on the Internet in real time or regularly through tools such as web crawlers (for example, questions and answers about different diseases on various forums, discussions, etc., or Various new medical cases, medical question and answer texts, etc.) to obtain specific types of information (for example, treatment plans corresponding to different diseases, treatment drugs, affiliated departments, clinical manifestations, etc.).

Step S204: Obtain a second feature word set corresponding to each index node in the pre-established index.

Specifically, for historical questioning data, question and answer pairs are extracted in advance, and then feature extraction is performed on the question and answer pairs. The extracted features include at least the feature words corresponding to the questions in the question and answer pair. These feature words form the second feature word set, and Save the question-answer pairs and their corresponding features to the same row of the data table of the questionnaire database, and finally index the questionnaire database according to the column data of the feature. Each index node in the index includes the index value and pointer, and the index value includes at least The corresponding second feature word set of each question and answer pair, the pointer refers to a memory area, and the memory area records a reference to the data of the corresponding row recorded on the hard disk. = A question-answer pair refers to an information pair consisting of a question from a user and a reply from a doctor. A question-and-answer pair can consist of a question from the questioning user and one answer from the doctor, or a question from the questioning user and multiple answers from the doctor. It can also consist of multiple consecutive questions from the questioning user and one from the doctor. The response may consist of multiple consecutive questions from the user and multiple consecutive responses from the doctor.

In this embodiment, the server sequentially traverses each index node in the index, obtains the index value of the index node, and obtains a second feature word set corresponding to each index node. .

Step S206: Calculate the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node, and sort each index node to select according to the first similarity calculation result. A preset number of inodes are used as target inodes to obtain the target inode set.

Specifically, the first similarity is used to represent a degree of similarity between the first feature word set and the second feature word set. In some embodiments, the first similarity may be a cosine similarity. The cosine similarity of the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to any index node may be calculated. The feature word set and the second feature word set extract keywords to obtain the first keyword set corresponding to the question to be answered and the second keyword set corresponding to the index node, and then calculate the first keyword set and the second keyword set. For each word frequency vector, the cosine similarity is obtained by calculating the cosine of the angle between the two word frequency vectors.

Further, the server sorts each index node of the index database according to the magnitude of the cosine similarity, and selects a preset number of index nodes as target index nodes according to the sorting result to obtain a target index node set. In some embodiments, the server may sort the index nodes in descending order according to the magnitude of the cosine similarity, select the index node of TOPN1 as the target index node, and N1 is a preset value set in advance, which can be set and adjusted based on experience.

Step S208: Obtain a question-answer pair corresponding to each target index node in the target index node set from the consultation database.

Specifically, because each index node in the index stores a pointer to a corresponding row in a table in the consultation database. The corresponding row of data corresponding to the index node can be obtained through the pointer, and the question-answer pair is data of one column in the row of data, so the corresponding question-answer pair can be obtained through the index node.

Step S210: Calculate the second similarity between the current question to be answered and the question corresponding to each question and answer pair, sort each question and answer pair according to the second similarity calculation result to select the target question and answer pair, and perform Consultation data is recommended.

Specifically, the second similarity is used to characterize the similarity between the current question to be answered and the question corresponding to each question-answer pair. In some embodiments, the second similarity may be a string similarity. Calculate the second similarity between the current question to be answered and the question corresponding to each question and answer pair. Specifically, it may include the following steps: After the server obtains the question and answer pair corresponding to each target index node, first calculate the current question and answer The edit distance between each question and answer pair in the obtained question and answer pair. The edit distance refers to the minimum required to edit a single character (such as modify, insert, delete) when modifying from one string to another. frequency. Then calculate the string similarity between the current question to be answered and each question in the obtained question and answer pair according to the editor, the formula is: Similarity = (Max (x, y) -Levenshtein) / Max (x, y) , x is the length of the string corresponding to the question to be answered, y is the length of the string corresponding to the question in the question and answer pair, and Levenshtein is the editing distance.

Further, the server sorts each question-and-answer pair obtained in step S208 according to the magnitude of the string similarity, and then selects a preset number of question-and-answer pairs as target question-and-answer pairs according to the sorting results, and performs consultation data recommendation based on these target question-and-answer pairs. In some embodiments, the server may sort the question and answer pairs obtained in step S208 in descending order according to the similarity of the string, select the question and answer pair of TOPN2 as the target question and answer pair, N2 is a preset value, and may be based on experience Make adjustments.

In some embodiments, the server recommends the diagnosis data according to the target question and answer pairs, which may be to recommend all target question and answer pairs to the doctor terminal, or to select any one question and answer pair to recommend to the doctor terminal, or to rank first. The question-and-answer pair is recommended to the doctor's terminal, and how to recommend it is not limited in this application.

In other embodiments, the server may also directly select the answers in the target question-and-answer pairs to recommend to the doctor terminal, may recommend all the answers of the target question-and-answer pairs to the doctor terminal, or may recommend the answers to any of the question-and-answer pairs. It is recommended to the doctor terminal, or the answer selected by the first question-answer pair is recommended to the doctor terminal. How to recommend it is not limited in the present invention.

In the above consultation data recommendation method, the server first obtains the feature word set corresponding to the question to be answered, and then calculates the first similarity between the feature word set of the question to be answered and the feature word set of each index node in the index, and selects the similarity The largest nodes are used as the target nodes, and then the corresponding question-and-answer pairs of these nodes are found, and the second similarity between the question to be answered and the question in the question-and-answer pair is calculated. The question-and-answer pairs with the largest string similarity are selected as the target question-and-answer pairs. For the recommendation of the consultation data, the application has been sorted twice to accurately locate the question and answer pairs that are most similar to the question to be answered, and to recommend based on the most similar question and answer pairs. Accurate answers, which improves the efficiency of the consultation.

In some embodiments, as shown in FIG. 3, before step S202, the method includes:

Step S302: Obtain a questionnaire information set corresponding to each previous questionnaire, and preprocess the questionnaire information set.

Specifically, the previous consultations refer to the various consultations completed before the current time, and the consultation information set refers to the information composed of the consultation information of the consultation user and the reply information of the doctor user in a complete consultation. Collect consultation information.

In this embodiment, preprocessing includes clauses, referential resolution, context processing, and the like. Sentence refers to the segmentation of a piece of information into a single sentence; referential resolution refers to the calculation of the reference content of the pronoun in the sentence, which can be calculated by syntactic analysis and editing distance; context processing refers to the completion of the context. For example: D: Are you dizzy? U: Yes, I am dizzy. Make the meaning of the second sentence more comprehensive; context processing uses syntactic analysis and sentence pattern judgment.

In step S304, question-and-answer pairs are extracted from the pre-processed questionnaire information set, and feature extraction is performed on the extracted question-and-answer pairs.

Specifically, in a complete consultation by the questioning user, questions are usually asked multiple times. The doctor will respond after each question is asked by the questioning user. Each time the question is asked by the questioning user and the doctor corresponding to the question. The answer constitutes a question and answer pair. Extracting the question-and-answer pairs means extracting the question-and-answer pairs from the questionnaire information corresponding to a complete consultation.

Further, the server performs feature extraction on the extracted question-answer pairs. In some embodiments, feature extraction may be extracting keywords for questions in a question-answer pair. In other embodiments, the extracted features may be, for example, the number of single sentences in the question-answer pair, the number of adjectives, question words, and so on.

In step S306, the question-and-answer pairs and the features corresponding to the question-and-answer pairs are correspondingly stored in the questionnaire database.

Specifically, the server stores the features corresponding to the question-answer pairs and the question-answer pairs in the inquiry database, that is, stores the features corresponding to the question-answer pairs and the question-answer pairs as different columns in the same row of the table in the database.

In some embodiments, the inquisition user communicates with the doctor through an instant message during the inquiries, and the message carries the user IDs of both parties in the communication, including the inquisition user ID and the doctor user ID. Specifically, the inquiries terminal The information sent carries the user ID of the consultation, and the information sent by the doctor's terminal carries the user ID of the doctor. Therefore, when the server obtains the questionnaire information corresponding to previous consultations, it can also obtain the user identifier corresponding to the questionnaire information, and The user identifier corresponding to the question-answer pair and the feature corresponding to the question-answer pair are stored one-to-one in the consultation database.

In step S308, an inquiry database is indexed according to the characteristics.

Specifically, the server establishes an index according to the column data of the features in the questionnaire database, and each node in the index corresponds to a row of data in the questionnaire database, including at least the features corresponding to the question-answer pair and the question-answer pair.

In some embodiments, the server may also create an index based on user identification and characteristics.

In this embodiment, by extracting features and establishing indexes from the questionnaire information, when calculating the similarity between the question to be answered and each question and answer pair, there is no need to traverse the entire database, only the calculation based on the question to be answered and the index value , Thereby significantly improving computing efficiency.

In some embodiments, as shown in FIG. 4, extracting question-and-answer pairs from the pre-processed questionnaire information includes:

Step S304A: Obtain a user ID corresponding to each piece of questionnaire information in the questionnaire information set, and the user identifier is a questioner user ID or a doctor user ID.

Specifically, each piece of inquiry information in the inquiry information corresponds to a user ID. For a message sent by the inquiry terminal, the corresponding user ID is the inquiry user ID, and for a message sent by the doctor terminal, the corresponding user ID is Doctor user ID.

In step S306B, the consultation information corresponding to the doctor user ID is filtered according to a preset rule.

Specifically, the preset rule at least includes: filtering out messages ending with an interrogative word, and messages matching a preset set of phrases. Interrogative words can be, for example, "what to do", "what is going on", "why" and so on. The preset idioms are sentences set in advance by the doctor's terminal to save response time, for example, "Please wait a moment", "Hello, I'm not in the class at present", and so on.

Step S308C, the questionnaire information set of the filtered question text is extracted according to punctuation marks and question words.

Specifically, the filtered questionnaire information is traversed from the first questionnaire information, and the user ID corresponding to each questionnaire information is obtained in turn. When the user ID corresponding to the questionnaire information is the questioner user ID, the questionnaire is determined. Does the information include a question sentence, and if so, the question sentence is used as one of the questions in the question and answer pair, starting from the questionnaire information corresponding to the first doctor user ID in the following corresponding to the question, obtaining all consecutive doctor user ID correspondences Until the questioning information corresponding to the next questioning user ID appears, the questioning information corresponding to the obtained doctor user ID is used as the answer to the question sentence to form a question and answer pair. Specifically, the extracted question-and-answer pairs can include one answer to one question, or multiple consecutive answers to one question, or one consecutive answer to multiple questions, or multiple consecutive answers to multiple consecutive questions. The specific consultation depends on the situation, and this application is not limited here.

In some embodiments, performing feature extraction on the extracted question-and-answer pairs includes: segmenting the questions in the extracted question-and-answer pairs to obtain a set of words corresponding to the questions; and separating each word in the word set from a pre-established feature word library. Each word is matched. When the match is successful, the word is used as the extracted feature.

Specifically, the server may first perform word segmentation on the questions in the extracted question and answer pair to obtain a word set corresponding to the question. To segment the questions in the extracted question and answer pairs, you can first divide the question into complete sentences according to punctuation, and then perform segmentation processing on each segmented sentence. For example, you can use the string matching segmentation method to segment each segmented Sentence segmentation processing, such as forward maximum matching, which divides strings in a segmented sentence from left to right; or reverse maximum matching, which divides strings in a segmented sentence from right to Word segmentation to the left; or shortest path word segmentation, where the number of words in a string of a segmented sentence is required to be cut to a minimum; or, two-way maximum matching, which performs word segmentation matching in both forward and reverse directions. Word segmentation can also be used to perform segmentation processing on each segmented sentence. Word segmentation is a method of machine speech judgment and uses syntactic and semantic information to process ambiguity to segment words. You can also use statistical word segmentation to perform word segmentation on each segmented sentence. From the historical search history of the current user or the historical search history of the general user, according to the statistics of the phrase, it will be counted that some two adjacent words appear more frequently. If there are many, you can use these two adjacent words as phrases to perform segmentation.

Further, each word in the word set obtained by the segmentation is matched with each word in a pre-established feature word library, and the matched words are used as feature words. In some embodiments, the match may be that the two words are exactly the same. In other embodiments, the matching may be that the similarity between the two words exceeds a preset threshold, such as "belly pain" and "belly pain" as two words that match each other. The feature vocabulary can be the authoritative explanation of various diseases obtained from the existing medical database, including its corresponding introduction, symptoms, complications, therapeutic drugs, common examinations and other professional information, and it can also be the medical treatment of various drugs. Information, such as the type of disease treated by the drug, the medical data can also be an open source medical data source on the Internet in real time or regularly through tools such as web crawlers (for example, questions and answers about different diseases on various forums, discussions, etc., or Various new medical cases, medical question and answer texts, etc.) to obtain specific types of information (for example, treatment plans corresponding to different diseases, treatment drugs, affiliated departments, clinical manifestations, etc.).

In some embodiments, as shown in FIG. 5, the steps of respectively calculating the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node include:

Step S502: Calculate a feature weight for each feature word in the first feature word set to obtain a first calculation result, select keywords based on the first calculation result, and obtain a first keyword set corresponding to a question to be currently answered.

Specifically, feature weights are used to characterize the importance of a feature. The larger the feature weight, the more important the feature word is, and the more it can represent the meaning of a word set. In some embodiments, term feature frequency-inverse document frequency (TF-IDF) algorithm may be used to calculate feature weights for each feature word. In this embodiment, a first settlement result is obtained after calculating feature weights. The first calculation result refers to a weight value corresponding to each feature word in the first word set. The feature words can be sorted according to the weight value, and then keywords are selected according to the sorting result, thereby obtaining a first keyword set.

In some embodiments, the server may sort each feature word in the first feature word set in descending order according to the feature weight, and then select a preset number of feature words that are ranked first as keywords to obtain the first keyword set.

Step S504, calculating feature weights for each feature word in the second feature word set to obtain a second calculation result, selecting keywords according to the second calculation result, and obtaining a second keyword set corresponding to each index node.

Specifically, a term frequency-inverse document frequency algorithm may be used to calculate feature weights for each feature word in the second feature word set to obtain a second calculation result, and the second calculation result refers to the feature weight of each feature word in the second word set. Value, the feature words can be sorted according to the weight value, and then the keywords are selected according to the ranking result to obtain a second keyword set.

In some embodiments, the server may sort each feature word in the second feature word set in descending order according to the feature weight, and then select a preset number of feature words that are ranked first as keywords to obtain a second keyword set.

In step S506, the first word frequency vector corresponding to the current question to be answered and the second word frequency vector corresponding to each index node are obtained according to the first keyword set and the second keyword set.

Specifically, the first keyword set and the second keyword set are combined to obtain a union, and the word frequencies of each keyword in the union set in the first feature word set and the second feature word set are calculated respectively according to the word frequencies. A first word frequency vector and a second word frequency vector are generated. For example, if the first feature word set is: cough / smoker / insomnia, its corresponding keyword set is {cough, smoking}; the second feature word set is: headache / cough / running nose / cooling, and its corresponding The key word is {headache, runny nose}. Combine the two keywords to get {cough, smoking, headache, runny nose}. Then, the word frequency of each word in the set in the first feature word set is: cough 1, smoking 1, headache 0, runny nose 0, the word frequency of each word in the set in the first feature word set is: cough 1, smoking 0, headache 1, runny nose 1, and finally the first word frequency vector is [1,1 , 0,0], and the second word frequency vector is [1,0,1,1].

Step S508: Calculate the cosine of the angle between each first word frequency vector and each second word frequency vector to obtain the first similarity.

Specifically, the calculation formula of the cosine similarity is:

n (n≥2) of word frequency dimension vectors, A _i is a first word frequency vector, B _i for the second word frequency vector.

In this embodiment, the cosine similarity of the two feature word sets is calculated by extracting keywords from the feature word set and obtaining the word frequency vector. Compared with calculating the similarity of the two documents to be answered by the question and answer, the savings The amount of calculation improves the calculation efficiency.

In some embodiments, as shown in FIG. 6, calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result includes:

In step S602, an initial feature weight of each feature word in the first feature word set is calculated using a word frequency-inverse document frequency algorithm.

Specifically, first calculate the word frequency TF, which can be calculated with reference to the following formula:

Word frequency TF = number of times a word appears in a document / total number of words in the document;

Then, to calculate the IDF of the inverse document frequency, refer to the following formula:

Inverse document word frequency

Finally, calculate the initial feature weight: W = TF * IDF.

In step S604, it is sequentially judged whether each feature word in the first feature word set satisfies a preset adjustment rule. If so, the process proceeds to step S606; if not, the process proceeds to step S608.

Step S606: Adjust the initial weight of the feature words according to the adjustment rule to obtain the final feature weight.

In step S608, the initial feature weight is used as the final feature weight.

Specifically, the preset adjustment rule is a rule for manually adjusting a feature weight of a feature word. In some embodiments, the preset adjustment rule may be: when two feature words appear at the same time and the difference between their corresponding feature weights is less than a preset threshold, then the weight of one of the words is adjusted to make the difference in weights Not less than the preset threshold, for example, when headache and hand pain appear as feature words at the same time, and the difference between their corresponding feature weights is less than 0.2, the feature weight of headache is adjusted so that the feature weight difference between headache and hand pain More than 0.2, the purpose of doing this is to increase the weight of the feature words that have a greater effect on the symptoms, thereby improving the accuracy of keyword selection.

In this embodiment, the accuracy of keyword selection can be improved by adjusting feature weights.

It should be understood that although the steps in the flowchart of FIG. 2-6 are sequentially displayed in accordance with the directions of the arrows, these steps are not necessarily performed in the order indicated by the arrows. Unless explicitly stated in this document, the execution of these steps is not strictly limited, and these steps can be performed in other orders. Moreover, at least a part of the steps in FIG. 2-6 may include multiple sub-steps or stages. These sub-steps or stages are not necessarily performed at the same time, but may be performed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed in turn or alternately with at least a part of another step or a sub-step or stage of another step.

In some embodiments, as shown in FIG. 7, a consultation data recommendation device 700 is provided, including:

A first feature word set acquisition module 702, configured to obtain a current question to be answered, segment the current question to be answered, extract feature words according to the result of the word segmentation, and obtain a first feature word set corresponding to the current question to be answered;

A second feature word set obtaining module 704, configured to obtain a second feature word set corresponding to each index node in a pre-established index;

The target index node set acquisition module 706 is configured to respectively calculate a first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node. Each index node is sorted to select a preset number of index nodes as target index nodes to obtain a target index node set;

A question-and-answer pair acquisition module 708 is configured to obtain a question-and-answer pair corresponding to each target index node in the target index node set from the consultation database;

The recommendation module 710 is configured to separately calculate a second similarity between a current question to be answered and a question corresponding to each question and answer pair, and rank each question and answer pair according to the second similarity calculation result to select a target question and answer pair, and according to the selected target Questions and answers are recommended for questioning data.

In some embodiments, as shown in FIG. 8, the foregoing apparatus further includes:

A pre-processing module 802, configured to obtain a set of inquiry information corresponding to previous visits, and pre-process the set of inquiry information;

A feature extraction module 804, configured to extract question and answer pairs from the pre-processed questionnaire information set, and perform feature extraction on the extracted question and answer pairs;

A storage module 806, configured to correspondingly store question and answer pairs and the characteristics corresponding to the question and answer pairs in the questionnaire database;

The index establishing module 808 is configured to index the consultation database according to characteristics.

In some embodiments, the feature extraction module 804 is further configured to obtain a user ID corresponding to each piece of questioning information in the questioning information set, and the user identifier is a questioning user identifier or a doctor user identifier; Filtering is performed according to preset rules; the filtered questionnaire information set is extracted according to punctuation marks and question words.

In some embodiments, the feature extraction module 804 is further configured to perform segmentation on the questions in the extracted question and answer pairs to obtain a set of words corresponding to the questions; and match each word in the word set with each word in a pre-established feature word library When the match is successful, the word is used as the extracted feature.

In some embodiments, the target index node set acquisition module 706 is further configured to calculate feature weights for each feature word in the first feature word set to obtain a first calculation result, select keywords based on the first calculation result, and obtain a current question to be answered. Corresponding first keyword set; calculating feature weights for each feature word in the second feature word set to obtain a second calculation result, selecting keywords based on the second calculation result, and obtaining a second keyword set corresponding to each index node; A keyword set and a second keyword set to obtain the first word frequency vector corresponding to the current question to be answered and the second word frequency vector corresponding to each index node; and calculate the angle between each first word frequency vector and each second word frequency vector The cosine value gives the first similarity.

In some embodiments, the target index node set acquisition module 706 is further configured to calculate an initial feature weight of each feature word in the first feature word set using a word frequency-inverse document frequency algorithm; when any feature word in the first feature word set When the preset adjustment rules are satisfied, the initial feature weights of the feature words are adjusted according to the preset adjustment rules to obtain the final feature weights; when any feature word in the first feature word set does not meet the preset adjustment rules, the initial The feature weight is used as the final feature weight.

For the specific limitation of the consultation data recommendation device, refer to the limitation on the recommendation method of the consultation data mentioned above, which is not repeated here. Each module in the above-mentioned consultation data recommendation device may be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

In some embodiments, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 8. The computer device includes a processor, a memory, a network interface, and a database connected through a system bus. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile computer-readable storage medium and an internal memory. The non-volatile computer-readable storage medium stores an operating system, computer-readable instructions, and a database. The internal memory provides an environment for operating the operating system and computer-readable instructions in a non-volatile computer-readable storage medium. The database of the computer equipment is used to store data such as question-answer pairs, characteristics corresponding to the question-answer pairs. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions are executed by a processor to implement a method for recommending diagnosis data.

Those skilled in the art can understand that the structure shown in FIG. 8 is only a block diagram of a part of the structure related to the scheme of the present application, and does not constitute a limitation on the computer equipment to which the scheme of the present application is applied. The specific computer equipment may be Include more or fewer parts than shown in the figure, or combine certain parts, or have a different arrangement of parts.

A computer device includes a memory and one or more processors. Computer-readable instructions are stored in the memory. When the computer-readable instructions are executed by the processor, the one or more processors execute the following steps: obtaining a current question to be answered , Segmenting the current question to be answered, and extracting feature words according to the result of the segmentation, to obtain a first feature word set corresponding to the current question to be answered; obtaining a second feature word set corresponding to each index node in a pre-established index; and calculating the current wait The first similarity between the first feature word set corresponding to the answer question and the second feature word set corresponding to each index node, each index node is sorted according to the first similarity calculation result to select a preset number of index nodes as Target inodes to obtain the target inode set; obtain the question-answer pairs corresponding to each target inode in the target inode set from the consultation database; calculate the second similarity between the current question to be answered and the question corresponding to each question-answer pair For each question and answer based on the second similarity calculation result Sort the pairs to select the target question-answer pairs, and recommend the diagnosis data based on the selected target question-answer pairs.

In some embodiments, before the step of acquiring the current question to be answered, when the processor executes the computer-readable instructions, the following steps are further implemented: obtaining the inquiry information set corresponding to the previous consultations, preprocessing the inquiry information set; The processed question and answer information set extracts question and answer pairs, and extracts the features of the question and answer pairs; correspondingly stores the features of the question and answer pairs and the question and answer pairs into the question and answer database; and indexes the question and answer database based on the features.

In some embodiments, extracting question-and-answer pairs from the pre-processed questionnaire information includes: obtaining a user ID corresponding to each piece of questionnaire information in the questionnaire information set, and the user ID is a questioning user ID or a doctor user ID; The questionnaire information corresponding to the user ID is filtered according to a preset rule; for the filtered questionnaire information set, question and answer pairs are extracted according to punctuation marks and question words.

In some embodiments, the steps of respectively calculating the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node include: The feature weight is calculated for each feature word to obtain a first calculation result, and keywords are selected according to the first calculation result to obtain a first keyword set corresponding to the current question to be answered; the feature weight is calculated for each feature word in the second feature word set to obtain a second Calculate the results, select keywords based on the second calculation results, and obtain a second keyword set corresponding to each index node; obtain the first word frequency vector and each index corresponding to the current question to be answered according to the first keyword set and the second keyword set The second word frequency vector corresponding to the node; the angle cosine between each first word frequency vector and each second word frequency vector is calculated to obtain the first similarity.

In some embodiments, calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result includes: using a word frequency-inverse document frequency algorithm to calculate an initial feature weight for each feature word in the first feature word set; When any feature word in the first feature word set meets a preset adjustment rule, the initial feature weight of the feature word is adjusted according to the preset adjustment rule to obtain the final feature weight; when any one of the first feature word set When the feature words do not satisfy the preset adjustment rules, the initial feature weight is taken as the final feature weight.

One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps: obtaining the current pending answer. Questions, segment the current question to be answered, and extract feature words based on the results of the segmentation to obtain the first feature word set corresponding to the question currently to be answered; obtain the second feature word set corresponding to each index node in the pre-established index; calculate the current separately The first similarity between the first feature word set corresponding to the question to be answered and the second feature word set corresponding to each index node, each index node is sorted according to the first similarity calculation result to select a preset number of index nodes As a target index node, a target index node set is obtained; a question-answer pair corresponding to each target index node in the target index node set is obtained from the consultation database; and a second similarity between the current question to be answered and the question corresponding to each question-answer pair is calculated separately Degree based on the second similarity calculation result Sort to select target question and answer pairs, and recommend diagnosis data based on the selected target question and answer pairs.

In some embodiments, before the step of obtaining the current question to be answered, when the computer-readable instructions are executed by the processor, the following steps are also implemented: obtaining the inquiry information set corresponding to previous visits, and pre-processing the inquiry information set; The pre-processed questionnaire information set extracts question-and-answer pairs and extracts features from the question-and-answer pairs; correspondingly stores the features of the question-and-answer pairs and question-and-answer pairs in the question-and-answer database; and indexes the question-and-answer database based on the features.

A person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by using computer-readable instructions to instruct related hardware. The computer-readable instructions can be stored in a nonvolatile computer In the readable storage medium, the computer-readable instructions, when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other media used in the embodiments provided in this application may include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined. In order to make the description concise, all possible combinations of the technical features in the above embodiments have not been described. However, as long as there is no contradiction in the combination of these technical features, it should be It is considered to be the range described in this specification.

The above-mentioned embodiments only express several implementation manners of the present application, and the description thereof is more specific and detailed, but cannot be understood as a limitation on the scope of the invention patent. It should be noted that, for those of ordinary skill in the art, without departing from the concept of the present application, several modifications and improvements can be made, and these all belong to the protection scope of the present application. Therefore, the protection scope of this application patent shall be subject to the appended claims.

Claims

A method for recommending consultation data includes:

Acquiring the current question to be answered, segmenting the current question to be answered, and extracting feature words according to the result of the word segmentation to obtain a first feature word set corresponding to the current question to be answered;

Obtaining a second feature word set corresponding to each index node in a pre-established index;

Calculate the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node, and sort each index node to select a pre- Set the number of inodes as the target inodes to get the target inode set;

Obtaining question and answer pairs corresponding to each target index node in the target index node set from the consultation database; and

Calculate a second similarity between the current question to be answered and a question corresponding to each question-answer pair, rank each question-answer pair according to the second similarity calculation result to select a target question-answer pair, and according to the selected target question-answer pair Recommendations for consultation data.
The method according to claim 1, wherein before the step of obtaining a current question to be answered, the method further comprises:

Obtaining the inquiry information set corresponding to previous consultations, and preprocessing the inquiry information set;

Extracting question-answer pairs from the pre-processed questionnaire information set, and performing feature extraction on the extracted question-answer pairs;

Correspondingly storing the question-answer pairs and the features corresponding to the question-answer pairs in an inquiry database; and

Indexing the consultation database according to the characteristics.
The method according to claim 2, wherein the extracting the question-and-answer pairs from the pre-processed questionnaire information comprises:

Obtaining a user ID corresponding to each piece of the inquiry information in the inquiry information set, where the user identifier is an inquiry user ID or a doctor user ID;

Filtering the consultation information corresponding to the doctor's user ID according to preset rules; and

For the filtered questionnaire information set, question and answer pairs are extracted based on punctuation marks and question words.
The method according to claim 2 or 3, wherein performing feature extraction on the extracted question-answer pairs comprises:

Perform word segmentation on the extracted questions in the question and answer pair to obtain a set of words corresponding to the questions; and

Each word in the word set is matched with each word in a pre-established feature word library, and when the matching is successful, the word is used as the extracted feature.
The method according to claim 1, wherein the step of calculating a first similarity between a first feature word set corresponding to the current question to be answered and a second feature word set corresponding to each index node, respectively. ,include:

Calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result, selecting keywords according to the first calculation result, and obtaining a first keyword set corresponding to the current question to be answered;

Calculating feature weights for each feature word in the second feature word set to obtain a second calculation result, selecting keywords according to the second calculation result, and obtaining a second keyword set corresponding to each index node;

Obtaining the first word frequency vector corresponding to the current question to be answered and the second word frequency vector corresponding to each index node according to the first keyword set and the second keyword set; and

Calculate the cosine of the angle between each first word frequency vector and each second word frequency vector to obtain the first similarity.
The method according to claim 5, wherein the calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result comprises:

Using a word frequency-inverse document frequency algorithm to calculate an initial feature weight of each feature word in the first feature word set;

When any feature word in the first feature word set meets a preset adjustment rule, adjusting an initial feature weight of the feature word according to the preset adjustment rule to obtain a final feature weight; and

When any feature word in the first feature word set does not satisfy a preset adjustment rule, the initial feature weight is used as a final feature weight.
A device for recommending consultation data includes:

A first feature word set acquisition module, configured to obtain a current question to be answered, perform word segmentation on the current question to be answered, extract feature words according to the word segmentation result, and obtain a first feature word set corresponding to the current question to be answered;

A second feature word set acquisition module, configured to obtain a second feature word set corresponding to each index node in a pre-established index;

A target index node set acquisition module is configured to respectively calculate a first similarity between a first feature word set corresponding to the current question to be answered and a second feature word set corresponding to each index node, and calculate a result according to the first similarity Sorting each index node to select a preset number of index nodes as target index nodes to obtain a target index node set;

A question-and-answer pair acquisition module, for obtaining a question-and-answer pair corresponding to each target index node in the target index node set from the consultation database; and

A recommendation module is configured to separately calculate a second similarity between the current question to be answered and a question corresponding to each question and answer pair, and sort each question and answer pair to select a target question and answer pair according to the second similarity calculation result, and according to the selected question and answer pair, The target question-and-answer is recommended for consultation data.
The apparatus according to claim 7, further comprising:

A pre-processing module, configured to obtain a set of inquiry information corresponding to previous visits, and pre-process the set of inquiry information;

A feature extraction module, configured to extract question and answer pairs from the pre-processed questionnaire information set, and perform feature extraction on the extracted question and answer pairs;

A storage module, configured to correspondingly store the question-answer pairs and the features corresponding to the question-answer pairs to a question-and-answer database; and

An index establishing module is configured to index the consultation database according to the characteristics.
A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more processors are caused. Each processor performs the following steps:

Acquiring the current question to be answered, segmenting the current question to be answered, and extracting feature words according to the result of the word segmentation to obtain a first feature word set corresponding to the current question to be answered;

Obtaining a second feature word set corresponding to each index node in a pre-established index;

Calculate the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node, and sort each index node to select a pre- Set the number of inodes as the target inodes to get the target inode set;

Obtaining question and answer pairs corresponding to each target index node in the target index node set from the consultation database; and

Calculate a second similarity between the current question to be answered and a question corresponding to each question-answer pair, rank each question-answer pair according to the second similarity calculation result to select a target question-answer pair, and according to the selected target question-answer pair Recommendations for consultation data.
The computer device according to claim 10, wherein the processor further executes the following steps when executing the computer-readable instructions:

Obtaining the inquiry information set corresponding to previous consultations, and preprocessing the inquiry information set;

Extracting question-answer pairs from the pre-processed questionnaire information set, and performing feature extraction on the extracted question-answer pairs;

Correspondingly storing the question-answer pairs and the features corresponding to the question-answer pairs in an inquiry database; and

Indexing the consultation database according to the characteristics.
The computer device according to claim 10, wherein the processor further executes the following steps when executing the computer-readable instructions:

Obtaining a user ID corresponding to each piece of the inquiry information in the inquiry information set, where the user identifier is an inquiry user ID or a doctor user ID;

Filtering the consultation information corresponding to the doctor's user ID according to preset rules; and

For the filtered questionnaire information set, question and answer pairs are extracted based on punctuation marks and question words.
The computer device according to claim 10 or 11, wherein the processor further executes the following steps when executing the computer-readable instructions:

Perform word segmentation on the extracted questions in the question and answer pair to obtain a set of words corresponding to the questions; and

Each word in the word set is matched with each word in a pre-established feature word library, and when the matching is successful, the word is used as the extracted feature.
The computer device according to claim 9, wherein the processor further executes the following steps when executing the computer-readable instructions:

Calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result, selecting keywords according to the first calculation result, and obtaining a first keyword set corresponding to the current question to be answered;

Calculating feature weights for each feature word in the second feature word set to obtain a second calculation result, selecting keywords according to the second calculation result, and obtaining a second keyword set corresponding to each index node;

Obtaining the first word frequency vector corresponding to the current question to be answered and the second word frequency vector corresponding to each index node according to the first keyword set and the second keyword set; and

Calculate the cosine of the angle between each first word frequency vector and each second word frequency vector to obtain the first similarity.
The computer device of claim 13, wherein the processor further executes the following steps when executing the computer-readable instructions:

Using a word frequency-inverse document frequency algorithm to calculate an initial feature weight of each feature word in the first feature word set;

When any feature word in the first feature word set meets a preset adjustment rule, adjusting an initial feature weight of the feature word according to the preset adjustment rule to obtain a final feature weight; and

When any feature word in the first feature word set does not satisfy a preset adjustment rule, the initial feature weight is used as a final feature weight.
One or more non-transitory computer-readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:

Acquiring the current question to be answered, segmenting the current question to be answered, and extracting feature words according to the result of the word segmentation to obtain a first feature word set corresponding to the current question to be answered;

Obtaining a second feature word set corresponding to each index node in a pre-established index;

Calculate the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node, and sort each index node to select a pre- Set the number of inodes as the target inodes to get the target inode set;

Obtaining question and answer pairs corresponding to each target index node in the target index node set from the consultation database; and

Calculate a second similarity between the current question to be answered and a question corresponding to each question-answer pair, rank each question-answer pair according to the second similarity calculation result to select a target question-answer pair, and according to the selected target question-answer pair Recommendations for consultation data.
The storage medium according to claim 15, wherein when the computer-readable instructions are executed by the processor, the following steps are further performed:

Obtaining the inquiry information set corresponding to previous consultations, and preprocessing the inquiry information set;

Extracting question-answer pairs from the pre-processed questionnaire information set, and performing feature extraction on the extracted question-answer pairs;

Correspondingly storing the question-answer pairs and the features corresponding to the question-answer pairs in an inquiry database; and

Indexing the consultation database according to the characteristics.
The storage medium according to claim 16, wherein when the computer-readable instructions are executed by the processor, the following steps are further performed:

Obtaining a user ID corresponding to each piece of the inquiry information in the inquiry information set, where the user identifier is an inquiry user ID or a doctor user ID;

Filtering the consultation information corresponding to the doctor's user ID according to preset rules; and

For the filtered questionnaire information set, question and answer pairs are extracted based on punctuation marks and question words.
The storage medium according to claim 16 or 17, wherein when the computer-readable instructions are executed by the processor, the following steps are further performed:

Perform word segmentation on the extracted questions in the question and answer pair to obtain a set of words corresponding to the questions; and

Each word in the word set is matched with each word in a pre-established feature word library, and when the matching is successful, the word is used as the extracted feature.
The storage medium according to claim 15, wherein when the computer-readable instructions are executed by the processor, the following steps are further performed:

Calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result, selecting keywords according to the first calculation result, and obtaining a first keyword set corresponding to the current question to be answered;

Calculating feature weights for each feature word in the second feature word set to obtain a second calculation result, selecting keywords according to the second calculation result, and obtaining a second keyword set corresponding to each index node;

Obtaining the first word frequency vector corresponding to the current question to be answered and the second word frequency vector corresponding to each index node according to the first keyword set and the second keyword set; and

Calculate the cosine of the angle between each first word frequency vector and each second word frequency vector to obtain the first similarity.
The storage medium according to claim 19, wherein when the computer-readable instructions are executed by the processor, the following steps are further performed:

Using a word frequency-inverse document frequency algorithm to calculate an initial feature weight of each feature word in the first feature word set;

When any feature word in the first feature word set meets a preset adjustment rule, adjusting an initial feature weight of the feature word according to the preset adjustment rule to obtain a final feature weight; and

When any feature word in the first feature word set does not satisfy a preset adjustment rule, the initial feature weight is used as a final feature weight.