WO2020007028A1 - Medical consultation data recommendation method, device, computer apparatus, and storage medium - Google Patents

Medical consultation data recommendation method, device, computer apparatus, and storage medium Download PDF

Info

Publication number
WO2020007028A1
WO2020007028A1 PCT/CN2019/071525 CN2019071525W WO2020007028A1 WO 2020007028 A1 WO2020007028 A1 WO 2020007028A1 CN 2019071525 W CN2019071525 W CN 2019071525W WO 2020007028 A1 WO2020007028 A1 WO 2020007028A1
Authority
WO
WIPO (PCT)
Prior art keywords
question
feature
word
feature word
answered
Prior art date
Application number
PCT/CN2019/071525
Other languages
French (fr)
Chinese (zh)
Inventor
高羽
柳恭
葛培明
孙行智
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020007028A1 publication Critical patent/WO2020007028A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • the present application relates to a method, an apparatus, a computer device, and a storage medium for recommending consultation data.
  • a method, an apparatus, a computer device, and a storage medium for recommending consultation data are provided.
  • a method for recommending consultation data includes:
  • a device for recommending consultation data includes:
  • a first feature word set acquisition module configured to obtain a current question to be answered, perform word segmentation on the current question to be answered, extract feature words according to the word segmentation result, and obtain a first feature word set corresponding to the current question to be answered;
  • a second feature word set acquisition module configured to obtain a second feature word set corresponding to each index node in a pre-established index
  • a target index node set acquisition module is configured to respectively calculate a first similarity between a first feature word set corresponding to the current question to be answered and a second feature word set corresponding to each index node, and calculate a result according to the first similarity Sorting each index node to select a preset number of index nodes as target index nodes to obtain a target index node set;
  • a question-and-answer pair acquisition module for obtaining a question-and-answer pair corresponding to each target index node in the target index node set from the consultation database;
  • a recommendation module is configured to separately calculate a second similarity between the current question to be answered and a question corresponding to each question and answer pair, and sort each question and answer pair to select a target question and answer pair according to the second similarity calculation result, and according to the selected question and answer pair, The target question-and-answer is recommended for consultation data.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the steps of the method for recommending diagnosis data provided in any embodiment of the present application are implemented. .
  • One or more non-transitory computer-readable storage media storing computer-readable instructions.
  • the computer-readable instructions When executed by one or more processors, the one or more processors implement one of the embodiments of the present application. The steps provided in the recommended method of consultation data.
  • FIG. 1 is an application scenario diagram of a method for recommending diagnosis data according to one or more embodiments.
  • FIG. 2 is a schematic flowchart of a method for recommending diagnosis data according to one or more embodiments.
  • FIG. 3 is a schematic flowchart before step S202 in one or more embodiments.
  • FIG. 4 is a schematic flowchart of step S304 according to one or more embodiments.
  • FIG. 5 is a schematic flowchart of step S206 according to one or more embodiments.
  • FIG. 6 is a schematic flowchart of step S502 according to one or more embodiments.
  • FIG. 7 is a structural block diagram of a consultation data recommendation device according to one or more embodiments.
  • FIG. 8 is a block diagram of a consultation data recommendation device in another embodiment.
  • FIG. 9 is a block diagram of a computer device according to one or more embodiments.
  • the method for recommending diagnosis data provided in this application can be applied to the application environment shown in FIG. 1.
  • the consultation terminal 102 and the doctor terminal 104 communicate with the server 106 through a network, respectively.
  • the server 106 After receiving the question to be answered sent by the consultation terminal, the server 106 performs word segmentation on the question to be answered, extracts feature words according to the word segmentation result, obtains a first feature word set corresponding to the question to be answered, and obtains a pre-built index database.
  • the second feature word set corresponding to each index node calculates the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node.
  • the inodes are sorted to select a preset number of indexes as the target inodes to obtain the target inode set.
  • the questionnaire pairs corresponding to each target inode are searched from the consultation information database, and the current question to be answered is corresponding to each question and answer pair.
  • the second similarity between the questions according to the calculation result of the second similarity, sort each question and answer pair to select the target question and answer pair, and according to the selected target question and answer pair, the doctor's terminal recommends the consultation data, and the recommended consultation data Can be the entire target Q & A pair or just the target Q & A
  • the reply message is sorted to select a preset number of indexes as the target inodes to obtain the target inode set.
  • the consultation terminal 102 and the doctor terminal 104 may be, but are not limited to, various personal computers, notebook computers, smart phones, and tablet computers.
  • the server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
  • a method for recommending diagnosis data is provided.
  • the method is applied to the server in FIG. 1 as an example, and includes the following steps:
  • Step S202 Obtain a current question to be answered, segment the current question to be answered, and extract feature words according to the result of the word segmentation to obtain a first feature word set corresponding to the question currently to be answered.
  • the question to be answered refers to a question entered by the questioning user at the questioning terminal.
  • the server will receive the inquisition question sent by the inquisition terminal, segment the question, and obtain the segmentation result.
  • the segmentation result refers to the one obtained after the segmentation. Sequence of words. For example, the segmentation result obtained after the segmentation of "What should I do for my stomachache" can be: I / Stomachache / What to do.
  • Word segmentation can also be used to perform segmentation processing on each segmented sentence.
  • word segmentation processing such as forward maximum matching, which divides strings in a segmented sentence from left to right, or reverse maximum matching, which divides strings in a segmented sentence from right to left Word segmentation; or shortest path word segmentation, where the number of words in a string of a segmented sentence is required to be cut to a minimum; or, two-way maximum matching, which performs word segmentation matching in both forward and reverse directions.
  • Word segmentation can also be used to perform segmentation processing on each segmented sentence.
  • Word segmentation is a method of machine speech judgment, which uses syntactic and semantic information to process ambiguity to segment words. You can also use statistical word segmentation to perform word segmentation on each segmented sentence. From the historical search history of the current user or the historical search history of the general user, according to the statistics of the phrase, it will be counted that some two adjacent words appear more frequently. If there are many, you can use these two adjacent words as phrases to perform segmentation.
  • the server extracts feature words according to the segmentation results.
  • extracting feature words may specifically match each word in the segmentation result with each word in a pre-established feature word library, and use the matched words as feature words.
  • the match may be that the two words are exactly the same.
  • the matching may be that the similarity between the two words exceeds a preset threshold, such as "belly pain” and "belly pain” as two words that match each other.
  • the feature vocabulary can be the authoritative explanation of various diseases obtained from the existing medical database, including its corresponding introduction, symptoms, complications, therapeutic drugs, common examinations and other professional information, and it can also be the medical treatment of various drugs.
  • the medical data can also be an open source medical data source on the Internet in real time or regularly through tools such as web crawlers (for example, questions and answers about different diseases on various forums, discussions, etc., or Various new medical cases, medical question and answer texts, etc.) to obtain specific types of information (for example, treatment plans corresponding to different diseases, treatment drugs, affiliated departments, clinical manifestations, etc.).
  • tools such as web crawlers (for example, questions and answers about different diseases on various forums, discussions, etc., or Various new medical cases, medical question and answer texts, etc.) to obtain specific types of information (for example, treatment plans corresponding to different diseases, treatment drugs, affiliated departments, clinical manifestations, etc.).
  • Step S204 Obtain a second feature word set corresponding to each index node in the pre-established index.
  • question and answer pairs are extracted in advance, and then feature extraction is performed on the question and answer pairs.
  • the extracted features include at least the feature words corresponding to the questions in the question and answer pair. These feature words form the second feature word set, and Save the question-answer pairs and their corresponding features to the same row of the data table of the questionnaire database, and finally index the questionnaire database according to the column data of the feature.
  • Each index node in the index includes the index value and pointer, and the index value includes at least The corresponding second feature word set of each question and answer pair, the pointer refers to a memory area, and the memory area records a reference to the data of the corresponding row recorded on the hard disk.
  • a question-answer pair refers to an information pair consisting of a question from a user and a reply from a doctor.
  • a question-and-answer pair can consist of a question from the questioning user and one answer from the doctor, or a question from the questioning user and multiple answers from the doctor. It can also consist of multiple consecutive questions from the questioning user and one from the doctor. The response may consist of multiple consecutive questions from the user and multiple consecutive responses from the doctor.
  • the server sequentially traverses each index node in the index, obtains the index value of the index node, and obtains a second feature word set corresponding to each index node. .
  • Step S206 Calculate the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node, and sort each index node to select according to the first similarity calculation result.
  • a preset number of inodes are used as target inodes to obtain the target inode set.
  • the first similarity is used to represent a degree of similarity between the first feature word set and the second feature word set.
  • the first similarity may be a cosine similarity.
  • the cosine similarity of the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to any index node may be calculated.
  • the feature word set and the second feature word set extract keywords to obtain the first keyword set corresponding to the question to be answered and the second keyword set corresponding to the index node, and then calculate the first keyword set and the second keyword set.
  • the cosine similarity is obtained by calculating the cosine of the angle between the two word frequency vectors.
  • the server sorts each index node of the index database according to the magnitude of the cosine similarity, and selects a preset number of index nodes as target index nodes according to the sorting result to obtain a target index node set.
  • the server may sort the index nodes in descending order according to the magnitude of the cosine similarity, select the index node of TOPN1 as the target index node, and N1 is a preset value set in advance, which can be set and adjusted based on experience.
  • Step S208 Obtain a question-answer pair corresponding to each target index node in the target index node set from the consultation database.
  • each index node in the index stores a pointer to a corresponding row in a table in the consultation database.
  • the corresponding row of data corresponding to the index node can be obtained through the pointer, and the question-answer pair is data of one column in the row of data, so the corresponding question-answer pair can be obtained through the index node.
  • Step S210 Calculate the second similarity between the current question to be answered and the question corresponding to each question and answer pair, sort each question and answer pair according to the second similarity calculation result to select the target question and answer pair, and perform Consultation data is recommended.
  • the second similarity is used to characterize the similarity between the current question to be answered and the question corresponding to each question-answer pair.
  • the second similarity may be a string similarity. Calculate the second similarity between the current question to be answered and the question corresponding to each question and answer pair. Specifically, it may include the following steps: After the server obtains the question and answer pair corresponding to each target index node, first calculate the current question and answer The edit distance between each question and answer pair in the obtained question and answer pair. The edit distance refers to the minimum required to edit a single character (such as modify, insert, delete) when modifying from one string to another. frequency.
  • Similarity (Max (x, y) -Levenshtein) / Max (x, y) , x is the length of the string corresponding to the question to be answered, y is the length of the string corresponding to the question in the question and answer pair, and Levenshtein is the editing distance.
  • the server sorts each question-and-answer pair obtained in step S208 according to the magnitude of the string similarity, and then selects a preset number of question-and-answer pairs as target question-and-answer pairs according to the sorting results, and performs consultation data recommendation based on these target question-and-answer pairs.
  • the server may sort the question and answer pairs obtained in step S208 in descending order according to the similarity of the string, select the question and answer pair of TOPN2 as the target question and answer pair, N2 is a preset value, and may be based on experience Make adjustments.
  • the server recommends the diagnosis data according to the target question and answer pairs, which may be to recommend all target question and answer pairs to the doctor terminal, or to select any one question and answer pair to recommend to the doctor terminal, or to rank first.
  • the question-and-answer pair is recommended to the doctor's terminal, and how to recommend it is not limited in this application.
  • the server may also directly select the answers in the target question-and-answer pairs to recommend to the doctor terminal, may recommend all the answers of the target question-and-answer pairs to the doctor terminal, or may recommend the answers to any of the question-and-answer pairs. It is recommended to the doctor terminal, or the answer selected by the first question-answer pair is recommended to the doctor terminal. How to recommend it is not limited in the present invention.
  • the server first obtains the feature word set corresponding to the question to be answered, and then calculates the first similarity between the feature word set of the question to be answered and the feature word set of each index node in the index, and selects the similarity
  • the largest nodes are used as the target nodes, and then the corresponding question-and-answer pairs of these nodes are found, and the second similarity between the question to be answered and the question in the question-and-answer pair is calculated.
  • the question-and-answer pairs with the largest string similarity are selected as the target question-and-answer pairs.
  • the application has been sorted twice to accurately locate the question and answer pairs that are most similar to the question to be answered, and to recommend based on the most similar question and answer pairs. Accurate answers, which improves the efficiency of the consultation.
  • the method before step S202, the method includes:
  • Step S302 Obtain a questionnaire information set corresponding to each previous questionnaire, and preprocess the questionnaire information set.
  • the previous consultations refer to the various consultations completed before the current time
  • the consultation information set refers to the information composed of the consultation information of the consultation user and the reply information of the doctor user in a complete consultation. Collect consultation information.
  • preprocessing includes clauses, referential resolution, context processing, and the like.
  • Sentence refers to the segmentation of a piece of information into a single sentence; referential resolution refers to the calculation of the reference content of the pronoun in the sentence, which can be calculated by syntactic analysis and editing distance; context processing refers to the completion of the context. For example: D: Are you dizzy? U: Yes, I am dizzy. Make the meaning of the second sentence more comprehensive; context processing uses syntactic analysis and sentence pattern judgment.
  • step S304 question-and-answer pairs are extracted from the pre-processed questionnaire information set, and feature extraction is performed on the extracted question-and-answer pairs.
  • Extracting the question-and-answer pairs means extracting the question-and-answer pairs from the questionnaire information corresponding to a complete consultation.
  • the server performs feature extraction on the extracted question-answer pairs.
  • feature extraction may be extracting keywords for questions in a question-answer pair.
  • the extracted features may be, for example, the number of single sentences in the question-answer pair, the number of adjectives, question words, and so on.
  • step S306 the question-and-answer pairs and the features corresponding to the question-and-answer pairs are correspondingly stored in the questionnaire database.
  • the server stores the features corresponding to the question-answer pairs and the question-answer pairs in the inquiry database, that is, stores the features corresponding to the question-answer pairs and the question-answer pairs as different columns in the same row of the table in the database.
  • the inquisition user communicates with the doctor through an instant message during the inquiries, and the message carries the user IDs of both parties in the communication, including the inquisition user ID and the doctor user ID.
  • the inquiries terminal The information sent carries the user ID of the consultation, and the information sent by the doctor's terminal carries the user ID of the doctor. Therefore, when the server obtains the questionnaire information corresponding to previous consultations, it can also obtain the user identifier corresponding to the questionnaire information, and The user identifier corresponding to the question-answer pair and the feature corresponding to the question-answer pair are stored one-to-one in the consultation database.
  • step S308 an inquiry database is indexed according to the characteristics.
  • the server establishes an index according to the column data of the features in the questionnaire database, and each node in the index corresponds to a row of data in the questionnaire database, including at least the features corresponding to the question-answer pair and the question-answer pair.
  • the server may also create an index based on user identification and characteristics.
  • extracting question-and-answer pairs from the pre-processed questionnaire information includes:
  • Step S304A Obtain a user ID corresponding to each piece of questionnaire information in the questionnaire information set, and the user identifier is a questioner user ID or a doctor user ID.
  • each piece of inquiry information in the inquiry information corresponds to a user ID.
  • the corresponding user ID is the inquiry user ID
  • the corresponding user ID is Doctor user ID.
  • step S306B the consultation information corresponding to the doctor user ID is filtered according to a preset rule.
  • the preset rule at least includes: filtering out messages ending with an interrogative word, and messages matching a preset set of phrases.
  • Interrogative words can be, for example, "what to do”, “what is going on”, “why” and so on.
  • the preset idioms are sentences set in advance by the doctor's terminal to save response time, for example, “Please wait a moment”, “Hello, I'm not in the class at present", and so on.
  • Step S308C the questionnaire information set of the filtered question text is extracted according to punctuation marks and question words.
  • the filtered questionnaire information is traversed from the first questionnaire information, and the user ID corresponding to each questionnaire information is obtained in turn.
  • the questionnaire is determined. Does the information include a question sentence, and if so, the question sentence is used as one of the questions in the question and answer pair, starting from the questionnaire information corresponding to the first doctor user ID in the following corresponding to the question, obtaining all consecutive doctor user ID correspondences Until the questioning information corresponding to the next questioning user ID appears, the questioning information corresponding to the obtained doctor user ID is used as the answer to the question sentence to form a question and answer pair.
  • the extracted question-and-answer pairs can include one answer to one question, or multiple consecutive answers to one question, or one consecutive answer to multiple questions, or multiple consecutive answers to multiple consecutive questions. The specific consultation depends on the situation, and this application is not limited here.
  • performing feature extraction on the extracted question-and-answer pairs includes: segmenting the questions in the extracted question-and-answer pairs to obtain a set of words corresponding to the questions; and separating each word in the word set from a pre-established feature word library. Each word is matched. When the match is successful, the word is used as the extracted feature.
  • the server may first perform word segmentation on the questions in the extracted question and answer pair to obtain a word set corresponding to the question.
  • word segmentation on the questions in the extracted question and answer pair to obtain a word set corresponding to the question.
  • Word segmentation can also be used to perform segmentation processing on each segmented sentence.
  • Word segmentation is a method of machine speech judgment and uses syntactic and semantic information to process ambiguity to segment words.
  • each word in the word set obtained by the segmentation is matched with each word in a pre-established feature word library, and the matched words are used as feature words.
  • the match may be that the two words are exactly the same.
  • the matching may be that the similarity between the two words exceeds a preset threshold, such as "belly pain” and "belly pain” as two words that match each other.
  • the feature vocabulary can be the authoritative explanation of various diseases obtained from the existing medical database, including its corresponding introduction, symptoms, complications, therapeutic drugs, common examinations and other professional information, and it can also be the medical treatment of various drugs.
  • the medical data can also be an open source medical data source on the Internet in real time or regularly through tools such as web crawlers (for example, questions and answers about different diseases on various forums, discussions, etc., or Various new medical cases, medical question and answer texts, etc.) to obtain specific types of information (for example, treatment plans corresponding to different diseases, treatment drugs, affiliated departments, clinical manifestations, etc.).
  • tools such as web crawlers (for example, questions and answers about different diseases on various forums, discussions, etc., or Various new medical cases, medical question and answer texts, etc.) to obtain specific types of information (for example, treatment plans corresponding to different diseases, treatment drugs, affiliated departments, clinical manifestations, etc.).
  • the steps of respectively calculating the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node include:
  • Step S502 Calculate a feature weight for each feature word in the first feature word set to obtain a first calculation result, select keywords based on the first calculation result, and obtain a first keyword set corresponding to a question to be currently answered.
  • feature weights are used to characterize the importance of a feature. The larger the feature weight, the more important the feature word is, and the more it can represent the meaning of a word set.
  • term feature frequency-inverse document frequency (TF-IDF) algorithm may be used to calculate feature weights for each feature word.
  • TF-IDF term feature frequency-inverse document frequency
  • a first settlement result is obtained after calculating feature weights.
  • the first calculation result refers to a weight value corresponding to each feature word in the first word set.
  • the feature words can be sorted according to the weight value, and then keywords are selected according to the sorting result, thereby obtaining a first keyword set.
  • the server may sort each feature word in the first feature word set in descending order according to the feature weight, and then select a preset number of feature words that are ranked first as keywords to obtain the first keyword set.
  • Step S504 calculating feature weights for each feature word in the second feature word set to obtain a second calculation result, selecting keywords according to the second calculation result, and obtaining a second keyword set corresponding to each index node.
  • a term frequency-inverse document frequency algorithm may be used to calculate feature weights for each feature word in the second feature word set to obtain a second calculation result, and the second calculation result refers to the feature weight of each feature word in the second word set.
  • the feature words can be sorted according to the weight value, and then the keywords are selected according to the ranking result to obtain a second keyword set.
  • the server may sort each feature word in the second feature word set in descending order according to the feature weight, and then select a preset number of feature words that are ranked first as keywords to obtain a second keyword set.
  • step S506 the first word frequency vector corresponding to the current question to be answered and the second word frequency vector corresponding to each index node are obtained according to the first keyword set and the second keyword set.
  • first keyword set and the second keyword set are combined to obtain a union, and the word frequencies of each keyword in the union set in the first feature word set and the second feature word set are calculated respectively according to the word frequencies.
  • a first word frequency vector and a second word frequency vector are generated. For example, if the first feature word set is: cough / smoker / insomnia, its corresponding keyword set is ⁇ cough, smoking ⁇ ; the second feature word set is: headache / cough / running nose / cooling, and its corresponding The key word is ⁇ headache, runny nose ⁇ . Combine the two keywords to get ⁇ cough, smoking, headache, runny nose ⁇ .
  • the word frequency of each word in the set in the first feature word set is: cough 1, smoking 1, headache 0, runny nose 0, the word frequency of each word in the set in the first feature word set is: cough 1, smoking 0, headache 1, runny nose 1, and finally the first word frequency vector is [1,1 , 0,0], and the second word frequency vector is [1,0,1,1].
  • Step S508 Calculate the cosine of the angle between each first word frequency vector and each second word frequency vector to obtain the first similarity.
  • a i is a first word frequency vector
  • B i for the second word frequency vector
  • the cosine similarity of the two feature word sets is calculated by extracting keywords from the feature word set and obtaining the word frequency vector. Compared with calculating the similarity of the two documents to be answered by the question and answer, the savings The amount of calculation improves the calculation efficiency.
  • calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result includes:
  • step S602 an initial feature weight of each feature word in the first feature word set is calculated using a word frequency-inverse document frequency algorithm.
  • Word frequency TF number of times a word appears in a document / total number of words in the document;
  • step S604 it is sequentially judged whether each feature word in the first feature word set satisfies a preset adjustment rule. If so, the process proceeds to step S606; if not, the process proceeds to step S608.
  • Step S606 Adjust the initial weight of the feature words according to the adjustment rule to obtain the final feature weight.
  • step S608 the initial feature weight is used as the final feature weight.
  • the preset adjustment rule is a rule for manually adjusting a feature weight of a feature word.
  • the preset adjustment rule may be: when two feature words appear at the same time and the difference between their corresponding feature weights is less than a preset threshold, then the weight of one of the words is adjusted to make the difference in weights Not less than the preset threshold, for example, when headache and hand pain appear as feature words at the same time, and the difference between their corresponding feature weights is less than 0.2, the feature weight of headache is adjusted so that the feature weight difference between headache and hand pain More than 0.2, the purpose of doing this is to increase the weight of the feature words that have a greater effect on the symptoms, thereby improving the accuracy of keyword selection.
  • the accuracy of keyword selection can be improved by adjusting feature weights.
  • steps in the flowchart of FIG. 2-6 are sequentially displayed in accordance with the directions of the arrows, these steps are not necessarily performed in the order indicated by the arrows. Unless explicitly stated in this document, the execution of these steps is not strictly limited, and these steps can be performed in other orders. Moreover, at least a part of the steps in FIG. 2-6 may include multiple sub-steps or stages. These sub-steps or stages are not necessarily performed at the same time, but may be performed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed in turn or alternately with at least a part of another step or a sub-step or stage of another step.
  • a consultation data recommendation device 700 including:
  • a first feature word set acquisition module 702 configured to obtain a current question to be answered, segment the current question to be answered, extract feature words according to the result of the word segmentation, and obtain a first feature word set corresponding to the current question to be answered;
  • a second feature word set obtaining module 704 configured to obtain a second feature word set corresponding to each index node in a pre-established index
  • the target index node set acquisition module 706 is configured to respectively calculate a first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node.
  • Each index node is sorted to select a preset number of index nodes as target index nodes to obtain a target index node set;
  • a question-and-answer pair acquisition module 708 is configured to obtain a question-and-answer pair corresponding to each target index node in the target index node set from the consultation database;
  • the recommendation module 710 is configured to separately calculate a second similarity between a current question to be answered and a question corresponding to each question and answer pair, and rank each question and answer pair according to the second similarity calculation result to select a target question and answer pair, and according to the selected target Questions and answers are recommended for questioning data.
  • the foregoing apparatus further includes:
  • a pre-processing module 802 configured to obtain a set of inquiry information corresponding to previous visits, and pre-process the set of inquiry information
  • a feature extraction module 804 configured to extract question and answer pairs from the pre-processed questionnaire information set, and perform feature extraction on the extracted question and answer pairs;
  • a storage module 806, configured to correspondingly store question and answer pairs and the characteristics corresponding to the question and answer pairs in the questionnaire database
  • the index establishing module 808 is configured to index the consultation database according to characteristics.
  • the feature extraction module 804 is further configured to obtain a user ID corresponding to each piece of questioning information in the questioning information set, and the user identifier is a questioning user identifier or a doctor user identifier; Filtering is performed according to preset rules; the filtered questionnaire information set is extracted according to punctuation marks and question words.
  • the feature extraction module 804 is further configured to perform segmentation on the questions in the extracted question and answer pairs to obtain a set of words corresponding to the questions; and match each word in the word set with each word in a pre-established feature word library When the match is successful, the word is used as the extracted feature.
  • the target index node set acquisition module 706 is further configured to calculate feature weights for each feature word in the first feature word set to obtain a first calculation result, select keywords based on the first calculation result, and obtain a current question to be answered. Corresponding first keyword set; calculating feature weights for each feature word in the second feature word set to obtain a second calculation result, selecting keywords based on the second calculation result, and obtaining a second keyword set corresponding to each index node; A keyword set and a second keyword set to obtain the first word frequency vector corresponding to the current question to be answered and the second word frequency vector corresponding to each index node; and calculate the angle between each first word frequency vector and each second word frequency vector The cosine value gives the first similarity.
  • the target index node set acquisition module 706 is further configured to calculate an initial feature weight of each feature word in the first feature word set using a word frequency-inverse document frequency algorithm; when any feature word in the first feature word set When the preset adjustment rules are satisfied, the initial feature weights of the feature words are adjusted according to the preset adjustment rules to obtain the final feature weights; when any feature word in the first feature word set does not meet the preset adjustment rules, the initial The feature weight is used as the final feature weight.
  • Each module in the above-mentioned consultation data recommendation device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 8.
  • the computer device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile computer-readable storage medium and an internal memory.
  • the non-volatile computer-readable storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for operating the operating system and computer-readable instructions in a non-volatile computer-readable storage medium.
  • the database of the computer equipment is used to store data such as question-answer pairs, characteristics corresponding to the question-answer pairs.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by a processor to implement a method for recommending diagnosis data.
  • FIG. 8 is only a block diagram of a part of the structure related to the scheme of the present application, and does not constitute a limitation on the computer equipment to which the scheme of the present application is applied.
  • the specific computer equipment may be Include more or fewer parts than shown in the figure, or combine certain parts, or have a different arrangement of parts.
  • a computer device includes a memory and one or more processors.
  • Computer-readable instructions are stored in the memory.
  • the one or more processors execute the following steps: obtaining a current question to be answered , Segmenting the current question to be answered, and extracting feature words according to the result of the segmentation, to obtain a first feature word set corresponding to the current question to be answered; obtaining a second feature word set corresponding to each index node in a pre-established index; and calculating the current wait
  • the first similarity between the first feature word set corresponding to the answer question and the second feature word set corresponding to each index node, each index node is sorted according to the first similarity calculation result to select a preset number of index nodes as Target inodes to obtain the target inode set; obtain the question-answer pairs corresponding to each target inode in the target inode set from the consultation database; calculate the second similarity between the current question to be answered and the question corresponding to each question-answer pair For each question and answer
  • the processor executes the computer-readable instructions: obtaining the inquiry information set corresponding to the previous consultations, preprocessing the inquiry information set;
  • the processed question and answer information set extracts question and answer pairs, and extracts the features of the question and answer pairs; correspondingly stores the features of the question and answer pairs and the question and answer pairs into the question and answer database; and indexes the question and answer database based on the features.
  • extracting question-and-answer pairs from the pre-processed questionnaire information includes: obtaining a user ID corresponding to each piece of questionnaire information in the questionnaire information set, and the user ID is a questioning user ID or a doctor user ID;
  • the questionnaire information corresponding to the user ID is filtered according to a preset rule; for the filtered questionnaire information set, question and answer pairs are extracted according to punctuation marks and question words.
  • performing feature extraction on the extracted question-and-answer pairs includes: segmenting the questions in the extracted question-and-answer pairs to obtain a set of words corresponding to the questions; and separating each word in the word set from a pre-established feature word library. Each word is matched. When the match is successful, the word is used as the extracted feature.
  • the steps of respectively calculating the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node include: The feature weight is calculated for each feature word to obtain a first calculation result, and keywords are selected according to the first calculation result to obtain a first keyword set corresponding to the current question to be answered; the feature weight is calculated for each feature word in the second feature word set to obtain a second Calculate the results, select keywords based on the second calculation results, and obtain a second keyword set corresponding to each index node; obtain the first word frequency vector and each index corresponding to the current question to be answered according to the first keyword set and the second keyword set The second word frequency vector corresponding to the node; the angle cosine between each first word frequency vector and each second word frequency vector is calculated to obtain the first similarity.
  • calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result includes: using a word frequency-inverse document frequency algorithm to calculate an initial feature weight for each feature word in the first feature word set; When any feature word in the first feature word set meets a preset adjustment rule, the initial feature weight of the feature word is adjusted according to the preset adjustment rule to obtain the final feature weight; when any one of the first feature word set When the feature words do not satisfy the preset adjustment rules, the initial feature weight is taken as the final feature weight.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the one or more processors execute the following steps: obtaining the current pending answer. Questions, segment the current question to be answered, and extract feature words based on the results of the segmentation to obtain the first feature word set corresponding to the question currently to be answered; obtain the second feature word set corresponding to each index node in the pre-established index; calculate the current separately
  • the first similarity between the first feature word set corresponding to the question to be answered and the second feature word set corresponding to each index node, each index node is sorted according to the first similarity calculation result to select a preset number of index nodes As a target index node, a target index node set is obtained; a question-answer pair corresponding to each target index node in the target index node set is obtained from the consultation database; and a second similarity between the current question to be answered and the question corresponding to each question-answer pair is calculated separately Degree based on
  • the following steps are also implemented: obtaining the inquiry information set corresponding to previous visits, and pre-processing the inquiry information set;
  • the pre-processed questionnaire information set extracts question-and-answer pairs and extracts features from the question-and-answer pairs; correspondingly stores the features of the question-and-answer pairs and question-and-answer pairs in the question-and-answer database; and indexes the question-and-answer database based on the features.
  • extracting question-and-answer pairs from the pre-processed questionnaire information includes: obtaining a user ID corresponding to each piece of questionnaire information in the questionnaire information set, and the user ID is a questioning user ID or a doctor user ID;
  • the questionnaire information corresponding to the user ID is filtered according to a preset rule; for the filtered questionnaire information set, question and answer pairs are extracted according to punctuation marks and question words.
  • performing feature extraction on the extracted question-and-answer pairs includes: segmenting the questions in the extracted question-and-answer pairs to obtain a set of words corresponding to the questions; and separating each word in the word set from a pre-established feature word library. Each word is matched. When the match is successful, the word is used as the extracted feature.
  • the steps of respectively calculating the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node include: The feature weight is calculated for each feature word to obtain a first calculation result, and keywords are selected according to the first calculation result to obtain a first keyword set corresponding to the current question to be answered; the feature weight is calculated for each feature word in the second feature word set to obtain a second Calculate the results, select keywords based on the second calculation results, and obtain a second keyword set corresponding to each index node; obtain the first word frequency vector and each index corresponding to the current question to be answered according to the first keyword set and the second keyword set The second word frequency vector corresponding to the node; the angle cosine between each first word frequency vector and each second word frequency vector is calculated to obtain the first similarity.
  • calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result includes: using a word frequency-inverse document frequency algorithm to calculate an initial feature weight for each feature word in the first feature word set; When any feature word in the first feature word set meets a preset adjustment rule, the initial feature weight of the feature word is adjusted according to the preset adjustment rule to obtain the final feature weight; when any one of the first feature word set When the feature words do not satisfy the preset adjustment rules, the initial feature weight is taken as the final feature weight.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Abstract

A medical consultation data recommendation method, comprising: acquiring a current question to be answered, performing word segmentation, extracting feature words according to a word segmentation result, and obtaining a first feature word set corresponding to the current question; acquiring a second feature word set corresponding to each index node in a pre-established index; calculating a cosine similarity level between the first feature word set and the second feature word set, sorting each index node according to a first similarity calculation result, so as to select a pre-determined number of index nodes as target index nodes, and obtaining a target index node set; acquiring, from a medical consultation database, question-answer pairs respectively corresponding to the target index nodes; and calculating second similarity levels between the current question and questions respectively corresponding to the question-answer pairs, sorting the question-answer pairs according to the second similarity levels, so as to select a target question-answer pair, and recommending medical consultation data according to the selected question-answer pair.

Description

问诊数据推荐方法、装置、计算机设备和存储介质Recommendation method, device, computer equipment and storage medium for consultation data
相关申请的交叉引用Cross-reference to related applications
本申请要求于2018年07月04日提交中国专利局,申请号为2018107242917,申请名称为“问诊数据推荐方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed on July 04, 2018 with the Chinese Patent Office under the application number 2018107242917, and the application name is "Recommendation Method, Device, Computer Equipment, and Storage Medium for Interrogation Data" Citations are incorporated in this application.
技术领域Technical field
本申请涉及一种问诊数据推荐方法、装置、计算机设备和存储介质。The present application relates to a method, an apparatus, a computer device, and a storage medium for recommending consultation data.
背景技术Background technique
随着互联网技术的飞速发展,基于互联网的在线问诊和在线健康咨询得到越来越多人的青睐。在线问诊和在线健康咨询中,每个用户在提出问题之后,都期望得到医生最迅速的回答。With the rapid development of Internet technology, Internet-based online consultation and online health consultation have become more and more popular. In online consultation and online health consultation, each user expects the fastest response from the doctor after asking a question.
传统技术中,医生在看到用户的提问之后,需要经过思考,组织语言,书写回答最后点击发送,用户才能看到对问题的回复,然而,发明人意识到,这种方式导致问诊效率低下。In the traditional technology, after seeing the user ’s question, the doctor needs to think, organize the language, write the answer, and click to send it, and then the user can see the response to the question. However, the inventor realized that this method caused inefficient consultation .
发明内容Summary of the invention
根据本申请公开的各种实施例,提供一种问诊数据推荐方法、装置、计算机设备和存储介质。According to various embodiments disclosed in the present application, a method, an apparatus, a computer device, and a storage medium for recommending consultation data are provided.
一种问诊数据推荐方法,包括:A method for recommending consultation data includes:
获取当前待回答问题,对所述当前待回答问题进行分词,根据分词结果提取特征词,得到所述当前待回答问题对应的第一特征词集合;Acquiring the current question to be answered, segmenting the current question to be answered, and extracting feature words according to the result of the word segmentation to obtain a first feature word set corresponding to the current question to be answered;
获取预先建立的索引中各个索引节点对应的第二特征词集合;Obtaining a second feature word set corresponding to each index node in a pre-established index;
分别计算所述当前待回答问题对应的第一特征词集合与各个索引节点对应的第二特征词集合之间的第一相似度,根据第一相似度计算结果对各个索引节点进行排序以选取预设数量的索引节点作为目标索引节点,得到目标索引节点集合;Calculate the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node, and sort each index node to select a pre- Set the number of inodes as the target inodes to get the target inode set;
从问诊数据库中获取目标索引节点集合中各个目标索引节点对应的问答对;及Obtaining question and answer pairs corresponding to each target index node in the target index node set from the consultation database; and
分别计算所述当前待回答问题与各个问答对对应的问题之间的第二相似度,根据第二相似度计算结果对各个问答对进行排序以选取目标问答对,根据选取的所述目标问答对进行问诊数据推荐。Calculate a second similarity between the current question to be answered and a question corresponding to each question-answer pair, rank each question-answer pair according to the second similarity calculation result to select a target question-answer pair, and according to the selected target question-answer pair Recommendations for consultation data.
一种问诊数据推荐装置包括:A device for recommending consultation data includes:
第一特征词集合获取模块,用于获取当前待回答问题,对所述当前待回答问题进行分词,根据分词结果提取特征词,得到所述当前待回答问题对应的第一特征词集合;A first feature word set acquisition module, configured to obtain a current question to be answered, perform word segmentation on the current question to be answered, extract feature words according to the word segmentation result, and obtain a first feature word set corresponding to the current question to be answered;
第二特征词集合获取模块,用于获取预先建立的索引中各个索引节点对应的第二特征词集合;A second feature word set acquisition module, configured to obtain a second feature word set corresponding to each index node in a pre-established index;
目标索引节点集合获取模块,用于分别计算所述当前待回答问题对应的第一特征词集合与各个索引节点对应的第二特征词集合之间的第一相似度,根据第一相似度计算结果对各个索引节点进行排序以选取预设数量的索引节点作为目标索引节点,得到目标索引节点集合;A target index node set acquisition module is configured to respectively calculate a first similarity between a first feature word set corresponding to the current question to be answered and a second feature word set corresponding to each index node, and calculate a result according to the first similarity Sorting each index node to select a preset number of index nodes as target index nodes to obtain a target index node set;
问答对获取模块,用于从问诊数据库中获取目标索引节点集合中各个目标索引节点对应的问答对;及A question-and-answer pair acquisition module, for obtaining a question-and-answer pair corresponding to each target index node in the target index node set from the consultation database; and
推荐模块,用于分别计算所述当前待回答问题与各个问答对对应的问题之间的第二相似度,根据第二相似度计算结果对各个问答对进行排序以选取目标问答对,根据选取的所述目标问答对进行问诊数据推荐。A recommendation module is configured to separately calculate a second similarity between the current question to be answered and a question corresponding to each question and answer pair, and sort each question and answer pair to select a target question and answer pair according to the second similarity calculation result, and according to the selected question and answer pair, The target question-and-answer is recommended for consultation data.
一种计算机设备,包括存储器和一个或多个处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时实现本申请任意一个实施例中提供的问诊数据推荐方法的步骤。A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the steps of the method for recommending diagnosis data provided in any embodiment of the present application are implemented. .
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现本申请任意一个实施例中提供的问诊数据推荐方法的步骤。One or more non-transitory computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors implement one of the embodiments of the present application. The steps provided in the recommended method of consultation data.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below. Other features and advantages of the application will become apparent from the description, the drawings, and the claims.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to explain the technical solutions in the embodiments of the present application more clearly, the drawings used in the embodiments are briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. Those of ordinary skill in the art can also obtain other drawings according to these drawings without paying creative labor.
图1为根据一个或多个实施例中问诊数据推荐方法的应用场景图。FIG. 1 is an application scenario diagram of a method for recommending diagnosis data according to one or more embodiments.
图2为根据一个或多个实施例中问诊数据推荐方法的流程示意图。FIG. 2 is a schematic flowchart of a method for recommending diagnosis data according to one or more embodiments.
图3为根据一个或多个实施例中步骤S202之前的流程示意图。FIG. 3 is a schematic flowchart before step S202 in one or more embodiments.
图4为根据一个或多个实施例中步骤S304对应的流程示意图。FIG. 4 is a schematic flowchart of step S304 according to one or more embodiments.
图5为根据一个或多个实施例中步骤S206对应的流程示意图。FIG. 5 is a schematic flowchart of step S206 according to one or more embodiments.
图6为根据一个或多个实施例中步骤S502对应的流程示意图。FIG. 6 is a schematic flowchart of step S502 according to one or more embodiments.
图7为根据一个或多个实施例中问诊数据推荐装置的结构框图。FIG. 7 is a structural block diagram of a consultation data recommendation device according to one or more embodiments.
图8为另一个实施例中问诊数据推荐装置的框图。FIG. 8 is a block diagram of a consultation data recommendation device in another embodiment.
图9为根据一个或多个实施例中计算机设备的框图。FIG. 9 is a block diagram of a computer device according to one or more embodiments.
具体实施方式detailed description
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the technical solution and advantages of the present application more clear and clear, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the application, and are not used to limit the application.
本申请提供的问诊数据推荐方法,可以应用于如图1所示的应用环境中。问诊终端102、医生终端104分别通过网络与服务器106进行通信。服务器106在接收到问诊终端发送的待回答问题后,对当前待回答问题进行分词,根据分词结果提取特征词,得到当前待回答问题对应的第一特征词集合,获取预先建立的索引库中各个索引节点对应的第二特征词集合,分别计算当前待回答问题对应的第一特征词集合与各个索引节点对应的第二特征词集合的第一相似度,根据第一相似度计算结果对各个索引节点进行排序以选取预设数量的索引作为目标索引节点,得到目标索引节点集合,从问诊信息数据库中查找每一条目标索引节点对应的问答对,分别计算当前待回答问题与各个问答对对应的问题之间的第二相似度,根据第二相似度计算结果对各个问答对进行排序以选取目标问答对,根据选取的目标问答对,相医生终端进行问诊数据推荐,推荐的问诊数据可以是整个目标问答对,也可以仅仅是目标问答对中的答复信息。The method for recommending diagnosis data provided in this application can be applied to the application environment shown in FIG. 1. The consultation terminal 102 and the doctor terminal 104 communicate with the server 106 through a network, respectively. After receiving the question to be answered sent by the consultation terminal, the server 106 performs word segmentation on the question to be answered, extracts feature words according to the word segmentation result, obtains a first feature word set corresponding to the question to be answered, and obtains a pre-built index database. The second feature word set corresponding to each index node calculates the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node. The inodes are sorted to select a preset number of indexes as the target inodes to obtain the target inode set. The questionnaire pairs corresponding to each target inode are searched from the consultation information database, and the current question to be answered is corresponding to each question and answer pair. The second similarity between the questions, according to the calculation result of the second similarity, sort each question and answer pair to select the target question and answer pair, and according to the selected target question and answer pair, the doctor's terminal recommends the consultation data, and the recommended consultation data Can be the entire target Q & A pair or just the target Q & A The reply message.
问诊终端102、医生终端104可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The consultation terminal 102 and the doctor terminal 104 may be, but are not limited to, various personal computers, notebook computers, smart phones, and tablet computers. The server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
在一些实施例中,如图2所示,提供了一种问诊数据推荐方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:In some embodiments, as shown in FIG. 2, a method for recommending diagnosis data is provided. The method is applied to the server in FIG. 1 as an example, and includes the following steps:
步骤S202,获取当前待回答问题,对当前待回答问题进行分词,根据分词结果提取特征词,得到当前待回答问题对应的第一特征词集合。Step S202: Obtain a current question to be answered, segment the current question to be answered, and extract feature words according to the result of the word segmentation to obtain a first feature word set corresponding to the question currently to be answered.
具体地,待回答问题指的是问诊用户在问诊终端输入的问诊问题。当问诊用户在问诊终端输入问诊问题时,服务器会接收到问诊终端发送的问诊问题,对该问诊问题进行分词,得到分词结果,分词结果指的是分词后得到的一个一个的词语组成的词语序列。如,“我肚子痛怎么办”分词后得到的分词结果可以为:我/肚子痛/怎么办。Specifically, the question to be answered refers to a question entered by the questioning user at the questioning terminal. When the inquisition user enters an inquisition question in the inquisition terminal, the server will receive the inquisition question sent by the inquisition terminal, segment the question, and obtain the segmentation result. The segmentation result refers to the one obtained after the segmentation. Sequence of words. For example, the segmentation result obtained after the segmentation of "What should I do for my stomachache" can be: I / Stomachache / What to do.
对当前待回答问题进行分词,可首先根据标点符号将待回答问题分成一条条完整的语句,再对各个切分的语句进行分词处理,如可利用字符串匹配的分词方法对各个切分的语句进行分词处理,如正向最大匹配法,把一个切分的语句中的字符串从左至右来分词;或者,反向最大匹配法,把一个切分的语句中的字符串从右至左来分词;或者,最短路径分词法,一个切分的语句中的字符串里面要求切出的词数是最少的;或者,双向最大匹配法,正反向同时进行分词匹配。还可利用词义分词法对各个切分的语句进行分词处理, 词义分词法是一种机器语音判断的分词方法,利用句法信息和语义信息来处理歧义现象来分词。还可利用统计分词法对各个切分的语句进行分词处理,从当前用户的历史搜索记录或大众用户的历史搜索记录中,根据词组的统计,会统计有些两个相邻的字出现的频率较多,则可将这两个相邻的字作为词组来进行分词。To segment the current question to be answered, you can first divide the question to be answered into complete sentences according to punctuation, and then perform word segmentation on each segmented sentence. For example, you can use the word segmentation method of string matching to segment each sentence. Perform word segmentation processing, such as forward maximum matching, which divides strings in a segmented sentence from left to right, or reverse maximum matching, which divides strings in a segmented sentence from right to left Word segmentation; or shortest path word segmentation, where the number of words in a string of a segmented sentence is required to be cut to a minimum; or, two-way maximum matching, which performs word segmentation matching in both forward and reverse directions. Word segmentation can also be used to perform segmentation processing on each segmented sentence. Word segmentation is a method of machine speech judgment, which uses syntactic and semantic information to process ambiguity to segment words. You can also use statistical word segmentation to perform word segmentation on each segmented sentence. From the historical search history of the current user or the historical search history of the general user, according to the statistics of the phrase, it will be counted that some two adjacent words appear more frequently. If there are many, you can use these two adjacent words as phrases to perform segmentation.
进一步,服务器根据分词结果提取特征词。在一些实施例中,提取特征词具体可以是将分词结果中各个词与预先建立的特征词库中各个词逐一进行匹配,将匹配上的词作为特征词。在一些实施例中,匹配可以是两个词完全相同。在另一些实施例中,匹配可以是两个词之间的相似度超过预设阈值,如“肚子痛”和“肚子疼”可作为相互匹配的两个词。特征词汇库可以是从现有的医疗数据库中获取的各种疾病的权威解释,包括其对应的简介、症状、并发症、治疗药品、常见检查等专业信息,也可以是各种药品对应的医疗信息,如药品主治的疾病类型等信息,该医疗数据也可以是通过网络爬虫等工具实时或者定时从互联网上的开源医疗数据源(例如,各大论坛上关于不同疾病的问答、讨论等,或各种新的医疗案例、医疗问答文本等)获取的特定类型的信息(例如,不同疾病对应的治疗方案、治疗药物、所属科室、临床表现等)。Further, the server extracts feature words according to the segmentation results. In some embodiments, extracting feature words may specifically match each word in the segmentation result with each word in a pre-established feature word library, and use the matched words as feature words. In some embodiments, the match may be that the two words are exactly the same. In other embodiments, the matching may be that the similarity between the two words exceeds a preset threshold, such as "belly pain" and "belly pain" as two words that match each other. The feature vocabulary can be the authoritative explanation of various diseases obtained from the existing medical database, including its corresponding introduction, symptoms, complications, therapeutic drugs, common examinations and other professional information, and it can also be the medical treatment of various drugs. Information, such as the type of disease treated by the drug, the medical data can also be an open source medical data source on the Internet in real time or regularly through tools such as web crawlers (for example, questions and answers about different diseases on various forums, discussions, etc., or Various new medical cases, medical question and answer texts, etc.) to obtain specific types of information (for example, treatment plans corresponding to different diseases, treatment drugs, affiliated departments, clinical manifestations, etc.).
步骤S204,获取预先建立的索引中各个索引节点对应的第二特征词集合。Step S204: Obtain a second feature word set corresponding to each index node in the pre-established index.
具体地,对于历史问诊数据,事先提取问答对,然后对问答对进行了特征提取,提取的特征中至少包括问答对中问题所对应的特征词,这些特征词组成第二特征词集合,并将问答对及其对应的特征保存至问诊数据库的数据表的同一行,最后根据特征所在的列数据对问诊数据库建立了索引,索引中各个索引节点包括索引值及指针,索引值至少包括各个问答对对应的第二特征词集合,指针是指一块内存区域,该内存区域记录的是对硬盘上记录的相应行的数据的引用。=问答对指的是问诊用户的问题与医生的答复所组成的信息对。问答对可以是由问诊用户的一个问题和医生的一个回答组成,也可以是由问诊用户一个问题和医生的多个回答组成,还可以是问诊用户的连续多个问题与医生的一个答复组成,还可以是由问诊用户的连续多个问题与医生的连续多个答复组成。Specifically, for historical questioning data, question and answer pairs are extracted in advance, and then feature extraction is performed on the question and answer pairs. The extracted features include at least the feature words corresponding to the questions in the question and answer pair. These feature words form the second feature word set, and Save the question-answer pairs and their corresponding features to the same row of the data table of the questionnaire database, and finally index the questionnaire database according to the column data of the feature. Each index node in the index includes the index value and pointer, and the index value includes at least The corresponding second feature word set of each question and answer pair, the pointer refers to a memory area, and the memory area records a reference to the data of the corresponding row recorded on the hard disk. = A question-answer pair refers to an information pair consisting of a question from a user and a reply from a doctor. A question-and-answer pair can consist of a question from the questioning user and one answer from the doctor, or a question from the questioning user and multiple answers from the doctor. It can also consist of multiple consecutive questions from the questioning user and one from the doctor. The response may consist of multiple consecutive questions from the user and multiple consecutive responses from the doctor.
在本实施例中,服务器依次遍历索引中各个索引节点,索取索引节点的索引值,得到各个索引节点对应的第二特征词集合。。In this embodiment, the server sequentially traverses each index node in the index, obtains the index value of the index node, and obtains a second feature word set corresponding to each index node. .
步骤S206,分别计算当前待回答问题对应的第一特征词集合与各个索引节点对应的第二特征词集合之间的第一相似度,根据第一相似度计算结果对各个索引节点进行排序以选取预设数量的索引节点作为目标索引节点,得到目标索引节点集合。Step S206: Calculate the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node, and sort each index node to select according to the first similarity calculation result. A preset number of inodes are used as target inodes to obtain the target inode set.
具体地,第一相似度用于表征第一特征词集合与第二特征词集合的相似程度。在一些实施例中,第一相似度可以为余弦相似度,计算当前待回答问题对应的第一特征词集合与任意一个索引节点对应的第二特征词集合的余弦相似度,可分别对第一特征词集合、第二特征词集合提取关键词,得到待回答问题对应的第一关键词集合以及索引节点对应的第二关键词集合,然后对第一关键词集合以及第二关键词集合计算其各自的词频向量,最后计算两个词频向量的夹角余弦值即得到余弦相似度。Specifically, the first similarity is used to represent a degree of similarity between the first feature word set and the second feature word set. In some embodiments, the first similarity may be a cosine similarity. The cosine similarity of the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to any index node may be calculated. The feature word set and the second feature word set extract keywords to obtain the first keyword set corresponding to the question to be answered and the second keyword set corresponding to the index node, and then calculate the first keyword set and the second keyword set. For each word frequency vector, the cosine similarity is obtained by calculating the cosine of the angle between the two word frequency vectors.
进一步,服务器根据余弦相似度的大小对索引库的各个索引节点进行排序,根据排序结果选取预设数量的索引节点作为目标索引节点,得到目标索引节点集合。在一些实施例中,服务器可根据余弦相似度的大小对索引节点进行降序排列,选取TOPN1的索引节点作为目标索引节点,N1为事先设定的预设值,可根据经验进行设定及调整。Further, the server sorts each index node of the index database according to the magnitude of the cosine similarity, and selects a preset number of index nodes as target index nodes according to the sorting result to obtain a target index node set. In some embodiments, the server may sort the index nodes in descending order according to the magnitude of the cosine similarity, select the index node of TOPN1 as the target index node, and N1 is a preset value set in advance, which can be set and adjusted based on experience.
步骤S208,从问诊数据库中获取目标索引节点集合中各个目标索引节点对应的问答对。Step S208: Obtain a question-answer pair corresponding to each target index node in the target index node set from the consultation database.
具体地,由于索引中的每一个索引节点中存储有指向问诊数据库中表中的相应行的指针。通过该指针可获取索引节点对应的相应行的数据,而问答对为该行数据中其中一列的数据,因此可通过索引节点获取到其对应的问答对。Specifically, because each index node in the index stores a pointer to a corresponding row in a table in the consultation database. The corresponding row of data corresponding to the index node can be obtained through the pointer, and the question-answer pair is data of one column in the row of data, so the corresponding question-answer pair can be obtained through the index node.
步骤S210,分别计算当前待回答问题与各个问答对对应的问题之间的第二相似度,根据第二相似度计算结果对各个问答对进行排序以选取目标问答对,根据选取的目标问答对进行问诊数据推荐。Step S210: Calculate the second similarity between the current question to be answered and the question corresponding to each question and answer pair, sort each question and answer pair according to the second similarity calculation result to select the target question and answer pair, and perform Consultation data is recommended.
具体地,第二相似度用于表征当前待回答问题与各个问答对对应的问题之间的相似程度。在一些实施例中,第二相似度可以为字符串相似度。计算当前待回答问题与各个问答对对应的问题之间的第二相似度,具体来说,可包括以下步骤:服务器获取到每一个目标索引节点对应的问答对后,首先计算当前待回答问题与获取的问答对中每一个问答对中问题之间的编辑距离,编辑距离指的是从一个字符串修改到另一个字符串时,其中编辑单个字符(比如修改、插入、删除)所需要的最少次数。然后根据编辑计算当前待回答问题与获取的问答对中每一个问答对中问题之间的字符串相似度,公式为:Similarity=(Max(x,y)-Levenshtein)/Max(x,y),x为待回答问题对应的字符串长度,y为问答对中问题所对应的字符串长度,Levenshtein为编辑距离。Specifically, the second similarity is used to characterize the similarity between the current question to be answered and the question corresponding to each question-answer pair. In some embodiments, the second similarity may be a string similarity. Calculate the second similarity between the current question to be answered and the question corresponding to each question and answer pair. Specifically, it may include the following steps: After the server obtains the question and answer pair corresponding to each target index node, first calculate the current question and answer The edit distance between each question and answer pair in the obtained question and answer pair. The edit distance refers to the minimum required to edit a single character (such as modify, insert, delete) when modifying from one string to another. frequency. Then calculate the string similarity between the current question to be answered and each question in the obtained question and answer pair according to the editor, the formula is: Similarity = (Max (x, y) -Levenshtein) / Max (x, y) , x is the length of the string corresponding to the question to be answered, y is the length of the string corresponding to the question in the question and answer pair, and Levenshtein is the editing distance.
进一步,服务器根据字符串相似度的大小对步骤S208中获取的各个问答对进行排序,然后根据排序结果选取预设数量的问答对作为目标问答对,根据这些目标问答对进行问诊数据推荐。在一些实施例中,服务器可根据字符串相似度的大小对对步骤S208中获取的各个问答对进行降序排列,选取TOPN2的问答对作为目标问答对,N2为事先设定的值,可根据经验进行调整。Further, the server sorts each question-and-answer pair obtained in step S208 according to the magnitude of the string similarity, and then selects a preset number of question-and-answer pairs as target question-and-answer pairs according to the sorting results, and performs consultation data recommendation based on these target question-and-answer pairs. In some embodiments, the server may sort the question and answer pairs obtained in step S208 in descending order according to the similarity of the string, select the question and answer pair of TOPN2 as the target question and answer pair, N2 is a preset value, and may be based on experience Make adjustments.
在一些实施例中,服务器根据目标问答对进行问诊数据推荐可以是将所有的目标问答对推荐给医生终端,也可以是选择任选一个问答对推荐给医生终端,或者是将排在第一的问答对推荐给医生终端,具体如何推荐,本申请在此不做限定。In some embodiments, the server recommends the diagnosis data according to the target question and answer pairs, which may be to recommend all target question and answer pairs to the doctor terminal, or to select any one question and answer pair to recommend to the doctor terminal, or to rank first. The question-and-answer pair is recommended to the doctor's terminal, and how to recommend it is not limited in this application.
在另一些实施例中,服务器也可以是直接选取目标问答对中的回答推荐给医生终端,可以是将所有目标问答对的回答都推荐给医生终端,也可以是任选一个问答对的回答推荐给医生终端,或者是选择排在第一的问答对的回答推荐给医生终端,具体如何推荐,本发明在此不做限制。In other embodiments, the server may also directly select the answers in the target question-and-answer pairs to recommend to the doctor terminal, may recommend all the answers of the target question-and-answer pairs to the doctor terminal, or may recommend the answers to any of the question-and-answer pairs. It is recommended to the doctor terminal, or the answer selected by the first question-answer pair is recommended to the doctor terminal. How to recommend it is not limited in the present invention.
上述问诊数据推荐方法中,服务器首先获取待回答问题对应的特征词集合,然后计算待回答问题的特征词集合与索引中各个索引节点的特征词集合之间的第一相似度,选取相 似度最大的一些节点作为目标节点,然后查找这些节点对应的问答对,计算待回答问题与问答对中问题的第二相似度,选择字符串相似度最大的一些问答对作为目标问答对,根据这些问答对来进行问诊数据的推荐,本申请中通过两次排序,精准地定位了与待回答问题最相似的问答对,根据最相似的问答对来进行推荐,实现了问诊时自动为医生推荐精准的回答,从而提高了问诊的效率。In the above consultation data recommendation method, the server first obtains the feature word set corresponding to the question to be answered, and then calculates the first similarity between the feature word set of the question to be answered and the feature word set of each index node in the index, and selects the similarity The largest nodes are used as the target nodes, and then the corresponding question-and-answer pairs of these nodes are found, and the second similarity between the question to be answered and the question in the question-and-answer pair is calculated. The question-and-answer pairs with the largest string similarity are selected as the target question-and-answer pairs. For the recommendation of the consultation data, the application has been sorted twice to accurately locate the question and answer pairs that are most similar to the question to be answered, and to recommend based on the most similar question and answer pairs. Accurate answers, which improves the efficiency of the consultation.
在一些实施例中,如图3所示,步骤S202之前包括:In some embodiments, as shown in FIG. 3, before step S202, the method includes:
步骤S302,获取历次问诊对应的问诊信息集合,对问诊信息集合进行预处理。Step S302: Obtain a questionnaire information set corresponding to each previous questionnaire, and preprocess the questionnaire information set.
具体地,历次问诊指的是当前时间之前已完成的各次问诊,问诊信息集合指的是一次完整的问诊中由问诊用户的问诊信息与医生用户的回复信息组成的信息集合问诊信息。Specifically, the previous consultations refer to the various consultations completed before the current time, and the consultation information set refers to the information composed of the consultation information of the consultation user and the reply information of the doctor user in a complete consultation. Collect consultation information.
在本实施例中,预处理包括分句、指代消解、上下文处理等。分句指的是将一条信息切分为单个的句子;指代消解指的是计算句子中代词的指代内容,可通过句法分析和编辑距离进行计算;上下文处理指的是补全上下文。例如:D:你是不是头晕?U:是的,把是的扩展成我是头晕。让第二句表达的意思更加全面;上下文处理使用句法分析和句式判断。In this embodiment, preprocessing includes clauses, referential resolution, context processing, and the like. Sentence refers to the segmentation of a piece of information into a single sentence; referential resolution refers to the calculation of the reference content of the pronoun in the sentence, which can be calculated by syntactic analysis and editing distance; context processing refers to the completion of the context. For example: D: Are you dizzy? U: Yes, I am dizzy. Make the meaning of the second sentence more comprehensive; context processing uses syntactic analysis and sentence pattern judgment.
步骤S304,对预处理后的问诊信息集合提取问答对,并对提取的问答对进行特征抽取。In step S304, question-and-answer pairs are extracted from the pre-processed questionnaire information set, and feature extraction is performed on the extracted question-and-answer pairs.
具体地,在问诊用户一次完整的问诊中,通常会多次提出问题,问诊用户每一次提出问题后医生会进行答复,问诊用户的每一次提问时的问题和该问题对应的医生答复即组成一个问答对。提取问答对即从一次完整的问诊对应的问诊信息中将问答对提取出来。Specifically, in a complete consultation by the questioning user, questions are usually asked multiple times. The doctor will respond after each question is asked by the questioning user. Each time the question is asked by the questioning user and the doctor corresponding to the question. The answer constitutes a question and answer pair. Extracting the question-and-answer pairs means extracting the question-and-answer pairs from the questionnaire information corresponding to a complete consultation.
进一步,服务器对提取的问答对进行特征抽取。在一些实施例中,特征抽取可以是对问答对中的问题提取关键词。在另一些实施例中,抽取的特征例如可以是问答对中的单句数量、形容词个数、疑问词等等。Further, the server performs feature extraction on the extracted question-answer pairs. In some embodiments, feature extraction may be extracting keywords for questions in a question-answer pair. In other embodiments, the extracted features may be, for example, the number of single sentences in the question-answer pair, the number of adjectives, question words, and so on.
步骤S306,将问答对及问答对对应的特征对应存储至问诊数据库。In step S306, the question-and-answer pairs and the features corresponding to the question-and-answer pairs are correspondingly stored in the questionnaire database.
具体地,服务器将问答对和问答对对应的特征对应地存储至问诊数据库,即将问答对和问答对对应的特征存储为数据库中表的同一行中不同的列。Specifically, the server stores the features corresponding to the question-answer pairs and the question-answer pairs in the inquiry database, that is, stores the features corresponding to the question-answer pairs and the question-answer pairs as different columns in the same row of the table in the database.
在一些实施例中,问诊用户在问诊时,与医生通过即时消息进行通讯,消息中携带通讯双方各自的用户标识,包括问诊用户标识与医生用户标识,具体来说,由问诊终端发送的信息,携带问诊用户标识,由医生终端发送的信息携带医生用户标识,因此,服务器在获取到历次问诊对应的问诊信息时,可同时获取到问诊信息对应的用户标识,然后将问答对对应的用户标识与问答对、问答对对应的特征一一对应存储至问诊数据库。In some embodiments, the inquisition user communicates with the doctor through an instant message during the inquiries, and the message carries the user IDs of both parties in the communication, including the inquisition user ID and the doctor user ID. Specifically, the inquiries terminal The information sent carries the user ID of the consultation, and the information sent by the doctor's terminal carries the user ID of the doctor. Therefore, when the server obtains the questionnaire information corresponding to previous consultations, it can also obtain the user identifier corresponding to the questionnaire information, and The user identifier corresponding to the question-answer pair and the feature corresponding to the question-answer pair are stored one-to-one in the consultation database.
步骤S308,根据特征对问诊数据库建立索引。In step S308, an inquiry database is indexed according to the characteristics.
具体地,服务器根据问诊数据库中特征所在的列数据建立索引,索引中各个节点分别对应问诊数据库中的一行数据,至少包括问答对、问答对对应的特征。Specifically, the server establishes an index according to the column data of the features in the questionnaire database, and each node in the index corresponds to a row of data in the questionnaire database, including at least the features corresponding to the question-answer pair and the question-answer pair.
在一些实施例中,服务器还可根据用户标识、特征建立索引。In some embodiments, the server may also create an index based on user identification and characteristics.
在本实施例中,通过对问诊信息提取特征并建立索引,在计算待回答问题与各个问答 对中的相似度时,不需要再遍历整个数据库,只需要根据待回答问题与索引值进行计算,从而显著地提升了计算效率。In this embodiment, by extracting features and establishing indexes from the questionnaire information, when calculating the similarity between the question to be answered and each question and answer pair, there is no need to traverse the entire database, only the calculation based on the question to be answered and the index value , Thereby significantly improving computing efficiency.
在一些实施例中,如图4所示,对预处理后的问诊信息提取问答对,包括:In some embodiments, as shown in FIG. 4, extracting question-and-answer pairs from the pre-processed questionnaire information includes:
步骤S304A,获取问诊信息集合中每一条问诊信息对应的用户标识,用户标识为问诊用户标识或医生用户标识。Step S304A: Obtain a user ID corresponding to each piece of questionnaire information in the questionnaire information set, and the user identifier is a questioner user ID or a doctor user ID.
具体地,问诊信息中每一条问诊信息都对应一个用户标识,由问诊终端发送的消息,其对应的用户标识为问诊用户标识,由医生终端发送的消息,其对应的用户标识为医生用户标识。Specifically, each piece of inquiry information in the inquiry information corresponds to a user ID. For a message sent by the inquiry terminal, the corresponding user ID is the inquiry user ID, and for a message sent by the doctor terminal, the corresponding user ID is Doctor user ID.
步骤S306B,对医生用户标识对应的问诊信息按照预设的规则进行过滤。In step S306B, the consultation information corresponding to the doctor user ID is filtered according to a preset rule.
具体地,预设的规则至少包括:过滤掉以疑问词结尾的消息,以及与预设的套话相匹配的消息。疑问词例如可以是“怎么办”、“怎么回事”、“为什么”等等。预设的套话为医生终端事先设定的用于节省回复时间的语句,例如,“请您稍等”、“您好,我目前不在班”等等。Specifically, the preset rule at least includes: filtering out messages ending with an interrogative word, and messages matching a preset set of phrases. Interrogative words can be, for example, "what to do", "what is going on", "why" and so on. The preset idioms are sentences set in advance by the doctor's terminal to save response time, for example, "Please wait a moment", "Hello, I'm not in the class at present", and so on.
步骤S308C,对过滤后的问诊文本问诊信息集合,根据标点符号和疑问词提取问答对。Step S308C, the questionnaire information set of the filtered question text is extracted according to punctuation marks and question words.
具体地,从第一个问诊信息开始遍历过滤后的问诊信息,依次获取每一个问诊信息对应的用户标识,当问诊信息对应的用户标识为问诊用户标识时,判断该问诊信息是否包含问句,若是,则以该问句作为问答对中的其中一个问题,从该问题对应的下文中第一个医生用户标识对应的问诊信息开始,获取所有连续的医生用户标识对应的问诊信息,直到下一个问诊用户标识对应的问诊信息出现,将获取的医生用户标识对应的问诊信息作为该问句的回答组成问答对。具体来说,提取的问答对可以包括一个问题一个回答,或者是一个问题连续多个回答,或者连续多个问题一个回答,或者是连续多个问题连续多个回答,具体是哪一种组合视具体问诊情况而定,本申请在此不做限制。Specifically, the filtered questionnaire information is traversed from the first questionnaire information, and the user ID corresponding to each questionnaire information is obtained in turn. When the user ID corresponding to the questionnaire information is the questioner user ID, the questionnaire is determined. Does the information include a question sentence, and if so, the question sentence is used as one of the questions in the question and answer pair, starting from the questionnaire information corresponding to the first doctor user ID in the following corresponding to the question, obtaining all consecutive doctor user ID correspondences Until the questioning information corresponding to the next questioning user ID appears, the questioning information corresponding to the obtained doctor user ID is used as the answer to the question sentence to form a question and answer pair. Specifically, the extracted question-and-answer pairs can include one answer to one question, or multiple consecutive answers to one question, or one consecutive answer to multiple questions, or multiple consecutive answers to multiple consecutive questions. The specific consultation depends on the situation, and this application is not limited here.
在一些实施例中,对提取的问答对进行特征抽取,包括:对提取的问答对中的问题进行分词,得到问题对应的词语集合;将词语集合中各个词语分别与预先建立的特征词库中各个词语进行匹配,当匹配成功时,将词语作为提取的特征。In some embodiments, performing feature extraction on the extracted question-and-answer pairs includes: segmenting the questions in the extracted question-and-answer pairs to obtain a set of words corresponding to the questions; and separating each word in the word set from a pre-established feature word library. Each word is matched. When the match is successful, the word is used as the extracted feature.
具体地,服务器可以先对提取的问答对中的问题进行分词,得到问题对应的词集合。对提取的问答对中的问题进行分词,可先根据标点符号将问题分成一条条完整的语句,再对各个切分的语句进行分词处理,如可利用字符串匹配的分词方法对各个切分的语句进行分词处理,如正向最大匹配法,把一个切分的语句中的字符串从左至右来分词;或者,反向最大匹配法,把一个切分的语句中的字符串从右至左来分词;或者,最短路径分词法,一个切分的语句中的字符串里面要求切出的词数是最少的;或者,双向最大匹配法,正反向同时进行分词匹配。还可利用词义分词法对各个切分的语句进行分词处理,词义分词法是一种机器语音判断的分词方法,利用句法信息和语义信息来处理歧义现象来分词。还可利用统计分词法对各个切分的语句进行分词处理,从当前用户的历史搜索记录或大众用户的历史搜索记录中,根据词组的统计,会统计有些两个相邻的字出现的频率较多,则可 将这两个相邻的字作为词组来进行分词。Specifically, the server may first perform word segmentation on the questions in the extracted question and answer pair to obtain a word set corresponding to the question. To segment the questions in the extracted question and answer pairs, you can first divide the question into complete sentences according to punctuation, and then perform segmentation processing on each segmented sentence. For example, you can use the string matching segmentation method to segment each segmented Sentence segmentation processing, such as forward maximum matching, which divides strings in a segmented sentence from left to right; or reverse maximum matching, which divides strings in a segmented sentence from right to Word segmentation to the left; or shortest path word segmentation, where the number of words in a string of a segmented sentence is required to be cut to a minimum; or, two-way maximum matching, which performs word segmentation matching in both forward and reverse directions. Word segmentation can also be used to perform segmentation processing on each segmented sentence. Word segmentation is a method of machine speech judgment and uses syntactic and semantic information to process ambiguity to segment words. You can also use statistical word segmentation to perform word segmentation on each segmented sentence. From the historical search history of the current user or the historical search history of the general user, according to the statistics of the phrase, it will be counted that some two adjacent words appear more frequently. If there are many, you can use these two adjacent words as phrases to perform segmentation.
进一步,将分词得到的词集合中每一个词与预先建立的特征词库中各个词逐一进行匹配,将匹配上的词作为特征词。在一些实施例中,匹配可以是两个词完全相同。在另一些实施例中,匹配可以是两个词之间的相似度超过预设阈值,如“肚子痛”和“肚子疼”可作为相互匹配的两个词。特征词汇库可以是从现有的医疗数据库中获取的各种疾病的权威解释,包括其对应的简介、症状、并发症、治疗药品、常见检查等专业信息,也可以是各种药品对应的医疗信息,如药品主治的疾病类型等信息,该医疗数据也可以是通过网络爬虫等工具实时或者定时从互联网上的开源医疗数据源(例如,各大论坛上关于不同疾病的问答、讨论等,或各种新的医疗案例、医疗问答文本等)获取的特定类型的信息(例如,不同疾病对应的治疗方案、治疗药物、所属科室、临床表现等)。Further, each word in the word set obtained by the segmentation is matched with each word in a pre-established feature word library, and the matched words are used as feature words. In some embodiments, the match may be that the two words are exactly the same. In other embodiments, the matching may be that the similarity between the two words exceeds a preset threshold, such as "belly pain" and "belly pain" as two words that match each other. The feature vocabulary can be the authoritative explanation of various diseases obtained from the existing medical database, including its corresponding introduction, symptoms, complications, therapeutic drugs, common examinations and other professional information, and it can also be the medical treatment of various drugs. Information, such as the type of disease treated by the drug, the medical data can also be an open source medical data source on the Internet in real time or regularly through tools such as web crawlers (for example, questions and answers about different diseases on various forums, discussions, etc., or Various new medical cases, medical question and answer texts, etc.) to obtain specific types of information (for example, treatment plans corresponding to different diseases, treatment drugs, affiliated departments, clinical manifestations, etc.).
在一些实施例中,如图5所示,分别计算当前待回答问题对应的第一特征词集合与各个索引节点对应的第二特征词集合之间的第一相似度的步骤,包括:In some embodiments, as shown in FIG. 5, the steps of respectively calculating the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node include:
步骤S502,对第一特征词集合中的各个特征词计算特征权重得到第一计算结果,根据第一计算结果选取关键词,得到当前待回答问题对应的第一关键词集合。Step S502: Calculate a feature weight for each feature word in the first feature word set to obtain a first calculation result, select keywords based on the first calculation result, and obtain a first keyword set corresponding to a question to be currently answered.
具体地,特征权重用于表征某个特征的重要程度,特征权重越大,说明该特征词越重要,越能够代表词集合的含义。在一些实施例中,对各个特征词计算特征权重可采用词频-逆文档频率(term frequency–inverse document frequency,TF-IDF)算法。在本实施例中,计算特征权重后得到第一结算结果,第一计算结果指的是第一词集合中各个特征词对应的权重值。根据权重值可以对特征词进行排序,然后根据排序结果选取关键词,从而得到第一关键词集合。Specifically, feature weights are used to characterize the importance of a feature. The larger the feature weight, the more important the feature word is, and the more it can represent the meaning of a word set. In some embodiments, term feature frequency-inverse document frequency (TF-IDF) algorithm may be used to calculate feature weights for each feature word. In this embodiment, a first settlement result is obtained after calculating feature weights. The first calculation result refers to a weight value corresponding to each feature word in the first word set. The feature words can be sorted according to the weight value, and then keywords are selected according to the sorting result, thereby obtaining a first keyword set.
在一些实施例中,服务器可根据特征权重对第一特征词集合中各个特征词进行降序排列,然后选取排序靠前的预设数量个特征词作为关键词,从而得到第一关键词集合。In some embodiments, the server may sort each feature word in the first feature word set in descending order according to the feature weight, and then select a preset number of feature words that are ranked first as keywords to obtain the first keyword set.
步骤S504,对第二特征词集合中各个特征词计算特征权重得到第二计算结果,根据第二计算结果选取关键词,得到各个索引节点对应的第二关键词集合。Step S504, calculating feature weights for each feature word in the second feature word set to obtain a second calculation result, selecting keywords according to the second calculation result, and obtaining a second keyword set corresponding to each index node.
具体地,可采用词频-逆文档频率算法对第二特征词集合中各个特征词计算特征权值以得到第二计算结果,第二计算结果指的是第二词集合中各个特征词的特征权重值,根据权重值可以对特征词进行排序,然后根据排序结果选取关键词,从而得到第二关键词集合。Specifically, a term frequency-inverse document frequency algorithm may be used to calculate feature weights for each feature word in the second feature word set to obtain a second calculation result, and the second calculation result refers to the feature weight of each feature word in the second word set. Value, the feature words can be sorted according to the weight value, and then the keywords are selected according to the ranking result to obtain a second keyword set.
在一些实施例中,服务器可根据特征权重对第二特征词集合中各个特征词进行降序排列,然后选取排序靠前的预设数量个特征词作为关键词,从而得到第二关键词集合。In some embodiments, the server may sort each feature word in the second feature word set in descending order according to the feature weight, and then select a preset number of feature words that are ranked first as keywords to obtain a second keyword set.
步骤S506,根据第一关键词集合和第二关键词集合得到当前待回答问题对应的第一词频向量以及各个索引节点对应的第二词频向量。In step S506, the first word frequency vector corresponding to the current question to be answered and the second word frequency vector corresponding to each index node are obtained according to the first keyword set and the second keyword set.
具体地,将第一关键词集合和第二关键词集合并得到一个并集,分别计算该并集中的各个关键词在第一特征词集合中以及第二特征词集合中的词频,分别根据词频生成第一词频向量和第二词频向量。举例来说,若第一特征词集合为:咳嗽/抽烟/失眠,其对应的关键词集合为{咳嗽,抽烟};第二特征词集合为:头痛/咳嗽/流鼻涕/降温,其对应的关键 词为{头痛,流鼻涕},将两个关键词合并得到{咳嗽,抽烟,头痛,流鼻涕},则,该集合中各个词在第一特征词集合中的词频为:咳嗽1,抽烟1,头痛0,流鼻涕0,该集合中各个词在第一特征词集合中的词频为:咳嗽1,抽烟0,头痛1,流鼻涕1,则最后得到第一词频向量为[1,1,0,0],第二词频向量为[1,0,1,1]。Specifically, the first keyword set and the second keyword set are combined to obtain a union, and the word frequencies of each keyword in the union set in the first feature word set and the second feature word set are calculated respectively according to the word frequencies. A first word frequency vector and a second word frequency vector are generated. For example, if the first feature word set is: cough / smoker / insomnia, its corresponding keyword set is {cough, smoking}; the second feature word set is: headache / cough / running nose / cooling, and its corresponding The key word is {headache, runny nose}. Combine the two keywords to get {cough, smoking, headache, runny nose}. Then, the word frequency of each word in the set in the first feature word set is: cough 1, smoking 1, headache 0, runny nose 0, the word frequency of each word in the set in the first feature word set is: cough 1, smoking 0, headache 1, runny nose 1, and finally the first word frequency vector is [1,1 , 0,0], and the second word frequency vector is [1,0,1,1].
步骤S508,分别计算各个第一词频向量与各个第二词频向量之间的夹角余弦值得到第一相似度。Step S508: Calculate the cosine of the angle between each first word frequency vector and each second word frequency vector to obtain the first similarity.
具体地,余弦相似度的计算公式为:Specifically, the calculation formula of the cosine similarity is:
Figure PCTCN2019071525-appb-000001
Figure PCTCN2019071525-appb-000001
n(n≥2)为词频向量的维度,A i为第一词频向量,B i为第二词频向量。 n (n≥2) of word frequency dimension vectors, A i is a first word frequency vector, B i for the second word frequency vector.
在本实施例中,通过从特征词集合中提取关键词并得到词频向量来计算两个特征词集合的余弦相似度,相比于计算待回答问题、问答对两个文档的相似度,节省了计算量,提高了计算效率。In this embodiment, the cosine similarity of the two feature word sets is calculated by extracting keywords from the feature word set and obtaining the word frequency vector. Compared with calculating the similarity of the two documents to be answered by the question and answer, the savings The amount of calculation improves the calculation efficiency.
在一些实施例中,如图6所示,对第一特征词集合中的各个特征词计算特征权重得到第一计算结果,包括:In some embodiments, as shown in FIG. 6, calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result includes:
步骤S602,采用词频-逆文档频率算法计算第一特征词集合中各个特征词的初始特征权重。In step S602, an initial feature weight of each feature word in the first feature word set is calculated using a word frequency-inverse document frequency algorithm.
具体地,首先计算词频TF,可参考如下公式进行计算:Specifically, first calculate the word frequency TF, which can be calculated with reference to the following formula:
词频TF=某个词在文档中出现的次数/文档的总词数;Word frequency TF = number of times a word appears in a document / total number of words in the document;
然后,计算逆文档词频IDF,可参考如下公式进行计算:Then, to calculate the IDF of the inverse document frequency, refer to the following formula:
逆文档词频
Figure PCTCN2019071525-appb-000002
Inverse document word frequency
Figure PCTCN2019071525-appb-000002
最后,计算初始特征权值:W=TF*IDF。Finally, calculate the initial feature weight: W = TF * IDF.
步骤S604,依次判断第一特征词集合中各个特征词是否满足预设的调整规则,若是,则进入步骤S606;若否,则进入步骤S608。In step S604, it is sequentially judged whether each feature word in the first feature word set satisfies a preset adjustment rule. If so, the process proceeds to step S606; if not, the process proceeds to step S608.
步骤S606,根据调整规则对特征词的初始权重进行调整,得到最终的特征权重。Step S606: Adjust the initial weight of the feature words according to the adjustment rule to obtain the final feature weight.
步骤S608,将初始特征权重作为最终的特征权重。In step S608, the initial feature weight is used as the final feature weight.
具体地,预设的调整规则为人工设定的对特征词的特征权值进行调整的规则。在一些实施例中,预设的调整规则可以是,当两个特征词同时出现且其对应的特征权重之差小于预设阈值时,则对其中一个词的权重进行调整以使得权值之差不小于该预设阈值,如,当头痛和手痛同时出现为特征词,且其对应的特征权重之差小于0.2时,将头痛的特征权重进行调整,使得头痛与手痛的特征权重之差大于0.2,这样做的目的是为了使症状影响较大的特征词的权重增大,从而提高关键词选取时的准确性。Specifically, the preset adjustment rule is a rule for manually adjusting a feature weight of a feature word. In some embodiments, the preset adjustment rule may be: when two feature words appear at the same time and the difference between their corresponding feature weights is less than a preset threshold, then the weight of one of the words is adjusted to make the difference in weights Not less than the preset threshold, for example, when headache and hand pain appear as feature words at the same time, and the difference between their corresponding feature weights is less than 0.2, the feature weight of headache is adjusted so that the feature weight difference between headache and hand pain More than 0.2, the purpose of doing this is to increase the weight of the feature words that have a greater effect on the symptoms, thereby improving the accuracy of keyword selection.
在本实施例中,通过对特征权重进行调整,可以提高关键词选取的准确性。In this embodiment, the accuracy of keyword selection can be improved by adjusting feature weights.
应该理解的是,虽然图2-6的流程图中的各个步骤按照箭头的指示依次显示,但是 这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-6中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the steps in the flowchart of FIG. 2-6 are sequentially displayed in accordance with the directions of the arrows, these steps are not necessarily performed in the order indicated by the arrows. Unless explicitly stated in this document, the execution of these steps is not strictly limited, and these steps can be performed in other orders. Moreover, at least a part of the steps in FIG. 2-6 may include multiple sub-steps or stages. These sub-steps or stages are not necessarily performed at the same time, but may be performed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed in turn or alternately with at least a part of another step or a sub-step or stage of another step.
在一些实施例中,如图7所示,提供了一种问诊数据推荐装置700,包括:In some embodiments, as shown in FIG. 7, a consultation data recommendation device 700 is provided, including:
第一特征词集合获取模块702,用于获取当前待回答问题,对当前待回答问题进行分词,根据分词结果提取特征词,得到当前待回答问题对应的第一特征词集合;A first feature word set acquisition module 702, configured to obtain a current question to be answered, segment the current question to be answered, extract feature words according to the result of the word segmentation, and obtain a first feature word set corresponding to the current question to be answered;
第二特征词集合获取模块704,用于获取预先建立的索引中各个索引节点对应的第二特征词集合;A second feature word set obtaining module 704, configured to obtain a second feature word set corresponding to each index node in a pre-established index;
目标索引节点集合获取模块706,用于分别计算当前待回答问题对应的第一特征词集合与各个索引节点对应的第二特征词集合之间的第一相似度,根据第一相似度计算结果对各个索引节点进行排序以选取预设数量的索引节点作为目标索引节点,得到目标索引节点集合;The target index node set acquisition module 706 is configured to respectively calculate a first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node. Each index node is sorted to select a preset number of index nodes as target index nodes to obtain a target index node set;
问答对获取模块708,用于从问诊数据库中获取目标索引节点集合中各个目标索引节点对应的问答对;A question-and-answer pair acquisition module 708 is configured to obtain a question-and-answer pair corresponding to each target index node in the target index node set from the consultation database;
推荐模块710,用于分别计算当前待回答问题与各个问答对对应的问题之间的第二相似度,根据第二相似度计算结果对各个问答对进行排序以选取目标问答对,根据选取的目标问答对进行问诊数据推荐。The recommendation module 710 is configured to separately calculate a second similarity between a current question to be answered and a question corresponding to each question and answer pair, and rank each question and answer pair according to the second similarity calculation result to select a target question and answer pair, and according to the selected target Questions and answers are recommended for questioning data.
在一些实施例中,如图8所示,上述装置还包括:In some embodiments, as shown in FIG. 8, the foregoing apparatus further includes:
预处理模块802,用于获取历次问诊对应的问诊信息集合,对问诊信息集合进行预处理;A pre-processing module 802, configured to obtain a set of inquiry information corresponding to previous visits, and pre-process the set of inquiry information;
特征抽取模块804,用于对预处理后的问诊信息集合提取问答对,并对提取的问答对进行特征抽取;A feature extraction module 804, configured to extract question and answer pairs from the pre-processed questionnaire information set, and perform feature extraction on the extracted question and answer pairs;
存储模块806,用于将问答对及问答对对应的特征对应存储至问诊数据库;A storage module 806, configured to correspondingly store question and answer pairs and the characteristics corresponding to the question and answer pairs in the questionnaire database;
索引建立模块808,用于根据特征对问诊数据库建立索引。The index establishing module 808 is configured to index the consultation database according to characteristics.
在一些实施例中,特征抽取模块804还用于获取问诊信息集合中每一条问诊信息对应的用户标识,用户标识为问诊用户标识或医生用户标识;对医生用户标识对应的问诊信息按照预设的规则进行过滤;对过滤后的问诊信息集合,根据标点符号和疑问词提取问答对。In some embodiments, the feature extraction module 804 is further configured to obtain a user ID corresponding to each piece of questioning information in the questioning information set, and the user identifier is a questioning user identifier or a doctor user identifier; Filtering is performed according to preset rules; the filtered questionnaire information set is extracted according to punctuation marks and question words.
在一些实施例中,特征抽取模块804还用于对提取的问答对中的问题进行分词,得到问题对应的词语集合;将词语集合中各个词语分别与预先建立的特征词库中各个词语进行匹配,当匹配成功时,将词语作为提取的特征。In some embodiments, the feature extraction module 804 is further configured to perform segmentation on the questions in the extracted question and answer pairs to obtain a set of words corresponding to the questions; and match each word in the word set with each word in a pre-established feature word library When the match is successful, the word is used as the extracted feature.
在一些实施例中,目标索引节点集合获取模块706还用于对第一特征词集合中的各个特征词计算特征权重得到第一计算结果,根据第一计算结果选取关键词,得到当前待回答问题对应的第一关键词集合;对第二特征词集合中各个特征词计算特征权重得到第二计算结果,根据第二计算结果选取关键词,得到各个索引节点对应的第二关键词集合;根据第一关键词集合和第二关键词集合得到当前待回答问题对应的第一词频向量以及各个索引节点对应的第二词频向量;分别计算各个第一词频向量与各个第二词频向量之间的夹角余弦值得到第一相似度。In some embodiments, the target index node set acquisition module 706 is further configured to calculate feature weights for each feature word in the first feature word set to obtain a first calculation result, select keywords based on the first calculation result, and obtain a current question to be answered. Corresponding first keyword set; calculating feature weights for each feature word in the second feature word set to obtain a second calculation result, selecting keywords based on the second calculation result, and obtaining a second keyword set corresponding to each index node; A keyword set and a second keyword set to obtain the first word frequency vector corresponding to the current question to be answered and the second word frequency vector corresponding to each index node; and calculate the angle between each first word frequency vector and each second word frequency vector The cosine value gives the first similarity.
在一些实施例中,目标索引节点集合获取模块706还用于采用词频-逆文档频率算法计算第一特征词集合中各个特征词的初始特征权重;当第一特征词集合中的任意一个特征词满足预设调整规则时,根据预设调整规则对特征词的初始特征权重进行调整,得到最终的特征权重;当第一特征词集合中的任意一个特征词不满足预设调整规则时,将初始特征权重作为最终的特征权重。In some embodiments, the target index node set acquisition module 706 is further configured to calculate an initial feature weight of each feature word in the first feature word set using a word frequency-inverse document frequency algorithm; when any feature word in the first feature word set When the preset adjustment rules are satisfied, the initial feature weights of the feature words are adjusted according to the preset adjustment rules to obtain the final feature weights; when any feature word in the first feature word set does not meet the preset adjustment rules, the initial The feature weight is used as the final feature weight.
关于问诊数据推荐装置的具体限定可以参见上文中对于问诊数据推荐方法的限定,在此不再赘述。上述问诊数据推荐装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the consultation data recommendation device, refer to the limitation on the recommendation method of the consultation data mentioned above, which is not repeated here. Each module in the above-mentioned consultation data recommendation device may be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
在一些实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图8所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性计算机可读存储介质、内存储器。该非易失性计算机可读存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性计算机可读存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储问答对、问答对对应的特征等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种问诊数据推荐方法。In some embodiments, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 8. The computer device includes a processor, a memory, a network interface, and a database connected through a system bus. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile computer-readable storage medium and an internal memory. The non-volatile computer-readable storage medium stores an operating system, computer-readable instructions, and a database. The internal memory provides an environment for operating the operating system and computer-readable instructions in a non-volatile computer-readable storage medium. The database of the computer equipment is used to store data such as question-answer pairs, characteristics corresponding to the question-answer pairs. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions are executed by a processor to implement a method for recommending diagnosis data.
本领域技术人员可以理解,图8中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 8 is only a block diagram of a part of the structure related to the scheme of the present application, and does not constitute a limitation on the computer equipment to which the scheme of the present application is applied. The specific computer equipment may be Include more or fewer parts than shown in the figure, or combine certain parts, or have a different arrangement of parts.
一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得一个或多个处理器执行以下步骤:获取当前待回答问题,对当前待回答问题进行分词,根据分词结果提取特征词,得到当前待回答问题对应的第一特征词集合;获取预先建立的索引中各个索引节点对应的第二特征词集合;分别计算当前待回答问题对应的第一特征词集合与各个索引节点对应的第二特征词集合之间的第一相似度,根据第一相似度计算结果对各个索引节点进行排序以选取预设数量的索引节 点作为目标索引节点,得到目标索引节点集合;从问诊数据库中获取目标索引节点集合中各个目标索引节点对应的问答对;分别计算当前待回答问题与各个问答对对应的问题之间的第二相似度,根据第二相似度计算结果对各个问答对进行排序以选取目标问答对,根据选取的目标问答对进行问诊数据推荐。A computer device includes a memory and one or more processors. Computer-readable instructions are stored in the memory. When the computer-readable instructions are executed by the processor, the one or more processors execute the following steps: obtaining a current question to be answered , Segmenting the current question to be answered, and extracting feature words according to the result of the segmentation, to obtain a first feature word set corresponding to the current question to be answered; obtaining a second feature word set corresponding to each index node in a pre-established index; and calculating the current wait The first similarity between the first feature word set corresponding to the answer question and the second feature word set corresponding to each index node, each index node is sorted according to the first similarity calculation result to select a preset number of index nodes as Target inodes to obtain the target inode set; obtain the question-answer pairs corresponding to each target inode in the target inode set from the consultation database; calculate the second similarity between the current question to be answered and the question corresponding to each question-answer pair For each question and answer based on the second similarity calculation result Sort the pairs to select the target question-answer pairs, and recommend the diagnosis data based on the selected target question-answer pairs.
在一些实施例中,获取当前待回答问题的步骤之前,处理器执行计算机可读指令时还实现以下步骤:获取历次问诊对应的问诊信息集合,对问诊信息集合进行预处理;对预处理后的问诊信息集合提取问答对,并对提取的问答对进行特征抽取;将问答对及问答对对应的特征对应存储至问诊数据库;根据特征对问诊数据库建立索引。In some embodiments, before the step of acquiring the current question to be answered, when the processor executes the computer-readable instructions, the following steps are further implemented: obtaining the inquiry information set corresponding to the previous consultations, preprocessing the inquiry information set; The processed question and answer information set extracts question and answer pairs, and extracts the features of the question and answer pairs; correspondingly stores the features of the question and answer pairs and the question and answer pairs into the question and answer database; and indexes the question and answer database based on the features.
在一些实施例中,对预处理后的问诊信息提取问答对,包括:获取问诊信息集合中每一条问诊信息对应的用户标识,用户标识为问诊用户标识或医生用户标识;对医生用户标识对应的问诊信息按照预设的规则进行过滤;对过滤后的问诊信息集合,根据标点符号和疑问词提取问答对。In some embodiments, extracting question-and-answer pairs from the pre-processed questionnaire information includes: obtaining a user ID corresponding to each piece of questionnaire information in the questionnaire information set, and the user ID is a questioning user ID or a doctor user ID; The questionnaire information corresponding to the user ID is filtered according to a preset rule; for the filtered questionnaire information set, question and answer pairs are extracted according to punctuation marks and question words.
在一些实施例中,对提取的问答对进行特征抽取,包括:对提取的问答对中的问题进行分词,得到问题对应的词语集合;将词语集合中各个词语分别与预先建立的特征词库中各个词语进行匹配,当匹配成功时,将词语作为提取的特征。In some embodiments, performing feature extraction on the extracted question-and-answer pairs includes: segmenting the questions in the extracted question-and-answer pairs to obtain a set of words corresponding to the questions; and separating each word in the word set from a pre-established feature word library. Each word is matched. When the match is successful, the word is used as the extracted feature.
在一些实施例中,分别计算当前待回答问题对应的第一特征词集合与各个索引节点对应的第二特征词集合之间的第一相似度的步骤,包括:对第一特征词集合中的各个特征词计算特征权重得到第一计算结果,根据第一计算结果选取关键词,得到当前待回答问题对应的第一关键词集合;对第二特征词集合中各个特征词计算特征权重得到第二计算结果,根据第二计算结果选取关键词,得到各个索引节点对应的第二关键词集合;根据第一关键词集合和第二关键词集合得到当前待回答问题对应的第一词频向量以及各个索引节点对应的第二词频向量;分别计算各个第一词频向量与各个第二词频向量之间的夹角余弦值得到第一相似度。In some embodiments, the steps of respectively calculating the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node include: The feature weight is calculated for each feature word to obtain a first calculation result, and keywords are selected according to the first calculation result to obtain a first keyword set corresponding to the current question to be answered; the feature weight is calculated for each feature word in the second feature word set to obtain a second Calculate the results, select keywords based on the second calculation results, and obtain a second keyword set corresponding to each index node; obtain the first word frequency vector and each index corresponding to the current question to be answered according to the first keyword set and the second keyword set The second word frequency vector corresponding to the node; the angle cosine between each first word frequency vector and each second word frequency vector is calculated to obtain the first similarity.
在一些实施例中,对第一特征词集合中的各个特征词计算特征权重得到第一计算结果,包括:采用词频-逆文档频率算法计算第一特征词集合中各个特征词的初始特征权重;当第一特征词集合中的任意一个特征词满足预设调整规则时,根据预设调整规则对特征词的初始特征权重进行调整,得到最终的特征权重;当第一特征词集合中的任意一个特征词不满足预设调整规则时,将初始特征权重作为最终的特征权重。In some embodiments, calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result includes: using a word frequency-inverse document frequency algorithm to calculate an initial feature weight for each feature word in the first feature word set; When any feature word in the first feature word set meets a preset adjustment rule, the initial feature weight of the feature word is adjusted according to the preset adjustment rule to obtain the final feature weight; when any one of the first feature word set When the feature words do not satisfy the preset adjustment rules, the initial feature weight is taken as the final feature weight.
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:获取当前待回答问题,对当前待回答问题进行分词,根据分词结果提取特征词,得到当前待回答问题对应的第一特征词集合;获取预先建立的索引中各个索引节点对应的第二特征词集合;分别计算当前待回答问题对应的第一特征词集合与各个索引节点对应的第二特征词集合之间的第一相似度,根据第一相似度计算结果对各个索引节点进行排序以选取预设数量的索引节点作为 目标索引节点,得到目标索引节点集合;从问诊数据库中获取目标索引节点集合中各个目标索引节点对应的问答对;分别计算当前待回答问题与各个问答对对应的问题之间的第二相似度,根据第二相似度计算结果对各个问答对进行排序以选取目标问答对,根据选取的目标问答对进行问诊数据推荐。One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps: obtaining the current pending answer. Questions, segment the current question to be answered, and extract feature words based on the results of the segmentation to obtain the first feature word set corresponding to the question currently to be answered; obtain the second feature word set corresponding to each index node in the pre-established index; calculate the current separately The first similarity between the first feature word set corresponding to the question to be answered and the second feature word set corresponding to each index node, each index node is sorted according to the first similarity calculation result to select a preset number of index nodes As a target index node, a target index node set is obtained; a question-answer pair corresponding to each target index node in the target index node set is obtained from the consultation database; and a second similarity between the current question to be answered and the question corresponding to each question-answer pair is calculated separately Degree based on the second similarity calculation result Sort to select target question and answer pairs, and recommend diagnosis data based on the selected target question and answer pairs.
在一些实施例中,获取当前待回答问题的步骤之前,计算机可读指令被处理器执行时还实现以下步骤:获取历次问诊对应的问诊信息集合,对问诊信息集合进行预处理;对预处理后的问诊信息集合提取问答对,并对提取的问答对进行特征抽取;将问答对及问答对对应的特征对应存储至问诊数据库;根据特征对问诊数据库建立索引。In some embodiments, before the step of obtaining the current question to be answered, when the computer-readable instructions are executed by the processor, the following steps are also implemented: obtaining the inquiry information set corresponding to previous visits, and pre-processing the inquiry information set; The pre-processed questionnaire information set extracts question-and-answer pairs and extracts features from the question-and-answer pairs; correspondingly stores the features of the question-and-answer pairs and question-and-answer pairs in the question-and-answer database; and indexes the question-and-answer database based on the features.
在一些实施例中,对预处理后的问诊信息提取问答对,包括:获取问诊信息集合中每一条问诊信息对应的用户标识,用户标识为问诊用户标识或医生用户标识;对医生用户标识对应的问诊信息按照预设的规则进行过滤;对过滤后的问诊信息集合,根据标点符号和疑问词提取问答对。In some embodiments, extracting question-and-answer pairs from the pre-processed questionnaire information includes: obtaining a user ID corresponding to each piece of questionnaire information in the questionnaire information set, and the user ID is a questioning user ID or a doctor user ID; The questionnaire information corresponding to the user ID is filtered according to a preset rule; for the filtered questionnaire information set, question and answer pairs are extracted according to punctuation marks and question words.
在一些实施例中,对提取的问答对进行特征抽取,包括:对提取的问答对中的问题进行分词,得到问题对应的词语集合;将词语集合中各个词语分别与预先建立的特征词库中各个词语进行匹配,当匹配成功时,将词语作为提取的特征。In some embodiments, performing feature extraction on the extracted question-and-answer pairs includes: segmenting the questions in the extracted question-and-answer pairs to obtain a set of words corresponding to the questions; and separating each word in the word set from a pre-established feature word library. Each word is matched. When the match is successful, the word is used as the extracted feature.
在一些实施例中,分别计算当前待回答问题对应的第一特征词集合与各个索引节点对应的第二特征词集合之间的第一相似度的步骤,包括:对第一特征词集合中的各个特征词计算特征权重得到第一计算结果,根据第一计算结果选取关键词,得到当前待回答问题对应的第一关键词集合;对第二特征词集合中各个特征词计算特征权重得到第二计算结果,根据第二计算结果选取关键词,得到各个索引节点对应的第二关键词集合;根据第一关键词集合和第二关键词集合得到当前待回答问题对应的第一词频向量以及各个索引节点对应的第二词频向量;分别计算各个第一词频向量与各个第二词频向量之间的夹角余弦值得到第一相似度。In some embodiments, the steps of respectively calculating the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node include: The feature weight is calculated for each feature word to obtain a first calculation result, and keywords are selected according to the first calculation result to obtain a first keyword set corresponding to the current question to be answered; the feature weight is calculated for each feature word in the second feature word set to obtain a second Calculate the results, select keywords based on the second calculation results, and obtain a second keyword set corresponding to each index node; obtain the first word frequency vector and each index corresponding to the current question to be answered according to the first keyword set and the second keyword set The second word frequency vector corresponding to the node; the angle cosine between each first word frequency vector and each second word frequency vector is calculated to obtain the first similarity.
在一些实施例中,对第一特征词集合中的各个特征词计算特征权重得到第一计算结果,包括:采用词频-逆文档频率算法计算第一特征词集合中各个特征词的初始特征权重;当第一特征词集合中的任意一个特征词满足预设调整规则时,根据预设调整规则对特征词的初始特征权重进行调整,得到最终的特征权重;当第一特征词集合中的任意一个特征词不满足预设调整规则时,将初始特征权重作为最终的特征权重。In some embodiments, calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result includes: using a word frequency-inverse document frequency algorithm to calculate an initial feature weight for each feature word in the first feature word set; When any feature word in the first feature word set meets a preset adjustment rule, the initial feature weight of the feature word is adjusted according to the preset adjustment rule to obtain the final feature weight; when any one of the first feature word set When the feature words do not satisfy the preset adjustment rules, the initial feature weight is taken as the final feature weight.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局 限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by using computer-readable instructions to instruct related hardware. The computer-readable instructions can be stored in a nonvolatile computer In the readable storage medium, the computer-readable instructions, when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other media used in the embodiments provided in this application may include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be arbitrarily combined. In order to make the description concise, all possible combinations of the technical features in the above embodiments have not been described. However, as long as there is no contradiction in the combination of these technical features, it should be It is considered to be the range described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of the present application, and the description thereof is more specific and detailed, but cannot be understood as a limitation on the scope of the invention patent. It should be noted that, for those of ordinary skill in the art, without departing from the concept of the present application, several modifications and improvements can be made, and these all belong to the protection scope of the present application. Therefore, the protection scope of this application patent shall be subject to the appended claims.

Claims (20)

  1. 一种问诊数据推荐方法,包括:A method for recommending consultation data includes:
    获取当前待回答问题,对所述当前待回答问题进行分词,根据分词结果提取特征词,得到所述当前待回答问题对应的第一特征词集合;Acquiring the current question to be answered, segmenting the current question to be answered, and extracting feature words according to the result of the word segmentation to obtain a first feature word set corresponding to the current question to be answered;
    获取预先建立的索引中各个索引节点对应的第二特征词集合;Obtaining a second feature word set corresponding to each index node in a pre-established index;
    分别计算所述当前待回答问题对应的第一特征词集合与各个索引节点对应的第二特征词集合之间的第一相似度,根据第一相似度计算结果对各个索引节点进行排序以选取预设数量的索引节点作为目标索引节点,得到目标索引节点集合;Calculate the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node, and sort each index node to select a pre- Set the number of inodes as the target inodes to get the target inode set;
    从问诊数据库中获取目标索引节点集合中各个目标索引节点对应的问答对;及Obtaining question and answer pairs corresponding to each target index node in the target index node set from the consultation database; and
    分别计算所述当前待回答问题与各个问答对对应的问题之间的第二相似度,根据第二相似度计算结果对各个问答对进行排序以选取目标问答对,根据选取的所述目标问答对进行问诊数据推荐。Calculate a second similarity between the current question to be answered and a question corresponding to each question-answer pair, rank each question-answer pair according to the second similarity calculation result to select a target question-answer pair, and according to the selected target question-answer pair Recommendations for consultation data.
  2. 根据权利要求1所述的方法,其特征在于,在所述获取当前待回答问题的步骤之前,所述方法还包括:The method according to claim 1, wherein before the step of obtaining a current question to be answered, the method further comprises:
    获取历次问诊对应的问诊信息集合,对所述问诊信息集合进行预处理;Obtaining the inquiry information set corresponding to previous consultations, and preprocessing the inquiry information set;
    对预处理后的问诊信息集合提取问答对,并对提取的所述问答对进行特征抽取;Extracting question-answer pairs from the pre-processed questionnaire information set, and performing feature extraction on the extracted question-answer pairs;
    将所述问答对及所述问答对对应的所述特征对应存储至问诊数据库;及Correspondingly storing the question-answer pairs and the features corresponding to the question-answer pairs in an inquiry database; and
    根据所述特征对所述问诊数据库建立索引。Indexing the consultation database according to the characteristics.
  3. 根据权利要求2所述的方法,其特征在于,所述对预处理后的问诊信息提取问答对,包括:The method according to claim 2, wherein the extracting the question-and-answer pairs from the pre-processed questionnaire information comprises:
    获取所述问诊信息集合中每一条问诊信息对应的用户标识,所述用户标识为问诊用户标识或医生用户标识;Obtaining a user ID corresponding to each piece of the inquiry information in the inquiry information set, where the user identifier is an inquiry user ID or a doctor user ID;
    对医生用户标识对应的问诊信息按照预设的规则进行过滤;及Filtering the consultation information corresponding to the doctor's user ID according to preset rules; and
    对过滤后的问诊信息集合,根据标点符号和疑问词提取问答对。For the filtered questionnaire information set, question and answer pairs are extracted based on punctuation marks and question words.
  4. 根据权利要求2或3所述的方法,其特征在于,所述对提取的所述问答对进行特征抽取,包括:The method according to claim 2 or 3, wherein performing feature extraction on the extracted question-answer pairs comprises:
    对提取的所述问答对中的问题进行分词,得到所述问题对应的词语集合;及Perform word segmentation on the extracted questions in the question and answer pair to obtain a set of words corresponding to the questions; and
    将所述词语集合中各个词语分别与预先建立的特征词库中各个词语进行匹配,当匹配成功时,将所述词语作为提取的特征。Each word in the word set is matched with each word in a pre-established feature word library, and when the matching is successful, the word is used as the extracted feature.
  5. 根据权利要求1所述的方法,其特征在于,所述分别计算所述当前待回答问题对应的第一特征词集合与各个索引节点对应的第二特征词集合之间的第一相似度的步骤,包括:The method according to claim 1, wherein the step of calculating a first similarity between a first feature word set corresponding to the current question to be answered and a second feature word set corresponding to each index node, respectively. ,include:
    对所述第一特征词集合中的各个特征词计算特征权重得到第一计算结果,根据所述第一计算结果选取关键词,得到所述当前待回答问题对应的第一关键词集合;Calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result, selecting keywords according to the first calculation result, and obtaining a first keyword set corresponding to the current question to be answered;
    对第二特征词集合中各个特征词计算特征权重得到第二计算结果,根据所述第二计算 结果选取关键词,得到各个索引节点对应的第二关键词集合;Calculating feature weights for each feature word in the second feature word set to obtain a second calculation result, selecting keywords according to the second calculation result, and obtaining a second keyword set corresponding to each index node;
    根据所述第一关键词集合和所述第二关键词集合得到当前待回答问题对应的第一词频向量以及各个索引节点对应的第二词频向量;及Obtaining the first word frequency vector corresponding to the current question to be answered and the second word frequency vector corresponding to each index node according to the first keyword set and the second keyword set; and
    分别计算各个第一词频向量与各个第二词频向量之间的夹角余弦值得到第一相似度。Calculate the cosine of the angle between each first word frequency vector and each second word frequency vector to obtain the first similarity.
  6. 根据权利要求5所述的方法,其特征在于,所述对所述第一特征词集合中的各个特征词计算特征权重得到第一计算结果,包括:The method according to claim 5, wherein the calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result comprises:
    采用词频-逆文档频率算法计算所述第一特征词集合中各个特征词的初始特征权重;Using a word frequency-inverse document frequency algorithm to calculate an initial feature weight of each feature word in the first feature word set;
    当所述第一特征词集合中的任意一个特征词满足预设调整规则时,根据所述预设调整规则对特征词的初始特征权重进行调整,得到最终的特征权重;及When any feature word in the first feature word set meets a preset adjustment rule, adjusting an initial feature weight of the feature word according to the preset adjustment rule to obtain a final feature weight; and
    当所述第一特征词集合中的任意一个特征词不满足预设调整规则时,将所述初始特征权重作为最终的特征权重。When any feature word in the first feature word set does not satisfy a preset adjustment rule, the initial feature weight is used as a final feature weight.
  7. 一种问诊数据推荐装置,包括:A device for recommending consultation data includes:
    第一特征词集合获取模块,用于获取当前待回答问题,对所述当前待回答问题进行分词,根据分词结果提取特征词,得到所述当前待回答问题对应的第一特征词集合;A first feature word set acquisition module, configured to obtain a current question to be answered, perform word segmentation on the current question to be answered, extract feature words according to the word segmentation result, and obtain a first feature word set corresponding to the current question to be answered;
    第二特征词集合获取模块,用于获取预先建立的索引中各个索引节点对应的第二特征词集合;A second feature word set acquisition module, configured to obtain a second feature word set corresponding to each index node in a pre-established index;
    目标索引节点集合获取模块,用于分别计算所述当前待回答问题对应的第一特征词集合与各个索引节点对应的第二特征词集合之间的第一相似度,根据第一相似度计算结果对各个索引节点进行排序以选取预设数量的索引节点作为目标索引节点,得到目标索引节点集合;A target index node set acquisition module is configured to respectively calculate a first similarity between a first feature word set corresponding to the current question to be answered and a second feature word set corresponding to each index node, and calculate a result according to the first similarity Sorting each index node to select a preset number of index nodes as target index nodes to obtain a target index node set;
    问答对获取模块,用于从问诊数据库中获取目标索引节点集合中各个目标索引节点对应的问答对;及A question-and-answer pair acquisition module, for obtaining a question-and-answer pair corresponding to each target index node in the target index node set from the consultation database; and
    推荐模块,用于分别计算所述当前待回答问题与各个问答对对应的问题之间的第二相似度,根据第二相似度计算结果对各个问答对进行排序以选取目标问答对,根据选取的所述目标问答对进行问诊数据推荐。A recommendation module is configured to separately calculate a second similarity between the current question to be answered and a question corresponding to each question and answer pair, and sort each question and answer pair to select a target question and answer pair according to the second similarity calculation result, and according to the selected question and answer pair, The target question-and-answer is recommended for consultation data.
  8. 根据权利要求7所述的装置,其特征在于,所述装置还包括:The apparatus according to claim 7, further comprising:
    预处理模块,用于获取历次问诊对应的问诊信息集合,对所述问诊信息集合进行预处理;A pre-processing module, configured to obtain a set of inquiry information corresponding to previous visits, and pre-process the set of inquiry information;
    特征抽取模块,用于对预处理后的问诊信息集合提取问答对,并对提取的所述问答对进行特征抽取;A feature extraction module, configured to extract question and answer pairs from the pre-processed questionnaire information set, and perform feature extraction on the extracted question and answer pairs;
    存储模块,用于将所述问答对及所述问答对对应的所述特征对应存储至问诊数据库;及A storage module, configured to correspondingly store the question-answer pairs and the features corresponding to the question-answer pairs to a question-and-answer database; and
    索引建立模块,用于根据所述特征对所述问诊数据库建立索引。An index establishing module is configured to index the consultation database according to the characteristics.
  9. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more processors are caused. Each processor performs the following steps:
    获取当前待回答问题,对所述当前待回答问题进行分词,根据分词结果提取特征词,得到所述当前待回答问题对应的第一特征词集合;Acquiring the current question to be answered, segmenting the current question to be answered, and extracting feature words according to the result of the word segmentation to obtain a first feature word set corresponding to the current question to be answered;
    获取预先建立的索引中各个索引节点对应的第二特征词集合;Obtaining a second feature word set corresponding to each index node in a pre-established index;
    分别计算所述当前待回答问题对应的第一特征词集合与各个索引节点对应的第二特征词集合之间的第一相似度,根据第一相似度计算结果对各个索引节点进行排序以选取预设数量的索引节点作为目标索引节点,得到目标索引节点集合;Calculate the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node, and sort each index node to select a pre- Set the number of inodes as the target inodes to get the target inode set;
    从问诊数据库中获取目标索引节点集合中各个目标索引节点对应的问答对;及Obtaining question and answer pairs corresponding to each target index node in the target index node set from the consultation database; and
    分别计算所述当前待回答问题与各个问答对对应的问题之间的第二相似度,根据第二相似度计算结果对各个问答对进行排序以选取目标问答对,根据选取的所述目标问答对进行问诊数据推荐。Calculate a second similarity between the current question to be answered and a question corresponding to each question-answer pair, rank each question-answer pair according to the second similarity calculation result to select a target question-answer pair, and according to the selected target question-answer pair Recommendations for consultation data.
  10. 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 10, wherein the processor further executes the following steps when executing the computer-readable instructions:
    获取历次问诊对应的问诊信息集合,对所述问诊信息集合进行预处理;Obtaining the inquiry information set corresponding to previous consultations, and preprocessing the inquiry information set;
    对预处理后的问诊信息集合提取问答对,并对提取的所述问答对进行特征抽取;Extracting question-answer pairs from the pre-processed questionnaire information set, and performing feature extraction on the extracted question-answer pairs;
    将所述问答对及所述问答对对应的所述特征对应存储至问诊数据库;及Correspondingly storing the question-answer pairs and the features corresponding to the question-answer pairs in an inquiry database; and
    根据所述特征对所述问诊数据库建立索引。Indexing the consultation database according to the characteristics.
  11. 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 10, wherein the processor further executes the following steps when executing the computer-readable instructions:
    获取所述问诊信息集合中每一条问诊信息对应的用户标识,所述用户标识为问诊用户标识或医生用户标识;Obtaining a user ID corresponding to each piece of the inquiry information in the inquiry information set, where the user identifier is an inquiry user ID or a doctor user ID;
    对医生用户标识对应的问诊信息按照预设的规则进行过滤;及Filtering the consultation information corresponding to the doctor's user ID according to preset rules; and
    对过滤后的问诊信息集合,根据标点符号和疑问词提取问答对。For the filtered questionnaire information set, question and answer pairs are extracted based on punctuation marks and question words.
  12. 根据权利要求10或11所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 10 or 11, wherein the processor further executes the following steps when executing the computer-readable instructions:
    对提取的所述问答对中的问题进行分词,得到所述问题对应的词语集合;及Perform word segmentation on the extracted questions in the question and answer pair to obtain a set of words corresponding to the questions; and
    将所述词语集合中各个词语分别与预先建立的特征词库中各个词语进行匹配,当匹配成功时,将所述词语作为提取的特征。Each word in the word set is matched with each word in a pre-established feature word library, and when the matching is successful, the word is used as the extracted feature.
  13. 根据权利要求9所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 9, wherein the processor further executes the following steps when executing the computer-readable instructions:
    对所述第一特征词集合中的各个特征词计算特征权重得到第一计算结果,根据所述第一计算结果选取关键词,得到所述当前待回答问题对应的第一关键词集合;Calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result, selecting keywords according to the first calculation result, and obtaining a first keyword set corresponding to the current question to be answered;
    对第二特征词集合中各个特征词计算特征权重得到第二计算结果,根据所述第二计算 结果选取关键词,得到各个索引节点对应的第二关键词集合;Calculating feature weights for each feature word in the second feature word set to obtain a second calculation result, selecting keywords according to the second calculation result, and obtaining a second keyword set corresponding to each index node;
    根据所述第一关键词集合和所述第二关键词集合得到当前待回答问题对应的第一词频向量以及各个索引节点对应的第二词频向量;及Obtaining the first word frequency vector corresponding to the current question to be answered and the second word frequency vector corresponding to each index node according to the first keyword set and the second keyword set; and
    分别计算各个第一词频向量与各个第二词频向量之间的夹角余弦值得到第一相似度。Calculate the cosine of the angle between each first word frequency vector and each second word frequency vector to obtain the first similarity.
  14. 根据权利要求13所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device of claim 13, wherein the processor further executes the following steps when executing the computer-readable instructions:
    采用词频-逆文档频率算法计算所述第一特征词集合中各个特征词的初始特征权重;Using a word frequency-inverse document frequency algorithm to calculate an initial feature weight of each feature word in the first feature word set;
    当所述第一特征词集合中的任意一个特征词满足预设调整规则时,根据所述预设调整规则对特征词的初始特征权重进行调整,得到最终的特征权重;及When any feature word in the first feature word set meets a preset adjustment rule, adjusting an initial feature weight of the feature word according to the preset adjustment rule to obtain a final feature weight; and
    当所述第一特征词集合中的任意一个特征词不满足预设调整规则时,将所述初始特征权重作为最终的特征权重。When any feature word in the first feature word set does not satisfy a preset adjustment rule, the initial feature weight is used as a final feature weight.
  15. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:One or more non-transitory computer-readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:
    获取当前待回答问题,对所述当前待回答问题进行分词,根据分词结果提取特征词,得到所述当前待回答问题对应的第一特征词集合;Acquiring the current question to be answered, segmenting the current question to be answered, and extracting feature words according to the result of the word segmentation to obtain a first feature word set corresponding to the current question to be answered;
    获取预先建立的索引中各个索引节点对应的第二特征词集合;Obtaining a second feature word set corresponding to each index node in a pre-established index;
    分别计算所述当前待回答问题对应的第一特征词集合与各个索引节点对应的第二特征词集合之间的第一相似度,根据第一相似度计算结果对各个索引节点进行排序以选取预设数量的索引节点作为目标索引节点,得到目标索引节点集合;Calculate the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node, and sort each index node to select a pre- Set the number of inodes as the target inodes to get the target inode set;
    从问诊数据库中获取目标索引节点集合中各个目标索引节点对应的问答对;及Obtaining question and answer pairs corresponding to each target index node in the target index node set from the consultation database; and
    分别计算所述当前待回答问题与各个问答对对应的问题之间的第二相似度,根据第二相似度计算结果对各个问答对进行排序以选取目标问答对,根据选取的所述目标问答对进行问诊数据推荐。Calculate a second similarity between the current question to be answered and a question corresponding to each question-answer pair, rank each question-answer pair according to the second similarity calculation result to select a target question-answer pair, and according to the selected target question-answer pair Recommendations for consultation data.
  16. 根据权利要求15所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:The storage medium according to claim 15, wherein when the computer-readable instructions are executed by the processor, the following steps are further performed:
    获取历次问诊对应的问诊信息集合,对所述问诊信息集合进行预处理;Obtaining the inquiry information set corresponding to previous consultations, and preprocessing the inquiry information set;
    对预处理后的问诊信息集合提取问答对,并对提取的所述问答对进行特征抽取;Extracting question-answer pairs from the pre-processed questionnaire information set, and performing feature extraction on the extracted question-answer pairs;
    将所述问答对及所述问答对对应的所述特征对应存储至问诊数据库;及Correspondingly storing the question-answer pairs and the features corresponding to the question-answer pairs in an inquiry database; and
    根据所述特征对所述问诊数据库建立索引。Indexing the consultation database according to the characteristics.
  17. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:The storage medium according to claim 16, wherein when the computer-readable instructions are executed by the processor, the following steps are further performed:
    获取所述问诊信息集合中每一条问诊信息对应的用户标识,所述用户标识为问诊用户标识或医生用户标识;Obtaining a user ID corresponding to each piece of the inquiry information in the inquiry information set, where the user identifier is an inquiry user ID or a doctor user ID;
    对医生用户标识对应的问诊信息按照预设的规则进行过滤;及Filtering the consultation information corresponding to the doctor's user ID according to preset rules; and
    对过滤后的问诊信息集合,根据标点符号和疑问词提取问答对。For the filtered questionnaire information set, question and answer pairs are extracted based on punctuation marks and question words.
  18. 根据权利要求16或17所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:The storage medium according to claim 16 or 17, wherein when the computer-readable instructions are executed by the processor, the following steps are further performed:
    对提取的所述问答对中的问题进行分词,得到所述问题对应的词语集合;及Perform word segmentation on the extracted questions in the question and answer pair to obtain a set of words corresponding to the questions; and
    将所述词语集合中各个词语分别与预先建立的特征词库中各个词语进行匹配,当匹配成功时,将所述词语作为提取的特征。Each word in the word set is matched with each word in a pre-established feature word library, and when the matching is successful, the word is used as the extracted feature.
  19. 根据权利要求15所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:The storage medium according to claim 15, wherein when the computer-readable instructions are executed by the processor, the following steps are further performed:
    对所述第一特征词集合中的各个特征词计算特征权重得到第一计算结果,根据所述第一计算结果选取关键词,得到所述当前待回答问题对应的第一关键词集合;Calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result, selecting keywords according to the first calculation result, and obtaining a first keyword set corresponding to the current question to be answered;
    对第二特征词集合中各个特征词计算特征权重得到第二计算结果,根据所述第二计算结果选取关键词,得到各个索引节点对应的第二关键词集合;Calculating feature weights for each feature word in the second feature word set to obtain a second calculation result, selecting keywords according to the second calculation result, and obtaining a second keyword set corresponding to each index node;
    根据所述第一关键词集合和所述第二关键词集合得到当前待回答问题对应的第一词频向量以及各个索引节点对应的第二词频向量;及Obtaining the first word frequency vector corresponding to the current question to be answered and the second word frequency vector corresponding to each index node according to the first keyword set and the second keyword set; and
    分别计算各个第一词频向量与各个第二词频向量之间的夹角余弦值得到第一相似度。Calculate the cosine of the angle between each first word frequency vector and each second word frequency vector to obtain the first similarity.
  20. 根据权利要求19所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:The storage medium according to claim 19, wherein when the computer-readable instructions are executed by the processor, the following steps are further performed:
    采用词频-逆文档频率算法计算所述第一特征词集合中各个特征词的初始特征权重;Using a word frequency-inverse document frequency algorithm to calculate an initial feature weight of each feature word in the first feature word set;
    当所述第一特征词集合中的任意一个特征词满足预设调整规则时,根据所述预设调整规则对特征词的初始特征权重进行调整,得到最终的特征权重;及When any feature word in the first feature word set meets a preset adjustment rule, adjusting an initial feature weight of the feature word according to the preset adjustment rule to obtain a final feature weight; and
    当所述第一特征词集合中的任意一个特征词不满足预设调整规则时,将所述初始特征权重作为最终的特征权重。When any feature word in the first feature word set does not satisfy a preset adjustment rule, the initial feature weight is used as a final feature weight.
PCT/CN2019/071525 2018-07-04 2019-01-14 Medical consultation data recommendation method, device, computer apparatus, and storage medium WO2020007028A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810724291.7 2018-07-04
CN201810724291.7A CN109147934B (en) 2018-07-04 2018-07-04 Inquiry data recommendation method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2020007028A1 true WO2020007028A1 (en) 2020-01-09

Family

ID=64799920

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/071525 WO2020007028A1 (en) 2018-07-04 2019-01-14 Medical consultation data recommendation method, device, computer apparatus, and storage medium

Country Status (2)

Country Link
CN (1) CN109147934B (en)
WO (1) WO2020007028A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111708949A (en) * 2020-06-19 2020-09-25 微医云(杭州)控股有限公司 Medical resource recommendation method and device, electronic equipment and storage medium
CN112002413A (en) * 2020-08-23 2020-11-27 吾征智能技术(北京)有限公司 Cardiovascular system infection intelligent cognitive system, equipment and storage medium
CN112002415A (en) * 2020-08-23 2020-11-27 吾征智能技术(北京)有限公司 Intelligent cognitive disease system based on human excrement
CN112269880A (en) * 2020-11-04 2021-01-26 吾征智能技术(北京)有限公司 Sweet text classification matching system based on linear function
CN112802597A (en) * 2021-01-18 2021-05-14 吾征智能技术(北京)有限公司 Intelligent neonatal jaundice evaluation system, device and storage medium
CN112951405A (en) * 2021-01-26 2021-06-11 北京搜狗科技发展有限公司 Method, device and equipment for realizing feature sorting
CN116089669A (en) * 2023-03-09 2023-05-09 数影星球(杭州)科技有限公司 Browser-based website uploading interception mode and system

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109147934B (en) * 2018-07-04 2023-04-11 平安科技(深圳)有限公司 Inquiry data recommendation method, device, computer equipment and storage medium
CN109783631B (en) * 2019-02-02 2022-05-17 北京百度网讯科技有限公司 Community question-answer data verification method and device, computer equipment and storage medium
CN111858863B (en) * 2019-04-29 2023-07-14 深圳市优必选科技有限公司 Reply recommendation method, reply recommendation device and electronic equipment
CN110321435B (en) * 2019-06-28 2020-09-29 京东数字科技控股有限公司 Data source dividing method, device, equipment and storage medium
CN110377719B (en) * 2019-07-25 2022-02-15 广东工业大学 Medical question and answer method and device
CN110473067B (en) * 2019-08-14 2020-09-04 杭州品茗安控信息技术股份有限公司 Method, device, equipment and storage medium for determining construction cost standard file of component
CN112559676B (en) * 2019-09-25 2022-05-17 北京新唐思创教育科技有限公司 Similar topic retrieval method and device and computer storage medium
CN111367971A (en) * 2020-03-30 2020-07-03 中国建设银行股份有限公司 Financial system abnormity auxiliary analysis method and device based on data mining
CN111553151A (en) * 2020-04-02 2020-08-18 深圳壹账通智能科技有限公司 Question recommendation method and device based on field similarity calculation and server
CN111476029A (en) * 2020-04-13 2020-07-31 武汉联影医疗科技有限公司 Resource recommendation method and device
CN113764111B (en) * 2020-09-29 2024-04-05 北京京东拓先科技有限公司 Method and device for determining message rounds
CN112397197A (en) * 2020-11-16 2021-02-23 康键信息技术(深圳)有限公司 Artificial intelligence-based inquiry data processing method and device
CN112541069A (en) * 2020-12-24 2021-03-23 山东山大鸥玛软件股份有限公司 Text matching method, system, terminal and storage medium combined with keywords
CN112818225A (en) * 2021-01-27 2021-05-18 上海明略人工智能(集团)有限公司 Display method and device of pushed data
CN112786176A (en) * 2021-02-22 2021-05-11 北京融威众邦电子技术有限公司 Intelligent self-service diagnosis method and device and computer equipment
CN112820364B (en) * 2021-02-22 2023-01-24 中国人民解放军联勤保障部队第九八〇医院 Oral cavity outpatient service electronic medical record system based on database framework
CN113203086A (en) * 2021-04-30 2021-08-03 江苏经贸职业技术学院 Lighting device with disinfection function for classroom
CN113658684A (en) * 2021-08-11 2021-11-16 挂号网(杭州)科技有限公司 Consultation result generation method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573028A (en) * 2015-01-14 2015-04-29 百度在线网络技术(北京)有限公司 Intelligent question-answer implementing method and system
CN107491655A (en) * 2017-08-31 2017-12-19 康安健康管理咨询(常熟)有限公司 Liver diseases information intelligent consultation method and system based on machine learning
CN108108449A (en) * 2017-12-27 2018-06-01 哈尔滨福满科技有限责任公司 A kind of implementation method based on multi-source heterogeneous data question answering system and the system towards medical field
CN109147934A (en) * 2018-07-04 2019-01-04 平安科技(深圳)有限公司 Interrogation data recommendation method, device, computer equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989040B (en) * 2015-02-03 2021-02-09 创新先进技术有限公司 Intelligent question and answer method, device and system
CN106503175B (en) * 2016-11-01 2019-03-29 上海智臻智能网络科技股份有限公司 Inquiry, problem extended method, device and the robot of Similar Text
WO2019084867A1 (en) * 2017-11-02 2019-05-09 深圳前海达闼云端智能科技有限公司 Automatic answering method and apparatus, storage medium, and electronic device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573028A (en) * 2015-01-14 2015-04-29 百度在线网络技术(北京)有限公司 Intelligent question-answer implementing method and system
CN107491655A (en) * 2017-08-31 2017-12-19 康安健康管理咨询(常熟)有限公司 Liver diseases information intelligent consultation method and system based on machine learning
CN108108449A (en) * 2017-12-27 2018-06-01 哈尔滨福满科技有限责任公司 A kind of implementation method based on multi-source heterogeneous data question answering system and the system towards medical field
CN109147934A (en) * 2018-07-04 2019-01-04 平安科技(深圳)有限公司 Interrogation data recommendation method, device, computer equipment and storage medium

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111708949A (en) * 2020-06-19 2020-09-25 微医云(杭州)控股有限公司 Medical resource recommendation method and device, electronic equipment and storage medium
CN111708949B (en) * 2020-06-19 2023-07-25 微医云(杭州)控股有限公司 Medical resource recommendation method and device, electronic equipment and storage medium
CN112002413A (en) * 2020-08-23 2020-11-27 吾征智能技术(北京)有限公司 Cardiovascular system infection intelligent cognitive system, equipment and storage medium
CN112002415A (en) * 2020-08-23 2020-11-27 吾征智能技术(北京)有限公司 Intelligent cognitive disease system based on human excrement
CN112002413B (en) * 2020-08-23 2023-09-29 吾征智能技术(北京)有限公司 Intelligent cognitive system, equipment and storage medium for cardiovascular system infection
CN112002415B (en) * 2020-08-23 2024-03-01 吾征智能技术(北京)有限公司 Intelligent cognitive disease system based on human excrement
CN112269880B (en) * 2020-11-04 2024-02-09 吾征智能技术(北京)有限公司 Sweet text classification matching system based on linear function
CN112269880A (en) * 2020-11-04 2021-01-26 吾征智能技术(北京)有限公司 Sweet text classification matching system based on linear function
CN112802597A (en) * 2021-01-18 2021-05-14 吾征智能技术(北京)有限公司 Intelligent neonatal jaundice evaluation system, device and storage medium
CN112802597B (en) * 2021-01-18 2023-11-21 吾征智能技术(北京)有限公司 Intelligent evaluation system, equipment and storage medium for neonatal jaundice
CN112951405A (en) * 2021-01-26 2021-06-11 北京搜狗科技发展有限公司 Method, device and equipment for realizing feature sorting
CN116089669A (en) * 2023-03-09 2023-05-09 数影星球(杭州)科技有限公司 Browser-based website uploading interception mode and system
CN116089669B (en) * 2023-03-09 2023-10-03 数影星球(杭州)科技有限公司 Browser-based website uploading interception mode and system

Also Published As

Publication number Publication date
CN109147934A (en) 2019-01-04
CN109147934B (en) 2023-04-11

Similar Documents

Publication Publication Date Title
WO2020007028A1 (en) Medical consultation data recommendation method, device, computer apparatus, and storage medium
US11301637B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
US9348900B2 (en) Generating an answer from multiple pipelines using clustering
US9146987B2 (en) Clustering based question set generation for training and testing of a question and answer system
US9230009B2 (en) Routing of questions to appropriately trained question and answer system pipelines using clustering
JP5998194B2 (en) Interactive search method and apparatus
WO2020119031A1 (en) Deep learning-based question and answer feedback method, device, apparatus, and storage medium
CN112328762A (en) Question and answer corpus generation method and device based on text generation model
CN108846138B (en) Question classification model construction method, device and medium fusing answer information
CN111797214A (en) FAQ database-based problem screening method and device, computer equipment and medium
US11699034B2 (en) Hybrid artificial intelligence system for semi-automatic patent infringement analysis
CN110134777B (en) Question duplication eliminating method and device, electronic equipment and computer readable storage medium
WO2020114100A1 (en) Information processing method and apparatus, and computer storage medium
CN112651236B (en) Method and device for extracting text information, computer equipment and storage medium
WO2023029513A1 (en) Artificial intelligence-based search intention recognition method and apparatus, device, and medium
CN112632261A (en) Intelligent question and answer method, device, equipment and storage medium
CN111930895A (en) Document data retrieval method, device, equipment and storage medium based on MRC
CN110990533A (en) Method and device for determining standard text corresponding to query text
CN110955767A (en) Algorithm and device for generating intention candidate set list set in robot dialogue system
CN113761161A (en) Text keyword extraction method and device, computer equipment and storage medium
WO2021000400A1 (en) Hospital guide similar problem pair generation method and system, and computer device
CN108810640B (en) Television program recommendation method
CN108763258B (en) Document theme parameter extraction method, product recommendation method, device and storage medium
CN112668284B (en) Legal document segmentation method and system
CN114464328A (en) Test information retrieval method and device, clinical test recommendation method and terminal

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19831424

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19831424

Country of ref document: EP

Kind code of ref document: A1