CN109147934A

CN109147934A - Interrogation data recommendation method, device, computer equipment and storage medium

Info

Publication number: CN109147934A
Application number: CN201810724291.7A
Authority: CN
Inventors: 高羽; 柳恭; 葛培明; 孙行智
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-07-04
Filing date: 2018-07-04
Publication date: 2019-01-04
Anticipated expiration: 2038-07-04
Also published as: WO2020007028A1; CN109147934B

Abstract

This application involves a kind of interrogation data recommendation method, device, computer equipment and storage mediums.Method includes: to obtain currently wait answer a question and segmented, and extracts Feature Words according to word segmentation result, obtains currently wait corresponding fisrt feature set of words of answering a question；Obtain the corresponding second feature set of words of each index node in the index that pre-establishes；Calculate separately the cosine similarity between fisrt feature set of words and second feature set of words, it is ranked up the index node to choose preset quantity to each index node as target index node according to the first similarity calculation result, obtains target index node set；The corresponding question and answer pair of each target index node are obtained from interrogation database；It calculates separately currently wait answer a question with each question and answer to the second similarity between corresponding problem, target question and answer pair is chosen to being ranked up to each question and answer according to the second similarity calculation result, according to the target question and answer of selection to progress interrogation data recommendation.

Description

Interrogation data recommendation method, device, computer equipment and storage medium

Technical field

This application involves online interview techniques field, more particularly to it is a kind of by interrogation data recommendation method, device, based on Calculate machine equipment and storage medium.

Background technique

With the rapid development of Internet technology, online interrogation Internet-based and online health consultation are obtained increasingly The favor of more people.In online interrogation and online health consultation, each user is after proposition problem, and all it is most fast to obtain doctor for expectation The answer of speed.

In traditional technology, doctor needs to organize language by thinking, writes and answer most after seeing the enquirement of user It clicks and sends afterwards, user can just see the reply to problem, lead to interrogation inefficiency.

Summary of the invention

Based on this, it is necessary in view of the above technical problems, provide a kind of interrogation data recommendation that can be improved interrogation efficiency Method, apparatus, computer equipment and storage medium.

A kind of interrogation data recommendation method, comprising:

It obtains currently wait answer a question, is currently segmented wait answer a question to described, feature is extracted according to word segmentation result Word obtains described currently wait corresponding fisrt feature set of words of answering a question；

Obtain the corresponding second feature set of words of each index node in the index that pre-establishes；

It calculates separately described currently wait corresponding fisrt feature set of words of answering a question corresponding with each index node The first similarity between two feature set of words is ranked up to select each index node according to the first similarity calculation result It takes the index node of preset quantity as target index node, obtains target index node set；

From obtaining the corresponding question and answer pair of each target index node in target index node set in interrogation database；

Calculate separately it is described currently wait answer a question with each question and answer to the second similarity between corresponding problem, according to Second similarity calculation result chooses target question and answer pair to being ranked up to each question and answer, according to the target question and answer of selection To progress interrogation data recommendation.

Include: before the step of acquisition is currently to be answered a question in one of the embodiments,

The corresponding interrogation information aggregate of all previous interrogation is obtained, the interrogation information aggregate is pre-processed；

Question and answer pair are extracted to pretreated interrogation information aggregate, and to the question and answer of extraction to progress feature extraction；

The question and answer pair and the question and answer store to interrogation database the corresponding feature correspondence；

The interrogation Database is indexed according to the feature.

It is described in one of the embodiments, that question and answer pair are extracted to pretreated interrogation information aggregate, comprising:

The corresponding user identifier of each interrogation information in the interrogation information aggregate is obtained, the user identifier is interrogation User identifier or clinician user mark；

Corresponding interrogation information is identified to clinician user to be filtered according to default rule；

To filtered interrogation information aggregate, question and answer pair are extracted according to punctuation mark and interrogative.

The question and answer of described pair of extraction are to progress feature extraction in one of the embodiments, comprising:

To the question and answer of extraction to the problems in segment, obtain the corresponding set of words of described problem；

Word each in the set of words is matched with each word in the feature dictionary pre-established respectively, when When successful match, using the word as the feature extracted.

It is calculated separately described in one of the embodiments, described currently wait corresponding fisrt feature set of words of answering a question The step of the first similarity between second feature set of words corresponding with each index node, comprising:

Feature weight is calculated to each Feature Words in the fisrt feature set of words and obtains the first calculated result, according to institute It states the first calculated result and chooses keyword, obtain described currently wait corresponding first keyword set of answering a question；

Feature weight is calculated to Feature Words each in second feature set of words and obtains the second calculated result, according to described second Calculated result chooses keyword, obtains corresponding second keyword set of each index node；

It is obtained currently according to first keyword set and second keyword set wait answer a question corresponding One word frequency vector and the corresponding second word frequency vector of each index node；

The included angle cosine value calculated separately between each first word frequency vector and each second word frequency vector obtains the first phase Like degree.

Each Feature Words in the fisrt feature set of words calculate feature weight in one of the embodiments, Obtain the first calculated result, comprising:

The initial characteristics power of each Feature Words in the fisrt feature set of words is calculated using term frequency-inverse document frequency algorithm Weight；

When any one Feature Words in the fisrt feature set of words meet default adjustment rule, according to described default Adjustment rule is adjusted the initial characteristics weight of Feature Words, obtains final feature weight；

It, will be described initial when any one Feature Words in the fisrt feature set of words are unsatisfactory for default adjustment rule Feature weight is as final feature weight.

A kind of interrogation data recommendation device, described device include:

Fisrt feature set of words obtains module, for obtaining currently wait answer a question, to it is described currently wait answer a question into Row participle extracts Feature Words according to word segmentation result, obtains described currently wait corresponding fisrt feature set of words of answering a question；

Second feature set of words obtains module, for obtaining each index node corresponding second in the index pre-established Feature set of words；

Target index node set obtains module, described currently wait corresponding fisrt feature of answering a question for calculating separately The first similarity between set of words second feature set of words corresponding with each index node, according to the first similarity calculation knot Fruit is ranked up the index node to choose preset quantity to each index node as target index node, obtains target index Node set；

Question and answer are to module is obtained, for from each target index saves in acquisition target index node set in interrogation database The corresponding question and answer pair of point；

Recommending module, it is described currently wait answer a question with each question and answer between corresponding problem for calculating separately Two similarities choose target question and answer pair to being ranked up to each question and answer according to the second similarity calculation result, according to selection The target question and answer to carry out interrogation data recommendation.

Described device in one of the embodiments, further include:

Preprocessing module carries out the interrogation information aggregate for obtaining the corresponding interrogation information aggregate of all previous interrogation Pretreatment；

Feature extraction module for extracting question and answer pair to pretreated interrogation information aggregate, and is asked described in extraction Answer questions carry out feature extraction；

Memory module, for storing the question and answer pair and the question and answer the corresponding feature correspondence to interrogation data Library；

Index establishes module, for being indexed according to the feature to the interrogation Database.

A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing Device realizes step described in above-mentioned interrogation data recommendation method when executing the computer program.

A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor Step described in above-mentioned interrogation data recommendation method is realized when row.

Above-mentioned interrogation data recommendation method, device, computer equipment and storage medium, obtain correspondence to be answered a question first Feature Words arrangement set, then calculate the feature word order of each node in Feature Words arrangement set to be answered a question and index The first similarity between column set chooses the maximum some nodes of similarity as destination node, then searches these nodes Corresponding question and answer pair, calculate the second similarity wait answer a question with question and answer centering problem, and similarity is more maximum asks for selection It answers questions as target question and answer pair, according to these question and answer to come the recommendation that carries out interrogation data, passes through two minor sorts, essence in the application It is located quasi-ly with wait most like question and answer pair of answering a question, according to most like question and answer to recommending, realizes interrogation Shi Zidong is that doctor recommends accurately to answer, to improve the efficiency of interrogation.

Detailed description of the invention

Fig. 1 is the application scenario diagram of interrogation data recommendation method in one embodiment；

Fig. 2 is the flow diagram of interrogation data recommendation method in one embodiment；

Fig. 3 is the flow diagram in one embodiment before step S202；

Fig. 4 is the corresponding flow diagram of step S304 in one embodiment；

Fig. 5 is the corresponding flow diagram of step S206 in one embodiment；

Fig. 6 is the corresponding flow diagram of step S502 in one embodiment；

Fig. 7 is the structural block diagram of interrogation data recommendation device in one embodiment；

Fig. 8 is the structural block diagram of interrogation data recommendation device in another embodiment；

Fig. 9 is the internal structure chart of computer equipment in one embodiment.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.

Interrogation data recommendation method provided by the present application, can be applied in application environment as shown in Figure 1.Wherein, it asks Terminal 102, doctor terminal 104 is examined to be communicated by network with server 106 respectively.Server 106 is receiving interrogation end End send after answering a question, to currently segmenting wait answer a question, Feature Words are extracted according to word segmentation result, are obtained current Wait corresponding fisrt feature set of words of answering a question, it is corresponding second special to obtain each index node in the index database pre-established Set of words is levied, is calculated separately currently wait the corresponding fisrt feature set of words of answering a question the second spy corresponding with each index node The first similarity for levying set of words, is ranked up to choose present count each index node according to the first similarity calculation result The index of amount obtains target index node set, each entry is searched from interrogation information database as target index node The corresponding question and answer pair of index node are marked, are calculated separately currently wait answer a question with each question and answer to second between corresponding problem Similarity chooses target question and answer pair to being ranked up to each question and answer according to the second similarity calculation result, according to selection Target question and answer pair, phase doctor terminal carry out interrogation data recommendation, wherein the interrogation data of recommendation can be entire target question and answer It is right, it can also be only the reply message of target question and answer centering.

Wherein, interrogation terminal 102, doctor terminal 104 can be, but not limited to be various personal computers, laptop, Smart phone, tablet computer, server 104 can use the server cluster of the either multiple server compositions of independent server To realize.

In one embodiment, as shown in Fig. 2, providing a kind of interrogation data recommendation method, it is applied to Fig. 1 in this way In server for be illustrated, comprising the following steps:

Step S202 is obtained currently wait answer a question, to currently segmenting wait answer a question, is extracted according to word segmentation result Feature Words obtain currently wait corresponding fisrt feature set of words of answering a question.

Specifically, interrogation problem that interrogation user inputs in interrogation terminal is referred to wait answer a question.When interrogation, user exists When interrogation terminal inputs interrogation problem, server can receive the interrogation problem of interrogation terminal transmission, carry out to the interrogation problem Participle, obtains word segmentation result, wherein word segmentation result refers to the word sequence of the word composition one by one obtained after participle Column.Such as, the word segmentation result obtained after " I what if having a stomachache " participle can be with are as follows: I/stomach-ache/what if.

It, can be first according to punctuation mark will to be divided into a rule complete wait answer a question to currently being segmented wait answer a question Sentence, then word segmentation processing is carried out to the sentence of each cutting, such as using the segmenting method of string matching to each cutting Sentence carry out word segmentation processing, such as Forward Maximum Method method, the character string in the sentence of a cutting is segmented from left to right； Alternatively, reversed maximum matching method, the character string in the sentence of a cutting is segmented from right to left；Alternatively, shortest path point Morphology, it is least that the word number cut out is required inside the character string in the sentence of a cutting；Alternatively, two-way maximum matching method, It is forward and reverse while carrying out participle matching.Also word segmentation processing, the meaning of a word point are carried out using sentence of the meaning of a word participle method to each cutting Morphology is a kind of segmenting method of machine talk judgement, handles Ambiguity using syntactic information and semantic information to segment. Word segmentation processing also is carried out using sentence of the statistical morphology to each cutting, from the historical search of active user record or masses In the historical search record of user, according to the statistics of phrase, the frequency that can count some two adjacent words appearance is more, then may be used The two adjacent words are segmented as phrase.

Further, server extracts Feature Words according to word segmentation result.In one embodiment, extracting Feature Words specifically can be with To match word each in word segmentation result one by one with each word in the feature dictionary pre-established, using the word matched as Feature Words.In one embodiment, it is identical to can be two words for matching.In another embodiment, matching can be two Similarity between a word is more than preset threshold, and such as " stomach-ache " and " stomach-ache " can be used as two words being mutually matched.Wherein, Feature lexicon can be the authentic interpretation of the various diseases obtained from existing medical data base, including its corresponding letter The specialized informations such as Jie, symptom, complication, treatment drug, common inspection, are also possible to the corresponding medical information of various drugs, such as The information such as the disease type that drug cures mainly, the medical data are also possible to through tools such as web crawlers in real time or periodically from mutual Open source medical data source in networking is (for example, about the question and answer of various disease, discussion etc. or various new doctors on each World Jam Treat case, medical question and answer text etc.) the certain types of information that obtains is (for example, the corresponding therapeutic scheme of various disease, medicine Object, affiliated department, clinical manifestation etc.).

Step S204 obtains the corresponding second feature set of words of each index node in the index pre-established.

Specifically, for history interrogation data, question and answer pair are extracted in advance, then question and answer are mentioned to feature extraction has been carried out Feature Words corresponding to question and answer centering problem are included at least in the feature taken, these Feature Words form second feature set of words, and Question and answer pair and its corresponding feature are saved to same a line of the tables of data of interrogation database, finally according to the columns where feature It is indexed according to interrogation Database, each index node includes index value and pointer in index, wherein index value at least wraps Each question and answer are included to corresponding second feature set of words, pointer refers to one piece of region of memory, and region of memory record is to hard The reference of the data of the corresponding line of disc recording.=wherein, question and answer are to the problem of referring to interrogation user and the answer of doctor institute The information pair of composition.Question and answer are made of the problem of interrogation user and the answer of doctor to can be, be also possible to by Multiple answers of one problem of interrogation user and doctor form, can also be interrogation user continuous multiple problems and doctor one A answer composition can also be and be made of continuous multiple problems of interrogation user and continuous multiple answers of doctor.

In the present embodiment, server successively traverses each index node in index, asks for the index value of index node, obtains To the corresponding second feature set of words of each index node.

Step S206 is calculated separately currently corresponding with each index node wait corresponding fisrt feature set of words of answering a question Second feature set of words between the first similarity, each index node is ranked up according to the first similarity calculation result Index node to choose preset quantity obtains target index node set as target index node.

Specifically, the first similarity is used to characterize the similarity degree of fisrt feature set of words Yu second feature set of words.? In one embodiment, the first similarity can be cosine similarity, calculate currently wait corresponding fisrt feature word set of answering a question Close the cosine similarity of corresponding with any one index node second feature set of words, can respectively to fisrt feature set of words, Second feature set of words extract keyword, obtain wait answer a question corresponding first keyword set and index node it is corresponding Then second keyword set calculates its respective word frequency vector to the first keyword set and the second keyword set, most The included angle cosine value for calculating two word frequency vectors afterwards obtains cosine similarity.

Further, server is ranked up each index node of index database according to the size of cosine similarity, according to Ranking results choose the index node of preset quantity as target index node, obtain target index node set.In a reality It applies in example, server can carry out descending arrangement to index node according to the size of cosine similarity, choose the index node of TOPN1 As target index node, wherein N1 is the preset value being previously set, and rule of thumb can be set and be adjusted.

Step S208 is asked from each target index node is corresponding in acquisition target index node set in interrogation database It answers questions.

Specifically, due to being stored with the corresponding line being directed toward in interrogation database in table in each of index index node Pointer.The data of the corresponding corresponding line of index node can be obtained by the pointer, and question and answer are to being in the row data wherein one The data of column, therefore its corresponding question and answer pair can be got by index node.

Step S210 is calculated separately currently similar to second between corresponding problem with each question and answer wait answer a question Each question and answer are chosen target question and answer pair to being ranked up according to the second similarity calculation result, according to the target of selection by degree Question and answer are to progress interrogation data recommendation.

Specifically, the second similarity is for characterizing currently wait answer a question with each question and answer to the phase between corresponding problem Like degree.In one embodiment, the second similarity can be similarity of character string.Calculating is currently asked wait answer a question with each The second similarity between corresponding problem is answered questions, specifically, it may include following steps: server gets each target The corresponding question and answer of index node are calculated currently wait answer a question and ask in question and answer centering each question and answer pair for obtaining first to rear Editing distance between topic, wherein when editing distance refers to being modified to another character string from a character string, wherein editing Minimum number required for single character (such as modification, insertion, deletion).Then according to editor calculate currently wait answer a question with Similarity of character string between each question and answer centering problem of the question and answer centering of acquisition, formula are as follows: Similarity=(Max (x, Y)-Levenshtein)/Max (x, y), wherein x is wait corresponding string length of answering a question, and y is question and answer centering problem Corresponding string length, Levenshtein are editing distance.

Further, server according to the size of similarity of character string to each question and answer obtained in step S208 to arranging Then sequence chooses the question and answer of preset quantity to as target question and answer pair, according to these target question and answer to progress according to ranking results Interrogation data recommendation.In one embodiment, server can be according to the size of similarity of character string to obtaining in step S208 Each question and answer to carry out descending arrangement, choose the question and answer of TOPN2 to as target question and answer pair, wherein N2 is previously set Value, can rule of thumb be adjusted.

In one embodiment, server can be to interrogation data recommendation is carried out by all targets according to target question and answer Question and answer are also possible to select an optional question and answer to recommending doctor terminal to doctor terminal is recommended, or will come How one question and answer specifically recommend doctor terminal is recommended, and the application is it is not limited here.

In another embodiment, doctor's end is recommended in the answer that server is also possible to directly choose the centering of target question and answer End, can be the answer of all target question and answer pair all recommending doctor terminal, be also possible to the answer of an optional question and answer pair Doctor terminal is recommended, or the answer for the question and answer pair for coming first is selected to recommend doctor terminal, specifically how to be recommended, this Invention is herein with no restrictions.

In above-mentioned interrogation data recommendation method, wait corresponding feature set of words of answering a question, then server obtains first Calculate the first similarity in feature set of words and index to be answered a question between the feature set of words of each index node, choosing It takes the maximum some nodes of similarity as destination node, then searches the corresponding question and answer pair of these nodes, calculate and asked wait answer Topic and the second similarity of question and answer centering problem select the maximum some question and answer of similarity of character string to as target question and answer pair, According to these question and answer to come the recommendation that carries out interrogation data, by two minor sorts in the application, accurately located with wait answer The most like question and answer pair of problem are that doctor recommends essence when realizing interrogation according to most like question and answer to recommending automatically Quasi- answer, to improve the efficiency of interrogation.

In one embodiment, as shown in figure 3, including: before step S202

Step S302 obtains the corresponding interrogation information aggregate of all previous interrogation, pre-processes to interrogation information aggregate.

Specifically, all previous interrogation before referring to current time completed each secondary interrogation, interrogation information aggregate refer to Believed in primary complete interrogation by the information aggregate interrogation that the interrogation information of interrogation user and the return information of clinician user form Breath.

In the present embodiment, pretreatment includes subordinate sentence, reference resolution, context processing etc..Wherein, subordinate sentence is referred to one Information cutting is single sentence；Reference resolution refers to calculating the reference content of pronoun in sentence, can pass through syntactic analysis It is calculated with editing distance；Context processing refers to completion context.Such as: D: whether dizzy are you? U: yes, Be to be extended to me be dizzy.Make the meaning of second expression more comprehensive；Context processing is sentenced using syntactic analysis and clause It is disconnected.

Step S304 extracts question and answer pair to pretreated interrogation information aggregate, and to the question and answer of extraction to progress feature It extracts.

Specifically, in interrogation user once complete interrogation, it will usually repeatedly propose problem, interrogation user mentions each time Doctor will do it answer after ging wrong, and when the enquirement each time of interrogation user the problem of, doctor corresponding with the problem replied group At a question and answer pair.Extract question and answer to i.e. from primary completely interrogation corresponding interrogation information by question and answer to extracting.

Further, server is to the question and answer of extraction to progress feature extraction.In one embodiment, feature extraction can be To question and answer to the problems in extract keyword.In another embodiment, the feature of extraction for example can be the list of question and answer centering Sentence quantity, adjective number, interrogative etc..

Step S306 stores question and answer pair and question and answer to interrogation database to corresponding feature correspondence.

Specifically, server by question and answer to and question and answer corresponding feature is accordingly stored to interrogation database, i.e., will ask Answering questions with question and answer is column different in same a line of table in database to corresponding characteristic storage.

In one embodiment, interrogation user is communicated by instant message with doctor, is carried in message in interrogation The respective user identifier of communication two party, including interrogation user identifier and clinician user identify, and specifically, are sent by interrogation terminal Information, carry interrogation user identifier, by doctor terminal send information carry clinician user mark, therefore, server is obtaining When getting the corresponding interrogation information of all previous interrogation, the corresponding user identifier of interrogation information can be got simultaneously, then by question and answer pair Corresponding user identifier and question and answer store to interrogation database, question and answer to corresponding feature one-to-one correspondence.

Step S308 indexes interrogation Database according to feature.

Specifically, server establishes index, each node in index according to the column data where feature in interrogation database The data line in interrogation database is respectively corresponded, includes at least question and answer to, question and answer to corresponding feature.

In one embodiment, server can also establish index according to user identifier, feature.

In the present embodiment, by interrogation information extraction feature and establishing index, calculate wait answer a question with it is each When the similarity of question and answer centering, do not need to traverse entire database again, it is only necessary to be counted according to wait answer a question with index value It calculates, to improve computational efficiency significantly.

In one embodiment, as shown in figure 4, to pretreated interrogation information extraction question and answer pair, comprising:

Step S304A, obtains the corresponding user identifier of each interrogation information in interrogation information aggregate, and user identifier is to ask Examine user identifier or clinician user mark.

Specifically, each interrogation information all corresponds to a user identifier in interrogation information, is disappeared by what interrogation terminal was sent Breath, corresponding user identifier are interrogation user identifier, and the message sent by doctor terminal, corresponding user identifier is doctor User identifier.

Step S306B identifies corresponding interrogation information to clinician user and is filtered according to default rule.

Specifically, default rule includes at least: filter out with interrogative end up message, and with preset polite phase Matched message.Wherein, interrogative for example can be " what if ", " what ", " why " etc..Preset polite is Doctor terminal be previously set for saving the sentence of turnaround time, for example, " woulding you please wait ", " you are good, my class of being not currently in " Etc..

Step S308C extracts question and answer according to punctuation mark and interrogative to filtered interrogation text interrogation information aggregate It is right.

Specifically, filtered interrogation information is begun stepping through from first interrogation information, successively obtains each interrogation letter Corresponding user identifier is ceased, when the corresponding user identifier of interrogation information is interrogation user identifier, whether judges the interrogation information Comprising question sentence, if so, using the question sentence as one of problem of question and answer centering, from first later in problem correspondence Clinician user identifies corresponding interrogation information and starts, and obtains all continuous clinician users and identifies corresponding interrogation information, until The corresponding interrogation information of next interrogation user identifier occurs, and the clinician user that will acquire identifies corresponding interrogation information as should The answer of question sentence forms question and answer pair.Specifically, the question and answer of extraction to may include a problem one answer or one Problem continuously multiple answers, or one answer of continuous multiple problems, or continuous multiple problems continuously multiple answers, specifically It is any combination depending on specific interrogation situation, the application is herein with no restrictions.

In one embodiment, to the question and answer of extraction to carrying out feature extraction, comprising: to the question and answer of extraction to the problems in It is segmented, obtains the corresponding set of words of problem；By word each in set of words respectively with the feature dictionary that pre-establishes In each word matched, when successful match, using word as extract feature.

Specifically, server can first to the question and answer of extraction to the problems in segment, obtain the corresponding word set of problem It closes.Wherein, to the question and answer of extraction to the problems in segment, it is complete problem first can be divided by a rule according to punctuation mark Sentence, then word segmentation processing is carried out to the sentence of each cutting, if the segmenting method using string matching is to each cutting Sentence carries out word segmentation processing, and such as Forward Maximum Method method, the character string in the sentence of a cutting is segmented from left to right；Or Person, reversed maximum matching method from right to left segment character string in the sentence of a cutting；Alternatively, shortest path segments Method, it is least that the word number cut out is required inside the character string in the sentence of a cutting；Alternatively, two-way maximum matching method, just It is reversed to carry out participle matching simultaneously.Also word segmentation processing, meaning of a word participle are carried out using sentence of the meaning of a word participle method to each cutting Method is a kind of segmenting method of machine talk judgement, handles Ambiguity using syntactic information and semantic information to segment.Also Word segmentation processing is carried out using sentence of the statistical morphology to each cutting, from the historical search of active user record or public use In the historical search record at family, according to the statistics of phrase, the frequency that can count some two adjacent words appearance is more, then can incite somebody to action The two adjacent words are segmented as phrase.

Further, will in the obtained set of words of participle each word and each word in the feature dictionary pre-established one by one into Row matching, using the word matched as Feature Words.In one embodiment, it is identical to can be two words for matching.Another In a embodiment, it is more than preset threshold that matching, which can be the similarity between two words, and such as " stomach-ache " and " stomach-ache " can be made For two words being mutually matched.Wherein, feature lexicon can be the various diseases obtained from existing medical data base The specialized informations such as authentic interpretation, including its corresponding brief introduction, symptom, complication, treatment drug, common inspection, are also possible to each The corresponding medical information of kind drug, such as the disease type information that drug cures mainly, the medical data are also possible to climb by network The tools such as worm are in real time or periodically from the open source medical data source on internet (for example, about various disease on each World Jam Question and answer, discussion etc. or various new medical cases, medical question and answer text etc.) the certain types of information that obtains is (for example, different The corresponding therapeutic scheme of disease, therapeutic agent, affiliated department, clinical manifestation etc.).

In one embodiment, as shown in figure 5, calculate separately currently wait answer a question corresponding fisrt feature set of words with The step of the first similarity between the corresponding second feature set of words of each index node, comprising:

Step S502 calculates feature weight to each Feature Words in fisrt feature set of words and obtains the first calculated result, Keyword is chosen according to the first calculated result, is obtained currently wait corresponding first keyword set of answering a question.

Specifically, feature weight is used to characterize the significance level of some feature, and feature weight is bigger, illustrates that the specific word is got over It is important, it can more represent the meaning of set of words.In one embodiment, calculating feature weight to each Feature Words can be used word Frequently-inverse document frequency (term frequency-inverse document frequency, TF-IDF) algorithm.In this implementation In example, the first checkout result is obtained after calculating feature weight, wherein the first calculated result refers to each spy in the first set of words Levy the corresponding weighted value of word.Feature Words can be ranked up according to weighted value, then choose keyword according to ranking results, from And obtain the first keyword set.

In one embodiment, server can drop each Feature Words in fisrt feature set of words according to feature weight Then sequence arrangement chooses the forward preset quantity Feature Words that sort as keyword, to obtain the first keyword set.

Step S504 calculates feature weight to Feature Words each in second feature set of words and obtains the second calculated result, root Keyword is chosen according to the second calculated result, obtains corresponding second keyword set of each index node.

Specifically, term frequency-inverse document frequency algorithm can be used, feature is calculated to Feature Words each in second feature set of words Weight is to obtain the second calculated result, wherein the second calculated result refers to the feature power of each Feature Words in the second set of words Weight values can be ranked up Feature Words according to weighted value, then keyword be chosen according to ranking results, to obtain the second pass Keyword set.

In one embodiment, server can drop each Feature Words in second feature set of words according to feature weight Then sequence arrangement chooses the forward preset quantity Feature Words that sort as keyword, to obtain the second keyword set.

Step S506 is obtained currently according to the first keyword set and the second keyword set wait answer a question corresponding One word frequency vector and the corresponding second word frequency vector of each index node.

Specifically, the first keyword set and the second keyword set are merged to obtain a union, calculates separately the union In word frequency of each keyword in fisrt feature set of words and in second feature set of words, generate the according to word frequency respectively One word frequency vector sum the second word frequency vector.For example, if fisrt feature set of words are as follows: cough/smoking/insomnia, it is corresponding Keyword set is combined into { cough is smoked }；Second feature set of words are as follows: headache/cough/rhinorrhea/cooling, corresponding keyword For { headache is had a running nose }, two keywords are merged to obtain { cough is smoked, and is had a headache, and is had a running nose }, then, each word in the set Word frequency in fisrt feature set of words are as follows: cough 1 smokes 1, headache 0, has a running nose 0, and each word is in fisrt feature in the set Word frequency in set of words are as follows: cough 1, smoke 0, headache 1, have a running nose 1, then finally obtain the first word frequency vector be [1,1,0, 0], the second word frequency vector is [1,0,1,1].

Step S508, the included angle cosine calculated separately between each first word frequency vector and each second word frequency vector are worth To the first similarity.

Specifically, the calculation formula of cosine similarity are as follows:

Wherein, n (n >=2) is the dimension of word frequency vector, A_iFor the first word frequency vector, B_iFor the second word frequency vector.

In the present embodiment, pass through the extraction keyword from feature set of words and obtain word frequency vector to calculate two features The cosine similarity of set of words, compared to calculate wait answer a question, question and answer are to the similarity of two documents, save calculation amount, Improve computational efficiency.

In one embodiment, it is obtained as shown in fig. 6, calculating feature weight to each Feature Words in fisrt feature set of words To the first calculated result, comprising:

Step S602, using term frequency-inverse document frequency algorithm calculate fisrt feature set of words in each Feature Words it is initial Feature weight.

Specifically, word frequency TF is calculated first, be can refer to following formula and is calculated:

The total word number of number/document that some word of word frequency TF=occurs in a document；

Then, inverse document word frequency IDF is calculated, following formula is can refer to and is calculated:

Finally, calculating initial characteristics weight: W=TF*IDF.

Step S604 successively judges whether each Feature Words meet preset adjustment rule in fisrt feature set of words, if It is then to enter step S606；If it is not, then entering step S608.

Step S606 is adjusted the initial weight of Feature Words according to adjustment rule, obtains final feature weight.

Step S608, using initial characteristics weight as final feature weight.

Specifically, preset adjustment rule is the rule being adjusted to the feature weight of Feature Words manually set.? In one embodiment, preset adjustment rule be can be, when two Feature Words while the difference of appearance and its corresponding feature weight When less than preset threshold, then the weight of one of word is adjusted so that the difference of weight is not less than the preset threshold, e.g., When headache and hand pain occur being characterized word simultaneously, and the difference of its corresponding feature weight is less than 0.2, by the feature weight of headache It is adjusted, so that the difference of the feature weight of headache and hand pain is greater than 0.2, the purpose for the arrangement is that in order to make symptom be affected Feature Words weight increase, thus improve keyword choose when accuracy.

In the present embodiment, by being adjusted to feature weight, the accuracy of keyword selection can be improved.

It should be understood that although each step in the flow chart of Fig. 2-6 is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, at least one in Fig. 2-6 Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, the execution sequence in these sub-steps or stage is also not necessarily successively It carries out, but can be at least part of the sub-step or stage of other steps or other steps in turn or alternately It executes.

In one embodiment, as shown in fig. 7, providing a kind of interrogation data recommendation device 700, comprising:

Fisrt feature set of words obtains module 702, for obtaining currently wait answer a question, to currently wait progress of answering a question Participle extracts Feature Words according to word segmentation result, obtains currently wait corresponding fisrt feature set of words of answering a question；

Second feature set of words obtains module 704, corresponding for obtaining each index node in the index pre-established Second feature set of words；

Target index node set obtains module 706, for calculating separately currently wait corresponding fisrt feature of answering a question The first similarity between set of words second feature set of words corresponding with each index node, according to the first similarity calculation knot Fruit is ranked up the index node to choose preset quantity to each index node as target index node, obtains target index Node set；

Question and answer to obtain module 708, for from interrogation database obtain target index node set in each target rope Draw the corresponding question and answer pair of node；

Recommending module 710, for calculating separately currently wait answer a question with each question and answer between corresponding problem Two similarities choose target question and answer pair to being ranked up to each question and answer according to the second similarity calculation result, according to selection Target question and answer to carry out interrogation data recommendation.

In one embodiment, as shown in figure 8, device further include:

Preprocessing module 802 carries out interrogation information aggregate pre- for obtaining the corresponding interrogation information aggregate of all previous interrogation Processing；

Feature extraction module 804, for extracting question and answer pair to pretreated interrogation information aggregate, and to the question and answer of extraction To progress feature extraction；

Memory module 806, for storing question and answer pair and question and answer corresponding feature correspondence to interrogation database；

Index establishes module 808, for being indexed according to feature to interrogation Database.

In one embodiment, feature extraction module 804 is also used to obtain each interrogation information in interrogation information aggregate Corresponding user identifier, user identifier are that interrogation user identifier or clinician user identify；Corresponding interrogation is identified to clinician user Information is filtered according to default rule；To filtered interrogation information aggregate, asked according to punctuation mark and interrogative extraction It answers questions.

In one embodiment, feature extraction module 804 be also used to the question and answer of extraction to the problems in segment, obtain To the corresponding set of words of problem；By word each in set of words respectively with each word in the feature dictionary that pre-establishes into Row matching, when successful match, using word as the feature extracted.

In one embodiment, target index node set obtains module 706 and is also used to in fisrt feature set of words Each Feature Words calculate feature weight and obtain the first calculated result, choose keyword according to the first calculated result, obtain currently to It answers a question corresponding first keyword set；Feature weight is calculated to Feature Words each in second feature set of words and obtains second Calculated result chooses keyword according to the second calculated result, obtains corresponding second keyword set of each index node；According to First keyword set and the second keyword set obtain currently wait corresponding first word frequency vector and each rope of answering a question Draw the corresponding second word frequency vector of node；Calculate separately the angle between each first word frequency vector and each second word frequency vector Cosine value obtains the first similarity.

In one embodiment, target index node set is obtained module 706 and is also used to be calculated using term frequency-inverse document frequency Method calculates the initial characteristics weight of each Feature Words in fisrt feature set of words；When any one in fisrt feature set of words is special When sign word meets default adjustment rule, the initial characteristics weight of Feature Words is adjusted according to default adjustment is regular, is obtained most Whole feature weight；It, will be initial special when any one Feature Words in fisrt feature set of words are unsatisfactory for default adjustment rule Weight is levied as final feature weight.

Specific about interrogation data recommendation device limits the limit that may refer to above for interrogation data recommendation method Fixed, details are not described herein.Modules in above-mentioned interrogation data recommendation device can fully or partially through software, hardware and its Combination is to realize.Above-mentioned each module can be embedded in the form of hardware or independently of in the processor in computer equipment, can also be with It is stored in the memory in computer equipment in a software form, in order to which processor calls the above modules of execution corresponding Operation.

In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in Figure 8.The computer equipment include by system bus connect processor, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The database of machine equipment is for storing question and answer to, question and answer to data such as corresponding features.The network interface of the computer equipment is used It is communicated in passing through network connection with external terminal.To realize that a kind of interrogation data push away when the computer program is executed by processor Recommend method.

It will be understood by those skilled in the art that structure shown in Fig. 8, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.

In one embodiment, a kind of computer equipment, including memory and processor are provided, which is stored with Computer program, the processor perform the steps of acquisition currently wait answer a question, to currently wait return when executing computer program Question and answer topic is segmented, and is extracted Feature Words according to word segmentation result, is obtained currently wait corresponding fisrt feature set of words of answering a question； Obtain the corresponding second feature set of words of each index node in the index that pre-establishes；It calculates separately currently wait answer a question pair The first similarity between the fisrt feature set of words answered second feature set of words corresponding with each index node, according to first Similarity calculation result is ranked up the index node to choose preset quantity to each index node as target index node, Obtain target index node set；From obtaining in interrogation database, each target index node in target index node set is corresponding Question and answer pair；It calculates separately currently wait answer a question with each question and answer to the second similarity between corresponding problem, according to Two similarity calculated results choose target question and answer pair to being ranked up to each question and answer, according to the target question and answer of selection to progress Interrogation data recommendation.

In one embodiment, before the step of obtaining currently wait answer a question, processor is gone back when executing computer program It performs the steps of and obtains the corresponding interrogation information aggregate of all previous interrogation, interrogation information aggregate is pre-processed；To pretreatment Interrogation information aggregate afterwards extracts question and answer pair, and to the question and answer of extraction to progress feature extraction；By question and answer pair and question and answer to correspondence Feature correspondence store to interrogation database；Interrogation Database is indexed according to feature.

In one embodiment, to pretreated interrogation information extraction question and answer pair, comprising: obtain in interrogation information aggregate The corresponding user identifier of each interrogation information, user identifier are that interrogation user identifier or clinician user identify；To clinician user Corresponding interrogation information is identified to be filtered according to default rule；To filtered interrogation information aggregate, according to punctuation mark Question and answer pair are extracted with interrogative.

In one embodiment, it calculates separately currently wait corresponding fisrt feature set of words and each index section of answering a question The step of putting the first similarity between corresponding second feature set of words, comprising: to each spy in fisrt feature set of words Sign word calculates feature weight and obtains the first calculated result, chooses keyword according to the first calculated result, obtains currently asking wait answer Inscribe corresponding first keyword set；Feature weight is calculated to Feature Words each in second feature set of words and obtains the second calculating knot Fruit chooses keyword according to the second calculated result, obtains corresponding second keyword set of each index node；It is closed according to first Keyword set and the second keyword set obtain currently wait corresponding first word frequency vector and each index node of answering a question Corresponding second word frequency vector；Calculate separately the included angle cosine value between each first word frequency vector and each second word frequency vector Obtain the first similarity.

In one embodiment, feature weight is calculated to each Feature Words in fisrt feature set of words and obtains the first calculating As a result, comprising: weighed using the initial characteristics that term frequency-inverse document frequency algorithm calculates each Feature Words in fisrt feature set of words Weight；When any one Feature Words in fisrt feature set of words meet default adjustment rule, according to default adjustment rule to spy The initial characteristics weight of sign word is adjusted, and obtains final feature weight；When any one in fisrt feature set of words is special When sign word is unsatisfactory for default adjustment rule, using initial characteristics weight as final feature weight.

In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program performs the steps of acquisition currently wait answer a question when being executed by processor, to currently being segmented wait answer a question, Feature Words are extracted according to word segmentation result, are obtained currently wait corresponding fisrt feature set of words of answering a question；What acquisition pre-established The corresponding second feature set of words of each index node in index；It calculates separately currently wait corresponding fisrt feature word of answering a question Gather the first similarity between second feature set of words corresponding with each index node, according to the first similarity calculation result It is ranked up the index node to choose preset quantity to each index node as target index node, obtains target index section Point set；From obtaining the corresponding question and answer pair of each target index node in target index node set in interrogation database；Respectively It calculates currently wait answer a question with each question and answer to the second similarity between corresponding problem, according to the second similarity calculation knot Fruit chooses target question and answer pair to being ranked up to each question and answer, according to the target question and answer of selection to progress interrogation data recommendation.

In one embodiment, before the step of obtaining currently wait answer a question, when computer program is executed by processor It also performs the steps of and obtains the corresponding interrogation information aggregate of all previous interrogation, interrogation information aggregate is pre-processed；To pre- place Interrogation information aggregate after reason extracts question and answer pair, and to the question and answer of extraction to progress feature extraction；By question and answer pair and question and answer to right The feature correspondence answered is stored to interrogation database；Interrogation Database is indexed according to feature.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM in a variety of forms may be used , such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), Enhanced SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) are direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield all should be considered as described in this specification.

The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the concept of this application, various modifications and improvements can be made, these belong to the protection of the application Range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims

1. a kind of interrogation data recommendation method, which comprises

It obtains currently wait answer a question, is currently segmented wait answer a question to described, Feature Words are extracted according to word segmentation result, are obtained To described currently wait corresponding fisrt feature set of words of answering a question；

It calculates separately described currently corresponding with each index node second special wait the corresponding fisrt feature set of words of answering a question The first similarity between set of words is levied, each index node is ranked up according to the first similarity calculation result pre- to choose If the index node of quantity obtains target index node set as target index node；

It calculates separately described currently wait answer a question with each question and answer to the second similarity between corresponding problem, according to second Similarity calculation result chooses target question and answer pair to being ranked up to each question and answer, according to the target question and answer of selection into Row interrogation data recommendation.

2. the method according to claim 1, wherein being wrapped before described the step of obtaining currently wait answer a question It includes:

The interrogation Database is indexed according to the feature.

3. according to the method described in claim 2, it is characterized in that, described to pretreated interrogation information extraction question and answer pair, Include:

The corresponding user identifier of each interrogation information in the interrogation information aggregate is obtained, the user identifier is interrogation user Mark or clinician user mark；

4. according to the method in claim 2 or 3, which is characterized in that the question and answer of described pair of extraction are to progress feature pumping It takes, comprising:

Word each in the set of words is matched with each word in the feature dictionary pre-established respectively, works as matching When success, using the word as the feature extracted.

5. the method according to claim 1, wherein it is described calculate separately it is described currently corresponding wait answer a question The step of the first similarity between fisrt feature set of words second feature set of words corresponding with each index node, comprising:

Feature weight is calculated to each Feature Words in the fisrt feature set of words and obtains the first calculated result, according to described the One calculated result chooses keyword, obtains described currently wait corresponding first keyword set of answering a question；

Feature weight is calculated to Feature Words each in second feature set of words and obtains the second calculated result, is calculated according to described second As a result keyword is chosen, corresponding second keyword set of each index node is obtained；

It is obtained currently according to first keyword set and second keyword set wait corresponding first word of answering a question Frequency vector and the corresponding second word frequency vector of each index node；

The included angle cosine value calculated separately between each first word frequency vector and each second word frequency vector obtains the first similarity.

6. according to the method described in claim 5, it is characterized in that, each feature in the fisrt feature set of words Word calculates feature weight and obtains the first calculated result, comprising:

The initial characteristics weight of each Feature Words in the fisrt feature set of words is calculated using term frequency-inverse document frequency algorithm；

When any one Feature Words in the fisrt feature set of words meet default adjustment rule, according to the default adjustment Rule is adjusted the initial characteristics weight of Feature Words, obtains final feature weight；

When any one Feature Words in the fisrt feature set of words are unsatisfactory for default adjustment rule, by the initial characteristics Weight is as final feature weight.

7. a kind of interrogation data recommendation device, which is characterized in that described device includes:

Fisrt feature set of words obtains module, for obtaining currently wait answer a question, to described currently wait answer a question point Word extracts Feature Words according to word segmentation result, obtains described currently wait corresponding fisrt feature set of words of answering a question；

Second feature set of words obtains module, for obtaining the corresponding second feature of each index node in the index pre-established Set of words；

Target index node set obtains module, described currently wait corresponding fisrt feature word set of answering a question for calculating separately The first similarity between second feature set of words corresponding with each index node is closed, according to the first similarity calculation result pair Each index node is ranked up the index node to choose preset quantity as target index node, obtains target index node Set；

Question and answer to obtain module, for from interrogation database obtain target index node set in each target index node pair The question and answer pair answered；

Recommending module, it is described currently wait answer a question with each question and answer to the second phase between corresponding problem for calculating separately Like degree, target question and answer pair are chosen to being ranked up to each question and answer according to the second similarity calculation result, according to the institute of selection Target question and answer are stated to progress interrogation data recommendation.

8. device according to claim 7, which is characterized in that described device further include:

Preprocessing module locates the interrogation information aggregate for obtaining the corresponding interrogation information aggregate of all previous interrogation in advance Reason；

Feature extraction module, for extracting question and answer pair to pretreated interrogation information aggregate, and to the question and answer pair of extraction Carry out feature extraction；

Memory module, for storing the question and answer pair and the question and answer the corresponding feature correspondence to interrogation database；

9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 6 the method when executing the computer program.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 6 is realized when being executed by processor.