CN109033244A - Search result ordering method and device - Google Patents

Search result ordering method and device Download PDF

Info

Publication number
CN109033244A
CN109033244A CN201810729232.9A CN201810729232A CN109033244A CN 109033244 A CN109033244 A CN 109033244A CN 201810729232 A CN201810729232 A CN 201810729232A CN 109033244 A CN109033244 A CN 109033244A
Authority
CN
China
Prior art keywords
candidate
correlation
described search
question
answer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810729232.9A
Other languages
Chinese (zh)
Other versions
CN109033244B (en
Inventor
施振辉
陈俊
周景博
范斌
罗程亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810729232.9A priority Critical patent/CN109033244B/en
Publication of CN109033244A publication Critical patent/CN109033244A/en
Application granted granted Critical
Publication of CN109033244B publication Critical patent/CN109033244B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present invention proposes a kind of search result ordering method and device.Include search problem in user's request this method comprises: obtaining user's request and candidate result from the first ranking results, includes candidate problem and the corresponding candidate answers of each candidate problem in candidate result;Obtain the first correlation metric of search problem and candidate problem;Obtain the second correlation metric of search problem and candidate answers;It according to the first correlation metric and the second correlation metric, reorders to the first ranking results, obtains the second ranking results.Because joined more specific correlation metrics in the second sequence, so that ranking results be made not limited by single sort method, it can better and more conveniently provide accurately to answer and sort and handle some specific problems.

Description

Search result ordering method and device
Technical field
The present invention relates to automatic question answering technical field more particularly to a kind of search result ordering methods and device.
Background technique
With the fast development of internet, there is largely search need relevant to medical knowledge aspect.For this A little search needs have derived medical intelligent answer service.
In medical automatic question answering, because being related to the particularity of medical treatment and the preciseness of answer, existing main method It is to carry out relevance ranking to existing answer content to provide answer.However these rely on the method for single relevance ranking due to piece Face property, limitation etc. lack the comprehensive measurement to question and answer correlation, it is difficult to provide accurately ranking results.And other are led The answering method in domain can not also be directly extended to medical field.
Scheme (1) is ranked up based on the information of problem and problem, has ignored the key message for including in answer, it is desirable to The quality of question and answer in original question and answer library is highly dependent on to good ranking results.
Scheme (2) is ranked up based on problem and the information of answer, the key message for including in problem is had ignored, in medical treatment Field, a bit, answer may be entirely different for problem deviation, therefore it is inaccurate to will lead to sequence.
Scheme (3) is ranked up based on the method that problem, answer merge, although containing the information of problem and answer, A kind of sort method is to have to stress to ranking results, and more complicated scene can not be coped in medical intelligent answer.
Summary of the invention
The embodiment of the present invention provides a kind of search result ordering method and device, to solve one in the prior art or more A technical problem.
In a first aspect, the embodiment of the invention provides a kind of search result ordering methods, comprising:
User's request and candidate result are obtained from the first ranking results, include search problem, institute in user's request State in candidate result includes candidate problem and the corresponding candidate answers of each candidate problem;
Obtain the first correlation metric of described search problem and the candidate problem;
Obtain the second correlation metric of described search problem and the candidate answers;
According to first correlation metric and second correlation metric, first ranking results are reset Sequence obtains the second ranking results.
With reference to first aspect, the embodiment of the present invention is in the first implementation of first aspect, according to first phase Closing property index and second correlation metric, reorder to first ranking results, obtain the second sequence as a result, packet It includes:
The candidate question and answer group for including in high priority list is determined according to first correlation metric;
The candidate question and answer group for including in low priority list is determined according to second correlation metric;
By the candidate question and answer group in the high priority list and the low priority list, according to high priority it is preceding, The posterior sequence of low priority merges, and obtains second ranking results.
The first implementation with reference to first aspect, second implementation of the embodiment of the present invention in first aspect In, the candidate question and answer group for including in high priority list is determined according to first correlation metric, comprising:
If at least one first correlation metric of a candidate question and answer group is higher than given threshold, the candidate is asked Answer a group addition high priority list.
The first implementation with reference to first aspect, the third implementation of the embodiment of the present invention in first aspect In, the candidate question and answer group for including in low priority list is determined according to second correlation metric, comprising:
If at least one second correlation metric of a candidate question and answer group is higher than given threshold, the candidate is asked Answer a group addition low priority list.
With reference to first aspect, the embodiment of the present invention obtains described search and asks in the 4th kind of implementation of first aspect First correlation metric of topic and the candidate problem, at least one including following manner:
Calculate the word rank TF-IDF similitude of described search problem and the candidate problem;
Calculate the character rank TF-IDF similitude of described search problem and the candidate problem;
Calculate the phonetic transcriptions of Chinese characters rank TF-IDF similitude of described search problem and the candidate problem;
Calculate the depth problem similitude of described search problem and the candidate problem;
Calculate the term vector similitude of described search problem and the candidate problem;
The potential applications for calculating described search problem and the candidate problem index similitude.
With reference to first aspect, the embodiment of the present invention obtains described search and asks in the 5th kind of implementation of first aspect Second correlation metric of topic and the candidate answers, at least one including following manner:
Calculate the depth question and answer correlation of described search problem and the candidate answers;
Calculate the word rank TF-IDF correlation of described search problem and the candidate answers;
Calculate the character rank TF-IDF correlation of described search problem and the candidate answers;
Calculate the phonetic transcriptions of Chinese characters rank TF-IDF correlation of described search problem and the candidate answers;
Calculate the term vector correlation of described search problem and the candidate answers;
It calculates described search problem and the potential applications of the candidate answers indexes correlation.
Second aspect, the embodiment of the invention provides a kind of search results ranking devices, comprising:
First sorting module, for obtaining user's request and candidate result, user's request from the first ranking results In include search problem, include candidate problem and the corresponding candidate answers of each candidate problem in the candidate result;
First correlation module, for obtaining the first correlation metric of described search problem and the candidate problem;
Second correlation module, for obtaining the second correlation metric of described search problem Yu the candidate answers;
Second sorting module, for according to first correlation metric and second correlation metric, to described the One ranking results reorder, and obtain the second ranking results.
In conjunction with second aspect, the embodiment of the present invention is in the first implementation of second aspect, the second sequence mould Block includes:
High priority submodule, for determining the candidate for including in high priority list according to first correlation metric Question and answer group;
Low priority submodule, for determining the candidate for including in low priority list according to second correlation metric Question and answer group;
Ordering by merging submodule, for by the candidate question and answer in the high priority list and the low priority list Group merges in the posterior sequence of preceding, low priority according to high priority, obtains second ranking results.
In conjunction with the first implementation of second aspect, second implementation of the embodiment of the present invention in second aspect In, if at least one first correlation metric that the high priority submodule is also used to a candidate question and answer group is higher than setting Then high priority list is added in the candidate question and answer group by threshold value.
In conjunction with the first implementation of second aspect, the third implementation of the embodiment of the present invention in second aspect In, if at least one second correlation metric that the low priority submodule is also used to a candidate question and answer group is higher than setting Then low priority list is added in the candidate question and answer group by threshold value.
In conjunction with second aspect, the embodiment of the present invention is in the 4th kind of implementation of second aspect, first correlation Module includes at least one of following submodule:
First word rank submodule, it is similar to the candidate word rank TF-IDF of problem for calculating described search problem Property;
First character level small pin for the case module, for calculating the character rank TF-IDF of described search problem and the candidate problem Similitude;
First phonetic transcriptions of Chinese characters rank submodule, for calculating the phonetic transcriptions of Chinese characters grade of described search problem and the candidate problem Other TF-IDF similitude;
Depth problem submodule, for calculating the depth problem similitude of described search problem and the candidate problem;
First term vector submodule, for calculating the term vector similitude of described search problem and the candidate problem;
First potential applications index submodule, for calculating the potential applications rope of described search problem and the candidate problem Draw similitude.
In conjunction with second aspect, the embodiment of the present invention is in the 5th kind of implementation of second aspect, second correlation Module includes at least one of following submodule:
Depth question and answer submodule, for calculating depth question and answer correlation of the described search problem with the candidate answers;
Second word rank submodule is related to the word rank TF-IDF of the candidate answers for calculating described search problem Property;
Second character level small pin for the case module, for calculating the character rank TF-IDF of described search problem Yu the candidate answers Correlation;
Second phonetic transcriptions of Chinese characters rank submodule, for calculating the phonetic transcriptions of Chinese characters grade of described search problem Yu the candidate answers Other TF-IDF correlation;
Second term vector submodule, for calculating term vector correlation of the described search problem with the candidate answers;
Second potential applications index submodule, for calculating the potential applications rope of described search problem Yu the candidate answers Draw correlation.
The third aspect, the embodiment of the invention provides a kind of search results ranking device, the function of described device can lead to Hardware realization is crossed, corresponding software realization can also be executed by hardware.The hardware or software include it is one or more with it is upper State the corresponding module of function.
It is described to deposit including processor and memory in the structure of search results ranking device in a possible design Reservoir is used to store the program for supporting search results ranking device to execute mentioned above searching results sort method, and the processor is matched It is set to for executing the program stored in the memory.Described search sort result device can also include communication interface, use In search results ranking device and other equipment or communication.
Fourth aspect, the embodiment of the invention provides a kind of computer readable storage mediums, for storing search result row Computer software instructions used in sequence device comprising for executing program involved in mentioned above searching results sort method.
A technical solution in above-mentioned technical proposal has the following advantages that or the utility model has the advantages that on the basis of based on main sequence Reorder technology, it is possible to prevente effectively from the aspect unicity of concern, the one-sidedness and limitation of the correlative character extracted The shortcomings that.
Another technical solution in above-mentioned technical proposal has the following advantages that or the utility model has the advantages that the technology of reordering is medical treatment A nucleus module in intelligent answer.Addition is reordered module, is realized to the further of medical intelligent answer ranking results Optimization.In other words, on the basis of having the answer sorted, we adjust to the position of part of result, so that Certain more suitable answer position Forwards, inappropriate answer position moves back, to achieve the purpose that Optimal scheduling result.
Above-mentioned general introduction is merely to illustrate that the purpose of book, it is not intended to be limited in any way.Except foregoing description Schematical aspect, except embodiment and feature, by reference to attached drawing and the following detailed description, the present invention is further Aspect, embodiment and feature, which will be, to be readily apparent that.
Detailed description of the invention
In the accompanying drawings, unless specified otherwise herein, otherwise indicate the same or similar through the identical appended drawing reference of multiple attached drawings Component or element.What these attached drawings were not necessarily to scale.It should be understood that these attached drawings depict only according to the present invention Disclosed some embodiments, and should not serve to limit the scope of the present invention.
Fig. 1 is the flow chart according to the search result ordering method of the embodiment of the present invention.
Fig. 2 is the flow chart according to the search result ordering method of the embodiment of the present invention.
Fig. 3 is the flow chart according to the search result ordering method of the embodiment of the present invention.
Fig. 4 is the block diagram according to the search results ranking device of the embodiment of the present invention.
Fig. 5 is the block diagram according to the search results ranking device of the embodiment of the present invention.
Fig. 6 is the structural block diagram according to the search results ranking device of the embodiment of the present invention.
Specific embodiment
Hereinafter, certain exemplary embodiments are simply just described.As one skilled in the art will recognize that Like that, without departing from the spirit or scope of the present invention, described embodiment can be modified by various different modes. Therefore, attached drawing and description are considered essentially illustrative rather than restrictive.
Fig. 1 is the flow chart according to the search result ordering method of the embodiment of the present invention.
As shown in Figure 1, the search result ordering method may comprise steps of:
Step S110, user's request and candidate result are obtained from the first ranking results, include that search is asked in user's request It inscribes, includes candidate problem and the corresponding candidate answers of each candidate problem in candidate result.
Step S120, the first correlation metric of search problem and candidate problem is obtained.
Step S130, the second correlation metric of search problem and candidate answers is obtained.
Step S140, it according to the first correlation metric and the second correlation metric, reorders to the first ranking results, Obtain the second ranking results.
In intelligent answer field, user can input in a search engine according to their own needs wants the problem of puing question to (i.e. search problem).For example, according to search problem search to candidate result may include several question and answer groups (candidate problem and its Corresponding candidate answers).Then, it is tentatively sorted using the various ways question and answer group candidate to these, such as: 1) it is based on asking The mode of topic and problem tentatively sorts.Candidate problem and search problem are encoded, according to candidate problem and search for problem Similarity is ranked up.2) mode based on problem and answer tentatively sorts.Candidate answers and search problem are encoded, root It is ranked up according to the similarity of candidate answers and search problem.3) mode merged based on problem, answer.To candidate problem, time It selects answer and search problem to be encoded, is ranked up according to comprehensive similarity.
After the first minor sort, available first sequence is as a result, available user's request from the first ranking results With multiple candidate results.It wherein, may include the search problem of user's input in user's request, each candidate result may include One candidate problem and its corresponding one or more candidate answers.
For multiple candidate results in the first ranking results, it is related to the first of candidate problem that search problem can be calculated Property index, and search problem and candidate answers the second correlation metric, in conjunction with both indexs to this multiple candidate result It reorders, to obtain, accurately ranking results more related to search problem.
In one possible implementation, as shown in Fig. 2, step S140 includes:
Step S210, the candidate question and answer group for including in high priority list is determined according to the first correlation metric.
Step S220, the candidate question and answer group for including in low priority list is determined according to the second correlation metric.
Step S230, by the candidate question and answer group in high priority list and low priority list, according to high priority it is preceding, The posterior sequence of low priority merges, and obtains second ranking results.
In one possible implementation, step S120 includes at least one of following manner:
Calculate the word rank TF-IDF (Term Frequency-Inverse Document of search problem and candidate problem Frequency, the inverse text frequency of word frequency -) similitude;
Calculate the character rank TF-IDF similitude of search problem and candidate problem;
Calculate the phonetic transcriptions of Chinese characters rank TF-IDF similitude of search problem and candidate problem;
Calculate the depth problem similitude of search problem and candidate problem;
Calculate the term vector similitude of search problem and candidate problem;
The potential applications for calculating search problem and candidate problem index similitude.
For example, search problem and candidate problem can be segmented, word rank is then calculated according to word segmentation result TF-IDF similitude.A point word can be carried out to search problem and candidate problem, then basis divides word result calculating character rank TF- IDF similitude.Search problem and the candidate problem Chinese phonetic alphabet can be obtained respectively, and phonetic transcriptions of Chinese characters is then calculated according to the Chinese phonetic alphabet Rank TF-IDF similitude.
Wherein, the advantages of calculating phonetic transcriptions of Chinese characters TF-IDF similitude is as follows:
Phonetic is the Chinese difference one of important with English, and each Chinese text uniquely corresponds to the sequence of a phonetic Column.Most users use spelling input method as Chinese character input tool, i.e. the first corresponding phonetic of input Chinese character, if then again from It does in the corresponding Chinese character of the phonetic and selects.This operation leads to user it is possible that wrong choice.If identical phonetic pair The Chinese character answered is different, such as the phonetic of " life " and " lighting a fire " is all " shenghuo ", and user may select the phonetically similar word of mistake. In addition, sometimes user just knows that the pronunciation of some word due to the generally use of Pinyin Input, do not know that specific Chinese character is write but Method also will affect the accuracy of Chinese character input.In medical intelligent answer scene, the doctor of various all Internet user's inputs The text that searching request is frequently not specification is treated, may include many literal mistakes.Therefore, make to be expressed in pinyin text This, then calculate text similarity and can then weaken the influence as caused by wrong word to a certain extent.
Phonetic transcriptions of Chinese characters TF-IDF can be calculated in character rank.For example, for the text S including Chinese character, in S Literary Chinese character is converted into pinyin representation (not considering tone), and the non-Chinese character in S then retains original character.Each is independent Phonetic transcriptions of Chinese characters can be regarded as an independent character.For example, by Chinese-character text " coughing with a lot of sputum " be converted into " ke ", " sou ", " tan " and " duo " four characters.It is then possible to using modes meters such as the IDF feature of character, the TF-IDF feature of text and cosine similarities The TF-IDF for calculating phonetic transcriptions of Chinese characters is similar.
Depth problem similitude is referred to as depth QQ similitude.When realizing depth QQ correlation, it can use and ask The modes such as topic cluster obtain the similar several other problems Q ' of each problem Q, learn (Pairwise using sorting to grade Learning) mode is trained.Then the model depth QQ similitude obtained search problem and candidate problem input training Result.
In one possible implementation, step S130 includes at least one of following manner:
Calculate the depth question and answer correlation of search problem and candidate answers;
Calculate the word rank TF-IDF correlation of search problem and candidate answers;
Calculate the character rank TF-IDF correlation of search problem and candidate answers;
Calculate the phonetic transcriptions of Chinese characters rank TF-IDF correlation of search problem and candidate answers;
Calculate the term vector correlation of search problem and candidate answers;
It calculates search problem and the potential applications of candidate answers indexes correlation.
Wherein, depth QA (question and answer) correlation can excavate the language of the search problem Q and candidate answers A of user Adopted relationship.The correlation of search problem Q with candidate answers A are calculated, using deep learning to adjust according to problem and problem phase The ranking results obtained like degree.
For example, under medical intelligent answer scene, in addition to matching user searches for problem QuWith candidate problem QiText it is similar Except property, sequence accuracy can be further improved by the association between matching problem and answer.On the one hand, two problems can Can be entirely different in the text, and be semantically identical or closely similar.It is answered if both of these problems are corresponding When case is identical or closely similar, even if QuWith QiIt is unable to complete matching, it can also be according to QuWith AiBetween association carry out Match.On the other hand, the case where question and answer group in question and answer resources bank also occurs erroneous matching, the problems in question and answer resources bank and answers The cost that case is difficult to accomplish to exactly match or reach exact matching is very high, may make Q in libraryiWith corresponding AiIt is not stringent Matching.In this case, ranking results can also be finely adjusted by depth QA correlation.
In one possible implementation, it is determined in high priority list according to first correlation metric and includes Candidate question and answer group, comprising:
If at least one first correlation metric of a candidate question and answer group is higher than given threshold, the candidate is asked Answer a group addition high priority list.
In one possible implementation, it is determined in low priority list according to second correlation metric and includes Candidate question and answer group, comprising:
If at least one second correlation metric of a candidate question and answer group is higher than given threshold, the candidate is asked Answer a group addition low priority list.
Wherein, each correlation metric may have a given threshold.The threshold value of different correlation metrics may not Together.First correlation metric mainly reflects the text similarity of problem and problem.Second correlation metric mainly reflect problem with The correlation of answer.In embodiments of the present invention, can according to practical application scene, select required the first correlation metric and The number amount and type of second correlation metric.Then each correlation metric and threshold between comparison search problem and candidate question and answer group Value, so that the classification of candidate question and answer group is stored in different priority lists.
In a kind of example, many indexes can be compared according to certain sequence.First more a certain index, will be qualified Question and answer group is put into corresponding priority list, and ineligible question and answer group is compared according to another index, and so on.
For example, if the similitude for having 10 question and answer groups and searching for the word rank of problem is higher than setting in 100 question and answer groups This 10 question and answer groups are then added in high prioritized results list by threshold value.Then, more remaining 90 problem sets are asked with search The similitude and given threshold of autograph symbol rank, then therefrom obtain in 20 groups of addition high priority lists.And so on, it is no longer superfluous It states.
In another example, many indexes, then duplicate removal can be respectively compared.
For example, comparing 100 question and answer groups and searching for the similitude of the word rank of problem, the phase of 10 word ranks is therefrom chosen It is higher than the question and answer group of given threshold like property.Compare this 100 question and answer groups and searches for the other similitude of character level of problem, Cong Zhongxuan 40 other similitudes of character level are taken to be higher than the question and answer group of given threshold.30 question and answer groups will be obtained after this 40 question and answer group duplicate removals It is added high priority list (high priority list duplicate removal again can also first be added).
Ranking results are advanced optimized and adjusted by a variety of correlation metrics, it can be by certain more suitable answer positions Forward is set, inappropriate answer position is moved back, with Optimal scheduling result.
In a kind of example, based on the above search problem QuWith candidate problem QiSimilitude, and search problem QuWith time Select problem AiCorrelation, using method as shown in Figure 3, by each question and answer group (Q in ranking results beforei, Ai) according to row Sequence is successively handled from front to back, and steps are as follows:
Step S301, Q is calculateduWith QiWord rank TF-IDF similitude, if similitude be higher than a certain threshold value, by (Qi, Ai) the supreme prioritized results list of addition;If similitude is lower than a certain threshold value, the question and answer group is abandoned;Otherwise, it enters step S302。
For example, two threshold values Y1, Y2 can be arranged for word rank TF-IDF similitude, Y1 is greater than Y2.If the question and answer group QuWith QiWord rank TF-IDF similitude be greater than Y1, then be put into high priority list.If the Q of the question and answer groupuWith QiWord Rank TF-IDF similitude is less than Y2, then abandons the question and answer group, the question and answer group for obviously not having correlation can be excluded, after reduction The continuous quantity compared.Question and answer group between Y1 and Y2, can compare other correlation metrics.Various correlations in example refer to The setting of target threshold value is similar with manner of comparison, is not repeated to illustrate below.
Step S302, Q is calculateduWith QiCharacter rank TF-IDF similitude, if similitude be higher than a certain threshold value, will (Qi, Ai) the supreme prioritized results list of addition;If similitude is lower than a certain threshold value, the question and answer group is abandoned;Otherwise, it enters step S303。
Step S303, Q is calculateduWith QiPhonetic transcriptions of Chinese characters TF-IDF similitude, if similitude be higher than a certain threshold value, will (Qi, Ai) the supreme prioritized results list of addition;If similitude is lower than a certain threshold value, the question and answer group is abandoned;Otherwise, it enters step S304。
Step S304, Q is calculateduWith QiDepth QQ similitude, if similitude is higher than a certain threshold value, by (Qi, Ai) be added to High prioritized results list;If similitude is lower than a certain threshold value, the question and answer group is abandoned;Otherwise, S305 is entered step.
Step S305, Q is calculateduWith QiTerm vector similitude, if similitude be higher than a certain threshold value, by (Qi, Ai) addition Supreme prioritized results list;If similitude is lower than a certain threshold value, the question and answer group is abandoned;Otherwise, S306 is entered step.
Step S306, Q is calculateduWith QiLSI (Latent Semantic Indexing, potential applications index) similitude, If similitude is higher than a certain threshold value, by (Qi, Ai) the supreme prioritized results list of addition;If similitude is lower than a certain threshold value, lose Abandon the question and answer group;Otherwise, S307 is entered step.
Step S307, Q is calculateduWith AiDepth QA correlation, if candidate result is (without entering high prioritized results list ) in maximum depth correlation be higher than a certain threshold value, then candidate result is added into low prioritized results list;Otherwise, it is arranged List is empty for low prioritized results;Execute step S308.
Step S308, preferentially merge two the results lists in preceding, low preferential posterior principle by high, the sequence knot after merging Fruit is final ranking results.
It should be pointed out that the sequence of step 301- step 308 can be adjusted as required, user search for problem with The similitude and correlation that the similitude and correlation of candidate problem, user search for problem and candidate answers can be according to actually answering Different indexs is selected to reorder with scene, in embodiments of the present invention without limitation.
Method for reordering is added after main sort method in the embodiment of the present invention, in the scene of for example medical intelligent answer, It is not comprehensive (such as one-sidedness, limitation) that the ranking results obtained due to main sort method can be efficiently solved, it is difficult to provide essence The problem of sequence of standard.By the way that many specifically relevant property indexs can be added in reordering, to make arranging order result comprehensive It is more multifactor, it can more preferably, more easily provide and accurately to answer sequence, handle some specific medical care problems.
Fig. 4 is the block diagram according to the search results ranking device of the embodiment of the present invention.As shown in figure 4, the device includes:
First sorting module 41 is asked for obtaining user's request and candidate result, the user from the first ranking results Include search problem in asking, includes candidate problem and the corresponding candidate answers of each candidate problem in the candidate result;
First correlation module 42, for obtaining the first correlation metric of described search problem and the candidate problem;
Second correlation module 43, for obtaining the second correlation metric of described search problem Yu the candidate answers;
Second sorting module 45 is used for according to first correlation metric and second correlation metric, to described First ranking results reorder, and obtain the second ranking results.
In one possible implementation, the second sorting module 45, further includes:
High priority submodule 451 includes for being determined in high priority list according to first correlation metric Candidate question and answer group;
Low priority submodule 452 includes for being determined in low priority list according to second correlation metric Candidate question and answer group;
Ordering by merging submodule 453, for asking the candidate in the high priority list and the low priority list Group is answered, is merged according to high priority in the posterior sequence of preceding, low priority, obtains second ranking results.
In one possible implementation, high priority submodule 451, if being also used to a candidate question and answer group extremely Few first correlation metric is higher than given threshold, then high priority list is added in the candidate question and answer group.
In one possible implementation, low priority submodule 452, if being also used to a candidate question and answer group extremely Few second correlation metric is higher than given threshold, then low priority list is added in the candidate question and answer group.
In one possible implementation, the first correlation module 42, at least one including following submodule:
First word rank submodule, it is similar to the candidate word rank TF-IDF of problem for calculating described search problem Property;
First character level small pin for the case module, for calculating the character rank TF-IDF of described search problem and the candidate problem Similitude;
First phonetic transcriptions of Chinese characters rank submodule, for calculating the phonetic transcriptions of Chinese characters grade of described search problem and the candidate problem Other TF-IDF similitude;
Depth problem submodule, for calculating the depth problem similitude of described search problem and the candidate problem;
First term vector submodule, for calculating the term vector similitude of described search problem and the candidate problem;
First potential applications index submodule, for calculating the potential applications rope of described search problem and the candidate problem Draw similitude.
In one possible implementation, the second correlation module 43, at least one including following submodule:
Depth question and answer submodule, for calculating depth question and answer correlation of the described search problem with the candidate answers;
Second word rank submodule is related to the word rank TF-IDF of the candidate answers for calculating described search problem Property;
Second character level small pin for the case module, for calculating the character rank TF-IDF of described search problem Yu the candidate answers Correlation;
Second phonetic transcriptions of Chinese characters rank submodule, for calculating the phonetic transcriptions of Chinese characters grade of described search problem Yu the candidate answers Other TF-IDF correlation;
Second term vector submodule, for calculating term vector correlation of the described search problem with the candidate answers;
Second potential applications index submodule, for calculating the potential applications rope of described search problem Yu the candidate answers Draw correlation
The function of each module in each device of the embodiment of the present invention may refer to the corresponding description in the above method, herein not It repeats again.
Fig. 6 is the structural block diagram according to the search results ranking device of one embodiment of the invention.As shown in fig. 6, the device Include: memory 910 and processor 920, the computer program that can be run on processor 920 is stored in memory 910.Institute State the search result ordering method realized in above-described embodiment when processor 920 executes the computer program.The memory 910 and processor 920 quantity can for one or more.
The device further include:
Communication interface 930 carries out data interaction for being communicated with external device.
Memory 910 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non- Volatile memory), a for example, at least magnetic disk storage.
If memory 910, processor 920 and the independent realization of communication interface 930, memory 910,920 and of processor Communication interface 930 can be connected with each other by bus and complete mutual communication.The bus can be Industry Standard Architecture Structure (ISA, Industry Standard Architecture) bus, external equipment interconnection (PCI, Peripheral Component) bus or extended industry-standard architecture (EISA, Extended Industry Standard Component) bus etc..The bus can be divided into address bus, data/address bus, control bus etc..For convenient for expression, Fig. 6 In only indicated with a thick line, it is not intended that an only bus or a type of bus.
Optionally, in specific implementation, if memory 910, processor 920 and communication interface 930 are integrated in one piece of core On piece, then memory 910, processor 920 and communication interface 930 can complete mutual communication by internal interface.
The embodiment of the invention provides a kind of computer readable storage mediums, are stored with computer program, the program quilt Processor realizes any method in above-described embodiment when executing.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.Moreover, particular features, structures, materials, or characteristics described It may be combined in any suitable manner in any one or more of the embodiments or examples.In addition, without conflicting with each other, this The technical staff in field can be by the spy of different embodiments or examples described in this specification and different embodiments or examples Sign is combined.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance Or implicitly indicate the quantity of indicated technical characteristic." first " is defined as a result, the feature of " second " can be expressed or hidden It include at least one this feature containing ground.In the description of the present invention, the meaning of " plurality " is two or more, unless otherwise Clear specific restriction.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be of the invention Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable read-only memory (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other suitable Jie Matter, because can then be edited, be interpreted or when necessary with other for example by carrying out optical scanner to paper or other media Suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In readable storage medium storing program for executing.The storage medium can be read-only memory, disk or CD etc..
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in its various change or replacement, These should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the guarantor of the claim It protects subject to range.

Claims (14)

1. a kind of search result ordering method characterized by comprising
User's request and candidate result are obtained from the first ranking results, include search problem, the time in user's request Select in result includes candidate problem and the corresponding candidate answers of each candidate problem;
Obtain the first correlation metric of described search problem and the candidate problem;
Obtain the second correlation metric of described search problem and the candidate answers;
According to first correlation metric and second correlation metric, reorder to first ranking results, Obtain the second ranking results.
2. the method according to claim 1, wherein related to described second according to first correlation metric Property index, reorders to first ranking results, obtains the second sequence as a result, including:
The candidate question and answer group for including in high priority list is determined according to first correlation metric;
The candidate question and answer group for including in low priority list is determined according to second correlation metric;
By the candidate question and answer group in the high priority list and the low priority list, according to high priority preceding, low excellent The first posterior sequence of grade merges, and obtains second ranking results.
3. according to the method described in claim 2, it is characterized in that, determining that high priority arranges according to first correlation metric The candidate question and answer group for including in table, comprising:
If at least one first correlation metric of a candidate question and answer group is higher than given threshold, by the candidate question and answer group High priority list is added.
4. according to the method described in claim 2, it is characterized in that, determining that low priority arranges according to second correlation metric The candidate question and answer group for including in table, comprising:
If at least one second correlation metric of a candidate question and answer group is higher than given threshold, by the candidate question and answer group Low priority list is added.
5. the method according to claim 1, wherein obtaining the first of described search problem and the candidate problem Correlation metric, at least one including following manner:
Calculate the word rank TF-IDF similitude of described search problem and the candidate problem;
Calculate the character rank TF-IDF similitude of described search problem and the candidate problem;
Calculate the phonetic transcriptions of Chinese characters rank TF-IDF similitude of described search problem and the candidate problem;
Calculate the depth problem similitude of described search problem and the candidate problem;
Calculate the term vector similitude of described search problem and the candidate problem;
The potential applications for calculating described search problem and the candidate problem index similitude.
6. the method according to claim 1, wherein obtaining the second of described search problem and the candidate answers Correlation metric, at least one including following manner:
Calculate the depth question and answer correlation of described search problem and the candidate answers;
Calculate the word rank TF-IDF correlation of described search problem and the candidate answers;
Calculate the character rank TF-IDF correlation of described search problem and the candidate answers;
Calculate the phonetic transcriptions of Chinese characters rank TF-IDF correlation of described search problem and the candidate answers;
Calculate the term vector correlation of described search problem and the candidate answers;
It calculates described search problem and the potential applications of the candidate answers indexes correlation.
7. a kind of search results ranking device characterized by comprising
First sorting module is wrapped in user's request for obtaining user's request and candidate result from the first ranking results Search problem is included, includes candidate problem and the corresponding candidate answers of each candidate problem in the candidate result;
First correlation module, for obtaining the first correlation metric of described search problem and the candidate problem;
Second correlation module, for obtaining the second correlation metric of described search problem Yu the candidate answers;
Second sorting module is used for according to first correlation metric and second correlation metric, to the first row Sequence result reorders, and obtains the second ranking results.
8. device according to claim 7, which is characterized in that second sorting module includes:
High priority submodule, for determining the candidate question and answer for including in high priority list according to first correlation metric Group;
Low priority submodule, for determining the candidate question and answer for including in low priority list according to second correlation metric Group;
Ordering by merging submodule, for pressing the candidate question and answer group in the high priority list and the low priority list It is merged according to high priority in the posterior sequence of preceding, low priority, obtains second ranking results.
9. device according to claim 8, which is characterized in that if the high priority submodule is also used to a candidate At least one first correlation metric of question and answer group is higher than given threshold, then high priority column is added in the candidate question and answer group Table.
10. device according to claim 8, which is characterized in that if the low priority submodule is also used to a time It selects at least one second correlation metric of question and answer group to be higher than given threshold, then low priority column is added in the candidate question and answer group Table.
11. device according to claim 7, which is characterized in that first correlation module includes following submodule At least one:
First word rank submodule, for calculating the word rank TF-IDF similitude of described search problem and the candidate problem;
First character level small pin for the case module, it is similar to the candidate character rank TF-IDF of problem for calculating described search problem Property;
First phonetic transcriptions of Chinese characters rank submodule, for calculating the phonetic transcriptions of Chinese characters rank of described search problem and the candidate problem TF-IDF similitude;
Depth problem submodule, for calculating the depth problem similitude of described search problem and the candidate problem;
First term vector submodule, for calculating the term vector similitude of described search problem and the candidate problem;
First potential applications index submodule, index phase for calculating described search problem and the potential applications of the candidate problem Like property.
12. device according to claim 7, which is characterized in that second correlation module includes following submodule At least one:
Depth question and answer submodule, for calculating depth question and answer correlation of the described search problem with the candidate answers;
Second word rank submodule, for calculating word rank TF-IDF correlation of the described search problem with the candidate answers;
Second character level small pin for the case module is related to the character rank TF-IDF of the candidate answers for calculating described search problem Property;
Second phonetic transcriptions of Chinese characters rank submodule, for calculating the phonetic transcriptions of Chinese characters rank of described search problem Yu the candidate answers TF-IDF correlation;
Second term vector submodule, for calculating term vector correlation of the described search problem with the candidate answers;
Second potential applications index submodule, index phase for calculating described search problem and the potential applications of the candidate answers Guan Xing.
13. a kind of search results ranking device, which is characterized in that described device includes:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors Realize such as method described in any one of claims 1 to 6.
14. a kind of computer readable storage medium, is stored with computer program, which is characterized in that the program is held by processor Such as method described in any one of claims 1 to 6 is realized when row.
CN201810729232.9A 2018-07-05 2018-07-05 Search result ordering method and device Active CN109033244B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810729232.9A CN109033244B (en) 2018-07-05 2018-07-05 Search result ordering method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810729232.9A CN109033244B (en) 2018-07-05 2018-07-05 Search result ordering method and device

Publications (2)

Publication Number Publication Date
CN109033244A true CN109033244A (en) 2018-12-18
CN109033244B CN109033244B (en) 2020-10-16

Family

ID=65522449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810729232.9A Active CN109033244B (en) 2018-07-05 2018-07-05 Search result ordering method and device

Country Status (1)

Country Link
CN (1) CN109033244B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110825864A (en) * 2019-11-13 2020-02-21 北京香侬慧语科技有限责任公司 Method and device for obtaining answers to questions
CN110851484A (en) * 2019-11-13 2020-02-28 北京香侬慧语科技有限责任公司 Method and device for obtaining multi-index question answers
CN112784600A (en) * 2021-01-29 2021-05-11 北京百度网讯科技有限公司 Information sorting method and device, electronic equipment and storage medium
CN113326420A (en) * 2021-06-15 2021-08-31 北京百度网讯科技有限公司 Question retrieval method, device, electronic equipment and medium
CN113761084A (en) * 2020-06-03 2021-12-07 北京四维图新科技股份有限公司 POI search ranking model training method, ranking device, method and medium
CN115203598A (en) * 2022-07-20 2022-10-18 贝壳找房(北京)科技有限公司 Information sorting method, electronic device and storage medium in real estate field
CN116013488A (en) * 2023-03-27 2023-04-25 中国人民解放军总医院第六医学中心 Intelligent security management system for medical records with self-adaptive data rearrangement function

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8412514B1 (en) * 2005-10-27 2013-04-02 At&T Intellectual Property Ii, L.P. Method and apparatus for compiling and querying a QA database
CN108153876A (en) * 2017-12-26 2018-06-12 爱因互动科技发展(北京)有限公司 Intelligent answer method and system
CN108170739A (en) * 2017-12-18 2018-06-15 深圳前海微众银行股份有限公司 Problem matching process, terminal and computer readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8412514B1 (en) * 2005-10-27 2013-04-02 At&T Intellectual Property Ii, L.P. Method and apparatus for compiling and querying a QA database
CN108170739A (en) * 2017-12-18 2018-06-15 深圳前海微众银行股份有限公司 Problem matching process, terminal and computer readable storage medium
CN108153876A (en) * 2017-12-26 2018-06-12 爱因互动科技发展(北京)有限公司 Intelligent answer method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周亦鹏 等: "《软件人主题分析和信息检索技术》", 31 August 2012, 北京邮电大学出版社 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110825864A (en) * 2019-11-13 2020-02-21 北京香侬慧语科技有限责任公司 Method and device for obtaining answers to questions
CN110851484A (en) * 2019-11-13 2020-02-28 北京香侬慧语科技有限责任公司 Method and device for obtaining multi-index question answers
CN113761084A (en) * 2020-06-03 2021-12-07 北京四维图新科技股份有限公司 POI search ranking model training method, ranking device, method and medium
CN113761084B (en) * 2020-06-03 2023-08-08 北京四维图新科技股份有限公司 POI search ranking model training method, ranking device, method and medium
CN112784600A (en) * 2021-01-29 2021-05-11 北京百度网讯科技有限公司 Information sorting method and device, electronic equipment and storage medium
CN112784600B (en) * 2021-01-29 2024-01-16 北京百度网讯科技有限公司 Information ordering method, device, electronic equipment and storage medium
CN113326420A (en) * 2021-06-15 2021-08-31 北京百度网讯科技有限公司 Question retrieval method, device, electronic equipment and medium
CN113326420B (en) * 2021-06-15 2023-10-27 北京百度网讯科技有限公司 Question retrieval method, device, electronic equipment and medium
US11977567B2 (en) 2021-06-15 2024-05-07 Beijing Baidu Netcom Science Technology Co., Ltd. Method of retrieving query, electronic device and medium
CN115203598A (en) * 2022-07-20 2022-10-18 贝壳找房(北京)科技有限公司 Information sorting method, electronic device and storage medium in real estate field
CN115203598B (en) * 2022-07-20 2023-09-19 贝壳找房(北京)科技有限公司 Information ordering method in real estate field, electronic equipment and storage medium
CN116013488A (en) * 2023-03-27 2023-04-25 中国人民解放军总医院第六医学中心 Intelligent security management system for medical records with self-adaptive data rearrangement function

Also Published As

Publication number Publication date
CN109033244B (en) 2020-10-16

Similar Documents

Publication Publication Date Title
CN109033244A (en) Search result ordering method and device
Wang et al. K-adapter: Infusing knowledge into pre-trained models with adapters
US10628472B2 (en) Answering questions via a persona-based natural language processing (NLP) system
US10380149B2 (en) Question sentence generating device and computer program
RU2701110C2 (en) Studying and using contextual rules of extracting content to eliminate ambiguity of requests
US11481417B2 (en) Generation and utilization of vector indexes for data processing systems and methods
US11468238B2 (en) Data processing systems and methods
WO2018018626A1 (en) Conversation oriented machine-user interaction
CN116134432A (en) System and method for providing answers to queries
CN108846138B (en) Question classification model construction method, device and medium fusing answer information
US11455357B2 (en) Data processing systems and methods
CN114341841A (en) Building answers to queries by using depth models
WO2023236253A1 (en) Document retrieval method and apparatus, and electronic device
WO2022005573A1 (en) Interactive search training
WO2021092272A1 (en) Qa-bots for information search in documents using paraphrases
CN110717008B (en) Search result ordering method and related device based on semantic recognition
JP2017151588A (en) Image evaluation learning device, image evaluation device, image searching device, image evaluation learning method, image evaluation method, image searching method, and program
CN109657043B (en) Method, device and equipment for automatically generating article and storage medium
WO2016009321A1 (en) System for searching, recommending, and exploring documents through conceptual associations and inverted table for storing and querying conceptual indices
US20190318220A1 (en) Dispersed template-based batch interaction with a question answering system
CN113571196A (en) Method and device for constructing medical training sample and method for retrieving medical text
CN107784112A (en) Short text data Enhancement Method, system and detection authentication service platform
Secker et al. AISIID: An artificial immune system for interesting information discovery on the web
US10474726B2 (en) Generation of digital documents
EA002016B1 (en) A method of searching for fragments with similar text and/or semantic contents in electronic documents stored on a data storage devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant