CN105095385A - Method and device for outputting retrieval result - Google Patents

Method and device for outputting retrieval result Download PDF

Info

Publication number
CN105095385A
CN105095385A CN201510376979.7A CN201510376979A CN105095385A CN 105095385 A CN105095385 A CN 105095385A CN 201510376979 A CN201510376979 A CN 201510376979A CN 105095385 A CN105095385 A CN 105095385A
Authority
CN
China
Prior art keywords
searching keyword
class
natural language
training
carried out
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510376979.7A
Other languages
Chinese (zh)
Other versions
CN105095385B (en
Inventor
刘水
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510376979.7A priority Critical patent/CN105095385B/en
Publication of CN105095385A publication Critical patent/CN105095385A/en
Application granted granted Critical
Publication of CN105095385B publication Critical patent/CN105095385B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/244Grouping and aggregation

Abstract

The invention provides a method and a device for outputting a retrieval result. The method comprises that a query keyword is received; terminology analysis is carried out to the query keyword according to a training model obtained by training the query keywords of a natural language type and the query keywords of a terminology type in order to obtain an analysis result; and basic retrieval is carried out to the analysis result, and the retrieval result is output according to a relevancy sequence. By the method, the training model can be obtained by training the query keywords of the natural language type and the query keywords of the terminology type and the terminology analysis is carried out to the query keyword according to the training model in order to realize the terminology analysis based on a user behavior feedback dynamic states, timeliness and accuracy of the terminology analysis are enhanced, and thus a user can obtain the required retrieval result simply and quickly.

Description

A kind of output intent of result for retrieval and device
Technical field
The present invention relates to computer realm, particularly relate to a kind of output intent and device of result for retrieval.
Background technology
The correlation calculations technology of current term (term) mainly adopts following three kinds of methods: 1) heuristic rule algorithm, 2) without the algorithm, 3 of guidance method) there is the algorithm of guidance method.
In said method 1) heuristic rule algorithm is only incorporate in model by linguistic rules, belongs to static extraction effect, cannot change according to the inquiry custom of user or feedback result, causes result for retrieval not in real time and accurately; 2) algorithm without guidance method extracts not directly perceived and needs iteration to calculate, more loaded down with trivial details; 3) have the algorithm of guidance method to need a large amount of language material to mark, data volume causes greatly calculation cost high.
Summary of the invention
One of technical matters that the present invention solves is cannot according to the inquiry custom of user or feedback result in real time and simply obtain result for retrieval accurately.
An embodiment according to an aspect of the present invention, provides a kind of output intent of result for retrieval, comprising:
Receive searching keyword;
Training the training pattern obtained to carry out terminological analysis to described searching keyword according to being undertaken by the searching keyword of natural language class and the searching keyword of term class, obtaining analysis result;
Basis retrieval is carried out to described analysis result, and exports result for retrieval according to relevancy ranking.
An embodiment according to a further aspect of the invention, provides a kind of output unit of result for retrieval, comprising:
For receiving the device of searching keyword;
For training the training pattern obtained to carry out terminological analysis to described searching keyword according to being undertaken by the searching keyword of natural language class and the searching keyword of term class, obtain the device of analysis result;
For carrying out basis retrieval to described analysis result, and export the device of result for retrieval according to relevancy ranking.
Because the present embodiment can carry out training acquisition training pattern by the searching keyword of the searching keyword of natural language class and term class, and according to this training pattern, terminological analysis is carried out to described searching keyword, to realize based on user behavior feedback dynamic implement the analysis of term, enhance real-time and the accuracy of terminological analysis, so realize user can simple and fast obtain oneself need result for retrieval.
Those of ordinary skill in the art will understand, although detailed description is below carried out with reference to illustrated embodiment, accompanying drawing, the present invention is not limited in these embodiments.But scope of the present invention is widely, and be intended to limit scope of the present invention by means of only accompanying claim.
Accompanying drawing explanation
By reading the detailed description done non-limiting example done with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:
Fig. 1 shows the process flow diagram of the output intent of result for retrieval according to an embodiment of the invention.
Fig. 2 shows the process flow diagram of the process setting up training pattern in the output intent according to the result for retrieval of an alternate embodiment of the present invention.
Fig. 3 shows the block diagram of the output unit of the result for retrieval according to the embodiment of the present invention one.
Fig. 4 shows the block diagram of model apparatus for establishing in the output unit according to the result for retrieval of the embodiment of the present invention one.
In accompanying drawing, same or analogous Reference numeral represents same or analogous parts.
Embodiment
Before in further detail exemplary embodiment being discussed, it should be mentioned that some exemplary embodiments are described as the process or method described as process flow diagram.Although operations is described as the process of order by process flow diagram, many operations wherein can be implemented concurrently, concomitantly or simultaneously.In addition, the order of operations can be rearranged.Described process can be terminated when its operations are completed, but can also have the additional step do not comprised in the accompanying drawings.Described process can correspond to method, function, code, subroutine, subroutine etc.
Within a context alleged " computer equipment ", also referred to as " computer ", refer to the intelligent electronic device that can be performed the predetermined process such as numerical evaluation and/or logical calculated process by operation preset program or instruction, it can comprise processor and storer, the survival instruction that prestores in memory is performed to perform predetermined process process by processor, or perform predetermined process process by the hardware such as ASIC, FPGA, DSP, or combined by said two devices and realize.Computer equipment includes but not limited to server, PC, notebook computer, panel computer, smart mobile phone etc.
Described computer equipment comprises subscriber equipment and the network equipment.Wherein, described subscriber equipment includes but not limited to computer, smart mobile phone, PDA etc.; The described network equipment includes but not limited to the server group that single network server, multiple webserver form or the cloud be made up of a large amount of computing machine or the webserver based on cloud computing (CloudComputing), wherein, cloud computing is the one of Distributed Calculation, the super virtual machine be made up of a group loosely-coupled computing machine collection.Wherein, described computer equipment isolated operation can realize the present invention, also accessible network by realizing the present invention with the interactive operation of other computer equipments in network.Wherein, the network residing for described computer equipment includes but not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN (Local Area Network), VPN etc.
It should be noted that; described subscriber equipment, the network equipment and network etc. are only citing; other computer equipments that are existing or that may occur from now on or network, as being applicable to the present invention, within also should being included in scope, and are contained in this with way of reference.
Method (some of them are illustrated by process flow diagram) discussed below can be implemented by hardware, software, firmware, middleware, microcode, hardware description language or its combination in any.When implementing by software, firmware, middleware or microcode, program code or code segment in order to implement necessary task can be stored in machine or computer-readable medium (such as storage medium).(one or more) processor can implement necessary task.
Concrete structure disclosed herein and function detail are only representational, and are the objects for describing exemplary embodiment of the present invention.But the present invention can carry out specific implementation by many replacement forms, and should not be construed as only being limited to the embodiments set forth herein.
Should be understood that, although may have been used term " first ", " second " etc. here to describe unit, these unit should not limit by these terms.These terms are used to be only used to a unit and another unit to distinguish.For example, when not deviating from the scope of exemplary embodiment, first module can be called as second unit, and second unit can be called as first module similarly.Here used term "and/or" comprise one of them or more any and all combinations of listed associated item.
Should be understood that, when a unit is called as " connection " or " coupling " to another unit, it can directly connect or be coupled to another unit described, or can there is temporary location.On the other hand, " when being directly connected " or " directly coupled " to another unit, then there is not temporary location when a unit is called as.Should explain in a comparable manner the relation be used between description unit other words (such as " and be in ... between " compared to " and be directly in ... between ", " with ... contiguous " compared to " and with ... be directly close to " etc.).
Here used term is only used to describe specific embodiment and be not intended to limit exemplary embodiment.Unless context refers else clearly, otherwise singulative used here " ", " one " are also intended to comprise plural number.It is to be further understood that, the existence of the feature that term used here " comprises " and/or " comprising " specifies to state, integer, step, operation, unit and/or assembly, and do not get rid of and there is or add other features one or more, integer, step, operation, unit, assembly and/or its combination.
Also it should be mentioned that and to replace in implementation at some, the function/action mentioned can according to being different from occurring in sequence of indicating in accompanying drawing.For example, depend on involved function/action, in fact the two width figure in succession illustrated can perform simultaneously or sometimes can perform according to contrary order substantially.
Below in conjunction with accompanying drawing, the present invention is described in further detail.
Fig. 1 is the process flow diagram of the output intent of result for retrieval according to an embodiment of the invention.
Shown in composition graphs 1, the output intent of the result for retrieval described in the present embodiment comprises the steps:
S100, reception searching keyword;
S110, basis are undertaken training the training pattern obtained to carry out terminological analysis to described searching keyword by the searching keyword of natural language class and the searching keyword of term class, obtain analysis result;
S120, described analysis result carried out to basis retrieval, and export result for retrieval according to relevancy ranking.
Below each step is described in further detail.
In step S100, searching keyword (query) receives the query of user's pre-search for searching system.
In step S110, training the training pattern obtained to carry out terminological analysis to described searching keyword according to being undertaken by the searching keyword of natural language class and the searching keyword of term class (term), obtaining analysis result.
Described in composition graphs 2, in an alternate embodiment of the present invention, the process setting up training pattern can adopt following method:
Step 1100, according to retrieves historical record, searching keyword to be divided into groups;
Step 1101, carry out type identification to searching keyword in every group, described type comprises the searching keyword of natural language class and the searching keyword of term class;
Step 1102, the searching keyword of natural language class carried out features training and obtain training pattern after the searching keyword of term class is carried out discretize training.
In step 1100, URL (the UniformResoureLocator that the grouping of query is clicked according to user, uniform resource locator) divide into groups, but due to some URL click intention and indefinite, therefore the character length of the general uniform resource locator URL according to user's click, field delimiter and address structure separator filter URL, to realize the grouping of query.
In step 1101, be grouped into example with one of them, each grouping deterministic process, comprising:
First, determine to organize interior natural language class query, wherein judge that the process of natural language class query comprises: the distribution of query in statistical packet, calculates the degree of confidence of each query according to statistics, select the highest and length of query of degree of confidence the longest as natural language class query.General degree of confidence is determined by the exercise question of document in existing library, also can determine by other means.It is most preferably that the length of query is generally selected to be greater than 7 words, but this length does not do too concrete restriction.
Then, determine to organize interior term class query, comprise: determine with the lenth ratio of natural language class query in group in setting range, and have the query of predetermined registration as the term class query corresponding with described natural language class query with this natural language class query.In general group, the setting range of the lenth ratio of term class query and this natural language class query is 2 is most preferably, but suitably can adjust according to the requirement of inquiry precision.
As optionally, if do not have term class query in group, abandon this grouping, namely a grouping only has natural language class query and does not have term class query, then represent that grouping is improper, then abandon this grouping.
In step 1102, describedly natural language class query carried out features training and obtain training pattern after term class query is carried out discretize training, being grouped into example with one of them, comprising:
First, natural language class query in group is carried out characterization after linguistic analysis, two category features are selected in specific features process: 1, the contextual feature of natural language class query, ngram can be adopted (to be a kind of language model conventional in large vocabulary continuous speech recognition, also known as Chinese language model) part-of-speech tagging result etc., namely can be divided into the part of speech of the part of speech of current word, above a word and/or the part of speech of 2 words above; 2, every natural language class query distribution situation of natural language class query in all groups in retrieves historical record in group, can as tf (TermFrequency, word frequency) and/or the statistical weight method such as idf (InverseDocumentFrequency, reverse document-frequency).
Secondly, the word frequency of term class query in statistics group, carries out to this statistics the importance degree that discretize obtains this term class query.Namely add up the word frequency of term class query in all groups in retrieves historical record of term class query in every group, the importance degree of term class query can be divided into 3 grades according to word frequency statistics result, be designated as 0,1 and 2, wherein 0 is most important term class query.
Finally, in conjunction with the above-mentioned importance degree to natural language class query characterization and term class query, obtain regression model based on GBDT (GradientBoostingDecisionTree, gradient promotes decision tree) training.As optionally, regression model can adopt linear regression, logistic regression or svm-rank (support vector sorter) etc.
Can be applied in the embodiment of the present invention in large-scale searching system, analyze mainly for natural language class query, term class query important in natural language class query is carried out extraction and importance degree calculating, term class query higher for importance degree is inquired about, then return result for retrieval according to the analysis of importance degree, thus export result for retrieval according to relevancy ranking.The category of the embodiment of the present invention is the vocabulary Significance Analysis of natural language class query, based on the retrieves historical record of mass users, unified with nature class of languages query and term class query advantage separately, in synonym natural language class query group, the word frequency of each term in query is inquired about as target using term, using the syntactic context of natural language class query as feature, under the sequence framework of GBDT, train a term sequence regression model.
The embodiment of the present invention compared with prior art takes full advantage of the retrieves historical record of mass users, query is identified automatically, realize the dynamic feedback based on user behavior, based on didactic rule, automatic model language material is collected and training, enhance the accuracy that term analyzes, different models is trained according to the query of different length, difference modeling is carried out to the distribution difference caused due to length, make result for retrieval more accurate, vocabulary importance degree based on natural language class query carries out the output sequence of result for retrieval, make the result for retrieval that user can find oneself to need fast.
The searching keyword of the natural language class of user's input in historical record is carried out different training by the embodiment of the present invention together with the searching keyword of the term class split into according to the searching keyword of this natural language class, and according to the continuous Renewal model of the input that user is real-time, therefore make result for retrieval real-time higher.Carry out terminological analysis to the searching keyword of user's input by model in step 110, step 120 carries out basis retrieval based on the result of terminological analysis simultaneously, and exports result for retrieval according to relevancy ranking, makes Output rusults more accurate.
Fig. 3 illustrates the output unit (hereinafter referred to as " result for retrieval output unit ") inventing the result for retrieval of an embodiment according to this case, and this result for retrieval output unit can be a device in computer equipment or computer equipment.Shown in composition graphs 3, this result for retrieval output unit comprises:
For receiving the device (hereinafter referred to as " receiving trap ") 300 of searching keyword;
For training the training pattern obtained to carry out terminological analysis to described searching keyword according to being undertaken by the searching keyword of natural language class and the searching keyword of term class, obtain the device (hereinafter referred to as " analytical equipment ") 310 of analysis result;
For carrying out basis retrieval to described analysis result, and export the device (hereinafter referred to as " output unit ") 320 of result for retrieval according to relevancy ranking.
Below each device is described in further detail.
In receiving trap 300, searching keyword (query) receives the query of user's pre-search for searching system.
In analytical equipment 310, training the training pattern obtained to carry out terminological analysis to described searching keyword according to being undertaken by the searching keyword of natural language class and the searching keyword of term class (term), obtaining analysis result.
Described in composition graphs 4, in an alternate embodiment of the present invention, also comprise the apparatus for establishing (hereinafter referred to as " model apparatus for establishing ") 400 for being carried out training the training pattern obtained by the searching keyword of natural language class and the searching keyword of term class, comprising:
For the device (hereinafter referred to as " apparatus for grouping ") 401 searching keyword being carried out divide into groups according to retrieves historical record;
For carrying out type identification to searching keyword in every group, described type comprises the device (hereinafter referred to as " type identification device ") 402 of the searching keyword of natural language class and the searching keyword of term class;
For the searching keyword of natural language class being carried out features training and the searching keyword of term class being carried out the rear device (hereinafter referred to as " training pattern device ") 403 obtaining training pattern of discretize training.
In apparatus for grouping 401, URL (the UniformResoureLocator that the grouping of query is clicked according to user, uniform resource locator) divide into groups, but due to some URL click intention and indefinite, therefore the character length of the general uniform resource locator URL according to user's click, field delimiter and address structure separator filter URL, to realize the grouping of query.
In type identification device 402, be grouped into example with one of them, each grouping deterministic process, comprising:
For determining the device (hereinafter referred to as " natural language class query confirms sub-device ") 4021 of natural language class query, this device judges that the process of natural language class query comprises: the distribution of query in statistical packet, calculate the degree of confidence of each query according to statistics, select the highest and length of query of degree of confidence the longest as natural language class query.General degree of confidence is determined by the exercise question of document in existing library, also can determine by other means.It is most preferably that the length of query is generally selected to be greater than 7 words, but this length does not do too concrete restriction.
For determining the device (hereinafter referred to as " term class query confirms sub-device ") 4022 of term class query, this device comprises: determine with the lenth ratio of natural language class query in group in setting range, and have the query of predetermined registration as the term class query corresponding with described natural language class query with this natural language class query.In general group, the setting range of the lenth ratio of term class query and this natural language class query is 2 is most preferably, but suitably can adjust according to the requirement of inquiry precision.
As optionally, if do not have term class query in group, abandon this grouping, namely a grouping only has natural language class query and does not have term class query, then represent that grouping is improper, then abandon this grouping.
Training pattern device 403, is grouped into example with one of them, comprising:
For the natural language class query in group being carried out the device (hereinafter referred to as " the sub-device of characterization ") 4031 of characterization after linguistic analysis, two category features are selected in specific features process: 1, the contextual feature of natural language class query, ngram can be adopted (to be a kind of language model conventional in large vocabulary continuous speech recognition, also known as Chinese language model) part-of-speech tagging result etc., namely can be divided into the part of speech of the part of speech of current word, above a word and/or the part of speech of 2 words above; 2, every natural language class query distribution situation of natural language class query in all groups in retrieves historical record in group, can as tf (TermFrequency, word frequency) and/or the statistical weight method such as idf (InverseDocumentFrequency, reverse document-frequency).
For adding up the word frequency of term class query in group, the device (hereinafter referred to as " the true stator apparatus of importance degree ") 4032 that discretize obtains the importance degree of this term class query is carried out to this statistics.Namely add up the word frequency of term class query in all groups in retrieves historical record of term class query in every group, the importance degree of term class query can be divided into 3 grades according to word frequency statistics result, be designated as 0,1 and 2, wherein 0 is most important term class query.
For in conjunction with the above-mentioned importance degree to natural language class query characterization and term class query, promote based on gradient the device (hereinafter referred to as " training sub-device ") 4033 that decision tree GBDT training obtains regression model.As optionally, regression model can adopt linear regression, logistic regression or svm-rank (support vector sorter) etc.
Can be applied in the embodiment of the present invention in large-scale searching system, analyze mainly for natural language class query, term class query important in natural language class query is carried out extraction and importance degree calculating, term class query higher for importance degree is inquired about, then return result for retrieval according to the analysis of importance degree, thus export result for retrieval according to relevancy ranking.The category of the embodiment of the present invention is the vocabulary Significance Analysis of natural language class query, based on the retrieves historical record of mass users, unified with nature class of languages query and term class query advantage separately, in synonym natural language class query group, the word frequency of each term in query is inquired about as target using term, using the syntactic context of natural language class query as feature, under the sequence framework of GBDT, train a term sequence regression model.
The embodiment of the present invention compared with prior art takes full advantage of the retrieves historical record of mass users, query is identified automatically, realize the dynamic feedback based on user behavior, based on didactic rule, automatic model language material is collected and training, enhance the accuracy that term analyzes, different models is trained according to the query of different length, difference modeling is carried out to the distribution difference caused due to length, make result for retrieval more accurate, vocabulary importance degree based on natural language class query carries out the output sequence of result for retrieval, make the result for retrieval that user can find oneself to need fast.
The searching keyword of the natural language class of user's input in historical record is carried out different training by the embodiment of the present invention together with the searching keyword of the term class split into according to the searching keyword of this natural language class, and according to the continuous Renewal model of the input that user is real-time, therefore make result for retrieval real-time higher.In analytical equipment 310, by model, terminological analysis is carried out to the searching keyword of user's input, analyze the keyword that importance degree is high, output unit 320 carries out basis retrieval based on the result of terminological analysis simultaneously, and exports result for retrieval according to relevancy ranking, makes Output rusults more accurate.
It should be noted that the present invention can be implemented in the assembly of software and/or software restraint, such as, each device of the present invention can adopt special IC (ASIC) or any other similar hardware device to realize.In one embodiment, software program of the present invention can perform to realize step mentioned above or function by processor.Similarly, software program of the present invention (comprising relevant data structure) can be stored in computer readable recording medium storing program for performing, such as, and RAM storer, magnetic or CD-ROM driver or flexible plastic disc and similar devices.In addition, steps more of the present invention or function can adopt hardware to realize, such as, as coordinating with processor thus performing the circuit of each step or function.
To those skilled in the art, obviously the invention is not restricted to the details of above-mentioned one exemplary embodiment, and when not deviating from spirit of the present invention or essential characteristic, the present invention can be realized in other specific forms.Therefore, no matter from which point, all should embodiment be regarded as exemplary, and be nonrestrictive, scope of the present invention is limited by claims instead of above-mentioned explanation, and all changes be therefore intended in the implication of the equivalency by dropping on claim and scope are included in the present invention.Any Reference numeral in claim should be considered as the claim involved by limiting.In addition, obviously " comprising " one word do not get rid of other unit or step, odd number does not get rid of plural number.Multiple unit of stating in system claims or device also can be realized by software or hardware by a unit or device.First, second word such as grade is used for representing title, and does not represent any specific order.

Claims (10)

1. an output intent for result for retrieval, wherein, comprising:
Receive searching keyword;
Training the training pattern obtained to carry out terminological analysis to described searching keyword according to being undertaken by the searching keyword of natural language class and the searching keyword of term class, obtaining analysis result;
Basis retrieval is carried out to described analysis result, and exports result for retrieval according to relevancy ranking.
2. method according to claim 1, wherein, the searching keyword of the described searching keyword by natural language class and term class carries out the process of establishing of training the training pattern obtained, and comprising:
According to retrieves historical record, searching keyword is divided into groups;
Carry out type identification to searching keyword in every group, described type comprises the searching keyword of natural language class and the searching keyword of term class;
The searching keyword of natural language class is carried out features training and obtain training pattern after the searching keyword of term class is carried out discretize training.
3. method according to claim 2, wherein, describedly according to retrieves historical record, searching keyword to be divided into groups, comprising:
According to the character length of uniform resource locator URL, field delimiter and address structure separator that user clicks, searching keyword is divided into groups.
4. method according to claim 2, wherein, described type identification is carried out to searching keyword in every group, comprising:
According to the searching keyword of the degree of confidence determination natural language class of key word of the inquiry in every group;
Determine that the lenth ratio of the searching keyword often organizing the interior natural language class with determining is in setting range, and have the searching keyword of predetermined registration as the searching keyword of the term class corresponding with the searching keyword of described natural language class with the searching keyword of the natural language class determined.
5. method according to claim 4, wherein, describedly the searching keyword of natural language class carried out features training and obtain training pattern after the searching keyword of term class is carried out discretize training, comprising:
The searching keyword often organizing interior natural language class is carried out characterization;
Statistics is carried out to the searching keyword of term class corresponding to the searching keyword of natural language class in every group and obtains statistics, according to the importance degree of the searching keyword discretize of described statistics being obtained to natural language class in every group;
The importance degree of unified with nature class of languages query characterization and natural language class query, promotes decision tree GBDT training based on gradient and obtains regression model.
6. an output unit for result for retrieval, wherein, comprising:
For receiving the device of searching keyword;
For training the training pattern obtained to carry out terminological analysis to described searching keyword according to being undertaken by the searching keyword of natural language class and the searching keyword of term class, obtain the device of analysis result;
For carrying out basis retrieval to described analysis result, and export the device of result for retrieval according to relevancy ranking.
7. device according to claim 6, also comprises: for being carried out the apparatus for establishing of training the training pattern obtained by the searching keyword of natural language class and the searching keyword of term class, comprising:
For device searching keyword being carried out divide into groups according to retrieves historical record;
For carrying out type identification to searching keyword in every group, described type comprises the device of the searching keyword of natural language class and the searching keyword of term class;
For the searching keyword of natural language class being carried out features training and the searching keyword of term class being carried out the rear device obtaining training pattern of discretize training.
8. device according to claim 7, wherein, the described device for being carried out by searching keyword dividing into groups according to retrieves historical record, comprising:
For the character length of the uniform resource locator URL that clicks according to user, field delimiter and address structure separator, searching keyword is carried out the device divided into groups.
9. device according to claim 7, wherein, described for carrying out type identification to searching keyword in every group, described type comprises the device of the searching keyword of natural language class and the searching keyword of term class, comprising:
For the device of the searching keyword of the degree of confidence determination natural language class of key word of the inquiry in basis often group;
For determining that the lenth ratio of the searching keyword often organizing the interior natural language class with determining is in setting range, and there is the searching keyword of predetermined registration as the device of the searching keyword of the term class corresponding with the searching keyword of described natural language class with the searching keyword of the natural language class determined.
10. device according to claim 9, wherein, described for the searching keyword of natural language class being carried out features training and the searching keyword of term class being carried out the rear device obtaining training pattern of discretize training, comprising:
For the searching keyword often organizing interior natural language class being carried out the device of characterization;
Statistics is obtained, according to the device of importance degree of the searching keyword discretize of described statistics being obtained to natural language class in every group for carrying out statistics to the searching keyword of term class corresponding to the searching keyword of natural language class in every group;
For the importance degree of unified with nature class of languages query characterization and natural language class query, promote based on gradient the device that decision tree GBDT training obtains regression model.
CN201510376979.7A 2015-06-30 2015-06-30 A kind of output method and device of retrieval result Active CN105095385B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510376979.7A CN105095385B (en) 2015-06-30 2015-06-30 A kind of output method and device of retrieval result

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510376979.7A CN105095385B (en) 2015-06-30 2015-06-30 A kind of output method and device of retrieval result

Publications (2)

Publication Number Publication Date
CN105095385A true CN105095385A (en) 2015-11-25
CN105095385B CN105095385B (en) 2018-11-13

Family

ID=54575822

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510376979.7A Active CN105095385B (en) 2015-06-30 2015-06-30 A kind of output method and device of retrieval result

Country Status (1)

Country Link
CN (1) CN105095385B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109783648A (en) * 2018-12-28 2019-05-21 北京声智科技有限公司 A method of ASR language model is improved using ASR recognition result
CN111597314A (en) * 2020-04-20 2020-08-28 科大讯飞股份有限公司 Reasoning question-answering method, device and equipment
CN112100202A (en) * 2020-11-12 2020-12-18 北京药联健康科技有限公司 Product identification and product information completion method, storage medium and robot

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101819578A (en) * 2010-01-25 2010-09-01 青岛普加智能信息有限公司 Retrieval method, method and device for establishing index and retrieval system
US20110078205A1 (en) * 2009-09-30 2011-03-31 Robin Salkeld Method and system for finding appropriate semantic web ontology terms from words
CN103049474A (en) * 2011-10-25 2013-04-17 微软公司 Search query and document-related data translation
CN103399891A (en) * 2013-07-22 2013-11-20 百度在线网络技术(北京)有限公司 Method, device and system for automatic recommendation of network content

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110078205A1 (en) * 2009-09-30 2011-03-31 Robin Salkeld Method and system for finding appropriate semantic web ontology terms from words
CN101819578A (en) * 2010-01-25 2010-09-01 青岛普加智能信息有限公司 Retrieval method, method and device for establishing index and retrieval system
CN103049474A (en) * 2011-10-25 2013-04-17 微软公司 Search query and document-related data translation
CN103399891A (en) * 2013-07-22 2013-11-20 百度在线网络技术(北京)有限公司 Method, device and system for automatic recommendation of network content

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109783648A (en) * 2018-12-28 2019-05-21 北京声智科技有限公司 A method of ASR language model is improved using ASR recognition result
CN109783648B (en) * 2018-12-28 2020-12-29 北京声智科技有限公司 Method for improving ASR language model by using ASR recognition result
CN111597314A (en) * 2020-04-20 2020-08-28 科大讯飞股份有限公司 Reasoning question-answering method, device and equipment
CN111597314B (en) * 2020-04-20 2023-01-17 科大讯飞股份有限公司 Reasoning question-answering method, device and equipment
CN112100202A (en) * 2020-11-12 2020-12-18 北京药联健康科技有限公司 Product identification and product information completion method, storage medium and robot

Also Published As

Publication number Publication date
CN105095385B (en) 2018-11-13

Similar Documents

Publication Publication Date Title
CN102402619B (en) Search method and device
CN109815308B (en) Method and device for determining intention recognition model and method and device for searching intention recognition
CN102184169B (en) Method, device and equipment used for determining similarity information among character string information
CN101876981B (en) A kind of method and device building knowledge base
CN102193939B (en) The implementation method of information navigation, information navigation server and information handling system
JP5721818B2 (en) Use of model information group in search
CN101694668B (en) Method and device for confirming web structure similarity
CN109542247B (en) Sentence recommendation method and device, electronic equipment and storage medium
CN104360994A (en) Natural language understanding method and natural language understanding system
CN103092943B (en) A kind of method of advertisement scheduling and advertisement scheduling server
CN108319627A (en) Keyword extracting method and keyword extracting device
CN105653701B (en) Model generating method and device, word assign power method and device
CN105045901A (en) Search keyword push method and device
CN106202514A (en) Accident based on Agent is across the search method of media information and system
CN103136228A (en) Image search method and image search device
CN105302810A (en) Information search method and apparatus
CN105159930A (en) Search keyword pushing method and apparatus
CN104268166A (en) Input method, device and electronic device
CN105069077A (en) Search method and device
CN106294661A (en) A kind of extended search method and device
CN102982125B (en) A kind of method and apparatus for determining synonym text
CN105373546A (en) Information processing method and system for knowledge services
CN103744887A (en) Method and device for people search and computer equipment
CN104503988A (en) Searching method and device
CN105389328B (en) A kind of extensive open source software searching order optimization method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant