CN104424216A - Method and device for intention digging - Google Patents

Method and device for intention digging Download PDF

Info

Publication number
CN104424216A
CN104424216A CN201310371165.5A CN201310371165A CN104424216A CN 104424216 A CN104424216 A CN 104424216A CN 201310371165 A CN201310371165 A CN 201310371165A CN 104424216 A CN104424216 A CN 104424216A
Authority
CN
China
Prior art keywords
intention
inquiry
similar
intent information
input inquiry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310371165.5A
Other languages
Chinese (zh)
Other versions
CN104424216B (en
Inventor
黄耀海
张碧川
刘鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to CN201310371165.5A priority Critical patent/CN104424216B/en
Publication of CN104424216A publication Critical patent/CN104424216A/en
Application granted granted Critical
Publication of CN104424216B publication Critical patent/CN104424216B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method and device for intention digging. Disclosed is a method for intention digging. The method comprises the following steps: obtaining an input query; generating intention likeness queries aiming at the input query, wherein each intention likeness query has an intention type which is the same as or similar to that of the input query; digging a group of intentions aiming at each intention likeness query, wherein each intention provides a sub-theme aiming at the corresponding intention type; determining a similar intention information description set by all intention groups which use the intention likeness query; and digging the intention aiming at the input query by using the similar intention information description set.

Description

For being intended to the method and apparatus excavated
Technical field
The present invention relates to the method and apparatus of text mining.Especially, the present invention relates to the method and apparatus for excavating intention.And more particularly, the present invention relates to the method and apparatus finding the inquiry search intention behind that user proposes.
Background technology
Along with the development of computing machine and infotech, the speed that the information now in All Around The World produces constantly increases.There is all multi information such as personal information, occupational information, entertainment information, scientific and technical information, government information now in the world.Because information is too much, so cause becoming problem to the tissue of information and access.
In order to improve the experience of user in information seeking processes, be constantly developed for the method and system helping user to access the information that it is found.Such as, at Santos, et al.2011.University of Glasgow at the NTCIR-9Intent task:Experiments withTerrier on Subtopic Mining and Document Ranking.Proceedings ofNTCIR-9Workshop Meeting, 2011, Tokyo(non-patent literature 1) in propose and attempt understanding the inquiry potential intention behind that user inputs.When user inputs brief and ambiguous inquiry, it is desirable to export n (such as, n=10) important and diversified the best intention result.Table 1 shows a kind of example.
The example of table 1 input inquiry and output
Such as, as shown in table 1, if user input query " becoming a paralegal ", then can export the intention that several are relevant with " becoming a paralegal ", select for user.
Excavate in process in intention, usually evaluate the quality of intention Result with following formula:
D # - nDCG = I - rec + D - nDCG 2 - - - ( 1 )
Wherein I-rec(Intent recall) represent intention recall rate, namely in obtained intention, the quantity of the useful intention obtained (namely, the correct result obtained) ratio of quantity (all correct results) of those intentions that obtains relative to hope, be often used for the variation of measuring intention, D-nDCG represents intention accuracy rate (Intent precision), D-nDCG is diversified normalization accumulation of discount gain (Diversified-Normalized DiscountedCumulative Gain), its position-based calculates the degree of correlation of the result document list that search engine returns (see Sakai and Song, Evaluating Diversified Search ResultUsing Per-intent Graded Relevance, Proceedings of SIGIR ' 11, 2011Beijing(non-patent literature 2)), it is for measuring the overall degree of correlation of intention, and D#-nDCG represents the linear combination of I-rec and D-nDCG.
In above formula, I_rec, D-nDCG and D#-nDCG are (also referred to as model answer based on the time of day data of inquiring about, ground truth) by what determine, normally calculate by intention Result and time of day data are compared, the acquisition of these indexs is well known in the art, therefore will be not described in detail.
Exemplarily, in the prior art, the time of day data of inquiry are obtained by such as under type.Such as, time of day data can be artificial settings.For another example time of day data are provided by expositor and are produced by many individual votes.
In the prior art, usually multiple intention candidate is excavated from external resource of overall importance (such as search engine, wikipedia, inquiry log and Anchor Text), then by parameters such as frequencies, excavated intention candidate is sorted, to obtain the intention desired by user.
Such as at Xue, et al.2011.THUIR at NTCIR-9INTENT Task.Proceedings of NTCIR-9Workshop Meeting, 2011, Tokyo(non-patent literature 3) in disclose a kind of for being intended to the method excavated.The method extracts the Search Results comprising input inquiry, then based on the intention candidate of Search Results identification for input inquiry, finally sorts, to obtain the intention desired by user to described intention candidate based on certain criterion.
Fig. 1 shows the process flow diagram for being intended to the method excavated used in the non-patent literature 3 of prior art.As shown in Figure 1, in step S2100, obtain the inquiry of user's input.Next, in step S2110, excavate the intention candidate of described inquiry from external resources of overall importance such as search engine, wikipedia and inquiry logs.Next, in step S2120, from obtained intention candidate, remove the intention candidate of repetition.Then, in step S2130, the parameters such as the frequency utilizing intention candidate to occur, common frequency, muster data and the editing distance occurred, sort to the residue intention candidate after the intention candidate eliminating repetition.Finally, in step S2140, according to ranking results, the forward intention candidate of chosen position, as the intention desired by user, exports.
But, but according to practice, those skilled in the art find, for prior art non-patent literature 3 disclosed in method, when intent information (such as user's query history) is rare, with user, the intention obtained may expect that the intention obtained is inconsistent, namely said method accurately cannot provide user the intention candidate wishing to obtain.Therefore, the intention excavation performance of said method is lower.
In addition, at US Patent No. 8,214,347B2(patent documentation 1) in propose another kind of for being intended to the method excavated.In the method, from Search Results, extracting high frequency phrases, then by using some predetermined rule, excavating intention according to these phrases.
Fig. 2 shows the US8 in prior art, the process flow diagram for being intended to the method excavated used in 214,347B2.As shown in Figure 2, in step S2200, obtain the inquiry of user's input.Next, in step S2210, for the inquiry of user's input, extract Search Results.Next, the excavation being intended to candidate is carried out in step S2220, be included in the phrase identifying in Search Results and comprise input inquiry, and the feature such as the frequency utilizing phrase to occur, common frequency, muster data and the editing distance occurred determines optimum phrase, as intention candidate.Then, in step S2230, to intention, candidate sorts.Finally, in step S2240, according to ranking results, the forward intention candidate of chosen position, as the intention desired by user, exports.
But, but according to practice, those skilled in the art find, for the US8 of prior art, method disclosed in 214,347B2, when intent information (such as user's query history) is rare, with user, the intention obtained may expect that the intention obtained is inconsistent, namely said method does not accurately provide user the intention candidate wishing to obtain.Therefore, the intention excavation performance of said method is also lower.
Therefore, need to propose a kind of new technology and solve above-mentioned problems of the prior art.
Summary of the invention
An object of the present invention is the accuracy improving intention excavation.
Another object of the present invention improves intention recall rate.
According to an aspect of the present invention, provide a kind of for being intended to the method excavated, described method comprises: obtain input inquiry; Generate the similar inquiry of intention for described input inquiry, wherein each is intended to similar inquiry and has intention type same or similar with described input inquiry; Be intended to similar inquiry excavation one group intention for each, wherein each intention provides the sub-topics for the similar inquiry of corresponding intention; Determine that by using whole intention group of the similar inquiry of described intention similar intent information describes collection; And by using described similar intent information to describe the intention collecting and excavate for described input inquiry.
According to a further aspect in the invention, provide a kind of for method for information retrieval, comprising: receive the input inquiry that user adopts natural language; Carry out intention according to the above-mentioned method for being intended to excavate from described input inquiry to excavate; And obtain institute excavate be intended to Search Results.
According to another aspect of the invention, provide a kind of method auxiliary for question and answer, comprising: receive the input inquiry that user adopts natural language; Theme is excavated from described input inquiry according to the above-mentioned method for being intended to excavate; And the answer obtained for excavated theme.
According to another aspect of the invention, provide a kind of for being intended to the equipment excavated, described equipment comprises: input inquiry acquiring unit, obtains input inquiry; Be intended to similar query generation unit, generate the similar inquiry of intention for described input inquiry, wherein each is intended to similar inquiry and has intention type same or similar with described input inquiry; First intention excavates unit, is intended to similar inquiry excavation one group intention for each, and wherein each intention provides the sub-topics for the similar inquiry of corresponding intention; Similar intent information describes collection determining unit, determines that similar intent information describes collection by using whole intention group of the similar inquiry of described intention; And second intention excavates unit, by using described similar intent information, the intention collecting and excavate for described input inquiry is described.
According to another aspect of the invention, provide a kind of equipment for information retrieval, comprising: input inquiry receiving element, receive the input inquiry that user adopts natural language; Above-mentioned for being intended to the equipment excavated, carry out intention from described input inquiry and excavate; And Search Results obtain unit, obtain institute excavate be intended to Search Results.
According to another aspect of the invention, provide a kind of equipment auxiliary for question and answer, comprising: input inquiry receiving element, receive the input inquiry that user adopts natural language; Above-mentioned for being intended to the equipment excavated, excavate theme from described input inquiry; And answer obtains unit, obtains the answer for excavated theme.
One of advantage of the present invention is, the accuracy that intention is excavated is improved.Especially, when intent information rareness, also the intention obtained desired by user candidate can accurately be provided.
Another in advantage of the present invention is, intention recall rate is improved.
By referring to the detailed description of accompanying drawing to exemplary embodiment of the present invention, further feature of the present invention and advantage thereof will become clear.
Accompanying drawing explanation
What form a part for instructions drawings describes embodiments of the invention, and together with the description for explaining principle of the present invention.
With reference to accompanying drawing, according to detailed description below, clearly the present invention can be understood, wherein:
Fig. 1 shows the process flow diagram for being intended to the method excavated used in the non-patent literature 3 of prior art.
Fig. 2 shows the US8 in prior art, 214,347B2(patent documentation 1) the middle process flow diagram for being intended to the method excavated used.
Fig. 3 is the block diagram of the hardware configuration that the computer system 1000 can implementing embodiments of the present invention is shown.
Fig. 4 shows the process flow diagram by using the similar inquiry of intention to carry out being intended to the method excavated according to the embodiment of the present invention.
Fig. 5 shows the process flow diagram of the method generating the similar inquiry of intention according to the embodiment of the present invention.
Fig. 6 shows the process flow diagram of the method being generated the similar inquiry of intention according to the embodiment of the present invention by the similar inquiry storehouse of intention.
Fig. 7 shows the process flow diagram using domain body to generate the method for the similar inquiry of intention according to the embodiment of the present invention.
Fig. 8 shows the process flow diagram using the similar designator of intention to generate the method for the similar inquiry of intention according to the embodiment of the present invention.
Fig. 9 shows according to the embodiment of the present invention, generate the process flow diagram of the method for the similar inquiry of intention for described input inquiry.
Figure 10 shows according to the embodiment of the present invention, identify the process flow diagram of the method for the core intention part of input inquiry and modifier part.
Figure 11 shows according to the embodiment of the present invention, determined that by lexical analysis means similar intent information describes the process flow diagram of the method for collection.
Figure 12 shows according to the embodiment of the present invention, determined that by grammatical analysis means similar intent information describes the process flow diagram of the method for collection.
Figure 13 shows according to the embodiment of the present invention, determined that by semantic relation analysis means similar intent information describes the process flow diagram of the method for collection.
Figure 14 shows according to the embodiment of the present invention, determined that by logic analysis means similar intent information describes the process flow diagram of the method for collection.
Figure 15 shows the process flow diagram by using the similar inquiry of intention to carry out being intended to the another kind of method excavated according to the embodiment of the present invention.
Figure 16 shows the process flow diagram for method for information retrieval according to the embodiment of the present invention.
Figure 17 shows the process flow diagram of the method for assisting for question and answer according to the embodiment of the present invention.
Figure 18 shows the functional block diagram of the equipment 7000 for excavating intention according to the embodiment of the present invention.
Figure 19 shows the functional block diagram of the equipment 8000 for information retrieval according to the embodiment of the present invention.
Figure 20 shows the functional block diagram of the equipment 9000 of assisting for question and answer according to the embodiment of the present invention.
Embodiment
Various exemplary embodiment of the present invention is described in detail now with reference to accompanying drawing.It should be noted that: unless specifically stated otherwise, otherwise positioned opposite, the numerical expression of the parts of setting forth in these embodiments and step and numerical value do not limit the scope of the invention.
Illustrative to the description only actually of at least one exemplary embodiment below, never as any restriction to the present invention and application or use.
May not discuss in detail for the known technology of person of ordinary skill in the relevant, method and apparatus, but in the appropriate case, described technology, method and apparatus should be regarded as a part of authorizing instructions.
In all examples with discussing shown here, any occurrence should be construed as merely exemplary, instead of as restriction.Therefore, other example of exemplary embodiment can have different values.
It should be noted that: represent similar terms in similar label and letter accompanying drawing below, therefore, once be defined in an a certain Xiang Yi accompanying drawing, then do not need to be further discussed it in accompanying drawing subsequently.
Fig. 3 is the block diagram of the hardware configuration that the computer system 1000 can implementing embodiments of the present invention is shown.
As shown in Figure 3, computer system comprises computing machine 1110.Computing machine 1110 comprises the processing unit 1120, system storage 1130, fixed non-volatile memory interface 1140, removable non-volatile memory interface 1150, user's input interface 1160, network interface 1170, video interface 1190 and the output peripheral interface 1195 that connect via system bus 1121.
System storage 1130 comprises ROM(ROM (read-only memory)) 1131 and RAM(random access memory) 1132.BIOS(Basic Input or Output System (BIOS)) 1133 to reside in ROM1131.Operating system 1134, application program 1135, other program module 1136 and some routine data 1137 reside in RAM1132.
The fixed non-volatile memory 1141 of such as hard disk and so on is connected to fixed non-volatile memory interface 1140.Fixed non-volatile memory 1141 such as can store operating system 1144, application program 1145, other program module 1146 and some routine data 1147.
Such as the removable non-volatile memory of floppy disk 1151 and CD-ROM drive 1155 and so on is connected to removable non-volatile memory interface 1150.Such as, diskette 1 152 can be inserted in floppy disk 1151, and CD(CD) 1156 can be inserted in CD-ROM drive 1155.
Such as the input equipment of mouse 1161 and keyboard 1162 and so on is connected to user's input interface 1160.
Computing machine 1110 can be connected to remote computer 1180 by network interface 1170.Such as, network interface 1170 can be connected to remote computer 1180 via LAN (Local Area Network) 1171.Or network interface 1170 can be connected to modulator-demodular unit (modulator-demodulator) 1172, and modulator-demodular unit 1172 is connected to remote computer 1180 via wide area network 1173.
Remote computer 1180 can comprise the storer 1181 of such as hard disk and so on, and it stores remote application 1185.
Video interface 1190 is connected to monitor 1191.
Export peripheral interface 1195 and be connected to printer 1196 and loudspeaker 1197.
Computer system shown in Fig. 3 is only illustrative and is never intended to carry out any restriction to invention, its application, or uses.
Computer system shown in Fig. 3 can be incorporated in any embodiment, can be used as stand-alone computer, or the disposal system that also can be used as in equipment, can remove the assembly that one or more is unnecessary, also one or more additional assembly can be added to it.
Fig. 4 shows the process flow diagram by using the similar inquiry of intention to carry out being intended to the method excavated according to the embodiment of the present invention.
As shown in Figure 4, first, in step S3100, the inquiry of user's input is obtained.It will be appreciated by those skilled in the art that the inquiry that user inputs can adopt various language, include but not limited to: Chinese, English, Japanese, Korean, German, French, Russian, Arabic etc.
Such as, the inquiry that user inputs can be " becoming a paralegal ".For this inquiry, the time of day data (that is, model answer) obtained desired by user illustrate in table 2.
Table 2 is for the time of day data of inquiry " becoming a paralegal "
In table 2, so-called " intention type " refers to the relation of intention and respective queries.For the sake of clarity, table 3 shows some examples of intention type.
Inquiry Intention Intention type
becoming a paralegal becoming a paralegal class Course(course)
becoming a paralegal becoming a paralegal degree Degree(position)
becoming a engineer becoming a engineer class Course(course)
becoming a engineer Requirement of becoming a engineer Require(requires)
Table 3 intention type example
As shown in table 3, if the inquiry of input is " becoming a paralegal ", and be intended that " becoming a paralegal class " accordingly, then corresponding intention type is exactly " course(course) ", and namely " becoming a paralegal class " relates to the information of " course " aspect.If the inquiry of input is " becoming a paralegal ", and be intended that " becoming a paralegal degree " accordingly, then corresponding intention type is exactly " degree(position) ", and namely " becoming a paralegal degree " relates to the information of " degree " aspect.
Continue with reference to Fig. 4, next, in step S3110, the query generation for input is intended to similar inquiry.Wherein, each is intended to similar inquiry and has intention type same or similar with described input inquiry.
If inquiry is similar, they may have same or analogous intention type, when this means the information when user search one inquiry, and certain sub-topics of his this inquiry of removal search, and when the inquiry that other user searchs are similar, the sub-topics searched for may be identical.Such as, user search " becoming a paralegal ", a kind of being intended that and finding generally " course that the courseof paralegal(assists lawyer) ", if and user search " becoming anengineer ", a kind of being intended that and finding generally " course of the course of engineer(slip-stick artist) ".For other intention inquiry of " becoming a ' position ' ", this intention is also general.Therefore, we can use the similar inquiry of intention to excavate the intention of inquiring about for user.
Fig. 5 shows the process flow diagram of the method generating the similar inquiry of intention according to the embodiment of the present invention.As shown in Figure 5, first, in step S3210, for the inquiry of user's input, the similar inquiry of multiple intention is generated.As described hereinafter, multiple method can be used to generate the similar inquiry of multiple intention.Next, in step S3220, calculate the similar degree between each inquiry in the similar inquiry of described intention and described input inquiry.The method calculating the similar degree between each inquiry in the similar inquiry of described intention and described input inquiry will hereafter describe in more detail.Finally, in step S3230, from the similar inquiry of described intention, the similar inquiry of intention of the highest specific quantity of similar degree or similar degree is selected to be greater than the similar inquiry of intention of predetermined threshold, as output.
When input inquiry is simple word, the method shown in Fig. 6 can be adopted to generate the similar inquiry of multiple intention.As shown in Figure 6, in step S3310, by checking that the similar inquiry storehouse of intention generates the similar inquiry of intention.Such as, being intended to the list that similar inquiry storehouse maintains pop music star, when input inquiry relates to emerging pop music star, pop music star like emerging pop music stars can being selected with this as being intended to similar inquiry.Next, in step S3320, calculate the similar degree between each inquiry in the similar inquiry of described intention and described input inquiry.The method of each inquiry in the similar inquiry of calculating intention and the similar degree between described input inquiry will be described in more detail below.Finally, in step S3330, from the similar inquiry of described intention, the similar inquiry of intention of the highest specific quantity of similar degree or similar degree is selected to be greater than the similar inquiry of intention of predetermined threshold, as output.
In addition, the method shown in Fig. 7 can also be adopted generate the similar inquiry of intention.As shown in Figure 7, in step S3410, by checking that domain body generates the similar inquiry of multiple intention, that is, in domain body, obtain one or more brother of nodes of described input inquiry as the similar inquiry of described intention.Described " domain body " is structurized encyclopaedic knowledge network, such as wikipedia.Such as, if the inquiry of input is " Vanuatu ".In geography body, " Vanuatu " is Oceania country.Therefore, can geography body be passed through, select " Fiji ", " Indonesia ", " Kiribati ", " Marshall Islands " etc. as the similar inquiry of intention.Next, in step S3420, calculate the similar degree between each inquiry in the similar inquiry of described intention and described input inquiry.The method of each inquiry in the similar inquiry of calculating intention and the similar degree between described input inquiry will be described in more detail below.Finally, in step S3430, from the similar inquiry of described intention, the similar inquiry of intention of the highest specific quantity of similar degree or similar degree is selected to be greater than the similar inquiry of intention of predetermined threshold, as output.
Alternatively and/or additionally, the contiguous concept of described input inquiry can also be obtained as the similar inquiry of described intention in language dictionaries.
Alternatively and/or additionally, also, similarity can be intended to from inquiry log, obtain one or more inquiry as the similar inquiry of described intention by calculating based on the muster data be associated with described input inquiry.
In addition, the method shown in Fig. 8 can also be adopted generate the similar inquiry of intention.As shown in Figure 8, in step S3510, generate the similar inquiry of intention by using the similar designator of intention.The similar designator of described intention comprises at least one item in the following: coordination designator, two phrases wherein connected by described coordination designator are used as identical syntactic element in sentence, such as " with ", "AND", " and ", " with ", etc.; Relativity designator, the first phrase wherein in sentence is in relativity with the second phrase be connected to after described first phrase by described relativity designator, such as " relative to ", " compared to ", " vs ", " compared to ", etc.; And choice relation designator, two phrases wherein connected by described choice relation designator form selective expression in sentence, such as "or", " ... among ", " ... between ", " or ", " between ", " among ", etc.The similar designator of described intention shows that the phrase linked by it can be the similar inquiry of intention of candidate.
In other words, in step S3510, obtain one or more inquiry to phrase from least one data source, wherein each inquiry comprises phrase: described input inquiry, be intended to similar designator and the 3rd phrase; And from each inquiry to the 3rd phrase described in Phrase extraction, as the similar inquiry of described intention.
Such as, if the inquiry inputted is " pressure type cleaning machine ", following syntagma (sentence segment) can be obtained from data source:
Pressure type cleaning machine vs cold anticyclone cleaning machine;
Pressure type cleaning machine vs Pneumatic cleaning machine;
Pressure type cleaning machine and air compressor;
Pressure type cleaning machine and steam cleaner;
Grass mower or pressure type cleaning machine.
Therefore, for inquiry " pressure type cleaning machine ", " cold anticyclone cleaning machine ", " Pneumatic cleaning machine ", " air compressor ", " steam cleaner " and " grass mower " can be selected as the similar inquiry of intention.
Next, in step S3520, calculate the similar degree between each inquiry in the similar inquiry of described intention and described input inquiry.The method of each inquiry in the similar inquiry of calculating intention and the similar degree between described input inquiry will be described in more detail below.Finally, in step S3530, from the similar inquiry of described intention, the similar inquiry of intention of the highest specific quantity of similar degree or similar degree is selected to be greater than the similar inquiry of intention of predetermined threshold, as output.
In addition, when input inquiry is the inquiry of many words, the method shown in Fig. 9 can be used generate the similar inquiry of intention.As shown in Figure 9, first, in step S3610, core intention part and the modifier part of the described input inquiry inquired about as many words is identified.
Figure 10 shows according to the embodiment of the present invention, identifies the process flow diagram of the core intention part of input inquiry and the method for modifier part.As shown in Figure 10, first, in step S3710, each semantic primitive for input inquiry generates expanding query.That is, described input inquiry is resolved, described input inquiry to be divided into multiple semantic primitive (multiple word); For each semantic primitive divided of described input inquiry, generate the similar inquiry of interim intention (expanding query) that is made up of divided semantic primitive and changing section, wherein said changing section is the similar phrase of intention generated for other semantic primitive of described input inquiry.In one embodiment, the generation of the similar phrase of described intention (changing section) can comprise: obtain one or more inquiry to phrase from least one data source, and wherein each inquiry comprises phrase: other semantic primitive of described input inquiry, be intended to similar designator and the 3rd phrase; And from each inquiry to the 3rd phrase described in Phrase extraction, as the similar phrase of described intention (changing section).
Next, in step S3720, for each semantic primitive divided of described input inquiry, excavate one group of intention for the similar inquiry of each interim intention (expanding query), wherein each intention provides the sub-topics for the similar inquiry of corresponding interim intention.For each semantic primitive divided of described input inquiry, consistent degree is calculated by the intention group of the similar inquiry of interim intention of more corresponding semantic primitive, wherein said consistent degree is the homophylic tolerance of intention of the similar inquiry of interim intention for corresponding semantic primitive, if the intention type be present in the intention of the similar inquiry of interim intention is more general, then described consistent degree is higher.
Next, in step S3730, the semantic primitive in described input inquiry with the highest consistent degree is defined as the core intention part of described input inquiry, and other semantic primitive is defined as the modifier part of described input inquiry.
Such as, for input inquiry " becoming a paralegal ", use said method, expanding query is generated for each word.Table 4 shows the example of query word and the corresponding expanding query inquired about for many words.
The example of the query word that table 4 is inquired about for many words and corresponding expanding query
Then, for each expanding query, use traditional method to generate intention, and calculate consistent degree by comparing the intention group excavated for each semantic primitive.
In one embodiment, described consistent degree can calculate as follows:
Consi = N PopIntent N AllIntent - - - ( 1 )
Wherein, N allInetentrepresent that the institute obtained for the expanding query of each semantic primitive is intentional, N popIntentrepresent the intent information description existed in inquiring about more than 5.
Such as, at " becoming a Engineer ", " becoming a Accountant ", in the intention of " becoming a Law clerk " and so on, ubiquity " becoming a*class ", " becoming a*degree ", the intention type of " becoming a*training " and so on.But at " training paralegal ", " severing paralegal ", " supervising a paralegal ", in the intention of " directing a paralegal " and so on, seldom has ubiquitous intention type.Therefore for input inquiry " becoming a paralegal ", the consistent degree ratio " paralegal " of " becoming " is high.In this example, by data analysis, the consistent degree of " becoming " is 0.81, and the consistent degree of " paralegal " is 0.03.Therefore in this inquiry, core intention part is " becoming ", and modifier part is " paralegal ", and the intention of inquiry is determined primarily of " becoming ".
Referring back to Fig. 9, in step S3620, the similar inquiry of described intention is generated by the modifier part of replacing described input inquiry with multiple Substitute For Partial, wherein each Substitute For Partial is the similar phrase of intention generated for described modifier part, and wherein the similar phrase of each intention has the same or similar intention type of modifier part with described input inquiry.In one embodiment, the generation being intended to similar phrase (Substitute For Partial) comprises: obtain one or more inquiry to phrase from least one data source, and wherein each inquiry comprises phrase: described modifier part, be intended to similar designator and the 3rd phrase; And from each inquiry to the 3rd phrase described in Phrase extraction, as the similar phrase of described intention (Substitute For Partial).
Next, can in step S3630, calculate the similar degree between each inquiry in the similar inquiry of described intention and described input inquiry.The method of each inquiry in the similar inquiry of calculating intention and the similar degree between described input inquiry will be described in more detail below.Finally, can, in step S3640, from the similar inquiry of described intention, the similar inquiry of intention of the highest specific quantity of similar degree or similar degree be selected to be greater than the similar inquiry of intention of predetermined threshold, as output.
Alternatively, when described input inquiry is the inquiry of many words, also only can generate the similar phrase of intention, as the similar inquiry of described intention for the core intention part of described input inquiry.Specifically, when described input inquiry is the inquiry of many words, generating the similar inquiry of intention for described input inquiry can comprise: the core intention part and the modifier part that identify described input inquiry; Then the similar phrase of intention of the core intention part of described input inquiry is generated, as the similar inquiry of described intention.
Then, method described below can also be used to calculate similar degree between each inquiry in the similar inquiry of described intention and described input inquiry.Finally, can from the similar inquiry of described intention, the similar inquiry of intention of the highest specific quantity of similar degree or similar degree be selected to be greater than the similar inquiry of intention of predetermined threshold, as output.
Wherein, the method that can describe by referring to Figure 10 identifies core intention part and the modifier part of described input inquiry.First, resolve described input inquiry, described input inquiry to be divided into multiple semantic primitive (multiple word); For each semantic primitive divided of described input inquiry, generate the similar inquiry of interim intention that is made up of divided semantic primitive and changing section, wherein said changing section is the similar phrase of intention generated for other semantic primitive of described input inquiry.In one embodiment, the generation of the similar phrase of described intention (changing section) can comprise: obtain one or more inquiry to phrase from least one data source, and wherein each inquiry comprises phrase: other semantic primitive of described input inquiry, be intended to similar designator and the 3rd phrase; And from each inquiry to the 3rd phrase described in Phrase extraction, as the similar phrase of described intention (changing section).Next, for each semantic primitive divided of described input inquiry, for the similar inquiry excavation of each interim intention one group of intention, wherein each intention provides the sub-topics for the similar inquiry of corresponding interim intention.For each semantic primitive divided of described input inquiry, consistent degree is calculated by the intention group of the similar inquiry of interim intention of more corresponding semantic primitive, wherein said consistent degree is the homophylic tolerance of intention of the similar inquiry of interim intention for corresponding semantic primitive, if the intention type be present in the intention of the similar inquiry of interim intention is more general, then described consistent degree is higher.Finally, the semantic primitive in described input inquiry with the highest consistent degree is defined as the core intention part of described input inquiry, and other semantic primitive is defined as the modifier part of described input inquiry.In addition, in one embodiment, the similar phrase of intention generating the core intention part of described input inquiry comprises: obtain one or more inquiry to phrase from least one data source, and wherein each inquiry comprises phrase: the core intention part of described input inquiry, be intended to similar designator and the 3rd phrase; And from each inquiry to the 3rd phrase described in Phrase extraction, as the similar inquiry of described intention.
Such as, if input inquiry is " black history ", can determine that the core intention part of this input inquiry is " history ".Modifier part " black " can not be considered, and the similar phrase of the intention only generating " history ", such as " history timeline ", " studyof history ", " list of famous history ", " resources history " etc., as the similar inquiry of intention.
Below, the method for the inquiry in the similar inquiry of description calculating intention and the similar degree between input inquiry.Each inquiry in the similar inquiry of described intention and the similar degree between described input inquiry are calculated by least one item in the following.
(1) consistent degree of described inquiry and described input inquiry, if the inquiry in the similar inquiry of described intention is more similar to the intention type of described input inquiry, then the similar degree between them is higher;
(2) the vocabulary similarity of described inquiry and described input inquiry, if the inquiry in the similar inquiry of described intention is more similar to the form of described input inquiry, similar degree then between these two inquiries is higher, and the similar degree ratio " motorbike " of such as " car ", " motorbike ", " motorscooter ", the similar degree of " bike " are high;
(3) the grammer similarity of described inquiry and described input inquiry, if the inquiry in the similar inquiry of described intention is more similar to the grammatical pattern of described input inquiry in context (fragment or document) environment, similar degree then between these two inquiries is higher, such as relative to " ride abike ", the similar degree of " drive a car " and " drive a motor " is higher;
(4) the semantic similarity of described inquiry and described input inquiry, if the inquiry in the similar inquiry of described intention is more similar in implication to described input inquiry, then the similar degree of these two inquiries is higher;
(5) described inquiry and the described input inquiry context similarity in prepared collected works, if the inquiry in the similar inquiry of described intention is more similar to the context (fragment or document) of described input inquiry, then the similar degree of these two inquiries is higher;
(6) described inquiry and the described input inquiry common occurrence rate in inquiry log, if the inquiry in the similar inquiry of described intention and described input inquiry jointly occur more frequent in inquiry log, then the similar degree of these two inquiries is higher;
(7) described inquiry and the distance of described input inquiry in domain body, such as Britain, Japan and method state are all countries, but due to Britain and France Dou Shi European countries in the body, therefore the similar degree of Britain and France is higher than Britain and Japanese similar degree; And
(8) similarity of the muster data of described inquiry and described input inquiry, if the Similar Broken Line of the muster data of the inquiry in the similar inquiry of described intention and described input inquiry, then these two inquiries are similar.
In addition, each inquiry in the similar inquiry of described intention and the similar degree between described input inquiry can also be calculated by least one real-world information, and described real-world information at least comprises: time, position, user model and environment.
Such as, the inquiry inputted is " Phoenix university ", and the similar inquiry of the intention generated can be as shown in table 5.
The similar inquiry of intention of table 5 " university of phoenix "
When user searches in Beijing, user may wish to obtain the information as " the university of phoenix " of " university of the U.S. ", and when user searches in State of Arizona, US Mesa city, he may like to know that the information of " university of phoenix " as " university of Arizona State ", therefore, for these two users being in diverse location, the similar degree for the similar inquiry of each generated intention is different.
For being in for Pekinese user, most similar inquiry may be StanfordUniversity, Harvard University, Massachusetts Institute of Technology and University of Pennsylvania.And for being in the user in State of Arizona, US Mesa city, most similar inquiry may be Western International University, GrandCanyon University, University of Arizona and Northern Arizona University.
In addition, it will be appreciated by those skilled in the art that the similar degree for the similar inquiry of intention of institute's input inquiry is also different if the identity of user, the equipment (such as computing machine, mobile phone, printer etc.) that uses are different.
In addition, it will be appreciated by those skilled in the art that the mode that can combine the similar inquiry of above various generation intention in an arbitrary manner.
Returning referring to Fig. 4, in step S3120, by using method of the prior art, being intended to similar inquiry excavation one group intention for each, wherein each intention provides the sub-topics for the similar inquiry of corresponding intention.
Next, in step S3130, determine that by using whole intention group of the similar inquiry of described intention similar intent information describes collection.Similar intent information description is the linguistic form of respective intent type.Such as, go out as shown in Table 2, the intention type of " becoming a paralegal class " is " course(course) ", but in the present invention, we do not need the intention type identifying described intention, and only need the similar intent information only extracting this intention to describe.Such as " becoming a paralegal class ", extract " * class ".
Similar intent information is described and can be generated by input inquiry, such as, use " becoming a engineer class " and " steps on becoming a lawyer " to generate similar intent information and describe " becoming a paralegal class " and " steps onbecoming a paralegal ".In addition, described similar intent information is described and also can be presented by the regular expression of input inquiry.The similar inquiry of intention such as inquiring about " becoming a paralegal " is " becoming a paralegal class ", the similar inquiry of intention of inquiry " becoming a engineers " is " becoming a engineer class ", therefore, similar intent information describes and can be expressed as " * class ".
According to one embodiment of the present invention, can determine that described similar intent information describes collection by following step: the linguistic form analyzing each intention in whole intention group of the similar inquiry of described intention; Determine at least one query intention relation between the linguistic form of the similar inquiry of respective intent in described linguistic form and all the other linguistic forms; Corresponding to determined at least one query intention relation, each linguistic form be intended to is transformed to regular expression; And the regular expression that conversion obtains added to described similar intent information and describe and concentrate.
Preferably, determine that described similar intent information describes collection and may further include: expand each intention group, comprise: for each intention in this intention group, by replacing this at least one word with the synonym of at least one word in intention or near synonym and generate synonym phrase, produced synonym phrase not in the similar inquiry of corresponding intention, and adds in this intention group by least one word wherein said.
Similar intent information describes can have polytype, and the such as similar intent information of vocabulary type describes, the similar intent information of grammer type describes, the similar intent information of semantic type describes and the similar intent information of logical type describes.
According to the embodiment of the present invention, any one or multiple (with the random order) in lexical analysis, grammatical analysis, semantic relation analysis and logic analysis can be carried out to each intention in whole intention group of the similar inquiry of intention, and obtained similar intent information description combined, thus determine that described similar intent information describes collection.
Figure 11 shows according to the embodiment of the present invention, determined that by lexical analysis means similar intent information describes the process flow diagram of the method for collection.
As shown in figure 11, first in step S4100, resolved each intention in whole intention group of the similar inquiry of described intention by lexical analysis means, whether meet at least one morphological rule to detect the similar inquiry of corresponding intention.If the similar inquiry of corresponding intention meets at least one morphological rule, then following, in step S4110, for each intention in the intention group of the similar inquiry of this intention, replace the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention with asterisk wildcard, and obtain the intention of conversion.Next, in step S4120, the similar intent information of vocabulary type determining to have vocabulary and asterisk wildcard form describes, and the intention by described conversion describes as the similar intent information of the vocabulary type with vocabulary and asterisk wildcard form, describes the similar intent information of this vocabulary type as described regular expression; And it is concentrated described regular expression to be added to the description of similar intent information.
Such as, if the inquiry of input is " scooter ", the similar intent information of following exemplary term type can be generated and describe:
*store
electronic*
online
cheap*
*motor
Figure 12 shows according to the embodiment of the present invention, determined that by grammatical analysis means similar intent information describes the process flow diagram of the method for collection.
As shown in figure 12, first in step S4200, resolved each intention in whole intention group of the similar inquiry of described intention by grammatical analysis means, whether meet at least one syntax rule to detect the similar inquiry of corresponding intention.If the similar inquiry of corresponding intention meets at least one syntax rule, then following in step S4210, for each intention in the intention group of the similar inquiry of this intention, replace the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention with asterisk wildcard, and obtain the intention of conversion.Next, in step S4220, the similar intent information of grammer type determining to have syntax rule and asterisk wildcard form describes, intention by described conversion describes as the similar intent information of the grammer type with syntax rule and asterisk wildcard form, describes the similar intent information of this grammer type as described regular expression; And it is concentrated described regular expression to be added to the description of similar intent information.
Such as, for input inquiry " scooter ", the similar intent information of following example grammar type can be generated and describe:
*/prep/kids
how to/verb/*
*/prep/sale
Figure 13 shows according to the embodiment of the present invention, determined that by semantic relation analysis means similar intent information describes the process flow diagram of the method for collection.
As shown in figure 13, first in step S4300, resolved each intention in whole intention group of the similar inquiry of described intention by semantic relation analysis means, whether meet at least one semantic relation to detect the similar inquiry of corresponding intention.If the similar inquiry of corresponding intention meets at least one semantic relation, then following in step S4310, for each intention in the intention group of the similar inquiry of this intention, the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention is replaced with asterisk wildcard, and replace this all the other linguistic forms with the semantic marker of all the other linguistic forms of this intention, and obtain the intention of conversion.Next, in step S4320, the similar intent information of semantic type determining to have semantic marker and asterisk wildcard form describes, intention by described conversion describes as the similar intent information of the semantic type with semantic marker and asterisk wildcard form, describes similar for this semantic type intent information as described regular expression; And it is concentrated described regular expression to be added to the description of similar intent information.
Such as, for input inquiry " scooter ", the similar intent information of following exemplary semantic type can be generated and describe:
*<brand>
*<company>
Figure 14 shows according to the embodiment of the present invention, determined that by logic analysis means similar intent information describes the process flow diagram of the method for collection.
As shown in figure 14, first in step S4400, resolved each intention in whole intention group of the similar inquiry of described intention by logic analysis means, whether meet at least one logical relation to detect the similar inquiry of corresponding intention.If the similar inquiry of corresponding intention meets at least one logical relation, then following in step S4410, for each intention in the intention group of the similar inquiry of this intention, the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention is replaced with asterisk wildcard, and replace this all the other linguistic forms with the logical type of all the other linguistic forms of this intention, and obtain the intention of conversion.Next, in step S4420, the similar intent information of logical type determining to have logical type and asterisk wildcard form describes, intention by described conversion describes as the similar intent information of the logical type with logical type and asterisk wildcard form, describes similar for this logical type intent information as described regular expression; And it is concentrated described regular expression to be added to the description of similar intent information.
Such as, for input inquiry " scooter ", the similar intent information of following example logic type can be generated and describe:
*[version of](Word)
(Word)[place of]*
As previously mentioned, can to intention similar inquiry whole intention groups in each intention carry out in lexical analysis, grammatical analysis, semantic relation analysis and logic analysis any one or multiple.Such as, one analysis independent in lexical analysis, grammatical analysis, semantic relation analysis and logic analysis can be only carried out, the whole four kinds of analyses also only can carried out in lexical analysis, grammatical analysis, semantic relation analysis and logic analysis each intention in whole intention group of the similar inquiry of intention to each intention in whole intention group of the similar inquiry of intention.Therefore, the similar intent information obtained describe collection can comprise that the similar intent information of vocabulary type describes, the similar intent information of grammer type describes, the similar intent information of semantic type describes and the similar intent information of logical type describe in one or more.
In addition, in one embodiment, determine that described similar intent information describes collection and may further include: calculate described similar intent information and describe the degree of confidence concentrating each similar intent information to describe; And the similar intent information description of concentrating and selecting the similar intent information description of the highest specific quantity of degree of confidence or degree of confidence to be greater than predetermined threshold is described from described similar intent information.
In addition, described degree of confidence can use at least one item in the following to calculate: the frequency that similar intent information describes; The coverage rate that similar intent information describes; And similar intent information describes the correlativity with input inquiry.
In addition, described degree of confidence can from least one calculating the following: described similar intent information describes collection; The intention training set prepared; And the realm information prepared.
In addition, the degree of confidence described from described similar intent information description collection compute classes like intent information may further include: describe the different weight of configuration according to the respective class that the popularity of the similar inquiry of intention comes described similar intent information description is concentrated like intent information; And/or according to the similar inquiry of intention and the similar degree between described input inquiry come to described similar intent information describes the respective class concentrated seemingly intent information describe and configure different weights.
Or for previous inquiry " university of phoenix " university.For being in Pekinese user, because the similar degree of Stanford University, Harvard University, MassachusettsInstitute of Technology and University of Pennsylvania is high, therefore higher weight can be distributed for the similar inquiry of these intentions.Table 6 shows and is intended to each of " university ofphoenix " weight that similar inquiry distributes.
The weight example of the similar inquiry of intention of table 6 " university of phoenix "
Therefore, for input inquiry " university of phoenix ", the weight that the similar intent information description of " university of* " form obtains is higher.
Returning referring to Fig. 4, next in step S3140, describing by using described similar intent information the intention collecting and excavate for described input inquiry.In one embodiment, the asterisk wildcard in concentrating similar intent information to describe can be described generate intention for described input inquiry by replacing similar intent information with input inquiry.Such as, if input inquiry is " becominga paralegal ", and similar intent information is described as " step to* ", then can generate new intention " step to becoming a paralegal ", and generated intention can be exported.
Figure 15 shows the process flow diagram by using the similar inquiry of intention to carry out being intended to the another kind of method excavated according to the embodiment of the present invention.Method shown in Figure 15 by intention method for digging of the prior art and Combination of Methods according to the present invention are got up, to realize being intended to more accurately excavate.For simple and clear object, will be omitted with the detailed description with reference to step identical in the embodiment described by Fig. 4 in present embodiment.
As shown in figure 15, first, in step S5100, the inquiry of user's input is obtained.It will be appreciated by those skilled in the art that the inquiry that user inputs can adopt various language, include but not limited to: Chinese, English, Japanese, Korean, German, French, Russian, Arabic etc.Such as, the inquiry that user inputs can be " becoming a paralegal ".
Next, in step S5110, excavate one group of intention candidate of described input inquiry by using method well known in the prior art from external resources of overall importance such as search engine, wikipedia and inquiry logs.Next, in step S5120, from obtained intention candidate, remove the intention candidate of repetition.Next, in step S5130, to intention, candidate sorts, to obtain first group of intention.
Table 7 shows the inquiry " becoming a paralegal " inputted for user, the first group of intention obtained by using method well known in the prior art.
First group of intention that table 7 obtains for " becoming a paralegal "
Continue with reference to Figure 15, method described in the application of the invention excavates second group of intention for described input inquiry.That is, in step S3110, generate the similar inquiry of intention for described input inquiry, wherein each is intended to similar inquiry and has intention type same or similar with described input inquiry.In step S3120, be intended to similar inquiry excavation one group intention for each, wherein each intention provides the sub-topics for the similar inquiry of corresponding intention.In step S3130, determine that by using whole intention group of the similar inquiry of described intention similar intent information describes collection.In step S3140, by using described similar intent information, the second group of intention collecting and excavate for described input inquiry is described.
Following step S5140, what be intended to first group of intention and second group is combined into line ordering.In one embodiment, the intention only appeared in first group of intention can be deleted.
In another embodiment, can also by the second group of intention using described similar intent information description collection and described first group of intention to excavate for described input inquiry.Embodiment comprises: the asterisk wildcard described at least one similar intent information description of concentrating by replacing described similar intent information with input inquiry generates at least one intention, and at least one intention wherein said is not in first group of intention; And in first group of intention, add at least one intention generated, and will first group of intention of at least one generated intention be with the addition of as described second group of intention.
But some inquiry may have the peculiar intention be not present in the intention being intended to similar inquiry.In some embodiments of the invention, these peculiar intentions are processed especially.Such as, for input inquiry " last supper painting ", first group of intention about this input inquiry is shown in table 8.
Table 8 is about first group of intention of " last supper painting "
As can be seen from Table 8, when being intended to similar inquiry for other oil painting Leonardesque, " last supper painting Jesus " and " Last Supper Painting Milan Italy " is specific to this inquiry.And in intention mining process, it is desirable to these peculiar intentions to remain.Therefore, according to the embodiment of the present invention, by using described similar intent information to describe, second group of another kind of embodiment be intended to that collection and described first group of intention excavate for described input inquiry comprises: describe set pair by using described similar intent information and sort for first group of intention of described input inquiry.This embodiment comprises further: identify for the peculiar intention in first group of intention of described input inquiry; According to the intentional peculiar degree of spy, improve the weight of peculiar intention in described sequence; Wherein, special intentional peculiar degree is calculated by least one item in the following: input inquiry and the intentional common occurrence rate of spy in prepared intention training set; Input inquiry and the relation of peculiar intention in domain knowledge; The frequency of peculiar intention in muster data; And the popularity of peculiar intention in inquiry log.
Continue with reference to Figure 15, in step S5150, intention exports by the requirement according to user.Such as, the intention of specific quantity can be exported.Table 9 shows the intention exported for input inquiry " becominga paralegal ".Obviously, with reference to the time of day data (model answer) shown by table 2, the result obtained more meets the requirement of user compared to the first group of result obtained by prior art.
Table 9 for input inquiry " becoming a paralegal ", the intention exported by method of the present invention
The present inventor compares test to the method for Figure 15 according to the present invention and method of the prior art.Through test, the method shown in Fig. 1 is performance the best way in prior art.Therefore, select the method shown in Fig. 1 as the contrast of the inventive method.
Utilize the method shown in Fig. 1 of prior art to excavate the intention candidate of described inquiry from external resources of overall importance such as search engine, wikipedia, inquiry log and Anchor Text, and by the frequency of occurrences, intention candidate is sorted.
Excavate as a comparison, utilize the method according to Figure 15 of the present invention to carry out intention, and by the frequency of occurrences, intention candidate is sorted.The present inventor also tests 50 inquiries, comprising: " furniture for small spaces ", " Churchill downs ", " becoming a paralegal ", " internet phone service ", " Arkansas ", " battles in the civil war ", " hobby stores ", " Ontario Californiaairport " etc.Table 10 shows average test result.
Tolerance Prior art The present invention Improve
I-rec 0.3785 0.3933 0.0148
D-nDCG 0.3384 0.3715 0.0331
D#-nDCG 0.3584 0.3826 0.0242
The Performance comparision of table 10 the present invention and prior art
As can be seen from Table 10, compared to the method for prior art, the intention recall rate according to the method for Figure 15 of the present invention is all improved with intention accuracy rate.In addition, in D#-nDCG, method of the present invention improves 2.42% than the method for prior art.
In order to react effect of the present invention more intuitively, be described in detail for input inquiry " becoming aparalegal ".For input " becoming a paralegal ", front 10 results of getting the output of the present invention and prior art compare.Table 11 shows the desired time of day data obtained.Table 12 shows prior art and output separately of the present invention.Table 13 shows prior art and test and comparison result of the present invention, and obviously, the result that the present invention obtains is more accurate.That is, the accuracy rate of intention excavation can be provided by the present invention.
The desired time of day data obtained of table 11
Table 12 prior art and respective output of the present invention
Tolerance Prior art The present invention
I-rec 0.1111 0.3333
D-nDCG 0.0734 0.5053
D#-nDCG 0.0922 0.4193
Table 13 prior art and test and comparison of the present invention
Comparing by testing above, the present invention can be confirmed further and more adequately can carry out intention excavation compared to prior art, and improving intention recall rate.
Figure 16 shows the process flow diagram for method for information retrieval according to the embodiment of the present invention.As shown in figure 16, in step S6100, receive the input inquiry that user adopts natural language.Next, in step S6110, carry out intention excavation according to the method for the similar inquiry of intention that uses described herein from described input inquiry.Next, in step S6120, obtain institute excavate be intended to Search Results.
Figure 17 shows the process flow diagram of the method for assisting for question and answer according to the embodiment of the present invention.As shown in figure 17, in step S6200, receive the input inquiry that user adopts natural language.Next, in step S6210, excavate theme according to the method for the similar inquiry of intention that uses described herein from described input inquiry.Next, in step S6220, obtain the answer for excavated theme.
Figure 18 shows the functional block diagram of the equipment 7000 for excavating intention according to the embodiment of the present invention.The all functions module of this equipment 7000 (that is, no matter the various unit included by this equipment 7000, illustrate in the drawings, or do not illustrate in figure) can be realized by the combination realizing the hardware of the principle of the invention, software or hardware and software.It will be understood by those skilled in the art that the functional module described in Figure 18 can combine or be divided into submodule, thus realize the principle of foregoing invention.Therefore, description herein can be supported any possible combination of functional module described herein or divides or further limit.
As shown in figure 18, according to an aspect of the present invention, the equipment 7000 for excavating intention can comprise: input inquiry acquiring unit 7100, be intended to similar query generation unit 7200, first intention excavate unit 7300, similar intent information describe collection determining unit 7400 and second intention excavate unit 7500.Described input inquiry acquiring unit 7100 is configured to obtain input inquiry.Described intention similar query generation unit 7200 is configured to generate the similar inquiry of intention for described input inquiry, and wherein each is intended to similar inquiry and has intention type same or similar with described input inquiry.Described first intention excavates unit 7300 and is configured to be intended to similar inquiry excavation one group intention for each, and wherein each intention provides the sub-topics for the similar inquiry of corresponding intention.Described similar intent information describes collection determining unit 7400 and is configured to by using whole intention group of the similar inquiry of described intention to determine similar intent information description collection.Described second intention excavates unit 7500 and is configured to by using described similar intent information to describe the intention collecting and excavate for described input inquiry.
In one embodiment, the similar query generation unit 7200 of described intention can comprise: inquire about phrase acquiring unit, obtain one or more inquiry to phrase from least one data source, wherein each inquiry comprises phrase: described input inquiry, be intended to similar designator and the 3rd phrase; And the 3rd Phrase extraction unit, from each inquiry to the 3rd phrase described in Phrase extraction, as the similar inquiry of described intention.
In one embodiment, the similar designator of described intention can comprise at least one item in the following: coordination designator, and two phrases wherein connected by described coordination designator are used as identical syntactic element in sentence; Relativity designator, the first phrase wherein in sentence is in relativity with the second phrase be connected to after described first phrase by described relativity designator; And choice relation designator, two phrases wherein connected by described choice relation designator form selective expression in sentence.
In one embodiment, when described input inquiry is the inquiry of many words, the similar query generation unit 7200 of described intention can comprise: core intention part and modifier part recognition unit, identifies core intention part and the modifier part of described input inquiry; And be intended to similar phrase generation unit, generate the similar phrase of intention of the core intention part of described input inquiry, as the similar inquiry of described intention.
In one embodiment, when described input inquiry is the inquiry of many words, the similar query generation unit 7200 of described intention can comprise: core intention part and modifier part recognition unit, identifies core intention part and the modifier part of described input inquiry; And modifier part replacement unit, the similar inquiry of described intention is generated by the modifier part of replacing described input inquiry with multiple Substitute For Partial, wherein each Substitute For Partial is the similar phrase of intention generated for described modifier part, and wherein the similar phrase of each intention has the same or similar intention type of modifier part with described input inquiry.
In one embodiment, described core intention part and modifier part recognition unit can comprise: input inquiry resolution unit, resolve described input inquiry, so that described input inquiry is divided into multiple semantic primitive; The similar query generation unit of interim intention, for each semantic primitive divided of described input inquiry, generate the similar inquiry of interim intention that is made up of divided semantic primitive and changing section, wherein said changing section is the similar phrase of intention generated for other semantic primitive of described input inquiry; 3rd intention excavates unit, and for each semantic primitive divided of described input inquiry, for the similar inquiry excavation of each interim intention one group of intention, wherein each intention provides the sub-topics for the similar inquiry of corresponding interim intention; Consistent degree computing unit, for each semantic primitive divided of described input inquiry, consistent degree is calculated by the intention group of the similar inquiry of interim intention of more corresponding semantic primitive, wherein said consistent degree is the homophylic tolerance of intention of the similar inquiry of interim intention for corresponding semantic primitive, if the intention type be present in the intention of the similar inquiry of interim intention is more general, then described consistent degree is higher; And core intention part determining unit, the semantic primitive in described input inquiry with the highest consistent degree is defined as the core intention part of described input inquiry, and other semantic primitive is defined as the modifier part of described input inquiry.
In one embodiment, described intention similar query generation unit can comprise following at least one: from the intention being stored in described input inquiry similar inquiry storehouse, obtain the unit of one or more inquiry as the similar inquiry of described intention; The unit of one or more brother of nodes as the similar inquiry of described intention of described input inquiry is obtained in domain body; The unit of contiguous concept as the similar inquiry of described intention of described input inquiry is obtained in language dictionaries; And be intended to similarity from inquiry log, obtain the unit of one or more inquiry as the similar inquiry of described intention by calculating based on the muster data be associated with described input inquiry.
In one embodiment, the similar query generation unit of described intention may further include: similar degree computing unit, calculates the similar degree between each inquiry in the similar inquiry of described intention and described input inquiry; And be intended to similar query selection unit, from the similar inquiry of described intention, select the similar inquiry of intention of the highest specific quantity of similar degree or similar degree to be greater than the similar inquiry of intention of predetermined threshold.
In one embodiment, each inquiry in the similar inquiry of described intention and the similar degree between described input inquiry can be calculated by least one item in the following: the consistent degree of described inquiry and described input inquiry; The vocabulary similarity of described inquiry and described input inquiry; The grammer similarity of described inquiry and described input inquiry; The semantic similarity of described inquiry and described input inquiry; Described inquiry and the described input inquiry context similarity in prepared collected works; Described inquiry and the described input inquiry common occurrence rate in inquiry log; Described inquiry and the distance of described input inquiry in domain body; And the similarity of the muster data of described inquiry and described input inquiry.
In one embodiment, each inquiry in the similar inquiry of described intention and the similar degree between described input inquiry can be calculated by least one real-world information, and described real-world information at least comprises: time, position, user model and environment.
In one embodiment, described similar intent information is described and can be presented by the regular expression of input inquiry.
In one embodiment, described similar intent information describes collection determining unit 7400 and can comprise: linguistic form analytic unit, analyzes the linguistic form of each intention in whole intention group of the similar inquiry of described intention; Query intention relation determination unit, determines at least one query intention relation between the linguistic form of the similar inquiry of respective intent in described linguistic form and all the other linguistic forms; Regular expression converter unit, is transformed to regular expression corresponding to determined at least one query intention relation by each linguistic form be intended to; And regular expression adding device, the regular expression that conversion obtains is added to described similar intent information and describe concentrated.
In one embodiment, described similar intent information describes collection determining unit 7400 and may further include: intention group expanding element, expand each intention group, comprise: synonym phrase generation unit, for each intention in this intention group, by replacing this at least one word with the synonym of at least one word in intention or near synonym and generate synonym phrase, at least one word wherein said is not in the similar inquiry of corresponding intention, and synonym phrase adding device, produced synonym phrase is added in this intention group.
In one embodiment, described similar intent information describes collection determining unit 7400 and may further include: first intention resolution unit, resolved each intention in whole intention group of the similar inquiry of described intention by lexical analysis means, whether meet at least one morphological rule to detect the similar inquiry of corresponding intention; First asterisk wildcard replacement unit, if the similar inquiry of corresponding intention meets at least one morphological rule, then for each intention in the intention group of the similar inquiry of this intention, replace the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention with asterisk wildcard, and obtain the intention of conversion; First regular expression generation unit, describes the intention of described conversion as the similar intent information of the vocabulary type with vocabulary and asterisk wildcard form, and describes the similar intent information of this vocabulary type as described regular expression; And the first regular expression adding device, described regular expression is added to similar intent information and describe and concentrate.
In one embodiment, described similar intent information describes collection determining unit 7400 and may further include: second intention resolution unit, resolved each intention in whole intention group of the similar inquiry of described intention by grammatical analysis means, whether meet at least one syntax rule to detect the similar inquiry of corresponding intention; Second asterisk wildcard replacement unit, if the similar inquiry of corresponding intention meets at least one syntax rule, then for each intention in the intention group of the similar inquiry of this intention, replace the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention with asterisk wildcard, and obtain the intention of conversion; Second regular expression generation unit, describes the intention of described conversion as the similar intent information of the grammer type with syntax rule and asterisk wildcard form, and describes the similar intent information of this grammer type as described regular expression; And the second regular expression adding device, described regular expression is added to similar intent information and describe and concentrate.
In one embodiment, described similar intent information describes collection determining unit 7400 and may further include: the 3rd intents unit, resolved each intention in whole intention group of the similar inquiry of described intention by semantic relation analysis means, whether meet at least one semantic relation to detect the similar inquiry of corresponding intention; 3rd asterisk wildcard replacement unit, if the similar inquiry of corresponding intention meets at least one semantic relation, then for each intention in the intention group of the similar inquiry of this intention, the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention is replaced with asterisk wildcard, and replace this all the other linguistic forms with the semantic marker of all the other linguistic forms of this intention, and obtain the intention of conversion; And the 3rd regular expression generation unit, the intention of described conversion is described as the similar intent information of the semantic type with semantic marker and asterisk wildcard form, and similar for this semantic type intent information is described as described regular expression; 3rd regular expression adding device, adds similar intent information to and describes concentrated by described regular expression.
In one embodiment, described similar intent information describes collection determining unit 7400 and may further include: the 4th intents unit, resolved each intention in whole intention group of the similar inquiry of described intention by logic analysis means, whether meet at least one logical relation to detect the similar inquiry of corresponding intention; 4th asterisk wildcard replacement unit, if the similar inquiry of corresponding intention meets at least one logical relation, then for each intention in the intention group of the similar inquiry of this intention, the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention is replaced with asterisk wildcard, and replace this all the other linguistic forms with the logical type of all the other linguistic forms of this intention, and obtain the intention of conversion; And the 4th regular expression generation unit, the intention of described conversion is described as the similar intent information of the logical type with logical type and asterisk wildcard form, and similar for this logical type intent information is described as described regular expression; 4th regular expression adding device, adds similar intent information to and describes concentrated by described regular expression.
In one embodiment, described similar intent information describes collection determining unit and may further include: confidence computation unit, calculates described similar intent information and describes the degree of confidence concentrating each similar intent information to describe; And similar intent information describes selection unit, describe from described similar intent information and concentrate the similar intent information selecting the similar intent information description of the highest specific quantity of degree of confidence or degree of confidence to be greater than predetermined threshold to describe.
In one embodiment, described degree of confidence can use at least one item in the following to calculate: the frequency that similar intent information describes; The coverage rate that similar intent information describes; And similar intent information describes the correlativity with input inquiry.
In one embodiment, described degree of confidence can from least one calculating the following: described similar intent information describes collection; The intention training set prepared; And the realm information prepared.
In one embodiment, described confidence computation unit may further include: the first weight dispensing unit, comes to describe to described similar intent information the respective class concentrated describe the different weight of configuration like intent information according to the popularity of the similar inquiry of intention; And/or the second weight dispensing unit, according to the similar inquiry of intention and the similar degree between described input inquiry come to described similar intent information describes the respective class concentrated seemingly intent information describe and configure different weights.
In one embodiment, described second intention excavates unit 7500 and can comprise: input inquiry replacement unit, and the asterisk wildcard described in concentrated similar intent information description by replacing described similar intent information with input inquiry produces one group of intention.
In one embodiment, described second intention excavates unit 7500 and can comprise: first group of intention excavates unit, excavates first group of intention for described input inquiry from least one data source; And second group of intention excavates unit, by the second group of intention using described similar intent information description collection and described first group of intention to excavate for described input inquiry.
In one embodiment, described second group of intention is excavated unit and can be comprised: the unit generating at least one intention by replacing the described similar intent information asterisk wildcard described at least one similar intent information description of concentrating with input inquiry, and at least one intention wherein said is not in first group of intention; And in first group of intention, add the unit of at least one intention generated.
In one embodiment, described second group of intention is excavated unit and can be comprised: sequencing unit, describes set pair sort for first group of intention of described input inquiry by using described similar intent information.
In one embodiment, described second group of intention is excavated unit and be may further include: peculiar intention assessment unit, identifies for the peculiar intention in first group of intention of described input inquiry; Weight changes unit, according to the intentional peculiar degree of spy, improves the weight of peculiar intention in described sequence; Wherein, special intentional peculiar degree is calculated by least one item in the following: input inquiry and the intentional common occurrence rate of spy in prepared intention training set; Input inquiry and the relation of peculiar intention in domain knowledge; The frequency of peculiar intention in muster data; And the popularity of peculiar intention in inquiry log.
Figure 19 shows the functional block diagram of the equipment 8000 for information retrieval according to the embodiment of the present invention.The all functions module of this equipment 8000 (that is, no matter the various unit included by this equipment 8000, illustrate in the drawings, or do not illustrate in figure) can be realized by the combination realizing the hardware of the principle of the invention, software or hardware and software.It will be understood by those skilled in the art that the functional module described in Figure 19 can combine or be divided into submodule, thus realize the principle of foregoing invention.Therefore, description herein can be supported any possible combination of functional module described herein or divides or further limit.
As shown in figure 19, the equipment 8000 for information retrieval comprises: input inquiry receiving element 8100, above-mentioned for be intended to excavate equipment 7000 and Search Results obtain unit 8200.Described input inquiry receiving element 8100 is configured to receive the input inquiry that user adopts natural language.The described equipment 7000 for being intended to excavate is configured to carry out intention from described input inquiry and excavates.Described Search Results obtains unit 8200 and is configured to obtain institute and excavates the Search Results be intended to.
Figure 20 shows the functional block diagram of the equipment 9000 of assisting for question and answer according to the embodiment of the present invention.The all functions module of this equipment 9000 (that is, no matter the various unit included by this equipment 9000, illustrate in the drawings, or do not illustrate in figure) can be realized by the combination realizing the hardware of the principle of the invention, software or hardware and software.It will be understood by those skilled in the art that the functional module described in Figure 20 can combine or be divided into submodule, thus realize the principle of foregoing invention.Therefore, description herein can be supported any possible combination of functional module described herein or divides or further limit.
As shown in figure 20, comprise for the equipment 9000 that question and answer are auxiliary: input inquiry receiving element 9100, above-mentioned for being intended to the equipment 7000 that excavates and answer obtains unit 9200.Described input inquiry receiving element 9100 is configured to receive the input inquiry that user adopts natural language.The described equipment 7000 for being intended to excavate is configured to excavate theme from described input inquiry.Described answer obtains unit 9200 and is configured to obtain the answer for excavated theme.
The present invention can be realized by following various scheme:
Scheme 1: a kind of for being intended to the method excavated, described method comprises:
Obtain input inquiry;
Generate the similar inquiry of intention for described input inquiry, wherein each is intended to similar inquiry and has intention type same or similar with described input inquiry;
Be intended to similar inquiry excavation one group intention for each, wherein each intention provides the sub-topics for the similar inquiry of corresponding intention;
Determine that by using whole intention group of the similar inquiry of described intention similar intent information describes collection; And
By using described similar intent information, the intention collecting and excavate for described input inquiry is described.
Scheme 2: the method as described in scheme 1, wherein generates the similar inquiry of intention for described input inquiry and comprises:
Obtain one or more inquiry to phrase from least one data source, wherein each inquiry comprises phrase: described input inquiry, be intended to similar designator and the 3rd phrase; And
From each inquiry to the 3rd phrase described in Phrase extraction, as the similar inquiry of described intention.
Scheme 3: the method as described in scheme 2, the similar designator of wherein said intention comprises at least one item in the following:
Coordination designator, two phrases wherein connected by described coordination designator are used as identical syntactic element in sentence;
Relativity designator, the first phrase wherein in sentence is in relativity with the second phrase be connected to after described first phrase by described relativity designator; And
Choice relation designator, two phrases wherein connected by described choice relation designator form selective expression in sentence.
Scheme 4: the method as described in scheme 1 or 2, wherein when described input inquiry is the inquiry of many words, generates the similar inquiry of intention for described input inquiry and comprises:
Identify core intention part and the modifier part of described input inquiry; And
Generate the similar phrase of intention of the core intention part of described input inquiry, as the similar inquiry of described intention.
Scheme 5: the method as described in scheme 1 or 2, wherein when described input inquiry is the inquiry of many words, generates the similar inquiry of intention for described input inquiry and comprises:
Identify core intention part and the modifier part of described input inquiry; And
The similar inquiry of described intention is generated by the modifier part of replacing described input inquiry with multiple Substitute For Partial, wherein each Substitute For Partial is the similar phrase of intention generated for described modifier part, and wherein the similar phrase of each intention has the same or similar intention type of modifier part with described input inquiry.
Scheme 6: the method as described in scheme 4, wherein identifies that the core intention part of described input inquiry and modifier part comprise:
Resolve described input inquiry, so that described input inquiry is divided into multiple semantic primitive;
For each semantic primitive divided of described input inquiry, generate the similar inquiry of interim intention that is made up of divided semantic primitive and changing section, wherein said changing section is the similar phrase of intention generated for other semantic primitive of described input inquiry;
For each semantic primitive divided of described input inquiry, for the similar inquiry excavation of each interim intention one group of intention, wherein each intention provides the sub-topics for the similar inquiry of corresponding interim intention;
For each semantic primitive divided of described input inquiry, consistent degree is calculated by the intention group of the similar inquiry of interim intention of more corresponding semantic primitive, wherein said consistent degree is the homophylic tolerance of intention of the similar inquiry of interim intention for corresponding semantic primitive, if the intention type be present in the intention of the similar inquiry of interim intention is more general, then described consistent degree is higher; And
The semantic primitive in described input inquiry with the highest consistent degree is defined as the core intention part of described input inquiry, and other semantic primitive is defined as the modifier part of described input inquiry.
Scheme 7: the method as described in scheme 5, wherein identifies that the core intention part of described input inquiry and modifier part comprise:
Resolve described input inquiry, so that described input inquiry is divided into multiple semantic primitive;
For each semantic primitive divided of described input inquiry, generate the similar inquiry of interim intention that is made up of divided semantic primitive and changing section, wherein said changing section is the similar phrase of intention generated for other semantic primitive of described input inquiry;
For each semantic primitive divided of described input inquiry, for the similar inquiry excavation of each interim intention one group of intention, wherein each intention provides the sub-topics for the similar inquiry of corresponding interim intention;
For each semantic primitive divided of described input inquiry, consistent degree is calculated by the intention group of the similar inquiry of interim intention of more corresponding semantic primitive, wherein said consistent degree is the homophylic tolerance of intention of the similar inquiry of interim intention for corresponding semantic primitive, if the intention type be present in the intention of the similar inquiry of interim intention is more general, then described consistent degree is higher; And
The semantic primitive in described input inquiry with the highest consistent degree is defined as the core intention part of described input inquiry, and other semantic primitive is defined as the modifier part of described input inquiry.
Scheme 8: the method as described in scheme 1 or 2, wherein comprises following at least one for the similar inquiry of described input inquiry generation intention:
One or more inquiry is obtained as the similar inquiry of described intention from the intention being stored in described input inquiry similar inquiry storehouse;
One or more brother of nodes of described input inquiry are obtained as the similar inquiry of described intention in domain body;
The contiguous concept of described input inquiry is obtained as the similar inquiry of described intention in language dictionaries; And
Be intended to similarity from inquiry log, obtain one or more inquiry as the similar inquiry of described intention by calculating based on the muster data be associated with described input inquiry.
Scheme 9: the method as described in scheme 1, wherein generates the similar inquiry of intention for described input inquiry and comprises further:
Calculate the similar degree between each inquiry in the similar inquiry of described intention and described input inquiry; And
From the similar inquiry of described intention, select the similar inquiry of intention of the highest specific quantity of similar degree or similar degree to be greater than the similar inquiry of intention of predetermined threshold.
Scheme 10: the method as described in scheme 9, each inquiry in the similar inquiry of wherein said intention and the similar degree between described input inquiry are calculated by least one item in the following:
The consistent degree of described inquiry and described input inquiry;
The vocabulary similarity of described inquiry and described input inquiry;
The grammer similarity of described inquiry and described input inquiry;
The semantic similarity of described inquiry and described input inquiry;
Described inquiry and the described input inquiry context similarity in prepared collected works;
Described inquiry and the described input inquiry common occurrence rate in inquiry log;
Described inquiry and the distance of described input inquiry in domain body; And
The similarity of the muster data of described inquiry and described input inquiry.
Scheme 11: the method as described in scheme 9, each inquiry in the similar inquiry of wherein said intention and the similar degree between described input inquiry are calculated by least one real-world information, and described real-world information at least comprises: time, position, user model and environment.
Scheme 12: the method as described in scheme 1, the regular expression that wherein said similar intent information is described through input inquiry presents.
Scheme 13: the method as described in scheme 12, wherein determine that described similar intent information describes collection and comprises:
Analyze the linguistic form of each intention in whole intention group of the similar inquiry of described intention;
Determine at least one query intention relation between the linguistic form of the similar inquiry of respective intent in described linguistic form and all the other linguistic forms;
Corresponding to determined at least one query intention relation, each linguistic form be intended to is transformed to regular expression; And
The regular expression that conversion obtains being added to described similar intent information describes concentrated.
Scheme 14: the method as described in scheme 13, wherein determine that described similar intent information describes collection and comprises further:
Expand each intention group, comprising:
For each intention in this intention group, by replacing this at least one word with the synonym of at least one word in intention or near synonym and generate synonym phrase, at least one word wherein said is not intended in similar inquiry corresponding, and
Produced synonym phrase is added in this intention group.
Scheme 15: the method as described in scheme 13, wherein determine that described similar intent information describes collection and comprises further:
Resolved each intention in whole intention group of the similar inquiry of described intention by lexical analysis means, whether meet at least one morphological rule to detect the similar inquiry of corresponding intention;
If the similar inquiry of corresponding intention meets at least one morphological rule, then for each intention in the intention group of the similar inquiry of this intention, replace the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention with asterisk wildcard, and obtain the intention of conversion;
The intention of described conversion is described as the similar intent information of the vocabulary type with vocabulary and asterisk wildcard form, and the similar intent information of this vocabulary type is described as described regular expression; And
Described regular expression being added to similar intent information describes concentrated.
Scheme 16: the method as described in scheme 13, wherein determine that described similar intent information describes collection and comprises further:
Resolved each intention in whole intention group of the similar inquiry of described intention by grammatical analysis means, whether meet at least one syntax rule to detect the similar inquiry of corresponding intention;
If the similar inquiry of corresponding intention meets at least one syntax rule, then for each intention in the intention group of the similar inquiry of this intention, replace the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention with asterisk wildcard, and obtain the intention of conversion;
The intention of described conversion is described as the similar intent information of the grammer type with syntax rule and asterisk wildcard form, and the similar intent information of this grammer type is described as described regular expression; And
Described regular expression being added to similar intent information describes concentrated.
Scheme 17: the method as described in scheme 15, wherein determine that described similar intent information describes collection and comprises further:
Resolved each intention in whole intention group of the similar inquiry of described intention by grammatical analysis means, whether meet at least one syntax rule to detect the similar inquiry of corresponding intention;
If the similar inquiry of corresponding intention meets at least one syntax rule, then for each intention in the intention group of the similar inquiry of this intention, replace the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention with asterisk wildcard, and obtain the intention of conversion;
The intention of described conversion is described as the similar intent information of the grammer type with syntax rule and asterisk wildcard form, and the similar intent information of this grammer type is described as described regular expression; And
Described regular expression being added to similar intent information describes concentrated.
Scheme 18: as the method according to any one of scheme 13,15-17, wherein determine that described similar intent information describes collection and comprises further:
Resolved each intention in whole intention group of the similar inquiry of described intention by semantic relation analysis means, whether meet at least one semantic relation to detect the similar inquiry of corresponding intention;
If the similar inquiry of corresponding intention meets at least one semantic relation, then for each intention in the intention group of the similar inquiry of this intention, the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention is replaced with asterisk wildcard, and replace this all the other linguistic forms with the semantic marker of all the other linguistic forms of this intention, and obtain the intention of conversion;
The intention of described conversion is described as the similar intent information of the semantic type with semantic marker and asterisk wildcard form, and similar for this semantic type intent information is described as described regular expression; And
Described regular expression being added to similar intent information describes concentrated.
Scheme 19: as the method according to any one of scheme 13,15-17, wherein determine that described similar intent information describes collection and comprises further:
Resolved each intention in whole intention group of the similar inquiry of described intention by logic analysis means, whether meet at least one logical relation to detect the similar inquiry of corresponding intention;
If the similar inquiry of corresponding intention meets at least one logical relation, then for each intention in the intention group of the similar inquiry of this intention, the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention is replaced with asterisk wildcard, and replace this all the other linguistic forms with the logical type of all the other linguistic forms of this intention, and obtain the intention of conversion;
The intention of described conversion is described as the similar intent information of the logical type with logical type and asterisk wildcard form, and similar for this logical type intent information is described as described regular expression; And
Described regular expression being added to similar intent information describes concentrated.
Scheme 20: the method as described in scheme 18, wherein determine that described similar intent information describes collection and comprises further:
Resolved each intention in whole intention group of the similar inquiry of described intention by logic analysis means, whether meet at least one logical relation to detect the similar inquiry of corresponding intention;
If the similar inquiry of corresponding intention meets at least one logical relation, then for each intention in the intention group of the similar inquiry of this intention, the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention is replaced with asterisk wildcard, and replace this all the other linguistic forms with the logical type of all the other linguistic forms of this intention, and obtain the intention of conversion; And
The intention of described conversion is described as the similar intent information of the logical type with logical type and asterisk wildcard form, and similar for this logical type intent information is described as described regular expression;
Described regular expression being added to similar intent information describes concentrated.
Scheme 21: the method as described in scheme 13 or 14, wherein determine that described similar intent information describes collection and comprises further:
Calculate described similar intent information and describe the degree of confidence concentrating each similar intent information to describe; And
Describing from described similar intent information concentrates the similar intent information selecting the similar intent information description of the highest specific quantity of degree of confidence or degree of confidence to be greater than predetermined threshold to describe.
Scheme 22: the method as described in scheme 21, described degree of confidence uses at least one item in the following to calculate:
The frequency that similar intent information describes;
The coverage rate that similar intent information describes; And
Similar intent information describes the correlativity with input inquiry.
Scheme 23: the method as described in scheme 21, described degree of confidence is from least one calculating the following:
Described similar intent information describes collection;
The intention training set prepared; And
The realm information prepared.
Scheme 24: the method as described in scheme 23, wherein describes the collection compute classes degree of confidence that seemingly intent information describes from described similar intent information and comprises further:
Come that the respective class concentrated is described to described similar intent information according to the popularity of the similar inquiry of intention and describe the different weight of configuration like intent information; And/or
According to the similar inquiry of intention and the similar degree between described input inquiry come to described similar intent information describes the respective class concentrated seemingly intent information describe and configure different weights.
Scheme 25: the method as described in scheme 1, the intention wherein excavated for described input inquiry comprises:
The asterisk wildcard described in concentrated similar intent information description by replacing described similar intent information with input inquiry produces one group of intention.
Scheme 26: the method as described in scheme 1, the intention wherein excavated for described input inquiry comprises:
First group of intention for described input inquiry is excavated from least one data source; And
By the second group of intention using described similar intent information description collection and described first group of intention to excavate for described input inquiry.
Scheme 27: the method as described in scheme 26, the second group of intention wherein excavated for described input inquiry comprises:
The asterisk wildcard described at least one similar intent information description of concentrating by replacing described similar intent information with input inquiry generates at least one intention, and at least one intention wherein said is not in first group of intention; And
At least one intention generated is added in first group of intention.
Scheme 28: the method as described in scheme 26, the second group of intention wherein excavated for described input inquiry comprises:
Describe set pair by using described similar intent information to sort for first group of intention of described input inquiry.
Scheme 29: the method as described in scheme 28, the second group of intention wherein excavated for described input inquiry comprises further:
Identify for the peculiar intention in first group of intention of described input inquiry;
According to the intentional peculiar degree of spy, improve the weight of peculiar intention in described sequence;
Wherein, special intentional peculiar degree is calculated by least one item in the following:
Input inquiry and the intentional common occurrence rate of spy in prepared intention training set;
Input inquiry and the relation of peculiar intention in domain knowledge;
The frequency of peculiar intention in muster data; And
The popularity of peculiar intention in inquiry log.
Scheme 30: a kind of for method for information retrieval, comprising:
Receive the input inquiry that user adopts natural language;
Method according to any one of scheme 1-29 is carried out intention from described input inquiry and is excavated; And
Obtain institute excavate be intended to Search Results.
Scheme 31: a kind of method auxiliary for question and answer, comprising:
Receive the input inquiry that user adopts natural language;
Method according to any one of scheme 1-29 excavates theme from described input inquiry; And
Obtain the answer for excavated theme.
Scheme 32: a kind of for being intended to the equipment excavated, described equipment comprises:
Input inquiry acquiring unit, obtains input inquiry;
Be intended to similar query generation unit, generate the similar inquiry of intention for described input inquiry, wherein each is intended to similar inquiry and has intention type same or similar with described input inquiry;
First intention excavates unit, is intended to similar inquiry excavation one group intention for each, and wherein each intention provides the sub-topics for the similar inquiry of corresponding intention;
Similar intent information describes collection determining unit, determines that similar intent information describes collection by using whole intention group of the similar inquiry of described intention; And
Second intention excavates unit, describes by using described similar intent information the intention collecting and excavate for described input inquiry.
Scheme 33: the equipment as described in scheme 32, the similar query generation unit of wherein said intention comprises:
Inquiry is to phrase acquiring unit, and obtain one or more inquiry to phrase from least one data source, wherein each inquiry comprises phrase: described input inquiry, be intended to similar designator and the 3rd phrase; And
3rd Phrase extraction unit, from each inquiry to the 3rd phrase described in Phrase extraction, as the similar inquiry of described intention.
Scheme 34: the equipment as described in scheme 33, the similar designator of wherein said intention comprises at least one item in the following:
Coordination designator, two phrases wherein connected by described coordination designator are used as identical syntactic element in sentence;
Relativity designator, the first phrase wherein in sentence is in relativity with the second phrase be connected to after described first phrase by described relativity designator; And
Choice relation designator, two phrases wherein connected by described choice relation designator form selective expression in sentence.
Scheme 35: the equipment as described in scheme 32 or 33, wherein when described input inquiry is the inquiry of many words, the similar query generation unit of described intention comprises:
Core intention part and modifier part recognition unit, identify core intention part and the modifier part of described input inquiry; And
Be intended to similar phrase generation unit, generate the similar phrase of intention of the core intention part of described input inquiry, as the similar inquiry of described intention.
Scheme 36: the equipment as described in scheme 32 or 33, wherein when described input inquiry is the inquiry of many words, the similar query generation unit of described intention comprises:
Core intention part and modifier part recognition unit, identify core intention part and the modifier part of described input inquiry; And
Modifier part replacement unit, the similar inquiry of described intention is generated by the modifier part of replacing described input inquiry with multiple Substitute For Partial, wherein each Substitute For Partial is the similar phrase of intention generated for described modifier part, and wherein the similar phrase of each intention has the same or similar intention type of modifier part with described input inquiry.
Scheme 37: the equipment as described in scheme 35, wherein said core intention part and modifier part recognition unit comprise:
Input inquiry resolution unit, resolves described input inquiry, so that described input inquiry is divided into multiple semantic primitive;
The similar query generation unit of interim intention, for each semantic primitive divided of described input inquiry, generate the similar inquiry of interim intention that is made up of divided semantic primitive and changing section, wherein said changing section is the similar phrase of intention generated for other semantic primitive of described input inquiry;
3rd intention excavates unit, and for each semantic primitive divided of described input inquiry, for the similar inquiry excavation of each interim intention one group of intention, wherein each intention provides the sub-topics for the similar inquiry of corresponding interim intention;
Consistent degree computing unit, for each semantic primitive divided of described input inquiry, consistent degree is calculated by the intention group of the similar inquiry of interim intention of more corresponding semantic primitive, wherein said consistent degree is the homophylic tolerance of intention of the similar inquiry of interim intention for corresponding semantic primitive, if the intention type be present in the intention of the similar inquiry of interim intention is more general, then described consistent degree is higher; And
Core intention part determining unit, is defined as the core intention part of described input inquiry, and other semantic primitive is defined as the modifier part of described input inquiry by the semantic primitive in described input inquiry with the highest consistent degree.
Scheme 38: the equipment as described in scheme 36, wherein said core intention part and modifier part recognition unit comprise:
Input inquiry resolution unit, resolves described input inquiry, so that described input inquiry is divided into multiple semantic primitive;
The similar query generation unit of interim intention, for each semantic primitive divided of described input inquiry, generate the similar inquiry of interim intention that is made up of divided semantic primitive and changing section, wherein said changing section is the similar phrase of intention generated for other semantic primitive of described input inquiry;
3rd intention excavates unit, and for each semantic primitive divided of described input inquiry, for the similar inquiry excavation of each interim intention one group of intention, wherein each intention provides the sub-topics for the similar inquiry of corresponding interim intention;
Consistent degree computing unit, for each semantic primitive divided of described input inquiry, consistent degree is calculated by the intention group of the similar inquiry of interim intention of more corresponding semantic primitive, wherein said consistent degree is the homophylic tolerance of intention of the similar inquiry of interim intention for corresponding semantic primitive, if the intention type be present in the intention of the similar inquiry of interim intention is more general, then described consistent degree is higher; And
Core intention part determining unit, is defined as the core intention part of described input inquiry, and other semantic primitive is defined as the modifier part of described input inquiry by the semantic primitive in described input inquiry with the highest consistent degree.
Scheme 39: the equipment as described in scheme 32 or 33, wherein said intention similar query generation unit comprises following at least one:
The unit of one or more inquiry as the similar inquiry of described intention is obtained from the intention being stored in described input inquiry similar inquiry storehouse;
The unit of one or more brother of nodes as the similar inquiry of described intention of described input inquiry is obtained in domain body;
The unit of contiguous concept as the similar inquiry of described intention of described input inquiry is obtained in language dictionaries; And
Be intended to similarity from inquiry log, obtain the unit of one or more inquiry as the similar inquiry of described intention by calculating based on the muster data be associated with described input inquiry.
Scheme 40: the equipment as described in scheme 32, the similar query generation unit of wherein said intention comprises further:
Similar degree computing unit, calculates the similar degree between each inquiry in the similar inquiry of described intention and described input inquiry; And
Be intended to similar query selection unit, from the similar inquiry of described intention, select the similar inquiry of intention of the highest specific quantity of similar degree or similar degree to be greater than the similar inquiry of intention of predetermined threshold.
Scheme 41: the equipment as described in scheme 40, each inquiry in the similar inquiry of wherein said intention and the similar degree between described input inquiry are calculated by least one item in the following:
The consistent degree of described inquiry and described input inquiry;
The vocabulary similarity of described inquiry and described input inquiry;
The grammer similarity of described inquiry and described input inquiry;
The semantic similarity of described inquiry and described input inquiry;
Described inquiry and the described input inquiry context similarity in prepared collected works;
Described inquiry and the described input inquiry common occurrence rate in inquiry log;
Described inquiry and the distance of described input inquiry in domain body; And
The similarity of the muster data of described inquiry and described input inquiry.
Scheme 42: the equipment as described in scheme 40, each inquiry in the similar inquiry of wherein said intention and the similar degree between described input inquiry are calculated by least one real-world information, and described real-world information at least comprises: time, position, user model and environment.
Scheme 43: the equipment as described in scheme 32, the regular expression that wherein said similar intent information is described through input inquiry presents.
Scheme 44: the equipment as described in scheme 43, wherein said similar intent information describes collection determining unit and comprises:
Linguistic form analytic unit, analyzes the linguistic form of each intention in whole intention group of the similar inquiry of described intention;
Query intention relation determination unit, determines at least one query intention relation between the linguistic form of the similar inquiry of respective intent in described linguistic form and all the other linguistic forms;
Regular expression converter unit, is transformed to regular expression corresponding to determined at least one query intention relation by each linguistic form be intended to; And
Regular expression adding device, adds described similar intent information to and describes concentrated by the regular expression that conversion obtains.
Scheme 45: the equipment as described in scheme 44, wherein said similar intent information describes collection determining unit and comprises further:
Intention group expanding element, expand each intention group, comprising:
Synonym phrase generation unit, for each intention in this intention group, by replacing this at least one word with the synonym of at least one word in intention or near synonym and generate synonym phrase, at least one word wherein said is not intended in similar inquiry corresponding, and
Synonym phrase adding device, adds to produced synonym phrase in this intention group.
Scheme 46: the equipment as described in scheme 44, wherein said similar intent information describes collection determining unit and comprises further:
First intention resolution unit, resolves each intention in whole intention group of the similar inquiry of described intention by lexical analysis means, whether meet at least one morphological rule to detect the similar inquiry of corresponding intention;
First asterisk wildcard replacement unit, if the similar inquiry of corresponding intention meets at least one morphological rule, then for each intention in the intention group of the similar inquiry of this intention, replace the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention with asterisk wildcard, and obtain the intention of conversion;
First regular expression generation unit, describes the intention of described conversion as the similar intent information of the vocabulary type with vocabulary and asterisk wildcard form, and describes the similar intent information of this vocabulary type as described regular expression; And
First regular expression adding device, adds similar intent information to and describes concentrated by described regular expression.
Scheme 47: the equipment as described in scheme 44, wherein said similar intent information describes collection determining unit and comprises further:
Second intention resolution unit, resolves each intention in whole intention group of the similar inquiry of described intention by grammatical analysis means, whether meet at least one syntax rule to detect the similar inquiry of corresponding intention;
Second asterisk wildcard replacement unit, if the similar inquiry of corresponding intention meets at least one syntax rule, then for each intention in the intention group of the similar inquiry of this intention, replace the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention with asterisk wildcard, and obtain the intention of conversion;
Second regular expression generation unit, describes the intention of described conversion as the similar intent information of the grammer type with syntax rule and asterisk wildcard form, and describes the similar intent information of this grammer type as described regular expression; And
Second regular expression adding device, adds similar intent information to and describes concentrated by described regular expression.
Scheme 48: the equipment as described in scheme 46, wherein said similar intent information describes collection determining unit and comprises further:
Second intention resolution unit, resolves each intention in whole intention group of the similar inquiry of described intention by grammatical analysis means, whether meet at least one syntax rule to detect the similar inquiry of corresponding intention;
Second asterisk wildcard replacement unit, if the similar inquiry of corresponding intention meets at least one syntax rule, then for each intention in the intention group of the similar inquiry of this intention, replace the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention with asterisk wildcard, and obtain the intention of conversion;
Second regular expression generation unit, describes the intention of described conversion as the similar intent information of the grammer type with syntax rule and asterisk wildcard form, and describes the similar intent information of this grammer type as described regular expression; And
Second regular expression adding device, adds similar intent information to and describes concentrated by described regular expression.
Scheme 49: as the equipment according to any one of scheme 44,46-48, wherein said similar intent information describes collection determining unit and comprises further:
3rd intents unit, resolves each intention in whole intention group of the similar inquiry of described intention by semantic relation analysis means, whether meet at least one semantic relation to detect the similar inquiry of corresponding intention;
3rd asterisk wildcard replacement unit, if the similar inquiry of corresponding intention meets at least one semantic relation, then for each intention in the intention group of the similar inquiry of this intention, the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention is replaced with asterisk wildcard, and replace this all the other linguistic forms with the semantic marker of all the other linguistic forms of this intention, and obtain the intention of conversion; And
3rd regular expression generation unit, describes the intention of described conversion as the similar intent information of the semantic type with semantic marker and asterisk wildcard form, and describes similar for this semantic type intent information as described regular expression;
3rd regular expression adding device, adds similar intent information to and describes concentrated by described regular expression.
Scheme 50: as the equipment according to any one of scheme 44,46-48, wherein said similar intent information describes collection determining unit and comprises further:
4th intents unit, resolves each intention in whole intention group of the similar inquiry of described intention by logic analysis means, whether meet at least one logical relation to detect the similar inquiry of corresponding intention;
4th asterisk wildcard replacement unit, if the similar inquiry of corresponding intention meets at least one logical relation, then for each intention in the intention group of the similar inquiry of this intention, the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention is replaced with asterisk wildcard, and replace this all the other linguistic forms with the logical type of all the other linguistic forms of this intention, and obtain the intention of conversion; And
4th regular expression generation unit, describes the intention of described conversion as the similar intent information of the logical type with logical type and asterisk wildcard form, and describes similar for this logical type intent information as described regular expression;
4th regular expression adding device, adds similar intent information to and describes concentrated by described regular expression.
Scheme 51: the equipment as described in scheme 49, wherein said similar intent information describes collection determining unit and comprises further:
4th intents unit, resolves each intention in whole intention group of the similar inquiry of described intention by logic analysis means, whether meet at least one logical relation to detect the similar inquiry of corresponding intention;
4th asterisk wildcard replacement unit, if the similar inquiry of corresponding intention meets at least one logical relation, then for each intention in the intention group of the similar inquiry of this intention, the linguistic form of the similar inquiry of respective intent in the linguistic form of this intention is replaced with asterisk wildcard, and replace this all the other linguistic forms with the logical type of all the other linguistic forms of this intention, and obtain the intention of conversion; And
4th regular expression generation unit, describes the intention of described conversion as the similar intent information of the logical type with logical type and asterisk wildcard form, and describes similar for this logical type intent information as described regular expression;
4th regular expression adding device, adds similar intent information to and describes concentrated by described regular expression.
Scheme 52: the equipment as described in scheme 44 or 45, wherein said similar intent information describes collection determining unit and comprises further:
Confidence computation unit, calculates described similar intent information and describes the degree of confidence concentrating each similar intent information to describe; And
Similar intent information describes selection unit, describes concentrate the similar intent information selecting the similar intent information description of the highest specific quantity of degree of confidence or degree of confidence to be greater than predetermined threshold to describe from described similar intent information.
Scheme 53: the equipment as described in scheme 52, described degree of confidence uses at least one item in the following to calculate:
The frequency that similar intent information describes;
The coverage rate that similar intent information describes; And
Similar intent information describes the correlativity with input inquiry.
Scheme 54: the equipment as described in scheme 52, described degree of confidence is from least one calculating the following:
Described similar intent information describes collection;
The intention training set prepared; And
The realm information prepared.
Scheme 55: the equipment as described in scheme 54, wherein said confidence computation unit comprises further:
First weight dispensing unit, comes to describe to described similar intent information the respective class concentrated according to the popularity of the similar inquiry of intention and describes the different weight of configuration like intent information; And/or
Second weight dispensing unit, according to the similar inquiry of intention and the similar degree between described input inquiry come to described similar intent information describes the respective class concentrated seemingly intent information describe and configure different weights.
Scheme 56: the equipment as described in scheme 32, wherein said second intention excavates unit and comprises:
Input inquiry replacement unit, the asterisk wildcard described in concentrated similar intent information description by replacing described similar intent information with input inquiry produces one group of intention.
Scheme 57: the equipment as described in scheme 32, wherein said second intention excavates unit and comprises:
First group of intention excavates unit, excavates first group of intention for described input inquiry from least one data source; And
Second group of intention excavates unit, by the second group of intention using described similar intent information description collection and described first group of intention to excavate for described input inquiry.
Scheme 58: the equipment as described in scheme 57, wherein said second group of intention is excavated unit and is comprised:
Generate unit of at least one intention by replacing the described similar intent information asterisk wildcard described at least one similar intent information description of concentrating with input inquiry, at least one intention wherein said is not in first group of intention; And
The unit of at least one intention generated is added in first group of intention.
Scheme 59: the equipment as described in scheme 57, wherein said second group of intention is excavated unit and is comprised:
Sequencing unit, describes set pair by using described similar intent information and sorts for first group of intention of described input inquiry.
Scheme 60: the equipment as described in scheme 59, wherein said second group of intention is excavated unit and is comprised further:
Peculiar intention assessment unit, identifies for the peculiar intention in first group of intention of described input inquiry;
Weight changes unit, according to the intentional peculiar degree of spy, improves the weight of peculiar intention in described sequence;
Wherein, special intentional peculiar degree is calculated by least one item in the following:
Input inquiry and the intentional common occurrence rate of spy in prepared intention training set;
Input inquiry and the relation of peculiar intention in domain knowledge;
The frequency of peculiar intention in muster data; And
The popularity of peculiar intention in inquiry log.
Scheme 61: a kind of equipment for information retrieval, comprising:
Input inquiry receiving element, receives the input inquiry that user adopts natural language;
The equipment for being intended to excavation according to any one of scheme 32-60, carries out intention from described input inquiry and excavates; And
Search Results obtain unit, obtain institute excavate be intended to Search Results.
Scheme 62: a kind of equipment auxiliary for question and answer, comprising:
Input inquiry receiving element, receives the input inquiry that user adopts natural language;
The equipment for being intended to excavation according to any one of scheme 32-60, excavates theme from described input inquiry; And
Answer obtains unit, obtains the answer for excavated theme.
It will be appreciated by those skilled in the art that various embodiment of the present invention can at random combine, and do not exceed scope of the present invention.
Method and system of the present invention may be realized in many ways.Such as, any combination by software, hardware, firmware or software, hardware, firmware realizes method and system of the present invention.Said sequence for the step of described method is only to be described, and the step of method of the present invention is not limited to above specifically described order, unless specifically stated otherwise.In addition, in certain embodiments, can be also record program in the recording medium by the invention process, these programs comprise the machine readable instructions for realizing according to method of the present invention.Thus, the present invention also covers the recording medium stored for performing the program according to method of the present invention.
Although be described in detail specific embodiments more of the present invention by example, it should be appreciated by those skilled in the art, above example is only to be described, instead of in order to limit the scope of the invention.It should be appreciated by those skilled in the art, can without departing from the scope and spirit of the present invention, above embodiment be modified.Scope of the present invention is limited by claims.

Claims (10)

1., for being intended to the method excavated, described method comprises:
Obtain input inquiry;
Generate the similar inquiry of intention for described input inquiry, wherein each is intended to similar inquiry and has intention type same or similar with described input inquiry;
Be intended to similar inquiry excavation one group intention for each, wherein each intention provides the sub-topics for the similar inquiry of corresponding intention;
Determine that by using whole intention group of the similar inquiry of described intention similar intent information describes collection; And
By using described similar intent information, the intention collecting and excavate for described input inquiry is described.
2. the method for claim 1, wherein generates the similar inquiry of intention for described input inquiry and comprises:
Obtain one or more inquiry to phrase from least one data source, wherein each inquiry comprises phrase: described input inquiry, be intended to similar designator and the 3rd phrase; And
From each inquiry to the 3rd phrase described in Phrase extraction, as the similar inquiry of described intention.
3. the method for claim 1, the regular expression that wherein said similar intent information is described through input inquiry presents.
4. method as claimed in claim 3, wherein determine that described similar intent information describes collection and comprises:
Analyze the linguistic form of each intention in whole intention group of the similar inquiry of described intention;
Determine at least one query intention relation between the linguistic form of the similar inquiry of respective intent in described linguistic form and all the other linguistic forms;
Corresponding to determined at least one query intention relation, each linguistic form be intended to is transformed to regular expression; And
The regular expression that conversion obtains being added to described similar intent information describes concentrated.
5., for being intended to the equipment excavated, described equipment comprises:
Input inquiry acquiring unit, obtains input inquiry;
Be intended to similar query generation unit, generate the similar inquiry of intention for described input inquiry, wherein each is intended to similar inquiry and has intention type same or similar with described input inquiry;
First intention excavates unit, is intended to similar inquiry excavation one group intention for each, and wherein each intention provides the sub-topics for the similar inquiry of corresponding intention;
Similar intent information describes collection determining unit, determines that similar intent information describes collection by using whole intention group of the similar inquiry of described intention; And
Second intention excavates unit, describes by using described similar intent information the intention collecting and excavate for described input inquiry.
6. equipment as claimed in claim 5, the similar query generation unit of wherein said intention comprises:
Inquiry is to phrase acquiring unit, and obtain one or more inquiry to phrase from least one data source, wherein each inquiry comprises phrase: described input inquiry, be intended to similar designator and the 3rd phrase; And
3rd Phrase extraction unit, from each inquiry to the 3rd phrase described in Phrase extraction, as the similar inquiry of described intention.
7. equipment as claimed in claim 5, the regular expression that wherein said similar intent information is described through input inquiry presents.
8. equipment as claimed in claim 7, wherein said similar intent information describes collection determining unit and comprises:
Linguistic form analytic unit, analyzes the linguistic form of each intention in whole intention group of the similar inquiry of described intention;
Query intention relation determination unit, determines at least one query intention relation between the linguistic form of the similar inquiry of respective intent in described linguistic form and all the other linguistic forms;
Regular expression converter unit, is transformed to regular expression corresponding to determined at least one query intention relation by each linguistic form be intended to; And
Regular expression adding device, adds described similar intent information to and describes concentrated by the regular expression that conversion obtains.
9., for an equipment for information retrieval, comprising:
Input inquiry receiving element, receives the input inquiry that user adopts natural language;
The equipment for being intended to excavation according to any one of claim 5-8, carries out intention from described input inquiry and excavates; And
Search Results obtain unit, obtain institute excavate be intended to Search Results.
10., for the equipment that question and answer are auxiliary, comprising:
Input inquiry receiving element, receives the input inquiry that user adopts natural language;
The equipment for being intended to excavation according to any one of claim 5-8, excavates theme from described input inquiry; And
Answer obtains unit, obtains the answer for excavated theme.
CN201310371165.5A 2013-08-23 2013-08-23 Method and apparatus for being intended to excavate Active CN104424216B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310371165.5A CN104424216B (en) 2013-08-23 2013-08-23 Method and apparatus for being intended to excavate

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310371165.5A CN104424216B (en) 2013-08-23 2013-08-23 Method and apparatus for being intended to excavate

Publications (2)

Publication Number Publication Date
CN104424216A true CN104424216A (en) 2015-03-18
CN104424216B CN104424216B (en) 2018-01-23

Family

ID=52973214

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310371165.5A Active CN104424216B (en) 2013-08-23 2013-08-23 Method and apparatus for being intended to excavate

Country Status (1)

Country Link
CN (1) CN104424216B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776981A (en) * 2016-12-06 2017-05-31 广州市科恩电脑有限公司 A kind of intelligent search method based on Heuristics
CN107679039A (en) * 2017-10-17 2018-02-09 北京百度网讯科技有限公司 The method and apparatus being intended to for determining sentence
CN107704450A (en) * 2017-10-13 2018-02-16 威盛电子股份有限公司 Natural language recognition equipment and natural language recognition method
CN108170859A (en) * 2018-01-22 2018-06-15 北京百度网讯科技有限公司 Method, apparatus, storage medium and the terminal device of speech polling
CN108287858A (en) * 2017-03-02 2018-07-17 腾讯科技(深圳)有限公司 The semantic extracting method and device of natural language
CN110309252A (en) * 2018-02-28 2019-10-08 阿里巴巴集团控股有限公司 A kind of natural language processing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101339551A (en) * 2007-07-05 2009-01-07 日电(中国)有限公司 Natural language query demand extension equipment and its method
CN102063469A (en) * 2010-12-03 2011-05-18 百度在线网络技术(北京)有限公司 Method and device for acquiring relevant keyword message and computer equipment
CN102096717A (en) * 2011-02-15 2011-06-15 百度在线网络技术(北京)有限公司 Search method and search engine
CN102722558A (en) * 2012-05-29 2012-10-10 百度在线网络技术(北京)有限公司 User question recommending method and device
CN103049495A (en) * 2012-12-07 2013-04-17 百度在线网络技术(北京)有限公司 Method, device and equipment for providing searching advice corresponding to inquiring sequence

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101339551A (en) * 2007-07-05 2009-01-07 日电(中国)有限公司 Natural language query demand extension equipment and its method
CN102063469A (en) * 2010-12-03 2011-05-18 百度在线网络技术(北京)有限公司 Method and device for acquiring relevant keyword message and computer equipment
CN102096717A (en) * 2011-02-15 2011-06-15 百度在线网络技术(北京)有限公司 Search method and search engine
CN102722558A (en) * 2012-05-29 2012-10-10 百度在线网络技术(北京)有限公司 User question recommending method and device
CN103049495A (en) * 2012-12-07 2013-04-17 百度在线网络技术(北京)有限公司 Method, device and equipment for providing searching advice corresponding to inquiring sequence

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776981A (en) * 2016-12-06 2017-05-31 广州市科恩电脑有限公司 A kind of intelligent search method based on Heuristics
CN106776981B (en) * 2016-12-06 2020-12-15 广州同构科技有限公司 Intelligent retrieval method based on empirical knowledge
CN108287858A (en) * 2017-03-02 2018-07-17 腾讯科技(深圳)有限公司 The semantic extracting method and device of natural language
CN108287858B (en) * 2017-03-02 2021-08-10 腾讯科技(深圳)有限公司 Semantic extraction method and device for natural language
CN107704450A (en) * 2017-10-13 2018-02-16 威盛电子股份有限公司 Natural language recognition equipment and natural language recognition method
CN107704450B (en) * 2017-10-13 2020-12-04 威盛电子股份有限公司 Natural language identification device and natural language identification method
CN107679039A (en) * 2017-10-17 2018-02-09 北京百度网讯科技有限公司 The method and apparatus being intended to for determining sentence
CN107679039B (en) * 2017-10-17 2020-12-29 北京百度网讯科技有限公司 Method and device for determining statement intention
CN108170859A (en) * 2018-01-22 2018-06-15 北京百度网讯科技有限公司 Method, apparatus, storage medium and the terminal device of speech polling
CN108170859B (en) * 2018-01-22 2020-07-28 北京百度网讯科技有限公司 Voice query method, device, storage medium and terminal equipment
CN110309252A (en) * 2018-02-28 2019-10-08 阿里巴巴集团控股有限公司 A kind of natural language processing method and device
CN110309252B (en) * 2018-02-28 2023-11-24 阿里巴巴集团控股有限公司 Natural language processing method and device

Also Published As

Publication number Publication date
CN104424216B (en) 2018-01-23

Similar Documents

Publication Publication Date Title
CN104424216A (en) Method and device for intention digging
US7680778B2 (en) Support for reverse and stemmed hit-highlighting
Han et al. A generative entity-mention model for linking entities with knowledge base
JP5462361B2 (en) Query parsing for map search
US6473754B1 (en) Method and system for extracting characteristic string, method and system for searching for relevant document using the same, storage medium for storing characteristic string extraction program, and storage medium for storing relevant document searching program
US8494839B2 (en) Apparatus, method, and recording medium for morphological analysis and registering a new compound word
JP5746426B2 (en) Discovery of index documents
CN102725759A (en) Semantic table of contents for search results
CN101425071A (en) Location expression detection device and computer readable medium
CN103729402A (en) Method for establishing mapping knowledge domain based on book catalogue
KR20070098252A (en) System and method for providing automatically completed recommended word by correcting and displaying the word
CN102214189B (en) Data mining-based word usage knowledge acquisition system and method
Krishnaveni et al. Automatic text summarization by local scoring and ranking for improving coherence
Oramas et al. ELMD: An automatically generated entity linking gold standard dataset in the music domain
CN114328996A (en) Method and device for publishing information
CN105653701A (en) Model generating method and device as well as word weighting method and device
KR101638535B1 (en) Method of detecting issue patten associated with user search word, server performing the same and storage medium storing the same
CN101933017B (en) Document search device, document search system, and document search method
KR100691400B1 (en) Method for analyzing morpheme using additional information and morpheme analyzer for executing the method
CN105404677A (en) Tree structure based retrieval method
KR102083017B1 (en) Method and system for analyzing social review of place
US9507834B2 (en) Search suggestions using fuzzy-score matching and entity co-occurrence
JP2007219620A (en) Text retrieval device, program, and method
CN105574004A (en) Webpage deduplication method and device
CN106168947A (en) A kind of related entities method for digging and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant