CN104424216B - Method and apparatus for being intended to excavate - Google Patents

Method and apparatus for being intended to excavate Download PDF

Info

Publication number
CN104424216B
CN104424216B CN201310371165.5A CN201310371165A CN104424216B CN 104424216 B CN104424216 B CN 104424216B CN 201310371165 A CN201310371165 A CN 201310371165A CN 104424216 B CN104424216 B CN 104424216B
Authority
CN
China
Prior art keywords
similar
inquiry
intention
intended
intent information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310371165.5A
Other languages
Chinese (zh)
Other versions
CN104424216A (en
Inventor
黄耀海
张碧川
刘鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to CN201310371165.5A priority Critical patent/CN104424216B/en
Publication of CN104424216A publication Critical patent/CN104424216A/en
Application granted granted Critical
Publication of CN104424216B publication Critical patent/CN104424216B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation

Abstract

The present invention relates to the method and apparatus for being intended to excavate.A kind of method for being used to be intended to excavation is disclosed, methods described includes:Obtain input inquiry;It is intended to similar inquiry for input inquiry generation, each of which, which is intended to similar inquiry, to be had and the same or similar intention type of the input inquiry;One group of intention is excavated for the similar inquiry of each intention, wherein each be intended to provide for the corresponding sub-topicses for being intended to similar inquiry;Similar intent information description collection is determined by using the whole intention groups for being intended to similar inquiry;And by using intention of the similar intent information description collection to excavate for the input inquiry.

Description

Method and apparatus for being intended to excavate
Technical field
The present invention relates to the method and apparatus of text mining.Especially, the present invention relates to for excavate be intended to method and Equipment.And more particularly, it relates to find the method and apparatus of the search intention for the inquiry behind that user is proposed.
Background technology
With the continuous development of computer and information technology, speed caused by the information in All Around The World constantly increases now Add.All multi informations such as personal information, occupational information, entertainment information, scientific and technological information, government information in the world now be present.Because Information is excessive, so causing to turn into problem to the tissue of information and access.
In order to improve experience of the user in information seeking processes, the side that user accesses its information found is used to help Method and system are constantly developed.For example, in Santos, et al.2011.University of Glasgow at the NTCIR-9Intent task:Experiments with Terrier on Subtopic Mining and Document Ranking.Proceedings of NTCIR-9Workshop Meeting,2011,Tokyo(Non-patent literature 1)Middle proposition Attempt to understand the potential intention for the inquiry behind that user is inputted.The situation of brief and ambiguous inquiry is inputted in user Under, it is desirable to be able to export n(For example, n=10)Important and diversified optimal intention result.Table 1 shows that one kind is shown Example.
The input inquiry of table 1 and the example of output
For example, as shown in table 1, if " becoming a paralegal ", can be exported some user input query Individual and " intention relevant becoming a paralegal ", so that user is selected.
In excavation processing is intended to, the quality of intention Result is generally evaluated with below equation:
Wherein I-rec(Intent recall)Represent to be intended to recall rate, i.e., in the intention obtained, what is obtained has The quantity of intention(That is, the correct result obtained)Relative to those quantity being intended to for wishing to obtain(All correct knots Fruit)Ratio, be often used for measurement be intended to variation;D-nDCG represents to be intended to accuracy rate(Intent precision), D- NDCG is diversified normalization accumulation of discount gain(Diversified-Normalized Discounted Cumulative Gain), it calculates the degree of correlation for the result document list that search engine returns based on position(Referring to Sakai and Song, Evaluating Diversified Search Result Using Per-intent Graded Relevance, Proceedings of SIGIR’11,2011Beijing(Non-patent literature 2)), it is used to measure the overall degree of correlation being intended to; And D#-nDCG represents I-rec and D-nDCG linear combination.
In above formula, I_rec, D-nDCG and D#-nDCG are the time of day data based on inquiry(Also referred to as standard is answered Case, ground truth)It is determined, is compared to calculate with time of day data typically by Result will be intended to Obtain, these indexs are it is known in the art that therefore will not be described in detail again.
As an example, in the prior art, the time of day data of inquiry can be obtained in the following way.It is for example, true Real status data can be manually set.For another example time of day data are to be provided by expositor and produced by more individual votes Raw.
In the prior art, generally from external resource of overall importance(Such as search engine, wikipedia, inquiry log and Anchor Text)Multiple intention candidates are excavated, then the intention candidate excavated are ranked up by parameters such as frequencies, to obtain The desired intention of user.
Such as in Xue, et al.2011.THUIR at NTCIR-9INTENT Task.Proceedings of NTCIR-9Workshop Meeting,2011,Tokyo(Non-patent literature 3)In disclose it is a kind of be used for be intended to excavate side Method.This method extracts the search result for including input inquiry, and the intention for being then based on search result identification for input inquiry is waited Choosing, it is finally based on certain criterion and the intention candidate is ranked up, to obtain the desired intention of user.
Fig. 1 shows the flow chart of the method for being used to be intended to excavate used in the non-patent literature 3 of prior art.Such as Shown in Fig. 1, in step S2100, the inquiry of user's input is obtained.Next, in step S2110, from search engine, wikipedia And the external resource of overall importance such as inquiry log excavates the intention candidate of the inquiry.Next, in step S2120, from being obtained The intention candidate of repetition is removed in the intention candidate obtained.Then, in step S2130, the frequency, common for being intended to candidate's appearance is utilized The parameters such as frequency, muster data and the editing distance of appearance, the remaining intention after intention candidate to eliminating repetition are waited Choosing is ranked up.Finally, in step S2140, according to ranking results, the forward intention candidate in position is selected as desired by user Intention, exported.
However, still according to practice, those skilled in the art have found, for disclosed in the non-patent literature 3 of prior art Method, in intent information(Such as user's query history)In the case of rareness, the intention obtained may it is expected with user The intention arrived is inconsistent, i.e., the above method can not accurately provide the intention candidate that user intentionally gets.Therefore, the above method It is relatively low to be intended to excavation performance.
In addition, in United States Patent (USP) US8,214,347B2(Patent document 1)In propose and another be used to be intended to the side excavated Method.In the method, high frequency phrases are extracted from search result, then by using some predetermined rules, according to these Phrase is intended to excavate.
Fig. 2 shows the flow chart of the method for being used to be intended to excavate used in the US8,214,347B2 of prior art. As shown in Fig. 2 in step S2200, the inquiry of user's input is obtained.Next, in step S2210, for user's input Inquiry, extract search result.Next, in step S2220 be intended to the excavation of candidate, it is included in search result and identifies bag Phrase containing input inquiry, and the frequency occurred using phrase, the frequency, muster data and the editing distance that occur jointly etc. Feature determines optimal phrase, as intention candidate.Then, in step S2230, it is ranked up to being intended to candidate.Finally, in step S2240, according to ranking results, the forward intention candidate in position is selected to be exported as the desired intention of user.
However, still being found according to practice, those skilled in the art, for the US8 of prior art, 214,347B2 institutes are public The method opened, in intent information(Such as user's query history)In the case of rareness, the intention obtained may it is expected with user Obtained intention is inconsistent, i.e., the above method is without the intention candidate that accurately offer user intentionally gets.Therefore, the above method Intention excavate performance it is relatively low.
It is, therefore, desirable to provide a kind of new technology solves above-mentioned problems of the prior art.
The content of the invention
It is an object of the invention to improve the degree of accuracy for being intended to excavate.
Another object of the present invention is to improve to be intended to recall rate.
According to an aspect of the invention, there is provided a kind of be used to be intended to the method excavated, methods described includes:Obtain defeated Enter inquiry;It is intended to similar inquiry for input inquiry generation, each of which, which is intended to similar inquiry, to be had and the input Inquire about same or similar intention type;One group of intention is excavated for the similar inquiry of each intention, wherein each be intended to provide For the corresponding sub-topicses for being intended to similar inquiry;Class is determined by using the whole intention groups for being intended to similar inquiry Describe to collect like intent information;And by using meaning of the similar intent information description collection to excavate for the input inquiry Figure.
According to another aspect of the present invention, there is provided one kind is used for method for information retrieval, including:User is received to use certainly The input inquiry of right language;Intention excavation is carried out from the input inquiry according to the above-mentioned method for being used to be intended to excavate;And obtain Obtain the search result for excavating and being intended to.
According to another aspect of the invention, there is provided a kind of method for question and answer auxiliary, including:User is received to use certainly The input inquiry of right language;Theme is excavated from the input inquiry according to the above-mentioned method for being used to be intended to excavate;And obtain pin To the answer of the theme excavated.
According to another aspect of the invention, there is provided a kind of to be used to be intended to the equipment excavated, the equipment includes:Input is looked into Acquiring unit is ask, obtains input inquiry;It is intended to similar query generation unit, is intended to similar look into for input inquiry generation Ask, each of which, which is intended to similar inquiry, to be had and the same or similar intention type of the input inquiry;First intention excavates Unit, one group of intention is excavated for the similar inquiry of each intention, looked into wherein being each intended to provide for corresponding intention is similar The sub-topicses of inquiry;Similar intent information description collection determining unit, by using the whole intention groups for being intended to similar inquiry To determine similar intent information description collection;And second intention excavates unit, by using the similar intent information description collection To excavate the intention for the input inquiry.
According to another aspect of the invention, there is provided a kind of equipment for information retrieval, including:Input inquiry receives single Member, receive the input inquiry that user uses natural language;It is above-mentioned to be used to be intended to the equipment excavated, anticipated from the input inquiry Figure excavates;And search result obtaining unit, obtain the search result for excavating and being intended to.
According to another aspect of the invention, there is provided a kind of equipment for question and answer auxiliary, including:Input inquiry receives single Member, receive the input inquiry that user uses natural language;It is above-mentioned to be used to be intended to the equipment excavated, excavate and lead from the input inquiry Topic;And answer obtaining unit, obtain the answer for excavated theme.
One of the advantages of the present invention is, it is intended that the degree of accuracy of excavation is improved.Rare especially in intent information In the case of, the intention candidate obtained desired by user also can be accurately provided.
Another in advantages of the present invention is, it is intended that recall rate is improved.
By referring to the drawings to the present invention exemplary embodiment detailed description, further feature of the invention and its Advantage will be made apparent from.
Brief description of the drawings
The accompanying drawing of a part for constitution instruction describes embodiments of the invention, and is used to solve together with the description Release the principle of the present invention.
Referring to the drawings, according to following detailed description, the present invention can be more clearly understood, wherein:
Fig. 1 shows the flow chart of the method for being used to be intended to excavate used in the non-patent literature 3 of prior art.
Fig. 2 shows the US8,214,347B2 in prior art(Patent document 1)The middle side for being used to be intended to excavate used The flow chart of method.
Fig. 3 is the block diagram for illustrating the ability to implement the hardware configuration of the computer system 1000 of embodiments of the present invention.
Fig. 4 show according to the embodiment of the present invention be intended to the side of excavation by using similar inquiry is intended to The flow chart of method.
Fig. 5 shows that generation according to the embodiment of the present invention is intended to the flow chart of the method for similar inquiry.
Fig. 6 is shown according to the embodiment of the present invention by being intended to the similar method for inquiring about the similar inquiry of storehouse generation intention Flow chart.
Fig. 7 shows the flow for the method for being intended to similar inquiry using domain body generation according to the embodiment of the present invention Figure.
Fig. 8 shows the method for generating the similar inquiry of intention using similar designator is intended to according to the embodiment of the present invention Flow chart.
Fig. 9 shows the method for being intended to similar inquiry according to the embodiment of the present invention, for input inquiry generation Flow chart.
Figure 10 show according to the embodiment of the present invention, identified input inquiry core be intended to part and modifier part Method flow chart.
Figure 11 is shown according to the embodiment of the present invention, determine that similar intent information describes by morphological analysis means The flow chart of the method for collection.
Figure 12 is shown according to the embodiment of the present invention, determine that similar intent information describes by syntactic analysis means The flow chart of the method for collection.
Figure 13 show according to the embodiment of the present invention, similar intent information is determined by semantic relation analysis means The flow chart of the method for collection is described.
Figure 14 is shown according to the embodiment of the present invention, determine that similar intent information describes by logic analysis means The flow chart of the method for collection.
Figure 15 show according to the embodiment of the present invention carry out being intended to excavation by using similar inquiry is intended to Another flow chart of method.
Figure 16 shows the flow chart for method for information retrieval according to the embodiment of the present invention.
Figure 17 shows the flow chart of the method for being used for question and answer auxiliary according to the embodiment of the present invention.
Figure 18 shows the functional block diagram for being used to excavate the equipment 7000 being intended to according to the embodiment of the present invention.
Figure 19 shows the functional block diagram of the equipment 8000 for information retrieval according to the embodiment of the present invention.
Figure 20 shows the functional block diagram of the equipment 9000 for being used for question and answer auxiliary according to the embodiment of the present invention.
Embodiment
The various exemplary embodiments of the present invention are described in detail now with reference to accompanying drawing.It should be noted that:Unless have in addition Body illustrates that the unlimited system of part and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally The scope of invention.
The description only actually at least one exemplary embodiment is illustrative to be never used as to the present invention below And its application or any restrictions that use.
It may be not discussed in detail for technology, method and apparatus known to person of ordinary skill in the relevant, but suitable In the case of, the technology, method and apparatus should be considered as authorizing part for specification.
In shown here and discussion all examples, any occurrence should be construed as merely exemplary, without It is as limitation.Therefore, the other examples of exemplary embodiment can have different values.
It should be noted that:Similar label and letter represents similar terms in following accompanying drawing, therefore, once a certain Xiang Yi It is defined, then it need not be further discussed in subsequent accompanying drawing in individual accompanying drawing.
Fig. 3 is the block diagram for illustrating the ability to implement the hardware configuration of the computer system 1000 of embodiments of the present invention.
As shown in Figure 3, computer system includes computer 1110.Computer 1110 includes connecting via system bus 1121 The processing unit 1120 that connects, system storage 1130, fixed non-volatile memory interface 1140, removable non-volatile memories Device interface 1150, user input interface 1160, network interface 1170, video interface 1190 and peripheral interface 1195.
System storage 1130 includes ROM(Read-only storage)1131 and RAM(Random access memory)1132.BIOS (Basic input output system)1133 reside in ROM1131.Operating system 1134, application program 1135, other program modules 1136 and some routine datas 1137 reside in RAM1132.
The fixed non-volatile memory 1141 of such as hard disk etc is connected to fixed non-volatile memory interface 1140. Fixed non-volatile memory 1141 for example can store an operating system 1144, application program 1145, other program modules 1146 With some routine datas 1147.
The removable non-volatile memory of such as floppy disk 1151 and CD-ROM drive 1155 etc is connected to Removable non-volatile memory interface 1150.For example, diskette 1 152 can be inserted into floppy disk 1151, and CD (CD)1156 can be inserted into CD-ROM drive 1155.
The input equipment of such as mouse 1161 and keyboard 1162 etc is connected to user input interface 1160.
Computer 1110 can be connected to remote computer 1180 by network interface 1170.For example, network interface 1170 Remote computer 1180 can be connected to via LAN 1171.Or network interface 1170 may be coupled to modem (Modulator-demodulator)1172, and modem 1172 is connected to remote computer 1180 via wide area network 1173.
Remote computer 1180 can include the memory 1181 of such as hard disk etc, and it stores remote application 1185。
Video interface 1190 is connected to monitor 1191.
Peripheral interface 1195 is connected to printer 1196 and loudspeaker 1197.
Computer system shown in Fig. 3 is merely illustrative and is never intended to enter invention, its application, or uses Row any restrictions.
Computer system shown in Fig. 3 can be incorporated in any embodiment, can be used as stand-alone computer, or also may be used As the processing system in equipment, one or more unnecessary components can be removed, one or more can also be added to Multiple additional components.
Fig. 4 show according to the embodiment of the present invention be intended to the side of excavation by using similar inquiry is intended to The flow chart of method.
As shown in figure 4, first, in step S3100, obtain the inquiry of user's input.It will be appreciated by those skilled in the art that The inquiry that user is inputted can use various language, include but is not limited to:Chinese, English, Japanese, Korean, German, French, Russian, Arabic etc..
For example, the inquiry that user is inputted can be " becoming a paralegal ".For the inquiry, user wishes Hope obtained time of day data(That is, model answer)Shown in table 2.
Table 2 is for inquiring about " becoming a paralegal " time of day data
In table 2, so-called " intention type " refers to be intended to and the relation of respective queries.For the sake of clarity, table 3 is shown Some examples of intention type.
Inquiry It is intended to Intention type
becoming a paralegal becoming a paralegal class Course(Course)
becoming a paralegal becoming a paralegal degree Degree(Position)
becoming a engineer becoming a engineer class Course(Course)
becoming a engineer Requirement of becoming a engineer Require(It is required that)
The intention type example of table 3
As shown in table 3, if the inquiry of input is " becoming a paralegal ", and being intended that accordingly " becoming a paralegal class ", then corresponding intention type is exactly " course(Course)", i.e. " becoming a Paralegal class " are related to the information in terms of " course ".If the inquiry of input is " becoming a Paralegal ", and be intended that accordingly " becoming a paralegal degree ", then corresponding intention type be exactly “degree(Position)", i.e., " becoming a paralegal degree " be related to " degree " in terms of information.
With continued reference to Fig. 4, next, in step S3110, it is intended to similar inquiry for the query generation of input.Wherein, often One is intended to similar inquiry and has and the same or similar intention type of the input inquiry.
If inquiry is similar, they may have same or analogous intention type, it means that when user searches for During the information of one inquiry, some sub-topics of his removal search inquiry, and when being inquired about as other users searching class, institute The sub-topicses of search may be identical.For example, user's search " find by becoming a paralegal ", a kind of universal being intended that “the course of paralegal(Aid in the course of lawyer)", and if user searches for " becoming an " the course of engineer are found in engineer ", a kind of universal being intended that(The course of engineer)”.For It is this to be intended to be also universal for other intent queries of " becoming a ' position ' ".Therefore, we can use It is intended to similar inquiry to excavate the intention for user's inquiry.
Fig. 5 shows that generation according to the embodiment of the present invention is intended to the flow chart of the method for similar inquiry.Such as Fig. 5 institutes Show, first, in step S3210, for the inquiry of user's input, generate multiple be intended to similar to inquiry.As described hereinafter , multiple be intended to similar to inquiry can be generated using a variety of methods.Next, in step S3220, it is similar to calculate the intention The similar degree between each inquiry and the input inquiry in inquiry.Calculate each inquiry being intended in similar inquiry with The method of similar degree between the input inquiry will be described more fully hereinafter.Finally, in step S3230, from the intention The certain amount of intention class for being intended to be more than predetermined threshold similar to inquiry or similar degree of similar degree highest is selected in similar inquiry Like inquiry, as output.
When input inquiry is simple word, multiple be intended to similar to inquiry can be generated using the method shown in Fig. 6.Such as figure Shown in 6, in step S3310, it is intended to similar inquiry storehouse by checking to generate the similar inquiry of intention.Such as, it is intended that similar inquiry Storehouse maintains the list of pop music star, when input inquiry is related to emerging pop music star, can select with being somebody's turn to do Pop music star as emerging pop music stars is as the similar inquiry of intention.Next, in step S3320, meter Calculate the similar degree between each inquiry being intended in similar inquiry and the input inquiry.Calculate and be intended in similar inquiry The method of similar degree between each inquiry and the input inquiry will be described in more detail below.Finally, in step S3330, it is intended to select the similar inquiry of the certain amount of intention of similar degree highest or similar degree to be more than in similar inquiry from described The similar inquiry of the intention of predetermined threshold, as output.
Further, it is also possible to similar inquiry is intended to generate using the method shown in Fig. 7.As shown in fig. 7, in step S3410 In, multiple be intended to similar to inquiry is generated by checking domain body, i.e. the one of the input inquiry is obtained in domain body Individual or multiple brother of nodes are intended to similar inquiry as described." domain body " is the encyclopaedic knowledge network of structuring, example Such as wikipedia.For example, if the inquiry of input is " Vanuatu ".In geography body, " Vanuatu " is one big Foreign continent country.Therefore, " Fiji ", " Indonesia ", " Kiribati ", " Ma Shaoer group can be selected by geography body Island " etc. is as the similar inquiry of intention.Next, in step S3420, each inquiry being intended in similar inquiry and institute are calculated State the similar degree between input inquiry.Calculate the similar degree between each inquiry being intended in similar inquiry and the input inquiry Method will be described in more detail below.Finally, in step S3430, it is intended to select similar degree in similar inquiry from described Highest is certain amount of to be intended to the inquiry similar more than the intention of predetermined threshold of similar inquiry or similar degree, as output.
Alternatively and/or additionally, the neighbouring concept of the input inquiry can also be obtained in language dictionaries as institute State the similar inquiry of intention.
Alternatively and/or additionally, also, can be by calculating meaning based on the muster data associated with the input inquiry Figure similarity is intended to similar inquiry to obtain one or more inquiries from inquiry log as described.
Further, it is also possible to similar inquiry is intended to generate using the method shown in Fig. 8.As shown in figure 8, in step S3510, It is intended to similar inquiry by using similar designator is intended to generate.It is described to be intended to include in the following extremely similar to designator One item missing:Coordination designator, wherein two phrases connected by the coordination designator are used as phase in sentence With syntactic element, such as " and ", "AND", " and ", " with ", etc.;First in relativity designator, wherein sentence Phrase and the second phrase being connected to by the relativity designator after first phrase are in relativity, such as " relative to ", " compared to ", " vs ", " compared to ", etc.;And choice relation designator, wherein passing through the selection Two phrases of relationship indicator connection form selective expression in sentence, such as "or", " ... among ", " ... Between ", " or ", " between ", " among ", etc..The phrase that the intention shows to be linked by it similar to designator can To be the similar inquiry of intention of candidate.
In other words, in step S3510, one or more inquiries are obtained to phrase from least one data source, wherein often Individual inquiry includes to phrase:The input inquiry, it is intended to similar designator and the 3rd phrase;And from each inquiry to short Language extracts the 3rd phrase, is intended to similar inquiry as described.
If for example, the inquiry inputted is " pressure type cleaning machine ", following syntagma can be obtained from data source (sentence segment):
Pressure type cleaning machine vs cold anticyclone cleaning machines;
Pressure type cleaning machine vs Pneumatic cleaning machines;
Pressure type cleaning machine and air compressor;
Pressure type cleaning machine and steam cleaner;
Grass mower or pressure type cleaning machine.
Therefore, for inquiring about " pressure type cleaning machine ", " cold anticyclone cleaning machine ", " Pneumatic cleaning machine ", " air can be selected Compressor ", " steam cleaner " are used as intention is similar to inquire about with " grass mower ".
Next, in step S3520, calculate between each inquiry being intended in similar inquiry and the input inquiry Similar degree.The method for calculating the similar degree between each inquiry being intended in similar inquiry and the input inquiry will below In be more fully described.Finally, in step S3530, it is intended to select similar degree highest specific quantity in similar inquiry from described The similar inquiry of intention or similar degree be more than the similar inquiry of intention of predetermined threshold, as output.
In addition, when input inquiry is that more words are inquired about, similar inquiry can be intended to generate using the method shown in Fig. 9. As shown in figure 9, first, in step S3610, identify that the core of the input inquiry as the inquiry of more words is intended to part and modification Language part.
Figure 10 shows that according to the embodiment of the present invention the core of identified input inquiry is intended to part and modifier part Method flow chart.As shown in Figure 10, first, in step S3710, extension is generated for each semantic primitive of input inquiry Inquiry.That is, the input inquiry is parsed, the input inquiry is divided into multiple semantic primitives(Multiple words);For described The each semantic primitive divided of input inquiry, generate the interim intention that the semantic primitive by being divided and changing section are formed Similar inquiry(Expanding query), wherein the changing section is the intention generated for other semantic primitives of the input inquiry Similar phrase.In one embodiment, it is described to be intended to similar phrase(Changing section)Generation can include:From at least one Data source obtains one or more inquiries to phrase, wherein each inquiry includes to phrase:Other semantemes of the input inquiry Unit, it is intended to similar designator and the 3rd phrase;And from each inquiry to the 3rd phrase described in Phrase extraction, as institute State the similar phrase of intention(Changing section).
Next, in step S3720, for each semantic primitive divided of the input inquiry, face for each The similar inquiries of Shi Yitu(Expanding query)One group of intention is excavated, is looked into wherein each intention provides for corresponding interim intention is similar The sub-topicses of inquiry.For each semantic primitive divided of the input inquiry, pass through the interim of more corresponding semantic primitive It is intended to the similar intention group inquired about to calculate consistent degree, wherein the consistent degree is the interim intention class for corresponding semantic primitive Like the homophylic measurement of intention of inquiry, it is intended to if present in interim similar to the intention type in the intention inquired about more commonly, Then the consistent degree is higher.
Next, in step S3730, the semantic primitive in the input inquiry with highest consistent degree is defined as described The core of input inquiry is intended to part, and other semantic primitives are defined as to the modifier part of the input inquiry.
For example, " becoming a paralegal ", using the above method, generate and expand for each word for input inquiry Exhibition inquiry.Table 4 shows the example of the query word and corresponding expanding query for the inquiry of more words.
Table 4 is for the query word of more words inquiry and the example of corresponding expanding query
Then, for each expanding query, it is intended to using traditional method to generate, and each language is directed to by comparing Intention group that adopted unit excavates calculates consistent degree.
In one embodiment, the consistent degree can be calculated as below:
Wherein, NAllInetentRepresent intentional, the N obtained for the expanding query of each semantic primitivePopIntent Represent the intent information description present in more than 5 inquiries.
For example, in " becoming a Engineer ", " becoming a Accountant ", " becoming a Law In clerk " etc intention, generally existing " becoming a*class ", " becoming a*degree ", " becoming A*training " etc intention type.However, " training paralegal ", " severing paralegal ", " supervising a paralegal ", " and in directing a paralegal " etc intention, few generally existings Intention type.Therefore for input inquiry " becoming a paralegal ", the consistent degree ratio of " becoming " " paralegal " is high.In this example, by data analysis, the consistent degree of " becoming " is 0.81, and " paralegal " Consistent degree is 0.03.Therefore in the inquiry, it is " becoming " that core, which is intended to part, and modifier part is " paralegal ", The intention of inquiry is mainly determined by " becoming ".
Referring back to Fig. 9, in step S3620, by using the modifier of input inquiry described in a variety of replacement partial replacements Part come generate it is described be intended to similar inquiry, wherein it is each substitute part be for modifier part generation intention it is similar Phrase, wherein each be intended to similar phrase with the same or similar intention type in modifier part with the input inquiry. In one embodiment, it is intended that similar phrase(Substitute part)Generation include:From at least one data source obtain one or Multiple queries are to phrase, wherein each inquiry includes to phrase:The modifier part, the similar designator of intention and the 3rd are short Language;And similar phrase is intended to as described to the 3rd phrase described in Phrase extraction from each inquiry(Substitute part).
Next, each inquiry being intended in similar inquiry and the input inquiry can be calculated in step S3630 Between similar degree.The method for calculating the similar degree between each inquiry being intended in similar inquiry and the input inquiry will be Hereinafter it is more fully described.Finally, can be intended to select similar degree highest in similar inquiry from described in step S3640 It is certain amount of to be intended to the inquiry similar more than the intention of predetermined threshold of similar inquiry or similar degree, as output.
Alternatively, when the input inquiry is more word inquiries, can also be anticipated only for the core of the input inquiry The generation of figure part is intended to similar phrase, is intended to similar inquiry as described.Specifically, when the input inquiry is more word inquiries When, being intended to similar inquiry for input inquiry generation can include:Identify the input inquiry core be intended to part and Modifier part;Then the intention of the core intention part of the input inquiry is generated similar to phrase, it is similar as the intention Inquiry.
It is then also possible to using method as described below calculate each inquiry being intended in similar inquiry with it is described defeated Enter the similar degree between inquiry.Finally, can be intended to select the certain amount of meaning of similar degree highest in similar inquiry from described The similar inquiry of figure or similar degree are more than the similar inquiry of intention of predetermined threshold, as output.
Wherein it is possible to identify that the core of the input inquiry is intended to part and modification by referring to the method for Figure 10 descriptions Language part.First, the input inquiry is parsed, the input inquiry is divided into multiple semantic primitives(Multiple words);For The each semantic primitive divided of the input inquiry, generate the semantic primitive by being divided and changing section form it is interim It is intended to similar inquiry, wherein the changing section is the intention generated for other semantic primitives of the input inquiry similar to short Language.In one embodiment, it is described to be intended to similar phrase(Changing section)Generation can include:From at least one data source One or more inquiries are obtained to phrase, wherein each inquiry includes to phrase:Other semantic primitives, the meaning of the input inquiry Figure similar designator and the 3rd phrase;And from each inquiry to the 3rd phrase described in Phrase extraction, as the intention Similar phrase(Changing section).Next, for each semantic primitive divided of the input inquiry, for each interim It is intended to similar inquiry and excavates one group of intention, wherein each is intended to provide for the corresponding interim sub-topicses for being intended to similar inquiry. For each semantic primitive divided of the input inquiry, pass through the similar inquiry of the interim intention of more corresponding semantic primitive Intention group calculate consistent degree, wherein the consistent degree is the intention of the similar inquiry of interim intention for corresponding semantic primitive Homophylic measurement, it is intended to if present in interim similar to the intention type in the intention inquired about more commonly, then the consistent degree It is higher.Finally, the core for the semantic primitive in the input inquiry with highest consistent degree being defined as to the input inquiry is anticipated Figure part, and other semantic primitives are defined as to the modifier part of the input inquiry.In addition, in one embodiment, Generate the intention of the core intention part of the input inquiry includes similar to phrase:One or more is obtained from least one data source Individual inquiry is to phrase, wherein each inquiry includes to phrase:The core of the input inquiry is intended to part, is intended to similar designator And the 3rd phrase;And similar inquiry is intended to as described to the 3rd phrase described in Phrase extraction from each inquiry.
For example, if input inquiry is " black history ", it may be determined that the core of the input inquiry, which is intended to part, is “history”.Modifier part " black " can not be considered, and only generate the intention of " history " similar to phrase, such as “history timeline”、“study of history”、“list of famous history”、“resources History " etc., as the similar inquiry of intention.
Below, the method that description calculates the similar degree between the inquiry being intended in similar inquiry and input inquiry.The meaning The similar degree between each inquiry and the input inquiry in the similar inquiry of figure is counted by least one in the following Calculate.
(1)The inquiry and the consistent degree of the input inquiry, if the inquiry being intended in similar inquiry with it is described The intention type of input inquiry is more similar, then the similar degree between them is higher;
(2)The inquiry and the vocabulary similarity of the input inquiry, if the inquiry being intended in similar inquiry with The form of the input inquiry is more similar, then the two inquiry between similar degree it is higher, such as " car ", " motorbike ", The similar degree ratio " motorbike " of " motorscooter ", the similar degree of " bike " are high;
(3)The inquiry and the grammer similarity of the input inquiry, if the inquiry being intended in similar inquiry with The input inquiry is in context(Fragment or document)Grammatical pattern in environment is more similar, then similar between the two inquiries Degree is higher, such as relative to " ride a bike ", " drive a car " and " drive a motor " similar degree is higher;
(4)The inquiry and the semantic similarity of the input inquiry, if the inquiry being intended in similar inquiry with The input inquiry is more similar in implication, then the similar degree of the two inquiries is higher;
(5)The inquiry and context similarity of the input inquiry in the collected works prepared, if the intention Inquiry and the context of the input inquiry in similar inquiry(Fragment or document)It is more similar, then the two inquiry similar degrees It is higher;
(6)The inquiry and common occurrence rate of the input inquiry in inquiry log, if described be intended to similar look into Inquiry in inquiry occurs more frequent jointly with the input inquiry in inquiry log, then the similar degree of the two inquiries is got over It is high;
(7)The inquiry and distance of the input inquiry in domain body, such as Britain, Japan and Fa Guodoushi states Family, but because Britain and France are all European countries in the body, therefore the similar degree of Britain and France is higher than Britain and day This similar degree;And
(8)The inquiry and the similitude of the muster data of the input inquiry, if described be intended in similar inquiry Inquiry is similar to the curve of the muster data of the input inquiry, then the two inquiries are similar.
In addition, the similar degree between each inquiry being intended in similar inquiry and the input inquiry can also pass through At least one of real-world information calculate, the real-world information comprises at least:Time, position, user model, Yi Jihuan Border.
For example, the inquiry inputted is " Phoenix university ", and the similar inquiry of the intention generated can be such as the institute of table 5 Show.
" the similar inquiry of university of phoenix " intention of table 5
When user scans in Beijing, user may want to be used as " the university of " university in the U.S. " Of phoenix " information, and when user scans in State of Arizona, US Mesa cities, he may like to know that conduct " university of Arizona State " " university of phoenix " information, thus in diverse location this two For individual user, the similar degree for the similar inquiry of each generated intention is different.
For in Pekinese user, most similar inquiry is probably Stanford University, Harvard University, Massachusetts Institute of Technology and University of Pennsylvania. And for the user in State of Arizona, US Mesa cities, most similar inquiry is probably Western International University, Grand Canyon University, University of Arizona and Northern Arizona University。
If in addition, it will be appreciated by those skilled in the art that the identity of user, used equipment(Such as computer, hand Machine, printer etc.)Difference, the similar degree for the similar inquiry of intention of institute's input inquiry is also different.
In addition, it will be appreciated by those skilled in the art that can combine in an arbitrary manner it is any of the above generation be intended to it is similar The mode of inquiry.
Referring back to Fig. 4, in step S3120, by using method of the prior art, looked into for each intention is similar Ask and excavate one group of intention, wherein each be intended to provide for the corresponding sub-topicses for being intended to similar inquiry.
Next, in step S3130, similar meaning is determined by using the whole intention groups for being intended to similar inquiry Figure information description collection.Similar intent information description is the linguistic form of respective intent type.For example, go out as shown in Table 2, " becoming a paralegal class " intention type is " course(Course)", but in the present invention, we are not Need to identify the intention type of the intention, and only need only to extract the similar intent information description of the intention.Such as " for becoming a paralegal class ", extract " * class ".
Similar intent information description can be generated by input inquiry, such as use " becoming a engineer Class " with " steps on becoming a lawyer " generate similar intent information description " becoming a Paralegal class " and " steps on becoming a paralegal ".In addition, the similar intent information description It can be presented by the regular expression of input inquiry.Such as inquiry " the similar inquiry of becoming a paralegal " intention It is that " becoming a paralegal class ", " the similar inquiry of becoming a engineers " intention is for inquiry " becoming a engineer class ", therefore, similar intent information description can be expressed as " * class ".
According to one embodiment of the present invention, the similar intent information description can be determined by following step Collection:Analyze the linguistic form that each being intended in whole intention groups of similar inquiry is intended to;Determine the language shape At least one query intention relation in formula between the linguistic form and remaining linguistic form of the similar inquiry of respective intent;Correspond to Each linguistic form being intended to is transformed to regular expression by identified at least one query intention relation;And conversion is obtained The regular expression obtained is added to the similar intent information description and concentrated.
Preferably, determine that the similar intent information describes collection and may further include:Each intention group is extended, is wrapped Include:For each intention in the intention group, by being replaced with the synonym or near synonym of at least one word in intention At least one word and generate synonymous phrase, wherein at least one word is not intended in similar inquiry corresponding, and will Caused synonymous phrase is added in the intention group.
Similar intent information description can have polytype, such as vocabulary type describes similar to intent information, grammer type class Intent information similar with logical type is described like intent information description, semantic type similar to intent information to describe.
According to the embodiment of the present invention, each being intended in whole intention groups of similar inquiry can be intended into Row morphological analysis, syntactic analysis, semantic relation analysis and logic analysis in any one or more(With random order), and will Resulting similar intent information description combines, so that it is determined that the similar intent information description collection.
Figure 11 is shown according to the embodiment of the present invention, determine that similar intent information describes by morphological analysis means The flow chart of the method for collection.
As shown in figure 11, described be intended to similar to the complete of inquiry is parsed by morphological analysis means in step S4100 first Each intention in the intention group in portion, to detect whether the similar inquiry of corresponding intention meets at least one morphological rule.Such as Fruit is intended to similar inquiry and meets at least one morphological rule accordingly, then similar for the intention next, in step S4110 Each intention in the intention group of inquiry, the language of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard Speech form, and the intention converted.Next, in step S4120, it is determined that the vocabulary type class with vocabulary and asterisk wildcard form Described like intent information, will the conversion intention as having vocabulary intent information similar with the vocabulary type of asterisk wildcard form Description, the vocabulary type is described as the regular expression similar to intent information;And the regular expression is added to similar meaning The description of figure information is concentrated.
For example, if the inquiry of input is " scooter ", following exemplary term type can be generated and retouched similar to intent information State:
*store
electronic*
online
cheap*
*motor
Figure 12 is shown according to the embodiment of the present invention, determine that similar intent information describes by syntactic analysis means The flow chart of the method for collection.
As shown in figure 12, described be intended to similar to the complete of inquiry is parsed by syntactic analysis means in step S4200 first Each intention in the intention group in portion, to detect whether the similar inquiry of corresponding intention meets at least one syntax rule.Such as Fruit is intended to similar inquiry and meets at least one syntax rule accordingly, then is next looked into step S4210 for the intention is similar Each intention in the intention group of inquiry, the language of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard Form, and the intention converted.Next, in step S4220, it is determined that the grammer type with syntax rule and asterisk wildcard form Similar intent information describes, will the intention of the conversion as having, syntax rule is similar with the grammer type of asterisk wildcard form to anticipate Figure information is described, and the grammer type is described as the regular expression similar to intent information;And the regular expression is added to Similar intent information description is concentrated.
For example, for input inquiry " scooter ", following example grammar type can be generated and described similar to intent information:
*/prep/kids
how to/verb/*
*/prep/sale
Figure 13 show according to the embodiment of the present invention, similar intent information is determined by semantic relation analysis means The flow chart of the method for collection is described.
As shown in figure 13, described be intended to similar to inquiry is parsed by semantic relation analysis means in step S4300 first Whole intention groups in each intention, corresponding be intended to whether similar inquiry meets at least one semantic close to detect System.It is intended to similar inquiry if corresponding and meets at least one semantic relation, next in step S4310, for the intention Each intention in the intention group of similar inquiry, the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard Linguistic form, and remaining linguistic form is replaced with the semantic marker of remaining linguistic form of the intention, and converted Intention.Next, in step S4320, it is determined that being retouched with semantic marker intent information similar with the semantic type of asterisk wildcard form State, will the conversion intention as having the description of semantic marker similar with the semantic type of asterisk wildcard form intent information, general The semantic type describes to be used as the regular expression similar to intent information;And the regular expression is added to similar intent information and retouched State concentration.
For example, for input inquiry " scooter ", following exemplary semantic type can be generated and described similar to intent information:
*<brand>
*<company>
Figure 14 is shown according to the embodiment of the present invention, determine that similar intent information describes by logic analysis means The flow chart of the method for collection.
As shown in figure 14, described be intended to similar to the complete of inquiry is parsed by logic analysis means in step S4400 first Each intention in the intention group in portion, to detect whether the similar inquiry of corresponding intention meets at least one logical relation.Such as Fruit is intended to similar inquiry and meets at least one logical relation accordingly, then is next looked into step S4410 for the intention is similar Each intention in the intention group of inquiry, the language of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard Form, and remaining linguistic form is replaced with the logical type of remaining linguistic form of the intention, and the intention converted. Next, in step S4420, it is determined that with logical type intent information description similar with the logical type of asterisk wildcard form, will The intention of the conversion describes as with logical type intent information similar with the logical type of asterisk wildcard form, by the logical type Similar intent information description is used as the regular expression;And the regular expression is added to similar intent information description and concentrated.
For example, for input inquiry " scooter ", following example logic type can be generated and described similar to intent information:
*[version of](Word)
(Word)[place of]*
As it was previously stated, each being intended in whole intention groups of similar inquiry can be intended to carry out morphological analysis, Syntactic analysis, semantic relation analysis and logic analysis in any one or more.For example, can be to being intended to the complete of similar inquiry Each intention in the intention group in portion only carries out single in morphological analysis, syntactic analysis, semantic relation analysis and logic analysis A kind of only analysis, morphology point can also be only carried out to each intention being intended in whole intention groups of similar inquiry All four kinds of analyses in analysis, syntactic analysis, semantic relation analysis and logic analysis.Therefore, resulting similar intent information Description collection may include that vocabulary type describes similar to intent information, grammer type describes similar to intent information, semantic type is similar to intent information One or more in description intent information description similar with logical type.
In addition, in one embodiment, determine that the similar intent information describes collection and may further include:Calculate institute State similar intent information description and concentrate each confidence level similar to intent information description;And retouched from the similar intent information State and concentrate the certain amount of similar intent information description of selection confidence level highest or confidence level to be more than the similar of predetermined threshold Intent information describes.
In addition, the confidence level can be calculated using at least one in the following:Similar intent information description Frequency;The coverage rate of similar intent information description;And similar intent information description and the correlation of input inquiry.
In addition, the confidence level can calculate from least one in the following:The similar intent information description collection; The intention training set prepared;And the realm information prepared.
In addition, the confidence level for calculating similar intent information description from the similar intent information description collection can be wrapped further Include:Described according to the popularity for being intended to similar inquiry come the corresponding similar intent information for describing to concentrate to the similar intent information Configure different weights;And/or inquired about according to intention is similar with the similar degree between the input inquiry come to the similar meaning The corresponding similar intent information description that the description of figure information is concentrated configures different weights.
Or by previous inquiry " exemplified by university of phoenix " universities.For in Pekinese user, by In Stanford University, Harvard University, Massachusetts Institute of Technology and University of Pennsylvania similar degree is high, therefore can be the similar inquiry point of these intentions With higher weight.Table 6 is shown to " university of phoenix " each intention is similar to inquire about distributed power Weight.
" the weight example of university of phoenix " the similar inquiry of intention of table 6
Therefore, for input inquiry " university of phoenix ", " the similar meaning of university of* " forms The weight that the description of figure information obtains is higher.
Referring back to Fig. 4, next in step S3140, pin is excavated by using the similar intent information description collection To the intention of the input inquiry.In one embodiment, can be described by replacing similar intent information with input inquiry The asterisk wildcard in similar intent information description is concentrated to generate the intention for the input inquiry.If for example, input inquiry For " becoming a paralegal ", and similar intent information is described as that " step to* ", then new intention can be generated " step to becoming a paralegal ", and the intention generated can be exported.
Figure 15 show according to the embodiment of the present invention carry out being intended to excavation by using similar inquiry is intended to Another flow chart of method.Method shown in Figure 15 passes through by intention method for digging of the prior art and according to the present invention's Combination of Methods gets up, to realize that more accurate intention is excavated.For purposes of brevity, in present embodiment with reference picture 4 The detailed description of identical step will be omitted in described embodiment.
As shown in figure 15, first, in step S5100, the inquiry of user's input is obtained.Those skilled in the art can manage Solution, the inquiry that user is inputted can use various language, include but is not limited to:Chinese, English, Japanese, Korean, German, method Language, Russian, Arabic etc..For example, the inquiry that user is inputted can be " becoming a paralegal ".
Next, in step S5110, by using method well known in the prior art from search engine, wikipedia with And the external resource of overall importance such as inquiry log excavates one group of intention candidate of the input inquiry.Next, in step S5120, The intention candidate of repetition is removed from the intention candidate obtained.Next, in step S5130, it is ranked up to being intended to candidate, To obtain first group of intention.
Table 7 shows inquiry " the becoming a paralegal ", by using prior art inputted for user In first group of intention obtaining of known method.
Table 7 is for " first group of intention that becoming a paralegal " are obtained
With continued reference to Figure 15, second group of meaning for the input inquiry is excavated by using the method for the invention Figure.That is, in step S3110, it is intended to similar inquiry for input inquiry generation, each of which, which is intended to similar inquiry, to be had With the same or similar intention type of the input inquiry.In step S3120, one group is excavated for the similar inquiry of each intention It is intended to, wherein each be intended to provide for the corresponding sub-topicses for being intended to similar inquiry.In step S3130, by using described It is intended to the similar whole intention groups inquired about to determine similar intent information description collection.In step S3140, by using the class Collection is described like intent information to excavate second group of intention for the input inquiry.
Following step S5140, the combination to first group of intention and second group of intention are ranked up.In a kind of embodiment In, only occurring in the intention in first group of intention can be deleted.
In another embodiment, can also be by using the similar intent information description collection and first group of meaning Figure excavates second group of intention for the input inquiry.A kind of embodiment includes:By with input inquiry to replace The asterisk wildcard at least one similar intent information description that similar intent information description is concentrated is stated to generate at least one intention, Wherein described at least one intention is not in first group of intention;And at least one meaning generated is added in first group of intention Figure, and first group of intention of at least one intention generated with the addition of is as second group of intention.
However, some inquiries may have the peculiar intention for being not present in being intended in the intention of similar inquiry.In the present invention Some embodiments in, to these it is peculiar intention especially handled.For example, for input inquiry " last supper Painting ", first group of intention on the input inquiry is shown in table 8.
Table 8 is on " last supper painting " first group of intention
As can be seen from Table 8, in the case where being intended to similar inquiry and being directed to Leonardesque other oil paintings, " last supper Painting Jesus " and " Last Supper Painting Milan Italy " are specific to the inquiry.And it is being intended to In mining process, it is desirable to remain these peculiar intentions.Therefore, according to the embodiment of the present invention, by using described Similar intent information description collection excavates the another kind of second group of intention for the input inquiry with first group of intention Embodiment includes:Set pair is described by using the similar intent information to carry out for first group of intention of the input inquiry Sequence.The embodiment further comprises:Identification is for the peculiar intention in first group of intention of the input inquiry;According to spy Intentional peculiar degree, improve weight of the peculiar intention in the sequence;Wherein, by the following at least one of come Calculate special intentional peculiar degree:Input inquiry and special intentional common occurrence rate in the intention training set prepared;It is defeated Enter relation of the inquiry with peculiar intention in domain knowledge;Frequency of the peculiar intention in muster data;And peculiar intention exists Popularity in inquiry log.
With continued reference to Figure 15, in step S5150, will be intended to export according to the requirement of user.For example, certain number can be exported The intention of amount.Table 9 is shown for the input inquiry " intention that becoming a paralegal " are exported.Obviously, with reference to table 2 Shown time of day data(Model answer), resulting result is compared to passing through first group of knot obtained by prior art Fruit more meets the requirement of user.
Table 9 is for input inquiry " becoming a paralegal ", the intention exported by the method for the present invention
The present inventor compares survey to the method for Figure 15 according to the present invention with method of the prior art Examination.By test, the method shown in Fig. 1 is performance the best way in the prior art.Therefore, the method shown in Fig. 1 is selected to make For the contrast of the inventive method.
Using the method shown in Fig. 1 of prior art from overall situations such as search engine, wikipedia, inquiry log and Anchor Texts Property external resource excavate the intention candidate of the inquiry, and be ranked up by the frequency of occurrences to being intended to candidate.
As a comparison, carrying out intention excavation using the method shown in Figure 15 according to the present invention, and pass through the frequency of occurrences pair It is intended to candidate to be ranked up.The present inventor is also tested to 50 inquiries, including:“furniture for small spaces”、“Churchill downs”、“becoming a paralegal”、“internet phone service”、“Arkansas”、“battles in the civil war”、“hobby stores”、“Ontario California airport " etc..Table 10 shows average test result.
Measurement Prior art The present invention Improve
I-rec 0.3785 0.3933 0.0148
D-nDCG 0.3384 0.3715 0.0331
D#-nDCG 0.3584 0.3826 0.0242
The performance comparision of the present invention of table 10 and prior art
As can be seen from Table 10, compared to the method for prior art, recalled according to the intention of Figure 15 of present invention method Rate and intention accuracy rate are all improved.In addition, in terms of D#-nDCG, method of the invention improves than the method for prior art 2.42%。
In order to more intuitively react the effect of the present invention, " entered by input inquiry exemplified by becoming a paralegal " Row describes in detail." becoming a paralegal ", preceding 10 knots of the output of the present invention and prior art are taken for input Fruit is compared.Table 11 shows the time of day data of desired acquisition.Table 12 show prior art and the present invention it is each From output.Table 13 shows the test and comparison result of prior art and the present invention, it is clear that the result that the present invention obtains more is defined Really.That is, the accuracy rate for being intended to excavate can be provided by the present invention.
The desired time of day data obtained of table 11
The respective output of the prior art of table 12 and the present invention
Measurement Prior art The present invention
I-rec 0.1111 0.3333
D-nDCG 0.0734 0.5053
D#-nDCG 0.0922 0.4193
The test and comparison of the prior art of table 13 and the present invention
Experiment is compared more than, can further be confirmed the present invention and can more precisely be carried out compared to prior art It is intended to excavate, and improves intention recall rate.
Figure 16 shows the flow chart for method for information retrieval according to the embodiment of the present invention.Such as Figure 16 institutes Show, in step S6100, receive the input inquiry that user uses natural language.Next, in step S6110, retouched according to this paper The method that the use stated is intended to similar inquiry carries out intention excavation from the input inquiry.Next, in step S6120, obtain Excavate the search result being intended to.
Figure 17 shows the flow chart of the method for being used for question and answer auxiliary according to the embodiment of the present invention.Such as Figure 17 institutes Show, in step S6200, receive the input inquiry that user uses natural language.Next, in step S6210, retouched according to this paper The method that the use stated is intended to similar inquiry excavates theme from the input inquiry.Next, in step S6220, it is directed to The answer of the theme excavated.
Figure 18 shows the functional block diagram for being used to excavate the equipment 7000 being intended to according to the embodiment of the present invention.This sets Standby 7000 all functional modules(That is, the various units included by the equipment 7000, either show in figure, or in figure It is not shown)It can be realized by realizing the combination of the hardware of the principle of the invention, software or hardware and software.This area skill Art personnel are understandable that the functional module described in Figure 18 can combine or be divided into submodule, so as to realize The principle of foregoing invention.Therefore, description herein can support to functions described herein module it is any it is possible combination or Person divides or further restriction.
As shown in figure 18, according to an aspect of the present invention, can include for excavating the equipment 7000 being intended to:Input is looked into Acquiring unit 7100 is ask, is intended to similar query generation unit 7200, first intention excavation unit 7300, described similar to intent information Collect determining unit 7400 and second intention excavates unit 7500.The input inquiry acquiring unit 7100 is configured as obtaining defeated Enter inquiry.It is described to be intended to be configured as inquiring about for input inquiry generation intention is similar similar to query generation unit 7200, Each of which, which is intended to similar inquiry, to be had and the same or similar intention type of the input inquiry.The first intention excavates Unit 7300 is configured as excavating one group of intention for the similar inquiry of each intention, wherein each be intended to provide for corresponding It is intended to the sub-topicses of similar inquiry.The similar intent information description collection determining unit 7400 is configured as by using the meaning Figure determines similar intent information description collection similar to whole intention groups of inquiry.The second intention excavate unit 7500 by with It is set to by using intention of the similar intent information description collection to excavate for the input inquiry.
In one embodiment, it is described to be intended to include similar to query generation unit 7200:Inquiry obtains to phrase Unit, one or more inquiries is obtained to phrase from least one data source, wherein each inquiry includes to phrase:The input Inquiry, it is intended to similar designator and the 3rd phrase;And the 3rd Phrase extraction unit, from each inquiry to Phrase extraction institute The 3rd phrase is stated, is intended to similar inquiry as described.
In one embodiment, it is described to be intended to that at least one in the following is included similar to designator:Side by side Relationship indicator, wherein two phrases connected by the coordination designator are used as identical grammer member in sentence Element;The first phrase in relativity designator, wherein sentence by the relativity designator with being connected to described first The second phrase after phrase is in relativity;And choice relation designator, wherein passing through the choice relation designator Two phrases of connection form selective expression in sentence.
In one embodiment, it is described to be intended to similar query generation unit when the input inquiry is more word inquiries 7200 can include:Core is intended to part and modifier part recognition unit, identifies that the core of the input inquiry is intended to part With modifier part;And it is intended to similar phrase generation unit, the core for generating the input inquiry is intended to the intention class of part Like phrase, it is intended to similar inquiry as described.
In one embodiment, it is described to be intended to similar query generation unit when the input inquiry is more word inquiries 7200 can include:Core is intended to part and modifier part recognition unit, identifies that the core of the input inquiry is intended to part With modifier part;And modifier partial replacement unit, by using the modification of input inquiry described in a variety of replacement partial replacements Language part come generate it is described be intended to similar inquiry, wherein each part that substitutes is the intention class for modifier part generation Like phrase, wherein each be intended to similar phrase with the same or similar intention class in modifier part with the input inquiry Type.
In one embodiment, the core, which is intended to part and modifier part recognition unit, to include:Input is looked into Resolution unit is ask, the input inquiry is parsed, the input inquiry is divided into multiple semantic primitives;It is interim to be intended to similar look into Generation unit is ask, for each semantic primitive divided of the input inquiry, the semantic primitive by being divided is generated and changes Become the similar inquiry of interim intention that part is formed, wherein the changing section is other semantic primitives for the input inquiry The intention of generation is similar to phrase;3rd is intended to excavate unit, for each semantic primitive divided of the input inquiry, pin To one group of intention of each interim similar inquiry excavation of intention, wherein each intention is provided for the similar inquiry of interim intention accordingly Sub-topicses;Consistent degree computing unit, for each semantic primitive divided of the input inquiry, pass through more corresponding language The interim intention of adopted unit calculates consistent degree similar to the intention group of inquiry, wherein the consistent degree is for corresponding semantic primitive The similar inquiry of interim intention the homophylic measurement of intention, if present in the interim intention being intended in the intention of similar inquiry More commonly, then the consistent degree is higher for type;And core is intended to part determining unit, will have highest in the input inquiry The core that the semantic primitive of consistent degree is defined as the input inquiry is intended to part, and other semantic primitives is defined as described defeated Enter the modifier part of inquiry.
In one embodiment, it is described be intended to similar query generation unit can include it is following at least one:From depositing Store up and one or more inquiries are obtained in the similar inquiry storehouse of intention of the input inquiry as the list for being intended to similar inquiry Member;One or more brother of nodes of the input inquiry are obtained in domain body as the list for being intended to similar inquiry Member;The neighbouring concept of the input inquiry is obtained in language dictionaries as the unit for being intended to similar inquiry;And pass through Calculated based on the muster data associated with the input inquiry and be intended to similarity to obtain one or more from inquiry log Inquire about as the unit for being intended to similar inquiry.
In one embodiment, it is described to be intended to may further include similar to query generation unit:Similar degree calculates single Member, calculate the similar degree between each inquiry being intended in similar inquiry and the input inquiry;And it is intended to similar look into Selecting unit is ask, is intended to select similar degree highest is certain amount of to be intended to similar inquiry or similar degree in similar inquiry from described Inquiry similar more than the intention of predetermined threshold.
In one embodiment, it is similar between each inquiry being intended in similar inquiry and the input inquiry Degree can be calculated by least one in the following:The inquiry and the consistent degree of the input inquiry;The inquiry With the vocabulary similarity of the input inquiry;The inquiry and the grammer similarity of the input inquiry;It is described inquiry with it is described The semantic similarity of input inquiry;The inquiry and context similarity of the input inquiry in the collected works prepared;Institute State inquiry and common occurrence rate of the input inquiry in inquiry log;The inquiry is with the input inquiry in domain body In distance;And the inquiry and the similitude of the muster data of the input inquiry.
In one embodiment, it is similar between each inquiry being intended in similar inquiry and the input inquiry Degree can be calculated by least one real-world information, and the real-world information comprises at least:Time, position, user Model and environment.
In one embodiment, the similar intent information description can be in by the regular expression of input inquiry It is existing.
In one embodiment, the similar intent information description collection determining unit 7400 can include:Linguistic form Analytic unit, analyze the linguistic form that each being intended in whole intention groups of similar inquiry is intended to;Query intention Relation determination unit, determine in the linguistic form between the linguistic form and remaining linguistic form of the similar inquiry of respective intent At least one query intention relation;Regular expression converter unit, will be every corresponding to identified at least one query intention relation The linguistic form of one intention is transformed to regular expression;And regular expression adding device, the regular expression for converting acquisition is added The similar intent information description is added to concentrate.
In one embodiment, the similar intent information description collection determining unit 7400 may further include:Meaning Figure group expanding element, each intention group is extended, including:Synonymous phrase generation unit, for each meaning in the intention group Figure, by replacing at least one word with the synonym or near synonym of at least one word in intention to generate synonymous phrase, Wherein described at least one word is not intended in similar inquiry corresponding, and synonymous phrase adding device, same by caused by Adopted phrase is added in the intention group.
In one embodiment, the similar intent information description collection determining unit 7400 may further include:The One intents unit, each being intended in whole intention groups of similar inquiry is parsed by morphological analysis means It is intended to, to detect whether the similar inquiry of corresponding intention meets at least one morphological rule;First asterisk wildcard replacement unit, if It is corresponding to be intended to meet at least one morphological rule similar to inquiry, then for each in the intention group of the similar inquiry of the intention It is intended to, replaces the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention with asterisk wildcard, and the meaning converted Figure;First regular expression generation unit, using the intention of the conversion as similar with the vocabulary type of asterisk wildcard form with vocabulary Intent information is described, and the vocabulary type is described as the regular expression similar to intent information;And first regular expression add Add unit, the regular expression is added into similar intent information description concentrates.
In one embodiment, the similar intent information description collection determining unit 7400 may further include:The Two intents units, each being intended in whole intention groups of similar inquiry is parsed by syntactic analysis means It is intended to, to detect whether the similar inquiry of corresponding intention meets at least one syntax rule;Second asterisk wildcard replacement unit, if It is corresponding to be intended to meet at least one syntax rule similar to inquiry, then for each in the intention group of the similar inquiry of the intention It is intended to, replaces the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention with asterisk wildcard, and the meaning converted Figure;Second regular expression generation unit, using the intention of the conversion as the grammer type with syntax rule and asterisk wildcard form Similar intent information is described, and the grammer type is described as the regular expression similar to intent information;And the second regular table Up to adding device, the regular expression is added to similar intent information description and concentrated.
In one embodiment, the similar intent information description collection determining unit 7400 may further include:The Three intents units, described be intended to similar to every in the whole intention groups inquired about is parsed by semantic relation analysis means One intention, to detect whether the similar inquiry of corresponding intention meets at least one semantic relation;3rd asterisk wildcard replacement unit, If corresponding be intended to meet at least one semantic relation similar to inquiry, for every in the intention group of the similar inquiry of the intention One intention, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and with the meaning The semantic marker of remaining linguistic form of figure replaces remaining linguistic form, and the intention converted;And the 3rd regular table Up to generation unit, the intention of the conversion is retouched as with semantic marker intent information similar with the semantic type of asterisk wildcard form State, and the semantic type is described as the regular expression similar to intent information;3rd regular expression adding device, by it is described just Rule expression is added to similar intent information description and concentrated.
In one embodiment, the similar intent information description collection determining unit 7400 may further include:The Four intents units, each being intended in whole intention groups of similar inquiry is parsed by logic analysis means It is intended to, to detect whether the similar inquiry of corresponding intention meets at least one logical relation;4th asterisk wildcard replacement unit, if It is corresponding to be intended to meet at least one logical relation similar to inquiry, then for each in the intention group of the similar inquiry of the intention It is intended to, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and with the intention The logical type of remaining linguistic form replaces remaining linguistic form, and the intention converted;And the 4th regular expression life Into unit, described using the intention of the conversion as with logical type intent information similar with the logical type of asterisk wildcard form, And the logical type is described as the regular expression similar to intent information;4th regular expression adding device, will be described regular Expression is added to similar intent information description and concentrated.
In one embodiment, the similar intent information description collection determining unit may further include:Confidence level Computing unit, calculate the similar intent information description and concentrate each confidence level similar to intent information description;It is and similar Intent information describes selecting unit, concentrates selection confidence level highest certain amount of similar from the similar intent information description Intent information describes or confidence level describes more than the similar intent information of predetermined threshold.
In one embodiment, the confidence level can be calculated using at least one in the following:Similar meaning The frequency of figure information description;The coverage rate of similar intent information description;And similar intent information description and the phase of input inquiry Guan Xing.
In one embodiment, the confidence level can calculate from least one in the following:The similar meaning Figure information description collection;The intention training set prepared;And the realm information prepared.
In one embodiment, the confidence computation unit may further include:First weight dispensing unit, root The corresponding similar intent information description configuration for concentrate the similar intent information description according to the popularity for being intended to similar inquiry Different weights;And/or the second weight dispensing unit, according to the similar degree being intended between similar inquiry and the input inquiry come Different weights is configured to the corresponding similar intent information description that the similar intent information description is concentrated.
In one embodiment, the second intention excavates unit 7500 and can included:Input inquiry replacement unit, lead to Cross and concentrate the asterisk wildcard in similar intent information description to produce one to replace the similar intent information description with input inquiry Group is intended to.
In one embodiment, the second intention excavates unit 7500 and can included:First group of intention excavates unit, First group of intention for the input inquiry is excavated from least one data source;And second group of intention excavates unit, passes through Second group of intention for the input inquiry is excavated using the similar intent information description collection and first group of intention.
In one embodiment, second group of intention is excavated unit and can included:By being replaced with input inquiry The similar intent information describes the asterisk wildcard at least one similar intent information description of concentration to generate at least one meaning The unit of figure, wherein at least one intention is not in first group of intention;And add what is generated in first group of intention The unit of at least one intention.
In one embodiment, second group of intention is excavated unit and can included:Sequencing unit, by using described Similar intent information describes set pair and is ranked up for first group of intention of the input inquiry.
In one embodiment, second group of intention is excavated unit and may further include:Peculiar intention assessment list Member, identification is for the peculiar intention in first group of intention of the input inquiry;Weight changes unit, according to special intentional spy Degree of having, improve weight of the peculiar intention in the sequence;Wherein, peculiar meaning is calculated by least one in the following The peculiar degree of figure:Input inquiry and special intentional common occurrence rate in the intention training set prepared;Input inquiry and spy The intentional relation in domain knowledge;Frequency of the peculiar intention in muster data;And peculiar intention is in inquiry log Popularity.
Figure 19 shows the functional block diagram of the equipment 8000 for information retrieval according to the embodiment of the present invention.This sets Standby 8000 all functional modules(That is, the various units included by the equipment 8000, either show in figure, or in figure It is not shown)It can be realized by realizing the combination of the hardware of the principle of the invention, software or hardware and software.This area skill Art personnel are understandable that the functional module described in Figure 19 can combine or be divided into submodule, so as to realize The principle of foregoing invention.Therefore, description herein can support to functions described herein module it is any it is possible combination or Person divides or further restriction.
As shown in figure 19, the equipment 8000 for information retrieval includes:Input inquiry receiving unit 8100, it is above-mentioned be used for anticipate Scheme the equipment 7000 and search result obtaining unit 8200 excavated.The input inquiry receiving unit 8100 is configured as receiving User uses the input inquiry of natural language.The equipment 7000 for being used to be intended to excavate is configured as entering from the input inquiry Row is intended to excavate.The search result obtaining unit 8200 is configured as obtaining the search result for excavating and being intended to.
Figure 20 shows the functional block diagram of the equipment 9000 for being used for question and answer auxiliary according to the embodiment of the present invention.This sets Standby 9000 all functional modules(That is, the various units included by the equipment 9000, either show in figure, or in figure It is not shown)It can be realized by realizing the combination of the hardware of the principle of the invention, software or hardware and software.This area skill Art personnel are understandable that the functional module described in Figure 20 can combine or be divided into submodule, so as to realize The principle of foregoing invention.Therefore, description herein can support to functions described herein module it is any it is possible combination or Person divides or further restriction.
As shown in figure 20, the equipment 9000 for question and answer auxiliary includes:Input inquiry receiving unit 9100, it is above-mentioned be used for anticipate Scheme the equipment 7000 and answer obtaining unit 9200 excavated.The input inquiry receiving unit 9100 is configured as receiving user Using the input inquiry of natural language.The equipment 7000 for being used to be intended to excavate is configured as excavating from the input inquiry and led Topic.The answer obtaining unit 9200 is configured as obtaining the answer for excavated theme.
The present invention can be realized by following various schemes:
Scheme 1:A kind of to be used to be intended to the method excavated, methods described includes:
Obtain input inquiry;
It is intended to similar inquiry for input inquiry generation, each of which, which is intended to similar inquiry, to be had and the input Inquire about same or similar intention type;
One group of intention is excavated similar to inquiry for each intention, wherein it is similar for being intended to accordingly to be each intended to offer The sub-topicses of inquiry;
Similar intent information description collection is determined by using the whole intention groups for being intended to similar inquiry;And
By using intention of the similar intent information description collection to excavate for the input inquiry.
Scheme 2:Method as described in scheme 1, wherein being intended to similar inquiry for input inquiry generation includes:
One or more inquiries are obtained to phrase from least one data source, wherein each inquiry includes to phrase:It is described Input inquiry, it is intended to similar designator and the 3rd phrase;And
From each inquiry to the 3rd phrase described in Phrase extraction, it is intended to similar inquiry as described.
Scheme 3:Method as described in scheme 2, wherein described be intended to include at least one in the following similar to designator :
Coordination designator, wherein two phrases connected by the coordination designator are used as phase in sentence Same syntactic element;
The first phrase in relativity designator, wherein sentence is described with being connected to by the relativity designator The second phrase after first phrase is in relativity;And
Choice relation designator, wherein two phrases connected by the choice relation designator form choosing in sentence Selecting property is expressed.
Scheme 4:Method as described in scheme 1 or 2, wherein when the input inquiry is more word inquiries, for described defeated Enter query generation to be intended to include similar to inquiry:
Identify that the core of the input inquiry is intended to part and modifier part;And
The intention that the core for generating the input inquiry is intended to part is intended to similar inquiry similar to phrase as described.
Scheme 5:Method as described in scheme 1 or 2, wherein when the input inquiry is more word inquiries, for described defeated Enter query generation to be intended to include similar to inquiry:
Identify that the core of the input inquiry is intended to part and modifier part;And
The similar inquiry of the intention is generated by using the modifier part of input inquiry described in a variety of replacement partial replacements, Wherein it is each substitute part be the intention for modifier part generation similar to phrase, be intended to similar phrase wherein each and have There is the same or similar intention type in modifier part with the input inquiry.
Scheme 6:Method as described in scheme 4, wherein identifying that the core of the input inquiry is intended to part and modifier portion Dividing includes:
The input inquiry is parsed, the input inquiry is divided into multiple semantic primitives;
For each semantic primitive divided of the input inquiry, the semantic primitive by being divided and change portion are generated Divide the interim intention formed similar inquiry, wherein the changing section is other semantic primitives generation for the input inquiry Intention similar to phrase;
For each semantic primitive divided of the input inquiry, one is excavated for each similar inquire about of interim intention Group is intended to, wherein each be intended to provide for the corresponding interim sub-topicses for being intended to similar inquiry;
For each semantic primitive divided of the input inquiry, pass through the interim intention of more corresponding semantic primitive The intention group of similar inquiry calculates consistent degree, wherein the consistent degree similar is looked into for the interim intention of corresponding semantic primitive The homophylic measurement of intention of inquiry, it is intended to if present in interim similar to the intention type in the intention inquired about more commonly, then institute It is higher to state consistent degree;And
The core that the semantic primitive in the input inquiry with highest consistent degree is defined as to the input inquiry is intended to Part, and other semantic primitives are defined as to the modifier part of the input inquiry.
Scheme 7:Method as described in scheme 5, wherein identifying that the core of the input inquiry is intended to part and modifier portion Dividing includes:
The input inquiry is parsed, the input inquiry is divided into multiple semantic primitives;
For each semantic primitive divided of the input inquiry, the semantic primitive by being divided and change portion are generated Divide the interim intention formed similar inquiry, wherein the changing section is other semantic primitives generation for the input inquiry Intention similar to phrase;
For each semantic primitive divided of the input inquiry, one is excavated for each similar inquire about of interim intention Group is intended to, wherein each be intended to provide for the corresponding interim sub-topicses for being intended to similar inquiry;
For each semantic primitive divided of the input inquiry, pass through the interim intention of more corresponding semantic primitive The intention group of similar inquiry calculates consistent degree, wherein the consistent degree similar is looked into for the interim intention of corresponding semantic primitive The homophylic measurement of intention of inquiry, it is intended to if present in interim similar to the intention type in the intention inquired about more commonly, then institute It is higher to state consistent degree;And
The core that the semantic primitive in the input inquiry with highest consistent degree is defined as to the input inquiry is intended to Part, and other semantic primitives are defined as to the modifier part of the input inquiry.
Scheme 8:Method as described in scheme 1 or 2, wherein for input inquiry generation be intended to similar inquiry include with Under it is at least one:
One or more inquiries, which are obtained, from the similar inquiry storehouse of intention for being stored in the input inquiry is used as the intention Similar inquiry;
One or more brother of nodes that the input inquiry is obtained in domain body are intended to similar inquiry as described;
The neighbouring concept that the input inquiry is obtained in language dictionaries is intended to similar inquiry as described;And
It is intended to similarity by being calculated based on the muster data associated with the input inquiry to be obtained from inquiry log Obtain one or more inquiries and be intended to similar inquire about as described.
Scheme 9:Method as described in scheme 1, wherein being intended to similar inquiry further bag for input inquiry generation Include:
Calculate the similar degree between each inquiry being intended in similar inquiry and the input inquiry;And
It is intended to select the similar inquiry of the certain amount of intention of similar degree highest or similar degree big in similar inquiry from described In the similar inquiry of the intention of predetermined threshold.
Scheme 10:Method as described in scheme 9, wherein each inquiry being intended in similar inquiry is looked into the input Similar degree between inquiry is calculated by least one in the following:
The inquiry and the consistent degree of the input inquiry;
The inquiry and the vocabulary similarity of the input inquiry;
The inquiry and the grammer similarity of the input inquiry;
The inquiry and the semantic similarity of the input inquiry;
The inquiry and context similarity of the input inquiry in the collected works prepared;
The inquiry and common occurrence rate of the input inquiry in inquiry log;
The inquiry and distance of the input inquiry in domain body;And
The inquiry and the similitude of the muster data of the input inquiry.
Scheme 11:Method as described in scheme 9, wherein each inquiry being intended in similar inquiry is looked into the input Similar degree between inquiry is calculated by least one real-world information, and the real-world information comprises at least:Time, position Put, user model and environment.
Scheme 12:Method as described in scheme 1, wherein the regular table that the similar intent information description passes through input inquiry Up to presenting.
Scheme 13:Method as described in scheme 12, wherein determining that the similar intent information describes collection and included:
Analyze the linguistic form that each being intended in whole intention groups of similar inquiry is intended to;
Determine between the linguistic form and remaining linguistic form of the similar inquiry of respective intent in the linguistic form at least A kind of query intention relation;
Each linguistic form being intended to is transformed to regular table corresponding to identified at least one query intention relation Reach;And
The regular expression for converting acquisition is added into the similar intent information description to concentrate.
Scheme 14:Method as described in scheme 13, wherein determining that the similar intent information describes collection and further comprised:
Each intention group is extended, including:
For in the intention group each intention, by the synonym or near synonym with least one word in intention come Replace at least one word and generate synonymous phrase, wherein at least one word is not intended in similar inquiry corresponding, with And
Caused synonymous phrase is added in the intention group.
Scheme 15:Method as described in scheme 13, wherein determining that the similar intent information describes collection and further comprised:
Described each intention being intended in whole intention groups of similar inquiry is parsed by morphological analysis means, with Detection is corresponding to be intended to whether similar inquiry meets at least one morphological rule;
If corresponding be intended to meet at least one morphological rule similar to inquiry, for the intention of the similar inquiry of the intention Each intention in group, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and The intention converted;
Described using the intention of the conversion as with vocabulary intent information similar with the vocabulary type of asterisk wildcard form, and will The vocabulary type describes to be used as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
Scheme 16:Method as described in scheme 13, wherein determining that the similar intent information describes collection and further comprised:
Described each intention being intended in whole intention groups of similar inquiry is parsed by syntactic analysis means, with Detection is corresponding to be intended to whether similar inquiry meets at least one syntax rule;
If corresponding be intended to meet at least one syntax rule similar to inquiry, for the intention of the similar inquiry of the intention Each intention in group, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and The intention converted;
Described using the intention of the conversion as with syntax rule intent information similar with the grammer type of asterisk wildcard form, And the grammer type is described as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
Scheme 17:Method as described in scheme 15, wherein determining that the similar intent information describes collection and further comprised:
Described each intention being intended in whole intention groups of similar inquiry is parsed by syntactic analysis means, with Detection is corresponding to be intended to whether similar inquiry meets at least one syntax rule;
If corresponding be intended to meet at least one syntax rule similar to inquiry, for the intention of the similar inquiry of the intention Each intention in group, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and The intention converted;
Described using the intention of the conversion as with syntax rule intent information similar with the grammer type of asterisk wildcard form, And the grammer type is described as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
Scheme 18:Such as the method any one of scheme 13,15-17, wherein determining the similar intent information description Collection further comprises:
Described each meaning being intended in whole intention groups of similar inquiry is parsed by semantic relation analysis means Figure, to detect whether the similar inquiry of corresponding intention meets at least one semantic relation;
If corresponding be intended to meet at least one semantic relation similar to inquiry, for the intention of the similar inquiry of the intention Each intention in group, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and And remaining linguistic form is replaced with the semantic marker of remaining linguistic form of the intention, and the intention converted;
Described using the intention of the conversion as with semantic marker intent information similar with the semantic type of asterisk wildcard form, And the semantic type is described as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
Scheme 19:Such as the method any one of scheme 13,15-17, wherein determining the similar intent information description Collection further comprises:
Described each intention being intended in whole intention groups of similar inquiry is parsed by logic analysis means, with Detection is corresponding to be intended to whether similar inquiry meets at least one logical relation;
If corresponding be intended to meet at least one logical relation similar to inquiry, for the intention of the similar inquiry of the intention Each intention in group, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and And remaining linguistic form is replaced with the logical type of remaining linguistic form of the intention, and the intention converted;
Described using the intention of the conversion as with logical type intent information similar with the logical type of asterisk wildcard form, And the logical type is described as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
Scheme 20:Method as described in scheme 18, wherein determining that the similar intent information describes collection and further comprised:
Described each intention being intended in whole intention groups of similar inquiry is parsed by logic analysis means, with Detection is corresponding to be intended to whether similar inquiry meets at least one logical relation;
If corresponding be intended to meet at least one logical relation similar to inquiry, for the intention of the similar inquiry of the intention Each intention in group, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and And remaining linguistic form is replaced with the logical type of remaining linguistic form of the intention, and the intention converted;And
Described using the intention of the conversion as with logical type intent information similar with the logical type of asterisk wildcard form, And the logical type is described as the regular expression similar to intent information;
The regular expression is added into similar intent information description to concentrate.
Scheme 21:Method as described in scheme 13 or 14, wherein determining the similar intent information description collection further bag Include:
Calculate the similar intent information description and concentrate each confidence level similar to intent information description;And
The certain amount of similar intent information description of selection confidence level highest is concentrated from the similar intent information description Or confidence level is more than the similar intent information description of predetermined threshold.
Scheme 22:Method as described in scheme 21, the confidence level are calculated using at least one in the following:
The frequency of similar intent information description;
The coverage rate of similar intent information description;And
Similar intent information description and the correlation of input inquiry.
Scheme 23:Method as described in scheme 21, the confidence level calculate from least one in the following:
The similar intent information description collection;
The intention training set prepared;And
The realm information prepared.
Scheme 24:Method as described in scheme 23, wherein calculating similar intention letter from the similar intent information description collection The confidence level of breath description further comprises:
The corresponding similar intention letter of concentration is described to the similar intent information according to the similar popularity inquired about is intended to Breath description configures different weights;And/or
The similar intent information description is collected according to similar inquiry is intended to the similar degree between the input inquiry In corresponding similar intent information description configure different weights.
Scheme 25:Method as described in scheme 1, wherein excavation includes for the intention of the input inquiry:
By being concentrated with input inquiry to replace the similar intent information description similar to the wildcard in intent information description Accord with to produce one group of intention.
Scheme 26:Method as described in scheme 1, wherein excavation includes for the intention of the input inquiry:
First group of intention for the input inquiry is excavated from least one data source;And
Excavated by using the similar intent information description collection and first group of intention for the input inquiry Second group of intention.
Scheme 27:Method as described in scheme 26, wherein excavation includes for second group of intention of the input inquiry:
By being retouched with input inquiry to replace at least one similar intent information that the similar intent information describes to concentrate Asterisk wildcard in stating generates at least one intention, wherein at least one intention is not in first group of intention;And
At least one intention generated is added in first group of intention.
Scheme 28:Method as described in scheme 26, wherein excavation includes for second group of intention of the input inquiry:
Set pair is described by using the similar intent information to be ranked up for first group of intention of the input inquiry.
Scheme 29:Method as described in scheme 28, wherein excavation is further for second group of intention of the input inquiry Including:
Identification is for the peculiar intention in first group of intention of the input inquiry;
According to special intentional peculiar degree, weight of the peculiar intention in the sequence is improved;
Wherein, special intentional peculiar degree is calculated by least one in the following:
Input inquiry and special intentional common occurrence rate in the intention training set prepared;
Relation of the input inquiry with peculiar intention in domain knowledge;
Frequency of the peculiar intention in muster data;And
Popularity of the peculiar intention in inquiry log.
Scheme 30:One kind is used for method for information retrieval, including:
Receive the input inquiry that user uses natural language;
Method according to any one of scheme 1-29 carries out intention excavation from the input inquiry;And
Obtain the search result for excavating and being intended to.
Scheme 31:A kind of method for question and answer auxiliary, including:
Receive the input inquiry that user uses natural language;
Method according to any one of scheme 1-29 excavates theme from the input inquiry;And
Obtain the answer for excavated theme.
Scheme 32:A kind of to be used to be intended to the equipment excavated, the equipment includes:
Input inquiry acquiring unit, obtain input inquiry;
It is intended to similar query generation unit, is intended to similar inquiry for input inquiry generation, each of which is intended to Similar inquiry has and the same or similar intention type of the input inquiry;
First intention excavates unit, one group of intention is excavated for the similar inquiry of each intention, wherein each be intended to provide For the corresponding sub-topicses for being intended to similar inquiry;
Similar intent information description collection determining unit, come by using the whole intention groups for being intended to similar inquiry true Fixed similar intent information description collection;And
Second intention excavates unit, is excavated by using the similar intent information description collection for the input inquiry Intention.
Scheme 33:Equipment as described in scheme 32, wherein described be intended to include similar to query generation unit:
Inquiry obtains one or more inquiries to phrase, wherein each to phrase acquiring unit from least one data source Inquiry includes to phrase:The input inquiry, it is intended to similar designator and the 3rd phrase;And
3rd Phrase extraction unit, it is similar as the intention from each inquiry to the 3rd phrase described in Phrase extraction Inquiry.
Scheme 34:Equipment as described in scheme 33, wherein described be intended to include in the following at least similar to designator One:
Coordination designator, wherein two phrases connected by the coordination designator are used as phase in sentence Same syntactic element;
The first phrase in relativity designator, wherein sentence is described with being connected to by the relativity designator The second phrase after first phrase is in relativity;And
Choice relation designator, wherein two phrases connected by the choice relation designator form choosing in sentence Selecting property is expressed.
Scheme 35:Equipment as described in scheme 32 or 33, wherein when the input inquiry is more word inquiries, the intention Similar query generation unit includes:
Core is intended to part and modifier part recognition unit, identifies that the core of the input inquiry is intended to part and modification Language part;And
It is intended to similar phrase generation unit, the intention that the core for generating the input inquiry is intended to part is made similar to phrase It is intended to similar inquiry to be described.
Scheme 36:Equipment as described in scheme 32 or 33, wherein when the input inquiry is more word inquiries, the intention Similar query generation unit includes:
Core is intended to part and modifier part recognition unit, identifies that the core of the input inquiry is intended to part and modification Language part;And
Modifier partial replacement unit, by using the modifier part next life of input inquiry described in a variety of replacement partial replacements Be intended to similar inquiry into described, wherein it is each substitute part be the intention for modifier part generation similar to phrase, its In each be intended to similar phrase there is the same or similar intention type in modifier part with the input inquiry.
Scheme 37:Equipment as described in scheme 35, wherein the core is intended to part and modifier part recognition unit bag Include:
Input inquiry resolution unit, parses the input inquiry, and the input inquiry is divided into multiple semantic primitives;
It is interim to be intended to similar query generation unit, for each semantic primitive divided of the input inquiry, generation The similar inquiry of the interim intention that is made up of the semantic primitive divided with changing section, wherein the changing section is for described The intention of other semantic primitives generation of input inquiry is similar to phrase;
3rd is intended to excavate unit, for each semantic primitive divided of the input inquiry, for each interim It is intended to similar inquiry and excavates one group of intention, wherein each is intended to provide for the corresponding interim sub-topicses for being intended to similar inquiry;
Consistent degree computing unit, for each semantic primitive divided of the input inquiry, pass through more corresponding language The interim intention of adopted unit calculates consistent degree similar to the intention group of inquiry, wherein the consistent degree is for corresponding semantic primitive The similar inquiry of interim intention the homophylic measurement of intention, if present in the interim intention being intended in the intention of similar inquiry More commonly, then the consistent degree is higher for type;And
Core is intended to part determining unit, and the semantic primitive for having highest consistent degree in the input inquiry is defined as into institute The core for stating input inquiry is intended to part, and other semantic primitives are defined as to the modifier part of the input inquiry.
Scheme 38:Equipment as described in scheme 36, wherein the core is intended to part and modifier part recognition unit bag Include:
Input inquiry resolution unit, parses the input inquiry, and the input inquiry is divided into multiple semantic primitives;
It is interim to be intended to similar query generation unit, for each semantic primitive divided of the input inquiry, generation The similar inquiry of the interim intention that is made up of the semantic primitive divided with changing section, wherein the changing section is for described The intention of other semantic primitives generation of input inquiry is similar to phrase;
3rd is intended to excavate unit, for each semantic primitive divided of the input inquiry, for each interim It is intended to similar inquiry and excavates one group of intention, wherein each is intended to provide for the corresponding interim sub-topicses for being intended to similar inquiry;
Consistent degree computing unit, for each semantic primitive divided of the input inquiry, pass through more corresponding language The interim intention of adopted unit calculates consistent degree similar to the intention group of inquiry, wherein the consistent degree is for corresponding semantic primitive The similar inquiry of interim intention the homophylic measurement of intention, if present in the interim intention being intended in the intention of similar inquiry More commonly, then the consistent degree is higher for type;And
Core is intended to part determining unit, and the semantic primitive for having highest consistent degree in the input inquiry is defined as into institute The core for stating input inquiry is intended to part, and other semantic primitives are defined as to the modifier part of the input inquiry.
Scheme 39:Equipment as described in scheme 32 or 33, wherein described be intended to similar query generation unit including following It is at least one:
One or more inquiries, which are obtained, from the similar inquiry storehouse of intention for being stored in the input inquiry is used as the intention The unit of similar inquiry;
One or more brother of nodes that the input inquiry is obtained in domain body are intended to similar inquiry as described Unit;
The neighbouring concept of the input inquiry is obtained in language dictionaries as the unit for being intended to similar inquiry;And
It is intended to similarity by being calculated based on the muster data associated with the input inquiry to be obtained from inquiry log One or more inquiries are obtained as the units for being intended to similar inquiry.
Scheme 40:Equipment as described in scheme 32, wherein described be intended to further comprise similar to query generation unit:
Similar degree computing unit, calculate the class between each inquiry being intended in similar inquiry and the input inquiry Like degree;And
It is intended to similar inquiry selecting unit, is intended to select the certain amount of meaning of similar degree highest in similar inquiry from described The similar inquiry of figure or similar degree are more than the similar inquiry of intention of predetermined threshold.
Scheme 41:Equipment as described in scheme 40, wherein each inquiry being intended in similar inquiry and the input Similar degree between inquiry is calculated by least one in the following:
The inquiry and the consistent degree of the input inquiry;
The inquiry and the vocabulary similarity of the input inquiry;
The inquiry and the grammer similarity of the input inquiry;
The inquiry and the semantic similarity of the input inquiry;
The inquiry and context similarity of the input inquiry in the collected works prepared;
The inquiry and common occurrence rate of the input inquiry in inquiry log;
The inquiry and distance of the input inquiry in domain body;And
The inquiry and the similitude of the muster data of the input inquiry.
Scheme 42:Equipment as described in scheme 40, wherein each inquiry being intended in similar inquiry and the input Similar degree between inquiry is calculated by least one real-world information, and the real-world information comprises at least:Time, Position, user model and environment.
Scheme 43:Equipment as described in scheme 32, wherein the similar intent information description passes through the regular of input inquiry Express to present.
Scheme 44:Equipment as described in scheme 43, wherein the similar intent information description collection determining unit includes:
Linguistic form analytic unit, analyze the language that each being intended in whole intention groups of similar inquiry is intended to Speech form;
Query intention relation determination unit, determine that respective intent is similar to the linguistic form of inquiry and its in the linguistic form At least one query intention relation between remaining linguistic form;
Regular expression converter unit, the language for being intended to each corresponding to identified at least one query intention relation Formal argument is regular expression;And
Regular expression adding device, the regular expression for converting acquisition is added to the similar intent information description and concentrated.
Scheme 45:Equipment as described in scheme 44, wherein the similar intent information description collection determining unit is further wrapped Include:
Intention group expanding element, each intention group is extended, including:
Synonymous phrase generation unit, for each intention in the intention group, by with least one word in intention Synonym or near synonym generate synonymous phrase to replace at least one word, wherein at least one word is not corresponding It is intended in similar inquiry, and
Synonymous phrase adding device, caused synonymous phrase is added in the intention group.
Scheme 46:Equipment as described in scheme 44, wherein the similar intent information description collection determining unit is further wrapped Include:
First intention resolution unit, the whole intention groups for being intended to similar inquiry are parsed by morphological analysis means In each intention, corresponding be intended to whether similar inquiry meets at least one morphological rule to detect;
First asterisk wildcard replacement unit, if corresponding be intended to meet at least one morphological rule similar to inquiry, for Each intention in the intention group of the similar inquiry of the intention, respective intent class in the linguistic form of the intention is replaced with asterisk wildcard Like the linguistic form of inquiry, and the intention converted;
First regular expression generation unit, using the intention of the conversion as the vocabulary type with vocabulary and asterisk wildcard form Similar intent information is described, and the vocabulary type is described as the regular expression similar to intent information;And
First regular expression adding device, the regular expression is added to similar intent information description and concentrated.
Scheme 47:Equipment as described in scheme 44, wherein the similar intent information description collection determining unit is further wrapped Include:
Second intention resolution unit, the whole intention groups for being intended to similar inquiry are parsed by syntactic analysis means In each intention, corresponding be intended to whether similar inquiry meets at least one syntax rule to detect;
Second asterisk wildcard replacement unit, if corresponding be intended to meet at least one syntax rule similar to inquiry, for Each intention in the intention group of the similar inquiry of the intention, respective intent class in the linguistic form of the intention is replaced with asterisk wildcard Like the linguistic form of inquiry, and the intention converted;
Second regular expression generation unit, using the intention of the conversion as the language with syntax rule and asterisk wildcard form Method type is described similar to intent information, and the grammer type is described as the regular expression similar to intent information;And
Second regular expression adding device, the regular expression is added to similar intent information description and concentrated.
Scheme 48:Equipment as described in scheme 46, wherein the similar intent information description collection determining unit is further wrapped Include:
Second intention resolution unit, the whole intention groups for being intended to similar inquiry are parsed by syntactic analysis means In each intention, corresponding be intended to whether similar inquiry meets at least one syntax rule to detect;
Second asterisk wildcard replacement unit, if corresponding be intended to meet at least one syntax rule similar to inquiry, for Each intention in the intention group of the similar inquiry of the intention, respective intent class in the linguistic form of the intention is replaced with asterisk wildcard Like the linguistic form of inquiry, and the intention converted;
Second regular expression generation unit, using the intention of the conversion as the language with syntax rule and asterisk wildcard form Method type is described similar to intent information, and the grammer type is described as the regular expression similar to intent information;And
Second regular expression adding device, the regular expression is added to similar intent information description and concentrated.
Scheme 49:Such as the equipment any one of scheme 44,46-48, wherein the similar intent information description collection is true Order member further comprises:
3rd intents unit, the whole meanings for being intended to similar inquiry are parsed by semantic relation analysis means Each intention in figure group, to detect whether the similar inquiry of corresponding intention meets at least one semantic relation;
3rd asterisk wildcard replacement unit, if corresponding be intended to meet at least one semantic relation similar to inquiry, for Each intention in the intention group of the similar inquiry of the intention, respective intent class in the linguistic form of the intention is replaced with asterisk wildcard Remaining linguistic form is replaced like the linguistic form of inquiry, and with the semantic marker of remaining linguistic form of the intention, and is obtained To the intention of conversion;And
3rd regular expression generation unit, using the intention of the conversion as the language with semantic marker and asterisk wildcard form Adopted type is described similar to intent information, and the semantic type is described as the regular expression similar to intent information;
3rd regular expression adding device, the regular expression is added to similar intent information description and concentrated.
Scheme 50:Such as the equipment any one of scheme 44,46-48, wherein the similar intent information description collection is true Order member further comprises:
4th intents unit, the whole intention groups for being intended to similar inquiry are parsed by logic analysis means In each intention, corresponding be intended to whether similar inquiry meets at least one logical relation to detect;
4th asterisk wildcard replacement unit, if corresponding be intended to meet at least one logical relation similar to inquiry, for Each intention in the intention group of the similar inquiry of the intention, respective intent class in the linguistic form of the intention is replaced with asterisk wildcard Remaining linguistic form is replaced like the linguistic form of inquiry, and with the logical type of remaining linguistic form of the intention, and is obtained To the intention of conversion;And
4th regular expression generation unit, intention using the conversion is as having patrolling for logical type and asterisk wildcard form The type of collecting is described similar to intent information, and the logical type is described as the regular expression similar to intent information;
4th regular expression adding device, the regular expression is added to similar intent information description and concentrated.
Scheme 51:Equipment as described in scheme 49, wherein the similar intent information description collection determining unit is further wrapped Include:
4th intents unit, the whole intention groups for being intended to similar inquiry are parsed by logic analysis means In each intention, corresponding be intended to whether similar inquiry meets at least one logical relation to detect;
4th asterisk wildcard replacement unit, if corresponding be intended to meet at least one logical relation similar to inquiry, for Each intention in the intention group of the similar inquiry of the intention, respective intent class in the linguistic form of the intention is replaced with asterisk wildcard Remaining linguistic form is replaced like the linguistic form of inquiry, and with the logical type of remaining linguistic form of the intention, and is obtained To the intention of conversion;And
4th regular expression generation unit, intention using the conversion is as having patrolling for logical type and asterisk wildcard form The type of collecting is described similar to intent information, and the logical type is described as the regular expression similar to intent information;
4th regular expression adding device, the regular expression is added to similar intent information description and concentrated.
Scheme 52:Equipment as described in scheme 44 or 45, wherein the similar intent information description collection determining unit enters one Step includes:
Confidence computation unit, calculate the similar intent information description and concentrate each putting similar to intent information description Reliability;And
Similar intent information describes selecting unit, concentrates selection confidence level highest special from the similar intent information description The similar intent information that the similar intent information description of fixed number amount or confidence level are more than predetermined threshold describes.
Scheme 53:Equipment as described in scheme 52, the confidence level are calculated using at least one in the following:
The frequency of similar intent information description;
The coverage rate of similar intent information description;And
Similar intent information description and the correlation of input inquiry.
Scheme 54:Equipment as described in scheme 52, the confidence level calculate from least one in the following:
The similar intent information description collection;
The intention training set prepared;And
The realm information prepared.
Scheme 55:Equipment as described in scheme 54, wherein the confidence computation unit further comprises:
First weight dispensing unit, the similar intent information description is concentrated according to the popularity for being intended to similar inquiry Corresponding similar intent information description configure different weights;And/or
Second weight dispensing unit, according to the similar degree being intended between similar inquiry and the input inquiry come to the class The corresponding similar intent information description concentrated like intent information description configures different weights.
Scheme 56:Equipment as described in scheme 32, include wherein the second intention excavates unit:
Input inquiry replacement unit, by concentrating similar be intended to input inquiry to replace the similar intent information description Asterisk wildcard in information description produces one group of intention.
Scheme 57:Equipment as described in scheme 32, include wherein the second intention excavates unit:
First group of intention excavates unit, and first group of intention for the input inquiry is excavated from least one data source; And
Second group of intention excavates unit, is dug by using the similar intent information description collection and first group of intention Second group intention of the pick for the input inquiry.
Scheme 58:Equipment as described in scheme 57, include wherein second group of intention excavates unit:
By being retouched with input inquiry to replace at least one similar intent information that the similar intent information describes to concentrate Asterisk wildcard in stating generates the unit of at least one intention, wherein at least one intention is not in first group of intention;With And
The unit of at least one intention generated is added in first group of intention.
Scheme 59:Equipment as described in scheme 57, include wherein second group of intention excavates unit:
Sequencing unit, first group intention of the set pair for the input inquiry is described by using the similar intent information It is ranked up.
Scheme 60:Equipment as described in scheme 59, further comprise wherein second group of intention excavates unit:
Peculiar intention assessment unit, identification is for the peculiar intention in first group of intention of the input inquiry;
Weight changes unit, according to special intentional peculiar degree, improve weight of the peculiar intention in the sequence;
Wherein, special intentional peculiar degree is calculated by least one in the following:
Input inquiry and special intentional common occurrence rate in the intention training set prepared;
Relation of the input inquiry with peculiar intention in domain knowledge;
Frequency of the peculiar intention in muster data;And
Popularity of the peculiar intention in inquiry log.
Scheme 61:A kind of equipment for information retrieval, including:
Input inquiry receiving unit, receive the input inquiry that user uses natural language;
The equipment for being used to be intended to excavate according to any one of scheme 32-60, is intended to from the input inquiry Excavate;And
Search result obtaining unit, obtain the search result for excavating and being intended to.
Scheme 62:A kind of equipment for question and answer auxiliary, including:
Input inquiry receiving unit, receive the input inquiry that user uses natural language;
The equipment for being used to be intended to excavate according to any one of scheme 32-60, theme is excavated from the input inquiry; And
Answer obtaining unit, obtain the answer for excavated theme.
It will be appreciated by those skilled in the art that the various embodiments of the present invention can be combined arbitrarily, and without departing from this The scope of invention.
The method and system of the present invention may be achieved in many ways.For example, can by software, hardware, firmware or Software, hardware, firmware any combinations come realize the present invention method and system.The said sequence of the step of for methods described Order described in detail above is not limited to merely to illustrate, the step of method of the invention, it is special unless otherwise Do not mentionlet alone bright.In addition, in certain embodiments, the present invention can be also embodied as recording program in the recording medium, these programs Including the machine readable instructions for realizing the method according to the invention.Thus, the present invention also covering storage is used to perform basis The recording medium of the program of the method for the present invention.
Although some specific embodiments of the present invention are described in detail by example, the skill of this area Art personnel it should be understood that above example merely to illustrating, the scope being not intended to be limiting of the invention.The skill of this area Art personnel to above example it should be understood that can modify without departing from the scope and spirit of the present invention.This hair Bright scope is defined by the following claims.

Claims (58)

1. a kind of be used to be intended to the method excavated, methods described includes:
Obtain input inquiry;
It is intended to similar inquiry for input inquiry generation, each of which, which is intended to similar inquiry, to be had and the input inquiry Same or similar intention type;Wherein, being intended to similar inquiry for input inquiry generation includes:
One or more inquiries are obtained to phrase from least one data source, wherein each inquiry includes to phrase:The input Inquiry, it is intended to similar designator and the 3rd phrase;And
From each inquiry to the 3rd phrase described in Phrase extraction, it is intended to similar inquiry as described;
Wherein, it is described to be intended to include at least one in the following similar to designator:Coordination designator, relativity refer to Show symbol and choice relation designator;
One group of intention is excavated for the similar inquiry of each intention, wherein being each intended to provide inquiry similar for corresponding intention Sub-topicses;
Similar intent information description collection is determined by using the whole intention groups for being intended to similar inquiry;And
By using intention of the similar intent information description collection to excavate for the input inquiry.
2. the method as described in claim 1,
Two phrases wherein connected by the coordination designator are used as identical syntactic element in sentence;
The first phrase wherein in sentence and be connected to by the relativity designator after first phrase second Phrase is in relativity;And
Two phrases wherein connected by the choice relation designator form selective expression in sentence.
3. the method as described in claim 1, wherein when the input inquiry is more word inquiries, given birth to for the input inquiry Include into similar inquiry is intended to:
Identify that the core of the input inquiry is intended to part and modifier part;And
The intention that the core for generating the input inquiry is intended to part is intended to similar inquiry similar to phrase as described.
4. the method as described in claim 1, wherein when the input inquiry is more word inquiries, given birth to for the input inquiry Include into similar inquiry is intended to:
Identify that the core of the input inquiry is intended to part and modifier part;And
The similar inquiry of the intention is generated by using the modifier part of input inquiry described in a variety of replacement partial replacements, wherein It is each substitute part be the intention for modifier part generation similar to phrase, wherein it is each be intended to similar phrase have with The same or similar intention type in modifier part of the input inquiry.
5. the method as described in claim 3 or 4, wherein identifying that the core of the input inquiry is intended to part and modifier part Including:
The input inquiry is parsed, the input inquiry is divided into multiple semantic primitives;
For each semantic primitive divided of the input inquiry, the semantic primitive and changing section structure by being divided are generated Into the similar inquiry of interim intention, wherein the changing section is the meaning generated for other semantic primitives of the input inquiry The similar phrase of figure;
For each semantic primitive divided of the input inquiry, for one group of meaning of each interim similar inquiry excavation of intention Figure, wherein each be intended to provide for the corresponding interim sub-topicses for being intended to similar inquiry;
It is similar by the interim intention of more corresponding semantic primitive for each semantic primitive divided of the input inquiry The intention group of inquiry calculates consistent degree, wherein the consistent degree is the similar inquiry of interim intention for corresponding semantic primitive It is intended to homophylic measurement, then if present in the interim intention type being intended in the intention of similar inquiry more commonly and described one Cause degree is higher;And
The core that the semantic primitive in the input inquiry with highest consistent degree is defined as to the input inquiry is intended to part, And other semantic primitives are defined as to the modifier part of the input inquiry.
6. the method as described in claim 1, wherein for input inquiry generation be intended to similar inquiry include it is following extremely It is few one:
One or more inquiries, which are obtained, from the similar inquiry storehouse of intention of the input inquiry is intended to similar inquiry as described;
One or more brother of nodes that the input inquiry is obtained in domain body are intended to similar inquiry as described;
The neighbouring concept that the input inquiry is obtained in language dictionaries is intended to similar inquiry as described;And
It is intended to similarity by being calculated based on the muster data associated with the input inquiry to obtain one from inquiry log Individual or multiple queries are intended to similar inquiry as described.
7. the method as described in claim 1, further comprise wherein being intended to similar inquiry for input inquiry generation:
Calculate the similar degree between each inquiry being intended in similar inquiry and the input inquiry;And
It is intended to select certain amount of similar inquiry or the similar degree of being intended to of similar degree highest in similar inquire about more than pre- from described Determine the similar inquiry of intention of threshold value.
8. method as claimed in claim 7, wherein each inquiry and the input inquiry being intended in similar inquiry it Between similar degree by the following at least one of calculate:
The inquiry and the consistent degree of the input inquiry;
The inquiry and the vocabulary similarity of the input inquiry;
The inquiry and the grammer similarity of the input inquiry;
The inquiry and the semantic similarity of the input inquiry;
The inquiry and context similarity of the input inquiry in the collected works prepared;
The inquiry and common occurrence rate of the input inquiry in inquiry log;
The inquiry and distance of the input inquiry in domain body;And
The inquiry and the similitude of the muster data of the input inquiry.
9. method as claimed in claim 7, wherein each inquiry and the input inquiry being intended in similar inquiry it Between similar degree calculated by least one real-world information, the real-world information comprises at least:Time, position, User model and environment.
10. the method as described in claim 1, wherein the similar intent information description by the regular expression of input inquiry come Present.
11. method as claimed in claim 10, wherein determining that the similar intent information describes collection and included:
Analyze the linguistic form that each being intended in whole intention groups of similar inquiry is intended to;
Determine at least one between the linguistic form and remaining linguistic form of the similar inquiry of respective intent in the linguistic form Query intention relation;
Each linguistic form being intended to is transformed to regular expression corresponding to identified at least one query intention relation;With And
The regular expression for converting acquisition is added into the similar intent information description to concentrate.
12. method as claimed in claim 11, wherein determining that the similar intent information describes collection and further comprised:
Each intention group is extended, including:
For each intention in the intention group, by being replaced with the synonym or near synonym of at least one word in intention At least one word and generate synonymous phrase, wherein at least one word is not intended in similar inquiry corresponding, and
Caused synonymous phrase is added in the intention group.
13. method as claimed in claim 11, wherein determining that the similar intent information describes collection and further comprised:
Described each intention being intended in whole intention groups of similar inquiry is parsed by morphological analysis means, with detection It is corresponding to be intended to whether similar inquiry meets at least one morphological rule;
If corresponding be intended to meet at least one morphological rule similar to inquiry, in the intention group of the similar inquiry of the intention Each intention, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and obtain The intention of conversion;
Using the intention of the conversion as having similar with the vocabulary type of the asterisk wildcard form intent information of vocabulary to describe, and by the word Remittance type describes to be used as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
14. method as claimed in claim 11, wherein determining that the similar intent information describes collection and further comprised:
Described each intention being intended in whole intention groups of similar inquiry is parsed by syntactic analysis means, with detection It is corresponding to be intended to whether similar inquiry meets at least one syntax rule;
If corresponding be intended to meet at least one syntax rule similar to inquiry, in the intention group of the similar inquiry of the intention Each intention, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and obtain The intention of conversion;
Described using the intention of the conversion as with syntax rule intent information similar with the grammer type of asterisk wildcard form, and will The grammer type describes to be used as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
15. method as claimed in claim 13, wherein determining that the similar intent information describes collection and further comprised:
Described each intention being intended in whole intention groups of similar inquiry is parsed by syntactic analysis means, with detection It is corresponding to be intended to whether similar inquiry meets at least one syntax rule;
If corresponding be intended to meet at least one syntax rule similar to inquiry, in the intention group of the similar inquiry of the intention Each intention, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and obtain The intention of conversion;
Described using the intention of the conversion as with syntax rule intent information similar with the grammer type of asterisk wildcard form, and will The grammer type describes to be used as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
16. such as the method any one of claim 11,13-15, wherein determine the similar intent information description collect into One step includes:
Described each intention being intended in whole intention groups of similar inquiry is parsed by semantic relation analysis means, with Detection is corresponding to be intended to whether similar inquiry meets at least one semantic relation;
If corresponding be intended to meet at least one semantic relation similar to inquiry, in the intention group of the similar inquiry of the intention Each intention, replace the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention with asterisk wildcard, and with The semantic marker of remaining linguistic form of the intention replaces remaining linguistic form, and the intention converted;
Described using the intention of the conversion as with semantic marker intent information similar with the semantic type of asterisk wildcard form, and will The semantic type describes to be used as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
17. such as the method any one of claim 11,13-15, wherein determine the similar intent information description collect into One step includes:
Described each intention being intended in whole intention groups of similar inquiry is parsed by logic analysis means, with detection It is corresponding to be intended to whether similar inquiry meets at least one logical relation;
If corresponding be intended to meet at least one logical relation similar to inquiry, in the intention group of the similar inquiry of the intention Each intention, replace the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention with asterisk wildcard, and with The logical type of remaining linguistic form of the intention replaces remaining linguistic form, and the intention converted;
Described using the intention of the conversion as with logical type intent information similar with the logical type of asterisk wildcard form, and will The logical type describes to be used as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
18. method as claimed in claim 16, wherein determining that the similar intent information describes collection and further comprised:
Described each intention being intended in whole intention groups of similar inquiry is parsed by logic analysis means, with detection It is corresponding to be intended to whether similar inquiry meets at least one logical relation;
If corresponding be intended to meet at least one logical relation similar to inquiry, in the intention group of the similar inquiry of the intention Each intention, replace the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention with asterisk wildcard, and with The logical type of remaining linguistic form of the intention replaces remaining linguistic form, and the intention converted;And
Described using the intention of the conversion as with logical type intent information similar with the logical type of asterisk wildcard form, and will The logical type describes to be used as the regular expression similar to intent information;
The regular expression is added into similar intent information description to concentrate.
19. the method as described in claim 11 or 12, wherein determining that the similar intent information describes collection and further comprised:
Calculate the similar intent information description and concentrate each confidence level similar to intent information description;And
From the similar intent information description concentrate the certain amount of similar intent information of selection confidence level highest describe or The similar intent information that confidence level is more than predetermined threshold describes.
20. method as claimed in claim 19, the confidence level is calculated using at least one in the following:
The frequency of similar intent information description;
The coverage rate of similar intent information description;And
Similar intent information description and the correlation of input inquiry.
21. method as claimed in claim 19, the confidence level calculates from least one in the following:
The similar intent information description collection;
The intention training set prepared;And
The realm information prepared.
22. method as claimed in claim 21, retouched wherein calculating similar intent information from the similar intent information description collection The confidence level stated further comprises:
Retouched according to the popularity for being intended to similar inquiry come the corresponding similar intent information for describing to concentrate to the similar intent information State and configure different weights;And/or
That concentrates is described to the similar intent information according to similar inquiry is intended to the similar degree between the input inquiry Corresponding similar intent information description configures different weights.
23. the method as described in claim 1, wherein excavation includes for the intention of the input inquiry:
By with input inquiry come replace the similar intent information description concentrate asterisk wildcard in similar intent information description come Produce one group of intention.
24. the method as described in claim 1, wherein excavation includes for the intention of the input inquiry:
First group of intention for the input inquiry is excavated from least one data source;And
The for the input inquiry is excavated by using the similar intent information description collection and first group of intention Two groups of intentions.
25. method as claimed in claim 24, wherein excavation includes for second group of intention of the input inquiry:
In being described by least one similar intent information concentrated with input inquiry to replace the similar intent information to describe Asterisk wildcard generate at least one intention, wherein at least one intention is not in first group of intention;And
At least one intention generated is added in first group of intention.
26. method as claimed in claim 24, wherein excavation includes for second group of intention of the input inquiry:
Set pair is described by using the similar intent information to be ranked up for first group of intention of the input inquiry.
27. method as claimed in claim 26, wherein excavation further comprises for second group of intention of the input inquiry:
Identification is for the peculiar intention in first group of intention of the input inquiry;
According to special intentional peculiar degree, weight of the peculiar intention in the sequence is improved;
Wherein, special intentional peculiar degree is calculated by least one in the following:
Input inquiry and special intentional common occurrence rate in the intention training set prepared;
Relation of the input inquiry with peculiar intention in domain knowledge;
Frequency of the peculiar intention in muster data;And
Popularity of the peculiar intention in inquiry log.
28. one kind is used for method for information retrieval, including:
Receive the input inquiry that user uses natural language;
Method according to any one of claim 1-27 carries out intention excavation from the input inquiry;And
Obtain the search result for excavating and being intended to.
29. a kind of method for question and answer auxiliary, including:
Receive the input inquiry that user uses natural language;
Method according to any one of claim 1-27 excavates theme from the input inquiry;And
Obtain the answer for excavated theme.
30. a kind of be used to be intended to the equipment excavated, the equipment includes:
Input inquiry acquiring unit, obtain input inquiry;
It is intended to similar query generation unit, is intended to similar inquiry for input inquiry generation, each of which is intended to similar Inquiry has and the same or similar intention type of the input inquiry;Wherein, it is described to be intended to include similar to query generation unit:
Inquiry obtains one or more inquiries to phrase, wherein each inquiry to phrase acquiring unit from least one data source Phrase is included:The input inquiry, it is intended to similar designator and the 3rd phrase;And
3rd Phrase extraction unit, from each inquiry to the 3rd phrase described in Phrase extraction, it is intended to similar inquiry as described;
Wherein, it is described to be intended to include at least one in the following similar to designator:Coordination designator, relativity refer to Show symbol and choice relation designator;
First intention excavates unit, excavates one group of intention for the similar inquiry of each intention, is directed to wherein being each intended to offer The corresponding sub-topicses for being intended to similar inquiry;
Similar intent information description collection determining unit, class is determined by using the whole intention groups for being intended to similar inquiry Describe to collect like intent information;And
Second intention excavates unit, by using meaning of the similar intent information description collection to excavate for the input inquiry Figure.
31. equipment as claimed in claim 30,
Two phrases wherein connected by the coordination designator are used as identical syntactic element in sentence;
The first phrase wherein in sentence and be connected to by the relativity designator after first phrase second Phrase is in relativity;And
Two phrases wherein connected by the choice relation designator form selective expression in sentence.
32. equipment as claimed in claim 30, wherein when the input inquiry is more word inquiries, it is described to be intended to similar inquiry Generation unit includes:
Core is intended to part and modifier part recognition unit, identifies that the core of the input inquiry is intended to part and modifier portion Point;And
It is intended to similar phrase generation unit, the intention of core intention part of the input inquiry is generated similar to phrase, as institute State the similar inquiry of intention.
33. equipment as claimed in claim 30, wherein when the input inquiry is more word inquiries, it is described to be intended to similar inquiry Generation unit includes:
Core is intended to part and modifier part recognition unit, identifies that the core of the input inquiry is intended to part and modifier portion Point;And
Modifier partial replacement unit, by using the modifier part of input inquiry described in a variety of replacement partial replacements to generate State the similar inquiry of intention, wherein it is each substitute part be the intention for modifier part generation similar to phrase, wherein often It is individual to be intended to similar phrase with the same or similar intention type in modifier part with the input inquiry.
34. the equipment as described in claim 32 or 33, wherein the core is intended to part and modifier part recognition unit bag Include:
Input inquiry resolution unit, parses the input inquiry, and the input inquiry is divided into multiple semantic primitives;
It is interim to be intended to similar query generation unit, for each semantic primitive divided of the input inquiry, generate by institute The semantic primitive of division inquiry similar with the interim intention that changing section is formed, wherein the changing section is to be directed to the input The intention of other semantic primitives generation of inquiry is similar to phrase;
3rd is intended to excavate unit, for each semantic primitive divided of the input inquiry, for each interim intention One group of intention is excavated in similar inquiry, wherein each be intended to provide for the corresponding interim sub-topicses for being intended to similar inquiry;
Consistent degree computing unit, for each semantic primitive divided of the input inquiry, by relatively corresponding semantic more single The interim intention of member calculates consistent degree similar to the intention group of inquiry, wherein the consistent degree is facing for corresponding semantic primitive The homophylic measurement of intention of the similar inquiries of Shi Yitu, if present in the interim intention type being intended in the intention of similar inquiry More commonly, then the consistent degree is higher;And
Core is intended to part determining unit, the semantic primitive in the input inquiry with highest consistent degree is defined as described defeated The core for entering inquiry is intended to part, and other semantic primitives are defined as to the modifier part of the input inquiry.
35. equipment as claimed in claim 30, wherein it is described be intended to similar query generation unit include it is following at least one:
It is intended to from the intention of the input inquiry similar to one or more inquiries are obtained in inquiry storehouse as described similar to inquiry Unit;
One or more brother of nodes of the input inquiry are obtained in domain body as the list for being intended to similar inquiry Member;
The neighbouring concept of the input inquiry is obtained in language dictionaries as the unit for being intended to similar inquiry;And
It is intended to similarity by being calculated based on the muster data associated with the input inquiry to obtain one from inquiry log Individual or multiple queries are as the unit for being intended to similar inquiry.
36. equipment as claimed in claim 30, wherein described be intended to further comprise similar to query generation unit:
Similar degree computing unit, calculate similar between each inquiry being intended in similar inquiry and the input inquiry Degree;And
It is intended to similar inquiry selecting unit, is intended to select the certain amount of intention class of similar degree highest in similar inquiry from described It is more than the similar inquiry of intention of predetermined threshold like inquiry or similar degree.
37. equipment as claimed in claim 36, wherein each inquiry being intended in similar inquiry and the input inquiry Between similar degree by the following at least one of calculate:
The inquiry and the consistent degree of the input inquiry;
The inquiry and the vocabulary similarity of the input inquiry;
The inquiry and the grammer similarity of the input inquiry;
The inquiry and the semantic similarity of the input inquiry;
The inquiry and context similarity of the input inquiry in the collected works prepared;
The inquiry and common occurrence rate of the input inquiry in inquiry log;
The inquiry and distance of the input inquiry in domain body;And
The inquiry and the similitude of the muster data of the input inquiry.
38. equipment as claimed in claim 36, wherein each inquiry being intended in similar inquiry and the input inquiry Between similar degree calculated by least one real-world information, the real-world information comprises at least:Time, position Put, user model and environment.
39. equipment as claimed in claim 30, wherein the regular expression that the similar intent information description passes through input inquiry To present.
40. equipment as claimed in claim 39, wherein the similar intent information description collection determining unit includes:
Linguistic form analytic unit, analyze the language shape that each being intended in whole intention groups of similar inquiry is intended to Formula;
Query intention relation determination unit, determine the linguistic form and remaining language of the similar inquiry of respective intent in the linguistic form At least one query intention relation between speech form;
Regular expression converter unit, the linguistic form for being intended to each corresponding to identified at least one query intention relation It is transformed to regular expression;And
Regular expression adding device, the regular expression for converting acquisition is added to the similar intent information description and concentrated.
41. equipment as claimed in claim 40, wherein the similar intent information description collection determining unit further comprises:
Intention group expanding element, each intention group is extended, including:
Synonymous phrase generation unit, for each intention in the intention group, by with the same of at least one word in intention Adopted word or near synonym generate synonymous phrase to replace at least one word, wherein at least one word is not intended to accordingly In similar inquiry, and
Synonymous phrase adding device, caused synonymous phrase is added in the intention group.
42. equipment as claimed in claim 40, wherein the similar intent information description collection determining unit further comprises:
First intention resolution unit, described be intended in whole intention groups of similar inquiry is parsed by morphological analysis means Each is intended to, to detect whether the similar inquiry of corresponding intention meets at least one morphological rule;
First asterisk wildcard replacement unit, if corresponding be intended to meet at least one morphological rule similar to inquiry, for the meaning Each intention in the intention group of the similar inquiry of figure, replaces that respective intent in the linguistic form of the intention is similar to be looked into asterisk wildcard The linguistic form of inquiry, and the intention converted;
First regular expression generation unit, using the intention of the conversion as similar with the vocabulary type of asterisk wildcard form with vocabulary Intent information is described, and the vocabulary type is described as the regular expression similar to intent information;And
First regular expression adding device, the regular expression is added to similar intent information description and concentrated.
43. equipment as claimed in claim 40, wherein the similar intent information description collection determining unit further comprises:
Second intention resolution unit, described be intended in whole intention groups of similar inquiry is parsed by syntactic analysis means Each is intended to, to detect whether the similar inquiry of corresponding intention meets at least one syntax rule;
Second asterisk wildcard replacement unit, if corresponding be intended to meet at least one syntax rule similar to inquiry, for the meaning Each intention in the intention group of the similar inquiry of figure, replaces that respective intent in the linguistic form of the intention is similar to be looked into asterisk wildcard The linguistic form of inquiry, and the intention converted;
Second regular expression generation unit, using the intention of the conversion as the grammer type with syntax rule and asterisk wildcard form Similar intent information is described, and the grammer type is described as the regular expression similar to intent information;And
Second regular expression adding device, the regular expression is added to similar intent information description and concentrated.
44. equipment as claimed in claim 42, wherein the similar intent information description collection determining unit further comprises:
Second intention resolution unit, described be intended in whole intention groups of similar inquiry is parsed by syntactic analysis means Each is intended to, to detect whether the similar inquiry of corresponding intention meets at least one syntax rule;
Second asterisk wildcard replacement unit, if corresponding be intended to meet at least one syntax rule similar to inquiry, for the meaning Each intention in the intention group of the similar inquiry of figure, replaces that respective intent in the linguistic form of the intention is similar to be looked into asterisk wildcard The linguistic form of inquiry, and the intention converted;
Second regular expression generation unit, using the intention of the conversion as the grammer type with syntax rule and asterisk wildcard form Similar intent information is described, and the grammer type is described as the regular expression similar to intent information;And
Second regular expression adding device, the regular expression is added to similar intent information description and concentrated.
45. such as the equipment any one of claim 40,42-44, determined singly wherein the similar intent information description collects Member further comprises:
3rd intents unit, the whole intention groups for being intended to similar inquiry are parsed by semantic relation analysis means In each intention, corresponding be intended to whether similar inquiry meets at least one semantic relation to detect;
3rd asterisk wildcard replacement unit, if corresponding be intended to meet at least one semantic relation similar to inquiry, for the meaning Each intention in the intention group of the similar inquiry of figure, replaces that respective intent in the linguistic form of the intention is similar to be looked into asterisk wildcard The linguistic form of inquiry, and remaining linguistic form is replaced with the semantic marker of remaining linguistic form of the intention, and become The intention changed;And
3rd regular expression generation unit, using the intention of the conversion as the semantic type with semantic marker and asterisk wildcard form Similar intent information is described, and the semantic type is described as the regular expression similar to intent information;
3rd regular expression adding device, the regular expression is added to similar intent information description and concentrated.
46. such as the equipment any one of claim 40,42-44, determined singly wherein the similar intent information description collects Member further comprises:
4th intents unit, described be intended in whole intention groups of similar inquiry is parsed by logic analysis means Each is intended to, to detect whether the similar inquiry of corresponding intention meets at least one logical relation;
4th asterisk wildcard replacement unit, if corresponding be intended to meet at least one logical relation similar to inquiry, for the meaning Each intention in the intention group of the similar inquiry of figure, replaces that respective intent in the linguistic form of the intention is similar to be looked into asterisk wildcard The linguistic form of inquiry, and remaining linguistic form is replaced with the logical type of remaining linguistic form of the intention, and become The intention changed;And
4th regular expression generation unit, using the intention of the conversion as the logical type with logical type and asterisk wildcard form Similar intent information is described, and the logical type is described as the regular expression similar to intent information;
4th regular expression adding device, the regular expression is added to similar intent information description and concentrated.
47. equipment as claimed in claim 45, wherein the similar intent information description collection determining unit further comprises:
4th intents unit, described be intended in whole intention groups of similar inquiry is parsed by logic analysis means Each is intended to, to detect whether the similar inquiry of corresponding intention meets at least one logical relation;
4th asterisk wildcard replacement unit, if corresponding be intended to meet at least one logical relation similar to inquiry, for the meaning Each intention in the intention group of the similar inquiry of figure, replaces that respective intent in the linguistic form of the intention is similar to be looked into asterisk wildcard The linguistic form of inquiry, and remaining linguistic form is replaced with the logical type of remaining linguistic form of the intention, and become The intention changed;And
4th regular expression generation unit, using the intention of the conversion as the logical type with logical type and asterisk wildcard form Similar intent information is described, and the logical type is described as the regular expression similar to intent information;
4th regular expression adding device, the regular expression is added to similar intent information description and concentrated.
48. the equipment as described in claim 40 or 41, wherein the similar intent information description collection determining unit is further wrapped Include:
Confidence computation unit, calculate the similar intent information description and concentrate each confidence similar to intent information description Degree;And
Similar intent information describes selecting unit, and selection confidence level highest certain number is concentrated from the similar intent information description The similar intent information that the similar intent information description of amount or confidence level are more than predetermined threshold describes.
49. equipment as claimed in claim 48, the confidence level is calculated using at least one in the following:
The frequency of similar intent information description;
The coverage rate of similar intent information description;And
Similar intent information description and the correlation of input inquiry.
50. equipment as claimed in claim 48, the confidence level calculates from least one in the following:
The similar intent information description collection;
The intention training set prepared;And
The realm information prepared.
51. equipment as claimed in claim 50, wherein the confidence computation unit further comprises:
First weight dispensing unit, the phase of concentration is described according to the popularity for being intended to similar inquiry to the similar intent information Should the different weight of similar intent information description configuration;And/or
Second weight dispensing unit, inquired about according to intention is similar with the similar degree between the input inquiry come to the similar meaning The corresponding similar intent information description that the description of figure information is concentrated configures different weights.
52. equipment as claimed in claim 30, include wherein the second intention excavates unit:
Input inquiry replacement unit, by concentrating similar intent information with input inquiry to replace the similar intent information description Asterisk wildcard in description produces one group of intention.
53. equipment as claimed in claim 30, include wherein the second intention excavates unit:
First group of intention excavates unit, and first group of intention for the input inquiry is excavated from least one data source;And
Second group of intention excavates unit, and pin is excavated by using the similar intent information description collection and first group of intention To second group of intention of the input inquiry.
54. equipment as claimed in claim 53, include wherein second group of intention excavates unit:
In being described by least one similar intent information concentrated with input inquiry to replace the similar intent information to describe Asterisk wildcard generate the unit of at least one intention, wherein at least one intention is not in first group of intention;And
The unit of at least one intention generated is added in first group of intention.
55. equipment as claimed in claim 53, include wherein second group of intention excavates unit:
Sequencing unit, set pair is described by using the similar intent information and carried out for first group of intention of the input inquiry Sequence.
56. equipment as claimed in claim 55, further comprise wherein second group of intention excavates unit:
Peculiar intention assessment unit, identification is for the peculiar intention in first group of intention of the input inquiry;
Weight changes unit, according to special intentional peculiar degree, improve weight of the peculiar intention in the sequence;
Wherein, special intentional peculiar degree is calculated by least one in the following:
Input inquiry and special intentional common occurrence rate in the intention training set prepared;
Relation of the input inquiry with peculiar intention in domain knowledge;
Frequency of the peculiar intention in muster data;And
Popularity of the peculiar intention in inquiry log.
57. a kind of equipment for information retrieval, including:
Input inquiry receiving unit, receive the input inquiry that user uses natural language;
The equipment for being used to be intended to excavate according to any one of claim 30-56, is intended to from the input inquiry Excavate;And
Search result obtaining unit, obtain the search result for excavating and being intended to.
58. a kind of equipment for question and answer auxiliary, including:
Input inquiry receiving unit, receive the input inquiry that user uses natural language;
The equipment for being used to be intended to excavate according to any one of claim 30-56, theme is excavated from the input inquiry; And
Answer obtaining unit, obtain the answer for excavated theme.
CN201310371165.5A 2013-08-23 2013-08-23 Method and apparatus for being intended to excavate Active CN104424216B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310371165.5A CN104424216B (en) 2013-08-23 2013-08-23 Method and apparatus for being intended to excavate

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310371165.5A CN104424216B (en) 2013-08-23 2013-08-23 Method and apparatus for being intended to excavate

Publications (2)

Publication Number Publication Date
CN104424216A CN104424216A (en) 2015-03-18
CN104424216B true CN104424216B (en) 2018-01-23

Family

ID=52973214

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310371165.5A Active CN104424216B (en) 2013-08-23 2013-08-23 Method and apparatus for being intended to excavate

Country Status (1)

Country Link
CN (1) CN104424216B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776981B (en) * 2016-12-06 2020-12-15 广州同构科技有限公司 Intelligent retrieval method based on empirical knowledge
CN108287858B (en) * 2017-03-02 2021-08-10 腾讯科技(深圳)有限公司 Semantic extraction method and device for natural language
CN107704450B (en) * 2017-10-13 2020-12-04 威盛电子股份有限公司 Natural language identification device and natural language identification method
CN107679039B (en) * 2017-10-17 2020-12-29 北京百度网讯科技有限公司 Method and device for determining statement intention
CN108170859B (en) * 2018-01-22 2020-07-28 北京百度网讯科技有限公司 Voice query method, device, storage medium and terminal equipment
CN110309252B (en) * 2018-02-28 2023-11-24 阿里巴巴集团控股有限公司 Natural language processing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101339551A (en) * 2007-07-05 2009-01-07 日电(中国)有限公司 Natural language query demand extension equipment and its method
CN102063469A (en) * 2010-12-03 2011-05-18 百度在线网络技术(北京)有限公司 Method and device for acquiring relevant keyword message and computer equipment
CN102096717A (en) * 2011-02-15 2011-06-15 百度在线网络技术(北京)有限公司 Search method and search engine
CN102722558A (en) * 2012-05-29 2012-10-10 百度在线网络技术(北京)有限公司 User question recommending method and device
CN103049495A (en) * 2012-12-07 2013-04-17 百度在线网络技术(北京)有限公司 Method, device and equipment for providing searching advice corresponding to inquiring sequence

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101339551A (en) * 2007-07-05 2009-01-07 日电(中国)有限公司 Natural language query demand extension equipment and its method
CN102063469A (en) * 2010-12-03 2011-05-18 百度在线网络技术(北京)有限公司 Method and device for acquiring relevant keyword message and computer equipment
CN102096717A (en) * 2011-02-15 2011-06-15 百度在线网络技术(北京)有限公司 Search method and search engine
CN102722558A (en) * 2012-05-29 2012-10-10 百度在线网络技术(北京)有限公司 User question recommending method and device
CN103049495A (en) * 2012-12-07 2013-04-17 百度在线网络技术(北京)有限公司 Method, device and equipment for providing searching advice corresponding to inquiring sequence

Also Published As

Publication number Publication date
CN104424216A (en) 2015-03-18

Similar Documents

Publication Publication Date Title
CN104424216B (en) Method and apparatus for being intended to excavate
Gupta et al. A survey of text question answering techniques
US7925506B2 (en) Speech recognition accuracy via concept to keyword mapping
JP4650072B2 (en) Question answering system, data retrieval method, and computer program
EP2306451B1 (en) Speech recognition
KR100806936B1 (en) System and method for providing automatically completed recommended word by correcting and displaying the word
US8494839B2 (en) Apparatus, method, and recording medium for morphological analysis and registering a new compound word
CN102567509B (en) Method and system for instant messaging with visual messaging assistance
CN101425071A (en) Location expression detection device and computer readable medium
JP4737435B2 (en) LABELING SYSTEM, LABELING SERVICE SYSTEM, LABELING METHOD, AND LABELING PROGRAM
JP2006244262A (en) Retrieval system, method and program for answer to question
US20120078907A1 (en) Keyword presentation apparatus and method
CN101933017B (en) Document search device, document search system, and document search method
González et al. Siamese hierarchical attention networks for extractive summarization
Serigos Applying corpus and computational methods to loanword research: new approaches to Anglicisms in Spanish
JP2008077252A (en) Document ranking method, document retrieval method, document ranking device, document retrieval device, and recording medium
Aslam et al. Web-AM: An efficient boilerplate removal algorithm for Web articles
Fenogenova et al. A general method applicable to the search for anglicisms in russian social network texts
JP4783563B2 (en) Index generation program, search program, index generation method, search method, index generation device, and search device
CA2483805C (en) System and methods for improving accuracy of speech recognition
Mendes et al. Just. Ask—A multi-pronged approach to question answering
CH-Wang et al. Do Androids Know They're Only Dreaming of Electric Sheep?
US11734331B1 (en) Systems and methods to optimize search for emerging concepts
Nguyen et al. DCU and HCMUS at NTCIR-16 Lifelog-4
JP5182960B2 (en) Store name ambiguity resolving apparatus, method, program, and recording medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant