CN104424216B - Method and apparatus for being intended to excavate - Google Patents
Method and apparatus for being intended to excavate Download PDFInfo
- Publication number
- CN104424216B CN104424216B CN201310371165.5A CN201310371165A CN104424216B CN 104424216 B CN104424216 B CN 104424216B CN 201310371165 A CN201310371165 A CN 201310371165A CN 104424216 B CN104424216 B CN 104424216B
- Authority
- CN
- China
- Prior art keywords
- similar
- inquiry
- intention
- intended
- intent information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
Abstract
The present invention relates to the method and apparatus for being intended to excavate.A kind of method for being used to be intended to excavation is disclosed, methods described includes:Obtain input inquiry;It is intended to similar inquiry for input inquiry generation, each of which, which is intended to similar inquiry, to be had and the same or similar intention type of the input inquiry;One group of intention is excavated for the similar inquiry of each intention, wherein each be intended to provide for the corresponding sub-topicses for being intended to similar inquiry;Similar intent information description collection is determined by using the whole intention groups for being intended to similar inquiry;And by using intention of the similar intent information description collection to excavate for the input inquiry.
Description
Technical field
The present invention relates to the method and apparatus of text mining.Especially, the present invention relates to for excavate be intended to method and
Equipment.And more particularly, it relates to find the method and apparatus of the search intention for the inquiry behind that user is proposed.
Background technology
With the continuous development of computer and information technology, speed caused by the information in All Around The World constantly increases now
Add.All multi informations such as personal information, occupational information, entertainment information, scientific and technological information, government information in the world now be present.Because
Information is excessive, so causing to turn into problem to the tissue of information and access.
In order to improve experience of the user in information seeking processes, the side that user accesses its information found is used to help
Method and system are constantly developed.For example, in Santos, et al.2011.University of Glasgow at the
NTCIR-9Intent task:Experiments with Terrier on Subtopic Mining and Document
Ranking.Proceedings of NTCIR-9Workshop Meeting,2011,Tokyo(Non-patent literature 1)Middle proposition
Attempt to understand the potential intention for the inquiry behind that user is inputted.The situation of brief and ambiguous inquiry is inputted in user
Under, it is desirable to be able to export n(For example, n=10)Important and diversified optimal intention result.Table 1 shows that one kind is shown
Example.
The input inquiry of table 1 and the example of output
For example, as shown in table 1, if " becoming a paralegal ", can be exported some user input query
Individual and " intention relevant becoming a paralegal ", so that user is selected.
In excavation processing is intended to, the quality of intention Result is generally evaluated with below equation:
Wherein I-rec(Intent recall)Represent to be intended to recall rate, i.e., in the intention obtained, what is obtained has
The quantity of intention(That is, the correct result obtained)Relative to those quantity being intended to for wishing to obtain(All correct knots
Fruit)Ratio, be often used for measurement be intended to variation;D-nDCG represents to be intended to accuracy rate(Intent precision), D-
NDCG is diversified normalization accumulation of discount gain(Diversified-Normalized Discounted Cumulative
Gain), it calculates the degree of correlation for the result document list that search engine returns based on position(Referring to Sakai and Song,
Evaluating Diversified Search Result Using Per-intent Graded Relevance,
Proceedings of SIGIR’11,2011Beijing(Non-patent literature 2)), it is used to measure the overall degree of correlation being intended to;
And D#-nDCG represents I-rec and D-nDCG linear combination.
In above formula, I_rec, D-nDCG and D#-nDCG are the time of day data based on inquiry(Also referred to as standard is answered
Case, ground truth)It is determined, is compared to calculate with time of day data typically by Result will be intended to
Obtain, these indexs are it is known in the art that therefore will not be described in detail again.
As an example, in the prior art, the time of day data of inquiry can be obtained in the following way.It is for example, true
Real status data can be manually set.For another example time of day data are to be provided by expositor and produced by more individual votes
Raw.
In the prior art, generally from external resource of overall importance(Such as search engine, wikipedia, inquiry log and
Anchor Text)Multiple intention candidates are excavated, then the intention candidate excavated are ranked up by parameters such as frequencies, to obtain
The desired intention of user.
Such as in Xue, et al.2011.THUIR at NTCIR-9INTENT Task.Proceedings of
NTCIR-9Workshop Meeting,2011,Tokyo(Non-patent literature 3)In disclose it is a kind of be used for be intended to excavate side
Method.This method extracts the search result for including input inquiry, and the intention for being then based on search result identification for input inquiry is waited
Choosing, it is finally based on certain criterion and the intention candidate is ranked up, to obtain the desired intention of user.
Fig. 1 shows the flow chart of the method for being used to be intended to excavate used in the non-patent literature 3 of prior art.Such as
Shown in Fig. 1, in step S2100, the inquiry of user's input is obtained.Next, in step S2110, from search engine, wikipedia
And the external resource of overall importance such as inquiry log excavates the intention candidate of the inquiry.Next, in step S2120, from being obtained
The intention candidate of repetition is removed in the intention candidate obtained.Then, in step S2130, the frequency, common for being intended to candidate's appearance is utilized
The parameters such as frequency, muster data and the editing distance of appearance, the remaining intention after intention candidate to eliminating repetition are waited
Choosing is ranked up.Finally, in step S2140, according to ranking results, the forward intention candidate in position is selected as desired by user
Intention, exported.
However, still according to practice, those skilled in the art have found, for disclosed in the non-patent literature 3 of prior art
Method, in intent information(Such as user's query history)In the case of rareness, the intention obtained may it is expected with user
The intention arrived is inconsistent, i.e., the above method can not accurately provide the intention candidate that user intentionally gets.Therefore, the above method
It is relatively low to be intended to excavation performance.
In addition, in United States Patent (USP) US8,214,347B2(Patent document 1)In propose and another be used to be intended to the side excavated
Method.In the method, high frequency phrases are extracted from search result, then by using some predetermined rules, according to these
Phrase is intended to excavate.
Fig. 2 shows the flow chart of the method for being used to be intended to excavate used in the US8,214,347B2 of prior art.
As shown in Fig. 2 in step S2200, the inquiry of user's input is obtained.Next, in step S2210, for user's input
Inquiry, extract search result.Next, in step S2220 be intended to the excavation of candidate, it is included in search result and identifies bag
Phrase containing input inquiry, and the frequency occurred using phrase, the frequency, muster data and the editing distance that occur jointly etc.
Feature determines optimal phrase, as intention candidate.Then, in step S2230, it is ranked up to being intended to candidate.Finally, in step
S2240, according to ranking results, the forward intention candidate in position is selected to be exported as the desired intention of user.
However, still being found according to practice, those skilled in the art, for the US8 of prior art, 214,347B2 institutes are public
The method opened, in intent information(Such as user's query history)In the case of rareness, the intention obtained may it is expected with user
Obtained intention is inconsistent, i.e., the above method is without the intention candidate that accurately offer user intentionally gets.Therefore, the above method
Intention excavate performance it is relatively low.
It is, therefore, desirable to provide a kind of new technology solves above-mentioned problems of the prior art.
The content of the invention
It is an object of the invention to improve the degree of accuracy for being intended to excavate.
Another object of the present invention is to improve to be intended to recall rate.
According to an aspect of the invention, there is provided a kind of be used to be intended to the method excavated, methods described includes:Obtain defeated
Enter inquiry;It is intended to similar inquiry for input inquiry generation, each of which, which is intended to similar inquiry, to be had and the input
Inquire about same or similar intention type;One group of intention is excavated for the similar inquiry of each intention, wherein each be intended to provide
For the corresponding sub-topicses for being intended to similar inquiry;Class is determined by using the whole intention groups for being intended to similar inquiry
Describe to collect like intent information;And by using meaning of the similar intent information description collection to excavate for the input inquiry
Figure.
According to another aspect of the present invention, there is provided one kind is used for method for information retrieval, including:User is received to use certainly
The input inquiry of right language;Intention excavation is carried out from the input inquiry according to the above-mentioned method for being used to be intended to excavate;And obtain
Obtain the search result for excavating and being intended to.
According to another aspect of the invention, there is provided a kind of method for question and answer auxiliary, including:User is received to use certainly
The input inquiry of right language;Theme is excavated from the input inquiry according to the above-mentioned method for being used to be intended to excavate;And obtain pin
To the answer of the theme excavated.
According to another aspect of the invention, there is provided a kind of to be used to be intended to the equipment excavated, the equipment includes:Input is looked into
Acquiring unit is ask, obtains input inquiry;It is intended to similar query generation unit, is intended to similar look into for input inquiry generation
Ask, each of which, which is intended to similar inquiry, to be had and the same or similar intention type of the input inquiry;First intention excavates
Unit, one group of intention is excavated for the similar inquiry of each intention, looked into wherein being each intended to provide for corresponding intention is similar
The sub-topicses of inquiry;Similar intent information description collection determining unit, by using the whole intention groups for being intended to similar inquiry
To determine similar intent information description collection;And second intention excavates unit, by using the similar intent information description collection
To excavate the intention for the input inquiry.
According to another aspect of the invention, there is provided a kind of equipment for information retrieval, including:Input inquiry receives single
Member, receive the input inquiry that user uses natural language;It is above-mentioned to be used to be intended to the equipment excavated, anticipated from the input inquiry
Figure excavates;And search result obtaining unit, obtain the search result for excavating and being intended to.
According to another aspect of the invention, there is provided a kind of equipment for question and answer auxiliary, including:Input inquiry receives single
Member, receive the input inquiry that user uses natural language;It is above-mentioned to be used to be intended to the equipment excavated, excavate and lead from the input inquiry
Topic;And answer obtaining unit, obtain the answer for excavated theme.
One of the advantages of the present invention is, it is intended that the degree of accuracy of excavation is improved.Rare especially in intent information
In the case of, the intention candidate obtained desired by user also can be accurately provided.
Another in advantages of the present invention is, it is intended that recall rate is improved.
By referring to the drawings to the present invention exemplary embodiment detailed description, further feature of the invention and its
Advantage will be made apparent from.
Brief description of the drawings
The accompanying drawing of a part for constitution instruction describes embodiments of the invention, and is used to solve together with the description
Release the principle of the present invention.
Referring to the drawings, according to following detailed description, the present invention can be more clearly understood, wherein:
Fig. 1 shows the flow chart of the method for being used to be intended to excavate used in the non-patent literature 3 of prior art.
Fig. 2 shows the US8,214,347B2 in prior art(Patent document 1)The middle side for being used to be intended to excavate used
The flow chart of method.
Fig. 3 is the block diagram for illustrating the ability to implement the hardware configuration of the computer system 1000 of embodiments of the present invention.
Fig. 4 show according to the embodiment of the present invention be intended to the side of excavation by using similar inquiry is intended to
The flow chart of method.
Fig. 5 shows that generation according to the embodiment of the present invention is intended to the flow chart of the method for similar inquiry.
Fig. 6 is shown according to the embodiment of the present invention by being intended to the similar method for inquiring about the similar inquiry of storehouse generation intention
Flow chart.
Fig. 7 shows the flow for the method for being intended to similar inquiry using domain body generation according to the embodiment of the present invention
Figure.
Fig. 8 shows the method for generating the similar inquiry of intention using similar designator is intended to according to the embodiment of the present invention
Flow chart.
Fig. 9 shows the method for being intended to similar inquiry according to the embodiment of the present invention, for input inquiry generation
Flow chart.
Figure 10 show according to the embodiment of the present invention, identified input inquiry core be intended to part and modifier part
Method flow chart.
Figure 11 is shown according to the embodiment of the present invention, determine that similar intent information describes by morphological analysis means
The flow chart of the method for collection.
Figure 12 is shown according to the embodiment of the present invention, determine that similar intent information describes by syntactic analysis means
The flow chart of the method for collection.
Figure 13 show according to the embodiment of the present invention, similar intent information is determined by semantic relation analysis means
The flow chart of the method for collection is described.
Figure 14 is shown according to the embodiment of the present invention, determine that similar intent information describes by logic analysis means
The flow chart of the method for collection.
Figure 15 show according to the embodiment of the present invention carry out being intended to excavation by using similar inquiry is intended to
Another flow chart of method.
Figure 16 shows the flow chart for method for information retrieval according to the embodiment of the present invention.
Figure 17 shows the flow chart of the method for being used for question and answer auxiliary according to the embodiment of the present invention.
Figure 18 shows the functional block diagram for being used to excavate the equipment 7000 being intended to according to the embodiment of the present invention.
Figure 19 shows the functional block diagram of the equipment 8000 for information retrieval according to the embodiment of the present invention.
Figure 20 shows the functional block diagram of the equipment 9000 for being used for question and answer auxiliary according to the embodiment of the present invention.
Embodiment
The various exemplary embodiments of the present invention are described in detail now with reference to accompanying drawing.It should be noted that:Unless have in addition
Body illustrates that the unlimited system of part and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally
The scope of invention.
The description only actually at least one exemplary embodiment is illustrative to be never used as to the present invention below
And its application or any restrictions that use.
It may be not discussed in detail for technology, method and apparatus known to person of ordinary skill in the relevant, but suitable
In the case of, the technology, method and apparatus should be considered as authorizing part for specification.
In shown here and discussion all examples, any occurrence should be construed as merely exemplary, without
It is as limitation.Therefore, the other examples of exemplary embodiment can have different values.
It should be noted that:Similar label and letter represents similar terms in following accompanying drawing, therefore, once a certain Xiang Yi
It is defined, then it need not be further discussed in subsequent accompanying drawing in individual accompanying drawing.
Fig. 3 is the block diagram for illustrating the ability to implement the hardware configuration of the computer system 1000 of embodiments of the present invention.
As shown in Figure 3, computer system includes computer 1110.Computer 1110 includes connecting via system bus 1121
The processing unit 1120 that connects, system storage 1130, fixed non-volatile memory interface 1140, removable non-volatile memories
Device interface 1150, user input interface 1160, network interface 1170, video interface 1190 and peripheral interface 1195.
System storage 1130 includes ROM(Read-only storage)1131 and RAM(Random access memory)1132.BIOS
(Basic input output system)1133 reside in ROM1131.Operating system 1134, application program 1135, other program modules
1136 and some routine datas 1137 reside in RAM1132.
The fixed non-volatile memory 1141 of such as hard disk etc is connected to fixed non-volatile memory interface 1140.
Fixed non-volatile memory 1141 for example can store an operating system 1144, application program 1145, other program modules 1146
With some routine datas 1147.
The removable non-volatile memory of such as floppy disk 1151 and CD-ROM drive 1155 etc is connected to
Removable non-volatile memory interface 1150.For example, diskette 1 152 can be inserted into floppy disk 1151, and CD
(CD)1156 can be inserted into CD-ROM drive 1155.
The input equipment of such as mouse 1161 and keyboard 1162 etc is connected to user input interface 1160.
Computer 1110 can be connected to remote computer 1180 by network interface 1170.For example, network interface 1170
Remote computer 1180 can be connected to via LAN 1171.Or network interface 1170 may be coupled to modem
(Modulator-demodulator)1172, and modem 1172 is connected to remote computer 1180 via wide area network 1173.
Remote computer 1180 can include the memory 1181 of such as hard disk etc, and it stores remote application
1185。
Video interface 1190 is connected to monitor 1191.
Peripheral interface 1195 is connected to printer 1196 and loudspeaker 1197.
Computer system shown in Fig. 3 is merely illustrative and is never intended to enter invention, its application, or uses
Row any restrictions.
Computer system shown in Fig. 3 can be incorporated in any embodiment, can be used as stand-alone computer, or also may be used
As the processing system in equipment, one or more unnecessary components can be removed, one or more can also be added to
Multiple additional components.
Fig. 4 show according to the embodiment of the present invention be intended to the side of excavation by using similar inquiry is intended to
The flow chart of method.
As shown in figure 4, first, in step S3100, obtain the inquiry of user's input.It will be appreciated by those skilled in the art that
The inquiry that user is inputted can use various language, include but is not limited to:Chinese, English, Japanese, Korean, German, French,
Russian, Arabic etc..
For example, the inquiry that user is inputted can be " becoming a paralegal ".For the inquiry, user wishes
Hope obtained time of day data(That is, model answer)Shown in table 2.
Table 2 is for inquiring about " becoming a paralegal " time of day data
In table 2, so-called " intention type " refers to be intended to and the relation of respective queries.For the sake of clarity, table 3 is shown
Some examples of intention type.
Inquiry | It is intended to | Intention type |
becoming a paralegal | becoming a paralegal class | Course(Course) |
becoming a paralegal | becoming a paralegal degree | Degree(Position) |
becoming a engineer | becoming a engineer class | Course(Course) |
becoming a engineer | Requirement of becoming a engineer | Require(It is required that) |
The intention type example of table 3
As shown in table 3, if the inquiry of input is " becoming a paralegal ", and being intended that accordingly
" becoming a paralegal class ", then corresponding intention type is exactly " course(Course)", i.e. " becoming a
Paralegal class " are related to the information in terms of " course ".If the inquiry of input is " becoming a
Paralegal ", and be intended that accordingly " becoming a paralegal degree ", then corresponding intention type be exactly
“degree(Position)", i.e., " becoming a paralegal degree " be related to " degree " in terms of information.
With continued reference to Fig. 4, next, in step S3110, it is intended to similar inquiry for the query generation of input.Wherein, often
One is intended to similar inquiry and has and the same or similar intention type of the input inquiry.
If inquiry is similar, they may have same or analogous intention type, it means that when user searches for
During the information of one inquiry, some sub-topics of his removal search inquiry, and when being inquired about as other users searching class, institute
The sub-topicses of search may be identical.For example, user's search " find by becoming a paralegal ", a kind of universal being intended that
“the course of paralegal(Aid in the course of lawyer)", and if user searches for " becoming an
" the course of engineer are found in engineer ", a kind of universal being intended that(The course of engineer)”.For
It is this to be intended to be also universal for other intent queries of " becoming a ' position ' ".Therefore, we can use
It is intended to similar inquiry to excavate the intention for user's inquiry.
Fig. 5 shows that generation according to the embodiment of the present invention is intended to the flow chart of the method for similar inquiry.Such as Fig. 5 institutes
Show, first, in step S3210, for the inquiry of user's input, generate multiple be intended to similar to inquiry.As described hereinafter
, multiple be intended to similar to inquiry can be generated using a variety of methods.Next, in step S3220, it is similar to calculate the intention
The similar degree between each inquiry and the input inquiry in inquiry.Calculate each inquiry being intended in similar inquiry with
The method of similar degree between the input inquiry will be described more fully hereinafter.Finally, in step S3230, from the intention
The certain amount of intention class for being intended to be more than predetermined threshold similar to inquiry or similar degree of similar degree highest is selected in similar inquiry
Like inquiry, as output.
When input inquiry is simple word, multiple be intended to similar to inquiry can be generated using the method shown in Fig. 6.Such as figure
Shown in 6, in step S3310, it is intended to similar inquiry storehouse by checking to generate the similar inquiry of intention.Such as, it is intended that similar inquiry
Storehouse maintains the list of pop music star, when input inquiry is related to emerging pop music star, can select with being somebody's turn to do
Pop music star as emerging pop music stars is as the similar inquiry of intention.Next, in step S3320, meter
Calculate the similar degree between each inquiry being intended in similar inquiry and the input inquiry.Calculate and be intended in similar inquiry
The method of similar degree between each inquiry and the input inquiry will be described in more detail below.Finally, in step
S3330, it is intended to select the similar inquiry of the certain amount of intention of similar degree highest or similar degree to be more than in similar inquiry from described
The similar inquiry of the intention of predetermined threshold, as output.
Further, it is also possible to similar inquiry is intended to generate using the method shown in Fig. 7.As shown in fig. 7, in step S3410
In, multiple be intended to similar to inquiry is generated by checking domain body, i.e. the one of the input inquiry is obtained in domain body
Individual or multiple brother of nodes are intended to similar inquiry as described." domain body " is the encyclopaedic knowledge network of structuring, example
Such as wikipedia.For example, if the inquiry of input is " Vanuatu ".In geography body, " Vanuatu " is one big
Foreign continent country.Therefore, " Fiji ", " Indonesia ", " Kiribati ", " Ma Shaoer group can be selected by geography body
Island " etc. is as the similar inquiry of intention.Next, in step S3420, each inquiry being intended in similar inquiry and institute are calculated
State the similar degree between input inquiry.Calculate the similar degree between each inquiry being intended in similar inquiry and the input inquiry
Method will be described in more detail below.Finally, in step S3430, it is intended to select similar degree in similar inquiry from described
Highest is certain amount of to be intended to the inquiry similar more than the intention of predetermined threshold of similar inquiry or similar degree, as output.
Alternatively and/or additionally, the neighbouring concept of the input inquiry can also be obtained in language dictionaries as institute
State the similar inquiry of intention.
Alternatively and/or additionally, also, can be by calculating meaning based on the muster data associated with the input inquiry
Figure similarity is intended to similar inquiry to obtain one or more inquiries from inquiry log as described.
Further, it is also possible to similar inquiry is intended to generate using the method shown in Fig. 8.As shown in figure 8, in step S3510,
It is intended to similar inquiry by using similar designator is intended to generate.It is described to be intended to include in the following extremely similar to designator
One item missing:Coordination designator, wherein two phrases connected by the coordination designator are used as phase in sentence
With syntactic element, such as " and ", "AND", " and ", " with ", etc.;First in relativity designator, wherein sentence
Phrase and the second phrase being connected to by the relativity designator after first phrase are in relativity, such as
" relative to ", " compared to ", " vs ", " compared to ", etc.;And choice relation designator, wherein passing through the selection
Two phrases of relationship indicator connection form selective expression in sentence, such as "or", " ... among ", " ...
Between ", " or ", " between ", " among ", etc..The phrase that the intention shows to be linked by it similar to designator can
To be the similar inquiry of intention of candidate.
In other words, in step S3510, one or more inquiries are obtained to phrase from least one data source, wherein often
Individual inquiry includes to phrase:The input inquiry, it is intended to similar designator and the 3rd phrase;And from each inquiry to short
Language extracts the 3rd phrase, is intended to similar inquiry as described.
If for example, the inquiry inputted is " pressure type cleaning machine ", following syntagma can be obtained from data source
(sentence segment):
Pressure type cleaning machine vs cold anticyclone cleaning machines;
Pressure type cleaning machine vs Pneumatic cleaning machines;
Pressure type cleaning machine and air compressor;
Pressure type cleaning machine and steam cleaner;
Grass mower or pressure type cleaning machine.
Therefore, for inquiring about " pressure type cleaning machine ", " cold anticyclone cleaning machine ", " Pneumatic cleaning machine ", " air can be selected
Compressor ", " steam cleaner " are used as intention is similar to inquire about with " grass mower ".
Next, in step S3520, calculate between each inquiry being intended in similar inquiry and the input inquiry
Similar degree.The method for calculating the similar degree between each inquiry being intended in similar inquiry and the input inquiry will below
In be more fully described.Finally, in step S3530, it is intended to select similar degree highest specific quantity in similar inquiry from described
The similar inquiry of intention or similar degree be more than the similar inquiry of intention of predetermined threshold, as output.
In addition, when input inquiry is that more words are inquired about, similar inquiry can be intended to generate using the method shown in Fig. 9.
As shown in figure 9, first, in step S3610, identify that the core of the input inquiry as the inquiry of more words is intended to part and modification
Language part.
Figure 10 shows that according to the embodiment of the present invention the core of identified input inquiry is intended to part and modifier part
Method flow chart.As shown in Figure 10, first, in step S3710, extension is generated for each semantic primitive of input inquiry
Inquiry.That is, the input inquiry is parsed, the input inquiry is divided into multiple semantic primitives(Multiple words);For described
The each semantic primitive divided of input inquiry, generate the interim intention that the semantic primitive by being divided and changing section are formed
Similar inquiry(Expanding query), wherein the changing section is the intention generated for other semantic primitives of the input inquiry
Similar phrase.In one embodiment, it is described to be intended to similar phrase(Changing section)Generation can include:From at least one
Data source obtains one or more inquiries to phrase, wherein each inquiry includes to phrase:Other semantemes of the input inquiry
Unit, it is intended to similar designator and the 3rd phrase;And from each inquiry to the 3rd phrase described in Phrase extraction, as institute
State the similar phrase of intention(Changing section).
Next, in step S3720, for each semantic primitive divided of the input inquiry, face for each
The similar inquiries of Shi Yitu(Expanding query)One group of intention is excavated, is looked into wherein each intention provides for corresponding interim intention is similar
The sub-topicses of inquiry.For each semantic primitive divided of the input inquiry, pass through the interim of more corresponding semantic primitive
It is intended to the similar intention group inquired about to calculate consistent degree, wherein the consistent degree is the interim intention class for corresponding semantic primitive
Like the homophylic measurement of intention of inquiry, it is intended to if present in interim similar to the intention type in the intention inquired about more commonly,
Then the consistent degree is higher.
Next, in step S3730, the semantic primitive in the input inquiry with highest consistent degree is defined as described
The core of input inquiry is intended to part, and other semantic primitives are defined as to the modifier part of the input inquiry.
For example, " becoming a paralegal ", using the above method, generate and expand for each word for input inquiry
Exhibition inquiry.Table 4 shows the example of the query word and corresponding expanding query for the inquiry of more words.
Table 4 is for the query word of more words inquiry and the example of corresponding expanding query
Then, for each expanding query, it is intended to using traditional method to generate, and each language is directed to by comparing
Intention group that adopted unit excavates calculates consistent degree.
In one embodiment, the consistent degree can be calculated as below:
Wherein, NAllInetentRepresent intentional, the N obtained for the expanding query of each semantic primitivePopIntent
Represent the intent information description present in more than 5 inquiries.
For example, in " becoming a Engineer ", " becoming a Accountant ", " becoming a Law
In clerk " etc intention, generally existing " becoming a*class ", " becoming a*degree ", " becoming
A*training " etc intention type.However, " training paralegal ", " severing paralegal ",
" supervising a paralegal ", " and in directing a paralegal " etc intention, few generally existings
Intention type.Therefore for input inquiry " becoming a paralegal ", the consistent degree ratio of " becoming "
" paralegal " is high.In this example, by data analysis, the consistent degree of " becoming " is 0.81, and " paralegal "
Consistent degree is 0.03.Therefore in the inquiry, it is " becoming " that core, which is intended to part, and modifier part is " paralegal ",
The intention of inquiry is mainly determined by " becoming ".
Referring back to Fig. 9, in step S3620, by using the modifier of input inquiry described in a variety of replacement partial replacements
Part come generate it is described be intended to similar inquiry, wherein it is each substitute part be for modifier part generation intention it is similar
Phrase, wherein each be intended to similar phrase with the same or similar intention type in modifier part with the input inquiry.
In one embodiment, it is intended that similar phrase(Substitute part)Generation include:From at least one data source obtain one or
Multiple queries are to phrase, wherein each inquiry includes to phrase:The modifier part, the similar designator of intention and the 3rd are short
Language;And similar phrase is intended to as described to the 3rd phrase described in Phrase extraction from each inquiry(Substitute part).
Next, each inquiry being intended in similar inquiry and the input inquiry can be calculated in step S3630
Between similar degree.The method for calculating the similar degree between each inquiry being intended in similar inquiry and the input inquiry will be
Hereinafter it is more fully described.Finally, can be intended to select similar degree highest in similar inquiry from described in step S3640
It is certain amount of to be intended to the inquiry similar more than the intention of predetermined threshold of similar inquiry or similar degree, as output.
Alternatively, when the input inquiry is more word inquiries, can also be anticipated only for the core of the input inquiry
The generation of figure part is intended to similar phrase, is intended to similar inquiry as described.Specifically, when the input inquiry is more word inquiries
When, being intended to similar inquiry for input inquiry generation can include:Identify the input inquiry core be intended to part and
Modifier part;Then the intention of the core intention part of the input inquiry is generated similar to phrase, it is similar as the intention
Inquiry.
It is then also possible to using method as described below calculate each inquiry being intended in similar inquiry with it is described defeated
Enter the similar degree between inquiry.Finally, can be intended to select the certain amount of meaning of similar degree highest in similar inquiry from described
The similar inquiry of figure or similar degree are more than the similar inquiry of intention of predetermined threshold, as output.
Wherein it is possible to identify that the core of the input inquiry is intended to part and modification by referring to the method for Figure 10 descriptions
Language part.First, the input inquiry is parsed, the input inquiry is divided into multiple semantic primitives(Multiple words);For
The each semantic primitive divided of the input inquiry, generate the semantic primitive by being divided and changing section form it is interim
It is intended to similar inquiry, wherein the changing section is the intention generated for other semantic primitives of the input inquiry similar to short
Language.In one embodiment, it is described to be intended to similar phrase(Changing section)Generation can include:From at least one data source
One or more inquiries are obtained to phrase, wherein each inquiry includes to phrase:Other semantic primitives, the meaning of the input inquiry
Figure similar designator and the 3rd phrase;And from each inquiry to the 3rd phrase described in Phrase extraction, as the intention
Similar phrase(Changing section).Next, for each semantic primitive divided of the input inquiry, for each interim
It is intended to similar inquiry and excavates one group of intention, wherein each is intended to provide for the corresponding interim sub-topicses for being intended to similar inquiry.
For each semantic primitive divided of the input inquiry, pass through the similar inquiry of the interim intention of more corresponding semantic primitive
Intention group calculate consistent degree, wherein the consistent degree is the intention of the similar inquiry of interim intention for corresponding semantic primitive
Homophylic measurement, it is intended to if present in interim similar to the intention type in the intention inquired about more commonly, then the consistent degree
It is higher.Finally, the core for the semantic primitive in the input inquiry with highest consistent degree being defined as to the input inquiry is anticipated
Figure part, and other semantic primitives are defined as to the modifier part of the input inquiry.In addition, in one embodiment,
Generate the intention of the core intention part of the input inquiry includes similar to phrase:One or more is obtained from least one data source
Individual inquiry is to phrase, wherein each inquiry includes to phrase:The core of the input inquiry is intended to part, is intended to similar designator
And the 3rd phrase;And similar inquiry is intended to as described to the 3rd phrase described in Phrase extraction from each inquiry.
For example, if input inquiry is " black history ", it may be determined that the core of the input inquiry, which is intended to part, is
“history”.Modifier part " black " can not be considered, and only generate the intention of " history " similar to phrase, such as
“history timeline”、“study of history”、“list of famous history”、“resources
History " etc., as the similar inquiry of intention.
Below, the method that description calculates the similar degree between the inquiry being intended in similar inquiry and input inquiry.The meaning
The similar degree between each inquiry and the input inquiry in the similar inquiry of figure is counted by least one in the following
Calculate.
(1)The inquiry and the consistent degree of the input inquiry, if the inquiry being intended in similar inquiry with it is described
The intention type of input inquiry is more similar, then the similar degree between them is higher;
(2)The inquiry and the vocabulary similarity of the input inquiry, if the inquiry being intended in similar inquiry with
The form of the input inquiry is more similar, then the two inquiry between similar degree it is higher, such as " car ", " motorbike ",
The similar degree ratio " motorbike " of " motorscooter ", the similar degree of " bike " are high;
(3)The inquiry and the grammer similarity of the input inquiry, if the inquiry being intended in similar inquiry with
The input inquiry is in context(Fragment or document)Grammatical pattern in environment is more similar, then similar between the two inquiries
Degree is higher, such as relative to " ride a bike ", " drive a car " and " drive a motor " similar degree is higher;
(4)The inquiry and the semantic similarity of the input inquiry, if the inquiry being intended in similar inquiry with
The input inquiry is more similar in implication, then the similar degree of the two inquiries is higher;
(5)The inquiry and context similarity of the input inquiry in the collected works prepared, if the intention
Inquiry and the context of the input inquiry in similar inquiry(Fragment or document)It is more similar, then the two inquiry similar degrees
It is higher;
(6)The inquiry and common occurrence rate of the input inquiry in inquiry log, if described be intended to similar look into
Inquiry in inquiry occurs more frequent jointly with the input inquiry in inquiry log, then the similar degree of the two inquiries is got over
It is high;
(7)The inquiry and distance of the input inquiry in domain body, such as Britain, Japan and Fa Guodoushi states
Family, but because Britain and France are all European countries in the body, therefore the similar degree of Britain and France is higher than Britain and day
This similar degree;And
(8)The inquiry and the similitude of the muster data of the input inquiry, if described be intended in similar inquiry
Inquiry is similar to the curve of the muster data of the input inquiry, then the two inquiries are similar.
In addition, the similar degree between each inquiry being intended in similar inquiry and the input inquiry can also pass through
At least one of real-world information calculate, the real-world information comprises at least:Time, position, user model, Yi Jihuan
Border.
For example, the inquiry inputted is " Phoenix university ", and the similar inquiry of the intention generated can be such as the institute of table 5
Show.
" the similar inquiry of university of phoenix " intention of table 5
When user scans in Beijing, user may want to be used as " the university of " university in the U.S. "
Of phoenix " information, and when user scans in State of Arizona, US Mesa cities, he may like to know that conduct
" university of Arizona State " " university of phoenix " information, thus in diverse location this two
For individual user, the similar degree for the similar inquiry of each generated intention is different.
For in Pekinese user, most similar inquiry is probably Stanford University, Harvard
University, Massachusetts Institute of Technology and University of Pennsylvania.
And for the user in State of Arizona, US Mesa cities, most similar inquiry is probably Western
International University, Grand Canyon University, University of Arizona and
Northern Arizona University。
If in addition, it will be appreciated by those skilled in the art that the identity of user, used equipment(Such as computer, hand
Machine, printer etc.)Difference, the similar degree for the similar inquiry of intention of institute's input inquiry is also different.
In addition, it will be appreciated by those skilled in the art that can combine in an arbitrary manner it is any of the above generation be intended to it is similar
The mode of inquiry.
Referring back to Fig. 4, in step S3120, by using method of the prior art, looked into for each intention is similar
Ask and excavate one group of intention, wherein each be intended to provide for the corresponding sub-topicses for being intended to similar inquiry.
Next, in step S3130, similar meaning is determined by using the whole intention groups for being intended to similar inquiry
Figure information description collection.Similar intent information description is the linguistic form of respective intent type.For example, go out as shown in Table 2,
" becoming a paralegal class " intention type is " course(Course)", but in the present invention, we are not
Need to identify the intention type of the intention, and only need only to extract the similar intent information description of the intention.Such as
" for becoming a paralegal class ", extract " * class ".
Similar intent information description can be generated by input inquiry, such as use " becoming a engineer
Class " with " steps on becoming a lawyer " generate similar intent information description " becoming a
Paralegal class " and " steps on becoming a paralegal ".In addition, the similar intent information description
It can be presented by the regular expression of input inquiry.Such as inquiry " the similar inquiry of becoming a paralegal " intention
It is that " becoming a paralegal class ", " the similar inquiry of becoming a engineers " intention is for inquiry
" becoming a engineer class ", therefore, similar intent information description can be expressed as " * class ".
According to one embodiment of the present invention, the similar intent information description can be determined by following step
Collection:Analyze the linguistic form that each being intended in whole intention groups of similar inquiry is intended to;Determine the language shape
At least one query intention relation in formula between the linguistic form and remaining linguistic form of the similar inquiry of respective intent;Correspond to
Each linguistic form being intended to is transformed to regular expression by identified at least one query intention relation;And conversion is obtained
The regular expression obtained is added to the similar intent information description and concentrated.
Preferably, determine that the similar intent information describes collection and may further include:Each intention group is extended, is wrapped
Include:For each intention in the intention group, by being replaced with the synonym or near synonym of at least one word in intention
At least one word and generate synonymous phrase, wherein at least one word is not intended in similar inquiry corresponding, and will
Caused synonymous phrase is added in the intention group.
Similar intent information description can have polytype, such as vocabulary type describes similar to intent information, grammer type class
Intent information similar with logical type is described like intent information description, semantic type similar to intent information to describe.
According to the embodiment of the present invention, each being intended in whole intention groups of similar inquiry can be intended into
Row morphological analysis, syntactic analysis, semantic relation analysis and logic analysis in any one or more(With random order), and will
Resulting similar intent information description combines, so that it is determined that the similar intent information description collection.
Figure 11 is shown according to the embodiment of the present invention, determine that similar intent information describes by morphological analysis means
The flow chart of the method for collection.
As shown in figure 11, described be intended to similar to the complete of inquiry is parsed by morphological analysis means in step S4100 first
Each intention in the intention group in portion, to detect whether the similar inquiry of corresponding intention meets at least one morphological rule.Such as
Fruit is intended to similar inquiry and meets at least one morphological rule accordingly, then similar for the intention next, in step S4110
Each intention in the intention group of inquiry, the language of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard
Speech form, and the intention converted.Next, in step S4120, it is determined that the vocabulary type class with vocabulary and asterisk wildcard form
Described like intent information, will the conversion intention as having vocabulary intent information similar with the vocabulary type of asterisk wildcard form
Description, the vocabulary type is described as the regular expression similar to intent information;And the regular expression is added to similar meaning
The description of figure information is concentrated.
For example, if the inquiry of input is " scooter ", following exemplary term type can be generated and retouched similar to intent information
State:
*store
electronic*
online
cheap*
*motor
Figure 12 is shown according to the embodiment of the present invention, determine that similar intent information describes by syntactic analysis means
The flow chart of the method for collection.
As shown in figure 12, described be intended to similar to the complete of inquiry is parsed by syntactic analysis means in step S4200 first
Each intention in the intention group in portion, to detect whether the similar inquiry of corresponding intention meets at least one syntax rule.Such as
Fruit is intended to similar inquiry and meets at least one syntax rule accordingly, then is next looked into step S4210 for the intention is similar
Each intention in the intention group of inquiry, the language of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard
Form, and the intention converted.Next, in step S4220, it is determined that the grammer type with syntax rule and asterisk wildcard form
Similar intent information describes, will the intention of the conversion as having, syntax rule is similar with the grammer type of asterisk wildcard form to anticipate
Figure information is described, and the grammer type is described as the regular expression similar to intent information;And the regular expression is added to
Similar intent information description is concentrated.
For example, for input inquiry " scooter ", following example grammar type can be generated and described similar to intent information:
*/prep/kids
how to/verb/*
*/prep/sale
Figure 13 show according to the embodiment of the present invention, similar intent information is determined by semantic relation analysis means
The flow chart of the method for collection is described.
As shown in figure 13, described be intended to similar to inquiry is parsed by semantic relation analysis means in step S4300 first
Whole intention groups in each intention, corresponding be intended to whether similar inquiry meets at least one semantic close to detect
System.It is intended to similar inquiry if corresponding and meets at least one semantic relation, next in step S4310, for the intention
Each intention in the intention group of similar inquiry, the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard
Linguistic form, and remaining linguistic form is replaced with the semantic marker of remaining linguistic form of the intention, and converted
Intention.Next, in step S4320, it is determined that being retouched with semantic marker intent information similar with the semantic type of asterisk wildcard form
State, will the conversion intention as having the description of semantic marker similar with the semantic type of asterisk wildcard form intent information, general
The semantic type describes to be used as the regular expression similar to intent information;And the regular expression is added to similar intent information and retouched
State concentration.
For example, for input inquiry " scooter ", following exemplary semantic type can be generated and described similar to intent information:
*<brand>
*<company>
Figure 14 is shown according to the embodiment of the present invention, determine that similar intent information describes by logic analysis means
The flow chart of the method for collection.
As shown in figure 14, described be intended to similar to the complete of inquiry is parsed by logic analysis means in step S4400 first
Each intention in the intention group in portion, to detect whether the similar inquiry of corresponding intention meets at least one logical relation.Such as
Fruit is intended to similar inquiry and meets at least one logical relation accordingly, then is next looked into step S4410 for the intention is similar
Each intention in the intention group of inquiry, the language of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard
Form, and remaining linguistic form is replaced with the logical type of remaining linguistic form of the intention, and the intention converted.
Next, in step S4420, it is determined that with logical type intent information description similar with the logical type of asterisk wildcard form, will
The intention of the conversion describes as with logical type intent information similar with the logical type of asterisk wildcard form, by the logical type
Similar intent information description is used as the regular expression;And the regular expression is added to similar intent information description and concentrated.
For example, for input inquiry " scooter ", following example logic type can be generated and described similar to intent information:
*[version of](Word)
(Word)[place of]*
As it was previously stated, each being intended in whole intention groups of similar inquiry can be intended to carry out morphological analysis,
Syntactic analysis, semantic relation analysis and logic analysis in any one or more.For example, can be to being intended to the complete of similar inquiry
Each intention in the intention group in portion only carries out single in morphological analysis, syntactic analysis, semantic relation analysis and logic analysis
A kind of only analysis, morphology point can also be only carried out to each intention being intended in whole intention groups of similar inquiry
All four kinds of analyses in analysis, syntactic analysis, semantic relation analysis and logic analysis.Therefore, resulting similar intent information
Description collection may include that vocabulary type describes similar to intent information, grammer type describes similar to intent information, semantic type is similar to intent information
One or more in description intent information description similar with logical type.
In addition, in one embodiment, determine that the similar intent information describes collection and may further include:Calculate institute
State similar intent information description and concentrate each confidence level similar to intent information description;And retouched from the similar intent information
State and concentrate the certain amount of similar intent information description of selection confidence level highest or confidence level to be more than the similar of predetermined threshold
Intent information describes.
In addition, the confidence level can be calculated using at least one in the following:Similar intent information description
Frequency;The coverage rate of similar intent information description;And similar intent information description and the correlation of input inquiry.
In addition, the confidence level can calculate from least one in the following:The similar intent information description collection;
The intention training set prepared;And the realm information prepared.
In addition, the confidence level for calculating similar intent information description from the similar intent information description collection can be wrapped further
Include:Described according to the popularity for being intended to similar inquiry come the corresponding similar intent information for describing to concentrate to the similar intent information
Configure different weights;And/or inquired about according to intention is similar with the similar degree between the input inquiry come to the similar meaning
The corresponding similar intent information description that the description of figure information is concentrated configures different weights.
Or by previous inquiry " exemplified by university of phoenix " universities.For in Pekinese user, by
In Stanford University, Harvard University, Massachusetts Institute of
Technology and University of Pennsylvania similar degree is high, therefore can be the similar inquiry point of these intentions
With higher weight.Table 6 is shown to " university of phoenix " each intention is similar to inquire about distributed power
Weight.
" the weight example of university of phoenix " the similar inquiry of intention of table 6
Therefore, for input inquiry " university of phoenix ", " the similar meaning of university of* " forms
The weight that the description of figure information obtains is higher.
Referring back to Fig. 4, next in step S3140, pin is excavated by using the similar intent information description collection
To the intention of the input inquiry.In one embodiment, can be described by replacing similar intent information with input inquiry
The asterisk wildcard in similar intent information description is concentrated to generate the intention for the input inquiry.If for example, input inquiry
For " becoming a paralegal ", and similar intent information is described as that " step to* ", then new intention can be generated
" step to becoming a paralegal ", and the intention generated can be exported.
Figure 15 show according to the embodiment of the present invention carry out being intended to excavation by using similar inquiry is intended to
Another flow chart of method.Method shown in Figure 15 passes through by intention method for digging of the prior art and according to the present invention's
Combination of Methods gets up, to realize that more accurate intention is excavated.For purposes of brevity, in present embodiment with reference picture 4
The detailed description of identical step will be omitted in described embodiment.
As shown in figure 15, first, in step S5100, the inquiry of user's input is obtained.Those skilled in the art can manage
Solution, the inquiry that user is inputted can use various language, include but is not limited to:Chinese, English, Japanese, Korean, German, method
Language, Russian, Arabic etc..For example, the inquiry that user is inputted can be " becoming a paralegal ".
Next, in step S5110, by using method well known in the prior art from search engine, wikipedia with
And the external resource of overall importance such as inquiry log excavates one group of intention candidate of the input inquiry.Next, in step S5120,
The intention candidate of repetition is removed from the intention candidate obtained.Next, in step S5130, it is ranked up to being intended to candidate,
To obtain first group of intention.
Table 7 shows inquiry " the becoming a paralegal ", by using prior art inputted for user
In first group of intention obtaining of known method.
Table 7 is for " first group of intention that becoming a paralegal " are obtained
With continued reference to Figure 15, second group of meaning for the input inquiry is excavated by using the method for the invention
Figure.That is, in step S3110, it is intended to similar inquiry for input inquiry generation, each of which, which is intended to similar inquiry, to be had
With the same or similar intention type of the input inquiry.In step S3120, one group is excavated for the similar inquiry of each intention
It is intended to, wherein each be intended to provide for the corresponding sub-topicses for being intended to similar inquiry.In step S3130, by using described
It is intended to the similar whole intention groups inquired about to determine similar intent information description collection.In step S3140, by using the class
Collection is described like intent information to excavate second group of intention for the input inquiry.
Following step S5140, the combination to first group of intention and second group of intention are ranked up.In a kind of embodiment
In, only occurring in the intention in first group of intention can be deleted.
In another embodiment, can also be by using the similar intent information description collection and first group of meaning
Figure excavates second group of intention for the input inquiry.A kind of embodiment includes:By with input inquiry to replace
The asterisk wildcard at least one similar intent information description that similar intent information description is concentrated is stated to generate at least one intention,
Wherein described at least one intention is not in first group of intention;And at least one meaning generated is added in first group of intention
Figure, and first group of intention of at least one intention generated with the addition of is as second group of intention.
However, some inquiries may have the peculiar intention for being not present in being intended in the intention of similar inquiry.In the present invention
Some embodiments in, to these it is peculiar intention especially handled.For example, for input inquiry " last supper
Painting ", first group of intention on the input inquiry is shown in table 8.
Table 8 is on " last supper painting " first group of intention
As can be seen from Table 8, in the case where being intended to similar inquiry and being directed to Leonardesque other oil paintings, " last supper
Painting Jesus " and " Last Supper Painting Milan Italy " are specific to the inquiry.And it is being intended to
In mining process, it is desirable to remain these peculiar intentions.Therefore, according to the embodiment of the present invention, by using described
Similar intent information description collection excavates the another kind of second group of intention for the input inquiry with first group of intention
Embodiment includes:Set pair is described by using the similar intent information to carry out for first group of intention of the input inquiry
Sequence.The embodiment further comprises:Identification is for the peculiar intention in first group of intention of the input inquiry;According to spy
Intentional peculiar degree, improve weight of the peculiar intention in the sequence;Wherein, by the following at least one of come
Calculate special intentional peculiar degree:Input inquiry and special intentional common occurrence rate in the intention training set prepared;It is defeated
Enter relation of the inquiry with peculiar intention in domain knowledge;Frequency of the peculiar intention in muster data;And peculiar intention exists
Popularity in inquiry log.
With continued reference to Figure 15, in step S5150, will be intended to export according to the requirement of user.For example, certain number can be exported
The intention of amount.Table 9 is shown for the input inquiry " intention that becoming a paralegal " are exported.Obviously, with reference to table 2
Shown time of day data(Model answer), resulting result is compared to passing through first group of knot obtained by prior art
Fruit more meets the requirement of user.
Table 9 is for input inquiry " becoming a paralegal ", the intention exported by the method for the present invention
The present inventor compares survey to the method for Figure 15 according to the present invention with method of the prior art
Examination.By test, the method shown in Fig. 1 is performance the best way in the prior art.Therefore, the method shown in Fig. 1 is selected to make
For the contrast of the inventive method.
Using the method shown in Fig. 1 of prior art from overall situations such as search engine, wikipedia, inquiry log and Anchor Texts
Property external resource excavate the intention candidate of the inquiry, and be ranked up by the frequency of occurrences to being intended to candidate.
As a comparison, carrying out intention excavation using the method shown in Figure 15 according to the present invention, and pass through the frequency of occurrences pair
It is intended to candidate to be ranked up.The present inventor is also tested to 50 inquiries, including:“furniture for
small spaces”、“Churchill downs”、“becoming a paralegal”、“internet phone
service”、“Arkansas”、“battles in the civil war”、“hobby stores”、“Ontario
California airport " etc..Table 10 shows average test result.
Measurement | Prior art | The present invention | Improve |
I-rec | 0.3785 | 0.3933 | 0.0148 |
D-nDCG | 0.3384 | 0.3715 | 0.0331 |
D#-nDCG | 0.3584 | 0.3826 | 0.0242 |
The performance comparision of the present invention of table 10 and prior art
As can be seen from Table 10, compared to the method for prior art, recalled according to the intention of Figure 15 of present invention method
Rate and intention accuracy rate are all improved.In addition, in terms of D#-nDCG, method of the invention improves than the method for prior art
2.42%。
In order to more intuitively react the effect of the present invention, " entered by input inquiry exemplified by becoming a paralegal "
Row describes in detail." becoming a paralegal ", preceding 10 knots of the output of the present invention and prior art are taken for input
Fruit is compared.Table 11 shows the time of day data of desired acquisition.Table 12 show prior art and the present invention it is each
From output.Table 13 shows the test and comparison result of prior art and the present invention, it is clear that the result that the present invention obtains more is defined
Really.That is, the accuracy rate for being intended to excavate can be provided by the present invention.
The desired time of day data obtained of table 11
The respective output of the prior art of table 12 and the present invention
Measurement | Prior art | The present invention |
I-rec | 0.1111 | 0.3333 |
D-nDCG | 0.0734 | 0.5053 |
D#-nDCG | 0.0922 | 0.4193 |
The test and comparison of the prior art of table 13 and the present invention
Experiment is compared more than, can further be confirmed the present invention and can more precisely be carried out compared to prior art
It is intended to excavate, and improves intention recall rate.
Figure 16 shows the flow chart for method for information retrieval according to the embodiment of the present invention.Such as Figure 16 institutes
Show, in step S6100, receive the input inquiry that user uses natural language.Next, in step S6110, retouched according to this paper
The method that the use stated is intended to similar inquiry carries out intention excavation from the input inquiry.Next, in step S6120, obtain
Excavate the search result being intended to.
Figure 17 shows the flow chart of the method for being used for question and answer auxiliary according to the embodiment of the present invention.Such as Figure 17 institutes
Show, in step S6200, receive the input inquiry that user uses natural language.Next, in step S6210, retouched according to this paper
The method that the use stated is intended to similar inquiry excavates theme from the input inquiry.Next, in step S6220, it is directed to
The answer of the theme excavated.
Figure 18 shows the functional block diagram for being used to excavate the equipment 7000 being intended to according to the embodiment of the present invention.This sets
Standby 7000 all functional modules(That is, the various units included by the equipment 7000, either show in figure, or in figure
It is not shown)It can be realized by realizing the combination of the hardware of the principle of the invention, software or hardware and software.This area skill
Art personnel are understandable that the functional module described in Figure 18 can combine or be divided into submodule, so as to realize
The principle of foregoing invention.Therefore, description herein can support to functions described herein module it is any it is possible combination or
Person divides or further restriction.
As shown in figure 18, according to an aspect of the present invention, can include for excavating the equipment 7000 being intended to:Input is looked into
Acquiring unit 7100 is ask, is intended to similar query generation unit 7200, first intention excavation unit 7300, described similar to intent information
Collect determining unit 7400 and second intention excavates unit 7500.The input inquiry acquiring unit 7100 is configured as obtaining defeated
Enter inquiry.It is described to be intended to be configured as inquiring about for input inquiry generation intention is similar similar to query generation unit 7200,
Each of which, which is intended to similar inquiry, to be had and the same or similar intention type of the input inquiry.The first intention excavates
Unit 7300 is configured as excavating one group of intention for the similar inquiry of each intention, wherein each be intended to provide for corresponding
It is intended to the sub-topicses of similar inquiry.The similar intent information description collection determining unit 7400 is configured as by using the meaning
Figure determines similar intent information description collection similar to whole intention groups of inquiry.The second intention excavate unit 7500 by with
It is set to by using intention of the similar intent information description collection to excavate for the input inquiry.
In one embodiment, it is described to be intended to include similar to query generation unit 7200:Inquiry obtains to phrase
Unit, one or more inquiries is obtained to phrase from least one data source, wherein each inquiry includes to phrase:The input
Inquiry, it is intended to similar designator and the 3rd phrase;And the 3rd Phrase extraction unit, from each inquiry to Phrase extraction institute
The 3rd phrase is stated, is intended to similar inquiry as described.
In one embodiment, it is described to be intended to that at least one in the following is included similar to designator:Side by side
Relationship indicator, wherein two phrases connected by the coordination designator are used as identical grammer member in sentence
Element;The first phrase in relativity designator, wherein sentence by the relativity designator with being connected to described first
The second phrase after phrase is in relativity;And choice relation designator, wherein passing through the choice relation designator
Two phrases of connection form selective expression in sentence.
In one embodiment, it is described to be intended to similar query generation unit when the input inquiry is more word inquiries
7200 can include:Core is intended to part and modifier part recognition unit, identifies that the core of the input inquiry is intended to part
With modifier part;And it is intended to similar phrase generation unit, the core for generating the input inquiry is intended to the intention class of part
Like phrase, it is intended to similar inquiry as described.
In one embodiment, it is described to be intended to similar query generation unit when the input inquiry is more word inquiries
7200 can include:Core is intended to part and modifier part recognition unit, identifies that the core of the input inquiry is intended to part
With modifier part;And modifier partial replacement unit, by using the modification of input inquiry described in a variety of replacement partial replacements
Language part come generate it is described be intended to similar inquiry, wherein each part that substitutes is the intention class for modifier part generation
Like phrase, wherein each be intended to similar phrase with the same or similar intention class in modifier part with the input inquiry
Type.
In one embodiment, the core, which is intended to part and modifier part recognition unit, to include:Input is looked into
Resolution unit is ask, the input inquiry is parsed, the input inquiry is divided into multiple semantic primitives;It is interim to be intended to similar look into
Generation unit is ask, for each semantic primitive divided of the input inquiry, the semantic primitive by being divided is generated and changes
Become the similar inquiry of interim intention that part is formed, wherein the changing section is other semantic primitives for the input inquiry
The intention of generation is similar to phrase;3rd is intended to excavate unit, for each semantic primitive divided of the input inquiry, pin
To one group of intention of each interim similar inquiry excavation of intention, wherein each intention is provided for the similar inquiry of interim intention accordingly
Sub-topicses;Consistent degree computing unit, for each semantic primitive divided of the input inquiry, pass through more corresponding language
The interim intention of adopted unit calculates consistent degree similar to the intention group of inquiry, wherein the consistent degree is for corresponding semantic primitive
The similar inquiry of interim intention the homophylic measurement of intention, if present in the interim intention being intended in the intention of similar inquiry
More commonly, then the consistent degree is higher for type;And core is intended to part determining unit, will have highest in the input inquiry
The core that the semantic primitive of consistent degree is defined as the input inquiry is intended to part, and other semantic primitives is defined as described defeated
Enter the modifier part of inquiry.
In one embodiment, it is described be intended to similar query generation unit can include it is following at least one:From depositing
Store up and one or more inquiries are obtained in the similar inquiry storehouse of intention of the input inquiry as the list for being intended to similar inquiry
Member;One or more brother of nodes of the input inquiry are obtained in domain body as the list for being intended to similar inquiry
Member;The neighbouring concept of the input inquiry is obtained in language dictionaries as the unit for being intended to similar inquiry;And pass through
Calculated based on the muster data associated with the input inquiry and be intended to similarity to obtain one or more from inquiry log
Inquire about as the unit for being intended to similar inquiry.
In one embodiment, it is described to be intended to may further include similar to query generation unit:Similar degree calculates single
Member, calculate the similar degree between each inquiry being intended in similar inquiry and the input inquiry;And it is intended to similar look into
Selecting unit is ask, is intended to select similar degree highest is certain amount of to be intended to similar inquiry or similar degree in similar inquiry from described
Inquiry similar more than the intention of predetermined threshold.
In one embodiment, it is similar between each inquiry being intended in similar inquiry and the input inquiry
Degree can be calculated by least one in the following:The inquiry and the consistent degree of the input inquiry;The inquiry
With the vocabulary similarity of the input inquiry;The inquiry and the grammer similarity of the input inquiry;It is described inquiry with it is described
The semantic similarity of input inquiry;The inquiry and context similarity of the input inquiry in the collected works prepared;Institute
State inquiry and common occurrence rate of the input inquiry in inquiry log;The inquiry is with the input inquiry in domain body
In distance;And the inquiry and the similitude of the muster data of the input inquiry.
In one embodiment, it is similar between each inquiry being intended in similar inquiry and the input inquiry
Degree can be calculated by least one real-world information, and the real-world information comprises at least:Time, position, user
Model and environment.
In one embodiment, the similar intent information description can be in by the regular expression of input inquiry
It is existing.
In one embodiment, the similar intent information description collection determining unit 7400 can include:Linguistic form
Analytic unit, analyze the linguistic form that each being intended in whole intention groups of similar inquiry is intended to;Query intention
Relation determination unit, determine in the linguistic form between the linguistic form and remaining linguistic form of the similar inquiry of respective intent
At least one query intention relation;Regular expression converter unit, will be every corresponding to identified at least one query intention relation
The linguistic form of one intention is transformed to regular expression;And regular expression adding device, the regular expression for converting acquisition is added
The similar intent information description is added to concentrate.
In one embodiment, the similar intent information description collection determining unit 7400 may further include:Meaning
Figure group expanding element, each intention group is extended, including:Synonymous phrase generation unit, for each meaning in the intention group
Figure, by replacing at least one word with the synonym or near synonym of at least one word in intention to generate synonymous phrase,
Wherein described at least one word is not intended in similar inquiry corresponding, and synonymous phrase adding device, same by caused by
Adopted phrase is added in the intention group.
In one embodiment, the similar intent information description collection determining unit 7400 may further include:The
One intents unit, each being intended in whole intention groups of similar inquiry is parsed by morphological analysis means
It is intended to, to detect whether the similar inquiry of corresponding intention meets at least one morphological rule;First asterisk wildcard replacement unit, if
It is corresponding to be intended to meet at least one morphological rule similar to inquiry, then for each in the intention group of the similar inquiry of the intention
It is intended to, replaces the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention with asterisk wildcard, and the meaning converted
Figure;First regular expression generation unit, using the intention of the conversion as similar with the vocabulary type of asterisk wildcard form with vocabulary
Intent information is described, and the vocabulary type is described as the regular expression similar to intent information;And first regular expression add
Add unit, the regular expression is added into similar intent information description concentrates.
In one embodiment, the similar intent information description collection determining unit 7400 may further include:The
Two intents units, each being intended in whole intention groups of similar inquiry is parsed by syntactic analysis means
It is intended to, to detect whether the similar inquiry of corresponding intention meets at least one syntax rule;Second asterisk wildcard replacement unit, if
It is corresponding to be intended to meet at least one syntax rule similar to inquiry, then for each in the intention group of the similar inquiry of the intention
It is intended to, replaces the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention with asterisk wildcard, and the meaning converted
Figure;Second regular expression generation unit, using the intention of the conversion as the grammer type with syntax rule and asterisk wildcard form
Similar intent information is described, and the grammer type is described as the regular expression similar to intent information;And the second regular table
Up to adding device, the regular expression is added to similar intent information description and concentrated.
In one embodiment, the similar intent information description collection determining unit 7400 may further include:The
Three intents units, described be intended to similar to every in the whole intention groups inquired about is parsed by semantic relation analysis means
One intention, to detect whether the similar inquiry of corresponding intention meets at least one semantic relation;3rd asterisk wildcard replacement unit,
If corresponding be intended to meet at least one semantic relation similar to inquiry, for every in the intention group of the similar inquiry of the intention
One intention, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and with the meaning
The semantic marker of remaining linguistic form of figure replaces remaining linguistic form, and the intention converted;And the 3rd regular table
Up to generation unit, the intention of the conversion is retouched as with semantic marker intent information similar with the semantic type of asterisk wildcard form
State, and the semantic type is described as the regular expression similar to intent information;3rd regular expression adding device, by it is described just
Rule expression is added to similar intent information description and concentrated.
In one embodiment, the similar intent information description collection determining unit 7400 may further include:The
Four intents units, each being intended in whole intention groups of similar inquiry is parsed by logic analysis means
It is intended to, to detect whether the similar inquiry of corresponding intention meets at least one logical relation;4th asterisk wildcard replacement unit, if
It is corresponding to be intended to meet at least one logical relation similar to inquiry, then for each in the intention group of the similar inquiry of the intention
It is intended to, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and with the intention
The logical type of remaining linguistic form replaces remaining linguistic form, and the intention converted;And the 4th regular expression life
Into unit, described using the intention of the conversion as with logical type intent information similar with the logical type of asterisk wildcard form,
And the logical type is described as the regular expression similar to intent information;4th regular expression adding device, will be described regular
Expression is added to similar intent information description and concentrated.
In one embodiment, the similar intent information description collection determining unit may further include:Confidence level
Computing unit, calculate the similar intent information description and concentrate each confidence level similar to intent information description;It is and similar
Intent information describes selecting unit, concentrates selection confidence level highest certain amount of similar from the similar intent information description
Intent information describes or confidence level describes more than the similar intent information of predetermined threshold.
In one embodiment, the confidence level can be calculated using at least one in the following:Similar meaning
The frequency of figure information description;The coverage rate of similar intent information description;And similar intent information description and the phase of input inquiry
Guan Xing.
In one embodiment, the confidence level can calculate from least one in the following:The similar meaning
Figure information description collection;The intention training set prepared;And the realm information prepared.
In one embodiment, the confidence computation unit may further include:First weight dispensing unit, root
The corresponding similar intent information description configuration for concentrate the similar intent information description according to the popularity for being intended to similar inquiry
Different weights;And/or the second weight dispensing unit, according to the similar degree being intended between similar inquiry and the input inquiry come
Different weights is configured to the corresponding similar intent information description that the similar intent information description is concentrated.
In one embodiment, the second intention excavates unit 7500 and can included:Input inquiry replacement unit, lead to
Cross and concentrate the asterisk wildcard in similar intent information description to produce one to replace the similar intent information description with input inquiry
Group is intended to.
In one embodiment, the second intention excavates unit 7500 and can included:First group of intention excavates unit,
First group of intention for the input inquiry is excavated from least one data source;And second group of intention excavates unit, passes through
Second group of intention for the input inquiry is excavated using the similar intent information description collection and first group of intention.
In one embodiment, second group of intention is excavated unit and can included:By being replaced with input inquiry
The similar intent information describes the asterisk wildcard at least one similar intent information description of concentration to generate at least one meaning
The unit of figure, wherein at least one intention is not in first group of intention;And add what is generated in first group of intention
The unit of at least one intention.
In one embodiment, second group of intention is excavated unit and can included:Sequencing unit, by using described
Similar intent information describes set pair and is ranked up for first group of intention of the input inquiry.
In one embodiment, second group of intention is excavated unit and may further include:Peculiar intention assessment list
Member, identification is for the peculiar intention in first group of intention of the input inquiry;Weight changes unit, according to special intentional spy
Degree of having, improve weight of the peculiar intention in the sequence;Wherein, peculiar meaning is calculated by least one in the following
The peculiar degree of figure:Input inquiry and special intentional common occurrence rate in the intention training set prepared;Input inquiry and spy
The intentional relation in domain knowledge;Frequency of the peculiar intention in muster data;And peculiar intention is in inquiry log
Popularity.
Figure 19 shows the functional block diagram of the equipment 8000 for information retrieval according to the embodiment of the present invention.This sets
Standby 8000 all functional modules(That is, the various units included by the equipment 8000, either show in figure, or in figure
It is not shown)It can be realized by realizing the combination of the hardware of the principle of the invention, software or hardware and software.This area skill
Art personnel are understandable that the functional module described in Figure 19 can combine or be divided into submodule, so as to realize
The principle of foregoing invention.Therefore, description herein can support to functions described herein module it is any it is possible combination or
Person divides or further restriction.
As shown in figure 19, the equipment 8000 for information retrieval includes:Input inquiry receiving unit 8100, it is above-mentioned be used for anticipate
Scheme the equipment 7000 and search result obtaining unit 8200 excavated.The input inquiry receiving unit 8100 is configured as receiving
User uses the input inquiry of natural language.The equipment 7000 for being used to be intended to excavate is configured as entering from the input inquiry
Row is intended to excavate.The search result obtaining unit 8200 is configured as obtaining the search result for excavating and being intended to.
Figure 20 shows the functional block diagram of the equipment 9000 for being used for question and answer auxiliary according to the embodiment of the present invention.This sets
Standby 9000 all functional modules(That is, the various units included by the equipment 9000, either show in figure, or in figure
It is not shown)It can be realized by realizing the combination of the hardware of the principle of the invention, software or hardware and software.This area skill
Art personnel are understandable that the functional module described in Figure 20 can combine or be divided into submodule, so as to realize
The principle of foregoing invention.Therefore, description herein can support to functions described herein module it is any it is possible combination or
Person divides or further restriction.
As shown in figure 20, the equipment 9000 for question and answer auxiliary includes:Input inquiry receiving unit 9100, it is above-mentioned be used for anticipate
Scheme the equipment 7000 and answer obtaining unit 9200 excavated.The input inquiry receiving unit 9100 is configured as receiving user
Using the input inquiry of natural language.The equipment 7000 for being used to be intended to excavate is configured as excavating from the input inquiry and led
Topic.The answer obtaining unit 9200 is configured as obtaining the answer for excavated theme.
The present invention can be realized by following various schemes:
Scheme 1:A kind of to be used to be intended to the method excavated, methods described includes:
Obtain input inquiry;
It is intended to similar inquiry for input inquiry generation, each of which, which is intended to similar inquiry, to be had and the input
Inquire about same or similar intention type;
One group of intention is excavated similar to inquiry for each intention, wherein it is similar for being intended to accordingly to be each intended to offer
The sub-topicses of inquiry;
Similar intent information description collection is determined by using the whole intention groups for being intended to similar inquiry;And
By using intention of the similar intent information description collection to excavate for the input inquiry.
Scheme 2:Method as described in scheme 1, wherein being intended to similar inquiry for input inquiry generation includes:
One or more inquiries are obtained to phrase from least one data source, wherein each inquiry includes to phrase:It is described
Input inquiry, it is intended to similar designator and the 3rd phrase;And
From each inquiry to the 3rd phrase described in Phrase extraction, it is intended to similar inquiry as described.
Scheme 3:Method as described in scheme 2, wherein described be intended to include at least one in the following similar to designator
:
Coordination designator, wherein two phrases connected by the coordination designator are used as phase in sentence
Same syntactic element;
The first phrase in relativity designator, wherein sentence is described with being connected to by the relativity designator
The second phrase after first phrase is in relativity;And
Choice relation designator, wherein two phrases connected by the choice relation designator form choosing in sentence
Selecting property is expressed.
Scheme 4:Method as described in scheme 1 or 2, wherein when the input inquiry is more word inquiries, for described defeated
Enter query generation to be intended to include similar to inquiry:
Identify that the core of the input inquiry is intended to part and modifier part;And
The intention that the core for generating the input inquiry is intended to part is intended to similar inquiry similar to phrase as described.
Scheme 5:Method as described in scheme 1 or 2, wherein when the input inquiry is more word inquiries, for described defeated
Enter query generation to be intended to include similar to inquiry:
Identify that the core of the input inquiry is intended to part and modifier part;And
The similar inquiry of the intention is generated by using the modifier part of input inquiry described in a variety of replacement partial replacements,
Wherein it is each substitute part be the intention for modifier part generation similar to phrase, be intended to similar phrase wherein each and have
There is the same or similar intention type in modifier part with the input inquiry.
Scheme 6:Method as described in scheme 4, wherein identifying that the core of the input inquiry is intended to part and modifier portion
Dividing includes:
The input inquiry is parsed, the input inquiry is divided into multiple semantic primitives;
For each semantic primitive divided of the input inquiry, the semantic primitive by being divided and change portion are generated
Divide the interim intention formed similar inquiry, wherein the changing section is other semantic primitives generation for the input inquiry
Intention similar to phrase;
For each semantic primitive divided of the input inquiry, one is excavated for each similar inquire about of interim intention
Group is intended to, wherein each be intended to provide for the corresponding interim sub-topicses for being intended to similar inquiry;
For each semantic primitive divided of the input inquiry, pass through the interim intention of more corresponding semantic primitive
The intention group of similar inquiry calculates consistent degree, wherein the consistent degree similar is looked into for the interim intention of corresponding semantic primitive
The homophylic measurement of intention of inquiry, it is intended to if present in interim similar to the intention type in the intention inquired about more commonly, then institute
It is higher to state consistent degree;And
The core that the semantic primitive in the input inquiry with highest consistent degree is defined as to the input inquiry is intended to
Part, and other semantic primitives are defined as to the modifier part of the input inquiry.
Scheme 7:Method as described in scheme 5, wherein identifying that the core of the input inquiry is intended to part and modifier portion
Dividing includes:
The input inquiry is parsed, the input inquiry is divided into multiple semantic primitives;
For each semantic primitive divided of the input inquiry, the semantic primitive by being divided and change portion are generated
Divide the interim intention formed similar inquiry, wherein the changing section is other semantic primitives generation for the input inquiry
Intention similar to phrase;
For each semantic primitive divided of the input inquiry, one is excavated for each similar inquire about of interim intention
Group is intended to, wherein each be intended to provide for the corresponding interim sub-topicses for being intended to similar inquiry;
For each semantic primitive divided of the input inquiry, pass through the interim intention of more corresponding semantic primitive
The intention group of similar inquiry calculates consistent degree, wherein the consistent degree similar is looked into for the interim intention of corresponding semantic primitive
The homophylic measurement of intention of inquiry, it is intended to if present in interim similar to the intention type in the intention inquired about more commonly, then institute
It is higher to state consistent degree;And
The core that the semantic primitive in the input inquiry with highest consistent degree is defined as to the input inquiry is intended to
Part, and other semantic primitives are defined as to the modifier part of the input inquiry.
Scheme 8:Method as described in scheme 1 or 2, wherein for input inquiry generation be intended to similar inquiry include with
Under it is at least one:
One or more inquiries, which are obtained, from the similar inquiry storehouse of intention for being stored in the input inquiry is used as the intention
Similar inquiry;
One or more brother of nodes that the input inquiry is obtained in domain body are intended to similar inquiry as described;
The neighbouring concept that the input inquiry is obtained in language dictionaries is intended to similar inquiry as described;And
It is intended to similarity by being calculated based on the muster data associated with the input inquiry to be obtained from inquiry log
Obtain one or more inquiries and be intended to similar inquire about as described.
Scheme 9:Method as described in scheme 1, wherein being intended to similar inquiry further bag for input inquiry generation
Include:
Calculate the similar degree between each inquiry being intended in similar inquiry and the input inquiry;And
It is intended to select the similar inquiry of the certain amount of intention of similar degree highest or similar degree big in similar inquiry from described
In the similar inquiry of the intention of predetermined threshold.
Scheme 10:Method as described in scheme 9, wherein each inquiry being intended in similar inquiry is looked into the input
Similar degree between inquiry is calculated by least one in the following:
The inquiry and the consistent degree of the input inquiry;
The inquiry and the vocabulary similarity of the input inquiry;
The inquiry and the grammer similarity of the input inquiry;
The inquiry and the semantic similarity of the input inquiry;
The inquiry and context similarity of the input inquiry in the collected works prepared;
The inquiry and common occurrence rate of the input inquiry in inquiry log;
The inquiry and distance of the input inquiry in domain body;And
The inquiry and the similitude of the muster data of the input inquiry.
Scheme 11:Method as described in scheme 9, wherein each inquiry being intended in similar inquiry is looked into the input
Similar degree between inquiry is calculated by least one real-world information, and the real-world information comprises at least:Time, position
Put, user model and environment.
Scheme 12:Method as described in scheme 1, wherein the regular table that the similar intent information description passes through input inquiry
Up to presenting.
Scheme 13:Method as described in scheme 12, wherein determining that the similar intent information describes collection and included:
Analyze the linguistic form that each being intended in whole intention groups of similar inquiry is intended to;
Determine between the linguistic form and remaining linguistic form of the similar inquiry of respective intent in the linguistic form at least
A kind of query intention relation;
Each linguistic form being intended to is transformed to regular table corresponding to identified at least one query intention relation
Reach;And
The regular expression for converting acquisition is added into the similar intent information description to concentrate.
Scheme 14:Method as described in scheme 13, wherein determining that the similar intent information describes collection and further comprised:
Each intention group is extended, including:
For in the intention group each intention, by the synonym or near synonym with least one word in intention come
Replace at least one word and generate synonymous phrase, wherein at least one word is not intended in similar inquiry corresponding, with
And
Caused synonymous phrase is added in the intention group.
Scheme 15:Method as described in scheme 13, wherein determining that the similar intent information describes collection and further comprised:
Described each intention being intended in whole intention groups of similar inquiry is parsed by morphological analysis means, with
Detection is corresponding to be intended to whether similar inquiry meets at least one morphological rule;
If corresponding be intended to meet at least one morphological rule similar to inquiry, for the intention of the similar inquiry of the intention
Each intention in group, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and
The intention converted;
Described using the intention of the conversion as with vocabulary intent information similar with the vocabulary type of asterisk wildcard form, and will
The vocabulary type describes to be used as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
Scheme 16:Method as described in scheme 13, wherein determining that the similar intent information describes collection and further comprised:
Described each intention being intended in whole intention groups of similar inquiry is parsed by syntactic analysis means, with
Detection is corresponding to be intended to whether similar inquiry meets at least one syntax rule;
If corresponding be intended to meet at least one syntax rule similar to inquiry, for the intention of the similar inquiry of the intention
Each intention in group, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and
The intention converted;
Described using the intention of the conversion as with syntax rule intent information similar with the grammer type of asterisk wildcard form,
And the grammer type is described as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
Scheme 17:Method as described in scheme 15, wherein determining that the similar intent information describes collection and further comprised:
Described each intention being intended in whole intention groups of similar inquiry is parsed by syntactic analysis means, with
Detection is corresponding to be intended to whether similar inquiry meets at least one syntax rule;
If corresponding be intended to meet at least one syntax rule similar to inquiry, for the intention of the similar inquiry of the intention
Each intention in group, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and
The intention converted;
Described using the intention of the conversion as with syntax rule intent information similar with the grammer type of asterisk wildcard form,
And the grammer type is described as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
Scheme 18:Such as the method any one of scheme 13,15-17, wherein determining the similar intent information description
Collection further comprises:
Described each meaning being intended in whole intention groups of similar inquiry is parsed by semantic relation analysis means
Figure, to detect whether the similar inquiry of corresponding intention meets at least one semantic relation;
If corresponding be intended to meet at least one semantic relation similar to inquiry, for the intention of the similar inquiry of the intention
Each intention in group, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and
And remaining linguistic form is replaced with the semantic marker of remaining linguistic form of the intention, and the intention converted;
Described using the intention of the conversion as with semantic marker intent information similar with the semantic type of asterisk wildcard form,
And the semantic type is described as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
Scheme 19:Such as the method any one of scheme 13,15-17, wherein determining the similar intent information description
Collection further comprises:
Described each intention being intended in whole intention groups of similar inquiry is parsed by logic analysis means, with
Detection is corresponding to be intended to whether similar inquiry meets at least one logical relation;
If corresponding be intended to meet at least one logical relation similar to inquiry, for the intention of the similar inquiry of the intention
Each intention in group, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and
And remaining linguistic form is replaced with the logical type of remaining linguistic form of the intention, and the intention converted;
Described using the intention of the conversion as with logical type intent information similar with the logical type of asterisk wildcard form,
And the logical type is described as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
Scheme 20:Method as described in scheme 18, wherein determining that the similar intent information describes collection and further comprised:
Described each intention being intended in whole intention groups of similar inquiry is parsed by logic analysis means, with
Detection is corresponding to be intended to whether similar inquiry meets at least one logical relation;
If corresponding be intended to meet at least one logical relation similar to inquiry, for the intention of the similar inquiry of the intention
Each intention in group, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and
And remaining linguistic form is replaced with the logical type of remaining linguistic form of the intention, and the intention converted;And
Described using the intention of the conversion as with logical type intent information similar with the logical type of asterisk wildcard form,
And the logical type is described as the regular expression similar to intent information;
The regular expression is added into similar intent information description to concentrate.
Scheme 21:Method as described in scheme 13 or 14, wherein determining the similar intent information description collection further bag
Include:
Calculate the similar intent information description and concentrate each confidence level similar to intent information description;And
The certain amount of similar intent information description of selection confidence level highest is concentrated from the similar intent information description
Or confidence level is more than the similar intent information description of predetermined threshold.
Scheme 22:Method as described in scheme 21, the confidence level are calculated using at least one in the following:
The frequency of similar intent information description;
The coverage rate of similar intent information description;And
Similar intent information description and the correlation of input inquiry.
Scheme 23:Method as described in scheme 21, the confidence level calculate from least one in the following:
The similar intent information description collection;
The intention training set prepared;And
The realm information prepared.
Scheme 24:Method as described in scheme 23, wherein calculating similar intention letter from the similar intent information description collection
The confidence level of breath description further comprises:
The corresponding similar intention letter of concentration is described to the similar intent information according to the similar popularity inquired about is intended to
Breath description configures different weights;And/or
The similar intent information description is collected according to similar inquiry is intended to the similar degree between the input inquiry
In corresponding similar intent information description configure different weights.
Scheme 25:Method as described in scheme 1, wherein excavation includes for the intention of the input inquiry:
By being concentrated with input inquiry to replace the similar intent information description similar to the wildcard in intent information description
Accord with to produce one group of intention.
Scheme 26:Method as described in scheme 1, wherein excavation includes for the intention of the input inquiry:
First group of intention for the input inquiry is excavated from least one data source;And
Excavated by using the similar intent information description collection and first group of intention for the input inquiry
Second group of intention.
Scheme 27:Method as described in scheme 26, wherein excavation includes for second group of intention of the input inquiry:
By being retouched with input inquiry to replace at least one similar intent information that the similar intent information describes to concentrate
Asterisk wildcard in stating generates at least one intention, wherein at least one intention is not in first group of intention;And
At least one intention generated is added in first group of intention.
Scheme 28:Method as described in scheme 26, wherein excavation includes for second group of intention of the input inquiry:
Set pair is described by using the similar intent information to be ranked up for first group of intention of the input inquiry.
Scheme 29:Method as described in scheme 28, wherein excavation is further for second group of intention of the input inquiry
Including:
Identification is for the peculiar intention in first group of intention of the input inquiry;
According to special intentional peculiar degree, weight of the peculiar intention in the sequence is improved;
Wherein, special intentional peculiar degree is calculated by least one in the following:
Input inquiry and special intentional common occurrence rate in the intention training set prepared;
Relation of the input inquiry with peculiar intention in domain knowledge;
Frequency of the peculiar intention in muster data;And
Popularity of the peculiar intention in inquiry log.
Scheme 30:One kind is used for method for information retrieval, including:
Receive the input inquiry that user uses natural language;
Method according to any one of scheme 1-29 carries out intention excavation from the input inquiry;And
Obtain the search result for excavating and being intended to.
Scheme 31:A kind of method for question and answer auxiliary, including:
Receive the input inquiry that user uses natural language;
Method according to any one of scheme 1-29 excavates theme from the input inquiry;And
Obtain the answer for excavated theme.
Scheme 32:A kind of to be used to be intended to the equipment excavated, the equipment includes:
Input inquiry acquiring unit, obtain input inquiry;
It is intended to similar query generation unit, is intended to similar inquiry for input inquiry generation, each of which is intended to
Similar inquiry has and the same or similar intention type of the input inquiry;
First intention excavates unit, one group of intention is excavated for the similar inquiry of each intention, wherein each be intended to provide
For the corresponding sub-topicses for being intended to similar inquiry;
Similar intent information description collection determining unit, come by using the whole intention groups for being intended to similar inquiry true
Fixed similar intent information description collection;And
Second intention excavates unit, is excavated by using the similar intent information description collection for the input inquiry
Intention.
Scheme 33:Equipment as described in scheme 32, wherein described be intended to include similar to query generation unit:
Inquiry obtains one or more inquiries to phrase, wherein each to phrase acquiring unit from least one data source
Inquiry includes to phrase:The input inquiry, it is intended to similar designator and the 3rd phrase;And
3rd Phrase extraction unit, it is similar as the intention from each inquiry to the 3rd phrase described in Phrase extraction
Inquiry.
Scheme 34:Equipment as described in scheme 33, wherein described be intended to include in the following at least similar to designator
One:
Coordination designator, wherein two phrases connected by the coordination designator are used as phase in sentence
Same syntactic element;
The first phrase in relativity designator, wherein sentence is described with being connected to by the relativity designator
The second phrase after first phrase is in relativity;And
Choice relation designator, wherein two phrases connected by the choice relation designator form choosing in sentence
Selecting property is expressed.
Scheme 35:Equipment as described in scheme 32 or 33, wherein when the input inquiry is more word inquiries, the intention
Similar query generation unit includes:
Core is intended to part and modifier part recognition unit, identifies that the core of the input inquiry is intended to part and modification
Language part;And
It is intended to similar phrase generation unit, the intention that the core for generating the input inquiry is intended to part is made similar to phrase
It is intended to similar inquiry to be described.
Scheme 36:Equipment as described in scheme 32 or 33, wherein when the input inquiry is more word inquiries, the intention
Similar query generation unit includes:
Core is intended to part and modifier part recognition unit, identifies that the core of the input inquiry is intended to part and modification
Language part;And
Modifier partial replacement unit, by using the modifier part next life of input inquiry described in a variety of replacement partial replacements
Be intended to similar inquiry into described, wherein it is each substitute part be the intention for modifier part generation similar to phrase, its
In each be intended to similar phrase there is the same or similar intention type in modifier part with the input inquiry.
Scheme 37:Equipment as described in scheme 35, wherein the core is intended to part and modifier part recognition unit bag
Include:
Input inquiry resolution unit, parses the input inquiry, and the input inquiry is divided into multiple semantic primitives;
It is interim to be intended to similar query generation unit, for each semantic primitive divided of the input inquiry, generation
The similar inquiry of the interim intention that is made up of the semantic primitive divided with changing section, wherein the changing section is for described
The intention of other semantic primitives generation of input inquiry is similar to phrase;
3rd is intended to excavate unit, for each semantic primitive divided of the input inquiry, for each interim
It is intended to similar inquiry and excavates one group of intention, wherein each is intended to provide for the corresponding interim sub-topicses for being intended to similar inquiry;
Consistent degree computing unit, for each semantic primitive divided of the input inquiry, pass through more corresponding language
The interim intention of adopted unit calculates consistent degree similar to the intention group of inquiry, wherein the consistent degree is for corresponding semantic primitive
The similar inquiry of interim intention the homophylic measurement of intention, if present in the interim intention being intended in the intention of similar inquiry
More commonly, then the consistent degree is higher for type;And
Core is intended to part determining unit, and the semantic primitive for having highest consistent degree in the input inquiry is defined as into institute
The core for stating input inquiry is intended to part, and other semantic primitives are defined as to the modifier part of the input inquiry.
Scheme 38:Equipment as described in scheme 36, wherein the core is intended to part and modifier part recognition unit bag
Include:
Input inquiry resolution unit, parses the input inquiry, and the input inquiry is divided into multiple semantic primitives;
It is interim to be intended to similar query generation unit, for each semantic primitive divided of the input inquiry, generation
The similar inquiry of the interim intention that is made up of the semantic primitive divided with changing section, wherein the changing section is for described
The intention of other semantic primitives generation of input inquiry is similar to phrase;
3rd is intended to excavate unit, for each semantic primitive divided of the input inquiry, for each interim
It is intended to similar inquiry and excavates one group of intention, wherein each is intended to provide for the corresponding interim sub-topicses for being intended to similar inquiry;
Consistent degree computing unit, for each semantic primitive divided of the input inquiry, pass through more corresponding language
The interim intention of adopted unit calculates consistent degree similar to the intention group of inquiry, wherein the consistent degree is for corresponding semantic primitive
The similar inquiry of interim intention the homophylic measurement of intention, if present in the interim intention being intended in the intention of similar inquiry
More commonly, then the consistent degree is higher for type;And
Core is intended to part determining unit, and the semantic primitive for having highest consistent degree in the input inquiry is defined as into institute
The core for stating input inquiry is intended to part, and other semantic primitives are defined as to the modifier part of the input inquiry.
Scheme 39:Equipment as described in scheme 32 or 33, wherein described be intended to similar query generation unit including following
It is at least one:
One or more inquiries, which are obtained, from the similar inquiry storehouse of intention for being stored in the input inquiry is used as the intention
The unit of similar inquiry;
One or more brother of nodes that the input inquiry is obtained in domain body are intended to similar inquiry as described
Unit;
The neighbouring concept of the input inquiry is obtained in language dictionaries as the unit for being intended to similar inquiry;And
It is intended to similarity by being calculated based on the muster data associated with the input inquiry to be obtained from inquiry log
One or more inquiries are obtained as the units for being intended to similar inquiry.
Scheme 40:Equipment as described in scheme 32, wherein described be intended to further comprise similar to query generation unit:
Similar degree computing unit, calculate the class between each inquiry being intended in similar inquiry and the input inquiry
Like degree;And
It is intended to similar inquiry selecting unit, is intended to select the certain amount of meaning of similar degree highest in similar inquiry from described
The similar inquiry of figure or similar degree are more than the similar inquiry of intention of predetermined threshold.
Scheme 41:Equipment as described in scheme 40, wherein each inquiry being intended in similar inquiry and the input
Similar degree between inquiry is calculated by least one in the following:
The inquiry and the consistent degree of the input inquiry;
The inquiry and the vocabulary similarity of the input inquiry;
The inquiry and the grammer similarity of the input inquiry;
The inquiry and the semantic similarity of the input inquiry;
The inquiry and context similarity of the input inquiry in the collected works prepared;
The inquiry and common occurrence rate of the input inquiry in inquiry log;
The inquiry and distance of the input inquiry in domain body;And
The inquiry and the similitude of the muster data of the input inquiry.
Scheme 42:Equipment as described in scheme 40, wherein each inquiry being intended in similar inquiry and the input
Similar degree between inquiry is calculated by least one real-world information, and the real-world information comprises at least:Time,
Position, user model and environment.
Scheme 43:Equipment as described in scheme 32, wherein the similar intent information description passes through the regular of input inquiry
Express to present.
Scheme 44:Equipment as described in scheme 43, wherein the similar intent information description collection determining unit includes:
Linguistic form analytic unit, analyze the language that each being intended in whole intention groups of similar inquiry is intended to
Speech form;
Query intention relation determination unit, determine that respective intent is similar to the linguistic form of inquiry and its in the linguistic form
At least one query intention relation between remaining linguistic form;
Regular expression converter unit, the language for being intended to each corresponding to identified at least one query intention relation
Formal argument is regular expression;And
Regular expression adding device, the regular expression for converting acquisition is added to the similar intent information description and concentrated.
Scheme 45:Equipment as described in scheme 44, wherein the similar intent information description collection determining unit is further wrapped
Include:
Intention group expanding element, each intention group is extended, including:
Synonymous phrase generation unit, for each intention in the intention group, by with least one word in intention
Synonym or near synonym generate synonymous phrase to replace at least one word, wherein at least one word is not corresponding
It is intended in similar inquiry, and
Synonymous phrase adding device, caused synonymous phrase is added in the intention group.
Scheme 46:Equipment as described in scheme 44, wherein the similar intent information description collection determining unit is further wrapped
Include:
First intention resolution unit, the whole intention groups for being intended to similar inquiry are parsed by morphological analysis means
In each intention, corresponding be intended to whether similar inquiry meets at least one morphological rule to detect;
First asterisk wildcard replacement unit, if corresponding be intended to meet at least one morphological rule similar to inquiry, for
Each intention in the intention group of the similar inquiry of the intention, respective intent class in the linguistic form of the intention is replaced with asterisk wildcard
Like the linguistic form of inquiry, and the intention converted;
First regular expression generation unit, using the intention of the conversion as the vocabulary type with vocabulary and asterisk wildcard form
Similar intent information is described, and the vocabulary type is described as the regular expression similar to intent information;And
First regular expression adding device, the regular expression is added to similar intent information description and concentrated.
Scheme 47:Equipment as described in scheme 44, wherein the similar intent information description collection determining unit is further wrapped
Include:
Second intention resolution unit, the whole intention groups for being intended to similar inquiry are parsed by syntactic analysis means
In each intention, corresponding be intended to whether similar inquiry meets at least one syntax rule to detect;
Second asterisk wildcard replacement unit, if corresponding be intended to meet at least one syntax rule similar to inquiry, for
Each intention in the intention group of the similar inquiry of the intention, respective intent class in the linguistic form of the intention is replaced with asterisk wildcard
Like the linguistic form of inquiry, and the intention converted;
Second regular expression generation unit, using the intention of the conversion as the language with syntax rule and asterisk wildcard form
Method type is described similar to intent information, and the grammer type is described as the regular expression similar to intent information;And
Second regular expression adding device, the regular expression is added to similar intent information description and concentrated.
Scheme 48:Equipment as described in scheme 46, wherein the similar intent information description collection determining unit is further wrapped
Include:
Second intention resolution unit, the whole intention groups for being intended to similar inquiry are parsed by syntactic analysis means
In each intention, corresponding be intended to whether similar inquiry meets at least one syntax rule to detect;
Second asterisk wildcard replacement unit, if corresponding be intended to meet at least one syntax rule similar to inquiry, for
Each intention in the intention group of the similar inquiry of the intention, respective intent class in the linguistic form of the intention is replaced with asterisk wildcard
Like the linguistic form of inquiry, and the intention converted;
Second regular expression generation unit, using the intention of the conversion as the language with syntax rule and asterisk wildcard form
Method type is described similar to intent information, and the grammer type is described as the regular expression similar to intent information;And
Second regular expression adding device, the regular expression is added to similar intent information description and concentrated.
Scheme 49:Such as the equipment any one of scheme 44,46-48, wherein the similar intent information description collection is true
Order member further comprises:
3rd intents unit, the whole meanings for being intended to similar inquiry are parsed by semantic relation analysis means
Each intention in figure group, to detect whether the similar inquiry of corresponding intention meets at least one semantic relation;
3rd asterisk wildcard replacement unit, if corresponding be intended to meet at least one semantic relation similar to inquiry, for
Each intention in the intention group of the similar inquiry of the intention, respective intent class in the linguistic form of the intention is replaced with asterisk wildcard
Remaining linguistic form is replaced like the linguistic form of inquiry, and with the semantic marker of remaining linguistic form of the intention, and is obtained
To the intention of conversion;And
3rd regular expression generation unit, using the intention of the conversion as the language with semantic marker and asterisk wildcard form
Adopted type is described similar to intent information, and the semantic type is described as the regular expression similar to intent information;
3rd regular expression adding device, the regular expression is added to similar intent information description and concentrated.
Scheme 50:Such as the equipment any one of scheme 44,46-48, wherein the similar intent information description collection is true
Order member further comprises:
4th intents unit, the whole intention groups for being intended to similar inquiry are parsed by logic analysis means
In each intention, corresponding be intended to whether similar inquiry meets at least one logical relation to detect;
4th asterisk wildcard replacement unit, if corresponding be intended to meet at least one logical relation similar to inquiry, for
Each intention in the intention group of the similar inquiry of the intention, respective intent class in the linguistic form of the intention is replaced with asterisk wildcard
Remaining linguistic form is replaced like the linguistic form of inquiry, and with the logical type of remaining linguistic form of the intention, and is obtained
To the intention of conversion;And
4th regular expression generation unit, intention using the conversion is as having patrolling for logical type and asterisk wildcard form
The type of collecting is described similar to intent information, and the logical type is described as the regular expression similar to intent information;
4th regular expression adding device, the regular expression is added to similar intent information description and concentrated.
Scheme 51:Equipment as described in scheme 49, wherein the similar intent information description collection determining unit is further wrapped
Include:
4th intents unit, the whole intention groups for being intended to similar inquiry are parsed by logic analysis means
In each intention, corresponding be intended to whether similar inquiry meets at least one logical relation to detect;
4th asterisk wildcard replacement unit, if corresponding be intended to meet at least one logical relation similar to inquiry, for
Each intention in the intention group of the similar inquiry of the intention, respective intent class in the linguistic form of the intention is replaced with asterisk wildcard
Remaining linguistic form is replaced like the linguistic form of inquiry, and with the logical type of remaining linguistic form of the intention, and is obtained
To the intention of conversion;And
4th regular expression generation unit, intention using the conversion is as having patrolling for logical type and asterisk wildcard form
The type of collecting is described similar to intent information, and the logical type is described as the regular expression similar to intent information;
4th regular expression adding device, the regular expression is added to similar intent information description and concentrated.
Scheme 52:Equipment as described in scheme 44 or 45, wherein the similar intent information description collection determining unit enters one
Step includes:
Confidence computation unit, calculate the similar intent information description and concentrate each putting similar to intent information description
Reliability;And
Similar intent information describes selecting unit, concentrates selection confidence level highest special from the similar intent information description
The similar intent information that the similar intent information description of fixed number amount or confidence level are more than predetermined threshold describes.
Scheme 53:Equipment as described in scheme 52, the confidence level are calculated using at least one in the following:
The frequency of similar intent information description;
The coverage rate of similar intent information description;And
Similar intent information description and the correlation of input inquiry.
Scheme 54:Equipment as described in scheme 52, the confidence level calculate from least one in the following:
The similar intent information description collection;
The intention training set prepared;And
The realm information prepared.
Scheme 55:Equipment as described in scheme 54, wherein the confidence computation unit further comprises:
First weight dispensing unit, the similar intent information description is concentrated according to the popularity for being intended to similar inquiry
Corresponding similar intent information description configure different weights;And/or
Second weight dispensing unit, according to the similar degree being intended between similar inquiry and the input inquiry come to the class
The corresponding similar intent information description concentrated like intent information description configures different weights.
Scheme 56:Equipment as described in scheme 32, include wherein the second intention excavates unit:
Input inquiry replacement unit, by concentrating similar be intended to input inquiry to replace the similar intent information description
Asterisk wildcard in information description produces one group of intention.
Scheme 57:Equipment as described in scheme 32, include wherein the second intention excavates unit:
First group of intention excavates unit, and first group of intention for the input inquiry is excavated from least one data source;
And
Second group of intention excavates unit, is dug by using the similar intent information description collection and first group of intention
Second group intention of the pick for the input inquiry.
Scheme 58:Equipment as described in scheme 57, include wherein second group of intention excavates unit:
By being retouched with input inquiry to replace at least one similar intent information that the similar intent information describes to concentrate
Asterisk wildcard in stating generates the unit of at least one intention, wherein at least one intention is not in first group of intention;With
And
The unit of at least one intention generated is added in first group of intention.
Scheme 59:Equipment as described in scheme 57, include wherein second group of intention excavates unit:
Sequencing unit, first group intention of the set pair for the input inquiry is described by using the similar intent information
It is ranked up.
Scheme 60:Equipment as described in scheme 59, further comprise wherein second group of intention excavates unit:
Peculiar intention assessment unit, identification is for the peculiar intention in first group of intention of the input inquiry;
Weight changes unit, according to special intentional peculiar degree, improve weight of the peculiar intention in the sequence;
Wherein, special intentional peculiar degree is calculated by least one in the following:
Input inquiry and special intentional common occurrence rate in the intention training set prepared;
Relation of the input inquiry with peculiar intention in domain knowledge;
Frequency of the peculiar intention in muster data;And
Popularity of the peculiar intention in inquiry log.
Scheme 61:A kind of equipment for information retrieval, including:
Input inquiry receiving unit, receive the input inquiry that user uses natural language;
The equipment for being used to be intended to excavate according to any one of scheme 32-60, is intended to from the input inquiry
Excavate;And
Search result obtaining unit, obtain the search result for excavating and being intended to.
Scheme 62:A kind of equipment for question and answer auxiliary, including:
Input inquiry receiving unit, receive the input inquiry that user uses natural language;
The equipment for being used to be intended to excavate according to any one of scheme 32-60, theme is excavated from the input inquiry;
And
Answer obtaining unit, obtain the answer for excavated theme.
It will be appreciated by those skilled in the art that the various embodiments of the present invention can be combined arbitrarily, and without departing from this
The scope of invention.
The method and system of the present invention may be achieved in many ways.For example, can by software, hardware, firmware or
Software, hardware, firmware any combinations come realize the present invention method and system.The said sequence of the step of for methods described
Order described in detail above is not limited to merely to illustrate, the step of method of the invention, it is special unless otherwise
Do not mentionlet alone bright.In addition, in certain embodiments, the present invention can be also embodied as recording program in the recording medium, these programs
Including the machine readable instructions for realizing the method according to the invention.Thus, the present invention also covering storage is used to perform basis
The recording medium of the program of the method for the present invention.
Although some specific embodiments of the present invention are described in detail by example, the skill of this area
Art personnel it should be understood that above example merely to illustrating, the scope being not intended to be limiting of the invention.The skill of this area
Art personnel to above example it should be understood that can modify without departing from the scope and spirit of the present invention.This hair
Bright scope is defined by the following claims.
Claims (58)
1. a kind of be used to be intended to the method excavated, methods described includes:
Obtain input inquiry;
It is intended to similar inquiry for input inquiry generation, each of which, which is intended to similar inquiry, to be had and the input inquiry
Same or similar intention type;Wherein, being intended to similar inquiry for input inquiry generation includes:
One or more inquiries are obtained to phrase from least one data source, wherein each inquiry includes to phrase:The input
Inquiry, it is intended to similar designator and the 3rd phrase;And
From each inquiry to the 3rd phrase described in Phrase extraction, it is intended to similar inquiry as described;
Wherein, it is described to be intended to include at least one in the following similar to designator:Coordination designator, relativity refer to
Show symbol and choice relation designator;
One group of intention is excavated for the similar inquiry of each intention, wherein being each intended to provide inquiry similar for corresponding intention
Sub-topicses;
Similar intent information description collection is determined by using the whole intention groups for being intended to similar inquiry;And
By using intention of the similar intent information description collection to excavate for the input inquiry.
2. the method as described in claim 1,
Two phrases wherein connected by the coordination designator are used as identical syntactic element in sentence;
The first phrase wherein in sentence and be connected to by the relativity designator after first phrase second
Phrase is in relativity;And
Two phrases wherein connected by the choice relation designator form selective expression in sentence.
3. the method as described in claim 1, wherein when the input inquiry is more word inquiries, given birth to for the input inquiry
Include into similar inquiry is intended to:
Identify that the core of the input inquiry is intended to part and modifier part;And
The intention that the core for generating the input inquiry is intended to part is intended to similar inquiry similar to phrase as described.
4. the method as described in claim 1, wherein when the input inquiry is more word inquiries, given birth to for the input inquiry
Include into similar inquiry is intended to:
Identify that the core of the input inquiry is intended to part and modifier part;And
The similar inquiry of the intention is generated by using the modifier part of input inquiry described in a variety of replacement partial replacements, wherein
It is each substitute part be the intention for modifier part generation similar to phrase, wherein it is each be intended to similar phrase have with
The same or similar intention type in modifier part of the input inquiry.
5. the method as described in claim 3 or 4, wherein identifying that the core of the input inquiry is intended to part and modifier part
Including:
The input inquiry is parsed, the input inquiry is divided into multiple semantic primitives;
For each semantic primitive divided of the input inquiry, the semantic primitive and changing section structure by being divided are generated
Into the similar inquiry of interim intention, wherein the changing section is the meaning generated for other semantic primitives of the input inquiry
The similar phrase of figure;
For each semantic primitive divided of the input inquiry, for one group of meaning of each interim similar inquiry excavation of intention
Figure, wherein each be intended to provide for the corresponding interim sub-topicses for being intended to similar inquiry;
It is similar by the interim intention of more corresponding semantic primitive for each semantic primitive divided of the input inquiry
The intention group of inquiry calculates consistent degree, wherein the consistent degree is the similar inquiry of interim intention for corresponding semantic primitive
It is intended to homophylic measurement, then if present in the interim intention type being intended in the intention of similar inquiry more commonly and described one
Cause degree is higher;And
The core that the semantic primitive in the input inquiry with highest consistent degree is defined as to the input inquiry is intended to part,
And other semantic primitives are defined as to the modifier part of the input inquiry.
6. the method as described in claim 1, wherein for input inquiry generation be intended to similar inquiry include it is following extremely
It is few one:
One or more inquiries, which are obtained, from the similar inquiry storehouse of intention of the input inquiry is intended to similar inquiry as described;
One or more brother of nodes that the input inquiry is obtained in domain body are intended to similar inquiry as described;
The neighbouring concept that the input inquiry is obtained in language dictionaries is intended to similar inquiry as described;And
It is intended to similarity by being calculated based on the muster data associated with the input inquiry to obtain one from inquiry log
Individual or multiple queries are intended to similar inquiry as described.
7. the method as described in claim 1, further comprise wherein being intended to similar inquiry for input inquiry generation:
Calculate the similar degree between each inquiry being intended in similar inquiry and the input inquiry;And
It is intended to select certain amount of similar inquiry or the similar degree of being intended to of similar degree highest in similar inquire about more than pre- from described
Determine the similar inquiry of intention of threshold value.
8. method as claimed in claim 7, wherein each inquiry and the input inquiry being intended in similar inquiry it
Between similar degree by the following at least one of calculate:
The inquiry and the consistent degree of the input inquiry;
The inquiry and the vocabulary similarity of the input inquiry;
The inquiry and the grammer similarity of the input inquiry;
The inquiry and the semantic similarity of the input inquiry;
The inquiry and context similarity of the input inquiry in the collected works prepared;
The inquiry and common occurrence rate of the input inquiry in inquiry log;
The inquiry and distance of the input inquiry in domain body;And
The inquiry and the similitude of the muster data of the input inquiry.
9. method as claimed in claim 7, wherein each inquiry and the input inquiry being intended in similar inquiry it
Between similar degree calculated by least one real-world information, the real-world information comprises at least:Time, position,
User model and environment.
10. the method as described in claim 1, wherein the similar intent information description by the regular expression of input inquiry come
Present.
11. method as claimed in claim 10, wherein determining that the similar intent information describes collection and included:
Analyze the linguistic form that each being intended in whole intention groups of similar inquiry is intended to;
Determine at least one between the linguistic form and remaining linguistic form of the similar inquiry of respective intent in the linguistic form
Query intention relation;
Each linguistic form being intended to is transformed to regular expression corresponding to identified at least one query intention relation;With
And
The regular expression for converting acquisition is added into the similar intent information description to concentrate.
12. method as claimed in claim 11, wherein determining that the similar intent information describes collection and further comprised:
Each intention group is extended, including:
For each intention in the intention group, by being replaced with the synonym or near synonym of at least one word in intention
At least one word and generate synonymous phrase, wherein at least one word is not intended in similar inquiry corresponding, and
Caused synonymous phrase is added in the intention group.
13. method as claimed in claim 11, wherein determining that the similar intent information describes collection and further comprised:
Described each intention being intended in whole intention groups of similar inquiry is parsed by morphological analysis means, with detection
It is corresponding to be intended to whether similar inquiry meets at least one morphological rule;
If corresponding be intended to meet at least one morphological rule similar to inquiry, in the intention group of the similar inquiry of the intention
Each intention, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and obtain
The intention of conversion;
Using the intention of the conversion as having similar with the vocabulary type of the asterisk wildcard form intent information of vocabulary to describe, and by the word
Remittance type describes to be used as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
14. method as claimed in claim 11, wherein determining that the similar intent information describes collection and further comprised:
Described each intention being intended in whole intention groups of similar inquiry is parsed by syntactic analysis means, with detection
It is corresponding to be intended to whether similar inquiry meets at least one syntax rule;
If corresponding be intended to meet at least one syntax rule similar to inquiry, in the intention group of the similar inquiry of the intention
Each intention, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and obtain
The intention of conversion;
Described using the intention of the conversion as with syntax rule intent information similar with the grammer type of asterisk wildcard form, and will
The grammer type describes to be used as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
15. method as claimed in claim 13, wherein determining that the similar intent information describes collection and further comprised:
Described each intention being intended in whole intention groups of similar inquiry is parsed by syntactic analysis means, with detection
It is corresponding to be intended to whether similar inquiry meets at least one syntax rule;
If corresponding be intended to meet at least one syntax rule similar to inquiry, in the intention group of the similar inquiry of the intention
Each intention, the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention is replaced with asterisk wildcard, and obtain
The intention of conversion;
Described using the intention of the conversion as with syntax rule intent information similar with the grammer type of asterisk wildcard form, and will
The grammer type describes to be used as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
16. such as the method any one of claim 11,13-15, wherein determine the similar intent information description collect into
One step includes:
Described each intention being intended in whole intention groups of similar inquiry is parsed by semantic relation analysis means, with
Detection is corresponding to be intended to whether similar inquiry meets at least one semantic relation;
If corresponding be intended to meet at least one semantic relation similar to inquiry, in the intention group of the similar inquiry of the intention
Each intention, replace the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention with asterisk wildcard, and with
The semantic marker of remaining linguistic form of the intention replaces remaining linguistic form, and the intention converted;
Described using the intention of the conversion as with semantic marker intent information similar with the semantic type of asterisk wildcard form, and will
The semantic type describes to be used as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
17. such as the method any one of claim 11,13-15, wherein determine the similar intent information description collect into
One step includes:
Described each intention being intended in whole intention groups of similar inquiry is parsed by logic analysis means, with detection
It is corresponding to be intended to whether similar inquiry meets at least one logical relation;
If corresponding be intended to meet at least one logical relation similar to inquiry, in the intention group of the similar inquiry of the intention
Each intention, replace the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention with asterisk wildcard, and with
The logical type of remaining linguistic form of the intention replaces remaining linguistic form, and the intention converted;
Described using the intention of the conversion as with logical type intent information similar with the logical type of asterisk wildcard form, and will
The logical type describes to be used as the regular expression similar to intent information;And
The regular expression is added into similar intent information description to concentrate.
18. method as claimed in claim 16, wherein determining that the similar intent information describes collection and further comprised:
Described each intention being intended in whole intention groups of similar inquiry is parsed by logic analysis means, with detection
It is corresponding to be intended to whether similar inquiry meets at least one logical relation;
If corresponding be intended to meet at least one logical relation similar to inquiry, in the intention group of the similar inquiry of the intention
Each intention, replace the linguistic form of the similar inquiry of respective intent in the linguistic form of the intention with asterisk wildcard, and with
The logical type of remaining linguistic form of the intention replaces remaining linguistic form, and the intention converted;And
Described using the intention of the conversion as with logical type intent information similar with the logical type of asterisk wildcard form, and will
The logical type describes to be used as the regular expression similar to intent information;
The regular expression is added into similar intent information description to concentrate.
19. the method as described in claim 11 or 12, wherein determining that the similar intent information describes collection and further comprised:
Calculate the similar intent information description and concentrate each confidence level similar to intent information description;And
From the similar intent information description concentrate the certain amount of similar intent information of selection confidence level highest describe or
The similar intent information that confidence level is more than predetermined threshold describes.
20. method as claimed in claim 19, the confidence level is calculated using at least one in the following:
The frequency of similar intent information description;
The coverage rate of similar intent information description;And
Similar intent information description and the correlation of input inquiry.
21. method as claimed in claim 19, the confidence level calculates from least one in the following:
The similar intent information description collection;
The intention training set prepared;And
The realm information prepared.
22. method as claimed in claim 21, retouched wherein calculating similar intent information from the similar intent information description collection
The confidence level stated further comprises:
Retouched according to the popularity for being intended to similar inquiry come the corresponding similar intent information for describing to concentrate to the similar intent information
State and configure different weights;And/or
That concentrates is described to the similar intent information according to similar inquiry is intended to the similar degree between the input inquiry
Corresponding similar intent information description configures different weights.
23. the method as described in claim 1, wherein excavation includes for the intention of the input inquiry:
By with input inquiry come replace the similar intent information description concentrate asterisk wildcard in similar intent information description come
Produce one group of intention.
24. the method as described in claim 1, wherein excavation includes for the intention of the input inquiry:
First group of intention for the input inquiry is excavated from least one data source;And
The for the input inquiry is excavated by using the similar intent information description collection and first group of intention
Two groups of intentions.
25. method as claimed in claim 24, wherein excavation includes for second group of intention of the input inquiry:
In being described by least one similar intent information concentrated with input inquiry to replace the similar intent information to describe
Asterisk wildcard generate at least one intention, wherein at least one intention is not in first group of intention;And
At least one intention generated is added in first group of intention.
26. method as claimed in claim 24, wherein excavation includes for second group of intention of the input inquiry:
Set pair is described by using the similar intent information to be ranked up for first group of intention of the input inquiry.
27. method as claimed in claim 26, wherein excavation further comprises for second group of intention of the input inquiry:
Identification is for the peculiar intention in first group of intention of the input inquiry;
According to special intentional peculiar degree, weight of the peculiar intention in the sequence is improved;
Wherein, special intentional peculiar degree is calculated by least one in the following:
Input inquiry and special intentional common occurrence rate in the intention training set prepared;
Relation of the input inquiry with peculiar intention in domain knowledge;
Frequency of the peculiar intention in muster data;And
Popularity of the peculiar intention in inquiry log.
28. one kind is used for method for information retrieval, including:
Receive the input inquiry that user uses natural language;
Method according to any one of claim 1-27 carries out intention excavation from the input inquiry;And
Obtain the search result for excavating and being intended to.
29. a kind of method for question and answer auxiliary, including:
Receive the input inquiry that user uses natural language;
Method according to any one of claim 1-27 excavates theme from the input inquiry;And
Obtain the answer for excavated theme.
30. a kind of be used to be intended to the equipment excavated, the equipment includes:
Input inquiry acquiring unit, obtain input inquiry;
It is intended to similar query generation unit, is intended to similar inquiry for input inquiry generation, each of which is intended to similar
Inquiry has and the same or similar intention type of the input inquiry;Wherein, it is described to be intended to include similar to query generation unit:
Inquiry obtains one or more inquiries to phrase, wherein each inquiry to phrase acquiring unit from least one data source
Phrase is included:The input inquiry, it is intended to similar designator and the 3rd phrase;And
3rd Phrase extraction unit, from each inquiry to the 3rd phrase described in Phrase extraction, it is intended to similar inquiry as described;
Wherein, it is described to be intended to include at least one in the following similar to designator:Coordination designator, relativity refer to
Show symbol and choice relation designator;
First intention excavates unit, excavates one group of intention for the similar inquiry of each intention, is directed to wherein being each intended to offer
The corresponding sub-topicses for being intended to similar inquiry;
Similar intent information description collection determining unit, class is determined by using the whole intention groups for being intended to similar inquiry
Describe to collect like intent information;And
Second intention excavates unit, by using meaning of the similar intent information description collection to excavate for the input inquiry
Figure.
31. equipment as claimed in claim 30,
Two phrases wherein connected by the coordination designator are used as identical syntactic element in sentence;
The first phrase wherein in sentence and be connected to by the relativity designator after first phrase second
Phrase is in relativity;And
Two phrases wherein connected by the choice relation designator form selective expression in sentence.
32. equipment as claimed in claim 30, wherein when the input inquiry is more word inquiries, it is described to be intended to similar inquiry
Generation unit includes:
Core is intended to part and modifier part recognition unit, identifies that the core of the input inquiry is intended to part and modifier portion
Point;And
It is intended to similar phrase generation unit, the intention of core intention part of the input inquiry is generated similar to phrase, as institute
State the similar inquiry of intention.
33. equipment as claimed in claim 30, wherein when the input inquiry is more word inquiries, it is described to be intended to similar inquiry
Generation unit includes:
Core is intended to part and modifier part recognition unit, identifies that the core of the input inquiry is intended to part and modifier portion
Point;And
Modifier partial replacement unit, by using the modifier part of input inquiry described in a variety of replacement partial replacements to generate
State the similar inquiry of intention, wherein it is each substitute part be the intention for modifier part generation similar to phrase, wherein often
It is individual to be intended to similar phrase with the same or similar intention type in modifier part with the input inquiry.
34. the equipment as described in claim 32 or 33, wherein the core is intended to part and modifier part recognition unit bag
Include:
Input inquiry resolution unit, parses the input inquiry, and the input inquiry is divided into multiple semantic primitives;
It is interim to be intended to similar query generation unit, for each semantic primitive divided of the input inquiry, generate by institute
The semantic primitive of division inquiry similar with the interim intention that changing section is formed, wherein the changing section is to be directed to the input
The intention of other semantic primitives generation of inquiry is similar to phrase;
3rd is intended to excavate unit, for each semantic primitive divided of the input inquiry, for each interim intention
One group of intention is excavated in similar inquiry, wherein each be intended to provide for the corresponding interim sub-topicses for being intended to similar inquiry;
Consistent degree computing unit, for each semantic primitive divided of the input inquiry, by relatively corresponding semantic more single
The interim intention of member calculates consistent degree similar to the intention group of inquiry, wherein the consistent degree is facing for corresponding semantic primitive
The homophylic measurement of intention of the similar inquiries of Shi Yitu, if present in the interim intention type being intended in the intention of similar inquiry
More commonly, then the consistent degree is higher;And
Core is intended to part determining unit, the semantic primitive in the input inquiry with highest consistent degree is defined as described defeated
The core for entering inquiry is intended to part, and other semantic primitives are defined as to the modifier part of the input inquiry.
35. equipment as claimed in claim 30, wherein it is described be intended to similar query generation unit include it is following at least one:
It is intended to from the intention of the input inquiry similar to one or more inquiries are obtained in inquiry storehouse as described similar to inquiry
Unit;
One or more brother of nodes of the input inquiry are obtained in domain body as the list for being intended to similar inquiry
Member;
The neighbouring concept of the input inquiry is obtained in language dictionaries as the unit for being intended to similar inquiry;And
It is intended to similarity by being calculated based on the muster data associated with the input inquiry to obtain one from inquiry log
Individual or multiple queries are as the unit for being intended to similar inquiry.
36. equipment as claimed in claim 30, wherein described be intended to further comprise similar to query generation unit:
Similar degree computing unit, calculate similar between each inquiry being intended in similar inquiry and the input inquiry
Degree;And
It is intended to similar inquiry selecting unit, is intended to select the certain amount of intention class of similar degree highest in similar inquiry from described
It is more than the similar inquiry of intention of predetermined threshold like inquiry or similar degree.
37. equipment as claimed in claim 36, wherein each inquiry being intended in similar inquiry and the input inquiry
Between similar degree by the following at least one of calculate:
The inquiry and the consistent degree of the input inquiry;
The inquiry and the vocabulary similarity of the input inquiry;
The inquiry and the grammer similarity of the input inquiry;
The inquiry and the semantic similarity of the input inquiry;
The inquiry and context similarity of the input inquiry in the collected works prepared;
The inquiry and common occurrence rate of the input inquiry in inquiry log;
The inquiry and distance of the input inquiry in domain body;And
The inquiry and the similitude of the muster data of the input inquiry.
38. equipment as claimed in claim 36, wherein each inquiry being intended in similar inquiry and the input inquiry
Between similar degree calculated by least one real-world information, the real-world information comprises at least:Time, position
Put, user model and environment.
39. equipment as claimed in claim 30, wherein the regular expression that the similar intent information description passes through input inquiry
To present.
40. equipment as claimed in claim 39, wherein the similar intent information description collection determining unit includes:
Linguistic form analytic unit, analyze the language shape that each being intended in whole intention groups of similar inquiry is intended to
Formula;
Query intention relation determination unit, determine the linguistic form and remaining language of the similar inquiry of respective intent in the linguistic form
At least one query intention relation between speech form;
Regular expression converter unit, the linguistic form for being intended to each corresponding to identified at least one query intention relation
It is transformed to regular expression;And
Regular expression adding device, the regular expression for converting acquisition is added to the similar intent information description and concentrated.
41. equipment as claimed in claim 40, wherein the similar intent information description collection determining unit further comprises:
Intention group expanding element, each intention group is extended, including:
Synonymous phrase generation unit, for each intention in the intention group, by with the same of at least one word in intention
Adopted word or near synonym generate synonymous phrase to replace at least one word, wherein at least one word is not intended to accordingly
In similar inquiry, and
Synonymous phrase adding device, caused synonymous phrase is added in the intention group.
42. equipment as claimed in claim 40, wherein the similar intent information description collection determining unit further comprises:
First intention resolution unit, described be intended in whole intention groups of similar inquiry is parsed by morphological analysis means
Each is intended to, to detect whether the similar inquiry of corresponding intention meets at least one morphological rule;
First asterisk wildcard replacement unit, if corresponding be intended to meet at least one morphological rule similar to inquiry, for the meaning
Each intention in the intention group of the similar inquiry of figure, replaces that respective intent in the linguistic form of the intention is similar to be looked into asterisk wildcard
The linguistic form of inquiry, and the intention converted;
First regular expression generation unit, using the intention of the conversion as similar with the vocabulary type of asterisk wildcard form with vocabulary
Intent information is described, and the vocabulary type is described as the regular expression similar to intent information;And
First regular expression adding device, the regular expression is added to similar intent information description and concentrated.
43. equipment as claimed in claim 40, wherein the similar intent information description collection determining unit further comprises:
Second intention resolution unit, described be intended in whole intention groups of similar inquiry is parsed by syntactic analysis means
Each is intended to, to detect whether the similar inquiry of corresponding intention meets at least one syntax rule;
Second asterisk wildcard replacement unit, if corresponding be intended to meet at least one syntax rule similar to inquiry, for the meaning
Each intention in the intention group of the similar inquiry of figure, replaces that respective intent in the linguistic form of the intention is similar to be looked into asterisk wildcard
The linguistic form of inquiry, and the intention converted;
Second regular expression generation unit, using the intention of the conversion as the grammer type with syntax rule and asterisk wildcard form
Similar intent information is described, and the grammer type is described as the regular expression similar to intent information;And
Second regular expression adding device, the regular expression is added to similar intent information description and concentrated.
44. equipment as claimed in claim 42, wherein the similar intent information description collection determining unit further comprises:
Second intention resolution unit, described be intended in whole intention groups of similar inquiry is parsed by syntactic analysis means
Each is intended to, to detect whether the similar inquiry of corresponding intention meets at least one syntax rule;
Second asterisk wildcard replacement unit, if corresponding be intended to meet at least one syntax rule similar to inquiry, for the meaning
Each intention in the intention group of the similar inquiry of figure, replaces that respective intent in the linguistic form of the intention is similar to be looked into asterisk wildcard
The linguistic form of inquiry, and the intention converted;
Second regular expression generation unit, using the intention of the conversion as the grammer type with syntax rule and asterisk wildcard form
Similar intent information is described, and the grammer type is described as the regular expression similar to intent information;And
Second regular expression adding device, the regular expression is added to similar intent information description and concentrated.
45. such as the equipment any one of claim 40,42-44, determined singly wherein the similar intent information description collects
Member further comprises:
3rd intents unit, the whole intention groups for being intended to similar inquiry are parsed by semantic relation analysis means
In each intention, corresponding be intended to whether similar inquiry meets at least one semantic relation to detect;
3rd asterisk wildcard replacement unit, if corresponding be intended to meet at least one semantic relation similar to inquiry, for the meaning
Each intention in the intention group of the similar inquiry of figure, replaces that respective intent in the linguistic form of the intention is similar to be looked into asterisk wildcard
The linguistic form of inquiry, and remaining linguistic form is replaced with the semantic marker of remaining linguistic form of the intention, and become
The intention changed;And
3rd regular expression generation unit, using the intention of the conversion as the semantic type with semantic marker and asterisk wildcard form
Similar intent information is described, and the semantic type is described as the regular expression similar to intent information;
3rd regular expression adding device, the regular expression is added to similar intent information description and concentrated.
46. such as the equipment any one of claim 40,42-44, determined singly wherein the similar intent information description collects
Member further comprises:
4th intents unit, described be intended in whole intention groups of similar inquiry is parsed by logic analysis means
Each is intended to, to detect whether the similar inquiry of corresponding intention meets at least one logical relation;
4th asterisk wildcard replacement unit, if corresponding be intended to meet at least one logical relation similar to inquiry, for the meaning
Each intention in the intention group of the similar inquiry of figure, replaces that respective intent in the linguistic form of the intention is similar to be looked into asterisk wildcard
The linguistic form of inquiry, and remaining linguistic form is replaced with the logical type of remaining linguistic form of the intention, and become
The intention changed;And
4th regular expression generation unit, using the intention of the conversion as the logical type with logical type and asterisk wildcard form
Similar intent information is described, and the logical type is described as the regular expression similar to intent information;
4th regular expression adding device, the regular expression is added to similar intent information description and concentrated.
47. equipment as claimed in claim 45, wherein the similar intent information description collection determining unit further comprises:
4th intents unit, described be intended in whole intention groups of similar inquiry is parsed by logic analysis means
Each is intended to, to detect whether the similar inquiry of corresponding intention meets at least one logical relation;
4th asterisk wildcard replacement unit, if corresponding be intended to meet at least one logical relation similar to inquiry, for the meaning
Each intention in the intention group of the similar inquiry of figure, replaces that respective intent in the linguistic form of the intention is similar to be looked into asterisk wildcard
The linguistic form of inquiry, and remaining linguistic form is replaced with the logical type of remaining linguistic form of the intention, and become
The intention changed;And
4th regular expression generation unit, using the intention of the conversion as the logical type with logical type and asterisk wildcard form
Similar intent information is described, and the logical type is described as the regular expression similar to intent information;
4th regular expression adding device, the regular expression is added to similar intent information description and concentrated.
48. the equipment as described in claim 40 or 41, wherein the similar intent information description collection determining unit is further wrapped
Include:
Confidence computation unit, calculate the similar intent information description and concentrate each confidence similar to intent information description
Degree;And
Similar intent information describes selecting unit, and selection confidence level highest certain number is concentrated from the similar intent information description
The similar intent information that the similar intent information description of amount or confidence level are more than predetermined threshold describes.
49. equipment as claimed in claim 48, the confidence level is calculated using at least one in the following:
The frequency of similar intent information description;
The coverage rate of similar intent information description;And
Similar intent information description and the correlation of input inquiry.
50. equipment as claimed in claim 48, the confidence level calculates from least one in the following:
The similar intent information description collection;
The intention training set prepared;And
The realm information prepared.
51. equipment as claimed in claim 50, wherein the confidence computation unit further comprises:
First weight dispensing unit, the phase of concentration is described according to the popularity for being intended to similar inquiry to the similar intent information
Should the different weight of similar intent information description configuration;And/or
Second weight dispensing unit, inquired about according to intention is similar with the similar degree between the input inquiry come to the similar meaning
The corresponding similar intent information description that the description of figure information is concentrated configures different weights.
52. equipment as claimed in claim 30, include wherein the second intention excavates unit:
Input inquiry replacement unit, by concentrating similar intent information with input inquiry to replace the similar intent information description
Asterisk wildcard in description produces one group of intention.
53. equipment as claimed in claim 30, include wherein the second intention excavates unit:
First group of intention excavates unit, and first group of intention for the input inquiry is excavated from least one data source;And
Second group of intention excavates unit, and pin is excavated by using the similar intent information description collection and first group of intention
To second group of intention of the input inquiry.
54. equipment as claimed in claim 53, include wherein second group of intention excavates unit:
In being described by least one similar intent information concentrated with input inquiry to replace the similar intent information to describe
Asterisk wildcard generate the unit of at least one intention, wherein at least one intention is not in first group of intention;And
The unit of at least one intention generated is added in first group of intention.
55. equipment as claimed in claim 53, include wherein second group of intention excavates unit:
Sequencing unit, set pair is described by using the similar intent information and carried out for first group of intention of the input inquiry
Sequence.
56. equipment as claimed in claim 55, further comprise wherein second group of intention excavates unit:
Peculiar intention assessment unit, identification is for the peculiar intention in first group of intention of the input inquiry;
Weight changes unit, according to special intentional peculiar degree, improve weight of the peculiar intention in the sequence;
Wherein, special intentional peculiar degree is calculated by least one in the following:
Input inquiry and special intentional common occurrence rate in the intention training set prepared;
Relation of the input inquiry with peculiar intention in domain knowledge;
Frequency of the peculiar intention in muster data;And
Popularity of the peculiar intention in inquiry log.
57. a kind of equipment for information retrieval, including:
Input inquiry receiving unit, receive the input inquiry that user uses natural language;
The equipment for being used to be intended to excavate according to any one of claim 30-56, is intended to from the input inquiry
Excavate;And
Search result obtaining unit, obtain the search result for excavating and being intended to.
58. a kind of equipment for question and answer auxiliary, including:
Input inquiry receiving unit, receive the input inquiry that user uses natural language;
The equipment for being used to be intended to excavate according to any one of claim 30-56, theme is excavated from the input inquiry;
And
Answer obtaining unit, obtain the answer for excavated theme.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310371165.5A CN104424216B (en) | 2013-08-23 | 2013-08-23 | Method and apparatus for being intended to excavate |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310371165.5A CN104424216B (en) | 2013-08-23 | 2013-08-23 | Method and apparatus for being intended to excavate |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104424216A CN104424216A (en) | 2015-03-18 |
CN104424216B true CN104424216B (en) | 2018-01-23 |
Family
ID=52973214
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310371165.5A Active CN104424216B (en) | 2013-08-23 | 2013-08-23 | Method and apparatus for being intended to excavate |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104424216B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106776981B (en) * | 2016-12-06 | 2020-12-15 | 广州同构科技有限公司 | Intelligent retrieval method based on empirical knowledge |
CN108287858B (en) * | 2017-03-02 | 2021-08-10 | 腾讯科技(深圳)有限公司 | Semantic extraction method and device for natural language |
CN107704450B (en) * | 2017-10-13 | 2020-12-04 | 威盛电子股份有限公司 | Natural language identification device and natural language identification method |
CN107679039B (en) * | 2017-10-17 | 2020-12-29 | 北京百度网讯科技有限公司 | Method and device for determining statement intention |
CN108170859B (en) * | 2018-01-22 | 2020-07-28 | 北京百度网讯科技有限公司 | Voice query method, device, storage medium and terminal equipment |
CN110309252B (en) * | 2018-02-28 | 2023-11-24 | 阿里巴巴集团控股有限公司 | Natural language processing method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101339551A (en) * | 2007-07-05 | 2009-01-07 | 日电(中国)有限公司 | Natural language query demand extension equipment and its method |
CN102063469A (en) * | 2010-12-03 | 2011-05-18 | 百度在线网络技术(北京)有限公司 | Method and device for acquiring relevant keyword message and computer equipment |
CN102096717A (en) * | 2011-02-15 | 2011-06-15 | 百度在线网络技术(北京)有限公司 | Search method and search engine |
CN102722558A (en) * | 2012-05-29 | 2012-10-10 | 百度在线网络技术(北京)有限公司 | User question recommending method and device |
CN103049495A (en) * | 2012-12-07 | 2013-04-17 | 百度在线网络技术(北京)有限公司 | Method, device and equipment for providing searching advice corresponding to inquiring sequence |
-
2013
- 2013-08-23 CN CN201310371165.5A patent/CN104424216B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101339551A (en) * | 2007-07-05 | 2009-01-07 | 日电(中国)有限公司 | Natural language query demand extension equipment and its method |
CN102063469A (en) * | 2010-12-03 | 2011-05-18 | 百度在线网络技术(北京)有限公司 | Method and device for acquiring relevant keyword message and computer equipment |
CN102096717A (en) * | 2011-02-15 | 2011-06-15 | 百度在线网络技术(北京)有限公司 | Search method and search engine |
CN102722558A (en) * | 2012-05-29 | 2012-10-10 | 百度在线网络技术(北京)有限公司 | User question recommending method and device |
CN103049495A (en) * | 2012-12-07 | 2013-04-17 | 百度在线网络技术(北京)有限公司 | Method, device and equipment for providing searching advice corresponding to inquiring sequence |
Also Published As
Publication number | Publication date |
---|---|
CN104424216A (en) | 2015-03-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104424216B (en) | Method and apparatus for being intended to excavate | |
Gupta et al. | A survey of text question answering techniques | |
US7925506B2 (en) | Speech recognition accuracy via concept to keyword mapping | |
JP4650072B2 (en) | Question answering system, data retrieval method, and computer program | |
EP2306451B1 (en) | Speech recognition | |
KR100806936B1 (en) | System and method for providing automatically completed recommended word by correcting and displaying the word | |
US8494839B2 (en) | Apparatus, method, and recording medium for morphological analysis and registering a new compound word | |
CN102567509B (en) | Method and system for instant messaging with visual messaging assistance | |
CN101425071A (en) | Location expression detection device and computer readable medium | |
JP4737435B2 (en) | LABELING SYSTEM, LABELING SERVICE SYSTEM, LABELING METHOD, AND LABELING PROGRAM | |
JP2006244262A (en) | Retrieval system, method and program for answer to question | |
US20120078907A1 (en) | Keyword presentation apparatus and method | |
CN101933017B (en) | Document search device, document search system, and document search method | |
González et al. | Siamese hierarchical attention networks for extractive summarization | |
Serigos | Applying corpus and computational methods to loanword research: new approaches to Anglicisms in Spanish | |
JP2008077252A (en) | Document ranking method, document retrieval method, document ranking device, document retrieval device, and recording medium | |
Aslam et al. | Web-AM: An efficient boilerplate removal algorithm for Web articles | |
Fenogenova et al. | A general method applicable to the search for anglicisms in russian social network texts | |
JP4783563B2 (en) | Index generation program, search program, index generation method, search method, index generation device, and search device | |
CA2483805C (en) | System and methods for improving accuracy of speech recognition | |
Mendes et al. | Just. Ask—A multi-pronged approach to question answering | |
CH-Wang et al. | Do Androids Know They're Only Dreaming of Electric Sheep? | |
US11734331B1 (en) | Systems and methods to optimize search for emerging concepts | |
Nguyen et al. | DCU and HCMUS at NTCIR-16 Lifelog-4 | |
JP5182960B2 (en) | Store name ambiguity resolving apparatus, method, program, and recording medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |