CN108733702A - User inquires method, apparatus, electronic equipment and the medium of hyponymy extraction - Google Patents
User inquires method, apparatus, electronic equipment and the medium of hyponymy extraction Download PDFInfo
- Publication number
- CN108733702A CN108733702A CN201710260844.3A CN201710260844A CN108733702A CN 108733702 A CN108733702 A CN 108733702A CN 201710260844 A CN201710260844 A CN 201710260844A CN 108733702 A CN108733702 A CN 108733702A
- Authority
- CN
- China
- Prior art keywords
- user
- inquiry
- inquires
- centering
- natural result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the present invention provides a kind of method, apparatus, electronic equipment and the medium of user's inquiry hyponymy extraction, the extraction for the user's inquiry that in electric business field, can carry out hyponymy, to provide effective support for commodity and recalling for advertisement.This method includes:Construct candidate user inquiry pair;Using the feature being arranged previously according to observation index, the candidate user is inquired to being expressed as feature vector;After manually being marked to progress to user's inquiry of middle preset quantity to candidate user inquiry, supervised learning is utilized to train grader;The remaining user's inquiry of the candidate user inquiry centering is judged to whether meeting hyponymy using trained grader, and output meets user's inquiry of hyponymy to as extraction result.
Description
Technical field
The present invention relates to method, dresses that field of computer technology more particularly to a kind of user inquire hyponymy extraction
It sets, electronic equipment and medium.
Background technology
Hyponymy is generally regarded as a kind of lexical semantic relation to study.It is different between semantic similar vocabulary
Relationship type, and to refer to the semanteme of some vocabulary include the semanteme of another vocabulary to hyponymy, then it is semantic by comprising
Vocabulary is known as hyponym, another word is known as hypernym.For example " animal " is the upper of " cat ", " cat " is the bottom of " animal ".
In electric business field, the search user of user inquiry (query, the i.e. querying condition of user, usually a short sentence)
Typically it is directed to the description of certain product.There is also hyponymies, such as user's inquiry " iPhone " between user's inquiry is
User inquires the bottom of " smart mobile phone ", and the next user's inquiry is a specialization of upper user's inquiry.When user inputs user
When inquiry, the next user inquiry of user inquiry can be utilized to remove retrieval commodity and advertisement, the commodity retrieved or advertisement symbol
The semanteme for closing the next user's inquiry, also complies with the semanteme of upper user's inquiry naturally, and retrieval result is for a user can be with
Receive.It can thus be seen that in electric business field, extracting user's inquiry with hyponymy can be to commodity and advertisement
Recall larger help.
In the prior art, include mainly following aspects for the research of hyponymy extraction:
Mainly have for the method for vocabulary hyponymy extraction:1. based on two vocabulary in the same sentence when co-occurrence
Route characteristic, determine whether hyponymy using template or grader;2. the context occurred every time based on vocabulary
Feature is included oriented comprising degree between being calculated two feature vectors comprising hypothesis based on distributed, and is based on two vocabulary
The directly trained grader of contextual feature vector.
In Webpage search field, there is the less hyponymy for researching and analysing user's inquiry.Vocabulary is compared in user's inquiry
With more rich semanteme, the semantic of the contained multiple vocabulary of upper user's inquiry must have similar table in the next user inquires
It reaches or the expression of specialization.For example " Samsung large-screen mobile phone " constitutes hyponymy, but " Samsung with " Samsung big screen intelligent machine "
Large-screen mobile phone " does not constitute hyponymy with " Samsung 4G cell phone ".It is directed to user in the prior art and inquires carrying for hyponymy
The research comparative maturity taken is to obtain the hyponymy of user's inquiry by analyzing the click data of user.In analysis user
Click data when, be related to it is following 3 hypothesis:1. if two users inquire related, the corresponding click of two users' inquiries
Collections of web pages needs intersection or there are similitudes;2. if user inquires qiIt is that user inquires qjUpper, then qjBig portion
Divide webpage clicking and qiWebpage clicking it is similar, and qiWebpage clicking only have part and qjWebpage clicking it is similar;If 3. with
Family inquiry is bottom, then consistency higher in the content of its webpage clicking.By assuming that 1 generates candidate the next user's inquiry
It is right, by assuming that 2 can design an inclusion relation, by assuming that 3 can design the extensive degree of user inquiry, then utilize
Whether the upper the next user's inquiry of the two index Designs threshold decision candidate is to being true.
In realizing process of the present invention, inventor has found that at least there are the following problems in the prior art:
1, in Webpage search field and electric business platform field, the context user inquiry that user inquires in session is not sufficient to
The accurate semanteme for portraying user's inquiry, while not having route characteristic between user's inquiry, therefore the upper bottom based on vocabulary is closed
It is the upper the next extraction that extractive technique can not be directly applied for the inquiry of electric business platform field user;
2 and at present in Webpage search field, it is less that extraction user inquires the feature used when hyponymy, and in electricity
The consistency of stringent web page contents is more difficult on quotient's platform judges that (such as each attribute of commodity of commodity page displaying will unanimously
It can determine whether the consistency of content).It is difficult to protect due to lacking for the technical optimization on electric business platform this special screne
Card predicting candidate user inquiry to for timing have compared with high-accuracy (grader be judged as in the sample of positive example be really positive example ratio
Example) in the case of, while thering is the recall rate of higher true positive example (to be really classified device for the sample of positive example and be judged as positive example
Ratio).
Invention content
In view of this, the embodiment of the present invention provides the method, apparatus that a kind of user inquires hyponymy extraction, electronics is set
Standby and medium, the extraction for user's inquiry that hyponymy in electric business field, can be carried out, to recalling for commodity and advertisement
Effective support is provided.
To achieve the above object, according to an aspect of the invention, there is provided a kind of user inquires hyponymy extraction
Method.
A kind of user of the embodiment of the present invention inquires the method that hyponymy extracts:Construct candidate user inquiry
It is right;Using the feature being arranged previously according to observation index, the candidate user is inquired to being expressed as feature vector;To described
Candidate user inquires user's inquiry of centering preset quantity to carrying out after manually marking, and grader is trained using supervised learning;
Judge that the remaining user of the candidate user inquiry centering inquires to whether meeting hyponymy using trained grader,
Output meets user's inquiry of hyponymy to as extraction result.
Optionally, the inquiry of construction candidate user is to including:User's inquiry is clustered;Then by user's inquiry two in class
Two combinations constitute candidate user inquiry pair.
Optionally, carrying out cluster to user's inquiry includes:According to the Session data structure figures of user, wherein with user
The node as figure is inquired, the user's query node for then to the co-occurrence number in window being more than predetermined threshold value is attached work
For the side of figure, side right is the product of following four value:User inquires co-occurrence number, user inquires lexical set after carrying out cutting word
The similarity for the natural result set that similarity, the embedding vector similarities of user's inquiry, user's inquiry are clicked;And
User's query node on figure is clustered using label propagation algorithm.
Optionally, the observation index includes following one or more:Observation index one:Upper user inquires oneself of displaying
Right number of results is more than the natural result number that the next user inquires displaying;Observation index two:Upper user inquires the natural knot of displaying
Fruit set includes the degree that the next user inquires the natural result set of displaying, is more than the natural result that the next user inquires displaying
Set inquires the degree of the natural result set of displaying comprising upper user;Observation index three:Upper user's inquiry is used with the next
The natural result set overlapping quantity of family inquiry displaying is bigger, and the natural result set inclusion relation that upper user inquires displaying is set
Reliability is higher;Observation index four:Upper user inquires the natural result number clicked and is more than the natural knot that the next user's inquiry is clicked
Fruit number;Observation index five:It includes that the next user inquires the natural result clicked that upper user, which inquires the natural result set clicked,
The degree of set, it includes that upper user inquires the natural result collection clicked to be more than the next user to inquire the natural result set clicked
The degree of conjunction;Observation index six:Upper user's inquiry quantity Chong Die with the natural result set that the next user inquires displaying is bigger,
It is higher that upper user inquires the natural result set inclusion relation confidence level clicked.
Optionally, the feature includes following one or more:The feature that index one is arranged according to the observation:Candidate user is looked into
The upper user for asking centering inquires the natural result number of displaying, the next user of candidate user inquiry centering inquires the nature of displaying
Number of results;The feature that index two is arranged according to the observation:Candidate user, which is calculated, using the WeedsPrec of not cum rights inquires the upper of centering
The natural result set that position user inquires displaying includes that the next user of candidate user inquiry centering inquires the natural result of displaying
The degree of set calculates the natural knot that candidate user inquires the upper user inquiry displaying of centering using the balPrec of not cum rights
Fruit set includes the degree that the next user of candidate user inquiry centering inquires the natural result set of displaying;Utilize not cum rights
The natural result set that WeedsPrec calculates the next user inquiry displaying of candidate user inquiry centering includes that candidate user is inquired
The upper user of centering is inquired the degree of the natural result set of displaying, is looked into using the balPrec calculating candidate user of not cum rights
The natural result set that the next user of inquiry centering inquires displaying includes the upper user inquiry displaying of candidate user inquiry centering
Natural result set degree;Using the difference of the result of calculation of the WeedsPrec of not cum rights, utilize not cum rights
The difference of the result of calculation of balPrec;The feature that index three is arranged according to the observation:The upper user of candidate user inquiry centering looks into
Inquiry, quantity, the candidate user that candidate user inquires the natural result overlapping of the next user inquiry displaying of centering inquire centering
Upper user's inquiry, candidate user inquire the not cum rights LIN scores of the natural result set of the next user inquiry displaying of centering,
Reflect the ratio of intersection;The feature that index four is arranged according to the observation:The upper user that candidate user inquires centering inquires click
The next user that natural result number, candidate user inquire centering inquires the natural result number clicked;Index five is arranged according to the observation
Feature:Inquiring the natural result set clicked using the upper user of WeedsPrec calculating candidate user inquiry centerings includes
Candidate user inquires the degree of the natural result set of the next user inquiry click of centering, and weight is number of clicks, utilizes
The natural result set that ClarkeDE calculates the upper user inquiry click of candidate user inquiry centering includes that candidate user is inquired
The next user of centering inquires the degree for the natural result set clicked, and weight is number of clicks, calculates candidate using balPrec
The natural result set that user inquires the upper user inquiry click of centering includes that the next user of candidate user inquiry centering looks into
The degree for the natural result set clicked is ask, weight is number of clicks;Candidate user, which is calculated, using WeedsPrec inquires centering
The next user inquires the upper user that the natural result set clicked includes candidate user inquiry centering and inquires the natural knot clicked
The degree of fruit set, weight are number of clicks, calculate the next user's query point that candidate user inquires centering using ClarkeDE
The natural result set hit includes that the upper user of candidate user inquiry centering inquires the degree for the natural result set clicked, power
Weight is number of clicks, is calculated the natural result set that the next user inquiry of candidate user inquiry centering is clicked using balPrec
Including the upper user of candidate user inquiry centering inquires the degree for the natural result set clicked, weight is number of clicks;Profit
With the difference of the result of calculation of WeedsPrec, using ClarkeDE result of calculation difference, utilize the calculating knot of balPrec
The difference of fruit;The feature that index six is arranged according to the observation:Candidate user inquires the upper user inquiry of centering, candidate user inquiry
The next user of centering inquires the quantity for the natural result overlapping clicked, the upper user of candidate user inquiry centering inquires, and waits
It selects the next user of family inquiry centering to inquire the LIN scores for the natural result set clicked, reflects the ratio of intersection;Other spies
Sign:Feature a:The natural result that candidate user inquires the upper user inquiry displaying of centering is calculated with the WeedsPrec of not cum rights
The next user of the set comprising candidate user inquiry centering inquires the degree for the natural result set clicked;Feature b:With not cum rights
WeedsPrec calculate candidate user inquiry centering the next user inquire displaying natural result set looked into comprising candidate user
The upper user for asking centering inquires the degree for the natural result set clicked;The difference of the result of calculation of feature a and feature b.
Optionally, after manually being marked to progress to user's inquiry of middle preset quantity to candidate user inquiry,
Further include using supervised learning training grader:By user's inquiry pair from candidate user inquiry to the preset quantity of middle selection
Carry out after manually marking, by the user of preset quantity inquiry to according to preset ratio be divided into training set, verification collection and
Test set;Using the training set for being expressed as feature vector, decision tree classifier is promoted using gradient and trains grader, is then utilized
It is expressed as the hyper parameter of the verification collection adjustment grader of feature vector;And judge that the user in test set inquires using grader
To whether meeting hyponymy, accuracy rate and recall rate are then calculated.
Optionally, further include using supervised learning training grader:Decision tree classifier, random forest are promoted using gradient
One or more trained graders in grader, support vector machine classifier.
To achieve the above object, according to another aspect of the present invention, a kind of user's inquiry hyponymy extraction is provided
Device.
A kind of user of the embodiment of the present invention inquires the device that hyponymy extracts:Candidate block, for constructing
Candidate user inquiry pair;Representation module, for using the feature being arranged previously according to observation index, the candidate user to be inquired
To being expressed as feature vector;Training module, for candidate user inquiry to the user of middle preset quantity inquiry into
After pedestrian's work mark, grader is trained using supervised learning;Extraction module, described in being judged using trained grader
To whether meeting hyponymy, the user that output meets hyponymy looks into the remaining user's inquiry of candidate user inquiry centering
It askes to as extraction result.
Optionally, the candidate block is additionally operable to:User's inquiry is clustered;Then user in class is inquired into group two-by-two
It closes, constitutes candidate user inquiry pair.
Optionally, the candidate block is additionally operable to:According to the Session data structure figures of user, wherein inquired with user
As the node of figure, then user's query node to the co-occurrence number in window more than predetermined threshold value is attached as figure
Side, side right be following four value product:Lexical set is similar after user's inquiry co-occurrence number, user's inquiry progress cutting word
The similarity for the natural result set that degree, the embedding vector similarities of user's inquiry, user's inquiry are clicked;And it utilizes
Label propagation algorithm clusters user's query node on figure.
Optionally, the observation index includes following one or more:Observation index one:Upper user inquires oneself of displaying
Right number of results is more than the natural result number that the next user inquires displaying;Observation index two:Upper user inquires the natural knot of displaying
Fruit set includes the degree that the next user inquires the natural result set of displaying, is more than the natural result that the next user inquires displaying
Set inquires the degree of the natural result set of displaying comprising upper user;Observation index three:Upper user's inquiry is used with the next
The natural result set overlapping quantity of family inquiry displaying is bigger, and the natural result set inclusion relation that upper user inquires displaying is set
Reliability is higher;Observation index four:Upper user inquires the natural result number clicked and is more than the natural knot that the next user's inquiry is clicked
Fruit number;Observation index five:It includes that the next user inquires the natural result clicked that upper user, which inquires the natural result set clicked,
The degree of set, it includes that upper user inquires the natural result collection clicked to be more than the next user to inquire the natural result set clicked
The degree of conjunction;Observation index six:Upper user's inquiry quantity Chong Die with the natural result set that the next user inquires displaying is bigger,
It is higher that upper user inquires the natural result set inclusion relation confidence level clicked.
Optionally, the feature includes following one or more:The feature that index one is arranged according to the observation:Candidate user is looked into
The upper user for asking centering inquires the natural result number of displaying, the next user of candidate user inquiry centering inquires the nature of displaying
Number of results;The feature that index two is arranged according to the observation:Candidate user, which is calculated, using the WeedsPrec of not cum rights inquires the upper of centering
The natural result set that position user inquires displaying includes that the next user of candidate user inquiry centering inquires the natural result of displaying
The degree of set calculates the natural knot that candidate user inquires the upper user inquiry displaying of centering using the balPrec of not cum rights
Fruit set includes the degree that the next user of candidate user inquiry centering inquires the natural result set of displaying;Utilize not cum rights
The natural result set that WeedsPrec calculates the next user inquiry displaying of candidate user inquiry centering includes that candidate user is inquired
The upper user of centering is inquired the degree of the natural result set of displaying, is looked into using the balPrec calculating candidate user of not cum rights
The natural result set that the next user of inquiry centering inquires displaying includes the upper user inquiry displaying of candidate user inquiry centering
Natural result set degree;Using the difference of the result of calculation of the WeedsPrec of not cum rights, utilize not cum rights
The difference of the result of calculation of balPrec;The feature that index three is arranged according to the observation:The upper user of candidate user inquiry centering looks into
Inquiry, quantity, the candidate user that candidate user inquires the natural result overlapping of the next user inquiry displaying of centering inquire centering
Upper user's inquiry, candidate user inquire the not cum rights LIN scores of the natural result set of the next user inquiry displaying of centering,
Reflect the ratio of intersection;The feature that index four is arranged according to the observation:The upper user that candidate user inquires centering inquires click
The next user that natural result number, candidate user inquire centering inquires the natural result number clicked;Index five is arranged according to the observation
Feature:Inquiring the natural result set clicked using the upper user of WeedsPrec calculating candidate user inquiry centerings includes
Candidate user inquires the degree of the natural result set of the next user inquiry click of centering, and weight is number of clicks, utilizes
The natural result set that ClarkeDE calculates the upper user inquiry click of candidate user inquiry centering includes that candidate user is inquired
The next user of centering inquires the degree for the natural result set clicked, and weight is number of clicks, calculates candidate using balPrec
The natural result set that user inquires the upper user inquiry click of centering includes that the next user of candidate user inquiry centering looks into
The degree for the natural result set clicked is ask, weight is number of clicks;Candidate user, which is calculated, using WeedsPrec inquires centering
The next user inquires the upper user that the natural result set clicked includes candidate user inquiry centering and inquires the natural knot clicked
The degree of fruit set, weight are number of clicks, calculate the next user's query point that candidate user inquires centering using ClarkeDE
The natural result set hit includes that the upper user of candidate user inquiry centering inquires the degree for the natural result set clicked, power
Weight is number of clicks, is calculated the natural result set that the next user inquiry of candidate user inquiry centering is clicked using balPrec
Including the upper user of candidate user inquiry centering inquires the degree for the natural result set clicked, weight is number of clicks;Profit
With the difference of the result of calculation of WeedsPrec, using ClarkeDE result of calculation difference, utilize the calculating knot of balPrec
The difference of fruit;The feature that index six is arranged according to the observation:Candidate user inquires the upper user inquiry of centering, candidate user inquiry
The next user of centering inquires the quantity for the natural result overlapping clicked, the upper user of candidate user inquiry centering inquires, and waits
It selects the next user of family inquiry centering to inquire the LIN scores for the natural result set clicked, reflects the ratio of intersection;Other spies
Sign:Feature a:The natural result that candidate user inquires the upper user inquiry displaying of centering is calculated with the WeedsPrec of not cum rights
The next user of the set comprising candidate user inquiry centering inquires the degree for the natural result set clicked;Feature b:With not cum rights
WeedsPrec calculate candidate user inquiry centering the next user inquire displaying natural result set looked into comprising candidate user
The upper user for asking centering inquires the degree for the natural result set clicked;The difference of the result of calculation of feature a and feature b.
Optionally, the training module is additionally operable to:By the user from candidate user inquiry to the preset quantity of middle selection
Inquiry inquires the user of the preset quantity to being divided into training set according to preset ratio, testing to carrying out after manually marking
Card collection and test set;Using the training set for being expressed as feature vector, decision tree classifier is promoted using gradient and trains grader, so
The hyper parameter for the verification collection adjustment grader for being expressed as feature vector is utilized afterwards;And judge the use in test set using grader
Family is inquired to whether meeting hyponymy, and accuracy rate and recall rate are then calculated.
Optionally, the training module is additionally operable to:Decision tree classifier, random forest grader, branch are promoted using gradient
Hold one or more trained graders in vector machine classifier.
To achieve the above object, in accordance with a further aspect of the present invention, a kind of electronic equipment is provided.
The a kind of electronic equipment of the embodiment of the present invention includes:One or more processors;Storage device, for storing one
Or multiple programs, when one or more of programs are executed by one or more of processors so that one or more of
Processor realizes the method that the user of the embodiment of the present invention inquires hyponymy extraction.
To achieve the above object, according to another aspect of the invention, a kind of computer-readable medium is provided.
The computer-readable medium of the embodiment of the present invention, is stored thereon with computer program, and described program is held by processor
The method that the user of the embodiment of the present invention inquires hyponymy extraction is realized when row.
One embodiment in foregoing invention has the following advantages that or advantageous effect:Because using based on electric business field
Feature designs the feature of multidimensional, and oriented user's inquiry pair is characterized by multiple features, carries out the two classification instructions for having supervision
Practice, so as to help grader study to the technological means of more accurate Rule of judgment, so overcome makes in the prior art
With vocabulary hyponymy extract limitation and Webpage search domain features are less in the prior art and are difficult to the skill judged
Art problem, and then reached in the case where ensureing predicting candidate user inquiry to having compared with high-accuracy for timing while having had higher
True positive example recall rate, realize the technique effect of accurate extraction electric business field user inquiry hyponymy, be electric business
In field larger help is provided to commodity and recalling for advertisement;Based on user inquire cluster by way of, in class two-by-two
User, which inquires, constitutes candidate user inquiry pair, so as to relatively reasonable construction candidate user inquiry pair, reduces artificial mark
Burden;The characteristics of by being specific commodity page for the webpage retrieved under electric business environment, propose multiple observation index, to
Candidate user can be inquired to numerous features to being expressed as considering above multiple observation index, to help to accurately identify use
Whether family is inquired to being hyponymy;Decision tree training grader on training set is promoted by using gradient, is used in combination and acquires
Grader to candidate user inquiry pair and the follow-up candidate user inquiry newly excavated to carrying out the positive and negative judgements of two classification, from
And the next extraction result in accurate user's inquiry can be obtained.
Further effect possessed by above-mentioned non-usual optional mode adds hereinafter in conjunction with specific implementation mode
With explanation.
Description of the drawings
Attached drawing does not constitute inappropriate limitation of the present invention for more fully understanding the present invention.Wherein:
Fig. 1 is the signal of the key step for the method that user according to the ... of the embodiment of the present invention inquires hyponymy extraction
Figure;
Fig. 2 is the signal of the main modular for the device that user according to the ... of the embodiment of the present invention inquires hyponymy extraction
Figure;
Fig. 3 is adapted for the structural representation of the computer system for the terminal device or server of realizing the embodiment of the present application
Figure.
Specific implementation mode
It explains to the exemplary embodiment of the present invention below in conjunction with attached drawing, including the various of the embodiment of the present invention
Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize
It arrives, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together
The description to known function and structure is omitted for clarity and conciseness in sample in following description.
An embodiment of the present invention provides a kind of skills for the hyponymy extraction inquired for user in electric business platform field
Art scheme, and in particular to the semantic understanding of user's inquiry belongs to natural language processing field.In order to overcome the prior art not
The characteristics of foot, the technical solution of the embodiment of the present invention is based on electric business platform, new feature is devised, and by multiple features come table
Oriented user's inquiry pair is levied, two classification based trainings for having supervision are carried out, it is accurate in guarantee to which more accurate judgment condition is arrived in study
There can be higher recall rate in the case of true rate.The technical solution may be conveniently used user under electric business environment inquire it is upper
The next relationship extraction.
Fig. 1 is the signal of the key step for the method that user according to the ... of the embodiment of the present invention inquires hyponymy extraction
Figure.
As shown in Figure 1, the method that a kind of user of the embodiment of the present invention inquires hyponymy extraction includes mainly as follows
Step:
Step S11:Construct candidate user inquiry pair.The construction candidate user of this step is inquired to being that user is inquired list
In each user inquiry, extract some potentially with each user inquiry may constitute the next relationship user inquiry pair.This
The user's inquiry that can be inquired in list user in inventive embodiments clusters, and user in class is then inquired combination of two
Mode, constitute candidate user inquiry pair.
Cluster in the embodiment of the present invention can be, but not limited to be to carry out in the following way:According to the Session of user
Data structure figures, wherein be more than then predetermined threshold value to the co-occurrence number in window using user's inquiry as the node of figure
User's query node is attached the side as figure, and side right is the product of following four value:User inquires co-occurrence number, Yong Hucha
The similarity of lexical set, the embedding vector similarities of user's inquiry, user inquire the nature clicked after inquiry progress cutting word
The similarity of results set;And user's query node on figure is clustered using label propagation algorithm.
After the construction that step S11 completes candidate pair, from step S12 start to process.
Step S12:Using the feature being arranged previously according to observation index, candidate user inquiry is characterized expression
Vector.Before training grader, need to arbitrary user inquiry to being indicated using identical data presentation technique.For
It, is expressed as a vector of feature space by each user's inquiry pair.It is technical scheme of the present invention as preceding, based on electricity
The characteristics of quotient's platform, devises new feature, and oriented user's inquiry pair is characterized by multiple features.Implement in the present invention
In example, and 6 observation index being directed under true electric business environment (it such as can be 6 points, but be not limited at 6 points, Ke Yishi
It is wherein some, other observation index can also be added), devising some can help to judge q2Whether it is q1The next feature.
Observation index in the embodiment of the present invention includes following one or more:Observation index one:Upper user inquires exhibition
Show natural result number (inquired according to user and the item number of the commodity of non-advertisement that shows, carried out by the SKU of commodity in terms of
Number, identical commodity do not compute repeatedly number.Wherein, SKU is the abbreviation of product Unified number, each product is corresponding with uniquely
No. SKU.Therefore, in the embodiment of the present invention, natural result is represented with the SKU of commodity, i.e. natural result number is SKU numbers, natural
Results set is SKU set.Therefore hereafter at description " user inquires the natural result number of displaying " with " user inquires displaying
SKU numbers " replace) be more than the SKU numbers that the next user inquires displaying;Observation index two:Upper user inquires the SKU set of displaying
Including the next user inquires the degree of the SKU set of displaying, it includes upper user to be more than the next user to inquire the SKU set of displaying
The degree of the SKU set of inquiry displaying;Observation index three:The SKU set weights of displaying are inquired in upper user's inquiry with the next user
Folded quantity is bigger, and the SKU set inclusion relation confidence levels that upper user inquires displaying are higher;Observation index four:Upper user looks into
(user inquires the natural result number clicked and is carried out a little according to the natural result that user inquires the natural result number that inquiry is clicked
The number of the specific commodity page hit, similarly with aforementioned " user inquires the natural result number of displaying ", in the embodiment of the present invention, " point
The natural result number hit " is represented by " the SKU numbers of click ", and " the natural result set of click " is represented by " the SKU collection of click
Close ") it is more than the SKU numbers that the next user inquires click;Observation index five:Upper user inquires the SKU set clicked and includes bottom
User inquires the degree for the SKU set clicked, and it includes that upper user inquiry is clicked that the SKU set clicked is inquired more than bottom user
SKU set degree;Observation index six:Upper user's inquiry quantity Chong Die with the SKU set that the next user inquires displaying is got over
Greatly, it is higher to inquire the SKU set inclusion relation confidence levels clicked by upper user.
The feature being related to based on foregoing observations index can be, but not limited to include following one or more:Index according to the observation
The feature of one setting:The upper user that candidate user inquires centering inquires the SKU numbers of displaying, candidate user inquires the bottom of centering
User inquires the SKU numbers of displaying;The feature that index two is arranged according to the observation:Candidate use is calculated using the WeedsPrec of not cum rights
The next user inquiry displaying of the SKU set comprising candidate user inquiry centering of the upper user inquiry displaying of centering is inquired at family
The degree of SKU set calculates the SKU that candidate user inquires the upper user inquiry displaying of centering using the balPrec of not cum rights
The next user of the set comprising candidate user inquiry centering inquires the degree of the SKU set of displaying;Utilize not cum rights
The SKU set that WeedsPrec calculates the next user inquiry displaying of candidate user inquiry centering inquires centering comprising candidate user
Upper user inquire displaying SKU set degree, using not cum rights balPrec calculate candidate user inquiry centering under
The SKU set that position user inquires displaying includes the degree that the upper user of candidate user inquiry centering inquires the SKU set of displaying;
Using the difference of the result of calculation of the WeedsPrec of not cum rights, using not cum rights balPrec result of calculation difference;Root
The feature that index three is arranged according to observations:Candidate user inquires the upper user inquiry of centering, and candidate user inquires the bottom of centering
User inquires the quantity of the SKU overlappings of displaying, the upper user of candidate user inquiry centering inquires, and candidate user inquires centering
The next user inquires the not cum rights LIN scores of the SKU set of displaying, reflects the ratio of intersection;Index four is arranged according to the observation
Feature:The upper user that candidate user inquires centering inquires the SKU numbers clicked, the next user inquiry of candidate user inquiry centering
The SKU numbers of click;The feature that index five is arranged according to the observation:Candidate user, which is calculated, using WeedsPrec inquires the upper of centering
User inquires the degree that the next user of the SKU set comprising candidate user inquiry centering clicked inquires the SKU set clicked, power
Weight is number of clicks, the SKU set clicked using the upper user inquiry of ClarkeDE calculating candidate user inquiry centerings includes
The next user that candidate user inquires centering inquires the degree that the SKU clicked gathers, and weight is number of clicks, utilizes balPrec
The upper user for calculating candidate user inquiry centering inquires the next user that the SKU set clicked includes candidate user inquiry centering
The degree for the SKU set that inquiry is clicked, weight is number of clicks;It is calculated under candidate user inquiry centering using WeedsPrec
Position user inquires the degree that upper user of the SKU set comprising candidate user inquiry centering clicked inquires the SKU set clicked,
Weight is number of clicks, is gathered using the SKU of the next user inquiry click of ClarkeDE calculating candidate user inquiry centerings and wrapped
The upper user that centering is inquired containing candidate user inquires the degree for the SKU set clicked, and weight is number of clicks, utilizes
The SKU set that balPrec calculates the next user inquiry click of candidate user inquiry centering includes candidate user inquiry centering
Upper user inquires the degree for the SKU set clicked, and weight is number of clicks;Using the difference of the result of calculation of WeedsPrec,
Using the difference of the result of calculation of ClarkeDE, using balPrec result of calculation difference;Index six is arranged according to the observation
Feature:Candidate user inquires the upper user inquiry of centering, and the next user that candidate user inquires centering inquires the SKU weights clicked
The upper user inquiry of folded quantity, candidate user inquiry centering, the next user that candidate user inquires centering inquire click
The LIN scores of SKU set, reflect the ratio of intersection;Other feature:Feature a:Candidate use is calculated with the WeedsPrec of not cum rights
The SKU set that the upper user inquiry displaying of centering is inquired at family includes that the next user inquiry of candidate user inquiry centering is clicked
The degree of SKU set;Feature b:The next user inquiry displaying that candidate user inquires centering is calculated with the WeedsPrec of not cum rights
SKU set comprising candidate user inquiry centering upper user inquire click SKU set degree;Feature a and feature b
The difference of result of calculation.
Previous designs indicate well after the feature of user's inquiry pair, so that it may to train grader.
Step S13:After manually being marked to progress to user's inquiry of middle preset quantity to candidate user inquiry,
Grader is trained using supervised learning.Detailed process may include:By from candidate user inquiry to the preset quantity of middle selection
User's inquiry to carrying out after manually marking, the user of the preset quantity is inquired to being divided into training according to preset ratio
Collection, verification collection and test set;Using the training set for being expressed as feature vector, decision tree classifier training classification is promoted using gradient
Then device utilizes the hyper parameter for the verification collection adjustment grader for being expressed as feature vector;And judge test set using grader
In user inquire to whether meeting hyponymy, then calculate accuracy rate and recall rate.
Wherein, training grader can promote decision tree classifier, random forest grader, supporting vector using gradient
One or more trained graders in machine grader.
Step S14:Judge the remaining user's inquiry of the candidate user inquiry centering to being using trained grader
No to meet hyponymy, output meets user's inquiry of hyponymy to as extraction result.
The key step that the method for hyponymy extraction is inquired the user of the embodiment of the present invention in foregoing description carries out
Explanation.The detailed process of this method is described in detail below in conjunction with particular technique means.
The flow that specific extraction user inquires hyponymy is as follows:
The first step:Obtain candidate user inquiry pair
Each user in list is inquired for user and inquires qi, it is potential that we extract some in user's inquiry list first
And qiThe candidate user inquiry that the next relationship may be constituted, constitutes different candidate users and inquires to (q1,q2), wherein q1It is
Candidate user inquires the upper user inquiry of centering, q2It is the next user inquiry of candidate user inquiry centering.It can be by having
Identical specific commodity page is clicked, or obtains candidate user more than simple conditions such as certain threshold values using inclusion relation is clicked
Inquiry pair.In the embodiment of the present invention, it can be, but not limited to be the mode for inquiring cluster based on user, two two users in class inquired
Constitute candidate user inquiry pair.Relatively reasonable candidate user inquiry pair, can reduce the burden manually marked, because there is more ratio
Example can be labeled as positive example.Arbitrary two two users can also be inquired and constitute candidate user inquiry pair.Construct candidate user
Inquire on method do not influence data expression and grader such data expression under classification performance.
In aforementioned cluster process, clustering rule is as follows:According to Session data (data record user's continuous-query
Which user inquiry) counting user inquiry the co-occurrence number in window, whether be more than predetermined threshold value to using according to the co-occurrence frequency
Family inquiry connects side, constitutes using user's inquiry as the figure of node.The product for being set as following 4 indexs of side right:User's inquiry is altogether
Occurrence number, user inquire the similarity of lexical set after cutting word, the embedding vector similarities of user's inquiry, user's inquiry
Click the similarity of SKU set.After constituting figure, the node (user's inquiry) on figure is gathered using label propagation algorithm
Class.
It should be noted that the mode that cluster is inquired based on user obtain candidate user inquiry to be only obtain it is candidate right
A kind of mode, also some other mode, this do not influence it is subsequent from candidate centering grader judge user inquiry whether
It is the thinking of upper bottom.
Second step:Expression/characteristic Design of data
Before training grader, need to arbitrary user inquiry to being indicated using identical data presentation technique.
Each candidate user is inquired to (q1,q2), it is expressed as to a vector of feature space first.
The characteristics of for the webpage retrieved under electric business environment being specific commodity page (SKU), observes that candidate user is inquired
To inner true upper user's inquiry compared to true the next user's inquiry, there are following features:1. the SKU numbers of displaying are more, 2. displayings
SKU set usually include more SKU that true the next user inquires displaying, when the SKU set of 3. displayings has more overlapping,
The SKU of displaying gathers inclusion relation confidence level higher, and the SKU numbers of 4. clicks are more, and the 5. SKU set clicked usually more are wrapped
The SKU clicked is inquired containing true the next user, when the 6. SKU set clicked have more overlapping, the SKU of click gathers inclusion relation
Confidence level higher.Based on the above observation, we inquire candidate user to being expressed as considering multiple features of the above feature, then lead to
It crosses gradient and promotes decision tree training pattern on training set, the model acquired is to the candidate user inquiry newly excavated to carrying out two points
The positive and negative judgement of class.
In the embodiment of the present invention, according to described above for 6 observation index under true electric business environment, devise
It is the following to help to judge q2Whether it is q1The next feature.
Based on observation index one (the SKU numbers that upper user inquires displaying are more), design:
Feature 1:q1Displaying SKU numbers (counted as previously mentioned, natural result number is the number based on SKU, displaying
Natural result number refers to the SKU that user shows when page turning.q1The natural result number of displaying refers to q1Inquiry is lower all
The SKU quantity shown, the same SKU do not add up)
Feature 2:q2The SKU numbers of displaying
Based on observation index two, (the SKU set that upper user inquires displaying usually includes more that true the next user looks into
Ask the SKU of displaying), design:
Feature 3:q1The SKU set of displaying includes q2The degree of the SKU set of displaying, can continue to be subdivided into 2 features:
Feature 3.1:Inclusion relation is calculated using the WeedsPrec of not cum rights (specific formula for calculation sees below)
Feature 3.2:Inclusion relation is calculated using the balPrec of not cum rights
Feature 4:q2The SKU set of displaying includes q1The degree of the SKU set of displaying, can continue to be subdivided into 2 features:
Feature 4.1:Inclusion relation is calculated using the WeedsPrec of not cum rights
Feature 4.2:Inclusion relation is calculated using the balPrec of not cum rights
Feature 5:The difference of feature 3 and feature 4 can continue to be subdivided into 2 features:
Feature 5.1 calculates inclusion relation, obtained difference using the WeedsPrec of not cum rights
Feature 5.2 calculates inclusion relation, obtained difference using the balPrec of not cum rights
Based on observation index three, (when the SKU set of displaying has more overlapping, the SKU of displaying gathers inclusion relation confidence level
Higher), design:
Feature 6:q1,q2The quantity of the SKU overlappings of displaying
Feature 7:q1,q2The not cum rights LIN scores of the SKU set of displaying, reflecting the ratio of intersection, (specific formula for calculation is shown in
Hereafter)
In observation index three, it is proposed that 2 features:Feature 6 is the size of intersection, and feature 7 is the score that LIN is calculated.
The score calculated for LIN:Due to not cum rights it can be seen from formula hereafter, the molecule of LIN is exactly the size of intersection, and
Denominator is the sum of two set sizes.
In the design for carrying out feature 6 and feature 7 it is to be based on following consideration based on observation index three:q1、q2Show SKU intersections
Quantity it is fewer when, if lucky q2The SKU negligible amounts itself showed, then being likely to q1It includes q to show SKU2Show
The score of SKU can be higher.And if q1、q2Show SKU intersections quantity it is more when, q1And q2The SKU quantity itself showed is just
Will not be low, the result of calculation of other features being related at this time based on other observation index there are the case where relatively large deviation occur can
Energy property is with regard to relatively low.Feature 6, feature 7 based on three confidence level of observation index are introduced, subsequent disaggregated model can be helped to be distinguished
The sample of those other but inclusion relations higher by the calculated inclusion relation of other feature not confidence is (if these samples are not added with
Enter confidence characteristic, be easy to be mistaken for positive example).
If for example, when the inclusion relation that is calculated of other feature is higher, and the LIN scores of feature 7 are very low, then show to wait
The SKU of family inquiry centering bottom user inquiry is selected to gather the inquiry centering bottom user inquiry of smaller or candidate user
The SKU set that SKU gathers upper user's inquiry in the inquiry of bigger while candidate user is huge.Along with intersection size, this is special
Sign 6 carries out user to inquire the sorter model that hyponymy judges to distinguish on earth the case where being the former or the latter, friendship
It is the latter to collect greatly, and small intersection is for the former.
Based on observation index four (the SKU numbers that upper user inquires click are more), design:
Feature 8:q1The SKU numbers of click
Feature 9:q2The SKU numbers of click
Based on observation index five, (it usually includes more that true the next user looks into that upper user, which inquires the SKU set clicked,
Ask the SKU clicked), design:
Feature 10:q1The SKU set of click includes q2The degree of the SKU set of click, can continue to be subdivided into 3 features:
Feature 10.1:Inclusion relation is calculated using WeedsPrec, weight is number of clicks
Feature 10.2:Inclusion relation is calculated using ClarkeDE, weight is number of clicks
Feature 10.3:Inclusion relation is calculated using balPrec, weight is number of clicks
Feature 11:q2The SKU set of click includes q1The degree of the SKU set of click, can continue to be subdivided into 3 features:
Feature 11.1:Inclusion relation is calculated using WeedsPrec, weight is number of clicks
Feature 11.2:Inclusion relation is calculated using ClarkeDE, weight is number of clicks
Feature 11.3:Inclusion relation is calculated using balPrec, weight is number of clicks
Feature 12:The difference of feature 10 and feature 11 can continue to be subdivided into 3 features:
Feature 12.1:Inclusion relation, obtained difference are calculated using WeedsPrec
Feature 12.2:Inclusion relation, obtained difference are calculated using ClarkeDE
Feature 12.3:Inclusion relation, obtained difference are calculated using balPrec
Based on observation index six, (when the SKU set of click has more overlapping, the SKU of click gathers inclusion relation confidence level
Higher), design:
Feature 13:q1,q2The quantity of the SKU overlappings of click
Feature 14:q1,q2The LIN scores of the SKU set of click, reflect the ratio of intersection
Furthermore it is also possible to design some other features:
Feature 15:q1The SKU set of displaying includes q2The degree of the SKU set of click, is counted with the WeedsPrec of not cum rights
Calculate inclusion relation
Feature 16:q2The SKU set of displaying includes q1The degree of the SKU set of click, is counted with the WeedsPrec of not cum rights
Calculate inclusion relation
Feature 17:The difference of feature 15 and feature 16
In preceding feature design process, calculation formula is as described below, the inclusion relation calculation formula of feature vector:Given institute
There is the feature vector F of xx,wx(f) it is weights of the x on feature f, v includes the degree of u:
Inclusion relation confidence calculations formula:
Wherein, with the use of feature 3.1 not cum rights WeedsPrec calculate q1The SKU set of displaying includes q2The SKU of displaying
For the specific calculating process of the degree of set:
What feature 3.1 reflected is to calculate q with the WeedsPrec of not cum rights1The SKU set of displaying includes q2The SKU collection of displaying
The degree of conjunction, it is assumed that q1The SKU collection of displaying is combined into { SKU1, SKU2, SKU3, SKU4 }, due to not cum rights, wherein each SKU weights
It is 1, q2The SKU collection of displaying is combined into { SKU1, SKU2, SKU9 }, due to not cum rights, wherein each SKU weights are 1, then and WeedsPrec
(q2,q1)=(SKU1 weight+SKU2 weight)/(SKU1 weight+SKU2 weight+SKU3 weight+SKU4 weights)=0.5.
In preceding feature design, feature 1,2,8,9 reflects the specialization degree of user's inquiry itself to a certain extent, and
It includes degree that feature 3,4,5,10,11,12,15,16,17, which reflects the semanteme between user's inquiry, and feature 6,7,13,14 reflects
The semantic confidence level for including degree correlated characteristic.
To sum up, 26 features are devised altogether, and therefore, each candidate user is inquired to (q1,q2), it is expressed as 26
The vector of dimension, each vectorial dimension correspond to a feature, and the value of this dimension corresponds to the inquiry pair of this candidate user
(q1,q2) value in this feature.Observation index and feature in technical solution of the present invention, it is not limited to the embodiment of the present invention
Enumerate range, can also be according to reality when the technical solution that hyponymy is extracted in user's inquiry of the practical application present invention
The addition of border observation demand reduces some observation index and feature.
Third walks:Training
Judge candidate user inquiry to (q using grader1,q2) whether meet hyponymy.It is training grader below
Standard step.From candidate user inquiry, to extracting right quantity in set, (right quantity indicates following meaning herein:By standard
Way, first manually mark batch of data, with classifier training, the effect on verification collection then examined, if on training set
Trained application condition is small, and in verification collection, above the effect is relatively poor, then illustrates that training data is insufficient, needs to continue to extract candidate
User inquiry to carry out manually mark) user inquiry pair, manually marked, if meet hyponymy.For extracting
Each candidate user inquire to (q1,q2), a label is manually assigned, 1 indicates q1It is q2It is upper, 0 indicate q1It is not q2's
It is upper.Label is used for instructing how grader goes to judge whether to meet hyponymy according to feature vector.Then by extraction
Part is divided into the training set of proper proportion, verification collection, test set.
Each pair of candidate user in training set and verification collection is inquired into the feature vector to being expressed as 26 dimensions, in the spy of training set
Decision tree classifier is promoted on sign vector using such as, but not limited to gradient (to be not limited to gradient promotion in the embodiment of the present invention to determine
Plan Tree Classifier can also use the graders such as random forest, support vector machines) it is trained, it is adjusted on verification collection
The hyper parameter of grader prevents grader over-fitting on training set.
For classification performance of the quantitative analysis grader on the sample not observed, judged with grader on test set
Whether meet hyponymy, then calculates accuracy rate and recall rate.
4th step:Prediction
In forecast period, go to excavate all user's inquiries pair for meeting hyponymy using trained grader.
First, the remaining candidate user not marked is inquired into the feature vector to being expressed as 26 dimensions, is promoted by trained gradient
User's inquiry that decision tree prediction does not mark can will be predicted as positive candidate user inquiry pair and artificial mark to whether being positive example
Note is that positive user inquires the final output to being together as the embodiment of the present invention, that is, meets user's inquiry of hyponymy
It is right.
According to the method that user in aforementioned electric business of embodiment of the present invention field inquires hyponymy extraction, in candidate user
Inquiry is positive and negative by the promotion decision-tree model prediction of trained gradient to inner, so as to obtain meeting hyponymy
User inquires to as output.
Inventor implement technical scheme of the present invention carry out user inquiry hyponymy is extracted when, by
Training on 338 training sets manually marked, the super ginseng of degree and model that adjustment negative sample up-samples on verification collection at 200
Number --- the quantity and depth capacity of tree is predicted as positive candidate to inner, accuracy rate TP/ (TP+NP) is on test set
93.2%, and recall rate TP/ (TP+FN) is 36.6%.If merely with the one of feature of feature, such as feature 12.1, it is testing
Adjustment threshold value obtains 93.2% accuracy rate of the maximum accuracy rate not as good as grader on verification collection on card collection, and is surveying at this time
The upper accuracy rate of examination collection is 88.9%, and recall rate is only 7.1%.Therefore, in practical application, in order to ensure accuracy rate and recall rate,
A candidate user inquiry pair can be indicated based on aforementioned 26 features.If without higher accuracy rate or recall rate requirement,
The feature vector for carrying out candidate user inquiry pair using one or more of 26 features feature can be selected to indicate.To sum up,
The method that the user of embodiment of this case inquires hyponymy extraction may be implemented in the case where ensureing compared with high-accuracy, have compared with
High recall rate.
The method that user according to the ... of the embodiment of the present invention inquires hyponymy extraction can be seen that be based on because using
The characteristics of electric business field, designs the feature of multidimensional, and oriented user's inquiry pair is characterized by multiple features, has carried out supervision
Two classification based trainings, so as to help grader study to the technological means of more accurate Rule of judgment, so overcoming existing
There is the limitation extracted using vocabulary hyponymy in technology and Webpage search domain features are less and difficult in the prior art
The technical issues of to judge, and then reached in the case where ensureing predicting candidate user inquiry to having compared with high-accuracy for timing
The recall rate for having higher true positive example simultaneously realizes the technology effect of accurate extraction electric business field user inquiry hyponymy
Fruit, to provide larger help to commodity and recalling for advertisement in electric business field;It is right by way of inquiring cluster based on user
Two two users, which inquire, in class constitutes candidate user inquiry pair, so as to relatively reasonable construction candidate user inquiry pair, reduces
The burden manually marked;The characteristics of by being specific commodity page for the webpage retrieved under electric business environment, propose multiple observations
Index, so as to which candidate user is inquired numerous features to being expressed as considering above multiple observation index, to contribute to
User's inquiry is accurately identified to whether being hyponymy;Decision tree training classification on training set is promoted by using gradient
Device is used in combination the grader acquired to candidate user inquiry pair and the follow-up candidate user inquiry newly excavated to carrying out two classification
Positive and negative judgement, so as to obtain the next extraction result in accurate user's inquiry.
Fig. 2 is the signal of the main modular for the device that user according to the ... of the embodiment of the present invention inquires hyponymy extraction
Figure.
As shown in Fig. 2, the device 20 that a kind of user of the embodiment of the present invention inquires hyponymy extraction includes mainly:It waits
Modeling block 201, representation module 202, training module 203 and extraction module 204.
Wherein, candidate block 201 is for constructing candidate user inquiry pair;Representation module 202 is used for using previously according to sight
The feature for examining setup measures inquires the candidate user to being expressed as feature vector;Training module 203 is used for the time
It selects user's inquiry of family inquiry centering preset quantity to carrying out after manually marking, grader is trained using supervised learning;It carries
Whether modulus block 204 is used to judge the remaining user's inquiry of the candidate user inquiry centering to according with using trained grader
Hyponymy is closed, output meets user's inquiry of hyponymy to as extraction result.
Wherein, candidate block 201 can be additionally used in:User's inquiry is clustered;Then user in class is inquired into group two-by-two
It closes, constitutes candidate user inquiry pair.
In addition, the candidate block 201 can be additionally used in:According to the Session data structure figures of user, wherein with user
The node as figure is inquired, the user's query node for then to the co-occurrence number in window being more than predetermined threshold value is attached work
For the side of figure, side right is the product of following four value:User inquires co-occurrence number, user inquires lexical set after carrying out cutting word
Similarity, the embedding vector similarities of user's inquiry, user inquire the similarity for the SKU set clicked;And utilize mark
Label propagation algorithm clusters user's query node on figure.
In the embodiment of the present invention, observation index can be, but not limited to include following one or more:Observation index one:It is upper
The SKU numbers that user inquires displaying are more than the SKU numbers that the next user inquires displaying;Observation index two:Upper user inquires displaying
SKU set inquires the degree of the SKU set of displaying comprising the next user, and the SKU set that displaying is inquired more than the next user includes
Upper user inquires the degree of the SKU set of displaying;Observation index three:Upper user's inquiry inquires displaying with the next user
SKU set overlappings quantity is bigger, and the SKU set inclusion relation confidence levels that upper user inquires displaying are higher;Observation index four:On
Position user inquires the SKU numbers clicked and inquires the SKU numbers clicked more than the next user;Observation index five:Upper user, which inquires, to be clicked
SKU set the degree of the SKU set clicked is inquired comprising the next user, be more than the next user and inquire the SKU set packets clicked
The degree for the SKU set clicked is inquired containing upper user;Observation index six:Upper user's inquiry inquires displaying with the next user
SKU set overlappings quantity is bigger, and it is higher that upper user inquires the SKU set inclusion relation confidence levels clicked.
Preceding feature can be, but not limited to include following one or more:The feature that index one is arranged according to the observation:It is candidate
The upper user that user inquires centering inquires the SKU numbers of displaying, the next user of candidate user inquiry centering inquires the SKU of displaying
Number;The feature that index two is arranged according to the observation:The upper use that candidate user inquires centering is calculated using the WeedsPrec of not cum rights
The next user of the SKU set of family inquiry displaying comprising candidate user inquiry centering inquires the degree of the SKU set of displaying, utilizes
The upper user of the balPrec calculating candidate user inquiry centerings of cum rights does not inquire the SKU set shown and is looked into comprising candidate user
The next user for asking centering inquires the degree that the SKU of displaying gathers;Candidate user is calculated using the WeedsPrec of not cum rights to inquire
The SKU set that the next user of centering inquires displaying includes that the upper user of candidate user inquiry centering inquires the SKU collection of displaying
The degree of conjunction, the SKU that the next user inquiry displaying that candidate user inquires centering is calculated using the balPrec of not cum rights gather packet
The upper user that centering is inquired containing candidate user inquires the degree that the SKU of displaying gathers;Utilize the meter of the WeedsPrec of not cum rights
Calculate result difference, using not cum rights balPrec result of calculation difference;The feature that index three is arranged according to the observation:It waits
The upper user inquiry of family inquiry centering, candidate user is selected to inquire the number of the SKU overlappings of the next user inquiry displaying of centering
Amount, candidate user inquire the upper user inquiry of centering, and the SKU that candidate user inquires the next user inquiry displaying of centering gathers
Not cum rights LIN scores, reflect the ratio of intersection;The feature that index four is arranged according to the observation:Candidate user inquires the upper of centering
Position user inquires the SKU numbers clicked, the next user of candidate user inquiry centering inquires the SKU numbers clicked;Index according to the observation
The feature of five settings:Inquiring the SKU set clicked using the upper user of WeedsPrec calculating candidate user inquiry centerings includes
The next user that candidate user inquires centering inquires the degree that the SKU clicked gathers, and weight is number of clicks, utilizes ClarkeDE
The upper user for calculating candidate user inquiry centering inquires the next user that the SKU set clicked includes candidate user inquiry centering
The degree for the SKU set that inquiry is clicked, weight are number of clicks, inquire the upper of centering using balPrec calculating candidate users
User inquires the degree that the next user of the SKU set comprising candidate user inquiry centering clicked inquires the SKU set clicked, power
Weight is number of clicks;Inquiring the SKU set clicked using the next user of WeedsPrec calculating candidate user inquiry centerings includes
The upper user that candidate user inquires centering inquires the degree that the SKU clicked gathers, and weight is number of clicks, utilizes ClarkeDE
The next user for calculating candidate user inquiry centering inquires the upper user that the SKU set clicked includes candidate user inquiry centering
The degree for the SKU set that inquiry is clicked, weight are number of clicks, calculate the bottom that candidate user inquires centering using balPrec
User inquires the degree that upper user of the SKU set comprising candidate user inquiry centering clicked inquires the SKU set clicked, power
Weight is number of clicks;Using the difference of the result of calculation of WeedsPrec, using ClarkeDE result of calculation difference, utilize
The difference of the result of calculation of balPrec;The feature that index six is arranged according to the observation:The upper user of candidate user inquiry centering looks into
It askes, the next user that candidate user inquires centering inquires the quantity for the SKU overlappings clicked, the upper use of candidate user inquiry centering
Family is inquired, and the next user that candidate user inquires centering inquires the LIN scores that the SKU clicked gathers, and reflects the ratio of intersection;Separately
Outside, it can also include some other features, such as:Feature a:Candidate user, which is calculated, with the WeedsPrec of not cum rights inquires centering
Upper user inquire displaying SKU set comprising candidate user inquiry centering the next user inquire click SKU set
Degree;Feature b:The SKU set that candidate user inquires the next user inquiry displaying of centering is calculated with the WeedsPrec of not cum rights
Including the upper user of candidate user inquiry centering inquires the degree for the SKU set clicked;The result of calculation of feature a and feature b
Difference.
In the embodiment of the present invention, training module 203 can be additionally used in:By from candidate user inquiry to the present count of middle selection
User's inquiry of amount inquires the user of the preset quantity to being divided into instruction according to preset ratio to carrying out after manually marking
Practice collection, verification collection and test set;Using the training set for being expressed as feature vector, decision tree classifier training point is promoted using gradient
Then class device utilizes the hyper parameter for the verification collection adjustment grader for being expressed as feature vector;And judge test using grader
The user of concentration inquires to whether meeting hyponymy, then calculates accuracy rate and recall rate.
In addition, training module 203 can be additionally used in:Decision tree classifier, random forest grader, support are promoted using gradient
One or more trained graders in vector machine classifier.
From the above, it can be seen that because the characteristics of using based on electric business field, designs the feature of multidimensional, and pass through
Multiple features characterize oriented user's inquiry pair, two classification based trainings for having supervision are carried out, so as to help grader to learn
To the technological means of more accurate Rule of judgment, so overcoming the limitation for using vocabulary hyponymy to extract in the prior art
Property and Webpage search domain features are less in the prior art and the technical issues of being difficult to judge, and then reached and ensured to predict
In the case that candidate user inquiry for timing to having compared with high-accuracy while there is the recall rate of higher true positive example, realizes standard
The really technique effect of extraction electric business field user inquiry hyponymy, to be provided commodity and recalling for advertisement in electric business field
Larger help;By way of inquiring cluster based on user, two two users in class are inquired and constitute candidate user inquiry pair, from
And can be relatively reasonable the inquiry pair of construction candidate user, reduce the burden that manually marks;It is retrieved by being directed under electric business environment
Webpage the characteristics of being specific commodity page, multiple observation index are proposed, so as to inquire candidate user to being expressed as examining
The numerous features for considering above multiple observation index, to help to accurately identify user's inquiry to whether being hyponymy;It is logical
Cross and promote decision tree training grader on training set using gradient, be used in combination the grader acquired candidate user is inquired pair and
The follow-up candidate user inquiry newly excavated is to carrying out the two positive and negative judgements classified, so as to obtain accurate user's inquiry up and down
Position extraction result.
Below with reference to Fig. 3, it illustrates the computer systems 300 suitable for the terminal device for realizing the embodiment of the present application
Structural schematic diagram.Terminal device shown in Fig. 3 is only an example, to the function of the embodiment of the present application and should not use model
Shroud carrys out any restrictions.
As shown in figure 3, computer system 300 includes central processing unit (CPU) 301, it can be read-only according to being stored in
Program in memory (ROM) 302 or be loaded into the program in random access storage device (RAM) 303 from storage section 308 and
Execute various actions appropriate and processing.In RAM 303, also it is stored with system 300 and operates required various programs and data.
CPU 301, ROM 302 and RAM 303 are connected with each other by bus 304.Input/output (I/O) interface 305 is also connected to always
Line 304.
It is connected to I/O interfaces 305 with lower component:Importation 306 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 307 of spool (CRT), liquid crystal display (LCD) etc. and loud speaker etc.;Storage section 308 including hard disk etc.;
And the communications portion 309 of the network interface card including LAN card, modem etc..Communications portion 309 via such as because
The network of spy's net executes communication process.Driver 310 is also according to needing to be connected to I/O interfaces 305.Detachable media 311, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on driver 310, as needed in order to be read from thereon
Computer program be mounted into storage section 308 as needed.
Particularly, according to embodiment disclosed by the invention, the process of key step figure description above may be implemented as counting
Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product comprising be carried on computer
Computer program on readable medium, the computer program include the program generation for executing method shown in key step figure
Code.In such embodiments, which can be downloaded and installed by communications portion 309 from network, and/or
It is mounted from detachable media 311.When the computer program is executed by central processing unit (CPU) 301, execute the application's
The above-mentioned function of being limited in system.
It should be noted that computer-readable medium shown in the application can be computer-readable signal media or meter
Calculation machine readable storage medium storing program for executing either the two arbitrarily combines.Computer readable storage medium for example can be --- but not
Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or arbitrary above combination.Meter
The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to:Electrical connection with one or more conducting wires, just
It takes formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type and may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In this application, can be any include computer readable storage medium or storage journey
The tangible medium of sequence, the program can be commanded the either device use or in connection of execution system, device.And at this
In application, computer-readable signal media may include in a base band or as the data-signal that a carrier wave part is propagated,
Wherein carry computer-readable program code.Diversified forms may be used in the data-signal of this propagation, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By instruction execution system, device either device use or program in connection.Include on computer-readable medium
Program code can transmit with any suitable medium, including but not limited to:Wirelessly, electric wire, optical cable, RF etc. or above-mentioned
Any appropriate combination.
Key step figure in attached drawing and block diagram, it is illustrated that according to the system, method and calculating of the various embodiments of the application
The architecture, function and operation in the cards of machine program product.In this regard, each side in key step figure or block diagram
Frame can represent a part for a module, program segment, or code, and a part for above-mentioned module, program segment, or code includes one
A or multiple executable instructions for implementing the specified logical function.It should also be noted that in some implementations as replacements,
The function of being marked in box can also occur in a different order than that indicated in the drawings.For example, two succeedingly indicate
Box can essentially be basically executed in parallel, they can also be executed in the opposite order sometimes, this is according to involved function
Depending on.It is also noted that each box and block diagram in block diagram or key step figure or the box in key step figure
Combination, the dedicated hardware based systems of the functions or operations as defined in execution realize, or can use it is special firmly
The combination of part and computer instruction is realized.
Being described in module involved in the embodiment of the present application can be realized by way of software, can also be by hard
The mode of part is realized.Described module can also be arranged in the processor, for example, can be described as:A kind of processor packet
Include candidate block, representation module, training module and extraction module.Wherein, the title of these modules is not under certain conditions
The restriction to the module itself is constituted, for example, candidate block is also described as " module of construction candidate user inquiry pair ".
As on the other hand, present invention also provides a kind of computer-readable medium, which can be
Included in equipment described in above-described embodiment;Can also be individualism, and without be incorporated the equipment in.Above-mentioned calculating
Machine readable medium carries one or more program, when said one or multiple programs are executed by the equipment, makes
Obtaining the equipment includes:Construct candidate user inquiry pair;Using the feature being arranged previously according to observation index, by the candidate user
Inquiry is to being expressed as feature vector;The user of middle preset quantity is being inquired to manually being marked in candidate user inquiry
Later, grader is trained using supervised learning;Judge that the candidate user inquiry centering is remaining using trained grader
For user's inquiry to whether meeting hyponymy, output meets user's inquiry of hyponymy to as extraction result.
Technical solution according to the ... of the embodiment of the present invention, because the characteristics of using based on electric business field, designs the spy of multidimensional
Sign, and characterize by multiple features oriented user's inquiry pair, carries out two classification based trainings for having supervision, so as to help point
Class device learns the technological means to more accurate Rule of judgment, is carried in the prior art using vocabulary hyponymy so overcoming
The limitation that takes and Webpage search domain features are less in the prior art and the technical issues of being difficult to judge, and then reached
Ensure in the case that predicting candidate user inquires to having compared with high-accuracy for timing while having the recall rate of higher true positive example,
The technique effect for realizing accurate extraction electric business field user inquiry hyponymy is in electric business field to commodity and advertisement
It recalls and larger help is provided;By way of inquiring cluster based on user, two two users in class are inquired and constitute candidate user
Inquiry pair reduces the burden manually marked so as to relatively reasonable construction candidate user inquiry pair;By being directed to electric business ring
The characteristics of webpage retrieved under border is specific commodity page proposes multiple observation index, so as to by candidate user inquiry pair
It is expressed as considering numerous features of above multiple observation index, to help to accurately identify user's inquiry to whether being upper bottom
Relationship;Decision tree training grader on training set is promoted by using gradient, the grader acquired is used in combination to look into candidate user
Inquiry pair and the follow-up candidate user inquiry newly excavated are to carrying out the two positive and negative judgements classified, so as to obtain accurate user
The next extraction result in inquiry.
Above-mentioned specific implementation mode, does not constitute limiting the scope of the invention.Those skilled in the art should be bright
It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and replacement can occur.It is any
Modifications, equivalent substitutions and improvements made by within the spirit and principles in the present invention etc., should be included in the scope of the present invention
Within.
Claims (16)
1. a kind of method that user inquires hyponymy extraction, which is characterized in that including:
Construct candidate user inquiry pair;
Using the feature being arranged previously according to observation index, the candidate user is inquired to being expressed as feature vector;
After manually being marked to progress to user's inquiry of middle preset quantity to candidate user inquiry, supervised learning is utilized
Training grader;
Judge the remaining user's inquiry of the candidate user inquiry centering to whether meeting upper bottom using trained grader
Relationship, output meet user's inquiry of hyponymy to as extraction result.
2. according to the method described in claim 1, it is characterized in that, construction candidate user inquiry is to including:
User's inquiry is clustered;
Then user in class is inquired into combination of two, constitutes candidate user inquiry pair.
3. according to the method described in claim 2, it is characterized in that, to user inquiry carry out cluster include:
According to the Session data structure figures of user, wherein using user's inquiry as the node of figure, then in window
Co-occurrence number is more than that user's query node of predetermined threshold value is attached the side as figure, and side right is the product of following four value:
User inquires co-occurrence number, user inquires the similarity of lexical set after carrying out cutting word, the embedding vector phases of user's inquiry
The similarity for the natural result set clicked is inquired like degree, user;And
User's query node on figure is clustered using label propagation algorithm.
4. according to the method described in claim 1, it is characterized in that, the observation index includes following one or more:
Observation index one:The natural result number that upper user inquires displaying is more than the natural result number that the next user inquires displaying;
Observation index two:The natural result set that upper user inquires displaying includes the natural result collection that the next user inquires displaying
The degree of conjunction, it includes the natural result set that upper user inquires displaying to be more than the next user to inquire the natural result set shown
Degree;
Observation index three:Upper user's inquiry quantity Chong Die with the natural result set that the next user inquires displaying is bigger, upper
The natural result set inclusion relation confidence level that user inquires displaying is higher;
Observation index four:Upper user inquires the natural result number clicked and is more than the natural result number that the next user's inquiry is clicked;
Observation index five:It includes that the next user inquires the natural result collection clicked that upper user, which inquires the natural result set clicked,
The degree of conjunction, it includes that upper user inquires the natural result set clicked to be more than the next user to inquire the natural result set clicked
Degree;
Observation index six:The overlapping quantity that upper user's inquiry inquires the natural result set clicked with the next user is bigger, on
It is higher that position user inquires the natural result set inclusion relation confidence level clicked.
5. according to the method described in claim 4, it is characterized in that, the feature includes following one or more:
The feature that index one is arranged according to the observation:The natural result number of the upper user inquiry displaying of candidate user inquiry centering,
The next user that candidate user inquires centering inquires the natural result number shown;
The feature that index two is arranged according to the observation:Candidate user, which is calculated, using the WeedsPrec of not cum rights inquires the upper of centering
The natural result set that user inquires displaying includes that the next user of candidate user inquiry centering inquires the natural result collection of displaying
The degree of conjunction calculates the natural result that candidate user inquires the upper user inquiry displaying of centering using the balPrec of not cum rights
The next user of the set comprising candidate user inquiry centering inquires the degree of the natural result set of displaying;Utilize not cum rights
The natural result set that WeedsPrec calculates the next user inquiry displaying of candidate user inquiry centering includes that candidate user is inquired
The upper user of centering is inquired the degree of the natural result set of displaying, is looked into using the balPrec calculating candidate user of not cum rights
The natural result set that the next user of inquiry centering inquires displaying includes the upper user inquiry displaying of candidate user inquiry centering
Natural result set degree;Using the difference of the result of calculation of the WeedsPrec of not cum rights, utilize not cum rights
The difference of the result of calculation of balPrec;
The feature that index three is arranged according to the observation:Candidate user inquires the upper user inquiry of centering, and candidate user inquires centering
The next user inquire displaying natural result overlapping quantity, candidate user inquiry centering upper user inquiry, candidate use
The not cum rights LIN scores of the natural result set of the next user inquiry displaying of centering are inquired at family, reflect the ratio of intersection;
The feature that index four is arranged according to the observation:The natural result number of the upper user inquiry click of candidate user inquiry centering,
The next user that candidate user inquires centering inquires the natural result number clicked;
The feature that index five is arranged according to the observation:The upper user inquiry that candidate user inquires centering is calculated using WeedsPrec
The natural result set of click includes that the next user of candidate user inquiry centering inquires the degree for the natural result set clicked,
Weight is number of clicks, is calculated the natural result collection that the upper user inquiry of candidate user inquiry centering is clicked using ClarkeDE
Close comprising candidate user inquiry centering the next user inquire click natural result set degree, weight be number of clicks,
The natural result set that the upper user inquiry click that candidate user inquires centering is calculated using balPrec is looked into comprising candidate user
The next user for asking centering inquires the degree for the natural result set clicked, and weight is number of clicks;It is calculated using WeedsPrec
The next user that candidate user inquires centering inquires the upper use that the natural result set clicked includes candidate user inquiry centering
The degree for the natural result set that family inquiry is clicked, weight are number of clicks, are calculated candidate user inquiry pair using ClarkeDE
In the next user inquire click natural result set include candidate user inquiry centering upper user inquire click from
The degree of right results set, weight are number of clicks, are calculated the next user inquiry that candidate user inquires centering using balPrec
The natural result set of click includes that the upper user of candidate user inquiry centering inquires the degree for the natural result set clicked,
Weight is number of clicks;Using the difference of the result of calculation of WeedsPrec, using ClarkeDE result of calculation difference, profit
With the difference of the result of calculation of balPrec;
The feature that index six is arranged according to the observation:Candidate user inquires the upper user inquiry of centering, and candidate user inquires centering
The next user inquire click natural result overlapping quantity, candidate user inquiry centering upper user inquiry, candidate use
The LIN scores of the natural result set of the next user inquiry click of centering are inquired at family, reflect the ratio of intersection;
Other feature:Feature a:The upper user inquiry displaying that candidate user inquires centering is calculated with the WeedsPrec of not cum rights
Natural result set include candidate user inquiry centering the next user inquire click natural result set degree;Feature
b:Inquiring the natural result set shown with the next user of the WeedsPrec calculating candidate user inquiry centerings of not cum rights includes
Candidate user inquires the degree of the natural result set of the upper user inquiry click of centering;The result of calculation of feature a and feature b
Difference.
6. according to the method described in claim 1, it is characterized in that, in the use to candidate user inquiry to middle preset quantity
After family inquiry is to carrying out manually mark, further include using supervised learning training grader:
It, will be described pre- after it will manually be marked to progress to user's inquiry of the preset quantity of middle selection from candidate user inquiry
If user's inquiry of quantity according to preset ratio to being divided into training set, verification collection and test set;
Using the training set for being expressed as feature vector, decision tree classifier is promoted using gradient and trains grader, then utilizes table
It is shown as the hyper parameter of the verification collection adjustment grader of feature vector;And
The inquiry of the user in test set is judged to whether meeting hyponymy using grader, is then calculated accuracy rate and is recalled
Rate.
7. according to the method described in claim 1, it is characterized in that, further including using supervised learning training grader:Utilize ladder
Degree promotes one or more trained graders in decision tree classifier, random forest grader, support vector machine classifier.
8. a kind of user inquires the device of hyponymy extraction, which is characterized in that including:
Candidate block, for constructing candidate user inquiry pair;
Representation module, for using the feature being arranged previously according to observation index, the candidate user to be inquired to being expressed as spy
Sign vector;
Training module, for inquiring the user of middle preset quantity to manually being marked it to candidate user inquiry
Afterwards, grader is trained using supervised learning;
Extraction module, for judging the remaining user's inquiry of the candidate user inquiry centering to being using trained grader
No to meet hyponymy, output meets user's inquiry of hyponymy to as extraction result.
9. device according to claim 8, which is characterized in that the candidate block is additionally operable to:
User's inquiry is clustered;
Then user in class is inquired into combination of two, constitutes candidate user inquiry pair.
10. device according to claim 9, which is characterized in that the candidate block is additionally operable to:
According to the Session data structure figures of user, wherein using user's inquiry as the node of figure, then in window
Co-occurrence number is more than that user's query node of predetermined threshold value is attached the side as figure, and side right is the product of following four value:
User inquires co-occurrence number, user inquires the similarity of lexical set after carrying out cutting word, the embedding vector phases of user's inquiry
The similarity for the natural result set clicked is inquired like degree, user;And
User's query node on figure is clustered using label propagation algorithm.
11. device according to claim 8, which is characterized in that the observation index includes following one or more:
Observation index one:The natural result number that upper user inquires displaying is more than the natural result number that the next user inquires displaying;
Observation index two:The natural result set that upper user inquires displaying includes the natural result collection that the next user inquires displaying
The degree of conjunction, it includes the natural result set that upper user inquires displaying to be more than the next user to inquire the natural result set shown
Degree;
Observation index three:Upper user's inquiry quantity Chong Die with the natural result set that the next user inquires displaying is bigger, upper
The natural result set inclusion relation confidence level that user inquires displaying is higher;
Observation index four:Upper user inquires the natural result number clicked and is more than the natural result number that the next user's inquiry is clicked;
Observation index five:It includes that the next user inquires the natural result collection clicked that upper user, which inquires the natural result set clicked,
The degree of conjunction, it includes that upper user inquires the natural result set clicked to be more than the next user to inquire the natural result set clicked
Degree;
Observation index six:Upper user's inquiry quantity Chong Die with the natural result set that the next user inquires displaying is bigger, upper
It is higher that user inquires the natural result set inclusion relation confidence level clicked.
12. according to the devices described in claim 11, which is characterized in that the feature includes following one or more:
The feature that index one is arranged according to the observation:The natural result number of the upper user inquiry displaying of candidate user inquiry centering,
The next user that candidate user inquires centering inquires the natural result number shown;
The feature that index two is arranged according to the observation:Candidate user, which is calculated, using the WeedsPrec of not cum rights inquires the upper of centering
The natural result set that user inquires displaying includes that the next user of candidate user inquiry centering inquires the natural result collection of displaying
The degree of conjunction calculates the natural result that candidate user inquires the upper user inquiry displaying of centering using the balPrec of not cum rights
The next user of the set comprising candidate user inquiry centering inquires the degree of the natural result set of displaying;Utilize not cum rights
The natural result set that WeedsPrec calculates the next user inquiry displaying of candidate user inquiry centering includes that candidate user is inquired
The upper user of centering is inquired the degree of the natural result set of displaying, is looked into using the balPrec calculating candidate user of not cum rights
The natural result set that the next user of inquiry centering inquires displaying includes the upper user inquiry displaying of candidate user inquiry centering
Natural result set degree;Using the difference of the result of calculation of the WeedsPrec of not cum rights, utilize not cum rights
The difference of the result of calculation of balPrec;
The feature that index three is arranged according to the observation:Candidate user inquires the upper user inquiry of centering, and candidate user inquires centering
The next user inquire displaying natural result overlapping quantity, candidate user inquiry centering upper user inquiry, candidate use
The not cum rights LIN scores of the natural result set of the next user inquiry displaying of centering are inquired at family, reflect the ratio of intersection;
The feature that index four is arranged according to the observation:The natural result number of the upper user inquiry click of candidate user inquiry centering,
The next user that candidate user inquires centering inquires the natural result number clicked;
The feature that index five is arranged according to the observation:The upper user inquiry that candidate user inquires centering is calculated using WeedsPrec
The natural result set of click includes that the next user of candidate user inquiry centering inquires the degree for the natural result set clicked,
Weight is number of clicks, is calculated the natural result collection that the upper user inquiry of candidate user inquiry centering is clicked using ClarkeDE
Close comprising candidate user inquiry centering the next user inquire click natural result set degree, weight be number of clicks,
The natural result set that the upper user inquiry click that candidate user inquires centering is calculated using balPrec is looked into comprising candidate user
The next user for asking centering inquires the degree for the natural result set clicked, and weight is number of clicks;It is calculated using WeedsPrec
The next user that candidate user inquires centering inquires the upper use that the natural result set clicked includes candidate user inquiry centering
The degree for the natural result set that family inquiry is clicked, weight are number of clicks, are calculated candidate user inquiry pair using ClarkeDE
In the next user inquire click natural result set include candidate user inquiry centering upper user inquire click from
The degree of right results set, weight are number of clicks, are calculated the next user inquiry that candidate user inquires centering using balPrec
The natural result set of click includes that the upper user of candidate user inquiry centering inquires the degree for the natural result set clicked,
Weight is number of clicks;Using the difference of the result of calculation of WeedsPrec, using ClarkeDE result of calculation difference, profit
With the difference of the result of calculation of balPrec;
The feature that index six is arranged according to the observation:Candidate user inquires the upper user inquiry of centering, and candidate user inquires centering
The next user inquire click natural result overlapping quantity, candidate user inquiry centering upper user inquiry, candidate use
The LIN scores of the natural result set of the next user inquiry click of centering are inquired at family, reflect the ratio of intersection;
Other feature:Feature a:The upper user inquiry displaying that candidate user inquires centering is calculated with the WeedsPrec of not cum rights
Natural result set include candidate user inquiry centering the next user inquire click natural result set degree;Feature
b:Inquiring the natural result set shown with the next user of the WeedsPrec calculating candidate user inquiry centerings of not cum rights includes
Candidate user inquires the degree of the natural result set of the upper user inquiry click of centering;The result of calculation of feature a and feature b
Difference.
13. device according to claim 8, which is characterized in that the training module is additionally operable to:
It, will be described pre- after it will manually be marked to progress to user's inquiry of the preset quantity of middle selection from candidate user inquiry
If user's inquiry of quantity according to preset ratio to being divided into training set, verification collection and test set;
Using the training set for being expressed as feature vector, decision tree classifier is promoted using gradient and trains grader, then utilizes table
It is shown as the hyper parameter of the verification collection adjustment grader of feature vector;And
The inquiry of the user in test set is judged to whether meeting hyponymy using grader, is then calculated accuracy rate and is recalled
Rate.
14. device according to claim 8, which is characterized in that the training module is additionally operable to:Decision is promoted using gradient
One or more trained graders in Tree Classifier, random forest grader, support vector machine classifier.
15. a kind of electronic equipment, which is characterized in that including:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors so that one or more of processors are real
The now method as described in any one of claim 1-7.
16. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor
The method as described in any in claim 1-7 is realized when row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710260844.3A CN108733702B (en) | 2017-04-20 | 2017-04-20 | Method, device, electronic equipment and medium for extracting upper and lower relation of user query |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710260844.3A CN108733702B (en) | 2017-04-20 | 2017-04-20 | Method, device, electronic equipment and medium for extracting upper and lower relation of user query |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108733702A true CN108733702A (en) | 2018-11-02 |
CN108733702B CN108733702B (en) | 2020-09-29 |
Family
ID=63933408
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710260844.3A Active CN108733702B (en) | 2017-04-20 | 2017-04-20 | Method, device, electronic equipment and medium for extracting upper and lower relation of user query |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108733702B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110968665A (en) * | 2019-11-08 | 2020-04-07 | 浙江工业大学 | Method for recognizing upper and lower level word relation based on gradient enhanced decision tree |
CN111288973A (en) * | 2020-01-23 | 2020-06-16 | 中山大学 | Method and device for obtaining flow rate of sea surface, computer equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103699568A (en) * | 2013-11-16 | 2014-04-02 | 西安交通大学城市学院 | Method for extracting hyponymy relation of field terms from wikipedia |
CN104615724A (en) * | 2015-02-06 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Establishing method of knowledge base and information search method and device based on knowledge base |
CN105654144A (en) * | 2016-02-29 | 2016-06-08 | 东南大学 | Social network body constructing method based on machine learning |
CN105808525A (en) * | 2016-03-29 | 2016-07-27 | 国家计算机网络与信息安全管理中心 | Domain concept hypernym-hyponym relation extraction method based on similar concept pairs |
US20160292149A1 (en) * | 2014-08-02 | 2016-10-06 | Google Inc. | Word sense disambiguation using hypernyms |
CN106569993A (en) * | 2015-10-10 | 2017-04-19 | 中国移动通信集团公司 | Method and device for mining hypernym-hyponym relation between domain-specific terms |
-
2017
- 2017-04-20 CN CN201710260844.3A patent/CN108733702B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103699568A (en) * | 2013-11-16 | 2014-04-02 | 西安交通大学城市学院 | Method for extracting hyponymy relation of field terms from wikipedia |
US20160292149A1 (en) * | 2014-08-02 | 2016-10-06 | Google Inc. | Word sense disambiguation using hypernyms |
CN104615724A (en) * | 2015-02-06 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Establishing method of knowledge base and information search method and device based on knowledge base |
CN106569993A (en) * | 2015-10-10 | 2017-04-19 | 中国移动通信集团公司 | Method and device for mining hypernym-hyponym relation between domain-specific terms |
CN105654144A (en) * | 2016-02-29 | 2016-06-08 | 东南大学 | Social network body constructing method based on machine learning |
CN105808525A (en) * | 2016-03-29 | 2016-07-27 | 国家计算机网络与信息安全管理中心 | Domain concept hypernym-hyponym relation extraction method based on similar concept pairs |
Non-Patent Citations (2)
Title |
---|
LILI KOTLERMAN ET AL.,: "Directional Distributional Similarity for Lexical Expansion", 《PROCEEDINGS OF THE ACL-IJCNLP 2009 CONFERENCE SHORT PAPERS》 * |
付瑞吉: "开放域命名实体识别及其层次化类别获取", 《中国博士学位论文全文数据库信息科技辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110968665A (en) * | 2019-11-08 | 2020-04-07 | 浙江工业大学 | Method for recognizing upper and lower level word relation based on gradient enhanced decision tree |
CN110968665B (en) * | 2019-11-08 | 2022-09-23 | 浙江工业大学 | Method for recognizing upper and lower level word relation based on gradient enhanced decision tree |
CN111288973A (en) * | 2020-01-23 | 2020-06-16 | 中山大学 | Method and device for obtaining flow rate of sea surface, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108733702B (en) | 2020-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11347782B2 (en) | Internet text mining-based method and apparatus for judging validity of point of interest | |
CN103491205B (en) | The method for pushing of a kind of correlated resources address based on video search and device | |
CN107832468B (en) | Demand recognition methods and device | |
US11709999B2 (en) | Method and apparatus for acquiring POI state information, device and computer storage medium | |
CN107463704A (en) | Searching method and device based on artificial intelligence | |
CN110347894A (en) | Knowledge mapping processing method, device, computer equipment and storage medium based on crawler | |
CN109933660B (en) | API information search method towards natural language form based on handout and website | |
CN110134800A (en) | A kind of document relationships visible processing method and device | |
CN103324666A (en) | Topic tracing method and device based on micro-blog data | |
CN108959305A (en) | A kind of event extraction method and system based on internet big data | |
CN108628811A (en) | The matching process and device of address text | |
CN111026937A (en) | Method, device and equipment for extracting POI name and computer storage medium | |
CN109582788A (en) | Comment spam training, recognition methods, device, equipment and readable storage medium storing program for executing | |
CN106537387B (en) | Retrieval/storage image associated with event | |
CN110532352A (en) | Text duplicate checking method and device, computer readable storage medium, electronic equipment | |
EP3961426A2 (en) | Method and apparatus for recommending document, electronic device and medium | |
CN109828906A (en) | UI automated testing method, device, electronic equipment and storage medium | |
CN106407316A (en) | Topic model-based software question and answer recommendation method and device | |
CN110019849A (en) | A kind of video concern moment search method and device based on attention mechanism | |
CN117743543A (en) | Sentence generation method and device based on large language model and electronic equipment | |
CN108733702A (en) | User inquires method, apparatus, electronic equipment and the medium of hyponymy extraction | |
CN110516062A (en) | A kind of search processing method and device of document | |
US20210271637A1 (en) | Creating descriptors for business analytics applications | |
CN105095385B (en) | A kind of output method and device of retrieval result | |
CN116719915A (en) | Intelligent question-answering method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |