CN108733702A - User inquires method, apparatus, electronic equipment and the medium of hyponymy extraction - Google Patents

User inquires method, apparatus, electronic equipment and the medium of hyponymy extraction Download PDF

Info

Publication number
CN108733702A
CN108733702A CN201710260844.3A CN201710260844A CN108733702A CN 108733702 A CN108733702 A CN 108733702A CN 201710260844 A CN201710260844 A CN 201710260844A CN 108733702 A CN108733702 A CN 108733702A
Authority
CN
China
Prior art keywords
user
inquiry
inquires
centering
natural result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710260844.3A
Other languages
Chinese (zh)
Other versions
CN108733702B (en
Inventor
张俊浩
江雪
徐夙龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710260844.3A priority Critical patent/CN108733702B/en
Publication of CN108733702A publication Critical patent/CN108733702A/en
Application granted granted Critical
Publication of CN108733702B publication Critical patent/CN108733702B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present invention provides a kind of method, apparatus, electronic equipment and the medium of user's inquiry hyponymy extraction, the extraction for the user's inquiry that in electric business field, can carry out hyponymy, to provide effective support for commodity and recalling for advertisement.This method includes:Construct candidate user inquiry pair;Using the feature being arranged previously according to observation index, the candidate user is inquired to being expressed as feature vector;After manually being marked to progress to user's inquiry of middle preset quantity to candidate user inquiry, supervised learning is utilized to train grader;The remaining user's inquiry of the candidate user inquiry centering is judged to whether meeting hyponymy using trained grader, and output meets user's inquiry of hyponymy to as extraction result.

Description

User inquires method, apparatus, electronic equipment and the medium of hyponymy extraction
Technical field
The present invention relates to method, dresses that field of computer technology more particularly to a kind of user inquire hyponymy extraction It sets, electronic equipment and medium.
Background technology
Hyponymy is generally regarded as a kind of lexical semantic relation to study.It is different between semantic similar vocabulary Relationship type, and to refer to the semanteme of some vocabulary include the semanteme of another vocabulary to hyponymy, then it is semantic by comprising Vocabulary is known as hyponym, another word is known as hypernym.For example " animal " is the upper of " cat ", " cat " is the bottom of " animal ".
In electric business field, the search user of user inquiry (query, the i.e. querying condition of user, usually a short sentence) Typically it is directed to the description of certain product.There is also hyponymies, such as user's inquiry " iPhone " between user's inquiry is User inquires the bottom of " smart mobile phone ", and the next user's inquiry is a specialization of upper user's inquiry.When user inputs user When inquiry, the next user inquiry of user inquiry can be utilized to remove retrieval commodity and advertisement, the commodity retrieved or advertisement symbol The semanteme for closing the next user's inquiry, also complies with the semanteme of upper user's inquiry naturally, and retrieval result is for a user can be with Receive.It can thus be seen that in electric business field, extracting user's inquiry with hyponymy can be to commodity and advertisement Recall larger help.
In the prior art, include mainly following aspects for the research of hyponymy extraction:
Mainly have for the method for vocabulary hyponymy extraction:1. based on two vocabulary in the same sentence when co-occurrence Route characteristic, determine whether hyponymy using template or grader;2. the context occurred every time based on vocabulary Feature is included oriented comprising degree between being calculated two feature vectors comprising hypothesis based on distributed, and is based on two vocabulary The directly trained grader of contextual feature vector.
In Webpage search field, there is the less hyponymy for researching and analysing user's inquiry.Vocabulary is compared in user's inquiry With more rich semanteme, the semantic of the contained multiple vocabulary of upper user's inquiry must have similar table in the next user inquires It reaches or the expression of specialization.For example " Samsung large-screen mobile phone " constitutes hyponymy, but " Samsung with " Samsung big screen intelligent machine " Large-screen mobile phone " does not constitute hyponymy with " Samsung 4G cell phone ".It is directed to user in the prior art and inquires carrying for hyponymy The research comparative maturity taken is to obtain the hyponymy of user's inquiry by analyzing the click data of user.In analysis user Click data when, be related to it is following 3 hypothesis:1. if two users inquire related, the corresponding click of two users' inquiries Collections of web pages needs intersection or there are similitudes;2. if user inquires qiIt is that user inquires qjUpper, then qjBig portion Divide webpage clicking and qiWebpage clicking it is similar, and qiWebpage clicking only have part and qjWebpage clicking it is similar;If 3. with Family inquiry is bottom, then consistency higher in the content of its webpage clicking.By assuming that 1 generates candidate the next user's inquiry It is right, by assuming that 2 can design an inclusion relation, by assuming that 3 can design the extensive degree of user inquiry, then utilize Whether the upper the next user's inquiry of the two index Designs threshold decision candidate is to being true.
In realizing process of the present invention, inventor has found that at least there are the following problems in the prior art:
1, in Webpage search field and electric business platform field, the context user inquiry that user inquires in session is not sufficient to The accurate semanteme for portraying user's inquiry, while not having route characteristic between user's inquiry, therefore the upper bottom based on vocabulary is closed It is the upper the next extraction that extractive technique can not be directly applied for the inquiry of electric business platform field user;
2 and at present in Webpage search field, it is less that extraction user inquires the feature used when hyponymy, and in electricity The consistency of stringent web page contents is more difficult on quotient's platform judges that (such as each attribute of commodity of commodity page displaying will unanimously It can determine whether the consistency of content).It is difficult to protect due to lacking for the technical optimization on electric business platform this special screne Card predicting candidate user inquiry to for timing have compared with high-accuracy (grader be judged as in the sample of positive example be really positive example ratio Example) in the case of, while thering is the recall rate of higher true positive example (to be really classified device for the sample of positive example and be judged as positive example Ratio).
Invention content
In view of this, the embodiment of the present invention provides the method, apparatus that a kind of user inquires hyponymy extraction, electronics is set Standby and medium, the extraction for user's inquiry that hyponymy in electric business field, can be carried out, to recalling for commodity and advertisement Effective support is provided.
To achieve the above object, according to an aspect of the invention, there is provided a kind of user inquires hyponymy extraction Method.
A kind of user of the embodiment of the present invention inquires the method that hyponymy extracts:Construct candidate user inquiry It is right;Using the feature being arranged previously according to observation index, the candidate user is inquired to being expressed as feature vector;To described Candidate user inquires user's inquiry of centering preset quantity to carrying out after manually marking, and grader is trained using supervised learning; Judge that the remaining user of the candidate user inquiry centering inquires to whether meeting hyponymy using trained grader, Output meets user's inquiry of hyponymy to as extraction result.
Optionally, the inquiry of construction candidate user is to including:User's inquiry is clustered;Then by user's inquiry two in class Two combinations constitute candidate user inquiry pair.
Optionally, carrying out cluster to user's inquiry includes:According to the Session data structure figures of user, wherein with user The node as figure is inquired, the user's query node for then to the co-occurrence number in window being more than predetermined threshold value is attached work For the side of figure, side right is the product of following four value:User inquires co-occurrence number, user inquires lexical set after carrying out cutting word The similarity for the natural result set that similarity, the embedding vector similarities of user's inquiry, user's inquiry are clicked;And User's query node on figure is clustered using label propagation algorithm.
Optionally, the observation index includes following one or more:Observation index one:Upper user inquires oneself of displaying Right number of results is more than the natural result number that the next user inquires displaying;Observation index two:Upper user inquires the natural knot of displaying Fruit set includes the degree that the next user inquires the natural result set of displaying, is more than the natural result that the next user inquires displaying Set inquires the degree of the natural result set of displaying comprising upper user;Observation index three:Upper user's inquiry is used with the next The natural result set overlapping quantity of family inquiry displaying is bigger, and the natural result set inclusion relation that upper user inquires displaying is set Reliability is higher;Observation index four:Upper user inquires the natural result number clicked and is more than the natural knot that the next user's inquiry is clicked Fruit number;Observation index five:It includes that the next user inquires the natural result clicked that upper user, which inquires the natural result set clicked, The degree of set, it includes that upper user inquires the natural result collection clicked to be more than the next user to inquire the natural result set clicked The degree of conjunction;Observation index six:Upper user's inquiry quantity Chong Die with the natural result set that the next user inquires displaying is bigger, It is higher that upper user inquires the natural result set inclusion relation confidence level clicked.
Optionally, the feature includes following one or more:The feature that index one is arranged according to the observation:Candidate user is looked into The upper user for asking centering inquires the natural result number of displaying, the next user of candidate user inquiry centering inquires the nature of displaying Number of results;The feature that index two is arranged according to the observation:Candidate user, which is calculated, using the WeedsPrec of not cum rights inquires the upper of centering The natural result set that position user inquires displaying includes that the next user of candidate user inquiry centering inquires the natural result of displaying The degree of set calculates the natural knot that candidate user inquires the upper user inquiry displaying of centering using the balPrec of not cum rights Fruit set includes the degree that the next user of candidate user inquiry centering inquires the natural result set of displaying;Utilize not cum rights The natural result set that WeedsPrec calculates the next user inquiry displaying of candidate user inquiry centering includes that candidate user is inquired The upper user of centering is inquired the degree of the natural result set of displaying, is looked into using the balPrec calculating candidate user of not cum rights The natural result set that the next user of inquiry centering inquires displaying includes the upper user inquiry displaying of candidate user inquiry centering Natural result set degree;Using the difference of the result of calculation of the WeedsPrec of not cum rights, utilize not cum rights The difference of the result of calculation of balPrec;The feature that index three is arranged according to the observation:The upper user of candidate user inquiry centering looks into Inquiry, quantity, the candidate user that candidate user inquires the natural result overlapping of the next user inquiry displaying of centering inquire centering Upper user's inquiry, candidate user inquire the not cum rights LIN scores of the natural result set of the next user inquiry displaying of centering, Reflect the ratio of intersection;The feature that index four is arranged according to the observation:The upper user that candidate user inquires centering inquires click The next user that natural result number, candidate user inquire centering inquires the natural result number clicked;Index five is arranged according to the observation Feature:Inquiring the natural result set clicked using the upper user of WeedsPrec calculating candidate user inquiry centerings includes Candidate user inquires the degree of the natural result set of the next user inquiry click of centering, and weight is number of clicks, utilizes The natural result set that ClarkeDE calculates the upper user inquiry click of candidate user inquiry centering includes that candidate user is inquired The next user of centering inquires the degree for the natural result set clicked, and weight is number of clicks, calculates candidate using balPrec The natural result set that user inquires the upper user inquiry click of centering includes that the next user of candidate user inquiry centering looks into The degree for the natural result set clicked is ask, weight is number of clicks;Candidate user, which is calculated, using WeedsPrec inquires centering The next user inquires the upper user that the natural result set clicked includes candidate user inquiry centering and inquires the natural knot clicked The degree of fruit set, weight are number of clicks, calculate the next user's query point that candidate user inquires centering using ClarkeDE The natural result set hit includes that the upper user of candidate user inquiry centering inquires the degree for the natural result set clicked, power Weight is number of clicks, is calculated the natural result set that the next user inquiry of candidate user inquiry centering is clicked using balPrec Including the upper user of candidate user inquiry centering inquires the degree for the natural result set clicked, weight is number of clicks;Profit With the difference of the result of calculation of WeedsPrec, using ClarkeDE result of calculation difference, utilize the calculating knot of balPrec The difference of fruit;The feature that index six is arranged according to the observation:Candidate user inquires the upper user inquiry of centering, candidate user inquiry The next user of centering inquires the quantity for the natural result overlapping clicked, the upper user of candidate user inquiry centering inquires, and waits It selects the next user of family inquiry centering to inquire the LIN scores for the natural result set clicked, reflects the ratio of intersection;Other spies Sign:Feature a:The natural result that candidate user inquires the upper user inquiry displaying of centering is calculated with the WeedsPrec of not cum rights The next user of the set comprising candidate user inquiry centering inquires the degree for the natural result set clicked;Feature b:With not cum rights WeedsPrec calculate candidate user inquiry centering the next user inquire displaying natural result set looked into comprising candidate user The upper user for asking centering inquires the degree for the natural result set clicked;The difference of the result of calculation of feature a and feature b.
Optionally, after manually being marked to progress to user's inquiry of middle preset quantity to candidate user inquiry, Further include using supervised learning training grader:By user's inquiry pair from candidate user inquiry to the preset quantity of middle selection Carry out after manually marking, by the user of preset quantity inquiry to according to preset ratio be divided into training set, verification collection and Test set;Using the training set for being expressed as feature vector, decision tree classifier is promoted using gradient and trains grader, is then utilized It is expressed as the hyper parameter of the verification collection adjustment grader of feature vector;And judge that the user in test set inquires using grader To whether meeting hyponymy, accuracy rate and recall rate are then calculated.
Optionally, further include using supervised learning training grader:Decision tree classifier, random forest are promoted using gradient One or more trained graders in grader, support vector machine classifier.
To achieve the above object, according to another aspect of the present invention, a kind of user's inquiry hyponymy extraction is provided Device.
A kind of user of the embodiment of the present invention inquires the device that hyponymy extracts:Candidate block, for constructing Candidate user inquiry pair;Representation module, for using the feature being arranged previously according to observation index, the candidate user to be inquired To being expressed as feature vector;Training module, for candidate user inquiry to the user of middle preset quantity inquiry into After pedestrian's work mark, grader is trained using supervised learning;Extraction module, described in being judged using trained grader To whether meeting hyponymy, the user that output meets hyponymy looks into the remaining user's inquiry of candidate user inquiry centering It askes to as extraction result.
Optionally, the candidate block is additionally operable to:User's inquiry is clustered;Then user in class is inquired into group two-by-two It closes, constitutes candidate user inquiry pair.
Optionally, the candidate block is additionally operable to:According to the Session data structure figures of user, wherein inquired with user As the node of figure, then user's query node to the co-occurrence number in window more than predetermined threshold value is attached as figure Side, side right be following four value product:Lexical set is similar after user's inquiry co-occurrence number, user's inquiry progress cutting word The similarity for the natural result set that degree, the embedding vector similarities of user's inquiry, user's inquiry are clicked;And it utilizes Label propagation algorithm clusters user's query node on figure.
Optionally, the observation index includes following one or more:Observation index one:Upper user inquires oneself of displaying Right number of results is more than the natural result number that the next user inquires displaying;Observation index two:Upper user inquires the natural knot of displaying Fruit set includes the degree that the next user inquires the natural result set of displaying, is more than the natural result that the next user inquires displaying Set inquires the degree of the natural result set of displaying comprising upper user;Observation index three:Upper user's inquiry is used with the next The natural result set overlapping quantity of family inquiry displaying is bigger, and the natural result set inclusion relation that upper user inquires displaying is set Reliability is higher;Observation index four:Upper user inquires the natural result number clicked and is more than the natural knot that the next user's inquiry is clicked Fruit number;Observation index five:It includes that the next user inquires the natural result clicked that upper user, which inquires the natural result set clicked, The degree of set, it includes that upper user inquires the natural result collection clicked to be more than the next user to inquire the natural result set clicked The degree of conjunction;Observation index six:Upper user's inquiry quantity Chong Die with the natural result set that the next user inquires displaying is bigger, It is higher that upper user inquires the natural result set inclusion relation confidence level clicked.
Optionally, the feature includes following one or more:The feature that index one is arranged according to the observation:Candidate user is looked into The upper user for asking centering inquires the natural result number of displaying, the next user of candidate user inquiry centering inquires the nature of displaying Number of results;The feature that index two is arranged according to the observation:Candidate user, which is calculated, using the WeedsPrec of not cum rights inquires the upper of centering The natural result set that position user inquires displaying includes that the next user of candidate user inquiry centering inquires the natural result of displaying The degree of set calculates the natural knot that candidate user inquires the upper user inquiry displaying of centering using the balPrec of not cum rights Fruit set includes the degree that the next user of candidate user inquiry centering inquires the natural result set of displaying;Utilize not cum rights The natural result set that WeedsPrec calculates the next user inquiry displaying of candidate user inquiry centering includes that candidate user is inquired The upper user of centering is inquired the degree of the natural result set of displaying, is looked into using the balPrec calculating candidate user of not cum rights The natural result set that the next user of inquiry centering inquires displaying includes the upper user inquiry displaying of candidate user inquiry centering Natural result set degree;Using the difference of the result of calculation of the WeedsPrec of not cum rights, utilize not cum rights The difference of the result of calculation of balPrec;The feature that index three is arranged according to the observation:The upper user of candidate user inquiry centering looks into Inquiry, quantity, the candidate user that candidate user inquires the natural result overlapping of the next user inquiry displaying of centering inquire centering Upper user's inquiry, candidate user inquire the not cum rights LIN scores of the natural result set of the next user inquiry displaying of centering, Reflect the ratio of intersection;The feature that index four is arranged according to the observation:The upper user that candidate user inquires centering inquires click The next user that natural result number, candidate user inquire centering inquires the natural result number clicked;Index five is arranged according to the observation Feature:Inquiring the natural result set clicked using the upper user of WeedsPrec calculating candidate user inquiry centerings includes Candidate user inquires the degree of the natural result set of the next user inquiry click of centering, and weight is number of clicks, utilizes The natural result set that ClarkeDE calculates the upper user inquiry click of candidate user inquiry centering includes that candidate user is inquired The next user of centering inquires the degree for the natural result set clicked, and weight is number of clicks, calculates candidate using balPrec The natural result set that user inquires the upper user inquiry click of centering includes that the next user of candidate user inquiry centering looks into The degree for the natural result set clicked is ask, weight is number of clicks;Candidate user, which is calculated, using WeedsPrec inquires centering The next user inquires the upper user that the natural result set clicked includes candidate user inquiry centering and inquires the natural knot clicked The degree of fruit set, weight are number of clicks, calculate the next user's query point that candidate user inquires centering using ClarkeDE The natural result set hit includes that the upper user of candidate user inquiry centering inquires the degree for the natural result set clicked, power Weight is number of clicks, is calculated the natural result set that the next user inquiry of candidate user inquiry centering is clicked using balPrec Including the upper user of candidate user inquiry centering inquires the degree for the natural result set clicked, weight is number of clicks;Profit With the difference of the result of calculation of WeedsPrec, using ClarkeDE result of calculation difference, utilize the calculating knot of balPrec The difference of fruit;The feature that index six is arranged according to the observation:Candidate user inquires the upper user inquiry of centering, candidate user inquiry The next user of centering inquires the quantity for the natural result overlapping clicked, the upper user of candidate user inquiry centering inquires, and waits It selects the next user of family inquiry centering to inquire the LIN scores for the natural result set clicked, reflects the ratio of intersection;Other spies Sign:Feature a:The natural result that candidate user inquires the upper user inquiry displaying of centering is calculated with the WeedsPrec of not cum rights The next user of the set comprising candidate user inquiry centering inquires the degree for the natural result set clicked;Feature b:With not cum rights WeedsPrec calculate candidate user inquiry centering the next user inquire displaying natural result set looked into comprising candidate user The upper user for asking centering inquires the degree for the natural result set clicked;The difference of the result of calculation of feature a and feature b.
Optionally, the training module is additionally operable to:By the user from candidate user inquiry to the preset quantity of middle selection Inquiry inquires the user of the preset quantity to being divided into training set according to preset ratio, testing to carrying out after manually marking Card collection and test set;Using the training set for being expressed as feature vector, decision tree classifier is promoted using gradient and trains grader, so The hyper parameter for the verification collection adjustment grader for being expressed as feature vector is utilized afterwards;And judge the use in test set using grader Family is inquired to whether meeting hyponymy, and accuracy rate and recall rate are then calculated.
Optionally, the training module is additionally operable to:Decision tree classifier, random forest grader, branch are promoted using gradient Hold one or more trained graders in vector machine classifier.
To achieve the above object, in accordance with a further aspect of the present invention, a kind of electronic equipment is provided.
The a kind of electronic equipment of the embodiment of the present invention includes:One or more processors;Storage device, for storing one Or multiple programs, when one or more of programs are executed by one or more of processors so that one or more of Processor realizes the method that the user of the embodiment of the present invention inquires hyponymy extraction.
To achieve the above object, according to another aspect of the invention, a kind of computer-readable medium is provided.
The computer-readable medium of the embodiment of the present invention, is stored thereon with computer program, and described program is held by processor The method that the user of the embodiment of the present invention inquires hyponymy extraction is realized when row.
One embodiment in foregoing invention has the following advantages that or advantageous effect:Because using based on electric business field Feature designs the feature of multidimensional, and oriented user's inquiry pair is characterized by multiple features, carries out the two classification instructions for having supervision Practice, so as to help grader study to the technological means of more accurate Rule of judgment, so overcome makes in the prior art With vocabulary hyponymy extract limitation and Webpage search domain features are less in the prior art and are difficult to the skill judged Art problem, and then reached in the case where ensureing predicting candidate user inquiry to having compared with high-accuracy for timing while having had higher True positive example recall rate, realize the technique effect of accurate extraction electric business field user inquiry hyponymy, be electric business In field larger help is provided to commodity and recalling for advertisement;Based on user inquire cluster by way of, in class two-by-two User, which inquires, constitutes candidate user inquiry pair, so as to relatively reasonable construction candidate user inquiry pair, reduces artificial mark Burden;The characteristics of by being specific commodity page for the webpage retrieved under electric business environment, propose multiple observation index, to Candidate user can be inquired to numerous features to being expressed as considering above multiple observation index, to help to accurately identify use Whether family is inquired to being hyponymy;Decision tree training grader on training set is promoted by using gradient, is used in combination and acquires Grader to candidate user inquiry pair and the follow-up candidate user inquiry newly excavated to carrying out the positive and negative judgements of two classification, from And the next extraction result in accurate user's inquiry can be obtained.
Further effect possessed by above-mentioned non-usual optional mode adds hereinafter in conjunction with specific implementation mode With explanation.
Description of the drawings
Attached drawing does not constitute inappropriate limitation of the present invention for more fully understanding the present invention.Wherein:
Fig. 1 is the signal of the key step for the method that user according to the ... of the embodiment of the present invention inquires hyponymy extraction Figure;
Fig. 2 is the signal of the main modular for the device that user according to the ... of the embodiment of the present invention inquires hyponymy extraction Figure;
Fig. 3 is adapted for the structural representation of the computer system for the terminal device or server of realizing the embodiment of the present application Figure.
Specific implementation mode
It explains to the exemplary embodiment of the present invention below in conjunction with attached drawing, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize It arrives, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together The description to known function and structure is omitted for clarity and conciseness in sample in following description.
An embodiment of the present invention provides a kind of skills for the hyponymy extraction inquired for user in electric business platform field Art scheme, and in particular to the semantic understanding of user's inquiry belongs to natural language processing field.In order to overcome the prior art not The characteristics of foot, the technical solution of the embodiment of the present invention is based on electric business platform, new feature is devised, and by multiple features come table Oriented user's inquiry pair is levied, two classification based trainings for having supervision are carried out, it is accurate in guarantee to which more accurate judgment condition is arrived in study There can be higher recall rate in the case of true rate.The technical solution may be conveniently used user under electric business environment inquire it is upper The next relationship extraction.
Fig. 1 is the signal of the key step for the method that user according to the ... of the embodiment of the present invention inquires hyponymy extraction Figure.
As shown in Figure 1, the method that a kind of user of the embodiment of the present invention inquires hyponymy extraction includes mainly as follows Step:
Step S11:Construct candidate user inquiry pair.The construction candidate user of this step is inquired to being that user is inquired list In each user inquiry, extract some potentially with each user inquiry may constitute the next relationship user inquiry pair.This The user's inquiry that can be inquired in list user in inventive embodiments clusters, and user in class is then inquired combination of two Mode, constitute candidate user inquiry pair.
Cluster in the embodiment of the present invention can be, but not limited to be to carry out in the following way:According to the Session of user Data structure figures, wherein be more than then predetermined threshold value to the co-occurrence number in window using user's inquiry as the node of figure User's query node is attached the side as figure, and side right is the product of following four value:User inquires co-occurrence number, Yong Hucha The similarity of lexical set, the embedding vector similarities of user's inquiry, user inquire the nature clicked after inquiry progress cutting word The similarity of results set;And user's query node on figure is clustered using label propagation algorithm.
After the construction that step S11 completes candidate pair, from step S12 start to process.
Step S12:Using the feature being arranged previously according to observation index, candidate user inquiry is characterized expression Vector.Before training grader, need to arbitrary user inquiry to being indicated using identical data presentation technique.For It, is expressed as a vector of feature space by each user's inquiry pair.It is technical scheme of the present invention as preceding, based on electricity The characteristics of quotient's platform, devises new feature, and oriented user's inquiry pair is characterized by multiple features.Implement in the present invention In example, and 6 observation index being directed under true electric business environment (it such as can be 6 points, but be not limited at 6 points, Ke Yishi It is wherein some, other observation index can also be added), devising some can help to judge q2Whether it is q1The next feature.
Observation index in the embodiment of the present invention includes following one or more:Observation index one:Upper user inquires exhibition Show natural result number (inquired according to user and the item number of the commodity of non-advertisement that shows, carried out by the SKU of commodity in terms of Number, identical commodity do not compute repeatedly number.Wherein, SKU is the abbreviation of product Unified number, each product is corresponding with uniquely No. SKU.Therefore, in the embodiment of the present invention, natural result is represented with the SKU of commodity, i.e. natural result number is SKU numbers, natural Results set is SKU set.Therefore hereafter at description " user inquires the natural result number of displaying " with " user inquires displaying SKU numbers " replace) be more than the SKU numbers that the next user inquires displaying;Observation index two:Upper user inquires the SKU set of displaying Including the next user inquires the degree of the SKU set of displaying, it includes upper user to be more than the next user to inquire the SKU set of displaying The degree of the SKU set of inquiry displaying;Observation index three:The SKU set weights of displaying are inquired in upper user's inquiry with the next user Folded quantity is bigger, and the SKU set inclusion relation confidence levels that upper user inquires displaying are higher;Observation index four:Upper user looks into (user inquires the natural result number clicked and is carried out a little according to the natural result that user inquires the natural result number that inquiry is clicked The number of the specific commodity page hit, similarly with aforementioned " user inquires the natural result number of displaying ", in the embodiment of the present invention, " point The natural result number hit " is represented by " the SKU numbers of click ", and " the natural result set of click " is represented by " the SKU collection of click Close ") it is more than the SKU numbers that the next user inquires click;Observation index five:Upper user inquires the SKU set clicked and includes bottom User inquires the degree for the SKU set clicked, and it includes that upper user inquiry is clicked that the SKU set clicked is inquired more than bottom user SKU set degree;Observation index six:Upper user's inquiry quantity Chong Die with the SKU set that the next user inquires displaying is got over Greatly, it is higher to inquire the SKU set inclusion relation confidence levels clicked by upper user.
The feature being related to based on foregoing observations index can be, but not limited to include following one or more:Index according to the observation The feature of one setting:The upper user that candidate user inquires centering inquires the SKU numbers of displaying, candidate user inquires the bottom of centering User inquires the SKU numbers of displaying;The feature that index two is arranged according to the observation:Candidate use is calculated using the WeedsPrec of not cum rights The next user inquiry displaying of the SKU set comprising candidate user inquiry centering of the upper user inquiry displaying of centering is inquired at family The degree of SKU set calculates the SKU that candidate user inquires the upper user inquiry displaying of centering using the balPrec of not cum rights The next user of the set comprising candidate user inquiry centering inquires the degree of the SKU set of displaying;Utilize not cum rights The SKU set that WeedsPrec calculates the next user inquiry displaying of candidate user inquiry centering inquires centering comprising candidate user Upper user inquire displaying SKU set degree, using not cum rights balPrec calculate candidate user inquiry centering under The SKU set that position user inquires displaying includes the degree that the upper user of candidate user inquiry centering inquires the SKU set of displaying; Using the difference of the result of calculation of the WeedsPrec of not cum rights, using not cum rights balPrec result of calculation difference;Root The feature that index three is arranged according to observations:Candidate user inquires the upper user inquiry of centering, and candidate user inquires the bottom of centering User inquires the quantity of the SKU overlappings of displaying, the upper user of candidate user inquiry centering inquires, and candidate user inquires centering The next user inquires the not cum rights LIN scores of the SKU set of displaying, reflects the ratio of intersection;Index four is arranged according to the observation Feature:The upper user that candidate user inquires centering inquires the SKU numbers clicked, the next user inquiry of candidate user inquiry centering The SKU numbers of click;The feature that index five is arranged according to the observation:Candidate user, which is calculated, using WeedsPrec inquires the upper of centering User inquires the degree that the next user of the SKU set comprising candidate user inquiry centering clicked inquires the SKU set clicked, power Weight is number of clicks, the SKU set clicked using the upper user inquiry of ClarkeDE calculating candidate user inquiry centerings includes The next user that candidate user inquires centering inquires the degree that the SKU clicked gathers, and weight is number of clicks, utilizes balPrec The upper user for calculating candidate user inquiry centering inquires the next user that the SKU set clicked includes candidate user inquiry centering The degree for the SKU set that inquiry is clicked, weight is number of clicks;It is calculated under candidate user inquiry centering using WeedsPrec Position user inquires the degree that upper user of the SKU set comprising candidate user inquiry centering clicked inquires the SKU set clicked, Weight is number of clicks, is gathered using the SKU of the next user inquiry click of ClarkeDE calculating candidate user inquiry centerings and wrapped The upper user that centering is inquired containing candidate user inquires the degree for the SKU set clicked, and weight is number of clicks, utilizes The SKU set that balPrec calculates the next user inquiry click of candidate user inquiry centering includes candidate user inquiry centering Upper user inquires the degree for the SKU set clicked, and weight is number of clicks;Using the difference of the result of calculation of WeedsPrec, Using the difference of the result of calculation of ClarkeDE, using balPrec result of calculation difference;Index six is arranged according to the observation Feature:Candidate user inquires the upper user inquiry of centering, and the next user that candidate user inquires centering inquires the SKU weights clicked The upper user inquiry of folded quantity, candidate user inquiry centering, the next user that candidate user inquires centering inquire click The LIN scores of SKU set, reflect the ratio of intersection;Other feature:Feature a:Candidate use is calculated with the WeedsPrec of not cum rights The SKU set that the upper user inquiry displaying of centering is inquired at family includes that the next user inquiry of candidate user inquiry centering is clicked The degree of SKU set;Feature b:The next user inquiry displaying that candidate user inquires centering is calculated with the WeedsPrec of not cum rights SKU set comprising candidate user inquiry centering upper user inquire click SKU set degree;Feature a and feature b The difference of result of calculation.
Previous designs indicate well after the feature of user's inquiry pair, so that it may to train grader.
Step S13:After manually being marked to progress to user's inquiry of middle preset quantity to candidate user inquiry, Grader is trained using supervised learning.Detailed process may include:By from candidate user inquiry to the preset quantity of middle selection User's inquiry to carrying out after manually marking, the user of the preset quantity is inquired to being divided into training according to preset ratio Collection, verification collection and test set;Using the training set for being expressed as feature vector, decision tree classifier training classification is promoted using gradient Then device utilizes the hyper parameter for the verification collection adjustment grader for being expressed as feature vector;And judge test set using grader In user inquire to whether meeting hyponymy, then calculate accuracy rate and recall rate.
Wherein, training grader can promote decision tree classifier, random forest grader, supporting vector using gradient One or more trained graders in machine grader.
Step S14:Judge the remaining user's inquiry of the candidate user inquiry centering to being using trained grader No to meet hyponymy, output meets user's inquiry of hyponymy to as extraction result.
The key step that the method for hyponymy extraction is inquired the user of the embodiment of the present invention in foregoing description carries out Explanation.The detailed process of this method is described in detail below in conjunction with particular technique means.
The flow that specific extraction user inquires hyponymy is as follows:
The first step:Obtain candidate user inquiry pair
Each user in list is inquired for user and inquires qi, it is potential that we extract some in user's inquiry list first And qiThe candidate user inquiry that the next relationship may be constituted, constitutes different candidate users and inquires to (q1,q2), wherein q1It is Candidate user inquires the upper user inquiry of centering, q2It is the next user inquiry of candidate user inquiry centering.It can be by having Identical specific commodity page is clicked, or obtains candidate user more than simple conditions such as certain threshold values using inclusion relation is clicked Inquiry pair.In the embodiment of the present invention, it can be, but not limited to be the mode for inquiring cluster based on user, two two users in class inquired Constitute candidate user inquiry pair.Relatively reasonable candidate user inquiry pair, can reduce the burden manually marked, because there is more ratio Example can be labeled as positive example.Arbitrary two two users can also be inquired and constitute candidate user inquiry pair.Construct candidate user Inquire on method do not influence data expression and grader such data expression under classification performance.
In aforementioned cluster process, clustering rule is as follows:According to Session data (data record user's continuous-query Which user inquiry) counting user inquiry the co-occurrence number in window, whether be more than predetermined threshold value to using according to the co-occurrence frequency Family inquiry connects side, constitutes using user's inquiry as the figure of node.The product for being set as following 4 indexs of side right:User's inquiry is altogether Occurrence number, user inquire the similarity of lexical set after cutting word, the embedding vector similarities of user's inquiry, user's inquiry Click the similarity of SKU set.After constituting figure, the node (user's inquiry) on figure is gathered using label propagation algorithm Class.
It should be noted that the mode that cluster is inquired based on user obtain candidate user inquiry to be only obtain it is candidate right A kind of mode, also some other mode, this do not influence it is subsequent from candidate centering grader judge user inquiry whether It is the thinking of upper bottom.
Second step:Expression/characteristic Design of data
Before training grader, need to arbitrary user inquiry to being indicated using identical data presentation technique. Each candidate user is inquired to (q1,q2), it is expressed as to a vector of feature space first.
The characteristics of for the webpage retrieved under electric business environment being specific commodity page (SKU), observes that candidate user is inquired To inner true upper user's inquiry compared to true the next user's inquiry, there are following features:1. the SKU numbers of displaying are more, 2. displayings SKU set usually include more SKU that true the next user inquires displaying, when the SKU set of 3. displayings has more overlapping, The SKU of displaying gathers inclusion relation confidence level higher, and the SKU numbers of 4. clicks are more, and the 5. SKU set clicked usually more are wrapped The SKU clicked is inquired containing true the next user, when the 6. SKU set clicked have more overlapping, the SKU of click gathers inclusion relation Confidence level higher.Based on the above observation, we inquire candidate user to being expressed as considering multiple features of the above feature, then lead to It crosses gradient and promotes decision tree training pattern on training set, the model acquired is to the candidate user inquiry newly excavated to carrying out two points The positive and negative judgement of class.
In the embodiment of the present invention, according to described above for 6 observation index under true electric business environment, devise It is the following to help to judge q2Whether it is q1The next feature.
Based on observation index one (the SKU numbers that upper user inquires displaying are more), design:
Feature 1:q1Displaying SKU numbers (counted as previously mentioned, natural result number is the number based on SKU, displaying Natural result number refers to the SKU that user shows when page turning.q1The natural result number of displaying refers to q1Inquiry is lower all The SKU quantity shown, the same SKU do not add up)
Feature 2:q2The SKU numbers of displaying
Based on observation index two, (the SKU set that upper user inquires displaying usually includes more that true the next user looks into Ask the SKU of displaying), design:
Feature 3:q1The SKU set of displaying includes q2The degree of the SKU set of displaying, can continue to be subdivided into 2 features:
Feature 3.1:Inclusion relation is calculated using the WeedsPrec of not cum rights (specific formula for calculation sees below)
Feature 3.2:Inclusion relation is calculated using the balPrec of not cum rights
Feature 4:q2The SKU set of displaying includes q1The degree of the SKU set of displaying, can continue to be subdivided into 2 features:
Feature 4.1:Inclusion relation is calculated using the WeedsPrec of not cum rights
Feature 4.2:Inclusion relation is calculated using the balPrec of not cum rights
Feature 5:The difference of feature 3 and feature 4 can continue to be subdivided into 2 features:
Feature 5.1 calculates inclusion relation, obtained difference using the WeedsPrec of not cum rights
Feature 5.2 calculates inclusion relation, obtained difference using the balPrec of not cum rights
Based on observation index three, (when the SKU set of displaying has more overlapping, the SKU of displaying gathers inclusion relation confidence level Higher), design:
Feature 6:q1,q2The quantity of the SKU overlappings of displaying
Feature 7:q1,q2The not cum rights LIN scores of the SKU set of displaying, reflecting the ratio of intersection, (specific formula for calculation is shown in Hereafter)
In observation index three, it is proposed that 2 features:Feature 6 is the size of intersection, and feature 7 is the score that LIN is calculated. The score calculated for LIN:Due to not cum rights it can be seen from formula hereafter, the molecule of LIN is exactly the size of intersection, and Denominator is the sum of two set sizes.
In the design for carrying out feature 6 and feature 7 it is to be based on following consideration based on observation index three:q1、q2Show SKU intersections Quantity it is fewer when, if lucky q2The SKU negligible amounts itself showed, then being likely to q1It includes q to show SKU2Show The score of SKU can be higher.And if q1、q2Show SKU intersections quantity it is more when, q1And q2The SKU quantity itself showed is just Will not be low, the result of calculation of other features being related at this time based on other observation index there are the case where relatively large deviation occur can Energy property is with regard to relatively low.Feature 6, feature 7 based on three confidence level of observation index are introduced, subsequent disaggregated model can be helped to be distinguished The sample of those other but inclusion relations higher by the calculated inclusion relation of other feature not confidence is (if these samples are not added with Enter confidence characteristic, be easy to be mistaken for positive example).
If for example, when the inclusion relation that is calculated of other feature is higher, and the LIN scores of feature 7 are very low, then show to wait The SKU of family inquiry centering bottom user inquiry is selected to gather the inquiry centering bottom user inquiry of smaller or candidate user The SKU set that SKU gathers upper user's inquiry in the inquiry of bigger while candidate user is huge.Along with intersection size, this is special Sign 6 carries out user to inquire the sorter model that hyponymy judges to distinguish on earth the case where being the former or the latter, friendship It is the latter to collect greatly, and small intersection is for the former.
Based on observation index four (the SKU numbers that upper user inquires click are more), design:
Feature 8:q1The SKU numbers of click
Feature 9:q2The SKU numbers of click
Based on observation index five, (it usually includes more that true the next user looks into that upper user, which inquires the SKU set clicked, Ask the SKU clicked), design:
Feature 10:q1The SKU set of click includes q2The degree of the SKU set of click, can continue to be subdivided into 3 features:
Feature 10.1:Inclusion relation is calculated using WeedsPrec, weight is number of clicks
Feature 10.2:Inclusion relation is calculated using ClarkeDE, weight is number of clicks
Feature 10.3:Inclusion relation is calculated using balPrec, weight is number of clicks
Feature 11:q2The SKU set of click includes q1The degree of the SKU set of click, can continue to be subdivided into 3 features:
Feature 11.1:Inclusion relation is calculated using WeedsPrec, weight is number of clicks
Feature 11.2:Inclusion relation is calculated using ClarkeDE, weight is number of clicks
Feature 11.3:Inclusion relation is calculated using balPrec, weight is number of clicks
Feature 12:The difference of feature 10 and feature 11 can continue to be subdivided into 3 features:
Feature 12.1:Inclusion relation, obtained difference are calculated using WeedsPrec
Feature 12.2:Inclusion relation, obtained difference are calculated using ClarkeDE
Feature 12.3:Inclusion relation, obtained difference are calculated using balPrec
Based on observation index six, (when the SKU set of click has more overlapping, the SKU of click gathers inclusion relation confidence level Higher), design:
Feature 13:q1,q2The quantity of the SKU overlappings of click
Feature 14:q1,q2The LIN scores of the SKU set of click, reflect the ratio of intersection
Furthermore it is also possible to design some other features:
Feature 15:q1The SKU set of displaying includes q2The degree of the SKU set of click, is counted with the WeedsPrec of not cum rights Calculate inclusion relation
Feature 16:q2The SKU set of displaying includes q1The degree of the SKU set of click, is counted with the WeedsPrec of not cum rights Calculate inclusion relation
Feature 17:The difference of feature 15 and feature 16
In preceding feature design process, calculation formula is as described below, the inclusion relation calculation formula of feature vector:Given institute There is the feature vector F of xx,wx(f) it is weights of the x on feature f, v includes the degree of u:
Inclusion relation confidence calculations formula:
Wherein, with the use of feature 3.1 not cum rights WeedsPrec calculate q1The SKU set of displaying includes q2The SKU of displaying For the specific calculating process of the degree of set:
What feature 3.1 reflected is to calculate q with the WeedsPrec of not cum rights1The SKU set of displaying includes q2The SKU collection of displaying The degree of conjunction, it is assumed that q1The SKU collection of displaying is combined into { SKU1, SKU2, SKU3, SKU4 }, due to not cum rights, wherein each SKU weights It is 1, q2The SKU collection of displaying is combined into { SKU1, SKU2, SKU9 }, due to not cum rights, wherein each SKU weights are 1, then and WeedsPrec (q2,q1)=(SKU1 weight+SKU2 weight)/(SKU1 weight+SKU2 weight+SKU3 weight+SKU4 weights)=0.5.
In preceding feature design, feature 1,2,8,9 reflects the specialization degree of user's inquiry itself to a certain extent, and It includes degree that feature 3,4,5,10,11,12,15,16,17, which reflects the semanteme between user's inquiry, and feature 6,7,13,14 reflects The semantic confidence level for including degree correlated characteristic.
To sum up, 26 features are devised altogether, and therefore, each candidate user is inquired to (q1,q2), it is expressed as 26 The vector of dimension, each vectorial dimension correspond to a feature, and the value of this dimension corresponds to the inquiry pair of this candidate user (q1,q2) value in this feature.Observation index and feature in technical solution of the present invention, it is not limited to the embodiment of the present invention Enumerate range, can also be according to reality when the technical solution that hyponymy is extracted in user's inquiry of the practical application present invention The addition of border observation demand reduces some observation index and feature.
Third walks:Training
Judge candidate user inquiry to (q using grader1,q2) whether meet hyponymy.It is training grader below Standard step.From candidate user inquiry, to extracting right quantity in set, (right quantity indicates following meaning herein:By standard Way, first manually mark batch of data, with classifier training, the effect on verification collection then examined, if on training set Trained application condition is small, and in verification collection, above the effect is relatively poor, then illustrates that training data is insufficient, needs to continue to extract candidate User inquiry to carry out manually mark) user inquiry pair, manually marked, if meet hyponymy.For extracting Each candidate user inquire to (q1,q2), a label is manually assigned, 1 indicates q1It is q2It is upper, 0 indicate q1It is not q2's It is upper.Label is used for instructing how grader goes to judge whether to meet hyponymy according to feature vector.Then by extraction Part is divided into the training set of proper proportion, verification collection, test set.
Each pair of candidate user in training set and verification collection is inquired into the feature vector to being expressed as 26 dimensions, in the spy of training set Decision tree classifier is promoted on sign vector using such as, but not limited to gradient (to be not limited to gradient promotion in the embodiment of the present invention to determine Plan Tree Classifier can also use the graders such as random forest, support vector machines) it is trained, it is adjusted on verification collection The hyper parameter of grader prevents grader over-fitting on training set.
For classification performance of the quantitative analysis grader on the sample not observed, judged with grader on test set Whether meet hyponymy, then calculates accuracy rate and recall rate.
4th step:Prediction
In forecast period, go to excavate all user's inquiries pair for meeting hyponymy using trained grader. First, the remaining candidate user not marked is inquired into the feature vector to being expressed as 26 dimensions, is promoted by trained gradient User's inquiry that decision tree prediction does not mark can will be predicted as positive candidate user inquiry pair and artificial mark to whether being positive example Note is that positive user inquires the final output to being together as the embodiment of the present invention, that is, meets user's inquiry of hyponymy It is right.
According to the method that user in aforementioned electric business of embodiment of the present invention field inquires hyponymy extraction, in candidate user Inquiry is positive and negative by the promotion decision-tree model prediction of trained gradient to inner, so as to obtain meeting hyponymy User inquires to as output.
Inventor implement technical scheme of the present invention carry out user inquiry hyponymy is extracted when, by Training on 338 training sets manually marked, the super ginseng of degree and model that adjustment negative sample up-samples on verification collection at 200 Number --- the quantity and depth capacity of tree is predicted as positive candidate to inner, accuracy rate TP/ (TP+NP) is on test set 93.2%, and recall rate TP/ (TP+FN) is 36.6%.If merely with the one of feature of feature, such as feature 12.1, it is testing Adjustment threshold value obtains 93.2% accuracy rate of the maximum accuracy rate not as good as grader on verification collection on card collection, and is surveying at this time The upper accuracy rate of examination collection is 88.9%, and recall rate is only 7.1%.Therefore, in practical application, in order to ensure accuracy rate and recall rate, A candidate user inquiry pair can be indicated based on aforementioned 26 features.If without higher accuracy rate or recall rate requirement, The feature vector for carrying out candidate user inquiry pair using one or more of 26 features feature can be selected to indicate.To sum up, The method that the user of embodiment of this case inquires hyponymy extraction may be implemented in the case where ensureing compared with high-accuracy, have compared with High recall rate.
The method that user according to the ... of the embodiment of the present invention inquires hyponymy extraction can be seen that be based on because using The characteristics of electric business field, designs the feature of multidimensional, and oriented user's inquiry pair is characterized by multiple features, has carried out supervision Two classification based trainings, so as to help grader study to the technological means of more accurate Rule of judgment, so overcoming existing There is the limitation extracted using vocabulary hyponymy in technology and Webpage search domain features are less and difficult in the prior art The technical issues of to judge, and then reached in the case where ensureing predicting candidate user inquiry to having compared with high-accuracy for timing The recall rate for having higher true positive example simultaneously realizes the technology effect of accurate extraction electric business field user inquiry hyponymy Fruit, to provide larger help to commodity and recalling for advertisement in electric business field;It is right by way of inquiring cluster based on user Two two users, which inquire, in class constitutes candidate user inquiry pair, so as to relatively reasonable construction candidate user inquiry pair, reduces The burden manually marked;The characteristics of by being specific commodity page for the webpage retrieved under electric business environment, propose multiple observations Index, so as to which candidate user is inquired numerous features to being expressed as considering above multiple observation index, to contribute to User's inquiry is accurately identified to whether being hyponymy;Decision tree training classification on training set is promoted by using gradient Device is used in combination the grader acquired to candidate user inquiry pair and the follow-up candidate user inquiry newly excavated to carrying out two classification Positive and negative judgement, so as to obtain the next extraction result in accurate user's inquiry.
Fig. 2 is the signal of the main modular for the device that user according to the ... of the embodiment of the present invention inquires hyponymy extraction Figure.
As shown in Fig. 2, the device 20 that a kind of user of the embodiment of the present invention inquires hyponymy extraction includes mainly:It waits Modeling block 201, representation module 202, training module 203 and extraction module 204.
Wherein, candidate block 201 is for constructing candidate user inquiry pair;Representation module 202 is used for using previously according to sight The feature for examining setup measures inquires the candidate user to being expressed as feature vector;Training module 203 is used for the time It selects user's inquiry of family inquiry centering preset quantity to carrying out after manually marking, grader is trained using supervised learning;It carries Whether modulus block 204 is used to judge the remaining user's inquiry of the candidate user inquiry centering to according with using trained grader Hyponymy is closed, output meets user's inquiry of hyponymy to as extraction result.
Wherein, candidate block 201 can be additionally used in:User's inquiry is clustered;Then user in class is inquired into group two-by-two It closes, constitutes candidate user inquiry pair.
In addition, the candidate block 201 can be additionally used in:According to the Session data structure figures of user, wherein with user The node as figure is inquired, the user's query node for then to the co-occurrence number in window being more than predetermined threshold value is attached work For the side of figure, side right is the product of following four value:User inquires co-occurrence number, user inquires lexical set after carrying out cutting word Similarity, the embedding vector similarities of user's inquiry, user inquire the similarity for the SKU set clicked;And utilize mark Label propagation algorithm clusters user's query node on figure.
In the embodiment of the present invention, observation index can be, but not limited to include following one or more:Observation index one:It is upper The SKU numbers that user inquires displaying are more than the SKU numbers that the next user inquires displaying;Observation index two:Upper user inquires displaying SKU set inquires the degree of the SKU set of displaying comprising the next user, and the SKU set that displaying is inquired more than the next user includes Upper user inquires the degree of the SKU set of displaying;Observation index three:Upper user's inquiry inquires displaying with the next user SKU set overlappings quantity is bigger, and the SKU set inclusion relation confidence levels that upper user inquires displaying are higher;Observation index four:On Position user inquires the SKU numbers clicked and inquires the SKU numbers clicked more than the next user;Observation index five:Upper user, which inquires, to be clicked SKU set the degree of the SKU set clicked is inquired comprising the next user, be more than the next user and inquire the SKU set packets clicked The degree for the SKU set clicked is inquired containing upper user;Observation index six:Upper user's inquiry inquires displaying with the next user SKU set overlappings quantity is bigger, and it is higher that upper user inquires the SKU set inclusion relation confidence levels clicked.
Preceding feature can be, but not limited to include following one or more:The feature that index one is arranged according to the observation:It is candidate The upper user that user inquires centering inquires the SKU numbers of displaying, the next user of candidate user inquiry centering inquires the SKU of displaying Number;The feature that index two is arranged according to the observation:The upper use that candidate user inquires centering is calculated using the WeedsPrec of not cum rights The next user of the SKU set of family inquiry displaying comprising candidate user inquiry centering inquires the degree of the SKU set of displaying, utilizes The upper user of the balPrec calculating candidate user inquiry centerings of cum rights does not inquire the SKU set shown and is looked into comprising candidate user The next user for asking centering inquires the degree that the SKU of displaying gathers;Candidate user is calculated using the WeedsPrec of not cum rights to inquire The SKU set that the next user of centering inquires displaying includes that the upper user of candidate user inquiry centering inquires the SKU collection of displaying The degree of conjunction, the SKU that the next user inquiry displaying that candidate user inquires centering is calculated using the balPrec of not cum rights gather packet The upper user that centering is inquired containing candidate user inquires the degree that the SKU of displaying gathers;Utilize the meter of the WeedsPrec of not cum rights Calculate result difference, using not cum rights balPrec result of calculation difference;The feature that index three is arranged according to the observation:It waits The upper user inquiry of family inquiry centering, candidate user is selected to inquire the number of the SKU overlappings of the next user inquiry displaying of centering Amount, candidate user inquire the upper user inquiry of centering, and the SKU that candidate user inquires the next user inquiry displaying of centering gathers Not cum rights LIN scores, reflect the ratio of intersection;The feature that index four is arranged according to the observation:Candidate user inquires the upper of centering Position user inquires the SKU numbers clicked, the next user of candidate user inquiry centering inquires the SKU numbers clicked;Index according to the observation The feature of five settings:Inquiring the SKU set clicked using the upper user of WeedsPrec calculating candidate user inquiry centerings includes The next user that candidate user inquires centering inquires the degree that the SKU clicked gathers, and weight is number of clicks, utilizes ClarkeDE The upper user for calculating candidate user inquiry centering inquires the next user that the SKU set clicked includes candidate user inquiry centering The degree for the SKU set that inquiry is clicked, weight are number of clicks, inquire the upper of centering using balPrec calculating candidate users User inquires the degree that the next user of the SKU set comprising candidate user inquiry centering clicked inquires the SKU set clicked, power Weight is number of clicks;Inquiring the SKU set clicked using the next user of WeedsPrec calculating candidate user inquiry centerings includes The upper user that candidate user inquires centering inquires the degree that the SKU clicked gathers, and weight is number of clicks, utilizes ClarkeDE The next user for calculating candidate user inquiry centering inquires the upper user that the SKU set clicked includes candidate user inquiry centering The degree for the SKU set that inquiry is clicked, weight are number of clicks, calculate the bottom that candidate user inquires centering using balPrec User inquires the degree that upper user of the SKU set comprising candidate user inquiry centering clicked inquires the SKU set clicked, power Weight is number of clicks;Using the difference of the result of calculation of WeedsPrec, using ClarkeDE result of calculation difference, utilize The difference of the result of calculation of balPrec;The feature that index six is arranged according to the observation:The upper user of candidate user inquiry centering looks into It askes, the next user that candidate user inquires centering inquires the quantity for the SKU overlappings clicked, the upper use of candidate user inquiry centering Family is inquired, and the next user that candidate user inquires centering inquires the LIN scores that the SKU clicked gathers, and reflects the ratio of intersection;Separately Outside, it can also include some other features, such as:Feature a:Candidate user, which is calculated, with the WeedsPrec of not cum rights inquires centering Upper user inquire displaying SKU set comprising candidate user inquiry centering the next user inquire click SKU set Degree;Feature b:The SKU set that candidate user inquires the next user inquiry displaying of centering is calculated with the WeedsPrec of not cum rights Including the upper user of candidate user inquiry centering inquires the degree for the SKU set clicked;The result of calculation of feature a and feature b Difference.
In the embodiment of the present invention, training module 203 can be additionally used in:By from candidate user inquiry to the present count of middle selection User's inquiry of amount inquires the user of the preset quantity to being divided into instruction according to preset ratio to carrying out after manually marking Practice collection, verification collection and test set;Using the training set for being expressed as feature vector, decision tree classifier training point is promoted using gradient Then class device utilizes the hyper parameter for the verification collection adjustment grader for being expressed as feature vector;And judge test using grader The user of concentration inquires to whether meeting hyponymy, then calculates accuracy rate and recall rate.
In addition, training module 203 can be additionally used in:Decision tree classifier, random forest grader, support are promoted using gradient One or more trained graders in vector machine classifier.
From the above, it can be seen that because the characteristics of using based on electric business field, designs the feature of multidimensional, and pass through Multiple features characterize oriented user's inquiry pair, two classification based trainings for having supervision are carried out, so as to help grader to learn To the technological means of more accurate Rule of judgment, so overcoming the limitation for using vocabulary hyponymy to extract in the prior art Property and Webpage search domain features are less in the prior art and the technical issues of being difficult to judge, and then reached and ensured to predict In the case that candidate user inquiry for timing to having compared with high-accuracy while there is the recall rate of higher true positive example, realizes standard The really technique effect of extraction electric business field user inquiry hyponymy, to be provided commodity and recalling for advertisement in electric business field Larger help;By way of inquiring cluster based on user, two two users in class are inquired and constitute candidate user inquiry pair, from And can be relatively reasonable the inquiry pair of construction candidate user, reduce the burden that manually marks;It is retrieved by being directed under electric business environment Webpage the characteristics of being specific commodity page, multiple observation index are proposed, so as to inquire candidate user to being expressed as examining The numerous features for considering above multiple observation index, to help to accurately identify user's inquiry to whether being hyponymy;It is logical Cross and promote decision tree training grader on training set using gradient, be used in combination the grader acquired candidate user is inquired pair and The follow-up candidate user inquiry newly excavated is to carrying out the two positive and negative judgements classified, so as to obtain accurate user's inquiry up and down Position extraction result.
Below with reference to Fig. 3, it illustrates the computer systems 300 suitable for the terminal device for realizing the embodiment of the present application Structural schematic diagram.Terminal device shown in Fig. 3 is only an example, to the function of the embodiment of the present application and should not use model Shroud carrys out any restrictions.
As shown in figure 3, computer system 300 includes central processing unit (CPU) 301, it can be read-only according to being stored in Program in memory (ROM) 302 or be loaded into the program in random access storage device (RAM) 303 from storage section 308 and Execute various actions appropriate and processing.In RAM 303, also it is stored with system 300 and operates required various programs and data. CPU 301, ROM 302 and RAM 303 are connected with each other by bus 304.Input/output (I/O) interface 305 is also connected to always Line 304.
It is connected to I/O interfaces 305 with lower component:Importation 306 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 307 of spool (CRT), liquid crystal display (LCD) etc. and loud speaker etc.;Storage section 308 including hard disk etc.; And the communications portion 309 of the network interface card including LAN card, modem etc..Communications portion 309 via such as because The network of spy's net executes communication process.Driver 310 is also according to needing to be connected to I/O interfaces 305.Detachable media 311, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on driver 310, as needed in order to be read from thereon Computer program be mounted into storage section 308 as needed.
Particularly, according to embodiment disclosed by the invention, the process of key step figure description above may be implemented as counting Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product comprising be carried on computer Computer program on readable medium, the computer program include the program generation for executing method shown in key step figure Code.In such embodiments, which can be downloaded and installed by communications portion 309 from network, and/or It is mounted from detachable media 311.When the computer program is executed by central processing unit (CPU) 301, execute the application's The above-mentioned function of being limited in system.
It should be noted that computer-readable medium shown in the application can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two arbitrarily combines.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or arbitrary above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to:Electrical connection with one or more conducting wires, just It takes formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type and may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In this application, can be any include computer readable storage medium or storage journey The tangible medium of sequence, the program can be commanded the either device use or in connection of execution system, device.And at this In application, computer-readable signal media may include in a base band or as the data-signal that a carrier wave part is propagated, Wherein carry computer-readable program code.Diversified forms may be used in the data-signal of this propagation, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By instruction execution system, device either device use or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to:Wirelessly, electric wire, optical cable, RF etc. or above-mentioned Any appropriate combination.
Key step figure in attached drawing and block diagram, it is illustrated that according to the system, method and calculating of the various embodiments of the application The architecture, function and operation in the cards of machine program product.In this regard, each side in key step figure or block diagram Frame can represent a part for a module, program segment, or code, and a part for above-mentioned module, program segment, or code includes one A or multiple executable instructions for implementing the specified logical function.It should also be noted that in some implementations as replacements, The function of being marked in box can also occur in a different order than that indicated in the drawings.For example, two succeedingly indicate Box can essentially be basically executed in parallel, they can also be executed in the opposite order sometimes, this is according to involved function Depending on.It is also noted that each box and block diagram in block diagram or key step figure or the box in key step figure Combination, the dedicated hardware based systems of the functions or operations as defined in execution realize, or can use it is special firmly The combination of part and computer instruction is realized.
Being described in module involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described module can also be arranged in the processor, for example, can be described as:A kind of processor packet Include candidate block, representation module, training module and extraction module.Wherein, the title of these modules is not under certain conditions The restriction to the module itself is constituted, for example, candidate block is also described as " module of construction candidate user inquiry pair ".
As on the other hand, present invention also provides a kind of computer-readable medium, which can be Included in equipment described in above-described embodiment;Can also be individualism, and without be incorporated the equipment in.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the equipment, makes Obtaining the equipment includes:Construct candidate user inquiry pair;Using the feature being arranged previously according to observation index, by the candidate user Inquiry is to being expressed as feature vector;The user of middle preset quantity is being inquired to manually being marked in candidate user inquiry Later, grader is trained using supervised learning;Judge that the candidate user inquiry centering is remaining using trained grader For user's inquiry to whether meeting hyponymy, output meets user's inquiry of hyponymy to as extraction result.
Technical solution according to the ... of the embodiment of the present invention, because the characteristics of using based on electric business field, designs the spy of multidimensional Sign, and characterize by multiple features oriented user's inquiry pair, carries out two classification based trainings for having supervision, so as to help point Class device learns the technological means to more accurate Rule of judgment, is carried in the prior art using vocabulary hyponymy so overcoming The limitation that takes and Webpage search domain features are less in the prior art and the technical issues of being difficult to judge, and then reached Ensure in the case that predicting candidate user inquires to having compared with high-accuracy for timing while having the recall rate of higher true positive example, The technique effect for realizing accurate extraction electric business field user inquiry hyponymy is in electric business field to commodity and advertisement It recalls and larger help is provided;By way of inquiring cluster based on user, two two users in class are inquired and constitute candidate user Inquiry pair reduces the burden manually marked so as to relatively reasonable construction candidate user inquiry pair;By being directed to electric business ring The characteristics of webpage retrieved under border is specific commodity page proposes multiple observation index, so as to by candidate user inquiry pair It is expressed as considering numerous features of above multiple observation index, to help to accurately identify user's inquiry to whether being upper bottom Relationship;Decision tree training grader on training set is promoted by using gradient, the grader acquired is used in combination to look into candidate user Inquiry pair and the follow-up candidate user inquiry newly excavated are to carrying out the two positive and negative judgements classified, so as to obtain accurate user The next extraction result in inquiry.
Above-mentioned specific implementation mode, does not constitute limiting the scope of the invention.Those skilled in the art should be bright It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and replacement can occur.It is any Modifications, equivalent substitutions and improvements made by within the spirit and principles in the present invention etc., should be included in the scope of the present invention Within.

Claims (16)

1. a kind of method that user inquires hyponymy extraction, which is characterized in that including:
Construct candidate user inquiry pair;
Using the feature being arranged previously according to observation index, the candidate user is inquired to being expressed as feature vector;
After manually being marked to progress to user's inquiry of middle preset quantity to candidate user inquiry, supervised learning is utilized Training grader;
Judge the remaining user's inquiry of the candidate user inquiry centering to whether meeting upper bottom using trained grader Relationship, output meet user's inquiry of hyponymy to as extraction result.
2. according to the method described in claim 1, it is characterized in that, construction candidate user inquiry is to including:
User's inquiry is clustered;
Then user in class is inquired into combination of two, constitutes candidate user inquiry pair.
3. according to the method described in claim 2, it is characterized in that, to user inquiry carry out cluster include:
According to the Session data structure figures of user, wherein using user's inquiry as the node of figure, then in window Co-occurrence number is more than that user's query node of predetermined threshold value is attached the side as figure, and side right is the product of following four value: User inquires co-occurrence number, user inquires the similarity of lexical set after carrying out cutting word, the embedding vector phases of user's inquiry The similarity for the natural result set clicked is inquired like degree, user;And
User's query node on figure is clustered using label propagation algorithm.
4. according to the method described in claim 1, it is characterized in that, the observation index includes following one or more:
Observation index one:The natural result number that upper user inquires displaying is more than the natural result number that the next user inquires displaying;
Observation index two:The natural result set that upper user inquires displaying includes the natural result collection that the next user inquires displaying The degree of conjunction, it includes the natural result set that upper user inquires displaying to be more than the next user to inquire the natural result set shown Degree;
Observation index three:Upper user's inquiry quantity Chong Die with the natural result set that the next user inquires displaying is bigger, upper The natural result set inclusion relation confidence level that user inquires displaying is higher;
Observation index four:Upper user inquires the natural result number clicked and is more than the natural result number that the next user's inquiry is clicked;
Observation index five:It includes that the next user inquires the natural result collection clicked that upper user, which inquires the natural result set clicked, The degree of conjunction, it includes that upper user inquires the natural result set clicked to be more than the next user to inquire the natural result set clicked Degree;
Observation index six:The overlapping quantity that upper user's inquiry inquires the natural result set clicked with the next user is bigger, on It is higher that position user inquires the natural result set inclusion relation confidence level clicked.
5. according to the method described in claim 4, it is characterized in that, the feature includes following one or more:
The feature that index one is arranged according to the observation:The natural result number of the upper user inquiry displaying of candidate user inquiry centering, The next user that candidate user inquires centering inquires the natural result number shown;
The feature that index two is arranged according to the observation:Candidate user, which is calculated, using the WeedsPrec of not cum rights inquires the upper of centering The natural result set that user inquires displaying includes that the next user of candidate user inquiry centering inquires the natural result collection of displaying The degree of conjunction calculates the natural result that candidate user inquires the upper user inquiry displaying of centering using the balPrec of not cum rights The next user of the set comprising candidate user inquiry centering inquires the degree of the natural result set of displaying;Utilize not cum rights The natural result set that WeedsPrec calculates the next user inquiry displaying of candidate user inquiry centering includes that candidate user is inquired The upper user of centering is inquired the degree of the natural result set of displaying, is looked into using the balPrec calculating candidate user of not cum rights The natural result set that the next user of inquiry centering inquires displaying includes the upper user inquiry displaying of candidate user inquiry centering Natural result set degree;Using the difference of the result of calculation of the WeedsPrec of not cum rights, utilize not cum rights The difference of the result of calculation of balPrec;
The feature that index three is arranged according to the observation:Candidate user inquires the upper user inquiry of centering, and candidate user inquires centering The next user inquire displaying natural result overlapping quantity, candidate user inquiry centering upper user inquiry, candidate use The not cum rights LIN scores of the natural result set of the next user inquiry displaying of centering are inquired at family, reflect the ratio of intersection;
The feature that index four is arranged according to the observation:The natural result number of the upper user inquiry click of candidate user inquiry centering, The next user that candidate user inquires centering inquires the natural result number clicked;
The feature that index five is arranged according to the observation:The upper user inquiry that candidate user inquires centering is calculated using WeedsPrec The natural result set of click includes that the next user of candidate user inquiry centering inquires the degree for the natural result set clicked, Weight is number of clicks, is calculated the natural result collection that the upper user inquiry of candidate user inquiry centering is clicked using ClarkeDE Close comprising candidate user inquiry centering the next user inquire click natural result set degree, weight be number of clicks, The natural result set that the upper user inquiry click that candidate user inquires centering is calculated using balPrec is looked into comprising candidate user The next user for asking centering inquires the degree for the natural result set clicked, and weight is number of clicks;It is calculated using WeedsPrec The next user that candidate user inquires centering inquires the upper use that the natural result set clicked includes candidate user inquiry centering The degree for the natural result set that family inquiry is clicked, weight are number of clicks, are calculated candidate user inquiry pair using ClarkeDE In the next user inquire click natural result set include candidate user inquiry centering upper user inquire click from The degree of right results set, weight are number of clicks, are calculated the next user inquiry that candidate user inquires centering using balPrec The natural result set of click includes that the upper user of candidate user inquiry centering inquires the degree for the natural result set clicked, Weight is number of clicks;Using the difference of the result of calculation of WeedsPrec, using ClarkeDE result of calculation difference, profit With the difference of the result of calculation of balPrec;
The feature that index six is arranged according to the observation:Candidate user inquires the upper user inquiry of centering, and candidate user inquires centering The next user inquire click natural result overlapping quantity, candidate user inquiry centering upper user inquiry, candidate use The LIN scores of the natural result set of the next user inquiry click of centering are inquired at family, reflect the ratio of intersection;
Other feature:Feature a:The upper user inquiry displaying that candidate user inquires centering is calculated with the WeedsPrec of not cum rights Natural result set include candidate user inquiry centering the next user inquire click natural result set degree;Feature b:Inquiring the natural result set shown with the next user of the WeedsPrec calculating candidate user inquiry centerings of not cum rights includes Candidate user inquires the degree of the natural result set of the upper user inquiry click of centering;The result of calculation of feature a and feature b Difference.
6. according to the method described in claim 1, it is characterized in that, in the use to candidate user inquiry to middle preset quantity After family inquiry is to carrying out manually mark, further include using supervised learning training grader:
It, will be described pre- after it will manually be marked to progress to user's inquiry of the preset quantity of middle selection from candidate user inquiry If user's inquiry of quantity according to preset ratio to being divided into training set, verification collection and test set;
Using the training set for being expressed as feature vector, decision tree classifier is promoted using gradient and trains grader, then utilizes table It is shown as the hyper parameter of the verification collection adjustment grader of feature vector;And
The inquiry of the user in test set is judged to whether meeting hyponymy using grader, is then calculated accuracy rate and is recalled Rate.
7. according to the method described in claim 1, it is characterized in that, further including using supervised learning training grader:Utilize ladder Degree promotes one or more trained graders in decision tree classifier, random forest grader, support vector machine classifier.
8. a kind of user inquires the device of hyponymy extraction, which is characterized in that including:
Candidate block, for constructing candidate user inquiry pair;
Representation module, for using the feature being arranged previously according to observation index, the candidate user to be inquired to being expressed as spy Sign vector;
Training module, for inquiring the user of middle preset quantity to manually being marked it to candidate user inquiry Afterwards, grader is trained using supervised learning;
Extraction module, for judging the remaining user's inquiry of the candidate user inquiry centering to being using trained grader No to meet hyponymy, output meets user's inquiry of hyponymy to as extraction result.
9. device according to claim 8, which is characterized in that the candidate block is additionally operable to:
User's inquiry is clustered;
Then user in class is inquired into combination of two, constitutes candidate user inquiry pair.
10. device according to claim 9, which is characterized in that the candidate block is additionally operable to:
According to the Session data structure figures of user, wherein using user's inquiry as the node of figure, then in window Co-occurrence number is more than that user's query node of predetermined threshold value is attached the side as figure, and side right is the product of following four value: User inquires co-occurrence number, user inquires the similarity of lexical set after carrying out cutting word, the embedding vector phases of user's inquiry The similarity for the natural result set clicked is inquired like degree, user;And
User's query node on figure is clustered using label propagation algorithm.
11. device according to claim 8, which is characterized in that the observation index includes following one or more:
Observation index one:The natural result number that upper user inquires displaying is more than the natural result number that the next user inquires displaying;
Observation index two:The natural result set that upper user inquires displaying includes the natural result collection that the next user inquires displaying The degree of conjunction, it includes the natural result set that upper user inquires displaying to be more than the next user to inquire the natural result set shown Degree;
Observation index three:Upper user's inquiry quantity Chong Die with the natural result set that the next user inquires displaying is bigger, upper The natural result set inclusion relation confidence level that user inquires displaying is higher;
Observation index four:Upper user inquires the natural result number clicked and is more than the natural result number that the next user's inquiry is clicked;
Observation index five:It includes that the next user inquires the natural result collection clicked that upper user, which inquires the natural result set clicked, The degree of conjunction, it includes that upper user inquires the natural result set clicked to be more than the next user to inquire the natural result set clicked Degree;
Observation index six:Upper user's inquiry quantity Chong Die with the natural result set that the next user inquires displaying is bigger, upper It is higher that user inquires the natural result set inclusion relation confidence level clicked.
12. according to the devices described in claim 11, which is characterized in that the feature includes following one or more:
The feature that index one is arranged according to the observation:The natural result number of the upper user inquiry displaying of candidate user inquiry centering, The next user that candidate user inquires centering inquires the natural result number shown;
The feature that index two is arranged according to the observation:Candidate user, which is calculated, using the WeedsPrec of not cum rights inquires the upper of centering The natural result set that user inquires displaying includes that the next user of candidate user inquiry centering inquires the natural result collection of displaying The degree of conjunction calculates the natural result that candidate user inquires the upper user inquiry displaying of centering using the balPrec of not cum rights The next user of the set comprising candidate user inquiry centering inquires the degree of the natural result set of displaying;Utilize not cum rights The natural result set that WeedsPrec calculates the next user inquiry displaying of candidate user inquiry centering includes that candidate user is inquired The upper user of centering is inquired the degree of the natural result set of displaying, is looked into using the balPrec calculating candidate user of not cum rights The natural result set that the next user of inquiry centering inquires displaying includes the upper user inquiry displaying of candidate user inquiry centering Natural result set degree;Using the difference of the result of calculation of the WeedsPrec of not cum rights, utilize not cum rights The difference of the result of calculation of balPrec;
The feature that index three is arranged according to the observation:Candidate user inquires the upper user inquiry of centering, and candidate user inquires centering The next user inquire displaying natural result overlapping quantity, candidate user inquiry centering upper user inquiry, candidate use The not cum rights LIN scores of the natural result set of the next user inquiry displaying of centering are inquired at family, reflect the ratio of intersection;
The feature that index four is arranged according to the observation:The natural result number of the upper user inquiry click of candidate user inquiry centering, The next user that candidate user inquires centering inquires the natural result number clicked;
The feature that index five is arranged according to the observation:The upper user inquiry that candidate user inquires centering is calculated using WeedsPrec The natural result set of click includes that the next user of candidate user inquiry centering inquires the degree for the natural result set clicked, Weight is number of clicks, is calculated the natural result collection that the upper user inquiry of candidate user inquiry centering is clicked using ClarkeDE Close comprising candidate user inquiry centering the next user inquire click natural result set degree, weight be number of clicks, The natural result set that the upper user inquiry click that candidate user inquires centering is calculated using balPrec is looked into comprising candidate user The next user for asking centering inquires the degree for the natural result set clicked, and weight is number of clicks;It is calculated using WeedsPrec The next user that candidate user inquires centering inquires the upper use that the natural result set clicked includes candidate user inquiry centering The degree for the natural result set that family inquiry is clicked, weight are number of clicks, are calculated candidate user inquiry pair using ClarkeDE In the next user inquire click natural result set include candidate user inquiry centering upper user inquire click from The degree of right results set, weight are number of clicks, are calculated the next user inquiry that candidate user inquires centering using balPrec The natural result set of click includes that the upper user of candidate user inquiry centering inquires the degree for the natural result set clicked, Weight is number of clicks;Using the difference of the result of calculation of WeedsPrec, using ClarkeDE result of calculation difference, profit With the difference of the result of calculation of balPrec;
The feature that index six is arranged according to the observation:Candidate user inquires the upper user inquiry of centering, and candidate user inquires centering The next user inquire click natural result overlapping quantity, candidate user inquiry centering upper user inquiry, candidate use The LIN scores of the natural result set of the next user inquiry click of centering are inquired at family, reflect the ratio of intersection;
Other feature:Feature a:The upper user inquiry displaying that candidate user inquires centering is calculated with the WeedsPrec of not cum rights Natural result set include candidate user inquiry centering the next user inquire click natural result set degree;Feature b:Inquiring the natural result set shown with the next user of the WeedsPrec calculating candidate user inquiry centerings of not cum rights includes Candidate user inquires the degree of the natural result set of the upper user inquiry click of centering;The result of calculation of feature a and feature b Difference.
13. device according to claim 8, which is characterized in that the training module is additionally operable to:
It, will be described pre- after it will manually be marked to progress to user's inquiry of the preset quantity of middle selection from candidate user inquiry If user's inquiry of quantity according to preset ratio to being divided into training set, verification collection and test set;
Using the training set for being expressed as feature vector, decision tree classifier is promoted using gradient and trains grader, then utilizes table It is shown as the hyper parameter of the verification collection adjustment grader of feature vector;And
The inquiry of the user in test set is judged to whether meeting hyponymy using grader, is then calculated accuracy rate and is recalled Rate.
14. device according to claim 8, which is characterized in that the training module is additionally operable to:Decision is promoted using gradient One or more trained graders in Tree Classifier, random forest grader, support vector machine classifier.
15. a kind of electronic equipment, which is characterized in that including:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors so that one or more of processors are real The now method as described in any one of claim 1-7.
16. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor The method as described in any in claim 1-7 is realized when row.
CN201710260844.3A 2017-04-20 2017-04-20 Method, device, electronic equipment and medium for extracting upper and lower relation of user query Active CN108733702B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710260844.3A CN108733702B (en) 2017-04-20 2017-04-20 Method, device, electronic equipment and medium for extracting upper and lower relation of user query

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710260844.3A CN108733702B (en) 2017-04-20 2017-04-20 Method, device, electronic equipment and medium for extracting upper and lower relation of user query

Publications (2)

Publication Number Publication Date
CN108733702A true CN108733702A (en) 2018-11-02
CN108733702B CN108733702B (en) 2020-09-29

Family

ID=63933408

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710260844.3A Active CN108733702B (en) 2017-04-20 2017-04-20 Method, device, electronic equipment and medium for extracting upper and lower relation of user query

Country Status (1)

Country Link
CN (1) CN108733702B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968665A (en) * 2019-11-08 2020-04-07 浙江工业大学 Method for recognizing upper and lower level word relation based on gradient enhanced decision tree
CN111288973A (en) * 2020-01-23 2020-06-16 中山大学 Method and device for obtaining flow rate of sea surface, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699568A (en) * 2013-11-16 2014-04-02 西安交通大学城市学院 Method for extracting hyponymy relation of field terms from wikipedia
CN104615724A (en) * 2015-02-06 2015-05-13 百度在线网络技术(北京)有限公司 Establishing method of knowledge base and information search method and device based on knowledge base
CN105654144A (en) * 2016-02-29 2016-06-08 东南大学 Social network body constructing method based on machine learning
CN105808525A (en) * 2016-03-29 2016-07-27 国家计算机网络与信息安全管理中心 Domain concept hypernym-hyponym relation extraction method based on similar concept pairs
US20160292149A1 (en) * 2014-08-02 2016-10-06 Google Inc. Word sense disambiguation using hypernyms
CN106569993A (en) * 2015-10-10 2017-04-19 中国移动通信集团公司 Method and device for mining hypernym-hyponym relation between domain-specific terms

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699568A (en) * 2013-11-16 2014-04-02 西安交通大学城市学院 Method for extracting hyponymy relation of field terms from wikipedia
US20160292149A1 (en) * 2014-08-02 2016-10-06 Google Inc. Word sense disambiguation using hypernyms
CN104615724A (en) * 2015-02-06 2015-05-13 百度在线网络技术(北京)有限公司 Establishing method of knowledge base and information search method and device based on knowledge base
CN106569993A (en) * 2015-10-10 2017-04-19 中国移动通信集团公司 Method and device for mining hypernym-hyponym relation between domain-specific terms
CN105654144A (en) * 2016-02-29 2016-06-08 东南大学 Social network body constructing method based on machine learning
CN105808525A (en) * 2016-03-29 2016-07-27 国家计算机网络与信息安全管理中心 Domain concept hypernym-hyponym relation extraction method based on similar concept pairs

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LILI KOTLERMAN ET AL.,: "Directional Distributional Similarity for Lexical Expansion", 《PROCEEDINGS OF THE ACL-IJCNLP 2009 CONFERENCE SHORT PAPERS》 *
付瑞吉: "开放域命名实体识别及其层次化类别获取", 《中国博士学位论文全文数据库信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968665A (en) * 2019-11-08 2020-04-07 浙江工业大学 Method for recognizing upper and lower level word relation based on gradient enhanced decision tree
CN110968665B (en) * 2019-11-08 2022-09-23 浙江工业大学 Method for recognizing upper and lower level word relation based on gradient enhanced decision tree
CN111288973A (en) * 2020-01-23 2020-06-16 中山大学 Method and device for obtaining flow rate of sea surface, computer equipment and storage medium

Also Published As

Publication number Publication date
CN108733702B (en) 2020-09-29

Similar Documents

Publication Publication Date Title
US11347782B2 (en) Internet text mining-based method and apparatus for judging validity of point of interest
CN110837550B (en) Knowledge graph-based question answering method and device, electronic equipment and storage medium
CN107330011B (en) The recognition methods of the name entity of more strategy fusions and device
CN103491205B (en) The method for pushing of a kind of correlated resources address based on video search and device
US11709999B2 (en) Method and apparatus for acquiring POI state information, device and computer storage medium
CN109933660B (en) API information search method towards natural language form based on handout and website
CN107220386A (en) Information-pushing method and device
CN103455545A (en) Location estimation of social network users
CN103324666A (en) Topic tracing method and device based on micro-blog data
CN108628811A (en) The matching process and device of address text
CN109582788A (en) Comment spam training, recognition methods, device, equipment and readable storage medium storing program for executing
CN110458425A (en) Risk analysis method, device, readable medium and the electronic equipment of risk subject
CN110362800A (en) Configuration method, device, electronic equipment and the medium of product information
CN106537387B (en) Retrieval/storage image associated with event
CN110134845A (en) Project public sentiment monitoring method, device, computer equipment and storage medium
CN109828906A (en) UI automated testing method, device, electronic equipment and storage medium
CN110968664A (en) Document retrieval method, device, equipment and medium
CN110019849A (en) A kind of video concern moment search method and device based on attention mechanism
CN108733702A (en) User inquires method, apparatus, electronic equipment and the medium of hyponymy extraction
CN110516062A (en) A kind of search processing method and device of document
CN105095385B (en) A kind of output method and device of retrieval result
CN106777395A (en) A kind of topic based on community's text data finds system
CN114265777B (en) Application program testing method and device, electronic equipment and storage medium
CN109902152A (en) Method and apparatus for retrieving information
CN114299196A (en) Poster automatic generation method and system, storage medium and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant