CN105159884A - Method and device for establishing industry dictionary and industry identification method and device - Google Patents

Method and device for establishing industry dictionary and industry identification method and device Download PDF

Info

Publication number
CN105159884A
CN105159884A CN201510613993.4A CN201510613993A CN105159884A CN 105159884 A CN105159884 A CN 105159884A CN 201510613993 A CN201510613993 A CN 201510613993A CN 105159884 A CN105159884 A CN 105159884A
Authority
CN
China
Prior art keywords
search
industry
word
dictionary
search word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510613993.4A
Other languages
Chinese (zh)
Other versions
CN105159884B (en
Inventor
郭涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510613993.4A priority Critical patent/CN105159884B/en
Publication of CN105159884A publication Critical patent/CN105159884A/en
Application granted granted Critical
Publication of CN105159884B publication Critical patent/CN105159884B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method and a device for establishing an industry dictionary and industry identification method and device. The method for establishing the industry dictionary comprises the following steps: obtaining a user search behavior log; extracting various search terms and corresponding clicked search results from the user search behavior log; and determining the industries to which the search terms belong according to the clicked search results, with the search terms as exact index terms, and establishing and storing entry pairs between the exact index terms and the corresponding industries to form an exact dictionary. The exact dictionary can be automatically established on the basis of analysis of the user search behavior log; the search requirements of the user are reflected by the clicked search result; and the industries to which the search terms belong are determined on the basis of the search requirements, so that the accuracy rate of the obtained entry pairs is high. All clicked search results corresponding to the exact index terms are analyzed, so that omission of the one-to-many corresponding relation between certain exact index terms and the industries is avoided; and the accuracy rate of the entry pairs of the exact index terms is improved.

Description

The method for building up of industry dictionary and device and industry recognition methods and device
Technical field
The embodiment of the present invention relates to information discriminating technology field, particularly relates to a kind of method for building up of industry dictionary and device and industry recognition methods and device.
Background technology
The existing industry identification for search behavior, mainly based on the artificial vocabulary generated, only has when search word hit vocabulary, just can carry out the identification of industry.
The defect existed is: the artificial vocabulary generated is low for the coverage rate of search word; For the special search word of some corresponding multiple industry, utilize the artificial vocabulary generated, each search word is a corresponding industry only, causes industry recognition accuracy lower.
Summary of the invention
The embodiment of the present invention provides a kind of method for building up and device of industry dictionary, sets up with the robotization realizing industry dictionary.
The embodiment of the present invention also provides a kind of industry recognition methods and device, to improve the coverage rate to query string, and improves the accuracy rate of the industry identification of query string.
First aspect, embodiments provides a kind of method for building up of industry dictionary, comprising:
Obtain user search user behaviors log;
Each search word is extracted from described user search user behaviors log, and the clicked Search Results of correspondence;
Industry belonging to described clicked Search Results determination search word, using described search word as accurate index word, sets up and preserves described accurate index word and the entry pair of corresponding industry, forming accurate dictionary.
Second aspect, embodiments provides a kind of apparatus for establishing of industry dictionary, comprising:
Log acquisition module, for obtaining user search user behaviors log;
Extraction module, for extracting each search word from described user search user behaviors log, and the clicked Search Results of correspondence;
Accurate dictionary forms module, for industry belonging to described clicked Search Results determination search word, using described search word as accurate index word, sets up and preserves described accurate index word and the entry pair of corresponding industry, forming accurate dictionary.
The third aspect, embodiments provides a kind of industry recognition methods, and the dictionary that the method for building up of the industry dictionary provided based on any embodiment of the present invention is set up realizes, and comprising:
Obtain the query string of user's input;
Query string described in exact matching in the accurate dictionary set up in advance, using the industry corresponding to the accurate index word that the match is successful as industry corresponding to described query string, and returns industry corresponding to described query string.
Fourth aspect, embodiments provides a kind of industry recognition device, and the dictionary that the apparatus for establishing of the industry dictionary provided based on any embodiment of the present invention is set up realizes, and comprising:
Query string acquisition module, for obtaining the query string of user's input;
Industry identification module, for query string described in exact matching in the accurate dictionary set up in advance, using the industry corresponding to the accurate index word that the match is successful as industry corresponding to described query string, and returns industry corresponding to described query string.
The method for building up of the industry dictionary that the embodiment of the present invention provides and device, based on the analysis of user search user behaviors log, can automatically set up accurate dictionary, along with the renewal of user search user behaviors log, the entry pair in accurate dictionary can be constantly updated, improve the coverage rate of accurate dictionary to search word; What usually reflect due to clicked Search Results is the search need of user, therefore industry belonging to clicked Search Results determination search word, and the right accuracy rate of the entry obtained is high; Simultaneously, for each accurate index word in accurate dictionary, because all clicked Search Results of the correspondence to accurate index word is all analyzed, avoid the omission of the one-to-many corresponding relation of some accurate index word and industry, improve the accuracy rate that the entry of accurate dictionary is right.
The industry recognition methods that the embodiment of the present invention provides and device, because accurate dictionary automatically sets up based on the analysis of user search user behaviors log, along with the renewal of user search user behaviors log, the entry pair in accurate dictionary can be constantly updated, improve the coverage rate of accurate dictionary to query string; Utilize accurate dictionary, can realize the industry identification to query string, especially to the industry identification of the higher query string of search rate, both can be the accurate identification of one-one relationship, also can be the accurate identification of many-one relationship, improve the accuracy rate of the industry identification to query string.
Accompanying drawing explanation
In order to be illustrated more clearly in the present invention, introduce doing one to the accompanying drawing used required in the present invention simply below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The schematic flow sheet of the method for building up of a kind of industry dictionary that Fig. 1 a provides for the embodiment of the present invention one;
A kind of method flow schematic diagram forming accurate dictionary according to clicked Search Results in the method for building up of the industry dictionary that Fig. 1 b provides for the embodiment of the present invention one;
The structural representation of the apparatus for establishing of a kind of industry dictionary that Fig. 2 provides for the embodiment of the present invention four;
The schematic flow sheet of a kind of industry recognition methods that Fig. 3 provides for the embodiment of the present invention five;
The structural representation of a kind of industry recognition device that Fig. 4 provides for the embodiment of the present invention six.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly, be described in further detail the technical scheme in the embodiment of the present invention below in conjunction with accompanying drawing, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Be understandable that; specific embodiment described herein is only for explaining the present invention; but not limitation of the invention; based on the embodiment in the present invention; those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not full content.
Before in further detail exemplary embodiment being discussed, it should be mentioned that some exemplary embodiments are described as the process or method described as process flow diagram.Although operations (or step) is described as the process of order by process flow diagram, many operations wherein can be implemented concurrently, concomitantly or simultaneously.In addition, the order of operations can be rearranged.Described process can be terminated when its operations are completed, but can also have the additional step do not comprised in the accompanying drawings.Described process can correspond to method, function, code, subroutine, subroutine etc.
Also it should be mentioned that and to replace in implementation at some, the function/action mentioned can according to being different from occurring in sequence of indicating in accompanying drawing.For example, depend on involved function/action, in fact each width figure in succession illustrated can perform simultaneously or sometimes can perform according to contrary order substantially.
Embodiment one
Refer to Fig. 1 a, the schematic flow sheet of the method for building up of a kind of industry dictionary provided for the embodiment of the present invention one.The method of the embodiment of the present invention can be performed by the apparatus for establishing of the industry dictionary being configured to hardware and/or software simulating, and this implement device is typically configured at can be provided in the server of information search service.
The method comprises: operation 110 ~ operation 130.
110, user search user behaviors log is obtained.
It is a kind of mode of common obtaining information that user is undertaken searching for by various search engine.The search operation each time of user, search engine all can form corresponding user search user behaviors log, not only comprises the search word of user's input, also comprise corresponding Search Results, and user is to information such as the clicks of some concrete Search Results.
Not limiting the source of user search user behaviors log in this operation, both can be the user search user behaviors log of mobile terminal, also can be the user search user behaviors log of PC end.Further, for mobile terminal and/or PC end, both can be that a certain vertical channel classification is searched for (such as, " webpage " this vertical classification search in Baidu's search engine) corresponding user search user behaviors log, can also be that multiple vertical channel classification is searched for (such as, " webpage " and " map " two vertical classifications search in Baidu's search engine) corresponding user search user behaviors log, can also expand the user search user behaviors log of full platform to.
In this operation, normally obtain the user search user behaviors log of (such as 3 months) in setting-up time.
120, from described user search user behaviors log, each search word is extracted, and the clicked Search Results of correspondence.
130, industry belonging to described clicked Search Results determination search word, using described search word as accurate index word, sets up and preserves described accurate index word and the entry pair of corresponding industry, forming accurate dictionary.
For in the repeatedly search procedure of same search word, because the search need of different user is different, cause the clicked of the Search Results had, some Search Results are not clicked, what usually reflect due to clicked Search Results is the search need of user, therefore industry belonging to clicked Search Results determination search word, based on this, the right accuracy rate of the entry obtained is high.
For some search word, these search words only corresponding industry can only be determined according to clicked Search Results; And for other search word, these search words corresponding multiple industry simultaneously can be determined according to clicked Search Results, therefore, the entry in accurate dictionary is to there being two kinds of forms, and one is man-to-man entry pair, and a kind of is the entry pair of one-to-many.
Such as, suppose that search word is for " KFC ", clicked Search Results has: the Search Results including shops of KFC information, include the Search Results that KFC makes a reservation on the net, and include the Search Results of KFC's group purchase information, the Search Results clicked according to these, can determine that industry belonging to search word " KFC " is for " cuisines ", therefore the entry pair of accurate index word " KFC " and corresponding industry " cuisines " can be obtained, for man-to-man entry pair, be kept in accurate dictionary.
And for example, suppose that search word is for " exhibition center ", clicked Search Results has: the Search Results including the Exhibition Information in museum, include the Search Results of the Exhibition Information of the Art Museum, and include the Search Results of Exhibition Information of science and technology center, according to the clicked Search Results including the Exhibition Information in museum, can determine that industry belonging to search word " exhibition center " is for " museum ", according to the clicked Search Results including the Exhibition Information of the Art Museum, can determine that industry belonging to search word " exhibition center " is for " Art Museum ", according to the clicked Search Results including the Exhibition Information of science and technology center, can determine that industry belonging to search word " exhibition center " is for " science and technology center ", therefore accurate index word " exhibition center " and corresponding industry " museum " can be obtained, the entry pair of " Art Museum " and " science and technology center ", for the entry pair of one-to-many, be kept in accurate dictionary.
The technical scheme of the present embodiment, based on the analysis of user search user behaviors log, can automatically set up accurate dictionary, along with the renewal of user search user behaviors log, can constantly update the entry pair in accurate dictionary, improve the coverage rate of accurate dictionary to search word; What usually reflect due to clicked Search Results is the search need of user, therefore industry belonging to clicked Search Results determination search word, and the right accuracy rate of the entry obtained is high; Simultaneously, for each accurate index word in accurate dictionary, because all clicked Search Results of the correspondence to accurate index word is all analyzed, avoid the omission of the one-to-many corresponding relation of some accurate index word and industry, improve the accuracy rate that the entry of accurate dictionary is right.
As according to clicked Search Results, the one forming accurate dictionary preferred embodiment, refers to Fig. 1 b, can specifically comprise: operation 131 ~ operation 132.
131, the search rate of each search word is added up, and the click probability of the clicked Search Results of described correspondence.
Before address, normally obtain the user search user behaviors log in setting-up time.In setting-up time, user often carries out one search operation, then total searching times adds 1, total searching times that the user search user behaviors log obtained in setting-up time is corresponding can be added up, and add up the searching times of each search word, for each search word, according to searching times and total searching times of this search word, the search rate obtaining this search word can be added up.
For in the repeatedly search procedure of same search word, because the search need of different user is different, cause the number of clicks of the Search Results had higher, the number of clicks of some Search Results is lower, the Search Results also had is not clicked, by adding up the click probability of clicked Search Results corresponding to each search word, be conducive to the search need knowing user.
132, described search rate is greater than to each search word of the first threshold value, when the click probability of clicked Search Results corresponding to this search word is greater than the second threshold value, using this search word as accurate index word, and determine that corresponding described click probability is greater than the industry belonging to Search Results of the second threshold value, as the industry that described accurate index word is corresponding; Set up and preserve described accurate index word and the entry pair of corresponding industry, forming accurate dictionary.
Such as, suppose that the search rate of search word " KFC " is greater than the first threshold value, the click probability including the Search Results of shops of KFC information is greater than the second threshold value, the click probability including the Search Results that KFC makes a reservation on the net is also greater than the second threshold value, the click probability including the Search Results of KFC's group purchase information is also greater than the second threshold value, through determining, this industry belonging to three Search Results is " cuisines ", therefore using search word " KFC " as accurate index word, by " cuisines " industry as accurate index word " KFC " correspondence, obtain the entry pair of accurate index word " KFC " and corresponding industry " cuisines ", be kept in accurate dictionary.
Similarly, the entry pair one to one in accurate dictionary can be obtained, e.g., the entry pair of accurate index word " China Merchants Bank " and corresponding industry " bank ".
And for example, suppose that the search rate of search word " ABC hotel " is greater than the first threshold value, the click probability including the Search Results of the check-in information in this hotel is greater than the second threshold value, the click probability including the Search Results of the dish information that this hotel provides also is greater than the second threshold value, through determining, include the industry belonging to Search Results of the check-in information in this hotel for " hotel ", include the industry belonging to Search Results of the dish information that this hotel provides for " cuisines ", therefore using search word " ABC hotel " as accurate index word, by " hotel " and " cuisines " all as the industry that accurate index word " ABC hotel " is corresponding, obtain the entry pair of accurate index word " ABC hotel " and corresponding industry " hotel " and " cuisines ", be kept in accurate dictionary.
Similarly, the entry pair of the one-to-many in accurate dictionary can be obtained, e.g., the entry pair of accurate index word " exhibition center " and corresponding industry " museum ", " Art Museum " and " science and technology center ".
The technical scheme of the present embodiment, based on the analysis of user search user behaviors log, can automatically set up accurate dictionary, along with the renewal of user search user behaviors log, can constantly update the entry pair in accurate dictionary, improve the coverage rate of accurate dictionary to search word, for each entry pair in accurate dictionary, search rate due to accurate index word is greater than the first threshold value, the search word making search rate higher is all encompassed in accurate dictionary, be conducive to the industry identification of the higher search word of search rate, and all statistical study has been carried out to the click probability of each Search Results of the correspondence of accurate index word, and click probability is greater than the industry belonging to each Search Results of the second threshold value, be defined as the industry that accurate index word is corresponding, avoid the omission of the one-to-many corresponding relation of some accurate index word and industry, improve the accuracy rate that the entry of accurate dictionary is right.
Embodiment two
The present embodiment provides a kind of method for building up of industry dictionary, and the present embodiment, on the basis of above-described embodiment, after the search rate of each search word of statistics, also comprises:
Described search rate is less than or equal to each search word of the first threshold value, utilizes described accurate dictionary, this search word is split, obtain the sub-search word that this search word is corresponding, and the industry that described sub-search word is corresponding; Using sub-search word corresponding for this search word as fuzzy index word, set up and preserve the entry pair of described fuzzy index word and industry corresponding to described sub-search word, forming fuzzy dictionary.
Such as, suppose that the search rate of search word " neighbouring China Merchants Bank " is less than or equal to the first threshold value, utilize accurate dictionary, this search word is split, obtain the sub-search word " China Merchants Bank " that this search word is corresponding, and the industry " bank " that sub-search word " China Merchants Bank " is corresponding, therefore, using sub-search word " China Merchants Bank " corresponding for search word " neighbouring China Merchants Bank " as searching for word generally, word " China Merchants Bank " and the entry pair of industry " bank " are searched in foundation generally, are kept in fuzzy dictionary.
Similarly, the entry pair one to one in fuzzy dictionary can be obtained, e.g., the entry pair of fuzzy index word " KFC " and corresponding industry " cuisines ".
It should be noted that, some entry in accurate dictionary pair with the fuzzy entry in fuzzy dictionary to partly overlapping, such as this entry pair of " KFC " → " cuisines ", both can appear at inside accurate dictionary, also can appear at inside fuzzy dictionary simultaneously, but for accurate dictionary, the search word in user search user behaviors log and accurate index word " KFC " exact matching, the right concrete obtain manner above-described embodiment of entry is addressed, repeats no more; For fuzzy dictionary, search word in user search user behaviors log with search for word " KFC " fuzzy matching generally, such as, user search word is " neighbouring KFC ", the obtain manner obtain manner right from this entry in accurate dictionary is different, utilizes accurate dictionary to split search word and obtains.
And for example, suppose that the search rate of search word " China Merchants Bank near KFC " is less than or equal to the first threshold value, utilize accurate dictionary, this search word is split, obtain sub-search word " KFC " corresponding to this search word and " China Merchants Bank ", and the industry " cuisines " that sub-search word " KFC " is corresponding, the industry " bank " corresponding with sub-search word " China Merchants Bank ", therefore, the sub-search word " KFC " corresponding using search word " China Merchants Bank near KFC " and " China Merchants Bank " are as searching for word generally, set up entry pair: " KFC ", " China Merchants Bank " → " cuisines ", " bank ", be kept in fuzzy dictionary.
The technical scheme of the present embodiment, on the one hand, based on the analysis of user search user behaviors log, can automatically set up accurate dictionary, along with the renewal of user search user behaviors log, the entry pair in fuzzy dictionary can be constantly updated, improve the coverage rate of accurate dictionary to search word; On the other hand, for each entry pair in accurate dictionary, the search rate due to accurate index word is greater than the first threshold value, and the search word making search rate higher is all encompassed in accurate dictionary, is conducive to the industry identification of the higher search word of search rate; Again on the one hand, all statistical study has been carried out to the click probability of each Search Results of the correspondence of accurate index word, and click probability is greater than the industry belonging to each Search Results of the second threshold value, be defined as the industry that accurate index word is corresponding, avoid the omission of the one-to-many corresponding relation of some accurate index word and industry, improve the accuracy rate that the entry of accurate dictionary is right; Another aspect, based on the further analysis of user search user behaviors log, utilize accurate dictionary, can automatically set up fuzzy dictionary, for the search word (such as long-tail search word) that search rate is lower, utilize accurate dictionary cannot exact matching time, utilize fuzzy dictionary to carry out fuzzy matching, the industry identification of the lower search word of this part search rate can be realized, further increase the coverage rate to search word by accurate dictionary and fuzzy dictionary.
Embodiment three
The present embodiment provides a kind of method for building up of industry dictionary, and the present embodiment, on the basis of embodiment two, after the fuzzy dictionary of formation, also comprises:
When the quantity of sub-search word corresponding to this search word is at least two, priority is utilized to determine strategy, determine the priority of at least two sub-search words, set up and the entry pair of industry corresponding to the sub-search word the highest with priority of at least two sub-search words described in preserving, form priority dictionary.
Still be described for the search word " China Merchants Bank near KFC " in above-described embodiment two, for this search word, frontly to address, utilize accurate dictionary, entry pair can be obtained: " KFC ", " China Merchants Bank " → " cuisines ", " bank ", be kept in fuzzy dictionary.The sub-search word that search word " China Merchants Bank near KFC " is corresponding is two, " KFC " and " China Merchants Bank " respectively, strategy (has multiple strategy to utilize priority to determine, can be described in detail below), determine that the priority of these two sub-search words is: the priority of sub-search word " China Merchants Bank " is higher than the priority of sub-search word " KFC ", and industry corresponding to the highest sub-search word " China Merchants Bank " of this priority is " bank ", therefore, set up entry pair: " KFC ", " China Merchants Bank " → " bank ", form priority dictionary.
The technical scheme of the present embodiment, based on the analysis of user search user behaviors log, can automatically set up accurate dictionary, based on the further analysis of user search user behaviors log, and utilize accurate dictionary, can automatically set up fuzzy dictionary, wherein, utilize accurate dictionary, the industry identification to the higher search word of search rate can be realized, both can be the accurate identification of one-one relationship, also can be the accurate identification of many-one relationship, improves the accuracy rate of the industry identification to the higher search word of search rate, for the search word (such as long-tail search word) that search rate is lower, utilize accurate dictionary cannot exact matching time, fuzzy dictionary is utilized to carry out fuzzy matching, the industry identification of the lower search word of this part search rate can be realized, the coverage rate to search word is further increased by accurate dictionary and fuzzy dictionary, wherein, for the search word that some search rate is lower, to utilize in fuzzy dictionary entry pair one to one, the industry that these search words of identifiable design are corresponding, the accuracy rate of industry identification is higher (such as, for the search word " neighbouring China Merchants Bank " that search rate is lower, utilize entry one to one in fuzzy dictionary to " China Merchants Bank " → " bank ", the industry of this search word can be recognized for " bank "), for the search word that other search rate is lower, need to utilize the entry of multi-to-multi in fuzzy dictionary to carrying out industry identification, accordingly, the industry obtaining multiple correspondence can be identified, now, utilize priority dictionary, owing to considering the priority of multiple sub-search word corresponding to the lower search word of this search rate, and the priority between multiple sub-search word is conducive to the search need reflecting user, therefore utilize the entry of multi-to-multi in fuzzy dictionary to the use coordinating priority dictionary, the accuracy rate of industry identification can be improved (such as, for the search word " China Merchants Bank near KFC " that search rate is lower, utilize multi-to-multi entry in fuzzy dictionary to " KFC ", " China Merchants Bank " → " cuisines ", " bank ", identify and obtain industry corresponding to this search word for " cuisines " and " bank ", utilize entry in priority dictionary to " KFC ", " China Merchants Bank " → " bank ", can identify and obtain industry corresponding to this search word for " bank ").
In such scheme, utilize priority to determine strategy, determine that this operation of priority of at least two sub-search words can have multiple implementation.
Can according to described at least two positions of sub-search word in the search word of correspondence, determine the priority of the priority of position sub-search word rearward higher than the forward sub-search word in position; The sub-search word that such as search word " China Merchants Bank near KFC " is corresponding is two, " KFC " and " China Merchants Bank " respectively, the position of its neutron search word " China Merchants Bank " rearward, the position of sub-search word " KFC " is forward, therefore determines that the priority of these two sub-search words is: the priority of sub-search word " China Merchants Bank " is higher than the priority of sub-search word " KFC ".
Also can be long according to the word of described at least two sub-search words, determine the priority of the priority of the sub-search word that word is longer higher than the long shorter sub-search word of word; The sub-search word that such as search word " China Merchants Bank near KFC " is corresponding is two, " KFC " and " China Merchants Bank " respectively, the word of its neutron search word " China Merchants Bank " is longer, the word of sub-search word " KFC " is long shorter, therefore determines that the priority of these two sub-search words is: the priority of sub-search word " China Merchants Bank " is higher than the priority of sub-search word " KFC ".
Also according to the click probability of the clicked Search Results of search word corresponding to described at least two sub-search words, can determine that the priority clicking sub-search word corresponding to the higher clicked Search Results of probability is higher than the priority clicking sub-search word corresponding to the lower clicked Search Results of probability.Such as, the clicked Search Results that search word " China Merchants Bank near KFC " is corresponding is two classes through statistics, wherein, the click probability including the Search Results of China Merchants Bank is far above the click probability of Search Results including KFC, therefore, determine that the priority of these two sub-search words is: the priority of sub-search word " China Merchants Bank " is higher than the priority of sub-search word " KFC ".
It should be noted that, by determining that the priority between multiple sub-search word is conducive to the search need of reflection user, such as, time user search " China Merchants Bank near KFC ", the Search Results that what real expectation obtained is about China Merchants Bank, but not about the Search Results of KFC, therefore, for the search word that some search rate is lower, need to utilize the entry of multi-to-multi in fuzzy dictionary to carrying out industry identification, accordingly, the industry obtaining multiple correspondence can be identified, now, utilize priority dictionary, owing to considering the priority of multiple sub-search word corresponding to the lower search word of this search rate, and the priority between multiple sub-search word is conducive to the search need reflecting user, therefore utilize the entry of multi-to-multi in fuzzy dictionary to the use coordinating priority dictionary, the accuracy rate of industry identification can be improved.
Above-mentioned several implementation can perform separately, also employing capable of being combined, and the present embodiment does not limit this.
Also it should be noted that, with regard to performing separately, the accuracy rate of the priority between three kinds of determined multiple sub-search words of implementation is different, is followed successively by from high to low: according to sub-search word position, the mode > according to the click probability determination priority of Search Results corresponding to sub-search word determines that the mode > of priority is according to the long mode determining priority of sub-search word word.
Embodiment four
Refer to Fig. 2, the structural representation of the apparatus for establishing of a kind of industry dictionary provided for the embodiment of the present invention four.This device comprises: log acquisition module 210, extraction module 220 and accurate dictionary form module 230.
Wherein, log acquisition module 210 is for obtaining user search user behaviors log; Extraction module 220 for extracting each search word from described user search user behaviors log, and the clicked Search Results of correspondence; Accurate dictionary forms module 230 for industry belonging to described clicked Search Results determination search word, using described search word as accurate index word, sets up and preserves described accurate index word and the entry pair of corresponding industry, forming accurate dictionary.
In such scheme, described accurate dictionary forms module can be specifically for: the search rate adding up each search word, and the click probability of the clicked Search Results of described correspondence; Described search rate is greater than to each search word of the first threshold value, when the click probability of clicked Search Results corresponding to this search word is greater than the second threshold value, using this search word as accurate index word, and determine that corresponding described click probability is greater than the industry belonging to Search Results of the second threshold value, as the industry that described accurate index word is corresponding; Set up and preserve described accurate index word and the entry pair of corresponding industry, forming accurate dictionary.
Further, described device also can comprise:
Fuzzy dictionary forms module, for after the search rate of each search word of statistics, described search rate is less than or equal to each search word of the first threshold value, utilize described accurate dictionary, this search word is split, obtain the sub-search word that this search word is corresponding, and the industry that described sub-search word is corresponding; Using sub-search word corresponding for this search word as fuzzy index word, set up and preserve the entry pair of described fuzzy index word and industry corresponding to described sub-search word, forming fuzzy dictionary.
Further, described device also can comprise: Priority Determination module and priority dictionary form module.
Wherein, Priority Determination module is used for, after the fuzzy dictionary of formation, when the quantity of sub-search word corresponding to this search word is at least two, utilizes priority to determine strategy, determining the priority of at least two sub-search words; Priority dictionary forms module for setting up and the entry pair of industry corresponding to the sub-search word the highest with priority of at least two sub-search words described in preserving, formation priority dictionary.
Particularly, described Priority Determination module can comprise at least one submodule following: the first priority determination submodule, the second priority determination submodule and the 3rd priority determination submodule.
Wherein, the first priority determination submodule is used for according to described at least two positions of sub-search word in the search word of correspondence, determines the priority of the priority of position sub-search word rearward higher than the forward sub-search word in position; Second priority determination submodule is used for long according to the word of described at least two sub-search words, determines the priority of the priority of the sub-search word that word is longer higher than the long shorter sub-search word of word; 3rd priority determination submodule is used for the click probability of the clicked Search Results according to search word corresponding to described at least two sub-search words, determines that the priority clicking sub-search word corresponding to the higher clicked Search Results of probability is higher than the priority clicking sub-search word corresponding to the lower clicked Search Results of probability.
The apparatus for establishing of the industry dictionary that the embodiment of the present invention provides can perform the method for building up of the industry dictionary that any embodiment of the present invention provides, and possesses the corresponding functional module of manner of execution and beneficial effect.
Embodiment five
Referring to Fig. 3, is the schematic flow sheet of a kind of industry recognition methods that the embodiment of the present invention five provides.The method of the embodiment of the present invention can be performed by the industry recognition device being configured to hardware and/or software simulating, and this implement device is typically configured in the server that the identification of information industry can be provided to serve.The dictionary that the method for building up of the industry dictionary that the method for the present embodiment provides based on any embodiment of the present invention is set up realizes.
The method comprises: operation 310 ~ operation 320.
310, the query string of user's input is obtained.
320, query string described in exact matching in the accurate dictionary set up in advance, using the industry corresponding to the accurate index word that the match is successful as industry corresponding to described query string, and returns industry corresponding to described query string.
The technical scheme of the present embodiment, because accurate dictionary automatically sets up based on the analysis of user search user behaviors log, along with the renewal of user search user behaviors log, the entry pair in accurate dictionary can be constantly updated, improve the coverage rate of accurate dictionary to query string; Utilize accurate dictionary, can realize the industry identification to query string, especially to the industry identification of the higher query string of search rate, both can be the accurate identification of one-one relationship, also can be the accurate identification of many-one relationship, improve the accuracy rate of the industry identification to query string.
In such scheme, in the accurate dictionary set up in advance after query string described in exact matching, described method also can comprise:
If it fails to match, then query string described in fuzzy matching in the fuzzy dictionary set up in advance, using industry corresponding for the fuzzy index word that the match is successful as industry corresponding to described query string, and returns industry corresponding to described query string.
The manner, for the query string (such as long-tail search word) that search rate is lower, utilize accurate dictionary cannot exact matching time, fuzzy dictionary is utilized to carry out fuzzy matching, the industry identification of the lower query string of this part search rate can be realized, the coverage rate to query string is further increased by accurate dictionary and fuzzy dictionary, wherein, for the query string that some search rate is lower, to utilize in fuzzy dictionary entry pair one to one, the industry that these query strings of identifiable design are corresponding, the accuracy rate of industry identification is higher.
Further, in the fuzzy dictionary set up in advance after query string described in fuzzy matching, described method also can comprise:
When the quantity of the fuzzy index word that the match is successful being detected at least two, utilize the priority dictionary set up in advance, the industry that described in determining, at least two fuzzy index words are corresponding, as the industry that described query string is corresponding, and returns industry corresponding to described query string.
The manner, for the query string (such as long-tail search word) that search rate is lower, utilize accurate dictionary cannot exact matching time, fuzzy dictionary is utilized to carry out fuzzy matching, the industry identification of the lower query string of this part search rate can be realized, the coverage rate to query string is further increased by accurate dictionary and fuzzy dictionary, wherein, for the query string that some search rate is lower, need to utilize the entry of multi-to-multi in fuzzy dictionary to carrying out industry identification, accordingly, the industry obtaining multiple correspondence can be identified, now, utilize priority dictionary, owing to considering the priority of multiple sub-search word corresponding to the lower query string of search rate, and the priority between multiple sub-search word is conducive to the search need reflecting user, therefore utilize the entry of multi-to-multi in fuzzy dictionary to the use coordinating priority dictionary, the accuracy rate of industry identification can be improved.
To sum up, present embodiments provide the method that three kinds identify the industry that query string is corresponding, one utilizes accurate dictionary to carry out industry identification, another kind is when utilizing accurate dictionary recognition failures, fuzzy dictionary is utilized to carry out industry identification, another is when utilizing accurate dictionary recognition failures, utilizes fuzzy dictionary and coordinates the use of priority dictionary, realizing industry identification.
In above-mentioned various industry recognition methods, after returning industry corresponding to described query string, described method also can comprise:
The industry corresponding according to described query string, carries out truncation to the Search Results corresponding with described query string, obtain described query string corresponding recall result;
Result is recalled described in returning.
This programme can be applicable to the backstage of multiple vertical channel classification search, for filter search results, obtains recalling result.Such as be applied to the backstage of vertical channel classification search " Baidu's map ", for recalling and blocking.Application mode is, after industry identification, the search of general demand will be carried out coupling according to query string itself and recall, also can carry out industry according to the industry identified to recall, recall result to block according to the result of industry identification, thus result is recalled in impact, recognition result also may be used for sorting to recalling result simultaneously.
In above-mentioned various industry recognition methods, after returning industry corresponding to described query string, described method also can comprise:
The industry corresponding according to described query string carries out information recommendation;
Or comprise:
The industry corresponding according to described query string, determines recommendation information, and chooses component exhibiting;
According to selected component exhibiting, described recommendation information is processed, return described result.
This two schemes can be applicable to the foreground of multiple vertical channel classification search.To be applied to the foreground of vertical channel classification search " Baidu's map ", briefly introduce according to the information recommendation of industry recognition result and displaying.Such as, suppose user's input inquiry string " China Merchants Bank " in the search box of " Baidu's map ", the industry recognized is " bank ", according to the sector recognition result, determine corresponding recommendation information, and choose the component exhibiting corresponding with industry " bank ", recommendation information is shown according to this component exhibiting, as, represent recommendation information in the form of a list, and illustrate bank business hours, be the information such as ATM or agency.And for example, suppose user's input inquiry string " KFC " in the search box of " Baidu's map ", the industry recognized is " cuisines ", according to the sector recognition result, determine corresponding recommendation information, and choose the component exhibiting corresponding with industry " cuisines ", recommendation information is shown according to this component exhibiting, as, show the group purchase information of shops of each KFC, whether support to make a reservation on the net, whether collect the information such as the expense of sending outside.For another example, suppose user's input inquiry string " Four Seasons Hotel " in the search box of " Baidu's map ", the industry recognized is " hotel ", according to the sector recognition result, determine corresponding recommendation information, and choose the component exhibiting corresponding with industry " hotel ", recommendation information is shown according to this component exhibiting, as, show the address of each Four Seasons Hotel, move in and the information such as reservation.
Embodiment six
Referring to Fig. 4, is the structural representation of a kind of industry recognition device that the embodiment of the present invention six provides.The dictionary that the apparatus for establishing of the industry dictionary that the device of the present embodiment provides based on the embodiment of the present invention is set up realizes.This device comprises: query string acquisition module 410 and industry identification module 420.
Wherein, query string acquisition module 410 is for obtaining the query string of user's input; Industry identification module 420, for query string described in exact matching in the accurate dictionary set up in advance, using the industry corresponding to the accurate index word that the match is successful as industry corresponding to described query string, and returns industry corresponding to described query string.
In such scheme, described industry identification module 420 to be also used in the accurate dictionary set up in advance after query string described in exact matching, if it fails to match, then query string described in fuzzy matching in the fuzzy dictionary set up in advance, using industry corresponding for the fuzzy index word that the match is successful as industry corresponding to described query string, and return industry corresponding to described query string.
Further, described industry identification module 420 to be also used in the fuzzy dictionary set up in advance after query string described in fuzzy matching, when the quantity of the fuzzy index word that the match is successful being detected at least two, utilize the priority dictionary set up in advance, the industry that described in determining, at least two fuzzy index words are corresponding, as the industry that described query string is corresponding, and return industry corresponding to described query string.
In such scheme, described device also can comprise: recall result acquisition module and recall result and return module.
Wherein, recall result acquisition module for after returning industry corresponding to described query string, the industry corresponding according to described query string, carries out truncation to the Search Results corresponding with described query string, obtain described query string corresponding recall result; Recall result to return module recall result described in returning.
In such scheme, described device also can comprise:
Information recommendation module, for after returning industry corresponding to described query string, the industry corresponding according to described query string carries out information recommendation;
Or comprise: recommend key element determination module and show processing module.
Wherein, recommend key element determination module to be used for after returning industry corresponding to described query string, the industry corresponding according to described query string, determines recommendation information, and chooses component exhibiting; Show that processing module is used for processing described recommendation information according to selected component exhibiting, return described result.
The industry recognition device that the embodiment of the present invention provides can perform the industry recognition methods that any embodiment of the present invention provides, and possesses the corresponding functional module of manner of execution and beneficial effect.
Last it is noted that above each embodiment is only for illustration of technical scheme of the present invention, but not be limited; In embodiment preferred embodiment, be not limited, to those skilled in the art, the present invention can have various change and change.All do within spirit of the present invention and principle any amendment, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (20)

1. a method for building up for industry dictionary, is characterized in that, comprising:
Obtain user search user behaviors log;
Each search word is extracted from described user search user behaviors log, and the clicked Search Results of correspondence;
Industry belonging to described clicked Search Results determination search word, using described search word as accurate index word, sets up and preserves described accurate index word and the entry pair of corresponding industry, forming accurate dictionary.
2. method according to claim 1, is characterized in that, industry belonging to described clicked Search Results determination search word, using described search word as accurate index word, set up and preserve described accurate index word and the entry pair of corresponding industry, forming accurate dictionary, comprising:
Add up the search rate of each search word, and the click probability of the clicked Search Results of described correspondence;
Search rate is greater than to each search word of the first threshold value, when the click probability of clicked Search Results corresponding to this search word is greater than the second threshold value, using this search word as accurate index word, and determine that corresponding described click probability is greater than the industry belonging to Search Results of the second threshold value, as the industry that described accurate index word is corresponding; Set up and preserve described accurate index word and the entry pair of corresponding industry, forming accurate dictionary.
3. method according to claim 2, is characterized in that, after the search rate of each search word of statistics, described method also comprises:
Described search rate is less than or equal to each search word of the first threshold value, utilizes described accurate dictionary, this search word is split, obtain the sub-search word that this search word is corresponding, and the industry that described sub-search word is corresponding; Using sub-search word corresponding for this search word as fuzzy index word, set up and preserve the entry pair of described fuzzy index word and industry corresponding to described sub-search word, forming fuzzy dictionary.
4. method according to claim 3, is characterized in that, after the fuzzy dictionary of formation, described method also comprises:
When the quantity of sub-search word corresponding to this search word is at least two, priority is utilized to determine strategy, determine the priority of at least two sub-search words, set up and the entry pair of industry corresponding to the sub-search word the highest with priority of at least two sub-search words described in preserving, form priority dictionary.
5. method according to claim 4, is characterized in that, utilizes priority to determine strategy, determines the priority of at least two sub-search words, comprises following at least one item:
According to described at least two positions of sub-search word in the search word of correspondence, determine the priority of the priority of position sub-search word rearward higher than the forward sub-search word in position;
Long according to the word of described at least two sub-search words, determine the priority of the priority of the sub-search word that word is longer higher than the long shorter sub-search word of word;
According to the click probability of the clicked Search Results of search word corresponding to described at least two sub-search words, determine that the priority clicking sub-search word corresponding to the higher clicked Search Results of probability is higher than the priority clicking sub-search word corresponding to the lower clicked Search Results of probability.
6. an apparatus for establishing for industry dictionary, is characterized in that, comprising:
Log acquisition module, for obtaining user search user behaviors log;
Extraction module, for extracting each search word from described user search user behaviors log, and the clicked Search Results of correspondence;
Accurate dictionary forms module, for industry belonging to described clicked Search Results determination search word, using described search word as accurate index word, sets up and preserves described accurate index word and the entry pair of corresponding industry, forming accurate dictionary.
7. device according to claim 6, is characterized in that, described accurate dictionary formed module specifically for:
Add up the search rate of each search word, and the click probability of the clicked Search Results of described correspondence;
Described search rate is greater than to each search word of the first threshold value, when the click probability of clicked Search Results corresponding to this search word is greater than the second threshold value, using this search word as accurate index word, and determine that corresponding described click probability is greater than the industry belonging to Search Results of the second threshold value, as the industry that described accurate index word is corresponding; Set up and preserve described accurate index word and the entry pair of corresponding industry, forming accurate dictionary.
8. device according to claim 7, is characterized in that, described device also comprises:
Fuzzy dictionary forms module, for after the search rate of each search word of statistics, described search rate is less than or equal to each search word of the first threshold value, utilize described accurate dictionary, this search word is split, obtain the sub-search word that this search word is corresponding, and the industry that described sub-search word is corresponding; Using sub-search word corresponding for this search word as fuzzy index word, set up and preserve the entry pair of described fuzzy index word and industry corresponding to described sub-search word, forming fuzzy dictionary.
9. device according to claim 8, is characterized in that, described device also comprises:
Priority Determination module, for after the fuzzy dictionary of formation, when the quantity of sub-search word corresponding to this search word is at least two, utilizes priority to determine strategy, determines the priority of at least two sub-search words;
Priority dictionary forms module, for setting up and the entry pair of industry corresponding to the sub-search word the highest with priority of at least two sub-search words described in preserving, and formation priority dictionary.
10. device according to claim 9, is characterized in that, described Priority Determination module comprises at least one submodule following:
First priority determination submodule, for according to described at least two positions of sub-search word in the search word of correspondence, determines the priority of the priority of position sub-search word rearward higher than the forward sub-search word in position;
Second priority determination submodule, for long according to the word of described at least two sub-search words, determines the priority of the priority of the sub-search word that word is longer higher than the long shorter sub-search word of word;
3rd priority determination submodule, for the click probability of the clicked Search Results according to search word corresponding to described at least two sub-search words, determine that the priority clicking sub-search word corresponding to the higher clicked Search Results of probability is higher than the priority clicking sub-search word corresponding to the lower clicked Search Results of probability.
11. 1 kinds of industry recognition methodss, the dictionary that the method for building up based on the arbitrary described industry dictionary of claim 1-5 is set up realizes, and it is characterized in that, comprising:
Obtain the query string of user's input;
Query string described in exact matching in the accurate dictionary set up in advance, using the industry corresponding to the accurate index word that the match is successful as industry corresponding to described query string, and returns industry corresponding to described query string.
12. methods according to claim 11, is characterized in that, in the accurate dictionary set up in advance after query string described in exact matching, described method also comprises:
If it fails to match, then query string described in fuzzy matching in the fuzzy dictionary set up in advance, using industry corresponding for the fuzzy index word that the match is successful as industry corresponding to described query string, and returns industry corresponding to described query string.
13. methods according to claim 12, is characterized in that, in the fuzzy dictionary set up in advance after query string described in fuzzy matching, described method also comprises:
When the quantity of the fuzzy index word that the match is successful being detected at least two, utilize the priority dictionary set up in advance, the industry that described in determining, at least two fuzzy index words are corresponding, as the industry that described query string is corresponding, and returns industry corresponding to described query string.
14. according to the arbitrary described method of claim 11-13, and it is characterized in that, after returning industry corresponding to described query string, described method also comprises:
The industry corresponding according to described query string, carries out truncation to the Search Results corresponding with described query string, obtain described query string corresponding recall result;
Result is recalled described in returning.
15. according to the arbitrary described method of claim 11-13, and it is characterized in that, after returning industry corresponding to described query string, described method also comprises:
The industry corresponding according to described query string carries out information recommendation; Or
The industry corresponding according to described query string, determines recommendation information, and chooses component exhibiting; According to selected component exhibiting, described recommendation information is processed, return described result.
16. 1 kinds of industry recognition devices, the dictionary that the apparatus for establishing based on the arbitrary described industry dictionary of claim 6-10 is set up realizes, and it is characterized in that, comprising:
Query string acquisition module, for obtaining the query string of user's input;
Industry identification module, for query string described in exact matching in the accurate dictionary set up in advance, using the industry corresponding to the accurate index word that the match is successful as industry corresponding to described query string, and returns industry corresponding to described query string.
17. devices according to claim 16, it is characterized in that, described industry identification module, also for after query string described in exact matching in the accurate dictionary set up in advance, if it fails to match, then query string described in fuzzy matching in the fuzzy dictionary set up in advance, using industry corresponding for the fuzzy index word that the match is successful as industry corresponding to described query string, and returns industry corresponding to described query string.
18. devices according to claim 17, it is characterized in that, described industry identification module, also for after query string described in fuzzy matching in the fuzzy dictionary set up in advance, when the quantity of the fuzzy index word that the match is successful being detected at least two, utilize the priority dictionary set up in advance, the industry that described in determining, at least two fuzzy index words are corresponding, as the industry that described query string is corresponding, and return industry corresponding to described query string.
19. according to the arbitrary described device of claim 16-18, and it is characterized in that, described device also comprises:
Recall result acquisition module, for after returning industry corresponding to described query string, the industry corresponding according to described query string, carries out truncation to the Search Results corresponding with described query string, obtain described query string corresponding recall result;
Recall result and return module, described in returning, recall result.
20. according to the arbitrary described device of claim 16-18, and it is characterized in that, described device also comprises:
Information recommendation module, for after returning industry corresponding to described query string, the industry corresponding according to described query string carries out information recommendation;
Or:
Recommend key element determination module, for after returning industry corresponding to described query string, the industry corresponding according to described query string, determines recommendation information, and chooses component exhibiting;
Showing processing module, for processing described recommendation information according to selected component exhibiting, returning described result.
CN201510613993.4A 2015-09-23 2015-09-23 The method for building up and device of industry dictionary and industry recognition methods and device Active CN105159884B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510613993.4A CN105159884B (en) 2015-09-23 2015-09-23 The method for building up and device of industry dictionary and industry recognition methods and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510613993.4A CN105159884B (en) 2015-09-23 2015-09-23 The method for building up and device of industry dictionary and industry recognition methods and device

Publications (2)

Publication Number Publication Date
CN105159884A true CN105159884A (en) 2015-12-16
CN105159884B CN105159884B (en) 2018-06-29

Family

ID=54800743

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510613993.4A Active CN105159884B (en) 2015-09-23 2015-09-23 The method for building up and device of industry dictionary and industry recognition methods and device

Country Status (1)

Country Link
CN (1) CN105159884B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423362A (en) * 2017-06-20 2017-12-01 阿里巴巴集团控股有限公司 Industry determines method, Method of Get Remote Object and device, client, server
CN107612707A (en) * 2017-08-04 2018-01-19 上海斐讯数据通信技术有限公司 The preprocess method and system of the homologous sample data classification storage in Industry-oriented field
CN107832468A (en) * 2017-11-29 2018-03-23 百度在线网络技术(北京)有限公司 Demand recognition methods and device
CN108536800A (en) * 2018-04-03 2018-09-14 有米科技股份有限公司 File classification method, system, computer equipment and storage medium
CN110688558A (en) * 2019-09-10 2020-01-14 中国平安财产保险股份有限公司 Method and device for searching web page, electronic equipment and storage medium
WO2020147332A1 (en) * 2019-01-16 2020-07-23 苏宁云计算有限公司 Method and apparatus for expanding commodity search and recall
CN113268978A (en) * 2020-02-17 2021-08-17 北京搜狗科技发展有限公司 Information generation method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101140587A (en) * 2007-10-15 2008-03-12 深圳市迅雷网络技术有限公司 Searching method and apparatus
CN102169495A (en) * 2011-04-11 2011-08-31 趣拿开曼群岛有限公司 Industry dictionary generating method and device
CN103136339A (en) * 2013-02-01 2013-06-05 百度在线网络技术(北京)有限公司 Searching method, client-side and network server-side based on service information
CN103226601A (en) * 2013-04-25 2013-07-31 百度在线网络技术(北京)有限公司 Method and device for image search
US20150254230A1 (en) * 2012-09-28 2015-09-10 Alkis Papadopoullos Method and system for monitoring social media and analyzing text to automate classification of user posts using a facet based relevance assessment model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101140587A (en) * 2007-10-15 2008-03-12 深圳市迅雷网络技术有限公司 Searching method and apparatus
CN102169495A (en) * 2011-04-11 2011-08-31 趣拿开曼群岛有限公司 Industry dictionary generating method and device
US20150254230A1 (en) * 2012-09-28 2015-09-10 Alkis Papadopoullos Method and system for monitoring social media and analyzing text to automate classification of user posts using a facet based relevance assessment model
CN103136339A (en) * 2013-02-01 2013-06-05 百度在线网络技术(北京)有限公司 Searching method, client-side and network server-side based on service information
CN103226601A (en) * 2013-04-25 2013-07-31 百度在线网络技术(北京)有限公司 Method and device for image search

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423362A (en) * 2017-06-20 2017-12-01 阿里巴巴集团控股有限公司 Industry determines method, Method of Get Remote Object and device, client, server
CN107612707A (en) * 2017-08-04 2018-01-19 上海斐讯数据通信技术有限公司 The preprocess method and system of the homologous sample data classification storage in Industry-oriented field
CN107832468A (en) * 2017-11-29 2018-03-23 百度在线网络技术(北京)有限公司 Demand recognition methods and device
CN108536800A (en) * 2018-04-03 2018-09-14 有米科技股份有限公司 File classification method, system, computer equipment and storage medium
WO2020147332A1 (en) * 2019-01-16 2020-07-23 苏宁云计算有限公司 Method and apparatus for expanding commodity search and recall
CN110688558A (en) * 2019-09-10 2020-01-14 中国平安财产保险股份有限公司 Method and device for searching web page, electronic equipment and storage medium
CN113268978A (en) * 2020-02-17 2021-08-17 北京搜狗科技发展有限公司 Information generation method and device and electronic equipment

Also Published As

Publication number Publication date
CN105159884B (en) 2018-06-29

Similar Documents

Publication Publication Date Title
CN105159884A (en) Method and device for establishing industry dictionary and industry identification method and device
CN108228825B (en) A kind of station address data cleaning method based on participle
CN103368992A (en) Message push method and device
CN103514299A (en) Information searching method and device
CN105550171A (en) Error correction method and system for query information of vertical search engine
CN106022349B (en) Method and system for device type determination
CN111639253B (en) Data weight judging method, device, equipment and storage medium
CN101984422A (en) Fault-tolerant text query method and equipment
CN104468107A (en) Method and device for verification data processing
CN105608113A (en) Method and apparatus for judging POI data in text
KR102601545B1 (en) Geographic position point ranking method, ranking model training method and corresponding device
CN116881430B (en) Industrial chain identification method and device, electronic equipment and readable storage medium
CN105095464A (en) Method and device for detecting retrieval system
CN113051183A (en) Test data recommendation method and system, electronic device and storage medium
CN110427574B (en) Route similarity determination method, device, equipment and medium
US8463799B2 (en) System and method for consolidating search engine results
CN112183052B (en) Document repetition degree detection method, device, equipment and medium
CN105574091A (en) Information push method and device
CN111259058B (en) Data mining method, data mining device and electronic equipment
CN103970732A (en) Mining method and device of new word translation
CN111984876A (en) Interest point processing method, device, equipment and computer readable storage medium
CN116743474A (en) Decision tree generation method and device, electronic equipment and storage medium
CN115098362A (en) Page testing method and device, electronic equipment and storage medium
CN112861532B (en) Address standardization processing method, device, equipment and online searching system
CN107577667A (en) A kind of entity word treating method and apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant