WO2017020454A1 - 检索方法和装置 - Google Patents

检索方法和装置 Download PDF

Info

Publication number
WO2017020454A1
WO2017020454A1 PCT/CN2015/096012 CN2015096012W WO2017020454A1 WO 2017020454 A1 WO2017020454 A1 WO 2017020454A1 CN 2015096012 W CN2015096012 W CN 2015096012W WO 2017020454 A1 WO2017020454 A1 WO 2017020454A1
Authority
WO
WIPO (PCT)
Prior art keywords
aging
search
search formula
keyword
preset
Prior art date
Application number
PCT/CN2015/096012
Other languages
English (en)
French (fr)
Inventor
邹红建
方高林
程军
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Priority to US15/534,373 priority Critical patent/US10558694B2/en
Publication of WO2017020454A1 publication Critical patent/WO2017020454A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3349Reuse of stored results of previous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • the present application relates to the field of computers, and in particular to the field of search, and in particular to a retrieval method and apparatus.
  • the search result returned by the search result that is used for the search may be closer to the current time.
  • This is called the aging search formula.
  • the recognition of the aging search formula is mainly through two ways of user behavior recognition and language model recognition.
  • the user behavior recognition manner is that the search expression in the search log that the query frequency is greater than the preset number threshold at a certain time point is recognized as the aging search formula.
  • the language model recognition method is to calculate the score of the search formula on different language models, and the search whose score difference is greater than the preset score threshold will be recognized as the time-sensitive search formula. Then, in the above manner, when the recognition accuracy is improved, it is necessary to raise the corresponding threshold. At the same time, raising the corresponding threshold will cause a decrease in the recognition recall rate, which in turn affects the recognition effect of the aging search formula.
  • the present application provides a retrieval method and apparatus for solving the technical problems existing in the above background art.
  • the present application provides a retrieval method, which includes: searching a first aging search formula set from a search log, wherein the search log is used to record a search formula used by a user for searching, and the aging search formula is used.
  • the present application provides a retrieval apparatus, the apparatus comprising: a search unit configured to search a first aging search formula set from a search log, wherein the search log is used to record a search expression used by a user during a search.
  • the aging search formula is a search formula in which the difference between the publishing time of the search result returned by the search and the current time is less than the preset time difference threshold value; the selecting unit is configured to select and satisfy the set based on the first aging search formula set.
  • the search formula of one of the following selection conditions is used as a candidate aging search formula: located in the search log and semantically associated with the first aging search formula in the first aging search formula set; located in the search log and including the preset keyword combination
  • the preset keyword is a word that appears in the first aging search formula set to be greater than a preset threshold, the preset keyword combination is generated by combining the preset keywords; and the processing unit is configured to use the candidate aging time
  • the search execution processing operation obtains a second aging search formula, and the processing operation includes one of the following: removing the candidate aging search formula a candidate aging search formula having a semantic similarity to the first aging search formula that is less than a preset threshold; removing a term in the word included in the candidate aging search formula that has a semantic relevance to the candidate aging search formula that is less than a preset relevance threshold;
  • the unit is configured to perform a search by using a second aging search formula when the search formula input by the user matches the second aging
  • the search method and apparatus finds a first aging search formula set from a search log; and selects from the search formula in the search log to associate with the first aging search semantics in the first aging search formula set.
  • the candidate aging formula and the candidate aging search formula including the preset keyword combination; processing the candidate aging search formula to obtain the second aging search formula.
  • a second aging search formula is obtained from the search formula of the search log based on the first aging search formula that has been identified, thereby increasing the recognition recall rate in the case of ensuring the recognition accuracy in the recognition process of the aging search formula. In turn, the recognition effect of the aging search formula is improved.
  • Figure 1 shows a flow chart of one embodiment of a retrieval method in accordance with the present application
  • FIG. 2 shows a schematic diagram of finding a candidate aging search formula based on semantic keywords
  • FIG. 3 shows a schematic diagram of finding a candidate aging search formula based on a core expression dictionary
  • FIG. 4 shows a flow chart of another embodiment of a retrieval method in accordance with the present application.
  • Figure 5 is a block diagram showing the structure of an embodiment of a retrieval device according to the present application.
  • FIG. 6 is a schematic structural diagram of a computer system according to an embodiment of the present invention.
  • the first aging search formula is a search formula that has been identified as a aging search formula
  • the second aging search formula is a new aging search formula obtained from the search log based on the first aging search formula. That is, based on the first aging search formula, from the search log The time-sensitive search style of the recall.
  • FIG. 1 illustrates a flow 100 of one embodiment of a retrieval method in accordance with the present application.
  • the method includes the following steps:
  • Step 101 Find a first aging search formula set from the search log.
  • the search log may be a search formula used to record a user search.
  • the aging search formula may be a search formula in which the difference between the posting time and the current time of the search result returned when searching is less than a preset threshold. For example, when the user searches by using a search formula, when the user desires to obtain a picture of the latest news event related to the search formula, that is, when the release time of the desired search result is closer to the current time, the user may be The input search formula is called an aging search formula with time-sensitive requirements. In the present embodiment, some words that characterize the timeliness of the search formula, such as "event, occurrence, earnings", etc., may be set in advance.
  • the search formula can be recognized as the first aging search formula.
  • the number of queries of the search formula input by the user in a certain period of time may also be detected.
  • the search formula may be identified as the first aging search formula.
  • Step 102 Select, according to the first aging search formula set, a search formula that satisfies one of the following selection conditions as a candidate aging search formula: the first aging search formula located in the search log and the first aging search formula is semantically Associated; located in the search log and containing a preset keyword combination.
  • the preset keyword is a word that appears in the first aging search formula set to be greater than a preset threshold
  • the preset keyword combination is generated by combining the preset keywords.
  • the search formula associated with the first aging search formula in the first aging search formula set may be further searched from the search log as the candidate aging search formula based on the first aging search formula set that has been found. .
  • the association relationship may be that the search formula in the search log is semantically associated with the first aging search formula or the search formula in the search log includes words that are appearing in the first aging search formula set to be greater than a preset threshold.
  • a set of preset keyword combinations may be that the search formula in the search log is semantically associated with the first aging search formula or the search formula in the search log includes words that are appearing in the first aging search formula set to be greater than a preset threshold.
  • the second semantic keyword is a word whose semantic relevance to the search formula in the search log is greater than a second preset semantic relevance threshold; determining whether the first semantic keyword matches the second semantic keyword; if yes, The search formula in the search log is selected as a candidate aging search formula.
  • the semantic keyword (also referred to as a semantic signature) is a word whose relevance degree to the semantics of the corresponding search formula is greater than a preset relevance degree threshold, that is, the semantic keyword may reflect the search formula to which it belongs.
  • the true semantics For example, the core semantics of the search-style "How high Yao Ming is” is how tall Yao Ming is.
  • the semantic keywords are "Yao Ming" and "Height".
  • the first semantic keyword that reflects its true semantics may be extracted from the first aging search formula
  • the second semantic keyword that can reflect its true semantics is extracted from the search formula in the search log. .
  • the proposed first semantic keyword may be added to a pre-set vocabulary for storing the first semantic keyword, or the vocabulary may be referred to as a semantic signature dictionary. Then, the second semantic keyword may be matched with the first semantic keyword in the semantic signature dictionary. When the second semantic keyword matches the first semantic keyword, the search formula in the search log may be determined as Candidate aging search.
  • extracting the semantic keyword may be implemented by a semantic similarity algorithm.
  • the semantic similarity algorithm can use the Levenshtein distance algorithm and the Jaccard Coefficient algorithm.
  • the first semantic keyword and the second semantic keyword may be respectively extracted from the first aging search formula and the search formula in the search log by using the semantic similarity algorithm described above.
  • FIG. 2 shows a schematic diagram of finding a candidate aging search formula based on semantic keywords.
  • the first semantic keyword of each first aging search formula in the first aging search formula set may be extracted in advance, and then the semantic signature of the first aging search formula is aggregated to generate a semantic signature dictionary.
  • the second semantic keyword of the search formula in the search log can be extracted. Then, it can be determined whether the second semantic keyword of the search formula in the search log is in the semantic signature dictionary, and it is determined whether the search formula in the search log is a candidate aging search formula.
  • the preset keyword combination may be generated by: finding a co-occurrence keyword from the first aging search formula set, generating a co-occurrence keyword combination, and co-occurring keywords Words that coexist in the first aging search formula set and the number of occurrences is greater than a preset threshold; based on the relevance degree parameter of each co-occurrence keyword in the co-occurrence keyword combination, the association degree parameter corresponding to the co-occurrence keyword combination is obtained
  • the relevance degree parameter indicates a semantic relevance degree of the co-occurrence keyword and the first aging search formula to which it belongs; determining whether the association degree parameter corresponding to the co-occurrence keyword combination is greater than a preset relevance degree threshold; if so, the co-occurrence keyword combination is used as Preset keyword combinations.
  • the co-occurrence keyword can be found from the first aging search formula set in the following manner: first, the statistics of the words co-occurring in the first aging search formula are counted within a certain period of time (for example, 1 day). The number of occurrences (also known as the number of co-occurrences). Then, find the words (also called co-occurrence keywords) whose co-occurrence times are greater than the preset number threshold. After finding the co-occurrence keyword, the semantic relevance degree of the co-occurrence keyword and the first aging search formula (also referred to as the importance parameter) may be further calculated, and the importance parameter is normalized and then accumulated. The importance value of the co-occurrence keyword combination is obtained. Finally, the co-occurrence keyword combination whose importance value is greater than a certain preset threshold is used as the preset co-occurrence keyword combination.
  • the preset keyword combination is generated by: searching for an event keyword from the first aging search formula set, and the event keyword is appearing in the first aging search formula set.
  • the number of times is greater than a preset threshold, and the first aging search formula to which it belongs and the number of search formulas in the search log are all greater than the preset number threshold; the key for combination is found from the first aging search formula containing the event keyword
  • the word combination is a word whose number of occurrences in the first aging search formula containing the event keyword is greater than a preset threshold; the event keyword is combined with the combined keyword to generate a preset keyword combination.
  • the event keyword may be searched from the first aging search formula set in the following manner: first, the number of times the statistic word appears in the first aging search formula, and at the same time, the first aging time at which the word is located is counted. The number of search terms in the search and search logs (also known as the number of scatters). Then, the words whose number of occurrences and the number of scatters are greater than the preset number threshold are selected as event keywords. In this embodiment, after the event keyword is acquired, it may be further obtained from the first aging search formula including the event keyword. Combine keywords.
  • the number of occurrences of the words in the first aging search formula including the event keyword may be counted, and the words whose occurrence number is greater than a certain preset number threshold may be used as the combined keyword. Finally, the combination is combined with the event keyword to generate a preset keyword combination.
  • the first aging search formula may be aggregated according to the event keyword to generate a first aging search formula set corresponding to the news event.
  • keywords such as "the richest man”, “Asia”, “Li Ka-shing”, and “recapture” can be further searched.
  • the event keyword "Wang Jianlin” is combined with the above combination with keywords to generate a preset keyword combination. Namely the combination of “Wang Jianlin” & “The First Richest Man”, “Wang Jianlin” & “Asia”, “Wang Jianlin” & “Li Ka-shing”, “Wang Jianlin” & “Regain”.
  • the first aging search formula in the first aging search formula set may be pre-processed to obtain the entity word in the first aging search formula, and the pre-processing operation includes at least one of the following: a word segmentation operation, a part-of-speech tagging operation, a named entity recognizing operation; extracting a keyword associated with a template word in the preset template from the entity word; combining the keywords associated with the template word in the preset template to generate a preset key Word combination.
  • a template for expressing a time-sensitive event may be set in advance, for example, a template including a template word such as "** occurrence**", "** earthquake", "** event", and the like is set in advance.
  • the preset keyword combination may be generated in the following manner: first, performing pre-processing operations such as word segmentation, part-of-speech tagging, and named entity recognition on the first aging search formula, obtaining an entity word, and calculating the degree of association between the entity word and the first aging search formula. . Then, the entity in the first aging search formula that is not in the preset template and whose relevance is lower than the preset relevance threshold is removed.
  • the first aging search formula after the entity language that is not in the preset template and whose relevance is lower than the preset relevance threshold is removed is matched with the preset template, and the template words in the preset template are extracted.
  • the associated keywords are combined with the keywords associated with the template words in the preset template to generate a preset keyword combination.
  • the corresponding keyword may be extracted from the first aging search formula set by extracting the co-occurrence keyword, the event keyword, and the preset template, and the preset corresponding to the first aging search formula set is generated. Keyword combination, this keyword combination is also called core expression keyword combination). After generating the core expression keyword combination, the core expression keyword combination may be stored in a preset core expression dictionary.
  • the keyword of the search log may be extracted from the search formula of the search log, and the core expression keyword combination corresponding to the search formula of the search log may be generated by extracting the co-occurrence keyword, the event keyword, and the preset template. Then, it can be determined whether the core expression keyword combination corresponding to the search formula of the search log is in the core expression dictionary, and if so, the search formula of the search log is selected as the candidate aging search formula.
  • Step 103 Perform a processing operation on the candidate aging search formula to obtain a second aging search formula.
  • the candidate aging search formula may be further processed to obtain a second aging search formula.
  • the processing operation may be to remove the candidate aging search formula whose semantic similarity with the first aging search formula is less than the preset threshold in the candidate aging search formula, and then compare the semantic similarity between the candidate aging search formula and the first aging search formula to be greater than the preset
  • the candidate aging search formula of the threshold is used as the second aging search formula.
  • the processing operation may further remove the words whose semantic relevance degree from the candidate aging search formula is less than the preset relevance degree threshold in the words included in the candidate aging search formula, and then remove the candidate aging time search formula.
  • the candidate aging search formula after the word whose semantic relevance is less than the preset relevance degree threshold is used as the second aging search formula.
  • Step 104 when the search formula input by the user matches the second aging search formula,
  • the second aging search type searches.
  • the second aging search formula when the user performs a search, it may be determined whether the second aging search formula is included in the search formula input by the user, and when the second aging search formula is included in the search formula input by the user, the second aging search may be utilized. Search to get search results that are closer to the current time. For example, the user desires to obtain a news picture related to the most recent news event, and when the search formula input by the user includes a second aging search formula composed of keywords constituting the news event, the second aging search may be utilized. The search is performed to return to the user a news picture about the news event that is closer to the current time.
  • FIG. 4 illustrates a flow 400 of another embodiment of a retrieval method in accordance with the present application.
  • the method includes the following steps:
  • Step 401 Find a first aging search formula set from the search log.
  • some words indicating the timeliness of the search formula may be set in advance, and when the search formula input by the user includes the time-sensitive word, the search formula may be recognized as the first aging search formula.
  • the number of queries of the search formula input by the user in a certain period of time may also be detected. When the number of queries is greater than the preset number threshold, the search formula may be identified as the first aging search formula.
  • Step 402 based on the first aging search formula set, select a search formula that satisfies one of the following selection conditions as a candidate aging search formula: located in the search log and semantically related to the first aging search formula in the first aging search formula set Associated; located in the search log and containing a preset keyword combination.
  • the search formula associated with the first aging search formula in the first aging search formula set may be further searched from the search log as the candidate aging search formula based on the first aging search formula set that has been found.
  • the association relationship may be that the search formula in the search log is semantically associated with the first aging search formula or the search formula in the search log includes words that are appearing in the first aging search formula set to be greater than a preset threshold. A set of preset keyword combinations.
  • Step 403 Obtain a second aging search formula from the candidate aging search formula based on the semantic similarity and the historical aging search term and the preset verification word.
  • the candidate aging search formula and the first time may be calculated. Semantic similarity between the search terms; removing the candidate aging search formula whose semantic similarity with the first aging search formula is less than the preset threshold in the candidate aging search formula; the semantic similarity with the first aging search formula is removed is less than The candidate aging search formula after the candidate aging time search formula of the preset threshold is used as the second aging search formula.
  • the candidate aging search formula when the candidate aging search formula is semantically associated with the first aging search formula in the first aging search formula set, that is, the semantic signature extracted from the candidate aging search formula and the first aging search
  • the semantic similarity between the candidate aging search formula and the first aging search formula (also referred to as literal similarity) may be further calculated, and then the candidate whose literal similarity is greater than a certain preset threshold may be further calculated.
  • the aging search formula is used as the second aging search formula.
  • the words matching the preset verification word and the historical aging search word in the candidate aging search formula may be further removed. Then, the candidate aging search formula of the word matching the preset verification word and the historical aging search formula is removed as the second aging search formula.
  • the historical aging search term may be a keyword such as a co-occurrence keyword or an event keyword extracted from a historically recognized aging search formula.
  • the preset verification word may be a word that does not have a time requirement or cannot express the core semantics of the event, that is, a word whose degree of association with the first aging search formula is less than a preset threshold.
  • the candidate aging search formula is further verified by the verification method corresponding to the condition satisfied by the candidate aging search formula by the condition satisfied by the candidate aging search formula, and finally the second aging search is obtained from the search log. , a new time-sensitive search that is recalled from the search log. Therefore, in the process of recognizing the search formula in the search log of the user, the recognition rate of the recognition time-sensitive search formula is improved while ensuring the recognition accuracy of the aging search formula.
  • Step 404 When the search formula input by the user matches the second aging search formula, the second aging search formula is used for searching.
  • the second aging search formula when the user performs a search, it may be determined whether the second aging search formula is included in the search formula input by the user, and when the second aging search formula is included in the search formula input by the user, the second aging search may be utilized. Search to get search results that are closer to the current time.
  • the apparatus 500 includes a lookup unit 501, a selection unit 502, a processing unit 503, and a search unit 504.
  • the searching unit 501 is configured to search for a first aging search formula set from the search log, where the search log is used to record a search formula used when searching for a user, and the aging search formula is a search returned when searching.
  • the selecting unit 502 is configured to select a search formula that satisfies one of the following selection conditions as the candidate aging search formula based on the first aging search formula set : located in the search log and semantically associated with the first aging search formula in the first aging search formula set; located in the search log and including a preset keyword combination, wherein the preset keyword is in the first aging search The words appearing in the set are greater than the preset threshold, and the preset keyword combination is generated by combining the preset keywords; the processing unit 503 is configured to perform a processing operation on the candidate aging search formula to obtain the second aging search formula.
  • the processing operation includes one of the following: removing the semantic similarity between the candidate aging search formula and the first aging search formula is less than the preset a candidate aging search formula of values; removing a term in the word included in the candidate aging search formula from the candidate aging search formula that is less than a preset relevance degree threshold; the search unit 504 is configured to search for the user when entering When the two-time search type matches, the second aging search formula is used for searching.
  • the selecting unit 502 includes: an extracting subunit (not shown) configured to extract the first semantic keyword in the first aging search formula, and extracting the search log a second semantic keyword of the search formula, wherein the first semantic keyword is a word whose semantic relevance to the first aging search formula is greater than a threshold of the first preset semantic relevance degree, and the second semantic keyword is a search formula in the search log. a semantic relevance degree greater than a second predetermined semantic relevance threshold; a determination subunit (not shown) configured to determine whether the first semantic keyword matches the second semantic keyword; the candidate aging search formula selection subunit (not shown), configured to select a search formula in the search log as a candidate aging search formula when the first semantic keyword and the second semantic keyword are used.
  • an extracting subunit (not shown) configured to extract the first semantic keyword in the first aging search formula, and extracting the search log a second semantic keyword of the search formula, wherein the first semantic keyword is a word whose semantic relevance to the first aging search formula is greater than a threshold of the first
  • the apparatus 500 further includes: a first preset keyword combination generating unit (not shown), where the first preset keyword combination generating unit includes: a co-occurrence keyword searcher a unit (not shown) configured to find a co-occurrence keyword from the first aging search formula set to generate a co-occurrence keyword combination, and the co-occurrence keyword is co-occurring a word in the first aging search formula set and having a number of occurrences greater than a preset threshold; an association degree calculation sub-unit (not shown) configured to obtain an association degree parameter of each co-occurrence keyword in the co-occurrence keyword combination Correlation degree parameter corresponding to the co-occurrence keyword combination, the relevance degree parameter indicates a semantic relevance degree of the co-occurrence keyword and the first aging search formula to which the co-occurrence keyword belongs; the relevance degree judging sub-unit (not shown) is configured to determine the co-occurrence key Whether the association degree parameter corresponding to the word combination is greater than a preset relevance degree threshold; the first keyword combination generation subunit (not shown)
  • the apparatus 500 further includes: a second preset keyword combination generating unit (not shown), where the second preset keyword combination generating unit includes: an event keyword searching subunit (not shown), configured to search for an event keyword from the first aging search formula, the event keyword is that the number of occurrences in the first aging search formula is greater than a preset threshold, and the first aging search to which it belongs And a word in the search log that is greater than a predetermined number of thresholds; a combined keyword lookup subunit (not shown) configured to find a combination from the first aging search formula containing the event keyword The keyword, the combined keyword is a word that appears in the first aging search formula containing the event keyword and is greater than a preset threshold; the second keyword combination generation sub-unit (not shown) is configured to use the event key Words and combinations are combined with keywords to generate a preset keyword combination.
  • an event keyword searching subunit not shown
  • the second keyword combination generation sub-unit is configured to use the event key Words and combinations are combined with keywords to generate a
  • the processing unit 503 includes: a semantic similarity calculation subunit (not shown) configured to use the first aging in the candidate aging search formula and the first aging search formula set The semantic similarity between the candidate aging search formula and the first aging search formula is calculated when the search formula is semantically associated; the first removal subunit (not shown) is configured to remove the candidate aging search formula and the first a candidate aging search formula whose semantic similarity is less than a preset threshold; a first determining subunit (not shown) configured to remove a candidate whose semantic similarity to the first aging search formula is less than a preset threshold The candidate aging search formula after the aging search formula is used as the second aging search formula.
  • a semantic similarity calculation subunit (not shown) configured to use the first aging in the candidate aging search formula and the first aging search formula set The semantic similarity between the candidate aging search formula and the first aging search formula is calculated when the search formula is semantically associated
  • the first removal subunit (not shown) is configured to remove the candidate
  • the processing unit 503 further includes: a second removal subunit (not shown) configured to remove the candidate aging search when the candidate aging search formula includes the preset keyword combination
  • the words included in the formula that match the preset verification word and the historical time-sensitive search term, and the default verification word is the semantic relevance to the first time-sensitive search formula a term less than a preset threshold
  • a second determining sub-unit configured to use the candidate aging search formula after the word matching the preset verification word and the historical aging search term is removed as the second aging search formula.
  • retrieval device 500 also includes some other well-known structures, such as processors, memories, etc., which are not shown in FIG. 5 in order to unnecessarily obscure the embodiments of the present disclosure.
  • the unit or module involved in the embodiment of the present application may be implemented by software or by hardware.
  • the described unit or module may also be provided in the processor, for example, as a processor including a lookup unit, a selection unit, a processing unit, and a search unit.
  • the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • the search unit may also be described as "a unit configured to find a first set of aging search formulas from the search log".
  • FIG. 6 a block diagram of a computer system 600 suitable for use in implementing the apparatus of the embodiments of the present application is shown.
  • computer system 600 includes a central processing unit (CPU) 601 that can be loaded into a program in random access memory (RAM) 603 according to a program stored in read only memory (ROM) 602 or from storage portion 608. And perform various appropriate actions and processes.
  • RAM random access memory
  • ROM read only memory
  • RAM random access memory
  • various programs and data required for the operation of the system 600 are also stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also coupled to bus 604.
  • the following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, etc.; an output portion 607 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 608 including a hard disk or the like. And a communication portion 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the Internet.
  • Driver 610 is also coupled to I/O interface 605 as needed.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 610 as needed so that a computer program read therefrom is installed into the storage portion 608 as needed.
  • an embodiment of the present disclosure includes a computer program A sequential product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart.
  • the computer program can be downloaded and installed from the network via communication portion 609, and/or installed from removable media 611.
  • each block in the flowchart or block diagram can represent a module, a program segment, or a portion of code, and a module, a program segment, or a portion of code includes one or more Executable instructions.
  • the functions noted in the blocks may also occur in a different order than that illustrated in the drawings. For example, two successively represented blocks may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.
  • the present application further provides a computer readable storage medium, which may be a computer readable storage medium included in the apparatus described in the foregoing embodiment, or may exist separately, not A computer readable storage medium that is assembled into a terminal.
  • the computer readable storage medium stores one or more programs that are used by one or more processors to perform the information push method described in this application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种检索方法和装置。该检索方法包括:从搜索日志中查找出第一时效搜索式集合(101);基于第一时效搜索式集合,选取满足以下选取条件之一的搜索式作为候选时效搜索式:位于搜索日志中且与第一时效搜索式集合中的第一时效搜索式在语义上相关联;位于搜索日志中且包含预设关键词组合(102);对候选时效搜索式执行处理操作,得到第二时效搜索式(103);当用户输入的搜索式与第二时效搜索式匹配时,利用第二时效搜索式进行搜索(104)。实现了基于已被识别的第一时效搜索式,从搜索日志的搜索式中得到第二时效搜索式,从而在确保时效搜索式的识别过程中的识别准确率的情况下,增加识别召回率,进而提升了对时效搜索式的识别效果。

Description

检索方法和装置
相关申请的交叉引用
本申请要求于2015年08月03日提交的中国专利申请号为“201510481932.7”的优先权,其全部内容作为整体并入本申请中。
技术领域
本申请涉及计算机领域,具体涉及搜索领域,尤其涉及检索方法和装置。
背景技术
用户在进行检索时,时常会期望利用输入的搜索式获取与当前时间距离较近的搜索结果,此时,可以将被用来搜索时所返回的搜索结果的发布时间距离当前时间较近的搜索式称之为时效搜索式。通过对时效搜索式预先进行识别,当用户输入的搜索式中包含已被识别的时效搜索式时,则可以利用时效搜索式进行检索,使得返回的搜索结果更加精确。
在已知的技术中,对时效搜索式进行识别主要通过用户行为识别、语言模型识别两种方式。其中,用户行为识别方式为将搜索日志中查询频次在某一时间点大于预设数量阈值的搜索式识别为时效搜索式。语言模型识别方式为计算搜索式在不同语言模型上的得分,将得分差异大于预设分数阈值的搜索将识别为时效搜索式。然后,上述方式为提升识别准确率时,需要提升相应地阈值。同时,提升相应地阈值会引起识别召回率的降低,进而影响对时效搜索式的识别效果。
发明内容
本申请提供了一种检索方法和装置,用于解决上述背景技术部分存在的技术问题。
第一方面,本申请提供了检索方法,该方法包括:从搜索日志中查找出第一时效搜索式集合,其中,搜索日志用于记录用户搜索时所使用过的搜索式,时效搜索式为用来搜索时所返回的搜索结果的发布时间距离当前时间的差值小于预设时间差值阈值的搜索式;基于第一时效搜索式集合,选取满足以下选取条件之一的搜索式作为候选时效搜索式:位于搜索日志中且与第一时效搜索式集合中的第一时效搜索式在语义上相关联;位于搜索日志中且包含预设关键词组合,其中,预设关键词是在第一时效搜索式集合中出现的次数大于预设阈值的词语,预设关键词组合由对预设关键词进行组合生成;对候选时效搜索式执行处理操作,得到第二时效搜索式,处理操作包括以下之一:去除候选时效搜索式中与第一时效搜索式的语义相似度小于预设阈值的候选时效搜索式;去除候选时效搜索式所包括的词语中与该候选时效搜索式的语义关联度小于预设关联度阈值的词语;当用户输入的搜索式与第二时效搜索式匹配时,利用第二时效搜索式进行搜索。
第二方面,本申请提供了检索装置,该装置包括:查找单元,配置用于从搜索日志中查找出第一时效搜索式集合,其中,搜索日志用于记录用户搜索时所使用过的搜索式,时效搜索式为用来搜索时所返回的搜索结果的发布时间距离当前时间的差值小于预设时间差值阈值的搜索式;选取单元,配置用于基于第一时效搜索式集合,选取满足以下选取条件之一的搜索式作为候选时效搜索式:位于搜索日志中且与第一时效搜索式集合中的第一时效搜索式在语义上相关联;位于搜索日志中且包含预设关键词组合,其中,预设关键词是在第一时效搜索式集合中出现的次数大于预设阈值的词语,预设关键词组合由对预设关键词进行组合生成;处理单元,配置用于对候选时效搜索式执行处理操作,得到第二时效搜索式,处理操作包括以下之一:去除候选时效搜索式中与第一时效搜索式的语义相似度小于预设阈值的候选时效搜索式;去除候选时效搜索式所包括的词语中与该候选时效搜索式的语义关联度小于预设关联度阈值的词语;搜索单元,配置用于当用户输入的搜索式与第二时效搜索式匹配时,利用第二时效搜索式进行搜索。
本申请提供的检索方法和装置,通过从搜索日志中查找出第一时效搜索式集合;从搜索日志中的搜索式中选取出与第一时效搜索式集合中的第一时效搜索式语义相关联的候选时效式以及包含预设关键词组合的候选时效搜索式;对候选时效搜索式进行处理操作,得到第二时效搜索式。实现了基于已被识别的第一时效搜索式,从搜索日志的搜索式中得到第二时效搜索式,从而在确保时效搜索式的识别过程中的识别准确率的情况下,增加识别召回率,进而提升了对时效搜索式的识别效果。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1示出了根据本申请的检索方法的一个实施例的流程图;
图2示出了基于语义关键词查找出候选时效搜索式的原理图;
图3示出了基于核心表达词典查找出候选时效搜索式的原理图;
图4示出了根据本申请的检索方法的另一个实施例的流程图;
图5示出了根据本申请的检索装置的一个实施例的结构示意图;
图6是本发明的实施例提供的一种计算机系统的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
在本申请的实施例中,第一时效搜索式为已经被识别为时效搜索式的搜索式,第二时效搜索式为基于第一时效搜索式,从搜索日志中得到的新的时效搜索式,即在第一时效搜索式的基础上,从搜索日志 中召回的时效搜索式。
请参考图1,其示出了根据本申请的检索方法的一个实施例的流程100。该方法包括以下步骤:
步骤101,从搜索日志中查找出第一时效搜索式集合。
在本实施例中,搜索日志可以为用于记录用户搜索时所使用过的搜索式。时效搜索式可以为用来搜索时所返回的搜索结果的发布时间距离当前时间的差值小于预设阈值的搜索式。以用户利用一个搜索式进行搜索为例,当用户期望获取与该搜索式相关的最近的新闻事件的图片时,即期望得到的搜索结果的发布时间距离当前时间距离较近时,则可以将用户输入的搜索式称之为具有时效性需求的时效搜索式。在本实施例中,可以预先设置一些表征搜索式的时效性的词语,例如“事件、发生、财报”等词语。当用户输入的搜索式中包含时效性词语时,则可以将搜索式识别为第一时效搜索式。在本实施例中,还可以检测用户输入的搜索式在一定时间段内的查询次数,当查询次数大于预设数量阈值时,则可以将该搜索式识别为第一时效搜索式。
步骤102,基于第一时效搜索式集合,选取满足以下选取条件之一的搜索式,作为候选时效搜索式:位于搜索日志中且与第一时效搜索式集合中的第一时效搜索式在语义上相关联;位于搜索日志中且包含预设关键词组合。
在本实施例中,预设关键词是在第一时效搜索式集合中出现的次数大于预设阈值的词语,预设关键词组合由对预设关键词进行组合生成。在本实施例中,可以基于已经查找出的第一时效搜索式集合,进一步从搜索日志中查找出与第一时效搜索式集合中的第一时效搜索式相关联的搜索式作为候选时效搜索式。上述关联关系可以为:搜索日志中的搜索式与第一时效搜索式在语义上相关联或搜索日志中的搜索式包含由在第一时效搜索式集合中出现的次数大于预设阈值的词语所组成的预设关键词组合。
在本实施例的一些可选地实现方式中,基于第一时效搜索式集合,选取位于搜索日志中且与第一时效搜索式集合中的第一时效搜索式在语义上相关联的搜索式,作为候选时效搜索式包括:提取第一时效搜 索式中的第一语义关键词,以及提取搜索日志中的搜索式的第二语义关键词,第一语义关键词为与第一时效搜索式的语义关联度大于第一预设语义关联度阈值的词语,第二语义关键词为与搜索日志中的搜索式的语义关联度大于第二预设语义关联度阈值的词语;判断第一语义关键词与第二语义关键词是否匹配;若是,将搜索日志中的搜索式选取为候选时效搜索式。
在本实施例中,语义关键词(也可称之为语义签名)为与其对应的搜索式的语义的关联度大于预设关联度阈值的词语,即语义关键词可以反映出其所属的搜索式的真实语义。例如,搜索式“姚明有多高”的核心语义为姚明的身高是多少。相应地,语义关键词为“姚明”、“身高”。在本实施例中,可以分别从第一时效搜索式中提取出反应其真实语义的第一语义关键词,以及从搜索日志中的搜索式中提取出可以反应其真实语义的第二语义关键词。可以将提出的第一语义关键词加入到预先设置的用于存储第一语义关键词的词库中,也可将该词库称之为语义签名词典。然后,可以将第二语义关键词与语义签名词典中的第一语义关键词进行匹配,当第二语义关键词与第一语义关键词匹配时,则可以将该搜索日志中的搜索式确定为候选时效搜索式。
可选地,由于语义相似度算法输出的中间结果为可以反映出搜索式真实语义的语义关键词,提取语义关键词可以通过语义相似度算法实现。语义相似度算法可以采用最小编辑距离(Levenshtein distance)算法、Jaccard相似系数(Jaccard Coefficient)算法。可以采用上述语义相似度算法从第一时效搜索式以及搜索日志中的搜索式中,分别提取出第一语义关键词以及第二语义关键词。
请参考图2,其示出了基于语义关键词查找出候选时效搜索式的原理图。在本实施例中,可以预先提取出第一时效搜索式集合中每一个第一时效搜索式的第一语义关键词,然后将第一时效搜索式的语义签名进行聚合,生成语义签名词典。相应地,可以提取出搜索日志中的搜索式的第二语义关键词。然后,可以判断搜索日志中的搜索式的第二语义关键词是否在语义签名词典中,确定搜索日志中的搜索式是否为候选时效搜索式。
在本实施例的一些可选地实现方式中,预设关键词组合可以通过如下步骤生成:从第一时效搜索式集合中查找出共现关键词,生成共现关键词组合,共现关键词为共同出现在第一时效搜索式集合中并且出现次数大于预设阈值的词语;基于共现关键词组合中每个共现关键词的关联度参数,得到共现关键词组合对应的关联度参数,关联度参数指示共现关键词与其所属的第一时效搜索式的语义关联度;判断共现关键词组合对应的关联度参数是否大于预设关联度阈值;若是,将共现关键词组合作为预设关键词组合。
在本实施例中,可以采用以下方式从第一时效搜索式集合中查找出共现关键词:首选,统计一定时间段(例如1天)内,第一时效搜索式集合中共同出现的词语的出现次数(也可称之为共现次数)。然后,查找出共现次数大于预设数量阈值的词语(也可称之为共现关键词)。在查找出共现关键词之后,可以进一步计算出共现关键词与第一时效搜索式的语义关联度(也可称之为重要度参数),将重要度参数进行归一化后进行累加,得到该共现关键词组合的重要度值。最后,将重要度值大于一定预设阈值的共现关键词组合作为预设共现关键词组合。
在本实施例的一些可选地实现方式中,预设关键词组合通过如下步骤生成:从第一时效搜索式集合中查找出事件关键词,事件关键词为在第一时效搜索式集合中出现次数大于预设阈值,并且其所属的第一时效搜索式以及搜索日志中的搜索式的数量均大于预设数量阈值的词语;从包含事件关键词的第一时效搜索式中查找出组合用关键词,组合用关键词为在包含事件关键词的第一时效搜索式中出现的次数大于预设阈值的词语;将事件关键词与组合用关键词进行组合,生成预设关键词组合。
在本实施例中,可以采用以下方式从第一时效搜索式集合中查找出事件关键词:首先,统计词语在第一时效搜索式中集合出现的次数,同时,统计该词语所在的第一时效搜索式以及搜索日志中的搜索式的数量(也可称之为散布数)。然后,将出现次数和散布数均大于预设数量阈值的词语选取出来作为事件关键词。在本实施例中,在获取了事件关键词后,可以进一步从包含事件关键词的第一时效搜索式中获取 组合用关键词。例如,可以统计包含事件关键词的第一时效搜索式中的词语的出现次数,将出现次数大于一定预设数量阈值的词语作为组合用关键词。最后,将组合用关键词与事件关键词进行组合,生成预设关键词组合。
下面以王健林成为亚洲新首富这一新闻事件为例,说明生成预设关键词组合的过程:首先,查找出“王健林成亚洲首富”、“王建林重夺内地首富成为亚洲首富”、“王健林超越李嘉诚成为亚洲首富”、“王健林首富”等第一时效搜索式。然后,从上述第一时效搜索式中查找出可以反映上述时效搜索式所对应的共同新闻事件,即王健林成为亚洲首富这一事件的事件关键词,即“王健林”。在得到事件关键词“王健林”之后,可以根据该事件关键词将上述第一时效搜索式进行聚合,生成一个与这一新闻事件对应的第一时效搜索式集合。可以从该第一时效搜索式集合中可以进一步查找出“首富”、“亚洲”、“李嘉诚”、“重夺”等组合用关键词。最后,将事件关键词“王健林”与上述组合用关键词进行组合,生成预设关键词组合。即“王健林”&“首富”、“王健林”&“亚洲”、“王健林”&“李嘉诚”、“王健林”&“重夺”等预设关键词组合。
在本实施例中,还可以对第一时效搜索式集合中的第一时效搜索式进行预处理操作,得到第一时效搜索式中的实体词,预处理操作包括以下至少一项:分词操作、词性标注操作、命名实体识别操作;从实体词中提取出与预设模板中的模板词相关联的关键词;将与预设模板中的模板词相关联的关键词进行组合,生成预设关键词组合。
在本实施例中,可以预先设置用于表达时效性事件的模版,例如,预先设置包含“**发生**”、“**地震”、“**事件”等模板词的模板。可以采用以下方式生成预设关键词组合:首先,对第一时效搜索式进行分词、词性标注、命名实体识别等预处理操作,得到实体词,并且计算实体词与第一时效搜索式的关联度。然后,去除第一时效搜索式中不在预设模板中且关联度低于预设关联度阈值的实体。最后,将去除了不在预设模板中的且关联度低于预设关联度阈值的实体语之后的第一时效搜索式与预设模板进行匹配,提取出与预设模板中的模板词相 关联的关键词,将与预设模板中的模板词相关联的关键词进行组合,生成预设关键词组合。以第一时效搜索式为“北京木材厂发生大火”为例,可以将该搜索式与设置模板“**发生**”进行匹配,从而可以提取出位于“发生”这一模板词之前的实体词“北京”、“木材厂”,以及位于这一模板词之后的实体词“大火”。基于上述提取出的实体词,可以进一步生成“北京-大火”、“木材厂-大火”、“北京木材厂-大火”词语组合,作为预设关键词组合。
请参考图3,其示出了基于核心表达词典查找出候选时效搜索式的原理图。在本实施例中,可以通过提取共现关键词、事件关键词以及预设模板等方式从第一时效搜索式集合中提取出相应地关键词,并且生成第一时效搜索式集合对应的预设关键词组合,该关键词组合也称之为核心表达关键词组合)。在生成核心表达关键词组合之后,可以将核心表达关键词组合存储在预设的核心表达词典中。相应地,也可以采用提取共现关键词、事件关键词以及预设模板等方式,从搜索日志的搜索式中提取出关键词,并且生成搜索日志的搜索式对应的核心表达关键词组合。然后,可以判断搜索日志的搜索式对应的核心表达关键词组合是否在核心表达词典中,若是,则将该搜索日志的搜索式选取为候选时效搜索式。
步骤103,对候选时效搜索式执行处理操作,得到第二时效搜索式。
在本实施例中,在获取了候选时效搜索式之后,可以进一步对候选时效搜索式进行处理操作,从而得到第二时效搜索式。处理操作可以为去除候选时效搜索式中与第一时效搜索式的语义相似度小于预设阈值的候选时效搜索式,然后将候选时效搜索式中与第一时效搜索式的语义相似度大于预设阈值的候选时效搜索式,作为第二时效搜索式。在本实施例中,处理操作还可以为去除候选时效搜索式所包括的词语中与该候选时效搜索式的语义关联度小于预设关联度阈值的词语,然后将去除了与候选时效搜索式的语义关联度小于预设关联度阈值的词语之后的候选时效搜索式作为第二时效搜索式。
步骤104,当用户输入的搜索式与第二时效搜索式匹配时,利用 第二时效搜索式进行搜索。
在本实施例中,当用户进行搜索时,可以判断用户输入的搜索式中是否包含第二时效搜索式,当用户输入的搜索式中包含第二时效搜索式时,则可以利用第二时效搜索式进行搜索,从而获得距离当前时间较近的搜索结果。例如,用户期望获取与最近的新闻事件相关的新闻图片,当用户输入的搜索式中包括由与表征该新闻事件中的关键词组成的第二时效搜索式时,则可以利用该第二时效搜索式进行搜索,从而向用户返回距离当前时间较近的关于该新闻事件的新闻图片。
请参考图4,其示出了根据本申请的检索方法的另一个实施例的流程400。该方法包括以下步骤:
步骤401,从搜索日志中查找出第一时效搜索式集合。
在本实施例中,可以预先设置一些表征搜索式的时效性的词语,当用户输入的搜索式中包含时效性词语时,则可以将搜索式识别为第一时效搜索式。还可以检测用户输入的搜索式在一定时间段内的查询次数,当查询次数大于预设数量阈值时,则可以将该搜索式识别为第一时效搜索式。
步骤402,基于第一时效搜索式集合,选取满足以下选取条件之一的搜索式,作为候选时效搜索式:位于搜索日志中且与第一时效搜索式集合中的第一时效搜索式在语义上相关联;位于搜索日志中且包含预设关键词组合。
在本实施例中,可以基于已经查找出的第一时效搜索式集合,进一步从搜索日志中查找出与第一时效搜索式集合中的第一时效搜索式相关联的搜索式作为候选时效搜索式。上述关联关系可以为:搜索日志中的搜索式与第一时效搜索式在语义上相关联或搜索日志中的搜索式包含由在第一时效搜索式集合中出现的次数大于预设阈值的词语所组成的预设关键词组合。
步骤403,基于语义相似度以及历史时效搜索词和预设验证词,从候选时效搜索式中得到第二时效搜索式。
在本实施例中,当候选时效搜索式与第一时效搜索式集合中的第一时效搜索式在语义上相关联时,可以计算候选时效搜索式与第一时 效搜索式之间的语义相似度;去除候选时效搜索式中与第一时效搜索式的语义相似度小于预设阈值的候选时效搜索式;将去除了与第一时效搜索式的语义相似度小于预设阈值的候选时效搜索式之后的候选时效搜索式作为第二时效搜索式。在本实施例中,当候选时效搜索式与第一时效搜索式集合中的第一时效搜索式在语义上相关联时,即从候选时效搜索式中提取出的语义签名与从第一时效搜索式中提取出的语义签名相同时,可以进一步计算候选时效搜索式与第一时效搜索式的语义相似度(也可称之为字面相似度),然后将字面相似度大于一定预设阈值的候选时效搜索式作为第二时效搜索式。
在本实施例中,当候选时效搜索式中包含预设关键词组合时,可以进一步去除候选时效搜索式中与预设验证词、历史时效搜索词匹配的词语。然后,将去除了与预设验证词、历史时效搜索式匹配的词语的候选时效搜索式作为第二时效搜索式。在本实施例中,历史时效搜索词可以为从历史上被识别的时效搜索式中提取出的共现关键词、事件关键词等关键词。预设验证词可以为不具有时效性需求或不能表达事件的核心语义的词语,即与第一时效搜索式的关联度小于预设阈值的词语。
在本实施例中,通过针对候选时效搜索式所满足的条件,进一步通过与候选时效搜索式所满足的条件对应的验证方式对候选时效搜索式进行验证,最后从搜索日志中得到第二时效搜索式,实现从搜索日志中召回的新的时效搜索式。从而使得在对用户的搜索日志中的搜索式的识别过程中,在确保识别时效搜索式准确率的同时,提升识别时效搜索式的召回率。
步骤404,当用户输入的搜索式与第二时效搜索式匹配时,利用第二时效搜索式进行搜索。
在本实施例中,当用户进行搜索时,可以判断用户输入的搜索式中是否包含第二时效搜索式,当用户输入的搜索式中包含第二时效搜索式时,则可以利用第二时效搜索式进行搜索,从而获得距离当前时间较近的搜索结果。
请参考图5,图5示出了根据本申请的检索装置的一个实施例的 结构示意图。如图5所示,装置500包括:查找单元501,选取单元502,处理单元503,搜索单元504。其中,查找单元501配置用于从搜索日志中查找出第一时效搜索式集合,其中,搜索日志用于记录用户搜索时所使用过的搜索式,时效搜索式为用来搜索时所返回的搜索结果的发布时间距离当前时间的差值小于预设时间差值阈值的搜索式;选取单元502配置用于基于第一时效搜索式集合,选取满足以下选取条件之一的搜索式作为候选时效搜索式:位于搜索日志中且与第一时效搜索式集合中的第一时效搜索式在语义上相关联;位于搜索日志中且包含预设关键词组合,其中,预设关键词是在第一时效搜索式集合中出现的次数大于预设阈值的词语,预设关键词组合由对预设关键词进行组合生成;处理单元503配置用于对候选时效搜索式执行处理操作,得到第二时效搜索式,处理操作包括以下之一:去除候选时效搜索式中与第一时效搜索式的语义相似度小于预设阈值的候选时效搜索式;去除候选时效搜索式所包括的词语中与该候选时效搜索式的语义关联度小于预设关联度阈值的词语;搜索单元504配置用于当用户输入的搜索式与第二时效搜索式匹配时,利用第二时效搜索式进行搜索。
在本实施例的一些可选地实现方式中,选取单元502包括:提取子单元(未示出),配置用于提取第一时效搜索式中的第一语义关键词,以及提取搜索日志中的搜索式的第二语义关键词,第一语义关键词为与第一时效搜索式的语义关联度大于第一预设语义关联度阈值的词语,第二语义关键词为与搜索日志中的搜索式的语义关联度大于第二预设语义关联度阈值的词语;判断子单元(未示出),配置用于判断第一语义关键词与第二语义关键词是否匹配;候选时效搜索式选取子单元(未示出),配置用于当第一语义关键词与第二语义关键词时,将搜索日志中的搜索式选取为候选时效搜索式。
在本实施例的一些可选地实现方式中,装置500还包括:第一预设关键词组合生成单元(未示出),第一预设关键词组合生成单元包括:共现关键词查找子单元(未示出),配置用于从第一时效搜索式集合中查找出共现关键词,生成共现关键词组合,共现关键词为共同出现在 第一时效搜索式集合中并且出现次数大于预设阈值的词语;关联度计算子单元(未示出),配置用于基于共现关键词组合中每个共现关键词的关联度参数,得到共现关键词组合对应的关联度参数,关联度参数指示共现关键词与其所属的第一时效搜索式的语义关联度;关联度判断子单元(未示出),配置用于判断共现关键词组合对应的关联度参数是否大于预设关联度阈值;第一关键词组合生成子单元(未示出),配置用于当共现关键词组合对应的关联度参数是否大于预设关联度阈值时,将共现关键词组合作为预设关键词组合。
在本实施例的一些可选地实现方式中,装置500还包括:第二预设关键词组合生成单元(未示出),第二预设关键词组合生成单元包括:事件关键词查找子单元(未示出),配置用于从第一时效搜索式集合中查找出事件关键词,事件关键词为在第一时效搜索式集合中出现次数大于预设阈值,并且其所属的第一时效搜索式以及搜索日志中的搜索式的数量均大于预设数量阈值的词语;组合关键词查找子单元(未示出),配置用于从包含事件关键词的第一时效搜索式中查找出组合用关键词,组合用关键词为在包含事件关键词的第一时效搜索式中出现的次数大于预设阈值的词语;第二关键词组合生成子单元(未示出),配置用于将事件关键词与组合用关键词进行组合,生成预设关键词组合。
在本实施例的一些可选地实现方式中,处理单元503包括:语义相似度计算子单元(未示出),配置用于当候选时效搜索式与第一时效搜索式集合中的第一时效搜索式在语义上相关联时,计算候选时效搜索式与第一时效搜索式之间的语义相似度;第一去除子单元(未示出),配置用于去除候选时效搜索式中与第一时效搜索式的语义相似度小于预设阈值的候选时效搜索式;第一确定子单元(未示出),配置用于将去除了与第一时效搜索式的语义相似度小于预设阈值的候选时效搜索式之后的候选时效搜索式作为第二时效搜索式。
在本实施例的一些可选地实现方式中,处理单元503还包括:第二去除子单元(未示出),配置用于当候选时效搜索式包含预设关键词组合时,去除候选时效搜索式所包括的词语中与预设验证词和历史时效搜索词匹配的词语,预设验证词为与第一时效搜索式的语义关联度 小于预设阈值的词语;第二确定子单元(未示出),配置用于将去除了与预设验证词和历史时效搜索词匹配的词语之后的候选时效搜索式作为第二时效搜索式。
本领域技术人员可以理解,上述检索装置500还包括一些其他公知结构,例如处理器、存储器等,为了不必要地模糊本公开的实施例,这些公知的结构在图5中未示出。
本申请实施例中所涉及到的单元或模块可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元或模块也可以设置在处理器中,例如,可以描述为:一种处理器包括查找单元,选取单元、处理单元和搜索单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,查找单元还可以被描述为“配置用于从搜索日志中查找出第一时效搜索式集合的单元”。
下面参考图6,其示出了适于用来实现本申请实施例的设备的计算机系统600的结构示意图。
如图6所示,计算机系统600包括中央处理单元(CPU)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有系统600操作所需的各种程序和数据。CPU 601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程 序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。
附图中的流程图和框图,图示了按照本发明各种实施例的系统、方法的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
作为另一方面,本申请还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施例中所述装置中所包含的计算机可读存储介质;也可以是单独存在,未装配入终端中的计算机可读存储介质。所述计算机可读存储介质存储有一个或者一个以上程序,所述程序被一个或者一个以上的处理器用来执行描述于本申请的信息推送方法。
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离所述发明构思的情况下,由上述技术特征或其等同特征进行任意组合合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (14)

  1. 一种检索方法,其特征在于,所述方法包括:
    从搜索日志中查找出第一时效搜索式集合,其中,所述搜索日志用于记录用户搜索时所使用过的搜索式,时效搜索式为用来搜索时所返回的搜索结果的发布时间距离当前时间的差值小于预设时间差值阈值的搜索式;
    基于第一时效搜索式集合,选取满足以下选取条件之一的搜索式作为候选时效搜索式:位于所述搜索日志中且与第一时效搜索式集合中的第一时效搜索式在语义上相关联;位于所述搜索日志中且包含预设关键词组合,其中,预设关键词是在所述第一时效搜索式集合中出现的次数大于预设阈值的词语,预设关键词组合由对预设关键词进行组合生成;
    对所述候选时效搜索式执行处理操作,得到第二时效搜索式,所述处理操作包括以下之一:去除所述候选时效搜索式中与第一时效搜索式的语义相似度小于预设阈值的候选时效搜索式;去除所述候选时效搜索式所包括的词语中与该候选时效搜索式的语义关联度小于预设关联度阈值的词语;
    当用户输入的搜索式与第二时效搜索式匹配时,利用所述第二时效搜索式进行搜索。
  2. 根据权利要求1所述的方法,其特征在于,基于第一时效搜索式集合,选取位于所述搜索日志中且与第一时效搜索式集合中的第一时效搜索式在语义上相关联的搜索式,作为候选时效搜索式包括:
    提取所述第一时效搜索式中的第一语义关键词,以及提取搜索日志中的搜索式的第二语义关键词,所述第一语义关键词为与所述第一时效搜索式的语义关联度大于第一预设语义关联度阈值的词语,所述第二语义关键词为与所述搜索日志中的搜索式的语义关联度大于第二预设语义关联度阈值的词语;
    判断所述第一语义关键词与第二语义关键词是否匹配;
    若是,将所述搜索日志中的搜索式选取为候选时效搜索式。
  3. 根据权利要求2所述的方法,其特征在于,所述预设关键词组合通过如下步骤生成:
    从第一时效搜索式集合中查找出共现关键词,生成共现关键词组合,所述共现关键词为共同出现在第一时效搜索式集合中并且出现次数大于预设阈值的词语;
    基于共现关键词组合中每个共现关键词的关联度参数,得到共现关键词组合对应的关联度参数,其中,关联度参数用于指示共现关键词与该共现关键词所属的第一时效搜索式的语义关联度;
    判断所述共现关键词组合对应的关联度参数是否大于预设关联度阈值;
    若是,将所述共现关键词组合作为所述预设关键词组合。
  4. 根据权利要求2所述的方法,其特征在于,所述预设关键词组合通过如下步骤生成:
    从第一时效搜索式集合中查找出事件关键词,所述事件关键词为在第一时效搜索式集合中出现次数大于预设阈值,并且其所属的第一时效搜索式以及搜索日志中的搜索式的数量均大于预设数量阈值的词语;
    从包含事件关键词的第一时效搜索式中查找出组合用关键词,所述组合用关键词为在包含事件关键词的第一时效搜索式中出现的次数大于预设阈值的词语;
    将所述事件关键词与组合用关键词进行组合,生成所述预设关键词组合。
  5. 根据权利要求1-4之一所述的方法,其特征在于,所述对所述候选时效搜索式执行处理操作,得到第二时效搜索式包括:
    当候选时效搜索式与第一时效搜索式集合中的第一时效搜索式在语义上相关联时,计算候选时效搜索式与第一时效搜索式之间的语义 相似度;
    去除候选时效搜索式中与第一时效搜索式的语义相似度小于预设阈值的候选时效搜索式;
    将去除了与第一时效搜索式的语义相似度小于预设阈值的候选时效搜索式之后的候选时效搜索式作为第二时效搜索式。
  6. 根据权利要求5所述的方法,其特征在于,所述对所述候选时效搜索式执行处理操作,得到第二时效搜索式包括:
    当候选时效搜索式包含预设关键词组合时,去除候选时效搜索式所包括的词语中与预设验证词和历史时效搜索词匹配的词语,所述预设验证词为与第一时效搜索式的语义关联度小于预设阈值的词语;
    将去除了与预设验证词和历史时效搜索词匹配的词语之后的候选时效搜索式作为第二时效搜索式。
  7. 一种检索装置,其特征在于,所述装置包括:
    查找单元,配置用于从搜索日志中查找出第一时效搜索式集合,其中,所述搜索日志用于记录用户搜索时所使用过的搜索式,时效搜索式为用来搜索时所返回的搜索结果的发布时间距离当前时间的差值小于预设时间差值阈值的搜索式;
    选取单元,配置用于基于第一时效搜索式集合,选取满足以下选取条件之一的搜索式作为候选时效搜索式:位于所述搜索日志中且与第一时效搜索式集合中的第一时效搜索式在语义上相关联;位于所述搜索日志中且包含预设关键词组合,其中,预设关键词是在所述第一时效搜索式集合中出现的次数大于预设阈值的词语,预设关键词组合由对预设关键词进行组合生成;
    处理单元,配置用于对所述候选时效搜索式执行处理操作,得到第二时效搜索式,所述处理操作包括以下之一:去除候选时效搜索式中与第一时效搜索式的语义相似度小于预设阈值的候选时效搜索式;去除候选时效搜索式所包括的词语中与该候选时效搜索式的语义关联度小于预设关联度阈值的词语;
    搜索单元,配置用于当用户输入的搜索式与第二时效搜索式匹配时,利用所述第二时效搜索式进行搜索。
  8. 根据权利要求7所述的装置,其特征在于,所述选取单元包括:
    提取子单元,配置用于提取第一时效搜索式中的第一语义关键词,以及提取搜索日志中的搜索式的第二语义关键词,所述第一语义关键词为与所述第一时效搜索式的语义关联度大于第一预设语义关联度阈值的词语,所述第二语义关键词为与所述搜索日志中的搜索式的语义关联度大于第二预设语义关联度阈值的词语;
    判断子单元,配置用于判断所述第一语义关键词与第二语义关键词是否匹配;
    候选时效搜索式选取子单元,配置用于当第一语义关键词与第二语义关键词时,将所述搜索日志中的搜索式选取为候选时效搜索式。
  9. 根据权利要求8所述的装置,其特征在于,所述装置还包括:第一预设关键词组合生成单元,所述第一预设关键词组合生成单元包括:
    共现关键词查找子单元,配置用于从第一时效搜索式集合中查找出共现关键词,生成共现关键词组合,所述共现关键词为共同出现在第一时效搜索式集合中并且出现次数大于预设阈值的词语;
    关联度计算子单元,配置用于基于共现关键词组合中每个共现关键词的关联度参数,得到共现关键词组合对应的关联度参数,所述关联度参数指示共现关键词与其所属的第一时效搜索式的语义关联度;
    关联度判断子单元,配置用于判断所述共现关键词组合对应的关联度参数是否大于预设关联度阈值;
    第一关键词组合生成子单元,配置用于当所述共现关键词组合对应的关联度参数大于预设关联度阈值时,将所述共现关键词组合作为所述预设关键词组合。
  10. 根据权利要求8所述的装置,其特征在于,所述装置还包括:第二预设关键词组合生成单元,所述第二预设关键词组合生成单元包括:
    事件关键词查找子单元,配置用于从第一时效搜索式集合中查找出事件关键词,所述事件关键词为在第一时效搜索式集合中出现次数大于预设阈值,并且其所属的第一时效搜索式以及搜索日志中的搜索式的数量均大于预设数量阈值的词语;
    组合用关键词查找子单元,配置用于从包含事件关键词的第一时效搜索式中查找出组合用关键词,所述组合用关键词为在包含事件关键词的第一时效搜索式中出现的次数大于预设阈值的词语;
    第二关键词组合生成子单元,配置用于将所述事件关键词与组合用关键词进行组合,生成所述预设关键词组合。
  11. 根据权利要求7-10之一所述的装置,其特征在于,所述处理单元包括:
    语义相似度计算子单元,配置用于当候选时效搜索式与第一时效搜索式集合中的第一时效搜索式在语义上相关联时,计算候选时效搜索式与第一时效搜索式之间的语义相似度;
    第一去除子单元,配置用于去除候选时效搜索式中与第一时效搜索式的语义相似度小于预设阈值的候选时效搜索式;
    第一确定子单元,配置用于将去除了与第一时效搜索式的语义相似度小于预设阈值的候选时效搜索式之后的候选时效搜索式作为第二时效搜索式。
  12. 根据权利要求11所述的装置,其特征在于,所述处理单元还包括:
    第二去除子单元,配置用于当候选时效搜索式包含预设关键词组合时,去除候选时效搜索式所包括的词语中与预设验证词和历史时效搜索词匹配的词语,所述预设验证词为与第一时效搜索式的语义关联度小于预设阈值的词语;
    第二确定子单元,配置用于将去除了与预设验证词和历史时效搜索词匹配的词语之后的候选时效搜索式作为第二时效搜索式。
  13. 一种设备,包括:
    处理器;和
    存储器,
    所述存储器中存储有能够被所述处理器执行的计算机可读指令,在所述计算机可读指令被执行时,所述处理器执行权利要求1至6中任一项所述的方法。
  14. 一种非易失性计算机存储介质,所述计算机存储介质存储有能够被处理器执行的计算机可读指令,当所述计算机可读指令被处理器执行时,所述处理器执行权利要求1至6中任一项所述的方法。
PCT/CN2015/096012 2015-08-03 2015-11-30 检索方法和装置 WO2017020454A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/534,373 US10558694B2 (en) 2015-08-03 2015-11-30 Search method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510481932.7 2015-08-03
CN201510481932.7A CN105159938B (zh) 2015-08-03 2015-08-03 检索方法和装置

Publications (1)

Publication Number Publication Date
WO2017020454A1 true WO2017020454A1 (zh) 2017-02-09

Family

ID=54800795

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/096012 WO2017020454A1 (zh) 2015-08-03 2015-11-30 检索方法和装置

Country Status (3)

Country Link
US (1) US10558694B2 (zh)
CN (1) CN105159938B (zh)
WO (1) WO2017020454A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177023A (zh) * 2021-04-19 2021-07-27 杭州海康威视系统技术有限公司 一种日志检索方法、装置及电子设备

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829098A (zh) * 2017-08-28 2019-05-31 广东神马搜索科技有限公司 搜索结果优化方法、装置及服务器
JP7069615B2 (ja) * 2017-09-26 2022-05-18 カシオ計算機株式会社 情報処理システム、電子機器、情報処理方法及びプログラム
CN110363605B (zh) * 2018-04-10 2024-07-26 北京京东尚科信息技术有限公司 信息搜索方法和装置及计算机可读存储介质
CN111241379B (zh) * 2018-11-28 2023-04-25 阿里巴巴集团控股有限公司 搜索结果的处理方法、装置、电子设备及计算机可读介质
CN110245357B (zh) * 2019-06-26 2023-05-02 北京百度网讯科技有限公司 主实体识别方法和装置
US11429879B2 (en) * 2020-05-12 2022-08-30 Ubs Business Solutions Ag Methods and systems for identifying dynamic thematic relationships as a function of time
CN112685540A (zh) * 2021-01-07 2021-04-20 深圳市欢太科技有限公司 搜索方法、装置、存储介质以及终端
CN113806519A (zh) * 2021-09-24 2021-12-17 金蝶软件(中国)有限公司 一种搜索召回方法、装置及介质
CN115033747B (zh) * 2022-06-24 2023-05-30 北京百度网讯科技有限公司 异常状态的检索方法及其装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012915A (zh) * 2010-11-22 2011-04-13 百度在线网络技术(北京)有限公司 一种文档共享平台的关键词推荐方法及系统
CN102073684A (zh) * 2010-12-22 2011-05-25 百度在线网络技术(北京)有限公司 搜索日志的挖掘方法和装置以及页面搜索的方法和装置
CN102637171A (zh) * 2011-02-10 2012-08-15 北京百度网讯科技有限公司 一种优化搜索结果的方法和装置
US20130282754A1 (en) * 2012-04-23 2013-10-24 Estsoft Corp. System and method for extracting analogous queries

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8572069B2 (en) * 1999-03-31 2013-10-29 Apple Inc. Semi-automatic index term augmentation in document retrieval
US20090265328A1 (en) * 2008-04-16 2009-10-22 Yahool Inc. Predicting newsworthy queries using combined online and offline models
US8412699B1 (en) * 2009-06-12 2013-04-02 Google Inc. Fresh related search suggestions
US20120166439A1 (en) * 2010-12-28 2012-06-28 Yahoo! Inc. Method and system for classifying web sites using query-based web site models
CN102866992B (zh) * 2011-07-04 2015-12-02 阿里巴巴集团控股有限公司 一种在网页中显示产品信息的方法及装置
CN103136210A (zh) * 2011-11-23 2013-06-05 北京百度网讯科技有限公司 一种挖掘具有相似需求的查询的方法及装置
CN102609458B (zh) * 2012-01-12 2015-08-05 北京搜狗信息服务有限公司 一种图片推荐方法和装置
CN104216995B (zh) * 2014-09-10 2018-03-06 北京金山安全软件有限公司 信息处理方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012915A (zh) * 2010-11-22 2011-04-13 百度在线网络技术(北京)有限公司 一种文档共享平台的关键词推荐方法及系统
CN102073684A (zh) * 2010-12-22 2011-05-25 百度在线网络技术(北京)有限公司 搜索日志的挖掘方法和装置以及页面搜索的方法和装置
CN102637171A (zh) * 2011-02-10 2012-08-15 北京百度网讯科技有限公司 一种优化搜索结果的方法和装置
US20130282754A1 (en) * 2012-04-23 2013-10-24 Estsoft Corp. System and method for extracting analogous queries

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177023A (zh) * 2021-04-19 2021-07-27 杭州海康威视系统技术有限公司 一种日志检索方法、装置及电子设备
CN113177023B (zh) * 2021-04-19 2023-07-25 杭州海康威视系统技术有限公司 一种日志检索方法、装置及电子设备

Also Published As

Publication number Publication date
CN105159938A (zh) 2015-12-16
US10558694B2 (en) 2020-02-11
US20180137195A1 (en) 2018-05-17
CN105159938B (zh) 2018-11-30

Similar Documents

Publication Publication Date Title
WO2017020454A1 (zh) 检索方法和装置
US11544459B2 (en) Method and apparatus for determining feature words and server
KR100544514B1 (ko) 검색 쿼리 연관성 판단 방법 및 시스템
JP4494632B2 (ja) 言語モデルに基づく情報検索および音声認識
US7769751B1 (en) Method and apparatus for classifying documents based on user inputs
WO2018157789A1 (zh) 一种语音识别的方法、计算机、存储介质以及电子装置
WO2017091985A1 (zh) 停用词识别方法与装置
US11526512B1 (en) Rewriting queries
WO2021051599A1 (zh) 局部优化关键词的方法、装置、设备及存储介质
CN113660541A (zh) 新闻视频的摘要生成方法及装置
TWI681304B (zh) 自適應性調整關連搜尋詞的系統及其方法
CN109977397B (zh) 基于词性组合的新闻热点提取方法、系统及存储介质
CN107239455B (zh) 核心词识别方法及装置
US9965766B2 (en) Method to expand seed keywords into a relevant social query
CN109344397B (zh) 文本特征词语的提取方法及装置、存储介质及程序产品
US9953652B1 (en) Selective generalization of search queries
WO2023016267A1 (zh) 垃圾评论的识别方法、装置、设备及介质
CN113157946B (zh) 实体链接方法、装置、电子设备及存储介质
KR101614551B1 (ko) 카테고리 매칭을 이용한 키워드 추출 시스템 및 방법
US11314794B2 (en) System and method for adaptively adjusting related search words
CN114444491A (zh) 新词识别方法和装置
CN113268987B (zh) 一种实体名称识别方法、装置、电子设备和存储介质
CN118427308B (zh) 一种基于云计算的文档数据检测方法
JP7305077B2 (ja) 情報処理装置、要約文出力方法、及び要約文出力プログラム
KR100525616B1 (ko) 연관 검색 쿼리 추출 방법 및 시스템

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15900224

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 15534373

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15900224

Country of ref document: EP

Kind code of ref document: A1