WO2017193865A1 - 一种信息搜索方法及装置 - Google Patents

一种信息搜索方法及装置 Download PDF

Info

Publication number
WO2017193865A1
WO2017193865A1 PCT/CN2017/083032 CN2017083032W WO2017193865A1 WO 2017193865 A1 WO2017193865 A1 WO 2017193865A1 CN 2017083032 W CN2017083032 W CN 2017083032W WO 2017193865 A1 WO2017193865 A1 WO 2017193865A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
event
keyword
keyword group
search
Prior art date
Application number
PCT/CN2017/083032
Other languages
English (en)
French (fr)
Inventor
叶新
李前令
王刚
Original Assignee
广州神马移动信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州神马移动信息科技有限公司 filed Critical 广州神马移动信息科技有限公司
Publication of WO2017193865A1 publication Critical patent/WO2017193865A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • the present invention relates to the field of Internet communication technologies, and in particular, to an information search method and apparatus.
  • search engine needs to search for information required by the user according to the keyword group input by the user.
  • the related art provides an information search method, including: querying and acquiring information matching a keyword group according to a keyword group input by a user, and obtaining an information search result. Calculating the relevance of each information in the information search result to the keyword group, sorting all the information in the information search result according to the relevance degree of each information, and transmitting the sorted information search result to the user.
  • the information search according to the keyword group input by the user may result in a small amount of information acquired, and it is highly possible that the user cannot be searched.
  • the information that is really needed leads to a low accuracy of information search.
  • An object of the present invention is to provide an information search method and apparatus, which corrects the types of keywords in a keyword group input by a user when the number of pieces of information directly searched for by the keyword group input by the user is small.
  • the corrected keyword group is made more in line with the user's search intent. Re-searching the information according to the corrected keyword group can increase the number of searched information and improve the accuracy of information search.
  • an embodiment of the present invention provides an information search method, where the method includes:
  • the type of the keyword in the keyword group is corrected, and the information search result corresponding to the corrected keyword group is obtained.
  • the embodiment of the present invention provides the first possible implementation manner of the foregoing first aspect, wherein the quality information includes the number of information included in the information search result, and each information and the keyword group.
  • the embodiment of the present invention provides the second possible implementation manner of the foregoing first aspect, wherein the correcting the type of the keyword in the keyword group includes:
  • the type of the keyword in the keyword group is corrected according to the necessary coefficient corresponding to the keyword of the necessary type.
  • the embodiment of the present invention provides a third possible implementation manner of the foregoing first aspect, wherein, according to the keyword group, from a pre-established information event library Get information events that match the search intent criteria, including:
  • An information event having a degree of correlation with the keyword group greater than a preset relevance is determined as an information event that meets a search intent condition.
  • the embodiment of the present invention provides the fourth possible implementation manner of the foregoing first aspect, wherein the separately calculating each obtained information event and the keyword group Relevance, including:
  • the embodiment of the present invention provides the fifth possible implementation manner of the foregoing first aspect, wherein, according to the keyword group, from a pre-established information event library Get information events that match the search intent criteria, including:
  • the two information events are determined as information events that meet the search intent condition.
  • the embodiment of the present invention provides the sixth possible implementation manner of the foregoing first aspect, wherein the computing calculates any two information events in each information event. Relevance, including:
  • the embodiment of the present invention provides the seventh possible implementation manner of the foregoing first aspect, wherein the determining the necessary type according to the information event that meets the search intent condition
  • the necessary coefficients corresponding to the keywords include:
  • the necessary coefficients corresponding to the keywords of the necessary type are calculated according to the determined number of documents included in the information event.
  • the embodiment of the present invention provides the eighth possible implementation manner of the foregoing first aspect, wherein the necessary coefficient corresponding to the keyword according to the necessary type is The types of keywords in the keyword group are corrected, including:
  • the type of the keyword in the non-essential word set is corrected to a non-essential type, and if so, the correction of the type of the keyword in the keyword group is stopped.
  • the embodiment of the present invention provides the ninth possible implementation manner of the foregoing first aspect, wherein, according to the keyword group, obtaining an information event that meets a search intention condition from a pre-established information event library Previously, it also included:
  • the captured information documents are clustered into multiple information events according to the event keywords corresponding to each information document and the weights corresponding to the event keywords;
  • the information event library is established according to the plurality of information events, the event keywords corresponding to each information event, and the weights corresponding to the event keywords.
  • the embodiment of the present invention provides the foregoing A tenth possible implementation manner of the aspect, wherein the obtaining, according to the keyword group, an information event that meets a preset keyword coverage condition from a pre-established information event library, includes:
  • the information event including all keywords in the keyword group in the corresponding event keyword is obtained from the pre-established information event library, and the obtained information event is determined as an information event that meets the preset keyword coverage condition. ;
  • the number of matching words is calculated according to the number of the keywords, and information about at least the keyword of the keyword group in the corresponding event keyword is obtained from the pre-established information event library.
  • the event determines the acquired information event as an information event that meets the preset keyword coverage condition.
  • an embodiment of the present invention provides an information search apparatus, where the apparatus includes:
  • An obtaining module configured to acquire, according to the received keyword group, an information search result corresponding to the keyword group
  • a determining module configured to determine, according to the quality information of the information search result, whether the re-search condition is met
  • a correction module configured to correct a type of the keyword in the keyword group when the determining module determines that the re-search condition is satisfied, and obtain an information search result corresponding to the corrected keyword group.
  • the embodiment of the present invention provides the first possible implementation manner of the foregoing second aspect, wherein the quality information includes the number of information included in the information search result, and each information and the keyword group.
  • the degree of matching between the two; the determining module includes:
  • a statistical unit configured to count the number of information included in the information search result
  • a calculating unit configured to separately calculate a matching degree between each information in the information search result and the keyword group
  • a determining unit configured to determine whether the number of the information is greater than a preset value, and determining, according to the matching degree corresponding to each information, whether the information search result includes information that the matching degree is greater than a preset threshold;
  • a determining unit configured to: when it is determined that the number of the information is less than or equal to the preset value, or determine that the information search result does not include information that the matching degree is greater than the preset threshold, The re-search condition is satisfied, otherwise, the re-search condition is not satisfied.
  • the embodiment of the present invention provides the second possible implementation manner of the foregoing second aspect, wherein the correcting module includes:
  • An obtaining unit configured to acquire, according to the keyword group, an information event that meets a search intent condition from a pre-established information event library
  • a first determining unit configured to perform text analysis on the keyword group, and determine a type of each keyword included in the keyword group, where the type of the keyword includes a necessary type and a non-essential type;
  • a second determining unit configured to determine, according to the information event that meets the search intent condition, a necessary coefficient corresponding to a keyword of a necessary type
  • the correcting unit is configured to correct the type of the keyword in the keyword group according to the necessary coefficient corresponding to the keyword of the necessary type.
  • the embodiment of the present invention provides a third possible implementation manner of the foregoing second aspect, where the acquiring unit includes:
  • a first obtaining subunit configured to obtain, according to the keyword group, an information event that meets a preset keyword coverage condition from a pre-established information event library
  • a first calculating subunit configured to separately calculate a correlation between each acquired information event and the keyword group
  • a first determining subunit configured to determine an information event that is related to the preset correlation degree with the keyword group as an information event that meets a search intent condition.
  • the embodiment of the present invention provides the fourth possible implementation manner of the foregoing second aspect, wherein the first calculating subunit is configured to include according to the keyword group. Determining a phrase vector corresponding to the keyword group; determining an event vector corresponding to each information event according to the event keyword corresponding to each information event obtained; respectively calculating an event vector corresponding to each information event An angle cosine between the phrase vectors corresponding to the keyword group, and the correlation between the information event and the keyword group is obtained.
  • the embodiment of the present invention provides the fifth possible implementation manner of the foregoing second aspect, wherein the acquiring unit includes:
  • a second obtaining subunit for using a pre-established information event library according to the keyword group Obtain an information event that meets the preset keyword coverage condition
  • a second calculating subunit configured to calculate a correlation between any two information events in each information event obtained
  • a second determining subunit configured to determine the two information events as information events that meet the search intent condition if the correlation between the two information events is greater than the preset relevance.
  • the embodiment of the present invention provides the sixth possible implementation manner of the foregoing second aspect, wherein the second calculating subunit is configured to use each information obtained according to The event keyword corresponding to the event respectively determines an event vector corresponding to each information event; respectively calculates an angle cosine value between event vectors corresponding to any two information events in each information event, and obtains each information event The degree of correlation between any two information events.
  • the embodiment of the present invention provides the seventh possible implementation manner of the foregoing second aspect, wherein the second determining unit includes:
  • a third determining subunit configured to determine, from the information event that meets the search intent condition, an information event that matches a keyword of a necessary type
  • a third calculating subunit configured to calculate a necessary coefficient corresponding to the keyword of the necessary type according to the determined number of documents included in the information event.
  • the correcting unit includes:
  • a first determining subunit configured to determine, respectively, whether a necessary coefficient corresponding to each necessary type of keyword included in the keyword group is less than a preset necessary threshold
  • a second determining subunit configured to determine whether the non-essential word set includes all the necessary types of keywords of the keyword group
  • Correcting the subunit if not, correcting the type of the keyword in the non-essential word set to a non-essential type, and if so, stopping the correction of the type of the keyword in the keyword group.
  • the embodiment of the present invention provides the ninth possible implementation manner of the foregoing second aspect, wherein the device further includes:
  • An information event library establishing module configured to crawl an information document by using a web crawler; extract event keywords in each information document, and determine weights corresponding to the event keywords; according to event keywords and events corresponding to each information document The weight corresponding to the keyword is used to cluster the captured information document into a plurality of information events; and the information event library is established according to the plurality of information events, the event keywords corresponding to each information event, and the weights corresponding to the event keywords.
  • the embodiment of the present invention provides the tenth possible implementation manner of the foregoing second aspect, wherein the first obtaining subunit is configured to determine that the keyword group includes Whether the number of keywords is less than a preset number; if yes, obtaining an information event including all keywords in the keyword group in the corresponding event keyword from the pre-established information event library, and determining the acquired information event An information event that meets the preset keyword coverage condition; if not, the number of matching words is calculated according to the number of the keywords, and the corresponding event keyword is obtained from the pre-established information event library to include at least the matching word
  • the information events of the keywords in the plurality of keyword groups determine the acquired information events as information events that meet the preset keyword coverage conditions.
  • an embodiment of the present invention provides an information search apparatus, where the apparatus includes: a processor, a memory, a bus, and a communication interface, where the processor, the communication interface, and the memory are connected by using the bus;
  • the memory is used to store a program
  • the processor configured to invoke a program stored in the memory by the bus, to perform the method of any of the above.
  • the information search result corresponding to the keyword group is obtained according to the received keyword group; whether the re-search condition is satisfied according to the quality information of the information search result; When the type of the keyword in the keyword group is corrected, the information search result corresponding to the corrected keyword group is obtained.
  • the invention judges whether the re-search condition is satisfied according to the information search result obtained for the first time, and corrects the type of the keyword in the keyword group input by the user when satisfied, thereby greatly reducing spelling errors or words not related to the user's search intention in the information search.
  • the reference in the middle makes the corrected keyword group more in line with the user's search intent. Re-searching the information according to the corrected keyword group greatly increases the number of searched information and increases the probability of searching for information that the user really needs. Improve the accuracy of information search.
  • FIG. 1A is a flowchart of an information search method according to Embodiment 1 of the present invention.
  • FIG. 1B is a schematic flowchart of a correction keyword group provided by Embodiment 1 of the present invention.
  • FIG. 2 is a schematic structural diagram of an information search apparatus according to Embodiment 2 of the present invention.
  • FIG. 3 is a schematic structural diagram of an information search apparatus according to Embodiment 3 of the present invention.
  • an embodiment of the present invention provides an information search method and apparatus. Description will be made below by way of examples.
  • an embodiment of the present invention provides an information search method.
  • the method specifically includes the following steps:
  • Step 101 Acquire an information search result corresponding to the keyword group according to the received keyword group.
  • the execution subject of the embodiment of the present invention may be a server of a search engine.
  • the terminal involved in the embodiment of the present invention may be, for example, an intelligent electronic device such as a mobile phone or a computer.
  • the user searches for information through the search engine, the user submits a keyword group for expressing the user's search intention to the server through the terminal, and the keyword group includes one or more keywords.
  • the server After receiving the keyword group submitted by the user, the server performs text analysis on the keyword group, performs word segmentation on the keyword group, determines each keyword included in the keyword group, and determines each keyword group according to the part of speech and word meaning of each keyword.
  • the type of keyword, the type of keyword includes three types: necessary type, optional type, and non-essential type.
  • Keywords of the necessary type are also called AND logic words, which are words that must be included in the search information.
  • the keyword group is “Shandong Industry”, then the two keywords “Shandong” and “Industrial” are very important, both are AND logic words, and the search information needs to include these two keywords at the same time.
  • the optional type of keyword is also called OR logical word, which is an extension of some keywords.
  • the search information only needs to include one of the OR logic words.
  • the keyword group is “Huang Huaweing and Yang Ying”, Yang Ying’s English name is “Angelababy”, and the keyword “Yang Ying” is expanded to get the keyword “Angelababy”.
  • the keywords “Yang Ying” and “Angelababy” are OR logic words, the search information may only contain the keyword “Yang Ying” or only the keyword "Angelababy”.
  • a keyword of an unnecessary type may also be referred to as a RANK logical word, and is a word that is not necessarily included in the searched information. For example, if the keyword group is “Beijing Guoan vs. Tianjin Teda”, the keyword “competition” is the RANK logic word, and the search information may not include the keyword “competition”.
  • the server determines the type of each keyword
  • the information matching the keyword group is queried in the Internet according to keywords included in the keyword group submitted by the user, and the information matching the keyword group should include at least the keyword group.
  • the server obtains the queried information to the server local, and uses all the obtained information as the information search result corresponding to the keyword group.
  • step 101 In order to prevent the number of information included in the information search result corresponding to the keyword group obtained in the step 101 from being too small, and the information that the user really needs is lacking, after obtaining the information search result corresponding to the keyword group in the above manner, the operation of the following step 102 is performed. To determine if you need to re-search.
  • Step 102 Determine whether the re-search condition is met according to the quality information of the information search result. If yes, execute step 103. If no, send the obtained information search result to the user terminal, and end the operation.
  • the above quality information includes the number of information included in the information search result and the degree of matching between each information and the keyword group.
  • the process of specifically determining whether the re-search condition is satisfied includes:
  • the number of information included in the statistical information search result respectively calculating the matching degree between each information in the information search result and the keyword group; determining whether the number of information is greater than a preset value, and determining according to the matching degree corresponding to each information Whether the information search result contains information whose matching degree is greater than a preset threshold. If the number of information included in the information search result is less than or equal to the preset value, or the information search result does not include the information whose matching degree is greater than the preset threshold, it is judged that the re-search condition is satisfied; otherwise, the re-search condition is not satisfied.
  • the degree of matching between the above information and the keyword group is used to indicate the degree of correlation between the content of the information and the keywords included in the keyword group.
  • the preset value may be 0 or 5
  • the preset threshold may be 3 or 4
  • the specific value of the preset value and the preset threshold is not specifically limited in the embodiment of the present invention, and may be specifically determined in practical applications. Requirements are set.
  • the quality information of the information search result may further include a quality score corresponding to each information, and the quality score corresponding to the information may be calculated according to the matching degree of the information and the keyword group and the length and integrity of the information content. .
  • the re-search condition When it is judged whether the re-search condition is satisfied, the number of pieces of information whose quality score is smaller than the preset score in the information search result is determined. If the number of pieces of information in the information search result whose quality score is less than the preset score is greater than the preset number, the judgment is satisfied to re-search. The condition is met, otherwise, the judgment does not satisfy the re-search condition.
  • the information search needs to be performed again by the operation of the following step 103.
  • the information search result includes information whose matching degree is greater than a preset threshold, it is considered that the quality of the information search result obtained in step 101 is high, and The user's search needs are satisfied, so the information search is not re-executed, and the obtained information search result is directly sent to the user's terminal, and the operation ends.
  • Step 103 Correct the type of the keyword in the keyword group, and obtain the information search result corresponding to the corrected keyword group.
  • the keyword group input by the user contains a misspelling or contains a word that is not related to the user's search intention, so that the information search result directly obtained according to the keyword submitted by the user does not satisfy the re-search condition. . It is therefore necessary to correct the type of keywords in the keyword group submitted by the user to eliminate the adverse effects caused by misspellings or words that are not related to the user's search intent.
  • an information event library for information query search before the correction of the type of the keyword in the keyword group, an information event library for information query search is established, and the specific establishment process includes:
  • the information document is clustered into a plurality of information events; the information event library is established according to the plurality of information events, the event keywords corresponding to each information event, and the weights corresponding to the event keywords.
  • the above event keyword is a word whose frequency appears in the information document is higher than the preset frequency, and the weight corresponding to the event keyword may be determined according to the frequency of occurrence of the event keyword and the position appearing in the information document.
  • the information documents containing the same event keywords are clustered into a document collection, which is the above information event. After the plurality of information events are obtained by clustering in the above manner, for each information event, a mapping relationship between the information event, the event keyword corresponding to the information event, and the weight corresponding to each event keyword is established, each of which will be established. Mapping relationship corresponding to information events Stored in the information event library.
  • S1 Obtain an information event that meets the search intention condition from the pre-established information event library according to the keyword group.
  • the above search intent condition is used to determine whether the acquired information event conforms to the search intent of the user expressed by the keyword group.
  • the search intent condition may be embodied by a preset keyword coverage condition and a correlation between the information event and the keyword group, and the preset keyword coverage condition defines the event keyword corresponding to the acquired information event. At least the number of keywords in the keyword group to be included, when the information event meets the preset keyword coverage condition, the correlation between the information event and the keyword group is greater than the preset relevance, and the information event can be considered to meet the above search intention. condition.
  • the specific process of obtaining the information event that meets the search intent condition includes:
  • the information event that meets the preset keyword coverage condition is obtained from the pre-established information event library; the correlation between each information event and the keyword group is calculated separately; and the correlation between the keyword group and the keyword group is calculated.
  • An information event greater than the preset relevance is determined as an information event that meets the search intent condition.
  • the preset keyword coverage condition is related to the number of keywords included in the keyword group.
  • the high keyword coverage rate that is, the event keyword corresponding to the information event should cover all the keywords in the keyword group as much as possible.
  • the event keyword corresponding to the information event may cover only some of the keywords in the keyword group.
  • a preset number is set, and the preset number may be 1 or 3 or the like.
  • the number of keywords included in the keyword group is considered to be small, and high keyword coverage is required.
  • the keyword group is considered to have a large number of keywords, thereby reducing the coverage of the keyword.
  • the information event that meets the preset keyword coverage condition is obtained from the pre-established information event library, and specifically includes:
  • Determining whether the number of keywords included in the keyword group is less than a preset number if yes, acquiring, from the pre-established information event library, an information event including all keywords in the keyword group in the corresponding event keyword, and the obtained information
  • the event is determined as an information event that meets the preset keyword coverage condition; if not, the number of matching words is calculated according to the number of keywords, and the corresponding event keyword is obtained from the pre-established information event library, and at least the matching words are included.
  • the information event of the keyword in the keyword group determines the acquired information event as an information event that meets the preset keyword coverage condition.
  • the number of keywords in the keyword group is 10 and the matching coefficient is 5
  • the calculated number of matching words is 3, that is, the event keyword corresponding to the information event meeting the preset keyword coverage condition should include at least Three keywords in the keyword group.
  • the angle cosine between the event vector and the phrase vector corresponding to the keyword group obtains the correlation between each information event and the keyword group.
  • the number of keywords included in the keyword group is determined as the number of dimensions of the phrase vector, and the element value in each dimension is the weight of the keyword corresponding to the dimension, and the weight of the keyword may be based on the key
  • the event vector corresponding to the information event is determined by determining the number of event keywords corresponding to the information event as the number of dimensions of the event vector, and the element value in each dimension is the weight of the event keyword corresponding to the dimension.
  • the phrase vector corresponding to the keyword group is V1
  • the information event corresponds to The event vector is V2
  • any two information events in the information event that meet the preset keyword coverage condition may also be used.
  • the correlation between the correlations is greater than the preset correlation, and the specific determination process includes:
  • an information event that meets the preset keyword coverage condition is obtained from the pre-established information event library; and the correlation between any two information events in each information event obtained is calculated; if two information events are between If the correlation is greater than the preset relevance, the two information events are determined as information events that meet the search intent condition.
  • step S1 After the information event meeting the search intent condition is acquired in step S1, the type of the keyword in the keyword group submitted by the user is corrected by the following steps S2-S4.
  • S2 Perform text analysis on the keyword group to determine keywords of the necessary types included in the keyword group.
  • Word segmentation is performed on the keyword group to obtain various keywords included in the keyword group, and the part of speech and meaning of each keyword are determined.
  • the part of speech includes nouns, verbs or adjectives, and the meaning of the word is the specific meaning of the keyword.
  • the necessary types of keywords included in the keyword group are determined, and the part of the keyword of the necessary type is usually a noun.
  • S3 Determine the necessary coefficient corresponding to the keyword of the necessary type according to the information event that meets the search intention condition.
  • the above required coefficient is necessary for each information event according to the search intent condition Type keywords to score and get the total score.
  • the process of specifically determining the necessary coefficients corresponding to the keywords of the necessary types includes:
  • an information event matching the keyword of the necessary type is determined; and the necessary coefficient corresponding to the keyword of the necessary type is calculated according to the number of documents included in the determined information event.
  • An information event that matches a keyword of a necessary type is an information event in which a keyword of the necessary type is included in the corresponding event keyword.
  • the score for the keyword of the necessary type is the first preset value
  • the number of documents included in the information event is less than Or equal to the preset number of documents
  • the score for scoring the keywords of the necessary type is the second preset value.
  • the necessary coefficients corresponding to the keywords of each necessary type may be respectively determined in the above manner.
  • the specific process of correcting the types of keywords in the keyword group mentioned above includes:
  • a keyword of a necessary type whose necessary coefficient is smaller than a preset necessary threshold, it is considered that the keyword of the necessary type has a low contribution to the expression intention of the expression user, and is added to the non-essential word set.
  • the contribution of the search intent is very low, that is, the keyword group submitted by the user itself is not clear, and is not sufficient to express the user's search intention, so the correction of the type of the keyword in the keyword group is stopped, and the operation is ended.
  • the server may further send the prompt information of the re-input keyword group to the user's terminal, so as to prompt the user to re-enter the expression.
  • the keyword group whose search intent is.
  • the type of the necessary type of keywords is modified to a non-essential type.
  • the information that is no longer required to be acquired must include the keywords of the necessary type, thereby reducing the number of keywords that must be included in the acquired information, and thus the matching is obtained.
  • the amount of information that the user searches for intent increases accordingly, eliminating the negative impact of some unrelated or misspelled keywords in the keyword group on the search results.
  • the search result obtained by re-searching is sent to the user's terminal, so that the user browses the information that he really needs.
  • the information search result corresponding to the keyword group is obtained according to the received keyword group; whether the re-search condition is satisfied according to the quality information of the information search result; and when the re-search condition is satisfied, the key is The type of the keyword in the phrase is corrected, and the information search result corresponding to the corrected keyword group is obtained.
  • the invention judges whether the re-search condition is satisfied according to the information search result obtained for the first time, and corrects the type of the keyword in the keyword group input by the user when satisfied, thereby greatly reducing spelling errors or words not related to the user's search intention in the information search.
  • the reference in the middle makes the corrected keyword group more in line with the user's search intent. Re-searching the information according to the corrected keyword group greatly increases the number of searched information, improves the probability of searching for the information that the user really needs, and improves the accuracy of the information search.
  • an embodiment of the present invention provides an information search apparatus, which is used to execute the information search method provided in Embodiment 1 above.
  • the device specifically includes:
  • the obtaining module 201 is configured to obtain, according to the received keyword group, an information search result corresponding to the keyword group;
  • the determining module 202 is configured to determine, according to the quality information of the information search result, whether the weight is satisfied New search criteria;
  • the correcting module 203 is configured to correct the type of the keyword in the keyword group when the determining module 202 determines that the re-search condition is satisfied, and obtain the information search result corresponding to the corrected keyword group.
  • the judging module 202 determines that the re-search condition is not satisfied, it is considered that the quality of the information search result obtained by the obtaining module 201 is high, and the search requirement of the user can be satisfied, so that the information search is not re-executed, and the obtained information search result is directly sent. Give the user the terminal and end the operation.
  • the quality information includes the number of information included in the information search result and the degree of matching between each information and the keyword group; the determining module 202 determines whether the statistical unit, the calculating unit, the determining unit, and the determining unit determine whether Meet the re-search criteria.
  • a statistical unit configured to count the number of information included in the information search result; a calculating unit, configured to separately calculate a matching degree between each information in the information search result and the keyword group; and a determining unit, configured to determine whether the number of the information is greater than a pre- Setting a value, and determining, according to the matching degree corresponding to each information, whether the information search result includes information whose matching degree is greater than a preset threshold; the determining unit is configured to: when the number of the determined information is less than or equal to a preset value, or determine the information search When the result does not include information whose matching degree is greater than the preset threshold, it is judged that the re-search condition is satisfied, otherwise, the re-search condition is not satisfied.
  • the correction module 203 corrects the keyword group submitted by the user by the following acquisition unit, the first determination unit, the second determination unit, and the correction unit.
  • An obtaining unit configured to acquire, according to the keyword group, an information event that meets a search intent condition from a pre-established information event library; a first determining unit, configured to perform text analysis on the keyword group, and determine each of the keyword groups
  • the type of the keyword, the type of the keyword includes a necessary type and a non-essential type
  • the second determining unit is configured to determine a necessary coefficient corresponding to the keyword of the necessary type according to the information event that meets the search intention condition
  • the correcting unit is configured to The necessary coefficient corresponding to the keyword of the necessary type is corrected for the type of the keyword in the keyword group.
  • the acquiring unit determines, by the first acquiring subunit, the first calculating subunit, and the first determining subunit, an information event that meets the search intent condition.
  • a first acquiring subunit configured to obtain, according to the keyword group, an information event that meets a preset keyword coverage condition from a pre-established information event library; the first computing subunit is configured to separately calculate The correlation between each information event and the keyword group is obtained; the first determining sub-unit is configured to determine an information event that is related to the preset correlation degree with the keyword group as an information event that meets the search intent condition.
  • the first calculating sub-unit is configured to determine a phrase vector corresponding to the keyword group according to each keyword included in the keyword group, and determine an event vector corresponding to each information event according to the event keyword corresponding to each information event obtained.
  • the angle cosine between the event vector corresponding to each information event and the phrase vector corresponding to the keyword group is calculated separately, and the correlation between each information event and the keyword group is obtained.
  • the acquiring unit may further determine an information event that meets the search intention condition by using the second obtaining subunit, the second calculating subunit, and the second determining subunit as follows.
  • a second obtaining sub-unit configured to obtain, according to the keyword group, an information event that meets a preset keyword coverage condition from a pre-established information event library; and a second computing sub-unit, configured to calculate any two of the obtained information events
  • the correlation between the information events; the second determining subunit is configured to determine the two information events as information events that meet the search intent condition if the correlation between the two information events is greater than the preset relevance.
  • the second calculating sub-unit is configured to respectively determine an event vector corresponding to each information event according to the event keyword corresponding to each information event acquired; and respectively calculate an event vector corresponding to any two information events in each information event.
  • the cosine value between the angles gives the correlation between any two information events in each information event.
  • the second determining unit obtains the necessary coefficients corresponding to the keywords of the necessary type by using the third determining subunit and the third calculating subunit as follows.
  • a third determining subunit configured to determine, from an information event that meets a search intent condition, an information event that matches a keyword of a necessary type; and a third computing subunit, configured to calculate, according to the number of documents included in the determined information event The necessary coefficient corresponding to the keyword of the necessary type.
  • the correcting unit corrects the type of the keyword in the keyword group submitted by the user by the following first determining subunit, adding subunit, second judging subunit, and correcting subunit.
  • a first determining subunit configured to determine, respectively, whether a necessary coefficient corresponding to each necessary type of keyword included in the keyword group is less than a preset necessary threshold; and adding a subunit for using a necessary type that the necessary coefficient is less than a preset necessary threshold Keyword added to non-essential word set; second judgment a subunit, configured to determine whether all types of keywords of the keyword group are included in the non-essential word set; the correcting subunit, if not, correcting the type of the keyword in the non-essential word set to a non-essential type, If so, the correction of the type of the keyword in the keyword group is stopped.
  • the device before correcting the type of the keyword in the keyword group submitted by the user by the correction module 203, the device further pre-establishes the information event library through the following information event library establishment module.
  • An information event library establishing module is configured to crawl an information document through a web crawler; extract event keywords in each information document, and determine weights corresponding to the event keywords; according to event keywords and event keywords corresponding to each information document Corresponding weights are used to cluster the captured information documents into multiple information events; the information event library is established according to the multiple information events, the event keywords corresponding to each information event, and the weights corresponding to the event keywords.
  • the first obtaining sub-unit is configured to determine whether the number of keywords included in the keyword group is less than a preset number; if yes, obtain the corresponding event keyword from the pre-established information event library.
  • the information event including all the keywords in the keyword group determines the obtained information event as an information event that meets the preset keyword coverage condition; if not, the number of matching words is calculated according to the number of keywords, from the pre-established information event library And acquiring an information event that includes at least a keyword in the keyword group of the matching event keyword, and determining the acquired information event as an information event that meets the preset keyword coverage condition.
  • the information search result corresponding to the keyword group is obtained according to the received keyword group; whether the re-search condition is satisfied according to the quality information of the information search result; and when the re-search condition is satisfied, the key is The type of the keyword in the phrase is corrected, and the information search result corresponding to the corrected keyword group is obtained.
  • the invention judges whether the re-search condition is satisfied according to the information search result obtained for the first time, and corrects the type of the keyword in the keyword group input by the user when satisfied, thereby greatly reducing spelling errors or words not related to the user's search intention in the information search.
  • the reference in the middle makes the corrected keyword group more in line with the user's search intent. Re-searching the information according to the corrected keyword group greatly increases the number of searched information, improves the probability of searching for the information that the user really needs, and improves the accuracy of the information search.
  • an embodiment of the present invention provides an information search apparatus, which is used to execute the information search method provided in Embodiment 1 above.
  • the device specifically includes: a processor 301, a memory 302, a bus 303, and a communication interface 304.
  • the processor 301, the communication interface 304, and the memory 302 are connected by a bus 303.
  • the memory 302 is used to store a program
  • the processor 301 is configured to invoke the program stored in the memory 302 via the bus 303 to execute the information search method provided in Embodiment 1.
  • the processor 301 when performing the information search method provided in the embodiment 1, acquires the information search result corresponding to the keyword group according to the received keyword group; and determines whether the re-search condition is satisfied according to the quality information of the information search result; When the condition is re-searched, the type of the keyword in the keyword group is corrected, and the information search result corresponding to the corrected keyword group is obtained.
  • the information search result corresponding to the keyword group is obtained according to the received keyword group; whether the re-search condition is satisfied according to the quality information of the information search result; and when the re-search condition is satisfied, the key is The type of the keyword in the phrase is corrected, and the information search result corresponding to the corrected keyword group is obtained.
  • the invention judges whether the re-search condition is satisfied according to the information search result obtained for the first time, and corrects the type of the keyword in the keyword group input by the user when satisfied, thereby greatly reducing spelling errors or words not related to the user's search intention in the information search.
  • the reference in the middle makes the corrected keyword group more in line with the user's search intent. Re-searching the information according to the corrected keyword group greatly increases the number of searched information, improves the probability of searching for the information that the user really needs, and improves the accuracy of the information search.
  • the information search device provided by the embodiment of the present invention may be specific hardware on the device or software or firmware installed on the device. It will be apparent to those skilled in the art that, for the convenience and brevity of the description, the specific processes of the systems, devices, and units described above are Reference may be made to the corresponding process in the above method embodiments.
  • the disclosed apparatus and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some communication interface, device or unit, and may be electrical, mechanical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本发明提供一种信息搜索方法及装置。该方法包括:根据接收到的关键词组,获取关键词组对应的信息搜索结果;根据该信息搜索结果的质量信息,判断是否满足重新搜索条件;当判断满足重新搜索条件时,对该关键词组中关键词的类型进行矫正,获取矫正后的关键词组对应的信息搜索结果。本发明根据首次获取的信息搜索结果判断是否满足重新搜索条件,当满足时对用户输入的关键词组进行矫正,使得矫正后的关键词组更加符合用户的搜索意图。根据矫正后的关键词组重新进行信息搜索,提高了信息搜索的准确性。

Description

一种信息搜索方法及装置 技术领域
本发明涉及互联网通信技术领域,具体而言,涉及一种信息搜索方法及装置。
背景技术
目前,用户经常通过搜索引擎进行信息搜素,当用户在搜索引擎中输入待搜索的关键词组时,搜索引擎需要根据用户输入的关键词组搜索用户需要的信息。
当前,相关技术中提供了一种信息搜索方法,包括:根据用户输入的关键词组,查询并获取与关键词组匹配的信息,得到信息搜索结果。计算信息搜索结果中每个信息与关键词组的相关度,根据每个信息对应的相关度对信息搜索结果中的所有信息进行排序,将排序后的信息搜索结果发送给用户。
但当用户输入的关键词组中存在拼写错误,或者存在与用户搜索意图不相关的词时,根据用户输入的关键词组进行信息搜索会导致获取的信息的数量很少,极有可能搜索不到用户真正需要的信息,导致信息搜索的准确性很低。
发明内容
本发明实施例的一个目的在于提供一种信息搜索方法及装置,当根据用户输入的关键词组直接搜索获取的信息的数量较少时,对用户输入的关键词组中的关键词的类型进行矫正,使得矫正后的关键词组更加符合用户的搜索意图。根据矫正后的关键词组重新进行信息搜索,可以增加搜索到的信息的数量,提高信息搜索的准确性。
第一方面,本发明实施例提供了一种信息搜索方法,所述方法包括:
根据接收到的关键词组,获取所述关键词组对应的信息搜索结果;
根据所述信息搜索结果的质量信息,判断是否满足重新搜索条件;
当判断满足所述重新搜索条件时,对所述关键词组中关键词的类型进行矫正,并获取矫正后的关键词组对应的信息搜索结果。
结合第一方面,本发明实施例提供了上述第一方面的第一种可能的实现方式,其中,所述质量信息包括所述信息搜索结果包含的信息的数目及每个信息与所述关键词组之间的匹配度;根据所述信息搜索结果的质量信息,判断是否满足重新搜索条件,包括:
统计所述信息搜索结果包括的信息的数目;
分别计算所述信息搜索结果中每个信息与所述关键词组之间的匹配度;
确定所述信息的数目是否大于预设数值,及根据所述每个信息对应的匹配度,确定所述信息搜索结果中是否包含匹配度大于预设阈值的信息;
当确定所述信息的数目小于或等于所述预设数值,或确定所述信息搜索结果中不包含匹配度大于所述预设阈值的信息时,判断满足重新搜索条件,否则,判断不满足所述重新搜索条件。
结合第一方面,本发明实施例提供了上述第一方面的第二种可能的实现方式,其中,所述对所述关键词组中关键词的类型进行矫正,包括:
根据所述关键词组,从预先建立的信息事件库中获取符合搜索意图条件的信息事件;
对所述关键词组进行文本分析,确定所述关键词组中包括的每个关键词的类型,关键词的类型包括必要类型和非必要类型;
根据所述符合搜索意图条件的信息事件,确定必要类型的关键词对应的必要系数;
根据必要类型的关键词对应的必要系数,对所述关键词组中关键词的类型进行矫正。
结合第一方面的第二种可能的实现方式,本发明实施例提供了上述第一方面的第三种可能的实现方式,其中,所述根据所述关键词组,从预先建立的信息事件库中获取符合搜索意图条件的信息事件,包括:
根据所述关键词组,从预先建立的信息事件库中获取符合预设关键词覆盖条件的信息事件;
分别计算获取的每个信息事件与所述关键词组之间的相关度;
将与所述关键词组之间的相关度大于预设相关度的信息事件确定为符合搜索意图条件的信息事件。
结合第一方面第三种可能的实现方式,本发明实施例提供了上述第一方面的第四种可能的实现方式,其中,所述分别计算获取的每个信息事件与所述关键词组之间的相关度,包括:
根据所述关键词组包括的每个关键词,确定所述关键词组对应的词组向量;
根据获取的每个信息事件对应的事件关键词,分别确定每个信息事件对应的事件向量;
分别计算每个信息事件对应的事件向量与所述关键词组对应的词组向量之间的夹角余弦值,得到所述每个信息事件与所述关键词组之间的相关度。
结合第一方面的第二种可能的实现方式,本发明实施例提供了上述第一方面的第五种可能的实现方式,其中,所述根据所述关键词组,从预先建立的信息事件库中获取符合搜索意图条件的信息事件,包括:
根据所述关键词组,从预先建立的信息事件库中获取符合预设关键词覆盖条件的信息事件;
计算获取的每个信息事件中任意两个信息事件之间的相关度;
若两个信息事件之间的相关度大于预设相关度,则将所述两个信息事件确定为符合搜索意图条件的信息事件。
结合第一方面的第五种可能的实现方式,本发明实施例提供了上述第一方面的第六种可能的实现方式,其中,所述计算获取的每个信息事件中任意两个信息事件之间的相关度,包括:
根据获取的每个信息事件对应的事件关键词,分别确定每个信息事件对应的事件向量;
分别计算每个信息事件中任意两个信息事件对应的事件向量之间的 夹角余弦值,得到所述每个信息事件中任意两个信息事件之间的相关度。
结合第一方面的第二种可能的实现方式,本发明实施例提供了上述第一方面的第七种可能的实现方式,其中,所述根据所述符合搜索意图条件的信息事件,确定必要类型的关键词对应的必要系数,包括:
从所述符合搜索意图条件的信息事件中,确定出与必要类型的关键词匹配的信息事件;
根据确定的所述信息事件包含的文档数量,计算必要类型的关键词对应的必要系数。
结合第一方面的第二种可能的实现方式,本发明实施例提供了上述第一方面的第八种可能的实现方式,其中,所述根据必要类型的关键词对应的必要系数,对所述关键词组中关键词的类型进行矫正,包括:
分别判断所述关键词组包括的每个必要类型的关键词对应的必要系数是否小于预设必要阈值;
将必要系数小于所述预设必要阈值的关键词添加到非必要词集合中;
判断所述非必要词集合中是否包含所述关键词组的所有必要类型的关键词;
如果否,则将所述非必要词集合中的关键词的类型矫正为非必要类型,如果是,则停止对所述关键词组中关键词的类型的矫正。
结合第一方面,本发明实施例提供了上述第一方面的第九种可能的实现方式,其中,所述根据所述关键词组,从预先建立的信息事件库中获取符合搜索意图条件的信息事件之前,还包括:
通过网络爬虫抓取信息文档;
提取每个信息文档中的事件关键词,并确定所述事件关键词对应的权重;
根据每个信息文档对应的事件关键词及事件关键词对应的权重,将抓取的信息文档聚类为多个信息事件;
根据所述多个信息事件、每个信息事件对应的事件关键词及事件关键词对应的权重,建立信息事件库。
结合第一方面的第三种可能的实现方式,本发明实施例提供了上述第 一方面的第十种可能的实现方式,其中,所述根据所述关键词组,从预先建立的信息事件库中获取符合预设关键词覆盖条件的信息事件,包括:
判断所述关键词组包括的关键词的数目是否小于预设数目;
如果是,则从预先建立的信息事件库中,获取对应的事件关键词中包含所述关键词组中所有关键词的信息事件,将获取的信息事件确定为符合预设关键词覆盖条件的信息事件;
如果否,则根据所述关键词的数目计算匹配词数,从预先建立的信息事件库中,获取对应的事件关键词中至少包含所述匹配词数个所述关键词组中的关键词的信息事件,将获取的信息事件确定为符合预设关键词覆盖条件的信息事件。
第二方面,本发明实施例提供了一种信息搜索装置,所述装置包括:
获取模块,用于根据接收到的关键词组,获取所述关键词组对应的信息搜索结果;
判断模块,用于根据所述信息搜索结果的质量信息,判断是否满足重新搜索条件;
矫正模块,用于当所述判断模块判断满足所述重新搜索条件时,对所述关键词组中关键词的类型进行矫正,并获取矫正后的关键词组对应的信息搜索结果。
结合第二方面,本发明实施例提供了上述第二方面的第一种可能的实现方式,其中,所述质量信息包括所述信息搜索结果包含的信息的数目及每个信息与所述关键词组之间的匹配度;所述判断模块包括:
统计单元,用于统计所述信息搜索结果包括的信息的数目;
计算单元,用于分别计算所述信息搜索结果中每个信息与所述关键词组之间的匹配度;
确定单元,用于确定所述信息的数目是否大于预设数值,及根据所述每个信息对应的匹配度,确定所述信息搜索结果中是否包含匹配度大于预设阈值的信息;
判断单元,用于当确定所述信息的数目小于或等于所述预设数值,或确定所述信息搜索结果中不包含匹配度大于所述预设阈值的信息时,判断 满足重新搜索条件,否则,判断不满足所述重新搜索条件。
结合第二方面,本发明实施例提供了上述第二方面的第二种可能的实现方式,其中,所述矫正模块包括:
获取单元,用于根据所述关键词组,从预先建立的信息事件库中获取符合搜索意图条件的信息事件;
第一确定单元,用于对所述关键词组进行文本分析,确定所述关键词组中包括的每个关键词的类型,关键词的类型包括必要类型和非必要类型;
第二确定单元,用于根据所述符合搜索意图条件的信息事件,确定必要类型的关键词对应的必要系数;
矫正单元,用于根据必要类型的关键词对应的必要系数,对所述关键词组中关键词的类型进行矫正。
结合第二方面的第二种可能的实现方式,本发明实施例提供了上述第二方面的第三种可能的实现方式,其中,所述获取单元包括:
第一获取子单元,用于根据所述关键词组,从预先建立的信息事件库中获取符合预设关键词覆盖条件的信息事件;
第一计算子单元,用于分别计算获取的每个信息事件与所述关键词组之间的相关度;
第一确定子单元,用于将与所述关键词组之间的相关度大于预设相关度的信息事件确定为符合搜索意图条件的信息事件。
结合第二方面的第三种可能的实现方式,本发明实施例提供了上述第二方面的第四种可能的实现方式,其中,所述第一计算子单元,用于根据所述关键词组包括的每个关键词,确定所述关键词组对应的词组向量;根据获取的每个信息事件对应的事件关键词,分别确定每个信息事件对应的事件向量;分别计算每个信息事件对应的事件向量与所述关键词组对应的词组向量之间的夹角余弦值,得到所述每个信息事件与所述关键词组之间的相关度。
结合第二方面的第二种可能的实现方式,本发明实施例提供了上述第二方面的第五种可能的实现方式,其中,所述获取单元包括:
第二获取子单元,用于根据所述关键词组,从预先建立的信息事件库 中获取符合预设关键词覆盖条件的信息事件;
第二计算子单元,用于计算获取的每个信息事件中任意两个信息事件之间的相关度;
第二确定子单元,用于若两个信息事件之间的相关度大于预设相关度,则将所述两个信息事件确定为符合搜索意图条件的信息事件。
结合第二方面的第五种可能的实现方式,本发明实施例提供了上述第二方面的第六种可能的实现方式,其中,所述第二计算子单元,用于根据获取的每个信息事件对应的事件关键词,分别确定每个信息事件对应的事件向量;分别计算每个信息事件中任意两个信息事件对应的事件向量之间的夹角余弦值,得到所述每个信息事件中任意两个信息事件之间的相关度。
结合第二方面的第二种可能的实现方式,本发明实施例提供了上述第二方面的第七种可能的实现方式,其中,所述第二确定单元包括:
第三确定子单元,用于从所述符合搜索意图条件的信息事件中,确定出与必要类型的关键词匹配的信息事件;
第三计算子单元,用于根据确定的所述信息事件包含的文档数量,计算必要类型的关键词对应的必要系数。
结合第二方面的第二种可能的实现方式,本发明实施例提供了上述第二方面的第八种可能的实现方式,其中,所述矫正单元包括:
第一判断子单元,用于分别判断所述关键词组包括的每个必要类型的关键词对应的必要系数是否小于预设必要阈值;
添加子单元,用于将必要系数小于所述预设必要阈值的必要关键词添加到非必要词集合中;
第二判断子单元,用于判断所述非必要词集合中是否包含所述关键词组的所有必要类型的关键词;
矫正子单元,用于如果否,则将所述非必要词集合中的关键词的类型矫正为非必要类型,如果是,则停止对所述关键词组中关键词的类型的矫正。
结合第二方面,本发明实施例提供了上述第二方面的第九种可能的实现方式,其中,所述装置还包括:
信息事件库建立模块,用于通过网络爬虫抓取信息文档;提取每个信息文档中的事件关键词,并确定所述事件关键词对应的权重;根据每个信息文档对应的事件关键词及事件关键词对应的权重,将抓取的信息文档聚类为多个信息事件;根据所述多个信息事件、每个信息事件对应的事件关键词及事件关键词对应的权重,建立信息事件库。
结合第二方面的第三种可能的实现方式,本发明实施例提供了上述第二方面的第十种可能的实现方式,其中,所述第一获取子单元,用于判断所述关键词组包括的关键词的数目是否小于预设数目;如果是,则从预先建立的信息事件库中,获取对应的事件关键词中包含所述关键词组中所有关键词的信息事件,将获取的信息事件确定为符合预设关键词覆盖条件的信息事件;如果否,则根据所述关键词的数目计算匹配词数,从预先建立的信息事件库中,获取对应的事件关键词中至少包含所述匹配词数个所述关键词组中的关键词的信息事件,将获取的信息事件确定为符合预设关键词覆盖条件的信息事件。
第三方面,本发明实施例提供了一种信息搜索装置,所述装置包括:处理器、存储器、总线和通信接口,所述处理器、所述通信接口和所述存储器通过所述总线连接;
所述存储器用于存储程序;
所述处理器,用于通过所述总线调用存储在所述存储器中的程序,执行上述任一项所述的方法。
在本发明实施例提供的方法及装置中,根据接收到的关键词组,获取关键词组对应的信息搜索结果;根据该信息搜索结果的质量信息,判断是否满足重新搜索条件;当判断满足重新搜索条件时,对该关键词组中关键词的类型进行矫正,获取矫正后的关键词组对应的信息搜索结果。本发明根据首次获取的信息搜索结果判断是否满足重新搜索条件,当满足时对用户输入的关键词组中关键词的类型进行矫正,大大减少了拼写错误或与用户搜索意图不相关的词在信息搜索中的参考性,使得矫正后的关键词组更加符合用户的搜索意图。根据矫正后的关键词组重新进行信息搜索,大大增加了搜索到的信息的数量,提高了搜索到用户真正需要的信息的几率, 提高了信息搜索的准确性。
为使本发明的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本发明的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1A示出了本发明实施例1所提供的一种信息搜索方法的流程图;
图1B示出了本发明实施例1所提供的一种矫正关键词组的流程示意图;
图2示出了本发明实施例2所提供的一种信息搜索装置的结构示意图;
图3示出了本发明实施例3所提供的一种信息搜索装置的结构示意图。
具体实施方式
下面将结合本发明实施例中附图,对本发明实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本发明的实施例的详细描述并非旨在限制要求保护的本发明的范围,而是仅仅表示本发明的选定实施例。基于本发明的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。
在用户通过搜索引擎进行信息搜索时,用户在搜索引擎中输入的关键词组中可能存在拼写错误,或者存在与用户搜索意图不相关的词,但相关技术中只根据用户输入的关键词组进行信息搜索,导致获取的信息的数量 较少,极有可能搜索不到用户真正需要的信息,导致信息搜索的准确性很低。基于此,本发明实施例提供了一种信息搜索方法及装置。下面通过实施例进行描述。
<实施例1>
参见图1A,本发明实施例提供了一种信息搜索方法。该方法具体包括以下步骤:
步骤101:根据接收到的关键词组,获取关键词组对应的信息搜索结果。
本发明实施例的执行主体可以为搜索引擎的服务器。本发明实施例涉及的终端,可以例如为手机、电脑等智能电子设备。当用户通过搜索引擎进行信息搜索时,用户通过终端向服务器提交用于表达用户搜索意图的关键词组,该关键词组中包括一个或多个关键词。服务器接收到用户提交的关键词组后,对该关键词组进行文本分析,对关键词组进行分词处理,确定出关键词组中包括的各个关键词,根据各个关键词的词性及词义确定关键词组中每个关键词的类型,关键词的类型包括必要类型、可选类型和非必要类型三种类型。
必要类型的关键词也称为AND逻辑词,是搜索的信息中必须要包含的词。例如,关键词组为“山东工业”,则“山东”和“工业”两个关键词都很重要,都为AND逻辑词,搜索的信息中需要同时包含这两个关键词。
可选类型的关键词也称为OR逻辑词,是对某些关键词的扩充,搜索的信息中只需包含OR逻辑词中的一个即可。例如,关键词组为“黄晓明和杨颖”,杨颖的英文名字为“Angelababy”,对关键词“杨颖”进行扩充得到关键词“Angelababy”,关键词“杨颖”和“Angelababy”即为OR逻辑词,搜索的信息中可仅包含关键词“杨颖”,或仅包含关键词“Angelababy”。
非必要类型的关键词也可称为RANK逻辑词,是搜索的信息中不必须包含的词。例如,关键词组为“北京国安对战天津泰达”,则关键词“对战”即为RANK逻辑词,搜索的信息中可以不包含关键词“对战”。
上述服务器确定每个关键词的类型后,根据用户提交的关键词组中包括的关键词,在互联网中查询与该关键词组匹配的信息,与该关键词组匹配的信息中应至少包含该关键词组中的每个必要类型的关键词及可选类型的关键词中的一个关键词。服务器将查询到的信息获取到服务器本地,将获取的所有信息作为该关键词组对应的信息搜索结果。
为了防止步骤101中获取的该关键词组对应的信息搜索结果包括的信息的数量过少,缺少用户真正需要的信息,在通过上述方式得到关键词组对应的信息搜索结果后,通过如下步骤102的操作来判断是否需要重新进行搜索。
步骤102:根据该信息搜索结果的质量信息,判断是否满足重新搜索条件,如果是,则执行步骤103,如果否,则将获取的信息搜索结果发送给用户的终端,结束操作。
上述质量信息包括信息搜索结果包含的信息的数目及每个信息与关键词组之间的匹配度。具体判断是否满足重新搜索条件的过程包括:
统计信息搜索结果包括的信息的数目;分别计算信息搜索结果中的每个信息与关键词组之间的匹配度;确定信息的数目是否大于预设数值,及根据每个信息对应的匹配度,确定信息搜索结果中是否包含匹配度大于预设阈值的信息。如果信息搜索结果包括的信息的数目小于或等于预设数值,或者信息搜索结果中不包含匹配度大于预设阈值的信息,判断满足重新搜索条件,否则,判断不满足重新搜索条件。
上述信息与关键词组之间的匹配度用于表示信息的内容与关键词组中包括的关键词之间的相关性程度。上述预设数值可以为0或5等,上述预设阈值可以为3或4等,本发明实施例并不具体限定上述预设数值及预设阈值的具体取值,在实际应用中可根据具体需求进行设置。
在本发明实施例中,信息搜索结果的质量信息还可以包括每个信息对应的质量分值,信息对应的质量分值可根据信息与关键词组的匹配度以及信息内容的长度和完整度来计算。在判断是否满足重新搜索条件时,确定信息搜索结果中质量分值小于预设分值的信息的数目。如果信息搜索结果中质量分值小于预设分值的信息的数目大于预设个数,则判断满足重新搜 索条件,否则,判断不满足重新搜索条件。
当判断满足重新搜索条件时,认为步骤101中获取的信息搜索结果中包含的信息的数目过少,或者认为获取的信息搜索结果中包含的信息的质量很差,无法满足用户的搜索需求,因此需要通过如下步骤103的操作来重新进行信息搜索。当判断获取的信息搜索结果中包含的信息的数目大于预设数值,且该信息搜索结果中包含匹配度大于预设阈值的信息时,认为步骤101中获取的信息搜索结果的质量很高,能够满足用户的搜索需求,因此不再重新进行信息搜索,直接将获取的信息搜索结果发送给用户的终端,结束操作。
步骤103:对关键词组中关键词的类型进行矫正,并获取矫正后的关键词组对应的信息搜索结果。
当判断出满足重新搜索条件时,认为用户输入的关键词组中包含拼写错误,或者包含与用户的搜索意图不相关的词,导致根据用户提交的关键词直接获取的信息搜索结果不满足重新搜索条件。因此需要对用户提交的关键词组中关键词的类型进行矫正,以消除由于拼写错误或与用户的搜索意图不相关的词导致的不利影响。
本发明实施例中,在对关键词组中关键词的类型进行矫正之前,建立用于信息查询搜索的信息事件库,具体建立过程包括:
通过网络爬虫抓取信息文档;提取每个信息文档中的事件关键词,并确定事件关键词对应的权重;根据每个信息文档对应的事件关键词及事件关键词对应的权重,将抓取的信息文档聚类为多个信息事件;根据多个信息事件、每个信息事件对应的事件关键词及事件关键词对应的权重,建立信息事件库。
上述事件关键词为信息文档中出现频次高于预设频次的词,事件关键词对应的权重可根据事件关键词出现的频次及在信息文档中出现的位置来确定。将包含同样的事件关键词的信息文档聚类为一个文档集合,该文档集合即为上述信息事件。通过上述方式聚类得到多个信息事件后,对于每个信息事件,建立信息事件、该信息事件对应的事件关键词及每个事件关键词对应的权重之间的映射关系,将建立的每个信息事件对应的映射关系 存储在信息事件库中。
如图1B所示,通过上述方式预先建立信息事件库之后,具体通过如下步骤S1-S4对关键词组中关键词的类型进行矫正:
S1:根据关键词组,从预先建立的信息事件库中获取符合搜索意图条件的信息事件。
上述搜索意图条件用于判定获取的信息事件是否符合关键词组所表达的用户的搜索意图。本发明实施例中,可以通过预设关键词覆盖条件及信息事件与关键词组之间的相关度来体现上述搜索意图条件,预设关键词覆盖条件限定了获取的信息事件对应的事件关键词中至少应包含的关键词组中关键词的数量,当信息事件符合预设关键词覆盖条件后信息事件与关键词组之间的相关度还要大于预设相关度,才能认为该信息事件符合上述搜索意图条件。
上述获取符合搜索意图条件的信息事件的具体过程包括:
根据关键词组,从预先建立的信息事件库中获取符合预设关键词覆盖条件的信息事件;分别计算获取的每个信息事件与关键词组之间的相关度;将与关键词组之间的相关度大于预设相关度的信息事件确定为符合搜索意图条件的信息事件。
上述预设关键词覆盖条件与关键词组包含的关键词的数量相关,当关键词组包含的关键词的数量很少时,为了尽量全面准确地匹配用户的搜索意图,所以获取的信息事件需要有较高的关键词覆盖率,即信息事件对应的事件关键词应尽可能覆盖关键词组中的所有关键词。当关键词组包含的关键词的数量很多时,则关键词组中出现冗余信息的可能性很高,存在用户拼写错误的可能性也很高,因此可以适当降低关键词的覆盖率,即获取的信息事件对应的事件关键词可以只覆盖关键词组中的部分关键词。
本发明实施例中,设置预设数目,该预设数目可以为1或3等。当关键词组包含的关键词的数量小于该预设数目时,认为关键词组包含的关键词的数量很少,需要较高的关键词覆盖率。当关键词组包含的关键词的数量大于或等于该预设数目时,认为关键词组包含的关键词的数量很多,因此降低关键词的覆盖率。
上述从预先建立的信息事件库中获取符合预设关键词覆盖条件的信息事件,具体包括:
判断关键词组包括的关键词的数目是否小于预设数目;如果是,则从预先建立的信息事件库中,获取对应的事件关键词中包含关键词组中所有关键词的信息事件,将获取的信息事件确定为符合预设关键词覆盖条件的信息事件;如果否,则根据关键词的数目计算匹配词数,从预先建立的信息事件库中,获取对应的事件关键词中至少包含匹配词数个关键词组中的关键词的信息事件,将获取的信息事件确定为符合预设关键词覆盖条件的信息事件。
本发明实施例中,定义了上述匹配词数的计算方式,即匹配词数=(关键词的数目+匹配系数)/匹配系数,该匹配系数为预先设置的常量,如4或5等。例如,假设关键词组中关键词的数目为10,该匹配系数为5,则计算得到的匹配词数为3,即符合该预设关键词覆盖条件的信息事件对应的事件关键词中应至少包含关键词组中的3个关键词。
上述获取到符合预设关键词覆盖条件的信息事件后,通过如下方式计算获取的每个信息事件与关键词组之间的相关度,包括:
根据关键词组包括的每个关键词,确定关键词组对应的词组向量;根据获取的每个信息事件对应的事件关键词,分别确定每个信息事件对应的事件向量;分别计算每个信息事件对应的事件向量与关键词组对应的词组向量之间的夹角余弦值,得到每个信息事件与关键词组之间的相关度。
上述确定关键词组对应的词组向量时,将关键词组包含的关键词数量确定为词组向量的维度数量,每个维度上的元素值为维度对应的关键词的权重,关键词的权重可根据该关键词的类型来确定。例如,假设必要类型的关键词对应的权重为2、可选类型的关键词对应的权重为1,非必要类型的关键词对应的权重为0,假设关键词组为“山东工业”,“山东”和“工业”均为必要关键词,则关键词组“山东工业”对应的词组向量V1=[2,2]。
同样地,上述信息事件对应的事件向量,是将信息事件对应的事件关键词的数量确定为事件向量的维度数量,每个维度上的元素值为维度对应的事件关键词的权重。假设关键词组对应的词组向量为V1,信息事件对应 的事件向量为V2,则信息事件与关键词组之间的相关度=cos(V1和V2的夹角)=V1*V2/|V1|*|V2|。
本发明实施例中,除通过上述信息事件与关键词组之间的相关度来确定符合搜索意图条件的信息事件以外,还可以通过符合预设关键词覆盖条件的信息事件中任意两个信息事件之间的相关度大于预设相关度来确定,具体确定过程包括:
根据关键词组,从预先建立的信息事件库中获取符合预设关键词覆盖条件的信息事件;计算获取的每个信息事件中任意两个信息事件之间的相关度;若两个信息事件之间的相关度大于预设相关度,则将两个信息事件确定为符合搜索意图条件的信息事件。
获取符合预设关键词覆盖条件的信息事件的过程前文已作介绍,在此不再赘述。上述计算任意两个信息事件之间的相关度的过程如下:
根据获取的每个信息事件对应的事件关键词,分别确定每个信息事件对应的事件向量;分别计算每个信息事件中任意两个信息事件对应的事件向量之间的夹角余弦值,得到每个信息事件中任意两个信息事件之间的相关度。
确定信息事件对应的事件向量的过程及夹角余弦值的计算方式前文均已作介绍,在此不再赘述。
通过步骤S1获取到符合搜索意图条件的信息事件后,通过如下步骤S2-S4对用户提交的关键词组中关键词的类型进行矫正。
S2:对关键词组进行文本分析,确定关键词组中包括的必要类型的关键词。
对关键词组进行分词处理,得到关键词组包括的各个关键词,确定各个关键词的词性及词义,词性包括名词、动词或形容词等,词义为关键词的具体含义。根据各个关键词的词性及词义,确定出关键词组中包括的必要类型的关键词,必要类型的关键词的词性通常为名词。
S3:根据符合搜索意图条件的信息事件,确定必要类型的关键词对应的必要系数。
上述必要系数为根据符合搜索意图条件的每个信息事件分别对必要 类型的关键词进行评分,得到的总分数。具体确定必要类型的关键词对应的必要系数的过程包括:
从符合搜索意图条件的信息事件中,确定出与必要类型的关键词匹配的信息事件;根据确定的信息事件包含的文档数量,计算必要类型的关键词对应的必要系数。
与必要类型的关键词匹配的信息事件为对应的事件关键词中包含该必要类型的关键词的信息事件。当与必要类型的关键词匹配的信息事件中包含的文档数量大于预设文档数量时,为该必要类型的关键词进行评分的分数为第一预设值,当信息事件中包含的文档数量小于或等于预设文档数量时,为该必要类型的关键词进行评分的分数为第二预设值。通过与必要类型的关键词匹配的每个信息事件完成对该必要类型的关键词的评分后,累计得到的总分数即为该必要类型的关键词对应的必要系数。
对于关键词组中包括的每个必要类型的关键词,都可以按照上述方式分别确定每个必要类型的关键词对应的必要系数。
S4:根据必要类型的关键词对应的必要系数,对关键词组中关键词的类型进行矫正。
上述对关键词组中关键词的类型进行矫正的具体过程包括:
分别判断关键词组包括的每个必要类型的关键词对应的必要系数是否小于预设必要阈值;将必要系数小于预设必要阈值的必要类型的关键词添加到非必要词集合中;判断非必要词集合中是否包含关键词组的所有必要类型的关键词;如果否,则将非必要词集合中的关键词的类型矫正为非必要类型,如果是,则停止对关键词组中关键词的类型的矫正。
对于必要系数小于预设必要阈值的必要类型的关键词,认为该必要类型的关键词对于表达用户的搜索意图的贡献很低,则将其添加到非必要词集合中。完成对所有必要类型的关键词的判断后,确定非必要词集合中是否包含了关键词组中所有的必要类型的关键词,如果是,则认为关键词组中所有必要类型的关键词对表达用户的搜索意图的贡献都很低,即用户提交的关键词组本身不明确,不足以表达用户的搜索意图,因此停止对关键词组中关键词的类型的矫正,结束操作。
另外,本发明实施例中,非必要词集合中包含关键词组中所有的必要类型的关键词时,服务器还可以发送重新输入关键词组的提示信息给用户的终端,以提示用户重新输入更能表达其搜索意图的关键词组。
如果非必要词集合中仅包含关键词组中的部分必要类型的关键词,则将这部分必要类型的关键词的类型修改为非必要类型。如此在根据矫正后的关键词组重新进行信息搜索时,不再要求获取的信息中必须包含这部分必要类型的关键词,这样减少了获取的信息中必须包含的关键词的数目,因此获取的符合用户搜索意图的信息的数量会相应增加,消除了关键词组中一些无关或拼写错误的关键词对搜索结果的负面影响。
如图1A所示,本发明实施例中,根据矫正后的关键词组重新进行搜索后,还将重新搜索得到的信息搜索结果发送给用户的终端,以使用户浏览到其真正需要的信息。
在本发明实施例中,根据接收到的关键词组,获取关键词组对应的信息搜索结果;根据该信息搜索结果的质量信息,判断是否满足重新搜索条件;当判断满足重新搜索条件时,对该关键词组中关键词的类型进行矫正,获取矫正后的关键词组对应的信息搜索结果。本发明根据首次获取的信息搜索结果判断是否满足重新搜索条件,当满足时对用户输入的关键词组中关键词的类型进行矫正,大大减少了拼写错误或与用户搜索意图不相关的词在信息搜索中的参考性,使得矫正后的关键词组更加符合用户的搜索意图。根据矫正后的关键词组重新进行信息搜索,大大增加了搜索到的信息的数量,提高了搜索到用户真正需要的信息的几率,提高了信息搜索的准确性。
<实施例2>
参见图2,本发明实施例提供了一种信息搜索装置,该装置用于执行上述实施例1提供的信息搜索方法。该装置具体包括:
获取模块201,用于根据接收到的关键词组,获取关键词组对应的信息搜索结果;
判断模块202,用于根据信息搜索结果的质量信息,判断是否满足重 新搜索条件;
矫正模块203,用于当判断模块202判断满足重新搜索条件时,对关键词组中关键词的类型进行矫正,并获取矫正后的关键词组对应的信息搜索结果。
当判断模块202判断出不满足重新搜索条件时,认为获取模块201获取的信息搜索结果的质量很高,能够满足用户的搜索需求,因此不再重新进行信息搜索,直接将获取的信息搜索结果发送给用户的终端,结束操作。
在本发明实施例中,质量信息包括信息搜索结果包含的信息的数目及每个信息与关键词组之间的匹配度;判断模块202通过如下统计单元、计算单元、确定单元和判断单元来判断是否满足重新搜索条件。
统计单元,用于统计信息搜索结果包括的信息的数目;计算单元,用于分别计算信息搜索结果中每个信息与关键词组之间的匹配度;确定单元,用于确定信息的数目是否大于预设数值,及根据每个信息对应的匹配度,确定信息搜索结果中是否包含匹配度大于预设阈值的信息;判断单元,用于当确定信息的数目小于或等于预设数值,或确定信息搜索结果中不包含匹配度大于预设阈值的信息时,判断满足重新搜索条件,否则,判断不满足重新搜索条件。
矫正模块203通过如下获取单元、第一确定单元、第二确定单元和矫正单元来矫正用户提交的关键词组。
获取单元,用于根据关键词组,从预先建立的信息事件库中获取符合搜索意图条件的信息事件;第一确定单元,用于对关键词组进行文本分析,确定所述关键词组中包括的每个关键词的类型,关键词的类型包括必要类型和非必要类型;第二确定单元,用于根据符合搜索意图条件的信息事件,确定必要类型的关键词对应的必要系数;矫正单元,用于根据必要类型的关键词对应的必要系数,对关键词组中关键词的类型进行矫正。
上述获取单元通过第一获取子单元、第一计算子单元和第一确定子单元来确定符合搜索意图条件的信息事件。
第一获取子单元,用于根据关键词组,从预先建立的信息事件库中获取符合预设关键词覆盖条件的信息事件;第一计算子单元,用于分别计算 获取的每个信息事件与关键词组之间的相关度;第一确定子单元,用于将与关键词组之间的相关度大于预设相关度的信息事件确定为符合搜索意图条件的信息事件。
上述第一计算子单元,用于根据关键词组包括的每个关键词,确定关键词组对应的词组向量;根据获取的每个信息事件对应的事件关键词,分别确定每个信息事件对应的事件向量;分别计算每个信息事件对应的事件向量与关键词组对应的词组向量之间的夹角余弦值,得到每个信息事件与关键词组之间的相关度。
本发明实施例中,获取单元还可以通过如下第二获取子单元、第二计算子单元和第二确定子单元来确定符合搜索意图条件的信息事件。
第二获取子单元,用于根据关键词组,从预先建立的信息事件库中获取符合预设关键词覆盖条件的信息事件;第二计算子单元,用于计算获取的每个信息事件中任意两个信息事件之间的相关度;第二确定子单元,用于若两个信息事件之间的相关度大于预设相关度,则将两个信息事件确定为符合搜索意图条件的信息事件。
上述第二计算子单元,用于根据获取的每个信息事件对应的事件关键词,分别确定每个信息事件对应的事件向量;分别计算每个信息事件中任意两个信息事件对应的事件向量之间的夹角余弦值,得到每个信息事件中任意两个信息事件之间的相关度。
在本发明实施例中,第二确定单元通过如下第三确定子单元和第三计算子单元来得到必要类型的关键词对应的必要系数。
第三确定子单元,用于从符合搜索意图条件的信息事件中,确定出与必要类型的关键词匹配的信息事件;第三计算子单元,用于根据确定的信息事件包含的文档数量,计算必要类型的关键词对应的必要系数。
矫正单元通过如下第一判断子单元、添加子单元、第二判断子单元和矫正子单元来矫正用户提交的关键词组中关键词的类型。
第一判断子单元,用于分别判断关键词组包括的每个必要类型的关键词对应的必要系数是否小于预设必要阈值;添加子单元,用于将必要系数小于预设必要阈值的必要类型的关键词添加到非必要词集合中;第二判断 子单元,用于判断非必要词集合中是否包含关键词组的所有必要类型的关键词;矫正子单元,用于如果否,则将非必要词集合中的关键词的类型矫正为非必要类型,如果是,则停止对关键词组中关键词的类型的矫正。
本发明实施例中,在通过矫正模块203矫正用户提交的关键词组中关键词的类型之前,该装置还通过如下信息事件库建立模块来预先建立信息事件库。
信息事件库建立模块,用于通过网络爬虫抓取信息文档;提取每个信息文档中的事件关键词,并确定事件关键词对应的权重;根据每个信息文档对应的事件关键词及事件关键词对应的权重,将抓取的信息文档聚类为多个信息事件;根据多个信息事件、每个信息事件对应的事件关键词及事件关键词对应的权重,建立信息事件库。
在本发明实施例中,第一获取子单元,用于判断关键词组包括的关键词的数目是否小于预设数目;如果是,则从预先建立的信息事件库中,获取对应的事件关键词中包含关键词组中所有关键词的信息事件,将获取的信息事件确定为符合预设关键词覆盖条件的信息事件;如果否,则根据关键词的数目计算匹配词数,从预先建立的信息事件库中,获取对应的事件关键词中至少包含匹配词数个关键词组中的关键词的信息事件,将获取的信息事件确定为符合预设关键词覆盖条件的信息事件。
在本发明实施例中,根据接收到的关键词组,获取关键词组对应的信息搜索结果;根据该信息搜索结果的质量信息,判断是否满足重新搜索条件;当判断满足重新搜索条件时,对该关键词组中关键词的类型进行矫正,获取矫正后的关键词组对应的信息搜索结果。本发明根据首次获取的信息搜索结果判断是否满足重新搜索条件,当满足时对用户输入的关键词组中关键词的类型进行矫正,大大减少了拼写错误或与用户搜索意图不相关的词在信息搜索中的参考性,使得矫正后的关键词组更加符合用户的搜索意图。根据矫正后的关键词组重新进行信息搜索,大大增加了搜索到的信息的数量,提高了搜索到用户真正需要的信息的几率,提高了信息搜索的准确性。
<实施例3>
参见图3,本发明实施例提供了一种信息搜索装置,该装置用于执行上述实施例1提供的信息搜索方法。该装置具体包括:处理器301、存储器302、总线303和通信接口304,处理器301、通信接口304和存储器302通过总线303连接;
存储器302用于存储程序;
处理器301,用于通过总线303调用存储在存储器302中的程序,执行实施例1提供的信息搜索方法。
处理器301在执行实施例1提供的信息搜索方法时,根据接收到的关键词组,获取关键词组对应的信息搜索结果;根据该信息搜索结果的质量信息,判断是否满足重新搜索条件;当判断满足重新搜索条件时,对该关键词组中关键词的类型进行矫正,并获取矫正后的关键词组对应的信息搜索结果。
处理器301执行实施例1提供的方法的执行细节与实施例1中描述的内容相同,在此不再赘述。
在本发明实施例中,根据接收到的关键词组,获取关键词组对应的信息搜索结果;根据该信息搜索结果的质量信息,判断是否满足重新搜索条件;当判断满足重新搜索条件时,对该关键词组中关键词的类型进行矫正,获取矫正后的关键词组对应的信息搜索结果。本发明根据首次获取的信息搜索结果判断是否满足重新搜索条件,当满足时对用户输入的关键词组中关键词的类型进行矫正,大大减少了拼写错误或与用户搜索意图不相关的词在信息搜索中的参考性,使得矫正后的关键词组更加符合用户的搜索意图。根据矫正后的关键词组重新进行信息搜索,大大增加了搜索到的信息的数量,提高了搜索到用户真正需要的信息的几率,提高了信息搜索的准确性。
本发明实施例所提供的信息搜索装置可以为设备上的特定硬件或者安装于设备上的软件或固件等。所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,前述描述的系统、装置和单元的具体工作过程,均 可以参考上述方法实施例中的对应过程。
在本申请所提供的几个实施例中,应该理解到,所揭露装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (23)

  1. 一种信息搜索方法,其特征在于,所述方法包括:
    根据接收到的关键词组,获取所述关键词组对应的信息搜索结果;
    根据所述信息搜索结果的质量信息,判断是否满足重新搜索条件;
    当判断满足所述重新搜索条件时,对所述关键词组中关键词的类型进行矫正,并获取矫正后的关键词组对应的信息搜索结果。
  2. 根据权利要求1所述的方法,其特征在于,所述质量信息包括所述信息搜索结果包含的信息的数目及每个信息与所述关键词组之间的匹配度;
    根据所述信息搜索结果的质量信息,判断是否满足重新搜索条件,包括:
    统计所述信息搜索结果包括的信息的数目;
    分别计算所述信息搜索结果中每个信息与所述关键词组之间的匹配度;
    确定所述信息的数目是否大于预设数值,及根据所述每个信息对应的匹配度,确定所述信息搜索结果中是否包含匹配度大于预设阈值的信息;
    当确定所述信息的数目小于或等于所述预设数值,或确定所述信息搜索结果中不包含匹配度大于所述预设阈值的信息时,判断满足重新搜索条件,否则,判断不满足所述重新搜索条件。
  3. 根据权利要求1或2所述的方法,其特征在于,所述对所述关键词组中关键词的类型进行矫正,包括:
    根据所述关键词组,从预先建立的信息事件库中获取符合搜索意图条件的信息事件;
    对所述关键词组进行文本分析,确定所述关键词组中包括的每个关键词的类型,关键词的类型包括必要类型和非必要类型;
    根据所述符合搜索意图条件的信息事件,确定必要类型的关键词对应 的必要系数;
    根据必要类型的关键词对应的必要系数,对所述关键词组中关键词的类型进行矫正。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述关键词组,从预先建立的信息事件库中获取符合搜索意图条件的信息事件,包括:
    根据所述关键词组,从预先建立的信息事件库中获取符合预设关键词覆盖条件的信息事件;
    分别计算获取的每个信息事件与所述关键词组之间的相关度;
    将与所述关键词组之间的相关度大于预设相关度的信息事件确定为符合搜索意图条件的信息事件。
  5. 根据权利要求4所述的方法,其特征在于,所述分别计算获取的每个信息事件与所述关键词组之间的相关度,包括:
    根据所述关键词组包括的每个关键词,确定所述关键词组对应的词组向量;
    根据获取的每个信息事件对应的事件关键词,分别确定每个信息事件对应的事件向量;
    分别计算每个信息事件对应的事件向量与所述关键词组对应的词组向量之间的夹角余弦值,得到所述每个信息事件与所述关键词组之间的相关度。
  6. 根据权利要求3所述的方法,其特征在于,所述根据所述关键词组,从预先建立的信息事件库中获取符合搜索意图条件的信息事件,包括:
    根据所述关键词组,从预先建立的信息事件库中获取符合预设关键词覆盖条件的信息事件;
    计算获取的每个信息事件中任意两个信息事件之间的相关度;
    若两个信息事件之间的相关度大于预设相关度,则将所述两个信息事件确定为符合搜索意图条件的信息事件。
  7. 根据权利要求6所述的方法,其特征在于,所述计算获取的每个信息事件中任意两个信息事件之间的相关度,包括:
    根据获取的每个信息事件对应的事件关键词,分别确定每个信息事件对应的事件向量;
    分别计算每个信息事件中任意两个信息事件对应的事件向量之间的夹角余弦值,得到所述每个信息事件中任意两个信息事件之间的相关度。
  8. 根据权利要求3-7任一项所述的方法,其特征在于,所述根据所述符合搜索意图条件的信息事件,确定必要类型的关键词对应的必要系数,包括:
    从所述符合搜索意图条件的信息事件中,确定出与必要类型的关键词匹配的信息事件;
    根据确定的所述信息事件包含的文档数量,计算必要类型的关键词对应的必要系数。
  9. 根据权利要求3-8任一项所述的方法,其特征在于,所述根据必要类型的关键词对应的必要系数,对所述关键词组中关键词的类型进行矫正,包括:
    分别判断所述关键词组包括的每个必要类型的关键词对应的必要系数是否小于预设必要阈值;
    将必要系数小于所述预设必要阈值的关键词添加到非必要词集合中;
    判断所述非必要词集合中是否包含所述关键词组的所有必要类型的关键词;
    如果否,则将所述非必要词集合中的关键词的类型矫正为非必要类型,如果是,则停止对所述关键词组中关键词的类型的矫正。
  10. 根据权利要求3-9任一项所述的方法,其特征在于,所述根据所述关键词组,从预先建立的信息事件库中获取符合搜索意图条件的信息事 件之前,还包括:
    通过网络爬虫抓取信息文档;
    提取每个信息文档中的事件关键词,并确定所述事件关键词对应的权重;
    根据每个信息文档对应的事件关键词及事件关键词对应的权重,将抓取的信息文档聚类为多个信息事件;
    根据所述多个信息事件、每个信息事件对应的事件关键词及事件关键词对应的权重,建立信息事件库。
  11. 根据权利要求4-7任一项所述的方法,其特征在于,所述根据所述关键词组,从预先建立的信息事件库中获取符合预设关键词覆盖条件的信息事件,包括:
    判断所述关键词组包括的关键词的数目是否小于预设数目;
    如果是,则从预先建立的信息事件库中,获取对应的事件关键词中包含所述关键词组中所有关键词的信息事件,将获取的信息事件确定为符合预设关键词覆盖条件的信息事件;
    如果否,则根据所述关键词的数目计算匹配词数,从预先建立的信息事件库中,获取对应的事件关键词中至少包含所述匹配词数个所述关键词组中的关键词的信息事件,将获取的信息事件确定为符合预设关键词覆盖条件的信息事件。
  12. 一种信息搜索装置,其特征在于,所述装置包括:
    获取模块,用于根据接收到的关键词组,获取所述关键词组对应的信息搜索结果;
    判断模块,用于根据所述信息搜索结果的质量信息,判断是否满足重新搜索条件;
    矫正模块,用于当所述判断模块判断满足所述重新搜索条件时,对所述关键词组中关键词的类型进行矫正,并获取矫正后的关键词组对应的信息搜索结果。
  13. 根据权利要求12所述的装置,其特征在于,所述质量信息包括所述信息搜索结果包含的信息的数目及每个信息与所述关键词组之间的匹配度;所述判断模块包括:
    统计单元,用于统计所述信息搜索结果包括的信息的数目;
    计算单元,用于分别计算所述信息搜索结果中每个信息与所述关键词组之间的匹配度;
    确定单元,用于确定所述信息的数目是否大于预设数值,及根据所述每个信息对应的匹配度,确定所述信息搜索结果中是否包含匹配度大于预设阈值的信息;
    判断单元,用于当确定所述信息的数目小于或等于所述预设数值,或确定所述信息搜索结果中不包含匹配度大于所述预设阈值的信息时,判断满足重新搜索条件,否则,判断不满足所述重新搜索条件。
  14. 根据权利要求12或13所述的装置,其特征在于,所述矫正模块包括:
    获取单元,用于根据所述关键词组,从预先建立的信息事件库中获取符合搜索意图条件的信息事件;
    第一确定单元,用于对所述关键词组进行文本分析,确定所述关键词组中包括的每个关键词的类型,关键词的类型包括必要类型和非必要类型;
    第二确定单元,用于根据所述符合搜索意图条件的信息事件,确定必要类型的关键词对应的必要系数;
    矫正单元,用于根据必要类型的关键词对应的必要系数,对所述关键词组中关键词的类型进行矫正。
  15. 根据权利要求14所述的装置,其特征在于,所述获取单元包括:
    第一获取子单元,用于根据所述关键词组,从预先建立的信息事件库中获取符合预设关键词覆盖条件的信息事件;
    第一计算子单元,用于分别计算获取的每个信息事件与所述关键词组 之间的相关度;
    第一确定子单元,用于将与所述关键词组之间的相关度大于预设相关度的信息事件确定为符合搜索意图条件的信息事件。
  16. 根据权利要求15所述的装置,其特征在于,所述第一计算子单元,用于根据所述关键词组包括的每个关键词,确定所述关键词组对应的词组向量;根据获取的每个信息事件对应的事件关键词,分别确定每个信息事件对应的事件向量;分别计算每个信息事件对应的事件向量与所述关键词组对应的词组向量之间的夹角余弦值,得到所述每个信息事件与所述关键词组之间的相关度。
  17. 根据权利要求14所述的装置,其特征在于,所述获取单元包括:
    第二获取子单元,用于根据所述关键词组,从预先建立的信息事件库中获取符合预设关键词覆盖条件的信息事件;
    第二计算子单元,用于计算获取的每个信息事件中任意两个信息事件之间的相关度;
    第二确定子单元,用于若两个信息事件之间的相关度大于预设相关度,则将所述两个信息事件确定为符合搜索意图条件的信息事件。
  18. 根据权利要求17所述的装置,其特征在于,所述第二计算子单元,用于根据获取的每个信息事件对应的事件关键词,分别确定每个信息事件对应的事件向量;分别计算每个信息事件中任意两个信息事件对应的事件向量之间的夹角余弦值,得到所述每个信息事件中任意两个信息事件之间的相关度。
  19. 根据权利要求14-18所述的装置,其特征在于,所述第二确定单元包括:
    第三确定子单元,用于从所述符合搜索意图条件的信息事件中,确定出与必要类型的关键词匹配的信息事件;
    第三计算子单元,用于根据确定的所述信息事件包含的文档数量,计算必要类型的关键词对应的必要系数。
  20. 根据权利要求14-19所述的装置,其特征在于,所述矫正单元包括:
    第一判断子单元,用于分别判断所述关键词组包括的每个必要类型的关键词对应的必要系数是否小于预设必要阈值;
    添加子单元,用于将必要系数小于所述预设必要阈值的必要关键词添加到非必要词集合中;
    第二判断子单元,用于判断所述非必要词集合中是否包含所述关键词组的所有必要类型的关键词;
    矫正子单元,用于如果否,则将所述非必要词集合中的关键词的类型矫正为非必要类型,如果是,则停止对所述关键词组中关键词的类型的矫正。
  21. 根据权利要求14-20任一项所述的装置,其特征在于,所述装置还包括:
    信息事件库建立模块,用于通过网络爬虫抓取信息文档;提取每个信息文档中的事件关键词,并确定所述事件关键词对应的权重;根据每个信息文档对应的事件关键词及事件关键词对应的权重,将抓取的信息文档聚类为多个信息事件;根据所述多个信息事件、每个信息事件对应的事件关键词及事件关键词对应的权重,建立信息事件库。
  22. 根据权利要求15-22任一项所述的装置,其特征在于,所述第一获取子单元,用于判断所述关键词组包括的关键词的数目是否小于预设数目;如果是,则从预先建立的信息事件库中,获取对应的事件关键词中包含所述关键词组中所有关键词的信息事件,将获取的信息事件确定为符合预设关键词覆盖条件的信息事件;如果否,则根据所述关键词的数目计算匹配词数,从预先建立的信息事件库中,获取对应的事件关键词中至少包 含所述匹配词数个所述关键词组中的关键词的信息事件,将获取的信息事件确定为符合预设关键词覆盖条件的信息事件。
  23. 一种信息搜索装置,其特征在于,所述装置包括:处理器、存储器、总线和通信接口,所述处理器、所述通信接口和所述存储器通过所述总线连接;
    所述存储器用于存储程序;
    所述处理器,用于通过所述总线调用存储在所述存储器中的程序,执行所述权利要求1-11任一项所述的方法。
PCT/CN2017/083032 2016-05-09 2017-05-04 一种信息搜索方法及装置 WO2017193865A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610304432.0 2016-05-09
CN201610304432.0A CN105930505A (zh) 2016-05-09 2016-05-09 一种信息搜索方法及装置

Publications (1)

Publication Number Publication Date
WO2017193865A1 true WO2017193865A1 (zh) 2017-11-16

Family

ID=56835385

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/083032 WO2017193865A1 (zh) 2016-05-09 2017-05-04 一种信息搜索方法及装置

Country Status (2)

Country Link
CN (1) CN105930505A (zh)
WO (1) WO2017193865A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532393A (zh) * 2019-09-03 2019-12-03 腾讯科技(深圳)有限公司 文本处理方法、装置及其智能电子设备
CN110827108A (zh) * 2018-08-13 2020-02-21 阿里巴巴集团控股有限公司 信息搜索方法、搜索请求控制方法及系统

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930505A (zh) * 2016-05-09 2016-09-07 广州神马移动信息科技有限公司 一种信息搜索方法及装置
CN111177735B (zh) * 2019-07-30 2023-09-22 腾讯科技(深圳)有限公司 身份认证方法、装置、系统和设备以及存储介质
CN111259209B (zh) * 2020-01-10 2023-12-29 平安科技(深圳)有限公司 基于人工智能的用户意图预测方法、电子装置及存储介质
CN112379904B (zh) * 2020-11-16 2022-06-07 福建多多云科技有限公司 一种基于云手机的应用自动更新方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101206672A (zh) * 2007-12-25 2008-06-25 北京科文书业信息技术有限公司 商品搜索无结果智能处理系统及方法
CN103336765A (zh) * 2013-06-20 2013-10-02 上海大学 一种文本关键词的马尔可夫矩阵离线修正方法
CN103530344A (zh) * 2013-10-09 2014-01-22 上海大学 一种基于改进的tf-idf方法的检索词实时修正方法
CN103838735A (zh) * 2012-11-21 2014-06-04 大连灵动科技发展有限公司 一种提高检索效率和质量的数据检索方法
CN105930505A (zh) * 2016-05-09 2016-09-07 广州神马移动信息科技有限公司 一种信息搜索方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8176067B1 (en) * 2010-02-24 2012-05-08 A9.Com, Inc. Fixed phrase detection for search
JP5752073B2 (ja) * 2012-03-16 2015-07-22 三菱電機株式会社 データ修正装置
CN103366003B (zh) * 2013-07-19 2017-03-08 百度在线网络技术(北京)有限公司 基于用户反馈优化搜索结果的方法和设备
CN104036004B (zh) * 2014-06-17 2018-06-19 百度在线网络技术(北京)有限公司 搜索纠错方法和搜索纠错装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101206672A (zh) * 2007-12-25 2008-06-25 北京科文书业信息技术有限公司 商品搜索无结果智能处理系统及方法
CN103838735A (zh) * 2012-11-21 2014-06-04 大连灵动科技发展有限公司 一种提高检索效率和质量的数据检索方法
CN103336765A (zh) * 2013-06-20 2013-10-02 上海大学 一种文本关键词的马尔可夫矩阵离线修正方法
CN103530344A (zh) * 2013-10-09 2014-01-22 上海大学 一种基于改进的tf-idf方法的检索词实时修正方法
CN105930505A (zh) * 2016-05-09 2016-09-07 广州神马移动信息科技有限公司 一种信息搜索方法及装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827108A (zh) * 2018-08-13 2020-02-21 阿里巴巴集团控股有限公司 信息搜索方法、搜索请求控制方法及系统
CN110827108B (zh) * 2018-08-13 2023-05-26 阿里巴巴集团控股有限公司 信息搜索方法、搜索请求控制方法及系统
CN110532393A (zh) * 2019-09-03 2019-12-03 腾讯科技(深圳)有限公司 文本处理方法、装置及其智能电子设备
CN110532393B (zh) * 2019-09-03 2023-09-26 腾讯科技(深圳)有限公司 文本处理方法、装置及其智能电子设备

Also Published As

Publication number Publication date
CN105930505A (zh) 2016-09-07

Similar Documents

Publication Publication Date Title
WO2017193865A1 (zh) 一种信息搜索方法及装置
CN107704480B (zh) 扩展和强化知识图的方法和系统以及计算机介质
US9846748B2 (en) Searching for information based on generic attributes of the query
US8880512B2 (en) Method, apparatus and system, for rewriting search queries
US10747772B2 (en) Fuzzy substring search
US9311389B2 (en) Finding indexed documents
US20130339001A1 (en) Spelling candidate generation
US9275128B2 (en) Method and system for document indexing and data querying
US20150161173A1 (en) Similar search queries and images
US20090083255A1 (en) Query spelling correction
US10146775B2 (en) Apparatus, system and method for string disambiguation and entity ranking
US20090319883A1 (en) Automatic Video Annotation through Search and Mining
CN103377226A (zh) 一种智能检索方法及其系统
WO2012178152A1 (en) Methods and systems for retrieval of experts based on user customizable search and ranking parameters
JP4969554B2 (ja) トピックグラフを利用したドキュメント検索サーバ及び方法
WO2014008139A2 (en) Generating search results
CN103390004A (zh) 一种语义冗余的确定方法和装置、对应的搜索方法和装置
CN112612875B (zh) 一种查询词自动扩展方法、装置、设备及存储介质
US10565188B2 (en) System and method for performing a pattern matching search
CN113505196A (zh) 基于词性的文本检索方法、装置、电子设备及存储介质
CN112988969B (zh) 用于文本检索的方法、装置、设备以及存储介质
TWI490713B (zh) Information navigation method, information navigation server and information processing system
CN110222334B (zh) 一种主题相关性确定方法、装置、存储介质及终端设备
EP3800562A1 (en) Methods, apparatus, and computer program products for fuzzy term searching
US20210097073A1 (en) Methods, apparatus, and computer program products for fuzzy term searching

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17795483

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.03.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17795483

Country of ref document: EP

Kind code of ref document: A1