US8880512B2 - Method, apparatus and system, for rewriting search queries - Google Patents

Method, apparatus and system, for rewriting search queries Download PDF

Info

Publication number
US8880512B2
US8880512B2 US12/863,482 US86348210A US8880512B2 US 8880512 B2 US8880512 B2 US 8880512B2 US 86348210 A US86348210 A US 86348210A US 8880512 B2 US8880512 B2 US 8880512B2
Authority
US
United States
Prior art keywords
search
present
term
search term
property
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US12/863,482
Other languages
English (en)
Other versions
US20110082860A1 (en
Inventor
Fei Xing
Jing Dong
Ning Guo
Lei Hou
Qin Zhang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Assigned to ALIBABA GROUP HOLDING LIMITED reassignment ALIBABA GROUP HOLDING LIMITED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DONG, JING, GUO, NING, ZHANG, QIN, HOU, Lei, XING, Fei
Publication of US20110082860A1 publication Critical patent/US20110082860A1/en
Priority to US14/491,566 priority Critical patent/US9576054B2/en
Application granted granted Critical
Publication of US8880512B2 publication Critical patent/US8880512B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • G06F17/30672
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching

Definitions

  • the present disclosure relates to a network data processing field and, more particularly, to search methods, apparatuses and systems.
  • a search engine is in general a system that provides search services to a user after collecting, organizing, and processing information on the Internet based on certain strategies implemented by designated computer programs. From a user's perspective, a search engine provides a web page that includes a search box. After a keyword is inputted in the search box and submitted to the search engine through a browser, the search engine returns a list of information that is relevant to the content inputted by the user. In this sense, the keyword inputted by the user becomes search term. Specifically, the user searches relevant contents of interest using the search term.
  • contents relevant to user-input search term may be occasionally missed for retrieval.
  • search term of a user is “Black Lenovo Thinkpad Laptop X60”
  • the search engine may not find a completely matched search result because the input search keyword is too long. Therefore, the browser returns no search result.
  • search term is modified to be “Thinkpad Lenovo Laptop X60”
  • relevant results may then be retrieved from the search because the inputted search keyword is shorter.
  • rule-based search methods exist in existing technologies. One conventional method first segments the search term.
  • the method concludes certain rules according to the needs. For example, one rule may be that if phrases of two product types are close to each other, the latter has a higher weight. For instance, in “mobile phone charger”, “charger” may have a higher weight.
  • the original search term may be rewritten into a new search term.
  • a server of the search engine may subsequently perform a search using the new search term.
  • the search engine when performing a search, the search engine employs a method that rewrites the user-inputted search terms based on certain rules. Because each rewriting operation requires relevant rules that have been set up by a staff in advance and because search terms inputted by users through respective browsers may be of numerous types, the error rate of this simple rule-based method of rewriting search terms tends to be relatively high. Moreover, because of the existence of ambiguity, a result obtained after rewriting a search term may sometimes be inaccurate. Results obtained in a search that is based on an inaccurate rewritten search term may not be what the user wants, thus reducing user experience of the search engine.
  • the technical problem to be solved in the present disclosure is to provide a search method for solving the problem of inaccurate search results caused by searching after rule-based rewriting of search term in existing technologies, and for further improving relevance and recall rate.
  • the present disclosure further provides a search apparatus to ensure implementation and application of the above method in practice.
  • a search method performs acts including: obtaining, from a database, one or more search term candidates that are relevant to a present search term; retrieving properties of the present search term and the one or more search term candidates, the properties describing respective matching result of the present search term and the one or more search term candidates; determining whether or not the present search term needs to be rewritten based on the matching results; when it is determined that the present search term needs to be rewritten, rewriting, by a data rewriting system, the present search term to provide a rewritten present search term based on the matching results; and performing, by a search engine, a search based on the rewritten present search term.
  • obtaining from a pre-established database one or more search term candidates that are relevant to the present search term further may include obtaining at least two search term candidates from the database.
  • determining whether or not the present search term needs to be rewritten based on the matching results may include: assigning values to the properties based on the matching results, each property having a corresponding property value; processing the property values based on one or more predetermined rule to obtain at least two matching result values corresponding to the at least two search term candidates; and determining whether or not a maximum matching result value of the at least two matching result values is greater than a first threshold.
  • obtaining from a pre-established database one or more search term candidates that are relevant to the present search term may include obtaining one search term candidate from the database.
  • determining whether or not the present search term needs to be rewritten based on the matching results may include: assigning values to the properties of the one search term candidate and the present search term based on the matching results; processing the property values based on one or more predetermined rules to obtain one matching result value corresponding to the one search term candidate; and determining whether or not the matching result value is greater than a first threshold.
  • processing the property values based on one or more predetermined rules may include processing the property values based on a linear weighting approach or converting the properties values into the matching result values based on a Maximum Entropy Model.
  • the database may include search results corresponding to historical search terms. Additionally, upon determining that the present search term needs to be rewritten based on the matching results, the method may further include: determining whether or not the search term candidates corresponding to the matching results have corresponding search results; and when it is determined that the search term candidates corresponding to the matching results have corresponding search results, rewriting the present search term based on the matching results.
  • obtaining search term candidates that are relevant to the present search term from a database may include: segmenting the present search term to provide a plurality of child search terms; establishing a respective identifier for each child search term; and performing matching based on the respective identifier of each child search term in the database to obtain search term candidates of the child search terms.
  • retrieving the properties of the present search term and the search term candidates may include: comparing the child search terms with the search term candidates of the child search terms to provide a comparison result; and obtaining the matching results of the child search term and the search term candidates of the child search terms based on the comparison result.
  • the method may further cause a search result to be displayed to a user client.
  • a search apparatus includes: an acquisition module that obtains, from a database, one or more search term candidates that are relevant to a present search term; a property retrieving module that retrieves properties of the present search term and the one or more search term candidates, the properties describing respective matching results of the present search term and the one or more search term candidates; a first determination module that determines whether or not the present search term needs to be rewritten based on the matching results; a rewriting module that rewrites the present search term based on the matching results; and a searching module that performs a search using the rewritten present search term.
  • the first determination module may include: a first value assigning sub-module that assigns values to the properties based on the matching results with each property having a corresponding property value; a first processing sub-module that processes the property values based on one or more predetermined rules to obtain at least two matching result values that correspond to the at least two search term candidates; and a first determination sub-module that determines whether or not a maximum matching result value of the at least two matching result values is greater than a first threshold.
  • the first determination module may include: a second value assigning sub-module that assigns values to the properties of the search term candidate and the present search term based on the matching results; a second processing sub-module that processes the property values based on one or more predetermined rules to obtain a matching result value that corresponds to the search term candidate; and a second determination sub-module that determines whether or not the matching result value is greater than a first threshold.
  • the first processing sub-module or the second processing sub-module may process the property values based on a linear weighting approach or convert the property values into the matching result values based on a Maximum Entropy Model.
  • the apparatus may further include: a second determination module that determines whether or not the search term candidates corresponding to the matching results have corresponding search results; and an execution module that rewrites the present search term based on the matching results when it is determined that the search term candidates corresponding to the matching results have corresponding search results.
  • the acquisition module may include: a word segmenting sub-module that segments the present search term into a plurality of child search terms and establishes a respective identifier for each child search term; and a matching sub-module that obtains the one or more search term candidates from the database based on the identifiers of the child search terms.
  • the acquisition module may further include: a comparison sub-module that compares the child search terms with the one or more search term candidates to provide a comparison result; and a matching result acquisition sub-module that obtains the matching results of the child search terms and the one or more search term candidates based on the comparison result.
  • a comparison sub-module that compares the child search terms with the one or more search term candidates to provide a comparison result
  • a matching result acquisition sub-module that obtains the matching results of the child search terms and the one or more search term candidates based on the comparison result.
  • the apparatus may further include a result displaying module that causes a search result to be displayed to a user client.
  • a search system includes a data rewriting system and a search engine.
  • the data rewriting system may: obtain, from a database, one or more search term candidates that are relevant to a present search term; retrieve properties of the present search term and the one or more search term candidates, the properties describing respective matching results of the present search term and the one or more search term candidates; determine whether or not the present search term needs to be rewritten based on the matching results; and rewrite the present search term based on the matching results to provide a rewritten present search term.
  • the search engine may perform a search based on the rewritten present search term.
  • a series of search term candidates may be matched through a pre-established database on a server end of a search engine.
  • the search term candidates are historical search terms that are relevant to the present search term.
  • Match results of the present search term and the search term candidates are obtained.
  • an optimal search term candidate is found, and the present search term is rewritten. Therefore, the server can use the rewritten present search term as a keyword for the search, thus it avoids the use of fixed rules when rewriting the present search term prior to searching if according to existing technologies. This reduces the probability of having ambiguity in the search process, and improves search accuracy.
  • the disclosed method, apparatus and system can further improve relevancy and recall rate of a search result of the present search term. Any product implementing the present disclosure may not necessarily achieve all the above advantages at the same time.
  • FIG. 1 shows a flow chart of a first exemplary search method in accordance with the present disclosure.
  • FIG. 2 shows a flow chart of a second exemplary search method in accordance with the present disclosure.
  • FIG. 3 shows a flow chart of a third exemplary search method in accordance with the present disclosure.
  • FIG. 4 shows a schematic structural diagram of a first exemplary search apparatus in accordance with the present disclosure.
  • FIG. 5 shows a schematic structural diagram of a second exemplary search apparatus in accordance with the present disclosure.
  • FIG. 6 shows a schematic structural diagram of a third exemplary search apparatus in accordance with the present disclosure.
  • FIG. 7 shows a schematic structural diagram of an exemplary search system in accordance with the present disclosure.
  • FIG. 8 shows a schematic structural diagram of an exemplary search system in a practical application in accordance with the present disclosure.
  • the disclosed method, apparatus and system may be used in an environment or in a configuration of universal or specialized computer systems. Examples include a personal computer, a server computer, a handheld device or a portable device, a tablet device, a multi-processor system, and a distributed computing environment including any system or device above.
  • a program module includes routines, programs, objects, modules, and data structure, etc., for executing specific tasks or implementing specific abstract data types.
  • the disclosed method and server may also be implemented in a distributed computing environment.
  • a task is executed by remote processing devices which are connected through a communication network.
  • the program module may be located in storage media (which include storage devices) of local and remote computers.
  • the disclosed system may structurally include a pre-established database, a search log, a data rewriting system, a search engine, and a user client.
  • the search engine Upon receiving a search term inputted by a user, referred to as the present search term, the search engine transmits the present search term to the data rewriting system.
  • the data rewriting system matches the present search term in the pre-established database to obtain one or more historical search terms that are relevant to the present search term (i.e., search term candidates), retrieve properties of the present search term and search term candidates (where the properties are used for describing respective one or more matching results of the present search term and the search term candidates), and determine whether the matching results indicate that rewriting the present search term is required.
  • the present search term is rewritten according to the matching results.
  • the search engine then performs a search using the rewritten search term.
  • the pre-established database stores historical search terms of the user client, and may be implemented in a form of a search log or other approach.
  • the recall rate is referred to as a ratio between the number of relevant documents having been found and the number of all relevant documents in a document repository.
  • That entity can be implemented using an acquisition module, a property retrieving module, a first determination module, a rewriting module, and a searching module.
  • the acquisition module matches and obtains, for present search term, one or more search term candidates that are relevant to the present search term from a pre-established database.
  • the property retrieving module retrieves properties of the present search term and the search term candidates, where the properties are describe respective matching results of the present search term and the search term candidates.
  • the first determination module determines whether the present search term needs to be rewritten in view of the matching results.
  • the rewriting module rewrites the present search term based on the matching results.
  • the searching module performs a search using the rewritten search term. Accuracy and recall rate of associated result can be improved when search is performed using a present search term that is rewritten into that entity.
  • FIG. 1 shows a flow chart of a first exemplary method 100 in accordance with the present disclosure.
  • the method 100 may include actions as described below.
  • a data rewriting system matches and obtains, from a pre-established database, one or more search term candidates that are relevant to the present search term.
  • the database stores historical search terms of a user client.
  • the one or more search term candidates are historical search terms relevant to the present search term.
  • the pre-established database may be implemented in a form of a search log of a search engine.
  • the search log refers to log information of search terms and search results of the user client that has been collected by the search engine, i.e., the historical search terms of the user client.
  • the database may further record detailed information such as click rates and exposure rates of the search results.
  • data content in the search log may be used for re-establishing a new database.
  • the data rewriting system obtains at least two search term candidates.
  • the content of each search term candidate that has been matched and the content of the present search term include at least one phrase or word that is in common.
  • the data rewriting system retrieves properties of the present search term and at least two search term candidates, where the properties describe respective matching results of the present search term and each of the at least two search term candidates.
  • the data rewriting system matches the present search term with each of the at least two search term candidates to retrieve properties of the present search term and each of the search term candidate.
  • a property may be referred to as “number of matches associated with brand” between the present search term and any one search term candidate, for instance.
  • An example is the number of matches associated with “Nokia”, i.e., whether the brand “Nokia” appears in the present search term and a search term candidate. If affirmative, a respective property value will be assigned to be 1 during subsequent value assignment. If not, the respective property value will be assigned to be 0.
  • Other examples include “number of matches associated with product” such as the number of matches associated with “mobile phone”, and so forth.
  • the data rewriting system assigns values to the properties based on the matching results with each property having a corresponding property value.
  • Values are assigned to the properties based on the matching results. For example, a property “number of matches associated with brand” have a value of 1 in the present search term and a search term candidate means that a certain brand name is included and appears once in both the present search term and the search term candidate. The property value of this property is therefore 1. Upon value assignment, each property has a corresponding property value.
  • the data rewriting system processes all the property values based on one or more predetermined rules to obtain at least two matching result values that correspond to the at least two search term candidates.
  • the predetermined rules may include a rule satisfying a certain linear weighting, or may be implemented using a Maximum Entropy Model, i.e., converting the property values into matching result values using such probability model as Maximum Entropy Model.
  • the predetermined rules may be designated in advance according to practical needs.
  • the data rewriting system processes the property values according to the predetermined rules. Specifically, property values associated with each search term candidate are computed in order to obtain a matching result value for each search term candidate.
  • the matching result value may be any arbitrary value, e.g., a decimal number such as 0.8 or 0.6, or an integer such as 2 or 5. In the present disclosure, a more optimal result may be obtained using the Maximum Entropy Model.
  • the data rewriting system determines whether a maximum matching result value of the at least two matching result values is greater than a certain threshold. If affirmative, the process continues to 106 . Otherwise, no further processing is performed.
  • the data rewriting system determines whether or not rewriting the present search term is required.
  • a certain threshold may be set up in advance for the data rewriting system.
  • the data rewriting system determines whether or not a maximum matching result value of the matching result values is greater than that threshold. If affirmative, the search term candidate corresponding to that matching result value is more optimal than the present search term, where “more optimal than the present search term” may be interpreted as that the search term candidate has a relatively high degree of matching with the present search term and has fewer unnecessary words.
  • the threshold be 0.9.
  • the present search term needs to be rewritten when a matching result value of a particular search term candidate and the present search term is a maximum and is greater than 0.9. Specifically, the present search term is rewritten so that particular search term candidate becomes the rewritten present search term.
  • the threshold may be set up dynamically in response to the matching result values.
  • the data rewriting system rewrites the present search term so that the search term candidate becomes the rewritten present search term.
  • the search engine then conducts a search using the rewritten present search term.
  • the original present search term of the user client is rewritten into a search term candidate, where a matching result value of the search term candidate is a maximum and is greater than the threshold.
  • the search engine subsequently conducts a search using the rewritten present search term.
  • the technical scheme in the present embodiment no longer uses manually established and fixed rules, but rather creates a pre-established database directly using a search log of the search engine.
  • the user may set up and update the contents of that database himself.
  • various search terms may be rewritten using respective search term candidates that are matched therewith. Without the sole dependence on fixed rules, searching with the rewritten search term not only allows the disclosed search method to obtain higher accuracy and avoids generating ambiguity as a result of using the rules, but also improves recall rate of associated search results.
  • FIG. 2 shows a flow chart of a second exemplary search method 200 in accordance with the present disclosure.
  • the method may include certain actions as described below.
  • a data rewriting system matches and obtains one search term candidate that is relevant to the present search term from a pre-established database.
  • the data rewriting systems matches and obtains only one search term candidate that is relevant to the present search term from the pre-established database.
  • the data rewriting system may obtain search results corresponding to that search term candidate at the same time, with the search results including information such as web page identification, for example.
  • the data rewriting system retrieves properties of the present search term and the search term candidate, with the properties describing a matching result of the present search term and the search term candidate.
  • the data rewriting system may match the present search term with that one search term candidate to obtain the properties of the present search term and the one search term candidate, e.g., number of matches associated with brand, number of matches associated with product, and so forth.
  • the data rewriting system assigns values to the properties of the present search term and the search term candidate based on the matching result.
  • the data rewriting system assigns values to the properties based on the matching result. For example, the number of matches associated with a product in the present search term and the search term candidate may be 1. Specifically, a specific product name, e.g., “mobile phone”, may be included and appear once in both the present search term and the search term candidate. A property value of the corresponding property is therefore 1. Upon value assignment, each property of the present search term and the search term candidate has a corresponding property value. A collection of all property values upon matching the search term candidate with the present search term is obtained.
  • the data rewriting system processes the property values according to pre-determined rules to obtain a matching result value corresponding to the search term candidate.
  • the data rewriting system may process the collection of the property values using linear weighting.
  • a probability model such as the Maximum Entropy Model, the Hidden Markov Model, the Maximum Entropy Markov Model, or the Conditional Random Field Model may be used.
  • the data rewriting system processes the property values using a linear weighting approach or converts the property values into a matching result value using, for example, the Maximum Entropy Model.
  • the data rewriting system determines whether or not the matching result value is greater than a certain threshold. If not, no further processing is performed. If affirmative, the process continues to 206 .
  • the search term candidate corresponding to the matching result value is more optimal than the present search term.
  • the data rewriting system determines whether or not a search result corresponding to the search term candidate exists in the database. If not, no further processing is performed. If affirmative, the process continues to 207 .
  • the data rewriting system may determine whether or not the search term candidate corresponding to the matching result value has a corresponding search result in the database. If a search result is found, relevant results can be retrieved for this search term candidate. Recall rate is therefore improved when the server performs a search using that search term candidate.
  • the data rewriting system rewrites the present search term into the search term candidate, which is then used by a search engine in a search.
  • search term candidate only one search term candidate is obtained and matched from the pre-established database in a server of the search engine. Therefore, when applying the method, properties of the present search term and this particular search term candidate are retrieved and a matching result value between them is computed. Depending on whether or not the matching result value is greater than the predetermined threshold, it is determined whether or not the search term candidate is more optimal than the present search term. Furthermore, whether or not a corresponding search result exists for the search term candidate is also determined. If affirmative, the server of the search engine will by default perform a search using the search term candidate. By further making a determination regarding search results of the search term candidate, this method not only has a higher accuracy compared with the existing method that performs a search after rule-based rewriting of search term, but also improves recall rate of associated search results.
  • FIG. 3 shows a flow chart of a third exemplary search method 300 in accordance with the present disclosure.
  • the present embodiment may be considered as an exemplary search method in practice.
  • the method 300 may include certain actions as described below.
  • the data rewriting system segments a present search term of the user client into a plurality of child search terms, and sets up a respective identifier for each child search term.
  • the data rewriting system may use a word segementer to segment the present search term into a plurality of child search terms, and then set up a respective identifier for each child search term thus obtained.
  • the present search term may be “red Nokia n95 mobile phone.”
  • word segmentation and identifier setting what may be obtained may be: “red (qualifier)/Nokia (brand)/n95 (model number)/mobile (product type)”, where “red” is a child search term and “qualifier” is the respective identifier for that child search term, for example.
  • the data rewriting system performs matching in the pre-established database based on the identifiers of the child search terms to obtain two search term candidates.
  • a pre-established database is used to store historical search terms of a user client.
  • the search term candidates are historical search terms that are relevant to the present search term.
  • the data rewriting system performs matching in the pre-established database based on the identifiers of the child search term and obtains search term candidates from the historical search terms: “Nokia n95 mobile phone” and “red Nokia mobile phone”, for example.
  • an exemplary method of storing “red Nokia n95” in the database may similar to the one shown below.
  • the method of storing search term candidates in the database does not affect the implementation of the present exemplary embodiments.
  • Alternative storing methods may be used for storing the search term candidates.
  • the data rewriting system compares the child search terms of the present search term with the search term candidates to provide a comparison result.
  • comparing the child search terms with the search term candidates may refer to separately comparing “red”, “Nokia” and “mobile phone”, etc, with “Nokia n95 mobile phone” and “red Nokia mobile phone”, for example.
  • the data rewriting system obtains respective matching results between the child search terms and the two search term candidates separately.
  • properties of the present search term “red Nokia n95 mobile phone” and “Nokia n95 mobile phone” may be obtained such as: “number of matches associated with qualifier”, “number of matches associated with brand”, “number of matches associated with model number”, and “number of matches associated with product type.” These properties may represent matching results between the present search term and the two search term candidates.
  • the data rewriting system assigns values to the properties based on the matching results with each property having a corresponding property value.
  • properties of the search term candidate “Nokia n95 mobile phone” include: number of matches associated with qualifier, number of matches associated with brand, number of matches associated with model number, and number of matches associated with product type, with property values obtained after matching this search term candidate with the present search term to be 1, 2, 1 and 1 respectively.
  • each property has a corresponding property value.
  • the data rewriting system processes the property values based on one or more pre-determined rules to obtain two matching result values that are corresponding to the two search term candidates.
  • the predetermined rules may refer to computing according to a simple linear model, i.e., computing the matching result value by weighting all property values.
  • a relatively complicated probability model such as the Maximum Entropy Model may be used.
  • the results obtained in this action represent respective matching result values of the two search term candidates. For example, the matching result value of “Nokia n95 mobile phone” computed according to the Maximum Entropy Model may be 0.95 while the matching result value of the second search term candidate “red Nokia mobile phone” may be 0.8.
  • the data rewriting system determines whether or not a larger matching result value is greater than a certain threshold. If not, no further processing is performed. If affirmative, the process proceeds to 308 .
  • the data rewriting system may set up a threshold in advance, e.g., 0.9, in this exemplary embodiment. When a matching result value is greater than this threshold, the corresponding search term candidate is an optimal search term candidate.
  • a minimum threshold may further be set up. Specifically, when all matching result values are smaller than this minimum threshold, the present search term will not be rewritten. Moreover, when all matching result values are smaller than a certain maximum threshold, the present search term will be not rewritten.
  • the data rewriting system rewrites the present search term into the search term candidate.
  • the first search term candidate is more optimal than the present search term.
  • the present search term “red Nokia n95 mobile phone” is rewritten into “Nokia n95 mobile phone.”
  • a search engine performs a search using the rewritten present search term and causes the search results to be displayed to the user client.
  • the server of the search engine may directly perform a search using the rewritten search term, i.e., the first search term candidate “Nokia n95 mobile phone”, and causes those found results to be displayed to the user client.
  • the present disclosure further provides a first exemplary search apparatus 400 which is shown in FIG. 4 .
  • the apparatus 400 may include an acquisition module 401 , a property retrieving module 402 , a first value assigning sub-module 403 , a first processing sub-module 404 , a first determination sub-module 405 , a rewriting module 406 , and a searching module.
  • the acquisition module 401 matches and obtains, from a pre-established database, at least two search term candidates that are relevant to the present search term.
  • a server end of a search engine may set up the pre-established database in advance, which is used for storing historical search terms of a user client.
  • the historical search terms in the database may be obtained through a search log.
  • the search log may be referred to as log information that the search engine uses to collect search terms and search results of the user client.
  • the database may further record detailed information such as, for example, click rates and exposure rates of the search results.
  • the property retrieving module 402 retrieves properties of the present search term and the search term candidates, where the properties describe respective matching results of the present search term and the search term candidates.
  • the property retrieving module 402 matches the present search term with the at least two search term candidates upon obtaining the at least two search term candidates to obtain properties of the present search term and each of the search term candidate.
  • the properties may be, for example, a number of matches associated with brand and a number of matches associated with product in the present search term and a search term candidate.
  • the first value assigning sub-module 403 assigns values to the properties based on the matching results, with each property having a corresponding property value.
  • the first value assigning sub-module 403 assigns values to the properties based on the matching results. For example, in the present search term and one of the search term candidates, the number of matches associated with brand is 1, indicating that a certain brand name may be included and appears once in the present search term and that search term candidate. Therefore, the property value of that property is 1. After value assignment, each property has a corresponding property value.
  • the first processing sub-module 404 processes the property values based on one or more predetermined rules to obtain at least two matching result values that correspond to the at least two search term candidates.
  • the first processing sub-module 404 converts the property values into the matching result values based on one or more predetermined rules which may be a certain linear weighting rule or a probability model such as, for example, Maximum Entropy Model.
  • the predetermined rules may be designated in advance according to practical needs.
  • the first processing sub-module 404 may process the property values using a linear weighting approach or convert the property values into the matching result values using Maximum Entropy Model.
  • the first determination sub-module 405 determines whether or not a maximum matching result value of the at least two matching result values is greater than a certain threshold.
  • a search term candidate corresponding to that matching result value is more optimal than the present search term.
  • the rewriting module 406 rewrites the present search term based on the matching results.
  • the searching module 407 performs a search based on the rewritten present search term.
  • the exemplary apparatus may be integrated into the server of the search engine or may be a separate entity that is communicatively coupled with the server of the search engine. Furthermore, when the disclosed method is implemented in the form of software, the method may be executed as a new function for the server of the search engine or as a separate program. The present disclosure does not impose any limitation on how to implement the disclosed method or apparatus.
  • the apparatus rewrites the search term and performs a search using the rewritten search term without using manually-established and fixed rules.
  • the pre-established database may be created directly using a search log of the search engine.
  • the user may set up and update the contents of the database. This not only allows the search method to obtain a higher accuracy and avoids generating ambiguities as a result of using the rules, but also improves recall rate of associated search results.
  • the present disclosure further provides a second exemplary search apparatus 500 as shown in FIG. 5 .
  • the apparatus 500 may include an acquisition module 501 , a property retrieving module 502 , a second value assigning sub-module 503 , a second processing sub-module 504 , a second determination sub-module 505 , a second determination module 506 , an execution module 507 , and a searching module 508 .
  • the acquisition module 501 matches and obtains, from a pre-established database, one search term candidate that is relevant to the present search term.
  • the acquisition module 501 obtains only one search term candidate from the pre-established database.
  • the property retrieving module 502 retrieves properties of the present search term and the search term candidate, where the properties describe a matching result of the present search term and the one search term candidate.
  • the second value assigning sub-module 503 assigns values to the properties of the one search term candidate and the present search term based on the matching result.
  • the second processing sub-module 504 processes the property values based on one or more predetermined rules to obtain a matching result value that corresponds to the one search term candidate.
  • the second determination sub-module 505 determines whether or not the matching result value is greater than a certain threshold.
  • the second determination module 506 determines whether or not a search result that corresponds to the search term candidate exists in the database.
  • the second determination module 506 may determine whether a search result that corresponds to the search term candidate of the matching result value exists in the database. If a search result is found, relevant results can be found using this search term candidate.
  • the execution module 507 executes acts of rewriting the present search term into the search term candidate when a result of the second determination module is affirmative.
  • the searching module 508 performs a search using a result of the execution module 507 .
  • the second determination module 506 makes determination on search results and allows rewriting of the present search term when the search term candidate has relevant search results. This not only has a higher accuracy compared with existing method of searching after rule-based rewriting of search term, but also improves recall rate of associated search results.
  • the present disclosure further provides a third exemplary search apparatus 600 which is shown in FIG. 6 .
  • the apparatus 600 may include a word segmenting module 601 , a matching sub-module 602 , a comparing sub-module 603 , a matching result acquisition sub-module 604 , a first determination module 605 , a rewriting module 606 , a searching module 607 , and a result displaying module 608 .
  • the word segmenting module 601 segments a present search term into a plurality of child search terms, and sets up a respective identifier for each child search term.
  • the word segmenting module 601 may be implemented by a word segmenter.
  • the matching sub-module 602 matches the identifiers of the child search terms in a pre-established database to obtain a search term candidate.
  • the comparing sub-module 603 compares the child search terms of the present search term with the search term candidate.
  • the matching result acquisition sub-module 604 obtains a matching result of the child search terms and the search term candidate based on a comparison result.
  • the first determination module 605 determines whether or not the matching result indicates that the present search term needs to be rewritten.
  • the rewriting module 606 rewrites the present search term into the search term candidate.
  • the searching module 607 performs a search using a result of the rewriting module.
  • the result displaying module 608 causes search results to be displayed to the user client.
  • search term candidate when matching and obtaining a search term candidate, word segmentation of the present search term may be employed.
  • Various search term candidate may be matched and obtained based on child search term. Therefore, the various search term candidates can be more accurately matched and obtained in the database, thus facilitating subsequent rewriting of the present search term and searching. As such, search results are more accurate with recall rate thereof being improved.
  • the present disclosure further provides an exemplary search system 700 as shown in FIG. 7 .
  • the system 700 may include, at a server end, a database 701 , an acquisition module 702 , a property retrieving module 703 , a first determination module 704 , a rewriting module 705 , and a searching module 706 .
  • the database 701 stores historical search terms of a user client.
  • This pre-established database needs to be communicatively coupled to the server when acting as a separate entity, or may be integrated into the server, acting as a unit or a module of the server.
  • the acquisition module 702 matches and obtains, from the pre-determined database, a search term candidate that is relevant to the present search term.
  • the property retrieving module 703 retrieves properties of the present search term and the search term candidate, where the properties describe a matching result of the present search term and the search term candidate.
  • the first determination module 704 determines whether or not the present search term needs to be rewritten based on the matching result.
  • the rewriting module 705 rewrites the present search term based on the matching result.
  • the searching module 706 performs a search using a result of the rewriting module 705 .
  • the system may further include a search log (not shown).
  • the search log is communicatively coupled to the pre-established database, and provides the historical search terms of the user client or provides search results to the server, etc.
  • FIG. 8 provides a reference for a structure of various the components of the search system 700 in a practical application.
  • system may further include, at end of a user client, a browser 707 that receives the present search term from a user and submits the present search term to the server.
  • the present exemplary embodiment describes work interaction scenarios among the user client and various devices of the server end when the user client interacts with the server.
  • the browser first receives a search term inputted by the user for a search and submits the present search term to the server.
  • any relational terms such as “first” and “second” in this document are only meant to distinguish one entity from another entity or one operation from another operation, but not necessarily request or imply existence of any real-world relationship or ordering between these entities or operations.
  • terms such as “include”, “have” or any other variants mean non-exclusively “comprising”. Therefore, processes, methods, articles or devices which individually include a collection of features may not only be including those features, but may also include other features that are not listed, or any inherent features of these processes, methods, articles or devices.
  • a feature defined within the phrase “include a . . . ” does not exclude the possibility that process, method, article or device that recites the feature may have other equivalent features.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
US12/863,482 2009-05-12 2010-04-30 Method, apparatus and system, for rewriting search queries Active 2030-06-20 US8880512B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/491,566 US9576054B2 (en) 2009-05-12 2014-09-19 Search method, apparatus and system based on rewritten search term

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN200910135276.X 2009-05-12
CN200910135276XA CN101887436B (zh) 2009-05-12 2009-05-12 一种检索方法和装置
CN200910135276 2009-05-12
PCT/IB2010/001094 WO2010131101A1 (en) 2009-05-12 2010-04-30 Search method, apparatus and system

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2010/001094 A-371-Of-International WO2010131101A1 (en) 2009-05-12 2010-04-30 Search method, apparatus and system

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/491,566 Continuation US9576054B2 (en) 2009-05-12 2014-09-19 Search method, apparatus and system based on rewritten search term

Publications (2)

Publication Number Publication Date
US20110082860A1 US20110082860A1 (en) 2011-04-07
US8880512B2 true US8880512B2 (en) 2014-11-04

Family

ID=43073362

Family Applications (2)

Application Number Title Priority Date Filing Date
US12/863,482 Active 2030-06-20 US8880512B2 (en) 2009-05-12 2010-04-30 Method, apparatus and system, for rewriting search queries
US14/491,566 Active 2030-09-29 US9576054B2 (en) 2009-05-12 2014-09-19 Search method, apparatus and system based on rewritten search term

Family Applications After (1)

Application Number Title Priority Date Filing Date
US14/491,566 Active 2030-09-29 US9576054B2 (en) 2009-05-12 2014-09-19 Search method, apparatus and system based on rewritten search term

Country Status (6)

Country Link
US (2) US8880512B2 (ja)
EP (1) EP2430575A4 (ja)
JP (1) JP5698222B2 (ja)
CN (1) CN101887436B (ja)
HK (1) HK1148367A1 (ja)
WO (1) WO2010131101A1 (ja)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7536382B2 (en) * 2004-03-31 2009-05-19 Google Inc. Query rewriting with entity detection
US8185544B2 (en) * 2009-04-08 2012-05-22 Google Inc. Generating improved document classification data using historical search results
US8468143B1 (en) 2010-04-07 2013-06-18 Google Inc. System and method for directing questions to consultants through profile matching
JP5552448B2 (ja) * 2011-01-28 2014-07-16 株式会社日立製作所 検索式生成装置、検索システム、検索式生成方法
CN102915314B (zh) * 2011-08-05 2018-07-31 深圳市世纪光速信息技术有限公司 一种纠错对自动生成方法及系统
CN104166651B (zh) * 2013-05-16 2017-10-13 阿里巴巴集团控股有限公司 基于对同类数据对象整合的数据搜索的方法和装置
CN104239301B (zh) * 2013-06-06 2018-02-13 阿里巴巴集团控股有限公司 一种数据比对方法和装置
CN103617241B (zh) * 2013-11-26 2017-06-06 北京奇虎科技有限公司 搜索信息处理方法、浏览器终端与服务器
CN104750762A (zh) * 2013-12-31 2015-07-01 华为技术有限公司 一种信息检索方法及装置
CN103886039B (zh) * 2014-03-10 2018-01-19 百度在线网络技术(北京)有限公司 应用检索的优化方法和装置
CN104063433A (zh) * 2014-06-10 2014-09-24 百度在线网络技术(北京)有限公司 推荐内容的展现方法和装置
US9547690B2 (en) * 2014-09-15 2017-01-17 Google Inc. Query rewriting using session information
CN105574019B (zh) * 2014-10-14 2020-07-31 阿里巴巴(中国)有限公司 一种查询参数处理方法及装置
CN107491447B (zh) * 2016-06-12 2021-01-22 百度在线网络技术(北京)有限公司 建立查询改写判别模型、查询改写判别的方法和对应装置
CN107784014A (zh) * 2016-08-30 2018-03-09 广州市动景计算机科技有限公司 信息搜索方法、设备及电子设备
CN108153770A (zh) * 2016-12-05 2018-06-12 天脉聚源(北京)科技有限公司 一种搜索引擎加速的方法和系统
CN107958406A (zh) * 2017-11-30 2018-04-24 北京小度信息科技有限公司 查询数据的获取方法、装置及终端
WO2019228065A1 (en) * 2018-06-01 2019-12-05 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for processing queries
CN109241243B (zh) * 2018-08-30 2020-11-24 清华大学 候选文档排序方法及装置
CN111339759B (zh) * 2020-02-21 2023-07-25 北京百度网讯科技有限公司 领域要素识别模型训练方法、装置及电子设备

Family Cites Families (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2935877B2 (ja) * 1990-08-08 1999-08-16 株式会社リコー 文書検索装置
US5649221A (en) * 1995-09-14 1997-07-15 Crawford; H. Vance Reverse electronic dictionary using synonyms to expand search capabilities
US6144958A (en) * 1998-07-15 2000-11-07 Amazon.Com, Inc. System and method for correcting spelling errors in search queries
US8396859B2 (en) * 2000-06-26 2013-03-12 Oracle International Corporation Subject matter context search engine
US7392238B1 (en) * 2000-08-23 2008-06-24 Intel Corporation Method and apparatus for concept-based searching across a network
US20020059220A1 (en) * 2000-10-16 2002-05-16 Little Edwin Colby Intelligent computerized search engine
US20040249808A1 (en) * 2003-06-06 2004-12-09 Microsoft Corporation Query expansion using query logs
US7996419B2 (en) * 2004-03-31 2011-08-09 Google Inc. Query rewriting with entity detection
US7536382B2 (en) * 2004-03-31 2009-05-19 Google Inc. Query rewriting with entity detection
US7617176B2 (en) * 2004-07-13 2009-11-10 Microsoft Corporation Query-based snippet clustering for search result grouping
CN101601032A (zh) * 2005-01-18 2009-12-09 雅虎公司 结合万维网搜索技术和万维网内容的被赞助搜索条目的匹配和排名
US7636714B1 (en) * 2005-03-31 2009-12-22 Google Inc. Determining query term synonyms within query context
US7613664B2 (en) * 2005-03-31 2009-11-03 Palo Alto Research Center Incorporated Systems and methods for determining user interests
US7756855B2 (en) * 2006-10-11 2010-07-13 Collarity, Inc. Search phrase refinement by search term replacement
JP4143085B2 (ja) * 2005-12-15 2008-09-03 日本電信電話株式会社 同義語獲得方法及び装置及びプログラム及びコンピュータ読み取り可能な記録媒体
US9177124B2 (en) * 2006-03-01 2015-11-03 Oracle International Corporation Flexible authentication framework
US20070214158A1 (en) * 2006-03-08 2007-09-13 Yakov Kamen Method and apparatus for conducting a robust search
US7653618B2 (en) * 2007-02-02 2010-01-26 International Business Machines Corporation Method and system for searching and retrieving reusable assets
CN101276361B (zh) * 2007-03-28 2010-09-15 阿里巴巴集团控股有限公司 一种显示相关关键词的方法及系统
US20080294619A1 (en) * 2007-05-23 2008-11-27 Hamilton Ii Rick Allen System and method for automatic generation of search suggestions based on recent operator behavior
US7921069B2 (en) * 2007-06-28 2011-04-05 Yahoo! Inc. Granular data for behavioral targeting using predictive models
US20090037399A1 (en) * 2007-07-31 2009-02-05 Yahoo! Inc. System and Method for Determining Semantically Related Terms
CN101601038A (zh) * 2007-08-03 2009-12-09 松下电器产业株式会社 关联词语提示装置
US7788276B2 (en) * 2007-08-22 2010-08-31 Yahoo! Inc. Predictive stemming for web search with statistical machine translation models
US20090055386A1 (en) * 2007-08-24 2009-02-26 Boss Gregory J System and Method for Enhanced In-Document Searching for Text Applications in a Data Processing System
CN101398820B (zh) * 2007-09-24 2010-11-17 北京启明星辰信息技术股份有限公司 一种大规模关键词匹配方法
JP2009080577A (ja) * 2007-09-25 2009-04-16 Toshiba Corp 情報検索支援装置及び方法
US8583670B2 (en) * 2007-10-04 2013-11-12 Microsoft Corporation Query suggestions for no result web searches
CN101241512B (zh) * 2008-03-10 2012-01-11 北京搜狗科技发展有限公司 一种重新定义查询词的搜索方法及装置
US8095540B2 (en) * 2008-04-16 2012-01-10 Yahoo! Inc. Identifying superphrases of text strings
JP5355949B2 (ja) * 2008-07-16 2013-11-27 株式会社東芝 次検索キーワード提示装置、次検索キーワード提示方法、及び次検索キーワード提示プログラム
US20100205198A1 (en) * 2009-02-06 2010-08-12 Gilad Mishne Search query disambiguation

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Chinese Office Action mailed Feb. 13, 2012 for Chinese patent application No. 200910135276.X, a counterpart foreign application of U.S. Appl. No. 12/863,482, 12 pages.
Chinese Office Action mailed Jul. 27, 2012 for Chinese patent application No. 200910135276.X, a counterpart foreign application of U.S. Appl. No. 12/863,482, 15 pages.
Extended European Search Report mailed Jan. 22, 2013 for European patent application No. 10774604.2, 5 pages.
PCT Intl Search Report and Written Opinion for Application No. PCT/IB2010/001094, dated Sep. 30, 2010, 11 pgs.
Translated Copy of the Japanese Office Action mailed Jun. 17, 2014 for Japanese patent application No. 2012-510381, a counterpart foreign application of U.S. Appl. No. 12/863,482, 7 pages.
Translated Japanese Office Action mailed Dec. 3, 2013 for Japanese patent application No. 2012-510381, a counterpart foreign application of U.S. Appl. No. 12/863,482, 6 pages.

Also Published As

Publication number Publication date
HK1148367A1 (en) 2011-09-02
EP2430575A1 (en) 2012-03-21
JP2012527028A (ja) 2012-11-01
JP5698222B2 (ja) 2015-04-08
CN101887436B (zh) 2013-08-21
US20110082860A1 (en) 2011-04-07
EP2430575A4 (en) 2013-02-20
CN101887436A (zh) 2010-11-17
US9576054B2 (en) 2017-02-21
WO2010131101A1 (en) 2010-11-18
US20150074076A1 (en) 2015-03-12

Similar Documents

Publication Publication Date Title
US8880512B2 (en) Method, apparatus and system, for rewriting search queries
CN110069610B (zh) 基于Solr的检索方法、装置、设备和存储介质
JP5513624B2 (ja) クエリの一般属性に基づく情報の検索
US8468156B2 (en) Determining a geographic location relevant to a web page
US20170323023A1 (en) Techniques for presenting content to a user based on the user's preferences
US8761512B1 (en) Query by image
CN109522465A (zh) 基于知识图谱的语义搜索方法及装置
JP5721818B2 (ja) 検索におけるモデル情報群の使用
CN110390094B (zh) 对文档进行分类的方法、电子设备和计算机程序产品
WO2014071787A1 (zh) 检索应用的方法、装置及终端
CN110704743A (zh) 一种基于知识图谱的语义搜索方法及装置
JP6346218B2 (ja) オンライン取引プラットフォームのための検索方法、装置およびサーバ
WO2007062397A2 (en) Inferring search category synonyms from user logs
CN110990533B (zh) 确定查询文本所对应标准文本的方法及装置
US9971782B2 (en) Document tagging and retrieval using entity specifiers
US8700624B1 (en) Collaborative search apps platform for web search
CN102915381A (zh) 基于多维语义的可视化网络检索呈现系统及呈现控制方法
CA3051919C (en) Machine learning (ml) based expansion of a data set
KR20140091375A (ko) 사용자 질의 확장 기법을 이용한 시맨틱 콘텐츠 검색 시스템 및 방법
KR101592670B1 (ko) 인덱스를 이용하는 데이터 검색 장치 및 이를 이용하는 방법
TWI483129B (zh) Retrieval method and device
CN107423298B (zh) 一种搜索方法和装置
TWI484356B (zh) Retrieval methods, devices and systems
US20230214399A1 (en) Patent search system and method thereof
CN116126896A (zh) 数据检索方法及装置

Legal Events

Date Code Title Description
AS Assignment

Owner name: ALIBABA GROUP HOLDING LIMITED, CAYMAN ISLANDS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:XING, FEI;DONG, JING;GUO, NING;AND OTHERS;SIGNING DATES FROM 20100713 TO 20100714;REEL/FRAME:024704/0990

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551)

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8