WO2021053391A1 - Multilingual search queries and results - Google Patents

Multilingual search queries and results Download PDF

Info

Publication number
WO2021053391A1
WO2021053391A1 PCT/IB2020/000750 IB2020000750W WO2021053391A1 WO 2021053391 A1 WO2021053391 A1 WO 2021053391A1 IB 2020000750 W IB2020000750 W IB 2020000750W WO 2021053391 A1 WO2021053391 A1 WO 2021053391A1
Authority
WO
WIPO (PCT)
Prior art keywords
search results
language
search
query
determining
Prior art date
Application number
PCT/IB2020/000750
Other languages
French (fr)
Inventor
Vanessia WU
Deepak Narayan BOTE
Yiming Xu
Neema EBRAHIM-ZADEH
Adam Cohen
Original Assignee
Google Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google Llc filed Critical Google Llc
Publication of WO2021053391A1 publication Critical patent/WO2021053391A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3337Translation of the query language, e.g. Chinese to English

Definitions

  • This specification generally relates to obtaining and providing search results in response to executing a search query in multiple languages: a language in which the query is provided as well as one or more other languages.
  • a user device typically includes a user application, such as a web browser, via which the user device can submit a search query to a search engine.
  • the search engine identifies resources that are relevant to the search query.
  • the search engine identifies the resources in the form of search results and returns the search results to the user device on a search results page.
  • a search result is data generated by the search engine that identifies a resource, and includes a resource locator for the resource.
  • An example search result can include a web page title, a snippet of text extracted from the web page, and the URL of the web page.
  • a search engine executes a query in the language in which it is provided and provides a search results page that includes search results, which are also generally in the same language.
  • a user device can submit a separate, translated version of the query in another language, which generally results in the search engine providing a separate search results page including search results that are in that other language (or another language).
  • one innovative aspect of the subject matter described in this specification can be embodied in methods including the operations of receiving, from a user device associated with a user, a search query; determining, based on the search query, a first language that specifies a language in which the search query is to be executed; obtaining, for the search query, a first set of search results that are in the first language; determining to translate the search query into a second language that is different from the first language, wherein the determining comprises: determining, from a profile of the user, a set of languages and a confidence level for each language in the set of languages, wherein each language in the set of languages indicates a language of content that the user has previously accessed and wherein the confidence level for each language specifies a likelihood that the user understands the language; and identifying, based on the confidence levels, the second language from the set of languages; in response to determining to translate the search query into the second language, translating the search query into the second language to obtain a translated search query; obtaining, for
  • Another innovative aspect of the subject matter described in this specification can be embodied in methods including the operations of receiving, from a user device associated with a user, a search query; determining, based on the search query, a first language that specifies a language in which the search query is written; obtaining, for the search query, a first set of search results that are in the first language; determining, from a profile of the user, a set of languages in which the user has previously accessed content and a confidence level for each language in the set of languages, wherein the confidence level for a particular language specifies a likelihood that the user understands the particular language; identifying a second language from the set of languages, wherein the second language is different from the first language and the confidence level for the second language satisfies a first confidence threshold; translating the search query into the second language; obtaining, for the translated search query, a second set of search results that are in the second language; determining that a topic co-occurrence exists between a subset of search results from the second
  • determining to translate the search query into a second language that is different from the first language can include determining that a length of the search query is greater than a minimum query length threshold and is less than a maximum query length threshold.
  • determining to provide the second set of search results for presentation on the user device can be further based on a popularity of one or more search results in the second set of search results obtained for the translated search query, the popularity being based on the one or more search results being accessed a number of times that exceeds a popularity threshold.
  • identifying the second language from the set of languages can include determining that the confidence level for the second language does not satisfy a second confidence threshold.
  • methods can further include translating the second set of search results into the first language to obtain a translated second set of search results.
  • providing, for presentation on the user device, a search results page that includes the first set of search results and the second set of search results can include providing a search results page that includes the first set of search results and the translated second set of search results.
  • determining to translate the search query into the second language that is different from the first language can include one or more of: determining that the search query does not include any pornographic references; determining that the search query is not a navigational query; and determining, based on historical records of executed search queries, that a number of times the search query has been issued in the second language satisfies a particular threshold.
  • determining to provide the second set of search results for presentation on the user device can include comprises one or more of: determining, based on historical records of executed search queries, that the first set of search results has not received a number of interactions that satisfies an interaction threshold; determining that the second set of search results includes a number of search results in a language other than the second language that satisfies a language threshold; determining, based on historical records of executed search queries, that a number of times the search query has been issued in the second language satisfies a particular threshold; or determining that the second set of search results includes a number of search results with an IR score that does not satisfy an IR threshold.
  • determining the topic co-occurrence exists between a subset of search results from the second set of search results and a subset of search results from the first set of search results can include determining that the subset of search results from the second set of search results includes one or more entities that are present in the subset of search results from the first set of search results.
  • providing, for presentation on the user device, a search results page that includes the first set of search results and the second set of search results can include providing, for presentation on the user device, the first set of search results in a first portion of the search results page; and providing, for presentation on the user device, the second set of search results in a second portion of the search results page, wherein the second portion is delineated from the first portion.
  • each search result in the second set of search results can include an image obtained from a resource to which the search result is linked.
  • providing a search results page that includes the first set of search results and the second set of search results can include providing the search results page with two groupings of search results, with a first group including the first set of search results and a second group including the second set of search results.
  • the techniques described in this specification enable proactive provision of search results in one or more language(s) (other than the language of the query) without having to submit another translated version of the query.
  • the techniques described herein minimize the computing and network resources that would otherwise be needed to generate such a query (e.g., by using a separate translation service to generate such a query) and separately transmit such translated queries over the network.
  • the second set of search results may be provided proactively based upon a determination to provide the second set of search results using objective measures so that the second set of search results are provided when search results objectively enhance the first set of search results.
  • the techniques described in this specification provides an improved user interface that can provide a first set of search results, which are received in response to the untranslated query, and a second set of search results, which are responsive to a translated version of the query — all on a single search results page.
  • This is an improvement over other user interfaces for a search results page, which, for example, provide a separate search results page for each set of search results (i.e., the untranslated and translated search results).
  • the improved user interface provides the first set of search results in a vertical, list-type format on the search results page and the second set of search results in a carousel that displays these search results and enables navigating them in a horizontal fashion.
  • a user of the client device has simultaneous access to both sets of search results all within the same user interface and does not need to toggle back-and-forth between the two separate search results pages with the two sets of search results. User interaction with the search results is therefore improved.
  • the techniques described in this specification can also provide a translation engine as a component on the user interface that, e.g., is associated with the various search results in the different language.
  • a translation engine as a component on the user interface that, e.g., is associated with the various search results in the different language.
  • Such a technique is resource efficient because it avoids expending computing resources that would otherwise be required to repeatedly select portions of the search results text in the different language, inserting this text into a separate translation service (e.g., in a separate interface) to obtain the translation, and then return to the search results interface to repeat this process for other text in the different language.
  • a separate translation service e.g., in a separate interface
  • the improved user interface described in this specification enables a user to leam a language other than a language that a user understands.
  • an improved user interface for the search results page enables a user of a client device to access search results/content that a user typically would not access because such content may be in a language that the user does not understand. This results in an improvement in user experience and engagement on the content platform.
  • Figure 1 is a block diagram of an example environment in which digital content is distributed and provided for display on user devices.
  • Figure 2 is a flow diagram of an example process for translating a search query from a first language to a second language and providing search results in response to the search query in both languages.
  • Figures 3A-3D are example user interfaces showing a search results page being provided for display that includes the first and second sets of search results, which are provided in response to the submission of a search query in a first language and a second language, respectively.
  • Figure 4 is a block diagram of an example computer system that can be used to perform operations described.
  • This specification generally relates to obtaining and providing search results in response to executing a search query in multiple languages: a language in which the query is provided as well as one or more other languages.
  • a search engine can be configured to obtain a search query that is to be executed in a first language, determine to translate the search query into a second language that is different from the first language, obtain search results in response to the search query in both the first and the second languages, and provide these search results for display on a search results page.
  • the search engine can receive a search query from a user device associated with a user. Based on this received search query, the search engine can determine a first language that specifies a language in which the search query is to be executed. For example, the search engine can determine the first language by detecting the language in which the query is written or based on the language profile associated with the user device that submitted the query.
  • the search engine can also determine whether to translate the search query into another language (i. e. , a language other than the first language).
  • the search engine can evaluate several factors (e.g., content preferences, languages of content previously consumed by the user device, length of query, references to illicit information in the query, etc.) in determining whether to translate the search query into another language. If the search engine determines to translate the search query into another language based on the evaluation of one or more of these (or other appropriate) factors, the search engine identifies a language into which to translate this query (i.e., a second language).
  • the search engine identifies the second language based on the languages of content previously accessed by the user and their respective confidence levels representing the expected proficiency of the user of the user device in those languages. In response to determining to translate the search query and identifying the second language as the language into which the query is to be translated, the search engine translates the search query into the second language to obtain a translated search query.
  • the search engine obtains two sets of search results: a first set of search results that is obtained in response to executing the search query in the first language and a second set of search results that is obtained in response to executing the translated search query.
  • the search engine can determine whether to provide the second set of search results on a search results page.
  • the search engine can determine this based on an evaluation of one or more factors, including but not limited to, (1) whether a topic co-occurrence exists between a subset of search results from the second set of search results and a subset of search results from the first set of search results; (2) a popularity of the untranslated search query and/or the first set of search results received in response to the search query; (3) a popularity of one or more search results in the second set of search results obtained for the translated search query; (4) whether the second set of search results includes a number of search results in a language other than the second language; or (5) IR scores for one or more results in the second set of search results.
  • the search engine determines to provide the second set of search results, the search engine then provides a search results page for presentation on the user device that includes the first set of search results and the second set of search results. On the other hand, if the search engine determines not to provide the second set of search results, the search engine provides a search results page for presentation on the user device that includes the first set of search results but not the second set of search results. The second set of search results are therefore provided when the second set of search results objectively enhance the first set of search results.
  • Figure 1 is a block diagram of an example environment 100 in which digital content is distributed and provided for display on user devices.
  • a computer network 102 such as a local area network (LAN), wide area network (WAN), the Internet, or a combination thereof, connects publisher web sites 104, user devices 106, and the search engine 110.
  • the environment 100 may include many thousands of publisher web sites 104 and user devices 106.
  • a website 104 includes one or more resources 105 associated with a domain name and hosted by one or more servers.
  • An example website is a collection of web pages formatted in hypertext markup language (HTML) that can contain text, images, multimedia content, and programming elements, such as scripts.
  • HTML hypertext markup language
  • Each website 104 is maintained by a content publisher, which is an entity that controls, manages, and/or owns the website 104.
  • a resource is any data that can be provided by the publisher 104 over the network 102 and that can be associated with a resource address.
  • resources include HTML pages, word processing documents, and portable document format (PDF) documents, images, video, and feed sources, to name just a few.
  • the resources can include content (e.g., words, phrases, pictures) and may include embedded information (such as meta information and hyperlinks) and/or embedded instructions (such as scripts).
  • a user device 106 is an electronic device capable of requesting and receiving resources over the network 102.
  • Example user devices 106 include personal computers, mobile communication devices, and other devices that can send and receive data over the network 102.
  • a user device 106 typically includes a user application, such as a web browser or a native application, to facilitate the sending and receiving of data over the network 102.
  • the web browser can enable a user to display and interact with text, images, videos, music and other information typically located on a web page at a website on the world wide web or a local area network.
  • the search engine 108 identifies the resources by crawling the publisher web sites 104 and indexing the resources provided by the publisher web sites 104.
  • the indexed and, optionally, cached copies of the resources, are stored in an index 112.
  • the user devices 106 submit search queries to the search engine 110.
  • the search queries are submitted in the form of a search request that includes the search request and, optionally, a unique identifier that identifies the user device 106 that submits the request.
  • the unique identifier can be data from a cookie stored at the user device, or a user account identifier if the user maintains an account with the search engine 108, or some other identifier that identifies the user device 106 or the user using the user device.
  • the search engine 108 uses the index 112 to identify resources that are relevant to the queries.
  • the search engine 108 identifies the resources in the form of search results and returns the search results to the user devices 106 in a search results page resource.
  • a search result is data generated by the search engine 108 that identifies a resource that satisfies a particular search query, and includes a resource locator for the resource.
  • An example search result can include a web page title, a snippet of text extracted from the web page, and the URL of the web page.
  • the search results are ranked based on scores related to the resources identified by the search results, such as information retrieval (“IR”) scores, and optionally a separate ranking of each resource relative to other resources (e.g., an authority score).
  • the search results can be ordered according to these scores and provided to the user device 106 according to the order.
  • the user devices 106 receive the search results pages and render the pages for presentation to users. In response to the user selecting a search result at a user device 106, the user device 106 requests the resource identified by the resource locator included in the selected search result.
  • the publisher of the web site 104 hosting the resource receives the request for the resource from the user device 106 and provides the resource to the requesting user device 106.
  • the queries submitted from user devices 106 are stored in query logs 114.
  • Selection data e.g., click data
  • selection logs 116 are stored in selection logs 116.
  • the query logs 114 and the selection logs 116 define search history data 110 that include data from and related to previous search requests associated with unique identifiers.
  • the selection logs 116 define actions taken responsive to search results provided by the search engine 110.
  • the query logs 114 and selection logs 116 can be used to map queries submitted by the user devices 106 to web pages 104 that were identified in search results and the actions taken by users (i.e., that data are associated with the identifiers from the search requests so that a search history for each identifier can be accessed).
  • the selection logs 116 and query logs 114 can thus be used by the search engine 108 to determine the sequence of queries submitted by the user devices 106, the actions taken in response to the queries, and how often the queries are submitted.
  • Figure 2 is a flow diagram of an example process 200 for translating a search query from a first language to a second language and providing search results in response to the search query in both languages.
  • Operations of process 200 are described below as being performed by the components of the system described and depicted in Figure 1. Operations of the process 200 are described below for illustration purposes only. Operations of the process 200 can be performed by any appropriate device or system, e.g., any appropriate data processing apparatus. Operations of the process 200 can also be implemented as instructions stored on a non- transitory computer readable medium. Execution of the instructions cause one or more data processing apparatus to perform operations of the process 200.
  • a search engine 108 receives a search query from a user device 106 (at 205). As described above with reference to Figure 1, a user device 106 can submit a search query, which can be received and processed by the search engine 108.
  • the search engine 108 determines, based on the received search query, a first language specifying a language in which the search query is to be submitted/executed (at 210).
  • the search engine 108 can determine the language in which the search query is to be executed as the language in which the query is written.
  • a language detection service 118 (which can be part of the search engine 108 or separate and/or external from it) can be used to detect the first language in which the query is written.
  • the first language can be a predicted language in which the user of the user device 106 likely wants the results.
  • the search engine 108 can determine the first language using a machine learning model (e.g., a supervised or unsupervised machine learning model) to output the expected second language, where the machine learning model can be trained on previously-executed search queries, including the different parameters about the query (e.g., location of user device issuing query, entity identified in the query, language of content previously viewed by the user, etc.), and a corresponding label specifying the language in which the user wanted results.
  • a machine learning model e.g., a supervised or unsupervised machine learning model
  • the machine learning model may determine, e.g., based at least on the query’s location of Germany (among other parameters), that the first language (i.e., the language in which the query is to be executed) is German.
  • the search engine 108 obtains a first set of search results (at 215).
  • the search engine 108 executes the search query (as described with reference to Figure 1) and in response, retrieves a first set of search results, which are generally in the first language.
  • the search engine 108 determines whether to translate the search query into a second language that is different from the first language (at 220). In some implementations, the search engine 108 determines whether to translate the search query into the second language based at least on languages in which the user of the user device has previously accessed content. In such implementations, the search engine 108 determines, from a profile of the user of the user device 106, a set of languages corresponding to the languages of the content that the user has previously accessed and a confidence level for each language in the set of languages (as further described below).
  • the search engine 108 (or another content platform/publisher) can determine the language(s) of content that the user has previously accessed by parsing the attributes/metadata (e.g., a language attribute specifying the language of the content, a publication location, etc.) of the various content items accessed within a certain timeframe (e.g., one week, one month).
  • the search engine 108 (or another content platform/publisher) can include a set of rules to determine the language of the content item based on the identified attribute(s)/metadata of the content item and its/their corresponding value(s).
  • the search engine 108 obtains the corresponding value of the language parameter that indicates the language in which the content item is written or otherwise provided. In this example, the search engine 108 determines that the language of the content item is the language specified by the value of the language attribute. As another example, if the content item includes a publication location that specifies the country of publication of the content item as USA and/or a title/abstract attribute that is written in English (e.g., as detected by the language detection service 118), the search engine 108 determines the language of the content item is English.
  • the search engine 108 can also parse portions of the previously-accessed content items in determining the language of content that the user has previously accessed. For example, the search engine 108 can identify portions of previously-used content item (e.g., at random, a specified number of words of the content item, a specified amount of playing time of the content item, if the content item is non-textual) and use the language detection service 118 to identify the language of those portion of the content items. Based on this language identification for a portion of a content item, the search engine 108 determines that the language of the content item as a whole is the same as the language of the portion of the content item.
  • portions of previously-used content item e.g., at random, a specified number of words of the content item, a specified amount of playing time of the content item, if the content item is non-textual
  • the search engine 108 (or another content platform/publisher) can store a data structure (e.g., a table) that maintains a correlation between the content items previously accessed by the user of the user device and an identification of the determined language (e.g., determined using the example techniques described above or other appropriate techniques) for that content item.
  • the search engine can generate summary data specifying, e.g., a distribution of number (or percentage) of content items viewed/accessed by the user corresponding to the respective language of that content item.
  • such a distribution can be in the following forms: [ (English: 240)
  • the search engine 108 determines a confidence level for each of those languages.
  • the confidence level for a particular language specifies a likelihood that the user of the user device 106 understands that language.
  • the search engine 108 uses a rules-based engine, a machine learning model (e.g., a supervised or unsupervised model), or another appropriate statistical model to compute the confidence level for each language.
  • the rules-based engine defines a set of rules that are analyzed against the content and language data.
  • the rules-based engine can specify rules that can be used to determine the confidence levels, such as, e.g., (1) if 70% or more of the content accessed by the user is in a particular language, then there is a 90% confidence that the user understands that language; or (2) if 20% or less of the content accessed by the user is in a particular language, then there is a 50% confidence that the user understands that language.
  • the model can be trained on content/language datasets (specifying various parameters, e.g., language, type of content, length of content, etc.) for various users, and corresponding known labels specifying proficiency of the user in those language (e.g., 1 indicating full proficiency/understanding of the language, 0 indicating no proficiency/understanding of the language, 0.5 indicating intermediate proficiency/understanding of the language).
  • the resulting/trained model can accept language/content data for the particular user as input and output a data structure identifying various languages and the corresponding confidence level indicating the expected proficiency of the user in each such language.
  • the search engine 108 based on the set of languages and the corresponding confidence levels (as determined by the search engine and stored in a user profile for the user or as obtained from a content platform/publisher that determines the confidence levels), the search engine 108 identifies a second language into which the search query can be translated.
  • the search engine 108 can identify the second language based on the confidence levels determined for the set of languages. For example, the search engine can identify the language, other than the first language, that has the highest determined confidence level.
  • the search engine 108 ignores confidence value for English (since English is the first language in which the query is submitted) and identifies the highest confidence value/level (0.32>0.18>0.12) and the corresponding language (Spanish).
  • Spanish is identified as the second language.
  • the identified second language represents a language that the user is likely able to understand.
  • a confidence threshold is applied in evaluating whether to identify a particular language as the second language into which the query should be translated. If the language with the highest confidence value (other than the first language) satisfies (e.g., meets or exceeds) this threshold, then the search engine 108 identifies this language as the second language into which the search query is to be translated. On the other hand, if the language with the highest confidence value (other than the first language) does not satisfy (e.g., is less than) this threshold, then the search engine 108 determines not to identify this language as the second language into which the search query is to be translated. Using the above example, if the confidence threshold is 50% (0.5), the search engine 108 determines that the confidence level for Spanish (which is the language other than English with the confidence value/level) is less than the confidence threshold and thus, the query should not be translated into another language.
  • one or more of the following additional factors/attributes may be evaluated by the search engine 108 in determining whether to translate the search query into the second language. These can include, among others, (1) query length, (2) references to pornographic or other illicit information in the query, (3) frequency/ number of times the user of the user device 106 has executed search queries in languages other than the first language, or (4) whether the query is a navigational query (as described below). Each of these attributes and the search engine 108’s processing with respect to these attributes in determining whether to translate the search query, is described in the following paragraphs.
  • the search engine 108 determines a length of the search query, which can indicate whether the query should be translated. For example, a short query (e.g., less than five characters such as “ball”) or a long query (e.g., greater than 200 characters) are generally not good candidates for translation.
  • the search engine 108 determines the length of the search query by counting the characters included in the search query. For example, for a query “where is the zoo,” the search engine 108 would count each character of the query (excluding spaces between words) and determine that the length of the query is 13. Alternatively, or additionally, the search engine 108 determines the length of the search query by counting the words included in the search query.
  • the search engine 108 can count each sequence of characters that is separated by a space as one word. In the above example query, the search engine 108 would determine that the query includes four words. Additionally, or alternatively, the search engine 108 can apply other techniques (e.g., spatial length) in determining the query length.
  • the length of the query is compared to a minimum query length threshold and/or a maximum query length threshold. If the query length is less than the minimum query length threshold (e.g., two words and twenty characters, five words, thirty characters) and/or greater than the maximum query length threshold (e.g., ten words and 140 characters, twenty words, 200 characters), the search engine 108 determines that the query should not be translated into another language. On the other hand, if the query length is same as or greater than the minimum query length threshold and/or query length is less than or same as the maximum query length threshold, the search engine 108 determines that the query should be translated into another language.
  • the minimum query length threshold e.g., two words and twenty characters, five words, thirty characters
  • the maximum query length threshold e.g., ten words and 140 characters, twenty words, 200 characters
  • the search engine 108 can determine whether to translate the search query based on whether the search query includes any pornographic or other illicit references.
  • the search engine 108 can maintain a list of words or phrases that are known to be pornographic (or other illicit) references, and can compare word(s) of the search query with this known list of words and/or phrases. If a match is found (e.g., an exact match between the search query and a word/phrase in the list, a partial textual match between some words in the query and a word/phrase in the list), the search engine 108 determines that the query should not be translated. On the other hand, if a match is not found (e.g., no exact match or partial textual match), the search engine 108 determines that the query should be translated into another language.
  • the search engine 108 can determine whether to translate the search query based on whether the user device 106 has previously submitted search queries written in another language (i.e., a language other than the language of the search query). In some implementations, the search engine 108 determines, based on historical records of executed search queries by the user of the user device 106, whether the user of the user device 106 has submitted search queries in another language and if so, whether the number of times that search queries have been issued in one or more other languages (i.e., languages other than the first language) satisfies (e.g., meets or exceeds) a particular threshold.
  • languages other languages i.e., languages other than the first language
  • the search engine 108 can determine that the search query should be translated.
  • the search engine 108 can determine whether to translate the search query based on whether the search query is a navigational query.
  • a search query is navigational if it seeks a particular website or web page. For example, the name of a social networking website, if entered as a search query, would be considered navigational.
  • navigational queries are the names of the popular/well-known entities. Accordingly, in some implementations, the search engine 108 can maintain a list of popular entities and compare the entered search query against that list. If a match (e.g., an exact match) is found, the search engine 108 determines that the search query is navigational and determines that the search query should not be translated.
  • the search engine 108 determines that the search query is not navigational and determines that the search query should be translated (or evaluates one or more additional factors in determining whether to translate the search query).
  • the search engine 108 determines that the search query should be translated.
  • each of the above-analyzed factors can be assigned a score that can be combined (e.g., summed up) and/or normalized. If the normalized/ combined score satisfies (e.g., meets or exceeds) a predetermined threshold, the search engine 108 can determine that the search query should be translated. On the other hand, if the normalized score does not satisfy the predetermined threshold, the search engine 108 can determine that the search query should not be translated. Other techniques for combining the one or more of the above (or other) factors can be used to determine whether to translate the search query.
  • the search engine 108 can determine the second language in which to translate the search query based on the set of languages in which the user has previously accessed content and the corresponding confidence levels for each of those languages, as described above.
  • the search engine 108 in response to determining to translate the search query into the second language, the search engine 108 translates the search query into the second language (at 225).
  • the search engine 108 includes a translation engine 120 (or interacts with a translation engine 120 that may be separate and/or external to the search engine 108) to translate the search query into the identified second language to obtain a translated search query in the second language.
  • the search engine 108 obtains a second set of search results for the translated search query (at 230).
  • the search engine 108 executes the translated search query and retrieves a second set of search results that are responsive to this translated search query. This second set of search results is generally in the second language; however, the search results may also be in one or more other languages.
  • the search engine 108 determines whether to provide the second set of search results along with the first set of search results (at 235). In some implementations, the search engine 108 determines whether to provide the second set of search results along with the first set of search results based on an evaluation of one or more of the following factors: (1) whether a topic co-occurrence exists between a subset of search results from the second set of search results and a subset of search results from the first set of search results; (2) a popularity of the first search query and/or the first set of search results received in response to the first search query; (3) a popularity of one or more search results in the second set of search results obtained for the translated search query; (4) whether the second set of search results includes a number of search results in a language other than the second language; or (5) IR scores for one or more results in the second set of search results.
  • factors and the corresponding operation of the search engine 108 is described below.
  • the search engine 108 determines whether a topic co occurrence exists between a subset of search results from the second set of search results (e.g., a top-N search results in the second set of search results) and a subset of search results from the first set of search results (e.g., a top-N search results in the second set of search results). The search engine does this by determining that the subset of search results from the second set of search results includes one or more entities that are also present in the subset of search results from the first set of search results. In such implementations, the search engine 108 parses each subset of search results to identify the entity/entities identified in the results and compares the lists of entities for the two subsets of search results.
  • the search engine 108 determines to provide the second set of search results (or determines to evaluate additional factors that can indicate whether to provide the second set of search results). On the other hand, if the subset of search results from the second set does not include a threshold number of matching entities that are included in the subset of search results for the first set, the search engine 108 determines not to provide the second set of search results. [0073] Additionally, or alternatively, the search engine 108 can determine whether to provide the second set of search results based on the popularity of the search query (in the first language) and/or the first set of search results.
  • the popularity can be based on the number of times that the search query has been submitted and/or the number of times that one or more results in the first set of search results has been interacted with by user devices 106.
  • the search engine 106 determines the popularity of the search query and/or the first set of search results by determining, based on historical records of executed search queries, whether the search query (in the first language) has been issued a threshold number of times and/or whether one or more results of the first set of search results provided in response to the search query has previously received a threshold number of interactions (e.g., user clicks, time spent on the one or more of the first set of search results, etc.).
  • the search engine can determine to provide the second set of search results.
  • the search engine 106 can determine not to provide the second set of search results.
  • the search engine 108 can determine whether to provide the second set of search results based on a popularity of the search query in the second language.
  • the search engine 108 can determine the popularity of the search query in the second language by determining, based on historical records of executed search queries, that the number of times the search query has been issued in the second language satisfies (e.g., meets or exceeds) a particular threshold.
  • this factor weighs in favor of providing the second set of search results when the number of times the search has been executed in French (e.g., by a user device 106 for which the first language is a language other than French) satisfies (e.g., meets or exceeds) a particular threshold.
  • the search engine 108 can determine whether to provide the second set of search results based on whether the second set of search results includes a number of search results in a language other than the second language. If this number satisfies (e.g. , meets or exceeds) a language threshold (which can be dynamic or static), the search engine 108 can determine not to provide the second set of search results. On the other hand, if this number does not satisfy (e.g., is less than) a language threshold (which can be dynamic or static), the search engine 108 can determine to provide the second set of search results.
  • a language threshold which can be dynamic or static
  • the search engine 108 can determine whether to provide the second set of search results based on the search engine’s determined IR scores (or another similar metric) (as described above with reference to Figure 1) for the second set of search results. In some implementations, if the search engine 108 determines that a certain number of search results in the second set of search results has an IR score that satisfies (e.g., meets or exceeds) an IR threshold, the search engine 108 determines to provide the second set of search results.
  • the search engine 108 determines that a certain number of search results in the second set of search results has an IR score that does not satisfy (e.g., is less than) the IR threshold. Alternatively, or additionally, in some implementations, the search engine 108 can determine whether to provide the second set of search results based on the search engine’s comparison between the determined IR scores (as described above with reference to Figure 1) of the first set of search results and the second set of search results.
  • the search engine 108 determines that a certain number of search results in the second set of search results has an IR score that is the same as (or greater than) the IR scores for a subset of the first set of search results. For example, if the search engine 108 determines that a certain number of search results in the second set of search results has an IR score that is the same as (or greater than) the IR scores for a subset of the first set of search results, the search engine 108 determines to provide the second set of search results. On the other hand, if the search engine 108 determines that a certain number of search results in the second set of search results has an IR score that is less than the IR scores for a subset of the first set of search results, the search engine 108 determines not to provide the second set of search results.
  • the search engine 108 can process/analyze multiple factors described above (or appropriate factors) in determining whether to provide the second set of search results.
  • the search engine 108 determines not to provide the second set of search results based on the above-described processing, the search engine provides, for presentation on a user device 106, a search results page that includes the first set of search results (but not the second set of search results) (at 240). On the other hand, if the search engine 108 determines to provide the second set of search results based on the above-described processing, the search engine 106 provides, for presentation on the user device 106, a search results page (such as those shown in Figures 3A-3D) that includes the first set of search results as well as the second set of search results (at 245).
  • a search results page such as those shown in Figures 3A-3D
  • the search engine 108 can translate (e.g., using the translation engine 120) one or more search results in the second set of search results into the first language prior to providing these for display in the search results page. For example, if the second language is French and some of the search results are in French (e.g., as determined by the language detection service 118), the search engine 108 can translate these French search results into the first language (e.g., English). Alternatively, instead of translating the search results in the second set of search results, the search engine 108 can provide a user-selectable translation interface component (e.g., a button or plug-in) in proximity to the appropriate search result(s) in the second language (or another language other than the first language).
  • a user-selectable translation interface component e.g., a button or plug-in
  • the search engine 108 can invoke the translation engine 120 to provide a translation of the specific search result from the second language (or another language other than the first language) to the first language.
  • this user-selectable interface component e.g., a button or a plug-in; as shown at element 342 in Figure 3D
  • the search engine 108 can invoke the translation engine 120 to provide a translation of the specific search result from the second language (or another language other than the first language) to the first language.
  • the search engine 108 can obtain a search query that is to be executed in a first language, determines to translate the search query into a second language (or one or more additional languages), obtain search results in response to the search query in the first language and the second language (and/or one or more other languages), and provide these search results for display within a search results page.
  • Figures 3A-3D show example user interfaces in which a search results page is provided for display on the user device 106 and includes the first and second sets of search results that are provided in response to the submission of a search query in a first language and a second language, respectively.
  • a search results page 308 is provided for the search query 306 of “brexit.”
  • the first set of search results (in English) are provided as search results 302-A and 302-B.
  • the second set of search results (in French) are provided in the carousel/panel 304.
  • the user of the user device 106 can horizontally scroll/navigate through the second set of search results provided in this carousel/panel 304.
  • the user of the user device 106 can vertically scroll/navigate through the first set of search results.
  • the first set of search results are provided in a first portion of the search results page and the second set of search results are provided in a second portion of the search results page that is delineated from the first portion.
  • UI visually-separated user interface
  • Figure 3B shows another search results page 310 that is provided in response to the search query 312 of “causes of severe headache.”
  • the search results page 310 displays the translated search query (i.e., the search query in the second language).
  • This translated search query is a clickable/selectable link that, upon being clicked or selected, results in the search engine executing and providing a separate search results page with search results just for the translated query.
  • a user of the user device 106 may click on/select this link, e.g., to obtain additional search results for the translated search query (beyond the number of results provided in the carousel/panel 318).
  • Figure 3C shows another search results page 320 that is provided in response to the search query 322 of “brexit.”
  • the first set of search results (324 and 326) are shown in the same manner as in Figure 3A and the second set of search results are shown in a panel/carousel 328 (as in Figure 3A).
  • Each (or at least some) of the second set of search results provided in the carousel/panel 328 include an image (e.g., 330-A and 330-B).
  • the search engine 106 obtains each of the images from the underlying resource corresponding to the search result and, as shown, provides each such image for display in association with the respective search result.
  • Figure 3D shows another search results page 332 that is provided in response to the search query 334 of “brexit.”
  • the first set of search results (336) is shown in the same manner as in Figure 3A and the second set of search results are shown in a panel/carousel 338 (as in Figure 3A).
  • Each (or at least some) of the second set of search results provided in the carousel/panel 328 includes an image from the resource corresponding to the search result (as in Figure 3C).
  • the carousel/panel 338 is configured to enable provision of the second set of search results for queries translated in more than one language other than the first language (e.g., English).
  • the carousel/panel 338 can include a set of buttons 340, each button being associated with a particular country. Pressing a particular button from this set of buttons (340) results in the search query being translated into the language for the country associated with the selected button.
  • the search engine 108 executes the search query against Irish resources and provides a set of results in the carousel/panel 338.
  • the search engine 108 executes the search query against German resources and provides a set of results in the carousel/panel 338.
  • the search engine 108 provides a user-selectable translation interface component 342 (e.g., a button or plug-in) in proximity to the search result.
  • the search engine 108 can invoke the translation engine 120 to provide a translation of the specific search result from the language detected language (e.g., German) to the first language (e.g., English) and provided the translated search result on the same search results page (e.g., in place of the original, untranslated search result).
  • the language detected language e.g., German
  • the first language e.g., English
  • the search engine 108 automatically invokes the translation engine 120 when it detects a search result is in a language other than the first language and obtains the translation in the first language of the search result. In such implementations, the search engine 108 automatically provides, within the carousel 338, the translated search result in the first language.
  • FIG. 4 is a block diagram of an example computer system 400 that can be used to perform operations described above.
  • the system 400 includes a processor 410, a memory 420, a storage device 430, and an input/output device 440.
  • Each of the components 410, 420, 430, and 440 can be interconnected, for example, using a system bus 450.
  • the processor 410 is capable of processing instructions for execution within the system 400.
  • the processor 410 is a single-threaded processor.
  • the processor 410 is a multi-threaded processor.
  • the processor 410 is capable of processing instructions stored in the memory 420 or on the storage device 430.
  • the memory 420 stores information within the system 400.
  • the memory 420 is a computer-readable medium.
  • the memory 420 is a volatile memory unit.
  • the memory 420 is a non-volatile memory unit.
  • the storage device 430 is capable of providing mass storage for the system 400.
  • the storage device 430 is a computer-readable medium.
  • the storage device 430 can include, for example, a hard disk device, an optical disk device, a storage device that is shared over a network by multiple computing devices (e.g., a cloud storage device), or some other large capacity storage device.
  • the input/output device 440 provides input/output operations for the system 400.
  • the input/output device 440 can include one or more of a network interface devices, e.g., an Ethernet card, a serial communication device, e.g., and RS-232 port, and/or a wireless interface device, e.g., and 802.11 card.
  • the input/output device can include driver devices configured to receive input data and send output data to peripheral devices 460, e.g., keyboard, printer and display devices.
  • peripheral devices 460 e.g., keyboard, printer and display devices.
  • Other implementations, however, can also be used, such as mobile computing devices, mobile communication devices, set-top box television client devices, etc.
  • Embodiments of the subject matter and the operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
  • Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on computer storage media (or medium) for execution by, or to control the operation of, data processing apparatus.
  • the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
  • a computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them. Moreover, while a computer storage medium is not a propagated signal, a computer storage medium can be a source or destination of computer program instructions encoded in an artificially-generated propagated signal. The computer storage medium can also be, or be included in, one or more separate physical components or media (e.g., multiple CDs, disks, or other storage devices). [0094] The operations described in this specification can be implemented as operations performed by a data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.
  • the term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, a system on a chip, or multiple ones, or combinations, of the foregoing.
  • the apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
  • the apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of them.
  • the apparatus and execution environment can realize various different computing model infrastructures, such as web services, distributed computing and grid computing infrastructures.
  • a computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, obj ect, or other unit suitable for use in a computing environment.
  • a computer program may, but need not, correspond to a file in a file system.
  • a program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
  • a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
  • the processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform actions by operating on input data and generating output.
  • the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
  • processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors.
  • a processor will receive instructions and data from a read-only memory or a random access memory or both.
  • the essential elements of a computer are a processor for performing actions in accordance with instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
  • mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
  • a computer need not have such devices.
  • a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device (e.g., a universal serial bus (USB) flash drive), to name just a few.
  • Devices suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
  • the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
  • a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and a pointing device e.g., a mouse or a trackball
  • Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
  • a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to
  • Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g. , as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components.
  • the components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network.
  • Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), an inter-network (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer- to-peer networks).
  • LAN local area network
  • WAN wide area network
  • inter-network e.g., the Internet
  • peer-to-peer networks e.g., ad hoc peer- to-peer networks.
  • the computing system can include clients and servers.
  • a client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
  • a server transmits data (e.g., an HTML page) to a client device (e.g., for purposes of displaying data to and receiving user input from a user interacting with the client device).
  • client device e.g., for purposes of displaying data to and receiving user input from a user interacting with the client device.
  • Data generated at the client device e.g., a result of the user interaction

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining and providing search results in response to executing a search query in a first language as well as one or more other languages. In one aspect, a search query that is to be executed in a first language is received from a user device. A first set of results that are in the first language are obtained. The search query can be translated into a second language, and a second set of search result are obtained for the translated query. A topic co-occurrence between search results from the second set of search results and search results from the first set of search results is determined. If a topic co-occurrence exists, the second set of search results are provided for display on the user device along with the first set of search results.

Description

MULTILINGUAL SEARCH QUERIES AND RESULTS
CROSS-REFERENCE TO RELATED APPLICATIONS [0001] This application claims the benefit of U.S. Patent Application No. 62/903,698, entitled “MULTILINGUAL SEARCH RESULTS,” filed September 20, 2019, which is incorporated herein by reference in its entirety.
BACKGROUND
[0002] This specification generally relates to obtaining and providing search results in response to executing a search query in multiple languages: a language in which the query is provided as well as one or more other languages.
[0003] A user device typically includes a user application, such as a web browser, via which the user device can submit a search query to a search engine. In response to the search query, the search engine identifies resources that are relevant to the search query. The search engine identifies the resources in the form of search results and returns the search results to the user device on a search results page. A search result is data generated by the search engine that identifies a resource, and includes a resource locator for the resource. An example search result can include a web page title, a snippet of text extracted from the web page, and the URL of the web page.
[0004] In general, a search engine executes a query in the language in which it is provided and provides a search results page that includes search results, which are also generally in the same language. A user device can submit a separate, translated version of the query in another language, which generally results in the search engine providing a separate search results page including search results that are in that other language (or another language).
SUMMARY
[0005] In general, one innovative aspect of the subject matter described in this specification can be embodied in methods including the operations of receiving, from a user device associated with a user, a search query; determining, based on the search query, a first language that specifies a language in which the search query is to be executed; obtaining, for the search query, a first set of search results that are in the first language; determining to translate the search query into a second language that is different from the first language, wherein the determining comprises: determining, from a profile of the user, a set of languages and a confidence level for each language in the set of languages, wherein each language in the set of languages indicates a language of content that the user has previously accessed and wherein the confidence level for each language specifies a likelihood that the user understands the language; and identifying, based on the confidence levels, the second language from the set of languages; in response to determining to translate the search query into the second language, translating the search query into the second language to obtain a translated search query; obtaining, for the translated search query, a second set of search results that are in the second language; determining a topic co-occurrence exists between a subset of search results from the second set of search results and a subset of search results from the first set of search results; in response to determining that the topic co-occurrence exists, determining to provide the second set of search results for presentation on the user device; and providing, for presentation on the user device, a search results page that includes the first set of search results and the second set of search results. .
[0006] In general, another innovative aspect of the subject matter described in this specification can be embodied in methods including the operations of receiving, from a user device associated with a user, a search query; determining, based on the search query, a first language that specifies a language in which the search query is written; obtaining, for the search query, a first set of search results that are in the first language; determining, from a profile of the user, a set of languages in which the user has previously accessed content and a confidence level for each language in the set of languages, wherein the confidence level for a particular language specifies a likelihood that the user understands the particular language; identifying a second language from the set of languages, wherein the second language is different from the first language and the confidence level for the second language satisfies a first confidence threshold; translating the search query into the second language; obtaining, for the translated search query, a second set of search results that are in the second language; determining that a topic co-occurrence exists between a subset of search results from the second set of search results and a subset of search results from the first set of search results; in response to determining that the topic co-occurrence exists, determining to provide the second set of search results for presentation on the user device; and providing, for presentation on the user device, a search results page with two groupings of search results, with a first group including the first set of search results and a second group including the second set of search results.
[0007] Other embodiments of these aspect include corresponding methods, apparatus, and computer programs, configured to perform the actions of the methods, encoded on computer storage devices. These and other embodiments can each optionally include one or more of the following features. [0008] In some implementations, determining to translate the search query into a second language that is different from the first language can include determining that a length of the search query is greater than a minimum query length threshold and is less than a maximum query length threshold.
[0009] In some implementations, determining to provide the second set of search results for presentation on the user device can be further based on a popularity of one or more search results in the second set of search results obtained for the translated search query, the popularity being based on the one or more search results being accessed a number of times that exceeds a popularity threshold.
[0010] In some implementations, identifying the second language from the set of languages can include determining that the confidence level for the second language does not satisfy a second confidence threshold.
[0011] In some implementations, methods can further include translating the second set of search results into the first language to obtain a translated second set of search results. [0012] In some implementations, providing, for presentation on the user device, a search results page that includes the first set of search results and the second set of search results can include providing a search results page that includes the first set of search results and the translated second set of search results.
[0013] In some implementations, determining to translate the search query into the second language that is different from the first language, can include one or more of: determining that the search query does not include any pornographic references; determining that the search query is not a navigational query; and determining, based on historical records of executed search queries, that a number of times the search query has been issued in the second language satisfies a particular threshold.
[0014] In some implementations, determining to provide the second set of search results for presentation on the user device can include comprises one or more of: determining, based on historical records of executed search queries, that the first set of search results has not received a number of interactions that satisfies an interaction threshold; determining that the second set of search results includes a number of search results in a language other than the second language that satisfies a language threshold; determining, based on historical records of executed search queries, that a number of times the search query has been issued in the second language satisfies a particular threshold; or determining that the second set of search results includes a number of search results with an IR score that does not satisfy an IR threshold. [0015] In some implementations, determining the topic co-occurrence exists between a subset of search results from the second set of search results and a subset of search results from the first set of search results can include determining that the subset of search results from the second set of search results includes one or more entities that are present in the subset of search results from the first set of search results.
[0016] In some implementations, providing, for presentation on the user device, a search results page that includes the first set of search results and the second set of search results can include providing, for presentation on the user device, the first set of search results in a first portion of the search results page; and providing, for presentation on the user device, the second set of search results in a second portion of the search results page, wherein the second portion is delineated from the first portion.
[0017] In some implementations, each search result in the second set of search results can include an image obtained from a resource to which the search result is linked.
[0018] In some implementations, providing a search results page that includes the first set of search results and the second set of search results can include providing the search results page with two groupings of search results, with a first group including the first set of search results and a second group including the second set of search results.
[0019] The subject matter described in this specification can be implemented in particular embodiments to realize one or more of the following advantages.
[0020] The techniques described in this specification enable proactive provision of search results in one or more language(s) (other than the language of the query) without having to submit another translated version of the query. As a result, the techniques described herein minimize the computing and network resources that would otherwise be needed to generate such a query (e.g., by using a separate translation service to generate such a query) and separately transmit such translated queries over the network. The second set of search results may be provided proactively based upon a determination to provide the second set of search results using objective measures so that the second set of search results are provided when search results objectively enhance the first set of search results.
[0021] Moreover, the techniques described in this specification provides an improved user interface that can provide a first set of search results, which are received in response to the untranslated query, and a second set of search results, which are responsive to a translated version of the query — all on a single search results page. This is an improvement over other user interfaces for a search results page, which, for example, provide a separate search results page for each set of search results (i.e., the untranslated and translated search results). In some implementations (and as further described in this specification), the improved user interface provides the first set of search results in a vertical, list-type format on the search results page and the second set of search results in a carousel that displays these search results and enables navigating them in a horizontal fashion. As a result, a user of the client device has simultaneous access to both sets of search results all within the same user interface and does not need to toggle back-and-forth between the two separate search results pages with the two sets of search results. User interaction with the search results is therefore improved.
[0022] In addition to providing the second set of search results that are responsive to the search query in a different language (i.e., a language other than the one in which the query is written), the techniques described in this specification can also provide a translation engine as a component on the user interface that, e.g., is associated with the various search results in the different language. By enabling direct access to such a translation component for the different search results, the techniques described in this specification enable ready translation of search results on the search results page without having to invoke a separate translation service to achieve the translation. Such a technique is resource efficient because it avoids expending computing resources that would otherwise be required to repeatedly select portions of the search results text in the different language, inserting this text into a separate translation service (e.g., in a separate interface) to obtain the translation, and then return to the search results interface to repeat this process for other text in the different language.
[0023] Further still, the improved user interface described in this specification enables a user to leam a language other than a language that a user understands. For example, an improved user interface for the search results page enables a user of a client device to access search results/content that a user typically would not access because such content may be in a language that the user does not understand. This results in an improvement in user experience and engagement on the content platform.
[0024] The details of one or more embodiments of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.
BRIEF DESCRIPTION OF THE DRAWINGS [0025] Figure 1 is a block diagram of an example environment in which digital content is distributed and provided for display on user devices. [0026] Figure 2 is a flow diagram of an example process for translating a search query from a first language to a second language and providing search results in response to the search query in both languages.
[0027] Figures 3A-3D are example user interfaces showing a search results page being provided for display that includes the first and second sets of search results, which are provided in response to the submission of a search query in a first language and a second language, respectively.
[0028] Figure 4 is a block diagram of an example computer system that can be used to perform operations described.
[0029] Like reference numbers and designations in the various drawings indicate like elements.
DETAILED DESCRIPTION
[0030] This specification generally relates to obtaining and providing search results in response to executing a search query in multiple languages: a language in which the query is provided as well as one or more other languages.
[0031] As summarized below and described in greater detail throughout this specification, a search engine can be configured to obtain a search query that is to be executed in a first language, determine to translate the search query into a second language that is different from the first language, obtain search results in response to the search query in both the first and the second languages, and provide these search results for display on a search results page.
[0032] In some implementations, the search engine can receive a search query from a user device associated with a user. Based on this received search query, the search engine can determine a first language that specifies a language in which the search query is to be executed. For example, the search engine can determine the first language by detecting the language in which the query is written or based on the language profile associated with the user device that submitted the query.
[0033] The search engine can also determine whether to translate the search query into another language (i. e. , a language other than the first language). The search engine can evaluate several factors (e.g., content preferences, languages of content previously consumed by the user device, length of query, references to illicit information in the query, etc.) in determining whether to translate the search query into another language. If the search engine determines to translate the search query into another language based on the evaluation of one or more of these (or other appropriate) factors, the search engine identifies a language into which to translate this query (i.e., a second language). In some implementations, the search engine identifies the second language based on the languages of content previously accessed by the user and their respective confidence levels representing the expected proficiency of the user of the user device in those languages. In response to determining to translate the search query and identifying the second language as the language into which the query is to be translated, the search engine translates the search query into the second language to obtain a translated search query.
[0034] The search engine obtains two sets of search results: a first set of search results that is obtained in response to executing the search query in the first language and a second set of search results that is obtained in response to executing the translated search query.
[0035] The search engine can determine whether to provide the second set of search results on a search results page. The search engine can determine this based on an evaluation of one or more factors, including but not limited to, (1) whether a topic co-occurrence exists between a subset of search results from the second set of search results and a subset of search results from the first set of search results; (2) a popularity of the untranslated search query and/or the first set of search results received in response to the search query; (3) a popularity of one or more search results in the second set of search results obtained for the translated search query; (4) whether the second set of search results includes a number of search results in a language other than the second language; or (5) IR scores for one or more results in the second set of search results.
[0036] If, based on the evaluation of one or more of these (or other) factors, the search engine determines to provide the second set of search results, the search engine then provides a search results page for presentation on the user device that includes the first set of search results and the second set of search results. On the other hand, if the search engine determines not to provide the second set of search results, the search engine provides a search results page for presentation on the user device that includes the first set of search results but not the second set of search results. The second set of search results are therefore provided when the second set of search results objectively enhance the first set of search results.
[0037] These and additional features are further described with reference to Figure 1-4.
[0038] Figure 1 is a block diagram of an example environment 100 in which digital content is distributed and provided for display on user devices.
[0039] A computer network 102, such as a local area network (LAN), wide area network (WAN), the Internet, or a combination thereof, connects publisher web sites 104, user devices 106, and the search engine 110. The environment 100 may include many thousands of publisher web sites 104 and user devices 106. [0040] A website 104 includes one or more resources 105 associated with a domain name and hosted by one or more servers. An example website is a collection of web pages formatted in hypertext markup language (HTML) that can contain text, images, multimedia content, and programming elements, such as scripts. Each website 104 is maintained by a content publisher, which is an entity that controls, manages, and/or owns the website 104.
[0041] A resource is any data that can be provided by the publisher 104 over the network 102 and that can be associated with a resource address. Examples of resources include HTML pages, word processing documents, and portable document format (PDF) documents, images, video, and feed sources, to name just a few. The resources can include content (e.g., words, phrases, pictures) and may include embedded information (such as meta information and hyperlinks) and/or embedded instructions (such as scripts).
[0042] A user device 106 is an electronic device capable of requesting and receiving resources over the network 102. Example user devices 106 include personal computers, mobile communication devices, and other devices that can send and receive data over the network 102. A user device 106 typically includes a user application, such as a web browser or a native application, to facilitate the sending and receiving of data over the network 102. The web browser can enable a user to display and interact with text, images, videos, music and other information typically located on a web page at a website on the world wide web or a local area network.
[0043] To facilitate searching of these resources 105, the search engine 108 identifies the resources by crawling the publisher web sites 104 and indexing the resources provided by the publisher web sites 104. The indexed and, optionally, cached copies of the resources, are stored in an index 112.
[0044] The user devices 106 submit search queries to the search engine 110. The search queries are submitted in the form of a search request that includes the search request and, optionally, a unique identifier that identifies the user device 106 that submits the request. The unique identifier can be data from a cookie stored at the user device, or a user account identifier if the user maintains an account with the search engine 108, or some other identifier that identifies the user device 106 or the user using the user device.
[0045] In response to the search request, the search engine 108 uses the index 112 to identify resources that are relevant to the queries. The search engine 108 identifies the resources in the form of search results and returns the search results to the user devices 106 in a search results page resource. A search result is data generated by the search engine 108 that identifies a resource that satisfies a particular search query, and includes a resource locator for the resource. An example search result can include a web page title, a snippet of text extracted from the web page, and the URL of the web page.
[0046] The search results are ranked based on scores related to the resources identified by the search results, such as information retrieval (“IR”) scores, and optionally a separate ranking of each resource relative to other resources (e.g., an authority score). The search results can be ordered according to these scores and provided to the user device 106 according to the order. [0047] The user devices 106 receive the search results pages and render the pages for presentation to users. In response to the user selecting a search result at a user device 106, the user device 106 requests the resource identified by the resource locator included in the selected search result. The publisher of the web site 104 hosting the resource receives the request for the resource from the user device 106 and provides the resource to the requesting user device 106.
[0048] In some implementations, the queries submitted from user devices 106 are stored in query logs 114. Selection data (e.g., click data) for the queries and the web pages referenced by the search results are stored in selection logs 116. The query logs 114 and the selection logs 116 define search history data 110 that include data from and related to previous search requests associated with unique identifiers. The selection logs 116 define actions taken responsive to search results provided by the search engine 110. The query logs 114 and selection logs 116 can be used to map queries submitted by the user devices 106 to web pages 104 that were identified in search results and the actions taken by users (i.e., that data are associated with the identifiers from the search requests so that a search history for each identifier can be accessed). The selection logs 116 and query logs 114 can thus be used by the search engine 108 to determine the sequence of queries submitted by the user devices 106, the actions taken in response to the queries, and how often the queries are submitted.
[0049] Figure 2 is a flow diagram of an example process 200 for translating a search query from a first language to a second language and providing search results in response to the search query in both languages. Operations of process 200 are described below as being performed by the components of the system described and depicted in Figure 1. Operations of the process 200 are described below for illustration purposes only. Operations of the process 200 can be performed by any appropriate device or system, e.g., any appropriate data processing apparatus. Operations of the process 200 can also be implemented as instructions stored on a non- transitory computer readable medium. Execution of the instructions cause one or more data processing apparatus to perform operations of the process 200. [0050] A search engine 108 receives a search query from a user device 106 (at 205). As described above with reference to Figure 1, a user device 106 can submit a search query, which can be received and processed by the search engine 108.
[0051] The search engine 108 determines, based on the received search query, a first language specifying a language in which the search query is to be submitted/executed (at 210). In some implementations, the search engine 108 can determine the language in which the search query is to be executed as the language in which the query is written. In such implementations, a language detection service 118 (which can be part of the search engine 108 or separate and/or external from it) can be used to detect the first language in which the query is written. In some implementations, the first language can be a predicted language in which the user of the user device 106 likely wants the results. In such implementations, the search engine 108 can determine the first language using a machine learning model (e.g., a supervised or unsupervised machine learning model) to output the expected second language, where the machine learning model can be trained on previously-executed search queries, including the different parameters about the query (e.g., location of user device issuing query, entity identified in the query, language of content previously viewed by the user, etc.), and a corresponding label specifying the language in which the user wanted results. For example, although the language of a search query (“George Bush”) may be English, the machine learning model may determine, e.g., based at least on the query’s location of Germany (among other parameters), that the first language (i.e., the language in which the query is to be executed) is German.
[0052] For the search query received at operation 210, the search engine 108 obtains a first set of search results (at 215). In some implementations, based on the first language indicating the language in which the query is to be executed, the search engine 108 executes the search query (as described with reference to Figure 1) and in response, retrieves a first set of search results, which are generally in the first language.
[0053] The search engine 108 determines whether to translate the search query into a second language that is different from the first language (at 220). In some implementations, the search engine 108 determines whether to translate the search query into the second language based at least on languages in which the user of the user device has previously accessed content. In such implementations, the search engine 108 determines, from a profile of the user of the user device 106, a set of languages corresponding to the languages of the content that the user has previously accessed and a confidence level for each language in the set of languages (as further described below). [0054] The search engine 108 (or another content platform/publisher) can determine the language(s) of content that the user has previously accessed by parsing the attributes/metadata (e.g., a language attribute specifying the language of the content, a publication location, etc.) of the various content items accessed within a certain timeframe (e.g., one week, one month). In some implementations, the search engine 108 (or another content platform/publisher) can include a set of rules to determine the language of the content item based on the identified attribute(s)/metadata of the content item and its/their corresponding value(s). For example, if the content item includes a language attribute that specifies the language of the content item, the search engine 108 obtains the corresponding value of the language parameter that indicates the language in which the content item is written or otherwise provided. In this example, the search engine 108 determines that the language of the content item is the language specified by the value of the language attribute. As another example, if the content item includes a publication location that specifies the country of publication of the content item as USA and/or a title/abstract attribute that is written in English (e.g., as detected by the language detection service 118), the search engine 108 determines the language of the content item is English. [0055] Alternatively, or in addition to extracting and analyzing metadata/attributes of the content item, the search engine 108 (or another content platform/publisher) can also parse portions of the previously-accessed content items in determining the language of content that the user has previously accessed. For example, the search engine 108 can identify portions of previously-used content item (e.g., at random, a specified number of words of the content item, a specified amount of playing time of the content item, if the content item is non-textual) and use the language detection service 118 to identify the language of those portion of the content items. Based on this language identification for a portion of a content item, the search engine 108 determines that the language of the content item as a whole is the same as the language of the portion of the content item.
[0056] In some implementations, the search engine 108 (or another content platform/publisher) can store a data structure (e.g., a table) that maintains a correlation between the content items previously accessed by the user of the user device and an identification of the determined language (e.g., determined using the example techniques described above or other appropriate techniques) for that content item. In some implementations, the search engine can generate summary data specifying, e.g., a distribution of number (or percentage) of content items viewed/accessed by the user corresponding to the respective language of that content item. For example, such a distribution can be in the following forms: [ (English: 240) | (French: 10) | (Chinese:5) ] (representing actual counts); or [ (English:0.72) | (French:0.18) | (Chinese:0.10) ] (representing percentages).
[0057] Based on the data regarding the content items previously accessed by the user of the user device 106 and the languages of those items (also referred to as the set of languages), the search engine 108 (or another appropriate content platform/publisher) determines a confidence level for each of those languages. The confidence level for a particular language specifies a likelihood that the user of the user device 106 understands that language. In some implementations, the search engine 108 uses a rules-based engine, a machine learning model (e.g., a supervised or unsupervised model), or another appropriate statistical model to compute the confidence level for each language.
[0058] In implementations where a rules-based engine is used, the rules-based engine defines a set of rules that are analyzed against the content and language data. For example, the rules-based engine can specify rules that can be used to determine the confidence levels, such as, e.g., (1) if 70% or more of the content accessed by the user is in a particular language, then there is a 90% confidence that the user understands that language; or (2) if 20% or less of the content accessed by the user is in a particular language, then there is a 50% confidence that the user understands that language. In implementations where a machine learning model (e.g., a trained, supervised model) is used, the model can be trained on content/language datasets (specifying various parameters, e.g., language, type of content, length of content, etc.) for various users, and corresponding known labels specifying proficiency of the user in those language (e.g., 1 indicating full proficiency/understanding of the language, 0 indicating no proficiency/understanding of the language, 0.5 indicating intermediate proficiency/understanding of the language). The resulting/trained model can accept language/content data for the particular user as input and output a data structure identifying various languages and the corresponding confidence level indicating the expected proficiency of the user in each such language.
[0059] In some implementations, based on the set of languages and the corresponding confidence levels (as determined by the search engine and stored in a user profile for the user or as obtained from a content platform/publisher that determines the confidence levels), the search engine 108 identifies a second language into which the search query can be translated. The search engine 108 can identify the second language based on the confidence levels determined for the set of languages. For example, the search engine can identify the language, other than the first language, that has the highest determined confidence level. For example, assume that the first language of the query is English, and the set of languages and the corresponding confidence values are as follows: [English:!).95 | Spanish:!).32 | Chinese:!).12 | Japanese:!).18] In this example, the search engine 108 ignores confidence value for English (since English is the first language in which the query is submitted) and identifies the highest confidence value/level (0.32>0.18>0.12) and the corresponding language (Spanish). Thus, in this example, Spanish is identified as the second language. As such, the identified second language represents a language that the user is likely able to understand.
[0060] In some implementations, a confidence threshold is applied in evaluating whether to identify a particular language as the second language into which the query should be translated. If the language with the highest confidence value (other than the first language) satisfies (e.g., meets or exceeds) this threshold, then the search engine 108 identifies this language as the second language into which the search query is to be translated. On the other hand, if the language with the highest confidence value (other than the first language) does not satisfy (e.g., is less than) this threshold, then the search engine 108 determines not to identify this language as the second language into which the search query is to be translated. Using the above example, if the confidence threshold is 50% (0.5), the search engine 108 determines that the confidence level for Spanish (which is the language other than English with the confidence value/level) is less than the confidence threshold and thus, the query should not be translated into another language.
[0061] In some implementations, one or more of the following additional factors/attributes may be evaluated by the search engine 108 in determining whether to translate the search query into the second language. These can include, among others, (1) query length, (2) references to pornographic or other illicit information in the query, (3) frequency/ number of times the user of the user device 106 has executed search queries in languages other than the first language, or (4) whether the query is a navigational query (as described below). Each of these attributes and the search engine 108’s processing with respect to these attributes in determining whether to translate the search query, is described in the following paragraphs.
[0062] With respect to the query length attribute, the search engine 108 determines a length of the search query, which can indicate whether the query should be translated. For example, a short query (e.g., less than five characters such as “ball”) or a long query (e.g., greater than 200 characters) are generally not good candidates for translation. In some implementations, the search engine 108 determines the length of the search query by counting the characters included in the search query. For example, for a query “where is the zoo,” the search engine 108 would count each character of the query (excluding spaces between words) and determine that the length of the query is 13. Alternatively, or additionally, the search engine 108 determines the length of the search query by counting the words included in the search query. In such implementations, the search engine 108 can count each sequence of characters that is separated by a space as one word. In the above example query, the search engine 108 would determine that the query includes four words. Additionally, or alternatively, the search engine 108 can apply other techniques (e.g., spatial length) in determining the query length.
[0063] In some implementations, the length of the query is compared to a minimum query length threshold and/or a maximum query length threshold. If the query length is less than the minimum query length threshold (e.g., two words and twenty characters, five words, thirty characters) and/or greater than the maximum query length threshold (e.g., ten words and 140 characters, twenty words, 200 characters), the search engine 108 determines that the query should not be translated into another language. On the other hand, if the query length is same as or greater than the minimum query length threshold and/or query length is less than or same as the maximum query length threshold, the search engine 108 determines that the query should be translated into another language.
[0064] Additionally, or alternatively, the search engine 108 can determine whether to translate the search query based on whether the search query includes any pornographic or other illicit references. In some implementations, the search engine 108 can maintain a list of words or phrases that are known to be pornographic (or other illicit) references, and can compare word(s) of the search query with this known list of words and/or phrases. If a match is found (e.g., an exact match between the search query and a word/phrase in the list, a partial textual match between some words in the query and a word/phrase in the list), the search engine 108 determines that the query should not be translated. On the other hand, if a match is not found (e.g., no exact match or partial textual match), the search engine 108 determines that the query should be translated into another language.
[0065] Additionally, or alternatively, the search engine 108 can determine whether to translate the search query based on whether the user device 106 has previously submitted search queries written in another language (i.e., a language other than the language of the search query). In some implementations, the search engine 108 determines, based on historical records of executed search queries by the user of the user device 106, whether the user of the user device 106 has submitted search queries in another language and if so, whether the number of times that search queries have been issued in one or more other languages (i.e., languages other than the first language) satisfies (e.g., meets or exceeds) a particular threshold. For example, if the first language is English and if the user device 106 has submitted a certain number or percentage of French search queries (i.e., search queries written or provided in French) that satisfies a particular threshold, the search engine 108 can determine that the search query should be translated.
[0066] Additionally, or alternatively, the search engine 108 can determine whether to translate the search query based on whether the search query is a navigational query. A search query is navigational if it seeks a particular website or web page. For example, the name of a social networking website, if entered as a search query, would be considered navigational. In general, navigational queries are the names of the popular/well-known entities. Accordingly, in some implementations, the search engine 108 can maintain a list of popular entities and compare the entered search query against that list. If a match (e.g., an exact match) is found, the search engine 108 determines that the search query is navigational and determines that the search query should not be translated. On the other hand, if a match (e.g., an exact match) is not found, the search engine 108 determines that the search query is not navigational and determines that the search query should be translated (or evaluates one or more additional factors in determining whether to translate the search query).
[0067] Thus, based on an evaluation of one or more of the above factors, the search engine 108 determines that the search query should be translated. In some implementations, each of the above-analyzed factors can be assigned a score that can be combined (e.g., summed up) and/or normalized. If the normalized/ combined score satisfies (e.g., meets or exceeds) a predetermined threshold, the search engine 108 can determine that the search query should be translated. On the other hand, if the normalized score does not satisfy the predetermined threshold, the search engine 108 can determine that the search query should not be translated. Other techniques for combining the one or more of the above (or other) factors can be used to determine whether to translate the search query.
[0068] Upon determining to translate the search query, the search engine 108 can determine the second language in which to translate the search query based on the set of languages in which the user has previously accessed content and the corresponding confidence levels for each of those languages, as described above.
[0069] Returning to Figure 3, in response to determining to translate the search query into the second language, the search engine 108 translates the search query into the second language (at 225). In some implementations, the search engine 108 includes a translation engine 120 (or interacts with a translation engine 120 that may be separate and/or external to the search engine 108) to translate the search query into the identified second language to obtain a translated search query in the second language. [0070] The search engine 108 obtains a second set of search results for the translated search query (at 230). In some implementations, the search engine 108 executes the translated search query and retrieves a second set of search results that are responsive to this translated search query. This second set of search results is generally in the second language; however, the search results may also be in one or more other languages.
[0071] The search engine 108 determines whether to provide the second set of search results along with the first set of search results (at 235). In some implementations, the search engine 108 determines whether to provide the second set of search results along with the first set of search results based on an evaluation of one or more of the following factors: (1) whether a topic co-occurrence exists between a subset of search results from the second set of search results and a subset of search results from the first set of search results; (2) a popularity of the first search query and/or the first set of search results received in response to the first search query; (3) a popularity of one or more search results in the second set of search results obtained for the translated search query; (4) whether the second set of search results includes a number of search results in a language other than the second language; or (5) IR scores for one or more results in the second set of search results. Each of these factors and the corresponding operation of the search engine 108 is described below.
[0072] In some implementations, the search engine 108 determines whether a topic co occurrence exists between a subset of search results from the second set of search results (e.g., a top-N search results in the second set of search results) and a subset of search results from the first set of search results (e.g., a top-N search results in the second set of search results). The search engine does this by determining that the subset of search results from the second set of search results includes one or more entities that are also present in the subset of search results from the first set of search results. In such implementations, the search engine 108 parses each subset of search results to identify the entity/entities identified in the results and compares the lists of entities for the two subsets of search results. In some implementations, if the subset of search results from the second set includes a threshold number of matching entities that are also included in the subset of search results for the first set, the search engine 108 determines to provide the second set of search results (or determines to evaluate additional factors that can indicate whether to provide the second set of search results). On the other hand, if the subset of search results from the second set does not include a threshold number of matching entities that are included in the subset of search results for the first set, the search engine 108 determines not to provide the second set of search results. [0073] Additionally, or alternatively, the search engine 108 can determine whether to provide the second set of search results based on the popularity of the search query (in the first language) and/or the first set of search results. The popularity can be based on the number of times that the search query has been submitted and/or the number of times that one or more results in the first set of search results has been interacted with by user devices 106. In some implementations, the search engine 106 determines the popularity of the search query and/or the first set of search results by determining, based on historical records of executed search queries, whether the search query (in the first language) has been issued a threshold number of times and/or whether one or more results of the first set of search results provided in response to the search query has previously received a threshold number of interactions (e.g., user clicks, time spent on the one or more of the first set of search results, etc.). For example, if the search query (in the first language) has been issued a certain number of times number that satisfies (e.g., meets or exceeds) a predetermined threshold and/or if the number of interactions with one or more search results in the first set of search results satisfies (e.g., meets or exceeds) an interaction threshold (which can be a dynamic or static threshold), the search engine can determine to provide the second set of search results. On the other hand, if the search query (in the first language) has been issued a certain number of times number that does not satisfy (e.g., is less than) a predetermined threshold and/or if the number of interactions with one or more search results in the first set of search results does not satisfy (e.g., is less than) an interaction threshold (which can be a dynamic or static threshold), the search engine 106 can determine not to provide the second set of search results.
[0074] Additionally, or alternatively, the search engine 108 can determine whether to provide the second set of search results based on a popularity of the search query in the second language. In some implementations, the search engine 108 can determine the popularity of the search query in the second language by determining, based on historical records of executed search queries, that the number of times the search query has been issued in the second language satisfies (e.g., meets or exceeds) a particular threshold. For example, assuming the second language is French, this factor weighs in favor of providing the second set of search results when the number of times the search has been executed in French (e.g., by a user device 106 for which the first language is a language other than French) satisfies (e.g., meets or exceeds) a particular threshold.
[0075] Additionally, or alternatively, the search engine 108 can determine whether to provide the second set of search results based on whether the second set of search results includes a number of search results in a language other than the second language. If this number satisfies (e.g. , meets or exceeds) a language threshold (which can be dynamic or static), the search engine 108 can determine not to provide the second set of search results. On the other hand, if this number does not satisfy (e.g., is less than) a language threshold (which can be dynamic or static), the search engine 108 can determine to provide the second set of search results.
[0076] Additionally, or alternatively, the search engine 108 can determine whether to provide the second set of search results based on the search engine’s determined IR scores (or another similar metric) (as described above with reference to Figure 1) for the second set of search results. In some implementations, if the search engine 108 determines that a certain number of search results in the second set of search results has an IR score that satisfies (e.g., meets or exceeds) an IR threshold, the search engine 108 determines to provide the second set of search results. On the other hand, if the search engine 108 determines that a certain number of search results in the second set of search results has an IR score that does not satisfy (e.g., is less than) the IR threshold, the search engine 108 determines not to provide the second set of search results. Alternatively, or additionally, in some implementations, the search engine 108 can determine whether to provide the second set of search results based on the search engine’s comparison between the determined IR scores (as described above with reference to Figure 1) of the first set of search results and the second set of search results.
For example, if the search engine 108 determines that a certain number of search results in the second set of search results has an IR score that is the same as (or greater than) the IR scores for a subset of the first set of search results, the search engine 108 determines to provide the second set of search results. On the other hand, if the search engine 108 determines that a certain number of search results in the second set of search results has an IR score that is less than the IR scores for a subset of the first set of search results, the search engine 108 determines not to provide the second set of search results.
[0077] In some implementations, the search engine 108 can process/analyze multiple factors described above (or appropriate factors) in determining whether to provide the second set of search results.
[0078] If the search engine 108 determines not to provide the second set of search results based on the above-described processing, the search engine provides, for presentation on a user device 106, a search results page that includes the first set of search results (but not the second set of search results) (at 240). On the other hand, if the search engine 108 determines to provide the second set of search results based on the above-described processing, the search engine 106 provides, for presentation on the user device 106, a search results page (such as those shown in Figures 3A-3D) that includes the first set of search results as well as the second set of search results (at 245).
[0079] In some implementations, the search engine 108 can translate (e.g., using the translation engine 120) one or more search results in the second set of search results into the first language prior to providing these for display in the search results page. For example, if the second language is French and some of the search results are in French (e.g., as determined by the language detection service 118), the search engine 108 can translate these French search results into the first language (e.g., English). Alternatively, instead of translating the search results in the second set of search results, the search engine 108 can provide a user-selectable translation interface component (e.g., a button or plug-in) in proximity to the appropriate search result(s) in the second language (or another language other than the first language). Upon selection of this user-selectable interface component (e.g., a button or a plug-in; as shown at element 342 in Figure 3D), the search engine 108 can invoke the translation engine 120 to provide a translation of the specific search result from the second language (or another language other than the first language) to the first language.
[0080] Although the operations of Figure 2 are described in the context of translating the search query into a second language and providing search results in the second language, the above operations can be used to translate the search query into multiple languages (other than the first language) and provide search results in each of those languages (e.g., as shown in Figure 3D).
[0081] In summary, the search engine 108 can obtain a search query that is to be executed in a first language, determines to translate the search query into a second language (or one or more additional languages), obtain search results in response to the search query in the first language and the second language (and/or one or more other languages), and provide these search results for display within a search results page.
[0082] Figures 3A-3D show example user interfaces in which a search results page is provided for display on the user device 106 and includes the first and second sets of search results that are provided in response to the submission of a search query in a first language and a second language, respectively.
[0083] As shown in Figure 3A, a search results page 308 is provided for the search query 306 of “brexit.” On this search results page 308, the first set of search results (in English) are provided as search results 302-A and 302-B. The second set of search results (in French) are provided in the carousel/panel 304. The user of the user device 106 can horizontally scroll/navigate through the second set of search results provided in this carousel/panel 304. The user of the user device 106 can vertically scroll/navigate through the first set of search results. In this manner, and as shown, the first set of search results are provided in a first portion of the search results page and the second set of search results are provided in a second portion of the search results page that is delineated from the first portion. In some implementations, other visually-separated user interface (UI) containers (i.e., other than the horizontally-navigable carousel) can be used to provide the second set of results.
[0084] Figure 3B shows another search results page 310 that is provided in response to the search query 312 of “causes of severe headache.” On this search results page 310, the first set of search results (314 and 316) are shown in the same manner as in Figure 3A and the second set of search results are shown in a panel/carousel 318 (as in Figure 3 A). In some implementations, and as shown, the search results page 310 also displays the translated search query (i.e., the search query in the second language). This translated search query is a clickable/selectable link that, upon being clicked or selected, results in the search engine executing and providing a separate search results page with search results just for the translated query. A user of the user device 106 may click on/select this link, e.g., to obtain additional search results for the translated search query (beyond the number of results provided in the carousel/panel 318).
[0085] Figure 3C shows another search results page 320 that is provided in response to the search query 322 of “brexit.” On this search results page 320, the first set of search results (324 and 326) are shown in the same manner as in Figure 3A and the second set of search results are shown in a panel/carousel 328 (as in Figure 3A). Each (or at least some) of the second set of search results provided in the carousel/panel 328 include an image (e.g., 330-A and 330-B). The search engine 106 obtains each of the images from the underlying resource corresponding to the search result and, as shown, provides each such image for display in association with the respective search result.
[0086] Figure 3D shows another search results page 332 that is provided in response to the search query 334 of “brexit.” On this search results page 332, the first set of search results (336) is shown in the same manner as in Figure 3A and the second set of search results are shown in a panel/carousel 338 (as in Figure 3A). Each (or at least some) of the second set of search results provided in the carousel/panel 328 includes an image from the resource corresponding to the search result (as in Figure 3C). In addition, the carousel/panel 338 is configured to enable provision of the second set of search results for queries translated in more than one language other than the first language (e.g., English). For example, the carousel/panel 338 can include a set of buttons 340, each button being associated with a particular country. Pressing a particular button from this set of buttons (340) results in the search query being translated into the language for the country associated with the selected button. For example, in response to receiving the indication that the user device 106 selected the Ireland button, the search engine 108 executes the search query against Irish resources and provides a set of results in the carousel/panel 338. As another example, in response to receiving the indication that the user device 106 selected the Germany button, the search engine 108 executes the search query against German resources and provides a set of results in the carousel/panel 338.
[0087] As shown in carousel 338, one of the search results in not in the first language (English in this case). For this search result, the search engine 108 provides a user-selectable translation interface component 342 (e.g., a button or plug-in) in proximity to the search result. Upon selection of this user-selectable interface component 342, the search engine 108 can invoke the translation engine 120 to provide a translation of the specific search result from the language detected language (e.g., German) to the first language (e.g., English) and provided the translated search result on the same search results page (e.g., in place of the original, untranslated search result). In some implementations, the search engine 108 automatically invokes the translation engine 120 when it detects a search result is in a language other than the first language and obtains the translation in the first language of the search result. In such implementations, the search engine 108 automatically provides, within the carousel 338, the translated search result in the first language.
[0088] Figure 4 is a block diagram of an example computer system 400 that can be used to perform operations described above. The system 400 includes a processor 410, a memory 420, a storage device 430, and an input/output device 440. Each of the components 410, 420, 430, and 440 can be interconnected, for example, using a system bus 450. The processor 410 is capable of processing instructions for execution within the system 400. In some implementations, the processor 410 is a single-threaded processor. In another implementation, the processor 410 is a multi-threaded processor. The processor 410 is capable of processing instructions stored in the memory 420 or on the storage device 430.
[0089] The memory 420 stores information within the system 400. In one implementation, the memory 420 is a computer-readable medium. In some implementations, the memory 420 is a volatile memory unit. In another implementation, the memory 420 is a non-volatile memory unit.
[0090] The storage device 430 is capable of providing mass storage for the system 400. In some implementations, the storage device 430 is a computer-readable medium. In various different implementations, the storage device 430 can include, for example, a hard disk device, an optical disk device, a storage device that is shared over a network by multiple computing devices (e.g., a cloud storage device), or some other large capacity storage device.
[0091] The input/output device 440 provides input/output operations for the system 400. In some implementations, the input/output device 440 can include one or more of a network interface devices, e.g., an Ethernet card, a serial communication device, e.g., and RS-232 port, and/or a wireless interface device, e.g., and 802.11 card. In another implementation, the input/output device can include driver devices configured to receive input data and send output data to peripheral devices 460, e.g., keyboard, printer and display devices. Other implementations, however, can also be used, such as mobile computing devices, mobile communication devices, set-top box television client devices, etc.
[0092] Although an example processing system has been described in Figure 4, implementations of the subject matter and the functional operations described in this specification can be implemented in other types of digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
[0093] Embodiments of the subject matter and the operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on computer storage media (or medium) for execution by, or to control the operation of, data processing apparatus. Alternatively, or in addition, the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them. Moreover, while a computer storage medium is not a propagated signal, a computer storage medium can be a source or destination of computer program instructions encoded in an artificially-generated propagated signal. The computer storage medium can also be, or be included in, one or more separate physical components or media (e.g., multiple CDs, disks, or other storage devices). [0094] The operations described in this specification can be implemented as operations performed by a data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.
[0095] The term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, a system on a chip, or multiple ones, or combinations, of the foregoing. The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of them. The apparatus and execution environment can realize various different computing model infrastructures, such as web services, distributed computing and grid computing infrastructures.
[0096] A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, obj ect, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
[0097] The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform actions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
[0098] Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for performing actions in accordance with instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device (e.g., a universal serial bus (USB) flash drive), to name just a few. Devices suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
[0099] To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user’s client device in response to requests received from the web browser.
[00100] Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g. , as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), an inter-network (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer- to-peer networks).
[00101] The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some embodiments, a server transmits data (e.g., an HTML page) to a client device (e.g., for purposes of displaying data to and receiving user input from a user interacting with the client device). Data generated at the client device (e.g., a result of the user interaction) can be received from the client device at the server.
[00102] While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a sub combination or variation of a sub combination.
[00103] Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
[00104] Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking and parallel processing may be advantageous.
[00105] What is claimed is:

Claims

1. A computer implemented method comprising: receiving, from a user device associated with a user, a search query; determining, based on the search query, a first language that specifies a language in which the search query is to be executed; obtaining, for the search query, a first set of search results that are in the first language; determining to translate the search query into a second language that is different from the first language, wherein the determining comprises: determining, from a profile of the user, a set of languages and a confidence level for each language in the set of languages, wherein each language in the set of languages indicates a language of content that the user has previously accessed and wherein the confidence level for each language specifies a likelihood that the user understands the language; and identifying, based on the confidence levels, the second language from the set of languages; in response to determining to translate the search query into the second language, translating the search query into the second language to obtain a translated search query; obtaining, for the translated search query, a second set of search results that are in the second language; determining a topic co-occurrence exists between a subset of search results from the second set of search results and a subset of search results from the first set of search results; in response to determining that the topic co-occurrence exists, determining to provide the second set of search results for presentation on the user device; and providing, for presentation on the user device, a search results page that includes the first set of search results and the second set of search results.
2. The computer implemented method of claim 1, wherein determining to translate the search query into a second language that is different from the first language further comprises: determining that a length of the search query is greater than a minimum query length threshold and is less than a maximum query length threshold.
3. The computer implemented method of claim 2, wherein determining to provide the second set of search results for presentation on the user device is further based on a popularity of one or more search results in the second set of search results obtained for the translated search query, the popularity being based on the one or more search results being accessed a number of times that exceeds a popularity threshold.
4. The computer implemented method of any preceding claim: wherein identifying the second language from the set of languages, comprises determining that the confidence level for the second language does not satisfy a second confidence threshold; the method further comprising translating the second set of search results into the first language to obtain a translated second set of search results; and wherein providing, for presentation on the user device, a search results page that includes the first set of search results and the second set of search results, comprises: providing a search results page that includes the first set of search results and the translated second set of search results.
5. The computer implemented method of any preceding claim, wherein determining to translate the search query into the second language that is different from the first language, further comprises one or more of: determining that the search query does not include any pornographic references; determining that the search query is not a navigational query; and determining, based on historical records of executed search queries, that a number of times the search query has been issued in the second language satisfies a particular threshold.
6. The computer implemented method of any preceding claim, wherein determining to provide the second set of search results for presentation on the user device further comprises one or more of: determining, based on historical records of executed search queries, that the first set of search results has not received a number of interactions that satisfies an interaction threshold; determining that the second set of search results includes a number of search results in a language other than the second language that satisfies a language threshold; determining, based on historical records of executed search queries, that a number of times the search query has been issued in the second language satisfies a particular threshold; or determining that the second set of search results includes a number of search results with an IR score that does not satisfy an IR threshold.
7. The computer implemented method of any preceding claim, wherein determining the topic co-occurrence exists between a subset of search results from the second set of search results and a subset of search results from the first set of search results, comprises determining that the subset of search results from the second set of search results includes one or more entities that are present in the subset of search results from the first set of search results.
8. The computer implemented method of any preceding claim, wherein providing, for presentation on the user device, a search results page that includes the first set of search results and the second set of search results, comprises: providing, for presentation on the user device, the first set of search results in a first portion of the search results page; and providing, for presentation on the user device, the second set of search results in a second portion of the search results page, wherein the second portion is delineated from the first portion.
9. The computer implemented method of any preceding claim, wherein each search result in the second set of search results includes an image obtained from a resource to which the search result is linked.
10. The computer implemented method of any preceding claim, wherein providing a search results page that includes the first set of search results and the second set of search results comprises providing the search results page with two groupings of search results, with a first group including the first set of search results and a second group including the second set of search results.
11. A computer implemented method comprising: receiving, from a user device associated with a user, a search query; determining, based on the search query, a first language that specifies a language in which the search query is written; obtaining, for the search query, a first set of search results that are in the first language; determining, from a profile of the user, a set of languages in which the user has previously accessed content and a confidence level for each language in the set of languages, wherein the confidence level for a particular language specifies a likelihood that the user understands the particular language; identifying a second language from the set of languages, wherein the second language is different from the first language and the confidence level for the second language satisfies a first confidence threshold; translating the search query into the second language; obtaining, for the translated search query, a second set of search results that are in the second language; determining that a topic co-occurrence exists between a subset of search results from the second set of search results and a subset of search results from the first set of search results; in response to determining that the topic co-occurrence exists, determining to provide the second set of search results for presentation on the user device; and providing, for presentation on the user device, a search results page with two groupings of search results, with a first group including the first set of search results and a second group including the second set of search results.
12. A system, comprising: one or more processors; and one or more memories having stored thereon computer readable instructions configured to cause the one or more processors to carry out the method of any of claims 1-10.
13. A computer readable medium storing instructions that upon execution by one or more computers cause the one or more computers to perform operations of the method of any of claims 1-10.
PCT/IB2020/000750 2019-09-20 2020-09-17 Multilingual search queries and results WO2021053391A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962903698P 2019-09-20 2019-09-20
US62/903,698 2019-09-20

Publications (1)

Publication Number Publication Date
WO2021053391A1 true WO2021053391A1 (en) 2021-03-25

Family

ID=72801757

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2020/000750 WO2021053391A1 (en) 2019-09-20 2020-09-17 Multilingual search queries and results

Country Status (1)

Country Link
WO (1) WO2021053391A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220335046A1 (en) * 2021-04-20 2022-10-20 Fujitsu Limited Computer-readable recording medium storing information generating program, information generating method, and information generating apparatus
US20230084294A1 (en) * 2021-09-15 2023-03-16 Google Llc Determining multilingual content in responses to a query

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090024599A1 (en) * 2007-07-19 2009-01-22 Giovanni Tata Method for multi-lingual search and data mining
US20110313995A1 (en) * 2010-06-18 2011-12-22 Abraham Lederman Browser based multilingual federated search
WO2014031214A1 (en) * 2012-08-23 2014-02-27 Google Inc. Providing content in multiple languages

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090024599A1 (en) * 2007-07-19 2009-01-22 Giovanni Tata Method for multi-lingual search and data mining
US20110313995A1 (en) * 2010-06-18 2011-12-22 Abraham Lederman Browser based multilingual federated search
WO2014031214A1 (en) * 2012-08-23 2014-02-27 Google Inc. Providing content in multiple languages

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220335046A1 (en) * 2021-04-20 2022-10-20 Fujitsu Limited Computer-readable recording medium storing information generating program, information generating method, and information generating apparatus
US20230084294A1 (en) * 2021-09-15 2023-03-16 Google Llc Determining multilingual content in responses to a query

Similar Documents

Publication Publication Date Title
US9177018B2 (en) Cross language search options
US9177046B2 (en) Refining image relevance models
US11580181B1 (en) Query modification based on non-textual resource context
US8914349B2 (en) Dynamic image display area and image display within web search results
EP3529714B1 (en) Animated snippets for search results
US9396413B2 (en) Choosing image labels
US8873867B1 (en) Assigning labels to images
US8856125B1 (en) Non-text content item search
US8600973B1 (en) Removing substitution rules
US9183577B2 (en) Selection of images to display next to textual content
US20150370833A1 (en) Visual refinements in image search
JP2019522852A (en) System and method for providing contextual information
US10698888B1 (en) Answer facts from structured content
WO2021053391A1 (en) Multilingual search queries and results
US9811592B1 (en) Query modification based on textual resource context
US10146849B2 (en) Triggering answer boxes
US8838621B1 (en) Location query processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20788869

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20788869

Country of ref document: EP

Kind code of ref document: A1