DE102017125474A1

DE102017125474A1 - CONTEXTUAL COMMENTING OF INQUIRIES

Info

Publication number: DE102017125474A1
Application number: DE102017125474.9A
Authority: DE
Inventors: Ibrahim Badr; Nils Grimsmo; Gokhan H. Bakir; Kamil Anikiej; Aayush Kumar; Viacheslav Kuznetsov
Original assignee: Google LLC
Current assignee: Google LLC
Priority date: 2017-03-20
Filing date: 2017-10-30
Publication date: 2018-09-20
Also published as: DE202017106609U1; GB2560785A; EP3583514A1; WO2018174849A1; CN108628919A; GB201717984D0

Abstract

Verfahren, Systeme und Vorrichtungen, einschließlich Computerprogrammen, die auf einem Computerspeichermedium codiert sind, zum kontextuellen Eindeutigmachen von Anfragen sind offenbart. Bei einem Aspekt enthält ein Verfahren ein Empfangen eines Bilds, das auf einer Anzeige einer Computervorrichtung präsentiert wird, und einer Transkription einer durch einen Anwender der Computervorrichtung gesprochenen Äußerung, ein Identifizieren eines bestimmten Unterbilds, das im Bild enthalten ist, und, basierend auf einem Durchführen einer Bilderkennung an dem bestimmten Unterbild, ein Bestimmen von einem oder mehreren ersten Labels, die einen Kontext des bestimmten Unterbilds anzeigen. Das Verfahren enthält auch, basierend auf einem Durchführen einer Texterkennung an einem Teilbereich des Bilds, der ein anderer als das bestimmte Unterbild ist, ein Bestimmen von einem oder mehreren zweiten Labels, die den Kontext des bestimmten Unterbilds anzeigen, basierend auf der Transkription, den ersten Labels und den zweiten Labels, ein Erzeugen einer Suchanfrage und ein Liefern, zur Ausgabe, der Suchanfrage.

Methods, systems and apparatuses, including computer programs encoded on a computer storage medium, for contextualizing requests are disclosed. In one aspect, a method includes receiving an image presented on a display of a computing device and a transcription of an utterance spoken by a user of the computing device, identifying a particular sub-image contained in the image, and based on performing image recognition on the particular sub-image, determining one or more first labels that indicate a context of the particular sub-image. The method also includes, based on performing text recognition on a portion of the image other than the particular sub-image, determining one or more second labels that indicate the context of the particular sub-image based on the transcription, the first Labels and the second labels, generating a search query and delivering, for output, the search query.

Description

GEBIETTERRITORY

Die vorliegende Beschreibung betrifft Suchmaschinen.The present description relates to search engines.

HINTERGRUNDBACKGROUND

Allgemein enthält eine Suchanfrage einen oder mehrere Ausdrücke, die auf eine Anforderung der Ausführung einer Suche hin zu einer Suchmaschine überreicht werden. Beispielsweise kann ein Anwender Anfrageausdrücke einer Suchanfrage durch Tippen auf einer Tastatur oder, in dem Fall einer Sprachanfrage, durch Sprechen der Anfrageausdrücke in ein Mikrophon einer Computervorrichtung eingeben. Sprachanfragen können unter Verwendung einer Spracherkennungstechnologie verarbeitet werden.Generally, a search request includes one or more expressions that are presented upon a request to perform a search towards a search engine. For example, a user may enter query expressions of a search query by typing on a keyboard or, in the case of a voice request, by speaking the query expressions into a microphone of a computing device. Voice queries can be processed using speech recognition technology.

ZUSAMMENFASSUNGSUMMARY

Bei einigen Implementierungen kann ein Bild entsprechend einem Teilbereich einer Anzeige einer Computervorrichtung analysiert werden, um einem Anfrageverarbeitungssystem beim Antworten einer Anfrage in natürlicher Sprache zu helfen. Beispielsweise kann ein Anwender eine Frage über eine Fotografie stellen, die der Anwender auf der Computervorrichtung anschaut, wie beispielsweise „Was ist das?“. Die Computervorrichtung kann die Äußerung des Anwenders erfassen und ein jeweiliges Bild der Computervorrichtung erfassen, das der Anwender anschaut. Die Computervorrichtung verarbeitet die Äußerung, um eine Transkription der durch den Anwender der Computervorrichtung gesprochenen Äußerung zu erzeugen. Die Computervorrichtung sendet bzw. überträgt die Transkription und das Bild zu einem Server.In some implementations, an image corresponding to a portion of a display of a computing device may be analyzed to help a query processing system respond to a natural language query. For example, a user may ask a question about a photograph that the user is looking at on the computing device, such as "What is this?". The computing device may capture the user's utterance and capture a respective image of the computing device that the user is viewing. The computing device processes the utterance to produce a transcription of the utterance spoken by the user of the computing device. The computing device transmits the transcription and the image to a server.

Der Server empfängt die Transkription und das Bild von der Computervorrichtung. Der Server kann visuellen und textuellen Inhalt im Bild identifizieren. Der Server erzeugt Labels bzw. Etiketten für das Bild, die einem Inhalt des Bilds entsprechen, wie beispielsweise Standorte, Entitäten, Namen, Arten von Tieren, etc. Der Server kann ein bestimmtes Unterbild im Bild identifizieren. Das bestimmte Unterbild kann eine Fotografie oder eine Zeichnung sein. Bei einigen Aspekten identifiziert der Server einen Teilbereich des bestimmten Unterbilds, für den es wahrscheinlich ist, dass er von primärem Interesse für den Anwender ist, wie beispielsweise eine historische Sehenswürdigkeit im Bild. Der Server kann eine Bilderkennung auf dem bestimmten Unterbild durchführen, um Labels für das bestimmte Unterbild zu erzeugen. Der Server kann auch Labels für textuellen Inhalt im Bild erzeugen, wie beispielsweise Kommentare, die dem bestimmten Unterbild entsprechen, durch ein Durchführen einer Texterkennung an einem Teilbereich des Bilds, der ein anderer als das bestimmte Unterbild ist. Der Server kann eine Suchanfrage basierend auf der empfangenen Transkription und den erzeugten Labels erzeugen. Weiterhin kann der Server konfiguriert sein, um die Suchanfrage zur Ausgabe zu einer Suchmaschine zur Verfügung zu liefern.The server receives the transcription and the image from the computing device. The server can identify visual and textual content in the image. The server generates labels for the image that correspond to a content of the image, such as locations, entities, names, species of animals, etc. The server may identify a particular sub-image in the image. The particular subpicture may be a photograph or a drawing. In some aspects, the server identifies a portion of the particular sub-image that is likely to be of primary interest to the user, such as a historical landmark in the image. The server may perform an image recognition on the particular sub-image to generate labels for the particular sub-image. The server may also generate labels for textual content in the image, such as comments corresponding to the particular sub-image, by performing text recognition on a portion of the image other than the particular sub-image. The server can generate a search query based on the received transcription and the generated labels. Furthermore, the server may be configured to provide the search query for output to a search engine.

Ein innovativer Aspekt des in dieser Beschreibung beschriebenen Gegenstands wird in Verfahren verkörpert, die die Aktionen eines Empfangens eines Bilds enthalten, das auf, oder entsprechend zu, wenigstens einem Teilbereich einer Anzeige einer Computervorrichtung präsentiert wird, und eines Empfangens einer Transkription von, oder dem, was ihr entspricht, einer durch einen Anwender der Computervorrichtung gesprochenen Äußerung, typischerweise zu der Zeit, zu welcher das Bild präsentiert wird, eines Identifizierens eines bestimmten Unterbilds, das im Bild enthalten ist, und, basierend auf einem Durchführen einer Bilderkennung am bestimmten Unterbild, eines Bestimmens von einem oder mehreren ersten Labels, die einen Kontext des bestimmten Unterbilds anzeigen. Das Verfahren enthält auch, basierend auf einem Durchführen einer Texterkennung an einem Teilbereich des Bilds, der ein anderer als das bestimmte Unterbild ist, ein Bestimmen von einem oder mehreren zweiten Labels, die den Kontext des bestimmten Unterbilds anzeigen, basierend auf der Transkription, den ersten Labels und den zweiten Labels, ein Erzeugen einer Suchanfrage und ein Liefern, zur Ausgabe, der Suchanfrage.An innovative aspect of the subject matter described in this specification is embodied in methods including the actions of receiving an image presented on or corresponding to at least a portion of a display of a computing device and receiving a transcription from, or, what corresponds to it, an utterance spoken by a user of the computing device, typically at the time the image is presented, identifying a particular sub-image contained in the image, and, based on performing an image recognition on the particular sub-image Determining one or more first labels that indicate a context of the particular sub-image. The method also includes, based on performing text recognition on a portion of the image other than the particular sub-image, determining one or more second labels that indicate the context of the particular sub-image based on the transcription, the first Labels and the second labels, generating a search query and delivering, for output, the search query.

Solche Verfahrensschritte oder andere Kombinationen von Schritten, wie sie hierin beschrieben sind, können automatisch und ohne weitere Anwenderintervention ausgeführt werden, wie beispielsweise in Reaktion auf eine automatische Bestimmung durch die Computervorrichtung, dass das Verfahren zu einer bestimmten Zeit ausgeführt werden sollte, oder nach einem bestimmten Tastendruck, einem gesprochenen Befehl oder einer anderen Anzeige von einem Anwender der Computervorrichtung, dass ein solches Verfahren ausgeführt werden soll. Die hier beschriebenen Verfahren können daher eine effizientere Anwenderschnittstelle zur Anwendervorrichtung durch Reduzieren der Eingabe zur Verfügung stellen, die von einem Anwender erforderlich ist, um eine erwünschte oder wünschenswerte Suchanfragenerzeugung zu erreichen.Such method steps or other combinations of steps as described herein may be performed automatically and without further user intervention, such as in response to an automatic determination by the computing device that the method should be performed at a particular time, or after a particular one Keystroke, a spoken command, or other indication from a user of the computing device that such a procedure should be performed. The methods described herein may therefore provide a more efficient user interface to the user device by reducing the input required by a user to achieve a desirable or desirable search query generation.

Andere Implementierungen davon und andere Aspekte enthalten entsprechende Systeme, Vorrichtungen und Computerprogramme, die konfiguriert sind, um die Aktionen der Verfahren durchzuführen, codiert auf Computerspeichervorrichtungen.Other implementations thereof and other aspects include corresponding systems, devices, and computer programs that are configured to perform the actions of the methods encoded on computer memory devices.

Implementierungen können jeweils optional eines oder mehrere der folgenden Merkmale enthalten. Beispielsweise können die Verfahren eine Gewichtung des ersten Labels enthalten, die unterschiedlich von derjenigen der zweiten Labels ist. Die Verfahren können auch ein Erzeugen der Suchanfrage durch Ersetzen von einem oder mehreren der ersten Labels oder der zweiten Labels für Ausdrücke der Transkription enthalten. Bei einigen Aspekten enthalten die Verfahren ein Erzeugen für jedes der ersten Labels und der zweiten Labels eines Label-Vertrauenswerts, der eine Wahrscheinlichkeit anzeigt, dass das Label einem Teilbereich des bestimmten Unterbilds entspricht, der von primärem Interesse für den Anwender ist, und ein Auswählen von einem oder mehreren der ersten Labels und der zweiten Labels basierend auf den jeweiligen Label-Vertrauenswerten, wobei die Suchanfrage basierend auf dem einen oder den mehreren ausgewählten ersten Labels und zweiten Labels erzeugt wird. Weiterhin können die Verfahren ein Zugreifen auf historische Anfragedaten enthalten, die vorherige Suchanfragen enthalten, die durch andere Anwender geliefert sind, ein Erzeugen, basierend auf der Transkription, den ersten Labels und den zweiten Labels, von einer oder mehreren Kandidaten-Suchanfragen, ein Vergleichen der historischen Anfragedaten mit der einen oder den mehreren Kandidaten-Suchanfragen, und, basierend auf einem Vergleichen der historischen Anfragedaten mit der einen oder den mehreren Kandidaten-Suchanfragen, ein Auswählen der Suchanfrage unter der einen oder den mehreren Kandidaten-Suchanfragen.Implementations may each optionally include one or more of the following features. For example, the methods may include a weighting of the first label that different from that of the second labels. The methods may also include generating the search query by replacing one or more of the first labels or the second labels for terms of transcription. In some aspects, the methods include generating for each of the first labels and the second labels a label confidence value indicating a likelihood that the label corresponds to a portion of the particular sub-image that is of primary interest to the user and selecting one or more of the first labels and the second labels based on the respective label confidence values, the query being generated based on the one or more selected first labels and second labels. Furthermore, the methods may include accessing historical query data containing previous search queries provided by other users, generating, based on the transcription, the first labels and the second labels, one or more candidate search queries, comparing the historical query data with the one or more candidate search queries, and, based on comparing the historical query data with the one or more candidate search queries, selecting the search query among the one or more candidate search queries.

Die Verfahren können ein Erzeugen, basierend auf der Transkription, den ersten Labels und den zweiten Labels, von einer oder mehreren Kandidaten-Suchanfragen enthalten, ein Bestimmen, für jede der einen oder der mehreren Kandidaten-Suchanfragen, eines Anfrage-Vertrauenswerts, der eine Wahrscheinlichkeit anzeigt, dass die Kandidaten-Suchanfrage ein genaues Überschreiben der Transkription ist, und ein Auswählen, basierend auf den Anfrage-Vertrauenswerten, einer bestimmten Kandidaten-Suchanfrage als die Suchanfrage. Zusätzlich können die Verfahren ein Identifizieren von einem oder mehreren Bildern enthalten, die in dem Bild enthalten sind, ein Erzeugen für jedes des einen oder der mehreren Bilder, die in dem Bild enthalten sind, eines Bild-Vertrauenswerts, der eine Wahrscheinlichkeit anzeigt, dass ein Bild ein Bild von primärem Interesse für den Anwender ist, und, basierend auf den Bild-Vertrauenswerten für das eine oder die mehreren Bilder, ein Auswählen des bestimmten Unterbilds. Die Verfahren können ein Empfangen von Daten enthalten, die eine Auswahl eines Steuerereignisses bei der Computervorrichtung anzeigen, wobei das Steuerereignis das bestimmte Unterbild identifiziert. Bei einigen Aspekten ist die Computervorrichtung konfiguriert, um das Bild zu erfassen und um Audiodaten zu erfassen, die der Äußerung entsprechen, in Reaktion auf ein Erfassen von einem vordefinierten Stichwort („Hotword“).The methods may include generating, based on the transcription, the first labels and the second labels, one or more candidate search queries, determining, for each of the one or more candidate search queries, a query confidence score, a probability indicates that the candidate query is an exact override of the transcription and selecting, based on the query confidence values, a particular candidate query as the search query. In addition, the methods may include identifying one or more images included in the image, generating for each of the one or more images included in the image, an image confidence score indicating a probability that Image is an image of primary interest to the user, and, based on the image confidence values for the one or more images, selecting the particular sub-image. The methods may include receiving data indicative of a selection of a control event at the computing device, the control event identifying the particular sub-image. In some aspects, the computing device is configured to capture the image and to capture audio data that corresponds to the utterance in response to detecting a predefined keyword ("hotword").

Weiterhin können die Verfahren ein Empfangen eines zusätzlichen Bilds der Computervorrichtung und einer zusätzlichen Transkription einer zusätzlichen durch einen Anwender der Computervorrichtung gesprochenen Äußerung enthalten, ein Identifizieren eines zusätzlichen bestimmten Unterbilds, das im zusätzlichen Bild enthalten ist, basierend auf einem Durchführen einer Bilderkennung auf dem zusätzlich bestimmten Unterbild, ein Bestimmen von einem oder mehreren zusätzlichen ersten Labels, die einen Kontext des zusätzlichen bestimmten Unterbilds anzeigen, basierend auf einem Durchführen einer Texterkennung auf einen Teilbereich des zusätzlichen Bilds, der ein anderer als das zusätzliche bestimmte Unterbild ist, ein Bestimmen von einem oder mehreren zusätzlichen zweiten Labels, die den Kontext des zusätzlichen bestimmten Unterbilds anzeigen, basierend auf der zusätzlichen Transkription, den zusätzlichen ersten Labels und den zusätzlichen zweiten Labels, ein Erzeugen eines Befehls und ein Durchführen des Befehls. In diesem Fall kann ein Durchführen des Befehls ein Durchführen von einem oder mehreren eines Speicherns des zusätzlichen Bilds in einem Speicher, eines Speicherns des bestimmten Unterbilds im Speicher, eines Hochladens des zusätzlichen Bilds zu einem Server, eines Hochladens des bestimmten Unterbilds zum Server, eines Importierens des zusätzlichen Bilds zu einer Anwendung auf der Computervorrichtung und eines Importierens des bestimmten Unterbilds zur Anwendung der Computervorrichtung enthalten. Bei bestimmten Aspekten können die Verfahren ein Identifizieren von Metadaten enthalten, die mit dem bestimmten Unterbild assoziiert sind, wobei ein Bestimmen von dem einen oder den mehreren ersten Labels, die den Kontext des bestimmten Unterbilds anzeigen, weiterhin auf den Metadaten basiert, die mit dem bestimmten Unterbild assoziiert sind.Further, the methods may include receiving an additional image of the computing device and an additional transcription of an additional utterance spoken by a user of the computing device, identifying an additional particular sub-image contained in the additional image based on performing an image recognition on the additional determined one Subimage, determining one or more additional first labels indicating a context of the additional particular sub-image based on performing text recognition on a portion of the additional image other than the additional particular sub-image, determining one or more additional second labels indicating the context of the additional particular sub-image based on the additional transcription, the additional first labels, and the additional second labels, generating a B error and a completion of the command. In this case, performing the command may include performing one or more of storing the additional image in memory, storing the particular sub-image in memory, uploading the additional image to a server, uploading the particular sub-image to the server, importing of the additional image to an application on the computing device and importing the particular sub-image to the application of the computing device. In certain aspects, the methods may include identifying metadata associated with the particular sub-image, wherein determining the one or more first labels that indicate the context of the particular sub-image is further based on the metadata associated with the particular one Sub-picture are associated.

Vorteilhafte Implementierungen können eines oder mehrere der folgenden Merkmale enthalten. Die Verfahren können einen Kontext eines Bilds entsprechend einem Teilbereich einer Anzeige einer Computervorrichtung bestimmen, um bei der Verarbeitung von Anfragen in natürlicher Sprache zu helfen. Der Kontext des Bilds kann durch eine Bild- und/oder Text-Erkennung bestimmt werden. Spezifisch kann der Kontext des Bilds verwendet werden, um eine Transkription einer Äußerung eines Anwenders zu überschreiben bzw. neu zu schreiben. Die Verfahren können Labels erzeugen, die sich auf den Kontext des Bilds beziehen, und die Labels für Teilbereiche der Transkription ersetzen. Beispielsweise kann ein Anwender bei einem Anschauen einer Fotografie auf einer Computervorrichtung sein und fragen „Wo wurde dies aufgenommen?“. Die Verfahren können bestimmen, dass sich der Anwender auf das Foto auf dem Bildschirm der Computervorrichtung bezieht. Die Verfahren können Information über das Foto extrahieren, um einen Kontext des Fotos zu bestimmen, sowie einen Kontext von anderen Teilbereichen des Bilds, die das Foto nicht enthalten. In diesem Fall kann die Kontextinformation dazu verwendet werden, einen Standort zu bestimmen, bei welchem das Foto aufgenommen wurde. Als solches können die Verfahren Bilder entsprechend Anzeigen von Computervorrichtungen verwenden, um bei der Erzeugung von Suchanfragen zu helfen.Advantageous implementations may include one or more of the following features. The methods may determine a context of an image corresponding to a portion of a display of a computing device to assist in processing natural language queries. The context of the image can be determined by image and / or text recognition. Specifically, the context of the image may be used to rewrite or transcribe a transcription of a user's utterance. The methods may generate labels related to the context of the image and replace the labels for portions of the transcription. For example, when looking at a photograph, a user may be on a computing device and ask, "Where was this taken?". The methods may determine that the user refers to the photograph on the screen of the computing device. The methods may extract information about the photograph to determine a context of the photograph, and a context of other parts of the image that do not contain the photo. In this case, the context information may be used to determine a location at which the photo was taken. As such, the methods may use images corresponding to displays of computing devices to aid in the generation of search queries.

Bei einigen Aspekten können die Verfahren ein bestimmtes Unterbild in dem Bild identifizieren, das ein primärer Fokus des Anwenders ist. Die Verfahren können Labels erzeugen, die dem bestimmten Unterbild entsprechen, und Gewichtungslabels, die dem bestimmten Unterbild entsprechen, unterschiedlich von demjenigen von anderen Labels, so dass der Kontext des Bilds effektiver bestimmt werden kann. Die Verfahren können Labels basierend auf einer Berühmtheit des bestimmten Unterbilds im Bild gewichten, einer Häufigkeit, mit welcher die bestimmten Unterbild-Labels in historischen Suchanfragen erscheinen, einer Häufigkeit, mit welcher die bestimmten Unterbild-Labels in letzten Suchanfragen erscheinen, etc. Daher können die Verfahren primäre Punkte eines Anwenderinteresses im Bild identifizieren, um einen Kontext des Bilds als gesamtes zu bestimmen.In some aspects, the methods may identify a particular sub-picture in the image that is a primary focus of the user. The methods may generate labels corresponding to the particular sub-image and weight labels corresponding to the particular sub-image different from those of other labels so that the context of the image may be determined more effectively. The methods may weight labels based on a particular subpicture's fame in the image, a frequency at which the particular subpicture labels appear in historical queries, a frequency at which the particular subpicture labels appear in recent searches, and so on Method identify primary points of user interest in the image to determine a context of the image as a whole.

Die Details von einer oder mehreren Ausführungsformen der Erfindung sind in den beigefügten Zeichnungen und der nachstehenden Beschreibung dargelegt. Andere Merkmale und Vorteile der Erfindung werden aus der Beschreibung, den Zeichnungen und den Ansprüchen offensichtlich werden.The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features and advantages of the invention will become apparent from the description, the drawings and the claims.

Figurenlistelist of figures

1 FIG. 12 is a graphical representation of an example environment for contextualizing a query.
2 FIG. 4 is a graphical representation of an exemplary system for contextualizing a query.
3 Fig. 10 is a flowchart illustrating an example process for contextualizing a request.
4 FIG. 10 is a flowchart illustrating an example process for selecting a particular sub-image using confidence values. FIG.
5 FIG. 10 is a flow chart illustrating an example process for generating a search query using selected labels. FIG.
6 FIG. 10 is a diagram of an example computing device and example mobile computing device. FIG.

Gleiche Bezugszeichen und Bezeichnungen in den verschiedenen Zeichnungen zeigen gleiche Elemente an.Like reference numerals and designations in the various drawings indicate like elements.

DETAILLIERTE BESCHREIBUNGDETAILED DESCRIPTION

1 ist eine graphische Darstellung einer beispielhaften Umgebung 100 zum kontextuellen Eindeutigmachen einer Anfrage. Die Umgebung 100 enthält einen Anwender 102 und eine Computervorrichtung 104. In der Umgebung 100 liefert der Anwender 102 eine Äußerung 103, wie beispielsweise eine Anfrage, zur Computervorrichtung 104. Der Anwender 102 kann eine Frage über ein oder mehrere Objekte stellen, die auf einer graphischen Anzeige der Computervorrichtung 104 angezeigt sind. Beispielsweise kann die Äußerung 103 eine Anfrage enthalten, wie beispielsweise „Was ist das?“. In diesem Fall kann der Anwender 102 auf Objekte Bezug nehmen, wie beispielsweise ein Bild, Text, Video oder irgendeine Kombination davon, die auf der graphischen Anzeige der Computervorrichtung 104 angezeigt werden. Die Computervorrichtung 104 kann eine oder mehrere Computervorrichtungen enthalten, wie beispielsweise einen Laptop, einen Desktop, ein Smartphone, ein Tablet oder irgendeine andere Computervorrichtung, die bekannt ist. 1 is a graphical representation of an example environment 100 for contextualizing a request. The environment 100 contains a user 102 and a computer device 104 , In the neighborhood 100 provides the user 102 a statement 103 , such as a request, to the computing device 104. The user 102 may ask a question about one or more objects displayed on a graphical display of computing device 104. For example, the utterance 103 include a request, such as "What is this?". In this case, the user can 102 refer to objects, such as an image, text, video or any combination thereof, displayed on the graphical display of the computing device 104. The computer device 104 may include one or more computing devices, such as a laptop, a desktop, a smartphone, a tablet, or any other computing device that is known.

Die Äußerung 103 des Anwenders 102 kann kontextuell mehrdeutig sein. In diesem Fall kann die Äußerung 103 nicht durch einen Namen direkt Bezug nehmen auf den Inhalt, der bei der Computervorrichtung 102 angezeigt wird. Jedoch kann ein Kontext der angezeigten Objekte bestimmt werden und kann der Kontext in Kombination mit einer Transkription entsprechend der Äußerung 103 verwendet werden, um die Anfrage eindeutig zu machen.The comment 103 of the user 102 can be contextually ambiguous. In this case, the utterance 103 Do not refer directly to the contents of a computer by a name 102 is shown. However, a context of the displayed objects may be determined and the context may be combined with a transcription corresponding to the utterance 103 used to make the request unique.

Die Computervorrichtung 104 kann konfiguriert sein, um ein Bild 106 zu erfassen, das auf einer Anzeige der Computervorrichtung 104 präsentiert wird, wenn die Äußerung 103 des Anwenders 102 empfangen wird. Beispielsweise kann die Computervorrichtung 104 einen Teilbereich der Anzeige erfassen, die eine Fotografie 108 und Kommentierungen 116, die der Fotografie entsprechen, enthält, aber nicht ein Logo-Symbol 120 enthält, wie beispielsweise einen Anwendungstitel, den die Computervorrichtung 104 laufen lässt. Bei einigen Beispielen entspricht das Bild 106 einem Bildschirmfoto der Computervorrichtung 104. Alternativ oder zusätzlich kann die Computervorrichtung 104 persistent bzw. dauerhaft den angezeigten Inhalt erfassen und bestimmte Unterbilder auf eine Erfassung der Äußerung 103 hin senden bzw. übertragen. Weiterhin kann das Bild 106 auf eine Erfassung eines vordefinierten Hotword in der Äußerung 103 hin erfasst werden. Die Computervorrichtung 104 kann die Äußerung 103 transkribieren. Bei einigen Implementierungen kann die Computervorrichtung 104 Audiodaten entsprechend der Äußerung 103 zu einer Spracherkennungsmaschine senden und eine Transkription der Äußerung 103 von der Spracherkennungsmaschine empfangen.The computer device 104 can be configured to take a picture 106 to capture that on a display of the computing device 104 is presented when the utterance 103 of the user 102 Will be received. For example, the computing device 104 capture a portion of the ad that contains a photograph 108 and comments 116 that match the photograph contains, but not a logo icon 120 contains, such as an application title, the computing device 104 to run. In some examples, the picture is the same 106 a screenshot of the computer device 104 , Alternatively or additionally, the computing device 104 persistently or permanently capture the displayed content and send or transmit certain sub-images upon detection of the utterance 103. Furthermore, the picture can 106 on capturing a predefined hotword in the utterance 103 be detected. The computer device 104 can the utterance 103 transcribe. In some implementations, the computing device may 104 Audio data according to the statement 103 to send a speech recognition engine and a transcription of the utterance 103 received by the speech recognition engine.

Die Transkription entsprechend der Äußerung 103 und dem Bild 106 kann zu einem Server über ein Netzwerk zur Verarbeitung (z.B. Eindeutigmachen der Äußerung) übertragen bzw. gesendet werden. Der Server kann konfiguriert sein, um einen Kontext des Bilds 106 durch Analysieren des Bilds 106 zu bestimmen. Der Server kann den Kontext des Bilds 106 durch Identifizieren und Analysieren von Bildern von Fotografien in dem Bild bestimmen. Beispielsweise kann eine Fotografie 108 analysiert werden, um zu identifizieren, dass die Fotografie 108 eine oder mehrere Entitäten in der Fotografie 108 enthält. Unter Bezugnahme auf die beispielhafte Umgebung 100 der 1 kann die Fotografie 108 durch den Server identifiziert werden und dann analysiert werden, um zu bestimmen, dass die Fotografie 108 Entitäten enthält, wie beispielsweise den Eiffelturm 110 und einen Hund 112 vor dem Eiffelturm 110.Transcription according to the statement 103 and the picture 106 can be transmitted to a server over a network for processing (eg, to make the utterance clear). The server can be configured to display a context of the image 106 by analyzing the image 106 to determine. The server can change the context of the image 106 by identifying and analyzing images of photographs in the image. For example, a photograph 108 may be analyzed to identify the photograph 108 one or more entities in the photograph 108 contains. With reference to the exemplary environment 100 of the 1 can the photography 108 are identified by the server and then analyzed to determine that the photograph contains 108 entities, such as the Eiffel Tower 110 and a dog 112 in front of the Eiffel Tower 110 ,

Bei einigen Beispielen führt der Server eine Bilderkennung an dem bestimmten Unterbild 108 durch. Die Bilderkennung wird durchgeführt, um ein oder mehrere erste Labels zu bestimmen, die einen Kontext des bestimmten Unterbilds anzeigen. Beispielsweise kann der Server eine Bilderkennung auf der Fotografie 108 durchführen und erste Labels bestimmen, die der Fotografie 108 entsprechen, wie beispielsweise Eiffelturm, Frankreich, Paris und Hund. Die Bilderkennung kann eine Bestimmung von Entitäten in einem Fokus in der Fotografie 108, von Entitäten im Vordergrund und im Hintergrund der Fotografie 108, von relativen Größen von Entitäten in der Fotografie 108 und ähnliches enthalten. Bei einigen Beispielen kann der Server Metadaten identifizieren, die mit dem bestimmten Unterbild assoziiert sind, oder der Fotografie 108 in 1. Der Server kann die Metadaten verwenden, um die ersten Labels zu bestimmen, die dem bestimmten Unterbild entsprechen.In some examples, the server performs image recognition on the particular sub-image 108 by. The image recognition is performed to determine one or more first labels that indicate a context of the particular sub-image. For example, the server may have an image recognition on the photograph 108 carry out and determine the first labels, that of photography 108 such as Eiffel Tower, France, Paris and dog. Image recognition can be a determination of entities in a focus in photography 108 , entities in the foreground and background of photography 108 , relative sizes of entities in photography 108 and the like included. In some examples, the server may identify metadata associated with the particular subpicture or photograph 108 in 1 , The server can use the metadata to determine the first labels that match the particular sub-image.

Zusätzlich kann der Server eine Texterkennung auf dem Bild 106 durchführen. Der Server kann eine Texterkennung auf einem Teilbereich des Bilds 106 durchführen, der ein anderer als die Fotografie 108 ist. Der Teilbereich des Bilds 106 kann einen Titel 114 der Fotografie 108 und/oder Kommentierungen 116, die sich auf die Fotografie 108 beziehen, enthalten. Beispielsweise enthält das Bild 106 der 1 einen Titel 114, der einen Standort anzeigt, bei welchem die Fotografie 108 aufgenommen wurde, wie beispielsweise Paris, Frankreich. Das Bild 106 enthält auch Kommentierungen bzw. Kommentare 116, die sich auf die Fotografie 108 beziehen, wie beispielsweise „Dave ~ So cool, Frankreich ist mein Favorit.“ „Sarah ~ Ich wusste nicht, dass du einen Golden hattest, ich habe auch einen!“ und „Abby ~ Ich war gerade in Paris, wann warst du dort?“.In addition, the server can do a text recognition on the image 106 carry out. The server can do a text recognition on a partial area of the image 106 perform another than photography 108 is. The portion of the image 106 may be a title 114 the photograph 108 and / or comments 116 that focus on photography 108 refer, included. For example, the picture contains 106 of the 1 a title 114 indicating a location at which the photograph 108 was taken, such as Paris, France. The picture 106 also contains comments or comments 116 referring to photography 108, such as "Dave ~ So cool, France is my favorite.""Sarah ~ I did not know you had a Golden, I have one!" and "Abby ~ I was just in Paris, when were you there? ".

Der Titel 114 und die Kommentare 116 des Bilds 106 können durch den Server über Texterkennung verarbeitet werden. Durch ein Durchführen einer Texterkennung kann der Server ein oder mehrere zweite Labels bestimmen, die den Kontext des bestimmten Unterbilds weiter anzeigen. Beispielsweise kann der Server eine Texterkennung an dem Titel 114 durchführen, um zu verifizieren, dass der Standort des bestimmten Unterbilds Paris, Frankreich, ist. Weiterhin kann der Server eine Texterkennung an den Kommentaren 116 durchführen, um zu verifizieren, dass der Standort des bestimmten Unterbilds Paris, Frankreich, ist (z.B. durch ein Durchführen einer Texterkennung an der Phrase „Ich war gerade in Paris.“). Zusätzlich kann der Server eine Texterkennung an den Kommentaren 116 durchführen, um zu bestimmen, dass der Hund 112 in der Fotografie 108 ein Golden Retriever ist (z.B. durch Durchführen einer Texterkennung an der Phrase „Ich wusste nicht, dass du einen Golden ... hattest“). Als solches kann der Server ein oder mehrere zweite Labels erzeugen, wie beispielsweise Paris, Frankreich und Golden Retriever.The title 114 and the comments 116 of the picture 106 can be processed by the server via text recognition. By performing text recognition, the server may designate one or more second labels that further indicate the context of the particular sub-picture. For example, the server may have a text recognition on the title 114 to verify that the location of the particular subpicture is Paris, France. Furthermore, the server can do a text recognition on the comments 116 to verify that the location of the particular subpicture is Paris, France (eg, by performing a text recognition on the phrase "I was just in Paris."). In addition, the server can do a text recognition on the comments 116 perform to determine that the dog 112 in photography 108 a Golden Retriever is (eg by performing a text recognition on the phrase "I did not know that you had a golden ..."). As such, the server may generate one or more second labels, such as Paris, France, and Golden Retriever.

Der Server kann konfiguriert sein, um eine Suchanfrage basierend auf der empfangenen Transkription, den ersten Labels und den zweiten Labels zu erzeugen. Der Server kann die Suchanfrage automatisch ohne weitere Anwenderintervention erzeugen. Beispielsweise in Reaktion auf ein automatisches Bestimmen durch die Computervorrichtung 104, dass das Verfahren zu einer bestimmten Zeit ausgeführt werden sollte, nach einem bestimmten Tastendruck, der der Äußerung vorangeht, nach einem gesprochenen Befehl/Hotword, der/das in der Äußerung enthalten ist, oder irgendeiner anderen Anzeige von dem Anwender 102 der Computervorrichtung 104, dass ein solches Verfahren ausgeführt werden soll, vor der Transkription und bevor das Bild durch den Server empfangen wird.The server may be configured to generate a search query based on the received transcription, the first labels, and the second labels. The server can automatically generate the search query without further user intervention. For example, in response to automatic determination by the computing device 104 in that the method should be executed at a certain time, after a certain keystroke preceding the utterance, after a spoken command / word contained in the utterance or any other indication from the user 102 the computing device 104 that such a method should be executed before the transcription and before the image is received by the server.

Die Suchanfrage kann durch Überschreiben der Transkription erzeugt werden. Bei einigen Aspekten kann die Transkription durch Ersetzen von einem oder mehreren der ersten und/oder zweiten Labels in die Transkription überschrieben werden. Beispielsweise kann die Transkription „Was ist das?“ enthalten. In diesem Fall kann die Phrase „Der Eiffelturm“ für den Ausdruck „Das“ in der Transkription ersetzt werden. Daher kann die Suchanfrage überschrieben werden, um das Folgende zu enthalten: „Was ist der Eiffelturm?“.The search query can be generated by overwriting the transcription. In some aspects, transcription may be overridden by replacing one or more of the first and / or second labels with the transcription. For example, the transcription may include "What's this?". In this case, the phrase "The Eiffel Tower" for the term "Das" in the transcription can be replaced. Therefore, the search query can be overridden to include the following: "What is the Eiffel Tower?".

Bei einigen Aspekten ist der Server konfiguriert, um einen Label-Vertrauenswert für jedes der ersten und zweiten Labels zu erzeugen. In diesem Fall können die Label-Vertrauenswerte eine relative Wahrscheinlichkeit anzeigen, dass jedes Label einem Teilbereich des bestimmten Unterbilds entspricht, der von primärem Interesse für den Anwender 102 ist. Beispielsweise kann ein erstes Label „Eiffelturm“ mit einem Vertrauenswert von 0.8 enthalten und kann ein zweites Label „Golden Retriever“ mit einem Vertrauenswert von 0.5 enthalten. In diesem Fall können die Vertrauenswerte anzeigen, dass das erste Label einer Entität entspricht, für die es wahrscheinlicher ist, dass sie von primärem Interesse für den Anwender 102 ist, und zwar basierend auf dem größeren jeweiligen Label-Vertrauenswert.In some aspects, the server is configured to generate a label trust value for each of the first and second labels. In this case, the label trust values can be one indicate relative likelihood that each label corresponds to a subset of the particular subpicture that is of primary interest to the user 102 is. For example, a first label may contain "Eiffel Tower" with a confidence score of 0.8, and may include a second label "Golden Retriever" with a confidence score of 0.5. In this case, the trust values may indicate that the first label corresponds to an entity that is more likely to be of primary interest to the user 102 based on the larger respective label trust value.

Labels können ausgewählt werden, um die Suchanfrage zu erzeugen, basierend auf den Vertrauenswerten. Beispielsweise kann eine bestimmte Anzahl von Labels mit dem höchsten Vertrauenswert ausgewählt werden, um eine Suchanfrage in Kombination mit der Transkription zu erzeugen. Bei einem weiteren Beispiel können alle Labels, die eine bestimmte Label-Vertrauenswertschwelle erfüllen, in Kombination mit der Transkription verwendet werden, um die Suchanfrage zu erzeugen. Bei einem weiteren Beispiel kann der Server Label-Vertrauenswerte basierend auf einer Häufigkeit erzeugen, mit welcher die Labels in jüngsten bzw. letzten Suchanfragen erscheinen, einer Häufigkeit, mit welcher die Labels in allen historischen Suchanfragen erscheinen, und so weiter.Labels can be selected to generate the search query based on the trusted values. For example, a certain number of labels with the highest confidence value can be selected to generate a query in combination with the transcription. In another example, all labels that satisfy a particular label confidence threshold may be used in combination with the transcription to generate the search query. In another example, the server may generate label confidence values based on a frequency at which the labels appear in recent searches, a frequency at which labels appear in all historical queries, and so on.

Der Server kann konfiguriert sein, um auf historische Suchanfragedaten zuzugreifen. Die historischen Anfragedaten können eine Anzahl von vorherigen Suchanfragen enthalten, die durch den Anwender 102 und/oder andere Anwender geliefert sind. Der Server kann eine oder mehrere Kandidaten-Suchanfragen basierend auf der Transkription, den ersten Labels und den zweiten Labels erzeugen und die historischen Anfragedaten mit den Kandidaten-Suchanfragen vergleichen. Basierend auf einem Vergleichen der historischen Anfragedaten mit der einen oder den mehreren Kandidaten-Suchanfragen kann der Server eine bestimmte Kandidaten-Suchanfrage als die Suchanfrage auswählen. Beispielsweise kann der Server die bestimmte Kandidaten-Suchanfrage basierend auf einem Vergleich zwischen einer Häufigkeit der Kandidaten-Suchanfragen, die in jüngsten Suchanfragen erscheinen, wie beispielsweise Anfragen, die durch den Anwender eingegeben sind, und/oder einer Häufigkeit der Kandidaten-Suchanfragen, die in historischen Suchanfragen erscheinen, wie beispielsweise Anfragen, die durch alle Anwender in eine Suchmaschine eingegeben sind, auswählen.The server can be configured to access historical search request data. The historical query data may include a number of previous search queries made by the user 102 and / or other users are supplied. The server may generate one or more candidate search queries based on the transcription, the first labels, and the second labels, and compare the historical query data with the candidate search queries. Based on comparing the historical query data with the one or more candidate search queries, the server may select a particular candidate query as the search query. For example, the server may search the particular candidate search query based on a comparison between a frequency of candidate search queries appearing in recent search queries, such as queries entered by the user, and / or a frequency of candidate search queries entered in historical queries appear, such as queries entered by all users into a search engine.

Der Server kann konfiguriert sein, um die erzeugte Suchanfrage zur Ausgabe zu liefern. Beispielsweise kann der Server konfiguriert sein, um die erzeugte Suchanfrage zu einer Suchmaschine zu liefern. Bei einem weiteren Beispiel kann der Server die Suchanfrage erzeugen und die Suchanfrage zur Computervorrichtung 102 senden. In diesem Fall kann die Suchanfrage zu dem Anwender 102 akustisch oder visuell durch die Computervorrichtung 104 geliefert werden, um zu verifizieren, dass der Server die Anfrage genau bzw. akkurat überschrieben hat.The server may be configured to deliver the generated search request for output. For example, the server may be configured to deliver the generated search query to a search engine. In another example, the server may generate the search query and send the search request to the computing device 102. In this case, the search request to the user 102 may be audible or visual by the computing device 104 to verify that the server has accurately or accurately overwritten the request.

Der Server kann weiterhin konfiguriert sein, um die erzeugte Suchanfrage zur Ausgabe und/oder ein Suchergebnis zur Computervorrichtung 104 zu liefern. In diesem Fall kann die Computervorrichtung 104 konfiguriert sein, um die Suchanfrage zu empfangen und um ein Suchergebnis, das der Suchanfrage entspricht, zur Ausgabe 122 zu liefern, wie beispielsweise „Du schaust auf eine Fotografie des Eiffelturms“.The server may be further configured to generate the generated query for output and / or a search result to the computing device 104 to deliver. In this case, the computer device 104 be configured to receive the search request and to output a search result that matches the search request 122 such as "You look at a photograph of the Eiffel Tower".

2 ist eine graphische Darstellung eines beispielhaften Systems 200 zum kontextuellen Eindeutigmachen einer Anfrage. Das System 200 enthält den Anwender 102, die Computervorrichtung 104, einen Server 206, eine Bilderkennungsmaschine 208 und eine Texterkennungsmaschine 210. Die Computervorrichtung 104 ist in Kommunikation mit dem Server 206 über ein oder mehrere Netzwerke. Die Computervorrichtung 104 kann ein Mikrophon oder andere Erfassungsmechanismen zum Erfassen von Äußerungen des Anwenders 102 enthalten. 2 Figure 3 is a graphical representation of an exemplary system 200 for contextualizing a request. The system 200 contains the user 102 , the computer device 104 , a server 206 , an image recognition engine 208 and a text recognition engine 210 , The computing device 104 is in communication with the server 206 over one or more networks. The computer device 104 may be a microphone or other detection mechanisms for detecting user utterances 102 contain.

Bei einem Beispiel kann der Anwender 102 eine Äußerung zur Computervorrichtung 104 liefern. Die Äußerung kann durch die Computervorrichtung 102 erfasst und transkribiert werden. Als solches kann die Computervorrichtung 104 eine Transkription 204 erzeugen, die der Äußerung des Anwenders 102 entspricht. Die Computervorrichtung 104 kann auch konfiguriert sein, um ein Bild 202 einer graphischen Anzeige der Computervorrichtung 104 zu erfassen. Die Computervorrichtung 104 kann das Bild 202 auf ein Erfassen der Äußerung des Anwenders 102 hin oder auf ein Transkribieren der Äußerung hin erfassen. Zusätzlich oder alternativ kann die Computervorrichtung 104 konfiguriert sein, um die angezeigten Inhalte der Computervorrichtung 104 dauerhaft zu erfassen. In diesem Fall kann ein bestimmtes Unterbild mit der Transkription 204 zum Server 206 auf eine Erfassung der Äußerung hin gesendet werden.In one example, the user may 102 provide an utterance to the computing device 104. The utterance may be through the computing device 102 recorded and transcribed. As such, the computing device 104 a transcription 204 generate the utterance of the user 102 equivalent. The computer device 104 can also be configured to take a picture 202 a graphical display of the computer device 104 capture. The computing device 104 may capture the image 202 upon detecting the utterance of the user 102 or transcribing the utterance. Additionally or alternatively, the computing device 104 be configured to display the displayed contents of the computing device 104 permanently record. In this case, a particular subpicture can be transcribed 204 to the server 206 sent to a detection of the utterance.

Bei einem weiteren Beispiel kann die Computervorrichtung 104 konfiguriert sein, um die Äußerung des Anwenders 102 zum Server 206 zu senden. Beispielsweise kann die Computervorrichtung 104 konfiguriert sein, um ein vordefiniertes Hotword in der Äußerung zu erfassen, und auf eine Erfassung des Hotwords hin die Äußerung zum Server 206 senden. In diesem Fall ist der Server 206 konfiguriert, um eine Transkription entsprechend der Äußerung zu erzeugen.In another example, the computing device may 104 be configured to the utterance of the user 102 to the server 206 to send. For example, the computing device 104 be configured to capture a predefined hotword in the utterance and, upon detection of the hotword, the utterance to the server 206 send. In this case, the server 206 configured to generate a transcription according to the utterance.

In einem Fall (A) empfängt der Server 206 die Transkription 204 und das Bild 202 von der Computervorrichtung 104. Die Computervorrichtung 104 kann die Transkription 204 und das Bild 202 automatisch zum Server 206 senden. Die Computervorrichtung 104 kann auch die Transkription 204 und das Bild 202 auf eine Anwendereingabe hin senden. Beispielsweise kann der Anwender die Äußerung sowie eine Berührungseingabe bei der graphischen Anzeige der Computervorrichtung 104 liefern, was anzeigt, dass der Anwender zu einer Transkription entsprechend der Äußerung und dem Bild auffordert, um zum Server 206 gesendet zu werden.In a case (A), the server receives 206 the transcription 204 and the picture 202 of the computer device 104 , The computer device 104 can transcription 204 and the picture 202 automatically to the server 206 send. The computing device 104 may also transcribe 204 and the picture 202 on a user input. For example, the user may provide the utterance as well as a touch input to the graphical display of the computing device 104, indicating that the user is requesting a transcription corresponding to the utterance and the image to the server 206 to be sent.

In einem Fall (B) identifiziert der Server 206 ein bestimmtes Unterbild 207 des Bilds 202 und sendet das bestimmte Unterbild 207 zu einer Bilderkennungsmaschine 208. Bei einigen Aspekten ist der Server 206 in Kommunikation mit der Bilderkennungsmaschine 208 über das Netzwerk. Bei anderen Aspekten sind der Server 206 und die Bilderkennungsmaschine 208 in ein einziges System integriert.In a case (B), the server identifies 206 a particular sub-picture 207 of the picture 202 and sends the particular sub-picture 207 to an image recognition engine 208. In some aspects, the server is 206 in communication with the image recognition engine 208 over the network. Other aspects include the server 206 and the image recognition engine 208 integrated into a single system.

Bei einigen Beispielen kann das Bild 202 mehrere Bilder enthalten. Der Server 206 kann die mehreren Bildern analysieren, um das bestimmte Unterbild 207 zu bestimmen, für das es wahrscheinlich ist, dass es von Interesse für den Anwender 102 ist. Zusätzlich kann der Server 206 eine Anwendereingabe empfangen, die anzeigt, dass das bestimmte Unterbild 207 der Bilder in dem Bild 202 von primärem Interesse für den Anwender 102 ist. Der Server 206 kann einen Bild-Vertrauenswert für jedes der mehreren Bilder im Bild 202 erzeugen. Der Bild-Vertrauenswert kann eine relative Wahrscheinlichkeit anzeigen, dass ein Bild ein Bild von primärem Interesse für den Anwender 102 ist. Der Server 206 kann das bestimmte Unterbild 207 oder das Bild von primärem Interesse für den Anwender 102 basierend auf den erzeugten Vertrauenswerten bestimmen. Beispielsweise kann der Server 206 identifizieren, dass die Anzeige der Computervorrichtung 104 einen ersten Teilbereich und einen zweiten Teilbereich enthält. Der erste Teilbereich kann eine Fotografie enthalten und der zweite Teilbereich kann ein Logo-Bild enthalten, das einem Titel der Anwendung entspricht, die die Computervorrichtung verwendet. Der Server kann konfiguriert sein, um einen Vertrauenswert von 0.9 für den ersten Teilbereich zu erzeugen und einen Vertrauenswert von 0.3 für den zweiten Teilbereich. In diesem Fall bestimmt der Server 206, dass der erste Teilbereich wahrscheinlich von primärem Interesse für den Anwender 102 ist, basierend auf den erzeugten Vertrauenswerten.In some examples, the picture may 202 contain several pictures. The server 206 may analyze the multiple images to determine the particular sub-image 207 that is likely to be of interest to the user 102. In addition, the server can 206 receive a user input indicating that the particular sub-picture 207 the pictures in the picture 202 of primary interest to the user 102 is. The server 206 can provide a picture confidence value for each of the multiple pictures in the picture 202 produce. The image confidence value may indicate a relative likelihood that an image will be an image of primary interest to the user 102 is. The server 206 can the particular sub picture 207 or the image of primary interest to the user 102 determine based on the confidence values generated. For example, the server 206 identify that the display of the computing device 104 a first subarea and a second subarea. The first portion may include a photograph, and the second portion may include a logo image corresponding to a title of the application using the computing device. The server may be configured to produce a confidence level of 0.9 for the first subarea and a confidence level of 0.3 for the second subarea. In this case, the server determines 206 that the first subset is likely of primary interest to the user 102 is based on the confidence values generated.

Der Server kann konfiguriert sein, um das bestimmte Unterbild 207 basierend auf einem Empfangen von Daten zu bestimmen, die eine Auswahl eines Steuerereignisses anzeigen. Das Steuerereignis kann dem entsprechen, dass der Anwender 102 eine Eingabe bei der Computervorrichtung 104 liefert. Spezifisch kann das Steuerereignis dem entsprechen, dass der Anwender 102 mit der Anzeige der Computervorrichtung 104 interagiert. Beispielsweise kann der Anwender 102 mit einem Teilbereich der Anzeige interagieren, der dem bestimmten Unterbild 207 entspricht. Der Server 206 kann Daten empfangen, die anzeigen, dass der Anwender 102 mit einem Teilbereich der Anzeige interagierte, der dem bestimmten Unterbild 207 entspricht, und kann daher bestimmen, dass der Teilbereich der Anzeige dem bestimmten Unterbild 207 entspricht.The server may be configured to the particular sub-picture 207 based on receiving data indicative of a selection of a control event. The control event may correspond to the user 102 inputting to the computing device 104 supplies. Specifically, the control event may correspond to that of the user 102 with the display of the computer device 104 interacts. For example, the user 102 interact with a portion of the ad that corresponds to the particular subpicture 207 equivalent. The server 206 may receive data indicating that the user 102 was interacting with a portion of the display corresponding to the particular sub-picture 207, and may therefore determine that the portion of the display is the particular sub-picture 207 equivalent.

Bei einem Ereignis (C) führt die Bilderkennungsmaschine 208 eine Bilderkennung an dem bestimmten Unterbild 207 durch. Die Bilderkennungsmaschine 208 führt eine Bilderkennung durch, um Labels 209 für das bestimmte Unterbild 207 zu erzeugen, die einen Kontext des bestimmten Unterbilds anzeigen. Die Labels 209 können Entitäten im bestimmten Unterbild 207 entsprechen, wie beispielsweise Bäumen oder einem Hund. Die Labels 209 können auch Entitäten entsprechen, die spezifische Standorte oder Sehenswürdigkeiten enthalten, wie beispielsweise den Eiffelturm. Die Labels 209 können individuell oder in Kombination verwendet werden, um einen Kontext des bestimmten Unterbilds 207 zu bestimmen.At an event (C), the image recognition engine runs 208 an image recognition on the particular sub-image 207 by. The image recognition engine 208 performs image recognition to labels 209 for the particular sub-picture 207 indicating a context of the particular sub-picture. The labels 209 can be entities in the given subpicture 207 such as trees or a dog. The labels 209 may also correspond to entities containing specific locations or landmarks, such as the Eiffel Tower. The labels 209 can be used individually or in combination to create a context of the particular subpicture 207 to determine.

Die Bilderkennungsmaschine 208 kann konfiguriert sein, um einen Teilbereich des bestimmten Unterbilds 207 zu bestimmen, der von einem primären Fokus des Anwenders 102 ist. Beispielsweise kann die Bilderkennungsmaschine 208 das bestimmte Unterbild 207 analysieren, um zu bestimmen, dass das bestimmte Unterbild 207 Entitäten enthält, wie beispielsweise den Eiffelturm und einen Hund. Die Bilderkennungsmaschine 208 kann die Entitäten im bestimmten Unterbild 207 analysieren und bestimmen, dass der Eiffelturm bezüglich einer Größe größer als der Hund ist. Basierend auf der Bestimmung, dass der Eiffelturm proportional größer bezüglich der Größe gegenüber dem Hund ist, kann die Bilderkennungsmaschine 208 bestimmen, dass der Eiffelturm 110 wahrscheinlich von primärem Interesse für den Anwender 102 ist. Zusätzlich oder alternativ kann die Bilderkennungsmaschine 208 konfiguriert sein, um andere Aspekte des bestimmten Unterbilds 207 zu analysieren, wie beispielsweise Vordergrund gegenüber Hintergrund, Entitäten in einem Fokus des bestimmten Unterbilds 207 und ähnliches. Beispielsweise kann die Bilderkennungsmaschine 208 bestimmen, dass der Eiffelturm im Fokus im bestimmten Unterbild 207 ist und dass der Hund außerhalb des Fokus ist. Als solches kann die Bilderkennungsmaschine 208 bestimmen, dass der Eiffelturm wahrscheinlich von primärem Interesse für den Anwender 102 ist.The image recognition engine 208 may be configured to a subarea of the particular subpicture 207 to determine that of a primary focus of the user 102 is. For example, the image recognition engine 208 the particular sub-picture 207 analyze to determine that the particular subpicture 207 Contains entities such as the Eiffel Tower and a dog. The image recognition engine 208 can analyze the entities in the particular sub-image 207 and determine that the Eiffel Tower is larger than the dog in size. Based on the determination that the Eiffel Tower is proportionally larger in size to the dog, the image recognition engine 208 may determine that the Eiffel Tower 110 probably of primary interest to the user 102 is. Additionally or alternatively, the image recognition engine 208 may be configured to include other aspects of the particular sub-image 207 such as foreground versus background, entities in a focus of the particular sub-image 207 and similar. For example, the image recognition engine 208 Determine that the Eiffel Tower is in focus in the particular subpicture 207 is and that the dog is out of focus. As such, the image recognition engine 208 determine that the Eiffel Tower is likely of primary interest to the user 102 is.

Bei einem Ereignis (D) identifiziert der Server 206 einen oder mehrere Teilbereiche 211 des Bilds 202, die das bestimmte Unterbild nicht enthalten. Der eine oder die mehreren Teilbereiche 211 werden zur Texterkennungsmaschine 210 gesendet. Bei einigen Aspekten ist der Server 206 in Kommunikation mit der Texterkennungsmaschine 210 über das Netzwerk. Bei anderen Aspekten sind der Server 206 und die Texterkennungsmaschine 210 in ein einziges System integriert. Weiterhin können der Server 206, die Bilderkennungsmaschine 208 und die Texterkennungsmaschine 210 in ein einziges System integriert sein. Bei einigen Beispielen können der eine oder die mehreren Teilbereiche 211 eine Titel enthalten, der in dem Bild 202 enthalten ist, Kommentierungen bzw. Kommentare, die im Bild 202 enthalten sind, oder irgendeinen Inhalt im Bild 202, der nicht das bestimmte Unterbild 207 enthält.At an event (D), the server identifies 206 one or more subareas 211 of the picture 202 that the particular subpicture does not contain. The one or more subareas 211 become the text recognition engine 210 Posted. In some aspects, the server is 206 in communication with the text recognition engine 210 over the network. In other aspects, the server 206 and the text recognition engine 210 integrated into a single system. Furthermore, the server can 206 , the image recognition engine 208 and the text recognition engine 210 may be integrated into a single system. In some examples, the one or more subregions may 211 a title contained in the picture 202 contained, comments or comments included in image 202, or any content in the image 202 that is not the particular subpicture 207 contains.

Bei einem Ergebnis (E) führt die Texterkennungsmaschine 210 eine Texterkennung an dem einen oder den mehreren Teilbereichen 211 des Bilds 202 durch, der oder die das bestimmte Unterbild 207 nicht enthält oder enthalten. Die Texterkennungsmaschine 210 führt eine Texterkennung durch, um Labels 212 für den einen oder die mehreren Teilbereiche 211 zu erzeugen, die einen Kontext des bestimmten Unterbilds 207 anzeigen. Beispielsweise können die Teilbereiche 211 Kommentierungen enthalten, wie beispielsweise „Dave ~ So cool, Frankreich ist mein Favorit.“ „Sarah ~ Ich wusste nicht, dass du einen Golden hattest, ich habe auch einen!“ und „Abby ~ Ich war gerade in Paris, wann warst du dort?“. Die Labels 212 können direkt Text in dem einen oder den mehreren Teilbereichen 211 entsprechen. In diesem Fall können die Labels 212 Ausdrücke enthalten, wie beispielsweise „Frankreich“ oder „Paris“. Die Labels 212 können aus dem Text in dem einen oder den mehreren Teilbereichen 211 abgeleitet werden. In diesem Fall können die Labels 212 abgeleitet werden, um die Phrase „Golden Retriever“ zu enthalten. Die Labels 212 können individuell oder in Kombination verwendet werden, um einen Kontext des bestimmten Unterbilds 207 zu bestimmen.For a result (E), the text recognition engine performs 210 a text recognition at the one or more subregions 211 of the picture 202 through, the or the particular sub-picture 207 does not contain or contain. The text recognition engine 210 performs text recognition to labels 212 for the one or more subareas 211 to create a context of the particular subpicture 207 Show. For example, the subareas 211 Comments include, for example, "Dave ~ So cool, France is my favorite.""Sarah ~ I did not know you had a Golden, I also have one!" And "Abby ~ I was in Paris just when you were there ? ". The labels 212 may directly text in the one or more sections 211 correspond. In this case, the labels 212 Include expressions, such as "France" or "Paris". The labels 212 can be from the text in the one or more sub-sections 211 be derived. In this case, the labels 212 derived from the phrase "Golden Retriever". The labels 212 can be used individually or in combination to create a context of the particular subpicture 207 to determine.

Durch Durchführen einer Texterkennung kann die Texterkennungsmaschine 210 ein oder mehrere Labels 212 bestimmen, die weiterhin den Kontext des bestimmten Unterbilds 207 anzeigen. Beispielsweise kann die Texterkennungsmaschine 210 eine Texterkennung an den Kommentierungen 116 durchführen, um zu verifizieren, dass der Standort des bestimmten Unterbilds Paris, Frankreich, ist (z.B. durch Durchführen einer Texterkennung an der Phrase „Ich war gerade in Paris.“). Zusätzlich kann die Texterkennungsmaschine 210 eine Texterkennung an den Kommentierungen durchführen, um zu bestimmen, dass der Hund in dem bestimmten Unterbild 207 ein Golden Retriever ist (z.B. durch Durchführen einer Texterkennung an der Phrase „Ich wusste nicht, dass du einen Golden ... hattest“). Als solches kann die Texterkennungsmaschine 210 ein oder mehrere Labels 212 erzeugen, wie beispielsweise Paris, Frankreich und Golden Retriever.By performing text recognition, the text recognition engine 210 may include one or more labels 212 determine which continues the context of the particular subpicture 207 Show. For example, the text recognition engine 210 may do a text recognition on the annotations 116 to verify that the location of the particular subpicture is Paris, France (eg, by performing text recognition on the phrase "I was just in Paris."). In addition, the text recognition engine 210 do a text recognition on the annotations to determine that the dog is in the particular subpicture 207 a Golden Retriever is (eg by performing a text recognition on the phrase "I did not know that you had a golden ..."). As such, the text recognition engine 210 create one or more labels 212, such as Paris, France and Golden Retriever.

Bei einem Ereignis (F) erzeugt der Server 206 eine Suchanfrage 213 unter Verwendung der Transkription 204, der Labels 209 von der Bilderkennungsmaschine 208 und der Labels 212 von der Texterkennungsmaschine 210. Der Server 206 kann die Suchanfrage 213 automatisch ohne eine weitere Anwenderintervention erzeugen. Beispielsweise in Reaktion auf ein automatisches Bestimmen durch die Computervorrichtung 104, dass das Verfahren zu einer bestimmten Zeit ausgeführt werden sollte, nach einem bestimmten Tastendruck, der der Äußerung vorangeht, nach einem gesprochenen Befehl/Hotword, das in der Äußerung enthalten ist, oder irgendeiner anderen Anzeige vom Anwender 102 der Computervorrichtung 104, dass ein solches Verfahren vor der Transkription 204 und bevor das Bild 202 durch den Server 206 empfangen wird, auszuführen ist.At an event (F) the server generates 206 a search query 213 using the transcription 204 , the labels 209 from the image recognition engine 208 and the labels 212 from the text recognition engine 210 , The server 206 may do the search 213 automatically without further user intervention. For example, in response to automatic determination by the computing device 104 in that the method should be executed at a certain time, after a certain keystroke preceding the utterance, after a spoken command / hotword contained in the utterance, or any other indication from the user 102 the computing device 104 that such a procedure prior to transcription 204 and before the image 202 through the server 206 is received, is to execute.

Der Server 206 kann die Transkription 204 als die Suchanfrage 213 überschreiben. Der Server 206 kann eine Untergruppe der Labels der Bilderkennungsmaschine 209 und der Texterkennungsmaschine 212 in die Transkription 204 einsetzen, um die Suchanfrage 213 zu erzeugen. Beispielsweise kann der Server 206 das Label „Eiffelturm“ in die Transkription 204 einsetzen, so dass die erzeugte Suchanfrage 213 „Was ist der Eiffelturm?“ enthält.The server 206 can transcription 204 as the search query 213 overwrite. The server 206 may be a subset of the labels of the image recognition engine 209 and the text recognition engine 212 in the transcription 204 insert to the search query 213 to create. For example, the server 206 may label the "Eiffel Tower" in the transcription 204 insert so that the query generated 213 "What is the Eiffel Tower?" Contains.

Weiterhin liefert der Server 206 bei einem Ereignis (F) die erzeugte Suchanfrage 213 zur Ausgabe. Beispielsweise kann der Server 206 die Suchanfrage 213 zu einer Suchmaschine liefern. Der Server 206 kann Suchergebnisse von der Suchmaschine empfangen und die Suchergebnisse zur Computervorrichtung 104 liefern, und zwar über das Netzwerk. Bei einigen Aspekten kann die Computervorrichtung 104 die Suchergebnisse empfangen und die Suchergebnisse als akustische oder visuelle Ausgabe liefern. Beispielsweise kann der Server 206 die Suchanfrage 213 „Was ist der Eiffelturm?“ erzeugen und die erzeugte Suchanfrage 213 zur Computervorrichtung 104 liefern. In diesem Fall kann die Computervorrichtung 104 konfiguriert sein, um die erzeugte Suchanfrage 213 zum Anwender 102 zur Verifizierung akustisch auszugeben, bevor die Suchanfrage 213 zu einer Suchmaschine eingegeben wird.The server continues to deliver 206 at an event (F), the generated search request 213 for output. For example, the server 206 deliver the search query 213 to a search engine. The server 206 can receive search results from the search engine and deliver the search results to the computing device 104 via the network. In some aspects, the computing device 104 may receive the search results and provide the search results as an audible or visual output. For example, the server 206 the search query 213 "What is the Eiffel Tower?" Generate and the generated query 213 to the computing device 104 deliver. In this case, the computing device 104 may be configured to generate the generated query 213 to the user 102 to audition for verification before the search query 213 is entered to a search engine.

Bei einigen Beispielen erzeugt der Server 206 die Suchanfrage 213 gemäß erzeugten Gewichtungen der Labels 209 und 212. In diesem Fall kann der Server 206 eine erste Gewichtung für die Bildlabels 209 erzeugen, die sich von einer zweiten Gewichtung für die Textlabels 212 unterscheidet. Beispielsweise kann der Server 206 bestimmen, dass die Bildlabels 209 relevanter für die Transkription 204 als die Textlabels 212 sind. Als solches kann der Server 206 eine größere Betonung auf die Bildlabels 209 durch mehr Gewichten der Bildlabels 209 als der Textlabels 212 platzieren.In some examples, the server generates 206 the search query 213 according to generated weights of the labels 209 and 212 , In this case, server 206 may provide a first weighting for the image labels 209 generate a second weighting for the text labels 212 different. For example, the server 206 determine that the picture labels 209 more relevant for transcription 204 than the text labels 212 are. As such, the server can 206 a greater emphasis on the picture labels 209 through more weights of the picture labels 209 as the text labels 212 place.

Der Server 206 kann konfiguriert sein, um ein zusätzliches Bild der Computervorrichtung 104 und eine zusätzliche Transkription einer durch einen Anwender der Computervorrichtung 104 gesprochenen zusätzlichen Äußerung zu empfangen. Der Server 206 kann ein zusätzliches bestimmtes Unterbild identifizieren, das im zusätzlichen Bild enthalten ist, und das zusätzliche bestimmte Unterbild zur Bilderkennungsmaschine 208 senden, um eine Bilderkennung am zusätzlichen bestimmten Unterbild durchzuführen. Die Bilderkennungsmaschine 208 kann konfiguriert sein, um ein oder mehrere zusätzliche erste Labels für das zusätzliche bestimmte Unterbild zu erzeugen, die einen Kontext des zusätzlichen bestimmten Unterbilds anzeigen. Gleichermaßen kann der Server konfiguriert sein, um einen Teilbereich des zusätzlichen Bilds, der das zusätzliche bestimmte Unterbild nicht enthält, zur Texterkennungsmaschine 210 zu senden, um ein oder mehrere zusätzliche zweite Labels zu erzeugen, und zwar basierend auf einem Durchführen einer Texterkennung am Teilbereich des zusätzlichen Bilds, der ein anderer als das zusätzliche bestimmte Unterbild ist.The server 206 may be configured to provide an additional image of the computing device 104 and an additional transcription of a by a user of the computing device 104 to receive spoken additional utterance. The server 206 may identify an additional particular sub-image contained in the additional image and the additional particular sub-image to the image recognition engine 208 send to perform image recognition on the additional designated sub-picture. The image recognition engine 208 may be configured to generate one or more additional first labels for the additional particular sub-image indicating a context of the additional particular sub-image. Likewise, the server may be configured to provide a portion of the additional image that does not contain the additional particular sub-image to the text recognition engine 210 to generate one or more additional second labels based on performing text recognition on the portion of the additional image that is other than the additional particular sub-image.

Der Server 206 kann die zusätzliche Transkription, die zusätzlichen ersten Labels und die zusätzlichen zweiten Labels verwenden, um einen Befehl oder eine Aktion zu erzeugen. Der Befehl kann durch den Server 206 automatisch durchgeführt werden, zur Computervorrichtung 104 geliefert werden und ähnliches. Bei einigen Beispielen kann der Befehl eine oder mehrere Aktionen enthalten, wie beispielsweise ein Speichern des zusätzlichen Bilds in einem Speicher, ein Speichern des zusätzlichen bestimmten Unterbilds im Speicher, ein Hochladen des zusätzlichen Bilds zum Server 206, ein Hochladen des zusätzlichen bestimmten Unterbilds zum Server 206, ein Importieren des zusätzlichen Bilds zu einer Anwendung der Computervorrichtung 104 und ein Importieren des bestimmten Unterbilds zur Anwendung der Computervorrichtung 104. Beispielsweise kann der Anwender 102 gerade einen visuellen und textuellen Inhalt in einer Nachrichtenanwendung auf der Anzeige der Computervorrichtung 104 anschauen. Unter Verwendung der empfangenen Transkription und der erzeugten Labels kann der Server 206 konfiguriert sein, um einen Teilbereich eines Bilds in der Anwendung für Anmerkungen bzw. Anwendung für Nachrichten zu erfassen und den Teilbereich des Bilds zur Cloud zur Speicherung hochladen.The server 206 may use the additional transcription, additional first labels, and additional second labels to generate a command or action. The command can be through the server 206 be done automatically to the computing device 104 be delivered and the like. In some examples, the command may include one or more actions, such as storing the additional image in memory, storing the additional particular sub-image in memory, uploading the additional image to the server 206 , uploading the additional specific subpicture to the server 206 , importing the additional image to an application of the computing device 104 and importing the particular sub-image for use by the computing device 104 , For example, the user 102 may be viewing visual and textual content in a news application on the display of the computing device 104 look at. Using the received transcription and the generated labels, the server 206 may be configured to capture a portion of an image in the application annotation application and to upload the portion of the image to the cloud for storage.

Bei bestimmten Aspekten liefert der Server 206 die Suchanfrage 213 zur Computervorrichtung 104. In diesem Fall kann die Computervorrichtung 104 die Suchanfrage 213 zur Verifizierung durch den Anwender 102 liefern, bevor sie die Suchanfrage 213 als Eingabe zu einer Suchmaschine liefert. Als solches kann die Suchanfrage 213 durch den Anwender 102 angenommen, modifiziert oder abgelehnt werden. Beispielsweise kann der Anwender in Reaktion auf ein Empfangen der Suchanfrage 213 bei der Computervorrichtung 104 eine Anwendereingabe liefern, die anzeigt, dass die Suchanfrage 213 zu einer Suchmaschine zu liefern ist. Bei einem weiteren Beispiel kann der Anwender 102 eine Anwendereingabe liefern, die anzeigt, dass die Suchanfrage 213 zu modifizieren ist, bevor sie zur Suchmaschine geliefert wird. Als solches kann der Anwender die Suchanfrage 213 direkt modifizieren oder nach einer weiteren Suchanfrage vom Server 206 fragen. Bei einem weiteren Beispiel kann der Anwender 102 eine Anwendereingabe liefern, die anzeigt, dass die Suchanfrage 213 abgelehnt wird. Als solches kann der Anwender 102 nach einer weiteren Suchanfrage vom Server 206 Fragen oder eine weitere Äußerung liefern, um bei der Erzeugung einer weiteren Suchanfrage verwendet zu werden.For certain aspects, the server provides 206 the search query 213 to the computer device 104 , In this case, the computer device 104 the search query 213 for verification by the user 102 deliver before the search query 213 as input to a search engine supplies. As such, the search query 213 by the user 102 accepted, modified or rejected. For example, the user may respond in response to receiving the search request 213 at the computer device 104 provide a user input indicating that the search query 213 to deliver to a search engine. In another example, the user may 102 provide a user input indicating that the search query 213 is to be modified before it is delivered to the search engine. As such, the user can search query 213 modify directly or after another search from the server 206 ask. In another example, the user may 102 provide a user input indicating that the search query 213 is rejected. As such, the user can 102 after another search from the server 206 Provide questions or another statement to be used in generating another query.

3 ist ein Ablaufdiagramm, das einen beispielhaften Prozess 300 zum kontextuellen Eindeutigmachen einer Anfrage darstellt. Der Prozess 300 kann durch einen oder mehrere Server oder andere Computervorrichtungen durchgeführt werden. Beispielsweise können Operationen des Prozesses 300 durch den Server 206 der 2 durchgeführt werden. Operationen des Prozesses 300 können auch als Anweisungen implementiert sein, die auf einem nichtflüchtigen computerlesbaren Medium gespeichert sind, und dann, wenn die Anweisungen durch einen oder mehrere Server (oder andere Computervorrichtungen) ausgeführt werden, veranlassen die Anweisungen, dass der eine oder die mehreren Server Operationen des Prozesses 300 durchführen. 3 is a flowchart illustrating an exemplary process 300 to contextualize a request. The process 300 can be performed by one or more servers or other computing devices. For example, operations of the process 300 through the server 206 of the 2 be performed. Operations of the process 300 may also be implemented as instructions stored on a non-transitory computer-readable medium, and when the instructions are executed by one or more servers (or other computing devices), the instructions cause the one or more servers to perform operations of the process 300 carry out.

Bei einem Schritt 310 empfängt der Server ein Bild und eine Transkription einer Äußerung. Das Bild kann einer graphischen Anzeige einer Computervorrichtung in Kommunikation mit dem Server entsprechen. Beispielsweise kann die Computervorrichtung das Bild auf ein Empfangen der Äußerung hin erfassen. Bei einigen Aspekten kann das Bild einer graphischen Anzeige der Computervorrichtung entsprechen, wenn die Computervorrichtung in einem Kameramodus ist. Als solches kann das Bild einer Fotografie entsprechen, die die Computervorrichtung erfasst oder die durch eine Kamera in Kommunikation mit der Computervorrichtung angeschaut wird. Weiterhin kann das Bild einem Video entsprechen, das durch die Kamera der Computervorrichtung erfasst ist, oder einem Video, das bei einer Anzeige der Computervorrichtung angezeigt ist. Zusätzlich oder alternativ kann die Computervorrichtung Hintergrundrauschen senden, das erfasst wird, während die Äußerung empfangen wird. In diesem Fall kann der Server das Hintergrundrauschen verwenden, um zusätzliche Labels zu erzeugen und/oder um die erzeugten Labels zu bewerten.At one step 310 the server receives an image and a transcription of an utterance. The image may correspond to a graphical display of a computing device in communication with the server. For example, the computing device may capture the image upon receiving the utterance. In some aspects, the image may correspond to a graphical display of the computing device when the computing device is in a camera mode. As such, the image may correspond to a photograph that captures the computing device or that is being viewed by a camera in communication with the computing device. Furthermore, the image may correspond to a video captured by the camera of the computing device or a video displayed in a display of the computing device. Additionally or alternatively, the computing device may send background noise that is detected while the utterance is being received. In this case, the server can use the background noise to to generate additional labels and / or to evaluate the generated labels.

Die Transkription kann einer Äußerung entsprechen, die durch die Computervorrichtung empfangen ist. Bei einigen Aspekten wird die Transkription durch die Computervorrichtung basierend auf der empfangenen Äußerung erzeugt. Bei anderen Aspekten entspricht die Transkription einer Anwendereingabe, die durch die Computervorrichtung empfangen ist. Beispielsweise kann ein Anwender eine Frage über eine Tastatur oder eine Anwenderschnittstelle der Computervorrichtung eingeben. Die Computervorrichtung kann die Transkription basierend auf der Eingabe erzeugen und die Transkription zum Server liefern.The transcription may correspond to an utterance received by the computing device. In some aspects, the transcription is generated by the computing device based on the received utterance. In other aspects, the transcription corresponds to a user input received by the computing device. For example, a user may enter a question via a keyboard or user interface of the computing device. The computing device may generate the transcription based on the input and provide the transcription to the server.

Bei einem Schritt 320 identifiziert der Server ein bestimmtes Unterbild, das im Bild enthalten ist. Der Server ist konfiguriert, um ein bestimmtes Unterbild aus einem oder mehreren Bildern im Bild zu identifizieren. Das bestimmte Unterbild kann ein Bild sein, für das es wahrscheinlich ist, dass es im primären Fokus oder Interesse eines Anwenders ist. Beispielsweise kann das Bild eine Fotografie sowie auch mehrere andere graphische Symbole enthalten. Der Server kann konfiguriert sein, um das Bild zu analysieren, um zu bestimmen, dass die Fotografie von primärem Interesse für den Anwender ist, während die anderen graphischen Symbole in der Anzeige nicht von primärem Interesse für den Anwender sind.At one step 320 the server identifies a particular subpicture that is included in the image. The server is configured to identify a particular sub-picture from one or more pictures in the picture. The particular subpicture may be an image that is likely to be in the primary focus or interest of a user. For example, the image may contain a photograph as well as several other graphical symbols. The server may be configured to analyze the image to determine that the photograph is of primary interest to the user, while the other graphical symbols in the display are not of primary interest to the user.

Bei einem Schritt 330 bestimmt der Server ein oder mehrere erste Labels basierend auf einer Bilderkennung des bestimmten Unterbilds. Der Server kann eine Bilderkennung an dem bestimmten Unterbild im Bild durchführen, um eine oder mehrere Entitäten im bestimmten Unterbild zu identifizieren und jeweilige Labels für die eine oder die mehreren Entitäten zu erzeugen. Jede der einen oder der mehreren Entitäten kann einem oder mehreren jeweiligen ersten Labels entsprechen. Die ersten Labels können teilweise unter Verwendung von Metadaten bestimmt werden, die mit dem bestimmten Unterbild assoziiert sind. Die ersten Labels können einen Kontext des bestimmten Unterbilds anzeigen. Bei bestimmten Aspekten ist der Server konfiguriert, um eine Bilderkennung über dem gesamten Bild durchzuführen. In diesem Fall können die ersten Labels für alle Entitäten erzeugt werden, die durch eine Bilderkennung im Bild identifiziert sind.At one step 330 The server determines one or more first labels based on an image identifier of the particular sub-image. The server may perform image recognition on the particular sub-image in the image to identify one or more entities in the particular sub-image and to generate respective labels for the one or more entities. Each of the one or more entities may correspond to one or more respective first labels. The first labels may be partially determined using metadata associated with the particular sub-image. The first labels can display a context of the particular sub-picture. In certain aspects, the server is configured to perform image recognition over the entire image. In this case, the first labels can be generated for all entities identified by image recognition in the image.

Bei einem Schritt 340 bestimmt der Server ein oder mehrere zweite Labels basierend auf einer Texterkennung an einem Teilbereich des Bilds, der ein anderer als das bestimmte Unterbild ist. Der Server kann eine Texterkennung am Teilbereich des Bilds, der ein anderer als das bestimmte Unterbild ist, durchführen, um einen textuellen Inhalt des Bilds zu identifizieren, um ihn bei der Erzeugung von Labels zu verwenden, die einen Kontext des Inhalts anzeigen. Der textuelle Inhalt kann unter Verwendung von einem oder mehreren zweiten Labels gekennzeichnet werden, die einen Kontext des bestimmten Inhalts anzeigen.At one step 340 The server determines one or more second labels based on text recognition at a portion of the image other than the particular sub-image. The server may perform a text recognition on the portion of the image other than the particular sub-image to identify a textual content of the image for use in generating labels indicative of a context of the content. The textual content may be identified using one or more second labels that indicate a context of the particular content.

Die zweiten Labels können teilweise unter Verwendung von Metadaten bestimmt werden, die mit dem Teilbereich assoziiert sind. Beispielsweise kann der Server konfiguriert sein, um auf einen Code in Bezug auf ein Anzeigen eines Inhalts auf der Anzeige der Computervorrichtung zuzugreifen und ihn zu erfassen. In diesem Fall kann der Server auf einen Markierungscode zugreifen und den Markierungscode erfassen, um auf Metadaten zu analysieren, die bei der Erzeugeng der zweiten Labels verwendet werden können. Bei einigen Aspekten ist der Server konfiguriert, um eine Texterkennung über dem gesamten Bild durchzuführen. In diesem Fall können die zweiten Labels für den gesamten textuellen Inhalt erzeugt werden, der durch eine Texterkennung im Bild identifiziert ist.The second labels may be determined in part using metadata associated with the partition. For example, the server may be configured to access and capture code related to displaying content on the display of the computing device. In this case, the server can access a tag code and capture the tag code to analyze for metadata that can be used in the generation of the second labels. In some aspects, the server is configured to perform text recognition over the entire image. In this case, the second labels may be generated for the entire textual content identified by a text recognition in the image.

Bei einem Schritt 350 erzeugt der Server eine Suchanfrage basierend auf der Transkription, den ersten Labels und den zweiten Labels. Spezifisch ist der Server konfiguriert, um die Suchanfrage basierend auf der Transkription und den Labels zu erzeugen. Bei einigen Beispielen ist der Server konfiguriert, um mehrere Kandidaten-Suchanfragen basierend auf der Transkription und den Labels zu erzeugen. Die Kandidaten-Suchanfragen können basierend auf historischen Anfragedaten in eine Reihenfolge gebracht werden. Als solches kann eine Kandidaten-Suchanfrage mit einem obersten Rang als die Suchanfrage ausgewählt werden.At one step 350 The server generates a search query based on the transcription, the first labels, and the second labels. Specifically, the server is configured to generate the search query based on the transcription and the labels. In some examples, the server is configured to generate multiple candidate search queries based on the transcription and the labels. Candidate queries can be sequenced based on historical query data. As such, a candidate query with a top rank may be selected as the search query.

Bei einem Schritt 360 liefert der Server die Suchanfrage zur Ausgabe. Die ausgewählte Suchanfrage kann direkt zu einer Suchmaschine geliefert werden. In diesem Fall kann der Server auch konfiguriert sein, um ein oder mehrere Suchergebnisse von der Suchmaschine zu empfangen und um die Suchergebnisse zur Ausgabe zu liefern. Beispielsweise kann der Server die Suchanfrage zur Suchmaschine liefern, ein bestimmtes Suchergebnis auswählen und das Suchergebnis zur Computervorrichtung zur akustischen oder visuellen Ausgabe liefern.At one step 360 the server returns the search query for output. The selected search query can be delivered directly to a search engine. In this case, the server may also be configured to receive one or more search results from the search engine and to provide the search results for output. For example, the server may provide the search query to the search engine, select a particular search result, and provide the search result to the computing device for audible or visual output.

Bei anderen Aspekten kann die Suchanfrage zur Computervorrichtung geliefert werden. Die Computervorrichtung kann die Suchanfrage zur akustischen oder visuellen Ausgabe liefern. In diesem Fall kann die Suchanfrage durch einen Anwender verifiziert werden, bevor sie als Eingabe zu einer Suchmaschine geliefert wird.In other aspects, the query may be delivered to the computing device. The computing device may provide the search query for audible or visual output. In this case, the search request may be verified by a user before being provided as input to a search engine.

4 ist ein Ablaufdiagramm, das einen beispielhaften Prozess 400 zum Auswählen eines bestimmten Unterbilds unter Verwendung von Vertrauenswerten darstellt. Der Prozess 400 kann durch einen oder mehrere Server oder andere Computervorrichtungen durchgeführt werden. Beispielsweise können Operationen des Prozesses 400 durch den Server 206 der 2 durchgeführt werden. Operationen des Prozesses 400 können auch als Anweisungen implementiert sein, die auf einem nichtflüchtigen computerlesbaren Medium gespeichert sind, und dann, wenn die Anweisungen durch einen oder mehrere Server (oder andere Computervorrichtungen) ausgeführt werden, veranlassen die Anweisungen, dass der eine oder die mehreren Server Operationen des Prozesses 400 durchführen. 4 is a flowchart illustrating an exemplary process 400 to select a particular subpicture using Represents trustworthy. The process 400 can be performed by one or more servers or other computing devices. For example, operations of the process 400 through the server 206 of the 2 be performed. Operations of the process 400 may also be implemented as instructions stored on a non-transitory computer-readable medium, and when the instructions are executed by one or more servers (or other computing devices), the instructions cause the one or more servers to perform operations of the process 400 carry out.

Bei einem Schritt 410 identifiziert der Server Bilder die in einem Bild enthalten sind. Bei bestimmten Aspekten empfängt der Server ein Bild von einer Computervorrichtung und identifiziert eine Vielzahl von Bildern im Bild. Der Server kann konfiguriert sein, um eine Bilderkennung an dem Bild durchzuführen, um die Vielzahl von Bildern zu identifizieren. Die Bilder können Fotografien, Symbole, Zeichnungen, Bilder und ähnliches enthalten. Die Bilder können bezüglich einer Größe, einer Form und eines Typs variieren. Bei einigen Aspekten entsprechen die Bilder einem Standbild eines Videos. Beispielsweise kann das Bild von einer Web-Seite sein, die mehrere Bilder enthält, und ein Video, das im Hintergrund spielt. Das Bild kann einem einzelnen erfassten Einzelbild des Videos entsprechen, das auf der Web-Seite spielt.At one step 410 the server identifies images contained in an image. In certain aspects, the server receives an image from a computing device and identifies a plurality of images in the image. The server may be configured to perform an image recognition on the image to identify the plurality of images. The pictures may contain photographs, symbols, drawings, pictures and the like. The images may vary in size, shape and type. In some aspects, the images correspond to a still image of a video. For example, the image may be from a web page containing multiple images and a video playing in the background. The image may correspond to a single captured frame of the video playing on the web page.

Bei einem Schritt 420 erzeugt der Server einen Vertrauenswert für jedes der identifizierten Bilder. Die Vertrauenswerte können jeweils eine Wahrscheinlichkeit anzeigen, dass ein Bild ein Bild von primärem Interesse für einen Anwender ist, der das Bild anschaut. Die Vertrauenswerte können basierend auf verschiedenen Merkmalen des Bilds bestimmt werden. Beispielsweise kann der Server größere Vertrauenswerte für große Bilder gegenüber kleinen Bildern im Bild erzeugen. Bei einem weiteren Beispiel kann der Server größere Vertrauenswerte für Bilder mit einer großen Anzahl von identifizierbaren Entitäten im Bild erzeugen, wie beispielsweise Sehenswürdigkeiten, Menschen oder Tieren, und umgekehrt.At one step 420 The server generates a trust value for each of the identified images. The confidence values may each indicate a probability that an image is an image of primary interest to a user viewing the image. The confidence values can be determined based on various features of the image. For example, the server may generate larger confidence values for large images over small images in the image. In another example, the server may generate greater confidence values for images having a large number of identifiable entities in the image, such as landmarks, humans, or animals, and vice versa.

Bei einem Schritt 430 wählt der Server ein bestimmtes Unterbild basierend auf den Vertrauenswerten aus. Der Server kann konfiguriert sein, um das bestimmte Unterbild basierend auf dem höchsten Vertrauenswert auszuwählen. Als solches können die Vertrauenswerte der Bilder verglichen werden, um zu bestimmen, welches Bild mit dem größten Vertrauenswert assoziiert ist. Bei einigen Beispielen wählt der Server mehrere Bilder aus. In diesem Fall kann der Server konfiguriert sein, um Bilder auszuwählen, wenn jedes der ausgewählten Bilder eine vorbestimmte Bild-Vertrauenswertschwelle erfüllt. Dies kann der Fall sein, wenn mehrere Bilder im Bild ähnliche Entitäten oder Objekte enthalten. Beispielsweise können zwei Bilder in einem Bild den Eiffelturm enthalten und kann ein drittes Bild den Eiffelturm nicht enthalten. Als solches können die zwei jeweiligen Bilder, die den Eiffelturm enthalten, als die bestimmten Unterbilder aufgrund des ähnlichen Inhalts in jedem der zwei Bilder ausgewählt werden.At one step 430 The server selects a specific subpicture based on the trusted values. The server may be configured to select the particular subpicture based on the highest confidence value. As such, the confidence values of the images may be compared to determine which image is associated with the greatest confidence value. In some examples, the server selects multiple images. In this case, the server may be configured to select images when each of the selected images meets a predetermined image confidence threshold. This may be the case if multiple images in the image contain similar entities or objects. For example, two images in one image may contain the Eiffel Tower and a third image may not contain the Eiffel Tower. As such, the two respective images containing the Eiffel Tower may be selected as the particular sub-images due to the similar content in each of the two images.

5 ist ein Ablaufdiagramm, das einen beispielhaften Prozess 500 zum Erzeugen einer Suchanfrage unter Verwendung ausgewählter Labels darstellt. Der Prozess 500 kann durch einen oder mehrere Server oder andere Computervorrichtungen durchgeführt werden. Beispielsweise können Operationen des Prozesses 500 durch den Server 206 der 2 durchgeführt werden. Operationen des Prozesses 500 können auch als Anweisungen implementiert sein, die auf einem nichtflüchtigen computerlesbaren Medium gespeichert sind, und dann, wenn die Anweisungen durch eine oder mehrere Server (oder andere Computervorrichtungen) ausgeführt werden, veranlassen die Anweisungen, dass der eine oder die mehreren Server Operationen des Prozesses 500 durchführen. 5 is a flowchart illustrating an exemplary process 500 to generate a search query using selected labels. The process 500 can be performed by one or more servers or other computing devices. For example, operations of the process 500 may be performed by the server 206 of the 2 be performed. Operations of the process 500 may also be implemented as instructions stored on a non-transitory computer-readable medium, and when the instructions are executed by one or more servers (or other computing devices), the instructions cause the one or more servers to perform operations of the process 500 carry out.

Bei einem Schritt 510 erzeugt der Server einen Vertrauenswert für jedes der ersten Labels und der zweiten Labels. Die ersten Labels können einem bestimmten Unterbild entsprechen, das in einem Bild identifiziert ist, und die zweiten Labels können einem Teilbereich des Bilds entsprechen, der ein anderer als das bestimmte Unterbild ist. Beispielsweise kann das bestimmte Unterbild eine Fotografie des Eiffelturms in dem Bild sein und kann der Teilbereich des Bilds, der ein anderer als das bestimmte Unterbild ist, Kommentierungen über die Fotografie enthalten. Die Vertrauenswerte für die ersten und die zweiten Labels zeigen jeweils eine Wahrscheinlichkeit an, dass das jeweilige Label einem Teilbereich des bestimmten Unterbilds entspricht, der von primärem Interesse für den Anwender ist.At one step 510 The server generates a trust value for each of the first labels and the second labels. The first labels may correspond to a particular sub-image identified in an image, and the second labels may correspond to a portion of the image other than the particular sub-image. For example, the particular sub-image may be a photograph of the Eiffel Tower in the image, and the portion of the image other than the particular sub-image may include commentary on the photograph. The confidence values for the first and second labels each indicate a likelihood that the respective label corresponds to a portion of the particular sub-image that is of primary interest to the user.

Bei einem Schritt 520 wählt der Server ein oder mehrere der ersten Labels und der zweiten Labels basierend auf den Vertrauenswerten aus. Beispielsweise kann der Server ein einziges Label mit dem größten Vertrauenswert auswählen. Bei einem weiteren Beispiel ist der Server konfiguriert, um Labels mit Vertrauenswerten auszuwählen, die eine vorbestimmte Vertrauenswertschwelle erfüllen. Bei einem weiteren Beispiel ist der Server konfiguriert, um eine vorbestimmte Anzahl von Labels mit den größten Vertrauenswerten auszuwählen.At one step 520 The server selects one or more of the first labels and the second labels based on the trusted values. For example, the server can select a single label with the largest trust value. In another example, the server is configured to select labels with trust values that satisfy a predetermined confidence threshold. In another example, the server is configured to select a predetermined number of labels with the highest confidence values.

Bei einem Schritt 530 erzeugt der Server eine Suchanfrage unter Verwendung einer empfangenen Transkription, der ausgewählten ersten Labels und der ausgewählten zweiten Labels. Der Server kann konfiguriert sein, um die erzeugte Suchanfrage zur Ausgabe zu liefern. Beispielsweise kann der Server konfiguriert sein, um die erzeugte Suchanfrage zu einer Suchmaschine zu liefern. Bei einem weiteren Beispiel kann der Server die Suchanfrage erzeugen und die Suchanfrage zu einer Computervorrichtung senden. In diesem Fall kann die Suchanfrage zu einem Anwender akustisch oder visuell durch die Computervorrichtung geliefert werden.At one step 530 The server generates a search query using a received transcription, the selected first labels and the selected second labels. The server may be configured to deliver the generated search request for output. For example, the server may be configured to match the query generated to deliver to a search engine. In another example, the server may generate the search query and send the search request to a computing device. In this case, the search request may be delivered to a user acoustically or visually by the computing device.

6 ist eine graphische Darstellung einer beispielhaften Computervorrichtung 600 und einer beispielhaften mobilen Computervorrichtung 650, die mit den hierin beschriebenen Techniken verwendet werden können. Die Computervorrichtung 600 soll verschiedene Formen von digitalen Computern darstellen, wie beispielsweise Laptops, Desktops, Workstations, persönliche digitale Assistenten, Server, Bladeserver, Großrechner und andere geeignete Computer. Die mobile Computervorrichtung 650 soll verschiedene Formen von mobilen Vorrichtungen darstellen, wie beispielsweise persönliche digitale Assistenten, zellulare Telefone, Smartphones und andere ähnliche Computervorrichtungen. Die hier gezeigten Komponenten, ihre Verbindungen und Beziehungen und ihre Funktionen haben die Bedeutung, dass sie nur exemplarisch sind, und haben nicht die Bedeutung, dass sie Implementierungen der Erfindungen beschränken, die in diesem Dokument beschrieben und/oder beansprucht sind. 6 FIG. 10 is a diagram of an example computing device 600 and an example mobile computing device. FIG 650 which can be used with the techniques described herein. The computing device 600 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The mobile computing device 650 is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are meant to be exemplary only, and are not intended to limit implementations of the inventions described and / or claimed in this document.

Die Computervorrichtung 600 enthält einen Prozessor 602, einen Speicher 604, eine Speichervorrichtung 606, eine Hochgeschwindigkeitsschnittstelle 608, die mit dem Speicher 604 verbindet, und Hochgeschwindigkeits-Erweiterungsports 610, und eine Niedergeschwindigkeitsschnittstelle 612, die mit einem Niedergeschwindigkeitsbus 614 und der Speichervorrichtung 606 verbindet. Jede der Komponenten 602, 604, 606, 608, 610 und 612 ist unter Verwendung verschiedener Busse miteinander verbunden und kann auf einer gemeinsamen Hauptplatine oder auf andere Weisen, wie es geeignet ist, angebracht sein. Der Prozessor 602 kann Anweisungen zur Ausführung innerhalb der Computervorrichtung 600 verarbeiten, einschließlich Anweisungen, die im Speicher 604 oder auf der Speichervorrichtung 606 gespeichert sind, um graphische Information für eine GUI auf einer externen Eingabe/Ausgabe-Vorrichtung anzuzeigen, wie beispielsweise einer Anzeige 616, die mit der Hochgeschwindigkeitsschnittstelle 608 gekoppelt ist. Bei anderen Implementierungen können mehrere Prozessoren und/oder mehrere Busse verwendet werden, wie es geeignet ist, zusammen mit mehreren Speichern und Typen eines Speichers. Ebenso können mehrere Computervorrichtungen 600 verbunden sein, wobei jede Vorrichtung Teilbereiche der nötigen Operationen zur Verfügung stellt (z.B. als eine Serverbank, eine Gruppe von Bladeservern oder ein Mehrprozessorsystem).The computer device 600 contains a processor 602 a memory 604, a memory device 606 , a high-speed interface 608 connected to the memory 604 connects, and high-speed expansion ports 610, and a low-speed interface 612 connected to a low-speed bus 614 and the memory device 606 combines. Each of the components 602 . 604 . 606 . 608 . 610 and 612 is interconnected using various buses and may be mounted on a common motherboard or in other ways as appropriate. The processor 602 may process instructions for execution within the computing device 600, including instructions stored in memory 604 or on the storage device 606 are stored to display graphical information for a GUI on an external input / output device, such as a display 616 that with the high-speed interface 608 is coupled. In other implementations, multiple processors and / or multiple buses may be used, as appropriate, along with multiple memories and types of memory. Likewise, multiple computing devices 600 may be connected, with each device providing portions of the necessary operations (eg, as a server bank, a group of blade servers, or a multi-processor system).

Der Speicher 604 speichert Information innerhalb der Computervorrichtung 600. Bei einer Implementierung ist der Speicher 604 eine flüchtige Speichereinheit oder -einheiten. Bei einer weiteren Implementierung ist der Speicher 604 eine nichtflüchtige Speichereinheit oder -einheiten. Der Speicher 604 kann auch eine andere Form von computerlesbarem Medium sein, wie beispielsweise eine magnetische oder optische Platte.The memory 604 stores information within the computing device 600. In one implementation, the memory is 604 a volatile memory unit or units. In another implementation, the memory is 604 a non-volatile storage unit or units. The memory 604 may also be another form of computer-readable medium, such as a magnetic or optical disk.

Die Speichervorrichtung 606 kann einen Massenspeicher für die Computervorrichtung 600 zur Verfügung stellen. Bei einer Implementierung kann die Speichervorrichtung 606 ein computerlesbares Medium sein oder enthalten, wie beispielsweise eine Diskettenvorrichtung, eine Festplattenvorrichtung, eine optische Plattenvorrichtung oder eine Bandvorrichtung, ein Flash-Speicher oder eine andere ähnliche Festkörperspeichervorrichtung oder eine Gruppe von Vorrichtungen, einschließlich Vorrichtungen in einem Speicherbereichsnetzwerk oder anderen Konfigurationen. Ein Computerprogrammprodukt kann greifbar in einem Informationsträger verkörpert sein. Das Computerprogrammprodukt kann auch Anweisungen enthalten, die dann, wenn sie ausgeführt werden, ein oder mehrere Verfahren durchführen, wie beispielsweise diejenigen, die oben beschrieben sind. Der Informationsträger ist ein computer- oder maschinenlesbares Medium, wie beispielsweise der Speicher 604, die Speichervorrichtung 606 oder ein Speicher am Prozessor 602.The storage device 606 may provide a mass storage for the computing device 600. In one implementation, the storage device may 606 a computer readable medium, such as a floppy disk device, hard disk device, optical disk device or tape device, flash memory, or other similar solid state storage device or set of devices, including devices in a storage area network or other configurations. A computer program product may be tangibly embodied in an information carrier. The computer program product may also include instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer- or machine-readable medium, such as the memory 604 , the storage device 606 or a memory on the processor 602 ,

Die Hochgeschwindigkeitssteuerung 608 managt bandbreitenintensive Operationen für die Computervorrichtung 600, während die Niedergeschwindigkeitssteuerung 612 Operationen managt, die weniger bandbreitenintensiv sind. Eine solche Zuteilung von Funktionen ist nur beispielhaft. Bei einer Implementierung ist die Hochgeschwindigkeitssteuerung 608 mit dem Speicher 604, der Anzeige 616 (z.B. über einen Grafikprozessor oder Akzelerator) gekoppelt, und mit Hochgeschwindigkeits-Erweiterungsports 610, die verschiedene Erweiterungskarten (nicht gezeigt) aufnehmen können. Bei der Implementierung ist die Niedergeschwindigkeitssteuerung 612 mit der Speichervorrichtung 606 und dem Niedergeschwindigkeits-Erweiterungsport 614 gekoppelt. Das Niedergeschwindigkeits-Erweiterungsport, das verschiedene Kommunikationsports (z.B. USB, Bluetooth, Ethernet, drahtloses Ethernet) enthalten kann, kann mit einer oder mehreren Eingabe/Ausgabe-Vorrichtungen gekoppelt sein, wie beispielsweise einer Tastatur, einer Zeigevorrichtung, einem Scanner oder einer Netzwerkvorrichtung, wie beispielsweise einer Leitungsvermittlungsstelle oder einem Router, z.B. über einen Netzwerkadapter.The high speed control 608 manages bandwidth intensive operations for the computing device 600 while the low-speed controller 612 manages operations that are less bandwidth intensive. Such an allocation of functions is only exemplary. In one implementation, the high-speed control is 608 with the memory 604 , the display 616 (eg, coupled via a graphics processor or accelerator), and high speed expansion ports 610 that can accommodate various expansion cards (not shown). In implementation, the low-speed controller 612 is the memory device 606 and the low speed expansion port 614. The low-speed expansion port, which may include various communication ports (eg, USB, Bluetooth, Ethernet, wireless Ethernet), may be coupled to one or more input / output devices, such as a keyboard, pointing device, scanner, or network device, such as for example, a circuit switch or a router, eg via a network adapter.

Die Computervorrichtung 600 kann in einer Anzahl von unterschiedlichen Formen implementiert sein, wie es in der Figur gezeigt ist. Beispielsweise kann sie als ein Standardserver 620 implementiert sein, oder mehrere Male in einer Gruppe von solchen Servern. Sie kann auch als Teil eines Rackserversystems 624 implementiert sein. Zusätzlich kann sie in einem Personalcomputer implementiert sein, wie beispielsweise einem Laptop-Computer 622. Alternativ können Komponenten von der Computervorrichtung 600 mit anderen Komponenten in einer mobilen Vorrichtung (nicht gezeigt) kombiniert sein, wie beispielsweise der mobilen Computervorrichtung 650. Jede von solchen Vorrichtungen kann eine oder mehrere von Computervorrichtungen 600, 650 enthalten, und ein gesamtes System kann aus mehreren Computervorrichtungen 600, 650 gebildet sein, die miteinander kommunizieren.The computer device 600 can be implemented in a number of different forms, as shown in the figure. For example, it can act as a standard server 620 be implemented, or several times in a group of such Servers. It may also be implemented as part of a rack server system 624. In addition, it may be implemented in a personal computer, such as a laptop computer 622 , Alternatively, components of the computing device 600 combined with other components in a mobile device (not shown), such as the mobile computing device 650 , Each of such devices may include one or more of computer devices 600 . 650 included, and an entire system may consist of multiple computer devices 600 . 650 be formed, which communicate with each other.

Die mobile Computervorrichtung 650 enthält einen Prozessor 652, einen Speicher 664, eine Eingabe/Ausgabe-Vorrichtung, wie beispielsweise eine Anzeige 654, eine Kommunikationsschnittstelle 666 und einen Transceiver 668, und zwar unter anderen Komponenten. Die mobile Computervorrichtung 650 kann auch mit einer Speichervorrichtung versehen sein, wie beispielsweise einem Mikrolaufwerk oder einer anderen Vorrichtung, um einen zusätzlichen Speicher zur Verfügung zu stellen. Jede der Komponenten 650, 652, 664, 654, 666 und 668 ist unter Verwendung verschiedener Busse miteinander verbunden, und mehrere der Komponenten können auf einer gemeinsamen Hauptplatine oder auf andere Weisen, wie es geeignet ist, angebracht sein.The mobile computing device 650 contains a processor 652 , a store 664 , an input / output device, such as a display 654, a communication interface 666 and a transceiver 668 , among other components. The mobile computing device 650 may also be provided with a storage device, such as a microdrive or other device, to provide additional storage. Each of the components 650 . 652 . 664 . 654 . 666 and 668 is interconnected using different buses, and several of the components may be mounted on a common motherboard or in other ways as appropriate.

Der Prozessor 652 kann Anweisungen innerhalb der mobilen Computervorrichtung 650 ausführen, einschließlich Anweisungen, die im Speicher 664 gespeichert sind. Der Prozessor kann als eine Chipgruppe von Chips implementiert sein, die separate und mehrere analoge und digitale Prozessoren enthalten. Der Prozessor kann beispielsweise für eine Koordination der anderen Komponenten der mobilen Computervorrichtung 650 sorgen, wie beispielsweise eine Steuerung von Anwenderschnittstellen, Anwendungen, die durch die Vorrichtung 650 in Betrieb genommen werden, und eine drahtlose Kommunikation durch die Vorrichtung 650.The processor 652 can execute instructions within the mobile computing device 650, including instructions stored in memory 664 are stored. The processor may be implemented as a chip group of chips containing separate and multiple analog and digital processors. For example, the processor may coordinate the other components of the mobile computing device 650 such as control of user interfaces, applications through the device 650 and wireless communication through the device 650.

Der Prozessor 652 kann mit einem Anwender durch eine Steuerungsschnittstelle 658 und eine Anzeigeschnittstelle 656, die mit einer Anzeige 654 gekoppelt sind, kommunizieren. Die Anzeige 654 kann beispielsweise eine TFT-LCD-(Dünnfilmtransistor-Flüssigkristallanzeigen-) oder eine OLED-(Organische Lichtemittierende Dioden-)Anzeige oder eine andere geeignete Anzeigetechnologie sein. Die Anzeigeschnittstelle 656 kann eine geeignete Schaltung zum Antreiben der Anzeige 654 umfassen, um graphische und andere Information zu einem Anwender zu präsentieren. Die Steuerungsschnittstelle 658 kann Befehle von einem Anwender empfangen und sie zur Überreichung zum Prozessor 652 umwandeln. Zusätzlich kann eine externe Schnittstelle 662 in Kommunikation mit dem Prozessor 652 vorgesehen sein, um eine Nahfeldkommunikation der mobilen Computervorrichtung 650 mit anderen Vorrichtungen zu ermöglichen. Die externe Schnittstelle 662 kann beispielsweise für eine verdrahtete Kommunikation bei einigen Implementierungen sorgen, oder für eine drahtlose Kommunikation bei anderen Implementierungen, und mehrere Schnittstellen können auch verwendet werden.The processor 652 can with a user through a control interface 658 and a display interface 656 that with an ad 654 are coupled, communicate. The ad 654 For example, a TFT-LCD (Thin Film Transistor Liquid Crystal Display) or OLED (Organic Light Emitting Diode) display or other suitable display technology may be used. The display interface 656 may be a suitable circuit for driving the display 654 to present graphical and other information to a user. The control interface 658 can receive commands from a user and submit them to the processor 652 convert. In addition, an external interface 662 in communication with the processor 652 to enable near field communication of the mobile computing device 650 with other devices. For example, the external interface 662 may provide for wired communication in some implementations, or for wireless communication in other implementations, and multiple interfaces may also be used.

Der Speicher 664 speichert Information innerhalb der mobilen Computervorrichtung 650. Der Speicher 664 kann als eines oder mehrere eines computerlesbaren Mediums oder von Medien implementiert sein, einer flüchtigen Speichereinheit oder von Einheiten, oder einer nichtflüchtigen Speichereinheit oder von Einheiten. Ein Erweiterungsspeicher 654 kann auch vorgesehen sein und mit der Vorrichtung 650 über die Erweiterungsschnittstelle 652 verbunden sein, die beispielsweise eine SIMM-(Einzelreihiger Speicherbaustein-)Kartenschnittstelle. Ein solcher Erweiterungsspeicher 654 kann einen zusätzlichen Speicherplatz für die Vorrichtung 650 zur Verfügung stellen oder kann auch Anwendungen oder andere Information für die Vorrichtung 650 speichern. Spezifisch kann der Erweiterungsspeicher 654 Anweisungen enthalten, um die oben beschriebenen Prozesse auszuführen oder zu ergänzen und kann auch sichere Information enthalten. Somit kann der Erweiterungsspeicher 654 beispielsweise als ein Sicherheitsmodul für die Vorrichtung 650 zur Verfügung stehen und kann mit Anweisungen programmiert sein, die eine sichere Verwendung der Vorrichtung 650 erlauben. Zusätzlich können sichere Anwendungen über die SIMM-Karten zur Verfügung gestellt werden, zusammen mit zusätzlicher Information, wie beispielsweise einem Platzieren von Identifizierungsinformation auf der SIMM-Karte auf eine nicht hackbare Weise.The memory 664 stores information within the mobile computing device 650. The memory 664 may be implemented as one or more of a computer-readable medium or media, a volatile storage device or units, or a nonvolatile storage device or devices. An expansion memory 654 may also be provided and with the device 650 via the expansion interface 652 connected, for example, a SIMM (Einzelreihiger memory module) card interface. Such an expansion memory 654 may provide additional storage for the device 650, or may also provide applications or other information for the device 650 to save. Specifically, the extended memory 654 may include instructions to perform or supplement the processes described above, and may also include secure information. Thus, the extended memory 654 For example, it may be available as a security module for device 650 and may be programmed with instructions that may safely use the device 650 allow. Additionally, secure applications may be provided via the SIMM cards, along with additional information such as placing identification information on the SIMM card in a non-hackable manner.

Der Speicher kann beispielsweise einen Flash-Speicher und/oder einen NVRAM-Speicher enthalten, wie es nachstehend diskutiert ist. Bei einer Implementierung ist ein Computerprogrammprodukt greifbar in einem Informationsträger verkörpert. Das Computerprogrammprodukt enthält Anweisungen, die dann, wenn sie ausgeführt werden, ein oder mehrere Verfahren durchführen, wie beispielsweise diejenigen, die oben beschrieben sind. Der Informationsträger ist ein computer- oder maschinenlesbares Medium, wie beispielsweise der Speicher 664, der Erweiterungsspeicher 654, der Speicher am Prozessor 652 oder ein ausgebreitetes Signal, das beispielsweise über einen Transceiver 668 oder eine externe Schnittstelle 662 empfangen werden kann.The memory may include, for example, a flash memory and / or an NVRAM memory, as discussed below. In one implementation, a computer program product is tangibly embodied in an information carrier. The computer program product includes instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer- or machine-readable medium, such as the memory 664 , the expansion memory 654 , the memory at the processor 652 or a propagated signal, for example via a transceiver 668 or an external interface 662 can be received.

Die Vorrichtung 650 kann drahtlos über eine Kommunikationsschnittstelle 666 kommunizieren, die eine digitale Signalverarbeitungsschaltung enthalten kann, wo es nötig ist. Die Kommunikationsschnittstelle 666 kann für Kommunikationen unter verschiedenen Moden unter Protokollen sorgen, wie beispielsweise GSM-Sprachaufrufen, SMS-, EMS- oder MMS-Nachrichtenübermittlung, CDMA, TDMA, PDC, WCDMA, CDMA2000 oder GPRS, und zwar unter anderem. Eine solche Kommunikation kann beispielsweise über einen Funkfrequenztransceiver 668 erfolgen. Zusätzlich kann eine Nahfeldkommunikation auftreten, wie beispielsweise unter Verwendung von Bluetooth, Wi-Fi oder einen anderen solchen Transceiver (nicht gezeigt). Zusätzlich kann ein GPS-(Globales Positioniersystem-)Empfängermodul 650 zusätzliche navigations- und standortbezogene Daten zu der Vorrichtung 650 liefern, die als geeignet durch Anwendungen verwendet werden können, die auf der Vorrichtung 650 laufen.The device 650 can communicate wirelessly via a communication interface 666, which may include a digital signal processing circuit where necessary. The communication interface 666 can under for communications various modes, such as GSM voice calls, SMS, EMS or MMS messaging, CDMA, TDMA, PDC, WCDMA, CDMA2000 or GPRS, among others. Such communication may be, for example, via a radio frequency transceiver 668 respectively. Additionally, near field communication may occur, such as using Bluetooth, Wi-Fi, or another such transceiver (not shown). Additionally, a GPS (Global Positioning System) receiver module 650 may provide additional navigation and location related data to the device 650 which can be used as suitable by applications on the device 650 to run.

Die Vorrichtung 650 kann auch akustisch unter Verwendung von Audiocodec 660 kommunizieren, der gesprochene Information von einem Anwender empfangen und sie in nutzbare digitale Information umwandeln kann. Audiocodec 660 kann gleichermaßen akustischen Klang für einen Anwender erzeugen, wie beispielsweise durch einen Lautsprecher, z.B. in einem Handgerät der Vorrichtung 650. Ein solcher Klang kann einen Klang von Sprachtelefonanrufen enthalten, kann aufgezeichneten Klang (z.B. Sprachnachrichten, Musikdateien, etc.) enthalten und kann auch Klang enthalten, der durch Anwendungen erzeugt ist, die auf der Vorrichtung 650 in Betrieb sind.The device 650 can also communicate acoustically using audio codec 660, which can receive spoken information from a user and convert it into usable digital information. Audio codec 660 may equally produce acoustic sound to a user, such as through a speaker, eg, in a handset of device 650. Such sound may include a sound of voice phone calls, may include and may include recorded sound (eg, voice messages, music files, etc.) Also contain sound that is generated by applications on the device 650 are in operation.

Die Computervorrichtung 650 kann in einer Anzahl von unterschiedlichen Formen implementiert sein, wie es in der Figur gezeigt ist. Beispielsweise kann sie als zellulares Telefon 680 implementiert sein. Sie kann auch als Teil eines Smartphones 682, eines persönlichen Assistenten oder einer anderen ähnlichen mobilen Vorrichtung implementiert sein.The computer device 650 can be implemented in a number of different forms, as shown in the figure. For example, it can be used as a cellular telephone 680 be implemented. It can also be part of a smartphone 682 , a personal assistant or other similar mobile device.

Eine Anzahl von Implementierungen ist beschrieben worden. Nichtsdestoweniger wird es verstanden werden, dass verschiedene Modifikationen durchgeführt werden können, ohne vom Sinngehalt und Schutzumfang der Offenbarung abzuweichen. Beispielsweise können verschiedene Formen der oben gezeigten Abläufe verwendet werden, wobei Schritte neu angeordnet, hinzugefügt oder entfernt sind.A number of implementations have been described. Nevertheless, it will be understood that various modifications can be made without departing from the spirit and scope of the disclosure. For example, various forms of the operations shown above may be used with steps rearranged, added, or removed.

Ausführungsformen der Erfindung und alle der funktionellen Operationen, die in dieser Beschreibung beschrieben sind, können in einer digitalen elektronischen Schaltung implementiert sein, oder in Computer-Software, -Firmware oder -Hardware, einschließlich der Strukturen, die in dieser Beschreibung offenbart sind, und ihrer strukturellen Äquivalente, oder in Kombination von einem oder mehreren von ihnen. Ausführungsformen der Erfindung können als ein oder mehrere Computerprogrammprodukte implementiert sein, wie z.B. ein oder mehrere Module vom Computerprogrammanweisungen, die auf einem computerlesbaren Medium zur Ausführung durch eine Datenverarbeitungsvorrichtung oder zum Steuern der Operation von dieser codiert sind. Das computerlesbare Medium kann eine maschinenlesbare Speichervorrichtung, ein maschinenlesbares Speichersubstrat, eine Speichervorrichtung, eine Zusammensetzung einer Sache, die ein maschinenlesbares ausgebreitetes Signal bewirkt, oder eine Kombination von einem oder mehreren von ihnen sein. Der Ausdruck „Datenverarbeitungsvorrichtung“ umfasst alle Vorrichtungen, Geräte und Maschinen zum Verarbeiten von Daten, einschließlich, anhand eines Beispiels, eines programmierbaren Prozessors, eines Computers oder mehrerer Prozessoren oder Computer. Die Vorrichtung kann zusätzlich zu Hardware einen Code enthalten, der eine Ausführungsumgebung für das infrage stehende Computerprogramm erzeugt, z.B. einen Code, der eine Prozessor-Firmware, einen Protokollstapel, ein Datenbankmanagementsystem, ein Betriebssystem oder eine Kombination von einem oder mehreren von ihnen bildet. Ein ausgebreitetes Signal ist ein künstlich erzeugtes Signal, wie z.B. ein maschinenerzeugtes elektrisches, optisches oder elektromagnetisches Signal, das erzeugt ist, um Information zur Übertragung zu einer geeigneten Empfängervorrichtung zu codieren.Embodiments of the invention and all of the functional operations described in this specification may be implemented in a digital electronic circuit, or in computer software, firmware, or hardware, including the structures disclosed in this specification, and theirs structural equivalents, or in combination of one or more of them. Embodiments of the invention may be implemented as one or more computer program products, such as e.g. one or more modules of computer program instructions encoded on a computer readable medium for execution by or for operation by a data processing device. The computer-readable medium may be a machine-readable storage device, a machine-readable storage substrate, a storage device, a composition of matter that effects a machine-readable propagated signal, or a combination of one or more of them. The term "data processing device" includes all devices, devices and machines for processing data, including, by way of example, a programmable processor, a computer or multiple processors or computers. The apparatus may include, in addition to hardware, code that generates an execution environment for the computer program in question, e.g. a code that forms a processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g. a machine-generated electrical, optical or electromagnetic signal generated to encode information for transmission to a suitable receiver device.

Ein Computerprogramm (das auch als Programm, Software, Softwareanwendung, Skript oder Code bekannt ist) kann in irgendeiner Form einer Programmiersprache geschrieben sein, einschließlich kompilierter oder interpretierter Sprachen, und es kann in irgendeiner Form genutzt werden, einschließlich als alleinstehendes Programm oder als ein Modul, eine Komponente, ein Unterprogramm oder einer anderen Einheit, die zur Verwendung in einer Computerumgebung geeignet ist. Ein Computerprogramm entspricht nicht notwendigerweise einer Datei in einem Dateiensystem. Ein Programm kann in einem Teilbereich einer Datei gespeichert sein, der andere Programme oder Daten hält (z.B. ein oder mehrere Skripte, die in einem Aufzeichnungssprachendokument gespeichert sind), in einer einzigen Datei, die für das infrage stehende Programm bestimmt ist, oder in mehreren koordinierten Dateien (z.B. Dateien, die ein oder mehrere Module, Unterprogramme oder Teilbereiche eines Codes speichern). Ein Computerprogramm kann genutzt werden, um auf einem Computer oder auf mehreren Computern, die an einer Stelle oder über mehrere Stellen verteilt und durch ein Kommunikationsnetzwerk miteinander verbunden sind, ausgeführt zu werden.A computer program (also known as a program, software, software application, script or code) may be written in any form of programming language, including compiled or interpreted languages, and may be used in any form, including as a stand-alone program or as a module , a component, subroutine, or other device suitable for use in a computer environment. A computer program does not necessarily correspond to a file in a file system. A program may be stored in a portion of a file that holds other programs or data (eg, one or more scripts stored in a recording language document), in a single file destined for the program in question, or in several coordinated ones Files (eg files that store one or more modules, subroutines or sections of a code). A computer program may be used to run on a computer or on multiple computers distributed in one or more locations and interconnected by a communication network.

Die Prozesse und logischen Abläufe, die in dieser Beschreibung beschrieben sind, können durch einen oder mehrere programmierbare Prozessoren durchgeführt werden, die ein oder mehrere Computerprogramme ausführen, um Funktionen durch Arbeiten an Eingangsdaten und Erzeugen einer Ausgabe durchzuführen. Die Prozesse und logischen Abläufe können auch durchgeführt werden durch, und eine Vorrichtung kann auch implementiert sein als, eine spezielle logische Schaltung, wie z.B. ein FPGA (feldprogrammierbares Gate-Array) oder eine ASIC (anwendungsspezifische Intergierte Schaltung).The processes and logical operations described in this specification may be performed by one or more programmable processors that execute one or more computer programs to perform functions by working on input data and generating a computer program Issue. The processes and logical operations may also be performed by, and a device may also be implemented as, a special logic circuit such as an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Intergrated Circuit).

Prozessoren, die für die Ausführung eines Computerprogramms geeignet sind, enthalten, anhand eines Beispiels, sowohl allgemeine als auch spezielle Mikroprozessoren, und irgendeinen oder mehrere Prozessoren von irgendeiner Art eines digitalen Computers. Allgemein wird ein Prozessor Anweisungen und Daten von einem Nurlesespeicher oder einem Direktzugriffsspeicher oder beiden empfangen. Die wesentlichen Elemente eines Computers sind ein Prozessor zum Durchführen von Anweisungen und eine oder mehrere Speichervorrichtungen zum Speichern von Anweisungen und Daten. Allgemein wird ein Computer auch eine oder mehrere Massenspeichervorrichtungen zum Speichern von Daten enthalten oder operativ damit gekoppelt sein, um Daten von diesen zu empfangen oder Daten zu diesen zu transferieren, oder beides, wie z.B. magnetische, magnetooptische Platten oder optische Platten. Jedoch muss ein Computer solche Vorrichtungen nicht haben. Darüber hinaus kann ein Computer in einer anderen Vorrichtung eingebettet sein, z.B. ein Tablet-Computer, ein Mobiltelefon, ein persönlicher digitaler Assistent (PDA), ein mobiles Audioabspielgerät, ein Empfänger eines globalen Positioniersystems (GPS), um nur einige zu nennen. Computerlesbare Medien, die zum Speichern von Computerprogrammanweisungen und Daten geeignet sind, enthalten alle Formen eines nicht flüchtigen Speichers, von Medien und Speichervorrichtungen, einschließlich, anhand eines Beispiels, von Halbleiterspeichervorrichtungen, z.B. EPROM, EEPROM, und Flash-Speichervorrichtungen; magnetische Platten, z.B. interne Festplatten oder entfernbare Platten; magnetooptische Platten; und CD-ROM und DVD-ROM-Scheiben. Der Prozessor und der Speicher können durch eine spezielle Logikschaltung ergänzt oder darin enthalten sein.Processors suitable for executing a computer program include, by way of example, both general and specific microprocessors, and any one or more processors of any type of digital computer. Generally, a processor will receive instructions and data from a read only memory or random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include or be operably coupled to one or more mass storage devices for storing data for receiving data from or transferring data thereto, or both, e.g. magnetic, magneto-optical disks or optical disks. However, a computer does not have to have such devices. In addition, a computer may be embedded in another device, e.g. a tablet computer, a mobile phone, a personal digital assistant (PDA), a mobile audio player, a global positioning system (GPS) receiver, just to name a few. Computer-readable media suitable for storing computer program instructions and data include all forms of nonvolatile memory, media and memory devices including, by way of example, semiconductor memory devices, e.g. EPROM, EEPROM, and flash memory devices; magnetic plates, e.g. internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM discs. The processor and memory may be supplemented or included by a special logic circuit.

Um für eine Interaktion mit einem Anwender zu sorgen, können Ausführungsformen der Erfindung auf einem Computer implementiert sein, der eine Anzeigevorrichtung hat, wie z.B. einen CRT-(Kathodenstrahlröhren-) oder einen LCD-(Flüssigkristallanzeigen-)Monitor, zum Anzeigen von Information zum Anwender, und eine Tastatur und eine Zeigevorrichtung, wie z.B. eine Maus oder einen Trackball, wodurch der Anwender eine Eingabe zum Computer liefern kann. Andere Arten von Vorrichtungen können ebenso gut verwendet werden, um für eine Interaktion mit einem Anwender zu sorgen; beispielsweise kann eine zum Anwender gelieferte Rückkopplung irgendeine Form von sensorischer Rückkopplung sein, z.B. eine visuelle Rückkopplung, eine akustische Rückkopplung oder eine taktile Rückkopplung; und eine Eingabe vom Anwender kann in irgendeiner Form empfangen werden, einschließlich einer akustischen, einer sprachlichen oder einer taktilen Eingabe.To provide for interaction with a user, embodiments of the invention may be implemented on a computer having a display device, such as a display device. a CRT (CRT) or LCD (liquid crystal display) monitor for displaying information to the user, and a keyboard and a pointing device, such as a keyboard. a mouse or trackball, allowing the user to provide input to the computer. Other types of devices may equally well be used to provide interaction with a user; For example, feedback provided to the user may be some form of sensory feedback, e.g. a visual feedback, an acoustic feedback or a tactile feedback; and an input from the user may be received in any form, including audible, verbal or tactile input.

Ausführungsformen der Erfindung können in einem Computersystem implementiert sein, das eine Backend-Komponente enthält, wie z.B. als einen Datenserver, oder das eine Middleware-Komponente enthält, wie z.B. einen Anwendungsserver, oder das eine Frontend-Komponente enthält, wie z.B. einen Client-Computer, der eine graphische Anwenderschnittstelle oder einen Web-Browser hat, wodurch ein Anwender mit einer Implementierung der Erfindung interagieren kann, oder irgendeine Kombination von einer oder mehreren von solchen Backend-, Middleware- oder Frontend-Komponenten. Die Komponenten des Systems können durch irgendeine Form oder ein Medium einer digitalen Datenkommunikation miteinander verbunden sein, z.B. einem Kommunikationsnetzwerk. Beispiele von Kommunikationsnetzwerken enthalten ein lokales Netz („LAN“) und ein Weitverkehrsnetz („WAN“), z.B. das Internet.Embodiments of the invention may be implemented in a computer system that includes a backend component, such as a backend component. as a data server or that contains a middleware component, e.g. an application server or that contains a frontend component, such as a client computer having a graphical user interface or web browser that allows a user to interact with an implementation of the invention, or any combination of one or more of such backend, middleware, or frontend components. The components of the system may be interconnected by some form or medium of digital data communication, e.g. a communication network. Examples of communication networks include a local area network ("LAN") and a wide area network ("WAN"), e.g. the Internet.

Das Computersystem kann Clients und Server enthalten. Ein Client und ein Server sind allgemein entfernt voneinander und interagieren typischerweise durch ein Kommunikationsnetzwerk. Die Beziehung von Client und Server entsteht mittels Computerprogrammen, die auf den jeweiligen Computern laufen und die eine Client-Server-Beziehung zueinander haben.The computer system may include clients and servers. A client and a server are generally remote from each other and typically interact through a communication network. The relationship between client and server is created by means of computer programs that run on the respective computers and that have a client-server relationship with one another.

Während diese Beschreibung viele Besonderheiten enthält, sollten diese nicht als Beschränkungen für den Schutzumfang der Erfindung oder von dem, was beansprucht sein kann, angesehen werden, sondern vielmehr als Beschreibungen von Merkmalen, die spezifisch für bestimmte Ausführungsformen der Erfindung sind. Bestimmte Merkmale, die in dieser Beschreibung in dem Zusammenhang von separaten Ausführungsformen beschrieben sind, können auch in Kombination in einer einzigen Ausführungsform implementiert sein. Gegensätzlich dazu können verschiedene Merkmale, die im Zusammenhang mit einer einzigen Ausführungsform beschrieben sind, auch in mehreren Ausführungsformen separat oder in irgendeiner geeigneten Unterkombination implementiert sein. Darüber hinaus können, obwohl Merkmale oben derart beschrieben sein können, dass sie in bestimmten Kombinationen agieren, und sogar anfänglich als solches beansprucht sind, ein oder mehrere Merkmale aus einer beanspruchten Kombination in einigen Fällen von der Kombination ausgeschlossen werden, und die beanspruchte Kombination kann auf eine Unterkombination oder eine Variation einer Unterkombination gerichtet sein.While this description contains many specifics, these should not be considered as limitations on the scope of the invention or on what may be claimed, but rather as descriptions of features specific to particular embodiments of the invention. Certain features described in this specification in the context of separate embodiments may also be implemented in combination in a single embodiment. Conversely, various features described in the context of a single embodiment may also be implemented separately or in any suitable subcombination in several embodiments. Moreover, although features above may be described as acting in certain combinations, and even initially claimed as such, in some instances one or more features of a claimed combination may be excluded from the combination and the claimed combination may include be a sub-combination or a variation of a sub-combination directed.

Gleichermaßen sollte, während Operationen in den Zeichnungen in einer bestimmten Reihenfolge gezeigt sind, dies nicht derart verstanden werden, dass es erforderlich ist, dass solche Operationen in der gezeigten bestimmten Reihenfolge oder in einer sequentiellen Reihenfolge durchgeführt werden oder dass alle dargestellten Operationen durchgeführt werden, um erwünschte Ergebnisse zu erreichen. Unter gewissen Umständen können Multitasking und Parallelverarbeitung vorteilhaft sein. Darüber hinaus sollte die Trennung von verschiedenen Systemkomponenten bei den oben beschriebenen Ausführungsformen nicht derart verstanden werden, dass eine solche Trennung bei allen Ausführungsformen erforderlich ist, und es sollte verstanden werden, dass die beschriebenen Programmkomponenten und Systeme allgemein miteinander in einem einzigen Softwareprodukt integriert oder in mehrere Softwareprodukte gepackt sein können.Likewise, while operations in the drawings are shown in a particular order, it should not be understood that it is required that such operations be performed in the particular order shown or in a sequential order, or that all illustrated operations be performed to achieve desired results. In some circumstances, multitasking and parallel processing may be beneficial. Moreover, the separation of various system components in the embodiments described above should not be construed as requiring such separation in all embodiments, and it should be understood that the described program components and systems are generally integrated with one another in a single software product or into multiple Software products can be packed.

In jedem Fall, in welchem eine HTML-Datei angegeben ist, können andere Dateientypen oder Formate ersetzt sein. Beispielsweise kann eine HTML-Datei durch XML, JSON, Klartext oder andere Typen von Dateien ersetzt sein. Darüber hinaus können dort, wo eine Tabelle oder eine Hash-Tabelle angegeben ist, andere Datenstrukturen (wie beispielsweise Kalkulationstabellen, relationale Datenbanken oder strukturierte Dateien) verwendet werden.In any case, where an HTML file is specified, other file types or formats may be substituted. For example, an HTML file can be replaced by XML, JSON, plain text, or other types of files. In addition, where a table or hash table is specified, other data structures (such as spreadsheets, relational databases, or structured files) may be used.

Bestimmte Ausführungsformen der Erfindung sind beschrieben worden. Andere Ausführungsformen sind innerhalb des Schutzumfangs der folgenden Ansprüche. Beispielsweise können die Schritte, die in den Ansprüchen vorgetragen sind, in einer anderen Reihenfolge durchgeführt werden und noch erwünschte Ergebnisse erreichen.Certain embodiments of the invention have been described. Other embodiments are within the scope of the following claims. For example, the steps recited in the claims may be performed in a different order and still achieve desirable results.

Claims

Computer-implemented method comprising: Receiving an image presented on a display of a computing device and a transcription of an utterance spoken by a user of the computing device; Identifying a particular sub-picture contained in the picture; based on performing an image recognition on the particular sub-image, determining one or more first labels that indicate a context of the particular sub-image; based on performing text recognition on an area of the image other than the particular sub-image, determining one or more second labels that indicate the context of the particular sub-image; based on the transcription, the first labels and the second labels generating a search query; and Deliver, to the output, the query.

Method according to Claim 1 wherein generating the search query comprises replacing one or more of the first labels or the second labels for terms of transcription.

Method according to one of the preceding claims, comprising: Generating, for each of the first labels and the second labels, a label confidence value indicating a likelihood that the label corresponds to a portion of the particular sub-image that is of primary interest to the user; and Selecting one or more of the first labels and the second labels based on the respective label confidence values, wherein the search request is generated based on the one or more selected first labels and second labels.

Method according to one of the preceding claims, wherein generating a search query comprises: Access historical query data containing previous search queries provided by other users; Generating, based on the transcription, the first labels and the second labels, one or more candidate search queries; Comparing the historical query data with the one or more candidate search queries; and based on comparing the historical query data with the one or more candidate search queries, selecting the search query among the one or more candidate search queries.

Method according to one of Claims 1 to 3 comprising: based on the transcription, the first labels and the second labels, generating one or more candidate search queries; Determining, for each of the one or more candidate search queries, a request trust value indicating a probability that the candidate search query is an accurate override of the transcription; and selecting, based on the query confidence values, a particular candidate query as the search query.

The method of any one of the preceding claims, wherein identifying the particular sub-image contained in the image comprises: identifying one or more sub-images contained in the image; Generating, for each of the one or more sub-images contained in the image, an image confidence value indicating a likelihood that a sub-image is of primary interest to the user; and selecting the particular sub-image based on the image confidence values for the one or more sub-images.

Method according to one of Claims 1 to 5 wherein identifying the particular sub-image contained in the image includes receiving data indicative of a selection of a control event at the computing device, the control event identifying the particular sub-image.

The method of any one of the preceding claims, wherein the computing device is configured to capture the image and to acquire audio data corresponding to the utterance in response to detecting a predefined hotword.

Method according to one of the preceding claims, comprising: Receiving an additional image corresponding to at least one further portion of the display of the computing device and an additional transcription of an additional utterance spoken by a user of the computing device; Identifying an additionally determined subpicture included in the additional image; determining, based on performing an image recognition on the additionally determined sub-image, one or more additional first labels indicating a context of the additional particular sub-image; and based on performing text recognition on a portion of the additional image that is other than the additional particular sub-image, determining one or more additional second labels that indicate the context of the additional particular sub-image; based on the additional transcription, the additional first labels and the additional second labels, generating a command; and Perform the command.

Method according to Claim 9 wherein performing the command comprises performing one or more of storing the additional image in a memory, storing the particular sub-image in the memory, uploading the additional image to a server, uploading the particular sub-image to the server, importing the additional one Image to an application of the computing device and importing the particular sub-image to use the computing device.

Method according to one of the preceding claims, comprising: Identifying metadata associated with the particular subimage wherein determining from the one or more first labels that indicate the context of the particular sub-image is still based on the metadata associated with the particular sub-image.

System comprising: one or more computers and one or more memory devices that store instructions that are operable when executed by the one or more computers to cause the one or more computers to perform operations that include: Receiving an image presented on a display of a computing device and a transcription of an utterance spoken by a user of the computing device; and Identifying a particular sub-picture contained in the picture; based on performing an image recognition on the particular sub-image, determining one or more first labels that indicate a context of the particular sub-image; based on performing text recognition on a portion of the image other than the particular sub-image, determining one or more second labels that indicate the context of the particular sub-image; based on the transcription, the first labels and the second labels generating a search query; and Deliver, to the output, the query.

System after Claim 12 wherein the operation for generating a search query comprises weighting the first labels differently from the second labels.

System after Claim 12 or 13 wherein the operation for generating the search request comprises replacing one or more of the first labels or the second labels for terms of the transcription.

System according to one of Claims 12 to 14 wherein the operations include: generating, for each of the first labels and the second labels, a label confidence value indicating a likelihood that the label corresponds to a portion of the particular sub-image that is of primary interest to the user; Selecting one or more of the first labels and the second labels based on the respective label confidence values, wherein the search request is generated based on the one or more selected first labels and second labels.

System according to one of Claims 12 to 15 wherein the operation to generate a search query comprises: accessing historical query data containing previous search queries provided by other users; Generating, based on the transcription, the first labels and the second labels of one or more candidate search queries; Comparing the historical query data with the one or more candidate search queries; and based on comparing the historical query data with the one or more candidate search queries, selecting the search query among the one or more candidate search queries.

A non-transitory computer-readable medium storing software that includes instructions executable by one or more computers that, in such an embodiment, cause the one or more computers to perform operations that include: Receiving an image presented on a display of a computing device and a transcription of an utterance spoken by a user of the computing device; Identifying a particular sub-picture contained in the picture; based on performing an image recognition on the particular sub-image, determining one or more first labels that indicate a context of the particular sub-image; based on performing text recognition on a portion of the image other than the particular sub-image, determining one or more second labels that indicate the context of the particular sub-image; based on the transcription, the first labels and the second labels, generating a search query; and Deliver, to the output, the query.

Non-volatile computer readable medium after Claim 17 wherein the operation for generating a search query comprises weighting the first labels differently from the second labels.

Non-volatile computer readable medium after Claim 17 or 18 wherein the operation for generating the search request comprises replacing one or more of the first labels or the second labels for terms of the transcription.

Non-volatile computer readable medium after one of the Claims 17 to 19 wherein the operations include: generating, for each of the first labels and the second labels, a label confidence value indicating a likelihood that the label corresponds to a portion of the particular sub-image that is of primary interest to the user; Selecting one or more of the first labels and the second labels based on the respective label confidence values, the query being generated based on the one or more selected first labels and second labels.