WO2021139667A1

WO2021139667A1 - Input processing method and apparatus, and computing device

Info

Publication number: WO2021139667A1
Application number: PCT/CN2021/070410
Authority: WO
Inventors: 葛妮瑜; 方视菁; 胡雪梅; 李洁
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2020-01-08
Filing date: 2021-01-06
Publication date: 2021-07-15
Also published as: CN113095090A

Abstract

Disclosed are an input processing method and apparatus, and a computing device. The method comprises: receiving an input of a user; acquiring an image related to the input; searching the image for target text matching the input; and with regard to the found target text, highlighting the target text in the image, and/or displaying auxiliary information of the target text. Further disclosed are the corresponding input processing apparatus, the computing device, and a storage medium.

Description

Input processing method, device and computing equipment

Technical field

The present invention relates to the field of computer technology, in particular to an input processing method, device and computing equipment.

Background technique

With the development and popularization of computing devices such as mobile terminals and personal computers, users are becoming more and more accustomed to using computing devices to handle various daily affairs. For example, when faced with a large amount of information, a user may wish to use a computing device to quickly locate part of the information the user pays attention to, or obtain auxiliary information of the part of the information the user pays attention to. A typical application scenario is that when traveling in a region where different languages are spoken, and faced with a large amount of information in an unfamiliar language, the user hopes to use a computing device to locate part of the information that the user pays attention to and obtain the translation of this part of the information.

In a traditional solution, a user can use a camera installed in a computing device to collect an image that includes a large amount of information, the computing device recognizes all the information in the image, and displays auxiliary information of all the information to the user. However, this may be unfriendly to the user. For example, the image may be an image of a map, which includes the names of many different places. The user interface that displays the translations of all the place names may be tiring for the user, because the user needs to read one by one to find the place he cares about or is interested in, and obtain the translation for that place. This process is quite time-consuming and laborious, which reduces the user experience.

Summary of the invention

To this end, embodiments of the present invention provide an input processing method, device, and computing device to try to solve or at least alleviate the above problems.

According to one aspect of the embodiments of the present invention, there is provided an input processing method, including: receiving a user's input; acquiring an image related to the input; searching for a target text in the image that matches the input; and for the found target text, Highlight the target text in the image, and/or display auxiliary information for the target text.

Optionally, in the method according to the embodiment of the present invention, the user's input includes at least one of the following: text, image, video, and audio. The method further includes: acquiring the input text based on the user's input.

Optionally, in the method according to the embodiment of the present invention, the step of searching for the target text in the image that matches the input includes: obtaining the text contained in the image; and performing text matching between the text contained in the image and the input text.

Optionally, in the method according to the embodiment of the present invention, the step of matching the text contained in the image with the input text includes: in the case that the language of the input text is different from the language of the text contained in the image, obtaining the input text to The language translation of the text contained in the image; the text contained in the image is matched with the translation of the input text.

Optionally, in the method according to the embodiment of the present invention, the step of displaying the auxiliary information includes: highlighting the auxiliary information.

Optionally, in the method according to the embodiment of the present invention, the method further includes: for the found target text, obtaining display information of the target text in the image, the display information including at least the display area and/or display style of the target text.

Optionally, in the method according to the embodiment of the present invention, the step of displaying or highlighting the auxiliary information includes: configuring the display area of the auxiliary information based on the display area of the target text, and the display area of the auxiliary information covers the display area of the target text.

Optionally, in the method according to the embodiment of the present invention, the step of displaying or highlighting the auxiliary information includes: configuring the display style of the auxiliary information based on the display style of the target text.

According to another aspect of the embodiments of the present invention, there is provided an input processing method, including: receiving user input; acquiring an image related to the input; sending the input and the image to a server, so that the server finds the image that matches the input Target text; and for the found target text, highlight the target text in the image, and/or display auxiliary information of the target text.

According to another aspect of the embodiments of the present invention, there is provided an input processing method, including: receiving a user's input; acquiring an image related to the input; searching for a target text in the image that matches the input; and for the searched target text , Highlight the target text in the image, and/or show the translation of the target text.

Optionally, in the method according to the embodiment of the present invention, the method further includes: obtaining input text based on the input, and the input text is expressed in the input language; and the step of searching for the target text in the image that matches the input includes: obtaining text contained in the image , The text contained in the image is expressed in a source language different from the input language; the translation of the input text to the source language is obtained; the text contained in the image is matched with the translation of the input text.

Optionally, in the method according to the embodiment of the present invention, the translation includes the translation of the target text into the input language, or the translation of the target text into the target language specified by the user.

According to another aspect of the embodiments of the present invention, there is provided an input processing method, including: receiving user input; acquiring an image related to the input; sending the input and the image to a server, so that the server finds the image that matches the input Target text; and for the found target text, highlight the target text in the image, and/or display the translation of the target text.

According to another aspect of the embodiments of the present invention, there is provided an input processing method, including: receiving a user's input; acquiring an image related to the input; searching for a target image in the image that matches the input; and for the searched target image , Highlight the target image in the image, and/or display auxiliary information of the target image.

Optionally, in the method according to the embodiment of the present invention, the user's input includes at least one of the following: text, audio, video, and image, and the method further includes: acquiring an input image based on the input.

Optionally, in the method according to the embodiment of the present invention, the step of searching for a target image in the image that matches the input includes: acquiring image features of the image and the input image; and performing image matching between the image and the input image based on the image features.

Optionally, in the method according to the embodiment of the present invention, the method further includes: for the searched target image, acquiring the display area of the target image in the image.

Optionally, in the method according to the embodiment of the present invention, the step of displaying or highlighting the auxiliary information includes: configuring the display area of the auxiliary information based on the display area of the target image, and the display area of the auxiliary information covers the display area of the target image.

According to another aspect of the embodiments of the present invention, there is provided an input processing method, including: receiving user input; acquiring an image related to the input; sending the input and the image to a server, so that the server finds the image that matches the input Target image; and for the searched target image, highlight the target image in the image, and/or display auxiliary information of the target image.

According to another aspect of the embodiments of the present invention, there is provided an input processing method, including: receiving a user's input; obtaining a source object related to the input; searching for a target object matching the input in the source object; Target object, highlight the target object in the source object, and/or display auxiliary information of the target object.

Optionally, in the method according to the embodiment of the present invention, the user's input includes at least one of the following: text, audio, video, and image, and the source object includes at least one of the following: text, image, and video.

Optionally, in the method according to the embodiment of the present invention, the target object is a target text or a target image.

According to another aspect of the embodiments of the present invention, there is provided an input processing method, including: receiving user input; obtaining a source object related to the input; sending the input and the source object to a server, so that the server can find the source object and the input The matched target object; and for the found target object, highlight the target object in the source object, and/or display auxiliary information of the target object.

According to another aspect of the embodiments of the present invention, there is provided an input processing device, including: an interaction module, adapted to receive user input; an image acquisition module, adapted to acquire an image related to the input; and a text matching module, adapted to search The target text in the image that matches the input; wherein the interaction module is also adapted to highlight the target text in the image and/or display auxiliary information of the target text for the found target text.

According to another aspect of the embodiments of the present invention, there is provided an input processing device, including: an interaction module, adapted to receive user input; an image acquisition module, adapted to acquire an image related to the input; an image matching module, adapted to search The target image matching the input in the image; wherein the interaction module is also suitable for highlighting the target image in the image and/or displaying auxiliary information of the target image for the found target image.

According to another aspect of the embodiments of the present invention, there is provided a computing device, including: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be configured by Executed by one or more processors, and the one or more programs include instructions for executing the input processing method according to the embodiment of the present invention.

According to yet another aspect of the embodiments of the present invention, there is provided a computer-readable storage medium storing one or more programs. The one or more programs include instructions that, when executed by a computing device, cause the computing device to execute according to The input processing method of the embodiment of the present invention.

According to the input processing solution of the embodiment of the present invention, by receiving active input from the user, the target object (for example, target text or target image) matching the input of the source object (for example, image) is highlighted for the user, and/or the target object is displayed The auxiliary information in the source object avoids the tedious operation of reading and finding the target object that the user pays attention to in the source object, and improves the user experience. In addition, the input processing solution object according to the embodiment of the present invention only needs to obtain the auxiliary information of the target object included in the source object, which greatly reduces the workload.

The above description is only an overview of the technical solutions of the embodiments of the present invention. In order to have a clearer understanding of the technical means of the embodiments of the present invention, they can be implemented in accordance with the content of the specification, and in order to allow the above and other objectives, features and characteristics of the embodiments of the present invention. The advantages can be more obvious and easy to understand, and the specific implementation manners of the embodiments of the present invention are exemplified below.

Description of the drawings

In order to achieve the above and related objects, this article describes certain illustrative aspects in conjunction with the following description and drawings. These aspects indicate various ways in which the principles disclosed herein can be practiced, and all aspects and their equivalents are intended to fall into Into the scope of the claimed subject matter. By reading the following detailed description in conjunction with the accompanying drawings, the above and other objectives, features and advantages of the present disclosure will become more apparent. Throughout this disclosure, the same reference numerals generally refer to the same parts or elements.

Fig. 1 shows a schematic diagram of a computing device 100 according to an embodiment of the present invention;

FIG. 2 shows a structural block diagram of an input processing device 200 according to an embodiment of the present invention;

Fig. 3 shows a schematic diagram of an input-related image according to an embodiment of the present invention;

4A-4C respectively show screenshots of a user interface for displaying the translation of target text in an image according to an embodiment of the present invention; and

FIG. 5 shows a flowchart of an input processing method 500 according to an embodiment of the present invention;

FIG. 6 shows a flowchart of an input processing method 600 according to an embodiment of the present invention

FIG. 7 shows a structural block diagram of an input processing device 700 according to an embodiment of the present invention;

FIG. 8 shows a flowchart of an input processing method 800 according to an embodiment of the present invention; and

FIG. 9 shows a flowchart of an input processing method 900 according to an embodiment of the present invention.

Detailed ways

Hereinafter, exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although the drawings show exemplary embodiments of the present disclosure, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided to enable a more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

Understandably, when faced with a large amount of information, the user hopes to quickly locate the part of the information that the user pays attention to, and display the corresponding auxiliary information (such as corresponding translation, evaluation and introduction, etc.) when locating this part of the information . To this end, an embodiment of the present invention discloses an input processing device, which can receive user input, obtain an image related to the input and carry a large amount of information, and highlight the target of the user’s attention in the image. Object, and/or display auxiliary information of the target object, so that the user can quickly locate the target object and/or obtain auxiliary information of the target object.

The input processing apparatus according to the embodiment of the present invention can be implemented by the following computing device. Fig. 1 shows a schematic diagram of a computing device 100 according to an embodiment of the present invention. The computing device 100 is an electronic device capable of collecting and/or displaying images, such as a personal computer, a mobile communication device (for example, a smart phone), a tablet computer, and other devices that can collect and/or display images.

As shown in FIG. 1, the computing device 100 may include a memory interface 102, one or more processors 104, and a peripheral interface 106. The memory interface 102, the one or more processors 104, and/or the peripheral interface 106 may be discrete components or integrated in one or more integrated circuits. In the computing device 100, various elements may be coupled through one or more communication buses or signal lines. Sensors, devices, and subsystems can be coupled to the peripheral interface 106 to help achieve multiple functions.

For example, the motion sensor 110, the light sensor 112, and the distance sensor 114 may be coupled to the peripheral interface 106 to facilitate functions such as orientation, lighting, and distance measurement. Other sensors 116 can also be connected to the peripheral interface 106, such as a positioning system (such as a GPS receiver), a temperature sensor, a biometric sensor or other sensing devices, which can help implement related functions.

The camera subsystem 120 and the optical sensor 122 can be used to facilitate the realization of camera functions such as capturing images. The camera subsystem 120 and the optical sensor 122 can be, for example, a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS). Optical sensor.

The computing device 100 may help implement communication functions through one or more wireless communication subsystems 124, where the wireless communication subsystem 124 may include a radio frequency receiver and transmitter and/or an optical (for example, infrared) receiver and transmitter. The specific design and implementation of the wireless communication subsystem 124 may depend on one or more communication networks supported by the computing device 100. For example, the computing device 100 may include a wireless communication subsystem 124 designed to support a GSM network, a GPRS network, an EDGE network, a Wi-Fi or WiMax network, and a BluetoothTM network.

The audio subsystem 126 may be coupled with the speaker 128 and the microphone 130 to help implement voice-enabled functions, such as voice recognition, voice reproduction, digital recording, and telephony functions.

To display images, the I/O subsystem 140 may include a display controller 142 and/or one or more other input controllers 144. The display controller 142 may be coupled to the display 146. The display 146 may be, for example, a liquid crystal display (LCD), a touch screen, or other types of displays. In some implementations, the display 146 and the display controller 142 may use any one of a variety of touch sensing technologies to detect contact and movement or pause with them, where sensing technologies include but are not limited to capacitive, Resistive, infrared and surface acoustic wave technology. One or more other input controllers 144 may be coupled to other input/control devices 148, such as one or more buttons, rocker switches, thumbwheels, infrared ports, USB ports, and/or pointing devices such as stylus . One or more of the buttons (not shown) may include an up/down button for controlling the volume of the speaker 128 and/or the microphone 130.

The memory interface 102 may be coupled with the memory 150. The memory 150 may include high-speed random access memory and/or non-volatile memory, such as one or more magnetic disk storage devices, one or more optical storage devices, and/or flash memory (eg, NAND, NOR).

The memory 150 may store a program 154, and the program 154 runs on the operating system 152 stored in the memory 150. When the computing device is running, the operating system 152 will be loaded from the memory 150 and executed by the processor 104. When the program 154 is running, it is also loaded from the memory 150 and executed by the processor 104. Among the various programs 154, one of the programs is the

input processing apparatus

200, 700 according to the embodiment of the present invention, and includes instructions configured to execute the input processing methods 500, 600, and 800 according to the embodiment of the present invention.

The following describes the input processing device 200 and the input processing methods 500 and 600 executed by it.

In some cases, the input-related image displays multiple texts (also referred to as containing multiple texts), and the user wishes to locate the target text in the multiple texts displayed in the image. For example, the input processing device 200 may receive the user's input on a dish of interest, and obtain an image of a restaurant menu. The image of the restaurant menu includes the logo text of multiple dishes, and the input processing device 200 can highlight the logo text of the dish that the user is interested in in the image, so that the user can quickly locate the dish of interest. Further, the evaluation or introduction of the dish can also be displayed for the user's reference.

For another example, the input processing device 200 may receive a user's input to a destination subway station, and obtain an image of a subway line. The image of the subway line includes the logo text of multiple subway stations, and the input processing device 200 can highlight the logo text of the target subway station in the image, so that the user can quickly locate the target subway station. Furthermore, the translation of the destination subway station can also be displayed for the user's reference.

For another example, the input processing device 200 may receive a user's input on a commodity that he wants to purchase, and obtain an image of a store shelf. The image of the store shelf includes the logo text of multiple commodities, and the input processing device 200 can highlight the logo text of the commodity that the user wants to purchase in the image, so that the user can quickly locate the commodity. Further, the evaluation, introduction or reference price of the product can also be displayed for the user's reference.

Fig. 2 shows a schematic diagram of an input processing device 200 according to an embodiment of the present invention. As shown in FIG. 2, the input processing apparatus 200 includes an interaction module 210, which can receive a user's input, and the input can indicate a target text that the user pays attention to. In some embodiments, the interaction module 210 may receive user input via a user interface (which will be described in detail later). The user's input can be in various forms, for example, it can include but is not limited to one of the following: text, image, video, and audio.

If the user's input includes an image, video, or audio, the interaction module 210 may send the input to the recognition module 220, and the recognition module 220 obtains text based on the user's input. Among them, any image recognition technology and voice recognition technology can be used to obtain the text, which is not limited in the present invention. For ease of description, this text refers to the text input by the user, the text obtained based on the image input by the user, the text obtained based on the video input by the user, and the text obtained based on the audio input by the user as input text.

The input text can be one or multiple input texts. In the case where the user directly enters text, multiple input texts can be distinguished based on separators such as punctuation marks. For example, the user's input is "Chaoyangmen; Dongqiao; Jintai Road", that is, the following three input texts separated by semicolons: "Chaoyangmen", "East Bridge" and "Jintai Road". In the case where the user inputs an image, multiple input texts obtained based on the image can be distinguished according to different text blocks of the input text in the image (which will be described in detail later). In the case of a user inputting audio, multiple input texts obtained based on the audio can be distinguished by pausing or saying a separator word such as "interval". In the case where the user inputs a video, since the video includes images and audio, processing can be performed with reference to the situation of the input image and/or audio.

After acquiring one or more input texts, the text matching module 230 in the input processing device 200 may search for a target text matching the input in an image related to the input.

As shown in FIG. 2, the input processing device 200 further includes an image acquisition module 240, which can acquire an image related to the user's input. For example, the camera of the computing device 100 can be used to collect related images, or the related images sent to the input processing device 200 can be received via the network, and the source of the related images is not limited in the present invention.

The image related to the input may include multiple text blocks. As described above, the input-related image displays multiple texts, each text corresponding to the image block in the image is a text block, and the text contained in each text block is a text.

Fig. 3 shows a schematic diagram of an input-related image according to an embodiment of the present invention. The image is an image of a subway line and includes the logo text of many subway stations (that is, the names of the subway stations). The image block occupied by the name of each subway station in the image is a text block, and the name of the subway station displayed in each text block is a text.

The input-related images can be in various formats, for example, bitmap image formats such as JPEG, BMP, PNG, etc., and vector graphics formats such as SVG and SWF. The present invention does not limit the format of the image.

For images in formats such as SVG (Scalable Vector Graphics), the image acquisition module 240 can directly acquire the text contained in the image (without image recognition). For an image in a format such as JPEG, the image acquisition module 240 needs to send the image to the recognition module 220, and the recognition module 220 performs image recognition. The recognition module 220 can obtain the text contained in the image, that is, obtain multiple texts contained in multiple text blocks.

In some implementations, the recognition module 220 may use optical character recognition (OCR, Optical Character Recognition) technology to analyze the image and recognize text in the image. The recognition module 220 can also detect texts in multiple different languages. For example, the recognition module 220 may include an OCR engine capable of recognizing text in multiple languages, or an OCR engine for each of multiple different languages. Of course, other image recognition technologies can also be used to recognize the text in the image, which is not limited in the present invention.

The recognition module 220 may also detect the display information of the text in the image. The display information includes but is not limited to the display area and/or display style of the text in the image. The display area indicates the position of the text in the image, such as the coordinates of the text block where the text is located. The display style can include text color, background color, font size, font type, and so on. In some embodiments, this display information can be used to identify different text blocks in the image. For example, in the case where two parts of text have different font colors, different background colors, or are separated from each other (for example, separated by at least a threshold distance), the text recognition module 220 may determine that the two parts of text in the image are Two texts contained in two different text blocks.

In addition, according to other embodiments of the present invention, the recognition module 220 can also use natural language processing (NLP) technology to correct errors generated in image recognition, such as segmentation errors, text errors, grammatical errors, and so on.

After obtaining the text(s) contained in the image, the image obtaining module 240 sends the text contained in the image to the text matching module 230. The text matching module 230 may perform text matching between the text contained in the image and the input text to find a target text in the image that matches the input text. Among them, for each input text, the text contained in the image is matched with the input text of the item to find the target text in the image that matches the input text of the item.

The following takes an input text as an example to describe the process of finding the target text that matches the input text. First, the text matching module 230 can determine whether the input text and the text in the image acquired by the image acquisition module 240 use the same language.

In some implementations, the language of the text in the image acquired by the image acquisition module 240 is the same language as the language of the input text. That is, if the language of the text in the image acquired by the image acquisition module 240 is called the source language, and the language of the input text is called the input language, then the source language and the input language are the same language, the text matching module 230 may Straightly find the target text that matches the input text in the text contained in the image. For example, among multiple texts contained in the image, directly search for the target text that includes the input text of the item or includes at least a part of the input text of the item.

In other implementations, the language of the text in the image acquired by the image acquisition module 240 and the language of the input text are two different languages. That is, the source language and the input language are two different languages. At this time, the text matching module 230 needs to first translate the text obtained from the image into the input language, or translate the input text into the source language.

The text matching module 230 may send the text to be translated to the translation engine 250 for translation. In some embodiments, you can choose to translate the text obtained from the image into the input language, but this method requires translation of every text in the image, which requires a lot of work. Therefore, preferably, the input text is translated into the source language, that is, only the input text is sent to the translation engine 250 for translation.

After the translation, the text matching module 230 performs text matching between the text contained in the image and the translation of the input text into the source language, so as to find the target text that matches the input text in the text contained in the image. In some embodiments, if the target text in the image that matches the input text of the item cannot be found, the user interface can be used to prompt the user. If the target text in the image that matches the input text of the item is found, the vibration, ringing, blinking signal light and/or user interface of the computing device 1 can be used to prompt the user.

For each input text for which a matching target text is found, in some implementations, the text matching module 230 may display the target text matching the input text in the image. In some embodiments, the target text can be highlighted again in the text block where the target text is located in the image (for example, covering the text block) (refer to the description of highlighting auxiliary information below).

In other embodiments, various borders (such as rectangular boxes), shapes (such as arrows), and line segments (such as wavy lines) marks may be added to the text block where the target text in the image is located, so as to highlight the target text. In this way, users can quickly locate the target text without reading and searching one by one.

In other implementations, for each input text for which a matching target text is found, the text matching module 230 may obtain auxiliary information of the target text that matches the input text. It should be noted that the embodiment of the present invention does not limit the specific content of the auxiliary information, and any information related to the target text that can assist the user is within the protection scope of the present invention.

In some embodiments, the auxiliary information may be information such as reviews, introductions, reference prices, purchase channels, etc. The text matching module 230 may obtain such auxiliary information from various search engines, review websites, or shopping platforms.

In other embodiments, the auxiliary information may be translation. The text matching module 230 may send the target text matching the input text to the translation engine 250 to obtain the translation of the target text. The translation engine 250 can translate the target text into different languages. For example, the translation engine 250 may translate the text into the target language specified by the user, or the default language of the computing device 100 (for example, when the target language is not specified), or the input language of the input text (for example, the target language is not specified and the input language is different). In the case of the source language). Among them, the user can use the user interface to specify the target language (will be described in detail later).

After obtaining the auxiliary information of the target text, the interaction module 210 may display or highlight the auxiliary information using a user interface displaying an image related to the input.

Specifically, for the found target text, the interaction module 210 may obtain the display information of the target text in the image, and display or highlight the auxiliary information of the target text based on the display information.

For example, the display area of the auxiliary information can be configured based on the display area of the target text. The display area of the auxiliary information can cover the corresponding text block of the target text in the image, and can also be close to the corresponding text block of the target text in the image (for example, displayed around the corresponding text block).

For example, the display style of the auxiliary information can be configured based on the display style of the target text. In some cases, in order to coordinate the display of the auxiliary information with the image, at least part of the display style of the auxiliary information may be configured to be consistent with the corresponding display style of the target text (for example, the font size and font type configuration are consistent). In other cases, in order to highlight the auxiliary information, at least part of the display style of the auxiliary information may be configured to be significantly different from the corresponding display style of the target text. For example, the background color (or text color) of the auxiliary information can be configured as a bright color or a contrast color of the background color of the text contained in the image to highlight the auxiliary information. For another example, display styles such as underline, bold, italic, text background, text shading, and text border may be used to highlight the auxiliary information. The embodiment of the present invention does not limit the specific display style used to display or highlight the auxiliary information.

In addition, since there can be multiple input texts, in order to facilitate the user to distinguish target texts matching different input texts, the mark of the target text matching the same input text in the image or the display style of its auxiliary information can be the same. The mark of the target text or the display style of its auxiliary information that matches the input text of different items can be different.

According to some embodiments of the present invention, the displayed auxiliary information may be editable text, so that the user can perform editing operations such as copy and paste.

According to other embodiments of the present invention, the interaction module 210 may also receive a user's zoom instruction or selection instruction on the displayed auxiliary information, and in response to the user's zoom instruction or selection instruction on the displayed auxiliary information, zoom in or out accordingly. Display the auxiliary information to facilitate the user's reading. The interaction module 210 may also receive a save instruction from the user, and in response to the save instruction, store the above-mentioned image displaying the auxiliary information in the computing device 100, so as to facilitate subsequent viewing by the user.

According to some other embodiments of the present invention, the interaction module 210 may also receive a user's selection instruction for other text blocks that do not display auxiliary information, and in response to the user's selection instruction for other text blocks that do not display auxiliary information, obtain and display The auxiliary information of the text contained in the selected text block. The specific display configuration has been described in detail in the previous section, and will not be repeated here.

The following describes the process of locating the target text for the user and displaying the translation of the target text according to the input processing solution of the embodiment of the present invention with reference to FIGS. 4A to 4C.

4A to 4C show screenshots of a user interface for displaying translation of target text in an image according to an embodiment of the present invention. As shown in FIG. 4A, the user interface 410 displays an image 411, and the image 411 may be collected in response to the user's selection of the camera button 412. The image 411 includes a plurality of text blocks, and each text block includes one item of text. In the example of FIG. 4A, each text block included in the image 411 includes an English text. For example, the text block 416 includes an English text "King's Cross St Pancras".

The user interface 410 may also enable the user to select a language for translation. Specifically, the user interface 410 enables the user to select the source language 413, so that multiple texts in the source language will be recognized in the image 411. The user interface 410 also enables the user to select the target language 414 into which the text will be translated. In the example of FIG. 4A, the user has selected the source language 413 as English and the target language 414 as Chinese. That is, the user wants to translate the English text recognized in the image 411 into Chinese text.

The user interface 410 may also enable the user to input. Specifically, the user interface includes an input button 415. When the user selects the input button 415, in response to the user's selection of the input button 415, the user interface 420 as shown in FIG. 4B may be displayed.

The user interface 420 is similar to the user interface 410 and includes an image 411 and an input button 415, and also includes a control 421. The control 421 can receive the user's input, and can also prompt the user to enter the target language 414. For example, the input text in the target language 414 or the image/audio including the input text in the target language 414 may be input. In the example of FIG. 4B, the control 421 prompts the user to input Chinese input text. After the user inputs, in response to the control 421 receiving the user's input, a user interface 430 as shown in FIG. 4C may be displayed.

The user interface 430 is similar to the user interface 410 and includes an image 411 and an input button 415, and also includes a control 431. The control 431 can display one or more items of input text obtained from the user's input, and can also display the search result of at least one item of text that matches in the image 411 based on the one or more items of input text. In the example of FIG. 4C, the control 431 displays an input text "King's Cross" and the search result of the input text "A King's Cross has been found".

The user interface 430 also displays an overlay 432 on top of the image 411. The overlay 432 displays the translation 433 of the target text matching the user's input text in the image 411, and overlays the text block corresponding to the matching target text. In the example in Figure 4C, the text "King's Cross St Pancras" contained in the text block 416 in the image 411 matches the user's input text "King's Cross", covering 432 to display the text "King's Cross St Pancras" "The translation 433 "King's Cross" and overlays the corresponding text block 416.

The user interface 430 may configure the display style of the translation of the target text based on the display style of the matched target text. In the example in Figure 4C, the background color of the translation 433 and the matching target text "King's Cross St Pancras" are similar (for example, both are white), and the text color is similar (for example, both are blue), which helps to translate the translation 433 It is displayed in coordination with the image 411. In addition, the user interface 430 may also highlight the translation. In the example of FIG. 4C, the background color of the text of the translation 433 is a bright color (for example, bright blue), which helps to highlight the translation 433.

According to FIGS. 4A to 4C, the user can quickly and easily locate the target text of interest or interest in the image 411 and obtain its translation, without passively reading the translation of all the text in the image. For example, after capturing the image 411 and inputting "Kings Cross" through the user interface 420, the user can easily locate the "Kings Cross" subway station through the user interface 430 and see its translation.

The user interface 430 may also enable the user to download and store the image 411 with the translation 433 displayed. Specifically, the user interface 430 includes a download button 434. In response to the user's selection of the download button 434, the image 411 displaying the translation 433 can be downloaded and stored for the user to view later.

FIG. 5 shows a flowchart of an input processing method 500 according to an embodiment of the present invention. The input method 500 can be executed in the input processing device 200.

As shown in FIG. 5, the input processing method 500 starts at step S510. In step S510, the user's input is received. In step S520, an image related to the user's input is acquired.

In some embodiments, the user's input may include at least one of the following: text, image, video, and audio. After receiving the user's input, the input text may be obtained based on the user's input.

After acquiring the image related to the user's input, in step S530, search for target text in the image that matches the user's input. Specifically, the text contained in the image can be obtained, and the text contained in the image can be matched with the input text to find the matching target text. Wherein, when the language of the input text is different from the language of the text contained in the image, the translation of the input text into the language of the text contained in the image can also be obtained, and the text contained in the image can be matched with the translation of the input text. .

For the found target text, according to step S540, the target text may be highlighted in the input-related image, and/or auxiliary information of the target text may be displayed. Among them, auxiliary information can also be highlighted.

According to some embodiments of the present invention, for the searched target text, the display information of the target text in the above image can be acquired, and the auxiliary information can be displayed or highlighted based on the display information.

Specifically, the display information may at least include the display area and/or display style of the target text. In some embodiments, the display area of the auxiliary information may be configured based on the display area of the target text. For example, the display area of the auxiliary information may cover or be close to the display area of the target text. The display style of the auxiliary information can also be configured based on the display style of the target text.

For the detailed processing logic and implementation process of each step in the input processing method 500, please refer to the previous description of the input processing device 200 in conjunction with FIGS. 1 to 4C, which will not be repeated here.

Fig. 6 shows a flowchart of an input processing method 600 according to an embodiment of the present invention. The input method 600 can be executed in the input processing device 200. As shown in FIG. 6, the input processing method 600 starts at step S610.

In step S610, a user's input is received. In step S620, an image related to the user's input is acquired. In some embodiments, after receiving the user's input, the input text may also be obtained based on the input.

Then, in step S630, search for the target text in the image that matches the user's input. Specifically, the text contained in the image can be obtained. Among them, the input text is expressed in the input language, and the text contained in the image is expressed in a source language different from the input language. Therefore, it is necessary to obtain the translation of the input text to the source language, and then perform the text translation between the text contained in the image and the input text Match to find matching target text. The translation may include the translation of the target text into the input language, or the translation of the target text into the target language specified by the user.

For the found target text, according to step S640, the target text may be highlighted in the image, and/or the translation of the target text may be displayed.

For the detailed processing logic and implementation process of each step in the input processing method 600, please refer to the previous description of the input processing device 200 and the input processing method 500 in conjunction with FIGS. 1 to 5, which will not be repeated here.

Those skilled in the art should understand that although the input processing apparatus 200 is illustrated as including an interaction module 210, a recognition module 220, a text matching module 230, an image acquisition module 240, and a translation engine 250, one or more of these modules may be It is stored on and/or executed by other devices, such as a server that communicates with the input processing apparatus 200 (for example, a server that can perform image recognition, voice recognition, text matching, and language translation). In some embodiments, the input processing device 200 may receive user input, obtain an image related to the input, and send the input and the image to the server. The server searches for the target text that matches the input in the image, and returns the search result to the input processing device 200. The input processing device 200 then highlights the target text found by the server in the image, and/or displays auxiliary information of the target text. For the detailed processing logic and implementation process of the server to find the target text, please refer to the relevant description of the input processing apparatus 200 and the input processing method 500 in conjunction with FIGS. 1 to 5, and will not be repeated here.

The following describes the input processing device 700 and the input processing method 800 executed by it.

The image related to the input may be considered to include multiple image blocks. In some cases, the user wants to locate the image block he focuses on among the multiple image blocks included in the image, that is, the target image. For example, the input processing device 700 may receive the user's input of a commodity that he wants to purchase, and obtain an image of a store shelf. The image of the store shelf includes logo images of multiple commodities, and the input processing device 700 can highlight the logo images of the commodities that the user wants to purchase in the images, so that the user can quickly locate the commodities. Further, the evaluation, introduction or reference price of the product can also be displayed for the user's reference.

FIG. 7 shows a schematic diagram of an input processing device 700 according to an embodiment of the present invention. As shown in FIG. 7, the input processing device 700 includes an interaction module 710, an image acquisition module 720, and an image matching module 730. The interaction module 710 may receive a user's input, and the input may indicate a target image that the user pays attention to. The image acquisition module 720 may acquire an image related to the user's input. The image matching module 730 is coupled to the interaction module 710 and the image acquisition module 720, and can search for a target image matching the user's input among the above-mentioned images. For the found target image, the interaction module 720 may highlight the target image in the above-mentioned image, and/or display auxiliary information of the target image.

For the detailed processing logic and implementation process of each module in the input processing device 700, please refer to the related description of the input processing method 800 in conjunction with FIG. 8 below.

FIG. 8 shows a flowchart of an input processing method 800 according to an embodiment of the present invention. As shown in FIG. 8, the input processing method 800 is executed in the input processing device 700 and starts at step S810.

In step S810, a user's input may be received. In some embodiments, a user's input may be received via a user interface, and the input may indicate a target image that the user focuses on.

The user's input can be in various forms, for example, it can include but is not limited to one of the following: text, image, video, and audio. If the user's input includes text, audio, or video, then the input image needs to be obtained based on the user's input. Among them, if the user inputs an image, the image is the input image. If the user inputs text, the input image can be obtained based on the text (for example, obtained through a search engine). If the user inputs audio, the text can be obtained based on the audio, and then the input image can be obtained based on the text. For example, if the user inputs the text of "X doll" or speaks the voice of "X doll", the image of "X" can be obtained based on the input. If the user inputs a video, since the video includes images and audio, the input image can be obtained by referring to the input image and audio.

The input image can be one or more input images. In the case where the user directly inputs images continuously, multiple input images can naturally be distinguished. In the case where the user inputs text, multiple texts can be obtained first (multiple texts can be distinguished by separators such as punctuation marks), and then each input image is obtained based on each text. When the user inputs audio, multiple texts can be obtained based on the audio first (multiple texts obtained based on the audio can be distinguished by pausing or saying a separator such as "interval"), and then based on each text separately Obtain each input image. In the case where the user inputs a video, since the video includes continuous images and audio, it can be handled with reference to the case of continuous input of images and/or input of audio.

In step S820, an image related to the user's input may be acquired. For example, the camera of the computing device 100 can be used to collect related images, or the related images sent to the input processing device 700 can be received via the network, and the source of the related images is not limited in the present invention.

Then, according to step S830, a target image in the image that matches the input of the user can be searched for. Wherein, for each input image, a target image matching the input image(s) in the image is searched.

Taking the search for a target image that matches an input image as an example, the image features of the image and the input image(s) can be acquired first, and then based on the acquired image features, the image can be matched with the input image. To find the target image in the image that matches the input image. Wherein, any image feature extraction technology can be used to obtain image features, and any image matching technology can be used to perform image matching, which is not limited in the present invention.

In some embodiments, if a target image matching the input image cannot be found in the image, the user interface can be used to prompt the user. If a target image in the image that matches the input image of the item is found, the vibration, ringing, blinking of the signal light and/or the user interface of the computing device 100 can be used to prompt the user.

For each input image for which a matching target image is found, according to step S840, the target image matching the input image and/or auxiliary information of the target image may be displayed prominently in the above-mentioned image. Wherein, the auxiliary information may be information such as introduction, evaluation, reference price, etc., which is not limited in the present invention. For example, auxiliary information related to the target image can be obtained from various search engines, review websites, or shopping platforms.

In some embodiments, various marks such as borders (for example, rectangular frames), shapes (for example, arrows), line segments (for example, wavy lines), etc. may be added to the target image in the image, so as to highlight the target image. In this way, the user can quickly locate the target image without having to search one by one.

In other embodiments, the display area of the target image in the above-mentioned image may be obtained, and the display area indicates the position of the target image in the image. Then, the display area of the auxiliary information is configured based on the display area of the target image. For example, the display area of the auxiliary information may cover or be close to the display area of the target image. Further, if the image related to the input contains text, the display style of the auxiliary information can also be configured based on the display style of the text contained in the image. In some cases, in order to coordinate the display of the auxiliary information with the image, at least part of the display style of the auxiliary information may be configured to be consistent with the corresponding display style of the text in the image (for example, the font size and font type configuration are consistent). In other cases, in order to highlight the auxiliary information, at least part of the display style of the auxiliary information may be configured to be significantly different from the corresponding display style of the text contained in the image. For example, the background color (or text color) of the auxiliary information can be configured as a bright color or a contrast color of the background color of the text contained in the image to highlight the auxiliary information. For another example, display styles such as underline, bold, italic, text background, text shading, and text border may be used to highlight the auxiliary information. The embodiment of the present invention does not limit the specific display style used to display or highlight the auxiliary information.

In addition, since there can be multiple input images, in order to facilitate the user to distinguish target images that match different input images, the mark and/or display style of the auxiliary information of the target image matching the same input image in the above-mentioned images can be the same, The display styles of the mark and/or auxiliary information of the target image matching different input images in the above-mentioned images may be different.

According to some embodiments of the present invention, the displayed auxiliary information may be editable text, so that the user can perform editing operations such as copy and paste. According to other embodiments of the present invention, it is also possible to receive a user's zoom instruction or selection instruction for the displayed auxiliary information, and in response to the user's zoom instruction or selection instruction for the displayed auxiliary information, display the display in a correspondingly reduced or enlarged manner. Auxiliary information to facilitate the user's reading. It is also possible to receive a save instruction from the user, and in response to the save instruction, store the above-mentioned image displaying the auxiliary information in the computing device 100 to facilitate subsequent viewing by the user.

According to some other embodiments of the present invention, it is also possible to receive a user's selection instruction for other image blocks that do not display auxiliary information, and in response to the user's selection instruction for other image blocks that do not display auxiliary information, obtain and display the selected Auxiliary information of the image block. The specific display configuration has been described in detail in the previous section, and will not be repeated here.

For part of the detailed processing logic and implementation process of each step in the input processing method 800, please refer to the previous description of the input processing device 200 and the input processing method 500 in conjunction with FIGS. 1 to 5. I won't repeat them here.

Those skilled in the art should understand that one or more steps in the input processing method 800 executed by the input processing apparatus 700 described above may also be executed by other devices, such as a server communicating with the input processing apparatus 700. In some embodiments, the input processing device 700 may receive user input, obtain an image related to the input, and send the input and the image to the server. The server searches for a target image matching the input in the image, and returns the search result to the input processing device 700. The input processing device 700 then highlights the target image found by the server in the image, and/or displays auxiliary information of the target image.

Those skilled in the art should also understand that the embodiments of the present invention can not only be limited to locating target text or target images in images, but can also be extended to locating target texts, target images, etc. in various source objects such as text, images, and audio. Various target objects such as images.

FIG. 9 shows a flowchart of an input processing method 900 according to an embodiment of the present invention. As shown in FIG. 9, the input processing method 900 starts at step S910.

In step S910, a user's input is received, and the user's input may include at least one of the following: text, audio, video, and image. In step S920, a source object related to the user's input is obtained. The source object may include at least one of the following: text, image, and video.

Then, in step S930, a target object that matches the input of the user among the source objects can be searched. For the found target object, according to step S940, the target object may be highlighted in the source object, and/or auxiliary information of the target object may be displayed.

For the detailed processing logic and implementation process of each step in the input processing method 900, please refer to the foregoing description of the input processing device 200, the input processing method 500, the input processing device 700, and the input processing method 800 in conjunction with FIGS. 1-8. I won't repeat them here.

Those skilled in the art should understand that the source object may generally include multiple objects, for example, multiple pieces of text or multiple image blocks. The target object may be an object that the user pays attention to among multiple objects included in the source object, for example, it may be a target text, a target image, or any other suitable objects. The present invention does not limit the specific types of the target object and the source object. The scenes in which the target text is searched from the image, the target text is searched from the text, the target image is searched from the image, etc. are all in the present invention. Within the scope of protection.

Those skilled in the art should also understand that one or more steps in the input processing method 900 may also be executed by other devices, such as a server that communicates with the input processing apparatus that executes the input processing method 900. In some embodiments, a user's input may be received, an image related to the input may be obtained, and the input and the image may be sent to the server. The server searches for the target image that matches the input in the image, and returns the search result. The input processing device that executes the input processing method 900 then highlights the target image found by the server in the image, and/or displays auxiliary information of the target image.

According to the input processing solution of the embodiment of the present invention, by receiving active input from the user, the target object (for example, target text or target image) matching the input of the source object (for example, image) is highlighted for the user, and/or the target object is displayed The auxiliary information in the source object avoids the tedious operation of reading and finding the target object that the user pays attention to in the source object, and improves the user experience. Moreover, the input processing solution according to the embodiment of the present invention only needs to obtain the auxiliary information of the target object included in the source object, which greatly reduces the workload.

The various technologies described here can be implemented in combination with hardware or software, or a combination of them. Therefore, the method and device of the embodiment of the present invention, or some aspects or parts of the method and device of the embodiment of the present invention may be embedded in a tangible medium, such as a removable hard disk, U disk, floppy disk, CD-ROM, or any other machine. A form of program code (ie, instructions) in a read storage medium, where when the program is loaded into a machine such as a computer and executed by the machine, the machine becomes a device for practicing the embodiment of the present invention.

When the program code is executed on a programmable computer, the computing device generally includes a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), and at least one input device, And at least one output device. The memory is configured to store program code; the processor is configured to execute the method of the embodiment of the present invention according to instructions in the program code stored in the memory.

By way of example and not limitation, readable media include readable storage media and communication media. The readable storage medium stores information such as computer readable instructions, data structures, program modules, or other data. Communication media generally embody computer-readable instructions, data structures, program modules or other data in modulated data signals such as carrier waves or other transmission mechanisms, and include any information delivery media. Combinations of any of the above are also included in the scope of readable media.

In the instructions provided here, the algorithms and displays are not inherently related to any particular computer, virtual system or other equipment. Various general-purpose systems can also be used with the examples of embodiments of the present invention. Based on the above description, the structure required to construct this type of system is obvious. In addition, the embodiments of the present invention are not directed to any specific programming language. It should be understood that various programming languages can be used to implement the content of the embodiments of the present invention described herein, and the above description of specific languages is for the purpose of disclosing the best implementation of the embodiments of the present invention.

In the instructions provided here, a lot of specific details are explained. However, it can be understood that the embodiments of the embodiments of the present invention can be practiced without these specific details. In some instances, well-known methods, structures, and technologies are not shown in detail, so as not to obscure the understanding of this specification.

Similarly, it should be understood that in order to simplify the present disclosure and help understand one or more of the various aspects of the invention, in the above description of the exemplary embodiments of the embodiments of the present invention, the various features of the embodiments of the present invention are sometimes grouped together into Single embodiment, figure, or description thereof. However, the disclosed method should not be interpreted as reflecting the intention that the claimed embodiments of the present invention require more features than those explicitly recorded in each claim. More precisely, as reflected in the following claims, the inventive aspect lies in less than all the features of a single embodiment disclosed previously. Therefore, the claims following the specific embodiment are thus explicitly incorporated into the specific embodiment, wherein each claim itself serves as a separate embodiment of the embodiment of the present invention.

Those skilled in the art should understand that the modules or units or components of the device in the example disclosed herein can be arranged in the device as described in this embodiment, or alternatively can be positioned differently from the device in this example Of one or more devices. The modules in the foregoing examples can be combined into one module or, in addition, can be divided into multiple sub-modules.

Those skilled in the art can understand that it is possible to adaptively change the modules in the device in the embodiment and set them in one or more devices different from the embodiment. The modules or units or components in the embodiments can be combined into one module or unit or component, and in addition, they can be divided into multiple sub-modules or sub-units or sub-components. Except that at least some of such features and/or processes or units are mutually exclusive, any combination can be used to compare all the features disclosed in this specification (including the accompanying claims, abstract and drawings) and any method or methods disclosed in this manner or All the processes or units of the equipment are combined. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract and drawings) may be replaced by an alternative feature providing the same, equivalent or similar purpose.

In addition, those skilled in the art can understand that although some embodiments described herein include certain features included in other embodiments but not other features, the combination of features of different embodiments means that they are in the embodiments of the present invention. Within the scope of and form different embodiments. For example, in the following claims, any one of the claimed embodiments can be used in any combination.

In addition, some of the above-mentioned embodiments are described herein as methods or combinations of method elements that can be implemented by a processor of a computer system or by other devices that perform the above-mentioned functions. Therefore, a processor with the necessary instructions for implementing the method or method element described above forms a device for implementing the method or method element. In addition, the elements described herein of the device embodiments are examples of devices for implementing functions performed by the elements for the purpose of implementing the invention.

As used herein, unless otherwise specified, the use of ordinal numbers "first", "second", "third", etc. to describe ordinary objects merely refers to different instances of similar objects, and is not intended to imply such The described objects must have a given order in terms of time, space, order, or in any other way.

Although the embodiments of the present invention have been described in terms of a limited number of embodiments, benefiting from the above description, those skilled in the art understand that other embodiments can be envisaged within the scope of the embodiments of the present invention described thereby. In addition, it should be noted that the language used in this specification is mainly selected for readability and teaching purposes, not for explaining or limiting the subject matter of the embodiments of the present invention. Without departing from the scope and spirit of the appended claims, many modifications and alterations are obvious to those of ordinary skill in the art. Regarding the scope of the embodiments of the present invention, the disclosure of the embodiments of the present invention is illustrative and not restrictive, and the scope of the embodiments of the present invention is defined by the appended claims.

Claims

An input processing method, including:

Receive user input;

Acquiring an image related to the input;

Find the target text in the image that matches the input; and

For the found target text, highlight the target text in the image, and/or display auxiliary information of the target text.
The method of claim 1, wherein the user's input includes at least one of the following: text, image, video, and audio, and the method further comprises:

Obtain input text based on the user's input.
3. The method of claim 2, wherein the step of finding the target text in the image that matches the input comprises:

Acquiring the text contained in the image;

Perform text matching between the text contained in the image and the input text.
The method according to claim 3, wherein the step of text matching the text contained in the image with the input text comprises:

In the case where the language of the input text is different from the language of the text contained in the image, acquiring a translation of the input text to the language of the text contained in the image;

Perform text matching between the text contained in the image and the translation of the input text.
The method of claim 1, wherein the step of displaying the auxiliary information comprises:

Highlight the auxiliary information.
The method of claim 1 or 5, further comprising:

For the found target text, obtain display information of the target text in the image, and the display information includes at least the display area and/or display style of the target text.
The method of claim 6, wherein the step of displaying or highlighting the auxiliary information comprises:

The display area of the auxiliary information is configured based on the display area of the target text, and the display area of the auxiliary information covers the display area of the target text.
The method of claim 6, wherein the step of displaying or highlighting the auxiliary information comprises:

The display style of the auxiliary information is configured based on the display style of the target text.
An input processing method, including:

Receive user input;

Acquiring an image related to the input;

Sending the input and the image to a server, so that the server finds a target text in the image that matches the input; and

For the found target text, highlight the target text in the image, and/or display auxiliary information of the target text.
An input processing method, including:

Receive user input;

Acquiring an image related to the input;

Find the target text in the image that matches the input; and

For the found target text, highlight the target text in the image, and/or display the translation of the target text.
The method of claim 10, further comprising:

Obtaining input text based on the input, the input text being expressed in an input language; and

The step of finding the target text in the image that matches the input includes:

Acquiring text contained in the image, where the text contained in the image is expressed in a source language different from the input language;

Obtaining a translation of the input text into the source language;

Perform text matching between the text contained in the image and the translation of the input text.
The method according to claim 11, wherein the translation comprises a translation of the target text into the input language, or a translation of the target text into a target language specified by a user.
An input processing method, including:

Receive user input;

Acquiring an image related to the input;

Sending the input and the image to a server, so that the server finds a target text in the image that matches the input; and

For the found target text, highlight the target text in the image, and/or display the translation of the target text.
An input processing method, including:

Receive user input;

Acquiring an image related to the input;

Searching for a target image in the image that matches the input; and

For the found target image, highlight the target image in the image, and/or display auxiliary information of the target image.
The method of claim 14, wherein the user's input includes at least one of the following: text, audio, video, and image, and the method further comprises:

An input image is acquired based on the input.
The method of claim 15, wherein the step of searching for a target image in the image that matches the input comprises:

Acquiring image features of the image and the input image;

Based on the image feature, image matching is performed between the image and the input image.
The method of claim 14, wherein the step of displaying the auxiliary information comprises:

Highlight the auxiliary information.
The method according to claim 14 or 17, further comprising:

For the found target image, the display area of the target image in the image is acquired.
The method of claim 18, wherein the step of displaying or highlighting the auxiliary information comprises:

The display area of the auxiliary information is configured based on the display area of the target image, and the display area of the auxiliary information covers the display area of the target image.
An input processing method, including:

Receive user input;

Acquiring an image related to the input;

Sending the input and the image to a server, so that the server finds a target image in the image that matches the input; and

For the found target image, highlight the target image in the image, and/or display auxiliary information of the target image.
An input processing method, including:

Receive user input;

Obtain a source object related to the input;

Searching for a target object that matches the input among the source objects; and

For the found target object, highlight the target object in the source object, and/or display auxiliary information of the target object.
The method of claim 21, wherein the user's input includes at least one of the following: text, audio, video, and image, and the source object includes at least one of the following: text, image, and video.
The method according to claim 21, wherein the target object is a target text or a target image.
An input processing method, including:

Receive user input;

Obtain a source object related to the input;

Sending the input and the source object to a server, so that the server finds a target object matching the input among the source objects; and

For the found target object, highlight the target object in the source object, and/or display auxiliary information of the target object.
An input processing device, including:

Interactive module, suitable for receiving user input;

An image acquisition module, adapted to acquire an image related to the input;

A text matching module, adapted to find the target text in the image that matches the input; wherein

The interaction module is further adapted to highlight the target text in the image and/or display auxiliary information of the target text for the found target text.
An input processing device, including:

Interactive module, suitable for receiving user input;

An image acquisition module, adapted to acquire an image related to the input;

An image matching module, adapted to find a target image in the image that matches the input; wherein

The interaction module is further adapted to highlight the target image in the searched target image, and/or display auxiliary information of the target image.
A computing device including:

One or more processors;

Memory; and

A program, wherein the program is stored in the memory and is configured to be executed by the one or more processors, and the program comprises a method for executing the input processing method according to any one of claims 1-24 Instructions.
A computer-readable storage medium storing a program, the program comprising instructions that, when executed by a computing device, cause the computing device to execute the input processing method according to any one of claims 1-24.