CN113095090A

CN113095090A - Input processing method and device and computing equipment

Info

Publication number: CN113095090A
Application number: CN202010019082.XA
Authority: CN
Inventors: 葛妮瑜; 方视菁; 胡雪梅; 李洁
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Singapore Holdings Pte Ltd
Priority date: 2020-01-08
Filing date: 2020-01-08
Publication date: 2021-07-09
Also published as: WO2021139667A1

Abstract

The embodiment of the invention discloses an input processing method, which comprises the following steps: receiving input of a user; acquiring an image related to input; searching a target text matched with the input in the image; and for the searched target text, highlighting the target text in the image and/or displaying auxiliary information of the target text. The embodiment of the invention also discloses a corresponding input processing device, a computing device and a storage medium.

Description

Input processing method and device and computing equipment

Technical Field

The invention relates to the technical field of computers, in particular to an input processing method, an input processing device and computing equipment.

Background

With the development and popularity of computing devices, such as mobile terminals and personal computers, users are becoming more accustomed to using computing devices to handle various everyday matters. For example, when faced with a large amount of information, a user may wish to use a computing device to quickly locate, or obtain ancillary information to, the portion of information in which the user is interested. A typical application scenario is when traveling in areas where different languages are spoken, facing a large amount of information in strange languages, a user wishes to use a computing device to locate a portion of information in which the user is interested and obtain a translation of that portion of information.

In a conventional approach, a user may capture an image including a large amount of information using a camera installed in a computing device, and the computing device recognizes all information in the image and displays auxiliary information of all information to the user. This may however be unfriendly to the user. For example, the image may be an image of a map that includes names of a number of different locations. A user interface that displays translations for all locality names may be tiring to the user because the user needs to read one by one to find a locality he is interested in or interested in and obtain a translation for that locality. This process is quite time consuming and labor intensive, reducing the user experience.

Disclosure of Invention

To this end, embodiments of the present invention provide an input processing method, apparatus and computing device, in an effort to solve or at least alleviate the above-existing problems.

According to an aspect of an embodiment of the present invention, there is provided an input processing method including: receiving input of a user; acquiring an image related to input; searching a target text matched with the input in the image; and for the searched target text, highlighting the target text in the image and/or displaying auxiliary information of the target text.

Optionally, in the method according to the embodiment of the present invention, the input of the user includes at least one of: text, images, video and audio, the method further comprising: input text is retrieved based on input from a user.

Optionally, in the method according to an embodiment of the present invention, the step of finding a target text in the image that matches the input includes: acquiring a text contained in an image; text matching is performed on the text contained in the image and the input text.

Optionally, in the method according to an embodiment of the present invention, the step of text matching the text included in the image with the input text includes: acquiring a translation of the language of the input text to the language of the text contained in the image in the case that the language of the input text is different from the language of the text contained in the image; text matching is performed on the text contained in the image with the translation of the input text.

Optionally, in a method according to an embodiment of the present invention, the step of displaying the auxiliary information includes: the auxiliary information is highlighted.

Optionally, in the method according to the embodiment of the present invention, the method further includes: and for the searched target text, acquiring display information of the target text in the image, wherein the display information at least comprises a display area and/or a display style of the target text.

Optionally, in the method according to the embodiment of the present invention, the step of displaying or highlighting the auxiliary information includes: the display area of the auxiliary information is configured based on the display area of the target text, and the display area of the auxiliary information covers the display area of the target text.

Optionally, in the method according to the embodiment of the present invention, the step of displaying or highlighting the auxiliary information includes: the display style of the auxiliary information is configured based on the display style of the target text.

According to another aspect of the embodiments of the present invention, there is provided an input processing method including: receiving input of a user; acquiring an image related to input; the input and the image are sent to a server, so that the server searches a target text matched with the input in the image; and for the searched target text, highlighting the target text in the image and/or displaying auxiliary information of the target text.

According to another aspect of the embodiments of the present invention, there is provided an input processing method including: receiving input of a user; acquiring an image related to input; searching a target text matched with the input in the image; and for the searched target text, highlighting the target text in the image, and/or displaying a translation of the target text.

Optionally, in the method according to the embodiment of the present invention, the method further includes: obtaining an input text based on the input, the input text being represented in an input language; and the step of finding a target text in the image matching the input comprises: acquiring texts contained in an image, wherein the texts contained in the image are represented in a source language different from an input language; acquiring a translation from an input text to a source language; text matching is performed on the text contained in the image with the translation of the input text.

Optionally, in a method according to an embodiment of the invention, the translation includes a translation of the target text into the input language, or a translation of the target text into a user-specified target language.

According to another aspect of the embodiments of the present invention, there is provided an input processing method including: receiving input of a user; acquiring an image related to input; the input and the image are sent to a server, so that the server searches a target text matched with the input in the image; and for the searched target text, highlighting the target text in the image, and/or displaying a translation of the target text.

According to another aspect of the embodiments of the present invention, there is provided an input processing method including: receiving input of a user; acquiring an image related to input; searching a target image matched with the input in the image; and for the searched target image, highlighting the target image in the image and/or displaying auxiliary information of the target image.

Optionally, in the method according to the embodiment of the present invention, the input of the user includes at least one of: text, audio, video, and images, the method further comprising: an input image is acquired based on the input.

Optionally, in the method according to the embodiment of the present invention, the step of finding a target image in the image that matches the input includes: acquiring image characteristics of an image and an input image; and performing image matching on the image and the input image based on the image characteristics.

Optionally, in the method according to the embodiment of the present invention, the method further includes: and acquiring a display area of the target image in the image for the searched target image.

Optionally, in the method according to the embodiment of the present invention, the step of displaying or highlighting the auxiliary information includes: the display area of the auxiliary information is configured based on the display area of the target image, and the display area of the auxiliary information covers the display area of the target image.

According to another aspect of the embodiments of the present invention, there is provided an input processing method including: receiving input of a user; acquiring an image related to input; sending the input and the image to a server so that the server can search a target image matched with the input in the image; and for the searched target image, highlighting the target image in the image and/or displaying auxiliary information of the target image.

According to another aspect of the embodiments of the present invention, there is provided an input processing method including: receiving input of a user; obtaining a source object related to an input; searching a target object matched with the input in the source object; and for the searched target object, highlighting the target object in the source object and/or displaying auxiliary information of the target object.

Optionally, in the method according to the embodiment of the present invention, the input of the user includes at least one of: text, audio, video and images, the source object comprising at least one of: text, images, and video.

Optionally, in the method according to the embodiment of the present invention, the target object is a target text or a target image.

According to another aspect of the embodiments of the present invention, there is provided an input processing method including: receiving input of a user; obtaining a source object related to an input; sending the input and the source object to a server so that the server can search a target object matched with the input in the source object; and for the searched target object, highlighting the target object in the source object and/or displaying auxiliary information of the target object.

According to another aspect of the embodiments of the present invention, there is provided an input processing apparatus including: an interaction module adapted to receive input from a user; an image acquisition module adapted to acquire an image related to an input; the text matching module is suitable for searching a target text matched with the input in the image; wherein the interaction module is further adapted to highlight the target text in the image and/or to display auxiliary information of the target text for the found target text.

According to another aspect of the embodiments of the present invention, there is provided an input processing apparatus including: an interaction module adapted to receive input from a user; an image acquisition module adapted to acquire an image related to an input; the image matching module is suitable for searching a target image matched with the input in the image; wherein the interaction module is further adapted to highlight the target image in the image and/or display auxiliary information of the target image for the found target image.

According to another aspect of an embodiment of the present invention, there is provided a computing device including: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing the input processing method according to an embodiment of the present invention.

According to a further aspect of embodiments of the present invention, there is provided a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform an input processing method according to embodiments of the present invention.

According to the input processing scheme provided by the embodiment of the invention, the target objects (such as target texts or target images) matched with the input in the source objects (such as images) are highlighted for the user by receiving the active input of the user, and/or the auxiliary information of the target objects is displayed, so that the tedious operation of reading and searching the target objects concerned by the user in the source objects one by one is avoided, and the user experience is improved. In addition, the input processing scheme object only needs to acquire the auxiliary information of the target object included in the source object, and the workload is greatly reduced.

The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and the embodiments of the present invention can be implemented according to the content of the description in order to make the technical means of the embodiments of the present invention more clearly understood, and the detailed description of the embodiments of the present invention is provided below in order to make the foregoing and other objects, features, and advantages of the embodiments of the present invention more clearly understandable.

Drawings

To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.

FIG. 1 shows a schematic diagram of a computing device 100, according to one embodiment of the invention;

FIG. 2 shows a block diagram of an input processing device 200 according to an embodiment of the invention;

FIG. 3 illustrates a schematic diagram of an image associated with an input, according to one embodiment of the invention;

4A-4C respectively illustrate screenshots of a user interface for displaying a translation of target text in an image, in accordance with one embodiment of the present invention; and

FIG. 5 shows a flow diagram of an input processing method 500 according to one embodiment of the invention;

FIG. 6 shows a flow diagram of an input processing method 600 according to one embodiment of the invention

FIG. 7 shows a block diagram of an input processing device 700 according to an embodiment of the invention;

FIG. 8 shows a flow diagram of an input processing method 800 according to one embodiment of the invention; and

FIG. 9 shows a flow diagram of an input processing method 900 according to one embodiment of the invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

It will be appreciated that when faced with a large amount of information, the user may wish to be able to quickly locate a portion of the information from which the user is interested and also display corresponding ancillary information (e.g., corresponding translations, comments, and presentations, etc.) when located. To this end, the embodiment of the invention discloses an input processing device, which can receive input of a user, acquire an image which is related to the input and bears a plurality of information, and enable the user to quickly locate a target object and/or acquire auxiliary information of the target object by highlighting the target object concerned by the user and/or displaying the auxiliary information of the target object in the image.

The input processing apparatus according to an embodiment of the present invention can be realized by the following computing device. FIG. 1 shows a schematic diagram of a computing device 100, according to one embodiment of the invention. Computing device 100 is an electronic device capable of capturing and/or displaying images, such as personal computers, mobile communication devices (e.g., smart phones), tablet computers, and other devices that can capture and/or display images.

As shown in fig. 1, computing device 100 may include a memory interface 102, one or more processors 104, and a peripheral interface 106. The memory interface 102, the one or more processors 104, and/or the peripherals interface 106 can be discrete components or can be integrated in one or more integrated circuits. In the computing device 100, the various elements may be coupled by one or more communication buses or signal lines. Sensors, devices, and subsystems can be coupled to peripheral interface 106 to facilitate a variety of functions.

For example, motion sensors 110, light sensors 112, and distance sensors 114 may be coupled to peripheral interface 106 to facilitate directional, lighting, and ranging functions. Other sensors 116 may also be coupled to the peripheral interface 106, such as a positioning system (e.g., a GPS receiver), a temperature sensor, a biometric sensor, or other sensing device, to facilitate related functions.

The camera subsystem 120 and the optical sensor 122 may be used to facilitate implementation of camera functions such as capturing images, where the camera subsystem 120 and the optical sensor 122 may be, for example, a charge-coupled device (CCD) or a complementary metal-oxide semiconductor (CMOS) optical sensor.

Computing device 100 may facilitate communication functions through one or more wireless communication subsystems 124, where wireless communication subsystems 124 may include radio frequency receivers and transmitters and/or optical (e.g., infrared) receivers and transmitters. The particular design and implementation of the wireless communication subsystem 124 may depend on the one or more communication networks supported by the mobile terminal 100. For example, computing device 100 may include a communication subsystem 124 designed to support a GSM network, a GPRS network, an EDGE network, a Wi-Fi or WiMax network, and a Bluetooth network.

The audio subsystem 126 may be coupled to a speaker 128 and a microphone 130 to facilitate voice-enabled functions, such as voice recognition, voice replication, digital recording, and telephony functions.

To display images, the I/O subsystem 140 may include a display controller 142 and/or one or more other input controllers 144. The display controller 142 may be coupled to a display 146. The display 146 may be, for example, a Liquid Crystal Display (LCD), touch screen, or other type of display. In some implementations, the display 146 and display controller 142 can detect contact and movement or pauses made therewith using any of a variety of touch sensing technologies, including but not limited to capacitive, resistive, infrared, and surface acoustic wave technologies. One or more other input controllers 144 may be coupled to other input/control devices 148 such as one or more buttons, rocker switches, thumbwheels, infrared ports, USB ports, and/or pointing devices such as styluses. One or more of the buttons (not shown) may include an up/down button for controlling the volume of the speaker 128 and/or microphone 130.

The memory interface 102 may be coupled with a memory 150. The memory 150 may include high speed random access memory and/or non-volatile memory, such as one or more magnetic disk storage devices, one or more optical storage devices, and/or flash memory (e.g., NAND, NOR).

Memory 150 may store programs 154, with programs 154 running on operating system 152 stored in memory 150. While the computing device is running, an operating system 152 is loaded from memory 150 and executed by the processor 104. The program 154 is also loaded from the memory 150 and executed by the processor 104 during runtime. Among the various programs 154, one of the programs is an

input processing apparatus

200, 700 according to an embodiment of the present invention, and includes instructions configured to execute the input processing methods 500, 600, and 800 according to an embodiment of the present invention.

The following describes the input processing apparatus 200 and the input processing methods 500 and 600 performed by the same.

In some cases, the image associated with the input is displayed with multiple items of text (which may also be referred to as including multiple items of text), and the user wishes to locate the target text among the multiple items of text displayed by the image. For example, the input processing device 200 may receive a user input of a dish of interest and acquire an image of a restaurant menu. The image of the restaurant menu includes logo texts of a plurality of dishes, and the input processing device 200 may highlight the logo texts of the dishes in which the user is interested in the image, so that the user can quickly locate the dishes in which the user is interested. Furthermore, the evaluation or introduction of the dish can be displayed for the user to refer to.

For another example, the input processing device 200 may receive an input of a destination subway station from a user and acquire an image of a subway line. The image of the subway line includes the logo texts of a plurality of subway stations, and the input processing device 200 can highlight the logo text of the destination subway station in the image, so that the user can quickly locate the destination subway station. Further, the translation of the destination subway station can be displayed for the user to refer to.

For another example, the input processing device 200 may receive an input of an item desired to be purchased by a user, and acquire an image of a store shelf. The image of the store shelf includes logo texts of a plurality of products, and the input processing device 200 may highlight the logo texts of the product that the user wants to purchase in the image, so that the user can quickly locate the product. Furthermore, the evaluation, introduction or reference price of the commodity can be displayed for the user to refer to.

FIG. 2 shows a schematic diagram of an input processing device 200 according to one embodiment of the invention. As shown in fig. 2, the input processing apparatus 200 includes an interaction module 210, and the interaction module 210 may receive an input of a user, which may indicate a target text focused by the user. In some embodiments, the interaction module 210 may receive input from a user via a user interface (described in detail below). The user input may be in various forms, for example, may include, but is not limited to, one of: text, images, video, and audio.

If the user's input includes an image, video, or audio, the interaction module 210 may send the input to the recognition module 220, and the recognition module 220 retrieves text based on the user's input. Any image recognition technology and voice recognition technology may be used to obtain the text, which is not limited in the present invention. For convenience of description, text entered by a user, text acquired based on an image entered by a user, text acquired based on a video entered by a user, and text acquired based on audio entered by a user are all referred to herein as input text.

The input text may be one or more items of input text. In the case where the user directly inputs text, a plurality of input texts may be distinguished according to delimiters such as punctuation marks. For example, the user's input is "facing the sun's door; the east bridge; golden platform road ", namely the following three input texts distinguished by semicolon: "Chaoyang gate", "Dongda bridge" and "golden platform road". In the case where the user inputs an image, a plurality of items of input text acquired based on the image may be distinguished according to different text blocks of the input text in the image (which will be described in detail later). In the case where the user inputs audio, a plurality of items of input text acquired based on the audio may be distinguished by pausing or speaking a separation word such as "interval". In the case where the user inputs video, since the video includes images and audio, processing can be performed with reference to the case where images and/or audio are input.

After obtaining one or more items of input text, the text matching module 230 in the input processing apparatus 200 may find a target text matching the input in the image associated with the input.

As shown in fig. 2, the input processing apparatus 200 further includes an image acquisition module 240, and the image acquisition module 240 may acquire an image related to the input of the user. For example, the camera of the computing device 100 may be used to capture the relevant image, or the relevant image sent to the input processing apparatus 200 may be received via a network.

The image associated with the input may include a plurality of text blocks. As described above, the image related to the input displays a plurality of texts, each text corresponds to one image block in the image, that is, one text block, and each text block contains one text.

FIG. 3 shows a schematic diagram of an image associated with an input, according to one embodiment of the invention. The image is an image of a subway line and includes logo texts (i.e., names of subway stations) of a plurality of subway stations. The image block occupied by the name of each subway station in the image is a text block, and the name of the subway station displayed in each text block is a text.

The image related to the input may be in various formats, for example, a bitmap image format such as JPEG, BMP, PNG, and the like, and a vector graphics format such as SVG, SWF, and the like. The invention does not limit the format of the image.

For images in formats such as svg (scalable Vector graphics), the image capture module 240 may directly capture the text contained in the image (without image recognition). For an image in a format such as JPEG, the image acquisition module 240 needs to send the image to the recognition module 220, and the recognition module 220 performs image recognition. The recognition module 220 may obtain the text contained in the image, i.e., obtain a plurality of items of text contained in a plurality of text blocks.

In some implementations, the Recognition module 220 can analyze the image using Optical Character Recognition (OCR) techniques to recognize text in the image. The recognition module 220 may also detect text in a plurality of different languages. For example, the recognition module 220 may include an OCR engine capable of recognizing text in multiple languages, or an OCR engine for each of multiple different languages. Of course, other image recognition techniques may be used to recognize text in an image, and the invention is not limited in this respect.

The recognition module 220 may also detect display information of the text in the image, including, but not limited to, a display area and/or a display pattern of the text in the image. The display area indicates the position of the text in the image, for example the coordinates of the text block in which the text is located. The display style may then include text color, background color, font size, font type, and the like. In some embodiments, this display information may be used to identify different text blocks in the image. For example, where two portions of text have different font colors, different background colors, or are separated from each other (e.g., by at least a threshold distance), the text recognition module 220 may determine the two portions of text in the image as two items of text that are included in two different blocks of text.

Furthermore, according to other embodiments of the present invention, the recognition module 220 may also correct errors generated in image recognition, such as sentence break errors, text errors, grammar errors, and the like, through a Natural Language Processing (NLP) technique.

After acquiring the text(s) contained in the image, the image acquisition module 240 sends the text contained in the image to the text matching module 230. The text matching module 230 may text match the text contained in the image with the input text to find a target text in the image that matches the input text. And for each item of input text, performing text matching on the text contained in the image and the item of input text so as to search a target text matched with the item of input text in the image.

The following describes a process of finding a target text matching an input text by taking the input text as an example. First, the text matching module 230 may determine whether the input text and the text in the image obtained by the image obtaining module 240 use the same language.

In some implementations, the language of the text in the image acquired by the image acquisition module 240 is the same language as the language of the input text. That is, if the language of the text in the image acquired by the image acquisition module 240 is called a source language and the language of the input text is called an input language, and the source language and the input language are the same, the text matching module 230 may directly search for a target text matching the input text in the text included in the image. For example, among a plurality of items of text included in an image, a target text including the item of input text or including at least a part of the item of input text is directly found.

In other implementations, the language of the text in the image acquired by the image acquisition module 240 is two different languages than the language of the input text. That is, the source language and the input language are different languages, and the text matching module 230 needs to translate the text obtained from the image into the input language or translate the input text into the source language.

The text matching module 230 may send the text to be translated to the translation engine 250 for translation. In some embodiments, the text retrieved from the image may be selected to be translated into the input language, but this approach requires translation of every text in the image, which is quite labor intensive. It is therefore preferred that the input text is translated to the source language, i.e., that only the input text is sent to the translation engine 250 for translation.

After translation, the text matching module 230 performs text matching on the text included in the image and the translation of the input text into the source language, so as to find a target text matched with the input text in the text included in the image. In some embodiments, the user may be prompted with a user interface if no target text in the image is found that matches the entry text. If a target text is found in the image that matches the entry text, the user may be prompted with a vibration, ring, flashing light, and/or user interface of the computing device 100.

For each item of input text for which a matching target text is found, in some implementations, the text matching module 230 may highlight the target text that matches the item of input text in the image. In some embodiments, the target text may be re-highlighted above (e.g., overlaid on) the text block in which the target text is located in the image (see description of highlighting ancillary information below).

In other embodiments, marks such as various borders (e.g., rectangular boxes), shapes (e.g., arrows), line segments (e.g., wavy lines), etc., may be added to the text block in the image where the target text is located, thereby highlighting the target text. Thus, the user can quickly locate the target text without reading and searching one by one.

In other implementations, for each entry text for which a matching target text is found, the text matching module 230 may obtain auxiliary information for the target text matching the entry text. It should be noted that the specific content of the auxiliary information is not limited in the embodiment of the present invention, and any information that is related to the target text and can assist the user is within the protection scope of the present invention.

In some embodiments, the auxiliary information may be information of rating, introduction, reference price, purchase channel, etc., and the text matching module 230 may obtain the auxiliary information from various search engines, review websites, or shopping platforms.

In other embodiments, the auxiliary information may be a translation. The text matching module 230 may send the target text matching the entry text to the translation engine 250 to obtain a translation of the target text. Translation engine 250 may translate the target text into a different language. For example, the translation engine 250 may translate the text into a user-specified target language, or a default language for the computing device 100 (e.g., where the target language is not specified), or an input language for the input text (e.g., where the target language is not specified and the input language is different from the source language). Wherein the user may use a user interface to specify a target language (described in detail below).

After obtaining the auxiliary information of the target text, the interaction module 210 may display or highlight the auxiliary information using the user interface on which the image related to the input is displayed.

Specifically, for the searched target text, the interaction module 210 may obtain display information of the target text in the image, and display or highlight auxiliary information of the target text based on the display information.

For example, the display area of the auxiliary information may be configured based on the display area of the target text. The display area of the auxiliary information may cover the corresponding text block of the target text in the image or may be close to (e.g., displayed around) the corresponding text block of the target text in the image.

For example, the display style of the auxiliary information may be configured based on the display style of the target text. In some cases, in order to make the display of the auxiliary information compatible with the image, at least a portion of the display style of the auxiliary information may be configured to be consistent with a corresponding display style of the target text (e.g., consistent with a font size, font type configuration). In other cases, to highlight the auxiliary information, at least a portion of the display style of the auxiliary information may be configured to be significantly different from the corresponding display style of the target text. The auxiliary information may be highlighted, for example, by configuring the background color (or text color) of the auxiliary information to be a bright color or a contrasting color to the background color of the text contained in the image. Also, for example, display styles such as underlining, bolding, italics, text undertones, text shading, and text borders may be employed to highlight the auxiliary information. The embodiment of the present invention does not limit the specific display style for displaying or highlighting the auxiliary information.

In addition, since the input text may have a plurality of items, in order to facilitate the user to distinguish the target text matching different items of input text, the display pattern of the mark of the target text or the auxiliary information thereof matching the same item of input text in the image may be the same, and the display pattern of the mark of the target text or the auxiliary information thereof matching different items of input text in the image may be different.

According to some embodiments of the present invention, the displayed auxiliary information may be editable text to facilitate editing operations such as copy and paste by the user.

According to other embodiments of the present invention, the interaction module 210 may further receive a zoom instruction or a selection instruction of the user for the displayed auxiliary information, and in response to the zoom instruction or the selection instruction of the user for the displayed auxiliary information, display the auxiliary information in a zoom-in or zoom-out manner, so as to facilitate reading by the user. The interaction module 210 may also receive a save instruction from the user and store the image with the auxiliary information displayed thereon to the computing device 100 in response to the save instruction for subsequent viewing by the user.

According to still other embodiments of the present invention, the interaction module 240 may further receive a selection instruction of the user for another text block not displaying auxiliary information, and in response to the selection instruction of the user for another text block not displaying auxiliary information, obtain and display auxiliary information of text included in the selected text block. The specific configuration shown is described in detail above and will not be described herein.

The process of locating a target text for a user and displaying a translation of the target text according to an input processing scheme of an embodiment of the present invention is described below in conjunction with fig. 4A-4C.

FIGS. 4A-AC illustrate screenshots of a user interface for displaying a translation of target text in an image, according to one embodiment of the invention. As shown in fig. 4A, the user interface 410 is displayed with an image 411, which image 411 may be captured in response to a user selection of the camera button 412. The image 411 includes a plurality of text blocks, each text block including an item of text. In the example of FIG. 4A, each text block included in image 411 includes an item of English text, e.g., text block 416 includes an item of English text "King's Cross St Pancras".

The user interface 410 may also enable the user to select a language for translation. In particular, user interface 410 enables a user to select source language 413 such that multiple items of text in the source language will be identified in image 411. The user interface 410 also enables a user to select a target language 414 into which the text is to be translated. In the example of FIG. 4A, the user has selected the source language 413 as English and the target language 414 as Chinese. That is, the user wants to translate english text recognized in the image 411 into chinese text.

The user interface 410 may also enable a user to make inputs. In particular, the user interface includes an input button 415. When the user selects the input button 415, a user interface 420 as shown in fig. 4B may be displayed in response to the user's selection of the input button 415.

The user interface 420 includes an image 411 and input buttons 415, similar to the user interface 410, and also includes a control 421. The control 421 may receive input from the user and may also prompt the user for input in the target language 414. For example, input text in the target language 414 or an image/audio including the input text in the target language 414 may be input. In the example of FIG. 4B, control 421 prompts the user to enter input text in Chinese. After the user makes an input, in response to control 421 receiving the user's input, user interface 430 as shown in FIG. 4C may be displayed.

The user interface 430 includes an image 411 and input buttons 415 similar to the user interface 410, and also includes a control 431. The control 431 may display one or more input texts derived from the input of the user, and may also display a search result for searching the image 411 for at least one text matching based on the one or more input texts. In the example of FIG. 4C, control 431 shows an entry of text "King Cross" and the search for the entry of text "King Cross found".

The user interface 430 also displays an overlay 432 over the image 411. The overlay 432 displays a translation 433 of the target text in the image 411 that matches the user's input text and overlays the text block corresponding to the matching target text. In the example of FIG. 4C, an item of text "King's Cross St Pancras" contained by a text block 416 in the image 411 matches an item of input text "King's Cross" of the user, an overlay 432 displays a translation 433 "King's Cross St Pancras" of the item of text "King's Cross", and overlays the corresponding text block 416.

The user interface 430 may configure a display style of the translation of the target text based on the display style of the matching target text. In the example of FIG. 4C, translation 433 is similar in background color (e.g., both white) and text color (e.g., both blue) to the matching target text "King's Cross St Pancras", which helps to display translation 433 in harmony with image 411. In addition, the user interface 430 may also highlight the translation. In the example of FIG. 4C, the text of translation 433 is base colored in light (e.g., light blue), which helps to highlight translation 433.

4A-4C, a user can quickly and easily locate the target text of interest or interest in the image 411 and obtain a translation thereof without having to passively read translations of all text in the image. For example, after capturing image 411 and entering the "king cross" through user interface 420, the user can easily locate the "king cross" subway station and see its translation through user interface 430.

User interface 430 may also enable a user to download and store image 411 with translation 433 displayed. In particular, user interface 430 includes a download button 434. In response to user selection of download button 434, image 411 with displayed translation 433 can be downloaded and stored for subsequent viewing by the user.

FIG. 5 shows a flow diagram of an input processing method 500 according to one embodiment of the invention. The input method 500 may be performed in the input processing device 200.

As shown in fig. 5, the input processing method 500 begins at step S510. In step S510, an input of a user is received. In step S520, an image related to the input of the user is acquired.

In some embodiments, the user input may include at least one of: text, images, video, and audio, and after receiving the user input, the input text may be acquired based on the user input.

After acquiring the image related to the user 'S input, in step S530, the target text matching the user' S input in the image is searched. Specifically, the text included in the image may be acquired, and then the text included in the image and the input text may be subjected to text matching to search for a matched target text. When the language of the input text is different from the language of the text contained in the image, the translation from the input text to the language of the text contained in the image can be acquired, and the text contained in the image is subjected to text matching with the translation of the input text.

For the searched target text, the target text may be highlighted in the image related to the input and/or auxiliary information of the target text may be displayed according to step S540. The auxiliary information may be displayed in a highlighted manner.

According to some embodiments of the present invention, for the searched target text, display information of the target text in the image may be acquired, and the auxiliary information may be displayed or highlighted based on the display information.

Specifically, the display information may include at least a display region and/or a display style of the target text. In some embodiments, the display area of the auxiliary information may be configured based on the display area of the target text, for example, the display area of the auxiliary information may cover or be close to the display area of the target text. The display style of the auxiliary information may also be configured based on the display style of the target text.

For the detailed processing logic and implementation procedure of each step in the input processing method 500, reference may be made to the foregoing description of the input processing apparatus 200 in conjunction with fig. 1-4C, and details are not repeated here.

FIG. 6 shows a flow diagram of an input processing method 600 according to one embodiment of the invention. The input method 600 may be performed in the input processing device 200. As shown in fig. 6, the input processing method 600 begins at step S610.

In step S610, an input of a user is received. In step S620, an image related to the input of the user is acquired. In some embodiments, after receiving the user's input, the input text may also be retrieved based on the input.

Then, in step S630, the target text matching the input of the user in the image is searched. Specifically, the text contained in the image may be acquired. The input text is represented in an input language, and the text included in the image is represented in a source language different from the input language, so that a translation from the input text to the source language needs to be acquired, and then text matching is performed on the text included in the image and the translation of the input text to search for a matched target text. The translation may include a translation of the target text into the input language, or a translation of the target text into a user-specified target language.

For the found target text, the target text may be highlighted in the image and/or a translation of the target text may be displayed according to step S640.

For the detailed processing logic and implementation procedure of each step in the input processing method 600, reference may be made to the foregoing description of the input processing apparatus 200 and the input processing method 500 in conjunction with fig. 1 to 5, which is not described herein again.

It will be understood by those skilled in the art that although the input processing apparatus 200 is illustrated as including the interaction module 210, the recognition module 220, the text matching module 230, the image acquisition module 240, and the translation engine 250, one or more of these modules may be stored on and/or executed by other devices, such as a server in communication with the input processing apparatus 200 (e.g., a server that may perform image recognition, speech recognition, text matching, and language translation). In some embodiments, the input processing device 200 may receive an input from a user, obtain an image associated with the input, and send the input and the image to a server. The server searches for a target text in the image that matches the input and returns the search result to the input processing device 200. The input processing device 200 then highlights the target text found by the server and/or displays auxiliary information of the target text in the image. For the detailed processing logic and implementation process of the server for finding the target text, reference may be made to the foregoing description of the input processing apparatus 200 and the input processing method 500 in conjunction with fig. 1 to 5, which is not described herein again.

The following describes the input processing apparatus 700 and the input processing method 800 executed by the same.

The image associated with the input may be considered to include a plurality of image blocks, and in some cases, the user may wish to locate an image block of interest, i.e., the target image, among the plurality of image blocks included in the image. For example, the input processing device 700 may receive a user input for an item desired to be purchased and acquire an image of a store shelf. The image of the store shelf includes logo images of a plurality of goods, and the input processing device 700 may highlight the logo images of the goods that the user wants to purchase in the image, so that the user can quickly locate the goods. Furthermore, the evaluation, introduction or reference price of the commodity can be displayed for the user to refer to.

FIG. 7 shows a schematic diagram of an input processing device 700 according to one embodiment of the invention. As shown in fig. 7, the input processing apparatus 700 includes an interaction module 710, an image acquisition module 720, and an image matching module 730. The interaction module 710 may receive input from a user, which may indicate a target image of interest to the user. The image acquisition module 720 may acquire an image related to the input of the user. The image matching module 730 is coupled to the interaction module 710 and the image obtaining module 720, and can search the target image in the image matching with the input of the user. For the searched target image, the interaction module 720 may highlight the target image in the image and/or display auxiliary information of the target image.

For the detailed processing logic and implementation of the modules in the input processing device 700, reference is made to the following description of the input processing method 800 in conjunction with fig. 8.

FIG. 8 shows a flow diagram of an input processing method 800 according to one embodiment of the invention. As shown in fig. 8, input processing method 800 is performed in input processing device 700 and begins at step S810.

In step S810, an input of a user may be received. In some embodiments, a user's input may be received via a user interface, which may indicate a target image of interest to the user.

The user input may be in various forms, for example, may include, but is not limited to, one of: text, images, video, and audio. If the user's input includes text, audio, or video, then the input image may also need to be captured based on the user's input. Wherein, if the user inputs an image, the image is the input image. If the user enters text, an input image may be captured (e.g., captured by a search engine) based on the text. If the user inputs audio, text may be retrieved based on the audio, and an input image may be retrieved based on the text. For example, a user may input text of "X boys" or speak a voice of "X boys," and an image of "X" may be acquired based on the input. If a user inputs a video, since the video includes an image and an audio, the input image can be acquired with reference to the case of the input image and the audio.

The input image may be one or more input images. In the case where the user directly continuously inputs images, the plurality of input images can be naturally distinguished. In the case where the user inputs text, a plurality of texts (which may be distinguished according to separators such as punctuation marks) may be acquired first, and then each input image may be acquired based on each text. In the case where the user inputs audio, a plurality of texts may be acquired based on the audio (the plurality of texts acquired based on the audio may be distinguished by pausing or speaking a separation word such as "interval"), and then each input image may be acquired based on each text separately. In the case where the user inputs the video, since the video includes continuous images and audio, it is possible to refer to the case where the images and/or the audio are continuously input.

In step S820, an image related to the input of the user may be acquired. For example, the camera of the computing device 100 may be used to capture the relevant image, or the relevant image sent to the input processing apparatus 700 may be received via a network.

Then, according to step S830, a target image matching the input of the user in the image may be searched. Wherein for each input image, a target image in the image is found that matches the input image(s).

Taking the example of finding a target image in an image that matches an input image, image features of the image and the input image(s) may be obtained first, and then the image and the input image may be subjected to image matching based on the obtained image features to find a target image in the image that matches the input image. Any image feature extraction technology may be adopted to obtain the image features, and any image matching technology may be adopted to perform image matching, which is not limited in the present invention.

In some embodiments, the user may be prompted with a user interface if no target image of the images is found that matches the input image. If a target image in the image that matches the input image is found, the user may be prompted with a vibration, ring, flashing light, and/or user interface of the computing device 100.

For each input image for which a matching target image is found, a target image matching the input image and/or auxiliary information for displaying the target image may be prominently displayed in the image according to step S840. The auxiliary information may be information such as introduction, evaluation, reference price, etc., which is not limited by the present invention. For example, the auxiliary information related to the target image may be acquired from various search engines, a review website, or a shopping platform.

In some embodiments, various indicia such as borders (e.g., rectangular boxes), shapes (e.g., arrows), line segments (e.g., wavy lines), etc., may be added to the target image in the image to highlight the target image. Therefore, the user can quickly locate the target image without searching one by one.

In other embodiments, a display area of the target image in the image may be obtained, the display area indicating a position of the target image in the image. Then, the display area of the auxiliary information is configured based on the display area of the target image, for example, the display area of the auxiliary information may cover or be close to the display area of the target image. Further, if the image related to the input contains text, the display style of the auxiliary information may be configured based on the display style of the text contained in the image. In some cases, in order to make the display of the auxiliary information compatible with the image, at least a portion of the display pattern of the auxiliary information may be configured to coincide with a corresponding display pattern of text in the image (e.g., font size, font type configuration). In other cases, to highlight the auxiliary information, at least a portion of the display pattern of the auxiliary information may be configured to be significantly different from the corresponding display pattern of the text contained by the image. The auxiliary information may be highlighted, for example, by configuring the background color (or text color) of the auxiliary information to be a bright color or a contrasting color to the background color of the text contained in the image. Also, for example, display styles such as underlining, bolding, italics, text undertones, text shading, and text borders may be employed to highlight the auxiliary information. The embodiment of the present invention does not limit the specific display style for displaying or highlighting the auxiliary information.

In addition, since there may be a plurality of input images, in order to facilitate a user to distinguish between target images matching different input images, the display patterns of the marks of the target images and/or their auxiliary information in the images matching the same input image may be the same, and the display patterns of the marks of the target images and/or their auxiliary information in the images matching different input images may be different.

According to some embodiments of the present invention, the displayed auxiliary information may be editable text to facilitate editing operations such as copy and paste by the user. According to other embodiments of the present invention, a zoom instruction or a selection instruction of the user for the displayed auxiliary information may be further received, and in response to the zoom instruction or the selection instruction of the user for the displayed auxiliary information, the auxiliary information is displayed in a reduced or enlarged manner, so as to facilitate reading by the user. A save instruction of the user may also be received, and the image with the auxiliary information displayed thereon may be stored to the computing device 100 in response to the save instruction for subsequent viewing by the user.

According to still some embodiments of the present invention, a user selection instruction for other image blocks not displaying auxiliary information may also be received, and in response to the user selection instruction for other image blocks not displaying auxiliary information, the auxiliary information of the selected image block is acquired and displayed. The specific configuration shown is described in detail above and will not be described herein.

For a part of detailed processing logic and implementation procedures of the steps in the input processing method 800, reference may be made to the related descriptions of the input processing apparatus 200 and the input processing method 500 in conjunction with fig. 1 to 5. And will not be described in detail herein.

Those skilled in the art will appreciate that one or more of the steps of the input processing method 800 performed by the input processing apparatus 700 described above may also be performed by other devices, such as a server in communication with the input processing apparatus 700. In some embodiments, the input processing device 700 may receive an input from a user, obtain an image associated with the input, and send the input and the image to a server. The server searches for a target image in the image that matches the input, and returns the search result to the input processing device 700. The input processing device 700 then highlights the target image found by the server and/or displays auxiliary information of the target image in the image.

It will also be appreciated by those skilled in the art that embodiments of the present invention may be extended to locating various target objects such as target text, target images, and the like, among various source objects such as text, images, audio, and the like, without being limited to locating target text or target images in images.

FIG. 9 shows a flow diagram of an input processing method 900 according to one embodiment of the invention. As shown in fig. 9, the input processing method 900 begins at step S910.

In step S910, an input from a user is received, and the input from the user may include at least one of the following: text, audio, video, and images. In step S920, a source object related to the input of the user is obtained, where the source object may include at least one of: text, images, and video.

Then, a target object matching the input of the user among the source objects may be searched for in step S930. For the searched target object, according to step S940, the target object may be highlighted in the source object and/or auxiliary information of the target object may be displayed.

For a part of detailed processing logic and implementation procedures of the steps in the input processing method 900, reference may be made to the related descriptions of the input processing apparatus 200 and the input processing method 500, the input processing apparatus 700 and the input processing method 800 in conjunction with fig. 1 to 8. And will not be described in detail herein.

It will be appreciated by those skilled in the art that a source object may typically comprise a plurality of objects, for example comprising a plurality of items of text or a plurality of image blocks. The target object may then be an object of interest to a user among a plurality of objects comprised by the source object, for example a target text, a target image or any other suitable object. The invention does not limit the specific types of the target object and the source object, and scenes of searching the target object from the source object such as searching the target text from the image, searching the target text from the text, searching the target image from the image and the like are all within the protection scope of the invention.

It will also be understood by those skilled in the art that one or more of the steps of the input processing method 900 may also be performed by other devices, such as a server in communication with an input processing apparatus performing the input processing method 900. In some embodiments, a user input may be received, an image associated with the input may be obtained, and the input and the image may be transmitted to a server. The server searches the target image matched with the input in the image and returns the search result. The input processing device executing the input processing method 900 then highlights the target image found by the server and/or displays auxiliary information for the target image in the image.

According to the input processing scheme provided by the embodiment of the invention, the target objects (such as target texts or target images) matched with the input in the source objects (such as images) are highlighted for the user by receiving the active input of the user, and/or the auxiliary information of the target objects is displayed, so that the tedious operation of reading and searching the target objects concerned by the user in the source objects one by one is avoided, and the user experience is improved. In addition, the input processing scheme according to the embodiment of the invention only needs to acquire the auxiliary information of the target object included in the source object, thereby greatly reducing the workload.

The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of embodiments of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as removable hard drives, U.S. disks, floppy disks, CD-ROMs, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing embodiments of the invention.

In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to perform the methods of embodiments of the present invention according to instructions in the program code stored in the memory.

By way of example, and not limitation, readable media may comprise readable storage media and communication media. Readable storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of readable media.

In the description provided herein, algorithms and displays are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with examples of embodiments of the invention. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of embodiments of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best modes of embodiments of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that is, the claimed embodiments of the invention require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of an embodiment of this invention.

Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of and form different embodiments of the invention. For example, in the following claims, any of the claimed embodiments may be used in any combination.

Furthermore, some of the above embodiments are described herein as a method or combination of elements of a method that can be performed by a processor of a computer system or by other means for performing the functions described above. A processor having the necessary instructions for carrying out the method or method elements described above thus forms a means for carrying out the method or method elements. Furthermore, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.

As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.

While embodiments of the invention have been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the embodiments of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive embodiments. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present embodiments are disclosed by way of illustration and not limitation, the scope of embodiments of the invention being defined by the appended claims.

Claims

1. An input processing method, comprising:

receiving input of a user;

acquiring an image related to the input;

searching a target text matched with the input in the image; and

and for the searched target text, highlighting and displaying the target text in the image and/or displaying auxiliary information of the target text.

2. The method of claim 1, wherein the user input comprises at least one of: text, images, video and audio, the method further comprising:

obtaining input text based on the user input.

3. The method of claim 2, wherein the step of finding a target text in the image that matches the input comprises:

acquiring a text contained in the image;

and performing text matching on the text contained in the image and the input text.

4. The method of claim 3, wherein text matching the text contained in the image with the input text comprises:

acquiring a translation of the input text into a language of text contained in the image in a case where the language of the input text is different from the language of text contained in the image;

text matching is performed on text contained in the image and the translation of the input text.

5. The method of claim 1, wherein the displaying the auxiliary information comprises:

highlighting the auxiliary information.

6. The method of claim 1 or 5, further comprising:

and for the searched target text, acquiring display information of the target text in the image, wherein the display information at least comprises a display area and/or a display style of the target text.

7. The method of claim 6, wherein the step of displaying or highlighting the auxiliary information comprises:

configuring a display area of the auxiliary information based on a display area of the target text, the display area of the auxiliary information covering the display area of the target text.

8. The method of claim 6, wherein the step of displaying or highlighting the auxiliary information comprises:

configuring a display style of the auxiliary information based on a display style of the target text.

9. An input processing method, comprising:

receiving input of a user;

acquiring an image related to the input;

sending the input and the image to a server so that the server can search a target text matched with the input in the image; and

10. An input processing method, comprising:

receiving input of a user;

acquiring an image related to the input;

searching a target text matched with the input in the image; and

and for the searched target text, highlighting and displaying the target text in the image and/or displaying the translation of the target text.

11. The method of claim 10, further comprising:

obtaining input text based on the input, the input text being represented in an input language; and

the step of finding a target text in the image that matches the input comprises:

acquiring text contained in the image, wherein the text contained in the image is represented in a source language different from the input language;

obtaining a translation of the input text to the source language;

12. The method of claim 11, wherein the translation comprises a translation of the target text into the input language or a translation of the target text into a user-specified target language.

13. An input processing method, comprising:

receiving input of a user;

acquiring an image related to the input;

14. An input processing method, comprising:

receiving input of a user;

acquiring an image related to the input;

searching a target image matched with the input in the image; and

and for the searched target image, highlighting and displaying the target image in the image and/or displaying auxiliary information of the target image.

15. The method of claim 14, wherein the user input comprises at least one of: text, audio, video, and images, the method further comprising:

an input image is acquired based on the input.

16. The method of claim 15, wherein the step of finding a target image in the image that matches the input comprises:

acquiring image characteristics of the image and the input image;

and performing image matching on the image and the input image based on the image characteristics.

17. The method of claim 14, wherein the displaying the auxiliary information comprises:

highlighting the auxiliary information.

18. The method of claim 14 or 17, further comprising:

and for the searched target image, acquiring a display area of the target image in the image.

19. The method of claim 18, wherein the step of displaying or highlighting the auxiliary information comprises:

configuring a display area of the auxiliary information based on a display area of the target image, the display area of the auxiliary information covering the display area of the target image.

20. An input processing method, comprising:

receiving input of a user;

acquiring an image related to the input;

sending the input and the image to a server so that the server can search a target image matched with the input in the image; and

21. An input processing method, comprising:

receiving input of a user;

obtaining a source object associated with the input;

searching a target object matched with the input in the source object; and

and for the searched target object, highlighting and displaying the target object in the source object and/or displaying auxiliary information of the target object.

22. The method of claim 21, wherein the user input comprises at least one of: text, audio, video and images, the source objects including at least one of: text, images, and video.

23. The method of claim 21, wherein the target object is a target text or a target image.

24. An input processing method, comprising:

receiving input of a user;

obtaining a source object associated with the input;

sending the input and the source object to a server so that the server can search a target object matched with the input in the source object; and

25. An input processing apparatus comprising:

an interaction module adapted to receive input from a user;

an image acquisition module adapted to acquire an image related to the input;

the text matching module is suitable for searching a target text matched with the input in the image; wherein

The interaction module is further adapted to highlight the target text in the image and/or display auxiliary information of the target text for the searched target text.

26. An input processing apparatus comprising:

an interaction module adapted to receive input from a user;

an image acquisition module adapted to acquire an image related to the input;

the image matching module is suitable for searching a target image matched with the input in the image; wherein

The interaction module is further adapted to highlight the target image in the image and/or display auxiliary information of the target image for the searched target image.

27. A computing device, comprising:

one or more processors;

a memory; and

a program, wherein the program is stored in the memory and configured to be executed by the one or more processors, the program comprising instructions for performing the input processing method of any of claims 1-24.

28. A computer-readable storage medium storing a program, the program comprising instructions that, when executed by a computing device, cause the computing device to perform the input processing method of any of claims 1-24.