WO2021139667A1 - Input processing method and apparatus, and computing device - Google Patents

Input processing method and apparatus, and computing device Download PDF

Info

Publication number
WO2021139667A1
WO2021139667A1 PCT/CN2021/070410 CN2021070410W WO2021139667A1 WO 2021139667 A1 WO2021139667 A1 WO 2021139667A1 CN 2021070410 W CN2021070410 W CN 2021070410W WO 2021139667 A1 WO2021139667 A1 WO 2021139667A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
input
text
target
auxiliary information
Prior art date
Application number
PCT/CN2021/070410
Other languages
French (fr)
Chinese (zh)
Inventor
葛妮瑜
方视菁
胡雪梅
李洁
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2021139667A1 publication Critical patent/WO2021139667A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/535Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/418Document matching, e.g. of document images

Definitions

  • the present invention relates to the field of computer technology, in particular to an input processing method, device and computing equipment.
  • computing devices such as mobile terminals and personal computers
  • users are becoming more and more accustomed to using computing devices to handle various daily affairs.
  • a user may wish to use a computing device to quickly locate part of the information the user pays attention to, or obtain auxiliary information of the part of the information the user pays attention to.
  • a typical application scenario is that when traveling in a region where different languages are spoken, and faced with a large amount of information in an unfamiliar language, the user hopes to use a computing device to locate part of the information that the user pays attention to and obtain the translation of this part of the information.
  • a user can use a camera installed in a computing device to collect an image that includes a large amount of information, the computing device recognizes all the information in the image, and displays auxiliary information of all the information to the user.
  • the image may be an image of a map, which includes the names of many different places.
  • the user interface that displays the translations of all the place names may be tiring for the user, because the user needs to read one by one to find the place he cares about or is interested in, and obtain the translation for that place. This process is quite time-consuming and laborious, which reduces the user experience.
  • embodiments of the present invention provide an input processing method, device, and computing device to try to solve or at least alleviate the above problems.
  • an input processing method including: receiving a user's input; acquiring an image related to the input; searching for a target text in the image that matches the input; and for the found target text, Highlight the target text in the image, and/or display auxiliary information for the target text.
  • the user's input includes at least one of the following: text, image, video, and audio.
  • the method further includes: acquiring the input text based on the user's input.
  • the step of searching for the target text in the image that matches the input includes: obtaining the text contained in the image; and performing text matching between the text contained in the image and the input text.
  • the step of matching the text contained in the image with the input text includes: in the case that the language of the input text is different from the language of the text contained in the image, obtaining the input text to The language translation of the text contained in the image; the text contained in the image is matched with the translation of the input text.
  • the step of displaying the auxiliary information includes: highlighting the auxiliary information.
  • the method further includes: for the found target text, obtaining display information of the target text in the image, the display information including at least the display area and/or display style of the target text.
  • the step of displaying or highlighting the auxiliary information includes: configuring the display area of the auxiliary information based on the display area of the target text, and the display area of the auxiliary information covers the display area of the target text.
  • the step of displaying or highlighting the auxiliary information includes: configuring the display style of the auxiliary information based on the display style of the target text.
  • an input processing method including: receiving user input; acquiring an image related to the input; sending the input and the image to a server, so that the server finds the image that matches the input Target text; and for the found target text, highlight the target text in the image, and/or display auxiliary information of the target text.
  • an input processing method including: receiving a user's input; acquiring an image related to the input; searching for a target text in the image that matches the input; and for the searched target text , Highlight the target text in the image, and/or show the translation of the target text.
  • the method further includes: obtaining input text based on the input, and the input text is expressed in the input language; and the step of searching for the target text in the image that matches the input includes: obtaining text contained in the image , The text contained in the image is expressed in a source language different from the input language; the translation of the input text to the source language is obtained; the text contained in the image is matched with the translation of the input text.
  • the translation includes the translation of the target text into the input language, or the translation of the target text into the target language specified by the user.
  • an input processing method including: receiving user input; acquiring an image related to the input; sending the input and the image to a server, so that the server finds the image that matches the input Target text; and for the found target text, highlight the target text in the image, and/or display the translation of the target text.
  • an input processing method including: receiving a user's input; acquiring an image related to the input; searching for a target image in the image that matches the input; and for the searched target image , Highlight the target image in the image, and/or display auxiliary information of the target image.
  • the user's input includes at least one of the following: text, audio, video, and image
  • the method further includes: acquiring an input image based on the input.
  • the step of searching for a target image in the image that matches the input includes: acquiring image features of the image and the input image; and performing image matching between the image and the input image based on the image features.
  • the step of displaying the auxiliary information includes: highlighting the auxiliary information.
  • the method further includes: for the searched target image, acquiring the display area of the target image in the image.
  • the step of displaying or highlighting the auxiliary information includes: configuring the display area of the auxiliary information based on the display area of the target image, and the display area of the auxiliary information covers the display area of the target image.
  • an input processing method including: receiving user input; acquiring an image related to the input; sending the input and the image to a server, so that the server finds the image that matches the input Target image; and for the searched target image, highlight the target image in the image, and/or display auxiliary information of the target image.
  • an input processing method including: receiving a user's input; obtaining a source object related to the input; searching for a target object matching the input in the source object; Target object, highlight the target object in the source object, and/or display auxiliary information of the target object.
  • the user's input includes at least one of the following: text, audio, video, and image
  • the source object includes at least one of the following: text, image, and video.
  • the target object is a target text or a target image.
  • an input processing method including: receiving user input; obtaining a source object related to the input; sending the input and the source object to a server, so that the server can find the source object and the input The matched target object; and for the found target object, highlight the target object in the source object, and/or display auxiliary information of the target object.
  • an input processing device including: an interaction module, adapted to receive user input; an image acquisition module, adapted to acquire an image related to the input; and a text matching module, adapted to search The target text in the image that matches the input; wherein the interaction module is also adapted to highlight the target text in the image and/or display auxiliary information of the target text for the found target text.
  • an input processing device including: an interaction module, adapted to receive user input; an image acquisition module, adapted to acquire an image related to the input; an image matching module, adapted to search The target image matching the input in the image; wherein the interaction module is also suitable for highlighting the target image in the image and/or displaying auxiliary information of the target image for the found target image.
  • a computing device including: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be configured by Executed by one or more processors, and the one or more programs include instructions for executing the input processing method according to the embodiment of the present invention.
  • a computer-readable storage medium storing one or more programs.
  • the one or more programs include instructions that, when executed by a computing device, cause the computing device to execute according to The input processing method of the embodiment of the present invention.
  • the target object for example, target text or target image
  • the input of the source object for example, image
  • the target object is highlighted for the user, and/or the target object is displayed
  • the auxiliary information in the source object avoids the tedious operation of reading and finding the target object that the user pays attention to in the source object, and improves the user experience.
  • the input processing solution object according to the embodiment of the present invention only needs to obtain the auxiliary information of the target object included in the source object, which greatly reduces the workload.
  • Fig. 1 shows a schematic diagram of a computing device 100 according to an embodiment of the present invention
  • FIG. 2 shows a structural block diagram of an input processing device 200 according to an embodiment of the present invention
  • Fig. 3 shows a schematic diagram of an input-related image according to an embodiment of the present invention
  • 4A-4C respectively show screenshots of a user interface for displaying the translation of target text in an image according to an embodiment of the present invention.
  • FIG. 5 shows a flowchart of an input processing method 500 according to an embodiment of the present invention
  • FIG. 6 shows a flowchart of an input processing method 600 according to an embodiment of the present invention
  • FIG. 7 shows a structural block diagram of an input processing device 700 according to an embodiment of the present invention.
  • FIG. 8 shows a flowchart of an input processing method 800 according to an embodiment of the present invention.
  • FIG. 9 shows a flowchart of an input processing method 900 according to an embodiment of the present invention.
  • an embodiment of the present invention discloses an input processing device, which can receive user input, obtain an image related to the input and carry a large amount of information, and highlight the target of the user’s attention in the image. Object, and/or display auxiliary information of the target object, so that the user can quickly locate the target object and/or obtain auxiliary information of the target object.
  • Fig. 1 shows a schematic diagram of a computing device 100 according to an embodiment of the present invention.
  • the computing device 100 is an electronic device capable of collecting and/or displaying images, such as a personal computer, a mobile communication device (for example, a smart phone), a tablet computer, and other devices that can collect and/or display images.
  • the computing device 100 may include a memory interface 102, one or more processors 104, and a peripheral interface 106.
  • the memory interface 102, the one or more processors 104, and/or the peripheral interface 106 may be discrete components or integrated in one or more integrated circuits.
  • various elements may be coupled through one or more communication buses or signal lines. Sensors, devices, and subsystems can be coupled to the peripheral interface 106 to help achieve multiple functions.
  • the motion sensor 110, the light sensor 112, and the distance sensor 114 may be coupled to the peripheral interface 106 to facilitate functions such as orientation, lighting, and distance measurement.
  • Other sensors 116 can also be connected to the peripheral interface 106, such as a positioning system (such as a GPS receiver), a temperature sensor, a biometric sensor or other sensing devices, which can help implement related functions.
  • the camera subsystem 120 and the optical sensor 122 can be used to facilitate the realization of camera functions such as capturing images.
  • the camera subsystem 120 and the optical sensor 122 can be, for example, a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS).
  • CMOS complementary metal oxide semiconductor
  • the computing device 100 may help implement communication functions through one or more wireless communication subsystems 124, where the wireless communication subsystem 124 may include a radio frequency receiver and transmitter and/or an optical (for example, infrared) receiver and transmitter.
  • the specific design and implementation of the wireless communication subsystem 124 may depend on one or more communication networks supported by the computing device 100.
  • the computing device 100 may include a wireless communication subsystem 124 designed to support a GSM network, a GPRS network, an EDGE network, a Wi-Fi or WiMax network, and a BluetoothTM network.
  • the audio subsystem 126 may be coupled with the speaker 128 and the microphone 130 to help implement voice-enabled functions, such as voice recognition, voice reproduction, digital recording, and telephony functions.
  • the I/O subsystem 140 may include a display controller 142 and/or one or more other input controllers 144.
  • the display controller 142 may be coupled to the display 146.
  • the display 146 may be, for example, a liquid crystal display (LCD), a touch screen, or other types of displays.
  • the display 146 and the display controller 142 may use any one of a variety of touch sensing technologies to detect contact and movement or pause with them, where sensing technologies include but are not limited to capacitive, Resistive, infrared and surface acoustic wave technology.
  • One or more other input controllers 144 may be coupled to other input/control devices 148, such as one or more buttons, rocker switches, thumbwheels, infrared ports, USB ports, and/or pointing devices such as stylus .
  • One or more of the buttons may include an up/down button for controlling the volume of the speaker 128 and/or the microphone 130.
  • the memory interface 102 may be coupled with the memory 150.
  • the memory 150 may include high-speed random access memory and/or non-volatile memory, such as one or more magnetic disk storage devices, one or more optical storage devices, and/or flash memory (eg, NAND, NOR).
  • the memory 150 may store a program 154, and the program 154 runs on the operating system 152 stored in the memory 150.
  • the operating system 152 will be loaded from the memory 150 and executed by the processor 104.
  • the program 154 is running, it is also loaded from the memory 150 and executed by the processor 104.
  • one of the programs is the input processing apparatus 200, 700 according to the embodiment of the present invention, and includes instructions configured to execute the input processing methods 500, 600, and 800 according to the embodiment of the present invention.
  • the following describes the input processing device 200 and the input processing methods 500 and 600 executed by it.
  • the input-related image displays multiple texts (also referred to as containing multiple texts), and the user wishes to locate the target text in the multiple texts displayed in the image.
  • the input processing device 200 may receive the user's input on a dish of interest, and obtain an image of a restaurant menu.
  • the image of the restaurant menu includes the logo text of multiple dishes, and the input processing device 200 can highlight the logo text of the dish that the user is interested in in the image, so that the user can quickly locate the dish of interest.
  • the evaluation or introduction of the dish can also be displayed for the user's reference.
  • the input processing device 200 may receive a user's input to a destination subway station, and obtain an image of a subway line.
  • the image of the subway line includes the logo text of multiple subway stations, and the input processing device 200 can highlight the logo text of the target subway station in the image, so that the user can quickly locate the target subway station.
  • the translation of the destination subway station can also be displayed for the user's reference.
  • the input processing device 200 may receive a user's input on a commodity that he wants to purchase, and obtain an image of a store shelf.
  • the image of the store shelf includes the logo text of multiple commodities, and the input processing device 200 can highlight the logo text of the commodity that the user wants to purchase in the image, so that the user can quickly locate the commodity.
  • the evaluation, introduction or reference price of the product can also be displayed for the user's reference.
  • Fig. 2 shows a schematic diagram of an input processing device 200 according to an embodiment of the present invention.
  • the input processing apparatus 200 includes an interaction module 210, which can receive a user's input, and the input can indicate a target text that the user pays attention to.
  • the interaction module 210 may receive user input via a user interface (which will be described in detail later).
  • the user's input can be in various forms, for example, it can include but is not limited to one of the following: text, image, video, and audio.
  • the interaction module 210 may send the input to the recognition module 220, and the recognition module 220 obtains text based on the user's input.
  • the recognition module 220 obtains text based on the user's input.
  • any image recognition technology and voice recognition technology can be used to obtain the text, which is not limited in the present invention.
  • this text refers to the text input by the user, the text obtained based on the image input by the user, the text obtained based on the video input by the user, and the text obtained based on the audio input by the user as input text.
  • the input text can be one or multiple input texts.
  • multiple input texts can be distinguished based on separators such as punctuation marks.
  • the user's input is "Chaoyangmen; Dongqiao; Jintai Road", that is, the following three input texts separated by semicolons: “Chaoyangmen", “East Bridge” and "Jintai Road”.
  • the user inputs an image multiple input texts obtained based on the image can be distinguished according to different text blocks of the input text in the image (which will be described in detail later).
  • a separator word such as "interval”.
  • the user inputs a video since the video includes images and audio, processing can be performed with reference to the situation of the input image and/or audio.
  • the text matching module 230 in the input processing device 200 may search for a target text matching the input in an image related to the input.
  • the input processing device 200 further includes an image acquisition module 240, which can acquire an image related to the user's input.
  • the camera of the computing device 100 can be used to collect related images, or the related images sent to the input processing device 200 can be received via the network, and the source of the related images is not limited in the present invention.
  • the image related to the input may include multiple text blocks.
  • the input-related image displays multiple texts, each text corresponding to the image block in the image is a text block, and the text contained in each text block is a text.
  • Fig. 3 shows a schematic diagram of an input-related image according to an embodiment of the present invention.
  • the image is an image of a subway line and includes the logo text of many subway stations (that is, the names of the subway stations).
  • the image block occupied by the name of each subway station in the image is a text block, and the name of the subway station displayed in each text block is a text.
  • the input-related images can be in various formats, for example, bitmap image formats such as JPEG, BMP, PNG, etc., and vector graphics formats such as SVG and SWF.
  • bitmap image formats such as JPEG, BMP, PNG, etc.
  • vector graphics formats such as SVG and SWF.
  • the present invention does not limit the format of the image.
  • the image acquisition module 240 can directly acquire the text contained in the image (without image recognition).
  • the image acquisition module 240 needs to send the image to the recognition module 220, and the recognition module 220 performs image recognition.
  • the recognition module 220 can obtain the text contained in the image, that is, obtain multiple texts contained in multiple text blocks.
  • the recognition module 220 may use optical character recognition (OCR, Optical Character Recognition) technology to analyze the image and recognize text in the image.
  • OCR optical character recognition
  • the recognition module 220 can also detect texts in multiple different languages.
  • the recognition module 220 may include an OCR engine capable of recognizing text in multiple languages, or an OCR engine for each of multiple different languages.
  • OCR optical character recognition
  • other image recognition technologies can also be used to recognize the text in the image, which is not limited in the present invention.
  • the recognition module 220 may also detect the display information of the text in the image.
  • the display information includes but is not limited to the display area and/or display style of the text in the image.
  • the display area indicates the position of the text in the image, such as the coordinates of the text block where the text is located.
  • the display style can include text color, background color, font size, font type, and so on. In some embodiments, this display information can be used to identify different text blocks in the image. For example, in the case where two parts of text have different font colors, different background colors, or are separated from each other (for example, separated by at least a threshold distance), the text recognition module 220 may determine that the two parts of text in the image are Two texts contained in two different text blocks.
  • the recognition module 220 can also use natural language processing (NLP) technology to correct errors generated in image recognition, such as segmentation errors, text errors, grammatical errors, and so on.
  • NLP natural language processing
  • the image obtaining module 240 sends the text contained in the image to the text matching module 230.
  • the text matching module 230 may perform text matching between the text contained in the image and the input text to find a target text in the image that matches the input text. Among them, for each input text, the text contained in the image is matched with the input text of the item to find the target text in the image that matches the input text of the item.
  • the text matching module 230 can determine whether the input text and the text in the image acquired by the image acquisition module 240 use the same language.
  • the language of the text in the image acquired by the image acquisition module 240 is the same language as the language of the input text. That is, if the language of the text in the image acquired by the image acquisition module 240 is called the source language, and the language of the input text is called the input language, then the source language and the input language are the same language, the text matching module 230 may Straightly find the target text that matches the input text in the text contained in the image. For example, among multiple texts contained in the image, directly search for the target text that includes the input text of the item or includes at least a part of the input text of the item.
  • the language of the text in the image acquired by the image acquisition module 240 and the language of the input text are two different languages. That is, the source language and the input language are two different languages.
  • the text matching module 230 needs to first translate the text obtained from the image into the input language, or translate the input text into the source language.
  • the text matching module 230 may send the text to be translated to the translation engine 250 for translation.
  • the text matching module 230 performs text matching between the text contained in the image and the translation of the input text into the source language, so as to find the target text that matches the input text in the text contained in the image.
  • the user interface can be used to prompt the user. If the target text in the image that matches the input text of the item is found, the vibration, ringing, blinking signal light and/or user interface of the computing device 1 can be used to prompt the user.
  • the text matching module 230 may display the target text matching the input text in the image.
  • the target text can be highlighted again in the text block where the target text is located in the image (for example, covering the text block) (refer to the description of highlighting auxiliary information below).
  • various borders such as rectangular boxes
  • shapes such as arrows
  • line segments such as wavy lines
  • the text matching module 230 may obtain auxiliary information of the target text that matches the input text. It should be noted that the embodiment of the present invention does not limit the specific content of the auxiliary information, and any information related to the target text that can assist the user is within the protection scope of the present invention.
  • the auxiliary information may be information such as reviews, introductions, reference prices, purchase channels, etc.
  • the text matching module 230 may obtain such auxiliary information from various search engines, review websites, or shopping platforms.
  • the auxiliary information may be translation.
  • the text matching module 230 may send the target text matching the input text to the translation engine 250 to obtain the translation of the target text.
  • the translation engine 250 can translate the target text into different languages.
  • the translation engine 250 may translate the text into the target language specified by the user, or the default language of the computing device 100 (for example, when the target language is not specified), or the input language of the input text (for example, the target language is not specified and the input language is different).
  • the user can use the user interface to specify the target language (will be described in detail later).
  • the interaction module 210 may display or highlight the auxiliary information using a user interface displaying an image related to the input.
  • the interaction module 210 may obtain the display information of the target text in the image, and display or highlight the auxiliary information of the target text based on the display information.
  • the display area of the auxiliary information can be configured based on the display area of the target text.
  • the display area of the auxiliary information can cover the corresponding text block of the target text in the image, and can also be close to the corresponding text block of the target text in the image (for example, displayed around the corresponding text block).
  • the display style of the auxiliary information can be configured based on the display style of the target text.
  • at least part of the display style of the auxiliary information may be configured to be consistent with the corresponding display style of the target text (for example, the font size and font type configuration are consistent).
  • at least part of the display style of the auxiliary information may be configured to be significantly different from the corresponding display style of the target text.
  • the background color (or text color) of the auxiliary information can be configured as a bright color or a contrast color of the background color of the text contained in the image to highlight the auxiliary information.
  • display styles such as underline, bold, italic, text background, text shading, and text border may be used to highlight the auxiliary information.
  • the embodiment of the present invention does not limit the specific display style used to display or highlight the auxiliary information.
  • the mark of the target text matching the same input text in the image or the display style of its auxiliary information can be the same.
  • the mark of the target text or the display style of its auxiliary information that matches the input text of different items can be different.
  • the displayed auxiliary information may be editable text, so that the user can perform editing operations such as copy and paste.
  • the interaction module 210 may also receive a user's zoom instruction or selection instruction on the displayed auxiliary information, and in response to the user's zoom instruction or selection instruction on the displayed auxiliary information, zoom in or out accordingly. Display the auxiliary information to facilitate the user's reading.
  • the interaction module 210 may also receive a save instruction from the user, and in response to the save instruction, store the above-mentioned image displaying the auxiliary information in the computing device 100, so as to facilitate subsequent viewing by the user.
  • the interaction module 210 may also receive a user's selection instruction for other text blocks that do not display auxiliary information, and in response to the user's selection instruction for other text blocks that do not display auxiliary information, obtain and display The auxiliary information of the text contained in the selected text block.
  • the specific display configuration has been described in detail in the previous section, and will not be repeated here.
  • FIG. 4A to 4C show screenshots of a user interface for displaying translation of target text in an image according to an embodiment of the present invention.
  • the user interface 410 displays an image 411, and the image 411 may be collected in response to the user's selection of the camera button 412.
  • the image 411 includes a plurality of text blocks, and each text block includes one item of text.
  • each text block included in the image 411 includes an English text.
  • the text block 416 includes an English text "King's Cross St Pancras".
  • the user interface 410 may also enable the user to select a language for translation. Specifically, the user interface 410 enables the user to select the source language 413, so that multiple texts in the source language will be recognized in the image 411. The user interface 410 also enables the user to select the target language 414 into which the text will be translated. In the example of FIG. 4A, the user has selected the source language 413 as English and the target language 414 as Chinese. That is, the user wants to translate the English text recognized in the image 411 into Chinese text.
  • the user interface 410 may also enable the user to input.
  • the user interface includes an input button 415.
  • the user interface 420 as shown in FIG. 4B may be displayed.
  • the user interface 420 is similar to the user interface 410 and includes an image 411 and an input button 415, and also includes a control 421.
  • the control 421 can receive the user's input, and can also prompt the user to enter the target language 414. For example, the input text in the target language 414 or the image/audio including the input text in the target language 414 may be input. In the example of FIG. 4B, the control 421 prompts the user to input Chinese input text. After the user inputs, in response to the control 421 receiving the user's input, a user interface 430 as shown in FIG. 4C may be displayed.
  • the user interface 430 is similar to the user interface 410 and includes an image 411 and an input button 415, and also includes a control 431.
  • the control 431 can display one or more items of input text obtained from the user's input, and can also display the search result of at least one item of text that matches in the image 411 based on the one or more items of input text.
  • the control 431 displays an input text "King's Cross" and the search result of the input text "A King's Cross has been found".
  • the user interface 430 also displays an overlay 432 on top of the image 411.
  • the overlay 432 displays the translation 433 of the target text matching the user's input text in the image 411, and overlays the text block corresponding to the matching target text.
  • the text "King's Cross St Pancras" contained in the text block 416 in the image 411 matches the user's input text "King's Cross", covering 432 to display the text "King's Cross St Pancras" "The translation 433 "King's Cross” and overlays the corresponding text block 416.
  • the user interface 430 may configure the display style of the translation of the target text based on the display style of the matched target text.
  • the background color of the translation 433 and the matching target text "King's Cross St Pancras" are similar (for example, both are white), and the text color is similar (for example, both are blue), which helps to translate the translation 433 It is displayed in coordination with the image 411.
  • the user interface 430 may also highlight the translation.
  • the background color of the text of the translation 433 is a bright color (for example, bright blue), which helps to highlight the translation 433.
  • the user can quickly and easily locate the target text of interest or interest in the image 411 and obtain its translation, without passively reading the translation of all the text in the image. For example, after capturing the image 411 and inputting "Kings Cross" through the user interface 420, the user can easily locate the "Kings Cross” subway station through the user interface 430 and see its translation.
  • the user interface 430 may also enable the user to download and store the image 411 with the translation 433 displayed.
  • the user interface 430 includes a download button 434.
  • the image 411 displaying the translation 433 can be downloaded and stored for the user to view later.
  • FIG. 5 shows a flowchart of an input processing method 500 according to an embodiment of the present invention.
  • the input method 500 can be executed in the input processing device 200.
  • the input processing method 500 starts at step S510.
  • step S510 the user's input is received.
  • step S520 an image related to the user's input is acquired.
  • the user's input may include at least one of the following: text, image, video, and audio.
  • the input text may be obtained based on the user's input.
  • step S530 search for target text in the image that matches the user's input.
  • the text contained in the image can be obtained, and the text contained in the image can be matched with the input text to find the matching target text.
  • the language of the input text is different from the language of the text contained in the image
  • the translation of the input text into the language of the text contained in the image can also be obtained, and the text contained in the image can be matched with the translation of the input text. .
  • the target text may be highlighted in the input-related image, and/or auxiliary information of the target text may be displayed. Among them, auxiliary information can also be highlighted.
  • the display information of the target text in the above image can be acquired, and the auxiliary information can be displayed or highlighted based on the display information.
  • the display information may at least include the display area and/or display style of the target text.
  • the display area of the auxiliary information may be configured based on the display area of the target text.
  • the display area of the auxiliary information may cover or be close to the display area of the target text.
  • the display style of the auxiliary information can also be configured based on the display style of the target text.
  • Fig. 6 shows a flowchart of an input processing method 600 according to an embodiment of the present invention.
  • the input method 600 can be executed in the input processing device 200. As shown in FIG. 6, the input processing method 600 starts at step S610.
  • step S610 a user's input is received.
  • step S620 an image related to the user's input is acquired.
  • the input text may also be obtained based on the input.
  • step S630 search for the target text in the image that matches the user's input.
  • the text contained in the image can be obtained.
  • the input text is expressed in the input language
  • the text contained in the image is expressed in a source language different from the input language. Therefore, it is necessary to obtain the translation of the input text to the source language, and then perform the text translation between the text contained in the image and the input text Match to find matching target text.
  • the translation may include the translation of the target text into the input language, or the translation of the target text into the target language specified by the user.
  • the target text may be highlighted in the image, and/or the translation of the target text may be displayed.
  • the input processing apparatus 200 is illustrated as including an interaction module 210, a recognition module 220, a text matching module 230, an image acquisition module 240, and a translation engine 250, one or more of these modules may be It is stored on and/or executed by other devices, such as a server that communicates with the input processing apparatus 200 (for example, a server that can perform image recognition, voice recognition, text matching, and language translation).
  • the input processing device 200 may receive user input, obtain an image related to the input, and send the input and the image to the server. The server searches for the target text that matches the input in the image, and returns the search result to the input processing device 200.
  • the input processing device 200 then highlights the target text found by the server in the image, and/or displays auxiliary information of the target text.
  • auxiliary information of the target text please refer to the relevant description of the input processing apparatus 200 and the input processing method 500 in conjunction with FIGS. 1 to 5, and will not be repeated here.
  • the following describes the input processing device 700 and the input processing method 800 executed by it.
  • the image related to the input may be considered to include multiple image blocks.
  • the user wants to locate the image block he focuses on among the multiple image blocks included in the image, that is, the target image.
  • the input processing device 700 may receive the user's input of a commodity that he wants to purchase, and obtain an image of a store shelf.
  • the image of the store shelf includes logo images of multiple commodities, and the input processing device 700 can highlight the logo images of the commodities that the user wants to purchase in the images, so that the user can quickly locate the commodities.
  • the evaluation, introduction or reference price of the product can also be displayed for the user's reference.
  • FIG. 7 shows a schematic diagram of an input processing device 700 according to an embodiment of the present invention.
  • the input processing device 700 includes an interaction module 710, an image acquisition module 720, and an image matching module 730.
  • the interaction module 710 may receive a user's input, and the input may indicate a target image that the user pays attention to.
  • the image acquisition module 720 may acquire an image related to the user's input.
  • the image matching module 730 is coupled to the interaction module 710 and the image acquisition module 720, and can search for a target image matching the user's input among the above-mentioned images. For the found target image, the interaction module 720 may highlight the target image in the above-mentioned image, and/or display auxiliary information of the target image.
  • FIG. 8 shows a flowchart of an input processing method 800 according to an embodiment of the present invention. As shown in FIG. 8, the input processing method 800 is executed in the input processing device 700 and starts at step S810.
  • a user's input may be received.
  • a user's input may be received via a user interface, and the input may indicate a target image that the user focuses on.
  • the user's input can be in various forms, for example, it can include but is not limited to one of the following: text, image, video, and audio. If the user's input includes text, audio, or video, then the input image needs to be obtained based on the user's input. Among them, if the user inputs an image, the image is the input image. If the user inputs text, the input image can be obtained based on the text (for example, obtained through a search engine). If the user inputs audio, the text can be obtained based on the audio, and then the input image can be obtained based on the text. For example, if the user inputs the text of "X doll" or speaks the voice of "X doll", the image of "X” can be obtained based on the input. If the user inputs a video, since the video includes images and audio, the input image can be obtained by referring to the input image and audio.
  • the input image can be one or more input images.
  • multiple input images can naturally be distinguished.
  • multiple texts can be obtained first (multiple texts can be distinguished by separators such as punctuation marks), and then each input image is obtained based on each text.
  • the user inputs audio multiple texts can be obtained based on the audio first (multiple texts obtained based on the audio can be distinguished by pausing or saying a separator such as "interval"), and then based on each text separately Obtain each input image.
  • the user inputs a video since the video includes continuous images and audio, it can be handled with reference to the case of continuous input of images and/or input of audio.
  • an image related to the user's input may be acquired.
  • the camera of the computing device 100 can be used to collect related images, or the related images sent to the input processing device 700 can be received via the network, and the source of the related images is not limited in the present invention.
  • step S830 a target image in the image that matches the input of the user can be searched for.
  • a target image matching the input image(s) in the image is searched for each input image.
  • the image features of the image and the input image(s) can be acquired first, and then based on the acquired image features, the image can be matched with the input image. To find the target image in the image that matches the input image.
  • any image feature extraction technology can be used to obtain image features
  • any image matching technology can be used to perform image matching, which is not limited in the present invention.
  • the user interface can be used to prompt the user. If a target image in the image that matches the input image of the item is found, the vibration, ringing, blinking of the signal light and/or the user interface of the computing device 100 can be used to prompt the user.
  • the target image matching the input image and/or auxiliary information of the target image may be displayed prominently in the above-mentioned image.
  • the auxiliary information may be information such as introduction, evaluation, reference price, etc., which is not limited in the present invention.
  • auxiliary information related to the target image can be obtained from various search engines, review websites, or shopping platforms.
  • various marks such as borders (for example, rectangular frames), shapes (for example, arrows), line segments (for example, wavy lines), etc. may be added to the target image in the image, so as to highlight the target image. In this way, the user can quickly locate the target image without having to search one by one.
  • the display area of the target image in the above-mentioned image may be obtained, and the display area indicates the position of the target image in the image.
  • the display area of the auxiliary information is configured based on the display area of the target image.
  • the display area of the auxiliary information may cover or be close to the display area of the target image.
  • the display style of the auxiliary information can also be configured based on the display style of the text contained in the image.
  • at least part of the display style of the auxiliary information may be configured to be consistent with the corresponding display style of the text in the image (for example, the font size and font type configuration are consistent).
  • the display style of the auxiliary information may be configured to be significantly different from the corresponding display style of the text contained in the image.
  • the background color (or text color) of the auxiliary information can be configured as a bright color or a contrast color of the background color of the text contained in the image to highlight the auxiliary information.
  • display styles such as underline, bold, italic, text background, text shading, and text border may be used to highlight the auxiliary information.
  • the embodiment of the present invention does not limit the specific display style used to display or highlight the auxiliary information.
  • the mark and/or display style of the auxiliary information of the target image matching the same input image in the above-mentioned images can be the same,
  • the display styles of the mark and/or auxiliary information of the target image matching different input images in the above-mentioned images may be different.
  • the displayed auxiliary information may be editable text, so that the user can perform editing operations such as copy and paste.
  • Auxiliary information to facilitate the user's reading It is also possible to receive a save instruction from the user, and in response to the save instruction, store the above-mentioned image displaying the auxiliary information in the computing device 100 to facilitate subsequent viewing by the user.
  • the specific display configuration has been described in detail in the previous section, and will not be repeated here.
  • the input processing device 700 may receive user input, obtain an image related to the input, and send the input and the image to the server.
  • the server searches for a target image matching the input in the image, and returns the search result to the input processing device 700.
  • the input processing device 700 then highlights the target image found by the server in the image, and/or displays auxiliary information of the target image.
  • the embodiments of the present invention can not only be limited to locating target text or target images in images, but can also be extended to locating target texts, target images, etc. in various source objects such as text, images, and audio.
  • Various target objects such as images.
  • FIG. 9 shows a flowchart of an input processing method 900 according to an embodiment of the present invention. As shown in FIG. 9, the input processing method 900 starts at step S910.
  • a user's input is received, and the user's input may include at least one of the following: text, audio, video, and image.
  • the source object may include at least one of the following: text, image, and video.
  • step S930 a target object that matches the input of the user among the source objects can be searched.
  • the target object may be highlighted in the source object, and/or auxiliary information of the target object may be displayed.
  • the source object may generally include multiple objects, for example, multiple pieces of text or multiple image blocks.
  • the target object may be an object that the user pays attention to among multiple objects included in the source object, for example, it may be a target text, a target image, or any other suitable objects.
  • the present invention does not limit the specific types of the target object and the source object. The scenes in which the target text is searched from the image, the target text is searched from the text, the target image is searched from the image, etc. are all in the present invention. Within the scope of protection.
  • one or more steps in the input processing method 900 may also be executed by other devices, such as a server that communicates with the input processing apparatus that executes the input processing method 900.
  • a user's input may be received, an image related to the input may be obtained, and the input and the image may be sent to the server.
  • the server searches for the target image that matches the input in the image, and returns the search result.
  • the input processing device that executes the input processing method 900 then highlights the target image found by the server in the image, and/or displays auxiliary information of the target image.
  • the target object for example, target text or target image
  • the input of the source object for example, image
  • the target object is highlighted for the user, and/or the target object is displayed
  • the auxiliary information in the source object avoids the tedious operation of reading and finding the target object that the user pays attention to in the source object, and improves the user experience.
  • the input processing solution according to the embodiment of the present invention only needs to obtain the auxiliary information of the target object included in the source object, which greatly reduces the workload.
  • the various technologies described here can be implemented in combination with hardware or software, or a combination of them. Therefore, the method and device of the embodiment of the present invention, or some aspects or parts of the method and device of the embodiment of the present invention may be embedded in a tangible medium, such as a removable hard disk, U disk, floppy disk, CD-ROM, or any other machine.
  • a form of program code ie, instructions
  • a read storage medium where when the program is loaded into a machine such as a computer and executed by the machine, the machine becomes a device for practicing the embodiment of the present invention.
  • the computing device When the program code is executed on a programmable computer, the computing device generally includes a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), and at least one input device, And at least one output device.
  • the memory is configured to store program code; the processor is configured to execute the method of the embodiment of the present invention according to instructions in the program code stored in the memory.
  • readable media include readable storage media and communication media.
  • the readable storage medium stores information such as computer readable instructions, data structures, program modules, or other data.
  • Communication media generally embody computer-readable instructions, data structures, program modules or other data in modulated data signals such as carrier waves or other transmission mechanisms, and include any information delivery media. Combinations of any of the above are also included in the scope of readable media.
  • the algorithms and displays are not inherently related to any particular computer, virtual system or other equipment.
  • Various general-purpose systems can also be used with the examples of embodiments of the present invention. Based on the above description, the structure required to construct this type of system is obvious.
  • the embodiments of the present invention are not directed to any specific programming language. It should be understood that various programming languages can be used to implement the content of the embodiments of the present invention described herein, and the above description of specific languages is for the purpose of disclosing the best implementation of the embodiments of the present invention.
  • modules or units or components of the device in the example disclosed herein can be arranged in the device as described in this embodiment, or alternatively can be positioned differently from the device in this example Of one or more devices.
  • the modules in the foregoing examples can be combined into one module or, in addition, can be divided into multiple sub-modules.
  • modules or units or components in the embodiments can be combined into one module or unit or component, and in addition, they can be divided into multiple sub-modules or sub-units or sub-components. Except that at least some of such features and/or processes or units are mutually exclusive, any combination can be used to compare all the features disclosed in this specification (including the accompanying claims, abstract and drawings) and any method or methods disclosed in this manner or All the processes or units of the equipment are combined. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract and drawings) may be replaced by an alternative feature providing the same, equivalent or similar purpose.

Abstract

Disclosed are an input processing method and apparatus, and a computing device. The method comprises: receiving an input of a user; acquiring an image related to the input; searching the image for target text matching the input; and with regard to the found target text, highlighting the target text in the image, and/or displaying auxiliary information of the target text. Further disclosed are the corresponding input processing apparatus, the computing device, and a storage medium.

Description

一种输入处理方法、装置及计算设备Input processing method, device and computing equipment 技术领域Technical field
本发明涉及计算机技术领域,尤其涉及一种输入处理方法、装置及计算设备。The present invention relates to the field of computer technology, in particular to an input processing method, device and computing equipment.
背景技术Background technique
随着诸如移动终端和个人计算机之类的计算设备的发展和普及,用户越来越习惯于使用计算设备来处理各种日常事务。例如,当面对大量信息时,用户可能希望使用计算设备来迅速定位到其中用户关注的部分信息,或者获取其中用户关注的部分信息的辅助信息。一个典型的应用场景是,当在讲不同语言的地区旅行时,面对大量陌生语言的信息,用户希望使用计算设备定位其中用户关注的部分信息,并获取这部分信息的翻译。With the development and popularization of computing devices such as mobile terminals and personal computers, users are becoming more and more accustomed to using computing devices to handle various daily affairs. For example, when faced with a large amount of information, a user may wish to use a computing device to quickly locate part of the information the user pays attention to, or obtain auxiliary information of the part of the information the user pays attention to. A typical application scenario is that when traveling in a region where different languages are spoken, and faced with a large amount of information in an unfamiliar language, the user hopes to use a computing device to locate part of the information that the user pays attention to and obtain the translation of this part of the information.
在传统方案中,用户可以使用计算设备中安装的相机来采集包括大量信息的图像,计算设备识别该图像中的所有信息,将所有信息的辅助信息显示给用户。然而这对用户来说可能是不友好的。例如,图像可以是地图的图像,其包括众多不同的地点的名称。显示所有地点名称的翻译的用户界面可能会令用户感到疲惫,因为用户需要逐个去阅读才能查找到他关注或感兴趣的地点,获取该地点的翻译。这个过程相当费时费力,降低了用户体验。In a traditional solution, a user can use a camera installed in a computing device to collect an image that includes a large amount of information, the computing device recognizes all the information in the image, and displays auxiliary information of all the information to the user. However, this may be unfriendly to the user. For example, the image may be an image of a map, which includes the names of many different places. The user interface that displays the translations of all the place names may be tiring for the user, because the user needs to read one by one to find the place he cares about or is interested in, and obtain the translation for that place. This process is quite time-consuming and laborious, which reduces the user experience.
发明内容Summary of the invention
为此,本发明实施例提供一种输入处理方法、装置及计算设备,以力图解决或至少缓解上面存在的问题。To this end, embodiments of the present invention provide an input processing method, device, and computing device to try to solve or at least alleviate the above problems.
根据本发明实施例的一个方面,提供了一种输入处理方法,包括:接收用户的输入;获取与输入相关的图像;查找图像中与输入相匹配的 目标文本;以及对于查找到的目标文本,在图像中突出显示目标文本,和/或显示目标文本的辅助信息。According to one aspect of the embodiments of the present invention, there is provided an input processing method, including: receiving a user's input; acquiring an image related to the input; searching for a target text in the image that matches the input; and for the found target text, Highlight the target text in the image, and/or display auxiliary information for the target text.
可选地,在根据本发明实施例的方法中,用户的输入至少包括以下之一:文本、图像、视频和音频,该方法还包括:基于用户的输入获取输入文本。Optionally, in the method according to the embodiment of the present invention, the user's input includes at least one of the following: text, image, video, and audio. The method further includes: acquiring the input text based on the user's input.
可选地,在根据本发明实施例的方法中,查找图像中与输入相匹配的目标文本的步骤包括:获取图像包含的文本;将图像包含的文本与输入文本进行文本匹配。Optionally, in the method according to the embodiment of the present invention, the step of searching for the target text in the image that matches the input includes: obtaining the text contained in the image; and performing text matching between the text contained in the image and the input text.
可选地,在根据本发明实施例的方法中,将图像包含的文本与输入文本进行文本匹配的步骤包括:在输入文本的语言不同于图像包含的文本的语言的情况下,获取输入文本到图像包含的文本的语言的翻译;将图像包含的文本与输入文本的翻译进行文本匹配。Optionally, in the method according to the embodiment of the present invention, the step of matching the text contained in the image with the input text includes: in the case that the language of the input text is different from the language of the text contained in the image, obtaining the input text to The language translation of the text contained in the image; the text contained in the image is matched with the translation of the input text.
可选地,在根据本发明实施例的方法中,显示辅助信息的步骤包括:突出显示辅助信息。Optionally, in the method according to the embodiment of the present invention, the step of displaying the auxiliary information includes: highlighting the auxiliary information.
可选地,在根据本发明实施例的方法中,还包括:对于查找到的目标文本,获取目标文本在图像中的显示信息,显示信息至少包括目标文本的显示区域和/或显示样式。Optionally, in the method according to the embodiment of the present invention, the method further includes: for the found target text, obtaining display information of the target text in the image, the display information including at least the display area and/or display style of the target text.
可选地,在根据本发明实施例的方法中,显示或者突出显示辅助信息的步骤包括:基于目标文本的显示区域来配置辅助信息的显示区域,辅助信息的显示区域覆盖目标文本的显示区域。Optionally, in the method according to the embodiment of the present invention, the step of displaying or highlighting the auxiliary information includes: configuring the display area of the auxiliary information based on the display area of the target text, and the display area of the auxiliary information covers the display area of the target text.
可选地,在根据本发明实施例的方法中,显示或者突出显示辅助信息的步骤包括:基于目标文本的显示样式来配置辅助信息的显示样式。Optionally, in the method according to the embodiment of the present invention, the step of displaying or highlighting the auxiliary information includes: configuring the display style of the auxiliary information based on the display style of the target text.
根据本发明实施例的另一个方面,提供了一种输入处理方法,包括:接收用户的输入;获取与输入相关的图像;将输入与图像发送至服务器,以便服务器查找图像中与输入相匹配的目标文本;以及对于查找到的目标文本,在图像中突出显示目标文本,和/或显示目标文本的辅助信息。According to another aspect of the embodiments of the present invention, there is provided an input processing method, including: receiving user input; acquiring an image related to the input; sending the input and the image to a server, so that the server finds the image that matches the input Target text; and for the found target text, highlight the target text in the image, and/or display auxiliary information of the target text.
根据本发明实施例的另一个方面,提供了一种输入处理方法,包括:接收用户的输入;获取与输入相关的图像;查找图像中与输入相匹配的目标文本;以及对于查找到的目标文本,在图像中突出显示目标文本,和/或显示目标文本的翻译。According to another aspect of the embodiments of the present invention, there is provided an input processing method, including: receiving a user's input; acquiring an image related to the input; searching for a target text in the image that matches the input; and for the searched target text , Highlight the target text in the image, and/or show the translation of the target text.
可选地,在根据本发明实施例的方法中,还包括:基于输入获取输入文本,输入文本以输入语言表示;以及查找图像中与输入相匹配的目标文本的步骤包括:获取图像包含的文本,图像包含的文本以不同于输入语言的源语言表示;获取输入文本到源语言的翻译;将图像包含的文本与输入文本的翻译进行文本匹配。Optionally, in the method according to the embodiment of the present invention, the method further includes: obtaining input text based on the input, and the input text is expressed in the input language; and the step of searching for the target text in the image that matches the input includes: obtaining text contained in the image , The text contained in the image is expressed in a source language different from the input language; the translation of the input text to the source language is obtained; the text contained in the image is matched with the translation of the input text.
可选地,在根据本发明实施例的方法中,翻译包括目标文本到输入语言的翻译、或者目标文本到用户指定的目标语言的翻译。Optionally, in the method according to the embodiment of the present invention, the translation includes the translation of the target text into the input language, or the translation of the target text into the target language specified by the user.
根据本发明实施例的另一个方面,提供了一种输入处理方法,包括:接收用户的输入;获取与输入相关的图像;将输入与图像发送至服务器,以便服务器查找图像中与输入相匹配的目标文本;以及对于查找到的目标文本,在图像中突出显示目标文本,和/或显示目标文本的翻译。According to another aspect of the embodiments of the present invention, there is provided an input processing method, including: receiving user input; acquiring an image related to the input; sending the input and the image to a server, so that the server finds the image that matches the input Target text; and for the found target text, highlight the target text in the image, and/or display the translation of the target text.
根据本发明实施例的另一个方面,提供了一种输入处理方法,包括:接收用户的输入;获取与输入相关的图像;查找图像中与输入相匹配的目标图像;以及对于查找到的目标图像,在图像中突出显示目标图像,和/或显示目标图像的辅助信息。According to another aspect of the embodiments of the present invention, there is provided an input processing method, including: receiving a user's input; acquiring an image related to the input; searching for a target image in the image that matches the input; and for the searched target image , Highlight the target image in the image, and/or display auxiliary information of the target image.
可选地,在根据本发明实施例的方法中,用户的输入至少包括以下之一:文本、音频、视频和图像,该方法还包括:基于输入获取输入图像。Optionally, in the method according to the embodiment of the present invention, the user's input includes at least one of the following: text, audio, video, and image, and the method further includes: acquiring an input image based on the input.
可选地,在根据本发明实施例的方法中,查找图像中与输入相匹配的目标图像的步骤包括:获取图像和输入图像的图像特征;基于图像特征,将图像与输入图像进行图像匹配。Optionally, in the method according to the embodiment of the present invention, the step of searching for a target image in the image that matches the input includes: acquiring image features of the image and the input image; and performing image matching between the image and the input image based on the image features.
可选地,在根据本发明实施例的方法中,显示辅助信息的步骤包括:突出显示辅助信息。Optionally, in the method according to the embodiment of the present invention, the step of displaying the auxiliary information includes: highlighting the auxiliary information.
可选地,在根据本发明实施例的方法中,还包括:对于查找到的目标图像,获取目标图像在图像中的显示区域。Optionally, in the method according to the embodiment of the present invention, the method further includes: for the searched target image, acquiring the display area of the target image in the image.
可选地,在根据本发明实施例的方法中,显示或者突出显示辅助信息的步骤包括:基于目标图像的显示区域来配置辅助信息的显示区域,辅助信息的显示区域覆盖目标图像的显示区域。Optionally, in the method according to the embodiment of the present invention, the step of displaying or highlighting the auxiliary information includes: configuring the display area of the auxiliary information based on the display area of the target image, and the display area of the auxiliary information covers the display area of the target image.
根据本发明实施例的另一个方面,提供了一种输入处理方法,包括:接收用户的输入;获取与输入相关的图像;将输入与图像发送至服务器,以便服务器查找图像中与输入相匹配的目标图像;以及对于查找到的目标图像,在图像中突出显示目标图像,和/或显示目标图像的辅助信息。According to another aspect of the embodiments of the present invention, there is provided an input processing method, including: receiving user input; acquiring an image related to the input; sending the input and the image to a server, so that the server finds the image that matches the input Target image; and for the searched target image, highlight the target image in the image, and/or display auxiliary information of the target image.
根据本发明实施例的另一个方面,提供了一种输入处理方法,包括:接收用户的输入;获取与输入相关的源对象;查找源对象中与输入相匹配的目标对象;以及对于查找到的目标对象,在源对象中突出显示目标对象,和/或显示目标对象的辅助信息。According to another aspect of the embodiments of the present invention, there is provided an input processing method, including: receiving a user's input; obtaining a source object related to the input; searching for a target object matching the input in the source object; Target object, highlight the target object in the source object, and/or display auxiliary information of the target object.
可选地,在根据本发明实施例的方法中,用户的输入至少包括以下之一:文本、音频、视频和图像,源对象至少包括以下之一:文本、图像和视频。Optionally, in the method according to the embodiment of the present invention, the user's input includes at least one of the following: text, audio, video, and image, and the source object includes at least one of the following: text, image, and video.
可选地,在根据本发明实施例的方法中,目标对象为目标文本或者目标图像。Optionally, in the method according to the embodiment of the present invention, the target object is a target text or a target image.
根据本发明实施例的另一个方面,提供了一种输入处理方法,包括:接收用户的输入;获取与输入相关的源对象;将输入与源对象发送至服务器,以便服务器查找源对象中与输入相匹配的目标对象;以及对于查找到的目标对象,在源对象中突出显示目标对象,和/或显示目标对象的辅助信息。According to another aspect of the embodiments of the present invention, there is provided an input processing method, including: receiving user input; obtaining a source object related to the input; sending the input and the source object to a server, so that the server can find the source object and the input The matched target object; and for the found target object, highlight the target object in the source object, and/or display auxiliary information of the target object.
根据本发明实施例的另一个方面,提供了一种输入处理装置,包括:交互模块,适于接收用户的输入;图像获取模块,适于获取与输入相关的图像;文本匹配模块,适于查找图像中与输入相匹配的目标文本;其中交互模块还适于对于查找到的目标文本,在图像中突出显示目 标文本,和/或显示目标文本的辅助信息。According to another aspect of the embodiments of the present invention, there is provided an input processing device, including: an interaction module, adapted to receive user input; an image acquisition module, adapted to acquire an image related to the input; and a text matching module, adapted to search The target text in the image that matches the input; wherein the interaction module is also adapted to highlight the target text in the image and/or display auxiliary information of the target text for the found target text.
根据本发明实施例的另一个方面,提供了一种输入处理装置,包括:交互模块,适于接收用户的输入;图像获取模块,适于获取与输入相关的图像;图像匹配模块,适于查找图像中与输入相匹配的目标图像;其中交互模块还适于对于查找到的目标图像,在图像中突出显示目标图像,和/或显示目标图像的辅助信息。According to another aspect of the embodiments of the present invention, there is provided an input processing device, including: an interaction module, adapted to receive user input; an image acquisition module, adapted to acquire an image related to the input; an image matching module, adapted to search The target image matching the input in the image; wherein the interaction module is also suitable for highlighting the target image in the image and/or displaying auxiliary information of the target image for the found target image.
根据本发明实施例的另一个方面,提供了一种计算设备,包括:一个或多个处理器;存储器;以及一个或多个程序,其中一个或多个程序存储在存储器中并被配置为由一个或多个处理器执行,该一个或多个程序包括用于执行根据本发明实施例的输入处理方法的指令。According to another aspect of the embodiments of the present invention, there is provided a computing device, including: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be configured by Executed by one or more processors, and the one or more programs include instructions for executing the input processing method according to the embodiment of the present invention.
根据本发明实施例的还有一个方面,提供了一种存储一个或多个程序的计算机可读存储介质,一个或多个程序包括指令,该指令当被计算设备执行时,使得计算设备执行根据本发明实施例的输入处理方法。According to yet another aspect of the embodiments of the present invention, there is provided a computer-readable storage medium storing one or more programs. The one or more programs include instructions that, when executed by a computing device, cause the computing device to execute according to The input processing method of the embodiment of the present invention.
根据本发明实施例的输入处理方案,通过接收用户主动的输入来为用户突出显示源对象(例如图像)中与其输入相匹配的目标对象(例如目标文本或者目标图像),和/或显示目标对象的辅助信息,避免了在源对象中逐一阅读查找用户关注的目标对象的繁琐操作,提高了用户体验。并且,根据本发明实施例的输入处理方案对象仅需获取源对象包括的目标对象的辅助信息,极大地降低了工作量。According to the input processing solution of the embodiment of the present invention, by receiving active input from the user, the target object (for example, target text or target image) matching the input of the source object (for example, image) is highlighted for the user, and/or the target object is displayed The auxiliary information in the source object avoids the tedious operation of reading and finding the target object that the user pays attention to in the source object, and improves the user experience. In addition, the input processing solution object according to the embodiment of the present invention only needs to obtain the auxiliary information of the target object included in the source object, which greatly reduces the workload.
上述说明仅是本发明实施例技术方案的概述,为了能够更清楚了解本发明实施例的技术手段,而可依照说明书的内容予以实施,并且为了让本发明实施例的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明实施例的具体实施方式。The above description is only an overview of the technical solutions of the embodiments of the present invention. In order to have a clearer understanding of the technical means of the embodiments of the present invention, they can be implemented in accordance with the content of the specification, and in order to allow the above and other objectives, features and characteristics of the embodiments of the present invention. The advantages can be more obvious and easy to understand, and the specific implementation manners of the embodiments of the present invention are exemplified below.
附图说明Description of the drawings
为了实现上述以及相关目的,本文结合下面的描述和附图来描述某些说明性方面,这些方面指示了可以实践本文所公开的原理的各种方 式,并且所有方面及其等效方面旨在落入所要求保护的主题的范围内。通过结合附图阅读下面的详细描述,本公开的上述以及其它目的、特征和优势将变得更加明显。遍及本公开,相同的附图标记通常指代相同的部件或元素。In order to achieve the above and related objects, this article describes certain illustrative aspects in conjunction with the following description and drawings. These aspects indicate various ways in which the principles disclosed herein can be practiced, and all aspects and their equivalents are intended to fall into Into the scope of the claimed subject matter. By reading the following detailed description in conjunction with the accompanying drawings, the above and other objectives, features and advantages of the present disclosure will become more apparent. Throughout this disclosure, the same reference numerals generally refer to the same parts or elements.
图1示出了根据本发明一个实施例的计算设备100的示意图;Fig. 1 shows a schematic diagram of a computing device 100 according to an embodiment of the present invention;
图2示出了根据本发明一个实施例的输入处理装置200的结构框图;FIG. 2 shows a structural block diagram of an input processing device 200 according to an embodiment of the present invention;
图3示出了根据本发明一个实施例的与输入相关的图像的示意图;Fig. 3 shows a schematic diagram of an input-related image according to an embodiment of the present invention;
图4A-图4C分别示出了根据本发明一个实施例的用于显示图像中目标文本的翻译的用户界面的屏幕截图;以及4A-4C respectively show screenshots of a user interface for displaying the translation of target text in an image according to an embodiment of the present invention; and
图5示出了根据本发明一个实施例的输入处理方法500的流程图;FIG. 5 shows a flowchart of an input processing method 500 according to an embodiment of the present invention;
图6示出了根据本发明一个实施例的输入处理方法600的流程图FIG. 6 shows a flowchart of an input processing method 600 according to an embodiment of the present invention
图7示出了根据本发明一个实施例的输入处理装置700的结构框图;FIG. 7 shows a structural block diagram of an input processing device 700 according to an embodiment of the present invention;
图8示出了根据本发明一个实施例的输入处理方法800的流程图;以及FIG. 8 shows a flowchart of an input processing method 800 according to an embodiment of the present invention; and
图9示出了根据本发明一个实施例的输入处理方法900的流程图。FIG. 9 shows a flowchart of an input processing method 900 according to an embodiment of the present invention.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Hereinafter, exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although the drawings show exemplary embodiments of the present disclosure, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided to enable a more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.
可以理解地,当面对大量信息时,用户希望能够迅速从中定位到用户关注的部分信息,并在定位到这部分信息时还显示相应的辅助信息 (例如相应的翻译、评价和介绍等等)。为此,本发明实施例公开了一种输入处理装置,该输入处理装置可以接收用户的输入,获取与该输入相关的、承载有众多信息的图像,并通过在图像中突出显示用户关注的目标对象,和/或显示目标对象的辅助信息,来让用户能够迅速定位目标对象和/或获取目标对象的辅助信息。Understandably, when faced with a large amount of information, the user hopes to quickly locate the part of the information that the user pays attention to, and display the corresponding auxiliary information (such as corresponding translation, evaluation and introduction, etc.) when locating this part of the information . To this end, an embodiment of the present invention discloses an input processing device, which can receive user input, obtain an image related to the input and carry a large amount of information, and highlight the target of the user’s attention in the image. Object, and/or display auxiliary information of the target object, so that the user can quickly locate the target object and/or obtain auxiliary information of the target object.
根据本发明实施例的输入处理装置可以通过下述计算设备来实现。图1示出了根据本发明一个实施例的计算设备100的示意图。计算设备100是能够采集和/或显示图像的电子设备,诸如个人计算机、移动通信设备(例如智能电话)、平板计算机以及可以采集和/或显示图像的其他设备。The input processing apparatus according to the embodiment of the present invention can be implemented by the following computing device. Fig. 1 shows a schematic diagram of a computing device 100 according to an embodiment of the present invention. The computing device 100 is an electronic device capable of collecting and/or displaying images, such as a personal computer, a mobile communication device (for example, a smart phone), a tablet computer, and other devices that can collect and/or display images.
如图1所示,计算设备100可以包括存储器接口102、一个或多个处理器104,以及外围接口106。存储器接口102、一个或多个处理器104和/或外围接口106既可以是分立元件,也可以集成在一个或多个集成电路中。在计算设备100中,各种元件可以通过一条或多条通信总线或信号线来耦合。传感器、设备和子系统可以耦合到外围接口106,以便帮助实现多种功能。As shown in FIG. 1, the computing device 100 may include a memory interface 102, one or more processors 104, and a peripheral interface 106. The memory interface 102, the one or more processors 104, and/or the peripheral interface 106 may be discrete components or integrated in one or more integrated circuits. In the computing device 100, various elements may be coupled through one or more communication buses or signal lines. Sensors, devices, and subsystems can be coupled to the peripheral interface 106 to help achieve multiple functions.
例如,运动传感器110、光传感器112和距离传感器114可以耦合到外围接口106,以方便定向、照明和测距等功能。其他传感器116同样可以与外围接口106相连,例如定位系统(例如GPS接收机)、温度传感器、生物测定传感器或其他感测设备,由此可以帮助实施相关的功能。For example, the motion sensor 110, the light sensor 112, and the distance sensor 114 may be coupled to the peripheral interface 106 to facilitate functions such as orientation, lighting, and distance measurement. Other sensors 116 can also be connected to the peripheral interface 106, such as a positioning system (such as a GPS receiver), a temperature sensor, a biometric sensor or other sensing devices, which can help implement related functions.
相机子系统120和光学传感器122可以用于方便诸如采集图像之类的相机功能的实现,其中相机子系统120和光学传感器122例如可以是电荷耦合器件(CCD)或互补金属氧化物半导体(CMOS)光学传感器。The camera subsystem 120 and the optical sensor 122 can be used to facilitate the realization of camera functions such as capturing images. The camera subsystem 120 and the optical sensor 122 can be, for example, a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS). Optical sensor.
计算设备100可以通过一个或多个无线通信子系统124来帮助实现通信功能,其中无线通信子系统124可以包括射频接收机和发射机和/或光(例如红外)接收机和发射机。无线通信子系统124的特定设计和实施方式可以取决于计算设备100所支持的一个或多个通信网络。例如,计算设备100可以包括被设计成支持GSM网络、GPRS网络、EDGE网 络、Wi-Fi或WiMax网络以及BluetoothTM网络的无线通信子系统124。The computing device 100 may help implement communication functions through one or more wireless communication subsystems 124, where the wireless communication subsystem 124 may include a radio frequency receiver and transmitter and/or an optical (for example, infrared) receiver and transmitter. The specific design and implementation of the wireless communication subsystem 124 may depend on one or more communication networks supported by the computing device 100. For example, the computing device 100 may include a wireless communication subsystem 124 designed to support a GSM network, a GPRS network, an EDGE network, a Wi-Fi or WiMax network, and a BluetoothTM network.
音频子系统126可以与扬声器128以及麦克风130相耦合,以便帮助实施启用语音的功能,例如语音识别、语音复制、数字记录和电话功能。The audio subsystem 126 may be coupled with the speaker 128 and the microphone 130 to help implement voice-enabled functions, such as voice recognition, voice reproduction, digital recording, and telephony functions.
为了显示图像,I/O子系统140可以包括显示器控制器142和/或一个或多个其他输入控制器144。显示器控制器142可以耦合到显示器146。显示器146可以是诸如液晶显示器(LCD)、触摸屏或其他类型的显示器。在一些实现中,显示器146和显示器控制器142可以使用多种触摸感测技术中的任何一种来检测与之进行的接触和移动或是暂停,其中感测技术包括但不局限于电容性、电阻性、红外和表面声波技术。一个或多个其他输入控制器144可以耦合到其他输入/控制设备148,例如一个或多个按钮、摇杆开关、拇指旋轮、红外端口、USB端口、和/或指示笔之类的指点设备。其中一个或多个按钮(未显示)可以包括用于控制扬声器128和/或麦克风130音量的向上/向下按钮。To display images, the I/O subsystem 140 may include a display controller 142 and/or one or more other input controllers 144. The display controller 142 may be coupled to the display 146. The display 146 may be, for example, a liquid crystal display (LCD), a touch screen, or other types of displays. In some implementations, the display 146 and the display controller 142 may use any one of a variety of touch sensing technologies to detect contact and movement or pause with them, where sensing technologies include but are not limited to capacitive, Resistive, infrared and surface acoustic wave technology. One or more other input controllers 144 may be coupled to other input/control devices 148, such as one or more buttons, rocker switches, thumbwheels, infrared ports, USB ports, and/or pointing devices such as stylus . One or more of the buttons (not shown) may include an up/down button for controlling the volume of the speaker 128 and/or the microphone 130.
存储器接口102可以与存储器150相耦合。该存储器150可以包括高速随机存取存储器和/或非易失性存储器,例如一个或多个磁盘存储设备,一个或多个光学存储设备,和/或闪存存储器(例如NAND,NOR)。The memory interface 102 may be coupled with the memory 150. The memory 150 may include high-speed random access memory and/or non-volatile memory, such as one or more magnetic disk storage devices, one or more optical storage devices, and/or flash memory (eg, NAND, NOR).
存储器150可以存储程序154,程序154运行在存储器150所存储的操作系统152之上。在计算设备运行时,会从存储器150中加载操作系统152,并且由处理器104执行。程序154在运行时,也会从存储器150中加载,并由处理器104执行。在各种程序154中,其中的一种程序为根据本发明实施例的输入处理装置200、700,并包括被配置为执行根据本发明实施例的输入处理方法500、600和800的指令。The memory 150 may store a program 154, and the program 154 runs on the operating system 152 stored in the memory 150. When the computing device is running, the operating system 152 will be loaded from the memory 150 and executed by the processor 104. When the program 154 is running, it is also loaded from the memory 150 and executed by the processor 104. Among the various programs 154, one of the programs is the input processing apparatus 200, 700 according to the embodiment of the present invention, and includes instructions configured to execute the input processing methods 500, 600, and 800 according to the embodiment of the present invention.
下面对输入处理装置200及其执行的输入处理方法500和600进行描述。The following describes the input processing device 200 and the input processing methods 500 and 600 executed by it.
在一些情况下,与输入相关的图像显示有多项文本(也可称为包含有多项文本),用户希望在图像显示的多项文本中定位目标文本。例如,输入处理装置200可以接收用户对感兴趣的菜品的输入,并获取餐 厅菜单的图像。餐厅菜单的图像包括多个菜品的标志文本,输入处理装置200可以在图像中突出显示用户感兴趣的菜品的标志文本,使用户能快速定位到感兴趣的菜品。进一步地,还可以显示该菜品的评价或介绍,供用户参考。In some cases, the input-related image displays multiple texts (also referred to as containing multiple texts), and the user wishes to locate the target text in the multiple texts displayed in the image. For example, the input processing device 200 may receive the user's input on a dish of interest, and obtain an image of a restaurant menu. The image of the restaurant menu includes the logo text of multiple dishes, and the input processing device 200 can highlight the logo text of the dish that the user is interested in in the image, so that the user can quickly locate the dish of interest. Further, the evaluation or introduction of the dish can also be displayed for the user's reference.
又例如,输入处理装置200可以接收用户对目的地铁站的输入,并获取地铁线路的图像。地铁线路的图像包括多个地铁站的标志文本,输入处理装置200可以在图像中突出显示目的地铁站的标志文本,使用户能快速定位到目的地铁站。进一步地,还可以显示该目的地铁站的翻译,供用户参考。For another example, the input processing device 200 may receive a user's input to a destination subway station, and obtain an image of a subway line. The image of the subway line includes the logo text of multiple subway stations, and the input processing device 200 can highlight the logo text of the target subway station in the image, so that the user can quickly locate the target subway station. Furthermore, the translation of the destination subway station can also be displayed for the user's reference.
又例如,输入处理装置200可以接收用户对想要购买的商品的输入,并获取商店货架的图像。商店货架的图像包括多个商品的标志文本,输入处理装置200可以在图像中突出显示用户想要购买的商品的标志文本,使用户能快速定位到该商品。进一步地,还可以显示该商品的评价、介绍或参考价格,供用户参考。For another example, the input processing device 200 may receive a user's input on a commodity that he wants to purchase, and obtain an image of a store shelf. The image of the store shelf includes the logo text of multiple commodities, and the input processing device 200 can highlight the logo text of the commodity that the user wants to purchase in the image, so that the user can quickly locate the commodity. Further, the evaluation, introduction or reference price of the product can also be displayed for the user's reference.
图2示出了根据本发明一个实施例的输入处理装置200的示意图。如图2所示,输入处理装置200包括交互模块210,该交互模块210可以接收用户的输入,该输入可以指示用户所关注的目标文本。在一些实施例中,交互模块210可以经由用户界面来接收用户的输入(将在后文详细描述)。用户的输入可以是各种形式的,例如可以包括但不限于以下之一:文本、图像、视频和音频。Fig. 2 shows a schematic diagram of an input processing device 200 according to an embodiment of the present invention. As shown in FIG. 2, the input processing apparatus 200 includes an interaction module 210, which can receive a user's input, and the input can indicate a target text that the user pays attention to. In some embodiments, the interaction module 210 may receive user input via a user interface (which will be described in detail later). The user's input can be in various forms, for example, it can include but is not limited to one of the following: text, image, video, and audio.
如果用户的输入包括图像、视频或者音频,那么交互模块210可以将该输入发送至识别模块220,由识别模块220基于用户的输入获取文本。其中,可以采用任何图像识别技术和语音识别技术获取文本,本发明对此不做限制。为便于描述,本文将用户输入的文本、基于用户输入的图像获取的文本、基于用户输入的视频获取的文本、以及基于用户输入的音频获取的文本均称为输入文本。If the user's input includes an image, video, or audio, the interaction module 210 may send the input to the recognition module 220, and the recognition module 220 obtains text based on the user's input. Among them, any image recognition technology and voice recognition technology can be used to obtain the text, which is not limited in the present invention. For ease of description, this text refers to the text input by the user, the text obtained based on the image input by the user, the text obtained based on the video input by the user, and the text obtained based on the audio input by the user as input text.
输入文本可以是一项或者多项输入文本。在用户直接输入文本的情况下,多项输入文本可以根据如标点符号之类的分隔符来区分开。例 如,用户的输入为“朝阳门;东大桥;金台路”,也就是用分号区分开的以下三项输入文本:“朝阳门”、“东大桥”和“金台路”。在用户输入图像的情况下,基于图像获取到的多项输入文本可以根据输入文本在图像中的不同文本块来区分开(将在后文详细描述)。在用户输入音频的情况下,基于音频获取到的多项输入文本可以通过停顿或者说出诸如“间隔”这样的分隔词来区分开。在用户输入视频的情况下,由于视频包括图像和音频,可以参考输入图像和/或音频的情况进行处理。The input text can be one or multiple input texts. In the case where the user directly enters text, multiple input texts can be distinguished based on separators such as punctuation marks. For example, the user's input is "Chaoyangmen; Dongqiao; Jintai Road", that is, the following three input texts separated by semicolons: "Chaoyangmen", "East Bridge" and "Jintai Road". In the case where the user inputs an image, multiple input texts obtained based on the image can be distinguished according to different text blocks of the input text in the image (which will be described in detail later). In the case of a user inputting audio, multiple input texts obtained based on the audio can be distinguished by pausing or saying a separator word such as "interval". In the case where the user inputs a video, since the video includes images and audio, processing can be performed with reference to the situation of the input image and/or audio.
在获取一项或者多项输入文本之后,输入处理装置200中的文本匹配模块230可以在与输入相关的图像中查找与该输入相匹配的目标文本。After acquiring one or more input texts, the text matching module 230 in the input processing device 200 may search for a target text matching the input in an image related to the input.
如图2所示,输入处理装置200还包括图像获取模块240,该图像获取模块240可以获取与用户的输入相关的图像。例如,可以使用计算设备100的相机来采集相关图像,也可以经由网络接收发送给输入处理装置200的相关图像,本发明对相关图像的来源不做限制。As shown in FIG. 2, the input processing device 200 further includes an image acquisition module 240, which can acquire an image related to the user's input. For example, the camera of the computing device 100 can be used to collect related images, or the related images sent to the input processing device 200 can be received via the network, and the source of the related images is not limited in the present invention.
与输入相关的图像可以包括多个文本块。如前文所描述地,与输入相关的图像显示有多项文本,每项文本对应在图像中的图像块即为一个文本块,每个文本块所包含的文本即为一项文本。The image related to the input may include multiple text blocks. As described above, the input-related image displays multiple texts, each text corresponding to the image block in the image is a text block, and the text contained in each text block is a text.
图3示出了根据本发明一个实施例的与输入相关的图像的示意图。该图像为地铁线路的图像,并包括众多地铁站的标志文本(也就是地铁站的名称)。每个地铁站的名称在图像中所占据的图像块即为一个文本块,每个文本块中显示的地铁站的名称即为一项文本。Fig. 3 shows a schematic diagram of an input-related image according to an embodiment of the present invention. The image is an image of a subway line and includes the logo text of many subway stations (that is, the names of the subway stations). The image block occupied by the name of each subway station in the image is a text block, and the name of the subway station displayed in each text block is a text.
与输入相关的图像可以是各种格式,例如可以是诸如JPEG、BMP、PNG等等之类的位图图像格式、以及SVG、SWF等矢量图形格式。本发明对图像的格式不做限制。The input-related images can be in various formats, for example, bitmap image formats such as JPEG, BMP, PNG, etc., and vector graphics formats such as SVG and SWF. The present invention does not limit the format of the image.
对于诸如SVG(Scalable Vector Graphics)之类格式的图像,图像获取模块240可以直接获取图像包含的文本(无需图像识别)。对于诸如JPEG之类格式的图像,图像获取模块240则需要将图像发送给识别模块220,由识别模块220来进行图像识别。识别模块220可以获取该图像包 含的文本,也就是获取包含在多个文本块中的多项文本。For images in formats such as SVG (Scalable Vector Graphics), the image acquisition module 240 can directly acquire the text contained in the image (without image recognition). For an image in a format such as JPEG, the image acquisition module 240 needs to send the image to the recognition module 220, and the recognition module 220 performs image recognition. The recognition module 220 can obtain the text contained in the image, that is, obtain multiple texts contained in multiple text blocks.
在一些实现中,识别模块220可以使用光学字符识别(OCR,Optical Character Recognition)技术分析图像,识别图像中的文本。识别模块220还可以检测多种不同语言的文本。例如,识别模块220可以包括能够识别多种语言的文本的OCR引擎,或用于多种不同语言中的每一个的OCR引擎。当然,还可以使用其他的图像识别技术来识别图像中的文本,本发明对此不做限制。In some implementations, the recognition module 220 may use optical character recognition (OCR, Optical Character Recognition) technology to analyze the image and recognize text in the image. The recognition module 220 can also detect texts in multiple different languages. For example, the recognition module 220 may include an OCR engine capable of recognizing text in multiple languages, or an OCR engine for each of multiple different languages. Of course, other image recognition technologies can also be used to recognize the text in the image, which is not limited in the present invention.
识别模块220还可以检测文本在图像中的显示信息,显示信息包括但不限于文本在图像中的显示区域和/或显示样式。显示区域指示文本在图像中的位置,例如文本所在文本块的坐标。显示样式则可以包括文本颜色、背景颜色、字体大小、字体类型等等。在一些实施例中,这些显示信息可以被用于识别图像中的不同文本块。例如,在两部分文本具有不同的字体颜色、不同的背景颜色、或被相互隔开(例如,至少隔开了阈值距离)的情况下,文本识别模块220可以确定图像中的这两部分文本为被包括在两个不同的文本块中的两项文本。The recognition module 220 may also detect the display information of the text in the image. The display information includes but is not limited to the display area and/or display style of the text in the image. The display area indicates the position of the text in the image, such as the coordinates of the text block where the text is located. The display style can include text color, background color, font size, font type, and so on. In some embodiments, this display information can be used to identify different text blocks in the image. For example, in the case where two parts of text have different font colors, different background colors, or are separated from each other (for example, separated by at least a threshold distance), the text recognition module 220 may determine that the two parts of text in the image are Two texts contained in two different text blocks.
此外,根据本发明的其他实施方式,识别模块220还可以通过自然语言处理(NLP)技术,将图像识别中产生的错误进行修正,例如断句错误、文字错误、语法错误等等。In addition, according to other embodiments of the present invention, the recognition module 220 can also use natural language processing (NLP) technology to correct errors generated in image recognition, such as segmentation errors, text errors, grammatical errors, and so on.
获取图像包含的(多项)文本之后,图像获取模块240将图像包含的文本发送给文本匹配模块230。文本匹配模块230可以将图像包含的文本与输入文本进行文本匹配,以查找图像中与输入文本相匹配的目标文本。其中,对每一项输入文本,均将图像包含的文本与该项输入文本进行文本匹配,以查找图像中与该项输入文本相匹配的目标文本。After obtaining the text(s) contained in the image, the image obtaining module 240 sends the text contained in the image to the text matching module 230. The text matching module 230 may perform text matching between the text contained in the image and the input text to find a target text in the image that matches the input text. Among them, for each input text, the text contained in the image is matched with the input text of the item to find the target text in the image that matches the input text of the item.
下面以一项输入文本为例来描述查找与该项输入文本相匹配的目标文本的过程。首先,文本匹配模块230可以判断该项输入文本与图像获取模块240获取到的图像中的文本是否使用同一种语言。The following takes an input text as an example to describe the process of finding the target text that matches the input text. First, the text matching module 230 can determine whether the input text and the text in the image acquired by the image acquisition module 240 use the same language.
在一些实现中,图像获取模块240获取到的图像中的文本的语言与输入文本的语言是同一种语言。即,如果将图像获取模块240获取到的 图像中的文本的语言称为源语言,将输入文本的语言称为输入语言话,此时源语言与输入语言为同一种语言,文本匹配模块230可以直查找图像包含的文本中,与该项输入文本相匹配的目标文本。例如,图像包含的多项文本中,直接查找包括该项输入文本或者包括该项输文本的至少一部分的目标文本。In some implementations, the language of the text in the image acquired by the image acquisition module 240 is the same language as the language of the input text. That is, if the language of the text in the image acquired by the image acquisition module 240 is called the source language, and the language of the input text is called the input language, then the source language and the input language are the same language, the text matching module 230 may Straightly find the target text that matches the input text in the text contained in the image. For example, among multiple texts contained in the image, directly search for the target text that includes the input text of the item or includes at least a part of the input text of the item.
在另一些实现中,图像获取模块240获取到的图像中的文本的语与输入文本的语言是不同的两种语言。即,源语言与输入语言为不同两种语言,此时文本匹配模块230需要先将从图像中获取到的文本翻为输入语言,或者将该项输入文本翻译为源语言。In other implementations, the language of the text in the image acquired by the image acquisition module 240 and the language of the input text are two different languages. That is, the source language and the input language are two different languages. At this time, the text matching module 230 needs to first translate the text obtained from the image into the input language, or translate the input text into the source language.
文本匹配模块230可以将要翻译的文本发送给翻译引擎250来进翻译。在一些实施例中,可以选择将从图像中获取到的文本翻译为输语言,但这种方式需要将图像中的每一项文本都进行翻译,工作量相大。因此优选地,是将输入文本翻译为源语言,也就是说仅将输入文发送到翻译引擎250进行翻译。The text matching module 230 may send the text to be translated to the translation engine 250 for translation. In some embodiments, you can choose to translate the text obtained from the image into the input language, but this method requires translation of every text in the image, which requires a lot of work. Therefore, preferably, the input text is translated into the source language, that is, only the input text is sent to the translation engine 250 for translation.
翻译之后,文本匹配模块230再将图像包含的文本与该项输入文到源语言的翻译进行文本匹配,以在图像包含的文本中,查找与该项入文本相匹配的目标文本。在一些实施例中,如果查找不到图像中与项输入文本相匹配的目标文本,可以利用用户界面来提示用户。如果找到图像中与该项输入文本相匹配的目标文本,可以利用计算设备1的振动、响铃、信号灯闪烁和/或用户界面来提示用户。After the translation, the text matching module 230 performs text matching between the text contained in the image and the translation of the input text into the source language, so as to find the target text that matches the input text in the text contained in the image. In some embodiments, if the target text in the image that matches the input text of the item cannot be found, the user interface can be used to prompt the user. If the target text in the image that matches the input text of the item is found, the vibration, ringing, blinking signal light and/or user interface of the computing device 1 can be used to prompt the user.
对于查找到了相匹配的目标文本的各项输入文本,在一些实现中文本匹配模块230可以将与该项输入文本相匹配的目标文本在图像中出地显示。在一些实施例中,可以在图像中目标文本所在的文本块之(例如覆盖于该文本块)重新突出地显示目标文本(参考后文对突出示辅助信息的描述)。For each input text for which a matching target text is found, in some implementations, the text matching module 230 may display the target text matching the input text in the image. In some embodiments, the target text can be highlighted again in the text block where the target text is located in the image (for example, covering the text block) (refer to the description of highlighting auxiliary information below).
在另一些实施例中,可以对图像中目标文本所在的文本块添加诸各种边框(例如矩形框)、形状(例如箭头)、线段(例如波浪线)类的标记,从而突出显示目标文本。这样,就使得用户可以迅速定位 目标文本,而无需逐个阅读查找。In other embodiments, various borders (such as rectangular boxes), shapes (such as arrows), and line segments (such as wavy lines) marks may be added to the text block where the target text in the image is located, so as to highlight the target text. In this way, users can quickly locate the target text without reading and searching one by one.
在其他实现中,对于查找到了相匹配的目标文本的各项输入文本,文本匹配模块230可以获取与该项输入文本相匹配的目标文本的辅助信息。应当指出,本发明实施例对辅助信息的具体内容不做限制,任何与目标文本相关的、可以辅助用户的信息均在本发明的保护范围之内。In other implementations, for each input text for which a matching target text is found, the text matching module 230 may obtain auxiliary information of the target text that matches the input text. It should be noted that the embodiment of the present invention does not limit the specific content of the auxiliary information, and any information related to the target text that can assist the user is within the protection scope of the present invention.
在一些实施例中,辅助信息可以是评价、介绍、参考价格、购买渠道等信息,文本匹配模块230可以从各种搜索引擎、点评网站、或者购物平台处获取这些辅助信息。In some embodiments, the auxiliary information may be information such as reviews, introductions, reference prices, purchase channels, etc. The text matching module 230 may obtain such auxiliary information from various search engines, review websites, or shopping platforms.
在另一些实施例中,辅助信息可以是翻译。文本匹配模块230可以将与该项输入文本相匹配的目标文本发送给翻译引擎250,以获取该目标文本的翻译。翻译引擎250可以将目标文本翻译为不同的语言。例如,翻译引擎250可以将文本翻译为用户指定的目标语言、或者计算设备100的默认语言(例如未指定目标语言的情况下)、或者输入文本的输入语言(例如未指定目标语言且输入语言不同于源语言的情况下)。其中,用户可以使用用户界面来指定目标语言(将在后文详细描述)。In other embodiments, the auxiliary information may be translation. The text matching module 230 may send the target text matching the input text to the translation engine 250 to obtain the translation of the target text. The translation engine 250 can translate the target text into different languages. For example, the translation engine 250 may translate the text into the target language specified by the user, or the default language of the computing device 100 (for example, when the target language is not specified), or the input language of the input text (for example, the target language is not specified and the input language is different). In the case of the source language). Among them, the user can use the user interface to specify the target language (will be described in detail later).
获取目标文本的辅助信息之后,交互模块210可以使用显示有与输入相关的图像的用户界面来显示、或者突出显示该辅助信息。After obtaining the auxiliary information of the target text, the interaction module 210 may display or highlight the auxiliary information using a user interface displaying an image related to the input.
具体地,对于查找到的目标文本,交互模块210可以获取该目标文本在图像中的显示信息,并基于该显示信息来显示、或者突出显示目标文本的辅助信息。Specifically, for the found target text, the interaction module 210 may obtain the display information of the target text in the image, and display or highlight the auxiliary information of the target text based on the display information.
例如,可以基于目标文本的显示区域来配置辅助信息的显示区域。辅助信息的显示区域可以覆盖目标文本在图像中的对应文本块,也可以靠近目标文本在图像中的对应文本块(例如显示在对应文本块的周围)。For example, the display area of the auxiliary information can be configured based on the display area of the target text. The display area of the auxiliary information can cover the corresponding text block of the target text in the image, and can also be close to the corresponding text block of the target text in the image (for example, displayed around the corresponding text block).
例如,可以基于目标文本的显示样式来配置辅助信息的显示样式。在一些情况中,为了使辅助信息的显示可以与图像相协调,可以将辅助信息的至少部分显示样式配置为与目标文本的相应显示样式一致(例如字体大小、字体类型配置一致)。在另一些情况中,为了突出显示辅助信息,可以将辅助信息的至少部分显示样式配置为显著地区别于目标文 本的相应显示样式。例如可以将辅助信息的背景颜色(或者文本颜色)配置为亮色或者图像所包含的文本的背景颜色的反差色,来突出显示辅助信息。又例如可以采用诸如下划线、加粗、斜体、文本底色、文本底纹和文本边框之类的显示样式,来突出显示辅助信息。本发明实施例对用于显示或者突出显示辅助信息的具体显示样式不做限制。For example, the display style of the auxiliary information can be configured based on the display style of the target text. In some cases, in order to coordinate the display of the auxiliary information with the image, at least part of the display style of the auxiliary information may be configured to be consistent with the corresponding display style of the target text (for example, the font size and font type configuration are consistent). In other cases, in order to highlight the auxiliary information, at least part of the display style of the auxiliary information may be configured to be significantly different from the corresponding display style of the target text. For example, the background color (or text color) of the auxiliary information can be configured as a bright color or a contrast color of the background color of the text contained in the image to highlight the auxiliary information. For another example, display styles such as underline, bold, italic, text background, text shading, and text border may be used to highlight the auxiliary information. The embodiment of the present invention does not limit the specific display style used to display or highlight the auxiliary information.
此外,由于输入文本可以有多项,为了便于用户区分与不同项输入文本相匹配的目标文本,图像中与同一项输入文本相匹配的目标文本的标记或者其辅助信息的显示样式可以相同,图像中与不同项输入文本相匹配的目标文本的标记或者其辅助信息的显示样式可以不同。In addition, since there can be multiple input texts, in order to facilitate the user to distinguish target texts matching different input texts, the mark of the target text matching the same input text in the image or the display style of its auxiliary information can be the same. The mark of the target text or the display style of its auxiliary information that matches the input text of different items can be different.
根据本发明的一些实施方式,所显示的辅助信息可以是可编辑的文本,以便于用户进行复制和粘贴之类的编辑操作。According to some embodiments of the present invention, the displayed auxiliary information may be editable text, so that the user can perform editing operations such as copy and paste.
根据本发明的另一些实施方式,交互模块210还可以接收用户对所显示的辅助信息的缩放指令或者选择指令,并响应于用户对所显示的辅助信息的缩放指令或者选择指令,相应缩小或放大地显示该辅助信息,以便于用户的阅读。交互模块210还可以接收用户的保存指令,并响应于该保存指令将显示有辅助信息的上述图像存储至计算设备100,以便于用户的后续查看。According to other embodiments of the present invention, the interaction module 210 may also receive a user's zoom instruction or selection instruction on the displayed auxiliary information, and in response to the user's zoom instruction or selection instruction on the displayed auxiliary information, zoom in or out accordingly. Display the auxiliary information to facilitate the user's reading. The interaction module 210 may also receive a save instruction from the user, and in response to the save instruction, store the above-mentioned image displaying the auxiliary information in the computing device 100, so as to facilitate subsequent viewing by the user.
根据本发明的还有一些实施方式,交互模块210还可以接收用户对其他没有显示辅助信息的文本块的选择指令,并响应于用户对其他没有显示辅助信息的文本块的选择指令,获取并显示所选择的文本块包含的文本的辅助信息。具体显示的配置已在前文详细说明,此处不再赘述。According to some other embodiments of the present invention, the interaction module 210 may also receive a user's selection instruction for other text blocks that do not display auxiliary information, and in response to the user's selection instruction for other text blocks that do not display auxiliary information, obtain and display The auxiliary information of the text contained in the selected text block. The specific display configuration has been described in detail in the previous section, and will not be repeated here.
下面结合图4A~图4C来描述根据本发明实施例的输入处理方案为用户定位目标文本并显示目标文本的翻译的过程。The following describes the process of locating the target text for the user and displaying the translation of the target text according to the input processing solution of the embodiment of the present invention with reference to FIGS. 4A to 4C.
图4A~图4C示出了根据本发明一个实施例的用于显示图像中目标文本的翻译的用户界面的屏幕截图。如图4A所示,用户界面410显示有图像411,该图像411可以是响应于用户对相机按钮412的选择而采集到的。图像411包括多个文本块,每个文本块包括一项文本。在图4A的示例中,图像411包括的每个文本块均包括一项英文的文本,例如文本块 416包括一项英文的文本“King’s Cross St Pancras”。4A to 4C show screenshots of a user interface for displaying translation of target text in an image according to an embodiment of the present invention. As shown in FIG. 4A, the user interface 410 displays an image 411, and the image 411 may be collected in response to the user's selection of the camera button 412. The image 411 includes a plurality of text blocks, and each text block includes one item of text. In the example of FIG. 4A, each text block included in the image 411 includes an English text. For example, the text block 416 includes an English text "King's Cross St Pancras".
用户界面410还可以使用户选择用于翻译的语言。具体地,用户界面410使用户能选择源语言413,这样源语言的多项文本将在图像411中被识别到。用户界面410还使用户能选择文本将被翻译为的目标语言414。在图4A的示例中,用户选择了源语言413为英文以及目标语言414为中文。也就是说,用户想要将在图像411中识别的英文文本翻译为中文文本。The user interface 410 may also enable the user to select a language for translation. Specifically, the user interface 410 enables the user to select the source language 413, so that multiple texts in the source language will be recognized in the image 411. The user interface 410 also enables the user to select the target language 414 into which the text will be translated. In the example of FIG. 4A, the user has selected the source language 413 as English and the target language 414 as Chinese. That is, the user wants to translate the English text recognized in the image 411 into Chinese text.
用户界面410还可以使用户进行输入。具体地,用户界面包括输入按钮415。当用户选择输入按钮415时,响应于用户对该输入按钮415的选择,可以显示如图4B所示的用户界面420。The user interface 410 may also enable the user to input. Specifically, the user interface includes an input button 415. When the user selects the input button 415, in response to the user's selection of the input button 415, the user interface 420 as shown in FIG. 4B may be displayed.
用户界面420类似于用户界面410包括图像411和输入按钮415,还包括控件421。该控件421可以接收用户的输入,并且还可以提示用户进行目标语言414的输入。例如,可以输入目标语言414的输入文本或者包括目标语言414的输入文本的图像/音频。在图4B的示例中,控件421提示用户输入中文的输入文本。当用户进行输入之后,响应于控件421接收到用户的输入,可以显示如图4C所示的用户界面430。The user interface 420 is similar to the user interface 410 and includes an image 411 and an input button 415, and also includes a control 421. The control 421 can receive the user's input, and can also prompt the user to enter the target language 414. For example, the input text in the target language 414 or the image/audio including the input text in the target language 414 may be input. In the example of FIG. 4B, the control 421 prompts the user to input Chinese input text. After the user inputs, in response to the control 421 receiving the user's input, a user interface 430 as shown in FIG. 4C may be displayed.
用户界面430类似于用户界面410包括图像411和输入按钮415,还包括控件431。该控件431可以显示从用户的输入中得到的一项或者多项输入文本,并且还可以显示基于这一项或者多项输入文本在图像411中查找相匹配的至少一项文本的查找结果。在图4C的示例中,控件431显示了一项输入文本“国王十字”以及该项输入文本的查找结果“已找到一处国王十字”。The user interface 430 is similar to the user interface 410 and includes an image 411 and an input button 415, and also includes a control 431. The control 431 can display one or more items of input text obtained from the user's input, and can also display the search result of at least one item of text that matches in the image 411 based on the one or more items of input text. In the example of FIG. 4C, the control 431 displays an input text "King's Cross" and the search result of the input text "A King's Cross has been found".
用户界面430还显示有图像411之上的覆盖432。该覆盖432显示图像411中与用户的输入文本相匹配的目标文本的翻译433,并覆盖于相匹配的目标文本对应的文本块。在图4C的示例中,图像411中文本块416所包含的一项文本“King’s Cross St Pancras”与用户的一项输入文本“国王十字”相匹配,覆盖432显示该项文本“King’s Cross St Pancras”的翻译433“国王十字”,并覆盖于对应的文本块416。The user interface 430 also displays an overlay 432 on top of the image 411. The overlay 432 displays the translation 433 of the target text matching the user's input text in the image 411, and overlays the text block corresponding to the matching target text. In the example in Figure 4C, the text "King's Cross St Pancras" contained in the text block 416 in the image 411 matches the user's input text "King's Cross", covering 432 to display the text "King's Cross St Pancras" "The translation 433 "King's Cross" and overlays the corresponding text block 416.
用户界面430可以基于相匹配的目标文本的显示样式来配置目标文本的翻译的显示样式。在图4C的示例中,翻译433与相匹配的目标文本“King’s Cross St Pancras”的背景颜色类似(例如均为白色),文本颜色类似(例如均为蓝色),这样有助于将翻译433与图像411协调地显示。此外,用户界面430还可以突出地显示翻译。在图4C的示例中,翻译433的文本底色为亮色(例如亮蓝色),这样有助于将翻译433突出地显示。The user interface 430 may configure the display style of the translation of the target text based on the display style of the matched target text. In the example in Figure 4C, the background color of the translation 433 and the matching target text "King's Cross St Pancras" are similar (for example, both are white), and the text color is similar (for example, both are blue), which helps to translate the translation 433 It is displayed in coordination with the image 411. In addition, the user interface 430 may also highlight the translation. In the example of FIG. 4C, the background color of the text of the translation 433 is a bright color (for example, bright blue), which helps to highlight the translation 433.
根据图4A-图4C,用户可以快速且简便地定位到图像411中其关注或者感兴趣的目标文本并获取其翻译,而不需要被动地去阅读图像中所有文本的翻译。例如,采集图像411并通过用户界面420输入“国王十字”之后,用户可以很容易地通过用户界面430定位“国王十字”地铁站并看到其翻译。According to FIGS. 4A to 4C, the user can quickly and easily locate the target text of interest or interest in the image 411 and obtain its translation, without passively reading the translation of all the text in the image. For example, after capturing the image 411 and inputting "Kings Cross" through the user interface 420, the user can easily locate the "Kings Cross" subway station through the user interface 430 and see its translation.
用户界面430还可以使用户能下载并存储显示有翻译433的图像411。具体地,用户界面430包括下载按钮434。响应于用户对下载按钮434的选择,显示有翻译433的图像411可以被下载并存储,以便于用户后续查看。The user interface 430 may also enable the user to download and store the image 411 with the translation 433 displayed. Specifically, the user interface 430 includes a download button 434. In response to the user's selection of the download button 434, the image 411 displaying the translation 433 can be downloaded and stored for the user to view later.
图5示出了根据本发明一个实施例的输入处理方法500的流程图。该输入方法500可以在输入处理装置200中执行。FIG. 5 shows a flowchart of an input processing method 500 according to an embodiment of the present invention. The input method 500 can be executed in the input processing device 200.
如图5所示,输入处理方法500始于步骤S510。在步骤S510中,接收用户的输入。在步骤S520中,获取与用户的输入相关的图像。As shown in FIG. 5, the input processing method 500 starts at step S510. In step S510, the user's input is received. In step S520, an image related to the user's input is acquired.
在一些实施例中,用户的输入至少可以包括以下之一:文本、图像、视频和音频,在接收用户的输入之后,可以基于用户的输入获取输入文本。In some embodiments, the user's input may include at least one of the following: text, image, video, and audio. After receiving the user's input, the input text may be obtained based on the user's input.
在获取与用户的输入相关的图像之后,在步骤S530中,查找该图像中与用户的输入相匹配的目标文本。具体地,可以获取该图像包含的文本,再将该图像包含的文本与上述输入文本进行文本匹配,以查找相匹配的目标文本。其中,在输入文本的语言不同于该图像包含的文本的语言的情况下,还可以获取输入文本到该图像包含的文本的语言的翻译, 将该图像包含的文本与输入文本的翻译进行文本匹配。After acquiring the image related to the user's input, in step S530, search for target text in the image that matches the user's input. Specifically, the text contained in the image can be obtained, and the text contained in the image can be matched with the input text to find the matching target text. Wherein, when the language of the input text is different from the language of the text contained in the image, the translation of the input text into the language of the text contained in the image can also be obtained, and the text contained in the image can be matched with the translation of the input text. .
对于查找到的目标文本,可以根据步骤S540,在与输入相关的图像中突出显示目标文本,和/或显示该目标文本的辅助信息。其中,也可以突出地显示辅助信息。For the found target text, according to step S540, the target text may be highlighted in the input-related image, and/or auxiliary information of the target text may be displayed. Among them, auxiliary information can also be highlighted.
根据本发明的一些实施方式,对于查找到的目标文本,可以获取目标文本在上述图像中的显示信息,并基于显示信息来显示或者突出显示辅助信息。According to some embodiments of the present invention, for the searched target text, the display information of the target text in the above image can be acquired, and the auxiliary information can be displayed or highlighted based on the display information.
具体地,显示信息至少可以包括目标文本的显示区域和/或显示样式。在一些实施例中,可以基于目标文本的显示区域来配置辅助信息的显示区域,例如辅助信息的显示区域可以覆盖或者靠近目标文本的显示区域。还可以基于目标文本的显示样式来配置辅助信息的显示样式。Specifically, the display information may at least include the display area and/or display style of the target text. In some embodiments, the display area of the auxiliary information may be configured based on the display area of the target text. For example, the display area of the auxiliary information may cover or be close to the display area of the target text. The display style of the auxiliary information can also be configured based on the display style of the target text.
关于输入处理方法500中各步骤的详细处理逻辑和实施过程可以参见前文结合图1-图4C对输入处理装置200的相关描述,此处不再赘述。For the detailed processing logic and implementation process of each step in the input processing method 500, please refer to the previous description of the input processing device 200 in conjunction with FIGS. 1 to 4C, which will not be repeated here.
图6示出了根据本发明一个实施例的输入处理方法600的流程图。该输入方法600可以在输入处理装置200中执行。如图6所示,输入处理方法600始于步骤S610。Fig. 6 shows a flowchart of an input processing method 600 according to an embodiment of the present invention. The input method 600 can be executed in the input processing device 200. As shown in FIG. 6, the input processing method 600 starts at step S610.
在步骤S610中,接收用户的输入。在步骤S620中,获取与用户的输入相关的图像。在一些实施例中,在接收用户的输入之后,还可以基于输入获取输入文本。In step S610, a user's input is received. In step S620, an image related to the user's input is acquired. In some embodiments, after receiving the user's input, the input text may also be obtained based on the input.
而后,在步骤S630中,查找该图像中与用户的输入相匹配的目标文本。具体地,可以获取该图像包含的文本。其中,输入文本以输入语言表示,该图像包含的文本以不同于输入语言的源语言表示,因此,需要获取输入文本到源语言的翻译,再将该图像包含的文本与输入文本的翻译进行文本匹配,以查找相匹配的目标文本。翻译可以包括目标文本到输入语言的翻译、或者目标文本到用户指定的目标语言的翻译。Then, in step S630, search for the target text in the image that matches the user's input. Specifically, the text contained in the image can be obtained. Among them, the input text is expressed in the input language, and the text contained in the image is expressed in a source language different from the input language. Therefore, it is necessary to obtain the translation of the input text to the source language, and then perform the text translation between the text contained in the image and the input text Match to find matching target text. The translation may include the translation of the target text into the input language, or the translation of the target text into the target language specified by the user.
对于查找到的目标文本,可以根据步骤S640,在该图像中突出显示目标文本,和/或显示该目标文本的翻译。For the found target text, according to step S640, the target text may be highlighted in the image, and/or the translation of the target text may be displayed.
关于输入处理方法600中各步骤的详细处理逻辑和实施过程可以参见前文结合图1-图5对输入处理装置200以及输入处理方法500的相关描述,此处不再赘述。For the detailed processing logic and implementation process of each step in the input processing method 600, please refer to the previous description of the input processing device 200 and the input processing method 500 in conjunction with FIGS. 1 to 5, which will not be repeated here.
本领域技术人员应当理解,尽管输入处理装置200被图示为包括交互模块210、识别模块220、文本匹配模块230、图像获取模块240以及翻译引擎250,但这些模块中的一个或多个可以被存储在其他设备上和/或由其他设备执行,诸如与输入处理装置200通信的服务器(例如可以进行图像识别、语音识别、文本匹配、以及语言翻译的服务器)。在一些实施例中,输入处理装置200可以接收用户的输入,获取与输入相关的图像,将输入与图像发送至服务器。服务器查找该图像中该输入相匹配的目标文本,将查找结果返回给输入处理装置200。输入处理装置200再在该图像中突出显示服务器查找到的目标文本,和/或显示目标文本的辅助信息。关于服务器查找目标文本的详细处理逻辑和实施过程可以参见前文结合图1-图5对输入处理装置200以及输入处理方法500的相关描述,此处不再赘述。Those skilled in the art should understand that although the input processing apparatus 200 is illustrated as including an interaction module 210, a recognition module 220, a text matching module 230, an image acquisition module 240, and a translation engine 250, one or more of these modules may be It is stored on and/or executed by other devices, such as a server that communicates with the input processing apparatus 200 (for example, a server that can perform image recognition, voice recognition, text matching, and language translation). In some embodiments, the input processing device 200 may receive user input, obtain an image related to the input, and send the input and the image to the server. The server searches for the target text that matches the input in the image, and returns the search result to the input processing device 200. The input processing device 200 then highlights the target text found by the server in the image, and/or displays auxiliary information of the target text. For the detailed processing logic and implementation process of the server to find the target text, please refer to the relevant description of the input processing apparatus 200 and the input processing method 500 in conjunction with FIGS. 1 to 5, and will not be repeated here.
下面对输入处理装置700及其执行的输入处理方法800进行描述。The following describes the input processing device 700 and the input processing method 800 executed by it.
与输入相关的图像可以认为包括有多个图像块,在一些情况下,用户希望在图像包括的多个图像块中定位其关注的图像块,也就是目标图像。例如,输入处理装置700可以接收用户对想要购买的商品的输入,并获取商店货架的图像。商店货架的图像包括多个商品的标志图像,输入处理装置700可以在该图像中突出显示用户想要购买的商品的标志图像,使用户能快速定位到该商品。进一步地,还可以显示该商品的评价、介绍或参考价格,供用户参考。The image related to the input may be considered to include multiple image blocks. In some cases, the user wants to locate the image block he focuses on among the multiple image blocks included in the image, that is, the target image. For example, the input processing device 700 may receive the user's input of a commodity that he wants to purchase, and obtain an image of a store shelf. The image of the store shelf includes logo images of multiple commodities, and the input processing device 700 can highlight the logo images of the commodities that the user wants to purchase in the images, so that the user can quickly locate the commodities. Further, the evaluation, introduction or reference price of the product can also be displayed for the user's reference.
图7示出了根据本发明一个实施例的输入处理装置700的示意图。如图7所示,输入处理装置700包括交互模块710、图像获取模块720和图像匹配模块730。交互模块710可以接收用户的输入,该输入可以指示用户所关注的目标图像。图像获取模块720可以获取与用户的输入相关的图像。图像匹配模块730与交互模块710和图像获取模块720相耦接, 可以查找上述图像中与用户的输入相匹配的目标图像。对于查找到的目标图像,交互模块720可以在上述图像中突出显示该目标图像,和/或显示该目标图像的辅助信息。FIG. 7 shows a schematic diagram of an input processing device 700 according to an embodiment of the present invention. As shown in FIG. 7, the input processing device 700 includes an interaction module 710, an image acquisition module 720, and an image matching module 730. The interaction module 710 may receive a user's input, and the input may indicate a target image that the user pays attention to. The image acquisition module 720 may acquire an image related to the user's input. The image matching module 730 is coupled to the interaction module 710 and the image acquisition module 720, and can search for a target image matching the user's input among the above-mentioned images. For the found target image, the interaction module 720 may highlight the target image in the above-mentioned image, and/or display auxiliary information of the target image.
关于输入处理装置700中各模块的详细处理逻辑和实施过程参见下文结合图8对输入处理方法800的相关描述。For the detailed processing logic and implementation process of each module in the input processing device 700, please refer to the related description of the input processing method 800 in conjunction with FIG. 8 below.
图8示出了根据本发明一个实施例的输入处理方法800的流程图。如图8所示,输入处理方法800在输入处理装置700中执行,并始于步骤S810。FIG. 8 shows a flowchart of an input processing method 800 according to an embodiment of the present invention. As shown in FIG. 8, the input processing method 800 is executed in the input processing device 700 and starts at step S810.
在步骤S810中,可以接收用户的输入。在一些实施例中,可以经由用户界面来接收用户的输入,该输入可以指示用户所关注的目标图像。In step S810, a user's input may be received. In some embodiments, a user's input may be received via a user interface, and the input may indicate a target image that the user focuses on.
用户的输入可以是各种形式的,例如可以包括但不限于以下之一:文本、图像、视频和音频。如果用户的输入包括文本、音频或者视频,那么还需要基于用户的输入获取输入图像。其中,如果用户输入图像,该图像即为输入图像。如果用户输入文本,可以基于该文本获取输入图像(例如通过搜索引擎获取)。如果用户输入音频,可以基于该音频获取文本,再基于该文本获取输入图像。例如用户输入“X公仔”的文本或者说出“X公仔”的语音,均可以基于该输入获取到“X”的图像。如果用户输入视频,由于视频包括图像和音频,可以参考输入图像和音频的情况来获取输入图像。The user's input can be in various forms, for example, it can include but is not limited to one of the following: text, image, video, and audio. If the user's input includes text, audio, or video, then the input image needs to be obtained based on the user's input. Among them, if the user inputs an image, the image is the input image. If the user inputs text, the input image can be obtained based on the text (for example, obtained through a search engine). If the user inputs audio, the text can be obtained based on the audio, and then the input image can be obtained based on the text. For example, if the user inputs the text of "X doll" or speaks the voice of "X doll", the image of "X" can be obtained based on the input. If the user inputs a video, since the video includes images and audio, the input image can be obtained by referring to the input image and audio.
输入图像可以是一个或者多个输入图像。在用户直接连续输入图像的情况下,多个输入图像自然可以区分开。在用户输入文本的情况下,可以先获取多项文本(多项文本可以根据如标点符号之类的分隔符来区分开),再基于各项文本获取各个输入图像。在用户输入音频的情况下,可以先基于音频获取多项文本(基于音频获取到的多项文本可以通过停顿或者说出诸如“间隔”这样的分隔词来区分开),再基于各项文本分别获取各个输入图像。在用户输入视频的情况下,由于视频包括连续的图像和音频,可以参考连续输入图像和/或输入音频的情况处理。The input image can be one or more input images. In the case where the user directly inputs images continuously, multiple input images can naturally be distinguished. In the case where the user inputs text, multiple texts can be obtained first (multiple texts can be distinguished by separators such as punctuation marks), and then each input image is obtained based on each text. When the user inputs audio, multiple texts can be obtained based on the audio first (multiple texts obtained based on the audio can be distinguished by pausing or saying a separator such as "interval"), and then based on each text separately Obtain each input image. In the case where the user inputs a video, since the video includes continuous images and audio, it can be handled with reference to the case of continuous input of images and/or input of audio.
在步骤S820中,可以获取与用户的输入相关的图像。例如,可以使 用计算设备100的相机来采集相关图像,也可以经由网络接收发送给输入处理装置700的相关图像,本发明对相关图像的来源不做限制。In step S820, an image related to the user's input may be acquired. For example, the camera of the computing device 100 can be used to collect related images, or the related images sent to the input processing device 700 can be received via the network, and the source of the related images is not limited in the present invention.
而后,根据步骤S830,可以查找该图像中与用户的输入相匹配的目标图像。其中,对于每一个输入图像,均查找该图像中与该(个)输入图像相匹配的目标图像。Then, according to step S830, a target image in the image that matches the input of the user can be searched for. Wherein, for each input image, a target image matching the input image(s) in the image is searched.
以查找图像中与一个输入图像相匹配的目标图像为例,首先可以获取该图像和该(个)输入图像的图像特征,再基于获取的图像特征,将该图像与该输入图像进行图像匹配,以查找该图像中与该输入图像相匹配的目标图像。其中,可以采用任何图像特征提取技术来获取图像特征,可以采用任何图像匹配技术来进行图像匹配,本发明对此不做限制。Taking the search for a target image that matches an input image as an example, the image features of the image and the input image(s) can be acquired first, and then based on the acquired image features, the image can be matched with the input image. To find the target image in the image that matches the input image. Wherein, any image feature extraction technology can be used to obtain image features, and any image matching technology can be used to perform image matching, which is not limited in the present invention.
在一些实施例中,如果查找不到图像中与该输入图像相匹配的目标图像,可以利用用户界面来提示用户。如果查找到图像中与该项输入图像相匹配的目标图像,可以利用计算设备100的振动、响铃、信号灯闪烁和/或用户界面来提示用户。In some embodiments, if a target image matching the input image cannot be found in the image, the user interface can be used to prompt the user. If a target image in the image that matches the input image of the item is found, the vibration, ringing, blinking of the signal light and/or the user interface of the computing device 100 can be used to prompt the user.
对于查找到了相匹配的目标图像的各输入图像,根据步骤S840,可以在上述图像中突出地显示与该项输入图像相匹配的目标图像、和/或显示目标图像的辅助信息。其中,辅助信息可以是诸如介绍、评价、参考价格之类的信息,本发明对此不做限制。例如,可以从各种搜索引擎、点评网站、或者购物平台处获取与目标图像相关的辅助信息。For each input image for which a matching target image is found, according to step S840, the target image matching the input image and/or auxiliary information of the target image may be displayed prominently in the above-mentioned image. Wherein, the auxiliary information may be information such as introduction, evaluation, reference price, etc., which is not limited in the present invention. For example, auxiliary information related to the target image can be obtained from various search engines, review websites, or shopping platforms.
在一些实施例中,可以对图像中的目标图像添加各种诸如边框(例如矩形框)、形状(例如箭头)、线段(例如波浪线)等之类的标记,从而突出显示目标图像。这样,就使得用户可以迅速定位到目标图像,而无需逐一查找。In some embodiments, various marks such as borders (for example, rectangular frames), shapes (for example, arrows), line segments (for example, wavy lines), etc. may be added to the target image in the image, so as to highlight the target image. In this way, the user can quickly locate the target image without having to search one by one.
在另一些实施例中,可以获取目标图像在上述图像中的显示区域,显示区域指示目标图像在图像中的位置。而后,基于目标图像的显示区域来配置辅助信息的显示区域,例如,辅助信息的显示区域可以覆盖或者靠近目标图像的显示区域。进一步地,如果与输入相关的图像包含有 文本,还可以基于图像所包含的文本的显示样式来配置辅助信息的显示样式。在一些情况中,为了使辅助信息的显示可以与图像相协调,可以将辅助信息的至少部分显示样式配置为与图像中文本的相应显示样式一致(例如字体大小、字体类型配置一致)。在另一些情况中,为了突出显示辅助信息,可以将辅助信息的至少部分显示样式配置为显著地区别于图像所包含的文本的相应显示样式。例如可以将辅助信息的背景颜色(或者文本颜色)配置为亮色或者图像所包含的文本的背景颜色的反差色来突出显示辅助信息。又例如可以采用诸如下划线、加粗、斜体、文本底色、文本底纹和文本边框之类的显示样式来突出显示辅助信息。本发明实施例对用于显示或者突出显示辅助信息的具体显示样式不做限制。In other embodiments, the display area of the target image in the above-mentioned image may be obtained, and the display area indicates the position of the target image in the image. Then, the display area of the auxiliary information is configured based on the display area of the target image. For example, the display area of the auxiliary information may cover or be close to the display area of the target image. Further, if the image related to the input contains text, the display style of the auxiliary information can also be configured based on the display style of the text contained in the image. In some cases, in order to coordinate the display of the auxiliary information with the image, at least part of the display style of the auxiliary information may be configured to be consistent with the corresponding display style of the text in the image (for example, the font size and font type configuration are consistent). In other cases, in order to highlight the auxiliary information, at least part of the display style of the auxiliary information may be configured to be significantly different from the corresponding display style of the text contained in the image. For example, the background color (or text color) of the auxiliary information can be configured as a bright color or a contrast color of the background color of the text contained in the image to highlight the auxiliary information. For another example, display styles such as underline, bold, italic, text background, text shading, and text border may be used to highlight the auxiliary information. The embodiment of the present invention does not limit the specific display style used to display or highlight the auxiliary information.
此外,由于输入图像可以有多个,为了便于用户区分与不同输入图像相匹配的目标图像,上述图像中与同一输入图像相匹配的目标图像的标记和/或其辅助信息的显示样式可以相同,上述图像中与不同输入图像相匹配的目标图像的标记和/或其辅助信息的显示样式可以不同。In addition, since there can be multiple input images, in order to facilitate the user to distinguish target images that match different input images, the mark and/or display style of the auxiliary information of the target image matching the same input image in the above-mentioned images can be the same, The display styles of the mark and/or auxiliary information of the target image matching different input images in the above-mentioned images may be different.
根据本发明的一些实施方式,所显示的辅助信息可以是可编辑的文本,以便于用户进行复制和粘贴之类的编辑操作。根据本发明的另一些实施方式,还可以接收用户对所显示的辅助信息的缩放指令或者选择指令,并响应于用户对所显示的辅助信息的缩放指令或者选择指令,相应缩小或放大地显示该辅助信息,以便于用户的阅读。还可以接收用户的保存指令,并响应于该保存指令将显示有辅助信息的上述图像存储至计算设备100,以便于用户的后续查看。According to some embodiments of the present invention, the displayed auxiliary information may be editable text, so that the user can perform editing operations such as copy and paste. According to other embodiments of the present invention, it is also possible to receive a user's zoom instruction or selection instruction for the displayed auxiliary information, and in response to the user's zoom instruction or selection instruction for the displayed auxiliary information, display the display in a correspondingly reduced or enlarged manner. Auxiliary information to facilitate the user's reading. It is also possible to receive a save instruction from the user, and in response to the save instruction, store the above-mentioned image displaying the auxiliary information in the computing device 100 to facilitate subsequent viewing by the user.
根据本发明的还有一些实施方式,还可以接收用户对其他没有显示辅助信息的图像块的选择指令,并响应于用户对其他没有显示辅助信息的图像块的选择指令,获取并显示所选择的图像块的辅助信息。具体显示的配置已在前文详细说明,此处不再赘述。According to some other embodiments of the present invention, it is also possible to receive a user's selection instruction for other image blocks that do not display auxiliary information, and in response to the user's selection instruction for other image blocks that do not display auxiliary information, obtain and display the selected Auxiliary information of the image block. The specific display configuration has been described in detail in the previous section, and will not be repeated here.
关于输入处理方法800中各步骤的部分详细处理逻辑和实施过程可以参见前文结合图1-图5对输入处理装置200以及输入处理方法500的 相关描述。此处不再赘述。For part of the detailed processing logic and implementation process of each step in the input processing method 800, please refer to the previous description of the input processing device 200 and the input processing method 500 in conjunction with FIGS. 1 to 5. I won't repeat them here.
本领域技术人员应当理解,前文描述的输入处理装置700执行的输入处理方法800中的一个或多个步骤也可以由其他设备执行,诸如与输入处理装置700通信的服务器。在一些实施例中,输入处理装置700可以接收用户的输入,获取与输入相关的图像,将输入与图像发送至服务器。服务器查找该图像中该输入相匹配的目标图像,将查找结果返回给输入处理装置700。输入处理装置700再在该图像中突出显示服务器查找到的目标图像,和/或显示目标图像的辅助信息。Those skilled in the art should understand that one or more steps in the input processing method 800 executed by the input processing apparatus 700 described above may also be executed by other devices, such as a server communicating with the input processing apparatus 700. In some embodiments, the input processing device 700 may receive user input, obtain an image related to the input, and send the input and the image to the server. The server searches for a target image matching the input in the image, and returns the search result to the input processing device 700. The input processing device 700 then highlights the target image found by the server in the image, and/or displays auxiliary information of the target image.
本领域技术人员还应当理解,本发明实施例可以不仅限于在图像中定位目标文本或目标图像,还可以扩展至在诸如文本、图像、音频之类的各种源对象中定位诸如目标文本、目标图像之类的各种目标对象。Those skilled in the art should also understand that the embodiments of the present invention can not only be limited to locating target text or target images in images, but can also be extended to locating target texts, target images, etc. in various source objects such as text, images, and audio. Various target objects such as images.
图9示出了根据本发明一个实施例的输入处理方法900的流程图。如图9所示,该输入处理方法900始于步骤S910。FIG. 9 shows a flowchart of an input processing method 900 according to an embodiment of the present invention. As shown in FIG. 9, the input processing method 900 starts at step S910.
在步骤S910中,接收用户的输入,用户的输入至少可以包括以下之一:文本、音频、视频和图像。在步骤S920中,获取与用户的输入相关的源对象,源对象至少可以包括以下之一:文本、图像和视频。In step S910, a user's input is received, and the user's input may include at least one of the following: text, audio, video, and image. In step S920, a source object related to the user's input is obtained. The source object may include at least one of the following: text, image, and video.
而后,可以在步骤S930中查找源对象中与用户的输入相匹配的目标对象。对于查找到的目标对象,根据步骤S940,可以在源对象中突出显示目标对象,和/或显示目标对象的辅助信息。Then, in step S930, a target object that matches the input of the user among the source objects can be searched. For the found target object, according to step S940, the target object may be highlighted in the source object, and/or auxiliary information of the target object may be displayed.
关于输入处理方法900中各步骤的部分详细处理逻辑和实施过程可以参见前文结合图1-图8对输入处理装置200以及输入处理方法500、输入处理装置700以及输入处理方法800的相关描述。此处不再赘述。For the detailed processing logic and implementation process of each step in the input processing method 900, please refer to the foregoing description of the input processing device 200, the input processing method 500, the input processing device 700, and the input processing method 800 in conjunction with FIGS. 1-8. I won't repeat them here.
本领域技术人员应当理解,源对象通常可以包括多个对象,例如包括多项文本或者多个图像块。目标对象则可以是源对象所包括的多个对象中用户关注的对象,例如可以是目标文本、目标图像或者其他任何合适的对象。本发明对目标对象和源对象的具体类型不做限制,从图像中查找目标文本、从文本中查找目标文本、从图像中查找目标图像等等从源对象中查找目标对象的场景均在本发明的保护范围之内。Those skilled in the art should understand that the source object may generally include multiple objects, for example, multiple pieces of text or multiple image blocks. The target object may be an object that the user pays attention to among multiple objects included in the source object, for example, it may be a target text, a target image, or any other suitable objects. The present invention does not limit the specific types of the target object and the source object. The scenes in which the target text is searched from the image, the target text is searched from the text, the target image is searched from the image, etc. are all in the present invention. Within the scope of protection.
本领域技术人员还应当理解,输入处理方法900中的一个或多个步骤也可以由其他设备执行,诸如与执行输入处理方法900的输入处理装置通信的服务器。在一些实施例中,可以接收用户的输入,获取与输入相关的图像,将输入与图像发送至服务器。服务器查找该图像中该输入相匹配的目标图像,将查找结果返回。执行输入处理方法900的输入处理装置再在该图像中突出显示服务器查找到的目标图像,和/或显示目标图像的辅助信息。Those skilled in the art should also understand that one or more steps in the input processing method 900 may also be executed by other devices, such as a server that communicates with the input processing apparatus that executes the input processing method 900. In some embodiments, a user's input may be received, an image related to the input may be obtained, and the input and the image may be sent to the server. The server searches for the target image that matches the input in the image, and returns the search result. The input processing device that executes the input processing method 900 then highlights the target image found by the server in the image, and/or displays auxiliary information of the target image.
根据本发明实施例的输入处理方案,通过接收用户主动的输入来为用户突出显示源对象(例如图像)中与其输入相匹配的目标对象(例如目标文本或者目标图像),和/或显示目标对象的辅助信息,避免了在源对象中逐一阅读查找用户关注的目标对象的繁琐操作,提高了用户体验。并且,根据本发明实施例的输入处理方案仅需获取源对象包括的目标对象的辅助信息,极大地降低了工作量。According to the input processing solution of the embodiment of the present invention, by receiving active input from the user, the target object (for example, target text or target image) matching the input of the source object (for example, image) is highlighted for the user, and/or the target object is displayed The auxiliary information in the source object avoids the tedious operation of reading and finding the target object that the user pays attention to in the source object, and improves the user experience. Moreover, the input processing solution according to the embodiment of the present invention only needs to obtain the auxiliary information of the target object included in the source object, which greatly reduces the workload.
这里描述的各种技术可结合硬件或软件,或者它们的组合一起实现。从而,本发明实施例的方法和设备,或者本发明实施例的方法和设备的某些方面或部分可采取嵌入有形媒介,例如可移动硬盘、U盘、软盘、CD-ROM或者其它任意机器可读的存储介质中的程序代码(即指令)的形式,其中当程序被载入诸如计算机之类的机器,并被机器执行时,该机器变成实践本发明实施例的设备。The various technologies described here can be implemented in combination with hardware or software, or a combination of them. Therefore, the method and device of the embodiment of the present invention, or some aspects or parts of the method and device of the embodiment of the present invention may be embedded in a tangible medium, such as a removable hard disk, U disk, floppy disk, CD-ROM, or any other machine. A form of program code (ie, instructions) in a read storage medium, where when the program is loaded into a machine such as a computer and executed by the machine, the machine becomes a device for practicing the embodiment of the present invention.
在程序代码在可编程计算机上执行的情况下,计算设备一般包括处理器、处理器可读的存储介质(包括易失性和非易失性存储器和/或存储元件),至少一个输入装置,和至少一个输出装置。其中,存储器被配置用于存储程序代码;处理器被配置用于根据该存储器中存储的程序代码中的指令,执行本发明实施例的方法。When the program code is executed on a programmable computer, the computing device generally includes a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), and at least one input device, And at least one output device. The memory is configured to store program code; the processor is configured to execute the method of the embodiment of the present invention according to instructions in the program code stored in the memory.
以示例而非限制的方式,可读介质包括可读存储介质和通信介质。可读存储介质存储诸如计算机可读指令、数据结构、程序模块或其它数据等信息。通信介质一般以诸如载波或其它传输机制等已调制数据信号来体现计算机可读指令、数据结构、程序模块或其它数据,并且包括任 何信息传递介质。以上的任一种的组合也包括在可读介质的范围之内。By way of example and not limitation, readable media include readable storage media and communication media. The readable storage medium stores information such as computer readable instructions, data structures, program modules, or other data. Communication media generally embody computer-readable instructions, data structures, program modules or other data in modulated data signals such as carrier waves or other transmission mechanisms, and include any information delivery media. Combinations of any of the above are also included in the scope of readable media.
在此处所提供的说明书中,算法和显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与本发明实施例的示例一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本发明实施例也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本发明实施例的内容,并且上面对特定语言所做的描述是为了披露本发明实施例的最佳实施方式。In the instructions provided here, the algorithms and displays are not inherently related to any particular computer, virtual system or other equipment. Various general-purpose systems can also be used with the examples of embodiments of the present invention. Based on the above description, the structure required to construct this type of system is obvious. In addition, the embodiments of the present invention are not directed to any specific programming language. It should be understood that various programming languages can be used to implement the content of the embodiments of the present invention described herein, and the above description of specific languages is for the purpose of disclosing the best implementation of the embodiments of the present invention.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明实施例的实施例可以在没有这些具体细节的情况下被实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the instructions provided here, a lot of specific details are explained. However, it can be understood that the embodiments of the embodiments of the present invention can be practiced without these specific details. In some instances, well-known methods, structures, and technologies are not shown in detail, so as not to obscure the understanding of this specification.
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本发明实施例的示例性实施例的描述中,本发明实施例的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明实施例要求比在每个权利要求中所明确记载的特征更多特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明实施例的单独实施例。Similarly, it should be understood that in order to simplify the present disclosure and help understand one or more of the various aspects of the invention, in the above description of the exemplary embodiments of the embodiments of the present invention, the various features of the embodiments of the present invention are sometimes grouped together into Single embodiment, figure, or description thereof. However, the disclosed method should not be interpreted as reflecting the intention that the claimed embodiments of the present invention require more features than those explicitly recorded in each claim. More precisely, as reflected in the following claims, the inventive aspect lies in less than all the features of a single embodiment disclosed previously. Therefore, the claims following the specific embodiment are thus explicitly incorporated into the specific embodiment, wherein each claim itself serves as a separate embodiment of the embodiment of the present invention.
本领域那些技术人员应当理解在本文所公开的示例中的设备的模块或单元或组件可以布置在如该实施例中所描述的设备中,或者可替换地可以定位在与该示例中的设备不同的一个或多个设备中。前述示例中的模块可以组合为一个模块或者此外可以分成多个子模块。Those skilled in the art should understand that the modules or units or components of the device in the example disclosed herein can be arranged in the device as described in this embodiment, or alternatively can be positioned differently from the device in this example Of one or more devices. The modules in the foregoing examples can be combined into one module or, in addition, can be divided into multiple sub-modules.
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样 的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that it is possible to adaptively change the modules in the device in the embodiment and set them in one or more devices different from the embodiment. The modules or units or components in the embodiments can be combined into one module or unit or component, and in addition, they can be divided into multiple sub-modules or sub-units or sub-components. Except that at least some of such features and/or processes or units are mutually exclusive, any combination can be used to compare all the features disclosed in this specification (including the accompanying claims, abstract and drawings) and any method or methods disclosed in this manner or All the processes or units of the equipment are combined. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract and drawings) may be replaced by an alternative feature providing the same, equivalent or similar purpose.
此外,本领域的技术人员能够理解,尽管在此所描述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明实施例的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。In addition, those skilled in the art can understand that although some embodiments described herein include certain features included in other embodiments but not other features, the combination of features of different embodiments means that they are in the embodiments of the present invention. Within the scope of and form different embodiments. For example, in the following claims, any one of the claimed embodiments can be used in any combination.
此外,上述实施例中的一些在此被描述成可以由计算机系统的处理器或者由执行上述功能的其它装置实施的方法或方法元素的组合。因此,具有用于实施上述方法或方法元素的必要指令的处理器形成用于实施该方法或方法元素的装置。此外,装置实施例的在此所描述的元素是如下装置的例子:该装置用于实施由为了实施该发明的目的的元素所执行的功能。In addition, some of the above-mentioned embodiments are described herein as methods or combinations of method elements that can be implemented by a processor of a computer system or by other devices that perform the above-mentioned functions. Therefore, a processor with the necessary instructions for implementing the method or method element described above forms a device for implementing the method or method element. In addition, the elements described herein of the device embodiments are examples of devices for implementing functions performed by the elements for the purpose of implementing the invention.
如在此所使用的那样,除非另行规定,使用序数词“第一”、“第二”、“第三”等等来描述普通对象仅仅表示涉及类似对象的不同实例,并且并不意图暗示这样被描述的对象必须具有时间上、空间上、排序方面或者以任意其它方式的给定顺序。As used herein, unless otherwise specified, the use of ordinal numbers "first", "second", "third", etc. to describe ordinary objects merely refers to different instances of similar objects, and is not intended to imply such The described objects must have a given order in terms of time, space, order, or in any other way.
尽管根据有限数量的实施例描述了本发明实施例,但是受益于上面的描述,本技术领域内的技术人员明白,在由此描述的本发明实施例的范围内,可以设想其它实施例。此外,应当注意,本说明书中使用的语言主要是为了可读性和教导的目的而选择的,而不是为了解释或者限定本发明实施例的主题而选择的。在不偏离所附权利要求书的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。对于本发明实施例的范围,对本发明实施例所做的公开是说明性的而非限制性的,本发明实施例的范围由所附权利要求书限定。Although the embodiments of the present invention have been described in terms of a limited number of embodiments, benefiting from the above description, those skilled in the art understand that other embodiments can be envisaged within the scope of the embodiments of the present invention described thereby. In addition, it should be noted that the language used in this specification is mainly selected for readability and teaching purposes, not for explaining or limiting the subject matter of the embodiments of the present invention. Without departing from the scope and spirit of the appended claims, many modifications and alterations are obvious to those of ordinary skill in the art. Regarding the scope of the embodiments of the present invention, the disclosure of the embodiments of the present invention is illustrative and not restrictive, and the scope of the embodiments of the present invention is defined by the appended claims.

Claims (28)

  1. 一种输入处理方法,包括:An input processing method, including:
    接收用户的输入;Receive user input;
    获取与所述输入相关的图像;Acquiring an image related to the input;
    查找所述图像中与所述输入相匹配的目标文本;以及Find the target text in the image that matches the input; and
    对于查找到的目标文本,在所述图像中突出显示所述目标文本,和/或显示所述目标文本的辅助信息。For the found target text, highlight the target text in the image, and/or display auxiliary information of the target text.
  2. 如权利要求1所述的方法,其中,所述用户的输入至少包括以下之一:文本、图像、视频和音频,所述方法还包括:The method of claim 1, wherein the user's input includes at least one of the following: text, image, video, and audio, and the method further comprises:
    基于所述用户的输入获取输入文本。Obtain input text based on the user's input.
  3. 如权利要求2所述的方法,其中,查找所述图像中与所述输入相匹配的目标文本的步骤包括:3. The method of claim 2, wherein the step of finding the target text in the image that matches the input comprises:
    获取所述图像包含的文本;Acquiring the text contained in the image;
    将所述图像包含的文本与所述输入文本进行文本匹配。Perform text matching between the text contained in the image and the input text.
  4. 如权利要求3所述的方法,其中,将所述图像包含的文本与所述输入文本进行文本匹配的步骤包括:The method according to claim 3, wherein the step of text matching the text contained in the image with the input text comprises:
    在所述输入文本的语言不同于所述图像包含的文本的语言的情况下,获取所述输入文本到所述图像包含的文本的语言的翻译;In the case where the language of the input text is different from the language of the text contained in the image, acquiring a translation of the input text to the language of the text contained in the image;
    将所述图像包含的文本与所述输入文本的所述翻译进行文本匹配。Perform text matching between the text contained in the image and the translation of the input text.
  5. 如权利要求1所述的方法,其中,显示所述辅助信息的步骤包括:The method of claim 1, wherein the step of displaying the auxiliary information comprises:
    突出显示所述辅助信息。Highlight the auxiliary information.
  6. 如权利要求1或5所述的方法,还包括:The method of claim 1 or 5, further comprising:
    对于查找到的目标文本,获取所述目标文本在所述图像中的显示信息,所述显示信息至少包括所述目标文本的显示区域和/或显示样式。For the found target text, obtain display information of the target text in the image, and the display information includes at least the display area and/or display style of the target text.
  7. 如权利要求6所述的方法,其中,显示或者突出显示所述辅助信 息的步骤包括:The method of claim 6, wherein the step of displaying or highlighting the auxiliary information comprises:
    基于所述目标文本的显示区域来配置所述辅助信息的显示区域,所述辅助信息的显示区域覆盖所述目标文本的显示区域。The display area of the auxiliary information is configured based on the display area of the target text, and the display area of the auxiliary information covers the display area of the target text.
  8. 如权利要求6所述的方法,其中,显示或者突出显示所述辅助信息的步骤包括:The method of claim 6, wherein the step of displaying or highlighting the auxiliary information comprises:
    基于所述目标文本的显示样式来配置所述辅助信息的显示样式。The display style of the auxiliary information is configured based on the display style of the target text.
  9. 一种输入处理方法,包括:An input processing method, including:
    接收用户的输入;Receive user input;
    获取与所述输入相关的图像;Acquiring an image related to the input;
    将所述输入与所述图像发送至服务器,以便所述服务器查找所述图像中与所述输入相匹配的目标文本;以及Sending the input and the image to a server, so that the server finds a target text in the image that matches the input; and
    对于查找到的目标文本,在所述图像中突出显示所述目标文本,和/或显示所述目标文本的辅助信息。For the found target text, highlight the target text in the image, and/or display auxiliary information of the target text.
  10. 一种输入处理方法,包括:An input processing method, including:
    接收用户的输入;Receive user input;
    获取与所述输入相关的图像;Acquiring an image related to the input;
    查找所述图像中与所述输入相匹配的目标文本;以及Find the target text in the image that matches the input; and
    对于查找到的目标文本,在所述图像中突出显示所述目标文本,和/或显示所述目标文本的翻译。For the found target text, highlight the target text in the image, and/or display the translation of the target text.
  11. 如权利要求10所述的方法,还包括:The method of claim 10, further comprising:
    基于所述输入获取输入文本,所述输入文本以输入语言表示;以及Obtaining input text based on the input, the input text being expressed in an input language; and
    查找所述图像中与所述输入相匹配的目标文本的步骤包括:The step of finding the target text in the image that matches the input includes:
    获取所述图像包含的文本,所述图像包含的文本以不同于所述输入语言的源语言表示;Acquiring text contained in the image, where the text contained in the image is expressed in a source language different from the input language;
    获取所述输入文本到所述源语言的翻译;Obtaining a translation of the input text into the source language;
    将所述图像包含的文本与所述输入文本的所述翻译进行文本匹配。Perform text matching between the text contained in the image and the translation of the input text.
  12. 如权利要求11所述的方法,其中,所述翻译包括所述目标文本到所述输入语言的翻译、或者所述目标文本到用户指定的目标语言的翻译。The method according to claim 11, wherein the translation comprises a translation of the target text into the input language, or a translation of the target text into a target language specified by a user.
  13. 一种输入处理方法,包括:An input processing method, including:
    接收用户的输入;Receive user input;
    获取与所述输入相关的图像;Acquiring an image related to the input;
    将所述输入与所述图像发送至服务器,以便所述服务器查找所述图像中与所述输入相匹配的目标文本;以及Sending the input and the image to a server, so that the server finds a target text in the image that matches the input; and
    对于查找到的目标文本,在所述图像中突出显示所述目标文本,和/或显示所述目标文本的翻译。For the found target text, highlight the target text in the image, and/or display the translation of the target text.
  14. 一种输入处理方法,包括:An input processing method, including:
    接收用户的输入;Receive user input;
    获取与所述输入相关的图像;Acquiring an image related to the input;
    查找所述图像中与所述输入相匹配的目标图像;以及Searching for a target image in the image that matches the input; and
    对于查找到的目标图像,在所述图像中突出显示所述目标图像,和/或显示所述目标图像的辅助信息。For the found target image, highlight the target image in the image, and/or display auxiliary information of the target image.
  15. 如权利要求14所述的方法,其中,所述用户的输入至少包括以下之一:文本、音频、视频和图像,所述方法还包括:The method of claim 14, wherein the user's input includes at least one of the following: text, audio, video, and image, and the method further comprises:
    基于所述输入获取输入图像。An input image is acquired based on the input.
  16. 如权利要求15所述的方法,其中,查找所述图像中与所述输入相匹配的目标图像的步骤包括:The method of claim 15, wherein the step of searching for a target image in the image that matches the input comprises:
    获取所述图像和所述输入图像的图像特征;Acquiring image features of the image and the input image;
    基于所述图像特征,将所述图像与所述输入图像进行图像匹配。Based on the image feature, image matching is performed between the image and the input image.
  17. 如权利要求14所述的方法,其中,显示所述辅助信息的步骤包括:The method of claim 14, wherein the step of displaying the auxiliary information comprises:
    突出显示所述辅助信息。Highlight the auxiliary information.
  18. 如权利要求14或17所述的方法,还包括:The method according to claim 14 or 17, further comprising:
    对于查找到的目标图像,获取所述目标图像在所述图像中的显示区域。For the found target image, the display area of the target image in the image is acquired.
  19. 如权利要求18所述的方法,其中,显示或者突出显示所述辅助信息的步骤包括:The method of claim 18, wherein the step of displaying or highlighting the auxiliary information comprises:
    基于所述目标图像的显示区域来配置所述辅助信息的显示区域,所述辅助信息的显示区域覆盖所述目标图像的显示区域。The display area of the auxiliary information is configured based on the display area of the target image, and the display area of the auxiliary information covers the display area of the target image.
  20. 一种输入处理方法,包括:An input processing method, including:
    接收用户的输入;Receive user input;
    获取与所述输入相关的图像;Acquiring an image related to the input;
    将所述输入与所述图像发送至服务器,以便所述服务器查找所述图像中与所述输入相匹配的目标图像;以及Sending the input and the image to a server, so that the server finds a target image in the image that matches the input; and
    对于查找到的目标图像,在所述图像中突出显示所述目标图像,和/或显示所述目标图像的辅助信息。For the found target image, highlight the target image in the image, and/or display auxiliary information of the target image.
  21. 一种输入处理方法,包括:An input processing method, including:
    接收用户的输入;Receive user input;
    获取与所述输入相关的源对象;Obtain a source object related to the input;
    查找所述源对象中与所述输入相匹配的目标对象;以及Searching for a target object that matches the input among the source objects; and
    对于查找到的目标对象,在所述源对象中突出显示所述目标对象,和/或显示所述目标对象的辅助信息。For the found target object, highlight the target object in the source object, and/or display auxiliary information of the target object.
  22. 如权利要求21所述的方法,其中,所述用户的输入至少包括以下之一:文本、音频、视频和图像,所述源对象至少包括以下之一:文本、图像和视频。The method of claim 21, wherein the user's input includes at least one of the following: text, audio, video, and image, and the source object includes at least one of the following: text, image, and video.
  23. 如权利要求21所述的方法,其中,所述目标对象为目标文本或者目标图像。The method according to claim 21, wherein the target object is a target text or a target image.
  24. 一种输入处理方法,包括:An input processing method, including:
    接收用户的输入;Receive user input;
    获取与所述输入相关的源对象;Obtain a source object related to the input;
    将所述输入与所述源对象发送至服务器,以便所述服务器查找所述源对象中与所述输入相匹配的目标对象;以及Sending the input and the source object to a server, so that the server finds a target object matching the input among the source objects; and
    对于查找到的目标对象,在所述源对象中突出显示所述目标对象,和/或显示所述目标对象的辅助信息。For the found target object, highlight the target object in the source object, and/or display auxiliary information of the target object.
  25. 一种输入处理装置,包括:An input processing device, including:
    交互模块,适于接收用户的输入;Interactive module, suitable for receiving user input;
    图像获取模块,适于获取与所述输入相关的图像;An image acquisition module, adapted to acquire an image related to the input;
    文本匹配模块,适于查找所述图像中与所述输入相匹配的目标文本;其中A text matching module, adapted to find the target text in the image that matches the input; wherein
    所述交互模块还适于对于查找到的目标文本,在所述图像中突出显示所述目标文本,和/或显示所述目标文本的辅助信息。The interaction module is further adapted to highlight the target text in the image and/or display auxiliary information of the target text for the found target text.
  26. 一种输入处理装置,包括:An input processing device, including:
    交互模块,适于接收用户的输入;Interactive module, suitable for receiving user input;
    图像获取模块,适于获取与所述输入相关的图像;An image acquisition module, adapted to acquire an image related to the input;
    图像匹配模块,适于查找所述图像中与所述输入相匹配的目标图像;其中An image matching module, adapted to find a target image in the image that matches the input; wherein
    所述交互模块还适于对于查找到的目标图像,在所述图像中突出显示所述目标图像,和/或显示所述目标图像的辅助信息。The interaction module is further adapted to highlight the target image in the searched target image, and/or display auxiliary information of the target image.
  27. 一种计算设备,包括:A computing device including:
    一个或多个处理器;One or more processors;
    存储器;和Memory; and
    程序,其中所述程序存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述程序包括用于执行根据权利要求1-24中任一项所述的输入处理方法的指令。A program, wherein the program is stored in the memory and is configured to be executed by the one or more processors, and the program comprises a method for executing the input processing method according to any one of claims 1-24 Instructions.
  28. 一种存储程序的计算机可读存储介质,所述程序包括指令,所述指令当计算设备执行时,使得所述计算设备执行根据权利要求1-24中任一项所述的输入处理方法。A computer-readable storage medium storing a program, the program comprising instructions that, when executed by a computing device, cause the computing device to execute the input processing method according to any one of claims 1-24.
PCT/CN2021/070410 2020-01-08 2021-01-06 Input processing method and apparatus, and computing device WO2021139667A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010019082.XA CN113095090A (en) 2020-01-08 2020-01-08 Input processing method and device and computing equipment
CN202010019082.X 2020-01-08

Publications (1)

Publication Number Publication Date
WO2021139667A1 true WO2021139667A1 (en) 2021-07-15

Family

ID=76664046

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/070410 WO2021139667A1 (en) 2020-01-08 2021-01-06 Input processing method and apparatus, and computing device

Country Status (2)

Country Link
CN (1) CN113095090A (en)
WO (1) WO2021139667A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750003A (en) * 2012-05-30 2012-10-24 华为技术有限公司 Method and device for text input
US20140082550A1 (en) * 2012-09-18 2014-03-20 Michael William Farmer Systems and methods for integrated query and navigation of an information resource
CN110232111A (en) * 2019-05-30 2019-09-13 杨钦清 A kind of text display method, device and terminal device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750003A (en) * 2012-05-30 2012-10-24 华为技术有限公司 Method and device for text input
US20140082550A1 (en) * 2012-09-18 2014-03-20 Michael William Farmer Systems and methods for integrated query and navigation of an information resource
CN110232111A (en) * 2019-05-30 2019-09-13 杨钦清 A kind of text display method, device and terminal device

Also Published As

Publication number Publication date
CN113095090A (en) 2021-07-09

Similar Documents

Publication Publication Date Title
US11227326B2 (en) Augmented reality recommendations
US9922431B2 (en) Providing overlays based on text in a live camera view
US9760778B1 (en) Object recognition and navigation from ongoing video
US8943092B2 (en) Digital ink based contextual search
US20140253592A1 (en) Method for providing augmented reality, machine-readable storage medium, and portable terminal
CN109618222A (en) A kind of splicing video generation method, device, terminal device and storage medium
US20200258146A1 (en) Electronic purchase order generation method and device, terminal and storage medium
CN109189879B (en) Electronic book display method and device
US20140330814A1 (en) Method, client of retrieving information and computer storage medium
US20130174090A1 (en) Contact searching method and apparatus, and applied mobile terminal
KR20140030361A (en) Apparatus and method for recognizing a character in terminal equipment
KR101642340B1 (en) Method and apparatus for providing function of a portable terminal using color sensor
KR20120057799A (en) Method and apparatus for providing dictionary function in a portable terminal
US20220101638A1 (en) Image processing method, and electronic device supporting same
TWI710989B (en) Business execution method, device and equipment applied to client and server
CN109085982B (en) Content identification method and device and mobile terminal
CN108256071B (en) Method and device for generating screen recording file, terminal and storage medium
CN107679128B (en) Information display method and device, electronic equipment and storage medium
US20130039535A1 (en) Method and apparatus for reducing complexity of a computer vision system and applying related computer vision applications
WO2014176938A1 (en) Method and apparatus of retrieving information
CN114067797A (en) Voice control method, device, equipment and computer storage medium
WO2021139667A1 (en) Input processing method and apparatus, and computing device
CN106650727B (en) Information display method and AR equipment
EP4109334A1 (en) Character selection method and apparatus employing character recognition, and terminal device
US20230185434A1 (en) Method for providing tag, and electronic device for supporting same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21738881

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21738881

Country of ref document: EP

Kind code of ref document: A1