WO2021139667A1 - Procédé et appareil de traitement d'entrée, et dispositif informatique - Google Patents

Procédé et appareil de traitement d'entrée, et dispositif informatique Download PDF

Info

Publication number
WO2021139667A1
WO2021139667A1 PCT/CN2021/070410 CN2021070410W WO2021139667A1 WO 2021139667 A1 WO2021139667 A1 WO 2021139667A1 CN 2021070410 W CN2021070410 W CN 2021070410W WO 2021139667 A1 WO2021139667 A1 WO 2021139667A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
input
text
target
auxiliary information
Prior art date
Application number
PCT/CN2021/070410
Other languages
English (en)
Chinese (zh)
Inventor
葛妮瑜
方视菁
胡雪梅
李洁
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2021139667A1 publication Critical patent/WO2021139667A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/418Document matching, e.g. of document images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/535Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation

Definitions

  • the present invention relates to the field of computer technology, in particular to an input processing method, device and computing equipment.
  • computing devices such as mobile terminals and personal computers
  • users are becoming more and more accustomed to using computing devices to handle various daily affairs.
  • a user may wish to use a computing device to quickly locate part of the information the user pays attention to, or obtain auxiliary information of the part of the information the user pays attention to.
  • a typical application scenario is that when traveling in a region where different languages are spoken, and faced with a large amount of information in an unfamiliar language, the user hopes to use a computing device to locate part of the information that the user pays attention to and obtain the translation of this part of the information.
  • a user can use a camera installed in a computing device to collect an image that includes a large amount of information, the computing device recognizes all the information in the image, and displays auxiliary information of all the information to the user.
  • the image may be an image of a map, which includes the names of many different places.
  • the user interface that displays the translations of all the place names may be tiring for the user, because the user needs to read one by one to find the place he cares about or is interested in, and obtain the translation for that place. This process is quite time-consuming and laborious, which reduces the user experience.
  • embodiments of the present invention provide an input processing method, device, and computing device to try to solve or at least alleviate the above problems.
  • an input processing method including: receiving a user's input; acquiring an image related to the input; searching for a target text in the image that matches the input; and for the found target text, Highlight the target text in the image, and/or display auxiliary information for the target text.
  • the user's input includes at least one of the following: text, image, video, and audio.
  • the method further includes: acquiring the input text based on the user's input.
  • the step of searching for the target text in the image that matches the input includes: obtaining the text contained in the image; and performing text matching between the text contained in the image and the input text.
  • the step of matching the text contained in the image with the input text includes: in the case that the language of the input text is different from the language of the text contained in the image, obtaining the input text to The language translation of the text contained in the image; the text contained in the image is matched with the translation of the input text.
  • the step of displaying the auxiliary information includes: highlighting the auxiliary information.
  • the method further includes: for the found target text, obtaining display information of the target text in the image, the display information including at least the display area and/or display style of the target text.
  • the step of displaying or highlighting the auxiliary information includes: configuring the display area of the auxiliary information based on the display area of the target text, and the display area of the auxiliary information covers the display area of the target text.
  • the step of displaying or highlighting the auxiliary information includes: configuring the display style of the auxiliary information based on the display style of the target text.
  • an input processing method including: receiving user input; acquiring an image related to the input; sending the input and the image to a server, so that the server finds the image that matches the input Target text; and for the found target text, highlight the target text in the image, and/or display auxiliary information of the target text.
  • an input processing method including: receiving a user's input; acquiring an image related to the input; searching for a target text in the image that matches the input; and for the searched target text , Highlight the target text in the image, and/or show the translation of the target text.
  • the method further includes: obtaining input text based on the input, and the input text is expressed in the input language; and the step of searching for the target text in the image that matches the input includes: obtaining text contained in the image , The text contained in the image is expressed in a source language different from the input language; the translation of the input text to the source language is obtained; the text contained in the image is matched with the translation of the input text.
  • the translation includes the translation of the target text into the input language, or the translation of the target text into the target language specified by the user.
  • an input processing method including: receiving user input; acquiring an image related to the input; sending the input and the image to a server, so that the server finds the image that matches the input Target text; and for the found target text, highlight the target text in the image, and/or display the translation of the target text.
  • an input processing method including: receiving a user's input; acquiring an image related to the input; searching for a target image in the image that matches the input; and for the searched target image , Highlight the target image in the image, and/or display auxiliary information of the target image.
  • the user's input includes at least one of the following: text, audio, video, and image
  • the method further includes: acquiring an input image based on the input.
  • the step of searching for a target image in the image that matches the input includes: acquiring image features of the image and the input image; and performing image matching between the image and the input image based on the image features.
  • the step of displaying the auxiliary information includes: highlighting the auxiliary information.
  • the method further includes: for the searched target image, acquiring the display area of the target image in the image.
  • the step of displaying or highlighting the auxiliary information includes: configuring the display area of the auxiliary information based on the display area of the target image, and the display area of the auxiliary information covers the display area of the target image.
  • an input processing method including: receiving user input; acquiring an image related to the input; sending the input and the image to a server, so that the server finds the image that matches the input Target image; and for the searched target image, highlight the target image in the image, and/or display auxiliary information of the target image.
  • an input processing method including: receiving a user's input; obtaining a source object related to the input; searching for a target object matching the input in the source object; Target object, highlight the target object in the source object, and/or display auxiliary information of the target object.
  • the user's input includes at least one of the following: text, audio, video, and image
  • the source object includes at least one of the following: text, image, and video.
  • the target object is a target text or a target image.
  • an input processing method including: receiving user input; obtaining a source object related to the input; sending the input and the source object to a server, so that the server can find the source object and the input The matched target object; and for the found target object, highlight the target object in the source object, and/or display auxiliary information of the target object.
  • an input processing device including: an interaction module, adapted to receive user input; an image acquisition module, adapted to acquire an image related to the input; and a text matching module, adapted to search The target text in the image that matches the input; wherein the interaction module is also adapted to highlight the target text in the image and/or display auxiliary information of the target text for the found target text.
  • an input processing device including: an interaction module, adapted to receive user input; an image acquisition module, adapted to acquire an image related to the input; an image matching module, adapted to search The target image matching the input in the image; wherein the interaction module is also suitable for highlighting the target image in the image and/or displaying auxiliary information of the target image for the found target image.
  • a computing device including: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be configured by Executed by one or more processors, and the one or more programs include instructions for executing the input processing method according to the embodiment of the present invention.
  • a computer-readable storage medium storing one or more programs.
  • the one or more programs include instructions that, when executed by a computing device, cause the computing device to execute according to The input processing method of the embodiment of the present invention.
  • the target object for example, target text or target image
  • the input of the source object for example, image
  • the target object is highlighted for the user, and/or the target object is displayed
  • the auxiliary information in the source object avoids the tedious operation of reading and finding the target object that the user pays attention to in the source object, and improves the user experience.
  • the input processing solution object according to the embodiment of the present invention only needs to obtain the auxiliary information of the target object included in the source object, which greatly reduces the workload.
  • Fig. 1 shows a schematic diagram of a computing device 100 according to an embodiment of the present invention
  • FIG. 2 shows a structural block diagram of an input processing device 200 according to an embodiment of the present invention
  • Fig. 3 shows a schematic diagram of an input-related image according to an embodiment of the present invention
  • 4A-4C respectively show screenshots of a user interface for displaying the translation of target text in an image according to an embodiment of the present invention.
  • FIG. 5 shows a flowchart of an input processing method 500 according to an embodiment of the present invention
  • FIG. 6 shows a flowchart of an input processing method 600 according to an embodiment of the present invention
  • FIG. 7 shows a structural block diagram of an input processing device 700 according to an embodiment of the present invention.
  • FIG. 8 shows a flowchart of an input processing method 800 according to an embodiment of the present invention.
  • FIG. 9 shows a flowchart of an input processing method 900 according to an embodiment of the present invention.
  • an embodiment of the present invention discloses an input processing device, which can receive user input, obtain an image related to the input and carry a large amount of information, and highlight the target of the user’s attention in the image. Object, and/or display auxiliary information of the target object, so that the user can quickly locate the target object and/or obtain auxiliary information of the target object.
  • Fig. 1 shows a schematic diagram of a computing device 100 according to an embodiment of the present invention.
  • the computing device 100 is an electronic device capable of collecting and/or displaying images, such as a personal computer, a mobile communication device (for example, a smart phone), a tablet computer, and other devices that can collect and/or display images.
  • the computing device 100 may include a memory interface 102, one or more processors 104, and a peripheral interface 106.
  • the memory interface 102, the one or more processors 104, and/or the peripheral interface 106 may be discrete components or integrated in one or more integrated circuits.
  • various elements may be coupled through one or more communication buses or signal lines. Sensors, devices, and subsystems can be coupled to the peripheral interface 106 to help achieve multiple functions.
  • the motion sensor 110, the light sensor 112, and the distance sensor 114 may be coupled to the peripheral interface 106 to facilitate functions such as orientation, lighting, and distance measurement.
  • Other sensors 116 can also be connected to the peripheral interface 106, such as a positioning system (such as a GPS receiver), a temperature sensor, a biometric sensor or other sensing devices, which can help implement related functions.
  • the camera subsystem 120 and the optical sensor 122 can be used to facilitate the realization of camera functions such as capturing images.
  • the camera subsystem 120 and the optical sensor 122 can be, for example, a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS).
  • CMOS complementary metal oxide semiconductor
  • the computing device 100 may help implement communication functions through one or more wireless communication subsystems 124, where the wireless communication subsystem 124 may include a radio frequency receiver and transmitter and/or an optical (for example, infrared) receiver and transmitter.
  • the specific design and implementation of the wireless communication subsystem 124 may depend on one or more communication networks supported by the computing device 100.
  • the computing device 100 may include a wireless communication subsystem 124 designed to support a GSM network, a GPRS network, an EDGE network, a Wi-Fi or WiMax network, and a BluetoothTM network.
  • the audio subsystem 126 may be coupled with the speaker 128 and the microphone 130 to help implement voice-enabled functions, such as voice recognition, voice reproduction, digital recording, and telephony functions.
  • the I/O subsystem 140 may include a display controller 142 and/or one or more other input controllers 144.
  • the display controller 142 may be coupled to the display 146.
  • the display 146 may be, for example, a liquid crystal display (LCD), a touch screen, or other types of displays.
  • the display 146 and the display controller 142 may use any one of a variety of touch sensing technologies to detect contact and movement or pause with them, where sensing technologies include but are not limited to capacitive, Resistive, infrared and surface acoustic wave technology.
  • One or more other input controllers 144 may be coupled to other input/control devices 148, such as one or more buttons, rocker switches, thumbwheels, infrared ports, USB ports, and/or pointing devices such as stylus .
  • One or more of the buttons may include an up/down button for controlling the volume of the speaker 128 and/or the microphone 130.
  • the memory interface 102 may be coupled with the memory 150.
  • the memory 150 may include high-speed random access memory and/or non-volatile memory, such as one or more magnetic disk storage devices, one or more optical storage devices, and/or flash memory (eg, NAND, NOR).
  • the memory 150 may store a program 154, and the program 154 runs on the operating system 152 stored in the memory 150.
  • the operating system 152 will be loaded from the memory 150 and executed by the processor 104.
  • the program 154 is running, it is also loaded from the memory 150 and executed by the processor 104.
  • one of the programs is the input processing apparatus 200, 700 according to the embodiment of the present invention, and includes instructions configured to execute the input processing methods 500, 600, and 800 according to the embodiment of the present invention.
  • the following describes the input processing device 200 and the input processing methods 500 and 600 executed by it.
  • the input-related image displays multiple texts (also referred to as containing multiple texts), and the user wishes to locate the target text in the multiple texts displayed in the image.
  • the input processing device 200 may receive the user's input on a dish of interest, and obtain an image of a restaurant menu.
  • the image of the restaurant menu includes the logo text of multiple dishes, and the input processing device 200 can highlight the logo text of the dish that the user is interested in in the image, so that the user can quickly locate the dish of interest.
  • the evaluation or introduction of the dish can also be displayed for the user's reference.
  • the input processing device 200 may receive a user's input to a destination subway station, and obtain an image of a subway line.
  • the image of the subway line includes the logo text of multiple subway stations, and the input processing device 200 can highlight the logo text of the target subway station in the image, so that the user can quickly locate the target subway station.
  • the translation of the destination subway station can also be displayed for the user's reference.
  • the input processing device 200 may receive a user's input on a commodity that he wants to purchase, and obtain an image of a store shelf.
  • the image of the store shelf includes the logo text of multiple commodities, and the input processing device 200 can highlight the logo text of the commodity that the user wants to purchase in the image, so that the user can quickly locate the commodity.
  • the evaluation, introduction or reference price of the product can also be displayed for the user's reference.
  • Fig. 2 shows a schematic diagram of an input processing device 200 according to an embodiment of the present invention.
  • the input processing apparatus 200 includes an interaction module 210, which can receive a user's input, and the input can indicate a target text that the user pays attention to.
  • the interaction module 210 may receive user input via a user interface (which will be described in detail later).
  • the user's input can be in various forms, for example, it can include but is not limited to one of the following: text, image, video, and audio.
  • the interaction module 210 may send the input to the recognition module 220, and the recognition module 220 obtains text based on the user's input.
  • the recognition module 220 obtains text based on the user's input.
  • any image recognition technology and voice recognition technology can be used to obtain the text, which is not limited in the present invention.
  • this text refers to the text input by the user, the text obtained based on the image input by the user, the text obtained based on the video input by the user, and the text obtained based on the audio input by the user as input text.
  • the input text can be one or multiple input texts.
  • multiple input texts can be distinguished based on separators such as punctuation marks.
  • the user's input is "Chaoyangmen; Dongqiao; Jintai Road", that is, the following three input texts separated by semicolons: “Chaoyangmen", “East Bridge” and "Jintai Road”.
  • the user inputs an image multiple input texts obtained based on the image can be distinguished according to different text blocks of the input text in the image (which will be described in detail later).
  • a separator word such as "interval”.
  • the user inputs a video since the video includes images and audio, processing can be performed with reference to the situation of the input image and/or audio.
  • the text matching module 230 in the input processing device 200 may search for a target text matching the input in an image related to the input.
  • the input processing device 200 further includes an image acquisition module 240, which can acquire an image related to the user's input.
  • the camera of the computing device 100 can be used to collect related images, or the related images sent to the input processing device 200 can be received via the network, and the source of the related images is not limited in the present invention.
  • the image related to the input may include multiple text blocks.
  • the input-related image displays multiple texts, each text corresponding to the image block in the image is a text block, and the text contained in each text block is a text.
  • Fig. 3 shows a schematic diagram of an input-related image according to an embodiment of the present invention.
  • the image is an image of a subway line and includes the logo text of many subway stations (that is, the names of the subway stations).
  • the image block occupied by the name of each subway station in the image is a text block, and the name of the subway station displayed in each text block is a text.
  • the input-related images can be in various formats, for example, bitmap image formats such as JPEG, BMP, PNG, etc., and vector graphics formats such as SVG and SWF.
  • bitmap image formats such as JPEG, BMP, PNG, etc.
  • vector graphics formats such as SVG and SWF.
  • the present invention does not limit the format of the image.
  • the image acquisition module 240 can directly acquire the text contained in the image (without image recognition).
  • the image acquisition module 240 needs to send the image to the recognition module 220, and the recognition module 220 performs image recognition.
  • the recognition module 220 can obtain the text contained in the image, that is, obtain multiple texts contained in multiple text blocks.
  • the recognition module 220 may use optical character recognition (OCR, Optical Character Recognition) technology to analyze the image and recognize text in the image.
  • OCR optical character recognition
  • the recognition module 220 can also detect texts in multiple different languages.
  • the recognition module 220 may include an OCR engine capable of recognizing text in multiple languages, or an OCR engine for each of multiple different languages.
  • OCR optical character recognition
  • other image recognition technologies can also be used to recognize the text in the image, which is not limited in the present invention.
  • the recognition module 220 may also detect the display information of the text in the image.
  • the display information includes but is not limited to the display area and/or display style of the text in the image.
  • the display area indicates the position of the text in the image, such as the coordinates of the text block where the text is located.
  • the display style can include text color, background color, font size, font type, and so on. In some embodiments, this display information can be used to identify different text blocks in the image. For example, in the case where two parts of text have different font colors, different background colors, or are separated from each other (for example, separated by at least a threshold distance), the text recognition module 220 may determine that the two parts of text in the image are Two texts contained in two different text blocks.
  • the recognition module 220 can also use natural language processing (NLP) technology to correct errors generated in image recognition, such as segmentation errors, text errors, grammatical errors, and so on.
  • NLP natural language processing
  • the image obtaining module 240 sends the text contained in the image to the text matching module 230.
  • the text matching module 230 may perform text matching between the text contained in the image and the input text to find a target text in the image that matches the input text. Among them, for each input text, the text contained in the image is matched with the input text of the item to find the target text in the image that matches the input text of the item.
  • the text matching module 230 can determine whether the input text and the text in the image acquired by the image acquisition module 240 use the same language.
  • the language of the text in the image acquired by the image acquisition module 240 is the same language as the language of the input text. That is, if the language of the text in the image acquired by the image acquisition module 240 is called the source language, and the language of the input text is called the input language, then the source language and the input language are the same language, the text matching module 230 may Straightly find the target text that matches the input text in the text contained in the image. For example, among multiple texts contained in the image, directly search for the target text that includes the input text of the item or includes at least a part of the input text of the item.
  • the language of the text in the image acquired by the image acquisition module 240 and the language of the input text are two different languages. That is, the source language and the input language are two different languages.
  • the text matching module 230 needs to first translate the text obtained from the image into the input language, or translate the input text into the source language.
  • the text matching module 230 may send the text to be translated to the translation engine 250 for translation.
  • the text matching module 230 performs text matching between the text contained in the image and the translation of the input text into the source language, so as to find the target text that matches the input text in the text contained in the image.
  • the user interface can be used to prompt the user. If the target text in the image that matches the input text of the item is found, the vibration, ringing, blinking signal light and/or user interface of the computing device 1 can be used to prompt the user.
  • the text matching module 230 may display the target text matching the input text in the image.
  • the target text can be highlighted again in the text block where the target text is located in the image (for example, covering the text block) (refer to the description of highlighting auxiliary information below).
  • various borders such as rectangular boxes
  • shapes such as arrows
  • line segments such as wavy lines
  • the text matching module 230 may obtain auxiliary information of the target text that matches the input text. It should be noted that the embodiment of the present invention does not limit the specific content of the auxiliary information, and any information related to the target text that can assist the user is within the protection scope of the present invention.
  • the auxiliary information may be information such as reviews, introductions, reference prices, purchase channels, etc.
  • the text matching module 230 may obtain such auxiliary information from various search engines, review websites, or shopping platforms.
  • the auxiliary information may be translation.
  • the text matching module 230 may send the target text matching the input text to the translation engine 250 to obtain the translation of the target text.
  • the translation engine 250 can translate the target text into different languages.
  • the translation engine 250 may translate the text into the target language specified by the user, or the default language of the computing device 100 (for example, when the target language is not specified), or the input language of the input text (for example, the target language is not specified and the input language is different).
  • the user can use the user interface to specify the target language (will be described in detail later).
  • the interaction module 210 may display or highlight the auxiliary information using a user interface displaying an image related to the input.
  • the interaction module 210 may obtain the display information of the target text in the image, and display or highlight the auxiliary information of the target text based on the display information.
  • the display area of the auxiliary information can be configured based on the display area of the target text.
  • the display area of the auxiliary information can cover the corresponding text block of the target text in the image, and can also be close to the corresponding text block of the target text in the image (for example, displayed around the corresponding text block).
  • the display style of the auxiliary information can be configured based on the display style of the target text.
  • at least part of the display style of the auxiliary information may be configured to be consistent with the corresponding display style of the target text (for example, the font size and font type configuration are consistent).
  • at least part of the display style of the auxiliary information may be configured to be significantly different from the corresponding display style of the target text.
  • the background color (or text color) of the auxiliary information can be configured as a bright color or a contrast color of the background color of the text contained in the image to highlight the auxiliary information.
  • display styles such as underline, bold, italic, text background, text shading, and text border may be used to highlight the auxiliary information.
  • the embodiment of the present invention does not limit the specific display style used to display or highlight the auxiliary information.
  • the mark of the target text matching the same input text in the image or the display style of its auxiliary information can be the same.
  • the mark of the target text or the display style of its auxiliary information that matches the input text of different items can be different.
  • the displayed auxiliary information may be editable text, so that the user can perform editing operations such as copy and paste.
  • the interaction module 210 may also receive a user's zoom instruction or selection instruction on the displayed auxiliary information, and in response to the user's zoom instruction or selection instruction on the displayed auxiliary information, zoom in or out accordingly. Display the auxiliary information to facilitate the user's reading.
  • the interaction module 210 may also receive a save instruction from the user, and in response to the save instruction, store the above-mentioned image displaying the auxiliary information in the computing device 100, so as to facilitate subsequent viewing by the user.
  • the interaction module 210 may also receive a user's selection instruction for other text blocks that do not display auxiliary information, and in response to the user's selection instruction for other text blocks that do not display auxiliary information, obtain and display The auxiliary information of the text contained in the selected text block.
  • the specific display configuration has been described in detail in the previous section, and will not be repeated here.
  • FIG. 4A to 4C show screenshots of a user interface for displaying translation of target text in an image according to an embodiment of the present invention.
  • the user interface 410 displays an image 411, and the image 411 may be collected in response to the user's selection of the camera button 412.
  • the image 411 includes a plurality of text blocks, and each text block includes one item of text.
  • each text block included in the image 411 includes an English text.
  • the text block 416 includes an English text "King's Cross St Pancras".
  • the user interface 410 may also enable the user to select a language for translation. Specifically, the user interface 410 enables the user to select the source language 413, so that multiple texts in the source language will be recognized in the image 411. The user interface 410 also enables the user to select the target language 414 into which the text will be translated. In the example of FIG. 4A, the user has selected the source language 413 as English and the target language 414 as Chinese. That is, the user wants to translate the English text recognized in the image 411 into Chinese text.
  • the user interface 410 may also enable the user to input.
  • the user interface includes an input button 415.
  • the user interface 420 as shown in FIG. 4B may be displayed.
  • the user interface 420 is similar to the user interface 410 and includes an image 411 and an input button 415, and also includes a control 421.
  • the control 421 can receive the user's input, and can also prompt the user to enter the target language 414. For example, the input text in the target language 414 or the image/audio including the input text in the target language 414 may be input. In the example of FIG. 4B, the control 421 prompts the user to input Chinese input text. After the user inputs, in response to the control 421 receiving the user's input, a user interface 430 as shown in FIG. 4C may be displayed.
  • the user interface 430 is similar to the user interface 410 and includes an image 411 and an input button 415, and also includes a control 431.
  • the control 431 can display one or more items of input text obtained from the user's input, and can also display the search result of at least one item of text that matches in the image 411 based on the one or more items of input text.
  • the control 431 displays an input text "King's Cross" and the search result of the input text "A King's Cross has been found".
  • the user interface 430 also displays an overlay 432 on top of the image 411.
  • the overlay 432 displays the translation 433 of the target text matching the user's input text in the image 411, and overlays the text block corresponding to the matching target text.
  • the text "King's Cross St Pancras" contained in the text block 416 in the image 411 matches the user's input text "King's Cross", covering 432 to display the text "King's Cross St Pancras" "The translation 433 "King's Cross” and overlays the corresponding text block 416.
  • the user interface 430 may configure the display style of the translation of the target text based on the display style of the matched target text.
  • the background color of the translation 433 and the matching target text "King's Cross St Pancras" are similar (for example, both are white), and the text color is similar (for example, both are blue), which helps to translate the translation 433 It is displayed in coordination with the image 411.
  • the user interface 430 may also highlight the translation.
  • the background color of the text of the translation 433 is a bright color (for example, bright blue), which helps to highlight the translation 433.
  • the user can quickly and easily locate the target text of interest or interest in the image 411 and obtain its translation, without passively reading the translation of all the text in the image. For example, after capturing the image 411 and inputting "Kings Cross" through the user interface 420, the user can easily locate the "Kings Cross” subway station through the user interface 430 and see its translation.
  • the user interface 430 may also enable the user to download and store the image 411 with the translation 433 displayed.
  • the user interface 430 includes a download button 434.
  • the image 411 displaying the translation 433 can be downloaded and stored for the user to view later.
  • FIG. 5 shows a flowchart of an input processing method 500 according to an embodiment of the present invention.
  • the input method 500 can be executed in the input processing device 200.
  • the input processing method 500 starts at step S510.
  • step S510 the user's input is received.
  • step S520 an image related to the user's input is acquired.
  • the user's input may include at least one of the following: text, image, video, and audio.
  • the input text may be obtained based on the user's input.
  • step S530 search for target text in the image that matches the user's input.
  • the text contained in the image can be obtained, and the text contained in the image can be matched with the input text to find the matching target text.
  • the language of the input text is different from the language of the text contained in the image
  • the translation of the input text into the language of the text contained in the image can also be obtained, and the text contained in the image can be matched with the translation of the input text. .
  • the target text may be highlighted in the input-related image, and/or auxiliary information of the target text may be displayed. Among them, auxiliary information can also be highlighted.
  • the display information of the target text in the above image can be acquired, and the auxiliary information can be displayed or highlighted based on the display information.
  • the display information may at least include the display area and/or display style of the target text.
  • the display area of the auxiliary information may be configured based on the display area of the target text.
  • the display area of the auxiliary information may cover or be close to the display area of the target text.
  • the display style of the auxiliary information can also be configured based on the display style of the target text.
  • Fig. 6 shows a flowchart of an input processing method 600 according to an embodiment of the present invention.
  • the input method 600 can be executed in the input processing device 200. As shown in FIG. 6, the input processing method 600 starts at step S610.
  • step S610 a user's input is received.
  • step S620 an image related to the user's input is acquired.
  • the input text may also be obtained based on the input.
  • step S630 search for the target text in the image that matches the user's input.
  • the text contained in the image can be obtained.
  • the input text is expressed in the input language
  • the text contained in the image is expressed in a source language different from the input language. Therefore, it is necessary to obtain the translation of the input text to the source language, and then perform the text translation between the text contained in the image and the input text Match to find matching target text.
  • the translation may include the translation of the target text into the input language, or the translation of the target text into the target language specified by the user.
  • the target text may be highlighted in the image, and/or the translation of the target text may be displayed.
  • the input processing apparatus 200 is illustrated as including an interaction module 210, a recognition module 220, a text matching module 230, an image acquisition module 240, and a translation engine 250, one or more of these modules may be It is stored on and/or executed by other devices, such as a server that communicates with the input processing apparatus 200 (for example, a server that can perform image recognition, voice recognition, text matching, and language translation).
  • the input processing device 200 may receive user input, obtain an image related to the input, and send the input and the image to the server. The server searches for the target text that matches the input in the image, and returns the search result to the input processing device 200.
  • the input processing device 200 then highlights the target text found by the server in the image, and/or displays auxiliary information of the target text.
  • auxiliary information of the target text please refer to the relevant description of the input processing apparatus 200 and the input processing method 500 in conjunction with FIGS. 1 to 5, and will not be repeated here.
  • the following describes the input processing device 700 and the input processing method 800 executed by it.
  • the image related to the input may be considered to include multiple image blocks.
  • the user wants to locate the image block he focuses on among the multiple image blocks included in the image, that is, the target image.
  • the input processing device 700 may receive the user's input of a commodity that he wants to purchase, and obtain an image of a store shelf.
  • the image of the store shelf includes logo images of multiple commodities, and the input processing device 700 can highlight the logo images of the commodities that the user wants to purchase in the images, so that the user can quickly locate the commodities.
  • the evaluation, introduction or reference price of the product can also be displayed for the user's reference.
  • FIG. 7 shows a schematic diagram of an input processing device 700 according to an embodiment of the present invention.
  • the input processing device 700 includes an interaction module 710, an image acquisition module 720, and an image matching module 730.
  • the interaction module 710 may receive a user's input, and the input may indicate a target image that the user pays attention to.
  • the image acquisition module 720 may acquire an image related to the user's input.
  • the image matching module 730 is coupled to the interaction module 710 and the image acquisition module 720, and can search for a target image matching the user's input among the above-mentioned images. For the found target image, the interaction module 720 may highlight the target image in the above-mentioned image, and/or display auxiliary information of the target image.
  • FIG. 8 shows a flowchart of an input processing method 800 according to an embodiment of the present invention. As shown in FIG. 8, the input processing method 800 is executed in the input processing device 700 and starts at step S810.
  • a user's input may be received.
  • a user's input may be received via a user interface, and the input may indicate a target image that the user focuses on.
  • the user's input can be in various forms, for example, it can include but is not limited to one of the following: text, image, video, and audio. If the user's input includes text, audio, or video, then the input image needs to be obtained based on the user's input. Among them, if the user inputs an image, the image is the input image. If the user inputs text, the input image can be obtained based on the text (for example, obtained through a search engine). If the user inputs audio, the text can be obtained based on the audio, and then the input image can be obtained based on the text. For example, if the user inputs the text of "X doll" or speaks the voice of "X doll", the image of "X” can be obtained based on the input. If the user inputs a video, since the video includes images and audio, the input image can be obtained by referring to the input image and audio.
  • the input image can be one or more input images.
  • multiple input images can naturally be distinguished.
  • multiple texts can be obtained first (multiple texts can be distinguished by separators such as punctuation marks), and then each input image is obtained based on each text.
  • the user inputs audio multiple texts can be obtained based on the audio first (multiple texts obtained based on the audio can be distinguished by pausing or saying a separator such as "interval"), and then based on each text separately Obtain each input image.
  • the user inputs a video since the video includes continuous images and audio, it can be handled with reference to the case of continuous input of images and/or input of audio.
  • an image related to the user's input may be acquired.
  • the camera of the computing device 100 can be used to collect related images, or the related images sent to the input processing device 700 can be received via the network, and the source of the related images is not limited in the present invention.
  • step S830 a target image in the image that matches the input of the user can be searched for.
  • a target image matching the input image(s) in the image is searched for each input image.
  • the image features of the image and the input image(s) can be acquired first, and then based on the acquired image features, the image can be matched with the input image. To find the target image in the image that matches the input image.
  • any image feature extraction technology can be used to obtain image features
  • any image matching technology can be used to perform image matching, which is not limited in the present invention.
  • the user interface can be used to prompt the user. If a target image in the image that matches the input image of the item is found, the vibration, ringing, blinking of the signal light and/or the user interface of the computing device 100 can be used to prompt the user.
  • the target image matching the input image and/or auxiliary information of the target image may be displayed prominently in the above-mentioned image.
  • the auxiliary information may be information such as introduction, evaluation, reference price, etc., which is not limited in the present invention.
  • auxiliary information related to the target image can be obtained from various search engines, review websites, or shopping platforms.
  • various marks such as borders (for example, rectangular frames), shapes (for example, arrows), line segments (for example, wavy lines), etc. may be added to the target image in the image, so as to highlight the target image. In this way, the user can quickly locate the target image without having to search one by one.
  • the display area of the target image in the above-mentioned image may be obtained, and the display area indicates the position of the target image in the image.
  • the display area of the auxiliary information is configured based on the display area of the target image.
  • the display area of the auxiliary information may cover or be close to the display area of the target image.
  • the display style of the auxiliary information can also be configured based on the display style of the text contained in the image.
  • at least part of the display style of the auxiliary information may be configured to be consistent with the corresponding display style of the text in the image (for example, the font size and font type configuration are consistent).
  • the display style of the auxiliary information may be configured to be significantly different from the corresponding display style of the text contained in the image.
  • the background color (or text color) of the auxiliary information can be configured as a bright color or a contrast color of the background color of the text contained in the image to highlight the auxiliary information.
  • display styles such as underline, bold, italic, text background, text shading, and text border may be used to highlight the auxiliary information.
  • the embodiment of the present invention does not limit the specific display style used to display or highlight the auxiliary information.
  • the mark and/or display style of the auxiliary information of the target image matching the same input image in the above-mentioned images can be the same,
  • the display styles of the mark and/or auxiliary information of the target image matching different input images in the above-mentioned images may be different.
  • the displayed auxiliary information may be editable text, so that the user can perform editing operations such as copy and paste.
  • Auxiliary information to facilitate the user's reading It is also possible to receive a save instruction from the user, and in response to the save instruction, store the above-mentioned image displaying the auxiliary information in the computing device 100 to facilitate subsequent viewing by the user.
  • the specific display configuration has been described in detail in the previous section, and will not be repeated here.
  • the input processing device 700 may receive user input, obtain an image related to the input, and send the input and the image to the server.
  • the server searches for a target image matching the input in the image, and returns the search result to the input processing device 700.
  • the input processing device 700 then highlights the target image found by the server in the image, and/or displays auxiliary information of the target image.
  • the embodiments of the present invention can not only be limited to locating target text or target images in images, but can also be extended to locating target texts, target images, etc. in various source objects such as text, images, and audio.
  • Various target objects such as images.
  • FIG. 9 shows a flowchart of an input processing method 900 according to an embodiment of the present invention. As shown in FIG. 9, the input processing method 900 starts at step S910.
  • a user's input is received, and the user's input may include at least one of the following: text, audio, video, and image.
  • the source object may include at least one of the following: text, image, and video.
  • step S930 a target object that matches the input of the user among the source objects can be searched.
  • the target object may be highlighted in the source object, and/or auxiliary information of the target object may be displayed.
  • the source object may generally include multiple objects, for example, multiple pieces of text or multiple image blocks.
  • the target object may be an object that the user pays attention to among multiple objects included in the source object, for example, it may be a target text, a target image, or any other suitable objects.
  • the present invention does not limit the specific types of the target object and the source object. The scenes in which the target text is searched from the image, the target text is searched from the text, the target image is searched from the image, etc. are all in the present invention. Within the scope of protection.
  • one or more steps in the input processing method 900 may also be executed by other devices, such as a server that communicates with the input processing apparatus that executes the input processing method 900.
  • a user's input may be received, an image related to the input may be obtained, and the input and the image may be sent to the server.
  • the server searches for the target image that matches the input in the image, and returns the search result.
  • the input processing device that executes the input processing method 900 then highlights the target image found by the server in the image, and/or displays auxiliary information of the target image.
  • the target object for example, target text or target image
  • the input of the source object for example, image
  • the target object is highlighted for the user, and/or the target object is displayed
  • the auxiliary information in the source object avoids the tedious operation of reading and finding the target object that the user pays attention to in the source object, and improves the user experience.
  • the input processing solution according to the embodiment of the present invention only needs to obtain the auxiliary information of the target object included in the source object, which greatly reduces the workload.
  • the various technologies described here can be implemented in combination with hardware or software, or a combination of them. Therefore, the method and device of the embodiment of the present invention, or some aspects or parts of the method and device of the embodiment of the present invention may be embedded in a tangible medium, such as a removable hard disk, U disk, floppy disk, CD-ROM, or any other machine.
  • a form of program code ie, instructions
  • a read storage medium where when the program is loaded into a machine such as a computer and executed by the machine, the machine becomes a device for practicing the embodiment of the present invention.
  • the computing device When the program code is executed on a programmable computer, the computing device generally includes a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), and at least one input device, And at least one output device.
  • the memory is configured to store program code; the processor is configured to execute the method of the embodiment of the present invention according to instructions in the program code stored in the memory.
  • readable media include readable storage media and communication media.
  • the readable storage medium stores information such as computer readable instructions, data structures, program modules, or other data.
  • Communication media generally embody computer-readable instructions, data structures, program modules or other data in modulated data signals such as carrier waves or other transmission mechanisms, and include any information delivery media. Combinations of any of the above are also included in the scope of readable media.
  • the algorithms and displays are not inherently related to any particular computer, virtual system or other equipment.
  • Various general-purpose systems can also be used with the examples of embodiments of the present invention. Based on the above description, the structure required to construct this type of system is obvious.
  • the embodiments of the present invention are not directed to any specific programming language. It should be understood that various programming languages can be used to implement the content of the embodiments of the present invention described herein, and the above description of specific languages is for the purpose of disclosing the best implementation of the embodiments of the present invention.
  • modules or units or components of the device in the example disclosed herein can be arranged in the device as described in this embodiment, or alternatively can be positioned differently from the device in this example Of one or more devices.
  • the modules in the foregoing examples can be combined into one module or, in addition, can be divided into multiple sub-modules.
  • modules or units or components in the embodiments can be combined into one module or unit or component, and in addition, they can be divided into multiple sub-modules or sub-units or sub-components. Except that at least some of such features and/or processes or units are mutually exclusive, any combination can be used to compare all the features disclosed in this specification (including the accompanying claims, abstract and drawings) and any method or methods disclosed in this manner or All the processes or units of the equipment are combined. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract and drawings) may be replaced by an alternative feature providing the same, equivalent or similar purpose.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

La présente invention concerne un procédé et un appareil de traitement d'entrée, ainsi qu'un dispositif informatique. Le procédé consiste à : recevoir une entrée d'un utilisateur ; acquérir une image associée à l'entrée ; rechercher sur l'image un texte cible correspondant à l'entrée ; et, en ce qui concerne le texte cible trouvé, mettre en évidence le texte cible sur l'image, et/ou afficher des informations annexes concernant le texte cible. L'invention concerne également l'appareil de traitement d'entrée correspondant, le dispositif informatique et un support de stockage.
PCT/CN2021/070410 2020-01-08 2021-01-06 Procédé et appareil de traitement d'entrée, et dispositif informatique WO2021139667A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010019082.X 2020-01-08
CN202010019082.XA CN113095090A (zh) 2020-01-08 2020-01-08 一种输入处理方法、装置及计算设备

Publications (1)

Publication Number Publication Date
WO2021139667A1 true WO2021139667A1 (fr) 2021-07-15

Family

ID=76664046

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/070410 WO2021139667A1 (fr) 2020-01-08 2021-01-06 Procédé et appareil de traitement d'entrée, et dispositif informatique

Country Status (2)

Country Link
CN (1) CN113095090A (fr)
WO (1) WO2021139667A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750003A (zh) * 2012-05-30 2012-10-24 华为技术有限公司 文本输入的方法和装置
US20140082550A1 (en) * 2012-09-18 2014-03-20 Michael William Farmer Systems and methods for integrated query and navigation of an information resource
CN110232111A (zh) * 2019-05-30 2019-09-13 杨钦清 一种文本显示方法、装置及终端设备

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750003A (zh) * 2012-05-30 2012-10-24 华为技术有限公司 文本输入的方法和装置
US20140082550A1 (en) * 2012-09-18 2014-03-20 Michael William Farmer Systems and methods for integrated query and navigation of an information resource
CN110232111A (zh) * 2019-05-30 2019-09-13 杨钦清 一种文本显示方法、装置及终端设备

Also Published As

Publication number Publication date
CN113095090A (zh) 2021-07-09

Similar Documents

Publication Publication Date Title
US11227326B2 (en) Augmented reality recommendations
US9922431B2 (en) Providing overlays based on text in a live camera view
US9760778B1 (en) Object recognition and navigation from ongoing video
US8943092B2 (en) Digital ink based contextual search
US20140253592A1 (en) Method for providing augmented reality, machine-readable storage medium, and portable terminal
CN109618222A (zh) 一种拼接视频生成方法、装置、终端设备及存储介质
US20200258146A1 (en) Electronic purchase order generation method and device, terminal and storage medium
CN109189879B (zh) 电子书籍显示方法及装置
KR20140030361A (ko) 휴대단말기의 문자 인식장치 및 방법
KR20120057799A (ko) 휴대단말에서 사전 기능 제공 방법 및 장치
US20220101638A1 (en) Image processing method, and electronic device supporting same
TWI710989B (zh) 應用於客戶端、服務端的業務執行方法、裝置以及設備
WO2014176938A1 (fr) Procédé et appareil de récupération d'informations
US20240119082A1 (en) Method, apparatus, device, readable storage medium and product for media content processing
CN109085982B (zh) 内容识别方法、装置及移动终端
CN108256071B (zh) 录屏文件的生成方法、装置、终端及存储介质
CN107679128B (zh) 一种信息展示方法、装置、电子设备及存储介质
US20130039535A1 (en) Method and apparatus for reducing complexity of a computer vision system and applying related computer vision applications
CN114067797A (zh) 一种语音控制方法、装置、设备以及计算机存储介质
WO2021139667A1 (fr) Procédé et appareil de traitement d'entrée, et dispositif informatique
CN106650727B (zh) 一种信息显示方法以及ar设备
EP4109334A1 (fr) Procédé et appareil de sélection de caractères employant une reconnaissance de caractères, et dispositif terminal
KR20120133149A (ko) 데이터 태깅 장치, 그의 데이터 태깅 방법 및 데이터 검색 방법
US10762344B2 (en) Method and system for using whiteboard changes as interactive directives for vectorization software
EP2784736A1 (fr) Procédé et système de fourniture d'accès à des données

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21738881

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21738881

Country of ref document: EP

Kind code of ref document: A1