US9514377B2 - Techniques for distributed optical character recognition and distributed machine language translation - Google Patents

Techniques for distributed optical character recognition and distributed machine language translation Download PDF

Info

Publication number
US9514377B2
US9514377B2 US14/264,327 US201414264327A US9514377B2 US 9514377 B2 US9514377 B2 US 9514377B2 US 201414264327 A US201414264327 A US 201414264327A US 9514377 B2 US9514377 B2 US 9514377B2
Authority
US
United States
Prior art keywords
ocr
computing device
mobile computing
text
complexity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US14/264,327
Other versions
US20150310291A1 (en
Inventor
Alexander Jay Cuthbert
Peng Xu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Assigned to GOOGLE INC. reassignment GOOGLE INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CUTHBERT, ALEXANDER JAY, XU, PENG
Priority to US14/264,327 priority Critical patent/US9514377B2/en
Priority to CN201580029025.7A priority patent/CN106415605B/en
Priority to EP15720892.7A priority patent/EP3138046B1/en
Priority to PCT/US2015/027884 priority patent/WO2015168056A1/en
Priority to KR1020167033289A priority patent/KR101856119B1/en
Priority to EP16202790.8A priority patent/EP3168756B1/en
Publication of US20150310291A1 publication Critical patent/US20150310291A1/en
Publication of US9514377B2 publication Critical patent/US9514377B2/en
Application granted granted Critical
Assigned to GOOGLE LLC reassignment GOOGLE LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: GOOGLE INC.
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/17Image acquisition using hand-held instruments
    • G06K9/18
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • G06K9/22
    • G06K9/325
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/96Management of image or video recognition tasks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/142Image acquisition using hand-held instruments; Constructional details of the instruments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/22Character recognition characterised by the type of writing
    • G06V30/224Character recognition characterised by the type of writing of printed characters having additional code marks or containing code marks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present disclosure generally relates to mobile computing devices and, more particularly, to techniques for distributed optical character recognition (OCR) and distributed machine language translation.
  • OCR optical character recognition
  • Optical character recognition involves the detection of a text in an image using a computing device, e.g., a server. OCR can provide for a faster way to obtain the text in a digital form at a user device, e.g., compared to manual input of the text to the user device by a user.
  • the text can be utilized in various ways. For example, the text may be processed by a computing device, stored at a memory, and/or transmitted to another computing device.
  • One example of processing the text is machine language translation, which involves translating the text from a source language to a different target language using a computing device.
  • a computer-implemented technique can include receiving, at a mobile computing device having one or more processors, an image of an object comprising a text in a source language.
  • the technique can include determining, at the mobile computing device, a degree of optical character recognition (OCR) complexity for performing OCR on the image to obtain the text.
  • OCR optical character recognition
  • the technique can include performing, at the mobile computing device, OCR on the image to obtain an OCR text.
  • the technique can include: (i) transmitting, from the mobile computing device, at least a portion of the image to a first server, and (ii) receiving, at the mobile computing device at least a portion of the OCR text from the first server.
  • the technique can include determining, at the mobile computing device, a degree of translation complexity for translating the OCR text from the source language to a target language.
  • the technique can include performing, at the mobile computing device, machine language translation of the OCR text from the source language to a target language to obtain a translated OCR text in the target language.
  • the technique can include: (i) transmitting at least a portion of the OCR text to a second server, and (ii) receiving at least a portion of the translated OCR text from the second server.
  • the technique can also include outputting, at a display of the mobile computing device, the translated OCR text.
  • the technique can include: transmitting, from the mobile computing device, at least the portion of the image to the first server, and receiving, at the mobile computing device, at least the portion of the OCR text from the first server.
  • the second OCR complexity threshold represents a degree of OCR complexity that the mobile computing device is not appropriate for performing and the first server is appropriate for performing.
  • the technique can include: transmitting, from the mobile computing device, all of the image to the first server, and receiving, at the mobile computing device, all of the OCR text from the first server.
  • the technique can include: transmitting, from the mobile computing device, at least the portion of the OCR text to the second server, and receiving, at the mobile computing device, at least the portion of the translated OCR text from the second server.
  • the second translation complexity threshold represents a degree of translation complexity that the mobile computing device is not appropriate for performing and the second server is appropriate for performing.
  • the technique can include: transmitting, from the mobile computing device, all of the OCR text to the second server, and receiving, at the mobile computing device, all of the translated OCR text from the first server.
  • the translated OCR text includes first and second portions corresponding to machine language translation by the mobile computing device and the second server, respectively, and outputting the translated OCR text includes outputting, at the display of the mobile computing device, the first portion of the translated OCR text while awaiting the second portion of the translated OCR text from the second server.
  • the OCR text includes first and second portions corresponding to the first and second portions of the translated OCR text, respectively, and outputting the translated OCR text includes outputting, at the display of the mobile computing device, the first portion of the translated OCR text and the second portion of the OCR text while awaiting the second portion of the translated OCR text from the second server.
  • the technique further includes outputting, at the display of the mobile computing device, the first and second portions of the translated OCR text in response to receiving the second portion of the translated OCR text from the second server.
  • a mobile computing device having one or more processors configured to perform operations is presented.
  • the operations can include receiving an image of an object comprising a text in a source language.
  • the operations can include determining a degree of OCR complexity for performing OCR on the image to obtain the text.
  • the degree of OCR complexity is less than a first OCR complexity threshold the first OCR complexity threshold representing a degree of OCR complexity that the mobile computing device is appropriate for performing, the operations can include performing OCR on the image to obtain an OCR text.
  • the operations can include: (i) transmitting, via a communication device, at least a portion of the image to a first server, and (ii) receiving, via the communication device, at least a portion of the OCR text from the first server.
  • the operations can include determining a degree of translation complexity for translating the OCR text from the source language to a target language.
  • the degree of translation complexity is less than a first translation complexity threshold the first OCR complexity threshold representing a degree of translation complexity that the mobile computing device is appropriate for performing
  • the operations can include performing machine language translation of the OCR text from the source language to a target language to obtain a translated OCR text in the target language.
  • the operations can include (i) transmitting, via the communication device, at least a portion of the OCR text to a second server, and (ii) receiving, via the communication device, at least a portion of the translated OCR text from the second server.
  • the operations can also include outputting the translated OCR text at a display of the mobile computing device.
  • the operations can further include: transmitting, via the communication device, at least the portion of the image to the first server, and receiving, via the communication device, at least the portion of the OCR text from the first server.
  • the second OCR complexity threshold represents a degree of OCR complexity that the mobile computing device is not appropriate for performing and the first server is appropriate for performing.
  • the operations can further include: transmitting, via the communication device, all of the image to the first server, and receiving, via the communication device, all of the OCR text from the first server.
  • the operations can further include: transmitting, via the communication device, at least the portion of the OCR text to the second server, and receiving, via the communication device, at least the portion of the translated OCR text from the second server.
  • the second translation complexity threshold represents a degree of translation complexity that the mobile computing device is not appropriate for performing and the second server is appropriate for performing.
  • the operations can further include: transmitting, via the communication device, all of the OCR text to the second server, and receiving, via the communication device, all of the translated OCR text from the first server.
  • the translated OCR text includes first and second portions corresponding to machine language translation by the mobile computing device and the second server, respectively, and outputting the translated OCR text at the display of the mobile computing device includes displaying the first portion of the translated OCR text while awaiting the second portion of the translated OCR text from the second server.
  • the OCR text includes first and second portions corresponding to the first and second portions of the translated OCR text, respectively, and outputting the translated OCR text at the display of the mobile computing device includes displaying the first portion of the translated OCR text and the second portion of the OCR text while awaiting the second portion of the translated OCR text from the second server.
  • the operations further include outputting, at the display of the mobile computing device, the first and second portions of the translated OCR text in response to receiving the second portion of the translated OCR text from the second server.
  • the technique can include receiving, at a mobile computing device having one or more processors, an image of an object comprising a text in a source language.
  • the technique can include determining, at the mobile computing device, a degree of OCR complexity for performing OCR on the image to obtain the text.
  • the technique can include transmitting, from the mobile computing device to a server, at least a portion of the image based on the degree of OCR complexity.
  • the technique can include receiving, at the mobile computing device from the server, OCR results.
  • the technique can include obtaining, at the mobile computing device, an OCR text based on the OCR results.
  • the technique can include obtaining, at the mobile computing device, a machine language translation of the OCR text from the source language to a target language to obtain a translated OCR text.
  • the technique can also include outputting, at a display of the mobile computing device, the translated OCR text.
  • the technique further includes: performing, at the mobile computing device, OCR for the entire image when the degree of OCR complexity is less than a first OCR complexity threshold, and transmitting, from the mobile computing device to the server, at least the portion of the image when the degree of OCR complexity is greater than the first OCR complexity threshold.
  • the first OCR complexity threshold represents a degree of OCR complexity that the mobile computing device is appropriate for performing itself.
  • the technique further includes transmitting, from the mobile computing device to the server, all of the image when the degree of OCR complexity is greater than a second OCR complexity threshold that is greater than the first OCR complexity threshold.
  • the second OCR complexity threshold represents a degree of OCR complexity that the mobile computing device is not appropriate for performing itself.
  • the mobile computing device when the degree of OCR complexity is between the first and second OCR complexity thresholds, performs OCR for a first portion of the image and the mobile computing device transmits a second portion of the image to the server, the first and second portions of the image collectively forming the entire image.
  • the technique can include receiving, at a mobile computing device having one or more processors, an image of an object comprising a text in a source language.
  • the technique can include obtaining, at the mobile computing device, OCR results for the object and the text to obtain an OCR text.
  • the technique can include determining, at the mobile computing device, the source language of the OCR text.
  • the technique can include determining, at the mobile computing device, a degree of translation complexity for performing machine language translation of the OCR text from the source language to a target language.
  • the technique can include transmitting, from the mobile computing device to a server, at least a portion of the OCR text based on the degree of translation complexity.
  • the technique can include receiving, at the mobile computing device from the server, machine language translation results.
  • the technique can include obtaining, at the mobile computing device, a translated OCR text based on the machine language translation results.
  • the technique can also include outputting, at a display of the mobile computing device, the translated OCR text.
  • the technique further includes: performing, at the mobile computing device, machine language translation for the entire OCR text when the degree of translation complexity is less than a first translation complexity threshold, and transmitting, from the mobile computing device to the server, at least the portion of the OCR text when the degree of translation complexity is greater than the first translation complexity threshold.
  • the first translation complexity threshold represents a degree of translation complexity that the mobile computing device is appropriate for performing itself.
  • the technique further includes: transmitting, from the mobile computing device to the server, all of the OCR text when the degree of translation complexity is greater than a second translation complexity threshold that is greater than the first translation complexity threshold.
  • the second translation complexity threshold represents a degree of translation complexity that the mobile computing device is not appropriate for performing itself.
  • the mobile computing device when the degree of translation complexity is between the first and second translation complexity thresholds, performs machine language translation for a first portion of the OCR text and the mobile computing device transmits a second portion of the OCR text to the server, the first and second portions of the OCR text collectively forming the entire OCR text.
  • machine language translation results for the first portion of the OCR text that are obtained by the mobile computing device are output to the display of the mobile computing device before the machine language translation results for the second portion of the OCR text are received from the server.
  • the technique can include receiving, at a mobile computing device having one or more processors, an image of an object comprising a text in a source language.
  • the technique can include determining, at the mobile computing device, a degree of OCR complexity for performing OCR on the image to obtain the text.
  • the technique can include transmitting, from the mobile computing device to a first server, at least a portion of the image based on the degree of OCR complexity.
  • the technique can include receiving, at the mobile computing device from the first server, OCR results.
  • the technique can include obtaining, at the mobile computing device, an OCR text based on the OCR results.
  • the technique can include determining, at the mobile computing device, a degree of translation complexity for performing machine language translation of the OCR text from the source language to a target language.
  • the technique can include transmitting, from the mobile computing device to a second server, at least a portion of the OCR text based on the degree of translation complexity.
  • the technique can include receiving, at the mobile computing device from the second server, machine language translation results.
  • the technique can include obtaining, at the mobile computing device, a translated OCR text based on the machine language translation results.
  • the technique can also include outputting, at a display of the mobile computing device, the translated OCR text.
  • the technique further includes: performing, at the mobile computing device, OCR for the entire image when the degree of OCR complexity is less than a first OCR complexity threshold, wherein the first OCR complexity threshold represents a degree of OCR complexity that the mobile computing device is appropriate for performing itself, and transmitting, from the mobile computing device to the first server, at least the portion of the image when the degree of OCR complexity is greater than the first OCR complexity threshold.
  • the technique further includes: transmitting, from the mobile computing device to the first server, all of the image when the degree of OCR complexity is greater than a second OCR complexity threshold that is greater than the first OCR complexity threshold, wherein the second OCR complexity threshold represents a degree of OCR complexity that the mobile computing device is not appropriate for performing itself, and when the degree of OCR complexity is between the first and second OCR complexity thresholds, the mobile computing device performs OCR for a first portion of the image and the mobile computing device transmits a second portion of the image to the first server, the first and second portions of the image collectively forming the entire image.
  • the technique further includes: performing, at the mobile computing device, machine language translation for the entire OCR text when the degree of translation complexity is less than a first translation complexity threshold, wherein the first translation complexity threshold represents a degree of translation complexity that the mobile computing device is appropriate for performing itself, and transmitting, from the mobile computing device to the second server, at least the portion of the OCR text when the degree of translation complexity is greater than the first translation complexity threshold.
  • the technique further includes: transmitting, from the mobile computing device to the second server, all of the OCR text when the degree of translation complexity is greater than a second translation complexity threshold that is greater than the first translation complexity threshold, wherein the second translation complexity threshold represents a degree of translation complexity that the mobile computing device is not appropriate for performing itself, and when the degree of translation complexity is between the first and second translation complexity thresholds, the mobile computing device performs machine language translation for a first portion of the OCR text and the mobile computing device transmits a second portion of the OCR text to the second server, the first and second portions of the OCR text collectively forming the entire OCR text.
  • the translated OCR text includes first and second portions corresponding to machine language translation by the mobile computing device and the second server, respectively, and outputting the translated OCR text at the display of the mobile computing device includes displaying the first portion of the translated OCR text while awaiting the second portion of the translated OCR text from the second server, and subsequently outputting, at the display of the mobile computing device, the first and second portions of the translated OCR text in response to receiving the second portion of the translated OCR text from the second server.
  • the technique can include obtaining, at a mobile computing device having one or more processors, a text in a source language.
  • the technique can include determining, at the mobile computing device, the source language of the text.
  • the technique can include determining, at the mobile computing device, a degree of translation complexity for performing machine language translation of the text from the source language to a target language.
  • the technique can include transmitting, from the mobile computing device to a server, at least a portion of the text based on the degree of translation complexity.
  • the technique can include receiving, at the mobile computing device from the server, machine language translation results.
  • the technique can include obtaining, at the mobile computing device, a translated text based on the machine language translation results.
  • the technique can also include outputting, at a display of the mobile computing device, the translated text.
  • the technique further includes: performing, at the mobile computing device, machine language translation for the entire text when the degree of translation complexity is less than a first translation complexity threshold, and transmitting, from the mobile computing device to the server, at least the portion of the text when the degree of translation complexity is greater than the first translation complexity threshold.
  • the first translation complexity threshold represents a degree of translation complexity that the mobile computing device is appropriate for performing itself.
  • the technique further includes transmitting, from the mobile computing device to the server, all of the text when the degree of translation complexity is greater than a second translation complexity threshold that is greater than the first translation complexity threshold.
  • the second translation complexity threshold represents a degree of translation complexity that the mobile computing device is not appropriate for performing itself.
  • the mobile computing device when the degree of translation complexity is between the first and second translation complexity thresholds, performs machine language translation for a first portion of the text and the mobile computing device transmits a second portion of the text to the server, the first and second portions of the text collectively forming the entire text.
  • machine language translation results for the first portion of the text that are obtained by the mobile computing device are output to the display of the mobile computing device before the machine language translation results for the second portion of the text are received from the server.
  • FIG. 1 is a functional block diagram of a computing network including an example mobile computing device according to some implementations of the present disclosure
  • FIG. 2 is a functional block diagram of the example mobile computing device of FIG. 1 ;
  • FIGS. 3A-3C are flow diagrams of example techniques for distributed optical character recognition (OCR) and/or distributed machine language translation according to some implementations of the present disclosure.
  • FIGS. 4A-4D illustrate an example display of the example mobile computing device at various stages during execution of the distributed OCR and machine language translation techniques according to some implementations of the present disclosure.
  • Computer servers may have greater processing power than mobile computing devices (tablet computers, mobile phones, etc.) and thus they may generate better optical character recognition (OCR) results and machine language translation results. While computing servers can typically generate faster and/or more accurate results, obtaining these results at the mobile computing device can be slower due to network delays associated with transmitting data. Moreover, in simple/non-complex cases, the mobile computing device may be capable of generating the same results as a server. For example, an image may have a small quantity of text and/or very large text. Similarly, for example, a text for translation may have a small quantity of characters/words/sentences and/or may be a linguistically simple text.
  • OCR optical character recognition
  • the mobile computing device can receive an image of an object comprising a text in a source language.
  • the mobile computing device can determine a degree of OCR complexity for obtaining the text from the image. Based on this degree of OCR complexity, the mobile computing device and/or a server can perform OCR to obtain an OCR text.
  • the mobile computing device can then determine a degree of translation complexity for translating the OCR text from the source language to a target language.
  • the mobile computing device and/or a server can perform machine language translation of the OCR text from the source language to a target language to obtain a translated OCR text.
  • the mobile computing device can then output the translated OCR text.
  • the distributed OCR techniques and the distributed machine language translation techniques of the present disclosure can be used individually (separately) or together.
  • the distributed OCR techniques of the present disclosure can be used to obtain an OCR text, and the translated text can be obtained from the OCR text.
  • an OCR text can be obtained from an image, and the distributed machine language translation techniques of the present disclosure can be performed to obtain a translated text.
  • an OCR text can be determined and stored/output using the distributed OCR techniques of the present disclosure, and/or a translated of an input text can be determined and stored/output using the distributed machine language translation techniques of the present disclosure.
  • the computing network 100 includes servers 104 a and 104 b (collectively “servers 104 ”).
  • server 104 a may be an OCR server and server 104 b may be a machine language translation server.
  • server as used herein can refer to both a single hardware computer server and a plurality of similar servers operating in a parallel or distributed architecture.
  • the computing network 100 may include a single server 104 that performs both OCR and machine language translation, or the computing network 100 may include three or more servers that collectively perform OCR and machine language translation.
  • a mobile computing device 108 is configured to communicate with the servers 104 via a network 112 .
  • Examples of the mobile computing device 108 include a laptop computer, a tablet computer, a mobile phone, and wearable technology, such as a smartwatch, eyewear, or other wearable objects that incorporate a computing device. It should be appreciated, however, that the techniques of the present disclosure could be implemented at any computing device having a display and a camera, e.g., a desktop computer.
  • the network 112 can include a local area network (LAN), a wide area network (WAN), e.g., the Internet, or a combination thereof.
  • the mobile computing device 108 can be associated with a user 116 .
  • the user 116 can interact with the mobile computing device 108 via a display 120 , e.g., a touch display.
  • the user 116 can use the mobile computing device 108 to interact with an object 124 having a text 128 thereon.
  • the object 124 can be any object suitable to display the text 128 , including, but not limited to, a document, a sign, an advertisement, and a menu.
  • the user 116 may command the mobile computing device 108 to capture an image of the object 124 and its text 128 using a camera 212 (see FIG. 2 ) associated with the mobile computing device 108 . OCR can be performed on the image to detect the text 128 .
  • the text 128 can then be translated from its source language to a target language, such as a language understood/spoken by the user 116 .
  • the mobile computing device 108 can include the display 120 , a communication device 200 , a processor 204 , a memory 208 , and the camera 212 . It should be appreciated that the mobile computing device 108 can also include other suitable components, such as physical buttons, a microphone, and a speaker.
  • the communication device 200 can include any suitable components (such as a transceiver) that are configured to communicate with other devices, e.g., the servers 104 , via the network 112 .
  • the memory 208 can be any suitable storage medium (flash, hard disk, etc.) configured to store information at the mobile computing device 108 .
  • the processor 204 can control operation of the mobile computing device 108 .
  • Example functions performed by the processor 204 include, but are not limited to, controlling transmission/reception of information via the communication device 200 and controlling read/write operations at the memory 208 .
  • the processor 204 can also process information received from the camera 212 and output information to the display 120 .
  • the camera 212 can be any suitable camera (charge-coupled device (CCD), complimentary metal-oxide-semiconductor (CMOS), etc.) configured to capture an image of the object 124 and its text 128 .
  • the display 120 is a touch display configured to receive input from the user 116 .
  • the processor 204 can also be configured to execute at least a portion of the techniques of the present disclosure, which are now discussed in greater detail.
  • the processor 204 can receive an image of the object 124 and its text 128 from the camera 212 .
  • the image can be captured by the camera 212 by the user 116 positioning the camera 212 and providing an input to capture the image.
  • the processor 204 can determine a degree of OCR complexity for the text 128 .
  • the degree of OCR complexity may be determined in response to an OCR request that is (i) generated in response to an input by the user 116 or (ii) generated automatically in response to capturing the image with the camera 212 .
  • the degree of OCR complexity is indicative of a degree of difficulty for the processor 204 to perform the OCR itself. Based on the degree of OCR complexity, the processor 204 can determine whether to transmit the image to the server 104 a for OCR.
  • Example factors in determining the degree of OCR complexity include, but are not limited to, a resolution of the image, a size of the object 124 and/or its text 128 , a style/font of the text 128 , and/or an angle/view at which the image was captured. More specifically, when the image is captured at an angle, i.e., not straight-on or straight-forward, the text 128 in the image may be skewed.
  • a high resolution image may correspond to a lower degree of OCR complexity, whereas a low resolution image may correspond to a higher degree of OCR complexity.
  • a large, non-styled, and/or basic font may correspond to a lower degree of OCR complexity, whereas a small, styled, and/or complex font may correspond to a higher degree of OCR complexity. Little or no skew may correspond to a lower degree of OCR complexity, whereas highly skewed text may correspond to a higher degree of OCR complexity.
  • the processor 204 can determine whether to transmit the image to the server 104 a for OCR based on the degree of OCR complexity. For example, the processor 204 may compare the degree of OCR complexity to one or more OCR complexity thresholds.
  • the OCR complexity threshold(s) can be predefined or user-defined.
  • the degree of OCR complexity may indicate that the server 104 a is appropriate (or more appropriate than the processor 204 ) for performing OCR for at least a portion of the image. In these cases, the processor 204 may transmit at least the portion of the image to the server 104 a . In other cases, the processor 204 may transmit the entire image to the server 104 a for OCR, or may not transmit anything to the server 104 a and thus may perform the OCR entirely by itself.
  • the processor 204 may perform the OCR entirely by itself.
  • the processor 204 may transmit at least the portion of the image to the server 104 b .
  • the processor 204 may transmit the entire image to the server 104 b .
  • the processor 204 may determine that a lower resolution version of the image is sufficient for the server 104 a and thus the processor 204 may transmit at least a portion of the lower resolution version of the image to the server 104 a .
  • the server 104 a can return OCR results to the mobile computing device 108 .
  • the appropriateness of the server 104 a and the processor 204 to perform OCR on the image can refer to an expected level of accuracy and/or efficiency for the server 104 a and the processor 204 to perform the OCR, respectively.
  • the processor 204 can use any suitable OCR algorithms to perform the OCR itself. After obtaining the OCR results locally and/or from the server 104 a , the processor 204 can compile the OCR results to obtain an OCR text.
  • the OCR text represents OCR results for the object 124 having the text 128 thereon. Depending on the quality of the OCR results, the OCR text may be the same as the text 128 or different than the text 128 .
  • the processor 204 can determine a source language of the OCR text.
  • the processor 204 can determine the source language of the OCR text. When the processor 204 is not confident in its determination of the source language, the processor 204 may send the OCR text to the server 104 b for this determination and, if requested, for machine language translation as well.
  • the OCR text can be translated from its source language to a target language. For example, the OCR text can be translated in response to a translation request that is (i) generated in response to an input from the user 116 or (ii) generated automatically in response to determining that the source language is not one or one or more languages preferred by the user 116 .
  • the processor 204 can determine a degree of translation complexity for translating the OCR text from the source language to the target language.
  • the degree of translation complexity is indicative of a degree of difficulty for the processor 204 to perform machine language translation of the OCR text itself.
  • Example factors in determining the degree of translation complexity include, but are not limited to, complexities of the source language and/or the target language and a number of characters, words, and/or sentences in the OCR text. Less complex (simple), more common, and/or more utilized languages may correspond to a higher degree of translation complexity, whereas more complex, less common, and/or less utilized languages may correspond to a higher degree of translation complexity. Fewer characters, words, and/or sentences may correspond to a lower degree of translation complexity, whereas more characters, words, and/or sentences may correspond to a higher degree of translation complexity. For example only, English may have a low degree of translation complexity and Russian may have a high degree of translation complexity.
  • the processor 204 can determine whether to transmit the OCR text to the server 104 b for machine language translation. For example, the processor 204 may compare the degree of translation complexity to one or more translation complexity thresholds.
  • the translation complexity threshold(s) can be predefined or user-defined.
  • the mobile computing device 108 may also have local language packs, e.g., stored at the memory 208 , and these local language packs may contain information that can be used by the processor 204 in performing machine language translation itself. The presence and type of these local language packs, therefore, may affect the translation complexity threshold(s).
  • the degree of translation complexity may indicate that the server 104 b is appropriate (or more appropriate than the processor 204 ) for performing machine language translation for at least a portion of the OCR text.
  • the processor 204 may transmit at least the portion of the OCR text to the server 104 b . In other cases, the processor 204 may transmit the entire OCR text to the server 104 b for OCR, or may not transmit anything to the server 104 b and thus may perform the machine language translation entirely by itself. More specifically, when the degree of translation complexity is less than a first translation complexity threshold, the processor 204 may perform the machine language translation entirely by itself. When the degree of translation complexity is greater than the first translation complexity threshold and less than a second translation complexity threshold, the processor 204 may transmit at least the portion of the OCR text to the server 104 b . When the degree of translation complexity is greater than the second translation complexity threshold, the processor 204 may transmit the entire OCR text to the server 104 b.
  • the processor 204 can use any suitable machine translation algorithms to perform the machine language translation itself. After obtaining the machine language translation results locally and/or from the server 104 b , the processor 204 can compile the machine language translation results to obtain a translated text.
  • the translated text represents machine language translation results for the OCR text. Depending on the quality of the machine language translation results, the translated text may or may not be an accurate translation of the OCR text from the source language to the target language. Similarly, depending on the quality of the OCR results as discussed above, the translated text may or may not be an accurate translation of the text 128 of the object 124 from the source language to the target language.
  • the processor 204 can output the translated text. For example, the processor 204 can output the translated text to the display 120 .
  • the mobile computing device 108 can receive an image of the object 124 comprising the text 128 in a source language.
  • the mobile computing device 108 can determine the degree of OCR complexity for performing OCR on the image to obtain the text 128 .
  • the mobile computing device 108 can compare the degree of OCR complexity to the first and second OCR complexity thresholds. When the degree of OCR complexity is less than the first OCR complexity threshold, the processor 204 can perform OCR on the image entirely by itself at 316 and then proceed to 332 .
  • the mobile computing device 108 can transmit a portion of the image to the server 104 a for OCR at 320 and then proceed to 328 .
  • the mobile computing device 108 can transmit the entire image to the server 104 a for OCR at 324 and then proceed to 328 .
  • the mobile computing device 108 can receive OCR results from the server 104 a and obtain the OCR text.
  • the mobile computing device 108 can determine the source language of the OCR text. In some implementations, the mobile computing device 108 may transmit at least a portion the OCR text to the server 104 b to determine the source language.
  • the mobile computing device 108 can determine a degree of translation complexity for translating the OCR text from the source language to a target language.
  • the mobile computing device 108 can compare the degree of translation complexity to the first and second translation complexity thresholds. When the degree of translation complexity is less than the first translation complexity threshold, the mobile computing device 108 can perform machine language translation of the OCR text entirely by itself at 344 and then proceed to 360 . When the degree of translation complexity is between the first and second translation complexity thresholds, the mobile computing device 108 can transmit a portion of the OCR text to the server 104 b for machine language translation at 348 and then proceed to 356 .
  • the mobile computing device 108 can transmit the entire OCR text to the server 104 b for machine language translation at 352 and then proceed to 356 .
  • the mobile computing device 108 can receive the machine language translation results from the server 104 b and obtain the translated OCR text.
  • the mobile computing device 108 can output the translated OCR text at the display 120 .
  • outputting the translated OCR text at the display includes outputting a portion of the translated OCR text obtained by the mobile computing device before outputting another portion of the OCR text obtained from the server 104 b .
  • the technique 300 can then end or return to 304 for one or more additional cycles.
  • the mobile computing device 104 can receive an image of the object 124 comprising the text 128 in the source language.
  • the mobile computing device 108 can determine the degree of OCR complexity for performing OCR on the image to obtain the text 128 .
  • the mobile computing device 108 can transmit at least a portion of the image to the server 104 a based on the degree of OCR complexity.
  • the mobile computing device 108 can receive OCR results from the server 104 a .
  • the mobile computing device 108 can obtain an OCR text using the OCR results.
  • the mobile computing device 108 can obtain a machine language translation of the OCR text from the source language to a target language to obtain a translated OCR text.
  • the mobile computing device 108 can output the translated OCR text.
  • the technique 370 can then end or return to 371 for one or more additional cycles.
  • the mobile computing device 108 can receive an image of the object 124 comprising the text 128 in the source language.
  • the mobile computing device 108 can obtain an OCR text from the image.
  • the mobile computing device 108 can determine the source language of the OCR text.
  • the mobile computing device 108 can transmit at least a portion of the OCR text to the server 104 b to determine the source language.
  • the mobile computing device 108 can determine a degree of translation complexity for performing machine language translation of the OCR text from the source language to a target language.
  • the mobile computing device 108 can transmit at least a portion of the OCR text to the server 104 b based on the degree of translation complexity.
  • the mobile computing device 108 can receive machine language translation results from the server 104 b .
  • the mobile computing device 108 can obtain a translation of the OCR text from the source language to the target language based on the machine language translation results to obtain a translated OCR text.
  • the mobile computing device 108 can output the translated OCR text.
  • the technique 380 can then end or return to 381 for one or more additional cycles.
  • FIG. 4A illustrates an image of the object 124 , which for purposes of FIGS. 4A-4B is a menu in French.
  • the menu includes header/title text 404 (“La Menu”) that is larger than other text 408 .
  • the degree of OCR complexity can vary depending on text size, text style (bold, italics, etc.), and other similar factors.
  • the mobile computing device 108 performs OCR for the header/title text 404 and the server 104 a performs OCR for the other text 408 . More specifically, the mobile computing device 108 performs OCR for a first portion 412 containing the header/title text 404 and the server 104 a performs OCR for a second portion 416 containing the other text 408 .
  • FIG. 4B illustrates the results of the distributed OCR.
  • the mobile computing device 108 obtained a header/title OCR text 420 from the first portion 412 and the server 104 a obtained and provided another OCR text 424 .
  • These OCR texts 420 , 424 collectively represent the OCR text for the image.
  • the OCR texts 420 , 424 can be italicized or otherwise accented, e.g., outlined or bordered, to indicate that OCR is complete and/or machine language translation has not yet been performed.
  • the local OCR may be completed first and thus the header/title OCR text 420 may be accented before the other OCR text 424 .
  • the user 116 is an English speaking user that cannot read or understand French, so he/she requests machine language translation. This could be performed automatically, e.g., based on their language preferences, or in response to an input from the user 116 , e.g., selecting an icon, such as the camera button, or by pushing a physical button.
  • FIG. 4C illustrates the results of the local machine language translation from French to English.
  • the mobile computing device 108 has a French-English local language pack, e.g., stored at the memory 208 , and thus the mobile computing device 108 is capable of some French to English machine language translation. More specifically, the mobile computing device 108 is appropriate for performing machine language translation of a first portion 428 of the OCR text to obtain a first translated OCR text 432 .
  • This first portion 428 of the OCR text may include easy/simple words for French to English machine language translation.
  • the mobile computing device 108 may be incapable of, inaccurate in, or inefficient in performing machine language translation on a second portion 436 of the OCR text. More specifically, this second portion 436 of the OCR text includes a word 440 (Escargots) that the mobile computing device 108 is not appropriate for performing OCR.
  • Escargots word 440
  • the mobile computing device 108 can determine that the second portion 436 of the OCR text should be sent to the server 104 b for machine language translation.
  • the styling/accenting of the text can be removed once machine language translation is complete to notify the user 116 .
  • FIG. 4C the first translated OCR text 432 has no styling or accenting.
  • FIG. 4D illustrates the results of the distributed machine language translation from French to English.
  • the server 104 b has obtained and provided a second translated OCR text 444 .
  • the first translated OCR text 432 and the second translated OCR text 444 collectively represent the translated OCR text for the image.
  • the mobile computing device 108 In response to receiving the second translated OCR text 444 from the server 104 b , the mobile computing device 108 displays the second translated OCR text 444 .
  • the mobile computing device 108 may overlay and/or fade from the word 440 to the second translated OCR text 444 .
  • Example embodiments are provided so that this disclosure will be thorough, and will fully convey the scope to those who are skilled in the art. Numerous specific details are set forth such as examples of specific components, devices, and methods, to provide a thorough understanding of embodiments of the present disclosure. It will be apparent to those skilled in the art that specific details need not be employed, that example embodiments may be embodied in many different forms and that neither should be construed to limit the scope of the disclosure. In some example embodiments, well-known procedures, well-known device structures, and well-known technologies are not described in detail.
  • first, second, third, etc. may be used herein to describe various elements, components, regions, layers and/or sections, these elements, components, regions, layers and/or sections should not be limited by these terms. These terms may be only used to distinguish one element, component, region, layer or section from another region, layer or section. Terms such as “first,” “second,” and other numerical terms when used herein do not imply a sequence or order unless clearly indicated by the context. Thus, a first element, component, region, layer or section discussed below could be termed a second element, component, region, layer or section without departing from the teachings of the example embodiments.
  • module may refer to, be part of, or include: an Application Specific Integrated Circuit (ASIC); an electronic circuit; a combinational logic circuit; a field programmable gate array (FPGA); a processor or a distributed network of processors (shared, dedicated, or grouped) and storage in networked clusters or datacenters that executes code or a process; other suitable components that provide the described functionality; or a combination of some or all of the above, such as in a system-on-chip.
  • the term module may also include memory (shared, dedicated, or grouped) that stores code executed by the one or more processors.
  • code may include software, firmware, byte-code and/or microcode, and may refer to programs, routines, functions, classes, and/or objects.
  • shared means that some or all code from multiple modules may be executed using a single (shared) processor. In addition, some or all code from multiple modules may be stored by a single (shared) memory.
  • group means that some or all code from a single module may be executed using a group of processors. In addition, some or all code from a single module may be stored using a group of memories.
  • the techniques described herein may be implemented by one or more computer programs executed by one or more processors.
  • the computer programs include processor-executable instructions that are stored on a non-transitory tangible computer readable medium.
  • the computer programs may also include stored data.
  • Non-limiting examples of the non-transitory tangible computer readable medium are nonvolatile memory, magnetic storage, and optical storage.
  • the present disclosure also relates to an apparatus for performing the operations herein.
  • This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored on a computer readable medium that can be accessed by the computer.
  • a computer program may be stored in a tangible computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
  • the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
  • the present disclosure is well suited to a wide variety of computer network systems over numerous topologies.
  • the configuration and management of large networks comprise storage devices and computers that are communicatively coupled to dissimilar computers and storage devices over a network, such as the Internet.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Machine Translation (AREA)
  • Character Discrimination (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A technique for selectively distributing OCR and/or machine language translation tasks between a mobile computing device and server(s) includes receiving, at the mobile computing device, an image of an object comprising a text. The mobile computing device can determine a degree of optical character recognition (OCR) complexity for obtaining the text from the image. Based on this degree of OCR complexity, the mobile computing device and/or the server(s) can perform OCR to obtain an OCR text. The mobile computing device can then determine a degree of translation complexity for translating the OCR text from its source language to a target language. Based on this degree of translation complexity, the mobile computing device and/or the server(s) can perform machine language translation of the OCR text from the source language to a target language to obtain a translated OCR text. The mobile computing device can then output the translated OCR text.

Description

FIELD
The present disclosure generally relates to mobile computing devices and, more particularly, to techniques for distributed optical character recognition (OCR) and distributed machine language translation.
BACKGROUND
The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
Optical character recognition (OCR) involves the detection of a text in an image using a computing device, e.g., a server. OCR can provide for a faster way to obtain the text in a digital form at a user device, e.g., compared to manual input of the text to the user device by a user. After obtaining the text in the image, the text can be utilized in various ways. For example, the text may be processed by a computing device, stored at a memory, and/or transmitted to another computing device. One example of processing the text is machine language translation, which involves translating the text from a source language to a different target language using a computing device.
SUMMARY
A computer-implemented technique is presented. The technique can include receiving, at a mobile computing device having one or more processors, an image of an object comprising a text in a source language. The technique can include determining, at the mobile computing device, a degree of optical character recognition (OCR) complexity for performing OCR on the image to obtain the text. When the degree of OCR complexity is less than a first OCR complexity threshold the first OCR complexity threshold representing a degree of OCR complexity that the mobile computing device is appropriate for performing, the technique can include performing, at the mobile computing device, OCR on the image to obtain an OCR text. When the degree of OCR complexity is greater than the first OCR complexity threshold, the technique can include: (i) transmitting, from the mobile computing device, at least a portion of the image to a first server, and (ii) receiving, at the mobile computing device at least a portion of the OCR text from the first server. The technique can include determining, at the mobile computing device, a degree of translation complexity for translating the OCR text from the source language to a target language. When the degree of translation complexity is less than a first translation complexity threshold the first OCR complexity threshold representing a degree of translation complexity that the mobile computing device is appropriate for performing, the technique can include performing, at the mobile computing device, machine language translation of the OCR text from the source language to a target language to obtain a translated OCR text in the target language. When the degree of translation complexity is greater than the first translation complexity threshold, the technique can include: (i) transmitting at least a portion of the OCR text to a second server, and (ii) receiving at least a portion of the translated OCR text from the second server. The technique can also include outputting, at a display of the mobile computing device, the translated OCR text.
In some embodiments, when the degree of OCR complexity is greater than the first OCR complexity threshold and less than a second OCR complexity threshold, the technique can include: transmitting, from the mobile computing device, at least the portion of the image to the first server, and receiving, at the mobile computing device, at least the portion of the OCR text from the first server.
In other embodiments, the second OCR complexity threshold represents a degree of OCR complexity that the mobile computing device is not appropriate for performing and the first server is appropriate for performing.
In some embodiments, when the degree of OCR complexity is greater than the second OCR complexity threshold, the technique can include: transmitting, from the mobile computing device, all of the image to the first server, and receiving, at the mobile computing device, all of the OCR text from the first server.
In other embodiments, when the degree of translation complexity is greater than the first translation complexity threshold and less than a second translation complexity threshold, the technique can include: transmitting, from the mobile computing device, at least the portion of the OCR text to the second server, and receiving, at the mobile computing device, at least the portion of the translated OCR text from the second server.
In some embodiments, the second translation complexity threshold represents a degree of translation complexity that the mobile computing device is not appropriate for performing and the second server is appropriate for performing.
In other embodiments, when the degree of OCR complexity is greater than the second translation complexity threshold, the technique can include: transmitting, from the mobile computing device, all of the OCR text to the second server, and receiving, at the mobile computing device, all of the translated OCR text from the first server.
In some embodiments, the translated OCR text includes first and second portions corresponding to machine language translation by the mobile computing device and the second server, respectively, and outputting the translated OCR text includes outputting, at the display of the mobile computing device, the first portion of the translated OCR text while awaiting the second portion of the translated OCR text from the second server.
In other embodiments, the OCR text includes first and second portions corresponding to the first and second portions of the translated OCR text, respectively, and outputting the translated OCR text includes outputting, at the display of the mobile computing device, the first portion of the translated OCR text and the second portion of the OCR text while awaiting the second portion of the translated OCR text from the second server.
In some embodiments, the technique further includes outputting, at the display of the mobile computing device, the first and second portions of the translated OCR text in response to receiving the second portion of the translated OCR text from the second server.
A mobile computing device having one or more processors configured to perform operations is presented. The operations can include receiving an image of an object comprising a text in a source language. The operations can include determining a degree of OCR complexity for performing OCR on the image to obtain the text. When the degree of OCR complexity is less than a first OCR complexity threshold the first OCR complexity threshold representing a degree of OCR complexity that the mobile computing device is appropriate for performing, the operations can include performing OCR on the image to obtain an OCR text. When the degree of OCR complexity is greater than the first OCR complexity threshold, the operations can include: (i) transmitting, via a communication device, at least a portion of the image to a first server, and (ii) receiving, via the communication device, at least a portion of the OCR text from the first server. The operations can include determining a degree of translation complexity for translating the OCR text from the source language to a target language. When the degree of translation complexity is less than a first translation complexity threshold the first OCR complexity threshold representing a degree of translation complexity that the mobile computing device is appropriate for performing, the operations can include performing machine language translation of the OCR text from the source language to a target language to obtain a translated OCR text in the target language. When the degree of translation complexity is greater than the first translation complexity threshold, the operations can include (i) transmitting, via the communication device, at least a portion of the OCR text to a second server, and (ii) receiving, via the communication device, at least a portion of the translated OCR text from the second server. The operations can also include outputting the translated OCR text at a display of the mobile computing device.
In some embodiments, when the degree of OCR complexity is greater than the first OCR complexity threshold and less than a second OCR complexity threshold, the operations can further include: transmitting, via the communication device, at least the portion of the image to the first server, and receiving, via the communication device, at least the portion of the OCR text from the first server.
In other embodiments, the second OCR complexity threshold represents a degree of OCR complexity that the mobile computing device is not appropriate for performing and the first server is appropriate for performing.
In some embodiments, when the degree of OCR complexity is greater than the second OCR complexity threshold, the operations can further include: transmitting, via the communication device, all of the image to the first server, and receiving, via the communication device, all of the OCR text from the first server.
In other embodiments, when the degree of translation complexity is greater than the first translation complexity threshold and less than a second translation complexity threshold, the operations can further include: transmitting, via the communication device, at least the portion of the OCR text to the second server, and receiving, via the communication device, at least the portion of the translated OCR text from the second server.
In some embodiments, the second translation complexity threshold represents a degree of translation complexity that the mobile computing device is not appropriate for performing and the second server is appropriate for performing.
In other embodiments, when the degree of OCR complexity is greater than the second translation complexity threshold, the operations can further include: transmitting, via the communication device, all of the OCR text to the second server, and receiving, via the communication device, all of the translated OCR text from the first server.
In some embodiments, the translated OCR text includes first and second portions corresponding to machine language translation by the mobile computing device and the second server, respectively, and outputting the translated OCR text at the display of the mobile computing device includes displaying the first portion of the translated OCR text while awaiting the second portion of the translated OCR text from the second server.
In other embodiments, the OCR text includes first and second portions corresponding to the first and second portions of the translated OCR text, respectively, and outputting the translated OCR text at the display of the mobile computing device includes displaying the first portion of the translated OCR text and the second portion of the OCR text while awaiting the second portion of the translated OCR text from the second server.
In some embodiments, the operations further include outputting, at the display of the mobile computing device, the first and second portions of the translated OCR text in response to receiving the second portion of the translated OCR text from the second server.
Another computer-implemented technique is also presented. The technique can include receiving, at a mobile computing device having one or more processors, an image of an object comprising a text in a source language. The technique can include determining, at the mobile computing device, a degree of OCR complexity for performing OCR on the image to obtain the text. The technique can include transmitting, from the mobile computing device to a server, at least a portion of the image based on the degree of OCR complexity. The technique can include receiving, at the mobile computing device from the server, OCR results. The technique can include obtaining, at the mobile computing device, an OCR text based on the OCR results. The technique can include obtaining, at the mobile computing device, a machine language translation of the OCR text from the source language to a target language to obtain a translated OCR text. The technique can also include outputting, at a display of the mobile computing device, the translated OCR text.
In some embodiments, the technique further includes: performing, at the mobile computing device, OCR for the entire image when the degree of OCR complexity is less than a first OCR complexity threshold, and transmitting, from the mobile computing device to the server, at least the portion of the image when the degree of OCR complexity is greater than the first OCR complexity threshold.
In other embodiments, the first OCR complexity threshold represents a degree of OCR complexity that the mobile computing device is appropriate for performing itself.
In some embodiments, the technique further includes transmitting, from the mobile computing device to the server, all of the image when the degree of OCR complexity is greater than a second OCR complexity threshold that is greater than the first OCR complexity threshold.
In other embodiments, the second OCR complexity threshold represents a degree of OCR complexity that the mobile computing device is not appropriate for performing itself.
In some embodiments, when the degree of OCR complexity is between the first and second OCR complexity thresholds, the mobile computing device performs OCR for a first portion of the image and the mobile computing device transmits a second portion of the image to the server, the first and second portions of the image collectively forming the entire image.
Another computer-implemented technique is also presented. The technique can include receiving, at a mobile computing device having one or more processors, an image of an object comprising a text in a source language. The technique can include obtaining, at the mobile computing device, OCR results for the object and the text to obtain an OCR text. The technique can include determining, at the mobile computing device, the source language of the OCR text. The technique can include determining, at the mobile computing device, a degree of translation complexity for performing machine language translation of the OCR text from the source language to a target language. The technique can include transmitting, from the mobile computing device to a server, at least a portion of the OCR text based on the degree of translation complexity. The technique can include receiving, at the mobile computing device from the server, machine language translation results. The technique can include obtaining, at the mobile computing device, a translated OCR text based on the machine language translation results. The technique can also include outputting, at a display of the mobile computing device, the translated OCR text.
In some embodiments, the technique further includes: performing, at the mobile computing device, machine language translation for the entire OCR text when the degree of translation complexity is less than a first translation complexity threshold, and transmitting, from the mobile computing device to the server, at least the portion of the OCR text when the degree of translation complexity is greater than the first translation complexity threshold.
In other embodiments, the first translation complexity threshold represents a degree of translation complexity that the mobile computing device is appropriate for performing itself.
In some embodiments, the technique further includes: transmitting, from the mobile computing device to the server, all of the OCR text when the degree of translation complexity is greater than a second translation complexity threshold that is greater than the first translation complexity threshold.
In other embodiments, the second translation complexity threshold represents a degree of translation complexity that the mobile computing device is not appropriate for performing itself.
In some embodiments, when the degree of translation complexity is between the first and second translation complexity thresholds, the mobile computing device performs machine language translation for a first portion of the OCR text and the mobile computing device transmits a second portion of the OCR text to the server, the first and second portions of the OCR text collectively forming the entire OCR text.
In other embodiments, machine language translation results for the first portion of the OCR text that are obtained by the mobile computing device are output to the display of the mobile computing device before the machine language translation results for the second portion of the OCR text are received from the server.
Another computer-implemented technique is also presented. The technique can include receiving, at a mobile computing device having one or more processors, an image of an object comprising a text in a source language. The technique can include determining, at the mobile computing device, a degree of OCR complexity for performing OCR on the image to obtain the text. The technique can include transmitting, from the mobile computing device to a first server, at least a portion of the image based on the degree of OCR complexity. The technique can include receiving, at the mobile computing device from the first server, OCR results. The technique can include obtaining, at the mobile computing device, an OCR text based on the OCR results. The technique can include determining, at the mobile computing device, a degree of translation complexity for performing machine language translation of the OCR text from the source language to a target language. The technique can include transmitting, from the mobile computing device to a second server, at least a portion of the OCR text based on the degree of translation complexity. The technique can include receiving, at the mobile computing device from the second server, machine language translation results. The technique can include obtaining, at the mobile computing device, a translated OCR text based on the machine language translation results. The technique can also include outputting, at a display of the mobile computing device, the translated OCR text.
In some embodiments, the technique further includes: performing, at the mobile computing device, OCR for the entire image when the degree of OCR complexity is less than a first OCR complexity threshold, wherein the first OCR complexity threshold represents a degree of OCR complexity that the mobile computing device is appropriate for performing itself, and transmitting, from the mobile computing device to the first server, at least the portion of the image when the degree of OCR complexity is greater than the first OCR complexity threshold.
In other embodiments, the technique further includes: transmitting, from the mobile computing device to the first server, all of the image when the degree of OCR complexity is greater than a second OCR complexity threshold that is greater than the first OCR complexity threshold, wherein the second OCR complexity threshold represents a degree of OCR complexity that the mobile computing device is not appropriate for performing itself, and when the degree of OCR complexity is between the first and second OCR complexity thresholds, the mobile computing device performs OCR for a first portion of the image and the mobile computing device transmits a second portion of the image to the first server, the first and second portions of the image collectively forming the entire image.
In some embodiments, the technique further includes: performing, at the mobile computing device, machine language translation for the entire OCR text when the degree of translation complexity is less than a first translation complexity threshold, wherein the first translation complexity threshold represents a degree of translation complexity that the mobile computing device is appropriate for performing itself, and transmitting, from the mobile computing device to the second server, at least the portion of the OCR text when the degree of translation complexity is greater than the first translation complexity threshold.
In other embodiments, the technique further includes: transmitting, from the mobile computing device to the second server, all of the OCR text when the degree of translation complexity is greater than a second translation complexity threshold that is greater than the first translation complexity threshold, wherein the second translation complexity threshold represents a degree of translation complexity that the mobile computing device is not appropriate for performing itself, and when the degree of translation complexity is between the first and second translation complexity thresholds, the mobile computing device performs machine language translation for a first portion of the OCR text and the mobile computing device transmits a second portion of the OCR text to the second server, the first and second portions of the OCR text collectively forming the entire OCR text.
In some embodiments, the translated OCR text includes first and second portions corresponding to machine language translation by the mobile computing device and the second server, respectively, and outputting the translated OCR text at the display of the mobile computing device includes displaying the first portion of the translated OCR text while awaiting the second portion of the translated OCR text from the second server, and subsequently outputting, at the display of the mobile computing device, the first and second portions of the translated OCR text in response to receiving the second portion of the translated OCR text from the second server.
Another computer-implemented technique is also presented. The technique can include obtaining, at a mobile computing device having one or more processors, a text in a source language. The technique can include determining, at the mobile computing device, the source language of the text. The technique can include determining, at the mobile computing device, a degree of translation complexity for performing machine language translation of the text from the source language to a target language. The technique can include transmitting, from the mobile computing device to a server, at least a portion of the text based on the degree of translation complexity. The technique can include receiving, at the mobile computing device from the server, machine language translation results. The technique can include obtaining, at the mobile computing device, a translated text based on the machine language translation results. The technique can also include outputting, at a display of the mobile computing device, the translated text.
In some embodiments, the technique further includes: performing, at the mobile computing device, machine language translation for the entire text when the degree of translation complexity is less than a first translation complexity threshold, and transmitting, from the mobile computing device to the server, at least the portion of the text when the degree of translation complexity is greater than the first translation complexity threshold.
In other embodiments, the first translation complexity threshold represents a degree of translation complexity that the mobile computing device is appropriate for performing itself.
In some embodiments, the technique further includes transmitting, from the mobile computing device to the server, all of the text when the degree of translation complexity is greater than a second translation complexity threshold that is greater than the first translation complexity threshold.
In other embodiments, the second translation complexity threshold represents a degree of translation complexity that the mobile computing device is not appropriate for performing itself.
In some embodiments, when the degree of translation complexity is between the first and second translation complexity thresholds, the mobile computing device performs machine language translation for a first portion of the text and the mobile computing device transmits a second portion of the text to the server, the first and second portions of the text collectively forming the entire text.
In other embodiments, machine language translation results for the first portion of the text that are obtained by the mobile computing device are output to the display of the mobile computing device before the machine language translation results for the second portion of the text are received from the server.
Further areas of applicability of the present disclosure will become apparent from the detailed description provided hereinafter. It should be understood that the detailed description and specific examples are intended for purposes of illustration only and are not intended to limit the scope of the disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
The present disclosure will become more fully understood from the detailed description and the accompanying drawings, wherein:
FIG. 1 is a functional block diagram of a computing network including an example mobile computing device according to some implementations of the present disclosure;
FIG. 2 is a functional block diagram of the example mobile computing device of FIG. 1;
FIGS. 3A-3C are flow diagrams of example techniques for distributed optical character recognition (OCR) and/or distributed machine language translation according to some implementations of the present disclosure; and
FIGS. 4A-4D illustrate an example display of the example mobile computing device at various stages during execution of the distributed OCR and machine language translation techniques according to some implementations of the present disclosure.
DETAILED DESCRIPTION
Computer servers may have greater processing power than mobile computing devices (tablet computers, mobile phones, etc.) and thus they may generate better optical character recognition (OCR) results and machine language translation results. While computing servers can typically generate faster and/or more accurate results, obtaining these results at the mobile computing device can be slower due to network delays associated with transmitting data. Moreover, in simple/non-complex cases, the mobile computing device may be capable of generating the same results as a server. For example, an image may have a small quantity of text and/or very large text. Similarly, for example, a text for translation may have a small quantity of characters/words/sentences and/or may be a linguistically simple text.
Accordingly, techniques are presented for distributed OCR and distributed language translation. These techniques involve selectively distributing OCR and/or machine language translation tasks between a mobile computing device and one or more servers based on degrees of complexity for the respective tasks. The mobile computing device can receive an image of an object comprising a text in a source language. The mobile computing device can determine a degree of OCR complexity for obtaining the text from the image. Based on this degree of OCR complexity, the mobile computing device and/or a server can perform OCR to obtain an OCR text. The mobile computing device can then determine a degree of translation complexity for translating the OCR text from the source language to a target language. Based on this degree of translation complexity, the mobile computing device and/or a server can perform machine language translation of the OCR text from the source language to a target language to obtain a translated OCR text. The mobile computing device can then output the translated OCR text. It will be appreciated that the distributed OCR techniques and the distributed machine language translation techniques of the present disclosure can be used individually (separately) or together. In one example implementation, the distributed OCR techniques of the present disclosure can be used to obtain an OCR text, and the translated text can be obtained from the OCR text. In another example implementation, an OCR text can be obtained from an image, and the distributed machine language translation techniques of the present disclosure can be performed to obtain a translated text. In other example implementations, an OCR text can be determined and stored/output using the distributed OCR techniques of the present disclosure, and/or a translated of an input text can be determined and stored/output using the distributed machine language translation techniques of the present disclosure.
Referring now to FIG. 1, a computing network 100 is illustrated. The computing network 100 includes servers 104 a and 104 b (collectively “servers 104”). For example, server 104 a may be an OCR server and server 104 b may be a machine language translation server. It should be appreciated, however, that the term “server” as used herein can refer to both a single hardware computer server and a plurality of similar servers operating in a parallel or distributed architecture. The computing network 100, therefore, may include a single server 104 that performs both OCR and machine language translation, or the computing network 100 may include three or more servers that collectively perform OCR and machine language translation.
A mobile computing device 108 is configured to communicate with the servers 104 via a network 112. Examples of the mobile computing device 108 include a laptop computer, a tablet computer, a mobile phone, and wearable technology, such as a smartwatch, eyewear, or other wearable objects that incorporate a computing device. It should be appreciated, however, that the techniques of the present disclosure could be implemented at any computing device having a display and a camera, e.g., a desktop computer. The network 112 can include a local area network (LAN), a wide area network (WAN), e.g., the Internet, or a combination thereof. The mobile computing device 108 can be associated with a user 116. For example, the user 116 can interact with the mobile computing device 108 via a display 120, e.g., a touch display.
The user 116 can use the mobile computing device 108 to interact with an object 124 having a text 128 thereon. The object 124 can be any object suitable to display the text 128, including, but not limited to, a document, a sign, an advertisement, and a menu. For example, the user 116 may command the mobile computing device 108 to capture an image of the object 124 and its text 128 using a camera 212 (see FIG. 2) associated with the mobile computing device 108. OCR can be performed on the image to detect the text 128. After obtaining the text 128, the text 128 can then be translated from its source language to a target language, such as a language understood/spoken by the user 116.
Referring now to FIG. 2, a functional block diagram of the example mobile computing device 108 is illustrated. The mobile computing device 108 can include the display 120, a communication device 200, a processor 204, a memory 208, and the camera 212. It should be appreciated that the mobile computing device 108 can also include other suitable components, such as physical buttons, a microphone, and a speaker. The communication device 200 can include any suitable components (such as a transceiver) that are configured to communicate with other devices, e.g., the servers 104, via the network 112. The memory 208 can be any suitable storage medium (flash, hard disk, etc.) configured to store information at the mobile computing device 108.
The processor 204 can control operation of the mobile computing device 108. Example functions performed by the processor 204 include, but are not limited to, controlling transmission/reception of information via the communication device 200 and controlling read/write operations at the memory 208. The processor 204 can also process information received from the camera 212 and output information to the display 120. The camera 212 can be any suitable camera (charge-coupled device (CCD), complimentary metal-oxide-semiconductor (CMOS), etc.) configured to capture an image of the object 124 and its text 128. In one implementation, the display 120 is a touch display configured to receive input from the user 116. The processor 204 can also be configured to execute at least a portion of the techniques of the present disclosure, which are now discussed in greater detail.
The processor 204 can receive an image of the object 124 and its text 128 from the camera 212. The image can be captured by the camera 212 by the user 116 positioning the camera 212 and providing an input to capture the image. When OCR on the image is requested, the processor 204 can determine a degree of OCR complexity for the text 128. For example, the degree of OCR complexity may be determined in response to an OCR request that is (i) generated in response to an input by the user 116 or (ii) generated automatically in response to capturing the image with the camera 212. The degree of OCR complexity is indicative of a degree of difficulty for the processor 204 to perform the OCR itself. Based on the degree of OCR complexity, the processor 204 can determine whether to transmit the image to the server 104 a for OCR.
Example factors in determining the degree of OCR complexity include, but are not limited to, a resolution of the image, a size of the object 124 and/or its text 128, a style/font of the text 128, and/or an angle/view at which the image was captured. More specifically, when the image is captured at an angle, i.e., not straight-on or straight-forward, the text 128 in the image may be skewed. A high resolution image may correspond to a lower degree of OCR complexity, whereas a low resolution image may correspond to a higher degree of OCR complexity. A large, non-styled, and/or basic font may correspond to a lower degree of OCR complexity, whereas a small, styled, and/or complex font may correspond to a higher degree of OCR complexity. Little or no skew may correspond to a lower degree of OCR complexity, whereas highly skewed text may correspond to a higher degree of OCR complexity.
As mentioned above, the processor 204 can determine whether to transmit the image to the server 104 a for OCR based on the degree of OCR complexity. For example, the processor 204 may compare the degree of OCR complexity to one or more OCR complexity thresholds. The OCR complexity threshold(s) can be predefined or user-defined. In some cases, the degree of OCR complexity may indicate that the server 104 a is appropriate (or more appropriate than the processor 204) for performing OCR for at least a portion of the image. In these cases, the processor 204 may transmit at least the portion of the image to the server 104 a. In other cases, the processor 204 may transmit the entire image to the server 104 a for OCR, or may not transmit anything to the server 104 a and thus may perform the OCR entirely by itself.
More specifically, when the degree of OCR complexity is less than a first OCR complexity threshold, the processor 204 may perform the OCR entirely by itself. When the degree of OCR complexity is greater than the first OCR complexity threshold and less than a second OCR complexity threshold, the processor 204 may transmit at least the portion of the image to the server 104 b. When the degree of OCR complexity is greater than the second OCR complexity threshold, the processor 204 may transmit the entire image to the server 104 b. Further, in some cases, the processor 204 may determine that a lower resolution version of the image is sufficient for the server 104 a and thus the processor 204 may transmit at least a portion of the lower resolution version of the image to the server 104 a. When at least the portion of the image is transmitted to the server 104 a, the server 104 a can return OCR results to the mobile computing device 108.
The appropriateness of the server 104 a and the processor 204 to perform OCR on the image can refer to an expected level of accuracy and/or efficiency for the server 104 a and the processor 204 to perform the OCR, respectively. The processor 204 can use any suitable OCR algorithms to perform the OCR itself. After obtaining the OCR results locally and/or from the server 104 a, the processor 204 can compile the OCR results to obtain an OCR text. The OCR text represents OCR results for the object 124 having the text 128 thereon. Depending on the quality of the OCR results, the OCR text may be the same as the text 128 or different than the text 128. The processor 204 can determine a source language of the OCR text.
Once the OCR text is obtained, the processor 204 can determine the source language of the OCR text. When the processor 204 is not confident in its determination of the source language, the processor 204 may send the OCR text to the server 104 b for this determination and, if requested, for machine language translation as well. When machine language translation of the OCR text is requested, the OCR text can be translated from its source language to a target language. For example, the OCR text can be translated in response to a translation request that is (i) generated in response to an input from the user 116 or (ii) generated automatically in response to determining that the source language is not one or one or more languages preferred by the user 116. In response to this translation request, the processor 204 can determine a degree of translation complexity for translating the OCR text from the source language to the target language. The degree of translation complexity is indicative of a degree of difficulty for the processor 204 to perform machine language translation of the OCR text itself.
Example factors in determining the degree of translation complexity include, but are not limited to, complexities of the source language and/or the target language and a number of characters, words, and/or sentences in the OCR text. Less complex (simple), more common, and/or more utilized languages may correspond to a higher degree of translation complexity, whereas more complex, less common, and/or less utilized languages may correspond to a higher degree of translation complexity. Fewer characters, words, and/or sentences may correspond to a lower degree of translation complexity, whereas more characters, words, and/or sentences may correspond to a higher degree of translation complexity. For example only, English may have a low degree of translation complexity and Russian may have a high degree of translation complexity.
Based on the degree of translation complexity, the processor 204 can determine whether to transmit the OCR text to the server 104 b for machine language translation. For example, the processor 204 may compare the degree of translation complexity to one or more translation complexity thresholds. The translation complexity threshold(s) can be predefined or user-defined. The mobile computing device 108 may also have local language packs, e.g., stored at the memory 208, and these local language packs may contain information that can be used by the processor 204 in performing machine language translation itself. The presence and type of these local language packs, therefore, may affect the translation complexity threshold(s). In some cases, the degree of translation complexity may indicate that the server 104 b is appropriate (or more appropriate than the processor 204) for performing machine language translation for at least a portion of the OCR text.
In these cases, the processor 204 may transmit at least the portion of the OCR text to the server 104 b. In other cases, the processor 204 may transmit the entire OCR text to the server 104 b for OCR, or may not transmit anything to the server 104 b and thus may perform the machine language translation entirely by itself. More specifically, when the degree of translation complexity is less than a first translation complexity threshold, the processor 204 may perform the machine language translation entirely by itself. When the degree of translation complexity is greater than the first translation complexity threshold and less than a second translation complexity threshold, the processor 204 may transmit at least the portion of the OCR text to the server 104 b. When the degree of translation complexity is greater than the second translation complexity threshold, the processor 204 may transmit the entire OCR text to the server 104 b.
The processor 204 can use any suitable machine translation algorithms to perform the machine language translation itself. After obtaining the machine language translation results locally and/or from the server 104 b, the processor 204 can compile the machine language translation results to obtain a translated text. The translated text represents machine language translation results for the OCR text. Depending on the quality of the machine language translation results, the translated text may or may not be an accurate translation of the OCR text from the source language to the target language. Similarly, depending on the quality of the OCR results as discussed above, the translated text may or may not be an accurate translation of the text 128 of the object 124 from the source language to the target language. After obtaining the translated text, the processor 204 can output the translated text. For example, the processor 204 can output the translated text to the display 120.
Referring now to FIG. 3A, a flow diagram of an example technique 300 for distributed OCR and distributed machine language translation is illustrated. At 304, the mobile computing device 108 can receive an image of the object 124 comprising the text 128 in a source language. At 308, the mobile computing device 108 can determine the degree of OCR complexity for performing OCR on the image to obtain the text 128. At 312, the mobile computing device 108 can compare the degree of OCR complexity to the first and second OCR complexity thresholds. When the degree of OCR complexity is less than the first OCR complexity threshold, the processor 204 can perform OCR on the image entirely by itself at 316 and then proceed to 332.
When the degree of OCR complexity is between the first and second OCR complexity thresholds, the mobile computing device 108 can transmit a portion of the image to the server 104 a for OCR at 320 and then proceed to 328. When the degree of OCR complexity is greater than the second OCR complexity threshold, the mobile computing device 108 can transmit the entire image to the server 104 a for OCR at 324 and then proceed to 328. At 328, the mobile computing device 108 can receive OCR results from the server 104 a and obtain the OCR text. At 332, the mobile computing device 108 can determine the source language of the OCR text. In some implementations, the mobile computing device 108 may transmit at least a portion the OCR text to the server 104 b to determine the source language.
At 336, the mobile computing device 108 can determine a degree of translation complexity for translating the OCR text from the source language to a target language. At 340, the mobile computing device 108 can compare the degree of translation complexity to the first and second translation complexity thresholds. When the degree of translation complexity is less than the first translation complexity threshold, the mobile computing device 108 can perform machine language translation of the OCR text entirely by itself at 344 and then proceed to 360. When the degree of translation complexity is between the first and second translation complexity thresholds, the mobile computing device 108 can transmit a portion of the OCR text to the server 104 b for machine language translation at 348 and then proceed to 356.
When the degree of translation complexity is greater than the second translation complexity threshold, the mobile computing device 108 can transmit the entire OCR text to the server 104 b for machine language translation at 352 and then proceed to 356. At 356, the mobile computing device 108 can receive the machine language translation results from the server 104 b and obtain the translated OCR text. At 360, the mobile computing device 108 can output the translated OCR text at the display 120. In some implementations, outputting the translated OCR text at the display includes outputting a portion of the translated OCR text obtained by the mobile computing device before outputting another portion of the OCR text obtained from the server 104 b. The technique 300 can then end or return to 304 for one or more additional cycles.
Referring now to FIG. 3B, an example technique 370 for distributed OCR is presented. At 371, the mobile computing device 104 can receive an image of the object 124 comprising the text 128 in the source language. At 372, the mobile computing device 108 can determine the degree of OCR complexity for performing OCR on the image to obtain the text 128. At 373, the mobile computing device 108 can transmit at least a portion of the image to the server 104 a based on the degree of OCR complexity. At 374, the mobile computing device 108 can receive OCR results from the server 104 a. At 375, the mobile computing device 108 can obtain an OCR text using the OCR results. At 376, the mobile computing device 108 can obtain a machine language translation of the OCR text from the source language to a target language to obtain a translated OCR text. At 377, the mobile computing device 108 can output the translated OCR text. The technique 370 can then end or return to 371 for one or more additional cycles.
Referring now to FIG. 3C, an example technique 380 for distributed machine language translation is presented. At 381, the mobile computing device 108 can receive an image of the object 124 comprising the text 128 in the source language. At 382, the mobile computing device 108 can obtain an OCR text from the image. At 383, the mobile computing device 108 can determine the source language of the OCR text. In some implementations, the mobile computing device 108 can transmit at least a portion of the OCR text to the server 104 b to determine the source language. At 384, the mobile computing device 108 can determine a degree of translation complexity for performing machine language translation of the OCR text from the source language to a target language. At 385, the mobile computing device 108 can transmit at least a portion of the OCR text to the server 104 b based on the degree of translation complexity. At 386, the mobile computing device 108 can receive machine language translation results from the server 104 b. At 387, the mobile computing device 108 can obtain a translation of the OCR text from the source language to the target language based on the machine language translation results to obtain a translated OCR text. At 388, the mobile computing device 108 can output the translated OCR text. The technique 380 can then end or return to 381 for one or more additional cycles.
Referring now to FIGS. 4A-4B, the display 120 of the example mobile computing device 108 at various stages during execution of the distributed OCR and machine language translation techniques is illustrated. FIG. 4A illustrates an image of the object 124, which for purposes of FIGS. 4A-4B is a menu in French. The menu includes header/title text 404 (“La Menu”) that is larger than other text 408. As previously discussed herein, the degree of OCR complexity can vary depending on text size, text style (bold, italics, etc.), and other similar factors. In this example, the mobile computing device 108 performs OCR for the header/title text 404 and the server 104 a performs OCR for the other text 408. More specifically, the mobile computing device 108 performs OCR for a first portion 412 containing the header/title text 404 and the server 104 a performs OCR for a second portion 416 containing the other text 408.
FIG. 4B illustrates the results of the distributed OCR. The mobile computing device 108 obtained a header/title OCR text 420 from the first portion 412 and the server 104 a obtained and provided another OCR text 424. These OCR texts 420, 424 collectively represent the OCR text for the image. In some implementations, the OCR texts 420, 424 can be italicized or otherwise accented, e.g., outlined or bordered, to indicate that OCR is complete and/or machine language translation has not yet been performed. For example, the local OCR may be completed first and thus the header/title OCR text 420 may be accented before the other OCR text 424. In this example, the user 116 is an English speaking user that cannot read or understand French, so he/she requests machine language translation. This could be performed automatically, e.g., based on their language preferences, or in response to an input from the user 116, e.g., selecting an icon, such as the camera button, or by pushing a physical button.
FIG. 4C illustrates the results of the local machine language translation from French to English. In this example, the mobile computing device 108 has a French-English local language pack, e.g., stored at the memory 208, and thus the mobile computing device 108 is capable of some French to English machine language translation. More specifically, the mobile computing device 108 is appropriate for performing machine language translation of a first portion 428 of the OCR text to obtain a first translated OCR text 432. This first portion 428 of the OCR text may include easy/simple words for French to English machine language translation. The mobile computing device 108, however, may be incapable of, inaccurate in, or inefficient in performing machine language translation on a second portion 436 of the OCR text. More specifically, this second portion 436 of the OCR text includes a word 440 (Escargots) that the mobile computing device 108 is not appropriate for performing OCR.
Because the degree of translation complexity being too high, e.g., because the word 440 is not in the local language pack, the mobile computing device 108 can determine that the second portion 436 of the OCR text should be sent to the server 104 b for machine language translation. In some implementations, the styling/accenting of the text can be removed once machine language translation is complete to notify the user 116. As shown in FIG. 4C, the first translated OCR text 432 has no styling or accenting. Lastly, FIG. 4D illustrates the results of the distributed machine language translation from French to English. The server 104 b has obtained and provided a second translated OCR text 444. The first translated OCR text 432 and the second translated OCR text 444 collectively represent the translated OCR text for the image. In response to receiving the second translated OCR text 444 from the server 104 b, the mobile computing device 108 displays the second translated OCR text 444. For example, the mobile computing device 108 may overlay and/or fade from the word 440 to the second translated OCR text 444.
Example embodiments are provided so that this disclosure will be thorough, and will fully convey the scope to those who are skilled in the art. Numerous specific details are set forth such as examples of specific components, devices, and methods, to provide a thorough understanding of embodiments of the present disclosure. It will be apparent to those skilled in the art that specific details need not be employed, that example embodiments may be embodied in many different forms and that neither should be construed to limit the scope of the disclosure. In some example embodiments, well-known procedures, well-known device structures, and well-known technologies are not described in detail.
The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. As used herein, the singular forms “a,” “an,” and “the” may be intended to include the plural forms as well, unless the context clearly indicates otherwise. The term “and/or” includes any and all combinations of one or more of the associated listed items. The terms “comprises,” “comprising,” “including,” and “having,” are inclusive and therefore specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order discussed or illustrated, unless specifically identified as an order of performance. It is also to be understood that additional or alternative steps may be employed.
Although the terms first, second, third, etc. may be used herein to describe various elements, components, regions, layers and/or sections, these elements, components, regions, layers and/or sections should not be limited by these terms. These terms may be only used to distinguish one element, component, region, layer or section from another region, layer or section. Terms such as “first,” “second,” and other numerical terms when used herein do not imply a sequence or order unless clearly indicated by the context. Thus, a first element, component, region, layer or section discussed below could be termed a second element, component, region, layer or section without departing from the teachings of the example embodiments.
As used herein, the term module may refer to, be part of, or include: an Application Specific Integrated Circuit (ASIC); an electronic circuit; a combinational logic circuit; a field programmable gate array (FPGA); a processor or a distributed network of processors (shared, dedicated, or grouped) and storage in networked clusters or datacenters that executes code or a process; other suitable components that provide the described functionality; or a combination of some or all of the above, such as in a system-on-chip. The term module may also include memory (shared, dedicated, or grouped) that stores code executed by the one or more processors.
The term code, as used above, may include software, firmware, byte-code and/or microcode, and may refer to programs, routines, functions, classes, and/or objects. The term shared, as used above, means that some or all code from multiple modules may be executed using a single (shared) processor. In addition, some or all code from multiple modules may be stored by a single (shared) memory. The term group, as used above, means that some or all code from a single module may be executed using a group of processors. In addition, some or all code from a single module may be stored using a group of memories.
The techniques described herein may be implemented by one or more computer programs executed by one or more processors. The computer programs include processor-executable instructions that are stored on a non-transitory tangible computer readable medium. The computer programs may also include stored data. Non-limiting examples of the non-transitory tangible computer readable medium are nonvolatile memory, magnetic storage, and optical storage.
Some portions of the above description present the techniques described herein in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. These operations, while described functionally or logically, are understood to be implemented by computer programs. Furthermore, it has also proven convenient at times to refer to these arrangements of operations as modules or by functional names, without loss of generality.
Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system memories or registers or other such information storage, transmission or display devices.
Certain aspects of the described techniques include process steps and instructions described herein in the form of an algorithm. It should be noted that the described process steps and instructions could be embodied in software, firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by real time network operating systems.
The present disclosure also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored on a computer readable medium that can be accessed by the computer. Such a computer program may be stored in a tangible computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. Furthermore, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
The algorithms and operations presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may also be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatuses to perform the required method steps. The required structure for a variety of these systems will be apparent to those of skill in the art, along with equivalent variations. In addition, the present disclosure is not described with reference to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present disclosure as described herein, and any references to specific languages are provided for disclosure of enablement and best mode of the present invention.
The present disclosure is well suited to a wide variety of computer network systems over numerous topologies. Within this field, the configuration and management of large networks comprise storage devices and computers that are communicatively coupled to dissimilar computers and storage devices over a network, such as the Internet.
The foregoing description of the embodiments has been provided for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure. Individual elements or features of a particular embodiment are generally not limited to that particular embodiment, but, where applicable, are interchangeable and can be used in a selected embodiment, even if not specifically shown or described. The same may also be varied in many ways. Such variations are not to be regarded as a departure from the disclosure, and all such modifications are intended to be included within the scope of the disclosure.

Claims (27)

What is claimed is:
1. A computer-implemented method, comprising:
during an image capture mode, receiving, at a mobile computing device having one or more processors, an image capture request;
in response to receiving the image capture request:
capturing, at the mobile computing device, an image of an object comprising a text in a source language;
determining, at the mobile computing device, a degree of optical character recognition (OCR) complexity for performing OCR on the image to obtain the text;
transmitting, from the mobile computing device to a server, at least a portion of the image based on the degree of OCR complexity;
receiving, at the mobile computing device from the server, OCR results; and
obtaining, at the mobile computing device, an OCR text based on the OCR results;
in response to the image capture request and obtaining the OCR text, determining, at the mobile computing device, whether to translate the OCR text to a different target language; and
in response to determining to translate the OCR text to the target language:
obtaining, at the mobile computing device, a machine language translation of the OCR text from the source language to the target language to obtain a translated OCR text;
obtaining, at the mobile computing device, a modified image by modifying (i) the image to replace the text with the translated OCR text and (ii) a styling of the translated OCR text such that its styling differs from a styling of the text; and
outputting, at a display of the mobile computing device, the modified image.
2. The computer-implemented method of claim 1, further comprising:
performing, at the mobile computing device, OCR for the entire image when the degree of OCR complexity is less than a first OCR complexity threshold, wherein the first OCR complexity threshold represents a degree of OCR complexity that the mobile computing device is appropriate for performing itself; and
transmitting, from the mobile computing device to the server, at least the portion of the image when the degree of OCR complexity is greater than the first OCR complexity threshold.
3. The computer-implemented method of claim 2, further comprising transmitting, from the mobile computing device to the server, all of the image when the degree of OCR complexity is greater than a second OCR complexity threshold that is greater than the first OCR complexity threshold, wherein the second OCR complexity threshold represents a degree of OCR complexity that the mobile computing device is not appropriate for performing itself.
4. The computer-implemented method of claim 3, wherein when the degree of OCR complexity is between the first and second OCR complexity thresholds, the mobile computing device performs OCR for a first portion of the image and the mobile computing device transmits a second portion of the image to the server, the first and second portions of the image collectively forming the entire image.
5. The computer-implemented method of claim 4, wherein OCR results for the first portion of the image that are obtained by the mobile computing device are used to generate and display the modified image before the OCR results for the second portion of the image are subsequently received from the server and used to generate and display a further modified image.
6. The computer-implemented method of claim 1, wherein determining whether to translate the OCR text from the source language to the target language includes determining whether the source language is a preferred language of a user associated with the mobile computing device.
7. The computer-implemented method of claim 1, wherein the styling of the translated OCR text is at least one of one of (i) italics and (ii) outlined or bordered.
8. The computer-implemented method of claim 1, wherein the modified image is displayed during the image capture mode.
9. The computer-implemented method of claim 1, wherein the modified image is displayed during an image preview mode that is transitioned to after capturing the image during the image capture mode.
10. A computer-implemented method, comprising:
during an image capture mode, receiving, at a mobile computing device having one or more processors, an image capture request;
in response to receiving the image capture request:
capturing, at the mobile computing device, an image of an object comprising a text in a source language;
obtaining, at the mobile computing device, optical character recognition (OCR) results for the object and the text to obtain an OCR text; and
determining, at the mobile computing device, the source language of the OCR text;
in response to receiving the image capture request and determining the source language of the OCR text, determining, at the mobile computing device, whether to translate the OCR text to a different target language; and
in response to determining to translate the OCR text to the target language:
determining, at the mobile computing device, a degree of translation complexity for performing machine language translation of the OCR text from the source language to the target language;
transmitting, from the mobile computing device to a server, at least a portion of the OCR text based on the degree of translation complexity;
receiving, at the mobile computing device from the server, machine language translation results;
obtaining, at the mobile computing device, a translated OCR text based on the machine language translation results;
obtaining, at the mobile computing device, a modified image by modifying (i) the image to replace the text with the translated OCR text and (ii) a styling of the translated OCR text such that its styling differs from a styling of the text; and
outputting, at a display of the mobile computing device, the modified image.
11. The computer-implemented method of claim 10, further comprising:
performing, at the mobile computing device, machine language translation for the entire OCR text when the degree of translation complexity is less than a first translation complexity threshold, wherein the first translation complexity threshold represents a degree of translation complexity that the mobile computing device is appropriate for performing itself; and
transmitting, from the mobile computing device to the server, at least the portion of the OCR text when the degree of translation complexity is greater than the first translation complexity threshold.
12. The computer-implemented method of claim 11, further comprising transmitting, from the mobile computing device to the server, all of the OCR text when the degree of translation complexity is greater than a second translation complexity threshold that is greater than the first translation complexity threshold, wherein the second translation complexity threshold represents a degree of translation complexity that the mobile computing device is not appropriate for performing itself.
13. The computer-implemented method of claim 12, wherein when the degree of translation complexity is between the first and second translation complexity thresholds, the mobile computing device performs machine language translation for a first portion of the OCR text and the mobile computing device transmits a second portion of the OCR text to the server, the first and second portions of the OCR text collectively forming the entire OCR text.
14. The computer-implemented method of claim 13, wherein machine language translation results for the first portion of the OCR text that are obtained by the mobile computing device are used to generate and display the modified image before the machine language translation results for the second portion of the OCR text are subsequently received from the server and used to generate and display a further modified image.
15. The computer-implemented method of claim 10, wherein determining whether to translate the OCR text from the source language to the target language includes determining whether the source language is a preferred language of a user associated with the mobile computing device.
16. A computer-implemented method, comprising:
during an image capture mode, receiving, at a mobile computing device having one or more processors, an image capture request;
in response to receiving the image capture request:
capturing, at the mobile computing device, an image of an object comprising a text in a source language;
determining, at the mobile computing device, a degree of optical character recognition (OCR) complexity for performing OCR on the image to obtain the text;
transmitting, from the mobile computing device to a first server, at least a portion of the image based on the degree of OCR complexity;
receiving, at the mobile computing device from the first server, OCR results; and
obtaining, at the mobile computing device, an OCR text based on the OCR results;
in response to receiving the image capture request and obtaining the OCR text, determining, at the mobile computing device, whether to translate the OCR text to a different target language; and
in response to determining to translate the OCR text to the target language:
determining, at the mobile computing device, a degree of translation complexity for performing machine language translation of the OCR text from the source language to the target language;
transmitting, from the mobile computing device to a second server, at least a portion of the OCR text based on the degree of translation complexity;
receiving, at the mobile computing device from the second server, machine language translation results;
obtaining, at the mobile computing device, a translated OCR text based on the machine language translation results;
obtaining, at the mobile computing device, a modified image by modifying (i) the image to replace the text with the translated OCR text and (ii) a styling of the translated OCR text such that its styling differs from a styling of the text; and
outputting, at a display of the mobile computing device, the modified image.
17. The computer-implemented method of claim 16, further comprising:
performing, at the mobile computing device, OCR for the entire image when the degree of OCR complexity is less than a first OCR complexity threshold, wherein the first OCR complexity threshold represents a degree of OCR complexity that the mobile computing device is appropriate for performing itself; and
transmitting, from the mobile computing device to the first server, at least the portion of the image when the degree of OCR complexity is greater than the first OCR complexity threshold.
18. The computer-implemented method of claim 17, further comprising:
transmitting, from the mobile computing device to the first server, all of the image when the degree of OCR complexity is greater than a second OCR complexity threshold that is greater than the first OCR complexity threshold, wherein the second OCR complexity threshold represents a degree of OCR complexity that the mobile computing device is not appropriate for performing itself, and
wherein when the degree of OCR complexity is between the first and second OCR complexity thresholds, the mobile computing device performs OCR for a first portion of the image and the mobile computing device transmits a second portion of the image to the first server, the first and second portions of the image collectively forming the entire image.
19. The computer-implemented method of claim 17, wherein the translated OCR text includes first and second portions corresponding to machine language translation by the mobile computing device and the second server, respectively, wherein the modified image includes the first portion of the translated OCR text in place of a corresponding portion of the text, wherein the modified image is displayed while awaiting the second portion of the translated OCR text from the second server, and further comprising:
modifying, at the mobile computing device, the modified image to include the second portion of the translated OCR text in place of a corresponding portion of the text to obtain a further modified image; and
outputting, at the display, the further modified image.
20. The computer-implemented method of claim 16, further comprising:
performing, at the mobile computing device, machine language translation for the entire OCR text when the degree of translation complexity is less than a first translation complexity threshold, wherein the first translation complexity threshold represents a degree of translation complexity that the mobile computing device is appropriate for performing itself; and
transmitting, from the mobile computing device to the second server, at least the portion of the OCR text when the degree of translation complexity is greater than the first translation complexity threshold.
21. The computer-implemented method of claim 20, further comprising:
transmitting, from the mobile computing device to the second server, all of the OCR text when the degree of translation complexity is greater than a second translation complexity threshold that is greater than the first translation complexity threshold, wherein the second translation complexity threshold represents a degree of translation complexity that the mobile computing device is not appropriate for performing itself, and
wherein when the degree of translation complexity is between the first and second translation complexity thresholds, the mobile computing device performs machine language translation for a first portion of the OCR text and the mobile computing device transmits a second portion of the OCR text to the second server, the first and second portions of the OCR text collectively forming the entire OCR text.
22. The computer-implemented method of claim 16, wherein determining whether to translate the OCR text from the source language to the target language includes determining whether the source language is a preferred language of a user associated with the mobile computing device.
23. A computer-implemented method, comprising:
during an image capture mode, receiving, at a mobile computing device having one or more processors, an image capture request;
in response to the image capture request:
capturing, at the mobile computing device, an image of an object comprising a text in a source language;
obtaining, at the mobile computing device, the text; and
determining, at the mobile computing device, the source language of the text;
in response to the image capture request and determining the source language of the text, determining, at the mobile computing device, whether to translate the text to a different target language; and
in response to determining to translate the text to the target language:
determining, at the mobile computing device, a degree of translation complexity for performing machine language translation of the text from the source language to the target language;
transmitting, from the mobile computing device to a server, at least a portion of the text based on the degree of translation complexity;
receiving, at the mobile computing device from the server, machine language translation results;
obtaining, at the mobile computing device, a translated text based on the machine language translation results;
obtaining, at the mobile computing device, a modified image by modifying (i) the image to replace the text with the translated text and (ii) a styling of the translated text such that its styling differs from a styling of the text; and
outputting, at a display of the mobile computing device, the modified image.
24. The computer-implemented method of claim 23, further comprising:
performing, at the mobile computing device, machine language translation for the entire text when the degree of translation complexity is less than a first translation complexity threshold, wherein the first translation complexity threshold represents a degree of translation complexity that the mobile computing device is appropriate for performing itself; and
transmitting, from the mobile computing device to the server, at least the portion of the text when the degree of translation complexity is greater than the first translation complexity threshold.
25. The computer-implemented method of claim 24, further comprising transmitting, from the mobile computing device to the server, all of the text when the degree of translation complexity is greater than a second translation complexity threshold that is greater than the first translation complexity threshold, wherein the second translation complexity threshold represents a degree of translation complexity that the mobile computing device is not appropriate for performing itself.
26. The computer-implemented method of claim 25, wherein when the degree of translation complexity is between the first and second translation complexity thresholds, the mobile computing device performs machine language translation for a first portion of the text and the mobile computing device transmits a second portion of the text to the server, the first and second portions of the text collectively forming the entire text.
27. The computer-implemented method of claim 26, wherein machine language translation results for the first portion of the text that are obtained by the mobile computing device are output to the display of the mobile computing device before the machine language translation results for the second portion of the text are received from the server.
US14/264,327 2014-04-29 2014-04-29 Techniques for distributed optical character recognition and distributed machine language translation Active 2034-05-04 US9514377B2 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US14/264,327 US9514377B2 (en) 2014-04-29 2014-04-29 Techniques for distributed optical character recognition and distributed machine language translation
KR1020167033289A KR101856119B1 (en) 2014-04-29 2015-04-28 Techniques for distributed optical character recognition and distributed machine language translation
EP15720892.7A EP3138046B1 (en) 2014-04-29 2015-04-28 Techniques for distributed optical character recognition and distributed machine language translation
PCT/US2015/027884 WO2015168056A1 (en) 2014-04-29 2015-04-28 Techniques for distributed optical character recognition and distributed machine language translation
CN201580029025.7A CN106415605B (en) 2014-04-29 2015-04-28 Technology for distributed optical character identification and distributed machines language translation
EP16202790.8A EP3168756B1 (en) 2014-04-29 2015-04-28 Techniques for distributed optical character recognition and distributed machine language translation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/264,327 US9514377B2 (en) 2014-04-29 2014-04-29 Techniques for distributed optical character recognition and distributed machine language translation

Publications (2)

Publication Number Publication Date
US20150310291A1 US20150310291A1 (en) 2015-10-29
US9514377B2 true US9514377B2 (en) 2016-12-06

Family

ID=53053145

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/264,327 Active 2034-05-04 US9514377B2 (en) 2014-04-29 2014-04-29 Techniques for distributed optical character recognition and distributed machine language translation

Country Status (5)

Country Link
US (1) US9514377B2 (en)
EP (2) EP3138046B1 (en)
KR (1) KR101856119B1 (en)
CN (1) CN106415605B (en)
WO (1) WO2015168056A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150074568A1 (en) * 2013-09-09 2015-03-12 Lg Electronic Inc. Mobile terminal and controlling method thereof
US10824917B2 (en) 2018-12-03 2020-11-03 Bank Of America Corporation Transformation of electronic documents by low-resolution intelligent up-sampling

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10579741B2 (en) * 2016-08-17 2020-03-03 International Business Machines Corporation Proactive input selection for improved machine translation
RU2634195C1 (en) * 2016-12-06 2017-10-24 Общество с ограниченной ответственностью "Аби Девелопмент" Method and device for determining document suitability for optical character recognition (ocr)
RU2632427C1 (en) 2016-12-09 2017-10-04 Общество с ограниченной ответственностью "Аби Девелопмент" Optimization of data exchange between client device and server
KR102478396B1 (en) * 2017-11-29 2022-12-19 삼성전자주식회사 The Electronic Device Recognizing the Text in the Image
JP6852666B2 (en) * 2017-12-26 2021-03-31 京セラドキュメントソリューションズ株式会社 Image forming device
KR102598104B1 (en) 2018-02-23 2023-11-06 삼성전자주식회사 Method for displaying text information on an object contained in an image by compensating for motion generated during time of receiving text information from an external electronic device and electronic device thereof
US10963723B2 (en) * 2018-12-23 2021-03-30 Microsoft Technology Licensing, Llc Digital image transcription and manipulation
CN112988011B (en) * 2021-03-24 2022-08-05 百度在线网络技术(北京)有限公司 Word-taking translation method and device
CN115543495A (en) * 2021-06-30 2022-12-30 腾讯科技(深圳)有限公司 Interface management method, device, equipment and readable storage medium
KR20230083971A (en) * 2021-12-03 2023-06-12 주식회사 오후랩스 A method for translating and editing text contained within an image and a device for performing the same
KR20240022732A (en) 2022-08-12 2024-02-20 주식회사 쿠자피에이에스 Text translation system embedded in images

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5546538A (en) 1993-12-14 1996-08-13 Intel Corporation System for processing handwriting written by user of portable computer by server or processing by the computer when the computer no longer communicate with server
US5848386A (en) * 1996-05-28 1998-12-08 Ricoh Company, Ltd. Method and system for translating documents using different translation resources for different portions of the documents
US6285978B1 (en) * 1998-09-24 2001-09-04 International Business Machines Corporation System and method for estimating accuracy of an automatic natural language translation
US6567547B1 (en) * 1999-03-05 2003-05-20 Hewlett-Packard Company System and method for dynamically switching OCR packages
US20030200078A1 (en) 2002-04-19 2003-10-23 Huitao Luo System and method for language translation of character strings occurring in captured image data
US20060181605A1 (en) 2000-11-06 2006-08-17 Boncyk Wayne C Data capture and identification system and process
US20060245005A1 (en) * 2005-04-29 2006-11-02 Hall John M System for language translation of documents, and methods
US20080118162A1 (en) * 2006-11-20 2008-05-22 Microsoft Corporation Text Detection on Mobile Communications Devices
US20080262828A1 (en) * 2006-02-17 2008-10-23 Google Inc. Encoding and Adaptive, Scalable Accessing of Distributed Models
US20110123115A1 (en) * 2009-11-25 2011-05-26 Google Inc. On-Screen Guideline-Based Selective Text Recognition
US8131536B2 (en) * 2007-01-12 2012-03-06 Raytheon Bbn Technologies Corp. Extraction-empowered machine translation
US20120075648A1 (en) 2010-09-24 2012-03-29 Keys Gregory C System and method for distributed optical character recognition processing
US8218020B2 (en) * 2008-11-21 2012-07-10 Beyo Gmbh Providing camera-based services using a portable communication device
US20120330646A1 (en) 2011-06-23 2012-12-27 International Business Machines Corporation Method For Enhanced Location Based And Context Sensitive Augmented Reality Translation
US20130114849A1 (en) 2011-11-04 2013-05-09 Microsoft Corporation Server-assisted object recognition and tracking for mobile devices
US8508787B2 (en) * 2009-11-23 2013-08-13 Xerox Corporation System and method for automatic translation of documents scanned by multifunctional printer machines
US20130289971A1 (en) 2012-04-25 2013-10-31 Kopin Corporation Instant Translation System
US8983190B2 (en) * 2013-08-13 2015-03-17 Bank Of America Corporation Dynamic service configuration during OCR capture

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9262409B2 (en) * 2008-08-06 2016-02-16 Abbyy Infopoisk Llc Translation of a selected text fragment of a screen
CN102622342B (en) * 2011-01-28 2018-09-28 上海肇通信息技术有限公司 Intermediate family of languages system, intermediate language engine, intermediate language translation system and correlation method
CN104573685B (en) * 2015-01-29 2017-11-21 中南大学 A kind of natural scene Method for text detection based on linear structure extraction

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5546538A (en) 1993-12-14 1996-08-13 Intel Corporation System for processing handwriting written by user of portable computer by server or processing by the computer when the computer no longer communicate with server
US5848386A (en) * 1996-05-28 1998-12-08 Ricoh Company, Ltd. Method and system for translating documents using different translation resources for different portions of the documents
US6285978B1 (en) * 1998-09-24 2001-09-04 International Business Machines Corporation System and method for estimating accuracy of an automatic natural language translation
US6567547B1 (en) * 1999-03-05 2003-05-20 Hewlett-Packard Company System and method for dynamically switching OCR packages
US20060181605A1 (en) 2000-11-06 2006-08-17 Boncyk Wayne C Data capture and identification system and process
US20030200078A1 (en) 2002-04-19 2003-10-23 Huitao Luo System and method for language translation of character strings occurring in captured image data
US20060245005A1 (en) * 2005-04-29 2006-11-02 Hall John M System for language translation of documents, and methods
US20080262828A1 (en) * 2006-02-17 2008-10-23 Google Inc. Encoding and Adaptive, Scalable Accessing of Distributed Models
US20080118162A1 (en) * 2006-11-20 2008-05-22 Microsoft Corporation Text Detection on Mobile Communications Devices
US8131536B2 (en) * 2007-01-12 2012-03-06 Raytheon Bbn Technologies Corp. Extraction-empowered machine translation
US8218020B2 (en) * 2008-11-21 2012-07-10 Beyo Gmbh Providing camera-based services using a portable communication device
US8508787B2 (en) * 2009-11-23 2013-08-13 Xerox Corporation System and method for automatic translation of documents scanned by multifunctional printer machines
US20110123115A1 (en) * 2009-11-25 2011-05-26 Google Inc. On-Screen Guideline-Based Selective Text Recognition
US20120075648A1 (en) 2010-09-24 2012-03-29 Keys Gregory C System and method for distributed optical character recognition processing
US8625113B2 (en) * 2010-09-24 2014-01-07 Ricoh Company Ltd System and method for distributed optical character recognition processing
US20120330646A1 (en) 2011-06-23 2012-12-27 International Business Machines Corporation Method For Enhanced Location Based And Context Sensitive Augmented Reality Translation
US20130114849A1 (en) 2011-11-04 2013-05-09 Microsoft Corporation Server-assisted object recognition and tracking for mobile devices
US20130289971A1 (en) 2012-04-25 2013-10-31 Kopin Corporation Instant Translation System
US8983190B2 (en) * 2013-08-13 2015-03-17 Bank Of America Corporation Dynamic service configuration during OCR capture

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
Berclaz, Jerome et al.: "Image-based mobile service: automatic text extraction and translation", Proceedings of SPIE, vol. 7542, Jan. 27, 2010, 12 pages, XP055200900, ISSN: 0277-786X.
Fragoso, Victor et al.: "TranslatAR: A mobile augmented reality translator", Applications of Computer Vision (WACV), 2011 IEEE Workshop ON, IEEE, Jan. 5, 2011, pp. 497-502, XP031913615.
Hsueh, Michael: "Interactive Text Recognition and Translation on a Mobile Device", Electrical Engineering and Computer Sciences, University of California at Berkeley, Technical Report No. UCB/EECS-2011-57, May 13, 2011, pp. 1-13, XP055052714, Retrieved from the Internet: URL: http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-57.pdf.
International Search Report and Written Opinion for International Application No. PCT/US2015/027873 mailed Jul. 20, 2015, 11 pages.
Partial International Search Report for International Application No. PCT/US2015/027884 mailed Jul. 16, 2015, 5 pages.
PCT International Search Report and Written Opinion dated Sep. 28, 2015 for PCT International Application PCT/US2015/027884, 15 pages.
Zhang, Mi et al.: "OCRdroid: A Framework to Digitize Text Using Mobile Phones" In:g "Mobile Computing, Applications, and Services", Jan. 1, 2010, Springer Berlin Heidelberg, Berlin, Heidelberg, XP055177757, ISBN: 978-3-64-212606-2 vol. 35, pp. 273-292.

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150074568A1 (en) * 2013-09-09 2015-03-12 Lg Electronic Inc. Mobile terminal and controlling method thereof
US9723124B2 (en) * 2013-09-09 2017-08-01 Lg Electronics Inc. Mobile terminal and controlling method thereof
US10547726B2 (en) 2013-09-09 2020-01-28 Lg Electronics Inc. Mobile terminal and controlling method thereof
US11064063B2 (en) 2013-09-09 2021-07-13 Lg Electronics Inc. Mobile terminal and controlling method thereof
US10824917B2 (en) 2018-12-03 2020-11-03 Bank Of America Corporation Transformation of electronic documents by low-resolution intelligent up-sampling

Also Published As

Publication number Publication date
KR20160147969A (en) 2016-12-23
US20150310291A1 (en) 2015-10-29
EP3168756B1 (en) 2021-08-25
EP3138046B1 (en) 2020-03-11
WO2015168056A1 (en) 2015-11-05
EP3168756A1 (en) 2017-05-17
CN106415605B (en) 2019-10-22
CN106415605A (en) 2017-02-15
KR101856119B1 (en) 2018-05-10
EP3138046A1 (en) 2017-03-08

Similar Documents

Publication Publication Date Title
US9514376B2 (en) Techniques for distributed optical character recognition and distributed machine language translation
US9514377B2 (en) Techniques for distributed optical character recognition and distributed machine language translation
US9436682B2 (en) Techniques for machine language translation of text from an image based on non-textual context information from the image
US9524293B2 (en) Techniques for automatically swapping languages and/or content for machine translation
US9400848B2 (en) Techniques for context-based grouping of messages for translation
US9836456B2 (en) Techniques for providing user image capture feedback for improved machine language translation
US10140293B2 (en) Coordinated user word selection for translation and obtaining of contextual information for the selected word
US9946712B2 (en) Techniques for user identification of and translation of media
US20170329745A1 (en) Textual message ordering based on message content
US11190653B2 (en) Techniques for capturing an image within the context of a document
WO2017066293A1 (en) Techniques for attaching media captured by a mobile computing device to an electronic document
US10386935B2 (en) Input method editor for inputting names of geographic locations
US11024305B2 (en) Systems and methods for using image searching with voice recognition commands

Legal Events

Date Code Title Description
AS Assignment

Owner name: GOOGLE INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CUTHBERT, ALEXANDER JAY;XU, PENG;REEL/FRAME:032777/0231

Effective date: 20140428

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: GOOGLE LLC, CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044097/0658

Effective date: 20170929

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8