CN112183122A - Character recognition method and device, storage medium and electronic equipment - Google Patents

Character recognition method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN112183122A
CN112183122A CN202011140561.3A CN202011140561A CN112183122A CN 112183122 A CN112183122 A CN 112183122A CN 202011140561 A CN202011140561 A CN 202011140561A CN 112183122 A CN112183122 A CN 112183122A
Authority
CN
China
Prior art keywords
language
character
target
characters
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011140561.3A
Other languages
Chinese (zh)
Inventor
袁佳平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202011140561.3A priority Critical patent/CN112183122A/en
Publication of CN112183122A publication Critical patent/CN112183122A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The invention discloses a character recognition method and device, a storage medium and electronic equipment. Wherein, the method comprises the following steps: performing character recognition on the target picture to obtain first character information; under the condition that the first text information comprises at least two languages of texts and the at least two languages of texts comprise languages of which the languages are not the target languages, performing machine translation on the languages of which the languages are not the target languages in the first text information to obtain second text information; and displaying second character information, wherein characters in the second character information are all in the target language. The invention solves the technical problem of low foreign language character translation efficiency caused by manual input of translation software in the prior art.

Description

Character recognition method and device, storage medium and electronic equipment
Technical Field
The invention relates to the field of computers, in particular to a character recognition method and device, a storage medium and electronic equipment.
Background
There are always foreign words in daily life that are not understood, such as instructions on the packaging of imported goods. Often, the foreign language tourism will encounter the characters which can not be understood, such as the foreign menu. Typically, foreign language words may be translated by translation software.
However, the current translation software needs users to manually input foreign language characters, and the operation is relatively complicated. For example, for a long foreign language user, the user needs to input characters one by one, and the user is easy to input errors. In addition, for characters that are not composed of english characters, such as tai, korean, and japanese, the user generally does not know how to input them.
Aiming at the problem of low foreign language character translation efficiency caused by the fact that translation software in the prior art needs manual input in the related art, an effective solution is not provided at present.
Disclosure of Invention
The embodiment of the invention provides a character recognition method and device, a storage medium and electronic equipment, which at least solve the technical problem of low foreign language character translation efficiency caused by manual input required by translation software in the prior art.
According to an aspect of an embodiment of the present invention, there is provided a character recognition method, including: performing character recognition on the target picture to obtain first character information; under the condition that the first text information comprises at least two languages of texts and the at least two languages of texts comprise texts with languages not in the target language, performing machine translation on the texts with the languages not in the target language in the first text information to obtain second text information; and displaying the second character information, wherein the characters in the second character information are all in the target language.
According to another aspect of the embodiments of the present invention, there is also provided a character recognition apparatus, including: the identification module is used for carrying out character identification on the target picture to obtain first character information; the translation module is used for performing machine translation on the characters of which the languages are not the target language in the first character information to obtain second character information under the condition that the first character information comprises characters of at least two languages and the characters of the at least two languages comprise characters of which the languages are not the target language; and the display module is used for displaying the second text information, wherein the texts in the second text information are all in the target language.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned character recognition method when running.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the above-mentioned character recognition method through the computer program.
In the embodiment of the invention, a mode of character recognition on the picture is adopted, and the first character information is obtained by character recognition on the target picture; under the condition that the first text information comprises at least two languages of texts and the at least two languages of texts comprise the languages of the texts of which the languages are not the target languages, performing machine translation on the texts of which the languages are not the target languages in the first text information to obtain second text information; and displaying the second character information, wherein the characters in the second character information are the target languages, and the purpose that foreign language characters can be translated by different manual inputs is achieved, so that the technical effect of improving the foreign language character translation efficiency is realized, and the technical problem of low foreign language character translation efficiency caused by manual input required by translation software in the prior art is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a schematic diagram of an application environment of an alternative text recognition method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method of text recognition according to an embodiment of the present invention;
FIG. 3 is a first alternative interface diagram according to an embodiment of the present invention;
FIG. 4 is a second alternative interface diagram according to an embodiment of the present invention;
FIG. 5 is a first diagram illustrating an alternative target picture according to an embodiment of the present invention;
FIG. 6 is a second diagram illustrating an alternative target picture according to an embodiment of the present invention;
FIG. 7 is a schematic view of an alternative game interface according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of an alternative playback voice setting interface according to an embodiment of the present invention;
FIG. 9 is an alternative overall flow diagram according to an embodiment of the invention;
FIG. 10 is a schematic structural diagram of an alternative text recognition apparatus according to an embodiment of the present invention;
fig. 11 is a schematic structural diagram of an alternative electronic device according to an embodiment of the invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to an aspect of the embodiments of the present invention, a method for character recognition is provided, and optionally, as an optional implementation manner, the method for character recognition may be applied, but not limited, to the environment of a character recognition system as shown in fig. 1. The character recognition system includes: user equipment 102, network 110, and server 112.
The user equipment 102 may be a terminal device configured with a target application, and may include but is not limited to at least one of the following: mobile phones (such as Android phones, iOS phones, etc.), notebook computers, tablet computers, palm computers, MID (Mobile Internet Devices), PAD, desktop computers, smart televisions, etc. The target application may be a translation application, a shopping application, an instant messaging application, a gaming application, and the like. Wherein, the user equipment 102 includes: the display 108, the processor 106 and the memory 104, the display is used for displaying the second text information, the processor is used for performing text recognition on the target picture to obtain the first text information, and performing machine translation on the text of which the language is not the target language in the first text information to obtain the second text information.
The network 110 may include, but is not limited to: a wired network, a wireless network, wherein the wired network comprises: a local area network, a metropolitan area network, and a wide area network, the wireless network comprising: bluetooth, WIFI, and other networks that enable wireless communication.
The server 112 includes: a database 114 and a processing engine 116, wherein the database 114 is used for storing information such as a target picture, first text information and second text information. The processing engine 116 is configured to process the target picture, including but not limited to performing text recognition on the target picture to obtain first text information, performing machine translation on a text in a language other than the target language in the first text information, and the like. The server may be a single server, a server cluster composed of a plurality of servers, or a cloud server.
The above is merely an example, and this is not limited in this embodiment.
Specifically, the following steps are realized through the setting of the virtual keys:
in step S102, performing character recognition on the target picture to obtain first character information; in step S104, under the condition that the first text information includes at least two languages of text and the at least two languages of text include a language other than a target language, performing machine translation on the language other than the target language in the first text information to obtain second text information; in step S106, the second text information is displayed, wherein the texts in the second text information are all in the target language.
Optionally, as an optional implementation manner, as shown in fig. 2, the text recognition method includes:
step S202, carrying out character recognition on a target picture to obtain first character information;
step S204, under the condition that the first text information comprises at least two languages of characters and the at least two languages of characters comprise characters of which the languages are not the target languages, performing machine translation on the characters of which the languages are not the target languages in the first text information to obtain second text information;
step S206, displaying the second text information, wherein the texts in the second text information are all in the target language.
Through the steps, a mode of character recognition is adopted for the picture, and character recognition is carried out on the target picture to obtain first character information; under the condition that the first text information comprises at least two languages of texts and the at least two languages of texts comprise the languages of the texts of which the languages are not the target languages, performing machine translation on the texts of which the languages are not the target languages in the first text information to obtain second text information; and displaying the second character information, wherein the characters in the second character information are the target languages, and the purpose that foreign language characters can be translated by different manual inputs is achieved, so that the technical effect of improving the foreign language character translation efficiency is realized, and the technical problem of low foreign language character translation efficiency caused by manual input required by translation software in the prior art is solved.
As an alternative embodiment, the target picture may be an image obtained by shooting a foreign language text. For example, the foreign language characters can be photographed through a photographing function of the mobile phone terminal. Specifically, translation software can be installed in the mobile phone terminal, the mobile phone terminal authorizes the translation software to use the camera, the user opens the translation software, and foreign language characters are shot through a shooting function of the translation software to obtain a target picture. Fig. 3 is a first schematic diagram of an alternative interface according to an embodiment of the present invention, where fig. 3 may be a translation software interface with a shooting function. A "photographing translation" touch key is arranged in the display interface of the translation software shown in fig. 3, and a user can start a photographing function of the translation software by touching the touch key and can photograph foreign languages through a camera of a mobile phone to obtain a target picture.
As an alternative embodiment, the target picture may include foreign languages in various languages, for example, english, japanese, korean, russian, spanish, etc. The user may select a foreign language that needs to be recognized by text in the interface shown in fig. 3, for example, the user may select a language by touching a "select language" key, and after touching the key, the user may display a pull-down menu as shown in fig. 4, and in the interface shown in fig. 4, the user may select a language. In this embodiment, the translation software further has a function of automatically detecting the language type by the AI, and if the function is selected, the translation software can automatically recognize the language type of the text included in the target picture.
As an optional implementation manner, the text included in the first text information may be a word, or a sentence. For example, the word may be a word represented by "you", a word represented by "good luck", or a sentence represented by "How are you".
As an alternative embodiment, the user may also select the translated target language in the interface shown in fig. 3, and may select the language to be translated in the pull-down menu of the target language by touching "the target language", for example, the language may be chinese, english, or the like. In this embodiment, it is assumed that the target language is Chinese, and the first text information obtained by text recognition in the target picture includesThe translation software comprises at least two languages, for example, the first character information comprises English, Korean and Chinese, the English and Korean in the first character information can be translated into Chinese, the Chinese is combined with the second character information in the Chinese contained in the first character information, and the translated second character information is displayed in the translation software. For example, the target disc character recognition results in a first character information of "こ '/' にちは,
Figure BDA0002738140080000061
hello, which includes japanese, korean, english and chinese, assuming that the target language is chinese, it can be used to "こ is にちは in the first text message,
Figure BDA0002738140080000062
hello ' performs character translation to obtain second character information ' hello, hello and hello ', and displays the translated second character information in a terminal interface. In the embodiment, the character translation is performed in a picture mode, so that the problem of low efficiency caused by manual character input can be solved, the effect of improving the character translation efficiency is achieved, and convenience is provided for life.
Optionally, under the condition that the first text information includes texts in at least two languages and the texts in the at least two languages include texts in a language other than the target language, performing machine translation on the texts in the first text information in which the language is other than the target language to obtain second text information, including: under the condition that the characters of the at least two languages comprise a first character of which the language is not a target language and a second character of which the language is the target language, performing machine translation on the first character to obtain a third character of which the language is the target language; and combining the second words and the third words into the second word information.
As an optional implementation manner, the first text information obtained by text recognition of the target picture may include text of the target language, for example, in the above embodiment, assuming that the target language is chinese, the first text information obtained by text recognition of the target picture may include text of the target languageThe text information "こ '/' にちは,
Figure BDA0002738140080000072
hello, hello ", in said first character information including a first character whose language is not a target language" こ/(にちは),
Figure BDA0002738140080000073
hello ", and the second word" hello "in the target language. In this embodiment, for the first word "こ//" にちは "in the first word information that is not in the target language,
Figure BDA0002738140080000071
the hello ' translates to obtain a third word of ' hello, hello ', and combines with a second word of ' hello ' in the target language in the first word information to obtain a second word information of ' hello, hello '. In this embodiment, by translating the characters in the target language in the target picture according to the target language and combining the translated characters with the characters in the target language in the target picture, the translation time can be reduced and the translation efficiency can be improved.
Optionally, under the condition that the first text information includes texts in at least two languages and the texts in the at least two languages include texts in a language other than the target language, performing machine translation on the texts in the first text information in which the language is other than the target language to obtain second text information, including: and under the condition that the at least two languages are not the target language, performing machine translation on the characters of the at least two languages respectively to obtain a fourth character of the target language, wherein the second character information comprises the fourth character.
As an optional implementation manner, the first text information obtained by performing text recognition on the target picture may not include the text in the target language. For example, the target language is Chinese, the first character information obtained by character recognition of the target picture is "こ/(にちは),
Figure BDA0002738140080000081
hello ", then only one word for the first word information" こ si にちは,
Figure BDA0002738140080000082
hello ' can be translated to obtain the fourth word ' hello, hello '. In the embodiment, the target picture is subjected to character recognition, and the recognized characters are translated, so that the problem of low efficiency caused by manual character input can be solved, and the translation rate can be improved.
Optionally, the performing text recognition on the target picture to obtain first text information includes: performing character recognition on the target picture to obtain characters recognized at a group of recognition positions in the target picture; and performing language identification on the characters identified at the group of identification positions to obtain the language corresponding to the characters at each identification position, wherein the first character information comprises the characters at each identification position and the corresponding language.
As an optional implementation manner, since the target picture may further include other elements, for example, a picture background, or the picture includes a large amount of blanks, and the other elements in the picture do not need to perform character recognition, in this embodiment, a position of the character in the image may be determined first, where the position may be an area occupied by the character in the image. Fig. 5 is a first diagram illustrating an alternative target picture according to an embodiment of the present invention, which is a picture of a vase, and the picture includes vase lines and a large amount of blank parts in the picture. In this embodiment, the position of the text in the picture is identified first, the text may be labeled in a manner of identifying the frame to obtain the schematic diagram shown in fig. 6, and the text in the target picture is labeled in fig. 6 through the labeling frame, so that the position of the text in the target picture can be accurately determined, and further, the language type of the text in the labeling frame can be identified. In this embodiment, by determining the position of the text in the target picture and then performing language identification on the text at the position, the specific positions of all the texts included in the target picture in the picture can be accurately determined, and thus the accuracy rate of identifying all the texts in the target picture can be improved.
Optionally, the performing machine translation on the text in the language different from the target language in the first text information includes: acquiring characters of which the languages are not the target language from the first character information according to the languages corresponding to the characters at each identification position; and performing machine translation on the acquired characters of which the language is not the target language to acquire the characters of the target language, wherein the second character information comprises the characters of the target language acquired through machine translation.
As an optional implementation manner, after the language identification is performed on the characters marked out in the target picture, the characters which are not in the target language can be accurately determined, and then the characters which are not in the target language can be translated. For example, the schematic diagram shown in fig. 6 is obtained by labeling the characters in fig. 5 in a manner of a label box, and in fig. 6, the characters in the target picture are labeled in the label box, and then the languages of the characters in the label box are recognized. For example, the language "しょく" obtained by recognizing the language of the character in the label box in fig. 6 is japanese, and the language "Flower" is english. Assuming that the target language is Chinese, since none of the languages of the characters in FIG. 6 is the target language, translating the characters in FIG. 6 results in "colored flowers". In this embodiment, the position of the character in the target picture can be accurately identified by means of the marking frame, the language of the character at the position is identified, and the character which is not in the target language can be accurately determined, so that the character in the target language is translated, and the technical effect of improving the translation accuracy can be achieved.
Optionally, the performing machine translation on the text in the language different from the target language in the first text information includes: under the condition that the characters with the languages which are not the target language comprise fifth characters of a first language and sixth characters of a second language, and the fifth characters of the first language are adjacent to the sixth characters of the second language, performing machine translation on the sixth characters of the second language according to the fifth characters of the first language to obtain seventh characters of the target language; or under the condition that the language is not the fifth character of the target language and the sixth character of the second language, and the fifth character of the first language is adjacent to the sixth character of the second language, performing machine translation on the sixth character of the second language according to the eighth character of the target language to obtain the ninth character of the target language, wherein the eighth character of the target language is the fifth character of the first language, and thus the character of the target language is obtained.
As an optional implementation manner, since the same text may contain a plurality of different translation results, the text in the target picture is translated in a manner of combining context in this embodiment. Specifically, the translation may be performed according to the translation result of the adjacent word. Assuming that the first text information obtained by performing text recognition on the target picture is "This food おいしい", wherein the fifth text "food" in the first language of english is adjacent to the sixth text "おいしい" in the second language of japanese, and the "おいしい" can be translated according to the "food". In this embodiment, the seventh letter of the translation result of "おいしい" may be "tasty" from the pair of "food".
As an alternative embodiment, the translation may also be performed according to the translation result of the adjacent words. Adjacent words are translated based on the results of the translation of words in the other languages to the target language. For example, the first character information obtained by character-recognizing the target picture is "ようこそ, Open now". The fifth character "ようこそ" with japanese language in the first language is adjacent to the sixth character "Open now" with english language in the second language, and the "Open now" can be translated according to the translation result of the "ようこそ". Assuming that the target language is Chinese, the translation result of ようこそ is "welcome", and the translation result of ようこそ "welcome" is used to translate "Open now" in combination with the context, resulting in the ninth word "business now". By the embodiment, the words in the target picture are translated by combining the context, so that the problem of inaccurate translation result caused by the fact that the same word has multiple meanings can be solved, and the technical effect of improving the accuracy of the translation result can be achieved.
Optionally, the performing machine translation on the text in the language different from the target language in the first text information includes: and under the condition that the language is not the fifth character of the target language, the characters of the at least two languages comprise the tenth character of the target language, and the fifth character of the first language is adjacent to the tenth character of the target language, performing machine translation on the fifth character of the first language according to the tenth character of the target language to obtain the eleventh character of the target language.
As an optional implementation manner, the first text information obtained by performing text recognition on the target picture may further include text in the target language, in this embodiment, text in other languages may be translated based on the text in the target language, so that the translation result obtained is closer to the meaning of the target language. For example, the first text information obtained by performing text recognition on the target picture is "Open now, and the business hours: 9:00-17:00 ", assuming that the target language is chinese, wherein the first text information includes a fifth text" Open now "in the first language being english, and a tenth text" business time "in the target language: 9:00-17:00". In this embodiment, the tenth word "business hours" of the target language is combined: 9:00-17:00 ", translating the fifth character" Open now "to obtain the eleventh character" business now ". In this embodiment, the words in other languages are translated by combining the words in the target language in the target picture, so that the translation result is closer to the target language, the problem of inaccurate translation result caused by transliteration is avoided, and the technical effect of improving the translation accuracy is achieved.
Optionally, after obtaining the second text message, the method further includes: and converting the characters in the second character information into the voice of the target language to obtain target voice information and play the target voice information.
As an alternative, unexplained problems are often encountered when traveling abroad. For example, when asking a way or ordering at a restaurant, there is often a problem of language communication impairment. In the present embodiment, it is assumed that the user travels in thailand and the desired destination is identified in english. The user does not know English nor know Thai, at the moment, the user can shoot the destination name marked by English to obtain a target picture, and after English in the target picture is translated into Thai, the target picture can be communicated with Thai local people through voice playing, so that the problem of inconvenient communication caused by language barrier is avoided.
Optionally, the performing text recognition on the target picture to obtain first text information includes: performing character recognition on the target picture in a target client to obtain the first character information; the displaying the second text information includes: displaying the second text information in the target client; the playing the target voice information comprises: and playing the target voice information in the target client.
In this embodiment, the target client may be translation software, a game application, a communication application, a live application, or the like. In the embodiment, with the development of internationalization, communication with people in different countries can be performed in daily life, for example, players can come from different countries in a game, and in the process of the game, a problem of language communication obstacle can exist. Fig. 7 is a schematic diagram of an alternative game interface according to an embodiment of the present invention, in the game interface shown in fig. 7, a player can input characters for communication, but there is a problem of language communication obstacle because the player is from different countries. In this embodiment, a button for photo-translation and voice playing is arranged below a text input by each player, and after the player touches the photo-translation button, the game application captures a text corresponding to the button to obtain a target picture, and then performs character recognition and translation on characters in the picture, so that a translation result can be displayed below the characters input by the player as shown in fig. 7. In addition, the player can play the translated result by clicking the voice play button, that is, the translated result "left side" of "Lift side" can be played. Specifically, the target language player may set in the setting interface, and the language of the translation result and the speech playing is the target language. In the present embodiment, the game application is described as an application scenario, and the image-based text translation and speech playback in the present application can be applied to various application programs. According to the embodiment, the translation result is displayed in the client, and the translation result is played in a voice mode, so that the problem of language communication obstacles of different countries can be solved.
Optionally, the playing the target voice information includes: acquiring preset character timbre and speaking speed; and playing the target voice information according to the character tone and the reading speed.
As an alternative embodiment, the user may select the tone and volume of the voice play. Fig. 8 is a schematic diagram of an alternative playing voice setting interface according to an embodiment of the present invention, in which a user may select different characters to play target voice information, and may also set the playing speed and volume of voice by dragging a speed and volume progress bar. In the embodiment, the requirements of different users can be met by setting the tone, the volume and the speed of the character, and the experience of the users is improved.
The present application is illustrated below by a specific embodiment, and technical terms involved in the embodiment include:
a Canvas: part of HTML5, allowing scripting languages to dynamically render bit images.
Audio: HTML5 specifies a standard method for including Audio through an Audio element that can be used to play sound.
OCR: character Recognition (Optical Character Recognition). Based on the deep learning technology, the character content on the picture is intelligently recognized into editable text. OCR supports the recognition of printed forms of cards and bills such as identity cards and business cards, also supports the recognition of handwritten forms such as waybills and the like, supports the provision of customized services, and can effectively replace manual information input.
TMT: and (4) machine translation. The advantages of neural machine translation and statistical machine translation are combined, translation knowledge is automatically learned from a large-scale bilingual corpus, automatic translation from a source language text to a target language text is realized, and translation of more than ten languages can be supported at present.
TTS: speech synthesis (Text To Speech). The requirement of converting the text into the personified voice is met, and a man-machine interaction closed loop is opened. The method provides various tone selections, supports user-defined volume and speech speed, provides personalized tone customization service for customers, and enables pronunciation to be more natural and professional and to better meet scene requirements. The voice synthesis is widely applied to scenes such as voice navigation, audio books, robots, voice assistants, automatic news broadcasting and the like, the human-computer interaction experience is improved, and the voice application construction efficiency is improved.
In this embodiment, fig. 9 is a flowchart illustrating an overall process according to an alternative embodiment of the present invention, which mainly includes the following steps:
and step S1, authorizing the mobile phone of the user. Due to the requirement of mobile phone equipment, when a user enters the translation software for the first time, the user needs to authorize to use a mobile phone camera or access a mobile phone album, and an original picture which the user wants to translate is obtained through the mobile phone camera or the mobile phone album.
In step S2, OCR character recognition technology may receive Base64 encoded data of a picture, and it is necessary to convert the picture to Base64 data first. Specifically, the original picture may be converted into Base64 data by using Canvas, wherein the conversion of the original picture into Base64 data by using Canvas may include the following steps: step S21, the size of the canvas is set according to the size of the original picture. Specifically, the width and height of the canvas may be set according to the width and height of the original picture, the width of the canvas may be set to be the same as the width of the original picture, and the height of the canvas is the same as the height of the original picture. The width and the height of the canvas can also be set to be larger than the width and the height of the original picture by a preset threshold value, and the preset threshold value can be set according to the actual condition; step S22, setting the original picture on Canvas; in step S23, the toDataURL method of Canvas is used to convert the original picture into Base64 data, and obtain the target picture. Specifically, because the OCR character recognition interface allows Base64 data with a size of 3M or less, if the converted Base64 data of a picture is too large, the quality of the picture can be changed by adjusting the second parameter of toDataURL, and the smaller the value, the more blurred the picture, and accordingly the smaller the converted Base64 data. Specifically, the original picture may be reduced to a preset ratio to obtain the target picture, where the preset ratio may be determined according to actual situations, and may be, for example, 0.9, 0.8, 0.75, and the like.
In step S3, the characters in the target picture are identified. Specifically, the Base64 data obtained in the step can be sent to an OCR character recognition interface, and the OCR character recognition can perform character recognition on multiple languages simultaneously. Such as chinese, english, japanese, korean, spanish, french, german, portuguese, etc.
And step S4, after the characters in the target picture are successfully recognized, the server returns a JSON data structure, the JSON data structure comprises target fields, and the target fields are the character contents recognized by the OCR. The target field may be a detetedtext field, and the detetedtext field is the text content recognized by the OCR.
Step S5, sending the recognized words in step S4 to the TMT text translation interface and setting the target language, such as the native language chinese, or other languages, such as english, japanese korean, etc. Different fields may be used to represent the words desired to be translated and the words for which the translation is complete. For example, if the text that the user wants to translate is "Hello", the text that the user wants to translate can be used for entering the text that wants to translate through the SourceText field, and specifically, can be represented by: & SourceText ═ Hello. The Target language may be set with a Target field, for example, translation into simplified chinese may be represented by & Target-zh. The TMT text translation technology can translate a variety of languages, for example, english, japanese, korean, french, spanish, and the like.
In step S6, after the TMT Text translation is successful, the server returns a Json data structure, and the translated Text result can be represented by the Target Text field. For example, for the translation of Hello, in the Json data structure returned by the server, the field "" Target Text ": "you are good". This results in the translation of Hello into Chinese as "Hello".
In step S7, the text is converted into speech. And sending the translated characters obtained in the step S6 to a TTS language synthesis interface. Text intended to be converted to speech may be represented by a Text field, a Volume field may set a sound Volume, a Speed field may set a speaking Speed, and a VoiceType field may set a character tone. The character may be a real-world character, such as a female voice, a male voice, a female voice, and a warm voice. The timbre of the character may be pre-recorded or synthesized by adjusting parameters such as audio.
Step S8, after the TTS speech synthesis interface is called successfully, the server will extract the Audio field from the JSON data structure, and the Audio field is the Base64 data corresponding to the voice. The extracted voice Base64 data is transmitted to the Audio, the Audio will package the voice into Audio, and the Audio can be heard by playing the Audio.
In this embodiment, the user can photograph an unknown foreign language, recognize the characters in the picture through the translation software, and broadcast the characters in the picture in a humanized sound through the native language. The problem that the user cannot understand the foreign characters is solved, the trouble of manually knocking the foreign characters into translation software is avoided, the eyes and the brain of the user are released, the whole translation process is automatically finished by a program, the foreign characters are understood by listening, and convenience and operation experience of translating the foreign characters by the user can be greatly improved.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
According to another aspect of the embodiment of the invention, a character recognition method device for implementing the character recognition method is also provided. As shown in fig. 10, the apparatus includes: the identification module 1002 is configured to perform character identification on a target picture to obtain first character information; a translation module 1004, configured to perform machine translation on a text in a language other than the target language in the first text information to obtain second text information, where the first text information includes texts in at least two languages and the texts in the at least two languages include a text in a language other than the target language; a display module 1006, configured to display the second text information, where the texts in the second text information are all in the target language.
Optionally, the apparatus is further configured to perform machine translation on the first text to obtain a third text in the target language under the condition that the text in the at least two languages includes a first text in a language other than the target language and a second text in the target language; and combining the second words and the third words into the second word information.
Optionally, the apparatus is further configured to, when the at least two languages are not the target language, perform machine translation on the characters in the at least two languages respectively to obtain a fourth character in the target language, where the second character information includes the fourth character.
Optionally, the apparatus is further configured to perform text recognition on the target picture to obtain first text information in the following manner: performing character recognition on the target picture to obtain characters recognized at a group of recognition positions in the target picture; and performing language identification on the characters identified at the group of identification positions to obtain the language corresponding to the characters at each identification position, wherein the first character information comprises the characters at each identification position and the corresponding language.
Optionally, the above apparatus is further configured to implement the machine translation on the text in the language different from the target language in the first text information by: acquiring characters of which the languages are not the target language from the first character information according to the languages corresponding to the characters at each identification position; and performing machine translation on the acquired characters of which the language is not the target language to acquire the characters of the target language, wherein the second character information comprises the characters of the target language acquired through machine translation.
Optionally, the above apparatus is further configured to implement the machine translation on the text in the language different from the target language in the first text information by: under the condition that the characters with the languages which are not the target language comprise fifth characters of a first language and sixth characters of a second language, and the fifth characters of the first language are adjacent to the sixth characters of the second language, performing machine translation on the sixth characters of the second language according to the fifth characters of the first language to obtain seventh characters of the target language; or under the condition that the language is not the fifth character of the target language and the sixth character of the second language, and the fifth character of the first language is adjacent to the sixth character of the second language, performing machine translation on the sixth character of the second language according to the eighth character of the target language to obtain the ninth character of the target language, wherein the eighth character of the target language is the fifth character of the first language, and thus the character of the target language is obtained.
Optionally, the above apparatus is further configured to implement the machine translation on the text in the language different from the target language in the first text information by: and under the condition that the language is not the fifth character of the target language, the characters of the at least two languages comprise the tenth character of the target language, and the fifth character of the first language is adjacent to the tenth character of the target language, performing machine translation on the fifth character of the first language according to the tenth character of the target language to obtain the eleventh character of the target language.
Optionally, the device is further configured to, after obtaining second text information, convert the text in the second text information into the voice of the target language to obtain target voice information; and playing the target voice information.
Optionally, the device is further configured to perform text recognition on the target picture in a target client to obtain the first text information; displaying the second text information in the target client; and playing the target voice information in the target client.
Optionally, the apparatus is further configured to implement the playing of the target voice information by: acquiring preset character timbre and speaking speed; and playing the target voice information according to the character tone and the reading speed.
According to another aspect of the embodiment of the present invention, there is also provided an electronic device for implementing the above-mentioned character recognition method, where the electronic device may be a terminal device or a server shown in fig. 1. The present embodiment takes the electronic device as a terminal device as an example for explanation. As shown in fig. 11, the electronic device comprises a memory 1102 and a processor 1104, wherein the memory 1102 stores a computer program and the processor 1104 is arranged to execute the steps of any of the above method embodiments by means of the computer program.
Optionally, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of a computer network.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, performing character recognition on the target picture to obtain first character information;
s2, performing machine translation on the text in the language other than the target language in the first text information to obtain second text information when the first text information includes text in at least two languages and the text in at least two languages includes text in a language other than the target language;
and S3, displaying the second character information, wherein the characters in the second character information are all in the target language.
Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 11 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 11 is a diagram illustrating a structure of the electronic device. For example, the electronics may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 11, or have a different configuration than shown in FIG. 11.
The memory 1102 may be used to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for character recognition in the embodiments of the present invention, and the processor 1104 executes various functional applications and data processing by operating the software programs and modules stored in the memory 1102, that is, the method for character recognition described above is implemented. The memory 1102 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1102 can further include memory located remotely from the processor 1104 and such remote memory can be coupled to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1102 may be specifically but not limited to be used for storing information such as a target picture, first text information, and second text information. As an example, as shown in fig. 11, the memory 1102 may include, but is not limited to, a recognition module 1002, a translation module 1004, and a display module 1006 in the character recognition device, and may further include, but is not limited to, other module units in the character recognition device, which is not described in detail in this example.
Optionally, the transmitting device 1106 is used for receiving or transmitting data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 1106 includes a Network adapter (NIC) that can be connected to a router via a Network cable to communicate with the internet or a local area Network. In one example, the transmission device 1106 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
In addition, the electronic device further includes: a display 1108 for displaying the second text message; and a connection bus 1110 for connecting the respective module components in the above-described electronic apparatus.
In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting a plurality of nodes through a network communication. Nodes can form a Peer-To-Peer (P2P, Peer To Peer) network, and any type of computing device, such as a server, a terminal, and other electronic devices, can become a node in the blockchain system by joining the Peer-To-Peer network.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations described above. Wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:
s1, performing character recognition on the target picture to obtain first character information;
s2, performing machine translation on the text in the language other than the target language in the first text information to obtain second text information when the first text information includes text in at least two languages and the text in at least two languages includes text in a language other than the target language;
and S3, displaying the second character information, wherein the characters in the second character information are all in the target language.
Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (13)

1. A method for recognizing a character, comprising:
performing character recognition on the target picture to obtain first character information;
under the condition that the first text information comprises at least two languages of texts and the at least two languages of texts comprise texts with languages not in the target language, performing machine translation on the texts with the languages not in the target language in the first text information to obtain second text information;
and displaying the second character information, wherein the characters in the second character information are all in the target language.
2. The method according to claim 1, wherein, in a case that the first text information includes at least two languages of text and the at least two languages of text include a language other than a target language, performing machine translation on the text in the language other than the target language in the first text information to obtain second text information, comprising:
under the condition that the characters of the at least two languages comprise a first character of which the language is not a target language and a second character of which the language is the target language, performing machine translation on the first character to obtain a third character of which the language is the target language;
and combining the second words and the third words into the second word information.
3. The method according to claim 1, wherein, in a case that the first text information includes at least two languages of text and the at least two languages of text include a language other than a target language, performing machine translation on the text in the language other than the target language in the first text information to obtain second text information, comprising:
and under the condition that the at least two languages are not the target language, performing machine translation on the characters of the at least two languages respectively to obtain a fourth character of the target language, wherein the second character information comprises the fourth character.
4. The method according to claim 1, wherein the performing text recognition on the target picture to obtain first text information comprises:
performing character recognition on the target picture to obtain characters recognized at a group of recognition positions in the target picture;
and performing language identification on the characters identified at the group of identification positions to obtain the language corresponding to the characters at each identification position, wherein the first character information comprises the characters at each identification position and the corresponding language.
5. The method according to claim 4, wherein said machine translating the text in the language different from the target language in the first text message comprises:
acquiring characters of which the languages are not the target language from the first character information according to the languages corresponding to the characters at each identification position;
and performing machine translation on the acquired characters of which the language is not the target language to acquire the characters of the target language, wherein the second character information comprises the characters of the target language acquired through machine translation.
6. The method according to any one of claims 1 to 5, wherein the machine translating the text in the language different from the target language in the first text information comprises:
under the condition that the characters with the languages which are not the target language comprise fifth characters of a first language and sixth characters of a second language, and the fifth characters of the first language are adjacent to the sixth characters of the second language, performing machine translation on the sixth characters of the second language according to the fifth characters of the first language to obtain seventh characters of the target language; or
And under the condition that the language is not the target language characters comprise a first language fifth character and a second language sixth character, and the first language fifth character is adjacent to the second language sixth character, performing machine translation on the second language sixth character according to the target language eighth character to obtain the target language ninth character, wherein the target language eighth character is the first language fifth character, and then performing machine translation on the target language character.
7. The method according to any one of claims 1 to 5, wherein the machine translating the text in the language different from the target language in the first text information comprises:
and under the condition that the language is not the fifth character of the target language, the characters of the at least two languages comprise the tenth character of the target language, and the fifth character of the first language is adjacent to the tenth character of the target language, performing machine translation on the fifth character of the first language according to the tenth character of the target language to obtain the eleventh character of the target language.
8. The method of any of claims 1 to 5, wherein after obtaining the second textual information, the method further comprises:
converting the characters in the second character information into the voice of the target language to obtain target voice information;
and playing the target voice information.
9. The method of claim 8,
the character recognition of the target picture to obtain first character information comprises the following steps: performing character recognition on the target picture in a target client to obtain the first character information;
the displaying the second text information includes: displaying the second text information in the target client;
the playing the target voice information comprises: and playing the target voice information in the target client.
10. The method of claim 8, wherein the playing the target voice message comprises:
acquiring preset character timbre and speaking speed;
and playing the target voice information according to the character tone and the reading speed.
11. A character recognition apparatus, comprising:
the identification module is used for carrying out character identification on the target picture to obtain first character information;
the translation module is used for performing machine translation on the characters of which the languages are not the target language in the first character information to obtain second character information under the condition that the first character information comprises characters of at least two languages and the characters of the at least two languages comprise characters of which the languages are not the target language;
and the display module is used for displaying the second text information, wherein the texts in the second text information are all in the target language.
12. A computer-readable storage medium, comprising a stored program, wherein the program when executed performs the method of any one of claims 1 to 10.
13. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 10 by means of the computer program.
CN202011140561.3A 2020-10-22 2020-10-22 Character recognition method and device, storage medium and electronic equipment Pending CN112183122A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011140561.3A CN112183122A (en) 2020-10-22 2020-10-22 Character recognition method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011140561.3A CN112183122A (en) 2020-10-22 2020-10-22 Character recognition method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN112183122A true CN112183122A (en) 2021-01-05

Family

ID=73923845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011140561.3A Pending CN112183122A (en) 2020-10-22 2020-10-22 Character recognition method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112183122A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114237468A (en) * 2021-12-08 2022-03-25 文思海辉智科科技有限公司 Translation method and device for text and picture, electronic equipment and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060285748A1 (en) * 2005-06-15 2006-12-21 Fuji Xerox Co., Ltd. Document processing device
CN102375824A (en) * 2010-08-12 2012-03-14 富士通株式会社 Device and method for acquiring multilingual texts with mutually corresponding contents
CN105279152A (en) * 2014-06-24 2016-01-27 腾讯科技(深圳)有限公司 Method and device for word-fetching translation
CN106557226A (en) * 2015-09-25 2017-04-05 Lg电子株式会社 Mobile terminal and its control method
CN110012190A (en) * 2017-12-15 2019-07-12 京瓷办公信息系统株式会社 Image processing apparatus
CN111209461A (en) * 2019-12-30 2020-05-29 成都理工大学 Bilingual corpus collection system based on public identification words
CN111368562A (en) * 2020-02-28 2020-07-03 北京字节跳动网络技术有限公司 Method and device for translating characters in picture, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060285748A1 (en) * 2005-06-15 2006-12-21 Fuji Xerox Co., Ltd. Document processing device
CN102375824A (en) * 2010-08-12 2012-03-14 富士通株式会社 Device and method for acquiring multilingual texts with mutually corresponding contents
CN105279152A (en) * 2014-06-24 2016-01-27 腾讯科技(深圳)有限公司 Method and device for word-fetching translation
CN106557226A (en) * 2015-09-25 2017-04-05 Lg电子株式会社 Mobile terminal and its control method
CN110012190A (en) * 2017-12-15 2019-07-12 京瓷办公信息系统株式会社 Image processing apparatus
CN111209461A (en) * 2019-12-30 2020-05-29 成都理工大学 Bilingual corpus collection system based on public identification words
CN111368562A (en) * 2020-02-28 2020-07-03 北京字节跳动网络技术有限公司 Method and device for translating characters in picture, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114237468A (en) * 2021-12-08 2022-03-25 文思海辉智科科技有限公司 Translation method and device for text and picture, electronic equipment and readable storage medium
CN114237468B (en) * 2021-12-08 2024-01-16 文思海辉智科科技有限公司 Text and picture translation method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN104735468B (en) A kind of method and system that image is synthesized to new video based on semantic analysis
KR101992424B1 (en) Apparatus for making artificial intelligence character for augmented reality and service system using the same
US20150277686A1 (en) Systems and Methods for the Real-Time Modification of Videos and Images Within a Social Network Format
CN111611436A (en) Label data processing method and device and computer readable storage medium
CN107071554B (en) Method for recognizing semantics and device
CN108600632A (en) It takes pictures reminding method, intelligent glasses and computer readable storage medium
CN108959274A (en) A kind of interpretation method and server of application program
CN113408208B (en) Model training method, information extraction method, related device and storage medium
CN112270768A (en) Ancient book reading method and system based on virtual reality technology and construction method thereof
CN109255130A (en) A kind of method, system and the equipment of language translation and study based on artificial intelligence
KR20210090273A (en) Voice packet recommendation method, device, equipment and storage medium
CN108847066A (en) A kind of content of courses reminding method, device, server and storage medium
CN112183122A (en) Character recognition method and device, storage medium and electronic equipment
US20230326369A1 (en) Method and apparatus for generating sign language video, computer device, and storage medium
KR102194008B1 (en) Method for providing augmented reality contents based on image of goods
CN111078982A (en) Electronic page retrieval method, electronic device and storage medium
CN107221031B (en) AR-based outdoor expansion activity open platform and implementation method thereof
US20150039643A1 (en) System for storing and searching image files, and cloud server
US11532111B1 (en) Systems and methods for generating comic books from video and images
CN109462689A (en) Voice broadcast method and device, electronic device and computer readable storage medium
CN112862558B (en) Method and system for generating product detail page and data processing method
CN113655933A (en) Text labeling method and device, storage medium and electronic equipment
CN112001929A (en) Picture asset processing method and device, storage medium and electronic device
CN111695323A (en) Information processing method and device and electronic equipment
CN112652294B (en) Speech synthesis method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40038145

Country of ref document: HK