CN113052067A - Real-time translation method, device, storage medium and terminal equipment - Google Patents

Real-time translation method, device, storage medium and terminal equipment Download PDF

Info

Publication number
CN113052067A
CN113052067A CN202110315357.9A CN202110315357A CN113052067A CN 113052067 A CN113052067 A CN 113052067A CN 202110315357 A CN202110315357 A CN 202110315357A CN 113052067 A CN113052067 A CN 113052067A
Authority
CN
China
Prior art keywords
terminal
translation
image
real
display mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110315357.9A
Other languages
Chinese (zh)
Inventor
吴心豪
苑杨
蒋意
范建军
陶纯玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202110315357.9A priority Critical patent/CN113052067A/en
Publication of CN113052067A publication Critical patent/CN113052067A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the application provides a real-time translation method, a real-time translation device, a storage medium and a terminal device. The real-time translation method avoids the defect of low translation efficiency caused by the fact that the whole terminal image often has the characteristics of complex image background or fuzzy characters, improves the translation efficiency on the whole, and also improves the translation accuracy; in addition, the real-time translation method can automatically acquire the image of the preset target area every time, so that manual screenshot of a user is not needed, and the translation efficiency is further improved on the whole.

Description

Real-time translation method, device, storage medium and terminal equipment
Technical Field
The present application relates to the field of intelligent translation technologies, and in particular, to a real-time translation method and apparatus, a storage medium, and a terminal device.
Background
At present, with the rapid development of industries such as games, cartoons, videos and videos, more and more people browse foreign cartoons, videos and webpages on mobile phone terminals and download foreign android game application programs. The pages of the foreign android game application are usually foreign text original subtitles, and translation processing is needed.
However, in the current product market, when translating the text in the terminal screen corresponding to the mobile phone terminal, the image recognition and translation processing is often directly performed on the whole terminal screen, and because the whole terminal screen has the characteristics of complex screen background or fuzzy characters, the efficiency corresponding to the recognition and translation process is usually low.
Disclosure of Invention
The embodiment of the application provides a real-time translation method, a real-time translation device, a storage medium and a terminal device, the real-time translation method can obtain a corresponding translation result by performing image processing, recognition and translation on an image corresponding to a preset target area in a terminal picture, the image processing, recognition and translation do not need to be performed on the whole terminal picture, and the translation efficiency is improved on the whole.
The embodiment of the application provides a real-time translation method, which comprises the following steps:
determining a preset target area on a current picture of the terminal;
acquiring an image displayed in a preset target area;
processing the image to obtain corresponding character image information;
identifying character image information through a neural network;
the recognized character is translated.
An embodiment of the present application further provides a real-time translation apparatus, including:
the target area determining unit is used for determining a preset target area on a current picture of the terminal;
the image acquisition unit is used for acquiring an image displayed in a preset target area;
the character extraction unit is used for processing the image to obtain corresponding character image information;
the recognition unit is used for recognizing the character image information through a neural network;
and the translation unit is used for translating the recognized characters.
The embodiment of the application also provides a storage medium, wherein a computer program is stored in the storage medium, and when the computer program runs on a computer, the computer is enabled to execute the real-time translation method.
The embodiment of the application further provides a terminal device, the terminal device comprises a processor and a memory, the memory stores a computer program, and the processor executes the real-time translation method by calling the computer program stored in the memory.
According to the real-time translation method, the preset target area on the current terminal picture is determined, the image displayed by the preset target area is obtained, the image is processed to obtain the corresponding character image information, the character image information is identified through the neural network, the identified character is translated, and then the image corresponding to the preset target area in the terminal picture can be processed, identified and translated to obtain the corresponding translation result, the whole terminal picture does not need to be processed, identified and translated, the defect that the translation efficiency is low due to the fact that the whole terminal picture always has the characteristics of complex picture background or fuzzy characters is avoided, the translation efficiency is improved on the whole, and meanwhile the translation accuracy is improved; in addition, the real-time translation method can automatically acquire the image of the preset target area every time, so that manual screenshot of a user is not needed, and the translation efficiency is further improved on the whole.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 is a first flowchart of a real-time translation method according to an embodiment of the present disclosure.
Fig. 2 is a schematic application interface diagram of a real-time translation method according to an embodiment of the present application.
Fig. 3 is a flowchart illustrating a method for determining a preset target area on a current screen of a terminal according to an embodiment of the present application.
Fig. 4 is a flowchart illustrating a method for determining a preset target area on a current screen of a terminal according to a translation area in an embodiment of the present application.
Fig. 5 is a schematic flowchart of a method for recognizing a character image in an embodiment of the present application.
Fig. 6 is a schematic flow chart of a method for obtaining character image information in the embodiment of the present application.
Fig. 7 is a flowchart illustrating a method for extracting characters from a preprocessed image according to an embodiment of the present disclosure.
Fig. 8 is a schematic flowchart of a method for extracting characters from a binarized target image according to an embodiment of the present application.
Fig. 9 is a second flowchart of a real-time translation method according to an embodiment of the present application.
Fig. 10 is a third flowchart illustrating a real-time translation method provided in an embodiment of the present application.
Fig. 11 is a block diagram of a real-time translation apparatus provided in an embodiment of the present application.
Fig. 12 is a block diagram of a target area determining unit provided in the embodiment of the present application.
Fig. 13 is a block diagram of a second region generation subunit provided in the embodiment of the present application.
Fig. 14 is a block diagram of a structure of an identification unit provided in an embodiment of the present application.
Fig. 15 is a block diagram of a structure of a character extraction unit provided in the embodiment of the present application.
Fig. 16 is a block diagram of a terminal device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without inventive step, are within the scope of the present application.
Referring to fig. 1, an embodiment of the present application provides a real-time translation method, which may be applied to a terminal device, where the terminal device may be a smart phone, a tablet computer, a game device, an AR (Augmented Reality) device, an automobile, a data storage device, an audio playing device, a video playing device, a notebook computer, a desktop computing device, a wearable device such as an electronic watch, an electronic glasses, an electronic helmet, an electronic bracelet, an electronic necklace, an electronic garment, and the like.
As shown in fig. 1, fig. 1 is a first schematic flow chart of a real-time translation method provided in an embodiment of the present application, where the real-time translation method includes the following steps:
and 110, determining a preset target area on the current picture of the terminal.
When a user uses a terminal, for example, when the user browses videos, foreign cartoons, web pages or runs foreign applications by using the terminal, text information to be translated often appears in a corresponding terminal page, and the text information with translation often appears in a specific area of a current page of the terminal, so that the specific area in which the text information with translation often appears in a terminal system is used as a preset target area, and a text area required to be translated in the current page of the terminal is directly determined according to the preset target area.
In the process of setting up the preset target area of the terminal, a proper area can be selected from the history record as the preset target area by using a big data function.
And 120, acquiring an image displayed in a preset target area.
In the process of translating the current picture of the terminal, after the preset target area is determined, an image displayed in the preset target area can be further acquired in the current picture of the terminal through an automatic screenshot function.
And 130, processing the image to obtain corresponding character image information.
After the image displayed in the preset target area is obtained, the image can be processed by adopting an image processing technical means to obtain corresponding character image information.
The image processing technical means adopts one or more of graying, character tilt correction, filtering and denoising, image enhancement and binarization.
And 140, recognizing the character image information through a neural network.
After the character image information is obtained, further identification is needed to obtain the corresponding character.
In the process of recognizing the character image, a neural network method can be adopted for recognition, and the recognized character is translated.
The neural network is usually trained in advance, and can identify a plurality of language types.
150, the recognized character is translated.
In some embodiments, as shown in fig. 2, fig. 2 is an application interface schematic diagram of a real-time translation method provided in an embodiment of the present application, where the real-time translation method is applied to a terminal device 100, the terminal device 100 is a mobile phone terminal, a preset target area is represented by a dashed line frame 11, when the mobile phone terminal plays a video screen in a horizontal screen mode (i.e., full screen mode play), a foreign language subtitle (referred to as "XXXXXX") appears in the preset target area 11 to be translated, at this time, the mobile phone terminal can obtain an image displayed by the preset target area according to the determined preset target area 11 on the current screen of the terminal, process the image to obtain corresponding character image information, identify the character image information through a neural network, and translate the identified character.
Referring to fig. 2, the result (indicated by the dashed box 12 "YYYYYY") obtained by the translation can be directly displayed at the bottom of the current video frame.
According to the real-time translation method, the preset target area on the current picture of the terminal is determined, the image displayed by the preset target area is obtained, the image is processed to obtain the corresponding character image information, the character image information is identified through the neural network, the identified character is translated, and then the image corresponding to the preset target area in the picture of the terminal can be processed, identified and translated to obtain the corresponding translation result, the image processing, identification and translation of the whole terminal picture are not needed, the defect that the translation efficiency is low due to the fact that the whole terminal picture is complex in picture background or fuzzy in characters is avoided, the translation efficiency is improved on the whole, and the translation accuracy is improved; in addition, the real-time translation method can automatically acquire the image of the preset target area every time, does not need manual screenshot of a user, reduces repetitive work, can be well adapted to videos, games and web novels, and further improves the translation efficiency on the whole.
In some embodiments, as shown in fig. 3, fig. 3 is a schematic flowchart of a method for determining a preset target area on a current screen of a terminal in the embodiment of the present application, where 110 includes:
and 112, acquiring the historical translation record of the terminal.
When the terminal device performs translation, a certain historical translation record is usually reserved, each translation record generally comprises a corresponding translation region, a translation language, translation time, a display mode corresponding to the terminal device during translation and the size of a terminal historical picture region, and the historical translation records lay a foundation for subsequently determining a preset target region of a current picture of the terminal.
And 114, acquiring a translation area for translating the terminal history picture in the history translation record.
After a history translation record is acquired, the size of a translation area for translating the terminal history picture in the history translation record, the terminal display mode information corresponding to the translation area and the area size of the terminal history picture can be extracted and acquired.
And 116, determining a preset target area on the current picture of the terminal according to the translation area.
After the translation region is obtained, a preset target region on the current picture of the terminal can be determined according to the size of the translation region.
In some embodiments, the historical translation record is a translation record corresponding to a first time translation is started by a certain application installed in the terminal.
The terminal system is provided with a manual translation mode option and an automatic translation mode option, when the terminal is translated for the first time after being started, the terminal firstly acquires a language selected by a user for translation, then acquires a translation mode selected by the user, and when the terminal is translated for the first time historically, the terminal system selects the manual translation mode by default and automatically needs to acquire a manual touch operation of the user on a terminal display screen so as to acquire a translation area for translating a current picture of the terminal and generate an image corresponding to the translation area, then processes the image so as to acquire corresponding character image information, identifies the character image information through a neural network, and translates the identified character.
The terminal reserves a translation record corresponding to the first translation starting, wherein the translation record comprises a translation area corresponding to the first translation starting, a translation language, translation time, a display mode corresponding to the terminal equipment during translation and the area size of a terminal picture.
When a certain application of the terminal carries out next translation, if a user does not select a manual translation mode, the terminal system defaults to select an automatic translation mode, at the moment, the terminal system defaults to select the automatic translation mode, a translation area for translating a terminal historical picture is obtained according to a translation record when the application starts translation for the first time in the terminal system, a preset target area on a current picture of the terminal is determined according to the translation area, then an image displayed by the preset target area is obtained, the image is processed to obtain corresponding character image information, the character image information is identified through a neural network, the identified character is translated, and then the image corresponding to the preset target area in the terminal picture can be subjected to image processing, identification and translation to obtain a corresponding translation result.
The historical translation records can be independently set according to different types of application programs installed on the terminal, and when the application programs of each type are started, the corresponding first historical translation records can be independently determined, so that the intelligence and the translation accuracy of the real-time translation method can be further improved.
In some embodiments, the preset target area is a rectangular selection area.
In the embodiment, the preset target area on the current picture of the terminal is determined by using the translation area for translating the historical picture of the terminal in the historical translation record, so that the defect that the coordinate position is required to be manually input to determine the translation area in the traditional technical scheme is overcome, the image of the preset target area can be automatically acquired every time, the coordinate position of the translation area is not required to be manually input by a user, and the translation efficiency is further improved on the whole.
In some embodiments, as shown in fig. 4, fig. 4 is a flowchart illustrating a method for determining a preset target area on a current screen of a terminal according to a translation area in the embodiment of the present application, where 116 includes:
and 1162, determining the current display mode of the terminal, wherein the display mode comprises a vertical screen display mode and a horizontal screen display mode.
When the preset target area on the current screen of the terminal is determined according to the translation area for translating the historical terminal screen, the factor of the terminal display mode needs to be further considered, because the sizes of the preset target areas corresponding to the vertical screen display mode and the horizontal screen display mode are different, and further differentiation needs to be performed, so that the current display mode of the terminal needs to be determined at first.
1164, judging whether the current display mode is the same as the terminal display mode corresponding to the history translation record, if yes, proceeding to 1166, otherwise, proceeding to 1168.
1166, directly determining the translation area as a preset target area on the current screen of the terminal.
The terminal display mode corresponding to the historical translation record is the same as the current display mode, for example, both are the vertical screen display mode or the horizontal screen display mode, and at this time, because the sizes of the terminal historical picture and the terminal current picture are correspondingly the same (namely, the coordinates are the same) in the same display mode, the translation area can be directly determined as the preset target area on the terminal current picture.
1168, converting and calculating the translation area to obtain a preset target area corresponding to the current picture of the terminal in the current display mode.
The terminal display mode corresponding to the historical translation record is different from the current display mode, for example, the terminal display mode corresponding to the historical translation record is a horizontal screen display mode, and the current display mode is a vertical screen mode, at this time, since the terminal historical picture and the terminal current picture are obviously different in area size in different display modes, the translation area cannot be directly determined as the preset target area on the terminal current picture, and the translation area needs to be further subjected to conversion calculation processing, so that the preset target area corresponding to the terminal current picture in the current display mode is obtained.
By further distinguishing different display modes of the terminal, the preset target area on the current picture of the terminal can be accurately determined when the terminal rotates and the display modes change.
In some embodiments, 1168 includes:
and calculating to obtain a preset target area corresponding to the current picture of the terminal in the current display mode according to the position coordinates of the translation area in the historical picture of the terminal and the position coordinates of the current picture of the terminal in the display screen of the terminal.
When the terminal display mode corresponding to the historical translation record is different from the current display mode, further conversion processing needs to be performed on the translation area.
Under a terminal display mode corresponding to the historical translation record, according to the historical translation record, the position coordinates of the terminal historical picture in a terminal display screen and the position coordinates of the translation area in the terminal display screen can be directly obtained; in the current display mode of the terminal, the position coordinates of the current screen of the terminal are also directly available.
Therefore, the position coordinates of the translation area in the terminal history picture can be calculated according to the position coordinates of the terminal history picture in the terminal display screen and the position coordinates of the translation area in the terminal display screen, and the preset target area corresponding to the current picture of the terminal in the current display mode is calculated and mapped by combining the position coordinates of the current picture of the terminal in the terminal display screen.
In some embodiments, the neural network is based on a long-term and short-term memory network, as shown in fig. 5, fig. 5 is a schematic flow chart of a method for recognizing a character image in an embodiment of the present application, and 140 includes:
and 142, training the preset sample based on the long-term and short-term memory network to obtain the artificial neural network model.
Before the character images are identified, a preset sample can be trained based on a Long-Short Term Memory network to further obtain an artificial neural network model, and the Long-Short Term Memory network (LSTM) is a time recurrent neural network, is specially designed for solving the Long-Term dependence problem of a common recurrent neural network and belongs to one of time recurrent neural networks.
144, recognizing the character image information by using an artificial neural network model.
The training of the preset samples based on the long-term and short-term memory network can obtain the trained artificial neural network model, lays a foundation for accurate recognition of character images by adopting the artificial neural network model, and further improves the accuracy of character image information recognition on the whole.
In some embodiments, as shown in fig. 6, fig. 6 is a schematic flowchart of a method for obtaining character image information in the embodiment of the present application, and 130 includes:
and 132, carrying out graying, character tilt correction, filtering and denoising and image enhancement on the image in sequence to obtain a preprocessed image.
When image preprocessing is performed, graying processing is generally required to be performed on an image displayed in a preset target area to be processed, then the situation of character inclination possibly existing in the grayed image is corrected through an inclination correction algorithm, filtering and denoising are further performed on the corrected image so as to remove interference noise, character pixels are highlighted through an image enhancement algorithm, and finally a preprocessed image is obtained.
And 134, extracting characters from the preprocessed image to obtain corresponding character image information.
After obtaining the pre-processed image, the character image information cannot be directly obtained, and further character extraction processing needs to be performed on the pre-processed image to obtain the character image information.
In some embodiments, as shown in fig. 7, fig. 7 is a schematic flowchart of a method for extracting characters from a preprocessed image in an embodiment of the present application, and 134 includes:
1342, binarizing the preprocessed image to obtain a binarized target image containing a plurality of characters.
When extracting characters, it is usually necessary to first perform binarization processing on a preprocessed image by using a threshold segmentation algorithm to obtain a binarization target image containing a plurality of characters.
1344, the binary target image is sequentially processed by morphology, character image segmentation and normalization to extract the corresponding character image information.
After the binarization target image is obtained, further processing is performed through a proper morphological algorithm, then a morphologically processed image is obtained, the morphologically processed image is subjected to character image segmentation to extract a character image, and then the character image is converted into a character image with a uniform format size through a normalization algorithm, so that corresponding character image information is obtained.
In some embodiments, as shown in fig. 8, fig. 8 is a schematic flowchart of a method for extracting characters from a binarized target image in the embodiment of the present application, and step 1344 includes:
1344a, performing morphology processing on the binarized target image to enable each character to form a complete connected domain and draw the minimum circumscribed rectangle of each character connected domain to obtain a target image after the morphology processing.
In the process of performing morphological processing on the binary target image, each character needs to form a complete connected domain through a proper morphological algorithm, and then the minimum circumscribed rectangle of each character connected domain is drawn to obtain the morphologically processed target image.
1344b, according to the minimum circumscribed rectangle of each character connected domain, performing character image segmentation and normalization processing on the target image after morphological processing to extract corresponding character image information.
And performing character image segmentation on the morphologically processed target according to the minimum circumscribed rectangle of each character connected domain by determining the minimum circumscribed rectangle of each character connected domain.
The divided character images may have different sizes and formats, and at this time, the divided character images are further uniformly adjusted to images with the same size through normalization operation, so as to obtain corresponding character image information.
In some embodiments, as shown in fig. 9, fig. 9 is a second flowchart of a real-time translation method provided in an embodiment of the present application, where the real-time translation method further includes:
160, displaying the translated text information on the current screen of the terminal; or
And converting the translated text information into corresponding voice information, and playing the voice information.
The text information obtained by translation is usually displayed at a preset position of the current page of the terminal, and the preset position can be usually set according to the habit of the user.
For example, the translated text information can be displayed at the bottom of the display screen of the terminal in the form of a transparent background and a white font.
The method can also convert the translated text information into corresponding voice information and play the voice information, so that the translated text information can be played in voice in the scene that a user reads cartoons and webpages, captions do not need to be translated at the bottom, and the real-time translation method is more intelligent and humanized.
In some embodiments, as shown in fig. 10, fig. 10 is a third schematic flow chart of a real-time translation method provided in an embodiment of the present application, where the real-time translation method includes:
and 112, acquiring the historical translation record of the terminal.
And 114, acquiring a translation area for translating the terminal history picture in the history translation record.
And 1162, determining the current display mode of the terminal, wherein the display mode comprises a vertical screen display mode and a horizontal screen display mode.
1164, judging whether the current display mode is the same as the terminal display mode corresponding to the history translation record, if yes, proceeding to 1166, otherwise, proceeding to 1168.
1166, directly determining the translation area as a preset target area on the current screen of the terminal.
1168, converting and calculating the translation area to obtain a preset target area corresponding to the current picture of the terminal in the current display mode.
And 120, acquiring an image displayed in a preset target area.
And 132, carrying out graying, character tilt correction, filtering and denoising and image enhancement on the image in sequence to obtain a preprocessed image.
1342, binarizing the preprocessed image to obtain a binarized target image containing a plurality of characters.
1344a, performing morphology processing on the binarized target image to enable each character to form a complete connected domain and draw the minimum circumscribed rectangle of each character connected domain to obtain a target image after the morphology processing.
1344b, according to the minimum circumscribed rectangle of each character connected domain, performing character image segmentation and normalization processing on the target image after morphological processing to extract corresponding character image information.
And 142, training the preset sample based on the long-term and short-term memory network to obtain the artificial neural network model.
144, recognizing the character image information by using an artificial neural network model.
150, the recognized character is translated.
Each step in this embodiment refers to the explanation of each corresponding step in the foregoing embodiment, and is not described again.
In the embodiment, a historical translation record of a terminal is obtained, a translation area for translating a historical picture of the terminal in a corresponding terminal display mode is obtained according to the historical translation record, a preset target area on a current picture of the terminal is determined according to the current display mode of the terminal, the terminal display mode corresponding to the historical translation record and the translation area, an image displayed by the preset target area is obtained, corresponding image processing steps are executed, a preset sample is trained based on a long-short term memory network to obtain an artificial neural network model, finally, the artificial neural network model is adopted to identify character image information, and the identified character is translated to obtain a corresponding translation result.
The real-time translation method does not need to process, identify and translate the whole terminal picture, avoids the defect of low translation efficiency caused by the fact that the whole terminal picture often has the characteristics of complex picture background or fuzzy characters, improves the translation efficiency on the whole, and improves the translation accuracy; in addition, the real-time translation method can automatically acquire the image of the preset target area every time, so that manual screenshot of a user is not needed, and the translation efficiency is further improved on the whole.
Referring to fig. 11 and fig. 11, a block diagram of a real-time translation apparatus 200 provided in this embodiment of the present application illustrates that the real-time translation apparatus 200 may be integrated in an electronic device. The electronic device may be a smartphone, a tablet computer, a gaming device, an AR (Augmented Reality) device, an automobile, a data storage device, an audio playback device, a video playback device, a laptop computer, a desktop computing device, a wearable device such as an electronic watch, an electronic glasses, an electronic helmet, an electronic bracelet, an electronic necklace, an electronic garment, or the like.
The real-time translation apparatus 200 with reference to fig. 11 includes:
a target area determining unit 210, configured to determine a preset target area on a current screen of the terminal;
an image obtaining unit 220, configured to obtain an image displayed in a preset target area;
a character extraction unit 230, configured to process the image to obtain corresponding character image information;
a recognition unit 240 for recognizing the character image information through a neural network;
and a translation unit 250 for translating the recognized character.
The real-time translation device 200 can be used as a general program built in the terminal, the real-time translation device 200 can be added to the sidebar in a self-defined manner, and the sidebar on the display screen of the terminal can be called at any time through a sliding operation.
In some embodiments, as shown in fig. 12, fig. 12 is a block diagram of a structure of a target area determining unit 210 provided in an embodiment of the present application, where the target area determining unit 210 includes:
a record acquisition subunit 212, configured to acquire a history translation record of the terminal;
a first area acquisition subunit 214, configured to acquire a translation area for translating a terminal history picture in the history translation record;
and a second region generating subunit 216, configured to determine a preset target region on the current screen of the terminal according to the translation region.
In some embodiments, as shown in fig. 13, in a block diagram of a structure of the second region generating subunit 216 provided in this embodiment of the present application, the second region generating subunit 216 includes:
a mode determination subunit 2162, configured to determine a current display mode of the terminal, where the display mode includes a vertical screen display mode or a horizontal screen display mode;
a mode judging subunit 2164, configured to judge whether the current display mode is the same as the terminal display mode corresponding to the historical translation record;
the first processing subunit 2166 is configured to, when the current display mode is the same as the terminal display mode corresponding to the historical translation record, directly determine the translation area as a preset target area on the current screen of the terminal;
the second processing subunit 2168 is configured to, when the current display mode is different from the terminal display mode corresponding to the historical translation record, perform conversion calculation processing on the translation area to obtain a preset target area corresponding to the current screen of the terminal in the current display mode.
In some embodiments, the second processing subunit 2168 is configured to calculate, according to the position coordinates of the translation area in the terminal history screen and the position coordinates of the terminal current screen in the terminal display screen, a preset target area corresponding to the terminal current screen in the current display mode.
In some embodiments, as shown in fig. 14, fig. 14 is a block diagram of a structure of a recognition unit 240 provided in the embodiment of the present application, wherein a neural network is used, and the recognition unit 240 includes:
a training subunit 242, configured to train a preset sample based on the long-term and short-term memory network to obtain an artificial neural network model;
and the identifying subunit 244 is configured to identify the character image information by using an artificial neural network model.
In some embodiments, as shown in fig. 15, fig. 15 is a block diagram of a structure of a character extracting unit 230 provided in an embodiment of the present application, where the character extracting unit 230 includes:
the preprocessing subunit 232 is configured to perform graying, character tilt correction, filtering denoising, and image enhancement on the image in sequence to obtain a preprocessed image;
and an extracting subunit 234, configured to perform character extraction on the preprocessed image to obtain corresponding character image information.
In specific implementation, the modules may be implemented as independent entities, or may be combined arbitrarily and implemented as one or several entities.
As can be seen from the above, in the real-time translation apparatus 200 provided in this embodiment of the present application, the target area determining unit 210 determines the preset target area on the current screen of the terminal, the image obtaining unit 220 obtains the image displayed in the preset target area, the character extracting unit 230 processes the image to obtain corresponding character image information, the recognition unit 240 is used to recognize the character image information through the neural network, and finally the translation unit 250 translates the recognized character to obtain a corresponding translation result, which does not need to perform image processing, recognition and translation on the whole terminal screen, thereby avoiding the disadvantage of low translation efficiency caused by the fact that the whole terminal screen often has the characteristics of complex screen background or fuzzy characters, improving the translation efficiency as a whole, and improving the translation accuracy; in addition, the real-time translation device 200 can automatically acquire the image of the preset target area each time, so that a user does not need to manually capture a screenshot, and the translation efficiency is further improved on the whole.
The present application also provides a storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the above real-time translation method.
As shown in fig. 16, fig. 16 is a block diagram of a terminal device 100 provided in an embodiment of the present application. The terminal device 100 may be a smart phone, a tablet computer, a game device, an AR (Augmented Reality) device, an automobile, a data storage device, an audio playing device, a video playing device, a notebook computer, a desktop computing device, a wearable device such as an electronic watch, an electronic glasses, an electronic helmet, an electronic bracelet, an electronic necklace, an electronic garment, or the like.
Among them, referring to fig. 16, the terminal device 100 includes a processor 101 and a memory 102. The processor 101 is electrically connected to the memory 102.
The processor 101 is a control center of the terminal device 100, connects various parts of the entire terminal device with various interfaces and lines, and performs various functions of the terminal device 100 and processes data by running or calling a computer program stored in the memory 102 and calling data stored in the memory 102, thereby monitoring the terminal device 100 as a whole.
In the present embodiment, the processor 101 in the terminal device 100 loads instructions corresponding to the processes of one or more computer programs into the memory 102 according to the following steps, and the processor 101 runs the computer programs stored in the memory 102, thereby executing the following steps:
determining a preset target area on a current picture of the terminal;
acquiring an image displayed in the preset target area;
processing the image to obtain corresponding character image information;
and recognizing the character image information through a neural network, and translating the recognized character.
In some embodiments, when determining the preset target area on the current screen of the terminal, the processor 101 performs the following steps:
acquiring a historical translation record of a terminal;
acquiring a translation area for translating the terminal history picture in the history translation record;
and determining a preset target area on the current picture of the terminal according to the translation area.
In some embodiments, when determining the preset target area on the current screen of the terminal according to the translation area, the processor 101 performs the following steps:
determining a current display mode of the terminal, wherein the display mode comprises a vertical screen display mode and a horizontal screen display mode;
judging whether the current display mode is the same as the terminal display mode corresponding to the historical translation record or not;
if so, directly determining the translation area as a preset target area on the current picture of the terminal;
if not, performing conversion calculation processing on the translation area to obtain a preset target area corresponding to the current picture of the terminal in the current display mode.
In some embodiments, when performing conversion calculation processing on the translation region to obtain a preset target region corresponding to the terminal in the current display mode, the processor 101 executes the following steps:
and calculating to obtain a preset target area corresponding to the current terminal picture in the current display mode according to the translation area, the terminal history picture and the position coordinates of the current terminal picture in a terminal display screen.
In some embodiments, the neural network is based on a long-short term memory network, and when the character image information is recognized through the neural network, the processor 101 performs the following steps:
training a preset sample based on a long-term and short-term memory network to obtain an artificial neural network model;
and identifying the character image information by adopting the artificial neural network model.
In some embodiments, when processing the image to obtain corresponding character image information, the processor 101 performs the following steps:
carrying out graying, character tilt correction, filtering and denoising and image enhancement on the image in sequence to obtain a preprocessed image;
and extracting characters from the preprocessed image to obtain corresponding character image information.
In some embodiments, the pre-processed image is character extracted to obtain corresponding character image information, and the processor 101 performs the following steps:
carrying out binarization processing on the preprocessed image to obtain a binarization target image containing a plurality of characters;
and sequentially carrying out morphological processing, character image segmentation and normalization processing on the binarization target image so as to extract and obtain corresponding character image information.
In some embodiments, when the binarization target image is sequentially subjected to morphological processing, character image segmentation and normalization processing to extract corresponding character image information, the processor 101 performs the following steps:
performing morphological processing on the binaryzation target image to enable each character to form a complete connected domain and draw a minimum circumscribed rectangle of each character connected domain to obtain a morphologically processed target image;
and according to the minimum circumscribed rectangle of each character connected domain, performing character image segmentation and normalization processing on the morphologically processed target image to extract corresponding character image information.
In some embodiments, after translating the text information, the processor 101 further performs the following steps:
displaying the text information obtained by translation on the current picture of the terminal; or
And converting the translated text information into corresponding voice information, and playing the voice information.
The memory 102 may be used to store computer programs and data. The memory 102 stores computer programs containing instructions executable in the processor. The computer program may constitute various functional modules. The processor 101 executes various functional applications and data processing by calling a computer program stored in the memory 102.
In the description of the present application, it is to be understood that terms such as "first", "second", and the like are used merely to distinguish one similar element from another, and are not to be construed as indicating or implying relative importance or implying any indication of the number of technical features indicated.
It should be noted that, all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, which may include, but is not limited to: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
The real-time translation method, the real-time translation device, the storage medium and the terminal device provided by the embodiment of the application are described in detail above. The principle and the implementation of the present application are explained herein by applying specific examples, and the above description of the embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (17)

1. A real-time translation method, characterized in that the real-time translation method comprises:
determining a preset target area on a current picture of the terminal;
acquiring an image displayed in the preset target area;
processing the image to obtain corresponding character image information;
recognizing the character image information through a neural network;
the recognized character is translated.
2. The real-time translation method according to claim 1, wherein the step of determining the preset target area on the current screen of the terminal comprises:
acquiring a historical translation record of a terminal;
acquiring a translation area for translating the terminal history picture in the history translation record;
and determining a preset target area on the current picture of the terminal according to the translation area.
3. The real-time translation method according to claim 2, wherein the step of determining a preset target area on the current screen of the terminal according to the translation area comprises:
determining a current display mode of the terminal, wherein the display mode comprises a vertical screen display mode and a horizontal screen display mode;
judging whether the current display mode is the same as the terminal display mode corresponding to the historical translation record or not;
if so, directly determining the translation area as a preset target area on the current picture of the terminal;
if not, performing conversion calculation processing on the translation area to obtain a preset target area corresponding to the current picture of the terminal in the current display mode.
4. The real-time translation method according to claim 3, wherein the step of performing conversion calculation processing on the translation region to obtain a preset target region corresponding to the terminal in the current display mode includes:
and calculating to obtain a preset target area corresponding to the current terminal picture in the current display mode according to the translation area, the terminal history picture and the position coordinates of the current terminal picture in a terminal display screen.
5. The real-time translation method according to claim 1, wherein the neural network is based on a long-short term memory network, and the step of recognizing the character image information through the neural network comprises:
training a preset sample based on a long-term and short-term memory network to obtain an artificial neural network model;
and identifying the character image information by adopting the artificial neural network model.
6. The real-time translation method according to claim 1, wherein the step of processing the image to obtain the corresponding character image information comprises:
carrying out graying, character tilt correction, filtering and denoising and image enhancement on the image in sequence to obtain a preprocessed image;
and extracting characters from the preprocessed image to obtain corresponding character image information.
7. The real-time translation method according to claim 6, wherein said step of extracting characters from said preprocessed image to obtain corresponding character image information comprises:
carrying out binarization processing on the preprocessed image to obtain a binarization target image containing a plurality of characters;
and sequentially carrying out morphological processing, character image segmentation and normalization processing on the binarization target image so as to extract and obtain corresponding character image information.
8. The real-time translation method according to claim 7, wherein the step of sequentially performing morphological processing, character image segmentation and normalization processing on the binarization target image to extract corresponding character image information comprises:
performing morphological processing on the binaryzation target image to enable each character to form a complete connected domain and draw a minimum circumscribed rectangle of each character connected domain to obtain a morphologically processed target image;
and according to the minimum circumscribed rectangle of each character connected domain, performing character image segmentation and normalization processing on the morphologically processed target image to extract corresponding character image information.
9. The real-time translation method according to claim 1, wherein the real-time translation method further comprises:
displaying the text information obtained by translation on the current picture of the terminal; or
And converting the translated text information into corresponding voice information, and playing the voice information.
10. A real-time translation device based on terminal pictures is characterized by comprising:
the target area determining unit is used for determining a preset target area on a current picture of the terminal;
the image acquisition unit is used for acquiring an image displayed in the preset target area;
the character extraction unit is used for processing the image to obtain corresponding character image information;
the recognition unit is used for recognizing the character image information through a neural network;
and the translation unit is used for translating the recognized characters.
11. The real-time translation apparatus according to claim 10, wherein the target region determination unit includes:
the record acquisition subunit is used for acquiring the historical translation record of the terminal;
a first area acquisition subunit, configured to acquire a translation area for translating a terminal history picture in the history translation record;
and the second area generation subunit is used for determining a preset target area on the current picture of the terminal according to the translation area.
12. The real-time translation device according to claim 11, wherein the second region generation subunit comprises:
the mode determining subunit is used for determining a current display mode of the terminal, wherein the display mode comprises a vertical screen display mode or a horizontal screen display mode;
a mode judging subunit, configured to judge whether the current display mode is the same as a terminal display mode corresponding to the historical translation record;
the first processing subunit is used for directly determining the translation area as a preset target area on the current picture of the terminal when the current display mode is the same as the terminal display mode corresponding to the historical translation record;
and the second processing subunit is used for performing conversion calculation processing on the translation area when the current display mode is different from the terminal display mode corresponding to the historical translation record so as to obtain a preset target area corresponding to the current picture of the terminal in the current display mode.
13. The real-time translation device according to claim 12, wherein the second processing subunit is configured to calculate a preset target area corresponding to the current terminal image in the current display mode according to the translation area, the terminal history image, and the position coordinates of the current terminal image in a terminal display screen.
14. The real-time translation device according to claim 10, wherein the neural network is based on a long-short term memory network, and the recognition unit comprises:
the training subunit is used for training a preset sample based on the long-term and short-term memory network to obtain an artificial neural network model;
and the identification subunit is used for identifying the character image information by adopting the artificial neural network model.
15. The real-time translation apparatus according to claim 10, wherein the character extraction unit comprises:
the preprocessing subunit is used for sequentially carrying out graying, character tilt correction, filtering and denoising and image enhancement on the image to obtain a preprocessed image;
and the extraction subunit is used for carrying out character extraction on the preprocessed image so as to obtain corresponding character image information.
16. A storage medium having stored therein a computer program which, when run on a computer, causes the computer to execute the real-time translation method of any one of claims 1 to 9.
17. A terminal device, characterized in that the terminal device comprises a processor and a memory, wherein a computer program is stored in the memory, and the processor executes the real-time translation method according to any one of claims 1 to 9 by calling the computer program stored in the memory.
CN202110315357.9A 2021-03-24 2021-03-24 Real-time translation method, device, storage medium and terminal equipment Pending CN113052067A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110315357.9A CN113052067A (en) 2021-03-24 2021-03-24 Real-time translation method, device, storage medium and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110315357.9A CN113052067A (en) 2021-03-24 2021-03-24 Real-time translation method, device, storage medium and terminal equipment

Publications (1)

Publication Number Publication Date
CN113052067A true CN113052067A (en) 2021-06-29

Family

ID=76515085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110315357.9A Pending CN113052067A (en) 2021-03-24 2021-03-24 Real-time translation method, device, storage medium and terminal equipment

Country Status (1)

Country Link
CN (1) CN113052067A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113947529A (en) * 2021-10-14 2022-01-18 万翼科技有限公司 Image enhancement method, model training method, component identification method and related equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729636A (en) * 2013-12-18 2014-04-16 小米科技有限责任公司 Method and device for cutting character and electronic device
CN107992483A (en) * 2016-10-26 2018-05-04 深圳超多维科技有限公司 The method, apparatus and electronic equipment of translation are given directions for gesture
CN108959274A (en) * 2018-06-27 2018-12-07 维沃移动通信有限公司 A kind of interpretation method and server of application program
CN109325464A (en) * 2018-10-16 2019-02-12 上海翎腾智能科技有限公司 A kind of finger point reading character recognition method and interpretation method based on artificial intelligence
CN109992753A (en) * 2019-03-22 2019-07-09 维沃移动通信有限公司 A kind of translation processing method and terminal device
CN110276349A (en) * 2019-06-24 2019-09-24 腾讯科技(深圳)有限公司 Method for processing video frequency, device, electronic equipment and storage medium
CN110866530A (en) * 2019-11-13 2020-03-06 云南大学 Character image recognition method and device and electronic equipment
CN111160047A (en) * 2018-11-08 2020-05-15 北京搜狗科技发展有限公司 Data processing method and device and data processing device
CN111340028A (en) * 2020-05-18 2020-06-26 创新奇智(北京)科技有限公司 Text positioning method and device, electronic equipment and storage medium
CN112328348A (en) * 2020-11-05 2021-02-05 深圳壹账通智能科技有限公司 Application program multi-language support method and device, computer equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729636A (en) * 2013-12-18 2014-04-16 小米科技有限责任公司 Method and device for cutting character and electronic device
CN107992483A (en) * 2016-10-26 2018-05-04 深圳超多维科技有限公司 The method, apparatus and electronic equipment of translation are given directions for gesture
CN108959274A (en) * 2018-06-27 2018-12-07 维沃移动通信有限公司 A kind of interpretation method and server of application program
CN109325464A (en) * 2018-10-16 2019-02-12 上海翎腾智能科技有限公司 A kind of finger point reading character recognition method and interpretation method based on artificial intelligence
CN111160047A (en) * 2018-11-08 2020-05-15 北京搜狗科技发展有限公司 Data processing method and device and data processing device
CN109992753A (en) * 2019-03-22 2019-07-09 维沃移动通信有限公司 A kind of translation processing method and terminal device
CN110276349A (en) * 2019-06-24 2019-09-24 腾讯科技(深圳)有限公司 Method for processing video frequency, device, electronic equipment and storage medium
CN110866530A (en) * 2019-11-13 2020-03-06 云南大学 Character image recognition method and device and electronic equipment
CN111340028A (en) * 2020-05-18 2020-06-26 创新奇智(北京)科技有限公司 Text positioning method and device, electronic equipment and storage medium
CN112328348A (en) * 2020-11-05 2021-02-05 深圳壹账通智能科技有限公司 Application program multi-language support method and device, computer equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
何鎏一等: "基于深度学习的光照不均匀文本图像的识别系统", 计算机应用与软件, vol. 6, no. 37 *
张紫婷;云静;许志伟;刘利民;: "基于深度学习的实景英语场景翻译的研究与应用", 内蒙古工业大学学报(自然科学版), no. 01 *
李念永;梁艳梅;张舒;杨立;常胜江;: "基于BP神经网络的复杂彩色图像文本定位", 光子学报, no. 10 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113947529A (en) * 2021-10-14 2022-01-18 万翼科技有限公司 Image enhancement method, model training method, component identification method and related equipment
CN113947529B (en) * 2021-10-14 2023-01-10 万翼科技有限公司 Image enhancement method, model training method, component identification method and related equipment

Similar Documents

Publication Publication Date Title
CN109242802B (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN111582241B (en) Video subtitle recognition method, device, equipment and storage medium
CN107862315B (en) Subtitle extraction method, video searching method, subtitle sharing method and device
CN111814520A (en) Skin type detection method, skin type grade classification method, and skin type detection device
Yang et al. Lecture video indexing and analysis using video ocr technology
CN110598686B (en) Invoice identification method, system, electronic equipment and medium
CN109344864B (en) Image processing method and device for dense object
CN111291661B (en) Method and equipment for identifying text content of icon in screen
WO2022089170A1 (en) Caption area identification method and apparatus, and device and storage medium
JP2023543640A (en) Liquor label identification method, liquor product information management method, and its apparatus, device, and storage medium
CN113436222A (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN112052730A (en) 3D dynamic portrait recognition monitoring device and method
CN113052067A (en) Real-time translation method, device, storage medium and terminal equipment
CN110533020A (en) A kind of recognition methods of text information, device and storage medium
CN110348353B (en) Image processing method and device
CN110782392B (en) Image processing method, device, electronic equipment and storage medium
CN112052352A (en) Video sequencing method, device, server and storage medium
CN111107264A (en) Image processing method, image processing device, storage medium and terminal
CN113591437B (en) Game text translation method, electronic device and storage medium
CN114387315A (en) Image processing model training method, image processing device, image processing equipment and image processing medium
CN111798542B (en) Model training method, data processing device, model training apparatus, and storage medium
CN111062377B (en) Question number detection method, system, storage medium and electronic equipment
CN114202723A (en) Intelligent editing application method, device, equipment and medium through picture recognition
CN111582281A (en) Picture display optimization method and device, electronic equipment and storage medium
CN111753715A (en) Method and device for shooting test questions in click-to-read scene, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination