CN111507112A

CN111507112A - Translation method and device and translation device

Info

Publication number: CN111507112A
Application number: CN201910100754.7A
Authority: CN
Inventors: 张玉亭; 马龙
Original assignee: Beijing Sogou Technology Development Co Ltd; Sogou Hangzhou Intelligent Technology Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2019-01-31
Filing date: 2019-01-31
Publication date: 2020-08-07
Anticipated expiration: 2039-01-31
Also published as: CN111507112B

Abstract

The embodiment of the invention provides a translation method, a translation device and a translation device. The method specifically comprises the following steps: determining a source language text line region in an image; if the adjacent source language text line regions are determined to comprise the text content of the same paragraph, combining the adjacent source language text line regions to obtain a source language text line region; and translating the source language text in the source language text field region to obtain a target language text corresponding to the source language text. The embodiment of the invention can improve the accuracy of image translation.

Description

Translation method and device and translation device

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a translation method and apparatus, and an apparatus for translation.

Background

With the continuous development of computer technology, texts in images can be translated by using a translation tool, source language texts in the images are translated into target language texts, and the translated images are output.

For example, referring to FIG. 1, there is shown a schematic diagram of an image to be translated that includes the following source language text in English as the source language: "China is leading the word in the facial recognition with words from the text after recognition with words 10 mile peer with out a single language in the same way second", assuming the target language is Chinese, the corresponding translation result of the source language text can be as follows: "the face recognition algorithm of china is in the leading position in the world, wherein the best algorithm can recognize ten million people in less than one second without any error. "

However, since the source language text is displayed divided into a plurality of text lines in the image shown in fig. 1, the translation tool translates the recognized text lines, respectively, and finally outputs the translated image as shown in fig. 2. The target language text that the user finally sees is: "china is leading in the face world", "best recognition algorithm", "algorithm capable of recognizing 10", "million population", "error of less than one second. ".

It can be seen that each line of target language text in fig. 2 corresponds to each line of source language text in fig. 1, but the semantic deviation between the target language text shown in fig. 2 and the source language text shown in fig. 1 is large, which not only affects the accuracy of translation, but also increases the difficulty for the user to understand.

Disclosure of Invention

The embodiment of the invention provides a translation method, a translation device and a translation device, which can improve the efficiency of checking commodity information by a user.

In order to solve the above problem, an embodiment of the present invention discloses a translation method, where the method includes:

determining a source language text line region in an image;

if the adjacent source language text line regions are determined to comprise the text content of the same paragraph, combining the adjacent source language text line regions to obtain a source language text line region;

and translating the source language text in the source language text field region to obtain a target language text corresponding to the source language text.

In another aspect, an embodiment of the present invention discloses a translation apparatus, where the apparatus includes:

the determining module is used for determining a source language text line region in the image;

the merging module is used for merging the adjacent source language text line regions to obtain source language text line regions if the adjacent source language text line regions comprise the text content of the same paragraph;

and the translation module is used for translating the source language text in the source language text field region to obtain a target language text corresponding to the source language text.

In yet another aspect, an embodiment of the present invention discloses an apparatus for translation, which includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors include instructions for:

determining a source language text line region in an image;

In yet another aspect, embodiments of the invention disclose a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform a translation method as described in one or more of the preceding.

The embodiment of the invention has the following advantages:

after determining a source language text line region in an image, the embodiment of the invention further judges whether the adjacent source language text line regions comprise text contents of the same paragraph, if the adjacent source language text line regions comprise the text contents of the same paragraph, the adjacent source language text line regions are merged to obtain the source language text line region, and then the source language text in the source language text line region is translated to obtain a target language text corresponding to the source language text. Because the text content in the source language text segment region obtained after combination comprises complete paragraphs and the combined paragraphs comprise complete sentences, the source language text in the source language text segment region obtained after combination is translated, and the target language text with more accurate semantic expression can be obtained, so that the translation accuracy is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic illustration of an image to be translated;

FIG. 2 is a schematic illustration of a translated image;

FIG. 3 is a flow chart of the steps of one embodiment of a translation method of the present invention;

FIG. 4 is a diagram of a source language text line region of the present invention;

FIG. 5 is a schematic illustration of another source language text line region of the present invention;

FIG. 6 is a block diagram of a translation apparatus according to an embodiment of the present invention;

FIG. 7 is a block diagram of an apparatus 800 for translation of the present invention; and

fig. 8 is a schematic diagram of a server in some embodiments of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Method embodiment

Referring to fig. 3, a flowchart illustrating steps of an embodiment of a translation method according to the present invention is shown, which may specifically include the following steps:

step 301, determining a source language text line region in an image;

step 302, if determining that the adjacent source language text line regions comprise the text content of the same paragraph, merging the adjacent source language text line regions to obtain a source language text line region;

step 303, translating the source language text in the source language text field region to obtain a target language text corresponding to the source language text.

The embodiment of the invention can be applied to a translation scene, and a translation client corresponding to the translation scene can translate the source language text in the image into the target language text according to the source language and the target language type set by a user. It can be understood that the embodiments of the present invention do not limit the types of the source language and the target language, for example, the source language may be chinese, and the target language may be english; alternatively, the source language may be english, and the target language may be japanese, etc.

The embodiment of the present invention does not limit the form of the translation client, for example, the translation client may be a translation APP (Application), and a user may download, install and use the APP in a terminal; alternatively, the translation client may be a web page online tool, and the user may open a web page, use an online translation client in the web page, and the like.

The translation client can run on a terminal, and the terminal specifically includes but is not limited to a smart phone, a tablet computer, an e-book reader, an MP3 (Moving Picture experts compressed standard Audio layer 3, Moving Picture experts Group Audio L eye III) player, an MP4 (Moving Picture experts compressed standard Audio layer 4, Moving Picture experts Group Audio L eye IV) player, a laptop portable computer, a vehicle-mounted computer, a desktop computer, a set-top box, a smart television, a wearable device, and the like.

In the embodiment of the present invention, the image to be translated may be any type of image, such as a commodity map, a detail map, a newspaper cover map, an advertisement map, and the like, and the Format of the image includes, but is not limited to, JPG (Joint Photographic expert group), PNG (Portable Network Graphics), TIFF (tag image File Format), BMP (Bitmap), and the like. It can be understood that the embodiment of the present invention does not limit the manner of acquiring the image, and the image may be downloaded from a web page, or acquired by a terminal device, for example, an image to be translated is obtained by taking a picture with a mobile phone or a camera.

For the image to be translated, the embodiment of the present invention first determines a source language text line region in the image, where the source language text line region is formed by combining a plurality of characters along the same direction, and each source language text line region may include a line of characters. Referring to FIG. 4, a schematic diagram of a source language text line region of the present invention is shown. As shown in fig. 4, where the rectangular frame region may represent the source language text line region obtained by identification, it can be seen that fig. 4 includes 5 source language text line regions.

It can be understood that, in the embodiment of the present invention, the direction of the source language text line region is not limited, and the direction of the source language text line region may be any direction, such as horizontal direction, vertical direction, and the like, according to the typesetting manner of the text in the image.

The embodiment of the present invention does not limit the specific manner of determining the source language text line region in the image. For example, the image may be thresholded to determine the source language text line regions in the image. Specifically, the thresholding process may select different thresholding methods according to actual conditions, such as a fixed thresholding method, an adaptive thresholding method, the Otsu method, an iterative method, and the like.

It is to be understood that the size of the source language text line region is not limited in this embodiment, for example, the source language text line region may be a minimum bounding rectangle containing the source language text line, that is, four sides of the minimum bounding rectangle are tangent to the uppermost end, the lowermost end, the leftmost end, and the rightmost end of the characters in the source language text line, respectively.

However, the thresholding method is generally applicable to images with simple backgrounds, and text regions in images with complex backgrounds are difficult to accurately locate, so that the embodiment of the present invention determines source language text line regions in the images by using a convolutional neural network.

In an alternative embodiment of the invention, the source language text line region may be determined according to a convolutional neural network.

Specifically, a large number of images containing text content can be collected as sample data to train and obtain a convolutional network prediction model, and text line regions in the images can be identified through the convolutional network prediction model. For example, an initial model of the convolutional network is first constructed and initialized, including the number of convolutional layers, the number of upsampling layers, the size of convolutional kernel, offset, and the like; then, iterative optimization can be performed on the initial model by adopting a gradient descent algorithm until the optimized model reaches a preset condition, the iterative optimization is stopped, and the model optimized for the last time is used as a convolutional network prediction model.

In the embodiment of the present invention, the initial model may be a network model composed of 7 convolutional layers and 1 upsampling layer. Of course, in practical applications, the number of convolution layers and the number of upsampling layers may be set according to actual needs, which is not limited in the embodiment of the present invention.

Because the convolutional network prediction model is obtained by training according to a large amount of sample data, compared with a thresholding mode, the source language text line region determined by the convolutional neural network is more accurate under the condition of complex image background.

Since the text in a source language text line region may not be a complete sentence, translating the text in each source language text line region separately will result in inaccurate semantic representation. While paragraphs are the most basic units in the article, the content of a paragraph usually has a relatively complete meaning. Therefore, the embodiment of the present invention further determines whether adjacent source language text line regions include text content of the same paragraph, and if it is determined that the adjacent source language text line regions include text content of the same paragraph, the adjacent source language text line regions may be merged, so that the text content in the source language text line regions obtained after merging includes a complete paragraph, and the merged paragraph includes a complete sentence, thereby translating the source language text in the source language text line regions, and obtaining a target language text with more accurate semantic expression, so as to improve the accuracy of translation.

One or more paragraphs may be included in the source language text segment region, and one paragraph may include one sentence or a plurality of sentences. It can be understood that the embodiment of the present invention does not limit the number of paragraphs in a source language text paragraph region, and the number of sentences in a paragraph.

Specifically, the embodiment of the present invention may determine whether adjacent source language text line regions include text contents of the same paragraph according to parameter information of the source language text line regions in the image. In an optional embodiment of the present invention, the determining that adjacent source language text line regions include text contents of the same paragraph specifically includes:

if the size difference of the adjacent source language text line regions is smaller than a preset difference value, the line spacing is smaller than a preset spacing value, and the text directions in the adjacent source language text line regions are the same, determining that the adjacent source language text line regions comprise the text content of the same paragraph; wherein the dimensions include: a height of the source language text line region, and/or a width of the source language text line region.

After determining the source language text line region in the image, the embodiment of the present invention may determine parameter information corresponding to each source language text line region, where the parameter information may at least include: the size (e.g., width, and/or height) of the source language text line regions, the line spacing (e.g., longitudinal distance) between the source language text line regions, the text direction.

In a specific application, the text content of the same paragraph usually has the same text format, such as the same font type, the same font size, the same text direction, and the like, and the text content of the same paragraph is usually located in a closer position area.

Therefore, if it is determined that the size difference of the adjacent source language text line regions is smaller than the preset difference value, the line spacing is smaller than the preset spacing value, and the text directions in the adjacent source language text line regions are the same, it can be determined that the adjacent source language text line regions include the text content of the same paragraph; wherein the dimensions may specifically include: a height of the source language text line region, and/or a width of the source language text line region.

It can be understood that the specific numerical values of the preset difference value and the preset distance value are not limited in the embodiment of the present invention, for example, the preset difference value and the preset distance value may be set to be smaller numerical values according to a conventional text typesetting experience.

In an example of an application of the present invention, referring to FIG. 5, a schematic diagram of another source language text line region of the present invention is shown. As shown in fig. 5, source language

text line regions

501, 502, 503, 504, 505, 506, 507, and 508 are included.

In addition, the text directions in 503 and 504 are the same, so that the text contents in 503 and 504 including the same paragraph can be determined, and thus 503 and 504 can be merged. Similarly, 506, 507, and 508 may be combined, and the combined result may be the following 5 source language text segment regions: paragraph area 1 (including 501), paragraph area 2 (including 502), paragraph area 3 (including 503 and 504), paragraph area 4 (including 505), and paragraph area 5 (including 506, 507, 508).

It is to be understood that the above-mentioned determination of whether adjacent source language text line regions include text content of the same paragraph according to the size (e.g. width, and/or height) of the source language text line regions, the line spacing (e.g. longitudinal distance) between the source language text line regions, and the text direction is only an application example of the present invention. The embodiment of the present invention does not limit the parameter information corresponding to the source language text line region according to the determination process, for example, the parameter information may further include: the coordinates of the top left vertex, the coordinates of the top right vertex, the angle (the included angle between the connecting line of the top left vertex and the top right vertex and the x axis), the height ratio, the text color and the like of the source language text line region.

For example, when determining whether adjacent source language text line regions include text content of the same paragraph, the embodiment of the present invention may further determine whether an angle difference between the adjacent source language text line regions is smaller than a preset angle (for example, the preset angle is set to 3 degrees), and if the size difference between the adjacent source language text line regions is smaller than a preset difference value, the line spacing is smaller than a preset spacing value, and the angle difference between the adjacent source language text line regions is smaller than 3 degrees, it may be determined that the adjacent source language text line regions include text content of the same paragraph.

For another example, when determining whether adjacent source language text line regions include text content of the same paragraph, the embodiment of the present invention may further determine whether colors of texts in the adjacent source language text line regions are the same, for example, if the size difference between the adjacent source language text line regions is smaller than the preset difference value, the line spacing is smaller than the preset spacing value, and directions and colors of texts in the adjacent source language text line regions are the same, it may be determined that the adjacent source language text line regions include text content of the same paragraph.

The embodiment of the invention can also identify the texts in the source language text line region, so as to determine the accuracy of whether the adjacent source language text line regions comprise the text content of the same paragraph according to the incidence relation between the texts in the adjacent source language text line regions, thereby further improving the accuracy of judgment.

In an optional embodiment of the present invention, the determining that adjacent source language text line regions include text contents of the same paragraph specifically includes:

step S11, determining the last word of the text line in the first area and determining the start word of the text line in the second area; the first region and the second region are adjacent source language text line regions, the first region is located at a first position in the adjacent source language text line region, and the second region is located at a second position in the adjacent source language text line region;

step S12, if it is determined that the end word and the start word satisfy the association condition, determining that the first area and the second area include text content of the same paragraph.

In practical application, the layout of the text mainly comprises a horizontal layout mode and a vertical layout mode, and for the horizontal layout mode, the first position may be an uplink position in an adjacent source language text line region, and the second position may be a downlink position in an adjacent source language text line region. For the vertical typesetting mode, if the text direction is from left to right, the first position may specifically be a left line position in the adjacent source language text line region, and the second position may specifically be a right line position in the adjacent source language text line region. Similarly, for the vertical typesetting mode, if the text direction is from right to left, the first position may specifically be a right line position in the adjacent source language text line region, and the second position may specifically be a left line position in the adjacent source language text line region. For convenience of description, the embodiments of the present invention are described by taking a horizontal typesetting manner as an example.

Optionally, the embodiment of the present invention may provide the following three determination manners for determining whether the last word and the starting word satisfy the association condition.

In a first mode

In an optional embodiment of the present invention, the determining that the end word and the start word satisfy an association condition specifically includes:

step S21, determining the first probability that the last tail word is a sentence tail word;

step S22, determining a second probability that the starting word is the first word of the sentence;

step S23, determining a third probability of the starting word appearing under the condition that the end word appears;

step S24, if the first probability is smaller than a first threshold, the second probability is smaller than a second threshold, and the third probability is greater than a third threshold, determining that the end word and the start word satisfy a correlation condition.

The language model is language abstract mathematical modeling according to language objective facts, a certain mapping relation can be established between the language model and the language objective facts, and whether the association conditions are met between the last tail words and the start words in the adjacent source language text line regions can be determined according to the language model. It should be noted that, the embodiment of the present invention mainly takes the statistical language model as an example for description, and the non-statistical language models may be referred to each other.

Alternatively, the statistical language model may describe the probability that an arbitrary word sequence S belongs to a certain language set in the form of probability distribution, where the word sequence S is not required to be complete in syntax, and may give a probability parameter value to the arbitrary word sequence S, and the corresponding calculation formula may be expressed as:

p(S)＝p(w₁,w₂,w₃,w₄,w₅,…,w_n)

＝p(w₁)p(w₂|w₁)p(w₃|w₁,w₂)...p(w_n|w₁,w₂,...,w_n-1) (1)

in formula (1), S includes n words, and w in formula (1)_iRepresenting the ith word in the sequence of words. Alternatively, the process of training the "language model" is to estimate the model parameters P (w)_i|w_i-n+1,...,w_i-1) Wherein P (w)_i|w_i-n+1,...,w_i-1) Can be used to denote that the first n-1 words are w_i-n+1,...,w_i-1In the case of (1), the suffix is w_iThe probability of (c).

Depending on the concept of the statistical language model, the existing statistical language model may process the preset corpus based on statistical algorithms to give the probability of word sequences or, given context data, predict the next most likely word.

In an embodiment of the present invention, the statistical language model specifically includes: a context-free Model, an N-gram Model, a Hidden Markov Model (HMM), a maximum entropy Model (maxim entrypye Model), a Recurrent Neural Networks Model (RNN). The context-free model can be independent of a context environment, the N-gram model, the HMM model, the maximum entropy model, the RMM model and the like need to be dependent on the context environment, machine learning methods used by the N-gram model, the HMM model, the maximum entropy model and the RMM model are different, and the machine learning methods used by the HMM model, the maximum entropy model and the RMM model not only consider the relation among preset corpora (namely training texts), but also use the time sequence characteristics of the training texts; and the N-element grammar model can not consider the relation between the training texts, wherein N is a positive integer which is more than or equal to 2.

Taking the language model as an N-gram language model as an example, after determining a source language text line region in an image, the embodiment of the present invention may perform text recognition on the source language text line region to determine a text in the source language text line region, perform word segmentation on the recognized text content to obtain a corresponding word segmentation sequence, further may obtain an end word w1 of a text line in a first region located upward and a start word w2 of a text line in a second region located downward in adjacent source language text line regions, and determine, according to the N-gram language model, a first probability P (w1) that w1 is an end word, a second probability P (w2) that w2 is an end word of a sentence, and a third probability P (w2 w1) that the start word w2 appears in a case that the end word w1 appears; if the first probability P (w1) is smaller than the first threshold, the second probability P (w2) is smaller than the second threshold, and the third probability P (w2| w1) is greater than the third threshold, which indicates that the probability that the end word w1 is an end word is smaller, the probability that the start word w2 is an initial word is smaller, and the probability that the end word w1 and the start word w2 occur simultaneously is greater, it is determined that the end word and the start word satisfy the association condition, that is, it may be determined that the first region and the second region include text contents of the same paragraph, and the first region and the second region may be merged.

It is to be understood that the specific values of the first threshold, the second threshold, and the third threshold are not limited in the embodiments of the present invention, for example, the first threshold and the second threshold may be the same value or different values, such as setting the first threshold and the second threshold to be 30% and setting the third threshold to be 80%.

In an application example of the present invention, it is assumed that a text region in a certain image is detected and recognized, and a first region located at an upper line in adjacent source language text line regions is obtained, and includes the following text contents: "10 open before the end of the year of ten high-speed rail new lines, 3 bottles of mineral water are surprised on the trial run car", and the second area that is located the downlinks includes following text content: "dumb people".

The text content in the first area is participled to obtain the following participle sequence of 10/high-speed rail/new line/year end/front/open, trial run/vehicle on/3 bottles/mineral water/surprise, and the last word w1 of the text line in the first area can be determined as 'surprise'; similarly, the start word w2 of the text line in the second region may be determined to be "blank"; assume that, according to the N-gram language model, the first probability P (w1) of determining the end word w1 "surprise" as an end word is 25%, the second probability P (w2) of determining the start word w2 "stay" as an initial word is 19%, and the third probability P (w2| w1) of determining the start word w2 to appear in the case of the end word w1 is 95%. It can be seen that the first probability P (w1) is less than the first threshold 30%, the second probability P (w2) is less than the second threshold 30%, and the third probability P (w2| w1) is greater than the third threshold 80%, it can be determined that the first region and the second region include text contents of the same paragraph, the first region and the second region can be merged, the source language text in the merged source language text segment region is "10 ten high-speed new lines open before the end of year," 3 bottles of mineral water in the vehicle are tried, and then the source language text in the merged source language text segment region is translated, so that the accuracy of translation can be improved.

Mode two

In an optional embodiment of the present invention, the determining that the end word and the start word satisfy an association condition specifically includes: and if the part of speech of the last end word is matched with a first preset part of speech, or if the part of speech of the initial word is matched with a second preset part of speech, determining that the last end word and the initial word meet the association condition.

The part of speech refers to the characteristic of a word as a basis for dividing the part of speech. Taking Chinese as an example, the words of modern Chinese can be divided into two types of 14 parts of speech (parts of speech). One type is the real word: nouns, verbs, adjectives, distinguishers, pronouns, numerators, quantifiers, one type is a null word: adverbs, prepositions, conjunctions, helpers, moors, vocabularies, pseudonyms, sighs.

In practical applications, words with different parts of speech have different functions and different positions in sentences, for example, a connection word is used to connect words with words, phrases with phrases or sentences with sentences, or a null word representing a certain logical relationship, and can be listed in parallel, in succession, in selection, in hypothesis, in comparison, in concession, etc. Thus, conjunctions typically occur in the middle of a sentence, and not at the end or beginning of a sentence. Prepositions, articles, and determiners also typically do not appear at the end of a sentence. For another example, the mood word is a fictitious word representing mood, and is often used at the end of a sentence or at a pause in a sentence to represent a variety of moods. Common examples of the word are: "Shi, Mo, Rong, Ba", o ". And (3) tone words: attached to the end of a word or sentence, it represents a tone.

According to the above feature of the part of speech, the embodiments of the present invention may identify the part of speech of the last word in the first region and the beginning word in the second region, and if it is determined that the part of speech of the last word matches the first preset part of speech, or if it is determined that the part of speech of the beginning word matches the second preset part of speech, it may be determined that the end word and the beginning word satisfy the association condition.

Optionally, the first preset part of speech may include at least any one of the following: conjunctions, prepositions, articles, qualifiers; the second threshold part of speech may include at least any one of: conjunctive words, mood words.

Specifically, the method and the device can label the part of speech of the participle in a large number of collected sentences, train a part of speech recognition model according to the labeled data, recognize the part of speech of the last word in the first region and the part of speech of the start word in the second region through the trained part of speech recognition model, further match the part of speech obtained through recognition with a first preset part of speech or a second preset part of speech, and determine that the last word and the start word meet the association condition if the part of speech of the last word is matched with the first preset part of speech or the part of speech of the start word is matched with the second preset part of speech.

Mode III

In an optional embodiment of the present invention, the determining that the end word and the start word satisfy an association condition specifically includes: and if the format of the initial word does not conform to the format of the sentence initial word corresponding to the source language, determining that the final word and the initial word meet the association condition.

The source language can be further identified to determine the language type of the source language, and then whether the format of the initial word meets the format of the initial word corresponding to the source language is judged, if the format of the initial word does not meet the format of the initial word corresponding to the source language, the probability that the initial word is not the initial word is higher, and then the initial word and the last word in the previous line (the text line in the first region) can meet the association condition.

Taking the source language as english as an example, the first letter of the first word of a sentence in an english sentence is usually a capital letter, and therefore, if it is determined that the first letter of the starting word is not a capital letter, it is determined that the format of the starting word does not conform to the format of the first word of a sentence corresponding to the source language as english, and it can be determined that the starting word and the last word of the previous line satisfy the association condition, and can be merged with the text content of the previous line.

In an optional embodiment of the present invention, the determining that the end word and the start word satisfy an association condition specifically includes: and if no punctuation mark exists after the last word of the text line in the first region, determining that the last word and the initial word meet the association condition.

In a specific application, a complete sentence usually ends with a punctuation mark, so if no punctuation mark exists after an end word of a text line in the first region, it indicates that the text content in the first region does not end, and the probability that the end word has an association relationship with a start word of a next line (a text line in the second region) is high, it may be determined that the end word and the start word satisfy an association condition.

It is understood that, in practical applications, the above-mentioned scheme for determining whether adjacent source language text line regions include text contents of the same paragraph may be used alone or in combination. For example, whether the adjacent source language text line regions include the text content of the same paragraph can be judged according to the parameter information corresponding to the source language text line regions; or judging whether the adjacent source language text line regions comprise the text content of the same paragraph or not according to the incidence relation between the last tail word in the first region and the start word in the second region; or after determining a source language text line region in the image, firstly judging whether the adjacent source language text line regions comprise text contents of the same paragraph according to parameter information corresponding to the source language text line region to perform preliminary paragraph grouping on the adjacent source language text line regions, then identifying texts in the source language text line regions subjected to preliminary paragraph grouping, and further judging whether the adjacent source language text line regions subjected to preliminary paragraph grouping comprise text contents of the same paragraph according to the incidence relation between the last tail word in the first region and the start word in the second region to perform further paragraph grouping on the preliminary paragraph grouping result to obtain a more accurate source language text line region; and finally, translating the source language text in the source language text field region obtained after the second paragraph grouping to obtain a target language text corresponding to the source language text.

It can be understood that the execution time of the embodiment of the present invention for recognizing the text in the source language text line region is not limited, for example, the embodiment of the present invention may recognize the source language text in the source language text line region after determining the source language text line region in the image and before grouping the adjacent source language text line regions; optionally, in the embodiment of the present invention, two convolutional neural networks may be trained, where one convolutional neural network is used to detect a source language text line region in an image to determine the source language text line region in the image, and the other convolutional neural network is used to perform text recognition on the source language text line region obtained by detection to obtain text content in each source language text line region; or, after the adjacent source language text line region is segmented, the source language text in the segmented source language text line region may be identified, and the like.

To sum up, after determining a source language text line region in an image, the embodiment of the present invention further determines whether adjacent source language text line regions include text content of the same paragraph, and if it is determined that adjacent source language text line regions include text content of the same paragraph, merges the adjacent source language text line regions to obtain a source language text line region, and translates a source language text in the source language text line region to obtain a target language text corresponding to the source language text. Because the text content in the source language text segment region obtained after combination comprises complete paragraphs and the combined paragraphs comprise complete sentences, the source language text in the source language text segment region obtained after combination is translated, and the target language text with more accurate semantic expression can be obtained, so that the translation accuracy is improved.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Device embodiment

Referring to fig. 6, a block diagram of a translation apparatus according to an embodiment of the present invention is shown, where the apparatus may specifically include:

a determining module 601, configured to determine a source language text line region in an image;

a merging module 602, configured to merge adjacent source language text line regions to obtain a source language text line region if it is determined that the adjacent source language text line regions include text content of the same paragraph;

the translation module 603 is configured to translate the source language text in the source language text segment region to obtain a target language text corresponding to the source language text.

Optionally, the merging module 602 may specifically include:

the first determining submodule is used for determining that the adjacent source language text line regions comprise text contents of the same paragraph if the size difference of the adjacent source language text line regions is smaller than a preset difference value, the line spacing is smaller than a preset spacing value and the text directions in the adjacent source language text line regions are the same; wherein the dimensions include: a height of the source language text line region, and/or a width of the source language text line region.

Optionally, the merging module 602 may specifically include:

the second determining submodule is used for determining the last word of the text line in the first area and determining the start word of the text line in the second area; the first region and the second region are adjacent source language text line regions, the first region is located at a first position in the adjacent source language text line region, and the second region is located at a second position in the adjacent source language text line region;

and the third determining submodule is used for determining that the first area and the second area comprise the text content of the same paragraph if the last word and the start word meet the association condition.

Optionally, the third determining sub-module may specifically include:

the first determining unit is used for determining the first probability that the last tail word is a sentence tail word;

the second determining unit is used for determining a second probability that the starting word is a sentence head word;

a third determining unit, configured to determine a third probability that the starting word appears if the last word appears;

a fourth determining unit, configured to determine that an association condition is satisfied between the end word and the start word if the first probability is smaller than a first threshold, the second probability is smaller than a second threshold, and the third probability is greater than a third threshold.

Optionally, the third determining sub-module may specifically include:

a fifth determining unit, configured to determine that the end word and the start word satisfy an association condition if the part of speech of the end word matches a first preset part of speech, or if the part of speech of the start word matches a second preset part of speech.

Optionally, the third determining sub-module may specifically include:

a sixth determining unit, configured to determine that the last word and the starting word satisfy an association condition if the format of the starting word does not match the format of the sentence start word corresponding to the source language.

Optionally, the source language text line region is determined according to a convolutional neural network.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

An embodiment of the present invention provides an apparatus for translation, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs including instructions for: determining a source language text line region in an image; if the adjacent source language text line regions are determined to comprise the text content of the same paragraph, combining the adjacent source language text line regions to obtain a source language text line region; and translating the source language text in the source language text field region to obtain a target language text corresponding to the source language text.

Fig. 7 is a block diagram illustrating an apparatus 800 for translation, according to an example embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 7, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user, in some embodiments, the screen may include a liquid crystal display (L CD) and a Touch Panel (TP). if the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice information processing mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on radio frequency information processing (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), programmable logic devices (P L D), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the methods described above.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 8 is a schematic diagram of a server in some embodiments of the invention. The server 1900, which may vary widely in configuration or performance, may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) storing applications 1942 or data 1944. Memory 1932 and storage medium 1930 can be, among other things, transient or persistent storage. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Still further, a central processor 1922 may be provided in communication with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.

The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows server, Mac OS XTM, UnixTM, &lttttranslation = L "&tttl &/t &gttinuxtm, FreeBSDTM, and so forth.

A non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform the translation method shown in fig. 1.

A non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform a translation method, the method comprising: determining a source language text line region in an image; if the adjacent source language text line regions are determined to comprise the text content of the same paragraph, combining the adjacent source language text line regions to obtain a source language text line region; and translating the source language text in the source language text field region to obtain a target language text corresponding to the source language text.

The embodiment of the invention discloses A1 and a translation method, which comprises the following steps:

determining a source language text line region in an image;

A2, according to the method in A1, the determining the text contents of the same paragraph in the adjacent source language text line region includes:

A3, according to the method in A1, the determining the text contents of the same paragraph in the adjacent source language text line region includes:

determining the last word of the text line in the first area and determining the initial word of the text line in the second area; the first region and the second region are adjacent source language text line regions, the first region is located at a first position in the adjacent source language text line region, and the second region is located at a second position in the adjacent source language text line region;

and if the last end word and the start word are determined to meet the association condition, determining that the first area and the second area comprise the text content of the same paragraph.

A4, according to the method of A3, the determining that the end word and the start word satisfy an association condition includes:

determining a first probability that the last end word is a sentence end word;

determining a second probability that the starting word is a sentence initial word;

determining a third probability of the beginning word occurring if the end word occurs;

and if the first probability is smaller than a first threshold, the second probability is smaller than a second threshold, and the third probability is larger than a third threshold, determining that a correlation condition is met between the end word and the starting word.

A5, according to the method of A3, the determining that the end word and the start word satisfy an association condition includes:

and if the part of speech of the last end word is matched with a first preset part of speech, or if the part of speech of the initial word is matched with a second preset part of speech, determining that the last end word and the initial word meet the association condition.

A6, according to the method of A3, the determining that the end word and the start word satisfy an association condition includes:

and if the format of the initial word does not conform to the format of the sentence initial word corresponding to the source language, determining that the final word and the initial word meet the association condition.

The embodiment of the invention discloses B7 and a translation device, wherein the translation device comprises:

B8, the apparatus of B7, the merge module comprising:

B9, the apparatus of B7, the merge module comprising:

B10, the apparatus of B9, the third determining submodule comprising:

B11, the apparatus of B9, the third determining submodule comprising:

B12, the apparatus of B9, the third determining submodule comprising:

The embodiment of the invention discloses C13, a device for translation, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs are configured to be executed by one or more processors and comprise instructions for:

determining a source language text line region in an image;

C14, the apparatus according to C13, the determining that the adjacent source language text line regions include text content of the same paragraph includes:

C15, the apparatus according to C13, the determining that the adjacent source language text line regions include text content of the same paragraph includes:

C16, the apparatus of C15, the determining that the end word and the start word satisfy an association condition, comprising:

determining a first probability that the last end word is a sentence end word;

C17, the apparatus of C15, the determining that the end word and the start word satisfy an association condition, comprising:

C18, the apparatus of C15, the determining that the end word and the start word satisfy an association condition, comprising:

Embodiments of the present invention disclose D19, a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform a translation method as described in one or more of a 1-a 6.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

The above detailed description is provided for a translation method, a translation apparatus and an apparatus for translation, and the principle and the implementation of the present invention are explained by applying specific examples, and the description of the above examples is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method of translation, the method comprising:

determining a source language text line region in an image;

2. The method of claim 1, wherein determining that adjacent source language text line regions include text content of a same paragraph comprises:

3. The method of claim 1, wherein determining that adjacent source language text line regions include text content of a same paragraph comprises:

4. The method of claim 3, wherein the determining that an association condition is satisfied between the end word and the starting word comprises:

determining a first probability that the last end word is a sentence end word;

5. The method of claim 3, wherein the determining that an association condition is satisfied between the end word and the starting word comprises:

6. The method of claim 3, wherein the determining that an association condition is satisfied between the end word and the starting word comprises:

7. A translation apparatus, the apparatus comprising:

8. The apparatus of claim 7, wherein the merging module comprises:

9. An apparatus for translation, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors the one or more programs including instructions for:

determining a source language text line region in an image;

10. A machine-readable medium having stored thereon instructions which, when executed by one or more processors, cause an apparatus to perform a translation method as recited in one or more of claims 1-6.