CN111507112B

CN111507112B - Translation method and device for translation

Info

Publication number: CN111507112B
Application number: CN201910100754.7A
Authority: CN
Inventors: 张玉亭; 马龙
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2019-01-31
Filing date: 2019-01-31
Publication date: 2024-02-02
Anticipated expiration: 2039-01-31
Also published as: CN111507112A

Abstract

The embodiment of the invention provides a translation method, a translation device and a translation device. Therein, wherein method of the method specifically comprises the following steps: determining a text line area of a source language in an image; if the adjacent source language text line area is determined to comprise the text content of the same paragraph, merging the adjacent source language text line area to obtain a source language text segment area; and translating the source language text in the source language text segment area to obtain a target language text corresponding to the source language text. The embodiment of the invention can improve the accuracy of image translation.

Description

Translation method and device for translation

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a translation method, a translation device, and a translation device.

Background

With the continuous development of computer technology, the text in the image can be translated by using a translation tool, the source language text in the image is translated into the target language text, and the translated image is output.

For example, referring to fig. 1, there is shown a schematic diagram of an image to be translated, the image comprises the following source language texts with English source language: "China is leading the world in facial recognition algorithms with its best algorithm able to recognize 10million people without a single mistake in less than a second". Assuming that the target language is chinese, the translation result corresponding to the text in the source language may be as follows: "Chinese face recognition algorithm is the leading one in the world, where the best algorithm can recognize tens of millions of people in less than one second without any error. "

However, since the source language text is displayed divided into a plurality of text lines in the image shown in fig. 1, the translation tool translates the plurality of recognized text lines, respectively, and finally outputs the translated image as shown in fig. 2. The target language text finally seen by the user is: "China leads in the facial world", "best recognition algorithm", "algorithm capable of recognizing 10", "million population", "error of less than one second". ".

It can be seen that, although each line of the target language text in fig. 2 corresponds to each line of the source language text in fig. 1, the semantic deviation between the target language text shown in fig. 2 and the source language text in fig. 1 is larger, which not only affects the accuracy of translation, but also increases the difficulty for the understanding of the user.

Disclosure of Invention

The embodiment of the invention provides a translation method, a translation device and a translation device, the efficiency of the user for checking commodity information can be improved.

In order to solve the above problems, an embodiment of the present invention discloses a translation method, which includes:

determining a text line area of a source language in an image;

if the adjacent source language text line area is determined to comprise the text content of the same paragraph, merging the adjacent source language text line area to obtain a source language text segment area;

And translating the source language text in the source language text segment area to obtain a target language text corresponding to the source language text.

In another aspect, an embodiment of the present invention discloses a translation apparatus, including:

the determining module is used for determining a text line area of a source language in the image;

the merging module is used for merging the adjacent source language text line areas to obtain the source language text segment areas if the adjacent source language text line areas are determined to comprise the text content of the same paragraph;

and the translation module is used for translating the source language text in the source language text segment area to obtain a target language text corresponding to the source language text.

In yet another aspect, an embodiment of the present invention discloses an apparatus for translation, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:

determining a text line area of a source language in an image;

In yet another aspect, embodiments of the invention disclose a machine-readable medium having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform a translation method as described in one or more of the preceding.

The embodiment of the invention has the following advantages:

after determining a source language text line area in an image, the embodiment of the invention further judges whether the adjacent source language text line area comprises text contents of the same paragraph, if the adjacent source language text line area is determined to comprise text contents of the same paragraph, the adjacent source language text line areas are combined to obtain a source language text line area, and then source language texts in the source language text line area are translated to obtain target language texts corresponding to the source language texts. Because the text content in the combined source language text segment region comprises a complete paragraph, and the combined paragraph comprises a complete sentence, the source language text in the combined source language text segment region is translated, and the target language text with more accurate semantic expression can be obtained, so that the translation accuracy is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of an image to be translated;

FIG. 2 is a schematic illustration of a post-translational image;

FIG. 3 is a flow chart of steps of one embodiment of a translation method of the present invention;

FIG. 4 is a schematic representation of a text line field of a source language of the present invention;

FIG. 5 is a schematic representation of another source language text line region of the present invention;

FIG. 6 is a block diagram of an embodiment of a translation device of the present invention;

FIG. 7 is a block diagram of an apparatus 800 for translation of the present invention; and

Fig. 8 is a schematic diagram of a server in some embodiments of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Method embodiment

Referring to fig. 3, a flowchart illustrating steps of an embodiment of a translation method of the present invention may specifically include the following steps:

step 301, determining a text line area of a source language in an image;

step 302, if it is determined that the text content of the same paragraph is included in the adjacent source language text line region, merging the adjacent source language text line regions to obtain a source language text segment region;

and 303, translating the source language text in the source language text segment area to obtain a target language text corresponding to the source language text.

The embodiment of the invention can be applied to a translation scene, and a translation client corresponding to the translation scene can translate a source language text in an image into a target language text according to the types of the source language and the target language set by a user. It may be appreciated that the embodiments of the present invention do not limit the kinds of source language and target language, for example, the source language may be chinese, and the target language may be english; alternatively, the source language may be english and the target language may be japanese, etc.

The embodiment of the invention does not limit the form of the translation client, for example, the translation client can be a translation APP (Application), and a user can download, install and use the APP in a terminal; alternatively, the translation client may be a web page online tool, the user may open a web page, use an online translation client in the web page, and so on.

The translation client may be running on a terminal that specifically includes, but is not limited to: smart phones, tablet computers, e-book readers, MP3 (dynamic video expert compression standard audio plane 3,Moving Picture Experts Group Audio Layer III) players, MP4 (dynamic video expert compression standard audio plane 4,Moving Picture Experts Group Audio Layer IV) players, laptop portable computers, car computers, desktop computers, set-top boxes, smart televisions, wearable devices, and the like.

In the embodiment of the present invention, the image to be translated may be any type of image, such as a commodity image, a detail image, a drawing cover image, an advertisement image, etc., and the format of the image includes, but is not limited to, JPG (Joint Photographic Experts Group, joint image expert group), PNG (Portable Network Graphics, portable network graphic), TIFF (Tag Image File Format ), BMP (Bitmap, bitmap), etc. It can be understood that the method for acquiring the image according to the embodiment of the present invention is not limited, and may be downloaded from a web page, or may be acquired through a terminal device, for example, a mobile phone or a camera is used to take a picture to obtain an image to be translated.

For the image to be translated, the embodiment of the invention firstly determines a source language text line area in the image, wherein the source language text line area is formed by combining a plurality of characters along the same direction, and each source language text line area can comprise a line of characters. Referring to FIG. 4, a schematic diagram of a text line area in source language of the present invention is shown. As shown in fig. 4, the rectangular box area may represent the identified source language text line area, and it can be seen that fig. 4 includes 5 source language text line areas.

It can be understood that the direction of the text line area of the source language is not limited in the embodiment of the present invention, and the direction of the text line area of the source language may be any direction such as a transverse direction, a longitudinal direction, etc. according to the typesetting mode of the characters in the image.

The embodiment of the invention does not limit the specific mode of determining the text line area of the source language in the image. For example, the image may be thresholded to determine the text line regions of the source language in the image. Specifically, the thresholding process may select different thresholding methods according to actual situations, such as a fixed thresholding method, an adaptive thresholding method, an oxford method, an iterative method, and the like.

It will be appreciated that the size of the source language text line region is not limited in this embodiment, for example, the source language text line region may be a minimum bounding rectangle containing the source language text line, that is, four sides of the minimum bounding rectangle are tangent to the uppermost end, the lowermost end, the leftmost end, and the rightmost end of the text in the source language text line, respectively.

However, the thresholding method is generally suitable for images with simple backgrounds, and for images with complex backgrounds, it is difficult to accurately locate text regions in the images, so that the embodiment of the invention uses a convolutional neural network to determine text line regions in the source language in the images.

In an alternative embodiment of the present invention, the source language text line region may be determined from a convolutional neural network.

Specifically, a large number of images containing text content may be collected as sample data to be trained to obtain a convolutional network prediction model by which text line regions in the images may be identified. For example, an initial model of the convolutional network is first constructed and initialized, including the number of layers of the convolutional layer, the number of layers of the upsampling layer, the size of the convolutional kernel, the offset, etc.; and then, carrying out iterative optimization on the initial model by adopting a gradient descent algorithm until the optimized model reaches a preset condition, stopping iterative optimization, and taking the last optimized model as a convolution network prediction model.

In the embodiment of the invention, the initial model may be a network model composed of 7 convolution layers and 1 upsampling layer. Of course, in practical application, the number of layers of the convolution layer and the number of layers of the upsampling layer may be set according to actual needs, which is not limited by the embodiment of the present invention.

Because the convolutional network prediction model is obtained through training according to a large amount of sample data, compared with a thresholding mode, the method has the advantage that the convolutional neural network is utilized to determine the obtained text line area of the source language more accurately under the condition that the image background is complex.

Since text in a source language text line region may not be a complete sentence, translating text in each source language text line region separately may result in inaccurate semantic expressions. And paragraphs are the most basic units in an article, the content of a paragraph generally has a relatively complete meaning. Therefore, the embodiment of the invention further judges whether the text content of the same paragraph is included in the adjacent source language text line area, if the text content of the same paragraph is included in the adjacent source language text line area, the adjacent source language text line area can be combined, so that the text content in the combined source language text line area comprises a complete paragraph, and the combined paragraph comprises a complete sentence, thereby translating the source language text in the source language text line area, obtaining a target language text with more accurate semantic expression, and improving the translation accuracy.

Wherein, the text segment area of the source language can comprise one or more paragraphs, and one paragraph can comprise one sentence or a plurality of sentences. It will be appreciated that the embodiments of the present invention do not limit the number of paragraphs in a text passage area of a source language, nor the number of sentences in a paragraph.

Specifically, the embodiment of the invention can judge whether the text content of the same paragraph is included in the adjacent text line area of the source language according to the parameter information of the text line area of the source language in the image. In an optional embodiment of the present invention, the determining that the text content of the same paragraph is included in the adjacent text line area of the source language may specifically include:

if the size difference of the adjacent source language text line areas is smaller than a preset difference value, the line spacing is smaller than a preset spacing value, and the text directions in the adjacent source language text line areas are the same, determining that the adjacent source language text line areas comprise text contents of the same paragraph; wherein the dimensions include: the height of the source language text line area, and/or the width of the source language text line area.

After determining the text line regions in the source language in the image, the embodiment of the invention may determine the parameter information corresponding to each text line region in the source language, where the parameter information may at least include: the size (e.g., width, and/or height) of the text line regions in the source language, the line spacing (e.g., longitudinal distance) between the text line regions in the source language, and the text direction.

In a particular application, the text content of the same paragraph is typically in the same text format, such as the same font type, the same font size, the same text direction, etc., and the text content of the same paragraph is typically located in a closer location area.

Therefore, if it is determined that the size difference of the adjacent source language text line areas is smaller than the preset difference value, the line spacing is smaller than the preset spacing value, and the text directions in the adjacent source language text line areas are the same, it is determined that the text contents of the same paragraph are included in the adjacent source language text line areas; wherein, the dimension specifically may include: the height of the source language text line area, and/or the width of the source language text line area.

It can be appreciated that the specific values of the preset difference value and the preset interval value are not limited in the embodiment of the present invention, and for example, the preset difference value and the preset interval value may be set to smaller values according to conventional text typesetting experience.

In an example application of the present invention, referring to FIG. 5, a schematic diagram of another source language text line region of the present invention is shown. As shown in fig. 5, source language text line regions 501, 502, 503, 504, 505, 506, 507, and 508 are included.

Wherein the size difference between 503 and 504 is smaller, if smaller than the preset difference value, and the line spacing between 503 and 504 is smaller, if smaller than the preset spacing value, and in addition, the text directions in 503 and 504 are the same, so that the text content of the same paragraph can be determined to be included in 503 and 504, and thus 503 and 504 can be combined. Similarly, 506, 507, and 508 may be combined, and the following 5 source language text segment regions may be obtained after the combination: paragraph area 1 (including 501), paragraph area 2 (including 502), paragraph area 3 (including 503 and 504), paragraph area 4 (including 505), paragraph area 5 (including 506, 507, 508).

It will be appreciated that the above-described determination of whether adjacent source language text line regions include text content of the same paragraph based on the size (e.g., width, and/or height) of the source language text line regions, the line spacing (e.g., longitudinal distance) between the source language text line regions, and the text direction is merely one example of an application of the present invention. The embodiment of the invention does not limit the parameter information corresponding to the text line area of the source language according to the judging process, for example, the parameter information can also comprise: upper left vertex coordinates, upper right vertex coordinates, angles (the included angle between the upper left and upper right vertex lines and the x-axis), height ratios, text colors, etc. of the text line region of the source language.

For example, when determining whether the text content of the same paragraph is included in the adjacent text line area of the source language, the embodiment of the present invention may further determine whether the angle difference between the adjacent text line areas of the source language is smaller than a preset angle (for example, the preset angle is set to 3 degrees), if the size difference of the adjacent source language text line areas is smaller than a preset difference value, the line spacing is smaller than a preset spacing value, and the angle difference of the adjacent source language text line areas is smaller than 3 degrees, determining that the adjacent source language text line areas comprise text content of the same paragraph.

For another example, when determining whether the text content of the same paragraph is included in the adjacent source language text line region, the embodiment of the present invention may further determine whether the text color in the adjacent source language text line region is the same, for example, if the size difference of the adjacent source language text line region is smaller than a preset difference value, and the line spacing is smaller than a preset spacing value, and the direction and color of the text in the adjacent source language text line region are the same, the text content of the same paragraph is included in the adjacent source language text line region may be determined.

The embodiment of the invention can also identify the texts in the text line areas of the source language so as to determine whether the text line areas of the adjacent source language comprise the accuracy of the text content of the same paragraph according to the association relation between the texts in the text line areas of the adjacent source language, thereby further improving the accuracy of judgment.

In an optional embodiment of the present invention, the determining that the text content of the same paragraph is included in the adjacent text line area of the source language may specifically include:

step S11, determining end words of Chinese character lines in a first area and determining initial words of the Chinese character lines in a second area; the first area and the second area are adjacent source language text line areas, the first area is located at a first position in the adjacent source language text line areas, and the second area is located at a second position in the adjacent source language text line areas;

and step S12, if the condition that the end word and the start word meet the association condition is determined, determining that the text content of the same paragraph is included in the first area and the second area.

In practical application, typesetting of the text mainly comprises two modes of transverse and longitudinal, and for the transverse typesetting mode, the first position can be specifically an uplink position in a text line area of the adjacent source language, and the second position can be specifically a downlink position in the text line area of the adjacent source language. For the longitudinal typesetting mode, if the text direction is from left to right, the first position may specifically be a left line position in the adjacent source language text line area, and the second position may specifically be a right line position in the adjacent source language text line area. Similarly, for the longitudinal typesetting mode, if the text direction is from right to left, the first position may specifically be a right line position in the adjacent source language text line region, and the second position may specifically be a left line position in the adjacent source language text line region. For convenience of description, the embodiment of the invention is illustrated by taking a transverse typesetting mode as an example.

Optionally, the embodiment of the present invention may provide the following three judging manners for determining whether the association condition is satisfied between the end word and the start word.

Mode one

In an optional embodiment of the present invention, the determining that the end word and the start word meet the association condition may specifically include:

s21, determining a first probability that the end word is an end word;

step S22, determining a second probability that the initial word is a sentence head word;

step S23, determining a third probability of occurrence of the initial word under the condition that the end word occurs;

step S24, if the first probability is smaller than a first threshold value and the second probability is smaller than a second threshold value, and if the third probability is greater than a third threshold, determining that the association condition is satisfied between the end word and the start word.

The language model is language abstract mathematical modeling performed according to language objective facts, and can establish a certain mapping relation between the language model and the language objective facts. It should be noted that, in the embodiment of the present invention, a statistical language model is mainly used as an example to describe, and non-statistical language models are only needed to be referred to each other.

Alternatively, the statistical language model may describe the probability that any word sequence S belongs to a certain language set in the form of a probability distribution, where it is not required that the word sequence S is grammatically complete, and the statistical language model may give a probability parameter value to any word sequence S, and the corresponding calculation formula may be expressed as:

p(S)＝p(w ₁ ,w ₂ ,w ₃ ,w ₄ ,w ₅ ,…,w _n )

＝p(w ₁ )p(w ₂ |w ₁ )p(w ₃ |w ₁ ,w ₂ )...p(w _n |w ₁ ,w ₂ ,...,w _n-1 ) (1)

in formula (1), S includes n words, w in formula (1) _i Representing the i-th vocabulary in the word sequence. Alternatively, the process of training a "language model", is to estimate the model parameters P (w _i |w _i-n+1 ,...,w _i-1 ) Wherein P (w _i |w _i-n+1 ,...,w _i-1 ) Can be used to represent the first n-1 words as w _i-n+ 1,...,w _i-1 In the case of (2), the postamble is w _i Is a probability of (2).

According to the concept of a statistical language model, the existing statistical language model may process a preset corpus based on a statistical algorithm to give probabilities of word sequences or, given context data, predict the next most likely word to occur.

In an embodiment of the present invention, the statistical language model may specifically include: a context-free Model, an N-gram Model, a hidden Markov Model (HMM, hidden Markov Model), a maximum entropy Model (Maximum Entropy Model), a recurrent neural network Model (RNN, recurrent Neural Networks Model). The context-independent model can be independent of a context environment, the N-gram model, the HMM model, the maximum entropy model, the RMM model and the like need to depend on the context environment, the machine learning methods used by the N-gram model, the HMM model, the maximum entropy model and the RMM model are different, and the machine learning methods used by the HMM model, the maximum entropy model and the RMM model not only consider the relation between preset corpus (namely training texts) but also use the time sequence characteristics of the training texts; and the N-gram model can not consider the relation between training texts, wherein N is a positive integer greater than or equal to 2.

Taking the language model as an N-gram language model as an example, after determining a source language text line area in an image, the embodiment of the invention can perform text recognition on the source language text line area to determine a text in the source language text line area, and perform word segmentation on the recognized text content to obtain a corresponding word segmentation sequence, so as to obtain an end word w1 of a Chinese line in a first area of an adjacent source language text line area and a start word w2 of a Chinese line in a second area of the adjacent source language text line area, and determine a first probability P (w 1) that w1 is an end sentence word, determine a second probability P (w 2) that w2 is a sentence head word, and determine a third probability P (w2| w1) that the start word w2 appears under the condition that the end word w1 appears; if the first probability P (w 1) is smaller than a first threshold, the second probability P (w 2) is smaller than a second threshold, and the third probability P (w2|w1) is larger than a third threshold, which indicates that the probability of the end word w1 as an end word is smaller, the probability of the start word w2 as a sentence head word is smaller, and the probability of the end word w1 and the start word w2 appearing simultaneously is larger, it is determined that the end word and the start word satisfy the association condition, that is, it is determined that text contents of the same paragraph are included in the first area and the second area, and the first area and the second area can be combined.

It will be appreciated that the specific values of the first threshold, the second threshold, and the third threshold are not limited in this embodiment of the present invention, for example, the first threshold and the second threshold may be the same value or different values, for example, the first threshold and the second threshold are both set to 30%, and the third threshold is set to 80%.

In an application example of the present invention, it is assumed that detecting a text region in an image and identifying the text, and obtaining a first region located in an upper line in a text line region of an adjacent source language includes the following text contents: "10 ten new lines of high-speed rail are opened before the year end, 3 bottles of mineral water on the test operation vehicle are frightened", and the second area located in the descending direction comprises the following text contents: "foolproof".

The text content in the first area is segmented to obtain a segmentation sequence of 10 words/high-speed rail/new line/year end/front/on, test run/on-board/3 bottles/mineral water/surprise, and the last word w1 of the Chinese line in the first area can be determined to be 'surprise'; similarly, the initial word w2 of the Chinese line in the second region may be determined to be "slow"; assume that according to the N-gram language model, a first probability P (w 1) that an end word w1 "fright" is an end word is 25%, a second probability P (w 2) that a start word w2 "stay" is a beginning word is 19%, and a third probability P (w2|w1) that the start word w2 appears in the case where the end word w1 appears is 95%. It can be seen that the first probability P (w 1) is smaller than the first threshold value 30%, the second probability P (w 2) is smaller than the second threshold value 30%, and the third probability P (w2|w1) is larger than the third threshold value 80%, so that the text content of the same paragraph can be determined to be included in the first region and the second region, the first region and the second region can be combined, the source language text in the combined source language text section region is obtained as '10 ten high-speed railway new line years ago is opened', 3 bottles of mineral water on a test vehicle are translated, and the accuracy of translation can be improved.

Mode two

In an optional embodiment of the present invention, the determining that the end word and the start word meet the association condition may specifically include: and if the part of speech of the end word is matched with the first preset part of speech, or if the part of speech of the starting word is matched with the second preset part of speech, determining that the end word and the starting word meet the association condition.

Parts of speech refers to the feature of a word as the basis for dividing the parts of speech. Taking chinese as an example, words of modern chinese can be divided into 14 parts of speech (parts of speech) of two classes. One class is the real word: nouns, verbs, adjectives, differentiating words, pronouns, numbers, adjectives, one category is an imaginary word: adverbs, prepositions, conjunctions, aides, mood words, personification, and exclamation.

In practical applications, words with different parts of speech have different positions in sentences due to different actions, for example, a connective word is an imaginary word used for connecting words with words, phrases with phrases or sentences with sentences, and can represent relations of parallel, connection, turning, compliance, selection, assumption, comparison, yield and the like. Therefore, a connective typically occurs in the middle of a sentence, and not at the end or beginning of the sentence. Furthermore, prepositions, articles, qualifiers generally do not appear at the end of sentences. For another example, the term is an imaginary term representing the spirit, and is often used to represent various kinds of spirit at the end of a sentence or at a pause in a sentence. Common mood words are: what is done, what is done with the woolen cloth, bar, o. The word of language: attached to the end of words and sentences, which represent the mood.

According to the characteristics of the parts of speech, the embodiment of the invention can identify the parts of speech of the end word of the Chinese character line in the first area and the starting word of the Chinese character line in the second area, and if the parts of speech of the end word is determined to be matched with the first preset part of speech, or if the parts of speech of the starting word is determined to be matched with the second preset part of speech, the end word and the starting word can be determined to meet the association condition.

Optionally, the first preset part of speech may at least include any one of the following: connective, preposition, article and qualifier; the second threshold part of speech may include at least any one of: the connective word and the mood word.

Specifically, the embodiment of the invention can mark the parts of speech of the word segmentation in a large number of collected sentences, train a part of speech recognition model according to the marked data, recognize the parts of speech of the end word of the character line in the first area and the start word of the character line in the second area through the trained part of speech recognition model, further match the part of speech obtained by recognition with the first preset part of speech or the second preset part of speech, and determine that the end word and the start word meet the association condition if the part of speech of the end word is determined to match with the first preset part of speech or the part of speech of the start word is determined to match with the second preset part of speech.

Mode three

In an optional embodiment of the present invention, the determining that the end word and the start word meet the association condition may specifically include: and if the format of the initial word does not accord with the format of the sentence head word corresponding to the source language, determining that the end word and the initial word meet the association condition.

The embodiment of the invention can also identify the source language to determine the language type of the source language, further judge whether the format of the initial word accords with the format of the sentence head word corresponding to the source language, and if the format of the initial word does not accord with the format of the sentence head word corresponding to the source language, the probability that the initial word is not the sentence head word is higher, and the association condition can be met between the initial word and the end word of the last line (the text line in the first area).

Taking the source language as English as an example, the initial letters of the sentence head words in English sentences are usually capital letters, so if the initial letters of the starting words are not capital letters, the format of the starting words is determined to be not in accordance with the format of the sentence head words corresponding to the source language as English, the condition that the starting words and the last words of the last line meet the association condition can be determined, and the sentence head words and the last text contents of the last line can be combined.

In an optional embodiment of the present invention, the determining that the end word and the start word meet the association condition may specifically include: and if no punctuation mark exists after the end word of the Chinese line in the first area, determining that the end word and the initial word meet the association condition.

In a specific application, a complete sentence will usually be terminated by punctuation marks, so if no punctuation mark exists after the end word of the text line in the first area, it is indicated that the text content in the first area is not terminated, the probability that the end word has an association relationship with the start word of the next line (text line in the second area) is higher, and it can be determined that the association condition between the end word and the start word is satisfied.

It can be appreciated that in practical applications, the above scheme of determining whether the text line area of the adjacent source language includes the text content of the same paragraph may be used alone or in combination. For example, whether the text content of the same paragraph is included in the adjacent text line area of the source language can be judged according to the parameter information corresponding to the text line area of the source language; or, according to the association relation between the end word in the first area and the start word in the second area, judging whether the text content of the same paragraph is included in the adjacent text line area of the source language; or after determining the source language text line area in the image, judging whether the adjacent source language text line area comprises text content of the same paragraph according to parameter information corresponding to the source language text line area, so as to perform preliminary paragraph grouping on the adjacent source language text line area, then identifying texts in the source language text line area after the preliminary paragraph grouping, and further judging whether the adjacent source language text line area after the preliminary paragraph grouping comprises text content of the same paragraph according to association relation between end words in the first area and initial words in the second area, so as to perform paragraph grouping again on the result of the preliminary paragraph grouping, thereby obtaining a more accurate source language text line area; and finally, translating the source language text in the source language text segment area obtained after the segmentation again to obtain a target language text corresponding to the source language text.

It may be appreciated that the execution timing of the recognition of the text in the source language text line region is not limited in the embodiment of the present invention, for example, the embodiment of the present invention may recognize the source language text in the source language text line region after determining the source language text line region in the image and before grouping the adjacent source language text line regions; optionally, the embodiment of the invention can train two convolutional neural networks, wherein one convolutional neural network is used for detecting the text line area of the source language in the image to determine the text line area of the source language in the image, and the other convolutional neural network is used for carrying out text recognition on the detected text line area of the source language to obtain the text content in each text line area of the source language; alternatively, after the adjacent source language text line regions are segmented, the source language text in the segmented source language text line region may be identified.

In summary, after determining a source language text line region in an image, the embodiment of the invention further determines whether the adjacent source language text line region includes text content of the same paragraph, if it is determined that the adjacent source language text line region includes text content of the same paragraph, the adjacent source language text line regions are combined to obtain a source language text line region, and then source language texts in the source language text line region are translated to obtain target language texts corresponding to the source language texts. Because the text content in the combined source language text segment region comprises a complete paragraph, and the combined paragraph comprises a complete sentence, the source language text in the combined source language text segment region is translated, and the target language text with more accurate semantic expression can be obtained, so that the translation accuracy is improved.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Device embodiment

Referring to FIG. 6, there is shown a block diagram of an embodiment of a translation apparatus of the present invention, which may include:

a determining module 601, configured to determine a text line area of a source language in an image;

a merging module 602, configured to, if it is determined that the adjacent source language text line area includes text content of the same paragraph, merge the adjacent source language text line areas to obtain a source language text segment area;

and the translation module 603 is configured to translate the source language text in the source language text segment area to obtain a target language text corresponding to the source language text.

Optionally, the merging module 602 may specifically include:

the first determining submodule is used for determining that text contents of the same paragraph are included in the adjacent source language text line areas if the size difference of the adjacent source language text line areas is smaller than a preset difference value, the line spacing is smaller than a preset spacing value and the text directions in the adjacent source language text line areas are the same; wherein the dimensions include: the height of the source language text line area, and/or the width of the source language text line area.

Optionally, the merging module 602 may specifically include:

the second determining submodule is used for determining end words of Chinese character lines in the first area and determining starting words of the Chinese character lines in the second area; the first area and the second area are adjacent source language text line areas, the first area is located at a first position in the adjacent source language text line areas, and the second area is located at a second position in the adjacent source language text line areas;

and the third determining submodule is used for determining that the text content of the same paragraph is included in the first area and the second area if the condition that the relation condition is met between the tail word and the initial word is determined.

Optionally, the third determining sub-module may specifically include:

the first determining unit is used for determining a first probability that the end word is an end word;

a second determining unit, configured to determine a second probability that the start word is a sentence head word;

a third determining unit configured to determine a third probability that the start word appears in a case where the end word appears;

and the fourth determining unit is used for determining that the end word and the start word meet the association condition if the first probability is smaller than a first threshold value, the second probability is smaller than a second threshold value and the third probability is larger than a third threshold value.

Optionally, the third determining sub-module may specifically include:

and a fifth determining unit, configured to determine that the association condition is satisfied between the end word and the start word if the part of speech of the end word matches with the first preset part of speech, or if the part of speech of the start word matches with the second preset part of speech.

Optionally, the third determining sub-module may specifically include:

and a sixth determining unit, configured to determine that the association condition is satisfied between the end word and the start word if the format of the start word does not conform to the format of the sentence head word corresponding to the source language.

Optionally, the source language text line region is determined from a convolutional neural network.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Embodiments of the present invention provide an apparatus for translation, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for: determining a text line area of a source language in an image; if the adjacent source language text line area is determined to comprise the text content of the same paragraph, merging the adjacent source language text line area to obtain a source language text segment area; and translating the source language text in the source language text segment area to obtain a target language text corresponding to the source language text.

Fig. 7 is a block diagram illustrating an apparatus 800 for translation, according to an example embodiment. For example, apparatus 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.

Referring to fig. 7, apparatus 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the apparatus 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing element 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the device 800. Examples of such data include instructions for any application or method operating on the device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 806 provides power to the various components of the device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.

The multimedia component 808 includes a screen between the device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice information processing mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the apparatus 800. For example, the sensor assembly 814 may detect an on/off state of the device 800, a relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or one component of the apparatus 800, the presence or absence of user contact with the apparatus 800, an orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communication between the apparatus 800 and other devices, either in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, in an NFC module may be based on radio frequency information processing (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including instructions executable by processor 820 of apparatus 800 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Fig. 8 is a schematic diagram of a server in some embodiments of the invention. The server 1900 may vary considerably in configuration or performance and may include one or more central processing units (central processing units, CPU) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) that store applications 1942 or data 1944. Wherein the memory 1932 and storage medium 1930 may be transitory or persistent. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, a central processor 1922 may be provided in communication with a storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.

The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.

A non-transitory computer readable storage medium, which when executed by a processor of an apparatus (server or terminal), enables the apparatus to perform the translation method shown in fig. 1.

A non-transitory computer readable storage medium, which when executed by a processor of an apparatus (server or terminal), causes the apparatus to perform a translation method, the method comprising: determining a text line area of a source language in an image; if the adjacent source language text line area is determined to comprise the text content of the same paragraph, merging the adjacent source language text line area to obtain a source language text segment area; and translating the source language text in the source language text segment area to obtain a target language text corresponding to the source language text.

The embodiment of the invention discloses A1 and a translation method, which comprises the following steps:

determining a text line area of a source language in an image;

A2, determining text content including the same paragraph in adjacent text line areas of the source language according to the method of A1, wherein the method comprises the following steps:

A3, determining text content including the same paragraph in adjacent text line areas of the source language according to the method of A1, wherein the method comprises the following steps:

determining an end word of a Chinese character line in the first area and a start word of the Chinese character line in the second area; the first area and the second area are adjacent source language text line areas, the first area is located at a first position in the adjacent source language text line areas, and the second area is located at a second position in the adjacent source language text line areas;

and if the condition that the end word and the start word meet the association condition is determined, determining that the text content of the same paragraph is included in the first area and the second area.

A4, determining that the end word and the start word meet association conditions according to the method of A3, wherein the method comprises the following steps:

determining a first probability that the end word is an end word;

determining a second probability that the initial word is a sentence head word;

determining a third probability of occurrence of the start word in the case of occurrence of the end word;

and if the first probability is smaller than a first threshold value, the second probability is smaller than a second threshold value and the third probability is larger than a third threshold value, determining that the end word and the start word meet the association condition.

A5, determining that the end word and the start word meet association conditions according to the method of A3, wherein the method comprises the following steps:

and if the part of speech of the end word is matched with the first preset part of speech, or if the part of speech of the starting word is matched with the second preset part of speech, determining that the end word and the starting word meet the association condition.

A6, according to the method of A3, the determining that the end word and the start word meet the association condition includes:

and if the format of the initial word does not accord with the format of the sentence head word corresponding to the source language, determining that the end word and the initial word meet the association condition.

The embodiment of the invention discloses a B7 and a translation device, wherein the device comprises:

B8, the apparatus of B7, the merge module comprising:

B9, the device of B7, the merging module includes:

B10, the apparatus of B9, the third determination submodule comprising:

B11, the apparatus of B9, the third determination submodule comprising:

B12, the apparatus of B9, the third determination submodule comprising:

The embodiment of the invention discloses a C13, a device for translation, which comprises a memory and one or more programs, wherein the one or more programs are stored in the memory, and are configured to be executed by one or more processors, and the one or more programs comprise instructions for:

determining a text line area of a source language in an image;

C14, the apparatus according to C13, wherein the determining text content including the same paragraph in the adjacent text line area of the source language includes:

C15, the apparatus according to C13, wherein the determining text content including the same paragraph in the adjacent text line area of the source language includes:

C16, the apparatus according to C15, wherein the determining that the end word and the start word meet the association condition includes:

Determining a first probability that the end word is an end word;

determining a second probability that the initial word is a sentence head word;

C17, the apparatus according to C15, wherein the determining that the end word and the start word meet the association condition includes:

C18, the apparatus according to C15, wherein the determining that the end word and the start word meet the association condition includes:

Embodiments of the invention disclose D19, a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform a translation method as described in one or more of A1 to A6.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

The foregoing has outlined some of the more detailed description of a translation method, a translation device and a translation device according to the present invention, wherein specific examples are provided herein to illustrate the principles and embodiments of the present invention and to help understand the method and core concepts thereof; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. A method of translation, the method comprising:

determining a text line area of a source language in an image;

translating the source language text in the source language text segment area to obtain a target language text corresponding to the source language text;

the determining text content including the same paragraph in adjacent text line areas of the source language comprises:

if the end word and the start word meet the association condition, determining that the text content of the same paragraph is included in the first area and the second area;

the determining that the end word and the start word meet the association condition comprises the following steps:

Determining a first probability that the end word is an end word;

determining a second probability that the initial word is a sentence head word;

2. The method of claim 1, wherein determining text content comprising the same paragraph in adjacent text line regions of the source language comprises:

3. The method of claim 1, wherein said determining that an association condition is satisfied between the end word and the start word comprises:

4. The method of claim 1, wherein said determining that an association condition is satisfied between the end word and the start word comprises:

5. A translation apparatus, the apparatus comprising:

the translation module is used for translating the source language text in the source language text segment area to obtain a target language text corresponding to the source language text;

the merging module comprises:

A third determining sub-module, configured to determine that the text content of the same paragraph is included in the first area and the second area if it is determined that the association condition is satisfied between the end word and the start word;

the third determination submodule includes:

6. The apparatus of claim 5, wherein the combining module comprises:

7. An apparatus for translation comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:

determining a text line area of a source language in an image;

determining a first probability that the end word is an end word;

determining a second probability that the initial word is a sentence head word;

8. A machine readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the translation method of any of claims 1 to 4.