WO2023103943A1

WO2023103943A1 - Image processing method and apparatus, and electronic device

Info

Publication number: WO2023103943A1
Application number: PCT/CN2022/136494
Authority: WO
Inventors: 刘池莉
Original assignee: 维沃移动通信有限公司
Priority date: 2021-12-10
Filing date: 2022-12-05
Publication date: 2023-06-15
Also published as: CN114299525A

Abstract

The present application relates to the technical field of communications, and discloses an image processing method and apparatus, and an electronic device. The method comprises: obtaining N texts and target information comprised in a target image, the target information comprising at least one of first integrity of the N texts and second integrity of first translations corresponding to the N texts, N being an integer greater than 1; combining S texts satisfying first semantic information among P texts to obtain a first text, the P texts being incomplete texts determined from the N texts according to the first integrity, and both P and S being integers greater than 1; and under the condition that the first text and a second translation are both complete texts, outputting the first text, the second translation being a text obtained by combining translations corresponding to the S texts among third translations, and the third translations being incomplete translations determined from the first translations according to the second integrity.

Description

Image processing method, device and electronic equipment

Cross References to Related Applications

This application claims priority to Chinese Patent Application No. 202111509057.0 filed in China on December 10, 2021, the entire contents of which are hereby incorporated by reference.

technical field

The present application belongs to the technical field of communications, and in particular relates to a picture processing method, device and electronic equipment.

Background technique

With the development of electronic device technology, electronic devices are more and more widely used, for example, electronic devices can recognize and process text in pictures.

Currently, when a picture includes multiple lines of text, the electronic device can combine the multiple lines of text in the picture according to the physical position coordinates and text layout of the text lines in the picture during the process of identifying the picture.

However, based on the above method, when the text in the picture includes complex text such as column text, page text, or deformed and irregular text, the electronic device may not be able to process the text in the picture according to the physical position coordinates of the text line and the text layout. merge. In this way, the electronic device has a poor processing ability for the text in the picture.

Contents of the invention

The purpose of the embodiments of the present application is to provide a picture processing method, device and electronic equipment, which can solve the problem that the electronic equipment has poor processing ability for text in pictures.

In the first aspect, the embodiment of the present application provides a picture processing method, the method includes: acquiring N texts and target information included in the target picture, the target information includes at least one of the following: the first complete text of the N texts degree, the second completeness degree of the first translation corresponding to the N texts, N is an integer greater than 1; among the P texts, S texts satisfying the first semantic information are combined to obtain the first text, and the P texts are based on The first completeness is an incomplete text determined from the N texts, P and S are both integers greater than 1, and both P and S are integers greater than 1; when both the first text and the second translation are complete texts In the case of , output the first text, the second translation is the text obtained after merging the translations corresponding to the S texts in the third translation, and the third translation is determined from the first translation according to the second completeness Incomplete translation.

In a second aspect, an embodiment of the present application provides an image processing apparatus, and the image processing apparatus includes: an acquisition module, a processing module, and an output module. An acquisition module, configured to acquire N texts and target information included in the target picture, the target information including at least one of the following: the first completeness of the N texts, the second completeness of the first translation corresponding to the N texts Degree, N is an integer greater than 1. A processing module, configured to merge S texts that satisfy the first semantic information among the P texts to obtain a first text, the P texts are incomplete texts determined from the N texts according to the first completeness, P and S is an integer greater than 1. An output module, configured to output the first text when the first text and the second translation are complete texts, the second translation is a text obtained after merging translations corresponding to the S texts in the third translation, the The third translation is an incomplete translation determined from the first translation according to the second degree of completeness.

In a third aspect, an embodiment of the present application provides an electronic device, the electronic device includes a processor and a memory, the memory stores programs or instructions that can run on the processor, and the programs or instructions are executed by the processor When implementing the steps of the method in the first aspect above.

In a fourth aspect, an embodiment of the present application provides a readable storage medium, on which a program or an instruction is stored, and when the program or instruction is executed by a processor, the steps of the method in the above first aspect are implemented.

In the fifth aspect, the embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the method in the first aspect above.

In a sixth aspect, an embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the method described in the first aspect.

In the embodiment of the present application, the N texts and target information included in the target picture are obtained, and the target information includes at least one of the following items: the first completeness of the N texts, the first translation of the N texts corresponding to the first Two completeness, N is an integer greater than 1; merging S texts satisfying the first semantic information in the P texts to obtain the first text, the P texts are determined from the N texts according to the first completeness Incomplete text, both P and S are integers greater than 1, and both P and S are integers greater than 1; when the first text and the second translation are both complete texts, the first text is output, and the second translation is A text obtained by merging translations corresponding to the S texts in the third translation, where the third translation is an incomplete translation determined from the first translation according to the second degree of completeness. Through this scheme, after obtaining multiple texts and target information in the target picture, at least one text that satisfies the semantic information among the incomplete texts determined according to the target information among the multiple texts can be combined to obtain a merged text, so When the text in the picture includes complex text such as column text, page text, or deformed and irregular text, these complex texts can be merged according to semantic information. Further, since the combined text is output only when the combined text and its corresponding translation are both complete texts, the semantics of the resulting combined text are more fluent. In this way, the ability to process the text in the picture is improved.

Description of drawings

FIG. 1 is a schematic diagram of an image processing method provided in an embodiment of the present application;

Fig. 2 (a) is one of the interface schematic diagrams of a picture processing provided by the embodiment of the present application;

Fig. 2(b) is the second schematic diagram of an image processing interface provided by the embodiment of the present application;

FIG. 3 is a schematic structural diagram of an image processing device provided in an embodiment of the present application;

FIG. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of hardware of an electronic device provided by an embodiment of the present application.

Detailed ways

The following will clearly describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of them. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments in this application belong to the protection scope of this application.

The terms "first", "second" and the like in the specification and claims of the present application are used to distinguish similar objects, and are not used to describe a specific sequence or sequence. It should be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application can be practiced in sequences other than those illustrated or described herein, and that references to "first," "second," etc. distinguish Objects are generally of one type, and the number of objects is not limited. For example, there may be one or more first objects. In addition, "and/or" in the specification and claims means at least one of the connected objects, and the character "/" generally means that the related objects are an "or" relationship.

The image processing method, device, and electronic device provided in the embodiments of the present application will be described in detail below through specific embodiments and application scenarios with reference to the accompanying drawings.

As shown in FIG. 1 , an embodiment of the present application provides an image processing method, and the method includes the following S101 to S103.

S101. The image processing apparatus acquires N pieces of text and object information included in a target image.

Wherein, the above-mentioned target information includes at least one of the following items: the first completeness degree of the N texts, the second completeness degree of the first translation corresponding to the N texts, and N is an integer greater than 1. N is an integer greater than 1.

Optionally, the above-mentioned target picture may be any of the following: a picture taken by the electronic device, a screenshot saved by the electronic device, and an online picture obtained by the electronic device.

Optionally, in this embodiment of the present application, the target picture may include multiple texts. The N texts are texts in the plurality of texts.

Optionally, the language types of the above N texts may be Chinese, English, Korean, Japanese and so on.

In addition, each of the above N texts may be a text in the traditional sense, or a text line. Specifically, it may be determined according to actual usage conditions, which is not limited in this embodiment of the present application.

Further, in the case that one of the N texts is a text line, the text line can be an independent single text line, and at this time, the single text line can be used as a text; or, the text line can be a certain text paragraph A line of text in .

Optionally, in the embodiment of the present application, the text content contained in the target picture may be identified through the picture text recognition technology, and the text content may specifically include: the text contained in the target picture, and the coordinates of the text.

Optionally, the above-mentioned first translation may include translations in one language type, or translations in multiple language types. Specifically, it may be determined according to actual usage conditions, which is not limited in this embodiment of the present application.

Optionally, the foregoing first completeness and second completeness are determined according to semantic information. For details, reference may be made to the detailed description of the following embodiments, and details are not described in this embodiment of the present application.

S102. The image processing device merges S texts satisfying the first semantic information among the P texts to obtain the first text.

Wherein, the above P texts are incomplete texts determined from the N texts according to the first completeness degree. Both P and S are integers greater than 1.

Optionally, for judging that P texts in the N texts are incomplete texts, the following scenarios may be included:

Scenario 1: The semantics of P texts are incomplete.

Scenario 2: The sentence structure of the first sentence or the last sentence of each text in the P texts is missing.

Scenario 3: The sentence-ending words in each of the P texts cannot form separate words.

It should be noted that the above three scenarios all use semantic information to determine that P texts out of N texts are incomplete texts. These three scenarios are only exemplary descriptions provided by the embodiments of this application. Of course, judging that P texts out of N texts are incomplete texts through semantic information may also include other implementations, which are not discussed in this embodiment of the application. limited.

Optionally, the first semantic information may include at least one of the following: sentence structure information, sentence component information, and phrase composition information.

Exemplarily, take the first semantic information as sentence structure information as an example. The sentence structure information may include at least one of the following: subject-predicate structure, verb-object structure, subject-verb-object structure, subject-verb-object definite complement structure, and the like.

Exemplarily, take the first semantic information as sentence component information as an example. The sentence component information may include at least one of the following: subject, predicate, object, attributive, adverbial, complement and so on.

Exemplarily, take the first semantic information as phrase composition information as an example. The phrase composition information may include at least one of the following: sentence beginning words, sentence ending words, common words, phrases, phrases, and the like.

It should be noted that the above embodiment is only an exemplary description of the first semantic information, and of course the first semantic information may also include other information related to semantics, which is not limited in this embodiment of the present application.

In addition, the description of the above-mentioned first semantic information is only a possible exemplary situation enumerated when the N texts are Chinese texts. When the N texts are of other language types, the semantic rules of other language types can be followed. Or syntax to explain the semantic information, which is not limited in this embodiment of the present application.

Optionally, in this embodiment of the present application, in a possible situation, the P texts only include a group of texts conforming to the first semantic information, that is, the S texts are the group of texts; in another possible situation, the P texts The texts include multiple sets of texts conforming to the first semantic information, and the S texts are any set of texts in the multiple sets of texts.

Further, in the case that the P texts include multiple groups of texts conforming to the first semantic information, for the implementation of merging the multiple groups of texts, you can refer to the detailed description of the S texts, which will not be discussed in this embodiment of the present application. repeat.

S103. In the case that the first text and the second translation are complete texts, the image processing device outputs the first text.

Wherein, the above-mentioned second translation is a text obtained by merging the translations corresponding to the S texts in the third translation, and the third translation is an incomplete translation determined from the first translation according to the second degree of completeness.

Example 1, take the image processing device as a mobile phone as an example. As shown in Figure 2(a), when the mobile phone displays a picture, the picture contains text: S1 to S8; as shown in Figure 2(b), it is the translation corresponding to S1 to S8: 01 to 08 , the first translation. The mobile phone can acquire the picture including text: S1 to S8, and the completeness of S1 to S8 and the completeness of the first translation. Since S2 to S8 are incomplete texts among the eight texts, the mobile phone can combine S2, S3 and S4 satisfying the first semantic information among S2 to S8 to obtain S9 after the combination of S2, S3 and S4. Afterwards, in the case that S9 and the translations obtained by merging 02 to 08 corresponding to S2 to S8 are all complete texts, output S9.

Further, for S5 and S6; S7 and S8 satisfying the first semantic information among the eight texts. The above-mentioned process can be executed cyclically, respectively merging S5 and S6; S7 and S8. Finally, output S10 after merging S5 and S6, and S11 after merging S7 and S8.

In this way, through the above process, multiple groups of text in the picture that satisfy the semantic information can be merged, thereby completing the processing of the text in the picture.

Optionally, after the above S102 and before S103, the image processing method provided in the embodiment of the present application may further include: the image processing apparatus determines that the first text is a complete text according to the first semantic information.

Further, for judging whether the first text is complete S2 to S8, you can refer to the detailed description of judging the completeness of N texts in the following embodiments, which will not be repeated here in this embodiment of the present application.

The embodiment of the present application provides a picture processing method. After obtaining multiple texts and target information in the target picture, at least one text that satisfies the semantic information among the incomplete texts determined according to the target information can be combined. , to obtain a merged text, so when the text in the picture includes complex text such as column text, page text, or deformed and irregular text, these complex texts can be merged according to semantic information. Further, since the combined text is output only when the combined text and its corresponding translation are both complete texts, the semantics of the resulting combined text are more fluent. In this way, the ability to process the text in the picture is improved.

Optionally, the above-mentioned first completeness includes the sentence completeness of the first target sentence of each text in the N texts, and the second completeness includes the sentence completeness of the second target sentence of each translation in the first translation; correspondingly Specifically, the above S101 may specifically be implemented through the following S101A to S101C.

S101A. The image processing device extracts text included in the target image to obtain N texts.

S101B. The image processing device analyzes the sentence completeness of the first target sentence based on the first semantic information.

Wherein, the above-mentioned first semantic information may include at least one of the following items: sentence structure information, sentence component information, and phrase composition information. The first target clause may include at least one of the following: the first sentence in the text, and the last sentence in the text.

Optionally, analyzing the sentence integrity of the first target sentence based on the first semantic information may include the following two possible implementations:

Implementation method 1: using the first semantic information as a preset rule to analyze the sentence integrity of the first target sentence.

Exemplarily, judgments are made based on the first semantic information such as sentence structure, phrase composition, sentence components, etc., and all possible incomplete texts are screened out. Take Figure 2(a) as an example. The first sentence of S6 is "a newly opened store in Dao". According to the analysis of sentence component information, the first sentence of S6 lacks a subject, so S6 is considered incomplete.

Exemplarily, it is judged based on the first semantic information such as the first word of the sentence, the word at the end of the sentence, and the phrase. Vocabulary, phrase table and end-of-sentence vocabulary for different types of languages can be constructed, and a weight can be set for each word in the vocabulary; wherein, the weight can be set according to the frequency of use of a word. In this way, based on the phrase composition information, it can be judged whether the last word of the last sentence of the text is a common sentence ending word, or whether the first word and the last word of the text can be independently formed into words or phrases to determine the Whether the text line paragraph is complete. As shown in Figure 2(a), the last word of S2 is "Peng", and it can be seen from the vocabulary that the probability of "Peng" being able to form a word alone and be used as an ending word is very low, so S2 is considered to be incomplete.

Implementation method 2: construct a semantic model corresponding to the first semantic information, input N texts into the semantic model, and analyze the sentence integrity of the first target sentence.

Specifically, text data with features such as lexical structure, syntactic structure, and sentence ending words can be used to train a semantic model, and the first semantic information such as part of speech and syntactic structure of different types of languages can be set through the semantic model. In this way, the semantic model can be directly used to judge whether the target sentence of the current text is complete.

It should be noted that when constructing the semantic model, information such as the morphology and syntactic structure of the incomplete sentence and possible missing sentence components are output at the same time. The embodiment of the present application does not limit the specific algorithm of the semantic model, as long as the corresponding model training data is constructed according to different types of languages.

S101C. The image processing device analyzes the sentence completeness of the second target sentence based on the second semantic information.

Wherein, the above-mentioned second semantic information respectively includes at least one of the following items: sentence structure information, sentence component information, and phrase composition information.

Optionally, for a specific implementation manner of analyzing the sentence completeness of the second target sentence based on the second semantic information, reference may be made to the detailed description of S101B in the foregoing embodiment, which will not be repeated in this embodiment of the present application.

Optionally, after the above S101B and before S102, the image processing method provided in this embodiment of the present application may further include the following S104.

S104. The image processing device determines P texts from the N texts according to the sentence completeness of the first target sentence.

Wherein, the above P texts correspond to the third translation.

Example 2, in combination with the above example 1, according to the first semantic information, since the last word of the last sentence of S2 is "friend", it cannot be used as the word at the end of the sentence, so the last sentence is incomplete, that is, S2 is incomplete; S3 The first word of the first sentence of S4 is "friend", which cannot be used as the first word of the sentence, so the first sentence is incomplete, that is, S3 is incomplete; the first sentence of S4 is "a lot of things", the first sentence A sentence lacks a subject predicate, so this first sentence is incomplete, that is, S4 is incomplete; the last sentence of S5 is "I know", and this last sentence lacks an object, so this last sentence is incomplete, that is, S5 is not Complete; the first sentence of S6 is "Tao...", the first sentence lacks a subject, so the first sentence is incomplete, that is, S6 is incomplete; the last sentence of S7 is "cannot", which cannot be used as a sentence The end word, so the last sentence is incomplete, that is, S7 is incomplete; the first sentence of S8 is "and", which cannot be used as the beginning word of a sentence, so the first sentence is incomplete, that is, S8 is incomplete. In this way, incomplete paragraphs S2 to S8 can be determined from S1 to S8.

The picture processing method provided by the embodiment of the present application can extract the text included in the target picture, obtain N texts, and analyze the sentence completeness of the first target sentence based on the first semantic information, and analyze the completeness of the first target sentence based on the second semantic information The sentence completeness of the second target sentence, that is, the completeness of the N texts and the completeness of the translations of the N texts can be determined.

Further, since P texts can be determined from the N texts according to the sentence integrity of the first target sentence, it is convenient to select texts satisfying the first semantic information from the P texts for merging.

Optionally, after the above S101 and before S102, the image processing method provided in the embodiment of the present application may further include the following S105 to S108.

S105. The image processing apparatus acquires at least two texts from the P texts that match the second text in the P texts according to the first semantic information.

Optionally, for the description of the first semantic information, reference may be made to the detailed description in the foregoing embodiments, which will not be repeated in this embodiment of the present application.

Optionally, the above-mentioned second text is any one of the P texts. For example, the second text is the most front-distributed text among the P texts.

Optionally, the above S105 may specifically include: the image processing device judges, according to the first semantic information, whether the second text among the P texts can be merged with any text other than the second text among the P texts, so as to obtain At least two texts that match the second text.

Further, "at least two texts matching the second text among the P texts" in the above S105 means: the second text and at least two texts satisfy the first semantic information.

S106. The image processing apparatus merges the second text with the at least two texts respectively to obtain at least two merged texts.

Optionally, the above S106 may include the following two specific possible implementation manners:

(1) Directly merging the second text with the at least two texts respectively to obtain at least two merged texts.

(2) Merging the last sentence of the second text with the first sentence of each of the at least two texts to obtain at least two merged sentences.

S107. The image processing device determines the sentence perplexity of the at least two merged texts.

Among them, the sentence perplexity is used to indicate the smoothness of the sentences in the merged text.

It can be understood that determining the sentence perplexity of the at least two merged texts is essentially determining the sentence perplexity of the merged sentences included in the at least two merged texts respectively.

It should be noted that the lower the sentence perplexity, the higher the fluency of the sentence, and thus the higher the semantic correctness; on the contrary, the higher the sentence perplexity, the lower the fluency of the sentence, and thus the lower the semantic correctness.

S108. The image processing apparatus determines the third text corresponding to the target merged text as the text to be merged corresponding to the second text.

Wherein, the above-mentioned target merged text is the text with the lowest sentence perplexity among at least two merged texts. The S texts include the second text and the third text.

It should be noted that since the lower the sentence perplexity, the higher the fluency of the sentence, after determining the third text with the lowest sentence perplexity as the text to be merged corresponding to the second text, the second text can be merged and the third text.

In the image processing method provided by the embodiment of the present application, after obtaining at least two texts matching the second text of the P texts from the P texts according to the first semantic information, since the second text can be matched with the The at least two texts are merged to obtain at least two merged texts, and the sentence perplexity of the at least two merged texts is determined, so the at least two texts can be selected from the at least two texts according to the perplexity of the two merged texts. The text to be merged is more matched to the text, thereby improving the correctness of the text merge.

Optionally, after the above S101 and before S102, the image processing method provided in the embodiment of the present application may further include the following S109 and S110. That is, the above details can be realized through S110 to S112.

S109. The image processing device determines adjacent Q texts from the P texts according to the distribution position of each text in the P texts.

Wherein, Q is an integer greater than or equal to S.

It should be noted that, through the distribution position of each text in the P texts, the distribution positions of the two texts that can be merged are clarified, so as to exclude some texts that obviously cannot be merged to form the same paragraph. In this way, Q texts with adjacent distribution positions are determined from the P texts.

Specifically, if other texts are included between the two texts, it means that the two texts cannot be merged, that is, the two texts cannot be merged across lines. Of course, it is also possible to record the numbers of texts that cannot be merged.

Exemplarily, as shown in FIG. 2(a), the picture includes texts S1, S2... and S8. From the distribution positions of these 8 texts, since there are multiple text line paragraphs between the two text line paragraphs, it can be determined that S2 and S7, S2 and S6, S2 and S8 cannot be merged, so that the text that cannot be merged can be The numbers are recorded in the non-merge list not_merge_list=[S2_S7, S2_S6, S2_S8].

It can be understood that, since the merging of two texts has a sequence relationship, the sequence of numbers in the non-merging list can represent the actual merging sequence. For example, S2_S7 in the unmergeable list means that the next sentence of S2 is not S7, but it does not mean that the next sentence of S7 cannot be S2.

S110. The image processing device determines S texts that satisfy the first semantic information among the Q texts as texts to be merged.

Optionally, for the implementation of determining S texts satisfying the first semantic information from the Q texts, reference may be made to the detailed descriptions in S105 to S108 in the foregoing embodiment. Specifically can include:

(1) According to the first semantic information, at least two texts matching text 1 in the Q texts are obtained from the Q texts.

(2) Combining the text 1 with the at least two texts respectively to obtain at least two merged texts.

(3) Determine the sentence perplexity of the at least two merged texts.

(4) Determine the text 2 corresponding to the merged text 1 as the text to be merged corresponding to the text 1 . The merged text 1 is the text with the lowest sentence perplexity among at least two merged texts. The S texts include text 1 and text 2.

It should be noted that if no text matching text 2 is obtained according to the first semantic information, only text 1 and text 2 are included in the S texts, so that through (1) to (4) in the above embodiment, It is possible to determine S texts satisfying semantic information from Q texts;

If other texts that match text 2 are obtained according to the first semantic information, it means that the S texts also include other texts except text 1 and text 2, so that (1) in the above embodiment can be continued to be executed in a loop to (4) to determine other texts that match text1 and text2.

In this way, through the above implementation manner, S texts satisfying the first semantic information can be obtained from the Q texts and determined as texts to be merged.

It can be understood that, since the Q texts adjacent to the distribution position can be determined from the P texts according to the distribution position of each text in the P texts, some texts that do not have the possibility of merging at the distribution position can be excluded, Therefore, unnecessary text merging operations of the electronic device are reduced. Further, since the S texts satisfying the first semantic information among the Q texts can be determined as the texts to be merged, after the rough screening of the distribution position, the first semantic information is used to determine the texts to be merged from the Q texts text, so that the semantic fluency of the merged text is higher.

Optionally, after the above S110 and before S102, the image processing method provided in this embodiment of the present application may further include the following S111. Correspondingly, the above S102 may specifically be implemented through the following S102A.

S111. The image processing apparatus determines a target arrangement sequence of the S texts according to the first semantic information.

It can be understood that, since the first semantic information includes sentence structure information, sentence component information and phrase composition information, etc., the arrangement order of the text can be determined according to the sentence component information and phrase composition information.

S102A. The image processing apparatus combines the S texts according to the target arrangement sequence to obtain the first text.

It should be noted that merging S texts according to the target order is essentially: merging the last sentence of one text and the first sentence of the other text among the two adjacent texts in S texts, This loops until the S texts are merged to obtain the first text.

As an example, take the first semantic information as sentence structure information and sentence component information as an example. Assume that the last sentence of text A is "I know", which is a subject-predicate structure; the first sentence of text B is "Dao is a newly opened shop", which is a verb-object structure. According to the sentence structure information, sentence component information and phrase composition information, it can be known that text A lacks an object, text B lacks a subject, and "zhi" and "dao" conform to the phrase composition information, so that the arrangement order of text A and text B can be determined for A_B. That is, text B is merged at the end of text A.

Exemplarily, take the first semantic information as phrase composition information as an example. Suppose the last word of the last sentence of text C is "friend"; the first word of the first sentence of text D is "friend". According to the phrase composition information, it can be known that "friend" in text C and "friend" in text D conform to the phrase composition information, so it can be determined that the arrangement order of text C and text D is C_D. That is, text B is merged at the end of text A.

The image processing method provided by the embodiment of the present application can determine the target arrangement order of the S texts according to the first semantic information, so after merging the S texts according to the target arrangement order to obtain the first text, the first The semantics of the text are more complete, and the problem of semantic contradictions is not easy to appear.

Optionally, the image processing method provided in the embodiment of the present application may also include another possible implementation manner. The method may also include the following S112 to S115.

S112. Acquire M texts in the target image.

S113. In the case that T text paragraphs in the M texts are incomplete texts, merge L texts satisfying the third semantic information among the T texts to obtain a fourth text.

Wherein, M, T and L are all integers greater than 1;

Optionally, for the description of the third semantic information, reference may be made to the relevant description of the first semantic information in the foregoing embodiments, which will not be repeated in this embodiment of the present application.

S114. If the fourth text is a complete text, the image processing device translates the fourth text to obtain a fourth translation.

Optionally, the fourth translation may include translations in one language type, or translations in multiple language types. The embodiment of the present application does not limit the number and language types of the fourth translations.

Exemplarily, the fourth text is a Chinese-type text, and the fourth translation is an English-type translation; or, the fourth text is an English-type text, and the fourth translation includes a Chinese-type translation and a Korean-type translation.

S115. In the case that both the fourth text and the fourth translation are complete texts, the image processing device outputs the fourth text and the fourth translation.

Exemplarily, it is assumed that the second text is Chinese text. When the Chinese text is determined to be a complete text, the Chinese text is translated to obtain an English translation. If the English translation is a complete text, the image processing device can output the Chinese text and the English translation.

In the image processing method provided by the embodiment of the present application, after obtaining the M texts in the target image, since the L texts satisfying the third semantic information among the T texts can be merged, the fourth text is obtained, and the fourth text is translated , to get the fourth translation, so when the fourth translation is a complete text, and when the fourth translation is a complete paragraph, the first text and the first translation are output, so that after judging the combination Based on whether the obtained fourth text is complete, combined with the judgment of the completeness of the fourth translation, it is determined whether to output the fourth text, thereby improving the accuracy of paragraph merging. Further, since the fourth translation can also be output, in a scene where the text in the target picture needs to be translated, a translation with higher accuracy can be output.

Optionally, after the above S114, the image processing method provided in the embodiment of the present application may further include the following S116 and S117.

S116. In the case that the fourth translation is an incomplete text, the image processing device merges R texts among the T texts to obtain a fifth text.

Wherein, the above R texts include paragraphs determined according to the semantic information of the fourth translation, and R is an integer greater than 1.

Optionally, the above R texts may include all of the L texts, or some of the L texts, which are determined according to actual conditions, which is not limited in this embodiment of the present application.

It should be noted that the R texts are texts satisfying semantic information among the T texts.

Further, when the fourth translation is an incomplete text, according to the semantic information of the fourth translation, other texts satisfying the semantic information can be obtained from the T texts, and the fourth text can be combined with the other texts. It can be understood that the text merging position of the fourth text and the other text corresponds to the semantically incomplete text position in the fourth translation.

S117. In the case that the fifth text and the fifth translation are complete paragraphs, the image processing device outputs the third text and the fifth translation.

Wherein, the above-mentioned fifth translation is the translation corresponding to the fifth text.

Optionally, for the description of judging that the fifth text and the fifth translation are complete texts, reference may be made to the description of the first text in the foregoing embodiment, which will not be repeated in this embodiment of the present application.

It should be noted that since the translation of the text in the picture is to obtain a semantically correct translation, the completeness of the translation is the focus of picture translation. If the translation is incomplete, even if the original text in the picture is merged into a paragraph (also called the original text paragraph) is complete, it is necessary to merge the text that satisfies the semantic information at the corresponding position of the original paragraph according to the position of the semantically incomplete text in the translation. Therefore, the integrity of the translation is judged after translation again to ensure the integrity of the final output text.

It can be understood that merging from the original text paragraphs can ensure the integrity of the original text paragraphs. Only when the original paragraph is a complete paragraph can an effective translation be obtained after the original paragraph is translated by the translation model. On the contrary, if only the translation is merged, it is difficult to obtain a translation that satisfies the semantic information.

Optionally, after the above S116 and before S117, the image processing method provided in the embodiment of the present application may further include: if the fifth text is a complete text, translating the fifth text to obtain a fifth translation. In this way, the translation process is performed only when the merged fifth text is a complete text, thereby avoiding invalid translation operations when the merged text is an incomplete text, and also saving the operation of electronic equipment resource.

In the image processing method provided by the embodiment of the present application, since the fourth translation is an incomplete translation, the R texts among the T texts can be combined to obtain the fifth text. Therefore, according to the incomplete fourth translation, re- The R texts satisfying the semantic information among the T texts are merged, thereby improving the accuracy of text merging. Further, since the fifth text and the fifth translation are output only when both the fifth text and the fifth translation are complete texts, it can be ensured that a translation with higher accuracy is output.

The image processing method provided in the embodiment of the present application may be executed by an image processing device. In the embodiment of the present application, the method of image processing performed by the image processing device is taken as an example to illustrate the method provided by the embodiment of the present application.

As shown in FIG. 3 , an embodiment of the present application provides an image processing apparatus 200 , and the image processing apparatus may include an acquisition module 201 , a processing module 202 and an output module 203 . The acquiring module 201 may be configured to acquire N texts and target information included in the target picture, where the target information includes at least one of the following items: the first completeness of the N texts, the first translation of the N texts corresponding to the first Two completeness, N is an integer greater than 1. The processing module 202 may be configured to merge S texts satisfying the first semantic information among the P texts to obtain the first text, the P texts are incomplete texts determined from the N texts according to the first completeness, Both P and S are integers greater than 1. The output module 203 may be configured to output the first text when the first text and the second translation are complete texts, and the second translation is the text obtained after merging translations corresponding to the S texts in the third translation , the third translation is an incomplete translation determined from the first translation according to the second degree of completeness.

Optionally, the first completeness includes the sentence completeness of the first target sentence in each of the N texts, and the second completeness includes the sentence completeness of the second target sentence in each of the first translations. The acquisition module 201 is specifically used to extract the text included in the target picture to obtain N texts; and based on the first semantic information, analyze the sentence integrity of the first target sentence; and based on the second semantic information, analyze the text of the second target sentence Sentence completeness; wherein, the first target sentence and the second target sentence respectively include at least one of the following: the first sentence in the text, the last sentence in the text; the first semantic information and the second semantic information respectively include at least the following One item: sentence structure information, sentence component information, and phrase composition information;

Optionally, the image processing apparatus may further include a determination module. The determination module can be used to determine P texts from the N texts according to the sentence integrity of the first target sentence, and the P texts correspond to the third translation.

Optionally, the image processing apparatus may further include a determination module. The obtaining module 201 may also be configured to obtain at least two texts from the P texts that match the second text in the P texts according to the first semantic information. The processing module 202 may also be configured to merge the second text with at least two texts to obtain at least two merged texts. A determination module, configured to determine the third text corresponding to the target merged text as the text to be merged corresponding to the second text, the target merged text is the text with the lowest sentence perplexity in at least two merged texts; wherein, the S text includes the first Second text and third text.

Optionally, the image processing apparatus may further include a determination module. The determining module can be used to determine adjacent Q texts from the P texts according to the distribution position of each text in the P texts, and Q is an integer greater than or equal to S; S texts of semantic information are determined as texts to be merged.

Optionally, the determination module may also be configured to determine the target arrangement order of the S texts according to the first semantic information. The processing module may be specifically configured to combine the S texts according to the target arrangement order to obtain the first text.

The embodiment of the present application provides an image processing device. After acquiring multiple texts and target information in the target picture, at least one text that satisfies semantic information among the incomplete texts determined according to the target information can be combined. , to obtain a merged text, so when the text in the picture includes complex text such as column text, page text, or deformed and irregular text, these complex texts can be merged according to semantic information. Further, since the combined text is output only when the combined text and its corresponding translation are both complete texts, the semantics of the resulting combined text are more fluent. In this way, the ability to process the text in the picture is improved.

The image processing apparatus in the embodiment of the present application may be an electronic device, or may be a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or other devices other than the terminal. Exemplarily, the electronic device can be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle electronic device, a mobile Internet device (Mobile Internet Device, MID), an augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) ) equipment, robots, wearable devices, ultra-mobile personal computer (ultra-mobile personal computer, UMPC), netbook or personal digital assistant (personal digital assistant, PDA), etc., can also serve as server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (television, TV), teller machine, or self-service machine, etc., which are not specifically limited in this embodiment of the present application.

The picture processing device in the embodiment of the present application may be a device with an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, which are not specifically limited in this embodiment of the present application.

The image processing apparatus provided in the embodiment of the present application can realize various processes realized by the method embodiments in FIG. 1 and FIG. 2 , and details are not repeated here to avoid repetition.

Optionally, as shown in FIG. 4 , the embodiment of the present application also provides an electronic device 300, including a processor 301 and a memory 302. The memory 302 stores programs or instructions that can run on the processor 301. The When the programs or instructions are executed by the processor 301, the various steps of the above image processing method embodiments can be achieved, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.

It should be noted that the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.

FIG. 5 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 400 includes, but is not limited to: a radio frequency unit 401, a network module 402, an audio output unit 403, an input unit 404, a sensor 405, a display unit 406, a user input unit 407, an interface unit 408, a memory 409, and a processor 410, etc. part.

Those skilled in the art can understand that the electronic device 400 can also include a power supply (such as a battery) for supplying power to various components, and the power supply can be logically connected to the processor 410 through the power management system, so that the management of charging, discharging, and function can be realized through the power management system. Consumption management and other functions. The structure of the electronic device shown in FIG. 5 does not constitute a limitation to the electronic device. The electronic device may include more or fewer components than shown in the figure, or combine certain components, or arrange different components, and details will not be repeated here. .

Wherein, the processor 410 may be configured to acquire N texts and target information included in the target picture, where the target information includes at least one of the following items: the first completeness of the N texts, the first translation corresponding to the N texts The second completeness degree, N is an integer greater than 1; and used to merge S texts satisfying the first semantic information in the P texts to obtain the first text, the P texts are obtained from the N texts according to the first completeness degree The incomplete text determined in the text, P and S are both integers greater than 1; and used to output the first text when the first text and the second translation are complete texts, and the second translation is the third translation The text obtained after merging the translations corresponding to the S texts, the third translation is an incomplete translation determined from the first translation according to the second degree of completeness.

Optionally, the first completeness includes the sentence completeness of the first target sentence in each of the N texts, and the second completeness includes the sentence completeness of the second target sentence in each of the first translations. The processor 410 is specifically configured to extract text included in the target picture to obtain N texts; and analyze the sentence integrity of the first target sentence based on the first semantic information; and analyze the sentence integrity of the second target sentence based on the second semantic information. Sentence completeness; wherein, the first target sentence and the second target sentence respectively include at least one of the following: the first sentence in the text, the last sentence in the text; the first semantic information and the second semantic information respectively include at least the following One item: sentence structure information, sentence component information, and phrase composition information.

Optionally, the processor 410 may be configured to determine P texts from the N texts according to the sentence completeness of the first target sentence, and the P texts correspond to the third translation.

Optionally, the processor 410 may also be configured to obtain at least two texts from the P texts that match the second text in the P texts according to the first semantic information; Texts are merged to obtain at least two merged texts; and the third text corresponding to the target merged text is determined as the text to be merged corresponding to the second text, and the target merged text is the lowest sentence perplexity in at least two merged texts text; wherein, the S texts include the second text and the third text.

Optionally, the processor 410 may be configured to determine adjacent Q texts from the P texts according to the distribution position of each text in the P texts, where Q is an integer greater than or equal to S; S texts satisfying the first semantic information among the texts are determined as texts to be merged.

Optionally, the processor 410 may also be configured to determine a target arrangement order of the S texts according to the first semantic information; and to combine the S texts according to the target arrangement order to obtain the first text.

An embodiment of the present application provides an electronic device. After acquiring multiple texts and target information in the target picture, at least one text that satisfies semantic information among the incomplete texts determined according to the target information among the multiple texts can be combined, A merged text is obtained, so when the text in the picture includes complex texts such as column text, page text, or deformed and irregular text, these complex texts can be merged according to semantic information. Further, since the combined text is output only when the combined text and its corresponding translation are both complete texts, the semantics of the resulting combined text are more fluent. In this way, the ability to process the text in the picture is improved.

It should be understood that, in this embodiment of the present application, the input unit 404 may include a graphics processing unit (graphics processing unit, GPU) 4041 and a microphone 4042, and the graphics processing unit 4041 is compatible with the image capturing device ( Such as the image data of the still picture or video obtained by the camera) for processing. The display unit 406 may include a display panel 4061, and the display panel 4061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 407 includes at least one of a touch panel 4071 and other input devices 4072 . The touch panel 4071 is also called a touch screen. The touch panel 4071 may include two parts, a touch detection device and a touch controller. Other input devices 4072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be repeated here.

The memory 409 can be used to store software programs as well as various data. The memory 109 can mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area can store an operating system, an application program or instructions required by at least one function (such as a sound playing function, image playback function, etc.), etc. Furthermore, memory 109 may include volatile memory or nonvolatile memory, or memory x09 may include both volatile and nonvolatile memory. Among them, the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash. Volatile memory can be random access memory (Random Access Memory, RAM), static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random access memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (Synch link DRAM , SLDRAM) and Direct Memory Bus Random Access Memory (Direct Rambus RAM, DRRAM). The memory 109 in the embodiment of the present application includes but is not limited to these and any other suitable types of memory.

The processor 110 may include one or more processing units; optionally, the processor 110 integrates an application processor and a modem processor, wherein the application processor mainly processes operations related to the operating system, user interface, and application programs, etc., Modem processors mainly process wireless communication signals, such as baseband processors. It can be understood that the foregoing modem processor may not be integrated into the processor 110 .

The embodiment of the present application also provides a readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by a processor, each process of the above-mentioned image processing method embodiment is realized, and the same Technical effects, in order to avoid repetition, will not be repeated here.

Wherein, the processor is the processor in the electronic device in the above embodiment. The readable storage medium includes a computer-readable storage medium, such as a computer read-only memory ROM, a random access memory RAM, a magnetic disk or an optical disk, and the like.

The embodiment of the present application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to realize the various processes of the above-mentioned image processing method embodiments, and can achieve the same To avoid repetition, the technical effects will not be repeated here.

It should be understood that the chips mentioned in the embodiments of the present application may also be called system-on-chip, system-on-chip, system-on-a-chip, or system-on-a-chip.

The embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the various processes in the above image processing method embodiment, and can achieve the same technical effect , to avoid repetition, it will not be repeated here.

It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element. In addition, it should be pointed out that the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. Functions are performed, for example, the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present application can be embodied in the form of computer software products, which are stored in a storage medium (such as ROM/RAM, magnetic disk, etc.) , CD-ROM), including several instructions to make a terminal (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) execute the method in each embodiment of the present application.

The embodiments of the present application have been described above in conjunction with the accompanying drawings, but the present application is not limited to the above-mentioned specific implementations. The above-mentioned specific implementations are only illustrative and not restrictive. Those of ordinary skill in the art will Under the inspiration of this application, without departing from the purpose of this application and the scope of protection of the claims, many forms can also be made, all of which belong to the protection of this application.

Claims

An image processing method, the method comprising:

Acquiring N texts and target information included in the target picture, the target information including at least one of the following: the first completeness of the N texts, the second completeness of the first translation corresponding to the N texts, N is an integer greater than 1;

Merging S texts satisfying the first semantic information among the P texts to obtain the first text, the P texts are incomplete texts determined from the N texts according to the first completeness, and both P and S are is an integer greater than 1;

In the case that both the first text and the second translation are complete texts, outputting the first text, the second translation is a text obtained after merging translations corresponding to the S texts in the third translation, The third translation is an incomplete translation determined from the first translation according to the second degree of completeness.
The method according to claim 1, wherein the first completeness includes the sentence completeness of the first target sentence of each text in the N texts, and the second completeness includes the sentence completeness in the first translation. Sentence completeness of the second target sentence for each translation;

The N texts and target information included in the acquisition target picture include:

Extracting the text included in the target picture to obtain the N texts;

analyzing the sentence completeness of the first target sentence based on the first semantic information;

analyzing the sentence completeness of the second target sentence based on the second semantic information;

Wherein, the first target sentence and the second target sentence respectively include at least one of the following: the first sentence in the text, the last sentence in the text;

The first semantic information and the second semantic information respectively include at least one of the following items: sentence structure information, sentence component information, and phrase composition information.
The method according to claim 2, wherein, based on the first semantic information, after analyzing the sentence integrity of the first target sentence, the S texts satisfying the first semantic information among the P texts are combined to obtain Before the first text, the method also includes:

The P texts are determined from the N texts according to the sentence completeness of the first target sentence, and the P texts correspond to the third translation.
The method according to claim 1, wherein said merging the S texts satisfying the first semantic information among the P texts, before obtaining the first text, the method further comprises:

According to the first semantic information, at least two texts matching the second text in the P texts are obtained from the P texts;

Merging the second text with the at least two texts respectively to obtain at least two merged texts;

determining the sentence perplexity of the at least two merged texts, the sentence perplexity being used to indicate the smoothness of sentences in the merged text;

Determining the third text corresponding to the target merged text as the text to be merged corresponding to the second text, the target merged text being the text with the lowest sentence perplexity among the at least two merged texts;

Wherein, the S texts include the second text and the third text.
The method according to claim 1, wherein said merging the S texts satisfying the first semantic information among the P texts, before obtaining the first text, the method further comprises:

According to the distribution position of each text in the P texts, determine Q texts with adjacent distribution positions from the P texts, where Q is an integer greater than or equal to S;

Determining S texts satisfying the first semantic information among the Q texts as texts to be merged.
The method according to claim 5, wherein, after determining the S texts satisfying the first semantic information among the Q texts as texts to be merged, the method further comprises:

determining the target sequence of the S texts according to the first semantic information;

The merging of the S texts satisfying the first semantic information among the P texts to obtain the first text includes:

Merge the S texts according to the target arrangement sequence to obtain the first text.
A picture processing device, the picture processing device comprising an acquisition module, a processing module and an output module;

The acquiring module is configured to acquire N texts and target information included in the target picture, and the target information includes at least one of the following items: the first completeness of the N texts, the first degree of the N texts corresponding to The second degree of completeness of the translation, N is an integer greater than 1;

The processing module is configured to combine S texts satisfying the first semantic information among the P texts to obtain a first text, and the P texts are non-identified texts determined from the N texts according to the first completeness degree. Complete text, both P and S are integers greater than 1;

The output module is configured to output the first text when the first text and the second translation are complete texts, and the second translation is a translation corresponding to the S texts in the third translation In the combined text, the third translation is an incomplete translation determined from the first translation according to the second degree of completeness.
The apparatus according to claim 7, wherein the first completeness includes the sentence completeness of the first target sentence of each of the N texts, and the second completeness includes the sentence completeness in the first translation. Sentence completeness of the second target sentence for each translation;

The acquisition module is specifically configured to extract the text included in the target picture to obtain the N texts; and based on the first semantic information, analyze the sentence integrity of the first target sentence; and based on the second Semantic information, analyzing the sentence completeness of the second target sentence;

Wherein, the first target sentence and the second target sentence respectively include at least one of the following: the first sentence in the text, the last sentence in the text;

The first semantic information and the second semantic information respectively include at least one of the following items: sentence structure information, sentence component information, and phrase composition information.
The device according to claim 8, wherein the image processing device further comprises a determining module;

The determination module is configured to determine the P texts from the N texts according to the sentence completeness of the first target sentence, and the P texts correspond to the third translation.
The device according to claim 7, wherein the image processing device further comprises a determining module;

The obtaining module is further configured to obtain at least two texts from the P texts that match the second text of the P texts according to the first semantic information;

The processing module is further configured to merge the second text with the at least two texts respectively to obtain at least two merged texts;

The determination module is configured to determine the third text corresponding to the target merged text as the text to be merged corresponding to the second text, and the target merged text is the text with the lowest sentence perplexity among the at least two merged texts;

Wherein, the S texts include the second text and the third text.
The device according to claim 7, wherein the image processing device further comprises a determining module;

The determination module is configured to determine Q texts with adjacent distribution positions from the P texts according to the distribution position of each text in the P texts, where Q is an integer greater than or equal to S; and Among the Q texts, S texts satisfying the first semantic information are determined as texts to be merged.
The device according to claim 11, wherein the determination module is further configured to determine the target arrangement order of the S texts according to the first semantic information;

The processing module is specifically configured to combine the S texts according to the target arrangement order to obtain the first text.
An electronic device, comprising a processor and a memory, the memory stores programs or instructions that can run on the processor, and when the programs or instructions are executed by the processor, any one of claims 1-6 is implemented. The steps of the picture processing method described in item.
A readable storage medium, storing programs or instructions on the readable storage medium, and implementing the steps of the image processing method according to any one of claims 1-6 when the programs or instructions are executed by a processor.
A computer program product, the program product is executed by at least one processor to implement the picture processing method according to any one of claims 1-6.
An electronic device configured to execute the image processing method according to any one of claims 1-6.
A chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, the processor is used to run programs or instructions, and realize the picture as described in any one of claims 1-6 Approach.