WO2023103943A1 - Image processing method and apparatus, and electronic device - Google Patents

Image processing method and apparatus, and electronic device Download PDF

Info

Publication number
WO2023103943A1
WO2023103943A1 PCT/CN2022/136494 CN2022136494W WO2023103943A1 WO 2023103943 A1 WO2023103943 A1 WO 2023103943A1 CN 2022136494 W CN2022136494 W CN 2022136494W WO 2023103943 A1 WO2023103943 A1 WO 2023103943A1
Authority
WO
WIPO (PCT)
Prior art keywords
texts
text
sentence
target
translation
Prior art date
Application number
PCT/CN2022/136494
Other languages
French (fr)
Chinese (zh)
Inventor
刘池莉
Original Assignee
维沃移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 维沃移动通信有限公司 filed Critical 维沃移动通信有限公司
Publication of WO2023103943A1 publication Critical patent/WO2023103943A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables

Definitions

  • the present application belongs to the technical field of communications, and in particular relates to a picture processing method, device and electronic equipment.
  • electronic devices are more and more widely used, for example, electronic devices can recognize and process text in pictures.
  • the electronic device can combine the multiple lines of text in the picture according to the physical position coordinates and text layout of the text lines in the picture during the process of identifying the picture.
  • the electronic device may not be able to process the text in the picture according to the physical position coordinates of the text line and the text layout. merge. In this way, the electronic device has a poor processing ability for the text in the picture.
  • the purpose of the embodiments of the present application is to provide a picture processing method, device and electronic equipment, which can solve the problem that the electronic equipment has poor processing ability for text in pictures.
  • the embodiment of the present application provides a picture processing method, the method includes: acquiring N texts and target information included in the target picture, the target information includes at least one of the following: the first complete text of the N texts degree, the second completeness degree of the first translation corresponding to the N texts, N is an integer greater than 1; among the P texts, S texts satisfying the first semantic information are combined to obtain the first text, and the P texts are based on The first completeness is an incomplete text determined from the N texts, P and S are both integers greater than 1, and both P and S are integers greater than 1; when both the first text and the second translation are complete texts In the case of , output the first text, the second translation is the text obtained after merging the translations corresponding to the S texts in the third translation, and the third translation is determined from the first translation according to the second completeness Incomplete translation.
  • an embodiment of the present application provides an image processing apparatus, and the image processing apparatus includes: an acquisition module, a processing module, and an output module.
  • An acquisition module configured to acquire N texts and target information included in the target picture, the target information including at least one of the following: the first completeness of the N texts, the second completeness of the first translation corresponding to the N texts Degree, N is an integer greater than 1.
  • a processing module configured to merge S texts that satisfy the first semantic information among the P texts to obtain a first text, the P texts are incomplete texts determined from the N texts according to the first completeness, P and S is an integer greater than 1.
  • An output module configured to output the first text when the first text and the second translation are complete texts
  • the second translation is a text obtained after merging translations corresponding to the S texts in the third translation
  • the The third translation is an incomplete translation determined from the first translation according to the second degree of completeness.
  • an embodiment of the present application provides an electronic device, the electronic device includes a processor and a memory, the memory stores programs or instructions that can run on the processor, and the programs or instructions are executed by the processor When implementing the steps of the method in the first aspect above.
  • an embodiment of the present application provides a readable storage medium, on which a program or an instruction is stored, and when the program or instruction is executed by a processor, the steps of the method in the above first aspect are implemented.
  • the embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the method in the first aspect above.
  • an embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the method described in the first aspect.
  • the N texts and target information included in the target picture are obtained, and the target information includes at least one of the following items: the first completeness of the N texts, the first translation of the N texts corresponding to the first Two completeness, N is an integer greater than 1; merging S texts satisfying the first semantic information in the P texts to obtain the first text, the P texts are determined from the N texts according to the first completeness Incomplete text, both P and S are integers greater than 1, and both P and S are integers greater than 1; when the first text and the second translation are both complete texts, the first text is output, and the second translation is A text obtained by merging translations corresponding to the S texts in the third translation, where the third translation is an incomplete translation determined from the first translation according to the second degree of completeness.
  • FIG. 1 is a schematic diagram of an image processing method provided in an embodiment of the present application.
  • Fig. 2 (a) is one of the interface schematic diagrams of a picture processing provided by the embodiment of the present application.
  • Fig. 2(b) is the second schematic diagram of an image processing interface provided by the embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an image processing device provided in an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of hardware of an electronic device provided by an embodiment of the present application.
  • an embodiment of the present application provides an image processing method, and the method includes the following S101 to S103.
  • the image processing apparatus acquires N pieces of text and object information included in a target image.
  • the above-mentioned target information includes at least one of the following items: the first completeness degree of the N texts, the second completeness degree of the first translation corresponding to the N texts, and N is an integer greater than 1.
  • N is an integer greater than 1.
  • the above-mentioned target picture may be any of the following: a picture taken by the electronic device, a screenshot saved by the electronic device, and an online picture obtained by the electronic device.
  • the target picture may include multiple texts.
  • the N texts are texts in the plurality of texts.
  • the language types of the above N texts may be Chinese, English, Korean, Japanese and so on.
  • each of the above N texts may be a text in the traditional sense, or a text line. Specifically, it may be determined according to actual usage conditions, which is not limited in this embodiment of the present application.
  • the text line can be an independent single text line, and at this time, the single text line can be used as a text; or, the text line can be a certain text paragraph A line of text in .
  • the text content contained in the target picture may be identified through the picture text recognition technology, and the text content may specifically include: the text contained in the target picture, and the coordinates of the text.
  • the above-mentioned first translation may include translations in one language type, or translations in multiple language types. Specifically, it may be determined according to actual usage conditions, which is not limited in this embodiment of the present application.
  • first completeness and second completeness are determined according to semantic information.
  • semantic information For details, reference may be made to the detailed description of the following embodiments, and details are not described in this embodiment of the present application.
  • the image processing device merges S texts satisfying the first semantic information among the P texts to obtain the first text.
  • P texts are incomplete texts determined from the N texts according to the first completeness degree. Both P and S are integers greater than 1.
  • Scenario 1 The semantics of P texts are incomplete.
  • Scenario 2 The sentence structure of the first sentence or the last sentence of each text in the P texts is missing.
  • Scenario 3 The sentence-ending words in each of the P texts cannot form separate words.
  • the first semantic information may include at least one of the following: sentence structure information, sentence component information, and phrase composition information.
  • the sentence structure information may include at least one of the following: subject-predicate structure, verb-object structure, subject-verb-object structure, subject-verb-object definite complement structure, and the like.
  • the sentence component information may include at least one of the following: subject, predicate, object, attributive, adverbial, complement and so on.
  • phrase composition information may include at least one of the following: sentence beginning words, sentence ending words, common words, phrases, phrases, and the like.
  • first semantic information may also include other information related to semantics, which is not limited in this embodiment of the present application.
  • the description of the above-mentioned first semantic information is only a possible exemplary situation enumerated when the N texts are Chinese texts.
  • the semantic rules of other language types can be followed.
  • syntax to explain the semantic information which is not limited in this embodiment of the present application.
  • the P texts only include a group of texts conforming to the first semantic information, that is, the S texts are the group of texts; in another possible situation, the P texts The texts include multiple sets of texts conforming to the first semantic information, and the S texts are any set of texts in the multiple sets of texts.
  • the P texts include multiple groups of texts conforming to the first semantic information
  • the S texts which will not be discussed in this embodiment of the present application. repeat.
  • the image processing device outputs the first text.
  • the above-mentioned second translation is a text obtained by merging the translations corresponding to the S texts in the third translation
  • the third translation is an incomplete translation determined from the first translation according to the second degree of completeness.
  • Example 1 take the image processing device as a mobile phone as an example.
  • the picture contains text: S1 to S8; as shown in Figure 2(b), it is the translation corresponding to S1 to S8: 01 to 08 , the first translation.
  • the mobile phone can acquire the picture including text: S1 to S8, and the completeness of S1 to S8 and the completeness of the first translation. Since S2 to S8 are incomplete texts among the eight texts, the mobile phone can combine S2, S3 and S4 satisfying the first semantic information among S2 to S8 to obtain S9 after the combination of S2, S3 and S4. Afterwards, in the case that S9 and the translations obtained by merging 02 to 08 corresponding to S2 to S8 are all complete texts, output S9.
  • the image processing method provided in the embodiment of the present application may further include: the image processing apparatus determines that the first text is a complete text according to the first semantic information.
  • the embodiment of the present application provides a picture processing method. After obtaining multiple texts and target information in the target picture, at least one text that satisfies the semantic information among the incomplete texts determined according to the target information can be combined. , to obtain a merged text, so when the text in the picture includes complex text such as column text, page text, or deformed and irregular text, these complex texts can be merged according to semantic information. Further, since the combined text is output only when the combined text and its corresponding translation are both complete texts, the semantics of the resulting combined text are more fluent. In this way, the ability to process the text in the picture is improved.
  • the above-mentioned first completeness includes the sentence completeness of the first target sentence of each text in the N texts
  • the second completeness includes the sentence completeness of the second target sentence of each translation in the first translation; correspondingly
  • the above S101 may specifically be implemented through the following S101A to S101C.
  • the image processing device extracts text included in the target image to obtain N texts.
  • the image processing device analyzes the sentence completeness of the first target sentence based on the first semantic information.
  • first semantic information may include at least one of the following items: sentence structure information, sentence component information, and phrase composition information.
  • the first target clause may include at least one of the following: the first sentence in the text, and the last sentence in the text.
  • analyzing the sentence integrity of the first target sentence based on the first semantic information may include the following two possible implementations:
  • Implementation method 1 using the first semantic information as a preset rule to analyze the sentence integrity of the first target sentence.
  • the first semantic information such as sentence structure, phrase composition, sentence components, etc.
  • all possible incomplete texts are screened out.
  • the first sentence of S6 is "a newly opened store in Dao". According to the analysis of sentence component information, the first sentence of S6 lacks a subject, so S6 is considered incomplete.
  • the first semantic information such as the first word of the sentence, the word at the end of the sentence, and the phrase.
  • Vocabulary, phrase table and end-of-sentence vocabulary for different types of languages can be constructed, and a weight can be set for each word in the vocabulary; wherein, the weight can be set according to the frequency of use of a word.
  • the phrase composition information it can be judged whether the last word of the last sentence of the text is a common sentence ending word, or whether the first word and the last word of the text can be independently formed into words or phrases to determine the Whether the text line paragraph is complete.
  • the last word of S2 is "Peng", and it can be seen from the vocabulary that the probability of "Peng” being able to form a word alone and be used as an ending word is very low, so S2 is considered to be incomplete.
  • Implementation method 2 construct a semantic model corresponding to the first semantic information, input N texts into the semantic model, and analyze the sentence integrity of the first target sentence.
  • text data with features such as lexical structure, syntactic structure, and sentence ending words can be used to train a semantic model, and the first semantic information such as part of speech and syntactic structure of different types of languages can be set through the semantic model.
  • the semantic model can be directly used to judge whether the target sentence of the current text is complete.
  • semantic model information such as the morphology and syntactic structure of the incomplete sentence and possible missing sentence components are output at the same time.
  • the embodiment of the present application does not limit the specific algorithm of the semantic model, as long as the corresponding model training data is constructed according to different types of languages.
  • the image processing device analyzes the sentence completeness of the second target sentence based on the second semantic information.
  • the above-mentioned second semantic information respectively includes at least one of the following items: sentence structure information, sentence component information, and phrase composition information.
  • the image processing method provided in this embodiment of the present application may further include the following S104.
  • the image processing device determines P texts from the N texts according to the sentence completeness of the first target sentence.
  • Example 2 in combination with the above example 1, according to the first semantic information, since the last word of the last sentence of S2 is "friend", it cannot be used as the word at the end of the sentence, so the last sentence is incomplete, that is, S2 is incomplete; S3 The first word of the first sentence of S4 is "friend”, which cannot be used as the first word of the sentence, so the first sentence is incomplete, that is, S3 is incomplete; the first sentence of S4 is "a lot of things", the first sentence A sentence lacks a subject predicate, so this first sentence is incomplete, that is, S4 is incomplete; the last sentence of S5 is "I know", and this last sentence lacks an object, so this last sentence is incomplete, that is, S5 is not Complete; the first sentence of S6 is "Tao", the first sentence lacks a subject, so the first sentence is incomplete, that is, S6 is incomplete; the last sentence of S7 is "cannot", which cannot be used as a sentence The end word, so the last sentence is incomplete, that is, S7 is
  • the picture processing method provided by the embodiment of the present application can extract the text included in the target picture, obtain N texts, and analyze the sentence completeness of the first target sentence based on the first semantic information, and analyze the completeness of the first target sentence based on the second semantic information
  • the sentence completeness of the second target sentence that is, the completeness of the N texts and the completeness of the translations of the N texts can be determined.
  • P texts can be determined from the N texts according to the sentence integrity of the first target sentence, it is convenient to select texts satisfying the first semantic information from the P texts for merging.
  • the image processing method provided in the embodiment of the present application may further include the following S105 to S108.
  • the image processing apparatus acquires at least two texts from the P texts that match the second text in the P texts according to the first semantic information.
  • the above-mentioned second text is any one of the P texts.
  • the second text is the most front-distributed text among the P texts.
  • the above S105 may specifically include: the image processing device judges, according to the first semantic information, whether the second text among the P texts can be merged with any text other than the second text among the P texts, so as to obtain At least two texts that match the second text.
  • At least two texts matching the second text among the P texts means: the second text and at least two texts satisfy the first semantic information.
  • the image processing apparatus merges the second text with the at least two texts respectively to obtain at least two merged texts.
  • the above S106 may include the following two specific possible implementation manners:
  • the image processing device determines the sentence perplexity of the at least two merged texts.
  • the sentence perplexity is used to indicate the smoothness of the sentences in the merged text.
  • determining the sentence perplexity of the at least two merged texts is essentially determining the sentence perplexity of the merged sentences included in the at least two merged texts respectively.
  • the image processing apparatus determines the third text corresponding to the target merged text as the text to be merged corresponding to the second text.
  • the above-mentioned target merged text is the text with the lowest sentence perplexity among at least two merged texts.
  • the S texts include the second text and the third text.
  • the second text can be merged and the third text.
  • the image processing method provided by the embodiment of the present application, after obtaining at least two texts matching the second text of the P texts from the P texts according to the first semantic information, since the second text can be matched with the The at least two texts are merged to obtain at least two merged texts, and the sentence perplexity of the at least two merged texts is determined, so the at least two texts can be selected from the at least two texts according to the perplexity of the two merged texts.
  • the text to be merged is more matched to the text, thereby improving the correctness of the text merge.
  • the image processing method provided in the embodiment of the present application may further include the following S109 and S110. That is, the above details can be realized through S110 to S112.
  • the image processing device determines adjacent Q texts from the P texts according to the distribution position of each text in the P texts.
  • Q is an integer greater than or equal to S.
  • sequence of numbers in the non-merging list can represent the actual merging sequence.
  • S2_S7 in the unmergeable list means that the next sentence of S2 is not S7, but it does not mean that the next sentence of S7 cannot be S2.
  • the image processing device determines S texts that satisfy the first semantic information among the Q texts as texts to be merged.
  • At least two texts matching text 1 in the Q texts are obtained from the Q texts.
  • the merged text 1 is the text with the lowest sentence perplexity among at least two merged texts.
  • the S texts include text 1 and text 2.
  • the S texts also include other texts except text 1 and text 2, so that (1) in the above embodiment can be continued to be executed in a loop to (4) to determine other texts that match text1 and text2.
  • S texts satisfying the first semantic information can be obtained from the Q texts and determined as texts to be merged.
  • the Q texts adjacent to the distribution position can be determined from the P texts according to the distribution position of each text in the P texts, some texts that do not have the possibility of merging at the distribution position can be excluded, Therefore, unnecessary text merging operations of the electronic device are reduced. Further, since the S texts satisfying the first semantic information among the Q texts can be determined as the texts to be merged, after the rough screening of the distribution position, the first semantic information is used to determine the texts to be merged from the Q texts text, so that the semantic fluency of the merged text is higher.
  • the image processing method provided in this embodiment of the present application may further include the following S111.
  • the above S102 may specifically be implemented through the following S102A.
  • the image processing apparatus determines a target arrangement sequence of the S texts according to the first semantic information.
  • the arrangement order of the text can be determined according to the sentence component information and phrase composition information.
  • the image processing apparatus combines the S texts according to the target arrangement sequence to obtain the first text.
  • merging S texts according to the target order is essentially: merging the last sentence of one text and the first sentence of the other text among the two adjacent texts in S texts, This loops until the S texts are merged to obtain the first text.
  • sentence structure information As an example, take the first semantic information as sentence structure information and sentence component information as an example. Assume that the last sentence of text A is "I know”, which is a subject-predicate structure; the first sentence of text B is "Dao is a newly opened shop", which is a verb-object structure. According to the sentence structure information, sentence component information and phrase composition information, it can be known that text A lacks an object, text B lacks a subject, and "zhi" and "dao" conform to the phrase composition information, so that the arrangement order of text A and text B can be determined for A_B. That is, text B is merged at the end of text A.
  • phrase composition information Take the first semantic information as phrase composition information as an example.
  • the last word of the last sentence of text C is "friend”; the first word of the first sentence of text D is "friend”.
  • phrase composition information it can be known that "friend” in text C and "friend” in text D conform to the phrase composition information, so it can be determined that the arrangement order of text C and text D is C_D. That is, text B is merged at the end of text A.
  • the image processing method provided by the embodiment of the present application can determine the target arrangement order of the S texts according to the first semantic information, so after merging the S texts according to the target arrangement order to obtain the first text, the first The semantics of the text are more complete, and the problem of semantic contradictions is not easy to appear.
  • the image processing method provided in the embodiment of the present application may also include another possible implementation manner.
  • the method may also include the following S112 to S115.
  • M, T and L are all integers greater than 1;
  • the image processing device translates the fourth text to obtain a fourth translation.
  • the fourth translation may include translations in one language type, or translations in multiple language types.
  • the embodiment of the present application does not limit the number and language types of the fourth translations.
  • the fourth text is a Chinese-type text
  • the fourth translation is an English-type translation
  • the fourth text is an English-type text
  • the fourth translation includes a Chinese-type translation and a Korean-type translation.
  • the image processing device outputs the fourth text and the fourth translation.
  • the second text is Chinese text.
  • the Chinese text is translated to obtain an English translation. If the English translation is a complete text, the image processing device can output the Chinese text and the English translation.
  • the fourth text is obtained, and the fourth text is translated , to get the fourth translation, so when the fourth translation is a complete text, and when the fourth translation is a complete paragraph, the first text and the first translation are output, so that after judging the combination Based on whether the obtained fourth text is complete, combined with the judgment of the completeness of the fourth translation, it is determined whether to output the fourth text, thereby improving the accuracy of paragraph merging.
  • the fourth translation can also be output, in a scene where the text in the target picture needs to be translated, a translation with higher accuracy can be output.
  • the image processing method provided in the embodiment of the present application may further include the following S116 and S117.
  • the image processing device merges R texts among the T texts to obtain a fifth text.
  • R texts include paragraphs determined according to the semantic information of the fourth translation, and R is an integer greater than 1.
  • the above R texts may include all of the L texts, or some of the L texts, which are determined according to actual conditions, which is not limited in this embodiment of the present application.
  • the R texts are texts satisfying semantic information among the T texts.
  • the fourth translation is an incomplete text
  • other texts satisfying the semantic information can be obtained from the T texts, and the fourth text can be combined with the other texts. It can be understood that the text merging position of the fourth text and the other text corresponds to the semantically incomplete text position in the fourth translation.
  • the image processing device outputs the third text and the fifth translation.
  • the above-mentioned fifth translation is the translation corresponding to the fifth text.
  • the completeness of the translation is the focus of picture translation. If the translation is incomplete, even if the original text in the picture is merged into a paragraph (also called the original text paragraph) is complete, it is necessary to merge the text that satisfies the semantic information at the corresponding position of the original paragraph according to the position of the semantically incomplete text in the translation. Therefore, the integrity of the translation is judged after translation again to ensure the integrity of the final output text.
  • merging from the original text paragraphs can ensure the integrity of the original text paragraphs. Only when the original paragraph is a complete paragraph can an effective translation be obtained after the original paragraph is translated by the translation model. On the contrary, if only the translation is merged, it is difficult to obtain a translation that satisfies the semantic information.
  • the image processing method provided in the embodiment of the present application may further include: if the fifth text is a complete text, translating the fifth text to obtain a fifth translation.
  • the translation process is performed only when the merged fifth text is a complete text, thereby avoiding invalid translation operations when the merged text is an incomplete text, and also saving the operation of electronic equipment resource.
  • the fourth translation is an incomplete translation
  • the R texts among the T texts can be combined to obtain the fifth text. Therefore, according to the incomplete fourth translation, re- The R texts satisfying the semantic information among the T texts are merged, thereby improving the accuracy of text merging. Further, since the fifth text and the fifth translation are output only when both the fifth text and the fifth translation are complete texts, it can be ensured that a translation with higher accuracy is output.
  • the image processing method provided in the embodiment of the present application may be executed by an image processing device.
  • the method of image processing performed by the image processing device is taken as an example to illustrate the method provided by the embodiment of the present application.
  • an embodiment of the present application provides an image processing apparatus 200 , and the image processing apparatus may include an acquisition module 201 , a processing module 202 and an output module 203 .
  • the acquiring module 201 may be configured to acquire N texts and target information included in the target picture, where the target information includes at least one of the following items: the first completeness of the N texts, the first translation of the N texts corresponding to the first Two completeness, N is an integer greater than 1.
  • the processing module 202 may be configured to merge S texts satisfying the first semantic information among the P texts to obtain the first text, the P texts are incomplete texts determined from the N texts according to the first completeness, Both P and S are integers greater than 1.
  • the output module 203 may be configured to output the first text when the first text and the second translation are complete texts, and the second translation is the text obtained after merging translations corresponding to the S texts in the third translation , the third translation is an incomplete translation determined from the first translation according to the second degree of completeness.
  • the first completeness includes the sentence completeness of the first target sentence in each of the N texts
  • the second completeness includes the sentence completeness of the second target sentence in each of the first translations.
  • the acquisition module 201 is specifically used to extract the text included in the target picture to obtain N texts; and based on the first semantic information, analyze the sentence integrity of the first target sentence; and based on the second semantic information, analyze the text of the second target sentence Sentence completeness; wherein, the first target sentence and the second target sentence respectively include at least one of the following: the first sentence in the text, the last sentence in the text; the first semantic information and the second semantic information respectively include at least the following One item: sentence structure information, sentence component information, and phrase composition information;
  • the image processing apparatus may further include a determination module.
  • the determination module can be used to determine P texts from the N texts according to the sentence integrity of the first target sentence, and the P texts correspond to the third translation.
  • the image processing apparatus may further include a determination module.
  • the obtaining module 201 may also be configured to obtain at least two texts from the P texts that match the second text in the P texts according to the first semantic information.
  • the processing module 202 may also be configured to merge the second text with at least two texts to obtain at least two merged texts.
  • a determination module configured to determine the third text corresponding to the target merged text as the text to be merged corresponding to the second text, the target merged text is the text with the lowest sentence perplexity in at least two merged texts; wherein, the S text includes the first Second text and third text.
  • the image processing apparatus may further include a determination module.
  • the determining module can be used to determine adjacent Q texts from the P texts according to the distribution position of each text in the P texts, and Q is an integer greater than or equal to S; S texts of semantic information are determined as texts to be merged.
  • the determination module may also be configured to determine the target arrangement order of the S texts according to the first semantic information.
  • the processing module may be specifically configured to combine the S texts according to the target arrangement order to obtain the first text.
  • the embodiment of the present application provides an image processing device. After acquiring multiple texts and target information in the target picture, at least one text that satisfies semantic information among the incomplete texts determined according to the target information can be combined. , to obtain a merged text, so when the text in the picture includes complex text such as column text, page text, or deformed and irregular text, these complex texts can be merged according to semantic information. Further, since the combined text is output only when the combined text and its corresponding translation are both complete texts, the semantics of the resulting combined text are more fluent. In this way, the ability to process the text in the picture is improved.
  • the image processing apparatus in the embodiment of the present application may be an electronic device, or may be a component in the electronic device, such as an integrated circuit or a chip.
  • the electronic device may be a terminal, or other devices other than the terminal.
  • the electronic device can be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle electronic device, a mobile Internet device (Mobile Internet Device, MID), an augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) ) equipment, robots, wearable devices, ultra-mobile personal computer (ultra-mobile personal computer, UMPC), netbook or personal digital assistant (personal digital assistant, PDA), etc.
  • the picture processing device in the embodiment of the present application may be a device with an operating system.
  • the operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, which are not specifically limited in this embodiment of the present application.
  • the image processing apparatus provided in the embodiment of the present application can realize various processes realized by the method embodiments in FIG. 1 and FIG. 2 , and details are not repeated here to avoid repetition.
  • the embodiment of the present application also provides an electronic device 300, including a processor 301 and a memory 302.
  • the memory 302 stores programs or instructions that can run on the processor 301.
  • the programs or instructions are executed by the processor 301, the various steps of the above image processing method embodiments can be achieved, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
  • the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.
  • FIG. 5 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
  • the electronic device 400 includes, but is not limited to: a radio frequency unit 401, a network module 402, an audio output unit 403, an input unit 404, a sensor 405, a display unit 406, a user input unit 407, an interface unit 408, a memory 409, and a processor 410, etc. part.
  • the electronic device 400 can also include a power supply (such as a battery) for supplying power to various components, and the power supply can be logically connected to the processor 410 through the power management system, so that the management of charging, discharging, and function can be realized through the power management system. Consumption management and other functions.
  • a power supply such as a battery
  • the structure of the electronic device shown in FIG. 5 does not constitute a limitation to the electronic device.
  • the electronic device may include more or fewer components than shown in the figure, or combine certain components, or arrange different components, and details will not be repeated here. .
  • the processor 410 may be configured to acquire N texts and target information included in the target picture, where the target information includes at least one of the following items: the first completeness of the N texts, the first translation corresponding to the N texts
  • the second completeness degree, N is an integer greater than 1; and used to merge S texts satisfying the first semantic information in the P texts to obtain the first text, the P texts are obtained from the N texts according to the first completeness degree
  • the incomplete text determined in the text, P and S are both integers greater than 1; and used to output the first text when the first text and the second translation are complete texts, and the second translation is the third translation
  • the text obtained after merging the translations corresponding to the S texts, the third translation is an incomplete translation determined from the first translation according to the second degree of completeness.
  • the first completeness includes the sentence completeness of the first target sentence in each of the N texts
  • the second completeness includes the sentence completeness of the second target sentence in each of the first translations.
  • the processor 410 is specifically configured to extract text included in the target picture to obtain N texts; and analyze the sentence integrity of the first target sentence based on the first semantic information; and analyze the sentence integrity of the second target sentence based on the second semantic information.
  • Sentence completeness wherein, the first target sentence and the second target sentence respectively include at least one of the following: the first sentence in the text, the last sentence in the text; the first semantic information and the second semantic information respectively include at least the following One item: sentence structure information, sentence component information, and phrase composition information.
  • the processor 410 may be configured to determine P texts from the N texts according to the sentence completeness of the first target sentence, and the P texts correspond to the third translation.
  • the processor 410 may also be configured to obtain at least two texts from the P texts that match the second text in the P texts according to the first semantic information; Texts are merged to obtain at least two merged texts; and the third text corresponding to the target merged text is determined as the text to be merged corresponding to the second text, and the target merged text is the lowest sentence perplexity in at least two merged texts text; wherein, the S texts include the second text and the third text.
  • the processor 410 may be configured to determine adjacent Q texts from the P texts according to the distribution position of each text in the P texts, where Q is an integer greater than or equal to S; S texts satisfying the first semantic information among the texts are determined as texts to be merged.
  • the processor 410 may also be configured to determine a target arrangement order of the S texts according to the first semantic information; and to combine the S texts according to the target arrangement order to obtain the first text.
  • An embodiment of the present application provides an electronic device. After acquiring multiple texts and target information in the target picture, at least one text that satisfies semantic information among the incomplete texts determined according to the target information among the multiple texts can be combined, A merged text is obtained, so when the text in the picture includes complex texts such as column text, page text, or deformed and irregular text, these complex texts can be merged according to semantic information. Further, since the combined text is output only when the combined text and its corresponding translation are both complete texts, the semantics of the resulting combined text are more fluent. In this way, the ability to process the text in the picture is improved.
  • the input unit 404 may include a graphics processing unit (graphics processing unit, GPU) 4041 and a microphone 4042, and the graphics processing unit 4041 is compatible with the image capturing device (such as the image data of the still picture or video obtained by the camera) for processing.
  • the display unit 406 may include a display panel 4061, and the display panel 4061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the user input unit 407 includes at least one of a touch panel 4071 and other input devices 4072 .
  • the touch panel 4071 is also called a touch screen.
  • the touch panel 4071 may include two parts, a touch detection device and a touch controller.
  • Other input devices 4072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be repeated here.
  • the memory 409 can be used to store software programs as well as various data.
  • the memory 109 can mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area can store an operating system, an application program or instructions required by at least one function (such as a sound playing function, image playback function, etc.), etc.
  • memory 109 may include volatile memory or nonvolatile memory, or memory x09 may include both volatile and nonvolatile memory.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
  • ROM Read-Only Memory
  • PROM programmable read-only memory
  • Erasable PROM Erasable PROM
  • EPROM erasable programmable read-only memory
  • Electrical EPROM Electrical EPROM
  • EEPROM electronically programmable Erase Programmable Read-Only Memory
  • Volatile memory can be random access memory (Random Access Memory, RAM), static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random access memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (Synch link DRAM , SLDRAM) and Direct Memory Bus Random Access Memory (Direct Rambus RAM, DRRAM).
  • RAM Random Access Memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM Double Data Rate SDRAM
  • DDRSDRAM double data rate synchronous dynamic random access memory
  • Enhanced SDRAM, ESDRAM enhanced synchronous dynamic random access memory
  • Synch link DRAM , SLDRAM
  • Direct Memory Bus Random Access Memory Direct Rambus
  • the processor 110 may include one or more processing units; optionally, the processor 110 integrates an application processor and a modem processor, wherein the application processor mainly processes operations related to the operating system, user interface, and application programs, etc., Modem processors mainly process wireless communication signals, such as baseband processors. It can be understood that the foregoing modem processor may not be integrated into the processor 110 .
  • the embodiment of the present application also provides a readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by a processor, each process of the above-mentioned image processing method embodiment is realized, and the same Technical effects, in order to avoid repetition, will not be repeated here.
  • the processor is the processor in the electronic device in the above embodiment.
  • the readable storage medium includes a computer-readable storage medium, such as a computer read-only memory ROM, a random access memory RAM, a magnetic disk or an optical disk, and the like.
  • the embodiment of the present application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to realize the various processes of the above-mentioned image processing method embodiments, and can achieve the same To avoid repetition, the technical effects will not be repeated here.
  • chips mentioned in the embodiments of the present application may also be called system-on-chip, system-on-chip, system-on-a-chip, or system-on-a-chip.
  • the embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the various processes in the above image processing method embodiment, and can achieve the same technical effect , to avoid repetition, it will not be repeated here.
  • the term “comprising”, “comprising” or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase “comprising a " does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.
  • the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. Functions are performed, for example, the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Machine Translation (AREA)

Abstract

The present application relates to the technical field of communications, and discloses an image processing method and apparatus, and an electronic device. The method comprises: obtaining N texts and target information comprised in a target image, the target information comprising at least one of first integrity of the N texts and second integrity of first translations corresponding to the N texts, N being an integer greater than 1; combining S texts satisfying first semantic information among P texts to obtain a first text, the P texts being incomplete texts determined from the N texts according to the first integrity, and both P and S being integers greater than 1; and under the condition that the first text and a second translation are both complete texts, outputting the first text, the second translation being a text obtained by combining translations corresponding to the S texts among third translations, and the third translations being incomplete translations determined from the first translations according to the second integrity.

Description

图片处理方法、装置及电子设备Image processing method, device and electronic equipment
相关申请的交叉引用Cross References to Related Applications
本申请主张在2021年12月10日在中国提交的中国专利申请号202111509057.0的优先权,其全部内容通过引用包含于此。This application claims priority to Chinese Patent Application No. 202111509057.0 filed in China on December 10, 2021, the entire contents of which are hereby incorporated by reference.
技术领域technical field
本申请属于通信技术领域,具体涉及一种图片处理方法、装置及电子设备。The present application belongs to the technical field of communications, and in particular relates to a picture processing method, device and electronic equipment.
背景技术Background technique
随着电子设备技术的发展,电子设备的应用越来越广泛,例如,电子设备可以识别并处理图片中的文本。With the development of electronic device technology, electronic devices are more and more widely used, for example, electronic devices can recognize and process text in pictures.
目前,在图片中包括多行文本的情况下,在电子设备识别该图片的过程中,电子设备可以根据该图片中文本行的物理位置坐标及文本布局,合并该图片中的多行文本。Currently, when a picture includes multiple lines of text, the electronic device can combine the multiple lines of text in the picture according to the physical position coordinates and text layout of the text lines in the picture during the process of identifying the picture.
然而,基于上述方式,当图片中的文本包括分栏文本、分页文本或者畸形不规则文本等复杂的文本时,电子设备可能无法根据文本行的物理位置坐标及文本布局,对图片中的文本进行合并。如此,导致电子设备对图片中的文本的处理能力较差。However, based on the above method, when the text in the picture includes complex text such as column text, page text, or deformed and irregular text, the electronic device may not be able to process the text in the picture according to the physical position coordinates of the text line and the text layout. merge. In this way, the electronic device has a poor processing ability for the text in the picture.
发明内容Contents of the invention
本申请实施例的目的是提供一种图片处理方法、装置及电子设备,能够解决电子设备对图片中的文本的处理能力较差的问题。The purpose of the embodiments of the present application is to provide a picture processing method, device and electronic equipment, which can solve the problem that the electronic equipment has poor processing ability for text in pictures.
第一方面,本申请实施例提供了一种图片处理方法,该方法包括:获取目标图片中包括的N个文本和目标信息,该目标信息包括以下至少一项:该N个文本的第一完整度,该N个文本对应的第一译文的第二完整度,N为大于1的整数;合并P个文本中满足第一语义信息的S个文本,得到第一文本,该P个文本为根据该第一完整度从该N个文本中确定的非完整文本,P和S均为大于1的整数,P和S均为大于1的整数;在该第一文本和第二译文均为完整文本的情况下,输出第一文本,第二译文为第三译文中与该S个文本对应的译文合并后得到的文本,该第三译文为根据该第二完整度从该第一译文中确定的非完整译文。In the first aspect, the embodiment of the present application provides a picture processing method, the method includes: acquiring N texts and target information included in the target picture, the target information includes at least one of the following: the first complete text of the N texts degree, the second completeness degree of the first translation corresponding to the N texts, N is an integer greater than 1; among the P texts, S texts satisfying the first semantic information are combined to obtain the first text, and the P texts are based on The first completeness is an incomplete text determined from the N texts, P and S are both integers greater than 1, and both P and S are integers greater than 1; when both the first text and the second translation are complete texts In the case of , output the first text, the second translation is the text obtained after merging the translations corresponding to the S texts in the third translation, and the third translation is determined from the first translation according to the second completeness Incomplete translation.
第二方面,本申请实施例提供了一种图片处理装置,该图片处理装置包括:包括获取模块、处理模块和输出模块。获取模块,用于获取目标图片中包括的N个文本和目标信息,该目标信息包括以下至少一项:该N个文本的第一完整度,该N个文本对应的第一译文的第二完整度,N为大于1的整数。处理模块,用于合并P个文本中满足第一语义信息的S个文本,得到第一文本,该P个文本为根据该第一完整度从该N个文本中确定的非完整文本,P和S均为大于1的整数。输出模块,用于在该第一文 本和第二译文为完整文本的情况下,输出第一文本,该第二译文为第三译文中与该S个文本对应的译文合并后得到的文本,该第三译文为根据该第二完整度从该第一译文中确定的非完整译文。In a second aspect, an embodiment of the present application provides an image processing apparatus, and the image processing apparatus includes: an acquisition module, a processing module, and an output module. An acquisition module, configured to acquire N texts and target information included in the target picture, the target information including at least one of the following: the first completeness of the N texts, the second completeness of the first translation corresponding to the N texts Degree, N is an integer greater than 1. A processing module, configured to merge S texts that satisfy the first semantic information among the P texts to obtain a first text, the P texts are incomplete texts determined from the N texts according to the first completeness, P and S is an integer greater than 1. An output module, configured to output the first text when the first text and the second translation are complete texts, the second translation is a text obtained after merging translations corresponding to the S texts in the third translation, the The third translation is an incomplete translation determined from the first translation according to the second degree of completeness.
第三方面,本申请实施例提供了一种电子设备,该电子设备包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被处理器执行时实现如上述第一方面中的方法的步骤。In a third aspect, an embodiment of the present application provides an electronic device, the electronic device includes a processor and a memory, the memory stores programs or instructions that can run on the processor, and the programs or instructions are executed by the processor When implementing the steps of the method in the first aspect above.
第四方面,本申请实施例提供了一种可读存储介质,该可读存储介质上存储程序或指令,程序或指令被处理器执行时实现如上述第一方面中的方法的步骤。In a fourth aspect, an embodiment of the present application provides a readable storage medium, on which a program or an instruction is stored, and when the program or instruction is executed by a processor, the steps of the method in the above first aspect are implemented.
第五方面,本申请实施例提供了一种芯片,该芯片包括处理器和通信接口,通信接口和处理器耦合,处理器用于运行程序或指令,实现如上述第一方面中的方法。In the fifth aspect, the embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the method in the first aspect above.
第六方面,本申请实施例提供一种计算机程序产品,该程序产品被存储在存储介质中,该程序产品被至少一个处理器执行以实现如第一方面所述的方法。In a sixth aspect, an embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the method described in the first aspect.
在本申请实施例中,获取目标图片中包括的N个文本和目标信息,该目标信息包括以下至少一项:该N个文本的第一完整度,该N个文本对应的第一译文的第二完整度,N为大于1的整数;合并P个文本中满足第一语义信息的S个文本,得到第一文本,该P个文本为根据该第一完整度从该N个文本中确定的非完整文本,P和S均为大于1的整数,P和S均为大于1的整数;在该第一文本和第二译文均为完整文本的情况下,输出第一文本,第二译文为第三译文中与该S个文本对应的译文合并后得到的文本,该第三译文为根据该第二完整度从该第一译文中确定的非完整译文。通过该方案,在获取到目标图片中的多个文本和目标信息之后,由于可以合并该多个文本中根据目标信息确定的非完整文本中满足语义信息的至少一个文本,得到一个合并文本,因此当图片中的文本包括分栏文本、分页文本或者畸形不规则文本等复杂的文本时,可以根据语义信息对这些复杂的文本进行合并。进一步地,由于在该合并文本和其对应的译文均为完整文本的情况下,才输出该合并文本,因此使得得到的合并文本的语义更加通顺。如此,提高了对图片中的文本的处理能力。In the embodiment of the present application, the N texts and target information included in the target picture are obtained, and the target information includes at least one of the following items: the first completeness of the N texts, the first translation of the N texts corresponding to the first Two completeness, N is an integer greater than 1; merging S texts satisfying the first semantic information in the P texts to obtain the first text, the P texts are determined from the N texts according to the first completeness Incomplete text, both P and S are integers greater than 1, and both P and S are integers greater than 1; when the first text and the second translation are both complete texts, the first text is output, and the second translation is A text obtained by merging translations corresponding to the S texts in the third translation, where the third translation is an incomplete translation determined from the first translation according to the second degree of completeness. Through this scheme, after obtaining multiple texts and target information in the target picture, at least one text that satisfies the semantic information among the incomplete texts determined according to the target information among the multiple texts can be combined to obtain a merged text, so When the text in the picture includes complex text such as column text, page text, or deformed and irregular text, these complex texts can be merged according to semantic information. Further, since the combined text is output only when the combined text and its corresponding translation are both complete texts, the semantics of the resulting combined text are more fluent. In this way, the ability to process the text in the picture is improved.
附图说明Description of drawings
图1为本申请实施例提供的一种图片处理方法的示意图;FIG. 1 is a schematic diagram of an image processing method provided in an embodiment of the present application;
图2(a)为本申请实施例提供的一种图片处理的界面示意图之一;Fig. 2 (a) is one of the interface schematic diagrams of a picture processing provided by the embodiment of the present application;
图2(b)为本申请实施例提供的一种图片处理的界面示意图之二;Fig. 2(b) is the second schematic diagram of an image processing interface provided by the embodiment of the present application;
图3为本申请实施例提供的图片处理装置的结构示意图;FIG. 3 is a schematic structural diagram of an image processing device provided in an embodiment of the present application;
图4为本申请实施例提供的电子设备的结构示意图;FIG. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;
图5为本申请实施例提供的电子设备的硬件示意图。FIG. 5 is a schematic diagram of hardware of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚地描述, 显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员获得的所有其他实施例,都属于本申请保护的范围。The following will clearly describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of them. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments in this application belong to the protection scope of this application.
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”等所区分的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。The terms "first", "second" and the like in the specification and claims of the present application are used to distinguish similar objects, and are not used to describe a specific sequence or sequence. It should be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application can be practiced in sequences other than those illustrated or described herein, and that references to "first," "second," etc. distinguish Objects are generally of one type, and the number of objects is not limited. For example, there may be one or more first objects. In addition, "and/or" in the specification and claims means at least one of the connected objects, and the character "/" generally means that the related objects are an "or" relationship.
下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的图片处理方法、装置及电子设备进行详细地说明。The image processing method, device, and electronic device provided in the embodiments of the present application will be described in detail below through specific embodiments and application scenarios with reference to the accompanying drawings.
如图1所示,本申请实施例提供一种图片处理方法,该方法包括下述S101至S103。As shown in FIG. 1 , an embodiment of the present application provides an image processing method, and the method includes the following S101 to S103.
S101、图片处理装置获取目标图片中包括的N个文本和目标信息。S101. The image processing apparatus acquires N pieces of text and object information included in a target image.
其中,上述目标信息包括以下至少一项:N个文本的第一完整度,N个文本对应的第一译文的第二完整度,N为大于1的整数。N为大于1的整数。Wherein, the above-mentioned target information includes at least one of the following items: the first completeness degree of the N texts, the second completeness degree of the first translation corresponding to the N texts, and N is an integer greater than 1. N is an integer greater than 1.
可选地,上述目标图片可以为以下任一项:电子设备拍摄的图片,电子设备保存的截图,电子设备获取的在线图片。Optionally, the above-mentioned target picture may be any of the following: a picture taken by the electronic device, a screenshot saved by the electronic device, and an online picture obtained by the electronic device.
可选地,在本申请实施例中,目标图片中可以包括多个文本。N个文本为该多个文本中的文本。Optionally, in this embodiment of the present application, the target picture may include multiple texts. The N texts are texts in the plurality of texts.
可选地,上述N个文本的语言类型可以为中文、英文、韩文和日文等。Optionally, the language types of the above N texts may be Chinese, English, Korean, Japanese and so on.
另外,上述N个文本中每个文本可以为传统意义上的文本,也可以为文本行。具体可以根据实际使用情况确定,本申请实施例对此不作限定。In addition, each of the above N texts may be a text in the traditional sense, or a text line. Specifically, it may be determined according to actual usage conditions, which is not limited in this embodiment of the present application.
进一步地,在N个文本中一个文本为文本行的情况下,该文本行可以为独立的单个文本行,此时该单个文本行可以作为一个文本;或者,该文本行可以为某个文本段落中的一个文本行。Further, in the case that one of the N texts is a text line, the text line can be an independent single text line, and at this time, the single text line can be used as a text; or, the text line can be a certain text paragraph A line of text in .
可选地,在本申请实施例中,可以通过图片文字识别技术,识别目标图片中包括的文本内容,该文本内容具体可以包括:该目标图片包括的文本,文本的坐标。Optionally, in the embodiment of the present application, the text content contained in the target picture may be identified through the picture text recognition technology, and the text content may specifically include: the text contained in the target picture, and the coordinates of the text.
可选地,上述第一译文可以包括一种语言类型的译文,或多种语言类型的译文。具体可以根据实际使用情况确定,本申请实施例对此不作限定。Optionally, the above-mentioned first translation may include translations in one language type, or translations in multiple language types. Specifically, it may be determined according to actual usage conditions, which is not limited in this embodiment of the present application.
可选地,上述第一完整度和第二完整度是根据语义信息确定,具体可以参照下述实施例的详细描述,本申请实施例在此不予赘述。Optionally, the foregoing first completeness and second completeness are determined according to semantic information. For details, reference may be made to the detailed description of the following embodiments, and details are not described in this embodiment of the present application.
S102、图片处理装置合并P个文本中满足第一语义信息的S个文本,得到第一文本。S102. The image processing device merges S texts satisfying the first semantic information among the P texts to obtain the first text.
其中,上述P个文本为根据第一完整度从N个文本中确定的非完整文本。P和S均为大于1的整数。Wherein, the above P texts are incomplete texts determined from the N texts according to the first completeness degree. Both P and S are integers greater than 1.
可选地,对于判断N个文本中的P个文本为非完整文本,可以包括如下几种场景:Optionally, for judging that P texts in the N texts are incomplete texts, the following scenarios may be included:
场景一:P个文本的语义不完整。Scenario 1: The semantics of P texts are incomplete.
场景二:P个文本中每个文本的第一个句子或最后一个句子的句型结构存在缺失。Scenario 2: The sentence structure of the first sentence or the last sentence of each text in the P texts is missing.
场景三:P个文本中每个文本的句尾词不能单独成词。Scenario 3: The sentence-ending words in each of the P texts cannot form separate words.
需要说明的是,上述3种场景均是通过语义信息,以判断N个文本中的P个文本为非完整文本。这3种场景仅是本申请实施例提供的示例性说明,当然,通过语义信息,判断N个文本中的P个文本为非完整文本还可以包括其他的实施方式,本申请实施例对此不作限定。It should be noted that the above three scenarios all use semantic information to determine that P texts out of N texts are incomplete texts. These three scenarios are only exemplary descriptions provided by the embodiments of this application. Of course, judging that P texts out of N texts are incomplete texts through semantic information may also include other implementations, which are not discussed in this embodiment of the application. limited.
可选地,上述第一语义信息可以包括以下至少一项:句型结构信息、句子成分信息、词组构成信息。Optionally, the first semantic information may include at least one of the following: sentence structure information, sentence component information, and phrase composition information.
示例性地,以第一语义信息为句型结构信息为例。句型结构信息可以包括以下至少一项:主谓结构、动宾结构、主谓宾结构、主谓宾定状补结构等。Exemplarily, take the first semantic information as sentence structure information as an example. The sentence structure information may include at least one of the following: subject-predicate structure, verb-object structure, subject-verb-object structure, subject-verb-object definite complement structure, and the like.
示例性地,以第一语义信息为句子成分信息为例。句子成分信息可以包括以下至少一项:主语、谓语、宾语、定语、状语、补语等。Exemplarily, take the first semantic information as sentence component information as an example. The sentence component information may include at least one of the following: subject, predicate, object, attributive, adverbial, complement and so on.
示例性地,以第一语义信息为词组构成信息为例。词组构成信息可以包括以下至少一项:句首词、句尾结束词、常用词语、词组、短语等。Exemplarily, take the first semantic information as phrase composition information as an example. The phrase composition information may include at least one of the following: sentence beginning words, sentence ending words, common words, phrases, phrases, and the like.
需要说明的是,上述实施例仅是对第一语义信息的示例性说明,当然第一语义信息还可以包括与语义相关的其他信息,本申请实施例对此不作限定。It should be noted that the above embodiment is only an exemplary description of the first semantic information, and of course the first semantic information may also include other information related to semantics, which is not limited in this embodiment of the present application.
此外,上述第一语义信息的说明,仅是在N个文本为中文文本的情况下,所列举的可能的示例性情况,当N个文本为其他语言类型时,可以按照其他语言类型的语义规则或语法对语义信息进行解释说明,本申请实施例对此不作限定。In addition, the description of the above-mentioned first semantic information is only a possible exemplary situation enumerated when the N texts are Chinese texts. When the N texts are of other language types, the semantic rules of other language types can be followed. Or syntax to explain the semantic information, which is not limited in this embodiment of the present application.
可选地,在本申请实施例中,一种可能的情况,P个文本仅包括符合第一语义信息的一组文本,即S个文本为该一组文本;另一种可能情况,P个文本包括符合第一语义信息的多组文本,S个文本为该多组文本中的任意一组文本。Optionally, in this embodiment of the present application, in a possible situation, the P texts only include a group of texts conforming to the first semantic information, that is, the S texts are the group of texts; in another possible situation, the P texts The texts include multiple sets of texts conforming to the first semantic information, and the S texts are any set of texts in the multiple sets of texts.
进一步地,在P个文本包括符合第一语义信息的多组文本的情况下,对于合并该多组文本的实施方式,可以参照对S个文本的详细描述,本申请实施例中对此不再赘述。Further, in the case that the P texts include multiple groups of texts conforming to the first semantic information, for the implementation of merging the multiple groups of texts, you can refer to the detailed description of the S texts, which will not be discussed in this embodiment of the present application. repeat.
S103、在该第一文本和第二译文均为完整文本的情况下,图片处理装置输出该第一文本。S103. In the case that the first text and the second translation are complete texts, the image processing device outputs the first text.
其中,上述第二译文为第三译文中与S个文本对应的译文合并后得到的文本,第三译文为根据第二完整度从第一译文中确定的非完整译文。Wherein, the above-mentioned second translation is a text obtained by merging the translations corresponding to the S texts in the third translation, and the third translation is an incomplete translation determined from the first translation according to the second degree of completeness.
示例一,以图片处理装置为手机为例。如图2(a)所示,在手机显示一张图片的情况下,该图片中包括文本:S1至S8;如图2(b)所示,为与S1至S8对应的译文:01至08,即第一译文。手机可以获取到该图片中包括文本:S1至S8,以及S1至S8的完整度和第一译文的完整度。由于这8个文本中S2至S8为非完整文本,因此手机可以合并S2至S8中满足第一语义信息的S2、S3和S4,得到S2、S3和S4合并之后的S9。之后,在S9、与S2至S8对应的02至08合并得到的译文均为完整文本的情况下,输出该S9。Example 1, take the image processing device as a mobile phone as an example. As shown in Figure 2(a), when the mobile phone displays a picture, the picture contains text: S1 to S8; as shown in Figure 2(b), it is the translation corresponding to S1 to S8: 01 to 08 , the first translation. The mobile phone can acquire the picture including text: S1 to S8, and the completeness of S1 to S8 and the completeness of the first translation. Since S2 to S8 are incomplete texts among the eight texts, the mobile phone can combine S2, S3 and S4 satisfying the first semantic information among S2 to S8 to obtain S9 after the combination of S2, S3 and S4. Afterwards, in the case that S9 and the translations obtained by merging 02 to 08 corresponding to S2 to S8 are all complete texts, output S9.
进一步地,对于这8个文本中满足第一语义信息的S5和S6;S7和S8。可以循环执行 上述流程,分别合并S5和S6;S7和S8。最后输出S5和S6合并之后的S10,和S7和S8合并之后的S11。Further, for S5 and S6; S7 and S8 satisfying the first semantic information among the eight texts. The above-mentioned process can be executed cyclically, respectively merging S5 and S6; S7 and S8. Finally, output S10 after merging S5 and S6, and S11 after merging S7 and S8.
如此,通过以上流程,可以图片中多组满足语义信息的文本合并,从而完成对图片中文本的处理。In this way, through the above process, multiple groups of text in the picture that satisfy the semantic information can be merged, thereby completing the processing of the text in the picture.
可选地,在上述S102之后,S103之前,本申请实施例提供的图片处理方法还可以包括:图片处理装置根据第一语义信息,确定第一文本为完整文本。Optionally, after the above S102 and before S103, the image processing method provided in the embodiment of the present application may further include: the image processing apparatus determines that the first text is a complete text according to the first semantic information.
进一步地,对于判断第一文本是否为完整S2至S8,可以参照下述实施例中判断N个文本的完整度的详细说明,本申请实施例在此不予赘述。Further, for judging whether the first text is complete S2 to S8, you can refer to the detailed description of judging the completeness of N texts in the following embodiments, which will not be repeated here in this embodiment of the present application.
本申请实施例提供一种图片处理方法,在获取到目标图片中的多个文本和目标信息之后,由于可以合并该多个文本中根据目标信息确定的非完整文本中满足语义信息的至少一个文本,得到一个合并文本,因此当图片中的文本包括分栏文本、分页文本或者畸形不规则文本等复杂的文本时,可以根据语义信息对这些复杂的文本进行合并。进一步地,由于在该合并文本和其对应的译文均为完整文本的情况下,才输出该合并文本,因此使得得到的合并文本的语义更加通顺。如此,提高了对图片中的文本的处理能力。The embodiment of the present application provides a picture processing method. After obtaining multiple texts and target information in the target picture, at least one text that satisfies the semantic information among the incomplete texts determined according to the target information can be combined. , to obtain a merged text, so when the text in the picture includes complex text such as column text, page text, or deformed and irregular text, these complex texts can be merged according to semantic information. Further, since the combined text is output only when the combined text and its corresponding translation are both complete texts, the semantics of the resulting combined text are more fluent. In this way, the ability to process the text in the picture is improved.
可选地,上述第一完整度包括N个文本中每个文本的第一目标句子的句子完整度,第二完整度包括第一译文中每个译文的第二目标句子的句子完整度;相应地,上述S101具体可以包括通过下述S101A至S101C实现。Optionally, the above-mentioned first completeness includes the sentence completeness of the first target sentence of each text in the N texts, and the second completeness includes the sentence completeness of the second target sentence of each translation in the first translation; correspondingly Specifically, the above S101 may specifically be implemented through the following S101A to S101C.
S101A、图片处理装置提取目标图片中包括的文本,得到N个文本。S101A. The image processing device extracts text included in the target image to obtain N texts.
S101B、图片处理装置基于第一语义信息,分析第一目标句子的句子完整度。S101B. The image processing device analyzes the sentence completeness of the first target sentence based on the first semantic information.
其中,上述第一语义信息可以包括以下至少一项:句型结构信息、句子成分信息、词组构成信息。第一目标子句可以包括以下至少一项:文本中的第一个句子、文本中的最后一个句子。Wherein, the above-mentioned first semantic information may include at least one of the following items: sentence structure information, sentence component information, and phrase composition information. The first target clause may include at least one of the following: the first sentence in the text, and the last sentence in the text.
可选地,基于第一语义信息,分析第一目标句子的句子完整度,可以包括如下两种可能的实现方式:Optionally, analyzing the sentence integrity of the first target sentence based on the first semantic information may include the following two possible implementations:
实现方式一:将第一语义信息作为一种预设规则,分析第一目标句子的句子完整度。Implementation method 1: using the first semantic information as a preset rule to analyze the sentence integrity of the first target sentence.
示例性地,从句型结构、词组构成、句子成分等第一语义信息上进行判断,将可能不完整的文本都筛选出来。以图2(a)为例。S6的第一个句子是“道一家新开业的店”,通过句子成分信息进行分析可知,S6的第一个句子缺少了主语,因此认为S6是不完整的。Exemplarily, judgments are made based on the first semantic information such as sentence structure, phrase composition, sentence components, etc., and all possible incomplete texts are screened out. Take Figure 2(a) as an example. The first sentence of S6 is "a newly opened store in Dao". According to the analysis of sentence component information, the first sentence of S6 lacks a subject, so S6 is considered incomplete.
示例性地,基于句首词、句尾词和短语等第一语义信息判断。可以构建不同类型的语言的词表、短语表及句尾结束词表,并为词表中的每一个词设置一个权重;其中,权重可以根据一个词的使用频率设置。如此,可以基于词组构成信息,判断文本的最后一个句子的最后一个字是否为常见的句尾词,或者,判断文本的第一个词和最后一个词能否单独成词或短语,以确定该文本行段落是否完整。如图2(a)所示,S2的最后一个词是“朋”,而通过词表可知,“朋”能单独成词并且能作为结尾词的概率很低,因此认为S2是不完整的。Exemplarily, it is judged based on the first semantic information such as the first word of the sentence, the word at the end of the sentence, and the phrase. Vocabulary, phrase table and end-of-sentence vocabulary for different types of languages can be constructed, and a weight can be set for each word in the vocabulary; wherein, the weight can be set according to the frequency of use of a word. In this way, based on the phrase composition information, it can be judged whether the last word of the last sentence of the text is a common sentence ending word, or whether the first word and the last word of the text can be independently formed into words or phrases to determine the Whether the text line paragraph is complete. As shown in Figure 2(a), the last word of S2 is "Peng", and it can be seen from the vocabulary that the probability of "Peng" being able to form a word alone and be used as an ending word is very low, so S2 is considered to be incomplete.
实现方式二:构建第一语义信息对应的语义模型,将N个文本输入该语义模型中,分析第一目标句子的句子完整度。Implementation method 2: construct a semantic model corresponding to the first semantic information, input N texts into the semantic model, and analyze the sentence integrity of the first target sentence.
具体地,可以使用词法、句法结构、句尾词等特征的文本数据,训练语义模型,通过语义模型,设置不同类型的语言的词性、句法结构等第一语义信息。如此,可以直接使用语义模型,判断当前文本的目标句子是否完整。Specifically, text data with features such as lexical structure, syntactic structure, and sentence ending words can be used to train a semantic model, and the first semantic information such as part of speech and syntactic structure of different types of languages can be set through the semantic model. In this way, the semantic model can be directly used to judge whether the target sentence of the current text is complete.
需要说明的是,在构建语义模型时,同时输出不完整句子的词法、句法结构及可能缺失的句子成分等信息。本申请实施例对语义模型的具体算法不作限定,只要根据不同类型的语言构建对应的模型训练数据即可。It should be noted that when constructing the semantic model, information such as the morphology and syntactic structure of the incomplete sentence and possible missing sentence components are output at the same time. The embodiment of the present application does not limit the specific algorithm of the semantic model, as long as the corresponding model training data is constructed according to different types of languages.
S101C、图片处理装置基于第二语义信息,分析第二目标句子的句子完整度。S101C. The image processing device analyzes the sentence completeness of the second target sentence based on the second semantic information.
其中,上述第二语义信息分别包括以下至少一项:句型结构信息、句子成分信息、词组构成信息。Wherein, the above-mentioned second semantic information respectively includes at least one of the following items: sentence structure information, sentence component information, and phrase composition information.
可选地,对于基于第二语义信息,分析第二目标句子的句子完整度的具体实施方式,可以参照上述实施例中对S101B的详细描述,本申请实施例对此不再赘述。Optionally, for a specific implementation manner of analyzing the sentence completeness of the second target sentence based on the second semantic information, reference may be made to the detailed description of S101B in the foregoing embodiment, which will not be repeated in this embodiment of the present application.
可选地,在上述S101B之后,S102之前,本申请实施例提供的图片处理方法还可以包括下述S104。Optionally, after the above S101B and before S102, the image processing method provided in this embodiment of the present application may further include the following S104.
S104、图片处理装置根据第一目标句子的句子完整度,从该N个文本中确定P个文本。S104. The image processing device determines P texts from the N texts according to the sentence completeness of the first target sentence.
其中,上述P个文本与第三译文对应。Wherein, the above P texts correspond to the third translation.
示例二,结合上述示例一,根据第一语义信息,由于S2的最后一个句子的最后一个词是“朋”,不能作为句尾结束词,因此该最后一个句子不完整,即S2不完整;S3的第一个句子的第一个词是“友”,不能作为句首词,因此该第一个句子不完整,即S3不完整;S4的第一个句子是“了很多东西”,该第一个句子缺少主谓语,因此该第一个句子不完整,即S4不完整;S5的最后一个句子是“我知”,该最后一个句子缺少宾语,因此该最后一个句子不完整,即S5不完整;S6的第一个句子是“道……”,该第一个句子缺少主语,因此该第一个句子不完整,即S6不完整;S7的最后一个句子是“不能”,不能作为句尾结束词,因此该最后一个句子不完整,即S7不完整;S8的第一个句子是“和”,不能作为句首词,因此该第一个句子不完整,即S8不完整。如此,可以从S1至S8中确定出不完整段落S2至S8。Example 2, in combination with the above example 1, according to the first semantic information, since the last word of the last sentence of S2 is "friend", it cannot be used as the word at the end of the sentence, so the last sentence is incomplete, that is, S2 is incomplete; S3 The first word of the first sentence of S4 is "friend", which cannot be used as the first word of the sentence, so the first sentence is incomplete, that is, S3 is incomplete; the first sentence of S4 is "a lot of things", the first sentence A sentence lacks a subject predicate, so this first sentence is incomplete, that is, S4 is incomplete; the last sentence of S5 is "I know", and this last sentence lacks an object, so this last sentence is incomplete, that is, S5 is not Complete; the first sentence of S6 is "Tao...", the first sentence lacks a subject, so the first sentence is incomplete, that is, S6 is incomplete; the last sentence of S7 is "cannot", which cannot be used as a sentence The end word, so the last sentence is incomplete, that is, S7 is incomplete; the first sentence of S8 is "and", which cannot be used as the beginning word of a sentence, so the first sentence is incomplete, that is, S8 is incomplete. In this way, incomplete paragraphs S2 to S8 can be determined from S1 to S8.
本申请实施例提供的图片处理方法,由于可以提取目标图片中包括的文本,得到N个文本,并基于第一语义信息,分析第一目标句子的句子完整度,以及基于第二语义信息,分析第二目标句子的句子完整度,即可以确定N个文本的完整度和N个文本的译文的完整度。The picture processing method provided by the embodiment of the present application can extract the text included in the target picture, obtain N texts, and analyze the sentence completeness of the first target sentence based on the first semantic information, and analyze the completeness of the first target sentence based on the second semantic information The sentence completeness of the second target sentence, that is, the completeness of the N texts and the completeness of the translations of the N texts can be determined.
进一步地,由于可以根据第一目标句子的句子完整度,从该N个文本中确定P个文本,因此便于之后从该P个文本中选择出满足第一语义信息的文本进行合并。Further, since P texts can be determined from the N texts according to the sentence integrity of the first target sentence, it is convenient to select texts satisfying the first semantic information from the P texts for merging.
可选地,在上述S101之后,S102之前,本申请实施例提供的图片处理方法还可以包括下述S105至S108。Optionally, after the above S101 and before S102, the image processing method provided in the embodiment of the present application may further include the following S105 to S108.
S105、图片处理装置根据第一语义信息,从P个文本中获取与P个文本中的第二文本匹配的至少两个文本。S105. The image processing apparatus acquires at least two texts from the P texts that match the second text in the P texts according to the first semantic information.
可选地,对于第一语义信息的描述,可以参照上述实施例中的详细描述,本申请实施例对此不再赘述。Optionally, for the description of the first semantic information, reference may be made to the detailed description in the foregoing embodiments, which will not be repeated in this embodiment of the present application.
可选地,上述第二文本为P个文本中的任意一个文本。例如,第二文本为从P个文本中分布位置最靠前的一个文本。Optionally, the above-mentioned second text is any one of the P texts. For example, the second text is the most front-distributed text among the P texts.
可选地,上述S105具体可以包括:图片处理装置根据第一语义信息,判断P个文本中第二文本与该P个文本中除该第二文本之外的其他任意文本是否可以合并,从而获取与第二文本匹配的至少两个文本。Optionally, the above S105 may specifically include: the image processing device judges, according to the first semantic information, whether the second text among the P texts can be merged with any text other than the second text among the P texts, so as to obtain At least two texts that match the second text.
进一步地,上述S105中“与P个文本中的第二文本匹配的至少两个文本”是指:第二文本与至少两个文本满足第一语义信息。Further, "at least two texts matching the second text among the P texts" in the above S105 means: the second text and at least two texts satisfy the first semantic information.
S106、图片处理装置将该第二文本分别与该至少两个文本合并,得到至少两个合并文本。S106. The image processing apparatus merges the second text with the at least two texts respectively to obtain at least two merged texts.
可选地,对于上述S106可以包括如下两种具体的可能实施方式:Optionally, the above S106 may include the following two specific possible implementation manners:
(1)直接将第二文本分别与该至少两个文本合并,得到至少两个合并文本。(1) Directly merging the second text with the at least two texts respectively to obtain at least two merged texts.
(2)将第二文本的最后一个句子和该至少两个文本中每个文本的第一个句子合并,得到至少两个合并句子。(2) Merging the last sentence of the second text with the first sentence of each of the at least two texts to obtain at least two merged sentences.
S107、图片处理装置确定该至少两个合并文本的句子困惑度。S107. The image processing device determines the sentence perplexity of the at least two merged texts.
其中,句子困惑度用于指示合并文本中的句子的通顺程度。Among them, the sentence perplexity is used to indicate the smoothness of the sentences in the merged text.
可以理解的是,确定至少两个合并文本的句子困惑度,实质上是分别确定至少两个合并文本中包括的合并句子的句子困惑度。It can be understood that determining the sentence perplexity of the at least two merged texts is essentially determining the sentence perplexity of the merged sentences included in the at least two merged texts respectively.
需要说明的是,句子困惑度越低,句子的通顺度越高,从而语义的正确度越高;反之,句子困惑度越高,句子的通顺度越低,从而语义的正确度越低。It should be noted that the lower the sentence perplexity, the higher the fluency of the sentence, and thus the higher the semantic correctness; on the contrary, the higher the sentence perplexity, the lower the fluency of the sentence, and thus the lower the semantic correctness.
S108、图片处理装置将目标合并文本对应的第三文本确定为该第二文本对应的待合并文本。S108. The image processing apparatus determines the third text corresponding to the target merged text as the text to be merged corresponding to the second text.
其中,上述目标合并文本为至少两个合并文本中句子困惑度最低的文本。S个文本包括第二文本和第三文本。Wherein, the above-mentioned target merged text is the text with the lowest sentence perplexity among at least two merged texts. The S texts include the second text and the third text.
需要说明的是,由于句子困惑度越低,句子的通顺度越高,因此在将句子困惑度最低的第三文本确定为该第二文本对应的待合并文本之后,可以该合并该第二文本和该第三文本。It should be noted that since the lower the sentence perplexity, the higher the fluency of the sentence, after determining the third text with the lowest sentence perplexity as the text to be merged corresponding to the second text, the second text can be merged and the third text.
本申请实施例提供的图片处理方法,在根据第一语义信息,从P个文本中获取与该P个文本中的第二文本匹配的至少两个文本之后,由于可以将该第二文本分别与该至少两个文本合并,得到至少两个合并文本,并确定该至少两个合并文本的句子困惑度,因此可以根据两个合并文本的困惑度,从该至少两个文本中选择与该第二文本更为匹配的待合并文本,从而提高了文本合并的正确性。In the image processing method provided by the embodiment of the present application, after obtaining at least two texts matching the second text of the P texts from the P texts according to the first semantic information, since the second text can be matched with the The at least two texts are merged to obtain at least two merged texts, and the sentence perplexity of the at least two merged texts is determined, so the at least two texts can be selected from the at least two texts according to the perplexity of the two merged texts. The text to be merged is more matched to the text, thereby improving the correctness of the text merge.
可选地,在上述S101之后,S102之前,本申请实施例提供的图片处理方法还可以包括下述S109和S110。即上述具体可以通过S110至S112实现。Optionally, after the above S101 and before S102, the image processing method provided in the embodiment of the present application may further include the following S109 and S110. That is, the above details can be realized through S110 to S112.
S109、图片处理装置根据该P个文本中每个文本的分布位置,从该P个文本中确定相邻的Q个文本。S109. The image processing device determines adjacent Q texts from the P texts according to the distribution position of each text in the P texts.
其中,Q为大于或等于S的整数。Wherein, Q is an integer greater than or equal to S.
需要说明的是,通过P个文本中每个文本的分布位置,明确可以合并的两个文本的分布位置,以此排除一些明显无法合并以构成同一个段落的文本。如此,从P个文本中确定分布位置相邻的Q个文本。It should be noted that, through the distribution position of each text in the P texts, the distribution positions of the two texts that can be merged are clarified, so as to exclude some texts that obviously cannot be merged to form the same paragraph. In this way, Q texts with adjacent distribution positions are determined from the P texts.
具体地,若两个文本中间包括其他文本,则说明无法合并该两个文本,即不能进行跨行合并这两个文本。当然,也可以记录下无法进行合并的文本的编号。Specifically, if other texts are included between the two texts, it means that the two texts cannot be merged, that is, the two texts cannot be merged across lines. Of course, it is also possible to record the numbers of texts that cannot be merged.
示例性的,如图2(a)所示,图片中包括文本S1、S2……和S8。从这8个文本的分布位置上,由于两个文本行段落间还存在多个文本行段落,因此可以确定S2和S7、S2和S6、S2和S8无法合并,从而可以将无法合并的文本的编号记录在非合并列表not_merge_list=[S2_S7、S2_S6、S2_S8]。Exemplarily, as shown in FIG. 2(a), the picture includes texts S1, S2... and S8. From the distribution positions of these 8 texts, since there are multiple text line paragraphs between the two text line paragraphs, it can be determined that S2 and S7, S2 and S6, S2 and S8 cannot be merged, so that the text that cannot be merged can be The numbers are recorded in the non-merge list not_merge_list=[S2_S7, S2_S6, S2_S8].
可以理解的是,由于两个文本的合并是存在顺序关系的,因此非合并列表中编号的先后顺序可以代表实际的合并顺序。例如,在无法合并列表中S2_S7代表的是S2的下一句不是S7,但不代表S7的下一句不能是S2。It can be understood that, since the merging of two texts has a sequence relationship, the sequence of numbers in the non-merging list can represent the actual merging sequence. For example, S2_S7 in the unmergeable list means that the next sentence of S2 is not S7, but it does not mean that the next sentence of S7 cannot be S2.
S110、图片处理装置将该Q个文本中确定满足第一语义信息的S个文本,确定为待合并文本。S110. The image processing device determines S texts that satisfy the first semantic information among the Q texts as texts to be merged.
可选地,对于第一语义信息的描述,可以参照上述实施例中的详细描述,本申请实施例对此不再赘述。Optionally, for the description of the first semantic information, reference may be made to the detailed description in the foregoing embodiments, which will not be repeated in this embodiment of the present application.
可选地,对于从Q个文本中确定满足第一语义信息的S个文本的实施方式,可以参照上述实施例中的S105至S108中的详细说明。具体可以包括:Optionally, for the implementation of determining S texts satisfying the first semantic information from the Q texts, reference may be made to the detailed descriptions in S105 to S108 in the foregoing embodiment. Specifically can include:
(1)根据第一语义信息,从Q个文本中获取与该Q个文本中的文本1匹配的至少两个文本。(1) According to the first semantic information, at least two texts matching text 1 in the Q texts are obtained from the Q texts.
(2)将该文本1分别与该至少两个文本合并,得到至少两个合并文本。(2) Combining the text 1 with the at least two texts respectively to obtain at least two merged texts.
(3)确定该至少两个合并文本的句子困惑度。(3) Determine the sentence perplexity of the at least two merged texts.
(4)将合并文本1对应的文本2确定为该文本1对应的待合并文本。该合并文本1为至少两个合并文本中句子困惑度最低的文本。S个文本中包括文本1和文本2。(4) Determine the text 2 corresponding to the merged text 1 as the text to be merged corresponding to the text 1 . The merged text 1 is the text with the lowest sentence perplexity among at least two merged texts. The S texts include text 1 and text 2.
需要说明的是,若根据第一语义信息,未获取到与文本2匹配的文本,则S个文本中仅包括文本1和文本2,从而通过上述实施例中的(1)至(4)就可以实现从Q个文本中确定满足语义信息的S个文本;It should be noted that if no text matching text 2 is obtained according to the first semantic information, only text 1 and text 2 are included in the S texts, so that through (1) to (4) in the above embodiment, It is possible to determine S texts satisfying semantic information from Q texts;
若根据第一语义信息,获取到与文本2匹配的其他文本,则说明S个文本中还包括除文本1和文本2之外的其他文本,从而可以继续循环执行上述实施例中的(1)至(4),以确定与文本1和文本2匹配的其他文本。If other texts that match text 2 are obtained according to the first semantic information, it means that the S texts also include other texts except text 1 and text 2, so that (1) in the above embodiment can be continued to be executed in a loop to (4) to determine other texts that match text1 and text2.
如此,通过上述实施方式,可以从Q个文本中得到满足第一语义信息的S个文本,并将其确定为待合并文本。In this way, through the above implementation manner, S texts satisfying the first semantic information can be obtained from the Q texts and determined as texts to be merged.
可以理解的是,由于可以根据P个文本中每个文本的分布位置,从该P个文本中确定分布位置相邻的Q个文本,因此可以排除一些在分布位置上不存在合并可能的文本,从而减少了电子设备不必要的文本合并操作。进一步地,由于可以将Q个文本中满足第一语义信息的S个文本,确定为待合并文本,因此在通过分布位置的粗略筛选后,通过第一语义信息,从Q个文本中确定待合并文本,从而使得合并后的文本的语义通顺度较高。It can be understood that, since the Q texts adjacent to the distribution position can be determined from the P texts according to the distribution position of each text in the P texts, some texts that do not have the possibility of merging at the distribution position can be excluded, Therefore, unnecessary text merging operations of the electronic device are reduced. Further, since the S texts satisfying the first semantic information among the Q texts can be determined as the texts to be merged, after the rough screening of the distribution position, the first semantic information is used to determine the texts to be merged from the Q texts text, so that the semantic fluency of the merged text is higher.
可选地,在上述S110之后,S102之前,本申请实施例提供的图片处理方法还可以包括下述S111。相应地,上述S102具体可以通过下述S102A实现。Optionally, after the above S110 and before S102, the image processing method provided in this embodiment of the present application may further include the following S111. Correspondingly, the above S102 may specifically be implemented through the following S102A.
S111、图片处理装置根据第一语义信息,确定S个文本的目标排列顺序。S111. The image processing apparatus determines a target arrangement sequence of the S texts according to the first semantic information.
可以理解的是,由于第一语义信息中包括句型结构信息、句子成分信息和词组构成信息等,因此,根据句子的成分信息和词组构成信息,可以确定为文本的排列顺序。It can be understood that, since the first semantic information includes sentence structure information, sentence component information and phrase composition information, etc., the arrangement order of the text can be determined according to the sentence component information and phrase composition information.
S102A、图片处理装置按照该目标排列顺序,合并该S个文本,得到第一文本。S102A. The image processing apparatus combines the S texts according to the target arrangement sequence to obtain the first text.
需要说明的是,按照目标排列顺序,合并S个文本,本质上是:合并S个文本中排列顺序相邻的两个文本中的一个文本的最后一个句子和另一个文本的第一个句子,如此循环直至完成合并S个文本,以得到第一文本。It should be noted that merging S texts according to the target order is essentially: merging the last sentence of one text and the first sentence of the other text among the two adjacent texts in S texts, This loops until the S texts are merged to obtain the first text.
示例性的,以第一语义信息为句型结构信息和句子成分信息为例。假设文本A的最后一个句子为“我知”,这是一个主谓结构;文本B的第一个句子为“道一家新开业的店”,这是动宾结构。根据句型结构信息、句子成分信息和词组构成信息,可以知道文本A缺少宾语,文本B缺少主语,且“知”和“道”符合词组构成信息,从而可以确定文本A和文本B的排列顺序为A_B。即在文本A的句尾合并文本B。As an example, take the first semantic information as sentence structure information and sentence component information as an example. Assume that the last sentence of text A is "I know", which is a subject-predicate structure; the first sentence of text B is "Dao is a newly opened shop", which is a verb-object structure. According to the sentence structure information, sentence component information and phrase composition information, it can be known that text A lacks an object, text B lacks a subject, and "zhi" and "dao" conform to the phrase composition information, so that the arrangement order of text A and text B can be determined for A_B. That is, text B is merged at the end of text A.
示例性的,以第一语义信息为词组构成信息为例。假设文本C的最后一个句子的最后一个词为“朋”;文本D的第一个句子的第一个词为“友”。根据词组构成信息,可以知道文本C中的“朋”和文本D中的“友”符合词组构成信息,从而可以确定文本C和文本D的排列顺序为C_D。即在文本A的句尾合并文本B。Exemplarily, take the first semantic information as phrase composition information as an example. Suppose the last word of the last sentence of text C is "friend"; the first word of the first sentence of text D is "friend". According to the phrase composition information, it can be known that "friend" in text C and "friend" in text D conform to the phrase composition information, so it can be determined that the arrangement order of text C and text D is C_D. That is, text B is merged at the end of text A.
本申请实施例提供的图片处理方法,由于可以根据第一语义信息,确定S个文本的目标排列顺序,因此在按照该目标排列顺序,合并该S个文本,得到第一文本之后,使得第一文本的语义更为完整,且不易出现语义矛盾的问题。The image processing method provided by the embodiment of the present application can determine the target arrangement order of the S texts according to the first semantic information, so after merging the S texts according to the target arrangement order to obtain the first text, the first The semantics of the text are more complete, and the problem of semantic contradictions is not easy to appear.
可选地,本申请实施例提供的图片处理方法还可以包括另一种可能的实现方式。该方法还可以包括下述S112至S115。Optionally, the image processing method provided in the embodiment of the present application may also include another possible implementation manner. The method may also include the following S112 to S115.
S112、获取目标图片中的M个文本。S112. Acquire M texts in the target image.
S113、在M个文本中的T个文本段落为非完整文本的情况下,合并T个文本中满足第三语义信息的L个文本,得到第四文本。S113. In the case that T text paragraphs in the M texts are incomplete texts, merge L texts satisfying the third semantic information among the T texts to obtain a fourth text.
其中,M、T和L均为大于1的整数;Wherein, M, T and L are all integers greater than 1;
可选地,对于第三语义信息的说明,可以参照上述实施例中对第一语义信息的相关描 述,本申请实施例对此不再赘述。Optionally, for the description of the third semantic information, reference may be made to the relevant description of the first semantic information in the foregoing embodiments, which will not be repeated in this embodiment of the present application.
S114、在第四文本为完整文本的情况下,图片处理装置对该第四文本进行翻译,得到第四译文。S114. If the fourth text is a complete text, the image processing device translates the fourth text to obtain a fourth translation.
可选地,上述第四译文可以包括一个语言类型的译文,或包括多种语言类型的译文。本申请实施例对第四译文的数量和语言类型不作限定。Optionally, the fourth translation may include translations in one language type, or translations in multiple language types. The embodiment of the present application does not limit the number and language types of the fourth translations.
示例性地,第四文本为中文类型的文本,第四译文为英文类型的译文;或者,第四文本为英文类型的文本,第四译文包括中文类型的译文、韩文类型的译文。Exemplarily, the fourth text is a Chinese-type text, and the fourth translation is an English-type translation; or, the fourth text is an English-type text, and the fourth translation includes a Chinese-type translation and a Korean-type translation.
S115、在第四文本和第四译文均为完整文本的情况下,图片处理装置输出该第四文本和该第四译文。S115. In the case that both the fourth text and the fourth translation are complete texts, the image processing device outputs the fourth text and the fourth translation.
示例性的,假设第二文本为中文文本。在确定该中文文本为完整文本的情况下,对该中文文本进行翻译,得到英文译文。在该英文译文为完整文本的情况下,图片处理装置可以输出该中文文本和该英文译文。Exemplarily, it is assumed that the second text is Chinese text. When the Chinese text is determined to be a complete text, the Chinese text is translated to obtain an English translation. If the English translation is a complete text, the image processing device can output the Chinese text and the English translation.
本申请实施例提供的图片处理方法,在获取目标图片中的M个文本之后,由于可以合并T个文本中满足第三语义信息的L个文本,得到第四文本,并对第四文本进行翻译,得到第四译文,因此在该第四文本为完整文本的情况下,且在该第四译文为完整段落的情况下,才输出该第一文本和该第一译文,从而可以在判断合并后得到的第四文本是否完整的基础上,再结合对第四译文的完整度的判断,以确定是否输出该第四文本,从而提高了段落合并的准确性。进一步地,由于还可以输出第四译文,因此在需要对目标图片中的文本进行翻译的场景中,可以输出准确性较高的译文。In the image processing method provided by the embodiment of the present application, after obtaining the M texts in the target image, since the L texts satisfying the third semantic information among the T texts can be merged, the fourth text is obtained, and the fourth text is translated , to get the fourth translation, so when the fourth translation is a complete text, and when the fourth translation is a complete paragraph, the first text and the first translation are output, so that after judging the combination Based on whether the obtained fourth text is complete, combined with the judgment of the completeness of the fourth translation, it is determined whether to output the fourth text, thereby improving the accuracy of paragraph merging. Further, since the fourth translation can also be output, in a scene where the text in the target picture needs to be translated, a translation with higher accuracy can be output.
可选地,在上述S114之后,本申请实施例提供的图片处理方法还可以包括下述S116和S117。Optionally, after the above S114, the image processing method provided in the embodiment of the present application may further include the following S116 and S117.
S116、在第四译文为非完整文本的情况下,图片处理装置合并T个文本中的R个文本,得到第五文本。S116. In the case that the fourth translation is an incomplete text, the image processing device merges R texts among the T texts to obtain a fifth text.
其中,上述R个文本包括根据第四译文的语义信息确定的段落,R为大于1的整数。Wherein, the above R texts include paragraphs determined according to the semantic information of the fourth translation, and R is an integer greater than 1.
可选地,上述R个文本可以包括L个文本中的全部文本,或包括L个文本中的部分文本,具体根据实际情况确定,本申请实施例中对此不作限定。Optionally, the above R texts may include all of the L texts, or some of the L texts, which are determined according to actual conditions, which is not limited in this embodiment of the present application.
需要说明的是,R个文本为T个文本中满足语义信息的文本。It should be noted that the R texts are texts satisfying semantic information among the T texts.
进一步地,在第四译文为非完整文本的情况下,根据第四译文的语义信息,可以从T个文本中获取满足语义信息的其他文本,并将第四文本与该其他文本合并。可以理解的是,第四文本与该其他文本的文本合并位置与第四译文中语义不完整的文本位置对应。Further, when the fourth translation is an incomplete text, according to the semantic information of the fourth translation, other texts satisfying the semantic information can be obtained from the T texts, and the fourth text can be combined with the other texts. It can be understood that the text merging position of the fourth text and the other text corresponds to the semantically incomplete text position in the fourth translation.
S117、在该第五文本和第五译文均为完整段落的情况下,图片处理装置输出该第三文本和该第五译文。S117. In the case that the fifth text and the fifth translation are complete paragraphs, the image processing device outputs the third text and the fifth translation.
其中,上述第五译文为第五文本对应的译文。Wherein, the above-mentioned fifth translation is the translation corresponding to the fifth text.
可选地,对于判断第五文本和第五译文为完整文本的说明,可以参照上述实施例中对第一文本的说明,本申请实施例对此不再赘述。Optionally, for the description of judging that the fifth text and the fifth translation are complete texts, reference may be made to the description of the first text in the foregoing embodiment, which will not be repeated in this embodiment of the present application.
需要说明的是,由于对图片中的文本进行翻译,就是为了得到语义正确的译文,因此译文的完整度是图片翻译的重点。如果译文不完整,即便图片中原本的文本合并后的段落(也称原文段落)是完整的,也需要按照译文中语义不完整的文本位置,在原文段落的相应位置合并满足语义信息的文本,从而再次经过翻译之后判断译文的完整性,以保障最终输出的文本的完整性。It should be noted that since the translation of the text in the picture is to obtain a semantically correct translation, the completeness of the translation is the focus of picture translation. If the translation is incomplete, even if the original text in the picture is merged into a paragraph (also called the original text paragraph) is complete, it is necessary to merge the text that satisfies the semantic information at the corresponding position of the original paragraph according to the position of the semantically incomplete text in the translation. Therefore, the integrity of the translation is judged after translation again to ensure the integrity of the final output text.
可以理解的是,从原文段落进行合并,可以保证原文段落的完整度。在原文段落为完整段落的情况下,才能在该原文段落经过翻译模型进行翻译之后得到有效的译文,反之,如果仅从译文上进行合并,是很难得到满足语义信息的译文的。It can be understood that merging from the original text paragraphs can ensure the integrity of the original text paragraphs. Only when the original paragraph is a complete paragraph can an effective translation be obtained after the original paragraph is translated by the translation model. On the contrary, if only the translation is merged, it is difficult to obtain a translation that satisfies the semantic information.
可选地,在上述S116之后,在S117之前,本申请实施例提供的图片处理方法还可以包括:在第五文本为完整文本的情况下,对第五文本进行翻译得到第五译文。如此,在合并后的第五文本为完整文本的情况下,才进行翻译流程,从而避免在合并后的文本为非完整文本落的情况下,进行无效的翻译操作,也节省了电子设备的运行资源。Optionally, after the above S116 and before S117, the image processing method provided in the embodiment of the present application may further include: if the fifth text is a complete text, translating the fifth text to obtain a fifth translation. In this way, the translation process is performed only when the merged fifth text is a complete text, thereby avoiding invalid translation operations when the merged text is an incomplete text, and also saving the operation of electronic equipment resource.
本申请实施例提供的图片处理方法,由于在第四译文为非完整译文的情况下,可以合并T个文本中的R个文本,得到第五文本,因此可以根据非完整的第四译文,重新对T个文本中满足语义信息的R个文本进行合并,从而提高了文本合并的准确性。进一步地,由于在第五文本和第五译文均为完整文本的情况下,才输出第五文本和第五译文,因此可以保证输出准确性较高的译文。In the image processing method provided by the embodiment of the present application, since the fourth translation is an incomplete translation, the R texts among the T texts can be combined to obtain the fifth text. Therefore, according to the incomplete fourth translation, re- The R texts satisfying the semantic information among the T texts are merged, thereby improving the accuracy of text merging. Further, since the fifth text and the fifth translation are output only when both the fifth text and the fifth translation are complete texts, it can be ensured that a translation with higher accuracy is output.
本申请实施例提供的图片处理方法,执行主体可以为图片处理装置。本申请实施例中以图片处理装置执行图片处理的方法为例,说明本申请实施例提供的The image processing method provided in the embodiment of the present application may be executed by an image processing device. In the embodiment of the present application, the method of image processing performed by the image processing device is taken as an example to illustrate the method provided by the embodiment of the present application.
如图3所示,本申请实施例提供一种图片处理装置200,该图片处理装置可以包括获取模块201、处理模块202和输出模块203。获取模块201,可以用于获取目标图片中包括的N个文本和目标信息,该目标信息包括以下至少一项:该N个文本的第一完整度,该N个文本对应的第一译文的第二完整度,N为大于1的整数。处理模块202,可以用于合并P个文本中满足第一语义信息的S个文本,得到第一文本,该P个文本为根据该第一完整度从该N个文本中确定的非完整文本,P和S均为大于1的整数。输出模块203,可以用于在该第一文本和第二译文为完整文本的情况下,输出第一文本,该第二译文为第三译文中与该S个文本对应的译文合并后得到的文本,该第三译文为根据该第二完整度从该第一译文中确定的非完整译文。As shown in FIG. 3 , an embodiment of the present application provides an image processing apparatus 200 , and the image processing apparatus may include an acquisition module 201 , a processing module 202 and an output module 203 . The acquiring module 201 may be configured to acquire N texts and target information included in the target picture, where the target information includes at least one of the following items: the first completeness of the N texts, the first translation of the N texts corresponding to the first Two completeness, N is an integer greater than 1. The processing module 202 may be configured to merge S texts satisfying the first semantic information among the P texts to obtain the first text, the P texts are incomplete texts determined from the N texts according to the first completeness, Both P and S are integers greater than 1. The output module 203 may be configured to output the first text when the first text and the second translation are complete texts, and the second translation is the text obtained after merging translations corresponding to the S texts in the third translation , the third translation is an incomplete translation determined from the first translation according to the second degree of completeness.
可选地,第一完整度包括N个文本中每个文本的第一目标句子的句子完整度,第二完整度包括第一译文中每个译文的第二目标句子的句子完整度。获取模块201,具体用于提取目标图片中包括的文本,得到N个文本;并基于第一语义信息,分析第一目标句子的句子完整度;以及基于第二语义信息,分析第二目标句子的句子完整度;其中,第一目标句子和第二目标句子分别包括以下至少一项:文本中的第一个句子、文本中的最后一个句子;第一语义信息和第二语义信息分别包括以下至少一项:句型结构信息、句子成分信息、词组构成信息;Optionally, the first completeness includes the sentence completeness of the first target sentence in each of the N texts, and the second completeness includes the sentence completeness of the second target sentence in each of the first translations. The acquisition module 201 is specifically used to extract the text included in the target picture to obtain N texts; and based on the first semantic information, analyze the sentence integrity of the first target sentence; and based on the second semantic information, analyze the text of the second target sentence Sentence completeness; wherein, the first target sentence and the second target sentence respectively include at least one of the following: the first sentence in the text, the last sentence in the text; the first semantic information and the second semantic information respectively include at least the following One item: sentence structure information, sentence component information, and phrase composition information;
可选地,图片处理装置还可以包括确定模块。确定模块,可以用于根据第一目标句子的句子完整度,从N个文本中确定P个文本,P个文本与第三译文对应。Optionally, the image processing apparatus may further include a determination module. The determination module can be used to determine P texts from the N texts according to the sentence integrity of the first target sentence, and the P texts correspond to the third translation.
可选地,图片处理装置还可以包括确定模块。获取模块201,还可以用于根据第一语义信息,从P个文本中获取与P个文本中的第二文本匹配的至少两个文本。处理模块202,还可以用于将第二文本分别与至少两个文本合并,得到至少两个合并文本。确定模块,用于将目标合并文本对应的第三文本确定为第二文本对应的待合并文本,该目标合并文本为至少两个合并文本中句子困惑度最低的文本;其中,S个文本包括第二文本和第三文本。Optionally, the image processing apparatus may further include a determination module. The obtaining module 201 may also be configured to obtain at least two texts from the P texts that match the second text in the P texts according to the first semantic information. The processing module 202 may also be configured to merge the second text with at least two texts to obtain at least two merged texts. A determination module, configured to determine the third text corresponding to the target merged text as the text to be merged corresponding to the second text, the target merged text is the text with the lowest sentence perplexity in at least two merged texts; wherein, the S text includes the first Second text and third text.
可选地,图片处理装置还可以包括确定模块。确定模块,可以用于根据P个文本中每个文本的分布位置,从P个文本中确定相邻的Q个文本,Q为大于或等于S的整数;并将该Q个文本中满足第一语义信息的S个文本,确定为待合并文本。Optionally, the image processing apparatus may further include a determination module. The determining module can be used to determine adjacent Q texts from the P texts according to the distribution position of each text in the P texts, and Q is an integer greater than or equal to S; S texts of semantic information are determined as texts to be merged.
可选地,确定模块,还可以用于根据第一语义信息,确定S个文本的目标排列顺序。处理模块,可以具体用于按照该目标排列顺序,合并该S个文本,得到第一文本。Optionally, the determination module may also be configured to determine the target arrangement order of the S texts according to the first semantic information. The processing module may be specifically configured to combine the S texts according to the target arrangement order to obtain the first text.
本申请实施例提供一种图片处理装置,在获取到目标图片中的多个文本和目标信息之后,由于可以合并该多个文本中根据目标信息确定的非完整文本中满足语义信息的至少一个文本,得到一个合并文本,因此当图片中的文本包括分栏文本、分页文本或者畸形不规则文本等复杂的文本时,可以根据语义信息对这些复杂的文本进行合并。进一步地,由于在该合并文本和其对应的译文均为完整文本的情况下,才输出该合并文本,因此使得得到的合并文本的语义更加通顺。如此,提高了对图片中的文本的处理能力。The embodiment of the present application provides an image processing device. After acquiring multiple texts and target information in the target picture, at least one text that satisfies semantic information among the incomplete texts determined according to the target information can be combined. , to obtain a merged text, so when the text in the picture includes complex text such as column text, page text, or deformed and irregular text, these complex texts can be merged according to semantic information. Further, since the combined text is output only when the combined text and its corresponding translation are both complete texts, the semantics of the resulting combined text are more fluent. In this way, the ability to process the text in the picture is improved.
本申请实施例中的图片处理装置可以是电子设备,也可以是电子设备中的部件,例如集成电路或芯片。该电子设备可以是终端,也可以为除终端之外的其他设备。示例性的,电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、移动上网装置(Mobile Internet Device,MID)、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、机器人、可穿戴设备、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本或者个人数字助理(personal digital assistant,PDA)等,还可以为服务器、网络附属存储器(Network Attached Storage,NAS)、个人计算机(personal computer,PC)、电视机(television,TV)、柜员机或者自助机等,本申请实施例不作具体限定。The image processing apparatus in the embodiment of the present application may be an electronic device, or may be a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or other devices other than the terminal. Exemplarily, the electronic device can be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle electronic device, a mobile Internet device (Mobile Internet Device, MID), an augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) ) equipment, robots, wearable devices, ultra-mobile personal computer (ultra-mobile personal computer, UMPC), netbook or personal digital assistant (personal digital assistant, PDA), etc., can also serve as server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (television, TV), teller machine, or self-service machine, etc., which are not specifically limited in this embodiment of the present application.
本申请实施例中的图片处理装置可以为具有操作系统的装置。该操作系统可以为安卓(Android)操作系统,可以为ios操作系统,还可以为其他可能的操作系统,本申请实施例不作具体限定。The picture processing device in the embodiment of the present application may be a device with an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, which are not specifically limited in this embodiment of the present application.
本申请实施例提供的图片处理装置能够实现图1和图2的方法实施例实现的各个过程,为避免重复,这里不再赘述。The image processing apparatus provided in the embodiment of the present application can realize various processes realized by the method embodiments in FIG. 1 and FIG. 2 , and details are not repeated here to avoid repetition.
可选的,如图4所示,本申请实施例还提供一种电子设备300,包括处理器301和存储器302,存储器302上存储有可在所述处理器301上运行的程序或指令,该程序或指令被处理器301执行时实现上述图片处理方法实施例的各个步骤,且能达到相同的技术效果,为避免重复,这里不再赘述。Optionally, as shown in FIG. 4 , the embodiment of the present application also provides an electronic device 300, including a processor 301 and a memory 302. The memory 302 stores programs or instructions that can run on the processor 301. The When the programs or instructions are executed by the processor 301, the various steps of the above image processing method embodiments can be achieved, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
需要注意的是,本申请实施例中的电子设备包括上述的移动电子设备和非移动电子设备。It should be noted that the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.
图5为实现本申请实施例的一种电子设备的硬件结构示意图。FIG. 5 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
该电子设备400包括但不限于:射频单元401、网络模块402、音频输出单元403、输入单元404、传感器405、显示单元406、用户输入单元407、接口单元408、存储器409、以及处理器410等部件。The electronic device 400 includes, but is not limited to: a radio frequency unit 401, a network module 402, an audio output unit 403, an input unit 404, a sensor 405, a display unit 406, a user input unit 407, an interface unit 408, a memory 409, and a processor 410, etc. part.
本领域技术人员可以理解,电子设备400还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器410逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图5中示出的电子设备结构并不构成对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。Those skilled in the art can understand that the electronic device 400 can also include a power supply (such as a battery) for supplying power to various components, and the power supply can be logically connected to the processor 410 through the power management system, so that the management of charging, discharging, and function can be realized through the power management system. Consumption management and other functions. The structure of the electronic device shown in FIG. 5 does not constitute a limitation to the electronic device. The electronic device may include more or fewer components than shown in the figure, or combine certain components, or arrange different components, and details will not be repeated here. .
其中,处理器410,可以用于获取目标图片中包括的N个文本和目标信息,该目标信息包括以下至少一项:该N个文本的第一完整度,该N个文本对应的第一译文的第二完整度,N为大于1的整数;并用于合并P个文本中满足第一语义信息的S个文本,得到第一文本,该P个文本为根据该第一完整度从该N个文本中确定的非完整文本,P和S均为大于1的整数;以及用于在该第一文本和第二译文为完整文本的情况下,输出第一文本,该第二译文为第三译文中与该S个文本对应的译文合并后得到的文本,该第三译文为根据该第二完整度从该第一译文中确定的非完整译文。Wherein, the processor 410 may be configured to acquire N texts and target information included in the target picture, where the target information includes at least one of the following items: the first completeness of the N texts, the first translation corresponding to the N texts The second completeness degree, N is an integer greater than 1; and used to merge S texts satisfying the first semantic information in the P texts to obtain the first text, the P texts are obtained from the N texts according to the first completeness degree The incomplete text determined in the text, P and S are both integers greater than 1; and used to output the first text when the first text and the second translation are complete texts, and the second translation is the third translation The text obtained after merging the translations corresponding to the S texts, the third translation is an incomplete translation determined from the first translation according to the second degree of completeness.
可选地,第一完整度包括所述N个文本中每个文本的第一目标句子的句子完整度,第二完整度包括第一译文中每个译文的第二目标句子的句子完整度。处理器410,具体用于提取目标图片中包括的文本,得到N个文本;并基于第一语义信息,分析第一目标句子的句子完整度;以及基于第二语义信息,分析第二目标句子的句子完整度;其中,第一目标句子和第二目标句子分别包括以下至少一项:文本中的第一个句子、文本中的最后一个句子;第一语义信息和第二语义信息分别包括以下至少一项:句型结构信息、句子成分信息、词组构成信息。Optionally, the first completeness includes the sentence completeness of the first target sentence in each of the N texts, and the second completeness includes the sentence completeness of the second target sentence in each of the first translations. The processor 410 is specifically configured to extract text included in the target picture to obtain N texts; and analyze the sentence integrity of the first target sentence based on the first semantic information; and analyze the sentence integrity of the second target sentence based on the second semantic information. Sentence completeness; wherein, the first target sentence and the second target sentence respectively include at least one of the following: the first sentence in the text, the last sentence in the text; the first semantic information and the second semantic information respectively include at least the following One item: sentence structure information, sentence component information, and phrase composition information.
可选地,处理器410,可以用于根据第一目标句子的句子完整度,从N个文本中确定P个文本,P个文本与第三译文对应。Optionally, the processor 410 may be configured to determine P texts from the N texts according to the sentence completeness of the first target sentence, and the P texts correspond to the third translation.
可选地,处理器410,还可以用于根据第一语义信息,从P个文本中获取与P个文本中的第二文本匹配的至少两个文本;并用于将第二文本分别与至少两个文本合并,得到至少两个合并文本;以及用于将目标合并文本对应的第三文本确定为第二文本对应的待合并文本,该目标合并文本为至少两个合并文本中句子困惑度最低的文本;其中,S个文本包括第二文本和第三文本。Optionally, the processor 410 may also be configured to obtain at least two texts from the P texts that match the second text in the P texts according to the first semantic information; Texts are merged to obtain at least two merged texts; and the third text corresponding to the target merged text is determined as the text to be merged corresponding to the second text, and the target merged text is the lowest sentence perplexity in at least two merged texts text; wherein, the S texts include the second text and the third text.
可选地,处理器410,可以用于根据P个文本中每个文本的分布位置,从P个文本中确定相邻的Q个文本,Q为大于或等于S的整数;并用于将该Q个文本中满足第一语义信息的S个文本,确定为待合并文本。Optionally, the processor 410 may be configured to determine adjacent Q texts from the P texts according to the distribution position of each text in the P texts, where Q is an integer greater than or equal to S; S texts satisfying the first semantic information among the texts are determined as texts to be merged.
可选地,处理器410,还可以用于根据第一语义信息,确定S个文本的目标排列顺序;并用于按照该目标排列顺序,合并该S个文本,得到第一文本。Optionally, the processor 410 may also be configured to determine a target arrangement order of the S texts according to the first semantic information; and to combine the S texts according to the target arrangement order to obtain the first text.
本申请实施例提供一种电子设备,在获取到目标图片中的多个文本和目标信息之后,由于可以合并该多个文本中根据目标信息确定的非完整文本中满足语义信息的至少一个文本,得到一个合并文本,因此当图片中的文本包括分栏文本、分页文本或者畸形不规则文本等复杂的文本时,可以根据语义信息对这些复杂的文本进行合并。进一步地,由于在该合并文本和其对应的译文均为完整文本的情况下,才输出该合并文本,因此使得得到的合并文本的语义更加通顺。如此,提高了对图片中的文本的处理能力。An embodiment of the present application provides an electronic device. After acquiring multiple texts and target information in the target picture, at least one text that satisfies semantic information among the incomplete texts determined according to the target information among the multiple texts can be combined, A merged text is obtained, so when the text in the picture includes complex texts such as column text, page text, or deformed and irregular text, these complex texts can be merged according to semantic information. Further, since the combined text is output only when the combined text and its corresponding translation are both complete texts, the semantics of the resulting combined text are more fluent. In this way, the ability to process the text in the picture is improved.
应理解的是,本申请实施例中,输入单元404可以包括图形处理器(graphics processing unit,GPU)4041和麦克风4042,图形处理器4041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元406可包括显示面板4061,可以采用液晶显示器、有机发光二极管等形式来配置显示面板4061。用户输入单元407包括触控面板4071以及其他输入设备4072中的至少一种。触控面板4071,也称为触摸屏。触控面板4071可包括触摸检测装置和触摸控制器两个部分。其他输入设备4072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。It should be understood that, in this embodiment of the present application, the input unit 404 may include a graphics processing unit (graphics processing unit, GPU) 4041 and a microphone 4042, and the graphics processing unit 4041 is compatible with the image capturing device ( Such as the image data of the still picture or video obtained by the camera) for processing. The display unit 406 may include a display panel 4061, and the display panel 4061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 407 includes at least one of a touch panel 4071 and other input devices 4072 . The touch panel 4071 is also called a touch screen. The touch panel 4071 may include two parts, a touch detection device and a touch controller. Other input devices 4072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be repeated here.
存储器409可用于存储软件程序以及各种数据。存储器109可主要包括存储程序或指令的第一存储区和存储数据的第二存储区,其中,第一存储区可存储操作系统、至少一个功能所需的应用程序或指令(比如声音播放功能、图像播放功能等)等。此外,存储器109可以包括易失性存储器或非易失性存储器,或者,存储器x09可以包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请实施例中的存储器109包括但不限于这些和任意其它适合类型的存储器。The memory 409 can be used to store software programs as well as various data. The memory 109 can mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area can store an operating system, an application program or instructions required by at least one function (such as a sound playing function, image playback function, etc.), etc. Furthermore, memory 109 may include volatile memory or nonvolatile memory, or memory x09 may include both volatile and nonvolatile memory. Among them, the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash. Volatile memory can be random access memory (Random Access Memory, RAM), static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random access memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (Synch link DRAM , SLDRAM) and Direct Memory Bus Random Access Memory (Direct Rambus RAM, DRRAM). The memory 109 in the embodiment of the present application includes but is not limited to these and any other suitable types of memory.
处理器110可包括一个或多个处理单元;可选的,处理器110集成应用处理器和调制解调处理器,其中,应用处理器主要处理涉及操作系统、用户界面和应用程序等的操作,调制解调处理器主要处理无线通信信号,如基带处理器。可以理解的是,上述调制解调处理器也可以不集成到处理器110中。The processor 110 may include one or more processing units; optionally, the processor 110 integrates an application processor and a modem processor, wherein the application processor mainly processes operations related to the operating system, user interface, and application programs, etc., Modem processors mainly process wireless communication signals, such as baseband processors. It can be understood that the foregoing modem processor may not be integrated into the processor 110 .
本申请实施例还提供一种可读存储介质,该可读存储介质上存储有程序或指令,该程 序或指令被处理器执行时实现上述图片处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。The embodiment of the present application also provides a readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by a processor, each process of the above-mentioned image processing method embodiment is realized, and the same Technical effects, in order to avoid repetition, will not be repeated here.
其中,处理器为上述实施例中电子设备中的处理器。可读存储介质,包括计算机可读存储介质,如计算机只读存储器ROM、随机存取存储器RAM、磁碟或者光盘等。Wherein, the processor is the processor in the electronic device in the above embodiment. The readable storage medium includes a computer-readable storage medium, such as a computer read-only memory ROM, a random access memory RAM, a magnetic disk or an optical disk, and the like.
本申请实施例另提供了一种芯片,该芯片包括处理器和通信接口,通信接口和处理器耦合,处理器用于运行程序或指令,实现上述图片处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。The embodiment of the present application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to realize the various processes of the above-mentioned image processing method embodiments, and can achieve the same To avoid repetition, the technical effects will not be repeated here.
应理解,本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片上系统芯片等。It should be understood that the chips mentioned in the embodiments of the present application may also be called system-on-chip, system-on-chip, system-on-a-chip, or system-on-a-chip.
本申请实施例提供一种计算机程序产品,该程序产品被存储在存储介质中,该程序产品被至少一个处理器执行以实现如上述图片处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。The embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the various processes in the above image processing method embodiment, and can achieve the same technical effect , to avoid repetition, it will not be repeated here.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element. In addition, it should be pointed out that the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. Functions are performed, for example, the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以计算机软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例中的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present application can be embodied in the form of computer software products, which are stored in a storage medium (such as ROM/RAM, magnetic disk, etc.) , CD-ROM), including several instructions to make a terminal (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) execute the method in each embodiment of the present application.
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。The embodiments of the present application have been described above in conjunction with the accompanying drawings, but the present application is not limited to the above-mentioned specific implementations. The above-mentioned specific implementations are only illustrative and not restrictive. Those of ordinary skill in the art will Under the inspiration of this application, without departing from the purpose of this application and the scope of protection of the claims, many forms can also be made, all of which belong to the protection of this application.

Claims (17)

  1. 一种图片处理方法,所述方法包括:An image processing method, the method comprising:
    获取目标图片中包括的N个文本和目标信息,所述目标信息包括以下至少一项:所述N个文本的第一完整度,所述N个文本对应的第一译文的第二完整度,N为大于1的整数;Acquiring N texts and target information included in the target picture, the target information including at least one of the following: the first completeness of the N texts, the second completeness of the first translation corresponding to the N texts, N is an integer greater than 1;
    合并P个文本中满足第一语义信息的S个文本,得到第一文本,所述P个文本为根据所述第一完整度从所述N个文本中确定的非完整文本,P和S均为大于1的整数;Merging S texts satisfying the first semantic information among the P texts to obtain the first text, the P texts are incomplete texts determined from the N texts according to the first completeness, and both P and S are is an integer greater than 1;
    在所述第一文本和第二译文均为完整文本的情况下,输出所述第一文本,所述第二译文为第三译文中与所述S个文本对应的译文合并后得到的文本,所述第三译文为根据所述第二完整度从所述第一译文中确定的非完整译文。In the case that both the first text and the second translation are complete texts, outputting the first text, the second translation is a text obtained after merging translations corresponding to the S texts in the third translation, The third translation is an incomplete translation determined from the first translation according to the second degree of completeness.
  2. 根据权利要求1所述的方法,其中,所述第一完整度包括所述N个文本中每个文本的第一目标句子的句子完整度,所述第二完整度包括所述第一译文中每个译文的第二目标句子的句子完整度;The method according to claim 1, wherein the first completeness includes the sentence completeness of the first target sentence of each text in the N texts, and the second completeness includes the sentence completeness in the first translation. Sentence completeness of the second target sentence for each translation;
    所述获取目标图片中包括的N个文本和目标信息,包括:The N texts and target information included in the acquisition target picture include:
    提取所述目标图片中包括的文本,得到所述N个文本;Extracting the text included in the target picture to obtain the N texts;
    基于所述第一语义信息,分析所述第一目标句子的句子完整度;analyzing the sentence completeness of the first target sentence based on the first semantic information;
    基于第二语义信息,分析所述第二目标句子的句子完整度;analyzing the sentence completeness of the second target sentence based on the second semantic information;
    其中,所述第一目标句子和所述第二目标句子分别包括以下至少一项:文本中的第一个句子、文本中的最后一个句子;Wherein, the first target sentence and the second target sentence respectively include at least one of the following: the first sentence in the text, the last sentence in the text;
    所述第一语义信息和所述第二语义信息分别包括以下至少一项:句型结构信息、句子成分信息、词组构成信息。The first semantic information and the second semantic information respectively include at least one of the following items: sentence structure information, sentence component information, and phrase composition information.
  3. 根据权利要求2所述的方法,其中,基于所述第一语义信息,分析所述第一目标句子的句子完整度之后,所述合并P个文本中满足第一语义信息的S个文本,得到第一文本之前,所述方法还包括:The method according to claim 2, wherein, based on the first semantic information, after analyzing the sentence integrity of the first target sentence, the S texts satisfying the first semantic information among the P texts are combined to obtain Before the first text, the method also includes:
    根据所述第一目标句子的句子完整度,从所述N个文本中确定所述P个文本,所述P个文本与所述第三译文对应。The P texts are determined from the N texts according to the sentence completeness of the first target sentence, and the P texts correspond to the third translation.
  4. 根据权利要求1所述的方法,其中,所述合并所述P个文本中满足第一语义信息的S个文本,得到第一文本之前,所述方法还包括:The method according to claim 1, wherein said merging the S texts satisfying the first semantic information among the P texts, before obtaining the first text, the method further comprises:
    根据所述第一语义信息,从所述P个文本中获取与所述P个文本中的第二文本匹 配的至少两个文本;According to the first semantic information, at least two texts matching the second text in the P texts are obtained from the P texts;
    将所述第二文本分别与所述至少两个文本合并,得到至少两个合并文本;Merging the second text with the at least two texts respectively to obtain at least two merged texts;
    确定所述至少两个合并文本的句子困惑度,所述句子困惑度用于指示合并文本中的句子的通顺程度;determining the sentence perplexity of the at least two merged texts, the sentence perplexity being used to indicate the smoothness of sentences in the merged text;
    将目标合并文本对应的第三文本确定为所述第二文本对应的待合并文本,所述目标合并文本为所述至少两个合并文本中句子困惑度最低的文本;Determining the third text corresponding to the target merged text as the text to be merged corresponding to the second text, the target merged text being the text with the lowest sentence perplexity among the at least two merged texts;
    其中,所述S个文本包括所述第二文本和所述第三文本。Wherein, the S texts include the second text and the third text.
  5. 根据权利要求1所述的方法,其中,所述合并所述P个文本中满足第一语义信息的S个文本,得到第一文本之前,所述方法还包括:The method according to claim 1, wherein said merging the S texts satisfying the first semantic information among the P texts, before obtaining the first text, the method further comprises:
    根据所述P个文本中每个文本的分布位置,从所述P个文本中确定分布位置相邻的Q个文本,Q为大于或等于S的整数;According to the distribution position of each text in the P texts, determine Q texts with adjacent distribution positions from the P texts, where Q is an integer greater than or equal to S;
    将所述Q个文本中满足所述第一语义信息的S个文本,确定为待合并文本。Determining S texts satisfying the first semantic information among the Q texts as texts to be merged.
  6. 根据权利要求5所述的方法,其中,所述将所述Q个文本中确定满足第一语义信息的S个文本,确定为待合并文本之后,所述方法还包括:The method according to claim 5, wherein, after determining the S texts satisfying the first semantic information among the Q texts as texts to be merged, the method further comprises:
    根据所述第一语义信息,确定所述S个文本的目标排列顺序;determining the target sequence of the S texts according to the first semantic information;
    所述合并所述P个文本中满足第一语义信息的S个文本,得到第一文本,包括:The merging of the S texts satisfying the first semantic information among the P texts to obtain the first text includes:
    按照所述目标排列顺序,合并所述S个文本,得到所述第一文本。Merge the S texts according to the target arrangement sequence to obtain the first text.
  7. 一种图片处理装置,所述图片处理装置包括获取模块、处理模块和输出模块;A picture processing device, the picture processing device comprising an acquisition module, a processing module and an output module;
    所述获取模块,用于获取目标图片中包括的N个文本和目标信息,所述目标信息包括以下至少一项:所述N个文本的第一完整度,所述N个文本对应的第一译文的第二完整度,N为大于1的整数;The acquiring module is configured to acquire N texts and target information included in the target picture, and the target information includes at least one of the following items: the first completeness of the N texts, the first degree of the N texts corresponding to The second degree of completeness of the translation, N is an integer greater than 1;
    所述处理模块,用于合并P个文本中满足第一语义信息的S个文本,得到第一文本,所述P个文本为根据所述第一完整度从所述N个文本中确定的非完整文本,P和S均为大于1的整数;The processing module is configured to combine S texts satisfying the first semantic information among the P texts to obtain a first text, and the P texts are non-identified texts determined from the N texts according to the first completeness degree. Complete text, both P and S are integers greater than 1;
    所述输出模块,用于在所述第一文本和第二译文为完整文本的情况下,输出所述第一文本,所述第二译文为第三译文中与所述S个文本对应的译文合并后得到的文本,所述第三译文为根据所述第二完整度从所述第一译文中确定的非完整译文。The output module is configured to output the first text when the first text and the second translation are complete texts, and the second translation is a translation corresponding to the S texts in the third translation In the combined text, the third translation is an incomplete translation determined from the first translation according to the second degree of completeness.
  8. 根据权利要求7所述的装置,其中,所述第一完整度包括所述N个文本中每个文本的第一目标句子的句子完整度,所述第二完整度包括所述第一译文中每个译文 的第二目标句子的句子完整度;The apparatus according to claim 7, wherein the first completeness includes the sentence completeness of the first target sentence of each of the N texts, and the second completeness includes the sentence completeness in the first translation. Sentence completeness of the second target sentence for each translation;
    所述获取模块,具体用于提取所述目标图片中包括的文本,得到所述N个文本;并基于所述第一语义信息,分析所述第一目标句子的句子完整度;以及基于第二语义信息,分析所述第二目标句子的句子完整度;The acquisition module is specifically configured to extract the text included in the target picture to obtain the N texts; and based on the first semantic information, analyze the sentence integrity of the first target sentence; and based on the second Semantic information, analyzing the sentence completeness of the second target sentence;
    其中,所述第一目标句子和所述第二目标句子分别包括以下至少一项:文本中的第一个句子、文本中的最后一个句子;Wherein, the first target sentence and the second target sentence respectively include at least one of the following: the first sentence in the text, the last sentence in the text;
    所述第一语义信息和所述第二语义信息分别包括以下至少一项:句型结构信息、句子成分信息、词组构成信息。The first semantic information and the second semantic information respectively include at least one of the following items: sentence structure information, sentence component information, and phrase composition information.
  9. 根据权利要求8所述的装置,其中,所述图片处理装置还包括确定模块;The device according to claim 8, wherein the image processing device further comprises a determining module;
    所述确定模块,用于根据所述第一目标句子的句子完整度,从所述N个文本中确定所述P个文本,所述P个文本与所述第三译文对应。The determination module is configured to determine the P texts from the N texts according to the sentence completeness of the first target sentence, and the P texts correspond to the third translation.
  10. 根据权利要求7所述的装置,其中,所述图片处理装置还包括确定模块;The device according to claim 7, wherein the image processing device further comprises a determining module;
    所述获取模块,还用于根据所述第一语义信息,从所述P个文本中获取与所述P个文本中的第二文本匹配的至少两个文本;The obtaining module is further configured to obtain at least two texts from the P texts that match the second text of the P texts according to the first semantic information;
    所述处理模块,还用于将所述第二文本分别与所述至少两个文本合并,得到至少两个合并文本;The processing module is further configured to merge the second text with the at least two texts respectively to obtain at least two merged texts;
    所述确定模块,用于将目标合并文本对应的第三文本确定为所述第二文本对应的待合并文本,所述目标合并文本为所述至少两个合并文本中句子困惑度最低的文本;The determination module is configured to determine the third text corresponding to the target merged text as the text to be merged corresponding to the second text, and the target merged text is the text with the lowest sentence perplexity among the at least two merged texts;
    其中,所述S个文本包括所述第二文本和所述第三文本。Wherein, the S texts include the second text and the third text.
  11. 根据权利要求7所述的装置,其中,所述图片处理装置还包括确定模块;The device according to claim 7, wherein the image processing device further comprises a determining module;
    所述确定模块,用于根据所述P个文本中每个文本的分布位置,从所述P个文本中确定分布位置相邻的Q个文本,Q为大于或等于S的整数;并将所述Q个文本中满足所述第一语义信息的S个文本,确定为待合并文本。The determination module is configured to determine Q texts with adjacent distribution positions from the P texts according to the distribution position of each text in the P texts, where Q is an integer greater than or equal to S; and Among the Q texts, S texts satisfying the first semantic information are determined as texts to be merged.
  12. 根据权利要求11所述的装置,其中,所述确定模块,还用于根据所述第一语义信息,确定所述S个文本的目标排列顺序;The device according to claim 11, wherein the determination module is further configured to determine the target arrangement order of the S texts according to the first semantic information;
    所述处理模块,具体用于按照所述目标排列顺序,合并所述S个文本,得到所述第一文本。The processing module is specifically configured to combine the S texts according to the target arrangement order to obtain the first text.
  13. 一种电子设备,包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1-6中任一项 所述的图片处理方法的步骤。An electronic device, comprising a processor and a memory, the memory stores programs or instructions that can run on the processor, and when the programs or instructions are executed by the processor, any one of claims 1-6 is implemented. The steps of the picture processing method described in item.
  14. 一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1-6中任一项所述的图片处理方法的步骤。A readable storage medium, storing programs or instructions on the readable storage medium, and implementing the steps of the image processing method according to any one of claims 1-6 when the programs or instructions are executed by a processor.
  15. 一种计算机程序产品,所述程序产品被至少一个处理器执行以实现如权利要求1-6中任一项所述的图片处理方法。A computer program product, the program product is executed by at least one processor to implement the picture processing method according to any one of claims 1-6.
  16. 一种电子设备,所述电子设备被配置成用于执行如权利要求1-6中任一项所述的图片处理方法。An electronic device configured to execute the image processing method according to any one of claims 1-6.
  17. 一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如权利要求1-6中任一项所述的图片处理方法。A chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, the processor is used to run programs or instructions, and realize the picture as described in any one of claims 1-6 Approach.
PCT/CN2022/136494 2021-12-10 2022-12-05 Image processing method and apparatus, and electronic device WO2023103943A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111509057.0 2021-12-10
CN202111509057.0A CN114299525A (en) 2021-12-10 2021-12-10 Picture processing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
WO2023103943A1 true WO2023103943A1 (en) 2023-06-15

Family

ID=80967753

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/136494 WO2023103943A1 (en) 2021-12-10 2022-12-05 Image processing method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN114299525A (en)
WO (1) WO2023103943A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114299525A (en) * 2021-12-10 2022-04-08 维沃移动通信有限公司 Picture processing method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9659224B1 (en) * 2014-03-31 2017-05-23 Amazon Technologies, Inc. Merging optical character recognized text from frames of image data
CN111368562A (en) * 2020-02-28 2020-07-03 北京字节跳动网络技术有限公司 Method and device for translating characters in picture, electronic equipment and storage medium
CN113343720A (en) * 2021-06-30 2021-09-03 北京搜狗科技发展有限公司 Subtitle translation method and device for subtitle translation
CN113660432A (en) * 2021-08-17 2021-11-16 安徽听见科技有限公司 Translation subtitle production method and device, electronic equipment and storage medium
CN114299525A (en) * 2021-12-10 2022-04-08 维沃移动通信有限公司 Picture processing method and device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9659224B1 (en) * 2014-03-31 2017-05-23 Amazon Technologies, Inc. Merging optical character recognized text from frames of image data
CN111368562A (en) * 2020-02-28 2020-07-03 北京字节跳动网络技术有限公司 Method and device for translating characters in picture, electronic equipment and storage medium
CN113343720A (en) * 2021-06-30 2021-09-03 北京搜狗科技发展有限公司 Subtitle translation method and device for subtitle translation
CN113660432A (en) * 2021-08-17 2021-11-16 安徽听见科技有限公司 Translation subtitle production method and device, electronic equipment and storage medium
CN114299525A (en) * 2021-12-10 2022-04-08 维沃移动通信有限公司 Picture processing method and device and electronic equipment

Also Published As

Publication number Publication date
CN114299525A (en) 2022-04-08

Similar Documents

Publication Publication Date Title
US10198506B2 (en) System and method of sentiment data generation
KR101130444B1 (en) System for identifying paraphrases using machine translation techniques
CN108334490B (en) Keyword extraction method and keyword extraction device
CN100452025C (en) System and method for auto-detecting collcation mistakes of file
AU2017408800B2 (en) Method and system of mining information, electronic device and readable storable medium
US8577882B2 (en) Method and system for searching multilingual documents
US10311113B2 (en) System and method of sentiment data use
JP5280642B2 (en) Translation system, translation program, and parallel translation data generation method
US20140316764A1 (en) Clarifying natural language input using targeted questions
KR20040025642A (en) Method and system for retrieving confirming sentences
JPWO2003065245A1 (en) Translation method, translation output method, storage medium, program, and computer apparatus
CN108920649B (en) Information recommendation method, device, equipment and medium
WO2022135474A1 (en) Information recommendation method and apparatus, and electronic device
WO2021159656A1 (en) Method, device, and equipment for semantic completion in a multi-round dialogue, and storage medium
WO2023103943A1 (en) Image processing method and apparatus, and electronic device
US20220365956A1 (en) Method and apparatus for generating patent summary information, and electronic device and medium
CN111950301A (en) English translation quality analysis method and system for Chinese translation and English translation
WO2003021391A2 (en) Method and apparatus for translating between two species of one generic language
Delecraz et al. Multimodal machine learning for natural language processing: disambiguating prepositional phrase attachments with images
CN110888940B (en) Text information extraction method and device, computer equipment and storage medium
JP2017015874A (en) Text reading comprehension support device, and annotation data creation device, annotation data creation method, and annotation data creation program
US20210263915A1 (en) Search Text Generation System and Search Text Generation Method
TWI376656B (en) Foreign-language learning method utilizing an original language to review corresponding foreign languages and foreign-language learning database system thereof
CN113157966B (en) Display method and device and electronic equipment
CN110020429A (en) Method for recognizing semantics and equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22903365

Country of ref document: EP

Kind code of ref document: A1