WO2023103943A1 - Procédé et appareil de traitement d'image, et dispositif électronique - Google Patents

Procédé et appareil de traitement d'image, et dispositif électronique Download PDF

Info

Publication number
WO2023103943A1
WO2023103943A1 PCT/CN2022/136494 CN2022136494W WO2023103943A1 WO 2023103943 A1 WO2023103943 A1 WO 2023103943A1 CN 2022136494 W CN2022136494 W CN 2022136494W WO 2023103943 A1 WO2023103943 A1 WO 2023103943A1
Authority
WO
WIPO (PCT)
Prior art keywords
texts
text
sentence
target
translation
Prior art date
Application number
PCT/CN2022/136494
Other languages
English (en)
Chinese (zh)
Inventor
刘池莉
Original Assignee
维沃移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 维沃移动通信有限公司 filed Critical 维沃移动通信有限公司
Publication of WO2023103943A1 publication Critical patent/WO2023103943A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables

Definitions

  • the present application belongs to the technical field of communications, and in particular relates to a picture processing method, device and electronic equipment.
  • electronic devices are more and more widely used, for example, electronic devices can recognize and process text in pictures.
  • the electronic device can combine the multiple lines of text in the picture according to the physical position coordinates and text layout of the text lines in the picture during the process of identifying the picture.
  • the electronic device may not be able to process the text in the picture according to the physical position coordinates of the text line and the text layout. merge. In this way, the electronic device has a poor processing ability for the text in the picture.
  • the purpose of the embodiments of the present application is to provide a picture processing method, device and electronic equipment, which can solve the problem that the electronic equipment has poor processing ability for text in pictures.
  • the embodiment of the present application provides a picture processing method, the method includes: acquiring N texts and target information included in the target picture, the target information includes at least one of the following: the first complete text of the N texts degree, the second completeness degree of the first translation corresponding to the N texts, N is an integer greater than 1; among the P texts, S texts satisfying the first semantic information are combined to obtain the first text, and the P texts are based on The first completeness is an incomplete text determined from the N texts, P and S are both integers greater than 1, and both P and S are integers greater than 1; when both the first text and the second translation are complete texts In the case of , output the first text, the second translation is the text obtained after merging the translations corresponding to the S texts in the third translation, and the third translation is determined from the first translation according to the second completeness Incomplete translation.
  • an embodiment of the present application provides an image processing apparatus, and the image processing apparatus includes: an acquisition module, a processing module, and an output module.
  • An acquisition module configured to acquire N texts and target information included in the target picture, the target information including at least one of the following: the first completeness of the N texts, the second completeness of the first translation corresponding to the N texts Degree, N is an integer greater than 1.
  • a processing module configured to merge S texts that satisfy the first semantic information among the P texts to obtain a first text, the P texts are incomplete texts determined from the N texts according to the first completeness, P and S is an integer greater than 1.
  • An output module configured to output the first text when the first text and the second translation are complete texts
  • the second translation is a text obtained after merging translations corresponding to the S texts in the third translation
  • the The third translation is an incomplete translation determined from the first translation according to the second degree of completeness.
  • an embodiment of the present application provides an electronic device, the electronic device includes a processor and a memory, the memory stores programs or instructions that can run on the processor, and the programs or instructions are executed by the processor When implementing the steps of the method in the first aspect above.
  • an embodiment of the present application provides a readable storage medium, on which a program or an instruction is stored, and when the program or instruction is executed by a processor, the steps of the method in the above first aspect are implemented.
  • the embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the method in the first aspect above.
  • an embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the method described in the first aspect.
  • the N texts and target information included in the target picture are obtained, and the target information includes at least one of the following items: the first completeness of the N texts, the first translation of the N texts corresponding to the first Two completeness, N is an integer greater than 1; merging S texts satisfying the first semantic information in the P texts to obtain the first text, the P texts are determined from the N texts according to the first completeness Incomplete text, both P and S are integers greater than 1, and both P and S are integers greater than 1; when the first text and the second translation are both complete texts, the first text is output, and the second translation is A text obtained by merging translations corresponding to the S texts in the third translation, where the third translation is an incomplete translation determined from the first translation according to the second degree of completeness.
  • FIG. 1 is a schematic diagram of an image processing method provided in an embodiment of the present application.
  • Fig. 2 (a) is one of the interface schematic diagrams of a picture processing provided by the embodiment of the present application.
  • Fig. 2(b) is the second schematic diagram of an image processing interface provided by the embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an image processing device provided in an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of hardware of an electronic device provided by an embodiment of the present application.
  • an embodiment of the present application provides an image processing method, and the method includes the following S101 to S103.
  • the image processing apparatus acquires N pieces of text and object information included in a target image.
  • the above-mentioned target information includes at least one of the following items: the first completeness degree of the N texts, the second completeness degree of the first translation corresponding to the N texts, and N is an integer greater than 1.
  • N is an integer greater than 1.
  • the above-mentioned target picture may be any of the following: a picture taken by the electronic device, a screenshot saved by the electronic device, and an online picture obtained by the electronic device.
  • the target picture may include multiple texts.
  • the N texts are texts in the plurality of texts.
  • the language types of the above N texts may be Chinese, English, Korean, Japanese and so on.
  • each of the above N texts may be a text in the traditional sense, or a text line. Specifically, it may be determined according to actual usage conditions, which is not limited in this embodiment of the present application.
  • the text line can be an independent single text line, and at this time, the single text line can be used as a text; or, the text line can be a certain text paragraph A line of text in .
  • the text content contained in the target picture may be identified through the picture text recognition technology, and the text content may specifically include: the text contained in the target picture, and the coordinates of the text.
  • the above-mentioned first translation may include translations in one language type, or translations in multiple language types. Specifically, it may be determined according to actual usage conditions, which is not limited in this embodiment of the present application.
  • first completeness and second completeness are determined according to semantic information.
  • semantic information For details, reference may be made to the detailed description of the following embodiments, and details are not described in this embodiment of the present application.
  • the image processing device merges S texts satisfying the first semantic information among the P texts to obtain the first text.
  • P texts are incomplete texts determined from the N texts according to the first completeness degree. Both P and S are integers greater than 1.
  • Scenario 1 The semantics of P texts are incomplete.
  • Scenario 2 The sentence structure of the first sentence or the last sentence of each text in the P texts is missing.
  • Scenario 3 The sentence-ending words in each of the P texts cannot form separate words.
  • the first semantic information may include at least one of the following: sentence structure information, sentence component information, and phrase composition information.
  • the sentence structure information may include at least one of the following: subject-predicate structure, verb-object structure, subject-verb-object structure, subject-verb-object definite complement structure, and the like.
  • the sentence component information may include at least one of the following: subject, predicate, object, attributive, adverbial, complement and so on.
  • phrase composition information may include at least one of the following: sentence beginning words, sentence ending words, common words, phrases, phrases, and the like.
  • first semantic information may also include other information related to semantics, which is not limited in this embodiment of the present application.
  • the description of the above-mentioned first semantic information is only a possible exemplary situation enumerated when the N texts are Chinese texts.
  • the semantic rules of other language types can be followed.
  • syntax to explain the semantic information which is not limited in this embodiment of the present application.
  • the P texts only include a group of texts conforming to the first semantic information, that is, the S texts are the group of texts; in another possible situation, the P texts The texts include multiple sets of texts conforming to the first semantic information, and the S texts are any set of texts in the multiple sets of texts.
  • the P texts include multiple groups of texts conforming to the first semantic information
  • the S texts which will not be discussed in this embodiment of the present application. repeat.
  • the image processing device outputs the first text.
  • the above-mentioned second translation is a text obtained by merging the translations corresponding to the S texts in the third translation
  • the third translation is an incomplete translation determined from the first translation according to the second degree of completeness.
  • Example 1 take the image processing device as a mobile phone as an example.
  • the picture contains text: S1 to S8; as shown in Figure 2(b), it is the translation corresponding to S1 to S8: 01 to 08 , the first translation.
  • the mobile phone can acquire the picture including text: S1 to S8, and the completeness of S1 to S8 and the completeness of the first translation. Since S2 to S8 are incomplete texts among the eight texts, the mobile phone can combine S2, S3 and S4 satisfying the first semantic information among S2 to S8 to obtain S9 after the combination of S2, S3 and S4. Afterwards, in the case that S9 and the translations obtained by merging 02 to 08 corresponding to S2 to S8 are all complete texts, output S9.
  • the image processing method provided in the embodiment of the present application may further include: the image processing apparatus determines that the first text is a complete text according to the first semantic information.
  • the embodiment of the present application provides a picture processing method. After obtaining multiple texts and target information in the target picture, at least one text that satisfies the semantic information among the incomplete texts determined according to the target information can be combined. , to obtain a merged text, so when the text in the picture includes complex text such as column text, page text, or deformed and irregular text, these complex texts can be merged according to semantic information. Further, since the combined text is output only when the combined text and its corresponding translation are both complete texts, the semantics of the resulting combined text are more fluent. In this way, the ability to process the text in the picture is improved.
  • the above-mentioned first completeness includes the sentence completeness of the first target sentence of each text in the N texts
  • the second completeness includes the sentence completeness of the second target sentence of each translation in the first translation; correspondingly
  • the above S101 may specifically be implemented through the following S101A to S101C.
  • the image processing device extracts text included in the target image to obtain N texts.
  • the image processing device analyzes the sentence completeness of the first target sentence based on the first semantic information.
  • first semantic information may include at least one of the following items: sentence structure information, sentence component information, and phrase composition information.
  • the first target clause may include at least one of the following: the first sentence in the text, and the last sentence in the text.
  • analyzing the sentence integrity of the first target sentence based on the first semantic information may include the following two possible implementations:
  • Implementation method 1 using the first semantic information as a preset rule to analyze the sentence integrity of the first target sentence.
  • the first semantic information such as sentence structure, phrase composition, sentence components, etc.
  • all possible incomplete texts are screened out.
  • the first sentence of S6 is "a newly opened store in Dao". According to the analysis of sentence component information, the first sentence of S6 lacks a subject, so S6 is considered incomplete.
  • the first semantic information such as the first word of the sentence, the word at the end of the sentence, and the phrase.
  • Vocabulary, phrase table and end-of-sentence vocabulary for different types of languages can be constructed, and a weight can be set for each word in the vocabulary; wherein, the weight can be set according to the frequency of use of a word.
  • the phrase composition information it can be judged whether the last word of the last sentence of the text is a common sentence ending word, or whether the first word and the last word of the text can be independently formed into words or phrases to determine the Whether the text line paragraph is complete.
  • the last word of S2 is "Peng", and it can be seen from the vocabulary that the probability of "Peng” being able to form a word alone and be used as an ending word is very low, so S2 is considered to be incomplete.
  • Implementation method 2 construct a semantic model corresponding to the first semantic information, input N texts into the semantic model, and analyze the sentence integrity of the first target sentence.
  • text data with features such as lexical structure, syntactic structure, and sentence ending words can be used to train a semantic model, and the first semantic information such as part of speech and syntactic structure of different types of languages can be set through the semantic model.
  • the semantic model can be directly used to judge whether the target sentence of the current text is complete.
  • semantic model information such as the morphology and syntactic structure of the incomplete sentence and possible missing sentence components are output at the same time.
  • the embodiment of the present application does not limit the specific algorithm of the semantic model, as long as the corresponding model training data is constructed according to different types of languages.
  • the image processing device analyzes the sentence completeness of the second target sentence based on the second semantic information.
  • the above-mentioned second semantic information respectively includes at least one of the following items: sentence structure information, sentence component information, and phrase composition information.
  • the image processing method provided in this embodiment of the present application may further include the following S104.
  • the image processing device determines P texts from the N texts according to the sentence completeness of the first target sentence.
  • Example 2 in combination with the above example 1, according to the first semantic information, since the last word of the last sentence of S2 is "friend", it cannot be used as the word at the end of the sentence, so the last sentence is incomplete, that is, S2 is incomplete; S3 The first word of the first sentence of S4 is "friend”, which cannot be used as the first word of the sentence, so the first sentence is incomplete, that is, S3 is incomplete; the first sentence of S4 is "a lot of things", the first sentence A sentence lacks a subject predicate, so this first sentence is incomplete, that is, S4 is incomplete; the last sentence of S5 is "I know", and this last sentence lacks an object, so this last sentence is incomplete, that is, S5 is not Complete; the first sentence of S6 is "Tao", the first sentence lacks a subject, so the first sentence is incomplete, that is, S6 is incomplete; the last sentence of S7 is "cannot", which cannot be used as a sentence The end word, so the last sentence is incomplete, that is, S7 is
  • the picture processing method provided by the embodiment of the present application can extract the text included in the target picture, obtain N texts, and analyze the sentence completeness of the first target sentence based on the first semantic information, and analyze the completeness of the first target sentence based on the second semantic information
  • the sentence completeness of the second target sentence that is, the completeness of the N texts and the completeness of the translations of the N texts can be determined.
  • P texts can be determined from the N texts according to the sentence integrity of the first target sentence, it is convenient to select texts satisfying the first semantic information from the P texts for merging.
  • the image processing method provided in the embodiment of the present application may further include the following S105 to S108.
  • the image processing apparatus acquires at least two texts from the P texts that match the second text in the P texts according to the first semantic information.
  • the above-mentioned second text is any one of the P texts.
  • the second text is the most front-distributed text among the P texts.
  • the above S105 may specifically include: the image processing device judges, according to the first semantic information, whether the second text among the P texts can be merged with any text other than the second text among the P texts, so as to obtain At least two texts that match the second text.
  • At least two texts matching the second text among the P texts means: the second text and at least two texts satisfy the first semantic information.
  • the image processing apparatus merges the second text with the at least two texts respectively to obtain at least two merged texts.
  • the above S106 may include the following two specific possible implementation manners:
  • the image processing device determines the sentence perplexity of the at least two merged texts.
  • the sentence perplexity is used to indicate the smoothness of the sentences in the merged text.
  • determining the sentence perplexity of the at least two merged texts is essentially determining the sentence perplexity of the merged sentences included in the at least two merged texts respectively.
  • the image processing apparatus determines the third text corresponding to the target merged text as the text to be merged corresponding to the second text.
  • the above-mentioned target merged text is the text with the lowest sentence perplexity among at least two merged texts.
  • the S texts include the second text and the third text.
  • the second text can be merged and the third text.
  • the image processing method provided by the embodiment of the present application, after obtaining at least two texts matching the second text of the P texts from the P texts according to the first semantic information, since the second text can be matched with the The at least two texts are merged to obtain at least two merged texts, and the sentence perplexity of the at least two merged texts is determined, so the at least two texts can be selected from the at least two texts according to the perplexity of the two merged texts.
  • the text to be merged is more matched to the text, thereby improving the correctness of the text merge.
  • the image processing method provided in the embodiment of the present application may further include the following S109 and S110. That is, the above details can be realized through S110 to S112.
  • the image processing device determines adjacent Q texts from the P texts according to the distribution position of each text in the P texts.
  • Q is an integer greater than or equal to S.
  • sequence of numbers in the non-merging list can represent the actual merging sequence.
  • S2_S7 in the unmergeable list means that the next sentence of S2 is not S7, but it does not mean that the next sentence of S7 cannot be S2.
  • the image processing device determines S texts that satisfy the first semantic information among the Q texts as texts to be merged.
  • At least two texts matching text 1 in the Q texts are obtained from the Q texts.
  • the merged text 1 is the text with the lowest sentence perplexity among at least two merged texts.
  • the S texts include text 1 and text 2.
  • the S texts also include other texts except text 1 and text 2, so that (1) in the above embodiment can be continued to be executed in a loop to (4) to determine other texts that match text1 and text2.
  • S texts satisfying the first semantic information can be obtained from the Q texts and determined as texts to be merged.
  • the Q texts adjacent to the distribution position can be determined from the P texts according to the distribution position of each text in the P texts, some texts that do not have the possibility of merging at the distribution position can be excluded, Therefore, unnecessary text merging operations of the electronic device are reduced. Further, since the S texts satisfying the first semantic information among the Q texts can be determined as the texts to be merged, after the rough screening of the distribution position, the first semantic information is used to determine the texts to be merged from the Q texts text, so that the semantic fluency of the merged text is higher.
  • the image processing method provided in this embodiment of the present application may further include the following S111.
  • the above S102 may specifically be implemented through the following S102A.
  • the image processing apparatus determines a target arrangement sequence of the S texts according to the first semantic information.
  • the arrangement order of the text can be determined according to the sentence component information and phrase composition information.
  • the image processing apparatus combines the S texts according to the target arrangement sequence to obtain the first text.
  • merging S texts according to the target order is essentially: merging the last sentence of one text and the first sentence of the other text among the two adjacent texts in S texts, This loops until the S texts are merged to obtain the first text.
  • sentence structure information As an example, take the first semantic information as sentence structure information and sentence component information as an example. Assume that the last sentence of text A is "I know”, which is a subject-predicate structure; the first sentence of text B is "Dao is a newly opened shop", which is a verb-object structure. According to the sentence structure information, sentence component information and phrase composition information, it can be known that text A lacks an object, text B lacks a subject, and "zhi" and "dao" conform to the phrase composition information, so that the arrangement order of text A and text B can be determined for A_B. That is, text B is merged at the end of text A.
  • phrase composition information Take the first semantic information as phrase composition information as an example.
  • the last word of the last sentence of text C is "friend”; the first word of the first sentence of text D is "friend”.
  • phrase composition information it can be known that "friend” in text C and "friend” in text D conform to the phrase composition information, so it can be determined that the arrangement order of text C and text D is C_D. That is, text B is merged at the end of text A.
  • the image processing method provided by the embodiment of the present application can determine the target arrangement order of the S texts according to the first semantic information, so after merging the S texts according to the target arrangement order to obtain the first text, the first The semantics of the text are more complete, and the problem of semantic contradictions is not easy to appear.
  • the image processing method provided in the embodiment of the present application may also include another possible implementation manner.
  • the method may also include the following S112 to S115.
  • M, T and L are all integers greater than 1;
  • the image processing device translates the fourth text to obtain a fourth translation.
  • the fourth translation may include translations in one language type, or translations in multiple language types.
  • the embodiment of the present application does not limit the number and language types of the fourth translations.
  • the fourth text is a Chinese-type text
  • the fourth translation is an English-type translation
  • the fourth text is an English-type text
  • the fourth translation includes a Chinese-type translation and a Korean-type translation.
  • the image processing device outputs the fourth text and the fourth translation.
  • the second text is Chinese text.
  • the Chinese text is translated to obtain an English translation. If the English translation is a complete text, the image processing device can output the Chinese text and the English translation.
  • the fourth text is obtained, and the fourth text is translated , to get the fourth translation, so when the fourth translation is a complete text, and when the fourth translation is a complete paragraph, the first text and the first translation are output, so that after judging the combination Based on whether the obtained fourth text is complete, combined with the judgment of the completeness of the fourth translation, it is determined whether to output the fourth text, thereby improving the accuracy of paragraph merging.
  • the fourth translation can also be output, in a scene where the text in the target picture needs to be translated, a translation with higher accuracy can be output.
  • the image processing method provided in the embodiment of the present application may further include the following S116 and S117.
  • the image processing device merges R texts among the T texts to obtain a fifth text.
  • R texts include paragraphs determined according to the semantic information of the fourth translation, and R is an integer greater than 1.
  • the above R texts may include all of the L texts, or some of the L texts, which are determined according to actual conditions, which is not limited in this embodiment of the present application.
  • the R texts are texts satisfying semantic information among the T texts.
  • the fourth translation is an incomplete text
  • other texts satisfying the semantic information can be obtained from the T texts, and the fourth text can be combined with the other texts. It can be understood that the text merging position of the fourth text and the other text corresponds to the semantically incomplete text position in the fourth translation.
  • the image processing device outputs the third text and the fifth translation.
  • the above-mentioned fifth translation is the translation corresponding to the fifth text.
  • the completeness of the translation is the focus of picture translation. If the translation is incomplete, even if the original text in the picture is merged into a paragraph (also called the original text paragraph) is complete, it is necessary to merge the text that satisfies the semantic information at the corresponding position of the original paragraph according to the position of the semantically incomplete text in the translation. Therefore, the integrity of the translation is judged after translation again to ensure the integrity of the final output text.
  • merging from the original text paragraphs can ensure the integrity of the original text paragraphs. Only when the original paragraph is a complete paragraph can an effective translation be obtained after the original paragraph is translated by the translation model. On the contrary, if only the translation is merged, it is difficult to obtain a translation that satisfies the semantic information.
  • the image processing method provided in the embodiment of the present application may further include: if the fifth text is a complete text, translating the fifth text to obtain a fifth translation.
  • the translation process is performed only when the merged fifth text is a complete text, thereby avoiding invalid translation operations when the merged text is an incomplete text, and also saving the operation of electronic equipment resource.
  • the fourth translation is an incomplete translation
  • the R texts among the T texts can be combined to obtain the fifth text. Therefore, according to the incomplete fourth translation, re- The R texts satisfying the semantic information among the T texts are merged, thereby improving the accuracy of text merging. Further, since the fifth text and the fifth translation are output only when both the fifth text and the fifth translation are complete texts, it can be ensured that a translation with higher accuracy is output.
  • the image processing method provided in the embodiment of the present application may be executed by an image processing device.
  • the method of image processing performed by the image processing device is taken as an example to illustrate the method provided by the embodiment of the present application.
  • an embodiment of the present application provides an image processing apparatus 200 , and the image processing apparatus may include an acquisition module 201 , a processing module 202 and an output module 203 .
  • the acquiring module 201 may be configured to acquire N texts and target information included in the target picture, where the target information includes at least one of the following items: the first completeness of the N texts, the first translation of the N texts corresponding to the first Two completeness, N is an integer greater than 1.
  • the processing module 202 may be configured to merge S texts satisfying the first semantic information among the P texts to obtain the first text, the P texts are incomplete texts determined from the N texts according to the first completeness, Both P and S are integers greater than 1.
  • the output module 203 may be configured to output the first text when the first text and the second translation are complete texts, and the second translation is the text obtained after merging translations corresponding to the S texts in the third translation , the third translation is an incomplete translation determined from the first translation according to the second degree of completeness.
  • the first completeness includes the sentence completeness of the first target sentence in each of the N texts
  • the second completeness includes the sentence completeness of the second target sentence in each of the first translations.
  • the acquisition module 201 is specifically used to extract the text included in the target picture to obtain N texts; and based on the first semantic information, analyze the sentence integrity of the first target sentence; and based on the second semantic information, analyze the text of the second target sentence Sentence completeness; wherein, the first target sentence and the second target sentence respectively include at least one of the following: the first sentence in the text, the last sentence in the text; the first semantic information and the second semantic information respectively include at least the following One item: sentence structure information, sentence component information, and phrase composition information;
  • the image processing apparatus may further include a determination module.
  • the determination module can be used to determine P texts from the N texts according to the sentence integrity of the first target sentence, and the P texts correspond to the third translation.
  • the image processing apparatus may further include a determination module.
  • the obtaining module 201 may also be configured to obtain at least two texts from the P texts that match the second text in the P texts according to the first semantic information.
  • the processing module 202 may also be configured to merge the second text with at least two texts to obtain at least two merged texts.
  • a determination module configured to determine the third text corresponding to the target merged text as the text to be merged corresponding to the second text, the target merged text is the text with the lowest sentence perplexity in at least two merged texts; wherein, the S text includes the first Second text and third text.
  • the image processing apparatus may further include a determination module.
  • the determining module can be used to determine adjacent Q texts from the P texts according to the distribution position of each text in the P texts, and Q is an integer greater than or equal to S; S texts of semantic information are determined as texts to be merged.
  • the determination module may also be configured to determine the target arrangement order of the S texts according to the first semantic information.
  • the processing module may be specifically configured to combine the S texts according to the target arrangement order to obtain the first text.
  • the embodiment of the present application provides an image processing device. After acquiring multiple texts and target information in the target picture, at least one text that satisfies semantic information among the incomplete texts determined according to the target information can be combined. , to obtain a merged text, so when the text in the picture includes complex text such as column text, page text, or deformed and irregular text, these complex texts can be merged according to semantic information. Further, since the combined text is output only when the combined text and its corresponding translation are both complete texts, the semantics of the resulting combined text are more fluent. In this way, the ability to process the text in the picture is improved.
  • the image processing apparatus in the embodiment of the present application may be an electronic device, or may be a component in the electronic device, such as an integrated circuit or a chip.
  • the electronic device may be a terminal, or other devices other than the terminal.
  • the electronic device can be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle electronic device, a mobile Internet device (Mobile Internet Device, MID), an augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) ) equipment, robots, wearable devices, ultra-mobile personal computer (ultra-mobile personal computer, UMPC), netbook or personal digital assistant (personal digital assistant, PDA), etc.
  • the picture processing device in the embodiment of the present application may be a device with an operating system.
  • the operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, which are not specifically limited in this embodiment of the present application.
  • the image processing apparatus provided in the embodiment of the present application can realize various processes realized by the method embodiments in FIG. 1 and FIG. 2 , and details are not repeated here to avoid repetition.
  • the embodiment of the present application also provides an electronic device 300, including a processor 301 and a memory 302.
  • the memory 302 stores programs or instructions that can run on the processor 301.
  • the programs or instructions are executed by the processor 301, the various steps of the above image processing method embodiments can be achieved, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
  • the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.
  • FIG. 5 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
  • the electronic device 400 includes, but is not limited to: a radio frequency unit 401, a network module 402, an audio output unit 403, an input unit 404, a sensor 405, a display unit 406, a user input unit 407, an interface unit 408, a memory 409, and a processor 410, etc. part.
  • the electronic device 400 can also include a power supply (such as a battery) for supplying power to various components, and the power supply can be logically connected to the processor 410 through the power management system, so that the management of charging, discharging, and function can be realized through the power management system. Consumption management and other functions.
  • a power supply such as a battery
  • the structure of the electronic device shown in FIG. 5 does not constitute a limitation to the electronic device.
  • the electronic device may include more or fewer components than shown in the figure, or combine certain components, or arrange different components, and details will not be repeated here. .
  • the processor 410 may be configured to acquire N texts and target information included in the target picture, where the target information includes at least one of the following items: the first completeness of the N texts, the first translation corresponding to the N texts
  • the second completeness degree, N is an integer greater than 1; and used to merge S texts satisfying the first semantic information in the P texts to obtain the first text, the P texts are obtained from the N texts according to the first completeness degree
  • the incomplete text determined in the text, P and S are both integers greater than 1; and used to output the first text when the first text and the second translation are complete texts, and the second translation is the third translation
  • the text obtained after merging the translations corresponding to the S texts, the third translation is an incomplete translation determined from the first translation according to the second degree of completeness.
  • the first completeness includes the sentence completeness of the first target sentence in each of the N texts
  • the second completeness includes the sentence completeness of the second target sentence in each of the first translations.
  • the processor 410 is specifically configured to extract text included in the target picture to obtain N texts; and analyze the sentence integrity of the first target sentence based on the first semantic information; and analyze the sentence integrity of the second target sentence based on the second semantic information.
  • Sentence completeness wherein, the first target sentence and the second target sentence respectively include at least one of the following: the first sentence in the text, the last sentence in the text; the first semantic information and the second semantic information respectively include at least the following One item: sentence structure information, sentence component information, and phrase composition information.
  • the processor 410 may be configured to determine P texts from the N texts according to the sentence completeness of the first target sentence, and the P texts correspond to the third translation.
  • the processor 410 may also be configured to obtain at least two texts from the P texts that match the second text in the P texts according to the first semantic information; Texts are merged to obtain at least two merged texts; and the third text corresponding to the target merged text is determined as the text to be merged corresponding to the second text, and the target merged text is the lowest sentence perplexity in at least two merged texts text; wherein, the S texts include the second text and the third text.
  • the processor 410 may be configured to determine adjacent Q texts from the P texts according to the distribution position of each text in the P texts, where Q is an integer greater than or equal to S; S texts satisfying the first semantic information among the texts are determined as texts to be merged.
  • the processor 410 may also be configured to determine a target arrangement order of the S texts according to the first semantic information; and to combine the S texts according to the target arrangement order to obtain the first text.
  • An embodiment of the present application provides an electronic device. After acquiring multiple texts and target information in the target picture, at least one text that satisfies semantic information among the incomplete texts determined according to the target information among the multiple texts can be combined, A merged text is obtained, so when the text in the picture includes complex texts such as column text, page text, or deformed and irregular text, these complex texts can be merged according to semantic information. Further, since the combined text is output only when the combined text and its corresponding translation are both complete texts, the semantics of the resulting combined text are more fluent. In this way, the ability to process the text in the picture is improved.
  • the input unit 404 may include a graphics processing unit (graphics processing unit, GPU) 4041 and a microphone 4042, and the graphics processing unit 4041 is compatible with the image capturing device (such as the image data of the still picture or video obtained by the camera) for processing.
  • the display unit 406 may include a display panel 4061, and the display panel 4061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the user input unit 407 includes at least one of a touch panel 4071 and other input devices 4072 .
  • the touch panel 4071 is also called a touch screen.
  • the touch panel 4071 may include two parts, a touch detection device and a touch controller.
  • Other input devices 4072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be repeated here.
  • the memory 409 can be used to store software programs as well as various data.
  • the memory 109 can mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area can store an operating system, an application program or instructions required by at least one function (such as a sound playing function, image playback function, etc.), etc.
  • memory 109 may include volatile memory or nonvolatile memory, or memory x09 may include both volatile and nonvolatile memory.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
  • ROM Read-Only Memory
  • PROM programmable read-only memory
  • Erasable PROM Erasable PROM
  • EPROM erasable programmable read-only memory
  • Electrical EPROM Electrical EPROM
  • EEPROM electronically programmable Erase Programmable Read-Only Memory
  • Volatile memory can be random access memory (Random Access Memory, RAM), static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random access memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (Synch link DRAM , SLDRAM) and Direct Memory Bus Random Access Memory (Direct Rambus RAM, DRRAM).
  • RAM Random Access Memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM Double Data Rate SDRAM
  • DDRSDRAM double data rate synchronous dynamic random access memory
  • Enhanced SDRAM, ESDRAM enhanced synchronous dynamic random access memory
  • Synch link DRAM , SLDRAM
  • Direct Memory Bus Random Access Memory Direct Rambus
  • the processor 110 may include one or more processing units; optionally, the processor 110 integrates an application processor and a modem processor, wherein the application processor mainly processes operations related to the operating system, user interface, and application programs, etc., Modem processors mainly process wireless communication signals, such as baseband processors. It can be understood that the foregoing modem processor may not be integrated into the processor 110 .
  • the embodiment of the present application also provides a readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by a processor, each process of the above-mentioned image processing method embodiment is realized, and the same Technical effects, in order to avoid repetition, will not be repeated here.
  • the processor is the processor in the electronic device in the above embodiment.
  • the readable storage medium includes a computer-readable storage medium, such as a computer read-only memory ROM, a random access memory RAM, a magnetic disk or an optical disk, and the like.
  • the embodiment of the present application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to realize the various processes of the above-mentioned image processing method embodiments, and can achieve the same To avoid repetition, the technical effects will not be repeated here.
  • chips mentioned in the embodiments of the present application may also be called system-on-chip, system-on-chip, system-on-a-chip, or system-on-a-chip.
  • the embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the various processes in the above image processing method embodiment, and can achieve the same technical effect , to avoid repetition, it will not be repeated here.
  • the term “comprising”, “comprising” or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase “comprising a " does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.
  • the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. Functions are performed, for example, the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Machine Translation (AREA)

Abstract

La présente demande se rapporte au domaine technique des communications et concerne un procédé et un appareil de traitement d'image, ainsi qu'un dispositif électronique. Le procédé consiste à : obtenir N textes et des informations cibles comprises dans une image cible, les informations cibles comprenant une première intégrité des N textes et/ou une deuxième intégrité de premières traductions correspondant aux N textes, N étant un entier supérieur à 1 ; combiner S textes satisfaisant à des premières informations sémantiques parmi P textes pour obtenir un premier texte, les P textes étant des textes incomplets déterminés parmi les N textes selon la première intégrité, et P et S étant tous les deux des entiers supérieurs à 1 ; et à condition que le premier texte et une deuxième traduction soient tous les deux des textes complets, délivrer en sortie le premier texte, la deuxième traduction étant un texte obtenu par combinaison de traductions correspondant aux S textes parmi des troisièmes traductions, et les troisièmes traductions étant des traductions incomplètes déterminées parmi les premières traductions selon la deuxième intégrité.
PCT/CN2022/136494 2021-12-10 2022-12-05 Procédé et appareil de traitement d'image, et dispositif électronique WO2023103943A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111509057.0A CN114299525A (zh) 2021-12-10 2021-12-10 图片处理方法、装置及电子设备
CN202111509057.0 2021-12-10

Publications (1)

Publication Number Publication Date
WO2023103943A1 true WO2023103943A1 (fr) 2023-06-15

Family

ID=80967753

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/136494 WO2023103943A1 (fr) 2021-12-10 2022-12-05 Procédé et appareil de traitement d'image, et dispositif électronique

Country Status (2)

Country Link
CN (1) CN114299525A (fr)
WO (1) WO2023103943A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114299525A (zh) * 2021-12-10 2022-04-08 维沃移动通信有限公司 图片处理方法、装置及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9659224B1 (en) * 2014-03-31 2017-05-23 Amazon Technologies, Inc. Merging optical character recognized text from frames of image data
CN111368562A (zh) * 2020-02-28 2020-07-03 北京字节跳动网络技术有限公司 翻译图片中的文字的方法、装置、电子设备、及存储介质
CN113343720A (zh) * 2021-06-30 2021-09-03 北京搜狗科技发展有限公司 一种字幕翻译方法、装置和用于字幕翻译的装置
CN113660432A (zh) * 2021-08-17 2021-11-16 安徽听见科技有限公司 翻译字幕制作方法、装置、电子设备与存储介质
CN114299525A (zh) * 2021-12-10 2022-04-08 维沃移动通信有限公司 图片处理方法、装置及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9659224B1 (en) * 2014-03-31 2017-05-23 Amazon Technologies, Inc. Merging optical character recognized text from frames of image data
CN111368562A (zh) * 2020-02-28 2020-07-03 北京字节跳动网络技术有限公司 翻译图片中的文字的方法、装置、电子设备、及存储介质
CN113343720A (zh) * 2021-06-30 2021-09-03 北京搜狗科技发展有限公司 一种字幕翻译方法、装置和用于字幕翻译的装置
CN113660432A (zh) * 2021-08-17 2021-11-16 安徽听见科技有限公司 翻译字幕制作方法、装置、电子设备与存储介质
CN114299525A (zh) * 2021-12-10 2022-04-08 维沃移动通信有限公司 图片处理方法、装置及电子设备

Also Published As

Publication number Publication date
CN114299525A (zh) 2022-04-08

Similar Documents

Publication Publication Date Title
US10198506B2 (en) System and method of sentiment data generation
CN108334490B (zh) 关键词提取方法以及关键词提取装置
KR101130444B1 (ko) 기계번역기법을 이용한 유사문장 식별 시스템
AU2017408800B2 (en) Method and system of mining information, electronic device and readable storable medium
US8577882B2 (en) Method and system for searching multilingual documents
JP5280642B2 (ja) 翻訳システム及び翻訳プログラム、並びに、対訳データ生成方法
US10311113B2 (en) System and method of sentiment data use
JPWO2003065245A1 (ja) 翻訳方法、翻訳文の出力方法、記憶媒体、プログラムおよびコンピュータ装置
WO2022135474A1 (fr) Procédé et appareil de recommandation d'informations et dispositif électronique
CN108920649B (zh) 一种信息推荐方法、装置、设备和介质
WO2023103943A1 (fr) Procédé et appareil de traitement d'image, et dispositif électronique
US20220365956A1 (en) Method and apparatus for generating patent summary information, and electronic device and medium
CN111950301A (zh) 一种中译英的英语译文质量分析方法及系统
WO2003021391A2 (fr) Procede et appareil de traduction d'un genre en un autre genre d'une langue generique
CN108984600B (zh) 交互处理方法、装置、计算机设备及可读介质
Delecraz et al. Multimodal machine learning for natural language processing: disambiguating prepositional phrase attachments with images
CN110020429A (zh) 语义识别方法及设备
CN110888940B (zh) 文本信息提取方法、装置、计算机设备及存储介质
JP2017015874A (ja) 文章読解支援装置、並びに、注釈データ作成装置、注釈データ作成方法及び注釈データ作成プログラム
US20210263915A1 (en) Search Text Generation System and Search Text Generation Method
WO2021097629A1 (fr) Procédé et appareil de traitement de données, et dispositif électronique et support de stockage
TWI376656B (en) Foreign-language learning method utilizing an original language to review corresponding foreign languages and foreign-language learning database system thereof
CN113157966B (zh) 显示方法、装置及电子设备
CN112036135A (zh) 一种文本处理方法和相关装置
KR101501459B1 (ko) 자동 번역 기술을 이용한 작문 시스템 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22903365

Country of ref document: EP

Kind code of ref document: A1