CN115797959A

CN115797959A - Picture processing method and device, electronic equipment and storage medium

Info

Publication number: CN115797959A
Application number: CN202211658626.2A
Authority: CN
Inventors: 赵瑞书
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2022-12-22
Filing date: 2022-12-22
Publication date: 2023-03-14

Abstract

The embodiment of the invention relates to a picture processing method, a picture processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a picture to be processed, and inputting the picture to be processed into a trained detection model to obtain a prediction mask output by the detection model; the prediction mask includes at least: the character prediction mask corresponds to a character area in the picture to be processed, and the special effect prediction mask corresponds to a special effect area of a character special effect in the picture to be processed; and merging the character prediction mask and the special effect prediction mask to obtain a target prediction mask of the character region and the special effect region in the picture to be processed. Therefore, the detection operation of characters and character special effects in the picture to be processed can be rapidly and accurately executed, the target prediction masks of the character region and the special effect region in the picture to be processed output by the detection model are obtained, and the work efficiency of detecting the picture to be processed is improved.

Description

Picture processing method and device, electronic equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a picture processing method and device, electronic equipment and a storage medium.

Background

In a poster of a common movie, a large artistic word is often attached, such as a name of the movie, a name of an actor, and the like. When the poster of the movie and television play needs to be displayed, the appearance of a viewer can be influenced by too many too large characters in the poster, so that too many too large artistic words in the poster can be erased.

However, before the too large artistic words in the poster are erased, the too large artistic words in the poster need to be identified firstly, so that a solution capable of identifying the too large artistic words in the poster is urgently needed.

Disclosure of Invention

In view of the above, embodiments of the present invention provide a picture processing method, an apparatus, an electronic device, and a storage medium to identify too many too large artistic words in a poster and obtain mask data of the artistic words.

In a first aspect, an embodiment of the present invention provides a method for processing an image, where the method includes:

acquiring a picture to be processed, and inputting the picture to be processed into a trained detection model to obtain a prediction mask output by the detection model;

the prediction mask includes at least: the character prediction mask corresponds to a character area in the picture to be processed, and the special effect prediction mask corresponds to a special effect area of a character special effect in the picture to be processed;

and merging the character prediction mask and the special effect prediction mask to obtain a target prediction mask of the character region and the special effect region in the picture to be processed.

In an optional embodiment, the merging the text prediction mask and the special effect prediction mask to obtain a target prediction mask of the text region and the special effect region in the picture to be processed includes:

merging the character prediction mask and the special effect prediction mask to obtain a merged mask of the character region and the special effect region in the picture to be processed;

and performing pixel expansion processing on the merging mask, and determining the expanded merging mask as a target prediction mask of the character region and the special effect region in the picture to be processed.

In an optional embodiment, the word prediction masks include a first word prediction mask and a second word prediction mask, the first word prediction mask corresponds to words in the picture to be processed, and the second word prediction mask corresponds to edges of words in the picture to be processed;

the special effect prediction mask comprises a first special effect prediction mask and a second special effect prediction mask, the first special effect prediction mask corresponds to a character special effect in the picture to be processed, and the second special effect prediction mask corresponds to a character special effect edge in the picture to be processed.

In an alternative embodiment, the trained detection model is obtained by:

acquiring a sample picture, and inputting the sample picture into a preset initial detection model to obtain a sample prediction mask output by the initial detection model;

the sample prediction mask includes at least: a sample text prediction mask corresponding to a sample text region in the sample picture and a sample special effect prediction mask corresponding to a sample special effect region of text in the sample picture;

determining the character cross entropy of the sample character area according to the sample character prediction mask;

determining a special effect cross entropy of the sample special effect region according to the sample special effect prediction mask;

and performing back propagation processing on the character cross entropy and the special effect cross entropy to finish the training of an initial detection model.

In an optional embodiment, the sample text prediction mask at least includes: a first prediction mask and a first edge prediction mask; the first prediction mask corresponds to sample text in the sample text region, the first edge prediction mask corresponds to sample text edges in the sample text region;

the sample special effect prediction mask includes at least: a second prediction mask and a second edge prediction mask; the second prediction mask corresponds to a sample text special effect in the sample special effect region, and the second edge prediction mask corresponds to a sample text special effect edge in the sample special effect region;

the literal cross entropy includes at least: a first cross entropy and a second cross entropy; the first cross entropy corresponds to a cross entropy of a sample literal in the sample literal region, the second cross entropy corresponds to a cross entropy of a sample literal edge in the sample literal region;

the special effect cross entropy comprises at least: a third cross entropy and a fourth cross entropy; the third cross entropy corresponds to a cross entropy of a sample text special effect in the sample special effect region, and the fourth cross entropy corresponds to a cross entropy of a sample text special effect edge in the sample special effect region.

In an optional embodiment, after the obtaining the sample picture, the method further includes:

marking a sample text area of the sample picture through a first marking mask, and marking a sample special effect area of the sample picture through a second marking mask;

acquiring a sample text edge from the sample text area, and marking through a third marking mask;

and acquiring a sample character special effect edge from the sample special effect area, and marking through a fourth marking mask.

In an optional embodiment, the determining the word cross entropy of the sample word region according to the sample word prediction mask includes:

calculating the first mark mask and the first prediction mask by adopting a preset cross entropy algorithm to obtain a first cross entropy of the sample characters in the sample character area;

calculating the third mark mask and the first edge prediction mask by adopting a preset cross entropy algorithm to obtain a second cross entropy of the sample character edge in the sample character area;

determining the first cross entropy and the second cross entropy as the text cross entropy of the sample text region.

In an optional embodiment, the determining the special effect cross entropy of the sample special effect region according to the sample special effect prediction mask includes:

calculating the second mark mask and the second prediction mask by adopting a preset cross entropy algorithm to obtain a third cross entropy of the special effect in the sample special effect region;

calculating the fourth mark mask and the second edge prediction mask by adopting a preset cross entropy algorithm to obtain a fourth cross entropy of the sample character special effect edge in the sample special effect region;

determining the third cross entropy and the fourth cross entropy as a special effect cross entropy of the sample special effect region.

In an optional embodiment, the performing back propagation processing on the text cross entropy and the special effect cross entropy to complete training of an initial detection model includes:

performing summation operation on the first cross entropy, the second cross entropy, the third cross entropy and the fourth cross entropy by adopting a preset summation algorithm to obtain a target cross entropy;

and performing back propagation processing on the target cross entropy to finish the training of the initial detection model.

In a second aspect, an embodiment of the present invention provides an image processing apparatus, where the apparatus includes:

the image acquisition module is used for acquiring an image to be processed, inputting the image to be processed into the trained detection model and obtaining a prediction mask output by the detection model;

and the mask determining module is used for merging the character prediction mask and the special effect prediction mask to obtain a target prediction mask of the character region and the special effect region in the picture to be processed.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a processor and a memory, wherein the processor is configured to execute a picture processing program stored in the memory to implement the picture processing method according to any one of the first aspect.

In a fourth aspect, an embodiment of the present invention provides a storage medium, where the storage medium stores one or more programs, and the one or more programs are executable by one or more processors to implement the picture processing method according to any one of the first aspects.

According to the technical scheme provided by the embodiment of the invention, the picture to be processed is acquired and input into the trained detection model, so that the prediction mask output by the detection model is obtained. Wherein the prediction mask comprises at least: the method comprises a character prediction mask and a special effect prediction mask, wherein the character prediction mask corresponds to a character area in a picture to be processed, and the special effect prediction mask corresponds to a special effect area of a character special effect in the picture to be processed. And merging the character prediction mask and the special effect prediction mask to obtain a target prediction mask of a character region and a special effect region in the picture to be processed. Therefore, the detection operation of the special effects of the characters and the characters in the picture to be processed can be quickly and accurately realized, the target prediction masks of the character region and the special effect region in the picture to be processed output by the detection model are obtained, the work efficiency of detecting the picture to be processed is improved, and the method is a basis for the subsequent erasing operation of the special effects of the characters and the characters in the picture to be processed.

Drawings

Fig. 1 is a flowchart illustrating an embodiment of a method for processing an image according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating another method for processing pictures according to an embodiment of the present invention;

FIG. 3 is a flowchart of an embodiment of a model training method according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a data annotation method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of another data annotation method according to an embodiment of the present invention;

fig. 6 is a flowchart of an embodiment of a target cross entropy determination method according to an embodiment of the present invention;

FIG. 7 is a schematic view of a visualization interface shown in an embodiment of the present invention;

FIG. 8 is a schematic view of another visualization interface shown in an embodiment of the present invention;

FIG. 9 is a flowchart illustrating another exemplary method for training a model according to an embodiment of the present invention;

FIG. 10 is a block diagram of an embodiment of a picture processing apparatus according to the present invention;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In a common poster of a movie, the name of the movie is usually provided with large artistic words. When some posters of movie and television series are used for displaying in the love art main station, excessive characters are not expected to appear in the posters, so that the artistic characters of the posters need to be erased by some technical means. The existing method generally adopts PS technology. However, the PS technique is complicated to operate and requires a high level of skill on the part of the person. If a large amount of data needs to be processed, a large amount of labor and time cost is required, and meanwhile, the quality of the PS pictures may have certain problems. Before the operation of erasing the artistic words in the poster is carried out, the artistic words in the poster need to be detected firstly. When the artistic words in the poster are detected, a part of the artistic words can have larger shadows or drawn lines, if only the artistic words are detected but no shadows or drawn lines are detected, in the actual use process of the detection result, for example: and erasing the artistic words. The resulting erase effect is not ideal.

Therefore, the embodiment of the invention provides that all the artistic words in the poster are more perfectly covered to the range of the artistic words by using a smaller mask. And a good foundation is laid for the next character erasing, and the erasing of the artistic characters in the poster is finally realized.

Specifically, in order to ensure the erasing effect of the artistic words and the special effects of the artistic words in the poster, the special effects of shadows, delineation and the like corresponding to the artistic words are required to be detected in the process of detecting the artistic words in the poster. If the shadow and the stroke are equal to the flower characters as a whole, when the special effect mask is generated and trained, the contents of some characters which are only similar to the special effect and have no artistic words are mistakenly detected as artistic words due to the fact that part of the special effect is too large.

In order to solve the technical problem, the embodiment of the invention provides a pre-training detection model for respectively detecting the special effects of the artistic words and the artistic words in the poster, so that the separation of the special effects of the artistic words and the artistic words is ensured, and the problem that other target objects are mistakenly recognized as the special effects of the artistic words or the artistic words to be detected due to the similarity with the special effect visual characteristics of the artistic words can be effectively inhibited.

The following further explains the image processing method provided by the present invention with specific embodiments, which are not limited to the embodiments of the present invention.

Referring to fig. 1, a flowchart of an embodiment of a picture processing method according to an embodiment of the present invention is shown. As shown in fig. 1, the process may include the following steps:

step 101, obtaining a picture to be processed, inputting the picture to be processed into a trained detection model, and obtaining a prediction mask output by the detection model.

Step 102, the prediction mask at least comprises: the character prediction mask corresponds to a character area in the picture to be processed, and the special effect prediction mask corresponds to a special effect area of a character special effect in the picture to be processed.

Step 101 and step 102 are described collectively below:

in the embodiment of the invention, in order to realize the rapid detection of the characters in the picture to be processed and the character special effects corresponding to the characters, a detection model is trained in advance, and the prediction mask output by the detection model can be obtained by inputting the picture to be processed into the trained detection model. For how to train the detection model, reference may be made to the following description in the flowchart, which is not detailed here.

Specifically, the prediction mask at least includes: the character prediction mask corresponds to a character area in the picture to be processed, and the special effect prediction mask corresponds to a special effect area of a character special effect in the picture to be processed. It can be understood that the to-be-processed picture is a picture including characters and a character special effect, and the number, the size and other picture information of the to-be-processed picture are not particularly limited in the embodiment of the present invention.

And 103, merging the character prediction mask and the special effect prediction mask to obtain a target prediction mask of a character region and a special effect region in the picture to be processed.

As can be seen from the above description, in the embodiment of the present invention, after the word prediction mask and the special effect prediction mask are obtained, merging processing (or union processing) may be performed on the word prediction mask and the special effect prediction mask, so that prediction masks (hereinafter referred to as target prediction masks) of a word region and a special effect region in an image to be processed may be obtained.

By the processing mode, the picture to be processed can be quickly and accurately detected, the target prediction mask of the character region and the special effect region of the character special effect in the picture to be processed is obtained, and the subsequent operation of executing character erasing on the picture to be processed is based on so as to better realize character erasing on the picture to be processed.

So far, the description about the flow shown in fig. 1 is completed.

As can be seen from the flow shown in fig. 1, in the technical solution of the present invention, the picture to be processed is obtained and input to the trained detection model, so as to obtain the prediction mask output by the detection model. Wherein the prediction mask comprises at least: the character prediction mask corresponds to a character area in the picture to be processed, and the special effect prediction mask corresponds to a special effect area of a character special effect in the picture to be processed. And merging the character prediction mask and the special effect prediction mask to obtain a target prediction mask of a character region and a special effect region in the picture to be processed. Therefore, the detection operation of the special effects of the characters and the characters in the picture to be processed can be quickly and accurately realized, the target prediction masks of the character region and the special effect region in the picture to be processed output by the detection model are obtained, the work efficiency of detecting the picture to be processed is improved, and the method is a basis for the subsequent erasing operation of the special effects of the characters and the characters in the picture to be processed.

Referring to fig. 2, a flowchart of another embodiment of a method for processing a picture according to an embodiment of the present invention is shown. The flow shown in fig. 2 specifically describes how to implement the processing operation on the picture to be processed and the training operation on the detection model based on the flow shown in fig. 1. As shown in fig. 2, the process may include the following steps:

step 201, obtaining a picture to be processed, inputting the picture to be processed to the trained detection model, and obtaining a prediction mask output by the detection model.

Step 202, the prediction mask at least comprises: the character prediction mask corresponds to a character area in the picture to be processed, and the special effect prediction mask corresponds to a special effect area of a character special effect in the picture to be processed.

Step 201 and step 202 are described collectively below:

based on the related description of the flow shown in fig. 1, it can be known that the prediction mask at least includes: the method comprises a character prediction mask and a special effect prediction mask, wherein the character prediction mask corresponds to a character area in a picture to be processed, and the special effect prediction mask corresponds to a special effect area of a character special effect in the picture to be processed. In the embodiment of the invention, in order to better distinguish characters from special effects of the characters in the picture to be processed, the edges of the characters and the edges of the special effects of the characters can be determined.

Based on this, it can be seen that the word prediction mask may include a first word prediction mask and a second word prediction mask. The first character prediction mask corresponds to characters in the picture to be processed, and the second character prediction mask corresponds to the edges of the characters in the picture to be processed. The special effect prediction mask may include a first special effect prediction mask and a second special effect prediction mask. The first special effect prediction mask corresponds to a Chinese character special effect of the picture to be processed, and the second special effect prediction mask corresponds to a Chinese character special effect edge of the picture to be processed. Therefore, the prediction mask of the characters and the character edges in the character area and the prediction mask of the character special effect and the character special effect edges in the special effect area in the picture to be processed can be obtained.

In practical application, because of the difference between the first character prediction mask of the characters in the picture to be processed and the second prediction mask of the character special effect, the characters in the picture to be processed and the character special effect can be better distinguished. Therefore, the first edge prediction mask of the text edge and the second edge prediction mask of the text special effect edge may be ignored, and accordingly, the text prediction mask may be the first prediction mask, and the special effect prediction mask may be the second prediction mask.

In an embodiment, the trained detection model may be obtained by: specifically, referring to fig. 3, it is a flowchart of an embodiment of a model training method according to an embodiment of the present invention. As shown in fig. 3, the process may include the following steps:

step 301, obtaining a sample picture, inputting the sample picture into a preset initial detection model, and obtaining a sample prediction mask output by the initial detection model.

Step 302, the sample prediction mask at least comprises: the method comprises a sample character prediction mask and a sample special effect prediction mask, wherein the sample character prediction mask corresponds to a sample character area in a sample picture, and the special effect prediction mask corresponds to a sample special effect area of characters in the sample picture.

Step 301 and step 302 are described collectively below:

in the embodiment of the invention, the sample picture can be obtained in advance, and the sample prediction mask in the sample picture is obtained, so that the training of the initial detection model according to the sample prediction mask is realized, and the detection model is obtained.

In an embodiment, after the sample picture is obtained, a sample text region corresponding to a sample text in the sample picture and a sample special effect region corresponding to a sample text special effect may also be determined, and a sample text edge of the sample text region and a sample text special effect edge of the sample special effect region may also be determined. In practical application, for convenient data labeling, two kinds of labels can be made: 1. labeling based on the outer edge and the inner edge of the character; 2. marking the outer edge and the inner edge based on the character + character special effect. Specifically, referring to fig. 4, a schematic diagram of a data annotation method according to an embodiment of the present invention is provided. For the following pictures shown in fig. 4, the labeling scheme is as follows: in fig. 4 (a), the original is shown in fig. 4, in which the "second season" in the first line is a character without a special effect of characters, and the "kukuo-resistant man's character" in the second line is a character with a special effect of a stereoscopic effect. For the data processing here, the characters without special effects of characters are individually labeled and represented as a dark gray Mask (hereinafter referred to as a first mark Mask) portion in fig. 4 (b); for the special effect part containing the character special effect, the Mask of the combined marking result of the special effect and the artistic word is adopted, and the Mask of the character is subtracted to be expressed as a Mask part (hereinafter, referred to as a second mark Mask) in light gray in a second drawing.

The representation in the original figure can be seen in fig. 4 (c), wherein the light gray mask is represented as an artistic word without special effect, and wherein the dark gray mask is represented as a special effect part of the artistic word.

It should be noted that, the above processing scheme for the special character effects has a more obvious expression effect for the fact that pictures in movie and television types such as cartoons, movie and television dramas contain larger characters and special character effects. The method is mainly characterized in that the characteristics and the incompleteness of the peripheral special effects of the characters can be effectively learned in the model training process by splitting the character special effects and the characters, and the characteristics and the completeness of other special effects without the characters can be effectively distinguished. Specifically, referring to fig. 5, a schematic diagram of another data annotation method provided in the embodiment of the present invention is shown. Fig. 5 is a diagram illustrating data labeling of characters and special effects of characters for an animation type picture based on the data labeling method illustrated in fig. 4, and is specifically illustrated in fig. 5, and detailed description of a specific labeling process is detailed in fig. 4, and is not described in detail here.

Furthermore, the sample text area of the sample picture can be marked by the first marking mask, and the sample special effect area of the sample picture can be marked by the second marking mask. Therefore, the sample characters and the sample character special effects in the sample picture can be distinguished, and in order to better distinguish the sample characters and the sample character special effects, the first mark mask and the second mark mask can be masks of different layers.

Acquiring a sample character edge from the sample character area, and marking through a third marking mask; and acquiring a sample character special effect edge from the sample special effect area, and marking the sample character special effect edge through a fourth marking mask. Thus, the sample character edge and the sample character special effect edge in the sample character area can be set and marked by the mask.

Specifically, after the first mark mask of the sample text region is determined, the first mark mask may be subjected to pixel erosion processing to obtain a third mark mask. Similarly, after the second mark mask of the sample special effect region is determined, the second mark mask may be subjected to pixel erosion processing to obtain a fourth mark mask. For example, a sample text edge after erosion of 3 pixels is obtained, and a sample text special effect edge of erosion of 3 pixels is obtained. The embodiment of the present invention does not specifically limit the number of pixels corroded by the pixels.

Optionally, in order to better distinguish the sample text region from the sample special effect region in the sample picture, after the sample text region of the sample picture is marked by the first marking mask and the sample special effect region of the sample picture is marked by the second marking mask, a preset number of pixel intervals between the sample text region and the sample special effect region may be further obtained as an interval region. The preset number may be any positive integer, for example, 2, and the embodiment of the present invention is not limited thereto.

Specifically, for example, the preset number is 2, the pixel expansion processing may be performed on the edge of the sample text in the sample text region to obtain an interval region expanded by 2 pixels, and the interval region may be marked by a fifth mark mask and a sixth mark mask. The fifth mark mask corresponds to a sample character interval, and the sixth mark mask corresponds to a sample character special effect interval. It should be noted that the interval area of the 2 pixels can be represented by a value of 255, and the corresponding area without any text in the sample picture can be represented by a value of 0. Therefore, the sample text area and the sample special effect area in the sample picture can be accurately distinguished and obtained.

Still further, a sample prediction mask output by the initial detection model can be obtained by inputting the sample picture into a preset initial detection model. Wherein the sample prediction mask includes at least a sample literal prediction mask and a sample special effect prediction mask.

Specifically, the sample text prediction mask at least includes: a first prediction mask and a first edge prediction mask. The first prediction mask corresponds to a sample word in the sample word region and the first edge prediction mask corresponds to a sample word edge in the sample word region. The sample special effect prediction mask at least includes: a second prediction mask and a second edge prediction mask. The second prediction mask corresponds to a sample text special effect in the sample special effect region, and the second edge prediction mask corresponds to a sample text special effect edge in the sample special effect region.

Optionally, the specific implementation of determining the first prediction mask of the sample word and the second prediction mask of the special effect of the sample word may include: the characteristic extraction processing can be carried out on the sample picture through a characteristic extraction algorithm to obtain the picture characteristic of the sample picture, and the picture characteristic is processed through a preset mask algorithm to predict sample characters in the sample picture and a prediction mask of a special effect of the sample characters. The first prediction mask of the sample word and the second prediction mask of the special effect of the sample word obtained by the prediction may be prediction masks of different output layers.

The first prediction mask and the second prediction mask of the different output layers may be relatively independent. The output layer of each prediction mask has the same size as the original picture, and the predicted mask case can be represented by a floating point number of 0-1. A mask close to 1 indicates that there is a word or special effect at the position, whereas a mask close to 0 indicates that there is no word.

And step 303, determining the character cross entropy of the sample character area according to the sample character prediction mask.

And step 304, determining the special effect cross entropy of the sample special effect region according to the sample special effect prediction mask.

Step 303 and step 304 are collectively described below:

wherein, the above-mentioned word cross entropy includes at least: a first cross entropy and a second cross entropy. The first cross entropy corresponds to the cross entropy of the sample text in the sample text region and the second cross entropy corresponds to the cross entropy of the sample text edge in the sample text region.

The special effect cross entropy at least comprises: a third cross entropy and a fourth cross entropy. The third cross entropy corresponds to the cross entropy of the sample text special effect in the sample special effect region, and the fourth cross entropy corresponds to the cross entropy of the sample text special effect edge in the sample special effect region.

In an embodiment, the specific implementation of determining the word cross entropy of the sample word region according to the sample word prediction mask may include: and operating the first mark mask and the first prediction mask by adopting a preset cross entropy algorithm to obtain a first cross entropy of the sample characters in the sample character area. And calculating the third mark mask and the first edge prediction mask by adopting a preset cross entropy algorithm to obtain a second cross entropy of the sample character edge in the sample character area. As such, the first cross entropy and the second cross entropy may be determined to be text cross entropies for the sample text region. The cross entropy algorithm may be a loss function algorithm, and the specific algorithm type of the loss function algorithm is not limited in this embodiment of the present invention.

For example, taking the first cross entropy as an example, the first mark mask and the first prediction mask of the sample word are represented by pictures of the same size. And assume that the value of the first mark mask _ background of the sample text is 0 or 1, and the value of the first prediction mask pred _ mask is a probability value within the interval of [0,1 ]. For each specific location, the corresponding first cross entropy calculation process may be as follows:

it is assumed that the first prediction mask pred _ mask is 0.9 and the corresponding standard value, i.e., the first flag mask, is 1. Then, according to the above description, the first mark mask and the first prediction mask are operated by using a preset cross entropy algorithm, and the first cross entropy of the sample text in the sample text region is 1 × ln0.9+0 × ln0.1. The character cross entropy and the special effect cross entropy of the whole sample picture are calculated by the method.

In an embodiment, the specific implementation of determining the special effect cross entropy of the sample special effect region according to the sample special effect prediction mask may include: and calculating the second marking mask and the second prediction mask by adopting a preset cross entropy algorithm to obtain a third cross entropy of the special effect in the sample special effect region. And calculating the fourth mark mask and the second edge prediction mask by adopting a preset cross entropy algorithm to obtain a fourth cross entropy of the sample character special effect edge in the sample special effect region. As such, the third cross entropy and the fourth cross entropy may be determined to be the special effect cross entropy of the sample special effect region.

It should be further noted that the first cross entropy may be used to regress a region range of a sample text in the sample picture, and the second cross entropy may be used to regress an edge of the sample text in the sample picture. The third cross entropy may be used to regress a region range of the sample text special effect in the sample picture, and the fourth cross entropy may be used to regress a sample text special effect edge in the sample picture.

And 305, performing back propagation processing on the character cross entropy and the special effect cross entropy to finish the training of the initial detection model.

The back propagation is a method for calculating the gradient of the expression recursively by using a chain rule.

Referring to the related description of step 303 and step 304, in an embodiment of the present invention, the text cross entropy at least includes a first cross entropy and a second cross entropy, and the effect cross entropy at least includes a third cross entropy and a fourth cross entropy. In this way, model training may be performed according to the first cross entropy, the second cross entropy, the third cross entropy, and the fourth cross entropy.

Based on the above description, in an embodiment, the performing back propagation processing on the text cross entropy and the special effect cross entropy to complete the training of the initial detection model may include: and performing summation operation on the first cross entropy, the second cross entropy, the third cross entropy and the fourth cross entropy by adopting a preset summation algorithm to obtain a target cross entropy, and performing back propagation processing on the target cross entropy to finish the training of the initial detection model. Therefore, the detection model can be obtained through training so as to realize the subsequent detection processing of the picture to be processed.

Referring to fig. 6 in detail, an embodiment of a target cross entropy determination method according to an embodiment of the present invention is shown in a flowchart. As shown in fig. 6, in the drawing, image is a sample picture, feature extract is picture features obtained by feature extraction on the sample picture, and the picture features are processed through a preset mask algorithm to obtain a prediction mask of sample characters and a sample character special effect in the sample picture through prediction. That is, pred edge mask1 is the first prediction mask of the sample text edge, pred mask2 is the second prediction mask of the sample text special effect, and Pred edge mask2 is the fourth marking mask of the sample text special effect edge.

In fig. 6, gt edge mask1 is the second label mask of the sample text edge in the sample text region, and gt mask1 is the first label mask of the sample text. gt mask2 is a third mark mask of the sample character special effect in the sample special effect area, and gt edge mask2 is a second edge prediction mask of the sample character special effect.

And calculating the gt edge mask1 and the Pred edge mask1 by adopting a preset cross entropy algorithm to obtain a second cross entropy of the edge of the sample text in the sample text region, which is edge _ loss1.

And calculating gt mask1 and Pred mask1 by adopting a preset cross entropy algorithm to obtain a first cross entropy of the sample text in the sample text area as mask _ loss1.

And calculating the gt mask2 and the Pred mask2 by adopting a preset cross entropy algorithm to obtain a third cross entropy of the sample text in the sample text area, namely mask _ loss2.

And calculating the gt edge mask2 and the Pred edge mask2 by adopting a preset cross entropy algorithm to obtain a fourth cross entropy of the edge of the sample text in the sample text region, which is edge _ loss2.

And then, performing summation operation on the first cross entropy mask _ loss1, the second cross entropy edge _ loss1, the third cross entropy mask _ loss2 and the fourth cross entropy edge _ loss2 by adopting a predicted preset summation algorithm to obtain a target cross entropy which is total loss.

It should be noted that the back propagation needs to add all cross entropies to obtain a comprehensive cross entropy (i.e. target cross entropy), and the back propagation is performed through the target cross entropy to show the partial derivatives of the network loss function to the neuron output values in each mask output layer. Then, according to an optimization algorithm, a gradient value of each neuron is calculated, and then each parameter is updated. Thus, the training of the initial detection model is completed, and the trained detection model is obtained.

So far, the description about the flow shown in fig. 3 is completed.

And inputting the sample picture into a preset initial detection model by obtaining the sample picture to obtain a sample prediction mask output by the initial detection model. Wherein the sample prediction mask comprises at least: the method comprises a sample character prediction mask and a sample special effect prediction mask, wherein the sample character prediction mask corresponds to a sample character area in a sample picture, and the special effect prediction mask corresponds to a sample special effect area of characters in the sample picture. And determining the character cross entropy of the sample character area according to the sample character prediction mask, and determining the special effect cross entropy of the sample special effect area according to the sample special effect prediction mask. And performing back propagation processing on the character cross entropy and the special effect cross entropy to finish the training of the initial detection model. Therefore, a sample prediction mask of the sample picture can be determined, and the character cross entropy of the sample character region and the special effect cross entropy of the sample special effect region are determined according to the sample prediction mask, so that the training of the initial detection model is realized by performing back propagation through the character cross entropy and the special effect cross entropy. Therefore, the image to be processed can be rapidly detected through the detection model obtained through training, an accurate detection result is obtained, and the working efficiency and the detection effect are improved.

And 203, merging the character prediction mask and the special effect prediction mask to obtain a merged mask of a character region and a special effect region in the picture to be processed.

And 204, performing pixel expansion processing on the merging mask, and determining the expanded merging mask as a target prediction mask of a character region and a special effect region in the picture to be processed.

Step 203 and step 204 are described collectively below:

as can be seen from the above description, in the embodiment of the present invention, the text prediction mask and the special effect prediction mask belong to different mask output layers, and therefore, the combined mask of the text region and the special effect region in the picture to be processed can be determined by combining the text prediction mask and the special effect prediction mask.

Furthermore, in order to avoid detection omission when mask detection is performed on the picture to be processed, and consequently, when characters are subsequently erased according to the prediction masks of the character region and the special effect region in the picture to be processed, the omitted characters are not completely erased, the erasing effect is not good, and the user experience is not good.

Therefore, the character area and the special effect area in the picture to be processed can be expanded. Specifically, after determining a combined mask of a text region and a special effect region in the picture to be processed, pixel expansion processing is performed on the text region and the special effect region of different combined mask output layers, for example, a preset number of pixel regions are obtained as expansion regions of the text region and the special effect region. The predicted number may be any positive integer, for example, 5, which is not limited in this embodiment of the present invention. Specifically, referring to fig. 7, a schematic diagram of a visualization interface according to an embodiment of the present invention is shown. Fig. 7 includes fig. 7 (a), 7 (b), and 7 (c). Fig. 7 (a) is a to-be-processed picture, and fig. 7 (b) is a picture of a text region and a special effect region output after a text prediction mask and a special effect prediction mask are merged. Fig. 7 (c) is a picture of a character region and a special effect region output after pixel expansion processing is performed on the merging mask.

Specifically, refer to fig. 8, which is a schematic diagram of another visualization interface according to an embodiment of the present invention. Fig. 8 includes fig. 8 (a) and 8 (b). Fig. 8 (a) is a picture to be processed, and fig. 8 (b) is a picture after a text region and a special effect region in the picture to be processed are expanded by 5 pixels. It can be seen from the picture that the detected merged mask can effectively cover all the text areas without excessive useless areas after the pixel expansion processing.

In the embodiment of the invention, the merging mask is subjected to pixel expansion, and the expanded merging mask is determined as the target prediction mask of a character area and a special effect area in the picture to be processed.

By the processing mode, the character area and the special effect area in the picture to be processed can be expanded, and detection omission is avoided when mask detection is carried out on the picture to be processed.

So far, the description about the flow shown in fig. 2 is completed.

By the processing mode, the detection operation of characters and character special effects in the picture to be processed can be quickly and accurately realized, and detection omission is avoided when mask detection is carried out on the picture to be processed by expanding the character region and the special effect region in the picture to be processed. And then the target prediction mask of the character region and the special effect region in the picture to be processed output by the detection model can be obtained, the working efficiency of detecting the picture to be processed is improved, and the method is used as a basis for the subsequent erasing operation of the characters and the special effects of the picture to be processed.

Referring to fig. 9, a flowchart of an embodiment of another model training method according to an embodiment of the present invention is provided. As shown in fig. 9, the process may include the following steps:

and step 901, obtaining a sample picture.

Step 902, marking a sample text area of the sample picture by a first marking mask and marking a sample special effect area of the sample picture by a second marking mask.

Step 903, obtaining a sample text edge from the sample text area, and marking through a third marking mask.

And 904, acquiring a sample text special effect edge from the sample special effect area, and marking through a fourth marking mask.

Step 905, inputting the sample picture into a preset initial detection model to obtain a sample prediction mask output by the initial detection model.

Step 906, the sample prediction mask includes at least: the method comprises a sample character prediction mask and a sample special effect prediction mask, wherein the sample character prediction mask corresponds to a sample character area in a sample picture, and the special effect prediction mask corresponds to a sample special effect area of characters in the sample picture.

Step 907, performing operation on the first mark mask and the first prediction mask by using a preset cross entropy algorithm to obtain a first cross entropy of the sample text in the sample text region.

And 908, operating the third mark mask and the first edge prediction mask by using a preset cross entropy algorithm to obtain a second cross entropy of the sample character edge in the sample character area.

And 909, operating the second mark mask and the second prediction mask by using a preset cross entropy algorithm to obtain a third cross entropy of the special effect in the sample special effect region.

And 910, operating the fourth mark mask and the second edge prediction mask by adopting a preset cross entropy algorithm to obtain a fourth cross entropy of the sample character special effect edge in the sample special effect area.

Step 911, performing summation operation on the first cross entropy, the second cross entropy, the third cross entropy and the fourth cross entropy by using a preset summation algorithm to obtain a target cross entropy.

And 912, performing back propagation processing on the target cross entropy to finish the training of the initial detection model.

For the detailed description of steps 901 to 912, refer to the related description of the flow shown in fig. 3, which is not repeated here.

Corresponding to the foregoing embodiment of the image processing method, the present invention further provides an embodiment block diagram of an apparatus.

Referring to fig. 10, a block diagram of an embodiment of a picture processing apparatus according to an embodiment of the present invention is shown. As shown in fig. 10, the apparatus includes:

the image obtaining module 1001 is configured to obtain an image to be processed, input the image to be processed to a trained detection model, and obtain a prediction mask output by the detection model;

a mask determining module 1002, configured to perform merging processing on the text prediction mask and the special effect prediction mask to obtain a target prediction mask of the text region and the special effect region in the to-be-processed picture.

In an optional implementation manner, the mask determining module 1002 is specifically configured to:

In an alternative embodiment, the device further comprises (not shown in the figures):

a model training model for the trained detection model obtained by:

the mask determining unit is used for acquiring a sample picture, inputting the sample picture into a preset initial detection model and obtaining a sample prediction mask output by the initial detection model;

a word cross entropy determining unit, configured to determine a word cross entropy of the sample word region according to the sample word prediction mask;

a special effect cross entropy determining unit, configured to determine a special effect cross entropy of the sample special effect region according to the sample special effect prediction mask;

and the model training unit is used for carrying out back propagation processing on the character cross entropy and the special effect cross entropy so as to finish the training of the initial detection model.

the text cross entropy at least comprises: a first cross entropy and a second cross entropy; the first cross entropy corresponds to a cross entropy of a sample literal in the sample literal region, the second cross entropy corresponds to a cross entropy of a sample literal edge in the sample literal region;

the first marking module is used for marking a sample character area of the sample picture through a first marking mask after the sample picture is obtained, and marking a sample special effect area of the sample picture through a second marking mask;

the second marking module is used for acquiring the sample character edge from the sample character area and marking the sample character edge through a third marking mask;

and the third marking module is used for acquiring a sample character special effect edge from the sample special effect area and marking the sample character special effect edge through a fourth marking mask.

In an optional embodiment, the text cross entropy determining unit is specifically configured to:

In an optional implementation manner, the special effect cross entropy determining unit is specifically configured to:

In an optional embodiment, the model training unit is specifically configured to:

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device 1100 shown in fig. 11 includes: at least one processor 1101, memory 1102, at least one network interface 1104, and a user interface 1103. The various components in the electronic device 1100 are coupled together by a bus system 1105. It is understood that the bus system 1105 serves to enable connected communication between these components. The bus system 1105 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled in fig. 11 as the bus system 1105.

The user interface 1103 may include, among other things, a display, a keyboard or a pointing device (e.g., a mouse, trackball), a touch pad or a touch screen, among others.

It is to be understood that the memory 1102 in embodiments of the present invention can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), double Data Rate Synchronous Dynamic random access memory (ddr Data Rate SDRAM, ddr SDRAM), enhanced Synchronous SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory 1102 described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some embodiments, memory 1102 stores elements, executable units or data structures, or a subset thereof, or an expanded set thereof as follows: an operating system 11021 and application programs 11022.

The operating system 11021 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application 11022 contains various applications such as a Media Player (Media Player), a Browser (Browser), etc. for implementing various application services. Programs that implement methods in accordance with embodiments of the invention may be included in application 11022.

In the embodiment of the present invention, by calling a program or an instruction stored in the memory 1102, specifically, a program or an instruction stored in the application 11022, the processor 1101 is configured to execute the method steps provided by the method embodiments, for example, including:

The methods disclosed in the embodiments of the present invention described above may be implemented in the processor 1101 or by the processor 1101. The processor 1101 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by instructions in the form of hardware, integrated logic circuits, or software in the processor 1101. The Processor 1101 may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software elements in the decoding processor. The software elements may be located in ram, flash, rom, prom, or eprom, registers, among other storage media that are well known in the art. The storage medium is located in the memory 1102, and the processor 1101 reads the information in the memory 1102 and completes the steps of the above method in combination with the hardware thereof.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented by means of units performing the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

The electronic device provided in this embodiment may be the electronic device shown in fig. 11, and may execute all the steps of the image processing method shown in fig. 1-2, so as to achieve the technical effect of the image processing method shown in fig. 1-2, and for brevity, it is not described herein again.

The embodiment of the invention also provides a storage medium (computer readable storage medium). The storage medium herein stores one or more programs. Among others, the storage medium may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory may also comprise a combination of memories of the kind described above.

When one or more programs in the storage medium can be executed by one or more processors, the image processing method executed on the electronic device side is realized.

The processor is used for executing the picture processing program stored in the memory so as to realize the following steps of the picture processing method executed on the electronic equipment side:

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A picture processing method, characterized in that the method comprises:

2. The method according to claim 1, wherein said merging the text prediction mask and the special effect prediction mask to obtain a target prediction mask of the text region and the special effect region in the to-be-processed picture comprises:

3. The method of claim 1, wherein the text prediction mask comprises a first text prediction mask corresponding to text in the picture to be processed and a second text prediction mask corresponding to text edges in the picture to be processed;

4. The method of claim 1, wherein the trained detection model is obtained by:

5. The method of claim 4, wherein the sample literal prediction mask comprises at least: a first prediction mask and a first edge prediction mask; the first prediction mask corresponds to sample text in the sample text region, the first edge prediction mask corresponds to sample text edges in the sample text region;

6. The method of claim 4, further comprising, after said obtaining the sample picture:

and acquiring a sample text special effect edge from the sample special effect area, and marking the sample text special effect edge through a fourth marking mask.

7. The method of claim 6, wherein determining the literal cross-entropy for the sample literal region based on the sample literal prediction mask comprises:

8. The method of claim 6, wherein determining the special effect cross entropy for the sample special effect region from the sample special effect prediction mask comprises:

9. The method of claim 5, wherein the back propagation of the literal cross entropy and the special effect cross entropy to complete the training of the initial detection model comprises:

10. A picture processing apparatus, characterized in that the apparatus comprises:

11. An electronic device, comprising: a processor and a memory, the processor being configured to execute a picture processing program stored in the memory to implement the picture processing method according to any one of claims 1 to 9.

12. A storage medium storing one or more programs executable by one or more processors to implement the picture processing method according to any one of claims 1 to 9.