CN112258416A

CN112258416A - Image processing method and device and electronic equipment

Info

Publication number: CN112258416A
Application number: CN202011166429.XA
Authority: CN
Inventors: 金敏
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2020-10-27
Filing date: 2020-10-27
Publication date: 2021-01-22

Abstract

The application discloses an image processing method and device, and belongs to the technical field of image processing. The method comprises the following steps: dividing a target image based on pixels to generate a plurality of image areas; aiming at a target area selected by a user in the plurality of image areas, identifying the color of a background image of the target area; when the color is a single color, replacing pixels of the target area by pixels of an image area adjacent to the target area in the plurality of image areas; and under the condition that the colors comprise at least two colors, simulating a target image feature matched with the background image of the target area by using a countermeasure network, and replacing the image feature of the target area by using the target image feature. According to the method and the device, when the noise information of the target area in the image is removed, the removing accuracy, integrity and efficiency of the noise information are improved.

Description

Image processing method and device and electronic equipment

Technical Field

The application belongs to the technical field of image processing, and particularly relates to an image processing method, an image processing device and electronic equipment.

Background

With the development of image processing technology, users can process images by using an application program on a mobile terminal. Specifically, when the noise information specified by the user exists in the image, the user can use a tool (for example, a cutting tool, an eraser tool, etc.) in the cropping software of the mobile terminal to cut or erase the noise information, so that the noise information does not exist in the processed image.

However, the image processing scheme, regardless of whether the cropping or the erasing is performed, not only removes the noise information, but also crops or erases the image information (for example, the image information around the noise information) other than the noise information in the original image; in addition, when the noise information is removed by adopting the erasing mode, the user manually selects the area to be erased, and the problem of incomplete noise information removal is easy to occur.

Therefore, a user often transfers a picture needing to remove noise information from a mobile terminal to a Personal Computer (PC) terminal, and adopts professional retouching software of the PC terminal to remove the noise information from the picture.

Therefore, when removing noise information in an image, the image processing scheme in the related art generally has a problem that it is difficult to accurately, completely and efficiently remove the noise information.

Disclosure of Invention

The embodiment of the application aims to provide an image processing method, an image processing device and electronic equipment, which can solve the problem that when an image processing scheme in the related art removes noise information in an image, the noise information is difficult to remove accurately, completely and efficiently.

In order to solve the technical problem, the present application is implemented as follows:

in a first aspect, an embodiment of the present application provides an image processing method, including:

dividing a target image based on pixels to generate a plurality of image areas;

aiming at a target area selected by a user in the plurality of image areas, identifying the color of a background image of the target area;

when the color is a single color, replacing pixels of the target area by pixels of an image area adjacent to the target area in the plurality of image areas;

and under the condition that the colors comprise at least two colors, simulating a target image feature matched with the background image of the target area by using a countermeasure network, and replacing the image feature of the target area by using the target image feature.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

the dividing module is used for dividing the target image on the basis of pixels to generate a plurality of image areas;

the identification module is used for identifying the color of a background image of a target area aiming at the target area selected by a user in the plurality of image areas;

a first replacement module, configured to replace a pixel of the target region with a pixel of an image region adjacent to the target region in the plurality of image regions if the color is a single color;

and the second replacement module is used for simulating a target image characteristic matched with the background image of the target area by using a countermeasure network under the condition that the colors comprise at least two colors, and replacing the image characteristic of the target area by using the target image characteristic.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.

In the embodiment of the application, when noise information in a target image is removed, the target image can be divided based on pixels to generate a plurality of image areas, then, a target area needing to remove the noise information is selected for users in the plurality of image areas, and the color of a background image of the target area is identified, so that the position of the target area and the color of the background image are determined more accurately as the determined target area is determined based on the pixel granularity; in the case that the color is a single color, that is, the target region includes noise information, but the background color of the target region is a pure color, pixels of an image region adjacent to the target region in the plurality of image regions may be used to replace the pixels of the target region, where, since the image region adjacent to the target region and the target region are from the same target image, there is a high probability that the background color of the adjacent image region coincides with the background color of the target region, and therefore, the noise information in the replaced target region may be replaced with an image feature that coincides with the background color of the target region, thereby achieving accurate and complete removal of the noise information; and under the condition that the background color of the target area is not pure color, namely the color of the background image of the target area comprises at least two colors, the countermeasure network can be utilized to simulate the target image characteristics matched with the background image of the target area, and then the target image characteristics are adopted to replace the image characteristics of the target area, so that the noise information in the target area after image characteristic replacement is replaced by the characteristics similar to the characteristics of the background image of the target area, thereby not only realizing effective and accurate removal of the noise information in the target area, but also keeping the background characteristics basically consistent with the target area in the target image before updating in the target area after updating, and therefore, the replacement operation only removes the foreground characteristics as the noise information in the target area, and can completely and completely remove the noise information in the target area, Image information except the noise information cannot be removed, and the accuracy and the integrity of removing the noise information in the image are ensured; in addition, when the noise information in the target image is removed, the image transfer and return processes among a plurality of terminals are not involved, so that the instantaneity and the efficiency of removing the noise information of the image are improved.

Drawings

FIG. 1 is a flow diagram of an image processing method according to one embodiment of the present application;

FIG. 2 is a flow chart of an image processing method according to another embodiment of the present application;

FIG. 3 is a flow chart of an image processing method of yet another embodiment of the present application;

FIG. 4 is one of the schematic images in the image processing method according to an embodiment of the present application;

FIG. 5 is a second schematic diagram of an image processing method according to an embodiment of the present application;

FIG. 6 is a third schematic diagram of an image in the image processing method according to an embodiment of the present application;

FIG. 7 is a fourth schematic diagram of an image in the image processing method according to an embodiment of the present application;

FIG. 8 is a fifth diagram of an image processing method according to an embodiment of the present application;

FIG. 9 is a sixth schematic view of an image in the image processing method according to an embodiment of the present application;

FIG. 10 is a seventh exemplary diagram of an image in the image processing method according to the embodiment of the present application;

FIG. 11 is an eighth schematic diagram of an image in the image processing method according to an embodiment of the present application;

FIG. 12 is a ninth illustration of an image in an image processing method according to an embodiment of the present application;

FIG. 13 is a block diagram of an image processing method according to an embodiment of the present application;

FIG. 14 is an eleventh schematic diagram of an image in the image processing method according to an embodiment of the present application;

fig. 15 is a block diagram of an image processing apparatus of still another embodiment of the present application;

FIG. 16 is a diagram of a hardware configuration of an electronic device according to an embodiment of the present application;

fig. 17 is a hardware configuration diagram of an electronic device according to another embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The image processing method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

Referring to fig. 1, a flow chart of an image processing method of an embodiment of the present application is shown.

Alternatively, the method may be applied to a mobile terminal.

The method shown in fig. 1 may specifically include the following steps:

step 101, dividing a target image based on pixels to generate a plurality of image areas;

the target image is divided in units of pixels, and the generated image areas may have the same size as the target image or a size slightly larger than the target image.

Step 102, aiming at a target area selected by a user in the plurality of image areas, identifying the color of a background image of the target area;

the user may select an area in the target image, that is, a target area, where the target area may be one or more image areas in the plurality of image areas.

Here, the coordinate position of the target area selected by the user in the target image can be identified.

The target image is an image from which noise information needs to be removed, and the target region is an image region in which the noise information is included in the target image.

The target area may be an area designated by a user through a user input (specifically, an input of the target image by the user is received, the input includes a selected target area, and a coordinate position of the target area in the target image is identified in response to the input), or an area including preset noise information automatically identified by the system.

After the position of the target area is determined, the color of the background image of the target area may be identified to determine whether the color of the background image of the target area is a single color or includes at least two colors.

The explanation here is made on a foreground image, a background image, a foreground feature, and a background feature in an image (or an image area in an image):

one image may be composed of a foreground image including a subject object (e.g., a focusing object) of the one image and a background image including an object other than the subject object in the one image.

Whether a foreground image or a background image is derived from an image, the image can be designated by a user, and can also be automatically identified by a system (wherein the system can comprise a pre-trained neural network model which can automatically identify the foreground image and the background image in the image);

the foreground features are image features identified for the foreground images; the background features are image features recognized from the background image, and when the image features are recognized, a neural network model which is trained in advance and used for image feature recognition can be adopted to realize the recognition.

103, in the case that the color is a single color, replacing the pixels of the target area by using the pixels of the image area adjacent to the target area in the plurality of image areas;

when the color of the background image of the target area is a single color, the pixels of the target area may be replaced by pixels of an image area adjacent to the coordinate position (which is an area) where the target area is located in the target image.

And 104, under the condition that the colors comprise at least two colors, simulating a target image feature matched with a background image of the target area by using a countermeasure network, and replacing the image feature of the target area by using the target image feature.

Wherein, the target area is used as the image area in the target image, therefore, the target area can be divided into two types of images of foreground image and background image, the foreground image may be noise information to be removed in the target region, and the target image feature matched with the background image in this step may be an image feature of the background image (i.e., a background feature), so that, in order to completely and accurately remove the noise information in the target region, a countermeasure network can be utilized to simulate target image features matching the background image of the target area, then, the background feature of the background image in the target area (i.e. the target image feature) is adopted to update the image feature of the target area, and replacing the foreground image belonging to the noise information in the target area by the target image characteristic, and keeping the image not belonging to the noise information in the target area as the original image.

Referring to fig. 2, a flow chart of an image processing method of another embodiment of the present application is shown. The flow of fig. 2 may be described in conjunction with a specific example of the image processing method shown in fig. 3. Referring to fig. 2, the method may specifically include the following steps:

step 201, performing pixel division on a target image according to pixel units with preset sizes to generate a plurality of pixel units matched with the target image;

wherein, the preset size can comprise one size or a plurality of sizes;

when the preset size is one size, the sizes of the generated multiple pixel units are all the same size, for example, a pixel unit composed of 2 × 2 pixels; when the predetermined size includes a plurality of sizes, for example, two sizes of 2 × 2 and 4 × 6 (where the unit of the two sizes is a pixel), the generated plurality of pixel cells may include a pixel cell of 2 × 2 pixels and a pixel cell of 4 × 6 pixels.

Whether the preset size specifically includes one size or a plurality of sizes, and the size of the specific size, may be determined according to the pixel size of the target image and the image content in the target image.

For example, if the pixel size of the target image is less than 750 × 750, dividing the pixel unit according to 2 × 2 size; when the pixel size of the target image is larger than or equal to 750 multiplied by 750, dividing pixel units according to the size of 3 multiplied by 3;

if the color of a certain large-area in the target image is single, dividing the pixel unit of the large-area by adopting the size of 4-6; in addition, a small area of the target image is rich in color, and the small area of the target image may be divided into pixel units by using a size smaller than 4 × 6, for example, 2 × 2.

Further, the reason why a plurality of pixel units matching the target image are defined in this step is that:

when the target image is divided on a pixel basis, the number of pixels at the edge position of the target image may not be enough for the size of one pixel unit (e.g., 3 × 3), and then further pixel expansion needs to be performed on the edge of the target image during the division, so that the total size of the generated plurality of pixel units may be equal to or larger than the total pixel size of the target image.

For example, the preset size is 3 × 3, the pixel size of the target image is 4 × 4, after the target image is divided into one pixel unit of 3 × 3, the edge of the target image, for example, the remaining 1 pixel of each of the right edge and the lower edge, is not enough to be 3 pixels, the remaining 1 pixel of the right edge may be copied to the right to expand 2 pixels (the two pixels are completely the same as the remaining 1 pixel of the right edge), the remaining 1 pixel of the lower edge may be copied to the down to expand 2 pixels (the two pixels are completely the same as the remaining 1 pixel of the lower edge), so that the pixel size of the target image is expanded to 6 × 6, and the expanded target image is divided into the pixel units according to the preset size of 3 × 3, so that 4 pixel units of 3 × 3 matched with the target image may be generated.

In addition, as for the division of the target image in pixel units in this step, the division may be an image division operation performed on the target image, or may be background information storage of the target image on pixel unit information (that is, the division information on the target image is stored in the background, but the target image is also original image).

In one example, referring to fig. 3, a user imports a picture 11 (i.e., a picture a) shown in fig. 4 in a target application, the user wants to crop the picture 11, specifically, the user can mark a region box 12 in the picture 11 to represent a cropping area, then the target application pops up an "edit" button and a "complete" button outside the picture 11, and if noise information (i.e., a noise image) is not included in the region box 12, the user can click the "complete" button, and the target application crops an image region in the region box 12 out of the picture 11; on the other hand, if the image area in the area frame 12 includes noise information, for example, the text "good" in the area frame 12 is regarded as noise information (the text may be designated as noise information by the user, or the text may be automatically recognized as noise information by the system), the user may click the "edit" button, and then the target application may automatically execute step 201, step 202, and step 203 in response to the operation of clicking the "edit" button, which corresponds to "automatically divide the picture a into MxN small squares by pixels (i.e., the plurality of pixel units generated in step 201) in fig. 3, where the area where the noise is located is an area surrounded by coordinates (t, g), (t, h), (k, g), (k, h), (M > h > g >0, N > k > t > 0)". The picture a may be placed in the fourth quadrant of the rectangular plane coordinate system, and divided into N parts in the X-axis direction of the picture a and M parts in the Y-axis direction of the picture a, so as to generate MxN small squares. Therefore, after the user clicks the "edit" button in fig. 3, as shown in fig. 4, the divided small squares are displayed in the picture 11 (i.e., the picture a), but the data of the divided small squares may be stored in the background and not displayed in the picture 11.

Step 202, taking the area where each pixel unit is located as an image area, and generating a plurality of image areas;

in the embodiment of the application, the target image can be subjected to pixel division according to pixel units with preset sizes to generate a plurality of pixel units matched with the target image, and using the area of each pixel unit as an image area to generate a plurality of image areas, since the granularity of a pixel element is much smaller than the granularity of a single image region, the target image feature that matches the background image of the target region also matches the granularity of the pixel element, the target image characteristics can more accurately express the image characteristics of the background image where the noise image is located in the target area, and the pixels of the image area adjacent to the target area can more accurately express the image characteristics of the background image in which the noise image is positioned in the target area, therefore, the image feature of the target region after the replacement of the pixel or the target image feature can be fused with the background feature of the image around the target region at a higher degree.

Step 203, identifying a coordinate area formed by a plurality of first pixel units corresponding to a target area selected by a user in the target image in the plurality of pixel units;

for example, the target area is an area marked by the user and including noise information, and after the user clicks the "edit" button in fig. 4, the target area 13 where the noise information is located may be manually circled as shown in fig. 5; then, the method of the embodiment of the present application may identify a plurality of small squares occupied by the target area 13 (i.e., the noise area Z) in the MxN small squares, and since each of the MxN small squares has coordinates, the coordinate area of the target area 13 may be determined according to a plurality of coordinates of the plurality of small squares occupied by the target area 13.

It should be noted that, if a specific position of a small square is not marked in a target image, for example, a picture 11, when a target area 13 is manually circled, a boundary of the target area may occupy half of the small squares, and when a plurality of small squares constituting the target area are determined, coordinates of a complete small square to which half of the small squares occupied by the target area (i.e., not the complete small square) belong may be all divided into coordinates in the target area, that is, a minimum division unit of the target area 13 is one small square, and there is no partial small square.

For example, the coordinates of the four vertices of the noise zone Z in the region of the MxN small squares are: (t, g), (t, h), (k, g), and (k, h), the coordinate region of the target region is a coordinate region surrounded by the coordinates of the four vertices.

Step 204, identifying the color of the background image of the coordinate area;

after steps 201 to 203, step 204 may be performed. For example, as shown in fig. 3, it may be determined whether the background color of the target region is a pure color, and the determining step may be performed according to a user input (for example, the user may input an instruction 1 indicating that the background color is a pure color, or an instruction 2 indicating that the background color is not a pure color), or may be performed automatically by the target program (first, the target program includes a neural network model that is trained in advance and identifies a foreground image and a background image, and then, for the background image identified by the neural network model, extracts a color feature, thereby determining whether the color of the background image of the target region is a single color, and the single color is a pure color, otherwise, the single color is not a pure color).

In the embodiment of the application, a plurality of pixel units matched with a target image can be generated by performing pixel division on the target image according to pixel units with preset sizes, so that when the position of a target area in the target image is determined, only a plurality of first pixel units corresponding to the target area in the target image need to be identified in the plurality of pixel units, a coordinate area formed by the plurality of first pixel units is determined as the position of the target area, and then the color of a background image of the coordinate area is identified. Since the granularity of the pixel unit is smaller than that of the image area, the pixel of the image area adjacent to the target area and the target image feature matched with the background image of the target area are also matched with the granularity of the pixel unit, so that the adopted pixel and the target image feature can more accurately express the image feature of the background image where the noise image is located in the target area, therefore, the image feature of the target area after the pixel or target image feature replacement can be more highly fused with the background feature of the image around the target area, and the color of the identified background image of the target area is more accurate.

Step 205, in a case that the color of the background image of the target area is a single color, identifying, among the plurality of pixel units, a second pixel unit whose coordinate position is adjacent to a coordinate area formed by the plurality of first pixel units;

wherein the first pixel unit is a pixel unit matched with (e.g. within) the target area, such as a small square; and the second pixel unit is a pixel unit which is positioned outside the target area and is adjacent to the coordinate position of the target area in the target image.

Further, for example, fig. 5 shows that there are a plurality of pixel units adjacent to the target area 13 coordinate, and the number of the second pixel units here may also be one or more, and when the number of the second pixel units is one, the second pixel unit may be one pixel unit randomly selected from a plurality of pixel units adjacent to the target area 13 coordinate; when the number of the second pixel units is plural, the second pixel units may be plural pixel units randomly selected from among plural pixel units that are adjacent to the target region 13 coordinates.

And step 206, replacing the pixels of the plurality of first pixel units by using the pixels of the second pixel unit.

When the number of the second pixel units is one, each image pixel in the target area can be replaced by directly using the image pixel of the second pixel unit (namely, the image feature of the second pixel unit);

when the number of the second pixel units is multiple, the average value of the color features of the image pixels of the multiple second pixel units may be used to replace the value of the color feature of each image pixel in the target area.

In the example of fig. 3, when the background color is a pure color, that is, in the case that the color of the background image of the target region is a single color, as shown in fig. 4, small squares adjacent to the target region 13, for example, small squares with coordinates (t-1, g), are randomly selected in the image region of the picture 11 except the target region 13, and each image pixel of the target region (i.e., noise region Z)13 surrounded by four vertexes of (t, g), (t, h), (k, g), (k, h) is replaced by an image pixel of the small squares with coordinates (t-1, g), so as to obtain the processed image 14 (i.e., image B) shown in fig. 6, wherein the noise information (text "good" is removed) in the target region 13 in fig. 5, in this example, a plurality of divided small squares are displayed on the picture 11 of fig. 4, therefore, it is also necessary to hide or remove a plurality of small squares displayed in the picture 11 in fig. 5 to generate the image 14 shown in fig. 6, and optionally, the image 14 with the noise information removed may be cropped to save the cropped picture.

In the embodiment of the present application, in the case where the color of the background image of the target region including noise information in the target image is a single color, since the target image has been divided into a plurality of pixel units, pixels of a plurality of first pixel units within the target area may be replaced with pixels of a second pixel unit having a coordinate position adjacent to the coordinate area of the target area, because, in the case where the color of the background image of the target area is a single color, the image characteristic (e.g., color) of the second pixel unit whose coordinate position is adjacent to the coordinate area of the target area is close to or equal to the image characteristic (e.g., color) of the background image of the target area, therefore, the noise information of the target area in the processed target image is replaced by the background characteristics (such as background color) of the target area, and the accurate and complete removal of the noise information is ensured; and because the granularity of the pixel units is smaller, the image characteristics of the second pixel unit adjacent to the target area are more easily close to the background characteristics of the target area, so that the image characteristics of the target area of the processed target image can be basically consistent with the background characteristics outside the target area.

Step 207, training a confrontation network to learn the background features of the target image and generating a confrontation network model under the condition that the colors of the background image of the target area comprise at least two colors;

in this case, when the color of the background image of the target area is at least two colors, i.e., not a single color, for example, a watermark or a background image is present in the background image of the target area.

In one example, similar to fig. 5, as shown in fig. 7, the target image of fig. 7, i.e. the noise information in the picture 11 'is the text "good", and the user circles the target area 13' where the noise information is located in fig. 7; while fig. 8 and 7 differ only in that fig. 7 shows small squares divided within the target image, fig. 8 may not be divided by the small squares.

As shown in fig. 7 and 8, since the color of the background image of the target region 13 'is not a single color, the confrontation network model M can be generated by training the confrontation network to learn the background feature (here, the tree feature) of the image 11'.

Optionally, in an embodiment, when training the countermeasure network, a training sample set is first acquired, where the training sample set includes multiple pairs of samples, and each pair of samples includes a positive sample and a negative sample;

when constructing multiple pairs of samples, this can be achieved by the following steps 301 and 302:

step 301, randomly selecting a region T (M > k >0, N > f >0) composed of k × f small squares in a region (the region includes a background image of the target image) other than the target region (i.e., a noise region) in the target image, and masking an image of the region T with a special pixel (e.g., black) to obtain a negative sample, where the target image is a positive sample, thereby forming a pair of samples; then repeating step 301P times, so that P pairs of samples can be obtained; in addition, in order to increase the number of samples, step 302 may further perform at least one of a plurality of data enhancement operations such as rotation, cutting, and color change on the P pairs of samples, so as to finally generate N pairs of samples (N > P >0, where N and P are both integers).

In one example, fig. 9 and 10 are a pair of samples, where fig. 9 is a positive sample and fig. 10 is a negative sample, where fig. 9 shows image 11' (i.e., the target image), where region 16 was selected in fig. 9 for black pixel masking, resulting in the negative sample shown in fig. 10, image 15; fig. 11 and 12 are also a pair of samples, where fig. 11 is a positive sample and fig. 12 is a negative sample.

In addition, it should be noted that the image sizes of each pair of samples in the training sample set are completely consistent, and taking fig. 9 and 10 as an example to illustrate, the confrontation network can find the position of the region 16 in fig. 9 where the region 16 is located before being masked with black pixels through the input region 16 marked in fig. 10 in the pair of samples in fig. 9 and 10. Thus, in training the countermeasure network, the countermeasure network can learn a pair of samples of the input, referring to the positive sample, for example, to fig. 9, for the negative sample, for example, the background feature before the region 16 is masked in fig. 10; due to the difference of the masked positions in the different pairs of samples, after the learning convergence, the generated confrontation network model M can learn the background features (such as the tree features in fig. 9) in the target image.

Optionally, when the relative area of the region where the noise information is located in the target picture is relatively large or it is difficult for the target application to collect a sufficient number of samples (for example, the number of pairs N of finally obtained samples is less than N), the method according to the embodiment of the present application may further perform the following operations:

a group of pictures E with background images similar to the background image of the target image can be locally searched for in a mobile terminal (e.g., a mobile phone) by using an Artificial Intelligence (AI) algorithm, and then the above steps 301 and 302 are continuously performed on the group of pictures E, so that the continuous sampling of the samples is realized until the number of samples reaches the required number N of samples.

If a group of pictures E with background images similar to the background image of the target image is not found locally, or a group of pictures E is found, but after sampling is continued, the number of pairs of samples is still less than N, the target picture can be put into an internet search engine to search for pictures with background images similar to the background image, for example, a group of pictures F is found, and then the above steps 301 and 302 are performed on the group of pictures F to realize sampling continuously until the number of pairs of samples is greater than or equal to N.

Step 208, covering the target area of the target image with a preset color to generate a first image;

in one example, the target area 13 ' in the target image 11 ' shown in fig. 8 may be masked in black color, i.e., each pixel within the target area 13 ' is replaced with a black pixel, thereby generating the first image 17 shown in fig. 13.

Step 209, simulating a target image feature in the first image, which matches the background image of the target area, based on the background feature (e.g., the background feature (here, the tree feature) of the image 11' in fig. 7, 8, or 9) by using the confrontation network model, and replacing the image feature of the target area of the first image with the target image feature by using the confrontation network model.

Here, since the countering network model M learns the background features, here the tree features, of the image 11 'in fig. 7, 8 or 9, the countering network model M can simulate the background features of the region 18, which is hidden by the black pixels, in the first image 17 shown in fig. 13 input into the model M, before the hiding, that is, simulate the background features (i.e., target image features) in the first image 17 that match the background image of the target region 13' in fig. 8, and then replace the image features of the region 18 of the first image 17 in fig. 13 with the target image features generated by the simulation, so as to generate the image 19 shown in fig. 14. Alternatively, the subsequent user may perform a cropping operation on the image 19 from which the noise information is removed, and save the cropped image.

In the embodiment of the application, under the condition that the color of the background image of the target area with the noise information comprises at least two colors, the confrontation network can be trained to learn the background characteristics of the target image, and a confrontation network model is generated; then, the target area of the target image is covered by a preset color to generate a first image; and simulating a target image feature matched with a background image of the target area in the first image based on the learned background feature by using the confrontation network model, and replacing the image feature of the target area of the first image by the target image feature by using the confrontation network model. Because the countermeasure network model can learn the background features of the background image in the target image, the background features of the target area covered in the target image can be simulated by utilizing the background features learned by the countermeasure network model, and the target image features generated after simulation are replaced into the target area of the target image, so that not only the noise information is removed from the target area in the replaced target image, but also the image features of the target area without the noise information can be fused and matched with the background features of the images except the target area, and the accuracy of removing the noise from the image is improved.

Optionally, when the countermeasure network is trained to learn the background features of the target image and generate a countermeasure network model, the countermeasure network may also be trained to learn the background features of the target image according to semantic features, and generate the countermeasure network model, where the semantic features are semantic features extracted from a target text, where the target text is a text of a background image of the target region of the target image.

In one example, in order to enable the generated countermeasure network model to learn more accurate background features of the target image, especially for the case that the background image in the target image has obvious semantics, the countermeasure network model can be deeply optimized in combination with a natural language processing technology.

When the confrontation network is trained, each pair of samples in the training sample set not only comprises the positive sample and the negative sample described above, but also can further comprise text information describing a background image in the positive sample;

when only two image data, namely a positive sample and a negative sample, exist in a pair of sample information, the training confrontation network described in the above embodiment can be directly adopted to learn the background features of the target image according to the semantic features, and a confrontation network model is generated;

if the pair of sample information includes not only the positive sample and the negative sample but also the text information, the text information can be input into a natural language model to extract semantic features, and the semantic features of the text information are obtained; and then carrying out data splicing on the image data (the positive sample and the negative sample) and the semantic features of the text information to form new sample data, training the countermeasure network by using the new sample data, and training the countermeasure network to learn the background features in the positive sample according to the semantic features so as to generate a countermeasure network model.

In the embodiment of the present application, after the countermeasure network model separately learns the background features of the background image in the target image, the accuracy and reliability of the target image features generated for the masked noise region may be low, so that when the countermeasure network is trained, a text description (i.e., a target text) about the background image in the target image is added, semantic features are extracted from the target text, and then the countermeasure network is trained to learn the background features of the target image according to the semantic features, so that sample information describing the background features of the masked noise region in the target image includes not only image features but also semantic features, thereby improving the accuracy and reliability of the trained countermeasure network model.

Furthermore, after training the confrontation network to learn the background features of the target image according to the semantic features to generate the confrontation network model, when the trained confrontation network model is used, the data input to the confrontation network model not only includes the first image described above, but also further includes the target semantic features describing the text content of the background image in the target image (for example, after the user clicks an "edit" button in fig. 8, for example, a text box may be output for the user to input the text content to describe the content of the background image of the noise region in the target picture); the method comprises the steps that semantics can be extracted from text contents through a natural language processing model, and target semantic features are generated; then, the confrontation network model may simulate a target image feature in the first image that matches a background image of the target region based on a previously learned background feature of the target image that matches the target semantic feature, and replace the image feature of the target region of the first image with the target image feature using the confrontation network model.

Compared with a mode of removing noise information by using a countermeasure network model trained by a sample without text information, in the embodiment of the application, the countermeasure network model trained by the sample including image information and text information is used to remove the noise information, so that the target image features updated by the target region where the noise information is located can be more matched and fused with the background features of the target image.

It should be noted that, in the image processing method provided in the embodiment of the present application, the execution subject may be an image processing apparatus, or a control module in the image processing apparatus for executing the image processing method. The image processing apparatus provided in the embodiment of the present application is described with an example in which an image processing apparatus executes an image processing method.

Referring to fig. 15, a block diagram of an image processing apparatus according to an embodiment of the present application is shown. The image processing apparatus includes:

a dividing module 31, configured to divide the target image based on pixels to generate a plurality of image areas;

the identification module 32 is configured to identify, for a target area selected by a user in the plurality of image areas, a color of a background image of the target area;

a first replacing module 33, configured to replace a pixel of the target area with a pixel of an image area adjacent to the target area in the plurality of image areas if the color is a single color;

and a second replacing module 34, configured to, if the colors include at least two colors, simulate, by using a countermeasure network, a target image feature that matches the background image of the target area, and replace, with the target image feature, the image feature of the target area.

Optionally, the dividing module 31 includes:

the dividing submodule is used for carrying out pixel division on a target image according to a pixel unit with a preset size to generate a plurality of pixel units matched with the target image;

and the first generation submodule is used for taking the area where each pixel unit is positioned as an image area and generating a plurality of image areas.

Optionally, the identification module 32 includes:

the first identification submodule is used for identifying a coordinate area formed by a plurality of first pixel units corresponding to a target area selected by a user in the target image in the plurality of pixel units;

and the second identification submodule is used for identifying the color of the background image of the coordinate area.

Optionally, the first replacement module 33 includes:

a third identifying submodule configured to identify, in the plurality of pixel units, a second pixel unit whose coordinate position is adjacent to the coordinate area, in a case where the color is a single color;

and the first replacement submodule is used for replacing the pixels of the plurality of first pixel units by adopting the pixels of the second pixel unit.

Optionally, the second replacement module 34 includes:

the training sub-module is used for training a confrontation network to learn the background characteristics of the target image and generating a confrontation network model under the condition that the colors comprise at least two colors;

the second generation submodule is used for masking the target area of the target image by a preset color to generate a first image;

and the second replacing sub-module is used for simulating a target image characteristic matched with the background image of the target area in the first image based on the background characteristic by adopting the confrontation network model, and replacing the image characteristic of the target area of the first image by the target image characteristic by adopting the confrontation network model.

Optionally, the training sub-module further comprises:

and the training unit is used for training the confrontation network to learn the background features of the target image according to the semantic features to generate a confrontation network model, wherein the semantic features are extracted semantic features of a target text, and the target text is a text of the background image of the target area describing the target image.

The image processing apparatus in the embodiment of the present application may be an apparatus, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Personal Computer (PC), a Television (TV), a teller machine, a self-service machine, and the like, and the embodiments of the present application are not limited in particular.

The image processing apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android operating system (Android), an iOS operating system, or other possible operating systems, which is not specifically limited in the embodiments of the present application.

The image processing apparatus provided in the embodiment of the present application can implement each process implemented by the foregoing method embodiment, and is not described here again to avoid repetition.

Optionally, as shown in fig. 16, an electronic device 2000 is further provided in this embodiment of the present application, and includes a processor 2002, a memory 2001, and a program or an instruction stored in the memory 2001 and executable on the processor 2002, where the program or the instruction implements each process of the above-described embodiment of the image processing method when executed by the processor 2002, and can achieve the same technical effect, and no further description is provided here to avoid repetition.

It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic devices and the non-mobile electronic devices described above.

Fig. 17 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 1000 includes, but is not limited to: a radio frequency unit 1001, a network module 1002, an audio output unit 1003, an input unit 1004, a sensor 1005, a display unit 1006, a user input unit 1007, an interface unit 1008, a memory 1009, and a processor 1010.

Those skilled in the art will appreciate that the electronic device 1000 may further comprise a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 1010 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The electronic device structure shown in fig. 17 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description thereof is omitted.

The processor 1010 is configured to divide the target image based on pixels to generate a plurality of image areas; aiming at a target area selected by a user in the plurality of image areas, identifying the color of a background image of the target area; when the color is a single color, replacing pixels of the target area by pixels of an image area adjacent to the target area in the plurality of image areas; and under the condition that the colors comprise at least two colors, simulating a target image feature matched with the background image of the target area by using a countermeasure network, and replacing the image feature of the target area by using the target image feature.

Optionally, the processor 1010 is configured to perform pixel division on the target image according to a pixel unit with a preset size, so as to generate a plurality of pixel units matched with the target image; and taking the area where each pixel unit is positioned as an image area to generate a plurality of image areas.

Optionally, the processor 1010 is configured to identify, in the plurality of pixel units, a coordinate area formed by a plurality of first pixel units corresponding to a target area selected by a user in the target image; identifying a color of a background image of the coordinate region.

Optionally, the processor 1010 is configured to identify, in a case where the color is a single color, a second pixel unit whose coordinate position is adjacent to the coordinate area among the plurality of pixel units; and replacing the pixels of the plurality of first pixel units by adopting the pixels of the second pixel units.

Optionally, the processor 1010 is configured to train a confrontation network to learn the background features of the target image and generate a confrontation network model if the colors include at least two colors; covering the target area of the target image with a preset color to generate a first image; and simulating a target image feature matched with the background image of the target area in the first image based on the background feature by adopting the confrontation network model, and replacing the image feature of the target area of the first image by the target image feature by adopting the confrontation network model.

Optionally, the processor 1010 is configured to train a countermeasure network to learn a background feature of the target image according to a semantic feature, and generate a countermeasure network model, where the semantic feature is a semantic feature extracted from a target text, and the target text is a text of a background image of the target region of the target image.

It should be understood that in the embodiment of the present application, the input Unit 1004 may include a Graphics Processing Unit (GPU) 10041 and a microphone 10042, and the Graphics Processing Unit 10041 processes image data of still pictures or videos obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 1006 may include a display panel 10061, and the display panel 10061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes a touch panel 10071 and other input devices 10072. The touch panel 10071 is also referred to as a touch screen. The touch panel 10071 may include two parts, a touch detection device and a touch controller. Other input devices 10072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. The memory 1009 may be used to store software programs as well as various data, including but not limited to application programs and operating systems. Processor 1010 may integrate an application processor that handles primarily operating systems, user interfaces, applications, etc. and a modem processor that handles primarily wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 1010.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the embodiment of the image processing method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the embodiment of the image processing method, and can achieve the same technical effect, and the details are not repeated here to avoid repetition.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An image processing method, characterized in that the method comprises:

dividing a target image based on pixels to generate a plurality of image areas;

2. The method of claim 1, wherein the dividing the target image on a pixel basis to generate a plurality of image regions comprises:

according to a pixel unit with a preset size, performing pixel division on a target image to generate a plurality of pixel units matched with the target image;

and taking the area where each pixel unit is positioned as an image area to generate a plurality of image areas.

3. The method of claim 2, wherein identifying, for a user-selected target region in the plurality of image regions, a color of a background image of the target region comprises:

identifying a coordinate area formed by a plurality of first pixel units corresponding to a target area selected by a user in the target image in the plurality of pixel units;

identifying a color of a background image of the coordinate region.

4. The method according to claim 3, wherein the replacing pixels of the target region with pixels of an image region adjacent to the target region in the plurality of image regions in the case that the color is a single color comprises:

in a case where the color is a single color, identifying, among the plurality of pixel units, a second pixel unit whose coordinate position is adjacent to the coordinate area;

and replacing the pixels of the plurality of first pixel units by adopting the pixels of the second pixel units.

5. The method according to claim 1 or 2, wherein the replacing the image feature of the target area with the target image feature by simulating the target image feature matching with the background image of the target area with a countermeasure network if the colors include at least two colors comprises:

training a confrontation network to learn the background features of the target image under the condition that the colors comprise at least two colors, and generating a confrontation network model;

covering the target area of the target image with a preset color to generate a first image;

and simulating a target image feature matched with the background image of the target area in the first image based on the background feature by adopting the confrontation network model, and replacing the image feature of the target area of the first image by the target image feature by adopting the confrontation network model.

6. The method of claim 5, wherein the training confrontation network learns background features of the target image, generating a confrontation network model, comprising:

training a countermeasure network to learn the background features of the target image according to the semantic features to generate a countermeasure network model, wherein the semantic features are extracted semantic features of a target text, and the target text is a text of the background image of the target area describing the target image.

7. An image processing apparatus, characterized in that the apparatus comprises:

8. The apparatus of claim 7, wherein the partitioning module comprises:

9. The apparatus of claim 8, wherein the identification module comprises:

10. The apparatus of claim 9, wherein the first replacement module comprises:

11. The apparatus of claim 7 or 8, wherein the second replacement module comprises:

12. The apparatus of claim 11, wherein the training sub-module further comprises:

13. An electronic device comprising a processor, a memory and a program or instructions stored on the memory and executable on the processor, the program or instructions, when executed by the processor, implementing the steps of the image processing method according to any one of claims 1 to 6.