WO2018177237A1

WO2018177237A1 - Image processing method and device, and storage medium

Info

Publication number: WO2018177237A1
Application number: PCT/CN2018/080446
Authority: WO
Inventors: 朱晓龙; 郑永森; 王浩; 黄凯宁; 罗文寒; 高雨; 杨之华; 华园; 曾毅榕; 吴发强; 黄祥瑞
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2017-03-29
Filing date: 2018-03-26
Publication date: 2018-10-04
Also published as: CN107025457B; CN107025457A

Abstract

Disclosed in an embodiment of the present invention are an image processing method and device. The method disclosed in the embodiment of the present invention comprises: after receiving a processing request with respect to an image, acquiring, according to an indication in the request, a semantic segmentation model corresponding to an element type to be replaced; performing, according to the model, an estimation of a probability that each pixel in the image belongs to the element type to obtain a preliminary probability graph; and performing according to conditional random fields, optimization on the preliminary probability graph, and using a segmented image obtained after the optimization to fuse the image and a preset element material, thereby replacing a certain element type in the image with the preset element material.

Description

Image processing method, device and storage medium

The present application claims priority to Chinese Patent Application No. 200910199165.X filed on March 29, 2017, the entire disclosure of which is incorporated herein by reference. In the application.

Technical field

The present application relates to the field of computer technologies, and in particular, to an image processing method, apparatus, and storage medium.

Background of the invention

With the popularity of smart mobile terminals, shooting records anytime and anywhere has gradually become a way of life, and at the same time, image processing, such as image beautification or special effects, has become more and more popular.

Summary of the invention

The embodiment of the present application provides an image processing method and apparatus; the accuracy of segmentation can be improved, and the fusion effect can be improved.

The embodiment of the present application provides an image processing method, which is applied to an image processing apparatus, and the method includes:

Receiving an image processing request indicating an image to be processed, and an element type to be replaced;

Obtaining a semantic segmentation model corresponding to the element type, the semantic segmentation model being trained by a deep neural network;

Determining, according to the semantic segmentation model, a probability that each pixel in the image belongs to the element type, and obtaining an initial probability map;

The image is merged with the preset element material according to the initial probability map to obtain a processed image.

An embodiment of the present application further provides an image processing apparatus, where the apparatus includes a processor and a memory, wherein the memory stores an instruction executable by the processor, and when the instruction is executed, the processor is used by :

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present application. Other drawings can also be obtained from those skilled in the art based on these drawings without paying any creative effort.

FIG. 1 is a schematic diagram of a scenario of an image processing method according to an embodiment of the present application;

FIG. 1b is a flowchart of an image processing method provided by an embodiment of the present application;

2a is another flowchart of an image processing method provided by an embodiment of the present application;

2b is a diagram showing an example of an interface of an image processing request in an image processing method according to an embodiment of the present application;

2c is a diagram showing an example of sky segmentation in an image processing method provided by an embodiment of the present application;

2d is a process flow diagram of an image processing method provided by an embodiment of the present application;

FIG. 3 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application; FIG.

FIG. 3b is another schematic structural diagram of an image processing apparatus according to an embodiment of the present application; FIG.

4 is a schematic structural diagram of a server provided by an embodiment of the present application.

Implementation

The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the drawings in the embodiments of the present application. It is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present application without creative efforts are within the scope of the present application. In an embodiment of the present application, element replacement is one of the most common techniques in special effects processing. Taking the sky element as an example, the threshold value can be generally determined based on the color and position of the sky in the image, and then the image is segmented according to the judgment result, and the sky region obtained after the segmentation is replaced with other elements, such as Pyrotechnics, reindeer, or two-dimensional space, etc., so that the processed image can achieve a special effect, but it is easy to cause false detection and missed detection, greatly affecting the accuracy of segmentation, and the fusion effect of images, such as Distortion or not smooth enough, and so on.

The embodiment of the present application provides an image processing method and apparatus, where the image processing apparatus may be specifically integrated in a device such as a server.

For example, referring to FIG. 1a, when a user needs to process an image, an image processing request may be sent to the server through the terminal, wherein the image processing request indicates an image to be processed, and information such as an element type to be replaced. After receiving the image processing request, the server may acquire a semantic segmentation model corresponding to the element type (the semantic segmentation model is trained by a deep neural network), and then, according to the semantic segmentation model, each pixel in the image belongs to The probability of the element type is predicted to obtain a segmentation probability map. For convenience of description, in the embodiment of the present application, the segmentation probability map is referred to as an initial probability map. Thereafter, the server may also optimize the initial probability map by means of a conditional random field to obtain a more fine segmentation result (ie, obtain a segmentation effect map), and then fuse the image with the preset element material according to the segmentation result. For example, a fusion method can be used to combine a first color portion (such as a white portion) in a segmentation effect image with a replaceable element material, and a second color portion (such as a black portion) in the segmentation effect image is combined with the image. Then, the two combined results are synthesized, and the synthesized processed image is supplied to the terminal, and the like.

The details are described below separately. It should be noted that the serial numbers of the following embodiments are not intended to limit the order of the embodiments.

This embodiment will be described from the perspective of an image processing apparatus, which may be integrated in a device such as a server.

An image processing method comprising: receiving an image processing request indicating an image to be processed (ie, an image to be processed), and an element type to be replaced (ie, an element type to be replaced), acquiring the element type Corresponding semantic segmentation model, which is trained by deep neural network. According to the semantic segmentation model, the probability that each pixel belongs to the element type is predicted, and the initial probability map is obtained, based on the conditional random field pair. The initial probability map is optimized to obtain a segmentation effect map, and the image is merged with the preset element material according to the segmentation effect image to obtain a processed image.

As shown in FIG. 1b, the specific process of the image processing method can be as follows:

101. Receive an image processing request.

For example, the image processing request sent by the terminal or other network side device may be specifically received (eg, a server), and the like. Wherein, the image processing request may indicate an image to be processed, and information such as an element type to be replaced.

The so-called element type refers to the category of the element, and the element refers to the basic element that can carry the visual information. For example, if the image processing request indicates that the type of the element to be replaced is “sky”, it indicates that the image needs to be in the image. All sky parts are replaced; for example, if the image processing request indicates that the type of element to be replaced is "portrait", it means that all portrait parts in the image need to be replaced, and so on, and so on.

102. Obtain a semantic segmentation model corresponding to the element type, and the semantic segmentation model is trained by a deep neural network.

For example, if in step 101, the image processing request received by the server (eg, the server) indicates that the element type to be replaced is "sky", then at this time, (eg, the server) may acquire a semantic segmentation corresponding to "sky" a model, and if the image processing request received in step 101 (eg, the server) indicates that the type of element to be replaced is "portrait", then at this time, a semantic segmentation model corresponding to "portrait" may be acquired, etc. .

The semantic segmentation model may be pre-stored in the image processing device or other storage device, and may be acquired by the image processing device when needed, or the semantic segmentation model may be self-established by the image processing device, that is, Before the step of "acquiring the semantic segmentation model corresponding to the element type", the image processing method may further include:

Establish a semantic segmentation model corresponding to the element type. For example, the specifics can be as follows:

The training data including the element type is obtained, and according to the training data, the preset semantic segmentation initial model is trained by using a deep neural network to obtain a semantic segmentation model corresponding to the element type.

For example, to establish a semantic segmentation model corresponding to “sky”, for example, a certain number (such as 8000, etc.) of images containing the sky can be collected, and then, based on these images, a preliminary semantic segmentation initial model is extracted by using a deep neural network. Fine tune, the resulting model is the semantic segmentation model corresponding to “sky”.

It should be noted that the preset semantic segmentation initial model may be preset according to the requirements of the actual application. For example, a pre-trained semantic segmentation model for 20 categories of general scenes may be used, and the like.

103. According to the semantic segmentation model, predicting a probability that each pixel in the image belongs to the element type, and obtaining an initial probability map; for example, the specific information may be as follows:

(1) (eg, a server) import the image into the semantic segmentation model to predict the probability that each pixel in the image belongs to the element type.

For example, taking the element type as "sky" as an example, at this time, the image can be imported into the semantic segmentation model corresponding to "sky" to predict the probability that each pixel in the image belongs to "sky".

For another example, taking the element type as “portrait” as an example, at this time, the image may be imported into a semantic segmentation model corresponding to “portrait” to predict the probability that each pixel in the image belongs to “portrait”, and so on. ,and many more.

(2) (e.g., the server) sets the color of the corresponding pixel on the preset mask according to the probability to obtain an initial probability map.

For example, the specificity may be determined whether the probability is greater than a preset threshold. If yes, the color of the corresponding pixel on the preset mask is set to the first color, and if not, the color of the corresponding pixel on the preset mask is set to The second color, after determining that all the pixels in the image are set on the preset mask, output a preset mask after setting the color to obtain an initial probability map. In an embodiment of the present application, the mask is a part outside the selection area and is responsible for protecting the content of the selection.

That is, a mask containing the first color and the second color may be obtained at this time, wherein the first color in the mask indicates that the probability that the corresponding pixel belongs to the element type is large, for example, greater than a preset threshold, and the second The color indicates that the probability that the corresponding pixel belongs to the element type is small, such as less than a preset threshold. Therefore, for the convenience of description, in the embodiment of the present application, the mask of the output is referred to as an initial probability map.

The preset threshold may be set according to the requirements of the actual application. For example, the preset threshold is specifically 80%, and the element type is “sky” as an example. If a certain pixel A belongs to the “sky”, the probability is greater than 80%, you can set the color of pixel A on the preset mask to the first color. Otherwise, if the probability that pixel A belongs to "sky" is less than or equal to 80%, you can set pixel A on the preset mask. The color is set to the second color, and so on.

The first color and the second color may also be determined according to actual application requirements. For example, the first color may be set to white, the second color may be set to black, or the first color may be set to pink. And set the second color to green, and so on. For convenience of description, in the embodiment of the present application, the description will be made by taking the first color as white and the second color as black.

104. The image is merged with the preset element material according to the initial probability map to obtain a processed image; for example, the specific content may be as follows:

(1) (eg, server) obtains replaceable element material according to a preset policy.

The preset policy may be set according to the requirements of the actual application. For example, the user may select a material selection instruction triggered by the user, and then, according to the material selection instruction, obtain the corresponding material from the material library as a replaceable element material, etc. .

In order to increase the diversity of the material of the element, the element material can also be obtained by random interception, that is, the step of “acquiring the replaceable element material according to the preset strategy” may also include:

A candidate image is acquired, the candidate image is randomly intercepted, and the intercepted image is taken as a replaceable element material, and the like.

The candidate image may be acquired on the network, or may be uploaded by the user, or may be directly recorded by the user on the terminal screen or the webpage, and then provided to the image processing apparatus, etc., where No longer.

(2) (eg, the server) combines the first color portion of the initial probability map with the acquired element material by a fusion method to obtain a first combination map.

Since the pixel of the first color portion belongs to the pixel with higher probability of the element type to be replaced, at this time, the portion can be combined with the acquired element material by the fusion method, that is, the pixel of the portion can be Replaced with the acquired element material.

(3) (eg, the server) combines the second color portion of the initial probability map with the image by a fusion method to obtain a second combination map.

Since the pixel of the second color portion belongs to the pixel with a lower probability of the element type to be replaced, at this time, the portion can be combined with the original image by the fusion method, that is, the pixel of the portion is retained.

It should be noted that, in order to improve the fusion effect, or to achieve other special effect effects, before the second color portion is combined with the image, the image may be subjected to certain preprocessing, such as color conversion, contrast adjustment, and brightness adjustment. , saturation adjustment, and/or adding other special effect masks, etc., and then combining the second color portion with the preprocessed image by a fusion method to obtain a second combination map.

(4) (eg, a server) synthesizing the first combination map and the second combination map to obtain a processed image.

In this way, the elements in the image that need to be replaced can be replaced with the material of the element, such as replacing "sky" in the image with "space", and so on, and will not be described here.

In addition, in order to ensure that the processed image obtained according to the initial probability map is more excellent, the initial probability map in the above example may be optimized, and the specific optimization manner is as follows:

The initial probability map is optimized based on conditional random fields (CRF or CRFs, also referred to as conditional random fields) to obtain a segmentation effect map.

For example, the pixel in the initial probability map may be mapped to the node in the conditional random field, and the similarity of the edge constraint between the nodes may be determined, and the similarity of the edge constraint is used in the initial probability map. The segmentation result of the pixel is adjusted to obtain a segmentation effect map.

Among them, the conditional random field is a discriminative probability model, which is a kind of random field. Like the Markov random field, the conditional random field has an undirected graph model. The nodes (ie, vertices) in the graph model represent random variables, and the connections between nodes represent the dependencies between random variables. Conditional random fields have the ability to express long-distance dependence and overlapping features, which can better solve the advantages of labeling (classification) biasing, and all features can be globally normalized to obtain global optimality. Solution, therefore, the conditional random field can be used to optimize the initial probability map to achieve the purpose of optimizing the segmentation result.

It should be noted that since the segmentation effect map is optimized by the initial probability map, the segmentation effect map is also a mask containing the first color and the second color.

Step 104: merging the image with the preset element material according to the initial probability map, and obtaining the processed image, comprising: merging the image with the preset element material according to the segmentation effect image to obtain a processed image; for example, Specifically, it can be as follows:

In order to increase the diversity of the material of the element, the element material can also be obtained by random interception, that is, the step “acquiring the replaceable element material according to the preset strategy” may also include:

(2) (e.g., the server) combines the first color portion of the segmentation effect map with the acquired element material by a fusion method to obtain a first combination map.

(3) (eg, a server) combines the second color portion of the segmentation rendering with the image by a fusion method to obtain a second combination.

In order to make the fusion result more realistic and avoid noise or missing due to inaccurate probability prediction, the segmentation effect map can be processed before fusion to make the segmentation boundary smoother and the connection of the replacement region. The color processing may be more natural; that is, before the step of “merging the image with the preset element material according to the segmentation effect image to obtain a processed image”, the image processing method may further include:

The segmentation effect diagram is subjected to an Appearance Model method and/or an image morphology operation process to obtain a processed segmentation effect map.

In this case, the step of “merging the image with the preset element material according to the segmentation effect image to obtain the processed image” may include: fusing the image with the preset element material according to the processed segmentation effect image, for example, performing Transparency (Alpha) blends to get processed images.

Among them, the appearance model method is a feature point extraction method widely used in the field of pattern recognition. It can statistically model the texture and further fuse the two statistical models of shape and texture into the apparent model. The image morphology operation processing may include processing such as noise reduction processing and/or connected domain analysis, and the segmentation effect map processed by the appearance model method or the image morphology operation may have a smoother boundary and a connection of the replacement area. The color transition at the place can be more natural.

It should be noted that the “Alpha Fusion” in the embodiment of the present application refers to the fusion based on the Alpha value, wherein Alpha is mainly used to specify the transparency level of the pixel. In general, 8 bits can be reserved for the alpha portion of each pixel, the effective value of alpha is in the range [0, 255], and [0, 255] represents the opacity [0%, 100%]. Therefore, when the alpha of the pixel is 0, it means completely transparent. When the alpha of the pixel is 128, it means 50% transparency, and when the alpha of the pixel is 255, it means completely opaque.

It can be seen that, after receiving the image processing request, the embodiment may acquire a semantic segmentation model corresponding to the element type that needs to be replaced according to the instruction of the request, and predict, according to the model, the probability that each pixel in the image belongs to the element type. To obtain an initial probability map, and then optimize the initial probability map based on the conditional random field, and use the segmentation effect map obtained by the optimization to fuse the image with the preset element material, thereby achieving an element type part of the image. The purpose of replacing the material of the preset element; because the semantic segmentation model in this scheme is mainly trained by the deep neural network, and when the image is semantically segmented by the model, it is not based on information such as color and position. Rather, by predicting the probability that each pixel belongs to the type of the element, the probability of false detection and missed detection can be greatly reduced compared to the existing scheme; in addition, since the scheme can also utilize conditional random field pair segmentation The initial probability map is optimized, so that a more detailed segmentation result can be obtained. Large improve accuracy of segmentation, helps to reduce the image distortion and improve image fusion effect.

According to the method described in the above embodiments, the following will be exemplified in further detail.

In this embodiment, the image processing apparatus is specifically integrated into the server, and the element to be replaced is “sky” as an example.

As shown in Figures 2a and 2d, an image processing method can be as follows:

201. The terminal sends an image processing request to the server, where the image processing request may indicate an image to be processed (ie, an image to be processed), and information such as an element type to be replaced (ie, an element type to be replaced).

The image processing request may be triggered in various manners, for example, by clicking or sliding a trigger button on a webpage or a client interface, or by triggering a preset instruction, and the like.

For example, taking the trigger button to trigger, for example, see Figure 2b. When the user needs to replace the sky part of the picture A with other elements, such as replacing the "space" element or adding "cloud", you can upload the picture A. And clicking the trigger key "play once" to trigger the generation of the image processing request, and sending the image processing request to the server, wherein the image processing request indicates that the image to be processed is the image A, and the element type to be replaced is "sky" .

It should be noted that in the embodiment, the elements that need to be replaced are referred to as “sky” as an example. It should be understood that the types of elements that need to be replaced may also be other types, such as “portraits” and “eyes”. "Or "plant", etc., the implementation is similar to this, and will not be repeated here.

202. After receiving the image processing request, the server acquires a semantic segmentation model corresponding to “sky”, and the semantic segmentation model is trained by a deep neural network.

The semantic segmentation model may be pre-stored in the image processing device or other storage device, and may be acquired by the image processing device when needed, or the semantic segmentation model may be self-created by the image processing device, for example, The training data including the type of the element may be obtained, for example, collecting a certain number of images containing the sky, and then, according to the training data (ie, the image containing the sky), using the deep neural network to perform the preset semantic segmentation initial model Train to get the semantic segmentation model corresponding to the "sky".

203. The server imports the image into the semantic segmentation model to predict a probability that each pixel in the image belongs to the “sky”.

For example, if in step 202, the received image processing request indicates that the image to be processed is the picture A, then at this time, the picture A may be imported into the semantics corresponding to the “sky” in the form of a three-channel color image. In the segmentation model, to predict the probability that each pixel in the image A belongs to the "sky", then step 204 is performed.

204. The server sets the color of the corresponding pixel on the preset mask according to the probability, to obtain an initial probability map.

For example, the specificity may be determined whether the probability is greater than a preset threshold. If yes, the color of the corresponding pixel on the preset mask is set to the first color, and if not, the color of the corresponding pixel on the preset mask is set to The second color, after determining that all the pixels in the image are set on the preset mask, output a preset mask after setting the color to obtain an initial probability map.

The preset threshold may be set according to the requirements of the actual application. For example, the preset threshold is specifically 80%. If the probability that a certain pixel K belongs to the “sky” is greater than 80%, the pixel K may be used. The color on the preset mask is set to the first color. Otherwise, if the probability that a pixel K belongs to the "sky" is less than or equal to 80%, the color of the pixel K on the preset mask may be set to the second color. Color, and so on.

The first color and the second color may also be determined according to actual application requirements. For example, the first color may be set to white, the second color may be set to black, or the first color may be set to pink. And set the second color to green, and so on.

For example, if the first color is set to white and the second color is set to black, then after the picture A is imported into the semantic segmentation model, an initial probability map as shown in FIG. 2c can be obtained.

205. The server optimizes the initial probability map based on the conditional random field to obtain a segmentation effect diagram.

For example, the server may map the pixels in the initial probability map to the nodes in the conditional random field, determine the similarity of the edge constraints between the nodes, and perform the segmentation result of the pixels in the initial probability map according to the similarity of the edge constraints. Adjust to get the split rendering.

Since the conditional random field is an undirected graph model, each pixel in the image can correspond to a node in the conditional random field, and preset a priori information including parameters such as color, texture, and position, so that The pixels with similar edge constraints between the nodes have similar segmentation results. Therefore, the segmentation results of the pixels in the initial probability map can be adjusted according to the similarity of the edge constraints, so that the sky segmentation result is more fine, for example, participation. In Fig. 2c, after the initial probability map is optimized based on the conditional random field, a more detailed segmentation effect map of the segmentation result can be obtained.

206. The server performs an appearance model method and/or an image morphology operation process on the segmentation rendering image to obtain a processed segmentation effect map, and then performs step 207.

The image morphology operation process may include processing such as noise reduction processing and/or connected domain analysis. The segmentation effect map processed by the appearance model method or the image morphology operation can make the segmentation boundary smoother and the color transition at the junction of the replacement region can be more natural.

It should be noted that step 206 is an optional step. If step 206 is not performed, after step 205 is performed, step 207 may be directly performed, and in step 208, the segmentation effect map, image, and The element material is fused to obtain a processed image.

207. The server obtains replaceable element material according to a preset policy.

In order to increase the diversity of the element material, the element material can also be obtained by random interception. For example, the server can acquire the candidate image, and then randomly intercept the candidate image, and replace the captured image as a replaceable image. Elemental material, and more.

208. The server fuses the processed segmentation effect image, the image, and the element material by a fusion method to obtain a processed image.

For example, the first color is white and the second color is black. For example, the server can combine the white part of the split effect image with the acquired element material by the fusion method to obtain the first color. The second combination map is obtained by combining the black portion in the segmentation effect map with the image A by a fusion method, and then the first combination image and the second combination image are combined to obtain a processed image.

Since the probability that the white portion of the pixel belongs to the "sky" is high, at this time, the pixel of the portion can be replaced with the acquired element material by the fusion method, and the pixel of the black portion belongs to the "sky". The probability is low. Therefore, at this time, the pixel of the part can be combined with the original image A by the fusion method, that is, the pixel of the part is retained, so that the first combination picture and the second combination picture are combined, and then The "sky" in the original image A can be replaced with the corresponding element material, for example, the "sky" in the image A is replaced with "the night sky of Christmas", etc., see FIG. 2d, and details are not described herein again.

It should be noted that, as shown in FIG. 2d, in order to improve the fusion effect or implement other special effects, the image A may be fixed before the black portion (ie, the second color portion) is combined with the image A. Pre-processing, such as color conversion, contrast adjustment, brightness adjustment, saturation adjustment, and/or adding other special effects masks, etc., and then combining the black portion with the pre-processed image A by a fusion method to A second combination diagram is obtained, and details are not described herein again.

209. The server sends the processed image to the terminal.

For example, the processed image can be displayed on the interface of the corresponding client. The server may also provide a corresponding save path and/or share interface for the user to protect and/or share, for example, the processed image may be saved in the cloud or locally (ie, in the terminal), and the processed image may be processed. Share to Weibo, circle of friends, and/or insert into the chat dialog interface of the instant chat tool, and so on, and will not repeat them here.

As can be seen from the above, after receiving the image processing request, the embodiment may acquire a semantic segmentation model corresponding to “sky” according to the instruction of the request, and predict a probability that each pixel in the image belongs to “sky” according to the model, to obtain The initial probability map is then optimized based on the conditional random field, and the image is merged with the preset element material by using the segmentation effect map obtained by the optimization, thereby replacing the “sky” part of the image with the pre-predetermined image. The purpose of the element material is set; because the semantic segmentation model in this scheme is mainly trained by the deep neural network, and the semantic segmentation of the image by using the model is not based on information such as color and position, but through The probability that each pixel belongs to the element type is predicted, so the probability of false detection and missed detection can be greatly reduced compared with the existing scheme; in addition, since the scheme can also utilize the conditional random field pair initialization after segmentation The probability map is optimized, so that more detailed segmentation results can be obtained, which greatly improves the segmentation precision. Accuracy helps reduce image distortion and improves image fusion.

In order to better implement the above method, the embodiment of the present application further provides an image processing apparatus, which may be integrated into a device such as a server.

As shown in FIG. 3a, the image processing apparatus includes a receiving unit 301, an obtaining unit 302, a prediction unit 303, an optimization unit 304, and a fusion unit 305, as follows:

(1) receiving unit 301;

The receiving unit 301 is configured to receive an image processing request, where the image processing request indicates an image that needs to be processed, and information such as an element type that needs to be replaced.

(2) obtaining unit 302;

The obtaining unit 302 is configured to obtain a semantic segmentation model corresponding to the element type, and the semantic segmentation model is trained by a deep neural network.

For example, if the image processing request received by the receiving unit 301 indicates that the element type to be replaced is “sky”, then the acquiring unit 302 can acquire the semantic segmentation model corresponding to “sky”, and if the receiving unit 301 receives The image processing request to the image indicates that the type of the element to be replaced is "portrait". At this time, the obtaining unit 302 can acquire a semantic segmentation model corresponding to the "portrait", and the like, and is not enumerated here.

The semantic segmentation model may be pre-stored in the image processing device or other storage device, and may be acquired by the image processing device when needed, or the semantic segmentation model may be self-established by the image processing device, that is, As shown in FIG. 3b, the image processing apparatus may further include a model establishing unit 306, as follows:

The model establishing unit 306 can be used to establish a semantic segmentation model corresponding to the element type. For example, the specific information may be as follows:

The preset semantic segmentation initial model may be preset according to actual application requirements. For example, a pre-trained semantic segmentation model for 20 categories of general scenes may be used, and the like.

(3) prediction unit 303;

The prediction unit 303 is configured to predict, according to the semantic segmentation model, a probability that each pixel in the image belongs to the element type, and obtain an initial probability map.

For example, the prediction unit 303 can include a prediction subunit and a setting subunit, as follows:

A prediction subunit that can be used to import the image into the semantic segmentation model to predict the probability that each pixel in the image belongs to the element type.

For example, taking the element type as “sky” as an example, at this time, the prediction subunit can import the image into the semantic segmentation model corresponding to “sky” to predict the probability that each pixel in the image belongs to “sky”.

The setting subunit can be used to set the color of the corresponding pixel on the preset mask according to the probability to obtain an initial probability map.

For example, the setting subunit may be specifically configured to determine whether the probability is greater than a preset threshold, and if yes, set a color of the corresponding pixel on the preset mask to a first color; if not, the corresponding pixel is preset The color on the mask is set to the second color; after determining that all the pixels in the image are set on the preset mask, the preset mask after setting the color is output, and an initial probability map is obtained.

The preset threshold may be set according to the requirements of the actual application, and the first color and the second color may also be determined according to actual application requirements. For example, the first color may be set to white, and the second color may be set to Black, and so on.

(4) optimization unit 304;

The optimization unit 304 is configured to optimize the initial probability map based on the conditional random field to obtain a segmentation effect map.

For example, the optimization unit 304 may be specifically configured to map the pixels in the initial probability map to the nodes in the conditional random field, determine the similarity of the edge constraints between the nodes, and determine the initial probability map according to the similarity of the edge constraints. The segmentation result of the pixel is adjusted to obtain a segmentation effect map.

(5) a fusion unit 305;

The merging unit 305 is configured to fuse the image with the preset element material according to the segmentation effect image to obtain a processed image.

For example, the fusion unit 305 can include a material acquisition subunit, a first fusion subunit, a second fusion subunit, and a synthesis subunit, as follows:

The material acquisition sub-unit is used to obtain a replaceable element material according to a preset policy.

The preset policy may be set according to the requirements of the actual application. For example, the material acquisition sub-unit may be specifically configured to receive a material selection instruction triggered by the user, and obtain corresponding material from the material library according to the material selection instruction, as Replaced element material, and so on.

In order to increase the diversity of the material of the element, the material of the element can also be obtained by random interception, namely:

The material acquisition sub-unit is specifically configured to acquire a candidate image, randomly intercept the candidate image, and use the intercepted image as a replaceable element material.

The first fusion subunit may be configured to combine the first color portion in the segmentation effect map with the acquired element material by a fusion method to obtain a first combination map.

The second fusion subunit may be configured to combine the second color portion in the segmentation effect map with the image by a fusion method to obtain a second combination map.

The synthesis subunit can be used to synthesize the first combination map and the second combination map to obtain a processed image.

In order to make the fusion result more realistic and avoid noise or missing due to inaccurate probability prediction, the segmentation effect map can be processed before fusion to make the segmentation boundary smoother and the connection of the replacement region. The color transition at the location may be more natural; that is, as shown in FIG. 3b, the image processing apparatus may further include a pre-processing unit 307, as follows:

The pre-processing unit 307 can be configured to perform an appearance model method and/or an image morphology operation process on the segmentation effect map to obtain a processed segmentation effect map.

At this time, the merging unit 305 may be specifically configured to fuse the image with the preset element material according to the processed segmentation effect image to obtain a processed image.

The image morphological operation processing may include processing such as noise reduction processing and/or connected domain analysis, and details are not described herein again.

In the specific implementation, the foregoing units may be implemented as a separate entity, or may be implemented in any combination, and may be implemented as the same or a plurality of entities. For the specific implementation of the foregoing, refer to the foregoing method embodiments, and details are not described herein.

As can be seen from the above, after receiving the image processing request, the acquiring unit 302 may acquire a semantic segmentation model corresponding to the element type that needs to be replaced according to the instruction of the request, and the prediction unit 303 predicts each image according to the model. A pixel belongs to the probability of the element type to obtain an initial probability map. Then, the optimization unit 304 optimizes the initial probability map based on the conditional random field, and the fusion unit 305 uses the optimized segmentation effect map to image and pre- The element material is fused to achieve the purpose of replacing a certain element type part of the image with the preset element material; since the semantic segmentation model in the scheme is mainly trained by the deep neural network, and the model is utilized Semantic segmentation of images is not based solely on information such as color and position, but by predicting the probability that each pixel belongs to that element type. Therefore, compared to existing solutions, false detection and leakage can be greatly reduced. Probability of detection; in addition, since the scheme can also utilize the conditional random field to split the initial Fig rate optimization, so you can get a finer segmentation results, greatly improving the accuracy of segmentation, helps to reduce the image distortion and improve image fusion effect.

The embodiment of the present application further provides a server, as shown in FIG. 4, which shows a schematic structural diagram of a server involved in the embodiment of the present application, specifically:

The server may include one or more processing core processor 401, one or more computer readable storage medium memories 402, power source 403, and input unit 404. It will be understood by those skilled in the art that the server structure illustrated in FIG. 4 does not constitute a limitation to the server, and may include more or less components than those illustrated, or some components may be combined, or different component arrangements. among them:

The processor 401 is the control center of the server, connecting various portions of the entire server using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 402, and recalling data stored in the memory 402, Execute the server's various functions and process data to monitor the server as a whole. The processor 401 may include one or more processing cores; the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, an application, etc., and the modem processor mainly Handle wireless communications. It can be understood that the above modem processor may not be integrated into the processor 401.

The memory 402 can be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by running software programs and modules stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored according to Data created by the use of the server, etc. Moreover, memory 402 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 402 can also include a memory controller to provide processor 401 access to memory 402.

The server also includes a power supply 403 for powering various components. The power supply 403 can be logically coupled to the processor 401 through a power management system to manage functions such as charging, discharging, and power management through the power management system. The power supply 403 may also include any one or more of a DC or AC power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The server can also include an input unit 404 that can be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function controls.

Although not shown, the server may further include a display unit or the like, and details are not described herein again. Specifically, in this embodiment, the processor 401 in the server loads the executable file corresponding to the process of one or more applications into the memory 402 according to the following instruction, and is stored in the memory by the processor 401. The application in 402 performs the above-described methods shown in Figures 1b and 2a, and the operations of the devices shown in Figures 3a and 3b, as follows:

Receiving an image processing request indicating an image to be processed and an element type to be replaced, and acquiring a semantic segmentation model corresponding to the element type, the semantic segmentation model being trained by a deep neural network, according to the semantic segmentation model Predicting the probability that each pixel belongs to the element type in the image, obtaining an initial probability map, and optimizing the initial probability map based on the conditional random field to obtain a segmentation effect map, and the image and the preset according to the segmentation effect map The element material is fused to obtain a processed image.

For example, the replaceable element material may be obtained according to a preset strategy, and then the first color portion in the segmentation effect image is combined with the acquired element material by a fusion method to obtain a first combination image, and the fusion method is adopted. The second color portion in the segmentation effect map is combined with the image to obtain a second combination image, and then the first combination image and the second combination image are combined to obtain a processed image.

The semantic segmentation model may be pre-stored in the image processing device or other storage device, and may be acquired by the image processing device when needed, or the semantic segmentation model may be self-established by the image processing device, that is, The processor 401 can also run an application stored in the memory 402 to implement the following functions:

In order to make the fusion result more realistic and avoid noise or missing due to inaccurate probability prediction, the segmentation effect map can be processed before fusion to make the segmentation boundary smoother and the connection of the replacement region. The color transition at the location can be more natural; that is, the processor 401 can also run an application stored in the memory 402 to implement the following functions:

The appearance model method and/or the image morphology operation processing are performed on the segmentation effect diagram, and the segmentation effect map is obtained after the processing, so that, after the fusion, the image and the preset element material can be segmented according to the processed segmentation effect map. The fusion is performed to obtain the processed image. For details, refer to the previous embodiment, and details are not described herein again.

For the specific implementation of the foregoing operations, refer to the foregoing embodiments, and details are not described herein again.

It can be seen that, after receiving the image processing request, the server of the embodiment may acquire a semantic segmentation model corresponding to the element type that needs to be replaced according to the instruction of the request, and predict, according to the model, each pixel in the image belongs to the element type. Probability to obtain the initial probability map, then optimize the initial probability map based on the conditional random field, and use the segmentation effect map obtained by the optimization to fuse the image with the preset element material to achieve an element in the image The type part is replaced with the purpose of the preset element material; since the semantic segmentation model in this scheme is mainly trained by the deep neural network, and the semantic segmentation of the image using the model is not based only on color and position, etc. Information, but by predicting the probability that each pixel belongs to the type of the element, therefore, the probability of false detection and missed detection can be greatly reduced compared to the existing scheme; in addition, since the scheme can also utilize the conditional random field Optimize the segmented initial probability map so that you can get more detailed scores As a result, greatly improving the accuracy of segmentation, helps to reduce the image distortion and improve image fusion effect.

A person skilled in the art may understand that all or part of the various steps of the foregoing embodiments may be performed by a program to instruct related hardware. The program may be stored in a computer readable storage medium, and the storage medium may include: Read Only Memory (ROM), Random Access Memory (RAM), disk or optical disk.

An image processing method and apparatus provided by the embodiments of the present application are described in detail. The principles and implementations of the present application are described in the specific examples. The description of the above embodiments is only used to help understand the present application. The method and its core idea; at the same time, those skilled in the art, according to the idea of the present application, there will be changes in the specific implementation manner and the scope of application, in summary, the contents of this specification should not be construed as Application restrictions.

Claims

An image processing method is applied to an image processing apparatus, the method comprising:

Receiving an image processing request indicating an image to be processed, and an element type to be replaced;

Obtaining a semantic segmentation model corresponding to the element type, the semantic segmentation model being trained by a deep neural network;

Determining, according to the semantic segmentation model, a probability that each pixel in the image belongs to the element type, and obtaining an initial probability map;

The image is merged with the preset element material according to the initial probability map to obtain a processed image.
The method according to claim 1, wherein the probability of each pixel in the image belonging to the element type is predicted according to the semantic segmentation model, and an initial probability map is obtained, including:

Importing the image into the semantic segmentation model, predicting a probability that each pixel in the image belongs to the element type;

The color of the corresponding pixel on the preset mask is set according to the probability, and an initial probability map is obtained.
The method according to claim 2, wherein the color of the corresponding pixel on the preset mask is set according to the probability, and an initial probability map is obtained, including:

Determining whether the probability is greater than a preset threshold;

When the probability is greater than the preset threshold, setting a color of the corresponding pixel on the preset mask to a first color;

When the probability is less than or equal to the threshold, setting a color of the corresponding pixel on the preset mask to a second color;

After determining that all the pixels in the image are set on the preset mask, the preset mask after setting the color is output, and an initial probability map is obtained.
The method of claim 1 further comprising:

The initial probability map is optimized based on the conditional random field to obtain a segmentation effect map;

The image is merged with the preset element material according to the initial probability map to obtain a processed image, including:

The image is merged with the preset element material according to the segmentation effect map to obtain a processed image.
The method according to claim 4, wherein the conditional random field optimizes the initial probability map to obtain a segmentation effect map, including:

Mapping pixels in the initial probability map to nodes in a conditional random field;

Determining the similarity of edge constraints between nodes;

The segmentation result of the pixels in the initial probability map is adjusted according to the similarity of the edge constraints to obtain a segmentation effect map.
The method according to claim 4 or 5, wherein the image is merged with the preset element material according to the segmentation effect map to obtain a processed image, including:

Obtaining replaceable element material according to a preset strategy;

Combining the first color portion in the segmentation effect map with the acquired element material by a fusion method to obtain a first combination map;

Combining the second color portion in the segmentation effect map with the image by a fusion method to obtain a second combination map;

The first combination map and the second combination map are combined to obtain a processed image.
The method according to claim 6, wherein the obtaining the replaceable element material according to the preset policy comprises:

Obtaining a candidate image, randomly extracting the candidate image, and using the intercepted image as a replaceable element material; or

Receiving a user-triggered material selection instruction, and acquiring a corresponding material from the material library according to the material selection instruction as a replaceable element material.
The method according to claim 4 or 5, wherein the image is merged with the preset element material according to the segmentation effect map to obtain the processed image, and further includes:

Performing an appearance model method and/or an image morphology operation process on the segmentation effect map to obtain a segmentation effect map after processing;

The merging the image with the preset element material according to the segmentation effect map to obtain the processed image includes: merging the image with a preset element material according to the processed segmentation effect map, and obtaining the processing After the image.
The method according to any one of claims 1 to 5, before the obtaining a semantic segmentation model corresponding to the element type, further comprising:

Obtaining training data containing the type of the element;

According to the training data, the preset semantic segmentation initial model is trained by using a deep neural network to obtain a semantic segmentation model corresponding to the element type.
An image processing apparatus, the apparatus comprising: a processor and a memory, wherein the memory stores instructions executable by the processor, and when the instructions are executed, the processor is configured to:

Receiving an image processing request indicating an image to be processed, and an element type to be replaced;

Obtaining a semantic segmentation model corresponding to the element type, the semantic segmentation model being trained by a deep neural network;

Determining, according to the semantic segmentation model, a probability that each pixel in the image belongs to the element type, and obtaining an initial probability map;

The image is merged with the preset element material according to the initial probability map to obtain a processed image.
The apparatus of claim 10, the processor further configured to:

Importing the image into the semantic segmentation model, predicting a probability that each pixel in the image belongs to the element type;

The color of the corresponding pixel on the preset mask is set according to the probability, and an initial probability map is obtained.
The apparatus of claim 11 wherein said processor is further configured to:

Determining whether the probability is greater than a preset threshold;

When the probability is greater than the preset threshold, setting a color of the corresponding pixel on the preset mask to a first color;

When the probability is less than or equal to the threshold, setting a color of the corresponding pixel on the preset mask to a second color;

After determining that all the pixels in the image are set on the preset mask, the preset mask after setting the color is output, and an initial probability map is obtained.
The apparatus of claim 10, the processor further configured to:

The initial probability map is optimized based on the conditional random field to obtain a segmentation effect map;

Wherein the processor is further configured to:

The image is merged with the preset element material according to the segmentation effect map to obtain a processed image.
The apparatus of claim 13 wherein said processor is further configured to:

Mapping the pixels in the initial probability map to the nodes in the conditional random field, determining the similarity of the edge constraints between the nodes, and adjusting the segmentation results of the pixels in the initial probability map according to the similarity of the edge constraints, Split the effect map.
The apparatus of claim 13 or 14, the processor further configured to:

Obtaining replaceable element material according to a preset strategy;

Combining the first color portion in the segmentation effect map with the acquired element material by a fusion method to obtain a first combination map;

Combining the second color portion in the segmentation effect map with the image by a fusion method to obtain a second combination map;

The first combination map and the second combination map are combined to obtain a processed image.
The apparatus according to claim 15, the processor is further configured to: acquire a candidate image, randomly intercept the candidate image, and use the intercepted image as a replaceable element material; or

Receiving a user-triggered material selection instruction, and acquiring a corresponding material from the material library according to the material selection instruction as a replaceable element material.
The apparatus of claim 13 or 14, the processor further configured to:

Performing an appearance model method and/or an image morphology operation process on the segmentation effect map to obtain a segmentation effect map after processing;

According to the processed segmentation effect map, the image is merged with the preset element material to obtain a processed image.
The apparatus according to any one of claims 10 to 14, the processor further configured to:

The training data including the element type is obtained, and according to the training data, the preset semantic segmentation initial model is trained by using a deep neural network to obtain a semantic segmentation model corresponding to the element type.
A non-transitory computer readable storage medium storing a computer program for performing the method of any one of claims 1 to 9.