CN110148102B

CN110148102B - Image synthesis method, advertisement material synthesis method and device

Info

Publication number: CN110148102B
Application number: CN201810146131.9A
Authority: CN
Inventors: 吕建超; 谢奕; 徐秋泓; 瞿佳; 朱海文; 袁燊星; 程诚; 张韬
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-02-12
Filing date: 2018-02-12
Publication date: 2022-07-15
Anticipated expiration: 2038-02-12
Also published as: CN110148102A

Abstract

The invention discloses an image synthesis method, an advertisement material synthesis method and an advertisement material synthesis device, and belongs to the field of digital image processing. The method comprises the following steps: the method comprises the steps of obtaining an original image to be subjected to matting, carrying out saliency detection on the original image to generate a saliency map, calculating by adopting a depth matting model according to the saliency map to obtain a matting mask, matting out a target material in the original image by adopting the matting mask, and synthesizing the target material and an original template to be synthesized to obtain a target image. According to the invention, the original image is automatically scratched by adopting the significance detection and the depth scratching model, so that the condition that a user needs to repeatedly carry out trisection calibration on the original image in the related technology is avoided, the scratching operation is simplified, the original image is scratched, the full-automatic realization process of synthesizing the scratched target material and the original model is realized, and the manufacturing efficiency of image synthesis is improved.

Description

Image synthesis method, advertisement material synthesis method and device

Technical Field

The embodiment of the invention relates to the field of digital image processing, in particular to an image synthesis method, an advertisement material synthesis method and an advertisement material synthesis device.

Background

The image synthesis method is generally a process of synthesizing a target material in an original image and an original template to be synthesized to obtain a target image. The process of obtaining the target material in the original image is the digital matting process. The digital matting process is a process of solving a matting equation (in english).

I_i＝α_iF_i+(1-α_i)B_i；

Wherein, I_iIs the color value of the pixel i, α_iIs a number between 0 and 1, called the transparency value of the digital image, alpha mask (English) or unknown mask (English mask). F_iIs the foreground color of pixel i, B_iIs the background color of pixel i. The alpha matrix of the original image is used to represent the matting result of the original image. When alpha is_iThe representative pixel i belongs to the foreground when the value is 1, and when the value is alpha_iThe representative pixel i belongs to the background when the value is 0, and when alpha is_iA number between 0 and 1 indicates that pixel i belongs to the foreground background blending region.

In the related art, a of most pixels in a digital image is calibrated by a user's manual calibration_iThe values, i.e. the ternary diagram. As shown in fig. 1, for an original image 100, the calibrated original image includes: user calibration alpha_iForeground region 12 with value 1, user calibration alpha _i A background region 14 with a value of 0, and a user rating α_iThe unknown region 16 with an indeterminate value, the unknown region 16 being the region that the matting algorithm needs to estimate. After a user manually calibrates an original image, a closed-form matching (closed-form matching) algorithm is adopted to estimate foreground pixels and background pixels in an unknown area 16 according to a foreground area 12 and a background area 14 specified by the user, and alpha of each pixel in the unknown area is obtained_iThe value is obtained.

Because the user hardly specifies the three-segmentation needed by the closed matting algorithm accurately, if an accurate matting result is to be obtained, the user is required to continuously re-scale the three-segmentation needed by next matting according to the matting result, and the accurate matting result can be obtained after carrying out digital matting for a plurality of times.

Disclosure of Invention

The embodiment of the invention provides an image synthesis method, an advertisement material synthesis method and an advertisement material synthesis device, which can solve the problem of low image synthesis manufacturing efficiency in the related technology. The technical scheme is as follows:

In a first aspect, there is provided an image synthesis method, the method comprising:

acquiring an original image to be scratched;

performing saliency detection on the original image to generate a saliency map;

calculating to obtain a cutout mask by adopting a depth cutout model according to the salient image, wherein the depth cutout model is used for representing a cutout rule obtained based on sample image training;

adopting the matting mask to matte out a target material in the original image;

and synthesizing the target material and the original template to be synthesized to obtain a target image.

In a second aspect, there is provided a method for synthesizing advertisement material, the method comprising:

acquiring a first trigger operation corresponding to an image uploading entry in a target application program, wherein the target application program is an application program with an advertisement material processing function;

acquiring an uploaded original image according to the first trigger operation;

opening a matting function;

carrying out automatic matting on the original image to obtain a target material in the original image, wherein the target material is at least one element of plants, animals and still life;

and synthesizing the target material and the advertisement template to be synthesized to obtain a target advertisement image.

In a third aspect, there is provided an image synthesizing apparatus, the apparatus comprising:

the acquisition module is used for acquiring an original image to be subjected to matting;

the generating module is used for carrying out significance detection on the original image to generate a significance map;

the calculation module is used for calculating to obtain a matting mask by adopting a depth matting model according to the saliency map, and the depth matting model is used for representing a matting rule obtained based on sample image training;

the matting module is used for matting out a target material in the original image by adopting the matting mask;

and the synthesis module is used for synthesizing the target material and the original template to be synthesized to obtain a target image.

In a fourth aspect, there is provided an advertisement material synthesizing apparatus, the apparatus comprising:

the system comprises a first acquisition module, a first display module and a second acquisition module, wherein the first acquisition module is used for acquiring a first trigger operation corresponding to an image uploading entrance in a target application program, and the target application program is an application program with an advertisement material processing function;

the second acquisition module is used for acquiring the uploaded original image according to the first trigger operation;

the opening module is used for opening the image matting function;

the matting module is used for carrying out automatic matting on the original image to obtain a target material in the original image, wherein the target material is at least one element of plants, animals and still things;

And the synthesis module is used for synthesizing the target material and the advertisement template to be synthesized to obtain a target advertisement image.

In a fifth aspect, a terminal is provided, which includes a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the image synthesis method as provided in the first aspect.

In a sixth aspect, there is provided a terminal comprising a processor and a memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement the method of synthesizing advertising material as provided by the second aspect.

In a seventh aspect, there is provided a computer readable storage medium having at least one instruction, at least one program, a set of codes, or a set of instructions stored therein, which is loaded and executed by the processor to implement the image synthesis method as provided in the first aspect.

In an eighth aspect, there is provided a computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement the method of synthesizing advertising material as provided by the second aspect.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

the method comprises the steps of obtaining an original image to be subjected to matting, conducting saliency detection on the original image to generate a saliency map, obtaining a matting mask by adopting a depth matting model according to the saliency map, matting out a target material in the original image by adopting the matting mask, and synthesizing the target material and an original template to be synthesized to obtain a target image. The method has the advantages that the terminal can automatically scratch the original image by adopting the significance detection and the depth scratch model, the condition that a user needs to repeatedly carry out trisection calibration on the original image in the related technology is avoided, the scratch operation is simplified, the scratch of the original image is realized, the full-automatic realization process of synthesizing the target material obtained by scratch and the original model is realized, and the image synthesis manufacturing efficiency is improved.

Drawings

Fig. 1 is a schematic diagram of a calibrated original image related to the related art;

FIG. 2 is a schematic diagram of an image synthesis system according to an embodiment of the present invention;

FIG. 3 is a flow chart of an image synthesis method provided by an embodiment of the invention;

FIG. 4 is a flow chart of an image synthesis method provided by another embodiment of the invention;

FIG. 5 is a schematic diagram of an interface involved in an image synthesis method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of an interface involved in an image synthesis method according to another embodiment of the present invention;

FIG. 7 is a schematic diagram of an interface involved in an image synthesis method according to another embodiment of the present invention;

FIG. 8 is a schematic diagram of a filter processing procedure involved in an image synthesis method according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of an image synthesis method provided by an embodiment of the invention;

FIG. 10 is a schematic diagram illustrating a saliency detection algorithm involved in an image synthesis method according to an embodiment of the present invention;

FIG. 11 is a schematic diagram of a depth matting model training process involved in an image synthesis method according to an embodiment of the present invention;

FIG. 12 is a flow diagram of a method for synthesizing advertising material according to one embodiment of the present invention;

Fig. 13 is a schematic structural diagram of an image synthesizing apparatus according to an embodiment of the present invention;

fig. 14 is a schematic structural diagram of an advertisement material synthesizing apparatus provided by an embodiment of the present invention;

fig. 15 is a block diagram of a terminal according to an exemplary embodiment of the present invention;

fig. 16 is a schematic structural diagram of a server according to an exemplary embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

First, some terms related to the embodiments of the present invention are explained:

saliency detection in the embodiment of the invention, saliency detection is based on Visual saliency detection, also called Visual saliency detection (English). Visual saliency detection is a technology for predicting a Visual Attention area of human eyes by simulating a human Visual Attention Mechanism (VA) through an intelligent algorithm. The visual attention mechanism is that when facing a scene, a human automatically processes the interested area and selectively ignores the uninteresting area, wherein the interested area of the human is called as a salient area.

The matting technology comprises the following steps: the method is an image processing technology for separating and processing target materials in an original image to be subjected to matting from other elements.

Ternary diagram (English: trimap): the method is an image obtained by roughly dividing a digital image, namely dividing the digital image into a foreground region, a background region and an unknown region.

Super-resolution processing: is a technique for reconstructing a corresponding high-resolution digital image from an observed low-resolution digital image. The super-resolution processing is used to improve the resolution of a digital image by a hardware or software method.

Depth sectional model: is a mathematical model for determining a matting mask from input data.

Optionally, the depth matting models include, but are not limited to: at least one of a Convolutional Neural Network (CNN) model, a Deep Neural Network (DNN) model, a Recurrent Neural Network (RNN) model, an embedding (embedding) model, a Gradient Boosting Decision Tree (GBDT) model, and a Logistic Regression (LR) model.

The CNN model is a feedforward neural network, generally consisting of two or more convolutional layers and a top fully connected layer. Optionally, the convolutional neural network model further comprises an associated weight and a pooling layer. In practical applications, the CNN model may be trained using a back propagation algorithm.

The DNN model is a deep learning framework. The DNN model includes an input layer, at least one hidden layer (or intermediate layer), and an output layer. Optionally, the input layer, the at least one hidden layer (or intermediate layer), and the output layer each include at least one neuron for processing the received data. Alternatively, the number of neurons between different layers may be the same; alternatively, it may be different.

The RNN model is a neural network with a feedback structure. In the RNN model, the output of a neuron can be directly applied to itself at the next time stamp, i.e., the input of the i-th layer neuron at time m includes its own output at time (m-1) in addition to the output of the (i-1) layer neuron at that time.

The embedding model is based on an entity and a relationship distributed vector representation, considering the relationship in each triplet instance as a translation from the entity head to the entity tail. The triple instance comprises a subject, a relation and an object, and can be expressed as (subject, relation and object); the subject is an entity head and the object is an entity tail. Such as: dad for a small sheet is a large sheet, and is represented by the triple instance as (small sheet, dad, large sheet).

The GBDT model is an iterative decision tree algorithm that consists of a number of decision trees, with the results of all trees being summed up as the final result. Each node of the decision tree obtains a predicted value, and the predicted value is an average value of ages of all people belonging to the node corresponding to the age, taking age as an example.

The LR model is a model built by applying a logistic function on the basis of linear regression.

In the related technology, the alpha values of most pixels in a digital image are determined through manual calibration of a user, and because the user hardly specifies the three-segment image required by the closed matting algorithm accurately, if an accurate matting result is to be obtained, the user is required to continuously re-calibrate the three-segment image required by next matting according to the matting result of this time, and the accurate matting result can be obtained after carrying out digital matting for many times. Therefore, the embodiment of the invention provides an image synthesis method, an advertisement material synthesis method and an advertisement material synthesis device. The method comprises the steps of obtaining an original image to be subjected to matting, conducting saliency detection on the original image to generate a saliency map, obtaining a matting mask by adopting a depth matting model according to the saliency map, matting out a target material in the original image by adopting the matting mask, and synthesizing the target material and an original template to be synthesized to obtain a target image. The method has the advantages that the terminal can automatically scratch the original image by adopting the significance detection and depth scratching model, the condition that a user needs to repeatedly carry out trisection image calibration on the original image in the related technology is avoided, scratching operation is simplified, original image scratching is realized, a target material obtained by scratching and the original model are synthesized to obtain a full-automatic realization process of the target image, and the image synthesis manufacturing efficiency is improved.

Referring to fig. 2, a schematic structural diagram of an image synthesis system according to an embodiment of the invention is shown. The image composition system includes a user terminal 220, a server cluster 230, and a designer terminal 240.

The user terminal 220 may be a mobile phone, a tablet computer, an e-book reader, an MP3 player (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4), a laptop portable computer, a desktop computer, and the like.

The user terminal 220 has a target application running therein, and the target application is an application having a matting function and an image synthesis function. Optionally, the target application is a recommendation information generation application. The recommendation information is information with recommendation value, such as advertisement information, multimedia information or consultation information. For example, the targeted application is an advertising application.

The user terminal 220 and the server cluster 230 are connected through a communication network. Optionally, the communication network is a wired network or a wireless network.

The server cluster 230 is a server, or several servers, or a virtualization platform, or a cloud computing service center.

The server cluster 230 includes an external access stratum, a service stratum, a data stratum, and an internal access stratum.

Optionally, a first access server 231 and a nginnx reverse proxy server 232 are deployed in the outer access stratum. The first access server 231 is configured to receive an original image uploaded by the user terminal 220.

A Hypertext Preprocessor (PHP) 7+ nginx server cluster 233, a Go server cluster 234, and a python server cluster 235 are deployed in the business layer. The python server cluster 235 has stored therein a TensorFlow software library. The TensorFlow software library is an open source software library which adopts data flow graphs (English: data flow graphs) and is used for numerical calculation. Optionally, the tensrflow software library in the python server cluster 235 is used to perform automatic matting on the received original image to obtain a target material in the original image.

A Common Database (CDB) cluster 236a, a REDIS database cluster 236b, a File Storage (CFS) server cluster 236c and an ElasticSearch server cluster 236d are deployed in the data layer, and a second access server 237 is deployed in the internal access layer.

The designer terminal 240 may be a mobile phone, a tablet computer, an e-book reader, an MP3 player (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), a laptop computer, a desktop computer, and the like.

The designer terminal 240 is connected to the server cluster 230 via a communication network. Optionally, the communication network is a wired network or a wireless network.

Designer terminal 240 comprises an information delivery device. The information delivery device in the designer terminal 240 is configured to make an original template to be synthesized, and deliver the made original template to the user terminal 220. When the original template is an advertisement template, the information delivery device in the designer terminal 240 is an advertisement template delivery device.

Referring to fig. 3, a flowchart of an image synthesis method according to an embodiment of the present invention is shown. The present embodiment is exemplified by applying the image synthesizing method to the image synthesizing system shown in fig. 2. The image synthesis method comprises the following steps:

step 301, obtaining an original image to be subjected to matting.

The original image is a frame of digital image. In general, an original image is an image including a background region and a foreground region. For example, the original image is a static advertisement image.

Optionally, the raw image is a digital image using Red Green Blue (RGB) color standards. The original image includes M × N pixels, and each pixel is represented by three RGB color components. It should be noted that the embodiments of the present invention are also applicable to black and white images or images of other color standards, and are not limited thereto.

The original image comprises N image elements, N being a positive integer. Wherein the image elements are constituent elements of the original image. Optionally, the image element includes: picture elements and/or text elements.

Step 302, performing saliency detection on the original image to generate a saliency map.

After the user terminal acquires the original image, the saliency of the original image is detected, and a saliency map corresponding to the original image is generated. The saliency map is an image output after saliency detection is carried out on an original image to be subjected to matting. The value of each pixel in the saliency map is used to indicate how much the predicted human eye has taken value of that pixel.

It should be noted that, the process of generating the saliency map by the user terminal performing the saliency detection on the original image may refer to the relevant details in the following embodiments, which are not described first.

And 303, calculating to obtain a cutout mask by adopting a depth cutout model according to the saliency map, wherein the depth cutout model is used for expressing a cutout rule obtained based on sample image training.

And the user terminal acquires the depth cutout model and calculates to obtain a cutout mask by adopting the depth cutout model according to the saliency map. Optionally, the depth matting model is a model obtained by training a neural network by using a sample image.

The user terminal obtains a self-stored depth sectional image model, or obtains the depth sectional image model from the server cluster. This embodiment does not limit this.

The matting mask is used to indicate the transparency values of the individual pixels of the original image. Optionally, the matting mask is a transparency value matrix of the original image, that is, an α matrix, and the transparency value matrix is a matrix for matting the input image.

And step 304, adopting a matting mask to matte out the target material in the original image.

And the user terminal adopts the matting mask to scratch out the target material in the original image. Wherein the target material is at least one image element of the N image elements constituting the original image.

It should be noted that, the user terminal uses the matting mask, and the process of matting out the target material in the original image can refer to relevant details in the following embodiments, which are not described first.

And 305, synthesizing the target material and the original template to be synthesized to obtain a target image.

The user terminal stores a plurality of original models, and when the user terminal detects selection operation corresponding to an original template, the target material obtained by matting and the original template are synthesized to obtain a target image. Wherein the target image comprises target material.

It should be noted that, the process of synthesizing the target material and the original template to be synthesized by the user terminal to obtain the target image may refer to the relevant details in the following embodiments, which are not described herein first.

It should be noted that, the steps 301 to 304 can be separately implemented to be an automatic matting method, and the automatic matting method is used to scrub target materials in an original image; step 305 is an image synthesis method, which synthesizes the target material extracted in steps 301 to 304 with an original template to be synthesized to obtain a target image.

Alternatively, the steps 301 to 305 are usually performed by a user terminal or a server cluster having an image processing function. This embodiment is not limited in this regard. For convenience of description, the present embodiment is described with the execution subject as the user terminal.

In summary, in the embodiment of the present invention, an original image to be subjected to matting is obtained, saliency detection is performed on the original image to generate a saliency map, a depth matting model is used to calculate according to the saliency map to obtain a matting mask, the matting mask is used to scratch out a target material in the original image, and the target material is synthesized with an original template to be synthesized to obtain a target image; the user terminal can automatically scratch the original image by adopting the significance detection and the depth scratching model, the condition that a user needs to repeatedly carry out trisection calibration on the original image in the related technology is avoided, the scratching operation is simplified, the original image is scratched, a full-automatic implementation process that a target material obtained by scratching and the original model are synthesized to obtain the target image is realized, and the image synthesis manufacturing efficiency is improved.

Referring to fig. 4, a flowchart of an image synthesis method according to another embodiment of the invention is shown. The present embodiment is exemplified in that the image synthesizing method is applied to the image synthesizing system shown in fig. 2.

The image synthesis method comprises the following steps:

step 401, obtaining an original template to be synthesized and an original image to be matted.

Optionally, when the user terminal receives a second trigger operation corresponding to the template selection entry in the target application program, the M candidate templates stored in the user terminal are displayed, and when the user terminal receives a third trigger operation corresponding to one candidate template of the M candidate templates, the candidate template is determined as the original template to be synthesized, and the original template to be synthesized is acquired and displayed. Wherein M is a positive integer.

Optionally, the template selection entry is an operable control for displaying the M candidate templates. Illustratively, the type of template selection entry includes at least one of a button, a manipulable entry, and a slider.

In one illustrative example, as shown in FIG. 5, the user interface is an advertising authoring interface 51 for authoring "XX" for the target application. The ad production interface 51 includes a template selection entry 52. When the user terminal receives a click operation corresponding to the template selection entry 52, M candidate templates stored in the user terminal are displayed in thumbnail form on the left side of the advertisement production interface 51. When the user terminal receives a click operation corresponding to one candidate template 53 among the M candidate templates, the candidate template 53 is determined as an original template to be synthesized, and the original template 53 to be synthesized is displayed in the form of a tile on the right side of the advertisement production interface 51.

Optionally, after the original template selected by the user is obtained, when a first trigger operation corresponding to an image upload entry in the target application program is obtained, the user terminal obtains an original image to be subjected to matting and starts a matting function.

Optionally, the image upload entry is an operable control for uploading an original image to be scratched out. Illustratively, the type of the image upload entry includes at least one of a button, a manipulable item, and a slider.

The trigger operation (e.g., the first trigger operation, the second trigger operation, or the third trigger operation) includes any one or a combination of a click operation, a slide operation, a press operation, and a long press operation.

In other possible implementations, the above-mentioned trigger operation may also be implemented in a voice form. For example, taking the triggering operation as the first triggering operation, the user inputs a voice signal in the user terminal, after the user terminal acquires the voice signal, the user terminal analyzes the voice signal to acquire voice content, and when a keyword word matching with the preset information of the image upload entry exists in the voice content, the user terminal determines that the image upload entry is triggered.

In an illustrative example, the advertisement production interface 51 based on the target application program "XX production" provided in fig. 5, as shown in fig. 6, further includes a portal 62 for uploading an image in the advertisement production interface 51. When the user terminal acquires a click operation corresponding to the image upload entry 62, the uploaded original image 64 is acquired, and the matting function is started.

Step 402, performing saliency detection on the original image to generate a saliency map.

After the user terminal starts the image matting function, the significance of the obtained original image is detected to generate a significant image. It should be noted that, the process of significance detection can refer to the relevant details in the following embodiments, which are not described herein.

And step 403, performing edge detection on the saliency map to obtain a corresponding trimap map, wherein the trimap map comprises a foreground region, a background region and an unknown region of the saliency map.

Optionally, the user terminal separates a foreground region and a background region by using a spatial filtering and threshold segmentation algorithm, and obtains a ternary diagram by combining morphological operations.

And step 404, calculating to obtain a matting mask of the original image by adopting a depth matting model according to the original image and the trimap image.

Optionally, the user terminal obtains the depth matting model, inputs the original image and the trimap image into the depth matting model, and calculates to obtain a matting mask of the original image.

Wherein, the depth sectional drawing model is obtained by training according to at least one group of sample data groups, and each group of sample data groups comprises: sample images, sample trimap images, and pre-labeled correct matting masks.

It should be noted that, the training process of the depth matting model can refer to the related description in the following embodiments, which are not introduced here.

Optionally, the user terminal adds the original image, the corresponding trisection image and the matting mask to the training sample set to obtain an updated training sample set, and trains the depth matting model according to the updated training sample set to obtain an updated depth matting model.

Wherein, training the depth matting model according to the updated training sample set, and the process of obtaining the updated depth matting model can be compared with the training process of the reference depth matting model, which is not repeated herein.

And 405, using the matting mask to matte out the target material in the original image.

Optionally, for each pixel of the original image adopting the RGB color standard, the user terminal multiplies the brightness value of each color component thereof by the transparency value at the corresponding position indicated by the matting mask to obtain a matting result of the original image, and separates out the target material therefrom.

With continued reference to fig. 6, based on the original image 64 provided in fig. 6, after the user terminal starts the matting function, the user terminal extracts the target material 66 in the original image 64 according to the matting method, where the target material 66 is a picture material of a person.

Step 406, obtaining the pre-labeled main body elements in the original template.

Optionally, the original template is a structurally layered template. The element level of the original template comprises a main body layer and other levels, and the other levels comprise at least one of a background layer, a decoration layer, a text layer and an interaction layer. The main body layer in the original template corresponds to a main body element, and the main body element is a pre-marked element which can be replaced in the original template.

Optionally, the obtaining, by the user terminal, the pre-labeled main element in the original template includes: after the user terminal obtains the element hierarchy of the original template, the main body element in the main body layer of the original template is obtained.

Optionally, step 406 and step 401 may be executed in parallel, and step 406 may also be executed before step 401, which is not limited in this embodiment.

And step 407, replacing the main elements in the original template according to the target material to obtain a candidate image.

Optionally, step 407 is executed after

steps

405 and 406, and after the user terminal extracts the target material in the original image, the target material is stored in the database, and the main elements in the original template are replaced according to the original template selected by the user, that is, the target material is automatically placed in the original template.

Optionally, the user terminal replaces the main elements in the original template with the target materials, which includes but is not limited to the following two possible implementation manners:

in a first possible implementation manner, the terminal replaces the main elements in the original template with the target material to obtain a candidate image after replacement.

In a second possible implementation manner, the terminal performs scaling processing on the target material to obtain a scaled target material, and an absolute value of a difference between the scaled target material and the size of the main element is smaller than a preset threshold; and replacing the main elements in the original template by the zoomed target material to obtain a candidate image.

Optionally, the user terminal compares the first size of the target material with the second size of the main element, and if the first size of the target material is smaller than the second size of the main element, performs amplification processing on the target material; if the first size of the target material is larger than the second size of the main body element, the target material is subjected to reduction processing.

And step 408, performing post-processing on the candidate image to generate a target image.

Wherein the post-processing comprises super-resolution processing and/or style filter processing.

In an illustrative example, based on the original template 53 provided in fig. 5 and the target material 66 provided in fig. 6, as shown in fig. 7, the user terminal acquires a body element 72 labeled in advance in the original template 53, replaces the body element 72 in the original template 53 with the target material 66 to obtain a candidate image, and performs post-processing on the candidate image by the user terminal to generate a target image 74.

Optionally, if the target material in the candidate image is subjected to the amplification processing, the resolution of the target material may be reduced, and in order to improve the resolution of the amplified target material, the amplified target material is subjected to super-resolution processing, so that the situation that the target material is blurred after being amplified is alleviated to a certain extent.

Optionally, the user terminal performs style filter processing on the candidate image by using a three-dimensional look-up table (3D LUT) algorithm to obtain a target image.

Optionally, the style of the target image includes one of a vivid style, a lightness style, a fresh style, an morning style, a warm style, and a high-cool style.

Optionally, before the user terminal performs the style filter processing on the candidate image by using the 3D LUT algorithm, the designer terminal needs to design an LUT template. Referring to fig. 8, the style filter design process includes: and the designer terminal performs visual design, color matching, LUT template derivation and LUT template uploading, so as to obtain the designed LUT template. The style filter processing process comprises the following steps: and the user terminal acquires the candidate image, acquires the designed LUT template, inputs the candidate image into the LUT template by adopting a 3D LUT algorithm, and outputs the candidate image to obtain a target image.

In an illustrative example, as shown in fig. 9, the image synthesis method provided by the embodiment of the present invention includes, but is not limited to, three steps: 1. acquiring a target material; the user uploads the shot picture 91 to an advertisement production application program of the user terminal, and correspondingly, the user terminal activates a matting function, and the user terminal automatically scratches the uploaded picture 91 to obtain a target material 92 in the picture 91. 2. Disassembling and replacing the original template; the user terminal acquires the main body elements 94 in the main body layer of the original template 93, and replaces the main body elements 94 with the target materials 92. 3. And performing post-processing on the candidate image to obtain a target image. The user terminal performs filter processing on the candidate image, and the like to obtain a target image 95.

In summary, the embodiment of the present invention further obtains the pre-labeled main elements in the original template, replaces the main elements in the original template with the target material to obtain the candidate image, and generates the target image according to the candidate image; the situation that a user needs to manually synthesize the images in the related technology is avoided, the automation of image synthesis is realized, and the production efficiency of image synthesis is further improved.

It should be noted that, in the embodiment of the present invention, the visual attention mechanism of the saliency detection may adopt a bottom-up visual attention model, and the basic structure of the visual attention model is shown in fig. 10.

For an original image, the visual attention model performs linear filtering on the original image to extract primary visual features, which are respectively: color (English: RGBY), brightness, and direction. Generating a plurality of feature maps for indicating significance measurement by adopting Center-periphery operation under various scales, carrying out cross scale combination and normalization on the feature maps to obtain a single-feature-dimension significant map, carrying out linear combination on the single-feature-dimension significant map to obtain a combined significant map, obtaining the most significant attention area in an original image by utilizing a competition mechanism of winning all (English) in biology, and finally completing attention focus transfer by adopting an Inhibition of return (English) method to obtain a final significant map.

Another point to be explained is that before the user terminal obtains the depth matting model, the user terminal needs to train the depth matting model. Optionally, the training process of the depth matting model includes: acquiring a training sample set, wherein the training sample set comprises at least one group of sample data groups; and training the original parameter model by adopting an error back propagation algorithm according to at least one group of sample data set to obtain a depth sectional drawing model.

Each set of sample data sets includes: sample images, sample trimap images, and pre-labeled correct matting masks.

The user terminal trains an original parameter model by adopting an error back propagation algorithm according to at least one group of sample data set to obtain a deep sectional drawing model, and the method comprises the following steps of but not limited to:

1. and for each group of sample data groups in the at least one group of sample data groups, inputting the sample images and the sample trimap images into the original parameter model to obtain a training result.

Optionally, the original parametric model is built according to a neural network model, such as: the original parametric model is built according to a DNN model or an RNN model.

Illustratively, for each group of sample data groups, the user terminal creates an input-output pair corresponding to the group of sample data groups, wherein input parameters of the input-output pair are sample images and sample three-segment images in the group of sample data groups, and output parameters are correct matting masks in the group of sample data groups; and the user terminal inputs the input parameters into the prediction model to obtain a training result.

For example, the sample data set includes a sample image a1, a sample trimap a2, and a pre-labeled correct matting mask X1, and the input-output pairs created by the user terminal are: (sample image A1, sample tripartite A2) - > (correct matting mask X1); wherein, (sample image a1, sample three-segment image a2) is the input parameter, and (correct matting mask X1) is the output parameter.

Alternatively, the input-output pairs are represented by feature vectors.

2. And comparing the training result with the correct scene identification to obtain a calculation loss, wherein the calculation loss is used for indicating the error between the training result and the correct scene identification.

Alternatively, the computational loss is expressed by cross-entropy (English),

alternatively, the user terminal calculates the calculated loss H (p, q) by the following formula:

wherein p (x) and q (x) are discrete distribution vectors of equal length, and p (x) represents the training result; q (x) represents an output parameter; x is a vector in the training results or output parameters.

3. And training by adopting an error back propagation algorithm according to the respective corresponding calculation loss of at least one group of sample data groups to obtain a depth matting model.

Optionally, the user terminal determines a gradient direction of the depth matting model according to the computation loss through a back propagation algorithm, and updates the model parameters in the depth matting model layer by layer from an output layer of the depth matting model.

Illustratively, as shown in fig. 11, the process of training the user terminal to obtain the depth matting model includes: the user terminal obtains a training sample set, wherein the training sample set comprises at least one group of sample data groups, and each group of sample data groups comprises: sample images, sample trimap images, and pre-labeled correct matting masks. And for each group of sample data sets, the user terminal inputs the sample image and the sample three-part graph into the original parameter model, outputs the sample image and the sample three-part graph to obtain a training result, compares the training result with a correct matting mask to obtain a calculation loss, and trains by adopting an error back propagation algorithm to obtain a deep matting model according to the calculation loss corresponding to at least one group of sample data sets. After the trained depth matting model is obtained, the user terminal stores the trained depth matting model. When the user terminal starts a matting function, the user terminal acquires an original image and a corresponding trisection image, acquires a depth matting model obtained through training, inputs the original image and the corresponding trisection image into the depth matting model, outputs a matting mask of the original image, and uses the matting mask to matte and take out a target material in the original image.

When the image synthesizing method is applied to the field of advertisement material production, the image synthesizing method is also referred to as an advertisement material synthesizing method. In the method for synthesizing the advertisement material, the target application program is an application program with an advertisement material processing function, the target material in the original image can be at least one element of plants, animals and still life, the original template to be synthesized is an advertisement template, and the synthesized target image is a target advertisement image.

Optionally, the method for synthesizing the advertisement material includes, but is not limited to, the following steps, as shown in fig. 12:

step 1201, a first trigger operation corresponding to an image uploading entry in a target application program is obtained, and the target application program is an application program with an advertisement material processing function.

Optionally, when the user terminal receives a click operation corresponding to the template selection entry in the advertisement material manufacturing application program, the stored M candidate advertisement templates are displayed, and when the user terminal receives a click operation corresponding to one candidate advertisement template of the M candidate advertisement templates, the candidate advertisement template is determined as the advertisement template to be synthesized, and the advertisement template is obtained and displayed.

Optionally, after obtaining the advertisement template selected by the user, the user terminal detects whether a click operation corresponding to the image upload entry exists, and if so, obtains a click operation corresponding to the image upload entry.

Step 1202, acquiring the uploaded original image according to a first trigger operation.

Optionally, the terminal obtains an original image to be subjected to matting according to a click operation corresponding to the image uploading entry.

Step 1203, starting a matting function.

Optionally, after the terminal obtains the original image to be subjected to matting, the matting function of the target application program is started.

And 1204, performing automatic matting on the original image to obtain a target material in the original image, wherein the target material is at least one element of plants, animals and still life.

The method comprises the steps that a user terminal carries out significance detection on an original image to generate a significance map, a cutout mask is obtained through calculation by adopting a depth cutout model according to the significance map, and the depth cutout model is used for representing a cutout rule obtained based on sample image training; and (4) scratching out the target material in the original image by adopting the scratching mask.

It should be noted that, the process of obtaining the cutout mask by the user terminal using the significance detection and the depth cutout model may refer to the relevant details in the above embodiments, and details are not repeated here.

And step 1205, synthesizing the target material and the advertisement template to be synthesized to obtain a target advertisement image.

A user terminal acquires a pre-marked main element in an advertisement template; replacing main elements in the advertisement template according to the target material to obtain candidate advertisement images; and performing post-processing on the candidate advertisement images to generate target advertisement images.

Optionally, the user terminal performs scaling processing on the target material to obtain a scaled target material, and an absolute value of a difference between the scaled target material and the size of the main element is smaller than a preset threshold; and replacing the main body elements in the advertisement template by the zoomed target material to obtain a candidate advertisement image.

Optionally, the user terminal performs style filter processing on the candidate advertisement image by using a three-dimensional lookup table algorithm to obtain a target advertisement image.

It should be noted that the advertisement material synthesizing method can be similar to the details related to the image synthesizing method provided by the above method embodiment, and will not be described herein again.

The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.

Referring to fig. 13, a schematic structural diagram of an image combining apparatus according to an embodiment of the invention is shown. The image synthesizing apparatus may be implemented by a dedicated hardware circuit, or a combination of hardware and software, as all or a part of an image synthesizing system, the image synthesizing apparatus including: an acquisition module 1310, a generation module 1320, a computation module 1330, a matting module 1340, and a composition module 1350.

An obtaining module 1310, configured to implement step 301 and/or step 401 described above.

A generating module 1320, configured to implement step 302 and/or step 402 described above.

A calculating module 1330 configured to implement the step 303.

A matting module 1340 for implementing the step 304 and/or the step 405 described above.

A synthesis module 1350, configured to implement the step 305.

Optionally, the calculating module 1330 includes: a detection unit and a calculation unit.

A detection unit, configured to implement step 403 described above.

A computing unit configured to implement the step 404.

Optionally, the computing unit is further configured to obtain a depth matting model, where the depth matting model is obtained by training according to at least one group of sample data sets, and each group of sample data sets includes: a sample image, a sample bipartite graph and a pre-labeled correct matting mask. And inputting the original image and the trisection image into the depth matting model, and calculating to obtain a matting mask of the original image.

Optionally, the computing unit is further configured to obtain a training sample set, where the training sample set includes at least one group of sample data groups; and training the original parameter model by adopting an error back propagation algorithm according to at least one group of sample data set to obtain a depth sectional drawing model.

Optionally, the calculating unit is further configured to, for each sample data group of the at least one group of sample data groups, input the sample image and the sample trimap into the original parameter model to obtain a training result. And comparing the training result with the correct matting mask to obtain a calculation loss, wherein the calculation loss is used for indicating the error between the training result and the correct matting mask. And training by adopting an error back propagation algorithm to obtain a depth sectional drawing model according to the respective corresponding calculation loss of at least one group of sample data sets.

Optionally, the obtaining module 1310 includes: a first acquisition unit and a second acquisition unit.

The device comprises a first acquisition unit, a second acquisition unit and a display unit, wherein the first acquisition unit is used for acquiring a first trigger operation corresponding to an image uploading entry in a target application program, and the target application program is an application program with a matting function.

And the second acquisition unit is used for acquiring the original image to be subjected to matting according to the first trigger operation and starting the matting function.

Optionally, the synthesizing module 1350 includes: a third acquisition unit, a replacement unit and a generation unit.

And the third acquisition unit is used for acquiring the body elements marked in advance in the original template.

And the replacing unit is used for replacing the main body element in the original template according to the target material to obtain a candidate image.

And the generating unit is used for performing post-processing on the candidate image to generate a target image.

Optionally, the replacing unit is further configured to perform scaling processing on the target material to obtain a scaled target material, where an absolute value of a difference between the scaled target material and the size of the main element is smaller than a preset threshold; and replacing the main elements in the original template by the zoomed target material to obtain a candidate image.

Optionally, the replacing unit is further configured to perform style filter processing on the candidate image by using a three-dimensional lookup table algorithm to obtain the target image.

The method embodiments described with reference to fig. 3-12 may be combined in further detail. Wherein, the obtaining module 1310 is further configured to implement any other implicit or disclosed function related to the obtaining step in the foregoing method embodiment; the generating module 1320 is further configured to implement any other implicit or disclosed function related to the generating step in the foregoing method embodiment; the computing module 1330 is further configured to implement any other implicit or disclosed functionality associated with the computing steps in the above method embodiments; the matting module 1340 is also used to implement any other implicit or disclosed functionality relating to the matting step in the above-described method embodiments; the synthesis module 1350 is also used to implement any other implicit or disclosed functionality related to the synthesis steps in the above method embodiments.

Referring to fig. 14, it shows a schematic structural diagram of an advertisement material synthesizing apparatus according to an embodiment of the present invention. The image synthesizing apparatus may be implemented by a dedicated hardware circuit, or a combination of hardware and software, as all or a part of an image synthesizing system, the image synthesizing apparatus including: a first obtaining module 1410, a second obtaining module 1420, an opening module 1430, a matting module 1440, and a synthesizing module 1450.

A first obtaining module 1410, configured to implement step 1201 described above.

A second obtaining module 1420, configured to implement the step 1202.

A starting module 1430 for implementing the step 1203.

A matting module 1440 for implementing the above step 1204.

A synthesis module 1450, configured to implement step 1205 described above.

Optionally, the matting module 1440 includes: the device comprises a first generation unit, a calculation unit and a matting unit.

The first generation unit is used for carrying out saliency detection on the original image to generate a saliency map;

the calculation unit is used for calculating to obtain a cutout mask by adopting a depth cutout model according to the saliency map, and the depth cutout model is used for representing a cutout rule obtained based on sample image training;

and the matting unit is used for matting out the target material in the original image by adopting the matting mask.

Optionally, the synthesizing module 1450 includes: the device comprises an acquisition unit, a replacement unit and a second generation unit.

The acquisition and generation unit is used for acquiring the pre-marked main elements in the advertisement template;

the replacing unit is used for replacing the main body element in the advertisement template according to the target material to obtain a candidate advertisement image;

and the second generating unit is used for carrying out post-processing on the candidate advertisement images to generate the target advertisement images.

Optionally, the replacing unit is further configured to perform scaling processing on the target material to obtain a scaled target material, where an absolute value of a difference between the scaled target material and the size of the main element is smaller than a preset threshold; and replacing the main element in the advertisement template by the zoomed target material to obtain the candidate advertisement image.

Optionally, the second generating unit is further configured to perform style filter processing on the candidate advertisement image by using a three-dimensional lookup table algorithm, so as to obtain the target advertisement image.

The method embodiments described with reference to fig. 3-12 may be combined in further detail. The first obtaining module 1410 and the second obtaining module 1420 are further configured to implement any other implicit or disclosed function related to the obtaining step in the foregoing method embodiments; the opening module 1430 is further configured to implement any other implicit or disclosed functionality associated with the opening step in the above method embodiments; the matting module 1440 is further configured to implement any other implicit or disclosed functionality related to the matting step in the above-described method embodiments; the synthesis module 1450 is also used to implement any other implicit or disclosed functionality related to the synthesis step in the above method embodiments.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, the division of each functional module is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiments, which are not described herein again.

An embodiment of the present invention provides an image synthesis system, which includes a processor and a memory, where the memory stores at least one instruction, and the at least one instruction is loaded and executed by the processor to implement the image synthesis method or the advertisement material synthesis method provided in the foregoing method embodiments.

Fig. 15 shows a block diagram of a terminal 1500 according to an exemplary embodiment of the present invention. The terminal 1500 may be the user terminal 220 in the image composition system provided in fig. 2, and may also be the designer terminal 240.

In general, terminal 1500 includes: a processor 1501 and a memory 1502.

Processor 1501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1501 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). Processor 1501 may also include a main processor and a coprocessor, where the main processor is a processor used to process data in a wake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1501 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 1501 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

The memory 1502 may include one or more computer-readable storage media, which may be non-transitory. The memory 1502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 1502 is used to store at least one instruction for execution by the processor 1501 to implement the image composition methods or advertisement material composition methods provided by the various embodiments described above.

In some embodiments, the terminal 1500 may further optionally include: a peripheral device interface 1503 and at least one peripheral device. The processor 1501, memory 1502, and peripheral interface 1503 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 1503 via buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of a radio frequency circuit 1504, a touch screen display 1505, a camera 1506, an audio circuit 1507, and a power supply 1509.

The peripheral device interface 1503 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 1501 and the memory 1502. In some embodiments, the processor 1501, memory 1502, and peripheral interface 1503 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1501, the memory 1502, and the peripheral device interface 1503 may be implemented on separate chips or circuit boards, which is not limited by the present embodiment.

The Radio Frequency circuitry 1504 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuitry 1504 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1504 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1504 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1504 can communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 1504 may also include NFC (Near Field Communication) related circuits, which are not limited by the present invention.

The display 1505 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1505 is a touch display screen, the display screen 1505 also has the ability to capture touch signals on or over the surface of the display screen 1505. The touch signal may be input to the processor 1501 as a control signal for processing. In this case, the display screen 1505 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, display 1505 may be one, providing the front panel of terminal 1500; in other embodiments, display 1505 may be at least two, each disposed on a different surface of terminal 1500 or in a folded design; in still other embodiments, display 1505 may be a flexible display disposed on a curved surface or a folded surface of terminal 1500. Even further, the display 1505 may be configured in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 1505 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.

Camera assembly 1506 is used to capture images or video. Optionally, camera assembly 1506 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of a terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, the main camera and the wide-angle camera are fused to realize panoramic shooting and a VR (Virtual Reality) shooting function or other fusion shooting functions. In some embodiments, camera head assembly 1506 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp and can be used for light compensation under different color temperatures.

The audio circuitry 1507 may include a microphone and speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1501 for processing or inputting the electric signals to the radio-frequency circuit 1504 to achieve voice communication. The microphones may be provided in a plurality, respectively, at different portions of the terminal 1500 for the purpose of stereo sound collection or noise reduction. The microphone may also be an array microphone or an omni-directional acquisition microphone. The speaker is used to convert electrical signals from the processor 1501 or the radio frequency circuit 1504 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 1507 may also include a headphone jack.

A power supply 1509 is used to supply power to the various components in terminal 1500. The power supply 1509 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power supply 1509 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery can also be used to support fast charge technology.

In some embodiments, the terminal 1500 also includes one or more sensors 1510. The one or more sensors 1510 include, but are not limited to: acceleration sensor 1511, gyro sensor 1512, pressure sensor 1513, optical sensor 1515, and proximity sensor 1516.

The acceleration sensor 1511 may detect the magnitude of acceleration on three coordinate axes of the coordinate system established with the terminal 1500. For example, the acceleration sensor 1511 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 1501 may control the touch screen display 1505 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1511. The acceleration sensor 1511 may also be used for acquisition of motion data of a game or a user.

The gyroscope sensor 1512 may detect a body direction and a rotation angle of the terminal 1500, and the gyroscope sensor 1512 and the acceleration sensor 1511 may cooperate to collect a 3D motion of the user on the terminal 1500. The processor 1501, based on the data collected by the gyroscope sensor 1512, may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization while shooting, game control, and inertial navigation.

Pressure sensor 1513 may be disposed on a side bezel of terminal 1500 and/or underneath touch display 1505. When the pressure sensor 1513 is disposed on the side frame of the terminal 1500, the holding signal of the user to the terminal 1500 may be detected, and the processor 1501 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1513. When the pressure sensor 1513 is disposed at the lower layer of the touch display 1505, the processor 1501 controls the operability control on the UI interface according to the pressure operation of the user on the touch display 1505. The operability control comprises at least one of a button control, a scroll bar control, an icon control, and a menu control.

The optical sensor 1515 is used to collect ambient light intensity. In one embodiment, processor 1501 may control the brightness of the display on touch screen 1505 based on the ambient light intensity collected by optical sensor 1515. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1505 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 1505 is turned down. In another embodiment, the processor 1501 may also dynamically adjust the shooting parameters of the camera assembly 1506 based on the ambient light intensity collected by the optical sensor 1515.

A proximity sensor 1516, also known as a distance sensor, is typically disposed on the front panel of the terminal 1500. The proximity sensor 1516 is used to collect a distance between the user and the front surface of the terminal 1500. In one embodiment, when the proximity sensor 1516 detects that the distance between the user and the front surface of the terminal 1500 gradually decreases, the touch display 1505 is controlled by the processor 1501 to switch from the bright screen state to the mute screen state; when the proximity sensor 1516 detects that the distance between the user and the front surface of the terminal 1500 gradually increases, the processor 1501 controls the touch display 1505 to switch from the message screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 15 is not limiting of terminal 1500 and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components may be used.

Referring to fig. 16, a schematic structural diagram of a server 1600 according to an exemplary embodiment of the present invention is shown. The server 1600 may be any one of the servers 230 in the image composition system shown in fig. 2, in particular: the server 1600 includes a Central Processing Unit (CPU)1601, a system memory 1604 including a Random Access Memory (RAM)1602 and a Read Only Memory (ROM)1603, and a system bus 1605 connecting the system memory 1604 and the central processing unit 1601. The server 1600 also includes a basic input/output system (I/O system) 1606, which facilitates transfer of information between devices within the computer, and a mass storage device 1607 for storing an operating system 1613, application programs 1614, and other program modules 1615.

The basic input/output system 1606 includes a display 1608 for displaying information and an input device 1609 such as a mouse, keyboard, etc. for a user to input information. Wherein the display 1608 and the input device 1609 are both connected to the central processing unit 1601 by way of an input-output controller 1610 which is connected to the system bus 1605. The basic input/output system 1606 may also include an input/output controller 1610 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, an input-output controller 1610 may also provide output to a display screen, a printer, or other type of output device.

The mass storage device 1607 is connected to the central processing unit 1601 by a mass storage controller (not shown) connected to the system bus 1605. The mass storage device 1607 and its associated computer-readable media provide non-volatile storage for the server 1600. That is, the mass storage device 1607 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROI drive.

Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 1604 and mass storage device 1607 described above may be collectively referred to as memory.

The server 1600 may also operate as a remote computer connected to a network through a network, such as the internet, in accordance with various embodiments of the invention. That is, the server 1600 may be connected to the network 1612 through the network interface unit 1611 coupled to the system bus 1605, or the network interface unit 1611 may be utilized to connect to other types of networks and remote computer systems (not shown).

Optionally, the memory has at least one instruction, at least one program, a set of codes, or a set of instructions stored therein, which is loaded and executed by the processor to implement the steps executed by the server in the image synthesis method or the advertisement material synthesis method provided by the above-mentioned embodiments of the method.

The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments.

It will be understood by those skilled in the art that all or part of the steps in the data archive method for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims

1. An image synthesis method, characterized in that the method comprises:

in response to receiving a second trigger operation corresponding to a template selection entry in a target application program, displaying at least one candidate advertisement template, wherein the target application program is an application program with a matting function;

in response to receiving a third trigger operation on any one of the at least one candidate advertisement template, determining the selected candidate advertisement template as an advertisement template to be synthesized, and acquiring and displaying the advertisement template to be synthesized;

acquiring a first trigger operation corresponding to an image uploading entrance in the target application program;

acquiring an original image to be subjected to matting according to the first trigger operation, and starting the matting function;

inputting the original image into a visual attention model, and performing saliency detection to generate a saliency map of the original image;

separating a foreground region and a background region in the saliency map by adopting a spatial domain filtering and threshold segmentation algorithm, and obtaining a trimap map corresponding to the saliency map by morphological operation, wherein the trimap map comprises the foreground region, the background region and an unknown region of the saliency map;

Calculating to obtain a matting mask of the original image by adopting a depth matting model according to the original image and the trisection image, wherein the depth matting model is used for representing a matting rule obtained based on sample image training;

adopting the matting mask to matte out a target material in the original image;

acquiring a pre-marked main element in the advertisement template to be synthesized;

replacing the main element with the target material to obtain a candidate advertisement image after replacement;

and performing post-processing on the candidate advertisement image to generate a target advertisement image.

2. The method according to claim 1, wherein the calculating a matting mask of the original image by using a depth matting model according to the original image and the ternary image comprises:

obtaining the depth sectional drawing model, wherein the depth sectional drawing model is obtained by training according to at least one group of sample data sets, and each group of sample data sets comprises: sample images, sample ternary images and pre-marked correct matting masks;

and inputting the original image and the three-segment image into the depth matting model, and calculating to obtain a matting mask of the original image.

3. The method of claim 2, wherein the obtaining the depth matting model comprises:

acquiring a training sample set, wherein the training sample set comprises the at least one group of sample data groups;

and training an original parameter model by adopting an error back propagation algorithm according to the at least one group of sample data set to obtain the depth sectional drawing model.

4. The method of claim 3, wherein the training an original parameter model by using an error back propagation algorithm according to the at least one group of sample data set to obtain the depth matting model comprises:

for each group of sample data groups in the at least one group of sample data groups, inputting the sample image and the sample ternary diagram into the original parameter model to obtain a training result;

comparing the training result with the correct matting mask to obtain a calculation loss, wherein the calculation loss is used for indicating an error between the training result and the correct matting mask;

and training by adopting the error back propagation algorithm to obtain the depth matting model according to the respective corresponding calculation loss of the at least one group of sample data sets.

5. The method according to any one of claims 1 to 4, wherein the replacing the main body elements in the advertisement template to be synthesized with the target material to obtain a candidate image after replacement comprises:

zooming the target material to obtain a zoomed target material, wherein the absolute value of the difference between the zoomed target material and the size of the main element is smaller than a preset threshold value;

and replacing the main body element in the advertisement template to be synthesized by adopting the zoomed target material to obtain the replaced candidate advertisement image.

6. The method of any of claims 1 to 4, wherein post-processing the candidate advertisement images to generate the target image comprises:

and performing style filter processing on the candidate image by adopting a three-dimensional lookup table algorithm to obtain the target advertisement image.

7. A terminal, characterized in that it comprises a processor and a memory in which at least one instruction, at least one program, set of codes or set of instructions is stored, which is loaded and executed by the processor to implement the image synthesis method according to any one of claims 1 to 6.

8. A computer readable storage medium, having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the image synthesis method according to any one of claims 1 to 6.