CN112785495A

CN112785495A - Image processing model training method, image generation method, device and equipment

Info

Publication number: CN112785495A
Application number: CN202110113482.1A
Authority: CN
Inventors: 郑自强
Original assignee: Yushi Technology Nanjing Co ltd
Current assignee: Uisee Technology Zhejiang Co Ltd
Priority date: 2021-01-27
Filing date: 2021-01-27
Publication date: 2021-05-11
Also published as: WO2022160239A1

Abstract

The embodiment of the disclosure relates to an image processing model training method, an image generation device and image processing equipment, wherein the training method comprises the following steps: determining a first adjustable attribute feature vector and a first fixed attribute feature vector of a target object on a sample image; after vector fusion is carried out on the first adjustable attribute feature vector and the first fixed attribute feature vector, the first adjustable attribute feature vector and the first fixed attribute feature vector are input into a generator in a generation countermeasure network to generate a target image; determining a second adjustable attribute feature vector and a second fixed attribute feature vector of a target object on a target image; determining an adjustable attribute difference degree based on the first adjustable attribute feature vector and the second adjustable attribute feature vector; determining a fixed attribute difference degree based on the first fixed attribute feature vector and the second fixed attribute feature vector; and adjusting the network parameters of the image processing model based on the adjustable attribute difference and the fixed attribute difference. The embodiment of the disclosure realizes the continuity processing of the target object attribute and the state control of the specific attribute of the target object.

Description

Image processing model training method, image generation method, device and equipment

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing model training method, an image generating device, and an image processing apparatus.

Background

Currently, image generation techniques are widely studied as an effective way to synthesize new images. Wherein manipulation of specific properties of object instances on an image facilitates not only more controllable image editing, but also many image understanding tasks, such as state estimation and recognition for object instances on images in the field of unmanned or autonomous driving.

Manipulating certain properties of an object instance on an image can be viewed as a transition of the properties from one state to another and the state as a domain/type, and property manipulation can be modeled as a domain/type transition. In the existing attribute conversion scheme based on model training, on one hand, due to the limitation of the model volume and the number of training samples in each attribute state, for example, model training is performed by using sample images of object instances in some specific attribute states, and the like, the trained model can only process a limited number of discrete states of the object instance attributes; on the other hand, the model can only adapt to the discrete state of the object instance attribute, so that the attribute to be adjusted of the object instance and other attribute information cannot be well separated in the model processing process, and further the state control and the image generation only aiming at a certain specific attribute cannot be realized.

Disclosure of Invention

In order to solve the technical problem or at least partially solve the technical problem, embodiments of the present disclosure provide an image processing model training method, an image generation method, an apparatus and a device.

In a first aspect, an embodiment of the present disclosure provides an image processing model training method, where an image processing model includes a generation countermeasure network, the method includes:

determining a first adjustable attribute feature vector and a first fixed attribute feature vector of a target object on a sample image; the first adjustable attribute feature vector is determined according to the attribute adjustment requirement of the target object;

after vector fusion is carried out on the first adjustable attribute feature vector and the first fixed attribute feature vector, the first adjustable attribute feature vector and the first fixed attribute feature vector are input into a generator in the generation countermeasure network, and a target image corresponding to the sample image is generated;

determining a second adjustable attribute feature vector and a second fixed attribute feature vector of a target object on the target image;

determining an adjustable attribute disparity based on the first adjustable attribute feature vector and the second adjustable attribute feature vector;

determining a fixed attribute degree of difference based on the first fixed attribute feature vector and the second fixed attribute feature vector;

and adjusting the network parameters of the image processing model based on the adjustable attribute difference degree and the fixed attribute difference degree.

In a second aspect, an embodiment of the present disclosure further provides an image generation method, where the image generation method is implemented based on a pre-trained image processing model, the image processing model is obtained based on any one of the image processing model training methods provided in the embodiments of the present disclosure, the image processing model includes a generation countermeasure network, and the image generation method includes:

determining a first adjustable attribute feature vector and a first fixed attribute feature vector of a target object on an image to be processed; the first adjustable attribute feature vector is determined according to the attribute adjustment requirement of the target object;

and after vector fusion is carried out on the first adjustable attribute feature vector and the first fixed attribute feature vector, inputting the vector into a generator in the generation countermeasure network, and generating a target image corresponding to the image to be processed.

In a third aspect, an embodiment of the present disclosure further provides an image processing model training apparatus, where an image processing model includes a generation countermeasure network, the apparatus includes:

the first feature vector determining module is used for determining a first adjustable attribute feature vector and a first fixed attribute feature vector of a target object on a sample image; the first adjustable attribute feature vector is determined according to the attribute adjustment requirement of the target object;

the target image generation module is used for performing vector fusion on the first adjustable attribute feature vector and the first fixed attribute feature vector, inputting the vector into a generator in the generation countermeasure network, and generating a target image corresponding to the sample image;

the second feature vector determination module is used for determining a second adjustable attribute feature vector and a second fixed attribute feature vector of the target object on the target image;

an adjustable attribute difference determination module, configured to determine an adjustable attribute difference based on the first adjustable attribute feature vector and the second adjustable attribute feature vector;

a fixed attribute difference determination module, configured to determine a fixed attribute difference based on the first fixed attribute feature vector and the second fixed attribute feature vector;

and the network parameter adjusting module is used for adjusting the network parameters of the image processing model based on the adjustable attribute difference degree and the fixed attribute difference degree.

In a fourth aspect, an embodiment of the present disclosure further provides an image generating apparatus, where the image generating apparatus is implemented based on a pre-trained image processing model, the image processing model is obtained based on any one of the image processing model training methods provided in the embodiment of the present disclosure, the image processing model includes a generation countermeasure network, and the image generating apparatus includes:

the characteristic vector determining module is used for determining a first adjustable attribute characteristic vector and a first fixed attribute characteristic vector of a target object on an image to be processed; the first adjustable attribute feature vector is determined according to the attribute adjustment requirement of the target object;

and the target image generation module is used for performing vector fusion on the first adjustable attribute feature vector and the first fixed attribute feature vector, inputting the vector into a generator in the generation countermeasure network, and generating a target image corresponding to the image to be processed.

In a fifth aspect, the present disclosure also provides an electronic device, including a memory and a processor, where the memory stores a computer program, and when the computer program is executed by the processor, the electronic device is enabled to implement any one of the image processing model training methods or the image generation methods provided by the present disclosure.

In a sixth aspect, the present disclosure also provides a computer-readable storage medium, where the storage medium stores a computer program, and when the computer program is executed by a computing device, the computing device is enabled to implement any one of the image processing model training methods or the image generation methods provided in the embodiments of the present disclosure.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has at least the following advantages: in the embodiment of the disclosure, the adjustable attribute and the fixed attribute of the target object are separated based on the attribute adjustment requirement of the target object on the sample image, and the first adjustable attribute feature vector of the target object is determined based on the attribute adjustment requirement; then combining the first fixed attribute feature vector of the target object, inputting the vector fusion into a generator for generating a countermeasure network in an image processing model, and obtaining a target image corresponding to the sample image; and finally, obtaining an adjustable attribute difference degree and a fixed attribute difference degree based on each obtained adjustable attribute feature vector and each obtained fixed attribute feature vector, and adjusting the network parameters of the image processing model, wherein the network parameter adjustment of multiple cycles can be executed by repeating the operation until the image processing model meeting the image generation requirement is obtained. The method and the device solve the problems that the existing model in the field of image processing can only process a limited number of discrete states of the attributes of the target object on the image, and the adjustable attributes and the fixed attributes of the target object cannot be effectively separated in the model processing process, so that the state control and the image generation aiming at a certain specific attribute of the target object cannot be realized; by adopting the image processing model obtained by training the technical scheme of the embodiment of the disclosure, the continuous processing of the target object attribute on the image can be realized, the discrete state of the target object attribute is not limited to be processed, the state control and the image generation aiming at a certain specific attribute of the target object can be realized, and for example, the image meeting the attribute adjustment requirement of the target object can be generated by controlling the state of the certain specific attribute of the target object.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a flowchart of an image processing model training method provided in an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a training architecture of an image processing model according to an embodiment of the present disclosure;

FIG. 3 is a flowchart of another image processing model training method provided by the embodiments of the present disclosure;

FIG. 4 is a schematic diagram of a training architecture of another image processing model according to an embodiment of the present disclosure;

FIG. 5 is a flowchart of another image processing model training method provided by the embodiments of the present disclosure;

FIG. 6 is a schematic diagram of a training architecture of another image processing model according to an embodiment of the present disclosure;

FIG. 7 is a flowchart of another image processing model training method provided by the embodiments of the present disclosure;

FIG. 8 is a schematic diagram of a training architecture of another image processing model according to an embodiment of the present disclosure;

fig. 9 is a flowchart of an image generation method provided by an embodiment of the present disclosure;

FIG. 10 is a schematic structural diagram of an image processing model training apparatus according to an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of an image generating apparatus according to an embodiment of the present disclosure;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

Fig. 1 is a flowchart of an image processing model training method provided in an embodiment of the present disclosure, which may be applied to a situation how to perform model training based on an attribute adjustment requirement of a target object on an image. The image processing model training method can be executed by an image processing model training device, which can be implemented by software and/or hardware and can be integrated on any electronic equipment with computing capability, such as a terminal or a server. Further, the electronic device may be integrated into other devices having image processing requirements, such as an autonomous vehicle, an unmanned vehicle, or an unmanned aerial vehicle, and may be deployed according to actual conditions.

It should be understood that there is no strict limitation on the execution sequence among the operations shown in fig. 1, the execution sequence shown in fig. 1 should not be understood as a specific limitation to the embodiment of the present disclosure, the execution sequence among the operations may be adjusted according to the actual processing situation, and different operations may be executed in parallel or in series.

Fig. 2 is a schematic diagram of a training architecture of an image processing model provided in an embodiment of the present disclosure, which is used for exemplarily describing the embodiment of the present disclosure with reference to fig. 1, but should not be construed as a specific limitation to the embodiment of the present disclosure. As shown in FIG. 2, the image processing model may include generating a countermeasure network (GAN) that includes a generator and a discriminator. Reference may be made to the prior art regarding the technical principles of generating a countermeasure network itself, and the disclosed embodiments are not specifically limited.

As shown in fig. 1, an image processing model training method provided by an embodiment of the present disclosure may include:

s101, determining a first adjustable attribute feature vector and a first fixed attribute feature vector of a target object on a sample image; and determining the first adjustable attribute feature vector according to the attribute adjustment requirement of the target object.

In the model training process, the sample image may be any type of image, such as a person image, a building image, a landscape image, and the like, and the target object may include a target photographic object on the sample image, such as a person on the person image, a building on the building image, a scene on the landscape image, and the like. The adjustable attribute of the target object on the sample image refers to an attribute which needs to be subjected to state control (or called adjustment), and the fixed attribute refers to an attribute which needs to be kept unchanged in state.

As shown in fig. 2, taking the sample image as a human image as an example, the target object refers to a person on the sample image, the face orientation of the person may be used as an adjustable attribute, and other attributes of the person excluding the face orientation may be used as fixed attributes, such as skin color, hair style, clothes color, facial expression, and background where the person is located, and any target image with a face orientation changed and fixed attributes kept unchanged may be generated by controlling continuous state values of the face orientation (different state values correspond to different adjustable attribute feature vectors).

In some embodiments, after determining the adjustable attribute and the fixed attribute of the target object on the sample image, a first fixed attribute feature vector of the target object on the sample image may be determined based on the fixed attribute of the target object by using any available image feature sampling (or referred to as feature extraction) method, and a first adjustable attribute feature vector corresponding to the adjustable attribute may be generated according to an adjustment requirement of the adjustable attribute (for example, the face orientation of the person is rotated by 15 degrees to the left). For example, the corresponding relationship between the state value and the vector modulo length value of the adjustable attribute of the target object in the attribute space may be predetermined, then the state value of the demand may be determined according to the attribute adjustment requirement of the target object, and then the vector modulo length value corresponding to the state value may be determined, and finally the first adjustable attribute feature vector may be generated based on any available feature vector generation manner according to the determined vector modulo length value.

Continuing with fig. 2, taking a face image as an example, assuming that the face orientation is an adjustable attribute of the target object, a corresponding relationship between a state value [0 °,90 ° ] of the face orientation in the attribute space and a vector modulo length value [0,0.5] is predetermined, an initial state value of the face orientation on the face image is 0 ° (i.e., in a face-up state), if it is determined according to the face orientation adjustment requirement of the target object that the state value currently required is 90 ° (e.g., rotated by 90 ° in the left shoulder direction thereof, and in a side face state), the corresponding vector modulo length value is 0.5, and then a first adjustable attribute feature vector is generated based on the vector modulo length value. It should be understood that when the adjustable attribute state takes any other value, the corresponding vector modulo value and the first adjustable attribute feature vector may also be determined, i.e., embodiments of the present disclosure may implement continuity control of the adjustable attribute state, and are not limited to a particular discretized attribute state.

And S102, after vector fusion is carried out on the first adjustable attribute feature vector and the first fixed attribute feature vector, the first adjustable attribute feature vector and the first fixed attribute feature vector are input into a generator in a generation countermeasure network, and a target image corresponding to the sample image is generated.

As shown in fig. 2, after vector fusion (fuse) is performed on the first adjustable attribute feature vector and the first fixed attribute feature vector, the first adjustable attribute feature vector and the first fixed attribute feature vector are input into a generator in the generation countermeasure network to generate a target image, and meanwhile, a discriminator in the generation countermeasure network can discriminate the target image generated by the generator based on the sample image, that is, the generation countermeasure network can use the sample image as real image data to achieve an image generation effect of self-supervision.

S103, determining a second adjustable attribute feature vector and a second fixed attribute feature vector of the target object on the target image.

For example, any available image feature extraction manner may be adopted to determine the second adjustable attribute feature vector and the second fixed attribute feature vector of the target object on the target image, and the embodiment of the present disclosure is not particularly limited.

And S104, determining the adjustable attribute difference degree based on the first adjustable attribute feature vector and the second adjustable attribute feature vector.

The adjustable attribute variance may be used to measure a variance between an ideal adjustable attribute state value of the target object (i.e., an adjustable attribute state value predetermined based on the attribute adjustment requirement of the target object) and an actual adjustable attribute state value of the target object on the target image, i.e., a variance between the required effect and the actual generated effect. Any algorithm that can be used to determine the difference between two vectors can be used to calculate the adjustable attribute disparity.

In some embodiments, determining the adjustable attribute disparity based on the first adjustable attribute feature vector and the second adjustable attribute feature vector comprises: determining a modular length value of the second adjustable attribute feature vector, determining a difference value or a quotient value between the modular length value and a modular length value corresponding to the first adjustable attribute feature vector, and taking the difference value or the quotient value as an adjustable attribute difference degree; the module length value corresponding to the first adjustable attribute feature vector is predetermined according to the attribute adjustment requirement of the target object, and the difference value mentioned in the embodiment of the present disclosure may be a positive value.

The value of the vector modulo length value is usually between 0 and 1, and the specific calculation of the vector modulo length value may be implemented by referring to the prior art, and is not specifically limited in the embodiment of the present disclosure. It should be noted that, when determining the corresponding relationship between the state value of the adjustable attribute of the target object in the attribute space and the vector modulo length value, the calculation mode of the vector modulo length value is the same as the calculation mode of determining the modulo length value of the second adjustable attribute feature vector in the process of determining the adjustable attribute disparity, so that the calculation accuracy of the adjustable attribute disparity can be ensured.

And S105, determining the fixed attribute difference degree based on the first fixed attribute feature vector and the second fixed attribute feature vector.

The fixed attribute difference can be used to measure whether the fixed attribute of the target object on the sample image and the target image changes, i.e. whether the fixed attribute of the target object changes before and after the attribute adjustment. The fixed attribute difference can be obtained in the same calculation manner as the adjustable attribute difference.

Illustratively, determining the fixed-attribute disparity based on the first fixed-attribute feature vector and the second fixed-attribute feature vector comprises: and respectively determining the modular length values of the first fixed attribute feature vector and the second fixed attribute feature vector, and taking the difference value or the quotient value between the two modular length values as the fixed attribute difference.

And S106, adjusting the network parameters of the image processing model based on the adjustable attribute difference and the fixed attribute difference.

In the model training process, the difference threshold values corresponding to the adjustable attribute difference and the fixed attribute difference can be preset according to the model training precision (the values can be flexibly set). If the adjustable attribute difference and the fixed attribute difference are both smaller than the difference threshold values corresponding to the adjustable attribute difference and the fixed attribute difference, the difference between the ideal adjustable attribute state value of the target object and the actual adjustable attribute state value of the target object on the target image is within the allowable error range, the fixed attribute of the target object on the sample image and the target image is not changed, the accuracy of the image currently output by the model meets the requirement, and the training of the model can be finished; if the adjustable attribute difference and the fixed attribute difference are both greater than or equal to the difference threshold values corresponding to the adjustable attribute difference and the fixed attribute difference, the ideal adjustable attribute state value of the target object is greatly different from the actual adjustable attribute state value of the target object on the target image, the fixed attributes of the target object on the sample image and the target image are changed, the precision of the currently output image of the model does not meet the requirement, the network parameters of the image processing model need to be adjusted according to the adjustable attribute difference and the fixed attribute difference, the model is optimized until the adjustable attribute difference and the fixed attribute difference obtained after the model training operation is repeatedly executed are respectively smaller than the difference threshold values corresponding to the adjustable attribute difference and the fixed attribute difference. Of course, if the adjustable attribute difference or the fixed attribute difference is greater than or equal to the corresponding difference threshold, the network parameters of the image processing model also need to be adjusted according to the adjustable attribute difference and the fixed attribute difference, so as to optimize the model. The network parameters may be determined according to a specific model structure, and those skilled in the art may reasonably configure the network parameters, and the embodiments of the present disclosure are not limited specifically.

In the embodiment of the disclosure, the adjustable attribute and the fixed attribute of the target object are separated based on the attribute adjustment requirement of the target object on the sample image, and the first adjustable attribute feature vector of the target object is determined based on the attribute adjustment requirement; then combining the first fixed attribute feature vector of the target object, inputting the vector fusion into a generator for generating a countermeasure network in an image processing model, and obtaining a target image corresponding to the sample image; and finally, obtaining an adjustable attribute difference degree and a fixed attribute difference degree based on each obtained adjustable attribute feature vector and each obtained fixed attribute feature vector, and adjusting the network parameters of the image processing model, wherein the network parameter adjustment of multiple cycles can be executed by repeating the operation until the image processing model meeting the image generation requirement is obtained. The method and the device solve the problems that the existing model in the field of image processing can only process a limited number of discrete states of the attributes of the target object on the image, and the adjustable attributes and the fixed attributes of the target object cannot be effectively separated in the model processing process, so that the state control and the image generation aiming at a certain specific attribute of the target object cannot be realized; by adopting the image processing model obtained by training the technical scheme of the embodiment of the disclosure, the continuous processing of the target object attribute on the image can be realized, the discrete state of the target object attribute is not limited to be processed, the state control and the image generation aiming at a certain specific attribute of the target object can be realized, and for example, the image meeting the attribute adjustment requirement of the target object can be generated by controlling the state of the certain specific attribute of the target object.

Fig. 3 is a flowchart of another image processing model training method provided in the embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and may be combined with the above optional embodiments. It should be understood that there is no strict limitation on the execution sequence among the operations shown in fig. 3, the execution sequence shown in fig. 3 should not be understood as a specific limitation to the embodiment of the present disclosure, the execution sequence among the operations may be adjusted according to the actual processing situation, and different operations may be executed in parallel or in series.

Fig. 4 is a schematic diagram of a training architecture of another image processing model provided in the embodiment of the present disclosure, which takes a sample image as a human image as an example, and is used to exemplarily describe the embodiment of the present disclosure with reference to fig. 3, but should not be construed as a specific limitation to the embodiment of the present disclosure. As shown in fig. 4, the image processing model may include a generation countermeasure network including a generator and a discriminator, a capsule network, and a multi-branch estimator including convolutional layers and a plurality of branch capsule networks respectively connected to the convolutional layers, the number of convolutional layers may be determined as needed, and embodiments of the present disclosure are not particularly limited. Both the capsule network and the multi-tap estimator may be used to extract image features.

It should be noted that, in fig. 4, in order to display the processing flow of the image data, the image processing model including the capsule network is repeatedly drawn on the right side of the target image, which does not mean that there are two image processing models in the embodiment of the present disclosure, and both the second fixed feature vector and the first fixed feature vector for clearly displaying the target object are input into the capsule network for processing. In the image processing model, the number of the capsule networks may be one or more according to the processing requirements of the feature vectors, for example, the image processing model may include two capsule networks for processing the second fixed feature vector and the first fixed feature vector, respectively. The parameters of the multiple capsule networks may be the same or different, and may be specifically determined according to the training requirements of the model.

As shown in fig. 3, an image processing model training method provided by the embodiment of the present disclosure may include:

s301, determining a first adjustable attribute feature vector and a first fixed attribute feature vector of a target object on a sample image; and determining the first adjustable attribute feature vector according to the attribute adjustment requirement of the target object.

S302, after vector fusion is carried out on the first adjustable attribute feature vector and the first fixed attribute feature vector, the first adjustable attribute feature vector and the first fixed attribute feature vector are input into a generator in a generation countermeasure network, and a target image corresponding to the sample image is generated.

S303, inputting the target image and the image label corresponding to the target image into a multi-branch estimator to obtain a plurality of second adjustable attribute feature vectors of the target object on the target image; wherein the image tag is used to mark the tunable property.

The multi-branch estimator comprises a convolution layer and a plurality of branch capsule networks respectively connected with the convolution layer, wherein each branch can extract the characteristics of the target image based on the image label corresponding to the target image to obtain a plurality of second adjustable attribute characteristic vectors of the target object on the target image. By utilizing multi-branch estimation to extract the characteristics of the adjustable attributes of the target object on the target image, different aspects of the attribute state on the target image can be concerned, and the characteristic extraction results of each branch are fused for obtaining the adjustable attribute difference degree, thereby being beneficial to improving the determination precision of the adjustable attribute difference degree.

S304, determining a second fixed attribute feature vector of the target object on the target image.

S305, determining a vector difference value between the first adjustable attribute feature vector and each second adjustable attribute feature vector.

And S306, determining the adjustable attribute difference degree based on the vector difference values.

In some embodiments, a difference or quotient value between a modulo value of the first tunable attribute feature vector and a modulo value of each second tunable attribute feature vector may be determined as a vector difference value between the first tunable attribute feature vector and each second tunable attribute feature vector; and then carrying out average calculation on the obtained vector difference values, and taking the obtained average value as the final adjustable attribute difference degree, or carrying out summation calculation on the obtained vector difference values, and taking the obtained vector difference sum as the final adjustable attribute difference degree, and the like. It should be appreciated that other methods of determining the degree of difference of the adjustable attribute based on the plurality of vector difference values may be used.

Taking the difference (taking the positive value as the value) between the modular length value of the first adjustable attribute feature vector and the modular length value of each second adjustable attribute feature vector as the vector difference value, summing the vector difference values, and taking the obtained vector difference sum as the final adjustable attribute difference degree, that is, in the embodiment of the present disclosure, the adjustable attribute detection loss function associated with the multi-branch estimator

Can be expressed as follows:

wherein | · | purple sweet₁Denotes a 1 norm, #_dA value of the modular length, alpha, of the second adjustable attribute feature vector representing the estimator output of the d-th branch of the multi-branch estimator_tRepresenting the modular length value, I, corresponding to the first tunable attribute feature vector_tRepresenting the target image.

The modular length value corresponding to the first adjustable attribute feature vector is predetermined according to the attribute adjustment requirement of the target object, the modular length value of each second adjustable attribute feature vector can also be directly output by the multi-branch estimator, and the specific calculation of the modular length value can be realized by referring to the prior art, which is not specifically limited in the embodiments of the present disclosure. For example, for a capsule network, the modular length value of the capsule vector may be compressed to within 1 in the following calculation manner:

wherein v is_jVector representation after the modular length values representing the capsule vectors are compressed to within 1, s_jRepresents any one capsule vector (or called capsule vector),

for maintaining the original vector direction of the vector,

is a scalar less than 1.

S307, inputting the first fixed attribute feature vector and the second fixed attribute feature vector into the capsule network respectively, and determining the fixed attribute difference degree according to the output result of the capsule network.

The capsule network has a self-awareness mechanism and can support stronger feature representation, and the vector representation of the capsule network can model multimodal distribution of image information to regress complex continuous signals, so that more accurate image features can be extracted by adopting the capsule network. As shown in fig. 4, the first fixed attribute feature vector and the second fixed attribute feature vector are respectively input to the capsule network and converted into high-dimensional feature vectors, which is helpful for improving the calculation accuracy of the fixed attribute difference.

The output result of the capsule network may be a high-dimensional capsule network corresponding to the first fixed attribute feature vector and the second fixed attribute feature vector, or a module length value of the high-dimensional capsule network corresponding to the first fixed attribute feature vector and the second fixed attribute feature vector.

Taking the difference (positive value) between the modular length values of the two high-dimensional capsule networks as the fixed attribute difference, that is, the fixed attribute detection loss function related to the capsule network in the embodiment of the present disclosure

Can be expressed as follows:

wherein, σ (I)_s) Representing a sample image I_sInputting first fixed attribute feature vector of upper target object into capsule netThe capsule vector, σ (I), obtained after complexation_t) Representing a target image I_tA capsule vector is obtained after the second fixed attribute feature vector of the upper target object is input into the capsule network, | · |. the calculation of the second fixed attribute feature vector of the upper target object is finished₁Represents 1 norm, | ·| non-conducting phosphor₂Representing a 2 norm (corresponding to the modulo length value).

And S308, adjusting the network parameters of the image processing model based on the adjustable attribute difference degree and the fixed attribute difference degree.

Continuing with FIG. 4, in some embodiments, the generation of the countermeasure network in the image processing model further includes a multi-tap discriminator (i.e., the generation is implemented using a multi-tap discriminator in the countermeasure network), the number of taps of the multi-tap discriminator and the multi-tap estimator being the same, the multi-tap discriminator and the multi-tap estimator sharing convolutional layers (the structural relationship is not shown in FIG. 4); each branch of the multi-branch discriminator is used for discriminating the target image according to the sample image. Considering that the trouble of gradient disappearance and mode collapse often exists in the training process of generating the antagonistic generation network, the identifier can be focused on different image characteristics by utilizing the multi-branch identifier, and the phenomena of gradient disappearance and mode collapse in the training process of generating the antagonistic generation network are reduced.

In the embodiment of the disclosure, the calculation accuracy of the fixed attribute difference is ensured by inputting the first fixed attribute feature vector and the second fixed attribute feature vector of the target object into the capsule network and determining the fixed attribute difference according to the output result of the capsule network; inputting the target image and the image label corresponding to the target image into a multi-branch estimator to obtain a plurality of second adjustable attribute feature vectors of the target object on the target image, and then combining the first adjustable attribute feature vectors to obtain adjustable attribute difference, so that the determination accuracy of the adjustable attribute difference is ensured; and finally, adjusting the network parameters of the image processing model based on the adjustable attribute difference and the fixed attribute difference, so that the trained image processing model can adjust the requirements according to the attributes of the target object, and generate the image meeting the requirements. The method and the device solve the problems that the existing model in the field of image processing can only process a limited number of discrete states of the attributes of the target object on the image, and the adjustable attributes and the fixed attributes of the target object cannot be effectively separated in the model processing process, so that the state control and the image generation aiming at a certain specific attribute of the target object cannot be realized; by adopting the image processing model obtained by training in the technical scheme of the embodiment of the disclosure, the continuous processing of the target object attribute on the image can be realized, the discrete state of the target object attribute is not limited to be processed, the state control and the image generation aiming at a certain specific attribute of the target object can be realized, and the generation efficiency of the required image is improved.

Fig. 5 is a flowchart of another image processing model training method provided in the embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above optional embodiments. It should be understood that there is no strict limitation on the execution sequence among the operations shown in fig. 5, the execution sequence shown in fig. 5 should not be understood as a specific limitation to the embodiment of the present disclosure, the execution sequence among the operations may be adjusted according to the actual processing situation, and different operations may be executed in parallel or in series.

Fig. 6 is a schematic diagram of a training architecture of another image processing model provided in the embodiment of the present disclosure, which takes a sample image as a human image as an example, and is used to exemplarily explain the embodiment of the present disclosure with reference to fig. 5, but should not be construed as a specific limitation to the embodiment of the present disclosure. As shown in fig. 6, the image processing model may include a generation countermeasure network, a capsule network, a multi-tap estimator, an encoder, and a decoder, wherein generating the countermeasure network includes a generator and a multi-tap discriminator.

It should be noted that, in fig. 6, in order to display the processing flow of the image data, the image processing model including the capsule network and the encoder is repeatedly drawn on the right side of the target image, which does not mean that there are two image processing models in the embodiment of the present disclosure, the second fixed feature vector and the first fixed feature vector aiming at clearly displaying the target object are both input into the capsule network for processing, and both the target image and the sample image can be subjected to feature extraction by using the encoder. In the image processing model, the number of the capsule networks may be one or more according to the processing requirements of the feature vectors, for example, the image processing model may include two capsule networks for processing the second fixed feature vector and the first fixed feature vector, respectively. Similarly, the number of the encoders may be one or more, for example, the image processing model may include two encoders for performing feature extraction on the sample image and the target image respectively. The parameters of the multiple capsule networks or the multiple encoders may be the same or different, and may be determined according to the training requirements of the model.

As shown in fig. 5, an image processing model training method provided by the embodiment of the present disclosure may include:

s501, inputting the sample image into an encoder to obtain a first fixed attribute feature vector of the target object.

The encoder has a function of extracting image features, and a specific implementation structure thereof may be implemented with reference to the prior art, and embodiments of the present disclosure are not particularly limited. Within the allowable range of the image feature extraction error, the sample can be directly input into the encoder, and the output vector of the encoder is used as the first fixed attribute feature vector of the target object.

S502, inputting a modular length value predetermined according to the attribute adjustment requirement of the target object into a decoder to obtain a first adjustable attribute feature vector of the target object.

The decoder has a function of generating a feature vector according to a vector modulo length value, and a specific implementation structure thereof may be implemented with reference to the prior art, and embodiments of the present disclosure are not particularly limited. For example, the attribute state value of the current attribute adjustment requirement and the corresponding vector modulo length value may be determined according to a predetermined correspondence between the state value of the adjustable attribute of the target object in the attribute space and the vector modulo length value, and then the vector modulo length value is input into the decoder to obtain the first adjustable attribute feature vector of the target object.

S503, after vector fusion is carried out on the first adjustable attribute feature vector and the first fixed attribute feature vector, the first adjustable attribute feature vector and the first fixed attribute feature vector are input into a generator in a generation countermeasure network, and a target image corresponding to the sample image is generated.

S504, inputting the target image and the image label corresponding to the target image into a multi-branch estimator to obtain a plurality of second adjustable attribute feature vectors of the target object on the target image; wherein the image tag is used to mark the tunable property.

And S505, inputting the target image into an encoder to obtain a second fixed attribute feature vector of the target object.

S506, determining a vector difference value between the first adjustable attribute feature vector and each second adjustable attribute feature vector.

And S507, determining the adjustable attribute difference degree based on the plurality of vector difference values.

And S508, respectively inputting the first fixed attribute feature vector and the second fixed attribute feature vector into the capsule network, and determining the fixed attribute difference degree according to the output result of the capsule network.

And S509, adjusting the network parameters of the image processing model based on the adjustable attribute difference degree and the fixed attribute difference degree.

By adopting the technical scheme of the embodiment of the disclosure, the problems that the existing model in the field of image processing can only process a limited number of discrete states of the attributes of the target object on the image, and the adjustable attributes and the fixed attributes of the target object cannot be effectively separated in the model processing process, so that the state control and the image generation aiming at a certain specific attribute of the target object cannot be realized are solved; by adopting the image processing model obtained by training in the technical scheme of the embodiment of the disclosure, the continuous processing of the target object attribute on the image can be realized, the discrete state of the target object attribute is not limited to be processed, the state control and the image generation aiming at a certain specific attribute of the target object can be realized, and the generation efficiency of the required image is improved.

Fig. 7 is a flowchart of another image processing model training method provided in the embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above optional embodiments. It should be understood that there is no strict limitation on the execution sequence among the operations shown in fig. 7, the execution sequence shown in fig. 7 should not be understood as a specific limitation to the embodiment of the present disclosure, the execution sequence among the operations may be adjusted according to the actual processing situation, and different operations may be executed in parallel or in series.

Fig. 8 is a schematic diagram of a training architecture of another image processing model provided in the embodiment of the present disclosure, which takes a sample image as a human image as an example, and is used to exemplarily explain the embodiment of the present disclosure with reference to fig. 7, but should not be construed as a specific limitation to the embodiment of the present disclosure. As shown in fig. 8, the image processing model may include a generation countermeasure network, a capsule network, a multi-tap estimator, an encoder, and a decoder, wherein generating the countermeasure network includes a generator and a multi-tap discriminator.

It should be noted that, in fig. 8, in order to display the processing flow of the image data, the image processing models including the generation countermeasure network, the capsule network, the multi-branch estimator, the encoder and the decoder are repeatedly drawn on the right side of the target image, and it is not meant that two image processing models exist in the embodiment of the present disclosure, which is intended to clearly display the data stream. Similar to fig. 4 or fig. 6, in the image processing model, the number of the generation countermeasure networks, the capsule networks, the multi-branch estimators, the encoders or the decoders may be one or more according to the processing requirements of the feature vectors or the images, for example, the number of the generation countermeasure networks, the capsule networks, the multi-branch estimators, the encoders or the decoders in the image processing model may be two. The parameters of the same functional module in the image processing model may be the same or different, and may be specifically determined according to the training requirements of the model.

As shown in fig. 7, an image processing model training method provided by the embodiment of the present disclosure may include:

s701, determining a first adjustable attribute feature vector and a first fixed attribute feature vector of a target object on a sample image; and determining the first adjustable attribute feature vector according to the attribute adjustment requirement of the target object.

And S702, after vector fusion is carried out on the first adjustable attribute feature vector and the first fixed attribute feature vector, inputting the vector into a generator in the generation countermeasure network, and generating a target image corresponding to the sample image.

S703, determining a second adjustable attribute feature vector and a second fixed attribute feature vector of the target object on the target image.

S704, determining the adjustable attribute difference degree based on the first adjustable attribute feature vector and the second adjustable attribute feature vector.

S705, determining the fixed attribute difference degree based on the first fixed attribute feature vector and the second fixed attribute feature vector.

And S706, acquiring an initial adjustable attribute feature vector of the target object on the sample image.

The initial adjustable attribute feature vector of the target object on the sample image refers to a feature vector corresponding to an adjustable attribute state value of the target object on the sample image before the target object attribute is adjusted. As shown in fig. 8, taking the sample image as a face image and the adjustable attribute as the face orientation as an example, the face orientation of the person is 0 ° before the face orientation is adjusted, and the attribute state value corresponds to the initial adjustable attribute feature vector.

As shown in fig. 8, in some embodiments, obtaining an initial tunable attribute feature vector for a target object on a sample image comprises:

acquiring an initial module length value corresponding to the adjustable attribute of the target object;

and inputting the initial module length value into a decoder in the image processing model to obtain an initial adjustable attribute feature vector.

For example, an initial attribute state value and a corresponding initial modulo length value of the adjustable attribute of the target object on the sample image may be determined according to a corresponding relationship between a state value and a vector modulo length value of the adjustable attribute of the target object in the attribute space, and then the initial modulo length value may be input into a decoder to obtain an initial adjustable attribute feature vector.

Further, obtaining an initial modular length value corresponding to the adjustable attribute of the target object includes:

and inputting the sample image and an image label corresponding to the sample image into a multi-branch estimator in the image processing model, and determining an initial module length value corresponding to the adjustable attribute of the target object according to the output result of the multi-branch estimator, wherein the image label is used for marking the adjustable attribute.

For example, the initial modular length value corresponding to the adjustable attribute of the target object may be directly output by using the multi-branch estimator, or the initial modular length value corresponding to the adjustable attribute of the target object may be calculated according to a plurality of initial vectors output by the multi-branch estimator. For example, the modulo length values of a plurality of initial vectors output by the multi-branch estimator may be averaged, and the resulting average may be used as the initial modulo length value.

And S707, after vector fusion is carried out on the initial adjustable attribute feature vector and the second fixed attribute feature vector, the initial adjustable attribute feature vector and the second fixed attribute feature vector are input into a generator in the generation countermeasure network, and a verification image corresponding to the target image is generated.

That is, the verification image is an image generated again by restoring the state of the adjustable attribute of the target object on the basis of the generated target image, and may be used to verify the quality of the generated target image and whether the target image meets the attribute adjustment requirement of the target object.

And S708, determining a third adjustable attribute feature vector of the target object on the verification image.

As shown in fig. 8, after vector fusion is performed on the initial adjustable attribute feature vector and the second fixed attribute feature vector, the initial adjustable attribute feature vector and the second fixed attribute feature vector are input to a generator in the generation countermeasure network to generate a verification image, and meanwhile, the verification image generated by the generator can be identified based on the sample image, that is, the generation countermeasure network can use the sample image as real image data to realize an image generation effect of self-supervision.

After the verification image is obtained, a third adjustable attribute feature vector of the target object on the verification image can be determined by adopting any available image feature extraction mode. For example, the verification image and the image tag corresponding to the verification image may be input to a multi-branch estimator, so as to obtain a plurality of third adjustable attribute feature vectors of the target object on the verification image; wherein the image tag is used to mark the tunable property.

S709, determining an adjustable attribute verification value based on the initial adjustable attribute feature vector and the third adjustable attribute feature vector.

The adjustable attribute verification value can be used for measuring the difference between the adjustable attribute state value of the target object on the verification image and the initial adjustable attribute state value of the target object on the sample image, and further evaluating the generation quality of the target image and the training effect of the image processing model. The smaller the adjustable attribute verification value is, the smaller the difference between the adjustable attribute state value of the target object on the verification image and the initial adjustable attribute state value of the target object on the sample image is, the target image can be successfully restored to the sample image state based on the target image, and the quality of the generated target image is better, and the current image processing model achieves a better training effect on the rough probability, otherwise, the quality of the generated target image is poorer, and the current image processing model still needs to be trained.

In some embodiments, a difference value or a quotient value between an initial modulo value corresponding to the initial tunable attribute feature vector and a modulo value of the third tunable attribute feature vector may be used as the tunable attribute verification value. For example, a vector difference value between the initial adjustable attribute feature vector and each third adjustable attribute feature vector may be determined, and an adjustable attribute verification value may be determined based on a plurality of vector difference values, for example, the adjustable attribute verification value may be obtained through a summation process or an averaging process.

And S710, adjusting the network parameters of the image processing model based on the adjustable attribute difference degree, the fixed attribute difference degree and the adjustable attribute verification value.

Specifically, in the model training process, the network parameters of the image processing model can be adjusted based on the adjustable attribute difference, the fixed attribute difference and the adjustable attribute verification value simultaneously, so as to improve the model training precision. And adjusting network parameters until the adjustable attribute difference, the fixed attribute difference and the adjustable attribute verification value are all smaller than the corresponding difference threshold (values can be flexibly set), considering that the model achieves the training effect, finishing the training of the model, and otherwise, continuing to execute the model training operation.

In the embodiment of the disclosure, a target image may be generated based on the attribute adjustment requirement of the sample image and the target object, a verification image may be generated based on the attribute state recovery requirement of the target image and the target object, and the extracted image feature vector is combined to obtain the adjustable attribute difference, the fixed attribute difference and the adjustable attribute verification value, which are used to adjust the network parameters of the image processing model, so that the training precision of the model may be improved, and the effect of generating the required image by the model may be improved.

Fig. 9 is a flowchart of an image generation method provided in an embodiment of the present disclosure, which may be applied to the case of image generation. The image generation method may be performed by an image generation apparatus, which may be implemented in software and/or hardware, and may be integrated on any electronic device with computing capability, such as a terminal or a server. Further, the electronic device may be integrated into other devices having image processing requirements, such as an autonomous vehicle, an unmanned vehicle, or an unmanned aerial vehicle, and may be deployed according to actual conditions.

The image generation method provided by the embodiment of the present disclosure is implemented based on a pre-trained image processing model, the image processing model is obtained based on any image processing model training method provided by the embodiment of the present disclosure, the image processing model includes a generation countermeasure network, and details not described in detail in the following embodiments may refer to descriptions in the above embodiments.

As shown in fig. 9, an image generation method provided by an embodiment of the present disclosure may include:

s901, determining a first adjustable attribute feature vector and a first fixed attribute feature vector of a target object on an image to be processed; and determining the first adjustable attribute feature vector according to the attribute adjustment requirement of the target object.

The image to be processed may be any type of image such as a person image, a building image, a landscape image, and the like, and the target object may include a target photographic object on the image to be processed, such as a person on the person image, a building on the building image, a scene on the landscape image, and the like. The adjustable attribute of the target object on the image to be processed refers to an attribute capable of performing state control (or called adjustment) in the image generation process, and the fixed attribute refers to an attribute capable of keeping a state unchanged in the image generation process. Taking the example of human on the image of the person, the face orientation of the person can be used as an adjustable attribute, and other attributes of the person except the face facing outwards can be used as fixed attributes, such as skin color, hair style, clothes color, facial expression, background of the person, and the like.

In some embodiments, after determining the adjustable attribute and the fixed attribute of the target object on the image to be processed according to the attribute adjustment requirement of the target object, a first fixed attribute feature vector of the target object on the image to be processed may be determined based on the fixed attribute of the target object in any available image feature sampling manner, and a first adjustable attribute feature vector corresponding to the adjustable attribute may be generated according to the adjustment requirement of the adjustable attribute. For example, the corresponding relationship between the state value and the vector modulo length value of the adjustable attribute of the target object in the attribute space may be predetermined, then the state value of the demand may be determined according to the attribute adjustment requirement of the target object, and then the vector modulo length value corresponding to the state value may be determined, and finally the first adjustable attribute feature vector may be generated based on any available feature vector generation manner according to the determined vector modulo length value.

In some embodiments, if the image processing model further comprises an encoder and a decoder, determining the first adjustable attribute feature vector and the first fixed attribute feature vector of the target object on the image to be processed may comprise: inputting an image to be processed into an encoder to obtain a first fixed attribute feature vector of a target object; and inputting a modular length value predetermined according to the attribute adjustment requirement of the target object into a decoder to obtain a first adjustable attribute feature vector of the target object.

And S902, after vector fusion is carried out on the first adjustable attribute feature vector and the first fixed attribute feature vector, inputting the vector into a generator in the generation countermeasure network, and generating a target image corresponding to the image to be processed.

For the explanation of the image processing model, reference may be made to the contents of the above-described embodiments. According to the method and the device, the target image meeting the attribute adjustment requirement can be rapidly generated based on the first adjustable attribute feature vector and the first fixed attribute feature vector of the target object on the image to be processed by utilizing the image processing model, so that the effective state control of the specific attribute of the target object on the image is realized, and the generation efficiency of the required image is improved.

Fig. 10 is a schematic structural diagram of an image processing model training apparatus provided in an embodiment of the present disclosure, which may be implemented by software and/or hardware, and may be integrated on any electronic device with computing capability, such as a terminal or a server. Further, the electronic device may be integrated into other devices having image processing requirements, such as an autonomous vehicle, an unmanned vehicle, or an unmanned aerial vehicle, and may be deployed according to actual conditions.

In an embodiment of the present disclosure, the image processing model may include generating a countermeasure network, and as shown in fig. 10, the image processing model training apparatus 1000 provided in an embodiment of the present disclosure may include a first feature vector determination module 1001, a target image generation module 1002, a second feature vector determination module 1003, an adjustable attribute disparity determination module 1004, a fixed attribute disparity determination module 1005, and a network parameter adjustment module 1006, where:

a first feature vector determination module 1001, configured to determine a first adjustable attribute feature vector and a first fixed attribute feature vector of a target object on a sample image; the first adjustable attribute feature vector is determined according to the attribute adjustment requirement of the target object;

the target image generation module 1002 is configured to perform vector fusion on the first adjustable attribute feature vector and the first fixed attribute feature vector, input the vector to a generator in the generation countermeasure network, and generate a target image corresponding to the sample image;

a second feature vector determining module 1003, configured to determine a second adjustable attribute feature vector and a second fixed attribute feature vector of the target object on the target image;

an adjustable attribute disparity determination module 1004 for determining an adjustable attribute disparity based on the first adjustable attribute feature vector and the second adjustable attribute feature vector;

a fixed attribute difference determining module 1005, configured to determine a fixed attribute difference based on the first fixed attribute feature vector and the second fixed attribute feature vector;

and a network parameter adjusting module 1006, configured to adjust a network parameter of the image processing model based on the adjustable attribute difference and the fixed attribute difference.

In some embodiments, the image processing model may also include a capsule network; correspondingly, the fixed attribute difference degree determining module 1005 is specifically configured to:

and respectively inputting the first fixed attribute feature vector and the second fixed attribute feature vector into the capsule network, and determining the fixed attribute difference according to the output result of the capsule network. That is, the fixed attribute degree of difference determination module 1005 may correspond to a capsule network in the image processing model, the capsule network has a function of performing high-dimensional feature extraction on the sample image or the target image, and the output result of the capsule network can be used to determine the fixed attribute degree of difference.

In some embodiments, the image processing model may further include a multi-branch estimator including a convolution layer and a plurality of branch capsule networks respectively connected to the convolution layer;

accordingly, the second feature vector determination module 1003 includes:

the second fixed attribute feature vector determining unit is used for determining a second fixed attribute feature vector of the target object on the target image;

the second adjustable attribute feature vector determining unit is used for inputting the target image and the image label corresponding to the target image into the multi-branch estimator to obtain a plurality of second adjustable attribute feature vectors of the target object on the target image; wherein the image label is used for marking the adjustable attribute; that is, the second adjustable attribute feature vector determination unit may correspond to a multi-branch estimator in the image processing model, the multi-branch estimator having a function of generating a second adjustable attribute feature vector of the target object on the target image;

the adjustable attribute variance determination module 1004 includes:

a vector difference value determination unit for determining a vector difference value between the first adjustable attribute feature vector and each of the second adjustable attribute feature vectors;

an adjustable attribute disparity determining unit for determining an adjustable attribute disparity based on the plurality of vector disparity values.

In some embodiments, the image processing model may further include an encoder and a decoder;

accordingly, the first feature vector determination module 1001 includes:

the first fixed attribute feature vector determining unit is used for inputting the sample image into the encoder to obtain a first fixed attribute feature vector of the target object; that is, the first fixed-attribute feature vector determination unit may correspond to an encoder in the image processing model, the encoder having a function of generating a first fixed-attribute feature vector of the target object on the sample image;

the first adjustable attribute feature vector determining unit is used for inputting a modular length value predetermined according to the attribute adjustment requirement of the target object into the decoder to obtain a first adjustable attribute feature vector of the target object; that is, the first tunable attribute feature vector determination unit may correspond to a decoder in the image processing model, the decoder having a function of generating a first tunable attribute feature vector of a target object on the sample image;

the second fixed attribute feature vector determination unit is specifically configured to:

and inputting the target image into the encoder to obtain a second fixed attribute feature vector of the target object, namely the encoder can also be used for generating the second fixed attribute feature vector of the target object on the target image.

In some embodiments, the adjustable attribute variance determination module 1004 is specifically configured to:

determining a module length value of the second adjustable attribute feature vector, determining a difference value between the module length value and a module length value corresponding to the first adjustable attribute feature vector, and taking the difference value as an adjustable attribute difference degree; the module length value corresponding to the first adjustable attribute feature vector is predetermined according to the attribute adjustment requirement of the target object;

the fixed attribute difference degree determining module 1005 is specifically configured to:

and respectively determining the module length values of the first fixed attribute feature vector and the second fixed attribute feature vector, and taking the difference value between the two module length values as the fixed attribute difference.

In some embodiments, the image processing model training apparatus 1000 provided by the embodiments of the present disclosure further includes:

the initial adjustable attribute feature vector acquisition module is used for acquiring an initial adjustable attribute feature vector of a target object on a sample image;

the verification image generation module is used for performing vector fusion on the initial adjustable attribute feature vector and the second fixed attribute feature vector, inputting the vector into a generator in a generation countermeasure network, and generating a verification image corresponding to the target image;

the third feature vector determining module is used for determining a third adjustable attribute feature vector of the target object on the verification image;

the adjustable attribute verification value determining module is used for determining an adjustable attribute verification value based on the initial adjustable attribute feature vector and the third adjustable attribute feature vector;

the network parameter adjustment module 1006 is specifically configured to:

and adjusting the network parameters of the image processing model based on the adjustable attribute difference degree, the fixed attribute difference degree and the adjustable attribute verification value.

In some embodiments, the image processing model may further include a decoder;

correspondingly, the initial adjustable attribute feature vector acquisition module comprises:

the initial modular length value acquisition unit is used for acquiring an initial modular length value corresponding to the adjustable attribute of the target object;

the initial adjustable attribute feature vector acquisition unit is used for inputting the initial module length value into the decoder to obtain an initial adjustable attribute feature vector; that is, the initial tunable attribute feature vector obtaining unit may also correspond to a decoder in the image processing model, and the decoder may also be configured to generate an initial tunable attribute feature vector of the target object on the sample image.

In some embodiments, the image processing model may further include a multi-branch estimator including a convolution layer and a plurality of branch capsule networks respectively connected to the convolution layer; the initial modular length value obtaining unit is specifically configured to:

inputting the sample image and the image label corresponding to the sample image into a multi-branch estimator, and determining an initial modular length value corresponding to the adjustable attribute of the target object according to the output result of the multi-branch estimator; namely, the initial modular length value obtaining unit may also correspond to a multi-branch estimator in the image processing model, and the multi-branch estimator may also be configured to output an initial modular length value corresponding to an adjustable attribute of the target object;

the third feature vector determination module is specifically configured to:

inputting the verification image and an image label corresponding to the verification image into a multi-branch estimator to obtain a third adjustable attribute feature vector of the target object on the verification image;

wherein the image tag is used to mark the tunable property.

In some embodiments, the generation countermeasure network further includes a multi-branch discriminator, the multi-branch discriminator and the multi-branch estimator having the same number of branches, the multi-branch discriminator and the multi-branch estimator sharing convolutional layers;

each branch of the multi-branch discriminator is used for discriminating the target image according to the sample image.

The image processing model training device provided by the embodiment of the disclosure can execute any image processing model training method provided by the embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method. Reference may be made to the description of any method embodiment of the disclosure that may not be described in detail in the embodiments of the apparatus of the disclosure.

Fig. 11 is a schematic structural diagram of an image generating apparatus provided in an embodiment of the present disclosure, where the apparatus may be implemented by software and/or hardware, and may be integrated on any electronic device with computing capability, such as a terminal or a server. Further, the electronic device may be integrated into other devices having image processing requirements, such as an autonomous vehicle, an unmanned vehicle, or an unmanned aerial vehicle, and may be deployed according to actual conditions.

The image generation device provided by the embodiment of the present disclosure is implemented based on a pre-trained image processing model, the image processing model is obtained based on any image processing model training method provided by the embodiment of the present disclosure, and the image processing model includes a generation countermeasure network, which may specifically refer to the description in the above method embodiments.

As shown in fig. 11, an image generation apparatus 1100 provided by the present disclosure includes a feature vector determination module 1101 and a target image generation module 1102, where:

a feature vector determining module 1101, configured to determine a first adjustable attribute feature vector and a first fixed attribute feature vector of a target object on an image to be processed; the first adjustable attribute feature vector is determined according to the attribute adjustment requirement of the target object;

and the target image generation module 1102 is configured to perform vector fusion on the first adjustable attribute feature vector and the first fixed attribute feature vector, input the vector into a generator in the generation countermeasure network, and generate a target image corresponding to the image to be processed.

The image generation device provided by the embodiment of the disclosure can execute any image generation method provided by the embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method. Reference may be made to the description of any method embodiment of the disclosure that may not be described in detail in the embodiments of the apparatus of the disclosure.

FIG. 12 is a schematic block diagram of an electronic device suitable for use in implementing embodiments in accordance with the present disclosure. As shown in fig. 12, the electronic apparatus 1200 includes a Central Processing Unit (CPU)1201, which can execute various processes in the foregoing embodiments according to a program stored in a Read Only Memory (ROM)1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. In the RAM1203, various programs and data necessary for the operation of the electronic apparatus 1200 are also stored. The CPU1201, ROM1202, and RAM1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.

The following components are connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output portion 1207 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 1208 including a hard disk and the like; and a communication section 1209 including a network interface card such as a LAN card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. A driver 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1210 as necessary, so that a computer program read out therefrom is mounted into the storage section 1208 as necessary.

In particular, the above described methods may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing any of the image processing model training methods or image generation methods provided by embodiments of the present disclosure. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 1209, and/or installed from the removable medium 1211.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules referred to in the embodiments of the present disclosure may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.

In addition, the embodiment of the present disclosure also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the foregoing embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer-readable storage medium stores one or more programs that, when executed by one or more processors, cause an electronic device to implement any of the image processing model training methods or image generation methods provided by embodiments of the present disclosure.

In summary, the embodiments of the present disclosure provide an image processing model training method, an image generation device, an apparatus, and a medium, in the embodiments of the present disclosure, an adjustable attribute and a fixed attribute of a target object are separated based on an attribute adjustment requirement of the target object on a sample image, and a first adjustable attribute feature vector of the target object is determined based on the attribute adjustment requirement; then combining the first fixed attribute feature vector of the target object, inputting the vector fusion into a generator for generating a countermeasure network in an image processing model, and obtaining a target image corresponding to the sample image; and finally, obtaining an adjustable attribute difference degree and a fixed attribute difference degree based on each obtained adjustable attribute feature vector and each obtained fixed attribute feature vector, and adjusting the network parameters of the image processing model, wherein the network parameter adjustment of multiple cycles can be executed by repeating the operation until the image processing model meeting the image generation requirement is obtained. The method and the device solve the problems that the existing model in the field of image processing can only process a limited number of discrete states of the attributes of the target object on the image, and the adjustable attributes and the fixed attributes of the target object cannot be effectively separated in the model processing process, so that the state control and the image generation aiming at a certain specific attribute of the target object cannot be realized; by adopting the image processing model obtained by training the technical scheme of the embodiment of the disclosure, the continuous processing of the target object attribute on the image can be realized, the discrete state of the target object attribute is not limited to be processed, the state control and the image generation aiming at a certain specific attribute of the target object can be realized, for example, the image meeting the attribute adjustment requirement of the target object can be generated by controlling the state of the certain specific attribute of the target object, and the generation efficiency of the required image is improved.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of image processing model training, wherein an image processing model includes generating a countermeasure network, the method comprising:

2. The method of claim 1, wherein the image processing model further comprises a capsule network;

correspondingly, the determining a fixed attribute difference degree based on the first fixed attribute feature vector and the second fixed attribute feature vector includes:

and respectively inputting the first fixed attribute feature vector and the second fixed attribute feature vector into the capsule network, and determining the fixed attribute difference degree according to the output result of the capsule network.

3. The method of claim 1, wherein the image processing model further comprises a multi-tap estimator comprising a convolutional layer and a plurality of tap capsule networks respectively connected to the convolutional layer;

correspondingly, the determining a second adjustable attribute feature vector of the target object on the target image includes:

inputting the target image and an image label corresponding to the target image into the multi-branch estimator to obtain a plurality of second adjustable attribute feature vectors of a target object on the target image; wherein the image tag is used to mark tunable attributes;

determining an adjustable attribute disparity based on the first adjustable attribute feature vector and the second adjustable attribute feature vector, comprising:

determining a vector difference value between the first tunable attribute feature vector and each second tunable attribute feature vector;

determining the adjustable attribute disparity based on a plurality of the vector disparity values.

4. The method of claim 1, wherein the image processing model further comprises an encoder and a decoder;

correspondingly, the determining a first adjustable attribute feature vector and a first fixed attribute feature vector of a target object on a sample image includes:

inputting the sample image into the encoder to obtain a first fixed attribute feature vector of the target object;

inputting a modular length value predetermined according to the attribute adjustment requirement of the target object into the decoder to obtain a first adjustable attribute feature vector of the target object;

the determining a second fixed attribute feature vector of a target object on the target image comprises:

and inputting the target image into the encoder to obtain a second fixed attribute feature vector of the target object.

5. The method of claim 1, wherein determining an adjustable attribute disparity based on the first adjustable attribute feature vector and the second adjustable attribute feature vector comprises:

determining a module length value of the second adjustable attribute feature vector, determining a difference value between the module length value and a module length value corresponding to the first adjustable attribute feature vector, and taking the difference value as the adjustable attribute difference degree; the module length value corresponding to the first adjustable attribute feature vector is predetermined according to the attribute adjustment requirement of the target object;

determining a fixed-attribute degree of difference based on the first fixed-attribute feature vector and the second fixed-attribute feature vector, comprising:

and respectively determining the module length values of the first fixed attribute feature vector and the second fixed attribute feature vector, and taking the difference value between the two module length values as the fixed attribute difference degree.

6. The method of claim 1, further comprising:

acquiring an initial adjustable attribute feature vector of a target object on the sample image;

after vector fusion is carried out on the initial adjustable attribute feature vector and the second fixed attribute feature vector, the initial adjustable attribute feature vector and the second fixed attribute feature vector are input into a generator in the generation countermeasure network, and a verification image corresponding to the target image is generated;

determining a third adjustable attribute feature vector of the target object on the verification image;

determining an adjustable attribute verification value based on the initial adjustable attribute feature vector and the third adjustable attribute feature vector;

adjusting a network parameter of the image processing model based on the adjustable attribute difference and the fixed attribute difference comprises:

7. The method of claim 6, wherein the image processing model further comprises a decoder;

correspondingly, the obtaining of the initial adjustable attribute feature vector of the target object on the sample image includes:

and inputting the initial modular length value into the decoder to obtain the initial adjustable attribute feature vector.

8. The method of claim 7, wherein the image processing model further comprises a multi-tap estimator comprising a convolutional layer and a plurality of tap capsule networks respectively connected to the convolutional layer;

correspondingly, the obtaining an initial modular length value corresponding to the adjustable attribute of the target object includes:

inputting the sample image and an image label corresponding to the sample image into the multi-branch estimator, and determining an initial modular length value corresponding to the adjustable attribute of the target object according to an output result of the multi-branch estimator;

the determining a third adjustable attribute feature vector of a target object on the verification image comprises:

inputting the verification image and an image label corresponding to the verification image into the multi-branch estimator to obtain a third adjustable attribute feature vector of a target object on the verification image;

wherein the image tag is used to mark tunable properties.

9. The method of claim 3, wherein the generative countermeasure network further comprises a multi-branch discriminator, the multi-branch discriminator and the multi-branch estimator having the same number of branches, the multi-branch discriminator and the multi-branch estimator sharing the convolutional layer;

10. An image generation method, wherein the image generation method is implemented based on a pre-trained image processing model obtained based on the image processing model training method according to any one of claims 1 to 9, wherein the image processing model includes generation of a countermeasure network, and the image generation method includes:

11. An apparatus for training an image processing model, wherein the image processing model includes a generation countermeasure network, the apparatus comprising:

12. An image generation apparatus, wherein the image generation apparatus is implemented based on a pre-trained image processing model obtained based on the image processing model training method according to any one of claims 1 to 9, the image processing model includes generation of a countermeasure network, and the image generation apparatus includes:

13. An electronic device, comprising a memory and a processor, wherein the memory has stored therein a computer program that, when executed by the processor, causes the electronic device to implement the image processing model training method of any one of claims 1-9 or the image generation method of claim 10.

14. A computer-readable storage medium, characterized in that a computer program is stored in the storage medium, which, when executed by a computing device, causes the computing device to implement the image processing model training method of any one of claims 1-9, or to implement the image generation method of claim 10.