WO2020108336A1 - 图像处理方法、装置、设备及存储介质 - Google Patents

图像处理方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2020108336A1
WO2020108336A1 PCT/CN2019/119087 CN2019119087W WO2020108336A1 WO 2020108336 A1 WO2020108336 A1 WO 2020108336A1 CN 2019119087 W CN2019119087 W CN 2019119087W WO 2020108336 A1 WO2020108336 A1 WO 2020108336A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
transformation
image
feature map
tensor
Prior art date
Application number
PCT/CN2019/119087
Other languages
English (en)
French (fr)
Inventor
揭泽群
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2020108336A1 publication Critical patent/WO2020108336A1/zh
Priority to US17/191,611 priority Critical patent/US11798145B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • Embodiments of the present application relate to the Internet field, and in particular, to an image processing method, device, device, and storage medium.
  • GAN generative adversarial networks
  • an image processing method, device, device, and storage medium are provided.
  • an image processing method which is executed by a computer device and includes:
  • the decoding network is used to extract the features of the image
  • each transform network is used for image transform processing
  • the second feature map is input to a reconstruction network to output a target image.
  • the reconstruction network is used to reconstruct the input feature map into a two-dimensional image.
  • an image processing method which is executed by a computer device and includes:
  • the confrontation network includes an image processing network and a plurality of discriminating networks, the image processing network includes a decoding network, a plurality of transformation networks, and a reconstruction network;
  • the original image is input to the trained image processing network, and the image-processed target image is output.
  • an image processing device including:
  • the decoding module is used to input the original image into the decoding network according to the image transformation instruction and output the first feature map of the original image.
  • the decoding network is used to extract the features of the image;
  • a transformation module for sequentially inputting the first feature map into a plurality of transformation networks corresponding to at least one transformation demand information, and outputting a second feature map, each transformation network being used for image transformation processing;
  • the reconstruction module is used to input the second feature map into a reconstruction network and output a target image.
  • the reconstruction network is used to reconstruct the input feature map into a two-dimensional image.
  • an image processing device including:
  • a building module for building an initial confrontation network includes an image processing network and a plurality of discriminating networks, the image processing network includes a decoding network, a plurality of transformation networks, and a reconstruction network;
  • a training module for training the multiple discriminant networks based on multiple image sets, and iteratively training the adversarial network based on the training results of the multiple discriminant networks;
  • the processing module is used for inputting the original image into the trained image processing network when receiving the image transformation instruction, and outputting the target image after image processing.
  • a computer device in one aspect, includes a memory and a processor.
  • the memory stores computer readable instructions.
  • the processor is executed as described above. A possible image processing method.
  • one or more non-volatile storage media storing computer-readable instructions are provided.
  • the computer-readable instructions are executed by one or more processors, the one or more processors execute any of the above Operations performed by an image processing method in a possible implementation manner.
  • FIG. 1 is a schematic diagram of an implementation environment of an image processing method provided by an embodiment of the present application.
  • FIG. 2 is a flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of an image processing method provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a transformation network provided by an embodiment of the present application.
  • FIG. 5 is a flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an adversarial network provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a computer device 102 provided by an embodiment of the present application.
  • AI Artificial Intelligence
  • Artificial Intelligence is a theory, method, technology, and application system that uses digital computers or digital computer-controlled machines to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology in computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machine has the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive subject, covering a wide range of fields, both hardware-level technology and software-level technology.
  • Basic technologies of artificial intelligence generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology and machine learning/deep learning.
  • Computer Vision is a science that studies how to make a machine "see”. Furthermore, it refers to the use of cameras and computers instead of human eyes to identify, track and measure targets. And further do graphics processing, so that the computer processing becomes more suitable for human eye observation or images sent to the instrument for inspection. As a scientific discipline, computer vision studies related theories and technologies, and attempts to establish artificial intelligence systems that can obtain information from images or multidimensional data.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronized positioning and map Construction and other technologies, including common face recognition, fingerprint recognition and other biometric recognition technologies.
  • Machine learning is a multidisciplinary interdisciplinary subject, involving multiple subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and so on. Specially study how the computer simulates or realizes human learning behavior to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its performance.
  • Machine learning is the core of artificial intelligence, and is the fundamental way to make computers intelligent, and its applications are in various fields of artificial intelligence.
  • Machine learning and deep learning usually include technologies such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, inductive learning, and pedagogical learning.
  • artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, drones , Robots, intelligent medical care, intelligent customer service, etc., I believe that with the development of technology, artificial intelligence technology will be applied in more fields and play an increasingly important value.
  • FIG. 1 is a schematic diagram of an implementation environment of an image processing method provided by an embodiment of the present application.
  • at least one user equipment 101 and a computer device 102 may be included, where the at least one user equipment 101
  • An application client may be installed.
  • the application client may be any client capable of image processing.
  • the user device 101 detects the trigger operation of the image conversion instruction, it sends the image conversion instruction carrying the original image to the computer device 102.
  • the computer device 102 is caused to perform image processing of multiple feature transformations on the original image according to the image transformation instruction.
  • the computer device 102 may be a server capable of providing image processing services, and the server may train the processing capabilities of the GAN through multiple image sets, so as to realize image processing through the trained GAN, and the computer device 102 may maintain a training database, Whenever an image transformation instruction is received, the original image carried by the image transformation instruction is stored in the image set in the training database for maintenance and storage of training data.
  • the computer device 102 may be a terminal.
  • the GAN includes a decoding network, a transformation network, a reconstruction network, and a discrimination network.
  • a decoding network By adjusting the parameters of each network, according to the input image, an output image with a certain feature transformation can be obtained through the GAN.
  • the output image of the above multiple feature transformations When the GAN model is used for image processing and you want to transform multiple features of the input image, you can train a GAN for each single feature, and then apply the trained multiple GANs to the input image in turn, that is , First decode the input image based on the decoding network, then transform the input image based on the transformation network, and finally reconstruct the input
  • FIG. 2 is a flowchart of an image processing method provided by an embodiment of the present application. Referring to FIG. 2, taking the computer device 102 as a server providing an image processing service as an example for illustration, this embodiment includes:
  • the server inputs the original image to the decoding network according to the image conversion instruction, and outputs the first feature map of the original image.
  • the decoding network is used to extract the features of the image.
  • the original image refers to an image to be processed.
  • the image conversion instruction is used to instruct the server to perform image conversion on the original image carried, for example, transforming the original image with facial features and hair color.
  • the image conversion instruction may be an image conversion instruction sent by the user equipment through the application client, or may be an image conversion instruction triggered by the server by default during training.
  • the embodiment of the present application does not specifically limit the acquisition method of the image conversion instruction .
  • the image transformation instruction may carry the image to be processed, and the server uses the image to be processed as the original image, where the original image may have multiple transformable features, for example, the features may be hair, human Face features and accessories, etc.
  • the server may also obtain the original image randomly from a locally stored database or according to a preset rule.
  • the embodiment of the present application does not specifically limit the method of acquiring the original image; alternatively, the preset rule may be The original image with high pixel height is obtained first, or the portrait may be obtained first.
  • the embodiment of the present application does not specifically limit the preset rule.
  • the decoding network can extract the features of the image through a first target number of convolutional layers.
  • the first target number can be preset or can be adjusted during the training of the decoding network.
  • This application implements The example does not specifically limit the source of the first target quantity.
  • the width is W 0
  • the depth is D 0
  • H 0 , W 0, and D 0 are positive integers
  • the depth D 0 may be the number of channels of the original image.
  • D 0 3, which are the red, green, and blue channels, respectively.
  • Convolution kernel size F the number of convolution kernels K.
  • each convolution kernel (filter) is used to indicate the weight when weighting the features of the original image.
  • Each convolution kernel may be a weight matrix with a size of F*F*D 0 , and each convolution kernel The depth matches the depth of the original image, where F is a positive integer less than the minimum value of H 0 and W 0 , the number of convolution kernels K is a positive integer, and K is used to indicate that the first convolution layer can The number of extracted features, that is, the number of feature maps output by the first convolutional layer is also K.
  • the parameters of each convolution kernel are adjusted according to the deviation indicated by each network loss function, and the final matrix value of each convolution kernel can be obtained after the training is completed.
  • the original image may also be a single channel, and the embodiments of the present application do not limit the number of channels of the original image.
  • the step size S can also be specified during initialization, so that when performing the convolution operation, the step size S can be greater than or equal to 1 and less than or equal to H 0 and W 0 Any positive integer of the minimum value; optionally, for more accurate extraction of the edge information of the original image, a boundary pad P can also be specified during initialization.
  • the first convolutional layer performs convolution weighting operations on each channel of the original image with S as the step according to the K convolution kernels, thereby obtaining K first layer feature maps (feature maps) ), using the K first-layer feature maps as the input of the second convolutional layer, that is, the depth of the input image of the second convolutional layer is K, then each convolution kernel in the second convolutional layer
  • the depth of is also K, and so on, and the output image of each convolutional layer is used as the input image of the next convolutional layer until after the feature extraction of the first target number of convolutional layers is obtained, so that the height is H
  • a first feature map with a width of W and a depth of D, where H, W, and D are positive integers, and the depth D of the first feature map can be used to indicate the number of feature information extracted by the decoding network.
  • the server obtains multiple conversion networks corresponding to at least one conversion requirement information.
  • the server may obtain multiple conversion networks corresponding to the at least one conversion demand information from the existing multiple conversion networks according to the at least one conversion demand information.
  • the at least one transformation requirement information may be information carried by the image transformation instruction, or may be information set by the server by default, or may be at least one transformation requirement information corresponding to a transformation requirement. At least one method for obtaining transformation demand information is specifically limited.
  • the server may acquire multiple transformation networks corresponding to the at least one transformation demand information from the existing transformation networks at a time based on the at least one transformation demand information, and optionalally, the server may also acquire a transformation network corresponding to the transformation demand information for each piece of transformation demand information processed.
  • the embodiment of the present application does not specifically limit the manner of acquiring the plurality of transformation networks.
  • the server may also sort the plurality of transformation networks corresponding to the at least one transformation demand information, optionally, The server may sort the multiple transformation networks corresponding to the at least one transformation demand information according to the degree of detail of the transformation demand information, for example, first process features with low detail requirements.
  • the sorting rules of multiple transformation networks are specifically defined. The above sorting may refer to using the output of one transformation network as the input of the next transformation network.
  • Each transformation demand information is used to indicate the demand for transformation based on a feature category.
  • Each transformation demand information can correspond to one or more transformation networks, and each trained transformation network can be used according to the corresponding transformation needs.
  • the information is subjected to feature transformation. For example, if the transformation requirement information C i is to change the hair color to green, then the feature category is hair color, and the transformation network T i is used to transform the hair color in the input feature map to green.
  • FIG. 3 is a schematic diagram of an image processing method provided by an embodiment of the present application. Taking the number of acquired transformation networks as N as an example, referring to FIG. 3, after passing through N transformation networks, the original image obtains N features transformed targets For images, the following steps 203-209 take the acquired i-th transformation network as an example, i is a positive integer less than or equal to N, and explain how the i-th transformation network performs the i-th feature transformation.
  • i is a positive integer less than or equal to N
  • a second feature map is output, and the following step 211 is performed.
  • the server obtains the i-th condition vector, where the i-th condition vector is a row vector or a column vector.
  • the i-th transformation network may include n convolutional layers and a target convolutional layer, where n is a positive integer, and the i-th
  • the condition vector is used to indicate the target transformation demand information for the i-th feature category.
  • the i-th condition vector may be a parameter input from the outside world, or may be generated according to the target transformation demand information, and the i-th condition vector is Non-zero vector.
  • the embodiment of the present application does not limit the manner of acquiring the i-th condition vector.
  • the condition vector can be used to represent the type of transformation required by the image transformation.
  • the length of the condition vector can be used to indicate the number of colors that the GAN can achieve the hair color transformation, for example, the GAN can realize the transformation of the hair color of five colors .
  • the length of the condition vector d i 5
  • each bit of the condition vector can be used to indicate a color, if the third bit indicates green, the third digit of the condition vector is set to 1, and all other values Set to 0, that is, the condition vector is [0,0,1,0,0], and the length of the condition vector is not specifically limited in the embodiment of the present application.
  • the server expands the i-th condition vector to be the same as the width W i-1 and the height H i-1 of the feature map output by the i-1 transform network, to obtain the i-th condition tensor.
  • the i-th condition vector can be first copied in the width direction of W i-1 rows, so as to expand to obtain a two-dimensional matrix of size d i *W i-1 , and then the two-dimensional matrix The matrix replicates the H i-1 column in the height direction, thereby expanding to obtain a three-dimensional tensor of size d i *H i-1 *W i-1 , which is the i-th conditional tensor, where d i , H i-1 and W i-1 are positive integers.
  • the i-th condition vector can be copied and expanded in the height direction, and then the obtained two-dimensional matrix can be copied and expanded in the width direction, or the i-th condition vector can be expanded in the width direction and the height direction at the same time.
  • Copy and expand in some embodiments, the condition vector may not be a row vector or a column vector, as long as it can represent the target transformation demand information indicating the i-th feature category, for example, it can be the condition tensor itself, that is, the condition can not be passed
  • the expansion of the vector directly obtains the three-dimensional matrix input from the outside as a conditional tensor.
  • the embodiment of the present application does not limit the manner of acquiring the i-th conditional tensor.
  • the server connects the i-th conditional tensor and the feature map output from the i-1th transformation network in the depth direction to obtain the i-th extended tensor.
  • the width of the i-th conditional tensor is the same as the feature map output by the i-1th transformation network, and the height of the i-th conditional tensor is also the same as the feature output by the i-1th transformation network.
  • the graph is the same, so that the conditional tensor can be directly connected to the feature map output by the i-1th transformation network in the depth direction, assuming that the depth of the feature map output by the i-1th transformation network is D i-1 ,
  • the size of the extended tensor thus obtained is (D i-1 +d i )*H i-1 *W i-1 , where D i-1 is a positive integer.
  • the server sequentially inputs the i-th extended tensor into multiple convolutional layers until the depth of the i-th extended tensor is reduced to the same depth as the feature map output by the i-1th transformed network.
  • the i-th dimension reduction tensor is used to reduce the depth of the i-th extended tensor into multiple convolutional layers until the depth of the i-th extended tensor is reduced to the same depth as the feature map output by the i-1th transformed network.
  • the depth of the i-th extension tensor is reduced from D i-1 +d i to D i-1 by the action of multiple convolution layers in the i-th transformation network, and then the size is obtained Is the i- th dimension reduction tensor of D i-1 *H i-1 *W i-1
  • the internal structure of the multiple convolutional layers may be similar to the structure of the convolutional layer in the decoding network in step 201 above, That is, each convolution layer includes a convolution kernel, and the hyperparameters of each convolution layer may be the same or different, which will not be repeated here.
  • a residual block can also be introduced, for example, if the input of the jth convolutional layer is the j-1th convolutional layer Is the output of x j , then the output of the jth convolutional layer can be expressed as f j (x j ), and the jth residual block is introduced between the jth convolutional layer and the j+1th convolutional layer , The j-th residual block can be expressed as f j (x j )+x j , and taking the j-th residual block as the input of the j+1 convolutional layer, by introducing the residual block, the problem is solved
  • the degradation problem of the neural network makes the deeper convolutional layer of the ith transformation network, the better the effect on image processing.
  • the number of the multiple convolutional layers may be a hyperparameter preset by the server, or may be a value adjusted during GAN training.
  • the quantity acquisition method is specifically limited.
  • the server inputs the i-th dimension reduction tensor to the target convolution layer of the i-th transformation network, and performs convolution processing on the i-th dimension reduction tensor.
  • the internal structure of the target convolutional layer is also similar to the structure of the aforementioned multiple convolutional layers, and will not be described again.
  • the server inputs the convolution-processed tensor to the activation function and outputs the ith mask.
  • the activation function is used to perform nonlinear processing on the input tensor.
  • the i-th mask is used to indicate the transformation area corresponding to the i-th transformation demand information in the feature map output from the i-1th transformation network. Based on the above example, the i-th mask is used to Indicate the area representing the hair in the feature map output from the i-1th transformation network.
  • the i-th mask may be a two-dimensional matrix of size H i-1 *W i-1 , and the i-th mask may be extended to the same depth as the i-th dimension reduction tensor
  • the parameter of the target convolutional layer can also be adjusted directly to obtain the expanded mask directly.
  • the activation function may be Sigmoid, tanh, or ReLU, etc., which can perform nonlinear processing on the output of the target convolutional layer, thereby improving the ability to express details of the GAN transformation.
  • the function expression of the activation function is specifically limited.
  • the server reserves the area corresponding to the i-th mask in the i-th dimension-reduction tensor, and replaces the area other than the i-th mask in the i-th dimension-reduction tensor with the i-1th Transform the corresponding area in the feature map output by the network to obtain the feature map output by the ith transform network.
  • the feature map output from the i-1th transformation network is represented as f i-1
  • the i-th dimension reduction tensor is represented as f i ′
  • the i-th mask is represented as g i
  • the feature map in the feature map corresponding to the i-th transform demand information undergoes feature transform.
  • the server repeatedly executes the above steps 203-209 until each transformation demand information is transformed correspondingly, and outputs a second feature map.
  • the above steps 203-209 illustrate the method of the ith transformation network to perform the ith transformation demand information.
  • the image transformation can be continued based on the ith+1 transformation network, through each transformation network
  • the serial connection of uses the output of the previous transformation network as the input of the next transformation network until the transformation network corresponding to each transformation demand information undergoes feature transformation, and the image output by the last transformation network is the second feature map .
  • the server inputs the second feature map to the reconstruction network and outputs a target image.
  • the reconstruction network is used to reconstruct the input feature map into a two-dimensional image.
  • the second feature map can be reconstructed into a target image through multiple convolution layers in the reconstruction network, and the target image is the original image after being processed by the at least one transformation requirement information.
  • the server inputs the original image into the decoding network for feature extraction according to the image transformation instruction, and then sequentially enters multiple transformation networks to realize transformation of multiple features, and then enters the reconstruction network to reconstruct the target image , So that the larger the number of features, the whole only needs to undergo one decoding and one reconstruction, making the image processing process simple and smooth; further, by introducing condition vectors and masks, each transformation network can make it unnecessary to discriminate network participation , To achieve the transformation of the corresponding transformation demand information, simplifying the GAN network architecture; further, by setting the step size S and the boundary fill P to an appropriate value, the height and width of the output second feature map can be made the same as the original image To avoid losing detailed information.
  • FIG. 5 is a flowchart of an image processing method provided by an embodiment of the present application. Referring to FIG. 5, this embodiment includes:
  • the server constructs an initial confrontation network, which includes a decoding network, multiple transformation networks, a reconstruction network, and multiple discriminant networks.
  • FIG. 6 is a schematic structural diagram of an adversarial network provided by an embodiment of the present application.
  • multiple transformation networks may be accessed in parallel after the decoding network. After the network is connected to the reconstructed network, after the reconstructed network is connected to the multiple discriminating networks in parallel, wherein each discriminating network has a one-to-one correspondence with the transformation network among the plurality of transformation networks.
  • the training data of the decoding network and the reconstruction network can be shared during the training process to shorten Training the data needed to optimize the resource allocation of the confrontation network.
  • the hyperparameters of each network can also be preset, and different transformation networks can be set to the same initialization value, or can be set to a different initialization value for different The same is true for discriminating networks, which will not be repeated here.
  • the embodiments of the present application do not specifically limit the initialization values of hyperparameters of each network in the initialization process.
  • the server inputs the i-th image set to the i-th discrimination network, and adjusts the parameters of the i-th discrimination network according to the value of the loss function of the i-th discrimination network.
  • the i-th discriminant network is any discriminant network among multiple discriminant networks.
  • the training situation of a branch in the adversarial network is used as an example for description, that is, a decoding network .
  • the branch consisting of the i-th transformation network, reconstruction network and i-th discriminant network has similar training steps for any branch in the adversarial network, and each branch in the adversarial network shares the decoding network and The training data of the reconstruction network will not be repeated here.
  • each image set may correspond to a feature category
  • each feature category may correspond to a discriminant network
  • each image set may include a true sample set and a false sample set, corresponding to the first feature category
  • the true sample set may be an image that has undergone transformation of the first feature category
  • the false sample set may be an image that has undergone transformation of the second feature category, where the second feature category may be in a transformable category group
  • any feature category through the training of the discriminant network, so that for the true sample set, the discriminant network can output 1, for the fake sample set, the discriminant network can output 0, so as to achieve the discriminant function .
  • the i-th discriminant network can be used to judge the output results after processing the decoding network, the i-th transform network and the reconstructed network, so as to adjust the parameters of each network in conjunction In order to obtain an optimized GAN, when the image processing is performed after training, it is not necessary to use multiple discriminant networks.
  • the loss function may include three types, and the first type of the loss function may be the i-th discriminant network D i , the discriminated network E, the i-th transform network T i and Whether the output image after the reconstruction of the network R is a true function L adv,i , the form of this function can be expressed as:
  • y is the image in the true sample set
  • x is the image generated by the GAN network.
  • the second type of loss function can be a function that classifies the features of the images in the true sample set during the discrimination process
  • the form of this function can be expressed as:
  • c i is the feature category corresponding to the i-th discriminant network, so that when the discriminant network classifies features more accurately, the loss function The smaller the value, the smaller the loss.
  • the third type of loss function can be a function for classifying the image generated by the GAN network in the process of discrimination
  • the form of this function can be expressed as:
  • the loss function The smaller the value, the smaller the loss.
  • the server adjusts the parameters of the decoding network, the reconstruction network, and the i-th transformation network according to the values of the loss functions of the decoding network, the reconstruction network, and the i-th transformation network.
  • the loss function of the reconstructed network and the decoded network It can be expressed in the following form:
  • the loss function Used to represent the loss between the original image and the original image after the original image passes through the decoding network and the original image.
  • the loss function of the i-th transformation network It can be expressed in the following form:
  • the preset value is a default parameter of the server, and may also be a manually set value.
  • the above steps 502-504 use the i-th image set to adjust the parameters of the i-th discriminant network first, but after adjusting the parameters of the discriminant network, it will affect the loss of the decoding network, the reconstruction network and the i-th transform network
  • the value of the function thus adjusting the parameters of the above network, which in turn affects the value of the loss function of the i-th discriminating network, makes the operation of repeatedly adjusting the parameters to achieve iterative training against a branch in the network.
  • Each branch in the adversarial network can perform the operations performed in steps 502-504 to realize the training of the initialization network, thereby obtaining a neural network capable of performing multiple feature transformations for subsequent image processing.
  • all the above loss functions can also be weighted and summed to obtain the loss function L G of the adversarial network:
  • the weight of the third type of loss function of the discrimination network is ⁇ cls
  • the weight of the sum of the loss functions of the decoding network and multiple transformation networks is ⁇ cyc .
  • the considered network combat training is completed, wherein the weight of each function of the weight loss may be a server pre Set value.
  • the method provided in the embodiment of the present application builds an initial confrontation network to train the multiple discriminant networks based on multiple image sets, and then iteratively trains the adversarial network based on the training results of the multiple discriminant networks until the confrontation After the network training is completed, when receiving the image transformation instruction, the original image is input into the trained adversarial network, and the target image after image processing is output.
  • multiple discriminant networks and multiple transformation networks share the training data of the decoding network and the reconstruction network , Which shortens the training data required for the GAN that performs multiple feature transformations, thereby shortening the time to train the GAN; further, by adjusting the value of the loss function of each network, it can instruct to obtain more accurate parameters against the network, Realize accurate feature transformation; further, when receiving the image transformation instruction, select the transformation network corresponding to the transformation demand information to realize image processing, and optimize the network architecture and resource configuration of GAN during training and use.
  • steps in the embodiments of the present application are not necessarily executed in the order indicated by the step numbers. Unless clearly stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least a part of the steps in each embodiment may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. The execution of these sub-steps or stages The order is not necessarily sequential, but may be executed in turn or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • FIG. 7 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • the device includes:
  • the decoding module 701 is used to input the original image into the decoding network according to the image transformation instruction and output the first feature map of the original image, and the decoding network is used to extract the features of the image;
  • the transformation module 702 is configured to sequentially input the first feature map to multiple transformation networks corresponding to at least one transformation requirement information, and output a second feature map, and each transformation network is used for image transformation processing;
  • the reconstruction module 703 is configured to input the second feature map into a reconstruction network and output a target image.
  • the reconstruction network is used to reconstruct the input feature map into a two-dimensional image.
  • the device provided by the embodiment of the present application inputs the original image to the decoding network for feature extraction according to the image transformation instruction, and then sequentially inputs multiple transformation networks to realize transformation of multiple features, and then inputs the reconstruction network to reconstruct the target image, so that The larger the number of features, the whole need only undergo one decoding and one reconstruction, making the image processing process simple and smooth.
  • the transformation module 702 includes:
  • the determining unit is configured to determine, for each transformation network, a conditional tensor based on the transformation demand information corresponding to the transformation network, the conditional tensor having the same width and height as the input feature map corresponding to the transformation network;
  • the transformation unit is configured to transform the region corresponding to the transformation network in the feature map output from the previous transformation network based on the conditional tensor corresponding to the transformation network, and output the feature map of the transformation network.
  • the determining unit is further used to obtain a condition vector, where the condition vector is a row vector or a column vector;
  • condition vector is extended to the same width and height as the input feature map to obtain the condition tensor.
  • the transformation unit includes:
  • the connected subunit is used to connect the conditional tensor with the input feature map in the depth direction to obtain an extended tensor
  • the dimension reduction subunit is used to sequentially input the extended tensor into multiple convolutional layers until the depth of the extended tensor is reduced to the same depth as the input feature map, and the dimension reduced tensor is output;
  • a first obtaining subunit configured to obtain a mask according to the dimension reduction tensor, and the mask is used to indicate an area indicated by the transformation demand information in the input feature map;
  • the second obtaining subunit is configured to obtain the output feature map of the transformation network according to the dimension reduction tensor, the mask and the input feature map.
  • the first acquisition subunit is further configured to input the dimension reduction tensor into a target convolutional layer of the transformation network, and perform convolution processing on the dimension reduction tensor ;
  • the convolution processed tensor is input into an activation function, and the mask is output.
  • the activation function is used to perform nonlinear processing on the input tensor.
  • the second acquisition subunit is further used to reserve the area corresponding to the mask in the dimension reduction tensor, and the area other than the mask in the dimension reduction tensor Replace with the corresponding area in the input feature map to obtain the output feature map.
  • FIG. 8 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application. Referring to Figure 8, the device includes:
  • the building module 801 is used to build an initial confrontation network, which includes a decoding network, multiple transformation networks, a reconstruction network, and multiple discriminant networks;
  • the training module 802 is used to train the multiple discriminant networks based on multiple image sets, and iteratively train the adversarial network based on the training results of the multiple discriminant networks;
  • the processing module 803 is used for inputting the original image into the trained adversarial network when receiving the image transformation instruction, and outputting the target image after image processing.
  • the device provided by the embodiment of the present application builds an initial confrontation network to train the multiple discriminant networks based on multiple image sets, and then iteratively trains the adversarial network according to the training results of the multiple discriminant networks until the confrontation After the network training is completed, when receiving the image transformation instruction, the original image is input into the trained adversarial network, and the target image after image processing is output. Because multiple discriminant networks and multiple transformation networks share the training data of the decoding network and the reconstruction network , Which shortens the training data required for the GAN that performs multiple feature transformations, thereby shortening the time to train the GAN.
  • the processing module 803 is further configured to, when receiving the image transformation instruction, obtain multiple transformation networks corresponding to the at least one transformation demand information according to at least one transformation demand information;
  • the original image is sequentially input to the decoding network, the plurality of transformation networks corresponding to the at least one transformation requirement information, and the reconstruction network, and the target image is output.
  • the training module 802 is further used to input an image set corresponding to the discriminant network for each discriminant network, and adjust the parameters of the discriminant network according to the value of the discriminant network's loss function;
  • the image processing apparatus provided in the above embodiments is only exemplified by the division of the above functional modules.
  • the above functions can be allocated by different functional modules as needed That is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the image processing apparatus and the image processing method embodiments provided in the above embodiments belong to the same concept. For the specific implementation process, see the image processing method embodiments, and details are not described here.
  • FIG. 9 shows an internal structure diagram of a computer device in an embodiment.
  • the computer device includes a processor, memory, and network interface connected by a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system and may also store a computer program.
  • the processor may cause the processor to implement an image processing method.
  • a computer program may also be stored in the internal memory, and when the computer program is executed by the processor, the processor may be caused to execute the image processing method
  • a computer-readable storage medium is also provided, for example, a memory including instructions, which can be executed by a processor in the terminal to complete the image processing method in the foregoing embodiments.
  • the computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, or the like.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM random access memory
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • RDRAM direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例公开了一种图像处理方法、装置、设备及存储介质。所述方法包括:根据图像变换指令,将原始图像输入解码网络,输出该原始图像的第一特征图;将该第一特征图依次输入与至少一个变换需求信息对应的多个变换网络,输出第二特征图;将该第二特征图输入重建网络,输出目标图像。

Description

图像处理方法、装置、设备及存储介质
本申请要求于2018年11月30日提交中国专利局,申请号为201811457745.5,申请名称为“图像处理方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及互联网领域,特别涉及一种图像处理方法、装置、设备及存储介质。
背景技术
随着多媒体技术和网络技术的飞速发展和广泛应用,人们在日常生活和生产活动大量使用图像信息。在很多情况下需要对图像进行处理,例如变换图像的颜色等。
目前,可以采用生成式对抗网络(generative adversarial networks,GAN)的深度学习模型实现图像处理,在GAN中,欲实现对输入图像进行多个特征的变换处理时,通常为每个单一的特征训练一个GAN,再将训练好的多个GAN依次作用于输入图像。
然而,在上述过程中,当特征数量越多时,需要的训练数据也就多,训练多个GAN所耗费的时间就越长,进行了多次解码和多次重建,使得图像处理的过程繁琐冗长。
发明内容
根据本申请提供的各种实施例,提供一种图像处理方法、装置、设备及存储介质。
一方面,提供了一种图像处理方法,由计算机设备执行,包括:
根据图像变换指令,将原始图像输入解码网络,输出该原始图像的第一特征图,该解码网络用于提取图像的特征;
将该第一特征图依次输入与至少一个变换需求信息对应的多个变换网络,输出第二特征图,每个变换网络用于进行图像变换处理;及
将该第二特征图输入重建网络,输出目标图像,该重建网络用于将输入的 特征图重建为二维图像。
一方面,提供了一种图像处理方法,由计算机设备执行,包括:
构建初始化的对抗网络,该对抗网络包括图像处理网络以及多个判别网络,所述图像处理网络包括解码网络、多个变换网络和重建网络;
根据多个图像集,训练该多个判别网络,根据该多个判别网络的训练结果,迭代训练该对抗网络;及
当接收图像变换指令时,将原始图像输入训练完毕的图像处理网络,输出经过图像处理的目标图像。
一方面,提供了一种图像处理装置,包括:
解码模块,用于根据图像变换指令,将原始图像输入解码网络,输出该原始图像的第一特征图,该解码网络用于提取图像的特征;
变换模块,用于将该第一特征图依次输入与至少一个变换需求信息对应的多个变换网络,输出第二特征图,每个变换网络用于进行图像变换处理;及
重建模块,用于将该第二特征图输入重建网络,输出目标图像,该重建网络用于将输入的特征图重建为二维图像。
一方面,提供了一种图像处理装置,包括:
构建模块,用于构建初始化的对抗网络,该对抗网络包括图像处理网络以及多个判别网络,所述图像处理网络包括解码网络、多个变换网络和重建网络;
训练模块,用于根据多个图像集,训练该多个判别网络,根据该多个判别网络的训练结果,迭代训练该对抗网络;及
处理模块,用于当接收图像变换指令时,将原始图像输入训练完毕的图像处理网络,输出经过图像处理的目标图像。
一方面,提供了一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如上述任一种可能实现方式的图像处理方法。
一方面,提供了一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述任一种可能实现方式的图像处理方法所执行的操作。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种图像处理方法的实施环境示意图;
图2是本申请实施例提供的图像处理方法的流程图;
图3是本申请实施例提供的图像处理方法的示意图;
图4是本申请实施例提供的变换网络的结构示意图;
图5是本申请实施例提供的图像处理方法的流程图;
图6是本申请实施例提供的对抗网络的结构示意图;
图7是本申请实施例提供的图像处理装置的结构示意图;
图8是本申请实施例提供的图像处理装置的结构示意图;及
图9是本申请实施例提供的计算机设备102的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
计算机视觉技术(Computer Vision,CV)计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识 别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。
机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。
随着人工智能技术研究和进步,人工智能技术在多个领域展开研究和应用,例如常见的智能家居、智能穿戴设备、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、无人机、机器人、智能医疗、智能客服等,相信随着技术的发展,人工智能技术将在更多的领域得到应用,并发挥越来越重要的价值。
本申请实施例提供的方案涉及人工智能的计算机视觉技术等技术,具体通过如下实施例进行说明:
图1是本申请实施例提供的一种图像处理方法的实施环境示意图,参见图1,在该实施环境中,可以包括至少一个用户设备101和计算机设备102,其中,该至少一个用户设备101上可以安装有应用客户端,该应用客户端可以是任一能够进行图像处理的客户端,当用户设备101检测到图像变换指令的触发操作时,向计算机设备102发送携带原始图像的图像变换指令,使得计算机设备102根据该图像变换指令,对该原始图像进行多个特征变换的图像处理。
其中,该计算机设备102可以是能够提供图像处理服务的服务器,该服务器可以通过多个图像集训练GAN的处理能力,从而通过训练完毕的GAN实现图像处理,该计算机设备102可以维护有训练数据库,每当接收图像变换指令时,将该图像变换指令携带的原始图像存储至训练数据库中的图像集,以进行训练数据的维护和储备。计算机设备102可以是终端。
在一些实施例中,GAN中包括解码网络、变换网络、重建网络和判别网络, 通过对各个网络的参数调整,从而根据输入图像,能够通过GAN得到进行了某个特征变换的输出图像。当采用GAN模型进行图像处理,欲实现对输入图像进行多个特征的变换处理时,可以为每个单一的特征训练一个GAN,再将训练好的多个GAN依次作用于输入图像,也即是,先基于解码网络对输入图像进行解码,再基于变换网络对输入图像进行变换,最后基于重建网络对输入图像进行重建,之后再进入下一个GAN中重复上述过程,直到得到对该输入图像进行了上述多个特征变换的输出图像。然而这种方法,当特征数量越多时,需要的训练数据也就多,训练多个GAN所耗费的时间就越长,进行了多次解码和多次重建,使得图像处理的过程繁琐冗长,故可以将原始图像输入解码网络进行特征提取,再依次输入多个变换网络,实现对多个特征的变换,再输入重建网络进行重建为目标图像,使得特征数量越多时,也整体只需经过一次解码和一次重建,使得图像处理的过程变得简洁流畅。
图2是本申请实施例提供的一种图像处理方法的流程图。参见图2,以该计算机设备102为提供图像处理服务的服务器为例进行说明,该实施例包括:
201、服务器根据图像变换指令,将原始图像输入解码网络,输出该原始图像的第一特征图,该解码网络用于提取图像的特征。
在一种可能实施方式中,原始图像是指待处理的图像。该图像变换指令用于指示服务器对所携带的原始图像进行图像变换,例如,对原始图像进行五官形态的变换、头发颜色的变换等。该图像变换指令可以是由用户设备通过应用客户端所发送的图像变换指令,也可以是在训练时由服务器默认触发的图像变换指令,本申请实施例不对该图像变换指令的获取方式进行具体限定。
在一些实施例中,该图像变换指令可以携带待处理图像,服务器将该待处理图像作为该原始图像,其中,该原始图像可以具有多个可变换的特征,例如,该特征可以是头发、人脸五官以及配饰等等。
在一些实施例中,服务器也可以从本地存储的数据库中随机或按照预设规则获取该原始图像,本申请实施例不对该原始图像的获取方式进行具体限定;可选地,该预设规则可以是先获取像素高的原始图像,也可以是先获取人像等,本申请实施例不对该预设规则进行具体限定。
其中,该解码网络可以通过第一目标数量的卷积层来提取图像的特征,该第一目标数量可以是预设的,也可以是在对解码网络进行训练的过程中调整的,本申请实施例不对该第一目标数量的来源进行具体限定。
以该原始图像进入的第一个卷积层为例,假设原始图像的高度为H 0,宽度为W 0,深度为D 0,其中,H 0、W 0和D 0为正整数,该深度D 0可以是该原始图像的通道数,例如,选取RGB通道时D 0=3,分别为红、绿、蓝通道,对该第一个卷积层在初始化时至少预设下述超参数:卷积核尺寸F,卷积核个数K。其中,每个卷积核(filter)用于指示对原始图像的特征进行加权时的权重,每个卷积核可以是尺寸大小为F*F*D 0的权值矩阵,各个卷积核的深度与原始图像的深度保持匹配,其中,F为小于H 0和W 0中的最小值的正整数,卷积核个数K为正整数,K用于指示该第一个卷积层所能够提取的特征数,也即是该第一个卷积层所输出的特征图的个数也为K。在GAN训练中,根据各个网络损失函数指示的偏差,从而调整各个卷积核的参数,当训练完成后即可得到各个卷积核的最终的矩阵取值。在一些实施例中,该原始图像也可以是单通道的,本申请实施例不对该原始图像的通道数进行限定。
可选地,为增加卷积运算的处理速度,在初始化时还可以指定步长S,以使得在进行卷积运算时,步长S可以为大于等于1,且小于等于H 0和W 0中的最小值的任一正整数;可选地,为更加准确的提取该原始图像的边缘信息,在初始化时还可以指定边界填充P,该边界填充P为大于等于0的整数,P用于指示在该原始图像外周的补零层数(zero padding),当没有指定步长S和边界填充P时,默认S=1,P=0。
基于上述参数,该第一个卷积层根据该K个卷积核,对该原始图像的各个通道以S为步幅进行卷积加权运算,从而可以得到K个第一层特征图(feature map),将该K个第一层特征图作为第二个卷积层的输入,也即是第二个卷积层的输入图像的深度为K,那么第二个卷积层中各个卷积核的深度也为K,依此类推,每一个卷积层的输出图像作为下一个卷积层的输入图像,直到经过了第一目标数量的卷积层的特征提取后,从而得到高度为H,宽度为W,深度为D的第一特征图,其中H、W和D为正整数,该第一特征图的深度D可以用于指示该解码网络所提取到的特征信息的个数。
在一种可能实施方式中,在预设超参数时,令步长S=1,边界填充P=(F-1)/2,可以使得该第一个卷积层输出的特征图与该原始图像的高度和宽度相同,从而该解码网络中的每个卷积层都设置为步长S=1,边界填充P=(F-1)/2,使得该解码网络输出的第一特征图的高度H=H 0,W=W 0
202、服务器获取与至少一个变换需求信息对应的多个变换网络。
服务器可以根据至少一个变换需求信息,从已有的多个变换网络中,获取与至少一个变换需求信息对应的多个变换网络。其中,该至少一个变换需求信息可以是该图像变换指令所携带的信息,也可以是服务器默认设置的信息,还可以是与某种变换需求对应的至少一个变换需求信息,本申请实施例不对该至少一个变换需求信息的获取方式进行具体限定。
在上述对多个变换网络的获取过程中,服务器可以基于该至少一个变换需求信息,一次性从已有的多个变换网络中,获取与该至少一个变换需求信息对应的多个变换网络,而可选地,服务器还可以每处理一个变换需求信息,就获取与该变换需求信息对应的变换网络,本申请实施例不对获取该多个变换网络的方式进行具体限定。
在一些实施例中,当服务器一次性获取与该至少一个变换需求信息对应的多个变换网络后,服务器还可以将与该至少一个变换需求信息对应的多个变换网络进行排序,可选地,服务器可以根据变换需求信息的细节程度,对与该至少一个变换需求信息对应的多个变换网络进行排序,例如,先处理细节要求低的特征,本申请实施例不对与该至少一个变换需求对应的多个变换网络的排序规则进行具体限定,上述排序可以是指将一个变换网络的输出作为下一个变换网络的输入。
其中,每个变换需求信息用于指示基于一个特征类别进行变换的需求,每个变换需求信息可以对应于一个或多个变换网络,每个训练好的变换网络,可以用于根据相应的变换需求信息进行特征变换,例如,该变换需求信息C i为将头发颜色变为绿色,则该特征类别为头发颜色,该变换网络T i用于将输入特征图中的头发颜色变换为绿色。
图3是本申请实施例提供的图像处理方法的示意图,以所获取的变换网络数目为N为例,参见图3,原始图像经过经过N个变换网络后,得到了经过N个特征变换的目标图像,下述步骤203-209以获取的第i个变换网络为例,i为小于等于N的正整数,对第i个变换网络如何进行第i个特征变换进行说明,在进行图像处理过程中,对于每个变换网络都具有类似的特征变换过程,在此不作赘述,直到对全部的变换需求信息都实现了相应地特征变换,输出第二特征图,执行下述步骤211。
203、服务器获取第i个条件向量,该第i个条件向量为行向量或列向量。
图4是本申请实施例提供的变换网络的结构示意图,参见图4,在该第i个 变换网络中可以包括n个卷积层和目标卷积层,其中n为正整数,该第i个条件向量用于指示对第i个特征类别的目标变换需求信息,该第i个条件向量可以是外界输入的参量,也可以是根据该目标变换需求信息生成的,且该第i个条件向量为非零向量。本申请实施例不对该第i个条件向量的获取方式进行限定。
基于上述示例,条件向量可以用于表示图像变换需求的变换类型。当该目标变换需求信息为将头发颜色变为绿色时,该条件向量的长度可以用于指示该GAN所能够实现头发颜色变换的色彩数量,例如,该GAN能实现5种色彩的头发颜色的变换,那么该条件向量的长度d i=5,该条件向量的每一位可以用于指示一个色彩,如果第三位指示为绿色,则该条件向量的第三位数字置为1,其余所有值置为0,也即是,该条件向量为[0,0,1,0,0],本申请实施例不对该条件向量的长度进行具体限定。
204、服务器将该第i个条件向量拓展到与第i-1个变换网络输出的特征图的宽度W i-1和高度H i-1相同,得到第i个条件张量。
在一种可能实施方式中,可以先将该第i个条件向量在宽度方向上复制W i-1行,从而拓展得到尺寸为d i*W i-1的二维矩阵,进而将该二维矩阵在高度方向上复制H i-1列,从而拓展得到尺寸为d i*H i-1*W i-1的三维张量,也即是该第i个条件张量,其中d i、H i-1和W i-1为正整数。
可选地,还可以将该第i个条件向量先在高度方向上复制拓展,再将得到的二维矩阵在宽度方向上复制拓展,或,将第i个条件向量同时向宽度方向和高度方向复制拓展,在一些实施例中,条件向量也可以不是行向量以及列向量,只要能够表示指示第i个特征类别的目标变换需求信息即可,例如可以是条件张量本身,即可以不通过条件向量的拓展,直接将外界输入的三维矩阵获取为条件张量,本申请实施例不对该第i个条件张量的获取方式进行限定。
205、服务器将第i个条件张量与该第i-1个变换网络输出的特征图在深度方向上相连,得到第i个扩展张量。
其中,由于该第i个条件张量的宽度与该第i-1个变换网络输出的特征图相同,且该第i个条件张量的高度也与该第i-1个变换网络输出的特征图相同,使得该条件张量与该第i-1个变换网络输出的特征图能够在深度方向上直接相连,假设该第i-1个变换网络输出的特征图的深度为D i-1,从而得到的该扩展张量的尺寸即为(D i-1+d i)*H i-1*W i-1,其中D i-1为正整数。
206、服务器将该第i个扩展张量依次输入多个卷积层,直到该第i个扩展 张量的深度降维到与该第i-1个变换网络输出的特征图的深度相同,输出第i个降维张量。
在上述步骤中,通过该第i个变换网络中的多个卷积层的作用,将该第i个扩展张量的深度从D i-1+d i降低到D i-1,进而得到尺寸为D i-1*H i-1*W i-1的第i个降维张量,该多个卷积层的内部结构可以和上述步骤201中的解码网络中卷积层的结构类似,也即是每个卷积层中都包括卷积核,各个卷积层的超参数可以相同,也可以不同,在此不作赘述。
可选地,在变换网络中的各个卷积层之间,还可以引入残差块(residual block),例如,如果第j个卷积层的输入,也即是第j-1个卷积层的输出为x j,那么第j个卷积层的输出可以表示为f j(x j),在第j个卷积层和第j+1个卷积层之间引入第j个残差块,该第j个残差块可以表示为f j(x j)+x j,并以该第j个残差块作为第j+1个卷积层的输入,通过引入残差块,解决了神经网络的退化问题,使得第i个变换网络的卷积层越深,对图像处理的效果越好。
在一种可能实施方式中,该多个卷积层的数量可以为服务器所预设的超参数,也可以是在GAN的训练中调整的数值,本申请实施例不对该多个卷积层的数量的获取方式进行具体限定。
207、服务器将该第i个降维张量输入该第i个变换网络的目标卷积层,对该第i个降维张量进行卷积处理。
在一些实施例中,该目标卷积层的内部结构也和上述多个卷积层的结构类似,再次不做赘述。可选地,当对每个变换网络的多个卷积层和目标卷积层的参数进行预设,使得步长S=1,边界填充P=(F-1)/2时,能够使得第一特征图经过多个变换网络进行特征变换后,输出的第二特征图的高度和宽度与原始图像保持相等。
208、服务器将卷积处理后的张量输入激活函数,输出第i个掩膜,该激活函数用于对输入的张量进行非线性处理。
其中,该第i个掩膜用于指示在该第i-1个变换网络输出的特征图中该第i个变换需求信息所对应的变换区域,基于上述示例,该第i个掩膜用于指示该第i-1个变换网络输出的特征图中的表示头发的区域。
可选地,该第i个掩膜可以为尺寸H i-1*W i-1的二维矩阵,则可以将该第i个掩膜拓展至与该第i个降维张量的深度相同,以便于进行后续的变换处理,在一些实施例中,还可以直接通过对该目标卷积层的参数调整,从而直接获取拓展 后的掩膜。
可选地,该激活函数可以是Sigmoid,可以是tanh,还可以是ReLU等,能够对该目标卷积层的输出进行非线性处理,从而提升GAN变换的细节表达能力,本申请实施例不对该激活函数的函数表达式进行具体限定。
209、服务器将该第i个降维张量中与该第i个掩膜对应的区域保留,将该第i个降维张量中除了该第i个掩膜外的区域,替换为该第i-1个变换网络输出的特征图中的相应区域,得到第i个变换网络输出的特征图。
在上述步骤中,假设该第i-1个变换网络输出的特征图表示为f i-1,该第i个降维张量表示为f i’,该第i个掩膜表示为g i,则该第i个变换网络输出的特征图可以表示为f i=g i*f i’+(1-g i)*f i-1,也即是,仅将第i-1个变换网络输出的特征图中与该第i个变换需求信息对应的区域进行特征变换。
210、服务器重复执行上述步骤203-209,直到每个变换需求信息都进行了相应变换,输出第二特征图。
上述步骤203-209示例了第i个变换网络进行第i个变换需求信息的方法,在203至209的步骤完成后,可以基于第i+1个变换网络来继续进行图像变换,通过各个变换网络的串行连接,将上一个变换网络的输出作为下一个变换网络的输入,直到与每个变换需求信息对应的变换网络进行了特征变换,最后一个变换网络输出的图像即为该第二特征图。
211、服务器将该第二特征图输入重建网络,输出目标图像,该重建网络用于将输入的特征图重建为二维图像。
在一种可能实施方式中,在重建网络中通过多个卷积层可以将该第二特征图重建为目标图像,该目标图像为经过了该至少一个变换需求信息的处理后的原始图像。
本申请实施例提供的方法,通过服务器根据图像变换指令,将原始图像输入解码网络进行特征提取,再依次输入多个变换网络,实现对多个特征的变换,再输入重建网络进行重建为目标图像,使得特征数量越多时,也整体只需经过一次解码和一次重建,使得图像处理的过程变得简洁流畅;进一步地,通过引入条件向量和掩膜,使得每个变换网络都能够无需判别网络参与,实现对相应变换需求信息的变换,精简了GAN的网络架构;进一步地,通过设置步长S和边界填充P为合适的数值,能够使得输出的第二特征图的高度和宽度与原始图像相同,避免了丢失细节信息。
上述所有可选技术方案,可以采用任意结合形成本公开的可选实施例,在此不再一一赘述。
图5是本申请实施例提供的图像处理方法的流程图。参见图5,该实施例包括:
501、服务器构建初始化的对抗网络,该对抗网络包括解码网络、多个变换网络、重建网络和多个判别网络。
可以将解码网络、多个变换网络以及重建网络称为图像处理网络,图像处理网络用于对图像进行处理。可选地,图6是本申请实施例提供的对抗网络的结构示意图,参见图6,在构建初始化的对抗网络时,可以在该解码网络后并行接入多个变换网络,在该多个变换网络后接入该重建网络,在该重建网络后并行接入该多个判别网络,其中,每个判别网络与该多个变换网络中的变换网络一一对应。
在上述过程中,在该多个变换网络和多个判别网络进行训练时,由于连接了同一个解码网络和重建网络,所以能够实现在训练过程中共享解码网络和重建网络的训练数据,以缩短训练所需数据,优化对抗网络的资源配置。
可选地,在构建初始化的对抗网络时,还可以对各个网络的超参数进行预设,对不同的变换网络,可以设置为相同的初始化数值,也可以设置为不同的初始化数值,对不同的判别网络也是如此,在此不作赘述,本申请实施例不对该初始化过程中各个网络的超参数的初始化数值进行具体限定。
502、服务器将第i个图像集输入第i个判别网络,根据第i个判别网络的损失函数的数值,调整该第i个判别网络的参数。
可选地,该第i个判别网络为多个判别网络中的任一判别网络,在步骤502-504中,以该对抗网络中的一条分支的训练情况为例进行说明,也即是解码网络、第i个变换网络、重建网络和第i个判别网络所组成的分支,对于对抗网络中的任一分支,都有类似的训练步骤,且该对抗网络中每一条分支都共享了解码网络和重建网络的训练数据,在此不作赘述。
在一些实施例中,每个图像集可以对应于一个特征类别,每个特征类别可以对应于一个判别网络,每个图像集都可以包括真样本集和假样本集,以第一特征类别对应的图像集为例,真样本集可以是经过了该第一特征类别变换的图像,假样本集可以是经过了第二特征类别变换的图像,其中,该第二特征类别 可以是可变换类别组中除了该第一特征类别之外的任一特征类别,通过对该判别网络的训练,使得对于该真样本集,判别网络可以输出1,对假样本集,判别网络可以输出0,从而实现判别功能。
在对GAN的训练中,该第i个判别网络可以用于对经过了该解码网络、第i个变换网络以及该重建网络处理后的输出结果进行判断,从而对各个网络的参数进行连动调整,以得到优化的GAN,当训练完毕后进行图像处理的过程中,可以无需使用多个判别网络。
可选地,对第i个判别网络而言,损失函数可以包括三类,第一类损失函数可以是该第i个判别网络D i,判别经过解码网络E、第i个变换网络T i以及重建网络R的处理后所输出的图像是否为真的函数L adv,i,该函数的形式可以表示为:
Figure PCTCN2019119087-appb-000001
其中,y为真样本集内的图像,x为GAN网络生成的图像,当该GAN网络重建的图片越逼真时,使得该损失函数L adv,i数值越小,也即是损失越小。
第二类损失函数可以是在进行判别的过程中,对真样本集内的图像进行特征分类的函数
Figure PCTCN2019119087-appb-000002
该函数的形式可以表示为:
Figure PCTCN2019119087-appb-000003
其中,c i为第i个判别网络所对应的特征类别,使得当该判别网络对特征的分类越准确,该损失函数
Figure PCTCN2019119087-appb-000004
数值越小,也即是损失越小。
第三类损失函数可以是在进行判别的过程中,对GAN网络生成的图像进行特征分类的函数
Figure PCTCN2019119087-appb-000005
该函数的形式可以表示为:
Figure PCTCN2019119087-appb-000006
其中,当该判别网络对特征的分类越准确,该损失函数
Figure PCTCN2019119087-appb-000007
数值越小,也即是损失越小。
503、服务器根据解码网络、重建网络以及第i个变换网络的损失函数的数值,调整该解码网络、重建网络以及第i个变换网络的参数。
可选地,该重建网络和解码网络的损失函数
Figure PCTCN2019119087-appb-000008
可以采用如下形式表示:
Figure PCTCN2019119087-appb-000009
该损失函数
Figure PCTCN2019119087-appb-000010
用于表示原始图像经过解码网络后直接输入重建网络所得 到的图像,与原始图像之间的损失,当重建网络和解码网络越准确时,
Figure PCTCN2019119087-appb-000011
数值越小。
可选地,该第i个变换网络的损失函数
Figure PCTCN2019119087-appb-000012
可以采用如下形式表示:
Figure PCTCN2019119087-appb-000013
该损失函数
Figure PCTCN2019119087-appb-000014
用于表示经过第i个变换网络后的图像,与经过变换、解码和重建后再解码的图像之间的损失,当第i个变换网络越准确时,
Figure PCTCN2019119087-appb-000015
数值越小。
504、重复执行上述步骤502-503,直到各个网络的损失函数的数值与理想值的差值小于预设值。
可选地,该预设值是服务器默认的参数,也可以是人为设定的数值。上述步骤502-504通过该第i个图像集,先对第i个判别网络的参数进行调整,而由于对判别网络的参数调整后,会影响解码网络、重建网络和第i个变换网络的损失函数的数值,从而对上述网络进行参数调整,进而又影响了该第i个判别网络的损失函数的数值,使得重复执行连动调整参数的操作,实现对抗网络中一个分支的迭代训练,对于该对抗网络中的每一个分支,都可以执行步骤502-504所执行的操作,以实现对初始化网络的训练,从而得到能够进行多个特征变换的神经网络,以备后续的图像处理。
在一些实施例中,还可以对上述所有的损失函数加权求和,得到该对抗网络的损失函数L G
Figure PCTCN2019119087-appb-000016
其中,判别网络的第三类损失函数的权重为μ cls,解码网络和多个变换网络的损失函数相加的数值所占的权重为μ cyc。在一种可能实施方式中,当该对抗网络的损失函数L G的数值与理想值的差值小于预设值时,视为该对抗网络训练完毕,其中,各个损失函数的权重可以是服务器预设的数值。
505、当接收图像变换指令时,根据至少一个变换需求信息,获取与该至少一个变换需求信息对应的多个变换网络。
506、将原始图像依次输入解码网络、与该至少一个变换需求信息对应的多个变换网络和重建网络,输出目标图像。
在上述步骤505-506中,通过训练完毕的GAN,实现了对原始图像进行多个特征变换的图像处理,从而输出目标图像,可选的实施过程在上一个实施例中已经详述,再次不做赘述。
本申请实施例提供的方法,通过构建初始化的对抗网络,从而根据多个图像集,训练该多个判别网络,再根据该多个判别网络的训练结果,迭代训练该对抗网络,直到对该对抗网络训练完毕后,当接收图像变换指令时,将原始图像输入训练完毕的对抗网络,输出经过图像处理的目标图像,由于多个判别网络和多个变换网络共享了解码网络和重建网络的训练数据,使得缩短了进行多个特征变换的GAN所需的训练数据,从而缩短了训练GAN的时间;进一步地,通过对各个网络的损失函数的数值调整,能够指示获取更准确的对抗网络的参数,实现精准的特征变换;进一步地,当接收到图像变换指令时,选取与变换需求信息对应的变换网络,实现图像处理,优化了GAN在训练时以及使用时的网络架构和资源配置。
上述所有可选技术方案,可以采用任意结合形成本公开的可选实施例,在此不再一一赘述。
应该理解的是,本申请各实施例中的各个步骤并不是必然按照步骤标号指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,各实施例中至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
图7是本申请实施例提供的图像处理装置的结构示意图。参见图7,该装置包括:
解码模块701,用于根据图像变换指令,将原始图像输入解码网络,输出该原始图像的第一特征图,该解码网络用于提取图像的特征;
变换模块702,用于将该第一特征图依次输入与至少一个变换需求信息对应的多个变换网络,输出第二特征图,每个变换网络用于进行图像变换处理;
重建模块703,用于将该第二特征图输入重建网络,输出目标图像,该重建网络用于将输入的特征图重建为二维图像。
本申请实施例提供的装置,根据图像变换指令,将原始图像输入解码网络进行特征提取,再依次输入多个变换网络,实现对多个特征的变换,再输入重建网络进行重建为目标图像,使得特征数量越多时,也整体只需经过一次解码 和一次重建,使得图像处理的过程变得简洁流畅。
在一种可能实施方式中,基于图7的装置组成,该变换模块702包括:
确定单元,用于对于每个变换网络,根据该变换网络对应的变换需求信息,确定条件张量,该条件张量与该变换网络对应的输入特征图的宽度和高度相同;
变换单元,用于基于该变换网络对应的条件张量,对上一个变换网络输出的特征图中该变换网络对应的区域进行变换,输出该变换网络的特征图。
在一种可能实施方式中,该确定单元还用于获取条件向量,该条件向量为行向量或列向量;
将该条件向量拓展到与该输入特征图的宽度和高度相同,得到该条件张量。
在一种可能实施方式中,基于图7的装置组成,该变换单元包括:
相连子单元,用于将该条件张量与该输入特征图在深度方向上相连,得到扩展张量;
降维子单元,用于将该扩展张量依次输入多个卷积层,直到该扩展张量的深度降维到与该输入特征图的深度相同,输出降维张量;
第一获取子单元,用于根据该降维张量获取掩膜,该掩膜用于指示在该输入特征图中该变换需求信息所指示的区域;
第二获取子单元,用于根据该降维张量、该掩膜和该输入特征图,获取该变换网络的输出特征图。
在一种可能实施方式中,基于图7的装置组成,该第一获取子单元还用于将该降维张量输入该变换网络的目标卷积层,对该降维张量进行卷积处理;
将卷积处理后的张量输入激活函数,输出该掩膜,该激活函数用于对输入的张量进行非线性处理。
在一种可能实施方式中,基于图7的装置组成,该第二获取子单元还用于将该降维张量中与该掩膜对应的区域保留,将该降维张量中除了该掩膜外的区域替换为该输入特征图中的相应区域,得到该输出特征图。
上述所有可选技术方案,可以采用任意结合形成本公开的可选实施例,在此不再一一赘述。
图8是本申请实施例提供的图像处理装置的结构示意图。参见图8,该装置包括:
构建模块801,用于构建初始化的对抗网络,该对抗网络包括解码网络、多 个变换网络、重建网络和多个判别网络;
训练模块802,用于根据多个图像集,训练该多个判别网络,根据该多个判别网络的训练结果,迭代训练该对抗网络;
处理模块803,用于当接收图像变换指令时,将原始图像输入训练完毕的对抗网络,输出经过图像处理的目标图像。
本申请实施例提供的装置,通过构建初始化的对抗网络,从而根据多个图像集,训练该多个判别网络,再根据该多个判别网络的训练结果,迭代训练该对抗网络,直到对该对抗网络训练完毕后,当接收图像变换指令时,将原始图像输入训练完毕的对抗网络,输出经过图像处理的目标图像,由于多个判别网络和多个变换网络共享了解码网络和重建网络的训练数据,使得缩短了进行多个特征变换的GAN所需的训练数据,从而缩短了训练GAN的时间。
在一种可能实施方式中,该处理模块803还用于当接收该图像变换指令时,根据至少一个变换需求信息,获取与该至少一个变换需求信息对应的多个变换网络;
将该原始图像依次输入该解码网络、该与该至少一个变换需求信息对应的多个变换网络和该重建网络,输出该目标图像。
在一种可能实施方式中,该训练模块802还用于对每个判别网络,输入与该判别网络对应的图像集,根据该判别网络的损失函数的数值,调整该判别网络的参数;
根据该解码网络、该重建网络以及与该判别网络对应的变换网络的损失函数的数值,调整该解码网络、该重建网络以及该变换网络的参数;
重复执行上述调整该判别网络的参数,以及调整该解码网络、该重建网络以及该变换网络的参数的步骤,直到各个网络的损失函数的数值与理想值的差值小于预设值。
上述所有可选技术方案,可以采用任意结合形成本公开的可选实施例,在此不再一一赘述。
需要说明的是:上述实施例提供的图像处理装置在进行图像处理时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的图像处理装置与图像处理方法实施例属于同一构思,其具体实现过程详见图像处理方法实施 例,这里不再赘述。
图9示出了一个实施例中计算机设备的内部结构图。如图9所示,该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现图像处理方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行图像处理方法
在示例性实施例中,还提供了一种计算机可读存储介质,例如包括指令的存储器,上述指令可由终端中的处理器执行以完成上述实施例中图像处理方法。例如,该计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,可以通过硬件来完成,可以通过计算机程序来指令相关的硬件来完成,计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (21)

  1. 一种图像处理方法,由计算机设备执行,包括:
    根据图像变换指令,将原始图像输入解码网络,输出所述原始图像的第一特征图,所述解码网络用于提取图像的特征;
    将所述第一特征图依次输入与至少一个变换需求信息对应的多个变换网络,输出第二特征图,每个变换网络用于进行图像变换处理;及
    将所述第二特征图输入重建网络,输出目标图像,所述重建网络用于将输入的特征图重建为二维图像。
  2. 根据权利要求1所述的方法,其特征在于,所述将所述第一特征图依次输入与至少一个变换需求信息对应的多个变换网络,输出第二特征图包括:
    对于每个变换网络,根据所述变换网络对应的变换需求信息,确定条件张量;及
    基于所述变换网络对应的条件张量,对上一个变换网络输出的特征图中所述变换网络对应的区域进行变换,输出所述变换网络的特征图。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述变换网络对应的变换需求信息,确定条件张量包括:
    获取条件向量;及
    将所述条件向量拓展到与所述变换网络对应的输入特征图的宽度和高度相同,得到所述条件张量。
  4. 根据权利要求2所述的方法,其特征在于,所述基于所述变换网络对应的条件张量,对上一个变换网络输出的特征图中所述变换网络对应的区域进行变换,输出所述变换网络的特征图包括:
    将所述条件张量与所述输入特征图在深度方向上相连,得到扩展张量;
    将所述扩展张量依次输入多个卷积层,直到所述扩展张量的深度降维到与所述输入特征图的深度相同,输出降维张量;
    根据所述降维张量获取掩膜,所述掩膜用于指示在所述输入特征图中所述变换需求信息所指示的区域;及
    根据所述降维张量、所述掩膜和所述输入特征图,获取所述变换网络的输出特征图。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述降维张量获取 掩膜包括:
    将所述降维张量输入所述变换网络的目标卷积层,对所述降维张量进行卷积处理;及
    将卷积处理后的张量输入激活函数,输出所述掩膜,所述激活函数用于对输入的张量进行非线性处理。
  6. 根据权利要求4所述的方法,其特征在于,所述根据所述降维张量、所述掩膜和所述输入特征图,获取所述变换网络的输出特征图包括:
    将所述降维张量中与所述掩膜对应的区域保留,将所述降维张量中除了所述掩膜外的区域替换为所述输入特征图中的相应区域,得到所述输出特征图。
  7. 一种图像处理方法,包括:
    构建初始化的对抗网络,所述对抗网络包括图像处理网络以及多个判别网络,所述图像处理网络包括解码网络、多个变换网络和重建网络;
    根据多个图像集,训练所述多个判别网络,根据所述多个判别网络的训练结果,迭代训练所述对抗网络;及
    当接收图像变换指令时,将原始图像输入训练完毕的图像处理网络,输出经过图像处理的目标图像。
  8. 根据权利要求7所述的方法,其特征在于,所述当接收图像变换指令时,将原始图像输入训练完毕的图像处理网络,输出经过图像变换的目标图像包括:
    当接收所述图像变换指令时,根据至少一个变换需求信息,获取与所述至少一个变换需求信息对应的多个变换网络;及
    将所述原始图像依次输入所述解码网络、所述与所述至少一个变换需求信息对应的多个变换网络和所述重建网络,输出所述目标图像。
  9. 根据权利要求7所述的方法,其特征在于,所述根据多个图像集,训练所述多个判别网络,根据所述多个判别网络的训练结果,迭代训练所述对抗网络包括:
    对每个判别网络,输入与所述判别网络对应的图像集,根据所述判别网络的损失函数的数值,调整所述判别网络的参数;
    根据所述解码网络、所述重建网络以及与所述判别网络对应的变换网络的损失函数的数值,调整所述解码网络、所述重建网络以及所述变换网络的参数;及
    重复执行上述调整所述判别网络的参数,以及调整所述解码网络、所述重建网络以及所述变换网络的参数的步骤,直到各个网络的损失函数的数值与理想值的差值小于预设值。
  10. 一种图像处理装置,包括:
    解码模块,用于根据图像变换指令,将原始图像输入解码网络,输出所述原始图像的第一特征图,所述解码网络用于提取图像的特征;
    变换模块,用于将所述第一特征图依次输入与至少一个变换需求信息对应的多个变换网络,输出第二特征图,每个变换网络用于进行图像变换处理;及
    重建模块,用于将所述第二特征图输入重建网络,输出目标图像,所述重建网络用于将输入的特征图重建为二维图像。
  11. 一种图像处理装置,包括:
    构建模块,用于构建初始化的对抗网络,所述对抗网络包括图像处理网络以及多个判别网络,所述图像处理网络包括解码网络、多个变换网络和重建网络;
    训练模块,用于根据多个图像集,训练所述多个判别网络,根据所述多个判别网络的训练结果,迭代训练所述对抗网络;及
    处理模块,用于当接收图像变换指令时,将原始图像输入训练完毕的图像处理网络,输出经过图像处理的目标图像。
  12. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
    根据图像变换指令,将原始图像输入解码网络,输出所述原始图像的第一特征图,所述解码网络用于提取图像的特征;
    将所述第一特征图依次输入与至少一个变换需求信息对应的多个变换网络,输出第二特征图,每个变换网络用于进行图像变换处理;及
    将所述第二特征图输入重建网络,输出目标图像,所述重建网络用于将输入的特征图重建为二维图像。
  13. 根据权利要求12所述的设备,其特征在于,所述将所述第一特征图依次输入与至少一个变换需求信息对应的多个变换网络,输出第二特征图包括:
    对于每个变换网络,根据所述变换网络对应的变换需求信息,确定条件张 量;及
    基于所述变换网络对应的条件张量,对上一个变换网络输出的特征图中所述变换网络对应的区域进行变换,输出所述变换网络的特征图。
  14. 根据权利要求13所述的设备,其特征在于,所述根据所述变换网络对应的变换需求信息,确定条件张量包括:
    获取条件向量;及
    将所述条件向量拓展到与所述变换网络对应的输入特征图的宽度和高度相同,得到所述条件张量。
  15. 根据权利要求13所述的设备,其特征在于,所述基于所述变换网络对应的条件张量,对上一个变换网络输出的特征图中所述变换网络对应的区域进行变换,输出所述变换网络的特征图包括:
    将所述条件张量与所述输入特征图在深度方向上相连,得到扩展张量;
    将所述扩展张量依次输入多个卷积层,直到所述扩展张量的深度降维到与所述输入特征图的深度相同,输出降维张量;
    根据所述降维张量获取掩膜,所述掩膜用于指示在所述输入特征图中所述变换需求信息所指示的区域;及
    根据所述降维张量、所述掩膜和所述输入特征图,获取所述变换网络的输出特征图。
  16. 根据权利要求15所述的设备,其特征在于,所述根据所述降维张量获取掩膜包括:
    将所述降维张量输入所述变换网络的目标卷积层,对所述降维张量进行卷积处理;及
    将卷积处理后的张量输入激活函数,输出所述掩膜,所述激活函数用于对输入的张量进行非线性处理。
  17. 根据权利要求15所述的设备,其特征在于,所述根据所述降维张量、所述掩膜和所述输入特征图,获取所述变换网络的输出特征图包括:
    将所述降维张量中与所述掩膜对应的区域保留,将所述降维张量中除了所述掩膜外的区域替换为所述输入特征图中的相应区域,得到所述输出特征图。
  18. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以 下步骤:
    构建初始化的对抗网络,所述对抗网络包括图像处理网络以及多个判别网络,所述图像处理网络包括解码网络、多个变换网络和重建网络;
    根据多个图像集,训练所述多个判别网络,根据所述多个判别网络的训练结果,迭代训练所述对抗网络;及
    当接收图像变换指令时,将原始图像输入训练完毕的图像处理网络,输出经过图像处理的目标图像。
  19. 根据权利要求18所述的设备,其特征在于,所述当接收图像变换指令时,将原始图像输入训练完毕的图像处理网络,输出经过图像变换的目标图像包括:
    当接收所述图像变换指令时,根据至少一个变换需求信息,获取与所述至少一个变换需求信息对应的多个变换网络;及
    将所述原始图像依次输入所述解码网络、所述与所述至少一个变换需求信息对应的多个变换网络和所述重建网络,输出所述目标图像。
  20. 根据权利要求18所述的设备,其特征在于,所述根据多个图像集,训练所述多个判别网络,根据所述多个判别网络的训练结果,迭代训练所述对抗网络包括:
    对每个判别网络,输入与所述判别网络对应的图像集,根据所述判别网络的损失函数的数值,调整所述判别网络的参数;
    根据所述解码网络、所述重建网络以及与所述判别网络对应的变换网络的损失函数的数值,调整所述解码网络、所述重建网络以及所述变换网络的参数;及
    重复执行上述调整所述判别网络的参数,以及调整所述解码网络、所述重建网络以及所述变换网络的参数的步骤,直到各个网络的损失函数的数值与理想值的差值小于预设值。
  21. 一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器如权利要求1至6或权利要求7至9中任一项所述的图像处理方法。
PCT/CN2019/119087 2018-11-30 2019-11-18 图像处理方法、装置、设备及存储介质 WO2020108336A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/191,611 US11798145B2 (en) 2018-11-30 2021-03-03 Image processing method and apparatus, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811457745.5A CN109361934B (zh) 2018-11-30 2018-11-30 图像处理方法、装置、设备及存储介质
CN201811457745.5 2018-11-30

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/191,611 Continuation US11798145B2 (en) 2018-11-30 2021-03-03 Image processing method and apparatus, device, and storage medium

Publications (1)

Publication Number Publication Date
WO2020108336A1 true WO2020108336A1 (zh) 2020-06-04

Family

ID=65330739

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/119087 WO2020108336A1 (zh) 2018-11-30 2019-11-18 图像处理方法、装置、设备及存储介质

Country Status (3)

Country Link
US (1) US11798145B2 (zh)
CN (2) CN113902921B (zh)
WO (1) WO2020108336A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666919A (zh) * 2020-06-24 2020-09-15 腾讯科技(深圳)有限公司 一种对象识别方法、装置、计算机设备和存储介质
CN112785687A (zh) * 2021-01-25 2021-05-11 Oppo广东移动通信有限公司 图像处理方法、装置、电子设备和可读存储介质
CN113159295A (zh) * 2021-04-27 2021-07-23 瀚博半导体(上海)有限公司 基于硬件加速器的张量处理方法和系统

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113902921B (zh) * 2018-11-30 2022-11-25 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质
CN110458164A (zh) * 2019-08-07 2019-11-15 深圳市商汤科技有限公司 图像处理方法、装置、设备及计算机可读存储介质
CN110868598B (zh) * 2019-10-17 2021-06-22 上海交通大学 基于对抗生成网络的视频内容替换方法及系统
CN111414852A (zh) * 2020-03-19 2020-07-14 驭势科技(南京)有限公司 图像预测及车辆行为规划方法、装置和系统及存储介质
US11972348B2 (en) * 2020-10-30 2024-04-30 Apple Inc. Texture unit circuit in neural network processor
US20230196087A1 (en) * 2021-10-26 2023-06-22 Tencent America LLC Instance adaptive training with noise robust losses against noisy labels
CN114993677B (zh) * 2022-05-11 2023-05-02 山东大学 不平衡小样本数据的滚动轴承故障诊断方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180018539A1 (en) * 2016-07-12 2018-01-18 Beihang University Ranking convolutional neural network constructing method and image processing method and apparatus thereof
CN108305238A (zh) * 2018-01-26 2018-07-20 腾讯科技(深圳)有限公司 图像处理方法、装置、存储介质和计算机设备
CN108564127A (zh) * 2018-04-19 2018-09-21 腾讯科技(深圳)有限公司 图像转换方法、装置、计算机设备及存储介质
CN108596330A (zh) * 2018-05-16 2018-09-28 中国人民解放军陆军工程大学 一种并行特征全卷积神经网络及其构建方法
CN108765261A (zh) * 2018-04-13 2018-11-06 北京市商汤科技开发有限公司 图像变换方法和装置、电子设备、计算机存储介质、程序
CN109361934A (zh) * 2018-11-30 2019-02-19 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101640803B (zh) * 2009-09-04 2012-05-09 中国科学技术大学 一种用于多光谱图像的渐进的分布式编解码方法及装置
CN103905522B (zh) * 2013-07-12 2017-08-25 青岛龙凯信息科技有限公司 基于云计算环境的图像身份对比监测识别方法
US20160180214A1 (en) * 2014-12-19 2016-06-23 Google Inc. Sharp discrepancy learning
US9589210B1 (en) * 2015-08-26 2017-03-07 Digitalglobe, Inc. Broad area geospatial object detection using autogenerated deep learning models
CN105872555B (zh) * 2016-03-25 2019-01-15 中国人民武装警察部队工程大学 一种针对h.264视频运动矢量信息嵌入的隐写分析算法
CN107103590B (zh) * 2017-03-22 2019-10-18 华南理工大学 一种基于深度卷积对抗生成网络的图像反射去除方法
CN106951867B (zh) * 2017-03-22 2019-08-23 成都擎天树科技有限公司 基于卷积神经网络的人脸识别方法、装置、系统及设备
CN106952239A (zh) * 2017-03-28 2017-07-14 厦门幻世网络科技有限公司 图像生成方法和装置
CN107154023B (zh) * 2017-05-17 2019-11-05 电子科技大学 基于生成对抗网络和亚像素卷积的人脸超分辨率重建方法
CN107330954A (zh) * 2017-07-14 2017-11-07 深圳市唯特视科技有限公司 一种基于衰减网络通过滑动属性操纵图像的方法
CN107437077A (zh) * 2017-08-04 2017-12-05 深圳市唯特视科技有限公司 一种基于生成对抗网络的旋转面部表示学习的方法
CN107886491A (zh) * 2017-11-27 2018-04-06 深圳市唯特视科技有限公司 一种基于像素最近邻的图像合成方法
CN107945282B (zh) * 2017-12-05 2021-01-29 洛阳中科信息产业研究院(中科院计算技术研究所洛阳分所) 基于对抗网络的快速多视角三维合成和展示方法及装置
KR102421856B1 (ko) * 2017-12-20 2022-07-18 삼성전자주식회사 영상의 상호작용 처리 방법 및 장치
CN108122249A (zh) * 2017-12-20 2018-06-05 长沙全度影像科技有限公司 一种基于gan网络深度学习模型的光流估计方法
CN108596267B (zh) * 2018-05-03 2020-08-28 Oppo广东移动通信有限公司 一种图像重建方法、终端设备及计算机可读存储介质
CN108765340B (zh) * 2018-05-29 2021-06-25 Oppo(重庆)智能科技有限公司 模糊图像处理方法、装置及终端设备
CN108898579B (zh) * 2018-05-30 2020-12-01 腾讯科技(深圳)有限公司 一种图像清晰度识别方法、装置和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180018539A1 (en) * 2016-07-12 2018-01-18 Beihang University Ranking convolutional neural network constructing method and image processing method and apparatus thereof
CN108305238A (zh) * 2018-01-26 2018-07-20 腾讯科技(深圳)有限公司 图像处理方法、装置、存储介质和计算机设备
CN108765261A (zh) * 2018-04-13 2018-11-06 北京市商汤科技开发有限公司 图像变换方法和装置、电子设备、计算机存储介质、程序
CN108564127A (zh) * 2018-04-19 2018-09-21 腾讯科技(深圳)有限公司 图像转换方法、装置、计算机设备及存储介质
CN108596330A (zh) * 2018-05-16 2018-09-28 中国人民解放军陆军工程大学 一种并行特征全卷积神经网络及其构建方法
CN109361934A (zh) * 2018-11-30 2019-02-19 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666919A (zh) * 2020-06-24 2020-09-15 腾讯科技(深圳)有限公司 一种对象识别方法、装置、计算机设备和存储介质
CN111666919B (zh) * 2020-06-24 2023-04-07 腾讯科技(深圳)有限公司 一种对象识别方法、装置、计算机设备和存储介质
CN112785687A (zh) * 2021-01-25 2021-05-11 Oppo广东移动通信有限公司 图像处理方法、装置、电子设备和可读存储介质
CN113159295A (zh) * 2021-04-27 2021-07-23 瀚博半导体(上海)有限公司 基于硬件加速器的张量处理方法和系统

Also Published As

Publication number Publication date
CN113902921B (zh) 2022-11-25
US11798145B2 (en) 2023-10-24
US20210192701A1 (en) 2021-06-24
CN113902921A (zh) 2022-01-07
CN109361934A (zh) 2019-02-19
CN109361934B (zh) 2021-10-08

Similar Documents

Publication Publication Date Title
WO2020108336A1 (zh) 图像处理方法、装置、设备及存储介质
CN109685819B (zh) 一种基于特征增强的三维医学图像分割方法
CN111553267B (zh) 图像处理方法、图像处理模型训练方法及设备
US20220028031A1 (en) Image processing method and apparatus, device, and storage medium
CN111401216B (zh) 图像处理、模型训练方法、装置、计算机设备和存储介质
US11620521B2 (en) Smoothing regularization for a generative neural network
CN111754596A (zh) 编辑模型生成、人脸图像编辑方法、装置、设备及介质
JP7246811B2 (ja) 顔画像生成用のデータ処理方法、データ処理機器、コンピュータプログラム、及びコンピュータ機器
WO2021159781A1 (zh) 图像处理方法、装置、设备及存储介质
US20220004803A1 (en) Semantic relation preserving knowledge distillation for image-to-image translation
WO2023273668A1 (zh) 图像分类方法、装置、设备、存储介质及程序产品
CN115565238B (zh) 换脸模型的训练方法、装置、设备、存储介质和程序产品
CN113780249B (zh) 表情识别模型的处理方法、装置、设备、介质和程序产品
CN113205449A (zh) 表情迁移模型的训练方法及装置、表情迁移方法及装置
US20220101122A1 (en) Energy-based variational autoencoders
CN115984949B (zh) 一种带有注意力机制的低质量人脸图像识别方法及设备
JP7225731B2 (ja) 多変数データシーケンスの画像化
US20220101145A1 (en) Training energy-based variational autoencoders
US11605001B2 (en) Weight demodulation for a generative neural network
Jin et al. FedCrack: Federated Transfer Learning With Unsupervised Representation for Crack Detection
Mathur et al. Recoloring Grayscale Images using GAN
JPWO2022002943A5 (zh)
GAO et al. Bidirectional Mapping Augmentation Algorithm for Synthetic Images Based on Generative Adversarial Network
CN112463936B (zh) 一种基于三维信息的视觉问答方法及系统
Infante Molina Learning to detect Deepfakes: benchmarks and algorithms

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19890609

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19890609

Country of ref document: EP

Kind code of ref document: A1