CN114283050A - Image processing method, device, equipment and storage medium - Google Patents

Image processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN114283050A
CN114283050A CN202111114829.0A CN202111114829A CN114283050A CN 114283050 A CN114283050 A CN 114283050A CN 202111114829 A CN202111114829 A CN 202111114829A CN 114283050 A CN114283050 A CN 114283050A
Authority
CN
China
Prior art keywords
image
tensor
image processing
style
output tensor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111114829.0A
Other languages
Chinese (zh)
Inventor
宋奕兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202111114829.0A priority Critical patent/CN114283050A/en
Publication of CN114283050A publication Critical patent/CN114283050A/en
Pending legal-status Critical Current

Links

Images

Abstract

The application discloses an image processing method, an image processing device, image processing equipment and a storage medium, and relates to the computer vision technology. The method and the device can be applied to various scenes such as cloud technology, artificial intelligence and intelligent traffic. The image processing method comprises the following steps: inputting the base image and the reference image into an image processing network; extracting image content characteristics of a basic image and image style characteristics of a reference image based on an image processing network; based on the image style characteristics of the reference image, carrying out image style migration on the image content characteristics of the basic image at the characteristic level to obtain image fusion characteristics; and reconstructing the image fusion characteristics into a target image based on an image processing network, wherein the target image has the image content of the basic image and the image style of the reference image. According to the method and the device, the original image content cannot be lost in the image processing process, and then the target image which is rich in image content and has the reference image style can be generated, so that the quality of the target image is improved, and the image processing effect is ensured.

Description

Image processing method, device, equipment and storage medium
Technical Field
The present application relates to computer vision technologies, and in particular, to an image processing method, an image processing apparatus, an image processing device, and a storage medium.
Background
Computer vision technology relates to a plurality of branches of image processing, image recognition, image semantic understanding, image retrieval, video processing, video semantic understanding and the like. The image processing is to convert one image into another image having a desired characteristic. Illustratively, image style migration is an image processing technique.
Among them, the image style migration is also called image style migration, and the purpose is to migrate the style (also called style) of one image into another image. In other words, given a base image and a reference image, a target image with both original image content and new style can be obtained through image style migration. Illustratively, the style of the image includes, but is not limited to, color, texture, and the like.
However, in the image style migration process, once the image content of the base image is lost, the image content in the target image is partially lost, thereby reducing the image quality and affecting the image processing effect. Therefore, a new image processing method is needed to realize high-quality image style migration.
Disclosure of Invention
The embodiment of the application provides an image processing method, an image processing device, image processing equipment and a storage medium, original image content is not lost in the whole image processing process, a target image which is rich in image content and has a reference image style can be generated, the quality of the target image is improved, and the image processing effect is further ensured. The technical scheme is as follows:
in one aspect, there is provided an image processing method, the method comprising:
inputting the base image and the reference image into an image processing network;
extracting image content characteristics of the basic image and image style characteristics of the reference image based on the image processing network;
based on the image style characteristics of the reference image, carrying out image style migration on the image content characteristics of the basic image at a characteristic level to obtain image fusion characteristics;
reconstructing the image fusion features into a target image based on the image processing network, the target image having image content of the base image and an image style of the reference image.
In some embodiments, the training process of the image processing network comprises:
inputting a sample basic image and a sample reference image into an initial network; based on the initial network, obtaining a sample target image having image content of the sample base image and an image style of the sample reference image;
constructing an image content loss function based on the sample base image and the sample target image;
constructing an image style loss function based on the sample reference image and the sample target image;
and updating the network parameters of the initial network according to the image content loss function and the image style loss function to obtain the image processing network.
In another aspect, there is provided an image processing apparatus, the apparatus including:
a feature extraction module configured to input the base image and the reference image into an image processing network; extracting image content characteristics of the basic image and image style characteristics of the reference image based on the image processing network;
the style migration module is configured to perform image style migration on the image content characteristics of the basic image at a characteristic level based on the image style characteristics of the reference image to obtain image fusion characteristics;
an image reconstruction module configured to reconstruct the image fusion features into a target image based on the image processing network, the target image having image content of the base image and an image style of the reference image.
In some embodiments, the image processing network is a reversible model;
the feature extraction module configured to: extracting image content characteristics of the base image and image style characteristics of the reference image based on forward calculation processes of all levels in the image processing network;
the image reconstruction module configured to: reconstructing the image fusion features into the target image based on reverse calculation processes of all levels in the image processing network;
and the forward calculation process and the reverse calculation process are mutually reversible calculation.
In some embodiments, the image processing network comprises an image compression layer and a plurality of stream units, each of the stream units comprising a normalization layer, a reversible convolution layer, and a coupling layer; the feature extraction module comprises:
the first processing unit is configured to perform linear transformation on a content feature tensor output by a previous flow unit based on a normalization layer of the flow unit to obtain a first output tensor for any flow unit which is not connected with the image compression layer;
a second processing unit configured to perform a convolution operation on the first output tensor to obtain a second output tensor based on the reversible convolution layer of the stream unit;
a third processing unit configured to segment the second output tensor to obtain a first sub-tensor and a second sub-tensor based on a coupling layer of the stream unit; performing nonlinear transformation on the first sub-tensor, and adding an obtained nonlinear transformation result and the second sub-tensor to obtain a third sub-tensor; performing feature splicing on the first sub tensor and the third sub tensor to obtain a third output tensor; inputting the third output tensor into a next stream unit.
In some embodiments, the second processing unit is configured to:
performing point multiplication on each element in the content feature tensor output by the previous stream unit and a first parameter to obtain a first intermediate result; and adding the first intermediate result and a second parameter to obtain the first output tensor.
In some embodiments, the third processing unit is configured to:
multiplying each element in the first output tensor by a first weight matrix respectively to obtain a second output tensor; wherein the first output tensor and the second output tensor have the same number of channels; the first weight matrix has a size of c × c, c being a positive integer, c referring to the number of channels of the first output tensor and the second output tensor.
In some embodiments, the image reconstruction module is configured to:
for any flow unit which is not connected with the image compression layer, segmenting the fusion feature tensor output by the previous flow unit based on the coupling layer of the flow unit to obtain a fourth sub-tensor and a fifth sub-tensor; performing nonlinear transformation on the fourth sub-tensor, and subtracting the fifth sub-tensor from the obtained nonlinear transformation result to obtain a sixth sub-tensor; performing feature splicing on the fourth sub-tensor and the sixth sub-tensor to obtain a fourth output tensor;
performing inverse transformation of convolution operation on the fourth output tensor based on the reversible convolution layer of the stream unit to obtain a fifth output tensor;
performing inverse transformation of linear transformation on the fifth output tensor based on the normalization layer of the stream unit to obtain a sixth output tensor; inputting the sixth output tensor into a next stream unit.
In some embodiments, the image reconstruction module is configured to:
multiplying each element in the fourth output tensor by a second weight matrix respectively to obtain a fifth output tensor;
wherein the fourth output tensor and the fifth output tensor have the same number of channels; the second weight matrix is an inverse of a weight matrix used in a forward calculation process for the invertible convolutional layers of the stream units.
In some embodiments, the image reconstruction module is configured to:
subtracting each element in the fifth output tensor from a second parameter to obtain a second intermediate result; the ratio of the second intermediate result to the first parameter is taken as the sixth output tensor.
In some embodiments, the image processing network comprises an image compression layer and a plurality of stream units; the device further comprises:
the image compression module is configured to perform image compression processing on the basic image based on an image compression layer of the image processing network to obtain a first compressed image, and input the first compressed image into a flow unit connected with the image compression layer; the first compressed image is reduced in size and deepened in number of channels as compared to the base image;
the image compression module is further configured to perform image compression processing on the reference image based on the image compression layer to obtain a second compressed image, and input the second compressed image into a streaming unit connected with the image compression layer; the second compressed image is reduced in size and deepened in channel number as compared to the reference image.
In some embodiments, the style migration module is configured to:
acquiring a first mean value and a first variance of the image content features on a channel dimension;
acquiring a second mean value and a second variance of the image style characteristics on a channel dimension;
obtaining a third intermediate result based on the image content features, the first mean and the first variance; multiplying the second variance with the third intermediate result to obtain a fourth intermediate result;
and adding the fourth intermediate result and the second average value to obtain the image fusion characteristic.
In some embodiments, the style migration module is configured to:
acquiring a first covariance matrix of the image content characteristics, and performing matrix decomposition on the first covariance matrix to obtain a first matrix decomposition result; determining the converted image content characteristics according to the first matrix decomposition result and the image content characteristics;
acquiring a second covariance matrix of the image style characteristics, and performing matrix decomposition on the second covariance matrix to obtain a second matrix decomposition result; acquiring the image fusion characteristic according to the second matrix decomposition result and the converted image content characteristic;
wherein the third covariance matrix of the image fusion features is the same as the first covariance matrix of the image style features.
In another aspect, a computer device is provided, the device comprising a processor and a memory, the memory having stored therein at least one program code, the at least one program code being loaded and executed by the processor to implement the image processing method described above.
In another aspect, a computer-readable storage medium is provided, in which at least one program code is stored, the at least one program code being loaded and executed by a processor to implement the image processing method described above.
In another aspect, a computer program product or a computer program is provided, the computer program product or the computer program comprising computer program code, the computer program code being stored in a computer-readable storage medium, the computer program code being read by a processor of a computer device from the computer-readable storage medium, the computer program code being executed by the processor such that the computer device performs the image processing method described above.
The embodiment of the application realizes image style migration based on an image processing network, in detail, based on the image processing network, image feature extraction and image reconstruction can be performed, namely, the image processing network is a technical framework integrating an image feature extraction function and an image reconstruction function, and integrating the image feature extraction function and the image reconstruction function ensures that the image processing network is information-lossless, so that the image style migration without losing original image content is realized, a target image with rich image content and a reference image style can be generated, the quality of the target image is improved, and the image processing effect is further ensured.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of a base image, a reference image and a target image provided by an embodiment of the present application;
fig. 2 is a schematic diagram of an implementation environment of an image processing method according to an embodiment of the present application;
fig. 3 is a flowchart of an image processing method provided in an embodiment of the present application;
FIG. 4 is a schematic diagram of a possible network architecture of an image processing network according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an image processing flow provided by an embodiment of the present application;
FIG. 6 is a flow chart of another image processing method provided by the embodiments of the present application;
fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of another computer device provided in an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The terms "first," "second," and the like, in this application, are used for distinguishing between similar items and items that have substantially the same function or similar functionality, and it should be understood that "first," "second," and "nth" do not have any logical or temporal dependency, nor do they define a quantity or order of execution. It will be further understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by these terms.
These terms are only used to distinguish one element from another. For example, a first element can be termed a second element, and, similarly, a second element can also be termed a first element, without departing from the scope of various examples. The first element and the second element may both be elements, and in some cases, may be separate and distinct elements.
At least one means one or more, and for example, at least one element may be an integer of one or more, such as one element, two elements, or three elements. And at least two means two or more, for example, at least two elements may be any integer number of two or more, such as two elements, three elements, and the like.
In some embodiments, the image processing scheme provided by the embodiments of the present application involves artificial intelligence techniques.
The artificial intelligence is a theory, a method, a technology and an application system which simulate, extend and expand human intelligence by using a digital computer or a machine controlled by the digital computer, sense the environment, acquire knowledge and obtain the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Computer Vision technology (CV) is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes image processing, image Recognition, image semantic understanding, image retrieval, Optical Character Recognition (OCR), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also includes common biometric technologies such as face Recognition and fingerprint Recognition.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.
With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.
The image processing scheme provided in the embodiment of the present application may relate to computer vision technology of artificial intelligence, machine learning, and other technologies, and is specifically described in the following embodiments.
Some key terms or abbreviations that may be involved in embodiments of the present application are described below.
Image style migration: also called image style migration, is a type of image generation task that aims to migrate the style of a reference image (also called style image or style image) into a base image (also called content image). In another expression, image style migration is a process of changing the style of an image while preserving the content of the image. That is, given a content image and a style image, a target image with both original image content and a new style can be obtained by image style migration.
In some embodiments, the style of the image includes, but is not limited to, image color, image texture, and the like.
In other embodiments, the style image is often an art work, such as an art painting, and the so-called image style migration refers to learning the style of the art painting and then applying the style to the content image, so that the shot ordinary photograph can be converted into a new image with the same style as the art painting.
Referring to fig. 1, a content image is an image which is automatically shot by a user by using a shooting device, a style image is an oil painting work, and the image which is automatically shot by the user can be converted into a target image which retains the original image content and has the style of the oil painting work through image style migration.
The following describes an implementation environment related to an image processing method provided by an embodiment of the present application.
In some embodiments, the image processing method is performed by the terminal alone; in other embodiments, the image processing method is jointly executed by the terminal and the server. For example, the terminal uploads the base image and the reference image to the server, the server completes image style migration based on the image processing method, and the obtained target image is returned to the terminal.
Taking the image processing method as an example, the terminal and the server jointly execute the image processing method, fig. 2 is a schematic diagram of an implementation environment of the image processing method provided in the embodiment of the present application, and referring to fig. 2, the implementation environment includes: a terminal 201 and a server 202.
Among them, the terminal 201 is installed and operated with a target application program supporting image processing. The terminal 201 is a terminal used by a user. The server 202 is used to provide background services for the target application.
In some embodiments, the target application may be a stand-alone application. The terminal 201 can log in the target application program based on the account information input by the user, and the interaction between the user and the terminal 201 is realized through the target application program. In addition, the target application program may also be a sub-application running in another application, for example, the sub-application may be an applet, which is not specifically limited in this embodiment of the present application.
In some embodiments, the server 202 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like. The terminal 201 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, an intelligent voice interaction device, an intelligent household appliance, a vehicle-mounted terminal, and the like, but is not limited thereto. The terminal 201 and the server 202 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited thereto.
An application scenario of the image processing method provided in the embodiment of the present application is described below.
The image processing scheme provided by the implementation of the application is suitable for image beautification, and the target image which retains the original image content and has the style of the reference image can be quickly generated by using the reference image built in the target application program or the reference image provided by the user aiming at the basic image provided by the user. The application field of image beautification is quite wide, and the image beautification comprises but not limited to the fields of movie and television, photography, fashion, electronic commerce, short video and the like, and can even be expanded to the fields of maps, car networking or intelligent transportation and the like. In the above fields, the image processing scheme provided by the embodiment of the present application can be applied to implement image beautification, so as to convert one image into another image with characteristics desired by a user.
The image processing scheme provided by the embodiment of the application relates to a new universal image style migration. The image processing scheme can be used for realizing high-quality image style migration with any resolution. The high quality means that the image processing scheme does not lose the image content of the base image in the image style migration process, and accordingly, the phenomenon that the image content is partially lost does not occur in the finally generated target image. The general meaning is that the image processing scheme can act on any basic image without training a class of image style migration algorithm for a single basic image. The following embodiments describe the image processing method provided in the embodiments of the present application in detail.
Fig. 3 is a flowchart of an image processing method according to an embodiment of the present application. The method is performed by a computer device of the type referred to in the implementation environment above. Referring to fig. 3, in some embodiments, the method flow includes the following steps.
301. The computer device inputs the base image and the reference image into the image processing network.
The reference image is also called a style image or a style image, and the base image is also called a content image. In some embodiments, the base image may be an image taken by a user, and the reference image may be provided by the user actively or may be a system-built image, for example, the user may select one of the multiple system-built images as the reference image, which is not limited herein.
The first point to be described is that the image processing network is responsible for image feature extraction, image style migration, and image reconstruction (restoration of image features to images). The task of the image feature extraction part is to extract image features of a basic image and a reference image; the task of the image style migration part is to perform image style migration on a feature level based on the image features extracted in the previous step; the task of the image reconstruction part is to reconstruct the image features which have been subjected to the image style migration in the previous step into an image.
The second point to be described is that the computation process of the image processing network is reversible, that is, the image processing network is a reversible model, that is, the computation process (also referred to as an inference process) of each hierarchy in the image processing network is reversible. Since the computation process of the image processing network is reversible, i.e. means that the image processing network is information-lossless. In some embodiments, the image processing network is a Flow model based on reversible operations, for example, the image processing network is a Glow model (reversible generation model), and the embodiments of the present application are not limited herein.
302. The computer device extracts image content features of the base image and image style features of the reference image based on the image processing network.
In some embodiments, image content features of the base image and image style features of the reference image are extracted based on forward computational processes at various levels in the image processing network.
Fig. 4 shows one possible network architecture of the image processing network. As shown in fig. 4, the image processing network includes an image compression layer 401, a plurality of stream units 402, and an image style migration module 403. Wherein each stream unit includes a normalization layer (Actnorm), a reversible convolutional layer, and a coupling layer. The functional roles of the image compression layer 401, the layers in the stream unit 402, and the image style migration module 403 are described in detail later.
In fig. 4, Nx represents that there are N stream unit superpositions. The value of N is a positive integer, for example, the value of N is 8, which is not limited herein in this embodiment of the application.
The first point to be noted is that the black arrow indicated to the right in fig. 4 represents the forward calculation process of the image processing network. In the forward calculation process, for each flow unit, the output of the normalization layer in that flow unit is passed to the reversible convolutional layer, and the output of the reversible convolutional layer is passed to the coupling layer. In addition, the black arrow indicated to the left in fig. 4 represents the reverse calculation process of the image processing network. In the reverse calculation process, for each flow unit, the output of the coupling layer in that flow unit is passed to the reversible convolutional layer, and the output of the reversible convolutional layer is passed to the normalization layer. That is, the forward calculation process and the reverse calculation process of the image processing network are reciprocal, that is, the forward calculation process and the reverse calculation process are reciprocal.
The second point to be described is that the image content feature refers to a feature of content included in an image, in which persons, animals, plants, natural scenery, and the like appearing in the image all belong to the image content. And the image style includes, but is not limited to, color, texture, etc. of the image, and accordingly, the image style features refer to features of the image color, features of the image texture, etc.
303. And the computer equipment performs image style migration on the image content characteristics of the basic image at the characteristic level based on the image style characteristics of the reference image to obtain image fusion characteristics.
This step is accomplished by an image style migration module 403 in the image processing network. In some embodiments, the image style migration module 403 is an AdaIN (Adaptive Instance Normalization) model or a WCT (white-Color Transform) model. The image style migration module 403 can also be other models besides the AdaIN model and the WCT model, and the embodiment of the present application is not limited herein, and only needs to be verified by mathematical verification that the other models have unbiased property, that is, the other models need to perform image style migration without losing image content.
304. The computer device reconstructs the image fusion feature into a target image based on an image processing network, the target image having the image content of the base image and the image style of the reference image.
In some embodiments, the image fusion features are reconstructed into the target image based on a reverse computational process at each level in the image processing network.
In the embodiment of the present application, the forward computation process of the image processing network is used for image feature extraction, and the reverse computation process of the image processing network is used for image reconstruction, that is, as shown in fig. 5, the image feature extraction and the image reconstruction are completed by one model, that is, the image processing network is a technical framework that combines the image feature extraction function and the image reconstruction function, and after a base image and a reference image are input into the image processing network, a target image with rich image content can be obtained through the image processing network, so that the image feature extraction and the image restoration without losing the original image content are realized, and further, the high-quality image style migration is realized.
The embodiment of the application realizes image style migration based on an image processing network, in detail, based on the image processing network, image feature extraction and image reconstruction can be performed, namely, the image processing network is a technical framework integrating an image feature extraction function and an image reconstruction function, and integrating the image feature extraction function and the image reconstruction function ensures that the image processing network is information-lossless, so that the image style migration without losing original image content is realized, a target image with rich image content and a reference image style can be generated, the quality of the target image is improved, and the image processing effect is further ensured.
Fig. 6 is a flowchart of another image processing method according to an embodiment of the present application. The method is performed by a computer device of the type referred to in the implementation environment above. Referring to fig. 6, in some embodiments, the method flow includes the following steps.
601. The computer device inputs the base image and the reference image into an image processing network whose computational process is reversible.
In the embodiment of the application, an image processing network is trained based on a sample data set. Wherein the sample data set comprises a sample base image and a sample reference image. Wherein the sample basis image refers to a content image for training; the sample reference image refers to a stylistic image used for training.
In some embodiments, the training process of the image processing network comprises: inputting a sample basic image and a sample reference image into an initial network; obtaining a sample target image based on the initial network, the sample target image having image content of a sample base image and an image style of a sample reference image; constructing an image content loss function based on the sample basic image and the sample target image; constructing an image style loss function based on the sample reference image and the sample target image; and updating the network parameters of the initial network according to the image content loss function and the image style loss function to obtain the image processing network.
It should be noted that the computer device for training the image processing network and the computer device for executing the image processing method may be the same device or different devices, and the present application is not limited herein.
602. The computer equipment carries out image compression processing on the basic image based on an image compression layer of the image processing network to obtain a first compressed image, and inputs the first compressed image into a flow unit connected with the image compression layer; and performing image compression processing on the reference image based on the image compression layer to obtain a second compressed image, and inputting the second compressed image into a stream unit connected with the image compression layer.
In some embodiments, the image processing network includes a plurality of image compression layers, and the architecture shown in fig. 4 includes two image compression layers. Wherein each image compression layer has the same function, only its input and output are different. The image compression layer appearing in this step refers to the first image compression layer of the image processing network, also called the first image compression layer, that is, after the base image and the reference image are input into the image processing network, the first image compression layer is used to perform image compression processing on the base image and the reference image.
In other embodiments, the compression function used by the image compression layer is an squeeze function, where the squeeze function is responsible for reducing the size of the image and deepening the number of channels of the image; that is, wherein the first compressed image is reduced in size and deepened in number of channels as compared to the base image; the second compressed image is reduced in size and the number of channels is deepened compared to the reference image. Illustratively, assuming that an image originally has a size h w c, the image has a size h/2 w/2 c 4c after being compressed by the squeeze function. Where h denotes the image height, w denotes the image width, and c denotes the number of image channels.
It should be noted that, for another image compression layer (also called a second image compression layer, located between two stream units) shown in fig. 4, the processing flow is similar to that of the first image compression layer.
603. The computer device extracts image content features of the first compressed image and image style features of the second compressed image based on forward computation processes of the levels in the image processing network.
The first compressed image is a base image subjected to image compression processing, and the second compressed image is a reference image subjected to image compression processing.
Taking the image feature extraction of the base image as an example, the method extracts the image content features of the compressed base image based on the forward computing process of the image processing network, and includes the following steps.
6031. And for any stream unit which is not connected with the image compression layer, carrying out linear transformation on the content characteristic tensor output by the previous stream unit based on the normalization layer of the stream unit to obtain a first output tensor.
The content feature tensor output by the previous stream unit is the input tensor of the current stream unit.
In some embodiments, linearly transforming the content feature tensor output by the previous stream unit based on the normalization layer of the stream unit to obtain a first output tensor comprises: performing point multiplication on each element in the content feature tensor output by the previous stream unit and a first parameter to obtain a first intermediate result; and adding the first intermediate result and the second parameter to obtain a first output tensor. Accordingly, the calculation formula is expressed as follows:
yi,j=w⊙xi,j+b
wherein "" indicates a dot product operation, w denotes a first parameter, b denotes a second parameter, xi,jThe element at the (i, j) position in the input tensor of the current stream unit is referred to, and the values of i and j are positive integers. The first parameter and the second parameter are both parameters to be learned.
In this step, each element in the input tensor of the current stream unit is multiplied by w, and then added with b as the offset term to obtain the output result of the current stream unit.
6032. And performing convolution operation on the first output tensor based on the reversible convolution layer of the stream unit to obtain a second output tensor.
In some embodiments, convolving the first output tensor based on the invertible convolution layer of the stream unit to obtain a second output tensor comprises: multiplying each element in the first output tensor by the first weight matrix respectively to obtain a second output tensor; accordingly, the calculation formula is expressed as follows:
Yi,j=W×Xi,j
the number of channels of the first output tensor and the second output tensor are the same; w refers to a first weight matrix, and the first weight matrix is a parameter to be learned; the size of the first weight matrix is c × c, and c is a positive integer; c denotes the number of channels of the first output tensor and the second output tensor; xi,jRefers to the element at the (i, j) position in the first output tensor.
6033. Based on the coupling layer of the flow unit, segmenting the second output tensor in the channel dimension to obtain a first sub tensor and a second sub tensor; carrying out nonlinear transformation on the first sub-tensor, and adding the obtained nonlinear transformation result and the second sub-tensor to obtain a third sub-tensor; performing feature splicing on the first sub tensor and the third sub tensor in the channel dimension to obtain a third output tensor; the third output tensor is input to the next stream unit. Accordingly, the calculation formula is expressed as follows:
xa,xb=split(x)
yb=NN(xa)+xb
y=concat(xa,yb)
wherein x denotes an output tensor of the invertible convolution layer of the stream unit, i.e., the second output tensor; x is the number ofaReference to the first sub-tensor, xbRefers to a second sub-tensor; split () refers to a slicing operation, concat () refers to a feature splicing operation, and NN () refers to a nonlinear transformation; yb denotes the third sub-tensor; y denotes the third output tensor. That is, the third output tensor continues to be input to the next stream unit connected thereto as the content feature tensor output by the current stream unit.
It should be noted that, for the forward calculation process, the flow of the stream units connected to the image compression layer is similar to the above steps 6031-6033, except that the input of the stream units is the output of the image compression layer or the output of the stream units is the input of the image compression layer.
604. And the computer equipment performs image style migration on the image content characteristics at the characteristic level based on the image style characteristics to obtain image fusion characteristics.
In the embodiment of the application, the image content of the basic image is characterized by fcMeaning, the image style characteristic of the reference image is expressed by fsIn this step, image style migration is performed at a feature level by the image style migration module 403 shown in fig. 4. Exemplarily, assuming that the image style migration operation is replaced with a function T (), the image style migration in the feature space can be represented by the following calculation formula.
fcs=r(fc,fs)
Wherein f iscsThe image fusion feature refers to the image feature after the image style migration obtained by the image style migration module, namely the image fusion feature.
Taking the image style migration module 403 as an AdaIN model as an example, performing image style migration at a feature level based on image content features of a base image and image style features of a reference image, including:
6041. acquiring a first mean value and a first variance of image content characteristics on a channel dimension; and acquiring a second mean and a second variance of the image style features in the channel dimension.
It should be noted that, for the sake of convenience of distinction, the image content feature f is herein referred tocThe mean and variance in the channel dimension, referred to as the first mean and first variance; characterizing an image style fsThe mean and variance in the channel dimension are referred to as the second mean and second variance.
The AdaIN model is to match the channel-wise mean and variance of the image content features to the channel-wise mean and variance of the image style features: in other words, the mean and variance of the feature map of each channel of the reference image are matched by aligning the mean and variance of the feature map of each channel of the base image. That is, the AdaIN model implements image style migration at the feature level by changing the data distribution of features.
6042. Acquiring a third intermediate result based on the image content characteristics, the first mean value and the first variance; multiplying the second variance by the third intermediate result to obtain a fourth intermediate result; and adding the fourth intermediate result and the second average value to obtain the image fusion characteristic.
Wherein, this step 6042 is expressed as follows using a calculation formula:
Figure BDA0003275183520000151
the formula can be understood as that the basic image is firstly de-stylized, namely, the image content characteristic subtracts the self mean value and then divides the self variance; and stylizing the style of the reference image by multiplying the variance of the image style features plus the corresponding mean.
Taking the image style migration module 403 as an example of the WCT model, based on the image content features of the base image and the image style features of the reference image, the image style migration is performed at a feature level, which includes two parts, namely, a Whitening operation (Whitening) and a coloring operation (Color Transfer).
Wherein, the whitening operation is: acquiring a first covariance matrix of image content characteristics, and performing matrix decomposition on the first covariance matrix to obtain a first matrix decomposition result; and determining the converted image content characteristics according to the first matrix decomposition result and the image content characteristics.
For the sake of convenience of distinction, the covariance matrix of the image content features is referred to as a first covariance matrix, the covariance matrix of the image style features is referred to as a second covariance matrix, and the covariance matrix of the image fusion features is referred to as a third covariance matrix.
In detail, the step is to obtain an orthogonal eigenvector corresponding to the first covariance matrix and a diagonal matrix formed by eigenvalues corresponding to the first covariance matrix, wherein the orthogonal eigenvector and the diagonal matrix are obtained by performing matrix decomposition on the first covariance matrix. In some embodiments, the matrix decomposition is a singular value decomposition, which is not limited herein. And then whitening the image content characteristics according to the orthogonal characteristic vector and the diagonal matrix.
Wherein, the coloring operation is as follows: acquiring a second covariance matrix of the image style characteristics, and performing matrix decomposition on the second covariance matrix to obtain a second matrix decomposition result; and acquiring image fusion characteristics according to the second matrix decomposition result and the converted image content characteristics. The obtaining mode of the second matrix decomposition result is similar to the obtaining mode of the first matrix decomposition result, and the second matrix decomposition result comprises an orthogonal eigenvector corresponding to the second covariance matrix and a diagonal matrix formed by eigenvalues corresponding to the second covariance matrix. In addition, the third covariance matrix of the image fusion features is the same as the first covariance matrix of the image-style features.
605. The computer device reconstructs the image fusion characteristics into a target image based on the reverse calculation process of each level in the image processing network, wherein the target image has the image content of a basic image and the image style of a reference image, and the forward calculation process and the reverse calculation process are mutually reversible.
This step utilizes the imageProcessing the inverse of the networkcsAnd restoring the image to an image space, finishing image reconstruction and obtaining a target image, wherein the target image not only retains the image content of the basic image, but also has the style of a reference image.
Wherein, the reverse calculation process and the forward calculation process of the image processing network are reciprocal. In some embodiments, reconstructing the image fusion feature into the target image based on a reverse calculation process of the image processing network includes the following steps.
6051. For any flow unit which is not connected with the image compression layer, segmenting the fusion feature tensor output by the previous flow unit in the channel dimension based on the coupling layer of the flow unit to obtain a fourth sub-tensor and a fifth sub-tensor; carrying out nonlinear transformation on the fourth sub-tensor, and subtracting the fifth sub-tensor from the obtained nonlinear transformation result to obtain a sixth sub-tensor; and performing feature splicing on the fourth sub tensor and the sixth sub tensor in the channel dimension to obtain a fourth output tensor.
6052. And performing inverse transformation of convolution operation on the fourth output tensor based on the reversible convolution layer of the stream unit to obtain a fifth output tensor.
In some embodiments, inverse transforming the convolution operation on the fourth output tensor based on the invertible convolution layer of the stream unit to obtain a fifth output tensor comprises: multiplying each element in the fourth output tensor by the second weight matrix respectively to obtain a fifth output tensor; the channel number of the fourth output tensor is the same as that of the fifth output tensor; the second weight matrix is the inverse of the weight matrix used in the forward calculation of the invertible convolution layer of the stream unit.
6053. Performing inverse transformation of linear transformation on the fifth output tensor based on the normalization layer of the stream unit to obtain a sixth output tensor; the sixth output tensor is input to the next stream unit.
In some embodiments, performing an inverse of the linear transformation on the fifth output tensor based on the normalization layer of the stream unit to obtain a sixth output tensor comprises: subtracting each element in the fifth output tensor from the second parameter respectively to obtain a second intermediate result; and taking the ratio of the second intermediate result to the first parameter as a sixth output tensor.
It should be noted that, for the reverse calculation process, the processing flow for other stream units connected to the image compression layer is similar to the above-mentioned steps 6051 to 6053. The difference is that either the input of these stream units is the output of the image compression layer or the output of these stream units is the input of the image compression layer.
The image processing network realizes image style migration based on the reversible image processing network in the calculation process, in detail, the image processing network extracts image features based on the forward calculation process of the image processing network and reconstructs the image based on the reverse calculation process of the image processing network, namely, the image processing network is a technical framework integrating the image feature extraction function and the image reconstruction function; the image processing network has reversibility, namely the image processing network is information lossless, so that the original image content cannot be lost in the image processing process, a target image with rich image content and a reference image style can be generated, the quality of the target image is improved, and the image processing effect is ensured.
Fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application. Referring to fig. 7, the apparatus includes:
a feature extraction module 701 configured to input the base image and the reference image into an image processing network; extracting image content characteristics of the basic image and image style characteristics of the reference image based on the image processing network;
a style migration module 702 configured to perform image style migration on the image content features of the base image at a feature level based on the image style features of the reference image, so as to obtain image fusion features;
an image reconstruction module 703 configured to reconstruct the image fusion features into a target image based on the image processing network, the target image having the image content of the base image and the image style of the reference image.
The embodiment of the application realizes image style migration based on an image processing network, in detail, based on the image processing network, image feature extraction and image reconstruction can be performed, namely, the image processing network is a technical framework integrating an image feature extraction function and an image reconstruction function, and integrating the image feature extraction function and the image reconstruction function ensures that the image processing network is information-lossless, so that the image style migration without losing original image content is realized, a target image with rich image content and a reference image style can be generated, the quality of the target image is improved, and the image processing effect is further ensured.
In some embodiments, the image processing network is a reversible model;
the feature extraction module configured to: extracting image content characteristics of the base image and image style characteristics of the reference image based on forward calculation processes of all levels in the image processing network;
the image reconstruction module configured to: reconstructing the image fusion features into the target image based on reverse calculation processes of all levels in the image processing network;
and the forward calculation process and the reverse calculation process are mutually reversible calculation.
In some embodiments, the image processing network comprises an image compression layer and a plurality of stream units, each of the stream units comprising a normalization layer, a reversible convolution layer, and a coupling layer; the feature extraction module comprises:
the first processing unit is configured to perform linear transformation on a content feature tensor output by a previous flow unit based on a normalization layer of the flow unit to obtain a first output tensor for any flow unit which is not connected with the image compression layer;
a second processing unit configured to perform a convolution operation on the first output tensor to obtain a second output tensor based on the reversible convolution layer of the stream unit;
a third processing unit configured to segment the second output tensor to obtain a first sub-tensor and a second sub-tensor based on a coupling layer of the stream unit; performing nonlinear transformation on the first sub-tensor, and adding an obtained nonlinear transformation result and the second sub-tensor to obtain a third sub-tensor; performing feature splicing on the first sub tensor and the third sub tensor to obtain a third output tensor; inputting the third output tensor into a next stream unit.
In some embodiments, the second processing unit is configured to:
performing point multiplication on each element in the content feature tensor output by the previous stream unit and a first parameter to obtain a first intermediate result; and adding the first intermediate result and a second parameter to obtain the first output tensor.
In some embodiments, the third processing unit is configured to:
multiplying each element in the first output tensor by a first weight matrix respectively to obtain a second output tensor; wherein the first output tensor and the second output tensor have the same number of channels; the first weight matrix has a size of c × c, c being a positive integer, c referring to the number of channels of the first output tensor and the second output tensor.
In some embodiments, the image reconstruction module is configured to:
for any flow unit which is not connected with the image compression layer, segmenting the fusion feature tensor output by the previous flow unit based on the coupling layer of the flow unit to obtain a fourth sub-tensor and a fifth sub-tensor; performing nonlinear transformation on the fourth sub-tensor, and subtracting the fifth sub-tensor from the obtained nonlinear transformation result to obtain a sixth sub-tensor; performing feature splicing on the fourth sub-tensor and the sixth sub-tensor to obtain a fourth output tensor;
performing inverse transformation of convolution operation on the fourth output tensor based on the reversible convolution layer of the stream unit to obtain a fifth output tensor;
performing inverse transformation of linear transformation on the fifth output tensor based on the normalization layer of the stream unit to obtain a sixth output tensor; inputting the sixth output tensor into a next stream unit.
In some embodiments, the image reconstruction module is configured to:
multiplying each element in the fourth output tensor by a second weight matrix respectively to obtain a fifth output tensor;
wherein the fourth output tensor and the fifth output tensor have the same number of channels; the second weight matrix is an inverse of a weight matrix used in a forward calculation process for the invertible convolutional layers of the stream units.
In some embodiments, the image reconstruction module is configured to:
subtracting each element in the fifth output tensor from a second parameter to obtain a second intermediate result; the ratio of the second intermediate result to the first parameter is taken as the sixth output tensor.
In some embodiments, the image processing network comprises an image compression layer and a plurality of stream units; the device further comprises:
the image compression module is configured to perform image compression processing on the basic image based on an image compression layer of the image processing network to obtain a first compressed image, and input the first compressed image into a flow unit connected with the image compression layer; the first compressed image is reduced in size and deepened in number of channels as compared to the base image;
the image compression module is further configured to perform image compression processing on the reference image based on the image compression layer to obtain a second compressed image, and input the second compressed image into a streaming unit connected with the image compression layer; the second compressed image is reduced in size and deepened in channel number as compared to the reference image.
In some embodiments, the style migration module is configured to:
acquiring a first mean value and a first variance of the image content features on a channel dimension;
acquiring a second mean value and a second variance of the image style characteristics on a channel dimension;
obtaining a third intermediate result based on the image content features, the first mean and the first variance; multiplying the second variance with the third intermediate result to obtain a fourth intermediate result;
and adding the fourth intermediate result and the second average value to obtain the image fusion characteristic.
In some embodiments, the style migration module is configured to:
acquiring a first covariance matrix of the image content characteristics, and performing matrix decomposition on the first covariance matrix to obtain a first matrix decomposition result; determining the converted image content characteristics according to the first matrix decomposition result and the image content characteristics;
acquiring a second covariance matrix of the image style characteristics, and performing matrix decomposition on the second covariance matrix to obtain a second matrix decomposition result; acquiring the image fusion characteristic according to the second matrix decomposition result and the converted image content characteristic;
wherein the third covariance matrix of the image fusion features is the same as the first covariance matrix of the image style features.
In some embodiments, the training process of the image processing network comprises:
inputting a sample basic image and a sample reference image into an initial network; based on the initial network, obtaining a sample target image having image content of the sample base image and an image style of the sample reference image;
constructing an image content loss function based on the sample base image and the sample target image;
constructing an image style loss function based on the sample reference image and the sample target image;
and updating the network parameters of the initial network according to the image content loss function and the image style loss function to obtain the image processing network.
All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
It should be noted that: in the image processing apparatus provided in the above embodiment, when processing an image, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the above described functions. In addition, the image processing apparatus and the image processing method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.
Fig. 8 shows a block diagram of a computer device 800 provided in an exemplary embodiment of the present application.
Generally, the computer device 800 includes: a processor 801 and a memory 802.
The processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 801 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 801 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 801 may be integrated with a GPU (Graphics Processing Unit) which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, the processor 801 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
Memory 802 may include one or more computer-readable storage media, which may be non-transitory. Memory 802 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 802 is used to store at least one program code for execution by the processor 801 to implement the image processing methods provided by the method embodiments herein.
In some embodiments, the computer device 800 may further optionally include: a peripheral interface 803 and at least one peripheral. The processor 801, memory 802 and peripheral interface 803 may be connected by a bus or signal line. Various peripheral devices may be connected to peripheral interface 803 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 804, a display screen 805, a camera assembly 806, an audio circuit 807, a positioning assembly 808, and a power supply 809.
The peripheral interface 803 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 801 and the memory 802. In some embodiments, the processor 801, memory 802, and peripheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 801, the memory 802, and the peripheral interface 803 may be implemented on separate chips or circuit boards, which are not limited by this embodiment.
The Radio Frequency circuit 804 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 804 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 804 converts an electrical signal into an electromagnetic signal to be transmitted, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 804 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 804 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 804 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display screen 805 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 805 is a touch display, the display 805 also has the ability to capture touch signals on or above the surface of the display 805. The touch signal may be input to the processor 801 as a control signal for processing. At this point, the display 805 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 805 may be one, disposed on the front panel of the computer device 800; in other embodiments, the display 805 may be at least two, each disposed on a different surface of the computer device 800 or in a folded design; in other embodiments, the display 805 may be a flexible display, disposed on a curved surface or on a folded surface of the computer device 800. Even further, the display 805 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 805 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.
The camera assembly 806 is used to capture images or video. Optionally, camera assembly 806 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 806 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
The audio circuit 807 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 801 for processing or inputting the electric signals to the radio frequency circuit 804 to realize voice communication. For stereo capture or noise reduction purposes, the microphones may be multiple and located at different locations on the computer device 800. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 801 or the radio frequency circuit 804 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 807 may also include a headphone jack.
The Location component 808 is used to locate the current geographic Location of the computer device 800 to implement navigation or LBS (Location Based Service). The Positioning component 808 may be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, or the galileo System in russia.
A power supply 809 is used to power the various components in the computer device 800. The power supply 809 can be ac, dc, disposable or rechargeable. When the power supply 809 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, the computer device 800 also includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyro sensor 812, pressure sensor 813, fingerprint sensor 814, optical sensor 815 and proximity sensor 816.
The acceleration sensor 811 may detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the computer apparatus 800. For example, the acceleration sensor 811 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 801 may control the display 805 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 811. The acceleration sensor 811 may also be used for acquisition of motion data of a game or a user.
The gyro sensor 812 may detect a body direction and a rotation angle of the computer device 800, and the gyro sensor 812 may cooperate with the acceleration sensor 811 to acquire a 3D motion of the user with respect to the computer device 800. From the data collected by the gyro sensor 812, the processor 801 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
Pressure sensors 813 may be disposed on the side bezel of computer device 800 and/or underneath display screen 805. When the pressure sensor 813 is arranged on the side frame of the computer device 800, the holding signal of the user to the computer device 800 can be detected, and the processor 801 performs left-right hand identification or shortcut operation according to the holding signal collected by the pressure sensor 813. When the pressure sensor 813 is disposed at a lower layer of the display screen 805, the processor 801 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 805. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The fingerprint sensor 814 is used for collecting a fingerprint of the user, and the processor 801 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 814, or the fingerprint sensor 814 identifies the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 801 authorizes the user to perform relevant sensitive operations including unlocking a screen, viewing encrypted information, downloading software, paying for and changing settings, etc. Fingerprint sensor 814 may be disposed on the front, back, or side of computer device 800. When a physical key or vendor Logo is provided on the computer device 800, the fingerprint sensor 814 may be integrated with the physical key or vendor Logo.
The optical sensor 815 is used to collect the ambient light intensity. In one embodiment, processor 801 may control the display brightness of display 805 based on the ambient light intensity collected by optical sensor 815. Specifically, when the ambient light intensity is high, the display brightness of the display screen 805 is increased; when the ambient light intensity is low, the display brightness of the display 805 is reduced. In another embodiment, the processor 801 may also dynamically adjust the shooting parameters of the camera assembly 806 based on the ambient light intensity collected by the optical sensor 815.
A proximity sensor 816, also known as a distance sensor, is typically provided on the front panel of the computer device 800. The proximity sensor 816 is used to capture the distance between the user and the front of the computer device 800. In one embodiment, the processor 801 controls the display 805 to switch from the bright screen state to the dark screen state when the proximity sensor 816 detects that the distance between the user and the front face of the computer device 800 is gradually reduced; when the proximity sensor 816 detects that the distance between the user and the front of the computer device 800 is gradually increasing, the display screen 805 is controlled by the processor 801 to switch from a breath-screen state to a bright-screen state.
Those skilled in the art will appreciate that the configuration illustrated in FIG. 8 is not intended to be limiting of the computer device 800 and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components may be employed.
Fig. 9 is a schematic structural diagram of a computer device 900 according to an embodiment of the present application. The computer 900 may be a server. The computer device 900 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 901 and one or more memories 902, where the memory 902 stores at least one program code, and the at least one program code is loaded and executed by the processors 901 to implement the image Processing method provided by the above-mentioned method embodiments. Certainly, the computer device 900 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the computer device 900 may also include other components for implementing device functions, which are not described herein again.
In some embodiments, there is also provided a computer readable storage medium, such as a memory, comprising program code executable by a processor in a computer device to perform the image processing method in the above embodiments. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.
In some embodiments, there is also provided a computer program product or a computer program comprising computer program code stored in a computer readable storage medium, the computer program code being read by a processor of a computer device from the computer readable storage medium, the computer program code being executed by the processor to cause the computer device to perform the image processing method described above.
In some embodiments, the computer program according to the embodiments of the present application may be deployed to be executed on one computer device or on multiple computer devices located at one site, or may be executed on multiple computer devices distributed at multiple sites and interconnected by a communication network, and the multiple computer devices distributed at the multiple sites and interconnected by the communication network may constitute a block chain system.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (15)

1. An image processing method, characterized in that the method comprises:
inputting the base image and the reference image into an image processing network;
extracting image content characteristics of the basic image and image style characteristics of the reference image based on the image processing network;
based on the image style characteristics of the reference image, carrying out image style migration on the image content characteristics of the basic image at a characteristic level to obtain image fusion characteristics;
reconstructing the image fusion features into a target image based on the image processing network, the target image having image content of the base image and an image style of the reference image.
2. The method of claim 1, wherein the image processing network is a reversible model;
the extracting the image content features of the base image and the image style features of the reference image based on the image processing network comprises:
extracting image content characteristics of the base image and image style characteristics of the reference image based on forward calculation processes of all levels in the image processing network;
the reconstructing the image fusion features into a target image based on the image processing network includes:
reconstructing the image fusion features into the target image based on reverse calculation processes of all levels in the image processing network;
and the forward calculation process and the reverse calculation process are mutually reversible calculation.
3. The method of claim 2, wherein the image processing network comprises an image compression layer and a plurality of flow units, each flow unit comprising a normalization layer, a reversible convolution layer, and a coupling layer;
the image content feature extraction of the basic image based on the forward computing process of each layer in the image processing network comprises the following steps:
for any flow unit which is not connected with the image compression layer, carrying out linear transformation on a content feature tensor output by the previous flow unit based on a normalization layer of the flow unit to obtain a first output tensor;
performing convolution operation on the first output tensor based on the reversible convolution layer of the stream unit to obtain a second output tensor;
segmenting the second output tensor to obtain a first sub-tensor and a second sub-tensor based on a coupling layer of the stream unit; performing nonlinear transformation on the first sub-tensor, and adding an obtained nonlinear transformation result and the second sub-tensor to obtain a third sub-tensor; performing feature splicing on the first sub tensor and the third sub tensor to obtain a third output tensor; inputting the third output tensor into a next stream unit.
4. The method of claim 3, wherein linearly transforming the content feature tensor output by the previous stream unit based on the normalization layer of the stream unit to obtain a first output tensor comprises:
performing point multiplication on each element in the content feature tensor output by the previous stream unit and a first parameter to obtain a first intermediate result;
and adding the first intermediate result and a second parameter to obtain the first output tensor.
5. The method of claim 3, wherein convolving the first output tensor based on the invertible convolutional layer of the stream unit to obtain a second output tensor comprises:
multiplying each element in the first output tensor by a first weight matrix respectively to obtain a second output tensor;
wherein the first output tensor and the second output tensor have the same number of channels; the first weight matrix has a size of c × c, c being a positive integer, c referring to the number of channels of the first output tensor and the second output tensor.
6. The method of claim 2, wherein the image processing network comprises an image compression layer and a plurality of flow units, each flow unit comprising a normalization layer, a reversible convolution layer, and a coupling layer;
reconstructing the image fusion features into a target image based on a reverse calculation process of each level in the image processing network, including:
for any flow unit which is not connected with the image compression layer, segmenting the fusion feature tensor output by the previous flow unit based on the coupling layer of the flow unit to obtain a fourth sub-tensor and a fifth sub-tensor; performing nonlinear transformation on the fourth sub-tensor, and subtracting the fifth sub-tensor from the obtained nonlinear transformation result to obtain a sixth sub-tensor; performing feature splicing on the fourth sub-tensor and the sixth sub-tensor to obtain a fourth output tensor;
performing inverse transformation of convolution operation on the fourth output tensor based on the reversible convolution layer of the stream unit to obtain a fifth output tensor;
performing inverse transformation of linear transformation on the fifth output tensor based on the normalization layer of the stream unit to obtain a sixth output tensor; inputting the sixth output tensor into a next stream unit.
7. The method of claim 6, wherein said inverse transforming said convolution operation on said fourth output tensor based on said reversible convolution layer of said stream unit to obtain a fifth output tensor comprises:
multiplying each element in the fourth output tensor by a second weight matrix respectively to obtain a fifth output tensor;
wherein the fourth output tensor and the fifth output tensor have the same number of channels; the second weight matrix is an inverse of a weight matrix used in a forward calculation process for the invertible convolutional layers of the stream units.
8. The method of claim 6, wherein said inverse linear transformation of said fifth output tensor based on said normalization layer for stream units resulting in a sixth output tensor comprises:
subtracting each element in the fifth output tensor from a second parameter to obtain a second intermediate result; the ratio of the second intermediate result to the first parameter is taken as the sixth output tensor.
9. The method of claim 1, wherein the image processing network comprises an image compression layer and a plurality of stream units; the method further comprises the following steps:
based on an image compression layer of the image processing network, carrying out image compression processing on the basic image to obtain a first compressed image, and inputting the first compressed image into a flow unit connected with the image compression layer; the first compressed image is reduced in size and deepened in number of channels as compared to the base image;
based on the image compression layer, performing image compression processing on the reference image to obtain a second compressed image, and inputting the second compressed image into a stream unit connected with the image compression layer; the second compressed image is reduced in size and deepened in channel number as compared to the reference image.
10. The method according to claim 1, wherein performing image style migration at a feature level based on image content features of the base image and image style features of the reference image to obtain image fusion features comprises:
acquiring a first mean value and a first variance of the image content features on a channel dimension;
acquiring a second mean value and a second variance of the image style characteristics on a channel dimension;
obtaining a third intermediate result based on the image content features, the first mean and the first variance; multiplying the second variance with the third intermediate result to obtain a fourth intermediate result;
and adding the fourth intermediate result and the second average value to obtain the image fusion characteristic.
11. The method according to claim 1, wherein performing image style migration at a feature level based on image content features of the base image and image style features of the reference image to obtain image fusion features comprises:
acquiring a first covariance matrix of the image content characteristics, and performing matrix decomposition on the first covariance matrix to obtain a first matrix decomposition result; determining the converted image content characteristics according to the first matrix decomposition result and the image content characteristics;
acquiring a second covariance matrix of the image style characteristics, and performing matrix decomposition on the second covariance matrix to obtain a second matrix decomposition result; acquiring the image fusion characteristic according to the second matrix decomposition result and the converted image content characteristic;
wherein the third covariance matrix of the image fusion features is the same as the first covariance matrix of the image style features.
12. An image processing apparatus, characterized in that the apparatus comprises:
a feature extraction module configured to input the base image and the reference image into an image processing network; extracting image content characteristics of the basic image and image style characteristics of the reference image based on the image processing network;
the style migration module is configured to perform image style migration on the image content characteristics of the basic image at a characteristic level based on the image style characteristics of the reference image to obtain image fusion characteristics;
an image reconstruction module configured to reconstruct the image fusion features into a target image based on the image processing network, the target image having image content of the base image and an image style of the reference image.
13. A computer device, characterized in that the device comprises a processor and a memory, in which at least one program code is stored, which is loaded and executed by the processor to implement the image processing method according to any of claims 1 to 11.
14. A computer-readable storage medium, characterized in that at least one program code is stored in the storage medium, which is loaded and executed by a processor to implement the image processing method according to any one of claims 1 to 11.
15. A computer program product or a computer program, characterized in that the computer program product or the computer program comprises computer program code, which is stored in a computer-readable storage medium, from which a processor of a computer device reads the computer program code, the processor executing the computer program code, causing the computer device to perform the image processing method according to any one of claims 1 to 11.
CN202111114829.0A 2021-09-23 2021-09-23 Image processing method, device, equipment and storage medium Pending CN114283050A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111114829.0A CN114283050A (en) 2021-09-23 2021-09-23 Image processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111114829.0A CN114283050A (en) 2021-09-23 2021-09-23 Image processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114283050A true CN114283050A (en) 2022-04-05

Family

ID=80868559

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111114829.0A Pending CN114283050A (en) 2021-09-23 2021-09-23 Image processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114283050A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116012248A (en) * 2022-12-30 2023-04-25 腾讯科技(深圳)有限公司 Image processing method, device, computer equipment and computer storage medium
CN116193242A (en) * 2023-04-24 2023-05-30 北京城建智控科技股份有限公司 Image analysis and transmission method of camera device
CN116664719A (en) * 2023-07-28 2023-08-29 腾讯科技(深圳)有限公司 Image redrawing model training method, image redrawing method and device
CN116912353A (en) * 2023-09-13 2023-10-20 上海蜜度信息技术有限公司 Multitasking image processing method, system, storage medium and electronic device
CN117576265A (en) * 2024-01-15 2024-02-20 腾讯科技(深圳)有限公司 Method, device, computer equipment and storage medium for generating style image

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116012248A (en) * 2022-12-30 2023-04-25 腾讯科技(深圳)有限公司 Image processing method, device, computer equipment and computer storage medium
CN116012248B (en) * 2022-12-30 2024-03-26 腾讯科技(深圳)有限公司 Image processing method, device, computer equipment and computer storage medium
CN116193242A (en) * 2023-04-24 2023-05-30 北京城建智控科技股份有限公司 Image analysis and transmission method of camera device
CN116193242B (en) * 2023-04-24 2023-07-14 北京城建智控科技股份有限公司 Image analysis and transmission method of camera device
CN116664719A (en) * 2023-07-28 2023-08-29 腾讯科技(深圳)有限公司 Image redrawing model training method, image redrawing method and device
CN116664719B (en) * 2023-07-28 2023-12-29 腾讯科技(深圳)有限公司 Image redrawing model training method, image redrawing method and device
CN116912353A (en) * 2023-09-13 2023-10-20 上海蜜度信息技术有限公司 Multitasking image processing method, system, storage medium and electronic device
CN116912353B (en) * 2023-09-13 2023-12-19 上海蜜度信息技术有限公司 Multitasking image processing method, system, storage medium and electronic device
CN117576265A (en) * 2024-01-15 2024-02-20 腾讯科技(深圳)有限公司 Method, device, computer equipment and storage medium for generating style image

Similar Documents

Publication Publication Date Title
CN110136136B (en) Scene segmentation method and device, computer equipment and storage medium
CN111091132B (en) Image recognition method and device based on artificial intelligence, computer equipment and medium
CN114283050A (en) Image processing method, device, equipment and storage medium
CN110544272B (en) Face tracking method, device, computer equipment and storage medium
CN111489378B (en) Video frame feature extraction method and device, computer equipment and storage medium
CN110570460B (en) Target tracking method, device, computer equipment and computer readable storage medium
CN111091166B (en) Image processing model training method, image processing device, and storage medium
CN112991494B (en) Image generation method, device, computer equipment and computer readable storage medium
CN112272311B (en) Method, device, terminal, server and medium for repairing splash screen
CN112581358B (en) Training method of image processing model, image processing method and device
CN114332530A (en) Image classification method and device, computer equipment and storage medium
CN111738914B (en) Image processing method, device, computer equipment and storage medium
CN112990053B (en) Image processing method, device, equipment and storage medium
CN115019050A (en) Image processing method, device, equipment and storage medium
CN110675412A (en) Image segmentation method, training method, device and equipment of image segmentation model
CN115170896A (en) Image processing method and device, electronic equipment and readable storage medium
CN111768507B (en) Image fusion method, device, computer equipment and storage medium
CN114283299A (en) Image clustering method and device, computer equipment and storage medium
CN113570510A (en) Image processing method, device, equipment and storage medium
CN113822263A (en) Image annotation method and device, computer equipment and storage medium
CN113705302A (en) Training method and device for image generation model, computer equipment and storage medium
CN113569822B (en) Image segmentation method and device, computer equipment and storage medium
CN112508959B (en) Video object segmentation method and device, electronic equipment and storage medium
CN114511082A (en) Training method of feature extraction model, image processing method, device and equipment
CN114328815A (en) Text mapping model processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination