WO2022156621A1 - 基于人工智能的图像上色方法、装置、电子设备、计算机可读存储介质及计算机程序产品 - Google Patents

基于人工智能的图像上色方法、装置、电子设备、计算机可读存储介质及计算机程序产品 Download PDF

Info

Publication number
WO2022156621A1
WO2022156621A1 PCT/CN2022/072298 CN2022072298W WO2022156621A1 WO 2022156621 A1 WO2022156621 A1 WO 2022156621A1 CN 2022072298 W CN2022072298 W CN 2022072298W WO 2022156621 A1 WO2022156621 A1 WO 2022156621A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
colored
feature
color
prior information
Prior art date
Application number
PCT/CN2022/072298
Other languages
English (en)
French (fr)
Inventor
邬彦泽
李昱
王鑫涛
张宏伦
赵珣
单瀛
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2022156621A1 publication Critical patent/WO2022156621A1/zh
Priority to US17/971,279 priority Critical patent/US20230040256A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/02Affine transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics

Definitions

  • the present application relates to image processing technology, and in particular, to an artificial intelligence-based image coloring method, apparatus, electronic device, computer-readable storage medium, and computer program product.
  • Artificial intelligence is a comprehensive technology of computer science. By studying the design principles and implementation methods of various intelligent machines, the machines have the functions of perception, reasoning and decision-making. Artificial intelligence technology is a comprehensive subject covering a wide range of fields, such as natural language processing technology and machine learning/deep learning. With the development of technology, artificial intelligence technology will be applied in more fields, and play a more increasingly important value.
  • Image processing is an important application of artificial intelligence, and typically, a corresponding shaded image can be generated based on a grayscale image.
  • problems such as color bleeding and color fading are prone to occur, which greatly affects the quality of the generated colored image. image processing efficiency.
  • Embodiments of the present application provide an artificial intelligence-based image coloring method, device, electronic device, computer-readable storage medium, and computer program product, which can accurately colorize an image to be colored, thereby improving the accuracy of image processing of electronic equipment. and image processing efficiency.
  • An embodiment of the present application provides an artificial intelligence-based image coloring method, the method is executed by an electronic device, and the method includes:
  • An upsampling process is performed on the second image feature based on the second color prior information to obtain a first colored image, wherein the first colored image is aligned with the to-be-colored image.
  • the embodiment of the present application provides an artificial intelligence-based image coloring device, including:
  • an acquisition module configured to acquire the first color prior information of the image to be colored
  • a transformation module configured to perform transformation processing on the first color prior information to obtain second color prior information aligned with the to-be-colored image
  • a processing module configured to downsample the to-be-colored image to obtain a first image feature; and configured to perform modulation and coloring processing on the first image feature based on the second color prior information to obtain a second image feature ; and is configured to perform an upsampling process on the second image feature based on the second color prior information to obtain a first colored image, wherein the first colored image is aligned with the to-be-colored image.
  • the embodiment of the present application provides an electronic device, including:
  • the processor is configured to implement the artificial intelligence-based image coloring method provided by the embodiment of the present application when executing the executable instructions stored in the memory.
  • the embodiments of the present application provide a computer-readable storage medium storing executable instructions for implementing the artificial intelligence-based image coloring method provided by the embodiments of the present application when executed by a processor.
  • the embodiments of the present application provide a computer program product, including computer programs or instructions, for implementing the artificial intelligence-based image coloring method provided by the embodiments of the present application when executed by a processor.
  • FIG. 1 is a schematic diagram of the architecture of an artificial intelligence-based coloring system 10 provided by an embodiment of the present application;
  • FIG. 2 is a schematic structural diagram of a terminal 400 provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of the composition and structure of the coloring system 10 provided by the embodiment of the present application.
  • FIG. 4 is a schematic diagram of image coloring provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of an artificial intelligence-based image coloring method provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of an artificial intelligence-based image coloring method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a coloring effect provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a coloring effect provided by an embodiment of the present application.
  • first/second/third is only used to distinguish similar objects, and does not represent a specific ordering of objects. It is understood that "first/second/third" Where permitted, the specific order or sequence may be interchanged to enable the embodiments of the application described herein to be practiced in sequences other than those illustrated or described herein.
  • Color prior information before the image is processed, the color-related experience and historical data that can be known, for example, can be expressed in the form of a feature map.
  • the color prior information can be a feature map including the features of the middle layer of the generative adversarial network.
  • Affine transformation It is a linear transformation from a two-dimensional vector to a two-dimensional vector. Affine transformations can be achieved by the composition of a series of atomic transformations such as translation, scaling, flipping, rotation and shearing.
  • Generative Adversarial Networks It is a deep learning model that includes a generator and a discriminator. The generator and discriminator play against each other and learn to produce reasonably good outputs. Among them, the discriminator performs classification prediction based on the input variables, and the generator randomly generates observation data through a given implicit information.
  • Foreground It is the person or thing that is in front of the subject or close to the front in the shot.
  • Downsampling Refers to further compressing the feature map, and reducing the features through maximum pooling or average pooling. In fact, it is a feature with small filtering effect and redundant information, and key information is retained.
  • Colorize an image that is, colorize a grayscale image.
  • Related technologies are based on deep learning to color images.
  • This method can be divided into two types, one is fully automatic coloring, and the other is coloring based on reference images.
  • the advantage of fully automatic coloring is that it is simple and convenient. It only needs to design the loss function for end-to-end training and testing, but it is easy to generate defective colored images, such as color bleeding and color fading.
  • To colorize according to the reference image you first need to provide a color reference image with similar content to the image to be colored, and then transfer the color of the reference image to the image to be colored according to the matching of the two images.
  • the coloring effect of coloring from a reference image depends largely on the quality of the reference image.
  • the embodiments of the present application provide an artificial intelligence-based image coloring method, device, electronic device, computer-readable storage medium, and computer program product, which can accurately diversify the coloring image, thereby improving the Image processing accuracy and image processing efficiency of electronic devices.
  • the following describes an exemplary application of the artificial intelligence-based image coloring method provided by the embodiments of the present application.
  • the artificial intelligence-based image coloring method provided by the embodiments of the present application may be implemented by various electronic devices, for example, may be implemented by a terminal alone , and can also be implemented collaboratively by the server and the terminal.
  • the terminal alone executes the artificial intelligence-based image coloring method described below, or the terminal and the server execute the artificial intelligence-based image coloring method described below, for example, the terminal sends the image to be colored to the server, and the server receives the image according to the The image to be colored performs an artificial intelligence-based image colorization method.
  • the electronic device for image coloring may be various types of terminal devices or servers, where the server may be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers It can also provide basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
  • the terminal can be a smart phone, a tablet computer, a notebook computer, a desktop computer, etc., but is not limited to this.
  • the terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in this application.
  • AIaaS artificial intelligence cloud services
  • the AIaaS platform will split several types of common AI services and provide independent services in the cloud. Or packaged services. This service model is similar to an AI-themed mall. All users can access one or more artificial intelligence services provided by the AIaaS platform through application programming interfaces.
  • one of the artificial intelligence cloud services may be an image coloring service, that is, a server in the cloud is encapsulated with the image coloring program provided by the embodiment of the present application.
  • the terminal sends an image coloring request carrying the image to be colorized to the server in the cloud, and the server in the cloud invokes the packaged image coloring program, generates a first colorized image based on the image to be colored, and converts the image to be colorized.
  • the first colored image is returned to the terminal, so that the terminal can display the first colored image.
  • FIG. 1 is a schematic structural diagram of an artificial intelligence-based coloring system 10 provided by an embodiment of the present application.
  • the terminal 400 is connected to the server 200 through the network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.
  • the terminal runs an electronic photo album
  • the terminal 400 receives the user's repair operation on the photo (the image to be colored), that is, the photo needs to be colored
  • the server 200 receives the image coloring request from the terminal 400, and the image coloring request carries the image coloring request. color image.
  • the server 200 obtains the first color prior information of the image to be colored, transforms the first color prior information, and obtains the second color prior information aligned with the image to be colored.
  • Two-color priori information colorizes the image to be colored, obtains a first colored image aligned with the image to be colored, and sends the first colored image to the terminal 400 to display the first colored image in the terminal 400.
  • the image coloring method provided in the application embodiment can also colorize the video frame in the video file, thereby realizing video restoration.
  • the terminal implements the artificial intelligence-based image coloring method provided by the embodiment of the present application by running a computer program
  • the computer program may be in the operating system.
  • Native program or software module can be a native application (APP, Application), that is, an artificial intelligence-based image coloring program that needs to be installed in the operating system to run; it can also be a small program, that is, it only needs to be downloaded to An artificial intelligence-based image coloring applet that can run in the browser environment of any client.
  • APP Native application
  • the above-mentioned computer program can be any application, module or plug-in in any form.
  • FIG. 2 is a schematic structural diagram of a terminal 400 provided by an embodiment of the present application.
  • the terminal 400 shown in FIG. 2 includes: at least one processor 410 , a memory 450 , at least one network interface 420 and a user interface 430 .
  • the various components in terminal 400 are coupled together by bus system 440 .
  • bus system 440 is used to implement the connection communication between these components.
  • the bus system 440 also includes a power bus, a control bus, and a status signal bus.
  • the various buses are labeled as bus system 440 in FIG. 3 .
  • the processor 410 may be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP, Digital Signal Processor), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., where a general-purpose processor may be a microprocessor or any conventional processor or the like.
  • DSP Digital Signal Processor
  • User interface 430 includes one or more output devices 431 that enable presentation of media content, including one or more speakers and/or one or more visual display screens.
  • User interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, and other input buttons and controls.
  • Memory 450 may be removable, non-removable, or a combination thereof.
  • Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like.
  • Memory 450 optionally includes one or more storage devices that are physically remote from processor 410 .
  • Memory 450 includes volatile memory or non-volatile memory, and may also include both volatile and non-volatile memory.
  • the non-volatile memory may be a read-only memory (ROM, Read Only Memory), and the volatile memory may be a random access memory (RAM, Random Access Memory).
  • ROM read-only memory
  • RAM random access memory
  • the memory 450 described in the embodiments of the present application is intended to include any suitable type of memory.
  • memory 450 is capable of storing data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.
  • the operating system 451 includes system programs for processing various basic system services and performing hardware-related tasks, such as framework layer, core library layer, driver layer, etc., for implementing various basic services and processing hardware-based tasks;
  • a presentation module 453 for enabling presentation of information (eg, a user interface for operating peripherals and displaying content and information) via one or more output devices 431 (eg, a display screen, speakers, etc.) associated with the user interface 430 );
  • An input processing module 454 for detecting one or more user inputs or interactions from one of the one or more input devices 432 and translating the detected inputs or interactions.
  • the artificial intelligence-based image coloring device provided by the embodiments of the present application may be implemented in software.
  • FIG. 2 shows the artificial intelligence-based image coloring device 455 stored in the memory 450, which may be Software in the form of programs and plug-ins, including the following software modules: acquisition module 4551, transformation module 4552, processing module 4553 and training module 4554, these modules are logical, and therefore can be arbitrarily combined or further dismantled according to the realized functions. point. The function of each module will be explained below.
  • FIG. 3 is a schematic diagram of the composition and structure of the coloring system 10 provided by the embodiment of the present application.
  • the shading system 10 includes an encoder, a pretrained GAN, a transform network, and a shading network.
  • the encoder is used to obtain the encoding vector of the image to be colored, and the encoder can be a generator in a generative adversarial network, an encoder part of an automatic encoder, or a convolutional neural network.
  • the pre-trained GAN is the generator of the trained GAN, which is used to generate the second colorized image and the first color prior information of the to-be-colored image.
  • the transformation network is used to transform the first color prior information based on the to-be-colored image and the second color-coded image to obtain second color prior information.
  • the rendering network is used to generate the first rendering image based on the image to be rendered and the second color prior information.
  • FIG. 4 is a schematic diagram of image coloring provided by an embodiment of the present application.
  • the coloring network includes a downsampling module, a residual module and an upsampling module.
  • the down-sampling module is composed of a plurality of down-sampling layers, and is configured to down-sample the image to be colored to obtain the first image feature;
  • the residual module is composed of a plurality of residual blocks, and is configured to analyze the first image based on the second color prior information.
  • the feature is modulated and colored to obtain the second image feature;
  • the upsampling module is composed of a plurality of upsampling layers, and is configured to perform upsampling processing on the second image feature based on the second color prior information, and obtain an image aligned with the to-be-colored image. 's first shaded image.
  • the execution subject of the following method may be a terminal or a server, and may specifically be a terminal or The server is implemented by running the above various computer programs; of course, according to the following understanding, it is not difficult to see that the artificial intelligence-based image coloring method provided by the embodiments of the present application can also be implemented by the terminal and the server.
  • FIG. 5 is a schematic flowchart of an artificial intelligence-based image coloring method provided by an embodiment of the present application, which will be described in conjunction with the steps shown in FIG. 5, the parts of the coloring system shown in FIG. 3, and FIG. 4 .
  • step 101 first color prior information of the image to be colored is obtained.
  • the image to be colored is a grayscale image in the LAB color mode, that is, the grayscale image has only a luminance channel (L) and lacks color channels (A and B). If the image to be colored is in RGB color mode, it needs to be converted to LAB color mode first.
  • the first color prior information is color prior information related to the image to be colored, for example, the color prior information related to the image to be colored in the pre-training GAN, that is, the intermediate layer feature of the GAN.
  • the encoding vector of the to-be-colored image may be obtained through the encoder first.
  • the encoder can be replaced with other convolutional neural networks.
  • colorize the to-be-colored image through pre-training GAN to obtain a second colorized image.
  • the pre-trained GAN can be a trained BigGAN or a trained StyleGAN.
  • the generator of BigGAN includes multiple residual blocks.
  • the encoding vector is linearly transformed and sent to the first residual block.
  • Each residual block includes batch normalization (BN, Batch Normalization). ) layers, activation layers, and convolutional layers.
  • Each residual block is skip-connected through 1 ⁇ 1 convolutions to achieve identity mapping to encoded vectors.
  • Identity mapping can directly pass the output of the previous layer (also the input of the latter layer) to the output of the latter layer, so that the output of the latter layer is approximated to its input, so as to keep the accuracy in the latter layer without causing a drop in accuracy .
  • BigGAN generates a second colored image.
  • the residual blocks can also be skip-connected through non-1 ⁇ 1 convolution.
  • the size of the feature maps corresponding to the output features of each residual block is different, that is, the scales of the output features are different.
  • the output features (multi-scale features) of different residual blocks are combined to obtain the first color prior information.
  • the identity mapping processing can effectively improve the output accuracy of multi-scale features. , that is, to improve the acquisition accuracy of the first color prior information, so as to accurately colorize the to-be-colored image, thereby improving the image processing accuracy and image processing efficiency of the electronic device.
  • step 102 transform processing is performed on the first color prior information to obtain second color prior information aligned with the image to be colored.
  • alignment refers to the same part (corresponding to one or more pixels) in the same position in different images.
  • the multiple pixels that make up a chicken tail are in the same position in different images.
  • the color prior information is aligned with the image to be colored, and its essence is that the position of the same object in the two is consistent. Since the color prior information is expressed in the form of a feature map, the alignment of the color prior information and the image to be colored means that the positions of the same object in the image to be colored and the color prior information are consistent.
  • the positions of the second colored image and the background part and the foreground part of the to-be-colored image in the figure are not in one-to-one correspondence, that is, the corresponding pixels in the two images are not aligned.
  • the position of the chicken tail in the second colored image is obviously inconsistent with the position of the chicken tail in the image to be colored.
  • the multi-scale features in the first color prior information and the image features corresponding to the image to be colored are not in a one-to-one correspondence, and there is a deviation. Therefore, it is also necessary to transform the first color prior information to obtain the second color prior information aligned with the image to be colored, that is, the color prior information aligned with the image features corresponding to the image to be colored. At this time, The colored image corresponding to the second color prior information is aligned with the to-be-colored image.
  • performing transformation processing on the first color prior information in step 102 to obtain the second color prior information aligned with the to-be-colored image is realized by a transformation network in the color rendering system 10, and the realization process is As shown in steps 1021 to 1023 in FIG. 6 .
  • step 1021 the similarity matrix between the image to be colored and the second colored image is determined, and the second colored image is obtained by colorizing the image to be colored and is not aligned with the image to be colored.
  • the first position feature of the image to be colored and the second position feature of the second color image can be extracted by the feature extractor, respectively.
  • the first position feature includes the position feature of each pixel in the to-be-colored image
  • the second position feature includes the position feature of each pixel in the second color image.
  • non-local processing is performed on the first position feature and the second position feature to obtain a similarity matrix between the image to be colored and the second colored image.
  • the similarity matrix includes the similarity between each pixel in the image to be colored and each pixel in the second colored image.
  • Non-local processing is used to obtain the similarity between a pixel in the image to be colored and any pixel in the second colored image by calculation, and the calculation methods include dot product, stitching, and bilinear similarity measure.
  • the similarity is calculated by the dot product
  • the similarity of the two positions can be obtained by calculating the dot product of the position vectors (position features) of the corresponding positions of the image to be colored and the second colored image.
  • the similarity is calculated by splicing, the position vectors of the corresponding positions in the two images are spliced and sent to the perceptron to predict the similarity between the two.
  • the similarity matrix can be normalized by the softmax function, so that the sum of the elements of each row in the similarity matrix is 1.
  • the obtained normalized similarity matrix is used as the similarity matrix between the image to be colored and the second colored image.
  • Determining the similarity matrix between the image to be colored and the second colored image that is not aligned with the image to be colored based on the positional features of the pixels can improve the accuracy of the similarity matrix, thereby improving the image processing accuracy of electronic equipment and Image processing efficiency.
  • step 1022 affine transformation is performed on the first color prior information based on the similarity matrix to obtain multi-scale features aligned with the image to be colored.
  • the first color prior information includes multi-scale features obtained in the process of colorizing the to-be-colored image, and affine transformation is performed on the multi-scale features in the first color prior information, that is, the similarity
  • the multi-scale feature aligned with the to-be-colored image can be obtained by multiplying the matrix with the multi-scale feature in the first color prior information.
  • step 1023 the multi-scale features aligned with the to-be-colored image are used as the second color prior information.
  • the similarity matrix between the image to be colored and the second colored image is obtained based on the similarity of the positional features at the corresponding positions of the image to be colored and the second colored image, and the first color prior information is determined by the similarity matrix.
  • the second color prior information aligned with the image to be colored can be obtained, which provides an accuracy guarantee for the subsequent generation of the first colored image aligned with the image to be colored, thereby improving the image processing accuracy of the electronic device. and image processing efficiency.
  • step 103 the image to be colored is down-sampled to obtain a first image feature.
  • the image to be rendered is downsampled by a downsampling module in the rendering network.
  • the down-sampling module includes multiple down-sampling layers, and in each down-sampling layer, the input features are convolved to obtain corresponding image features, and the obtained image features represent the position information and semantic information of the image to be colored.
  • the obtained image features are pooled to obtain the corresponding pooling results, and the pooling results are used as the input features of the next layer. Take the output of the last downsampling layer as the first image feature.
  • step 104 modulation and coloring processing is performed on the first image feature based on the second color prior information to obtain the second image feature.
  • the residual module and the upsampling module of the colorization network are respectively controlled by the multi-scale features aligned with the image to be colored.
  • Different scale features in the multi-scale features aligned with the image to be colored correspond to different parts of the colorization network.
  • the upsampling module of the colorization network includes two upsampling layers, there are a total of 3 scale features in the multi-scale features aligned with the image to be colored, which are respectively related to the residual module, the first upsampling layer, The second upsampling layer corresponds.
  • a first modulation parameter is determined based on multi-scale features aligned with the image to be colored in the second color prior information. That is, in the multi-scale features aligned with the image to be colored, the first scale feature corresponding to the residual module in the coloring network is determined, and the first scale feature is convolved to obtain the first scale feature corresponding to the residual module. Modulation parameters. Because the general residual module is composed of at least two residual blocks, multiple different convolution processes are performed on the first scale feature in parallel to obtain the first modulation parameters ( ⁇ and ⁇ , ⁇ represents the weight, ⁇ represents the deviation), and the dimension of each first modulation parameter is consistent with the dimension of the feature f to be modulated in the corresponding residual block.
  • Each residual block has multiple layers, and each layer consists of a convolution layer, a spatially adaptive normalization (SPADE, Spatially-Adaptive Normalization) layer, and an activation layer.
  • the feature f to be modulated is the feature obtained after the input feature is convolved by the convolution layer in each residual block. For example, when there are 6 residual blocks in the coloring network, 6 different convolution processes are performed on the first scale feature in parallel through different convolutional neural networks, and 6 first scales corresponding to the 6 residual blocks are obtained. Modulation parameters: ( ⁇ 1, ⁇ 1), ( ⁇ 2, ⁇ 2), ( ⁇ 3, ⁇ 3), ( ⁇ 4, ⁇ 4), ( ⁇ 5, ⁇ 5), ( ⁇ 6, ⁇ 6).
  • the SPADE layer and the BN layer type are also used for regularization and are modulated with the learned modulation parameters.
  • the SPADE layer is a conditional regular layer, that is, its modulation parameters are obtained externally; and the modulation parameters in the SPADE layer are tensors, not vectors in the BN layer.
  • the SPADE layer can better preserve the semantic information, so that the colorization network can generate the first colorized image with real texture.
  • the first image feature is modulated and colored by using the first modulation parameter to obtain the second image feature.
  • Convolution processing is performed on the first image feature through the convolution layer in the residual block to obtain a corresponding convolution result.
  • the obtained convolution result is linearly transformed by the first modulation parameter.
  • the formula for linear transformation is shown in formula (1):
  • f' is the feature obtained by modulating the feature to be modulated by the first modulation parameter, and is also the result of linear transformation.
  • the first image feature is modulated and colored by using the first modulation parameter, so that the process of modulating and coloring has a reference and basis, thereby improving the image processing accuracy and image processing efficiency of the electronic device.
  • the first modulation parameter is obtained, so that semantic information can be better preserved, thereby improving the image processing accuracy of electronic equipment and Image processing efficiency.
  • the first linear transformation result is mapped to a high-dimensional nonlinear interval, and finally, the mapped first linear transformation result and the first image feature are added, and the obtained summation result is used as the first Two image features.
  • the residual block is an identity mapping
  • the first linear transformation result after mapping is directly added to the first image feature
  • the residual block is a non-identity mapping
  • the first image feature is enlarged/ The reduced and mapped first linear transformation results are added.
  • the summation processing result of the previous residual block is the input of the latter residual block, and the summation processing result of the last residual block is taken as the second image feature.
  • the image information can be retained as much as possible in the second image feature, thereby improving the image processing accuracy and image processing efficiency of the electronic device.
  • step 105 an up-sampling process is performed on the second image feature based on the second color prior information to obtain a first colored image aligned with the image to be colored.
  • the second modulation parameter is determined based on the multi-scale features in the second color prior that are aligned with the image to be colored. That is, among the multi-scale features aligned with the image to be colored, the second scale feature corresponding to the upsampling module in the coloring network is determined. For example, when the upsampling module includes two upsampling layers, there are second scale features corresponding to the two upsampling layers respectively in the multi-scale features. Convolution processing is performed on the second scale feature through a convolutional neural network to obtain a second modulation parameter corresponding to the upsampling module.
  • the linear transformation result corresponding to the previous upsampling layer is the input of the latter upsampling layer, and the linear transformation result of the last upsampling layer is the predicted color image.
  • the first colored image can be accurately generated, thereby improving the image processing accuracy and image processing efficiency of the electronic device.
  • the second modulation parameter is obtained, so that semantic information can be better preserved, thereby improving the image processing accuracy of electronic equipment and Image processing efficiency.
  • a conversion process may be performed on the encoded vector to obtain a conversion vector.
  • one can control the modification of the encoding vector by: adding noise vectors to the encoding vector; or changing the class of its input when training a pretrained GAN; or finding directions related to color changes through unsupervised learning, and then following these Direction change encoding vector.
  • third color prior information aligned with the to-be-colored image is determined based on the conversion vector. That is, the conversion vector is used as the input vector of the pre-training GAN to obtain the third color prior information (ie, the intermediate layer feature of the pre-training GAN) in the process of generating the corresponding colored image by the pre-training GAN.
  • the to-be-colored image is modulated and colored based on the third color prior information to obtain a third to-color image aligned with the to-be-colored image.
  • the modulation and coloring process is similar to that described above, and will not be repeated here.
  • the third colored image includes at least one of the following: an image to be colored in a background, an image to be colored in a foreground, and an image to be colored with adjusted saturation.
  • the embodiments of the present application can not only automatically generate colored images with vivid colors and are highly aligned with the original image, but also can modify the coding vector by controlling to generate colored images with different coloring effects, so as to realize diversified coloring, thereby improving the performance of coloring.
  • Image processing accuracy and image processing efficiency of electronic devices. can not only automatically generate colored images with vivid colors and are highly aligned with the original image, but also can modify the coding vector by controlling to generate colored images with different coloring effects, so as to realize diversified coloring, thereby improving the performance of coloring.
  • the pre-trained GAN is pre-trained and its parameters are fixed.
  • the error between the image features of the colored image generated by the generator of the pre-trained GAN and the image features of the actual color image corresponding to the image to be colored is determined, and the error is back-propagated in the encoder as Update the parameters of the encoder.
  • the total loss function is determined based on the adversarial loss function, perceptual loss function, domain alignment loss function, and context loss function corresponding to the coloring network.
  • the adversarial loss function is used to make the first shaded image generated by the shader network more realistic
  • the perceptual loss function is used to make the first shaded image generated by the shader network feel more realistic and reasonable
  • the domain alignment loss function is used to The image to be colored and the second colored image are mapped to the same feature space
  • the context loss function is used to measure the similarity between the two unaligned images (the first colored image and the second colored image).
  • the to-be-colored image sample is processed by the coloring system 10 to obtain a first colored image sample aligned with the to-be-colored image sample, a second colored image sample not aligned with the to-be-colored image sample, and a predicted color image sample.
  • the first shaded image sample and the second shaded image sample are both image samples in the RGB color mode
  • the predicted color image samples are image samples in the LAB color mode
  • the predicted color image samples in the LAB color mode are Converting, you can get an image sample in RGB color mode.
  • the first colored image sample is obtained by performing color mode conversion on the predicted color image sample.
  • the adversarial loss value is determined based on the error between the predicted color image sample and the corresponding first actual color image
  • the perceptual loss value is determined based on the error between the second colored image sample and the corresponding second actual color image
  • the perceptual loss value is determined based on the error between the second colored image sample and the corresponding second actual color image.
  • the error between the shaded image sample and the second shaded image sample determines the domain alignment loss value
  • the context loss value is determined based on the error between the first shaded image sample and the second shaded image sample.
  • the first actual color image is an actual color image sample of the LAB color mode corresponding to the image sample to be colored
  • the predicted color image sample is obtained by predicting the missing two color channels of the image sample to be colored
  • the second color image sample is the color image sample of the predicted RGB color mode
  • the second actual color image is the color image sample of the actual RGB color mode corresponding to the image sample to be colored.
  • the first actual color image is converted into a color mode to obtain a second actual color image.
  • the adversarial loss value, perceptual loss value, domain alignment loss value, and context loss value are weighted and summed to obtain the total loss value.
  • the total loss value is back-propagated in the shading network based on the total loss function to update the parameters of the shading network.
  • the accuracy of coloring processing of the coloring network can be improved, thereby improving the image processing accuracy and image processing efficiency of the electronic device.
  • the embodiment of the present application determines the second color prior information that is aligned with the image to be colored, and performs modulation coloring processing and upsampling on the first image feature corresponding to the image to be colored based on the second color prior information. processing to obtain a first colored image. Because the second color prior information is aligned with the image to be colored, the first colored image generated based on the second color prior information is aligned with the image to be colored. In this way, the accurate image to be colored is achieved. Color the ground, thereby improving the image processing accuracy and image processing efficiency of the electronic device.
  • the terminal in response to a user's coloring operation on a grayscale video file, the terminal sends a coloring request carrying the grayscale video file to the cloud server.
  • the cloud server decodes the grayscale video file to obtain multiple video frames, each of which is an image to be colored. After that, colorize a plurality of video frames (images to be colored) to obtain a plurality of first colored images.
  • a plurality of first colored images are encoded to obtain a new video file in color, and the new video file is sent to the terminal to present the new video in the terminal.
  • an encoder such as a GAN encoder
  • xl grayscale image
  • receive z through the pre-trained GAN to generate a second colored image and and The relevant first color prior information (ie, the intermediate layer feature F prior ). Since the first color prior information is related to xl , rather than fully aligned with xl (as shown in Figure 4 The position of the chicken tail in x l is not consistent with the position of the chicken tail in x l), so you need to pass x l and Determine the positional correspondence between the two.
  • the similarity matrix M between the two, M represents the position similarity between the two pixel points, and M is used to align the first color prior information with x l .
  • the second color prior information is obtained, and some parameters in the coloring network are controlled by the second color prior information, so as to achieve the purpose of using the color prior information to guide the coloring.
  • the final shader network outputs the first shaded image based on the image to be shaded
  • the color prior information related to xl needs to be found in the pre-trained GAN.
  • an encoder that receives xl and outputs z is introduced, which is a neural network. After the z corresponding to x l is determined by the encoder, the pre-trained GAN receives z and outputs the z that has as much similar content as x l as possible
  • the multi-scale feature F prior formed by the features of multiple layers in the middle of the pre-training GAN is the first color prior information most related to x l .
  • the actual color image x rgb corresponding to the constraint x l and The features of the two in the pre-trained GAN discriminator are as close as possible.
  • the feature f to be modulated represents the image feature obtained by convolution in the residual block of the coloring network and the image feature obtained by convolution in the upsampling layer, and f' is the modulated feature. After the feature f to be modulated is modulated, it enters the next layer for processing, and finally, the coloring network generates a first coloring image that is aligned with the image to be colored.
  • the pre-trained GAN can be a BigGAN (and also a StyleGAN), which is pre-trained on the ImageNet dataset.
  • the whole training is divided into two stages: the first stage trains the encoder; the second stage trains the entire model (except for the pre-trained GAN and the encoder, because in the second stage, both are trained and have fixed parameters ), the loss functions used in the second stage include adversarial loss function, perceptual loss function, domain alignment loss function, and context loss function.
  • different color priors may be used to guide the coloring.
  • the first color prior information can be changed by changing z, such as adding a noise vector to the encoding vector, or changing the class of the input when training BigGAN (when the pre-trained GAN is BigGAN), or by unsupervised learning to find and color change
  • the relative directions, and then changing z along those directions, can make the final shaded image produce different shades.
  • FIG. 7 is a schematic diagram of the coloring effect provided by the embodiment of the present application.
  • the first row in FIG. 7 is the input image to be colored, and the second row is the artificial intelligence-based image proposed by the embodiment of the present application.
  • the colored image (result) obtained by the image coloring method.
  • the third line is to input a grayscale image including birds by changing the category of the bird, and color the grayscale image with different colors to obtain diverse results. .
  • FIG. 8 is a schematic diagram of a coloring effect provided by an embodiment of the present application, and FIG. 8 shows an image in which z is changed along some directions to generate a variety of coloring effects.
  • the directions shown in Figure 8 are related to the background color, related to the color of the foreground (eg vase, truck), and related to the color saturation first.
  • the first row (the first image is the image to be colored) is the different images obtained after coloring the background in the image to be colored
  • the second and third rows are the images to be colored
  • the different images obtained after the foreground is colored in the colored image.
  • the fourth row to the sixth row are the different images obtained after the saturation of the image to be colored is adjusted.
  • the embodiment of the present application guides coloring through color prior information, can automatically and conveniently generate high-quality coloring images with vivid colors, and can also control and modify color prior information to obtain different coloring effects. , to achieve diversified coloring, thereby improving the image processing accuracy and image processing efficiency of electronic equipment.
  • the artificial intelligence-based images stored in the memory 450 may include: an obtaining module 4551, configured to obtain the first color prior information of the image to be colored; The second color prior information of the colorized image alignment; the processing module 4553 is configured to downsample the image to be colored to obtain the first image feature; and is configured to modulate and colorize the first image feature based on the second color prior information processing to obtain a second image feature; and configured to perform an upsampling process on the second image feature based on the second color prior information to obtain a first colored image, wherein the first colored image is aligned with the to-be-colored image.
  • the obtaining module 4551 is further configured to obtain an encoding vector of the image to be colored; perform identity mapping processing on the encoding vector to obtain a second colored image, wherein the second colored image is not related to the The to-be-colored images are aligned; the multi-scale features are used as the first color prior information, wherein the multi-scale features are obtained in the process of obtaining the second color-coded image through the identity mapping.
  • the transformation module 4552 is further configured to determine a similarity matrix between the image to be colored and the second colored image, wherein the second colored image is obtained by colorizing the image to be colored, and the second colored image is not aligned with the image to be colored; affine transformation is performed on the first color prior information based on the similarity matrix to obtain a multi-scale feature aligned with the image to be colored, wherein the first color
  • the priori information includes multi-scale features obtained in the process of colorizing the to-be-colored image; the multi-scale features aligned with the to-be-colored image are used as the second color prior information.
  • the transformation module 4552 is further configured to obtain a first position feature of the image to be colored and a second position feature of the second image to be colored; wherein the first position feature includes each pixel in the image to be colored The position feature of the point, and the second position feature includes the position feature of each pixel in the second colored image; based on the first position feature and the second position feature, determine the similarity between the to-be-colored image and the second colored image degree matrix; wherein, the similarity matrix includes the similarity between each pixel in the image to be colored and each pixel in the second colored image.
  • the transformation module 4552 is further configured to perform non-local processing on the first position feature and the second position feature to obtain a similarity matrix corresponding to the non-local processing;
  • the normalization process is performed to obtain a similarity matrix between the image to be colored and the second colored image.
  • the processing module 4553 is further configured to determine a first modulation parameter based on the multi-scale features aligned with the to-be-colored image in the second color prior information; modulate the first image feature by using the first modulation parameter The coloring process is performed to obtain the second image feature.
  • the modulation shading process is implemented by a shading network
  • the shading network includes a residual module
  • the processing module 4553 is further configured to determine, among the multi-scale features aligned with the image to be shaded, the The first scale feature corresponding to the residual module in the network; convolution processing is performed on the first scale feature to obtain the first modulation parameter corresponding to the residual module.
  • the processing module 4553 is further configured to perform a convolution process on the first image feature to obtain a convolution result; perform a first linear transformation process on the convolution result by using the first modulation parameter to obtain a first linear transformation result ; Perform summation processing on the first linear transformation result and the first image feature, and use the obtained summation processing result as the second image feature.
  • the processing module 4553 is further configured to determine the second modulation parameter based on the multi-scale features aligned with the to-be-colored image in the second color prior information; perform deconvolution processing on the second image features to obtain Deconvolution result; perform second linear transformation processing on the deconvolution processing result through the second modulation parameter to obtain a second linear transformation result; perform activation processing on the second linear transformation result to obtain a predicted color aligned with the image to be colored image; perform color mode conversion processing on the predicted color image to obtain a first colored image.
  • the modulation shading process is implemented by a shading network
  • the shading network includes an upsampling module
  • the processing module 4553 is further configured to determine, among the multi-scale features aligned with the image to be shaded, the The second scale feature corresponding to the upsampling module in the network; convolution processing is performed on the second scale feature to obtain the second modulation parameter corresponding to the upsampling module.
  • the processing module 4553 is further configured to perform conversion processing on the coding vector to obtain a conversion vector; determine third color prior information aligned with the image to be colored based on the conversion vector; treat the image based on the third color prior information
  • the coloring image is subjected to modulation coloring processing to obtain a third coloring image aligned with the to-be-colored image; wherein, the third toning image includes at least one of the following images in the to-be-colored image after the background is colored, the to-be-colored image The image with the foreground colorized in the image, the image after the saturation adjustment of the image to be colored.
  • down-sampling, modulation shading and up-sampling are implemented by a shading network;
  • the artificial intelligence-based image shading apparatus further includes a training module 4554 configured to train the shading network by:
  • the adversarial loss function, perceptual loss function, domain alignment loss function, and context loss function corresponding to the coloring network determine the total loss function;
  • the coloring network is called to colorize the to-be-colored image sample to obtain the first colorized image sample, The second colored image sample and the predicted color image sample; wherein, the first colored image sample is obtained by converting the predicted color image sample, and is aligned with the to-be-colored image sample, and the second colored image sample is obtained by converting the predicted color image sample.
  • the color image sample is not aligned with the to-be-colored image sample; the adversarial loss value is determined based on the error between the predicted color image sample and the first actual color image corresponding to the predicted color image sample, based on the second upper
  • the error between the color image sample and the second actual color image corresponding to the second color image sample determines the perceptual loss value, determined based on the error between the to-be-colored image sample and the second color image sample a domain alignment loss value, and a context loss value is determined based on the error between the first shaded image sample and the second shaded image sample; wherein the second actual color image is obtained by converting the first actual color image; Weighted summation of adversarial loss value, perceptual loss value, domain alignment loss value, and context loss value to get the total loss value; back-propagating the total loss value in the colorization network based on the total loss function to update the parameters of the colorization network .
  • the embodiments of the present application provide a computer-readable storage medium storing executable instructions, wherein the executable instructions are stored, and when the executable instructions are executed by a processor, the processor will cause the processor to execute the artificial intelligence-based artificial intelligence provided by the embodiments of the present application.
  • the image coloring method of for example, the artificial intelligence-based image coloring method shown in Figure 5.
  • the storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; it may also be various devices including one or any combination of the above-mentioned memories .
  • executable instructions may take the form of programs, software, software modules, scripts, or code, written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and which Deployment may be in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • executable instructions may, but do not necessarily correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, a Hyper Text Markup Language (HTML, Hyper Text Markup Language) document
  • HTML Hyper Text Markup Language
  • One or more scripts in stored in a single file dedicated to the program in question, or in multiple cooperating files (eg, files that store one or more modules, subroutines, or code sections).
  • executable instructions may be deployed to be executed on one computing device, or on multiple computing devices located at one site, or alternatively, distributed across multiple sites and interconnected by a communication network execute on.
  • the embodiments of the present application determine the second color prior information aligned with the image to be colored, and perform modulation coloring processing and coloring on the first image feature corresponding to the image to be colored based on the second color prior information. Sampling processing to obtain a first colored image. Because the second color prior information is aligned with the image to be colored, the first colored image generated based on the second color prior information is also aligned with the image to be colored. In this way, the automatic image to be colored is realized. Precisely coloured.
  • coloring images with different coloring effects can be generated by controlling and modifying the color prior information, so as to realize diversified coloring.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Editing Of Facsimile Originals (AREA)

Abstract

本申请提供了一种基于人工智能的图像上色方法、装置、电子设备、计算机可读存储介质及计算机程序产品;方法包括:获取待上色图像的第一色彩先验信息;对第一色彩先验信息进行变换处理,得到与待上色图像对齐的第二色彩先验信息;下采样待上色图像,得到第一图像特征;基于第二色彩先验信息对第一图像特征进行调制上色处理,得到第二图像特征;基于第二色彩先验信息对第二图像特征进行上采样处理,得到第一上色图像,其中,所述第一上色图像与待上色图像对齐的。

Description

基于人工智能的图像上色方法、装置、电子设备、计算机可读存储介质及计算机程序产品
相关申请的交叉引用
本申请实施例基于申请号为202110075873.9、申请日为2021年01月20日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请实施例作为参考。
技术领域
本申请涉及图像处理技术,尤其涉及一种基于人工智能的图像上色方法、装置、电子设备、计算机可读存储介质及计算机程序产品。
背景技术
人工智能(AI,Artificial Intelligence)是计算机科学的一个综合技术,通过研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能技术是一门综合学科,涉及领域广泛,例如自然语言处理技术以及机器学习/深度学习等几大方向,随着技术的发展,人工智能技术将在更多的领域得到应用,并发挥越来越重要的价值。
图像处理是人工智能的重要应用,典型地,可以基于灰度图像生成对应的上色图像。然而,相关技术在生成上色图像的过程中,容易出现颜色渗色和颜色褪色等问题,极大地影响了生成的上色图像的质量,不得不再次对图像的色彩进行修复,从而降低电子设备的图像处理效率。
发明内容
本申请实施例提供一种基于人工智能的图像上色方法、装置、电子设备、计算机可读存储介质及计算机程序产品,能够精确地对待上色图像进行上色,从而提高电子设备的图像处理准确度以及图像处理效率。
本申请实施例的技术方案是这样实现的:
本申请实施例提供一种基于人工智能的图像上色方法,所述方法由电子设备执行,所述方法包括:
获取待上色图像的第一色彩先验信息;
对所述第一色彩先验信息进行变换处理,得到与所述待上色图像对齐的第二色彩先验信息;
下采样所述待上色图像,得到第一图像特征;
基于所述第二色彩先验信息对所述第一图像特征进行调制上色处理,得到第二图像特征;
基于所述第二色彩先验信息对所述第二图像特征进行上采样处理,得到第一上色图像,其中,所述第一上色图像与所述待上色图像对齐。
本申请实施例提供一种基于人工智能的图像上色装置,包括:
获取模块,配置为获取待上色图像的第一色彩先验信息;
变换模块,配置为对所述第一色彩先验信息进行变换处理,得到与所述待上色图像对齐的第二色彩先验信息;
处理模块,配置为下采样所述待上色图像,得到第一图像特征;以及配置为基于所述第二色彩先验信息对所述第一图像特征进行调制上色处理,得到第二图像特征;以及配置为基于所述第二色彩先验信息对所述第二图像特征进行上采样处理,得到第一上色图像,其中,所述第一上色图像与所述待上色图像对齐。
本申请实施例提供一种电子设备,包括:
存储器,用于存储可执行指令;
处理器,用于执行所述存储器中存储的可执行指令时,实现本申请实施例提供的基于人工智能的图像上色方法。
本申请实施例提供一种计算机可读存储介质,存储有可执行指令,用于被处理器执行时,实现本申请实施例提供的基于人工智能的图像上色方法。
本申请实施例提供一种计算机程序产品,包括计算机程序或指令,用于被处理器执行时,实现本申请实施例提供的基于人工智能的图像上色方法。
本申请实施例具有以下有益效果:
确定与待上色图像对齐的第二色彩先验信息,并基于第二色彩先验信息对待上色图像对应的第一图像特征进行调制上色处理和上采样处理,从而得到第一上色图像。因为第二色彩先验信息是与待上色图像对齐的,所以,基于第二色彩先验信息生成的第一上色图像是与待上色图像对齐的,如此,实现了对待上色图像精确地上色,不需要再次对图像的色彩进行修复,从而提高电子设备的图像处理准确度以及图像处理效率。
附图说明
图1是本申请实施例提供的基于人工智能的上色系统10的架构示意图;
图2是本申请实施例提供的终端400的结构示意图;
图3是本申请实施例提供的上色系统10的组成结构示意图;
图4是本申请实施例提供的图像上色的示意图;
图5是本申请实施例提供的基于人工智能的图像上色方法的流程示意图;
图6是本申请实施例提供的基于人工智能的图像上色方法的流程示意图;
图7是本申请实施例提供的上色效果的示意图;
图8是本申请实施例提供的上色效果的示意图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申 请作进一步地详细描述,所描述的实施例不应视为对本申请的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
在以下的描述中,所涉及的术语“第一/第二/第三”仅仅是是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一/第二/第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
对本申请实施例进行进一步详细说明之前,对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释。
1)色彩先验信息:在对图像进行处理之前,可以获知的颜色相关的经验和历史资料,例如可以采用特征图的形式表达。例如,当生成对抗网络能生成色彩丰富的图像时,认为该生成对抗网络中包含了足够多的色彩先验信息,色彩先验信息可以是包括生成对抗网络的中间层的特征的特征图。
2)仿射变换:是一种二维向量到二维向量之间的线性变换。仿射变换可以通过一系列的原子变换的复合来实现,原子变换如平移、缩放、翻转、旋转和剪切。
3)生成对抗网络(GAN,Generative Adversarial Networks):是一种深度学习模型,包括生成器和判别器。生成器和判别器互相博弈学习,从而产生相当好的输出。其中,判别器基于输入变量进行分类预测,生成器通过给定的某种隐含信息来随机产生观测数据。
4)前景:是镜头中位于主体前面或靠近前沿的人或物。
5)下采样:指对特征图进行进一步压缩,通过最大池化或者平均池化从而减少了特征,实际上是过滤作用小、信息冗余的特征,保留关键信息。
图像上色,即给灰度图像染上颜色。相关技术基于深度学习对图像上色,这种方法可以分为两种,一种是全自动上色,一种是根据参考图像来上色。全自动上色的优势在于简单方便,只需要设计好损失函数即可端到端的进行训练及测试,但是容易生成出有瑕疵的上色图像,例如颜色渗色和颜色褪色的上色图像。根据参考图像来上色首先需要提供一张和待上色图像有相似内容的彩色的参考图像,然后根据两张图像的匹配情况将参考图像的颜色转移到待上色图像上。根据参考图像来上色的上色效果很大程度上取决于参考图像的质量,如果两张图像有着相似内容,上色效果会很好,但是如果两张图像不相似,上色效果就会不好。所以根据参考图像来上色需要耗费大量精力去挑选参考图像。并且两种方法都难以做到多样化上色。
对于以上技术问题,本申请实施例提供一种基于人工智能的图像上色方法、 装置、电子设备、计算机可读存储介质及计算机程序产品,能对待上色图像精确地多样化上色,从而提高电子设备的图像处理准确度以及图像处理效率。
下面说明本申请实施例提供的基于人工智能的图像上色方法的示例性应用,本申请实施例提供的基于人工智能的图像上色方法可以由各种电子设备实施,例如,可以由终端单独实施,也可以由服务器和终端协同实施。例如终端独自执行下文所述的基于人工智能的图像上色方法,或者,由终端和服务器执行下文所述的基于人工智能的图像上色方法,例如终端向服务器发送待上色图像,服务器根据接收的待上色图像执行基于人工智能的图像上色方法。
本申请实施例提供的用于图像上色的电子设备可以是各种类型的终端设备或服务器,其中,服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器;终端可以是智能手机、平板电脑、笔记本电脑、台式计算机等,但并不局限于此。终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请对此不做限制。
以服务器为例,例如可以是部署在云端的服务器集群,向用户开放人工智能云服务(AI as a Service,AIaaS),AIaaS平台会把几类常见的AI服务进行拆分,并在云端提供独立或者打包的服务,这种服务模式类似于一个AI主题商城,所有的用户都可以通过应用程序编程接口的方式来接入使用AIaaS平台提供的一种或者多种人工智能服务。
例如,其中的一种人工智能云服务可以为图像上色服务,即云端的服务器封装有本申请实施例提供的图像上色程序。终端响应于图像上色触发操作,向云端的服务器发送携带待上色图像的图像上色请求,云端的服务器调用封装的图像上色程序,基于待上色图像生成第一上色图像,并将第一上色图像返回给终端,以使终端显示第一上色图像。
在一些实施例中,以服务器和终端协同实施本申请实施例提供的基于人工智能的图像上色方法为例进行说明一个示例性的上色系统。参见图1,图1是本申请实施例提供的基于人工智能的上色系统10的架构示意图。终端400通过网络300连接服务器200,网络300可以是广域网或者局域网,又或者是二者的组合。
终端运行电子相册,终端400接收用户针对照片(待上色图像)的修复操作,即需要对照片进行上色,服务器200接收来自终端400的图像上色请求,该图像上色请求中携带待上色图像。服务器200响应于图像上色请求,获取待上色图像的第一色彩先验信息,并对第一色彩先验信息进行变换,得到与待上色图像对齐的第二色彩先验信息,通过第二色彩先验信息对待上色图像上色,得到与待上色图像对齐的第一上色图像,将第一上色图像发送给终端400,以在终端400中显示第一上色图像,本申请实施例提供的图像上色方法还可以对视频文件中的视频帧进行上色,从而实现视频修复。
在一些实施例中,以本申请实施例提供的电子设备为终端为例,终端通过运行计算机程序来实现本申请实施例提供的基于人工智能的图像上色方法,计 算机程序可以是操作系统中的原生程序或软件模块;可以是本地(Native)应用程序(APP,Application),即需要在操作系统中安装才能运行的基于人工智能的图像上色程序;也可以是小程序,即只需要下载到任意客户端的浏览器环境中就可以运行的基于人工智能的图像上色小程序。总而言之,上述计算机程序可以是任意可以是任意形式的应用程序、模块或插件。
下面以本申请实施例提供的电子设备为上文所述的终端400为例进行说明。参见图2,图2是本申请实施例提供的终端400的结构示意图,图2所示的终端400包括:至少一个处理器410、存储器450、至少一个网络接口420和用户接口430。终端400中的各个组件通过总线系统440耦合在一起。可理解,总线系统440用于实现这些组件之间的连接通信。总线系统440除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图3将各种总线都标为总线系统440。
处理器410可以是一种集成电路芯片,具有信号的处理能力,例如通用处理器、数字信号处理器(DSP,Digital Signal Processor),或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,其中,通用处理器可以是微处理器或者任何常规的处理器等。
用户接口430包括使得能够呈现媒体内容的一个或多个输出装置431,包括一个或多个扬声器和/或一个或多个视觉显示屏。用户接口430还包括一个或多个输入装置432,包括有助于用户输入的用户接口部件,比如键盘、鼠标、麦克风、触屏显示屏、摄像头、其他输入按钮和控件。
存储器450可以是可移除的,不可移除的或其组合。示例性的硬件设备包括固态存储器,硬盘驱动器,光盘驱动器等。存储器450可选地包括在物理位置上远离处理器410的一个或多个存储设备。
存储器450包括易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。非易失性存储器可以是只读存储器(ROM,Read Only Memory),易失性存储器可以是随机存取存储器(RAM,Random Access Memory)。本申请实施例描述的存储器450旨在包括任意适合类型的存储器。
在一些实施例中,存储器450能够存储数据以支持各种操作,这些数据的示例包括程序、模块和数据结构或者其子集或超集,下面示例性说明。
操作系统451,包括用于处理各种基本系统服务和执行硬件相关任务的系统程序,例如框架层、核心库层、驱动层等,用于实现各种基础业务以及处理基于硬件的任务;
网络通信模块452,用于经由一个或多个(有线或无线)网络接口420到达其他计算设备,示例性的网络接口420包括:蓝牙、无线相容性认证(WiFi)、和通用串行总线(USB,Universal Serial Bus)等;
呈现模块453,用于经由一个或多个与用户接口430相关联的输出装置431(例如,显示屏、扬声器等)使得能够呈现信息(例如,用于操作外围设备和显示内容和信息的用户接口);
输入处理模块454,用于对一个或多个来自一个或多个输入装置432之一 的一个或多个用户输入或互动进行检测以及翻译所检测的输入或互动。
在一些实施例中,本申请实施例提供的基于人工智能的图像上色装置可以采用软件方式实现,图2示出了存储在存储器450中的基于人工智能的图像上色装置455,其可以是程序和插件等形式的软件,包括以下软件模块:获取模块4551、变换模块4552、处理模块4553和训练模块4554,这些模块是逻辑上的,因此根据所实现的功能可以进行任意的组合或进一步拆分。将在下文中说明各个模块的功能。
参见图3,图3是本申请实施例提供的上色系统10的组成结构示意图。上色系统10包括编码器、预训练GAN、变换网络和上色网络。其中,编码器用于获取待上色图像的编码向量,编码器可以是生成对抗网络中的生成器,也可以是自动编码器中的编码器部分,还可以是卷积神经网络。预训练GAN是训练好的GAN的生成器,用于生成第二上色图像以及待上色图像的第一色彩先验信息。变换网络用于基于待上色图像和第二上色图像对第一色彩先验信息进行变换,得到第二色彩先验信息。上色网络用于基于待上色图像和第二色彩先验信息生成第一上色图像。
参见图4,图4是本申请实施例提供的图像上色的示意图。如图4所示,上色网络包括下采样模块、残差模块和上采样模块。下采样模块由多个下采样层构成,配置为下采样待上色图像,得到第一图像特征;残差模块由多个残差块构成,配置为基于第二色彩先验信息对第一图像特征进行调制上色处理,得到第二图像特征;上采样模块由多个上采样层构成,配置为基于第二色彩先验信息对第二图像特征进行上采样处理,得到与待上色图像对齐的第一上色图像。
下面结合上文所述的上色系统10中的各个组成部分,说明本申请实施例提供的基于人工智能的图像上色方法,下述方法的执行主体可以为终端或者服务器,具体可以是终端或者服务器通过运行上文的各种计算机程序来实现的;当然,根据对下文的理解,不难看出也可以由终端和服务器协同实施本申请实施例提供的基于人工智能的图像上色方法。
参见图5,图5是本申请实施例提供的基于人工智能的图像上色方法的流程示意图,将结合图5示出的步骤、图3示出的上色系统的各部分以及图4进行说明。
在步骤101中,获取待上色图像的第一色彩先验信息。
在一些实施例中,待上色图像是LAB色彩模式的灰度图像,即该灰度图像只有亮度通道(L)而缺失了颜色通道(A和B)。若待上色图像为RGB色彩模式,则需要先将其转换为LAB色彩模式。第一色彩先验信息是与待上色图像相关的色彩先验信息,例如,预训练GAN中与待上色图像相关的色彩先验信息,即GAN的中间层特征。
在一些实施例中,如图4所示,可以先通过编码器获取待上色图像的编码向量。其中,编码器可以替换为其他卷积神经网络。然后,通过预训练GAN对待上色图像进行上色处理,得到第二上色图像。预训练GAN可以是训练好的BigGAN,也可以是训练好的StyleGAN。以BigGAN为例说明,BigGAN的 生成器中包括多个残差块,编码向量经过线性变换后送入第一个残差块中,每个残差块都包括批量归一化(BN,Batch Normalization)层、激活层和卷积层。每个残差块都通过1×1卷积进行跳跃连接,从而实现对编码向量的恒等映射。恒等映射可将前一层的输出(也是后一层的输入)直接传递到后一层的输出,使后一层的输出近似于其输入,以保持在后面的层次中不会造成精度下降。最终,BigGAN生成第二上色图像。其中,残差块也可都通过非1×1卷积进行跳跃连接。
在生成第二上色图像的过程中,每个残差块的输出特征所对应的特征图的大小不一样,即输出特征的尺度不一样。将不同残差块的输出特征(多尺度特征)合并,得到第一色彩先验信息。
通过对编码向量进行恒等映射处理,得到第二上色图像特征,并将该过程中得到的多尺度特征作为第一色彩先验信息,恒等映射处理可以有效提高多尺度特征的输出准确度,即提升第一色彩先验信息的获取准确度,从而精确地对待上色图像进行上色,从而提高电子设备的图像处理准确度以及图像处理效率。
在步骤102中,对第一色彩先验信息进行变换处理,得到与待上色图像对齐的第二色彩先验信息。
在一些实施例中,对齐是指同一个部分(对应一个或多个像素)在不同图像中的位置是一致的。例如,构成鸡尾的多个像素在不同图像中位置一致。色彩先验信息与待上色图像对齐,其实质是同一对象在二者中的位置是一致的。由于色彩先验信息是采用特征图的形式表达,因此,色彩先验信息与待上色图像的对齐是指,同一对象在待上色图像、色彩先验信息中的位置是一致的。然而,第二上色图像与待上色图像的背景部分以及前景部分在图中的位置并不是一一对应的,即两个图像中对应的像素没有实现位置对齐。如图4中,第二上色图像中的鸡尾所在位置与待上色图像中鸡尾所在位置明显不一致。相应地,第一色彩先验信息中的多尺度特征与待上色图像对应的图像特征也不是一一对应的,存在偏差。因此,还需要对第一色彩先验信息进行变换,以得到与待上色图像对齐的第二色彩先验信息,即与待上色图像对应的图像特征对齐的色彩先验信息,此时,第二色彩先验信息所对应的上色图像与待上色图像是对齐的。
在一些实施例中,步骤102中对第一色彩先验信息进行变换处理,得到与待上色图像对齐的第二色彩先验信息是通过上色系统10中的变换网络实现的,其实现过程如图6的步骤1021至步骤1023所示。
在步骤1021中,确定待上色图像与第二上色图像之间的相似度矩阵,第二上色图像是对待上色图像进行上色处理得到的,且未与待上色图像对齐。
如图4所示,可通过特征提取器分别提取待上色图像的第一位置特征与第二上色图像的第二位置特征。其中,第一位置特征包括待上色图像中每个像素点的位置特征,第二位置特征包括第二上色图像中每个像素点的位置特征。然后,对第一位置特征和第二位置特征进行非局部(non-local)处理,得到待上色图像与第二上色图像之间的相似度矩阵。其中,相似度矩阵包括待上色图像中每个像素点与第二上色图像中每个像素点之间的相似度。非局部处理用于通过计算得到待上色图像中一个像素点与第二上色图像中任意一个像素点之间的 相似度,计算方法包括点积、拼接和双线性相似度度量等。当用点积计算相似度时,通过计算待上色图像与第二上色图像对应位置的位置向量(位置特征)的点积,可得到两个位置的相似度。当通过拼接来计算相似度时,将两个图像中对应位置的位置向量拼接后送入感知器可预测得到二者的相似度。最后,可通过softmax函数对相似度矩阵进行归一化处理,使相似度矩阵中每一行的元素之和为1。将得到的归一化后的相似度矩阵,作为待上色图像与第二上色图像之间的相似度矩阵。
基于像素的位置特征确定待上色图像和未与待上色图像对齐的第二上色图像之间的相似度矩阵,可以提高相似度矩阵的准确度,从而提高电子设备的图像处理准确度以及图像处理效率。
通过非局部处理以及归一化处理,能够引入全局信息,从而提升相似度矩阵的准确度,从而提高电子设备的图像处理准确度以及图像处理效率。
在步骤1022中,基于相似度矩阵对第一色彩先验信息进行仿射变换处理,得到与待上色图像对齐的多尺度特征。
在一些实施例中,第一色彩先验信息包括对待上色图像进行上色处理的过程中得到的多尺度特征,对第一色彩先验信息中的多尺度特征进行仿射变换,即将相似度矩阵与第一色彩先验信息中的多尺度特征通过矩阵乘,可得到与待上色图像对齐的多尺度特征。
在步骤1023中,将与待上色图像对齐的多尺度特征作为第二色彩先验信息。
可见,基于待上色图像与第二上色图像对应位置处的位置特征的相似度得到待上色图像与第二上色图像间的相似度矩阵,通过相似度矩阵对第一色彩先验信息进行仿射变换,可以得到与待上色图像对齐的第二色彩先验信息,为后续生成与待上色图像对齐的第一上色图像提供了准确度保证,从而提高电子设备的图像处理准确度以及图像处理效率。
在步骤103中,下采样待上色图像,得到第一图像特征。
在一些实施例中,通过上色网络中的下采样模块下采样待上色图像。下采样模块包括多个下采样层,在每个下采样层中对输入特征进行卷积处理,得到对应的图像特征,所得到的图像特征代表待上色图像的位置信息和语义信息等。对得到的图像特征进行池化处理,得到对应的池化结果,并将池化结果作为下一层的输入特征。将最后一个下采样层的输出作为第一图像特征。
在步骤104中,基于第二色彩先验信息对第一图像特征进行调制上色处理,得到第二图像特征。
在一些实施例中,为了实现多尺度的控制,通过与待上色图像对齐的多尺度特征分别控制上色网络的残差模块和上采样模块。与待上色图像对齐的多尺度特征中不同的尺度特征对应上色网络中不同的部分。例如,当上色网络的上采样模块包括两个上采样层时,与待上色图像对齐的多尺度特征中一共有3个尺度的特征,分别与残差模块、第一个上采样层、第二个上采样层对应。
在一些可能的示例中,首先,基于第二色彩先验信息中与待上色图像对齐的多尺度特征,确定第一调制参数。即在与待上色图像对齐的多尺度特征中,确定与上色网络中的残差模块对应的第一尺度特征,对第一尺度特征进行卷积 处理,得到与残差模块对应的第一调制参数。因为一般残差模块由至少两个残差块构成,所以,对第一尺度特征并行进行多次不同的卷积处理,得到与每个残差块分别对应的第一调制参数(α和β,α代表权重,β代表偏差),每个第一调制参数的维度与对应的残差块中的待调制特征f的维度一致。每个残差块有多层,每层由卷积层、空间自适应归一化(SPADE,Spatially-Adaptive Normalization)层和激活层构成。待调制特征f是每个残差块中的卷积层对其输入特征进行卷积处理后得到的特征。例如,当上色网络中有6个残差块时,通过不同的卷积神经网络对第一尺度特征并行进行6次不同的卷积处理,得到分别对应6个残差块的6个第一调制参数:(α1,β1)、(α2,β2)、(α3,β3)、(α4,β4)、(α5,β5)、(α6,β6)。
其中,SPADE层与BN层类型,也是用于正则化,并利用学到的调制参数进行调制。与BN层不同的是,SPADE层为条件正则层,即其调制参数依赖于外部得到;且SPADE层中的调制参数为张量,而不是BN层中的向量。相较于常见的正则层,SPADE层可以更好的保留语义信息,以使上色网络生成具有真实纹理的第一上色图像。
然后,通过第一调制参数对第一图像特征进行调制上色处理,得到第二图像特征。通过残差块中的卷积层对第一图像特征进行卷积处理,得到对应的卷积结果。在SPADE层中,通过第一调制参数对得到的卷积结果进行线性变换。线性变换的公式如公式(1)所示:
f′=f*α+β     (1)
其中,f′是由第一调制参数对待调制特征调制得到的特征,也是线性变换结果。
通过第一调制参数对第一图像特征进行调制上色处理,使得调制上色的过程具有参考和依据,从而提高电子设备的图像处理准确度以及图像处理效率。
通过确定出与残差模块对应的第一尺度特征,并对第一尺度特征进行卷积处理,得到第一调制参数,从而可以更好的保留语义信息,从而提高电子设备的图像处理准确度以及图像处理效率。
在激活层中,将第一线性变换结果映射到高维的非线性区间,最后,将映射后的第一线性变换结果与第一图像特征进行加和处理,将得到的加和处理结果作为第二图像特征。其中,当残差块为恒等映射时,直接对映射后的第一线性变换结果与第一图像特征进行加和处理;当残差块为非恒等映射时,将第一图像特征放大/缩小后与映射后的第一线性变换结果相加。当存在多个残差块时,前一个残差块的加和处理结果是后一个残差块的输入,取最后一个残差块的加和处理结果作为第二图像特征。
通过激活层的处理,可以使得第二图像特征中尽量保留图像信息,从而提高电子设备的图像处理准确度以及图像处理效率。
在步骤105中,基于第二色彩先验信息对第二图像特征进行上采样处理,得到与待上色图像对齐的第一上色图像。
在一些实施例中,首先,基于第二色彩先验信息中与待上色图像对齐的多尺度特征,确定第二调制参数。即在与待上色图像对齐的多尺度特征中,确定 与上色网络中的上采样模块对应的第二尺度特征。例如,当上采样模块包括2个上采样层时,多尺度特征中存在与2个上采样层分别对应的第二尺度特征。通过卷积神经网络对第二尺度特征进行卷积处理,得到与上采样模块对应的第二调制参数。
然后,对第二图像特征进行反卷积处理(即上采样),将反卷积处理结果作为待调制特征与第二调制参数一并代入线性变换的公式(1)中进行线性变换,得到第二线性变换结果(即调制后的特征),对第二线性变换结果进行激活处理,得到待上色图像对应的LAB色彩模式的预测彩色图像。预测彩色图像中不仅有待上色图像中的亮度通道,还有待上色图像丢失的两个颜色通道。对预测彩色图像进行色彩模式转换,可以得到对应的RGB色彩模式的图像,即与待上色图像对齐的第一上色图像。
当存在多个上采样层时,前一个上采样层对应的线性变换结果是后一个上采样层的输入,最后一个上采样层的线性变换结果即为预测彩色图像。
通过反卷积处理、线性变换处理、激活处理以及色彩模式转换处理,可以准确生成第一上色图像,从而提高电子设备的图像处理准确度以及图像处理效率。
通过确定出与上采样模块对应的第二尺度特征,并对第二尺度特征进行卷积处理,得到第二调制参数,从而可以更好的保留语义信息,从而提高电子设备的图像处理准确度以及图像处理效率。
在一些实施例中,为了得到具有多样化上色效果的上色图像,可对编码向量进行转换处理,得到转换向量。例如,可以通过以下方式控制修改编码向量:在编码向量中加入噪音向量;或者在训练预训练GAN时,改变其输入的类别;或者通过无监督学习找到和颜色变化相关的方向,然后沿着这些方向改变编码向量。然后,基于转换向量确定与待上色图像对齐的第三色彩先验信息。即以转换向量作为预训练GAN的输入向量,获取预训练GAN在生成对应的上色图像的过程中的第三色彩先验信息(即预训练GAN的中间层特征)。最后,基于第三色彩先验信息对待上色图像进行调制上色处理,得到与待上色图像对齐的第三上色图像,调制上色处理的过程与前文类似,此处不再赘述。其中,第三上色图像包括以下至少一种:对待上色图像中背景上色后的图像、对待上色图像中前景上色后的图像、对待上色图像的饱和度调整后的图像。
可见,本申请实施例不仅可以自动生成颜色生动且与原图高度对齐的上色图像,还可以通过控制修改编码向量,生成具有不同上色效果的上色图像,实现多样化上色,从而提高电子设备的图像处理准确度以及图像处理效率。。
在一些实施例中,预训练GAN是提前训练好的,其参数是固定的。在训练编码器的过程中,确定预训练GAN的生成器生成的上色图像的图像特征与待上色图像对应的实际彩色图像的图像特征之间的误差,在编码器中反向传播误差以更新编码器的参数。
在训练好编码器之后,对上色网络进行训练。首先,基于上色网络对应的对抗损失函数、感知损失函数、域对齐损失函数和上下文损失函数确定总损失函数。其中,对抗损失函数用于使上色网络生成的第一上色图像更加逼真,感 知损失函数用于使上色网络生成的第一上色图像感觉上更真实合理,域对齐损失函数用于将待上色图像和第二上色图像映射到同一特征空间,上下文损失函数用于衡量两个未对齐图像(第一上色图像与第二上色图像)之间的相似度。
然后,通过上色系统10对待上色图像样本进行处理,得到与待上色图像样本对齐的第一上色图像样本、未与待上色图像样本对齐的第二上色图像样本以及预测彩色图像样本。在一些可能的示例中,第一上色图像样本和第二上色图像样本均为RGB色彩模式的图像样本,预测彩色图像样本是LAB色彩模式的图像样本,对LAB色彩模式的预测彩色图像样本进行转换,可以得到RGB色彩模式的图像样本。第一上色图像样本是对预测彩色图像样本进行色彩模式的转换得到的。
之后,基于预测彩色图像样本与对应的第一实际彩色图像之间的误差确定对抗损失值,基于第二上色图像样本与对应的第二实际彩色图像之间的误差确定感知损失值,基于待上色图像样本与第二上色图像样本之间的误差确定域对齐损失值,基于第一上色图像样本与第二上色图像样本之间的误差确定上下文损失值。
在一些可能的示例中,第一实际彩色图像是待上色图像样本对应的实际的LAB色彩模式的彩色图像样本,预测彩色图像样本是对待上色图像样本预测其丢失的两个颜色通道所得到的LAB色彩模式的彩色图像样本,第二上色图像样本是预测的RGB色彩模式的彩色图像样本,第二实际彩色图像是待上色图像样本对应的实际的RGB色彩模式的彩色图像样本,对第一实际彩色图像进行色彩模式的转换得到第二实际彩色图像。
在确定各损失值之后,对对抗损失值、感知损失值、域对齐损失值和上下文损失值进行加权求和,得到总损失值。最后,基于总损失函数在上色网络中反向传播总损失值,以更新上色网络的参数。
通过对多种损失值进行加权求和,可以提高上色网络的上色处理的准确度,从而提高电子设备的图像处理准确度以及图像处理效率。
可以看出,本申请实施例通过确定与待上色图像对齐的第二色彩先验信息,并基于第二色彩先验信息对待上色图像对应的第一图像特征进行调制上色处理和上采样处理,从而得到第一上色图像。因为第二色彩先验信息是与待上色图像对齐的,所以,基于第二色彩先验信息生成的第一上色图像是与待上色图像对齐的,如此,实现了对待上色图像精确地上色,从而提高电子设备的图像处理准确度以及图像处理效率。
下面,将说明本申请实施例在一个实际的应用场景中的示例性应用。
在视频应用中,终端响应于用户对灰度视频文件的上色操作,向云服务器发送携带灰度视频文件的上色请求。云服务器接收上色请求后,对灰度视频文件解码,得到多个视频帧,每个视频帧均为待上色图像。之后,对多个视频帧(待上色图像)上色,得到多个第一上色图像。对多个第一上色图像编码,得到彩色的新的视频文件,将新的视频文件发送给终端,以在终端中呈现新的视频。
下面对视频帧(待上色图像)上色的过程进行介绍。如图4所示,首先通过编码器(如GAN编码器)对待上色图像x l(灰度图像)进行编码,得到编码向量z。然后通过预训练GAN接收z,生成第二上色图像
Figure PCTCN2022072298-appb-000001
以及和
Figure PCTCN2022072298-appb-000002
相关的第一色彩先验信息(即中间层特征F prior)。由于第一色彩先验信息与x l相关,而不是与x l完全对齐(如图4中
Figure PCTCN2022072298-appb-000003
中的鸡尾与x l中的鸡尾的位置并不一致),所以需要通过x l
Figure PCTCN2022072298-appb-000004
确定两者的位置对应关系。确定x l
Figure PCTCN2022072298-appb-000005
间的相似度矩阵M,M表示二者的像素点间的位置相似度,利用M使第一色彩先验信息与x l对齐。对齐之后,得到第二色彩先验信息,通过第二色彩先验信息控制上色网络中的部分参数,从而达到利用色彩先验信息来指导上色的目的。最终上色网络基于待上色图像输出第一上色图像
Figure PCTCN2022072298-appb-000006
以下对上述上色过程进行具体的说明。
(a)、需要在预训练GAN中找到与x l相关的色彩先验信息。然而,考虑到基于x l在预训练GAN中“检索”相关的色彩先验信息这个问题没法定义及优化,所以引入一个接收x l并输出z的编码器,该编码器是一个神经网络。通过编码器确定x l对应的z后,预训练GAN接收z并输出和x l有尽可能多相似内容的
Figure PCTCN2022072298-appb-000007
此时预训练GAN的中间多个层的特征构成的多尺度特征F prior,就是和x l最相关的第一色彩先验信息。为了优化编码器,约束x l对应的实际彩色图像x rgb
Figure PCTCN2022072298-appb-000008
两者在预训练GAN的判别器里的特征尽可能相近。
(b)对第一色彩先验信息F prior进行变换,使之与x l对齐。由于F prior和x l通常在空间上并不是对齐的,所以首先需要将两者对齐才能更好的用F prior来指导上色。将x l
Figure PCTCN2022072298-appb-000009
分别经过同一个特征提取器,得到二者在所有空间位置上的特征向量(位置特征)对应的位置特征,根据x l
Figure PCTCN2022072298-appb-000010
对应的位置特征间的点积得到二者间的相似度矩阵M,M(u,v)表示x l的位置u和
Figure PCTCN2022072298-appb-000011
的位置v间的相似度(对应像素之间的相似度)。对M进行归一化处理,使得M满足∑ jM(i,j)=1。接下来根据M对F prior进行仿射变换,得到与x l对齐的第二色彩先验信息。
(c)利用与x l对齐的第二色彩先验信息来指导上色。上色网络由两个下采样层、六个残差块以及两个上采样层顺序堆叠构成。对第二色彩先验信息进行卷积处理,得到与待调制特征f维度一样的参数α和β,通过参数α和β对待调制特征f进行调制,调制公式为:f′=f*α+β。其中,待调制特征f代表上色网络的残差块中经过卷积处理得到的图像特征以及上采样层中经过卷积处理得到的图像特征,f′是调制后的特征。待调制特征f经过调制后进入下一层进行处理,最终,上色网络生成与待上色图像对齐的第一上色图像。
在一些实施例中,预训练GAN可以是BigGAN(也可以是StyleGAN),它是在ImageNet数据集上预先训练好的。整个训练分为两个阶段:第一个阶段训练编码器;第二个阶段训练整个模型(除了预训练GAN以及编码器以外,因为在第二个阶段,二者都是训练好并且固定参数的),第二个阶段采用的损失函数包括对抗损失函数、感知损失函数、域对齐损失函数以及上下文损失函数。
在一些实施例中,为了进行多样化上色,可以用不同的色彩先验信息来指 导上色。可以通过改变z来改变第一色彩先验信息,例如在编码向量中加入噪声向量,或者改变训练BigGAN(当预训练GAN为BigGAN时)时的输入的类别,或者通过无监督学习找到和颜色变化相关的方向,然后沿着这些方向改变z,就可以使最后的上色图像产生不同的上色效果。
如图7所示,图7是本申请实施例提供的上色效果的示意图,图7中第一行是输入的待上色图像,第二行是通过本申请实施例提出的基于人工智能的图像上色方法得到的上色图像(结果),第三行是通过改变鸟的类别实现输入一张包括鸟的灰度图像,用不同的色彩对该灰度图像进行上色,得到多样化结果。
如图8所示,图8是本申请实施例提供的上色效果的示意图,图8展示的是沿着一些方向改变z,生成多样化的上色效果的图像。图8中展示的这些方向有和背景颜色相关的、有和前景(如花瓶、卡车)颜色相关的、有和色彩饱和度先关的方向。图8中第一行(第一张为待上色图像)是对待上色图像中背景上色后得到的不同图像,第二行和第三行(第一张为待上色图像)是对待上色图像中前景上色后得到的不同图像,第四行至第六行(第一张为待上色图像)是对待上色图像的饱和度调整后得到的不同图像。
可以看出,本申请实施例通过色彩先验信息指导上色,可以自动便捷地生成高质量且有着生动颜色的上色图像,而且还可以通过控制修改色彩先验信息,得到不同的上色效果,实现多样化上色,从而提高电子设备的图像处理准确度以及图像处理效率。
下面继续说明本申请实施例提供的基于人工智能的图像上色装置455的实施为软件模块的示例性结构,在一些实施例中,如图2所示,存储在存储器450的基于人工智能的图像上色装置455中的软件模块可以包括:获取模块4551,配置为获取待上色图像的第一色彩先验信息;变换模块4552,配置为对第一色彩先验信息进行变换处理,得到与待上色图像对齐的第二色彩先验信息;处理模块4553,配置为下采样待上色图像,得到第一图像特征;以及配置为基于第二色彩先验信息对第一图像特征进行调制上色处理,得到第二图像特征;以及配置为基于第二色彩先验信息对第二图像特征进行上采样处理,得到第一上色图像,其中,第一上色图像与待上色图像对齐。
在一些实施例中,获取模块4551,还配置为获取待上色图像的编码向量;对编码向量进行恒等映射处理,得到第二上色图像,其中,所述第二上色图像未与所述待上色图像对齐;将多尺度特征,作为第一色彩先验信息,其中,所述多尺度特征是在通过所述恒等映射得到所述第二上色图像的过程中得到的。
在一些实施例中,变换模块4552,还配置为确定待上色图像与第二上色图像之间的相似度矩阵,其中,第二上色图像是对待上色图像进行上色处理得到的,且第二上色图像未与待上色图像对齐;基于相似度矩阵对第一色彩先验信息进行仿射变换处理,得到与待上色图像对齐的多尺度特征,其中,所述第一色彩先验信息包括对所述待上色图像进行上色处理的过程中得到的多尺度特征;将与待上色图像对齐的多尺度特征作为第二色彩先验信息。
在一些实施例中,变换模块4552,还配置为获取待上色图像的第一位置特 征和第二上色图像的第二位置特征;其中,第一位置特征包括待上色图像中每个像素点的位置特征,第二位置特征包括第二上色图像中每个像素点的位置特征;基于第一位置特征和第二位置特征,确定待上色图像与第二上色图像之间的相似度矩阵;其中,相似度矩阵包括待上色图像中每个像素点与第二上色图像中每个像素点之间的相似度。
在一些实施例中,变换模块4552,还配置为对第一位置特征和第二位置特征进行非局部处理,得到与非局部处理对应的相似度矩阵;对与非局部处理对应的相似度矩阵进行归一化处理,得到待上色图像与第二上色图像之间的相似度矩阵。
在一些实施例中,处理模块4553,还配置为基于第二色彩先验信息中与待上色图像对齐的多尺度特征,确定第一调制参数;通过第一调制参数对第一图像特征进行调制上色处理,得到第二图像特征。
在一些实施例中,调制上色处理是通过上色网络实现的,上色网络包括残差模块;处理模块4553,还配置为在与待上色图像对齐的多尺度特征中,确定与上色网络中的残差模块对应的第一尺度特征;对第一尺度特征进行卷积处理,得到与残差模块对应的第一调制参数。
在一些实施例中,处理模块4553,还配置为对第一图像特征进行卷积处理,得到卷积结果;通过第一调制参数对卷积结果进行第一线性变换处理,得到第一线性变换结果;将第一线性变换结果与第一图像特征进行加和处理,并将得到的加和处理结果作为第二图像特征。
在一些实施例中,处理模块4553,还配置为基于第二色彩先验信息中与待上色图像对齐的多尺度特征,确定第二调制参数;对第二图像特征进行反卷积处理,得到反卷积结果;通过第二调制参数对反卷积处理结果进行第二线性变换处理,得到第二线性变换结果;对第二线性变换结果进行激活处理,得到与待上色图像对齐的预测彩色图像;对预测彩色图像进行色彩模式转换处理,得到第一上色图像。
在一些实施例中,调制上色处理是通过上色网络实现的,上色网络包括上采样模块;处理模块4553,还配置为在与待上色图像对齐的多尺度特征中,确定与上色网络中的上采样模块对应的第二尺度特征;对第二尺度特征进行卷积处理,得到与上采样模块对应的第二调制参数。
在一些实施例中,处理模块4553,还配置为对编码向量进行转换处理,得到转换向量;基于转换向量确定与待上色图像对齐的第三色彩先验信息;基于第三色彩先验信息对待上色图像进行调制上色处理,得到与待上色图像对齐的第三上色图像;其中,第三上色图像包括以下至少一种对待上色图像中背景上色后的图像、对待上色图像中前景上色后的图像、对待上色图像的饱和度调整后的图像。
在一些实施例中,下采样、调制上色处理和上采样处理是通过上色网络实现的;基于人工智能的图像上色装置还包括训练模块4554,配置为通过以下方式训练上色网络:基于上色网络对应的对抗损失函数、感知损失函数、域对齐损失函数和上下文损失函数确定总损失函数;调用所述上色网络对待上色图像 样本进行上色处理,得到第一上色图像样本、第二上色图像样本以及预测彩色图像样本;其中,所述第一上色图像样本是对所述预测彩色图像样本进行转换得到,且与所述待上色图像样本对齐,所述第二上色图像样本未与所述待上色图像样本对齐;基于所述预测彩色图像样本与对应所述预测彩色图像样本的第一实际彩色图像之间的误差确定对抗损失值,基于所述第二上色图像样本与对应所述第二上色图像样本的第二实际彩色图像之间的误差确定感知损失值,基于所述待上色图像样本与所述第二上色图像样本之间的误差确定域对齐损失值,基于所述第一上色图像样本与所述第二上色图像样本之间的误差确定上下文损失值;其中,第二实际彩色图像是对第一实际彩色图像进行转换得到;对对抗损失值、感知损失值、域对齐损失值和上下文损失值进行加权求和,得到总损失值;基于总损失函数在上色网络中反向传播总损失值,以更新上色网络的参数。
本申请实施例提供一种存储有可执行指令的计算机可读存储介质,其中存储有可执行指令,当可执行指令被处理器执行时,将引起处理器执行本申请实施例提供的基于人工智能的图像上色方法,例如,如图5示出的基于人工智能的图像上色方法。
在一些实施例中,存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、闪存、磁表面存储器、光盘、或CD-ROM等存储器;也可以是包括上述存储器之一或任意组合的各种设备。
在一些实施例中,可执行指令可以采用程序、软件、软件模块、脚本或代码的形式,按任意形式的编程语言(包括编译或解释语言,或者声明性或过程性语言)来编写,并且其可按任意形式部署,包括被部署为独立的程序或者被部署为模块、组件、子例程或者适合在计算环境中使用的其它单元。
作为示例,可执行指令可以但不一定对应于文件系统中的文件,可以可被存储在保存其它程序或数据的文件的一部分,例如,存储在超文本标记语言(HTML,Hyper Text Markup Language)文档中的一个或多个脚本中,存储在专用于所讨论的程序的单个文件中,或者,存储在多个协同文件(例如,存储一个或多个模块、子程序或代码部分的文件)中。
作为示例,可执行指令可被部署为在一个计算设备上执行,或者在位于一个地点的多个计算设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个计算设备上执行。
综上所述,本申请实施例通过确定与待上色图像对齐的第二色彩先验信息,并基于第二色彩先验信息对待上色图像对应的第一图像特征进行调制上色处理和上采样处理,从而得到第一上色图像。因为第二色彩先验信息是与待上色图像对齐的,所以,基于第二色彩先验信息生成的第一上色图像也是与待上色图像对齐的,如此,实现了对待上色图像自动精确地上色。此外,本申请实施例还可以通过控制修改色彩先验信息,生成具有不同上色效果的上色图像,实现多样化上色。
以上所述,仅为本申请的实施例而已,并非用于限定本申请的保护范围。 凡在本申请的精神和范围之内所作的任何修改、等同替换和改进等,均包含在本申请的保护范围之内。

Claims (16)

  1. 一种基于人工智能的图像上色方法,所述方法由电子设备执行所述方法包括:
    获取待上色图像的第一色彩先验信息;
    对所述第一色彩先验信息进行变换处理,得到与所述待上色图像对齐的第二色彩先验信息;
    下采样所述待上色图像,得到第一图像特征;
    基于所述第二色彩先验信息对所述第一图像特征进行调制上色处理,得到第二图像特征;
    基于所述第二色彩先验信息对所述第二图像特征进行上采样处理,得到第一上色图像,其中,所述第一上色图像与所述待上色图像对齐。
  2. 根据权利要求1所述的方法,其中,所述获取待上色图像的第一色彩先验信息,包括:
    获取所述待上色图像的编码向量;
    对所述编码向量进行恒等映射处理,得到第二上色图像,其中,所述第二上色图像未与所述待上色图像对齐;
    将多尺度特征,作为所述第一色彩先验信息,其中,所述多尺度特征是在通过所述恒等映射得到所述第二上色图像的过程中得到的。
  3. 根据权利要求1所述的方法,其中,所述对所述第一色彩先验信息进行变换处理,得到与所述待上色图像对齐的第二色彩先验信息,包括:
    确定所述待上色图像与第二上色图像之间的相似度矩阵,其中,所述第二上色图像是对所述待上色图像进行上色处理得到的,且所述第二上色图像未与所述待上色图像对齐;
    基于所述相似度矩阵对所述第一色彩先验信息进行仿射变换处理,得到与所述待上色图像对齐的多尺度特征,其中,所述第一色彩先验信息包括对所述待上色图像进行上色处理的过程中得到的多尺度特征;
    将与所述待上色图像对齐的多尺度特征作为所述第二色彩先验信息。
  4. 根据权利要求3所述的方法,其中,所述确定所述待上色图像与第二上色图像之间的相似度矩阵,包括:
    获取所述待上色图像的第一位置特征和所述第二上色图像的第二位置特征;
    其中,所述第一位置特征包括所述待上色图像中每个像素点的位置特征,所述第二位置特征包括所述第二上色图像中每个像素点的位置特征;
    基于所述第一位置特征和所述第二位置特征,确定所述待上色图像与所述第二上色图像之间的相似度矩阵;
    其中,所述相似度矩阵包括所述待上色图像中每个像素点与所述第二上色图像中每个像素点之间的相似度。
  5. 根据权利要求4所述的方法,其中,所述基于所述第一位置特征和所述第二位置特征,确定所述待上色图像与所述第二上色图像之间的相似度矩阵,包括:
    对所述第一位置特征和所述第二位置特征进行非局部处理,得到与所述非局部处理对应的相似度矩阵;
    对与所述非局部处理对应的相似度矩阵进行归一化处理,得到所述待上色图像与所述第二上色图像之间的相似度矩阵。
  6. 根据权利要求1所述的方法,其中,所述基于所述第二色彩先验信息对所述第一图像特征进行调制上色处理,得到第二图像特征,包括:
    基于所述第二色彩先验信息中与所述待上色图像对齐的多尺度特征,确定第一调制参数;
    通过所述第一调制参数对所述第一图像特征进行调制上色处理,得到所述第二图像特征。
  7. 根据权利要求6所述的方法,其中,所述调制上色处理是通过上色网络实现的,所述上色网络包括残差模块;
    所述基于所述第二色彩先验信息中与所述待上色图像对齐的多尺度特征,确定第一调制参数,包括:
    在与所述待上色图像对齐的多尺度特征中,确定与所述上色网络中的残差模块对应的第一尺度特征;
    对所述第一尺度特征进行卷积处理,得到与所述残差模块对应的第一调制参数。
  8. 根据权利要求6所述的方法,其中,所述通过所述第一调制参数对所述第一图像特征进行调制上色处理,得到所述第二图像特征,包括:
    对所述第一图像特征进行卷积处理,得到卷积结果;
    通过所述第一调制参数对所述卷积结果进行第一线性变换处理,得到第一线性变换结果;
    将所述第一线性变换结果与所述第一图像特征进行加和处理,并将得到的加和处理结果作为所述第二图像特征。
  9. 根据权利要求1所述的方法,其中,所述基于所述第二色彩先验信息对所述第二图像特征进行上采样处理,得到第一上色图像,包括:
    基于所述第二色彩先验信息中与所述待上色图像对齐的多尺度特征,确定第二调制参数;
    对所述第二图像特征进行反卷积处理,得到反卷积结果;
    通过所述第二调制参数对反卷积处理结果进行第二线性变换处理,得到第二线性变换结果;
    对所述第二线性变换结果进行激活处理,得到与所述待上色图像对齐的预测彩色图像;
    对所述预测彩色图像进行色彩模式转换处理,得到所述第一上色图像。
  10. 根据权利要求9所述的方法,其中,所述调制上色处理是通过上色网络实现的,所述上色网络包括上采样模块;
    所述基于所述第二色彩先验信息中与所述待上色图像对齐的多尺度特征,确定第二调制参数,包括:
    在与所述待上色图像对齐的多尺度特征中,确定与所述上色网络中的上采 样模块对应的第二尺度特征;
    对所述第二尺度特征进行卷积处理,得到与所述上采样模块对应的第二调制参数。
  11. 根据权利要求2所述的方法,其中,所述方法还包括:
    对所述编码向量进行转换处理,得到转换向量;
    基于所述转换向量确定与所述待上色图像对齐的第三色彩先验信息;
    基于所述第三色彩先验信息对所述待上色图像进行调制上色处理,得到与所述待上色图像对齐的第三上色图像;
    其中,所述第三上色图像包括以下至少一种:对所述待上色图像中背景上色后的图像、对所述待上色图像中前景上色后的图像、对所述待上色图像的饱和度调整后的图像。
  12. 根据权利要求1所述的方法,其中,所述下采样、所述调制上色处理和所述上采样处理是通过上色网络实现的;
    在所述获取待上色图像的第一色彩先验信息之前,所述方法还包括:
    通过以下方式训练所述上色网络:
    基于所述上色网络对应的对抗损失函数、感知损失函数、域对齐损失函数和上下文损失函数确定总损失函数;
    调用所述上色网络对待上色图像样本进行上色处理,得到第一上色图像样本、第二上色图像样本以及预测彩色图像样本;
    其中,所述第一上色图像样本是对所述预测彩色图像样本进行转换得到,且与所述待上色图像样本对齐,所述第二上色图像样本未与所述待上色图像样本对齐;
    基于所述预测彩色图像样本与对应所述预测彩色图像样本的第一实际彩色图像之间的误差确定对抗损失值,基于所述第二上色图像样本与对应所述第二上色图像样本的第二实际彩色图像之间的误差确定感知损失值,基于所述待上色图像样本与所述第二上色图像样本之间的误差确定域对齐损失值,基于所述第一上色图像样本与所述第二上色图像样本之间的误差确定上下文损失值;
    其中,所述第二实际彩色图像是对所述第一实际彩色图像进行转换得到;
    对所述对抗损失值、所述感知损失值、所述域对齐损失值和所述上下文损失值进行加权求和,得到总损失值;
    基于所述总损失函数在所述上色网络中反向传播所述总损失值,并更新所述上色网络的参数。
  13. 一种基于人工智能的图像上色装置,包括:
    获取模块,配置为获取待上色图像的第一色彩先验信息;
    变换模块,配置为对所述第一色彩先验信息进行变换处理,得到与所述待上色图像对齐的第二色彩先验信息;
    处理模块,配置为下采样所述待上色图像,得到第一图像特征;以及配置为基于所述第二色彩先验信息对所述第一图像特征进行调制上色处理,得到第二图像特征;以及配置为基于所述第二色彩先验信息对所述第二图像特征进行上采样处理,得到第一上色图像,其中,所述第一上色图像与所述待上色图像 对齐。
  14. 一种电子设备,包括:
    存储器,用于存储可执行指令;
    处理器,用于执行所述存储器中存储的可执行指令时,实现权利要求1至12任一项所述的基于人工智能的图像上色方法。
  15. 一种计算机可读存储介质,存储有可执行指令,用于被处理器执行时,实现权利要求1至12任一项所述的基于人工智能的图像上色方法。
  16. 一种计算机程序产品,包括计算机程序或指令,所述计算机程序或指令被处理器执行时,实现权利要求1至12任一项所述的基于人工智能的图像上色方法。
PCT/CN2022/072298 2021-01-20 2022-01-17 基于人工智能的图像上色方法、装置、电子设备、计算机可读存储介质及计算机程序产品 WO2022156621A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/971,279 US20230040256A1 (en) 2021-01-20 2022-10-21 Image coloring method and apparatus based on artificial intelligence, electronic device, and computer readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110075873.9 2021-01-20
CN202110075873.9A CN113570678A (zh) 2021-01-20 2021-01-20 基于人工智能的图像上色方法、装置、电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/971,279 Continuation US20230040256A1 (en) 2021-01-20 2022-10-21 Image coloring method and apparatus based on artificial intelligence, electronic device, and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2022156621A1 true WO2022156621A1 (zh) 2022-07-28

Family

ID=78160922

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/072298 WO2022156621A1 (zh) 2021-01-20 2022-01-17 基于人工智能的图像上色方法、装置、电子设备、计算机可读存储介质及计算机程序产品

Country Status (3)

Country Link
US (1) US20230040256A1 (zh)
CN (1) CN113570678A (zh)
WO (1) WO2022156621A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113570678A (zh) * 2021-01-20 2021-10-29 腾讯科技(深圳)有限公司 基于人工智能的图像上色方法、装置、电子设备
US20230316606A1 (en) * 2022-03-21 2023-10-05 Adobe Inc. Generating and modifying digital images using a joint feature style latent space of a generative neural network
US12020364B1 (en) * 2022-04-07 2024-06-25 Bentley Systems, Incorporated Systems, methods, and media for modifying the coloring of images utilizing machine learning
CN116643497B (zh) * 2023-05-29 2024-05-10 汕头市鼎泰丰实业有限公司 筒子纱的染色控制系统及其方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150269747A1 (en) * 2014-03-24 2015-09-24 Apple Inc. Palette generation using user-selected images
CN108921916A (zh) * 2018-07-03 2018-11-30 广东工业大学 图片中多目标区域的上色方法、装置、设备及存储介质
CN111583097A (zh) * 2019-02-18 2020-08-25 北京三星通信技术研究有限公司 图像处理方法、装置、电子设备及计算机可读存储介质
CN113570678A (zh) * 2021-01-20 2021-10-29 腾讯科技(深圳)有限公司 基于人工智能的图像上色方法、装置、电子设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7400767B2 (en) * 2005-07-15 2008-07-15 Siemens Medical Solutions Usa, Inc. System and method for graph cuts image segmentation using a shape prior
KR102161052B1 (ko) * 2013-08-27 2020-09-29 삼성전자주식회사 영상에서 객체를 분리하는 방법 및 장치.
TWI775006B (zh) * 2019-11-01 2022-08-21 財團法人工業技術研究院 擬真虛擬人臉產生方法與系統,及應用其之人臉辨識方法與系統

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150269747A1 (en) * 2014-03-24 2015-09-24 Apple Inc. Palette generation using user-selected images
CN108921916A (zh) * 2018-07-03 2018-11-30 广东工业大学 图片中多目标区域的上色方法、装置、设备及存储介质
CN111583097A (zh) * 2019-02-18 2020-08-25 北京三星通信技术研究有限公司 图像处理方法、装置、电子设备及计算机可读存储介质
CN113570678A (zh) * 2021-01-20 2021-10-29 腾讯科技(深圳)有限公司 基于人工智能的图像上色方法、装置、电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
NIU WENFEI: "Research on Image Segmentation Method Based on Shape Prior and Graph Cut", CHINESE MASTER'S THESES FULL-TEXT DATABASE, 1 May 2013 (2013-05-01), pages 1 - 70, XP055952255, ISSN: 1674-0246 *

Also Published As

Publication number Publication date
CN113570678A (zh) 2021-10-29
US20230040256A1 (en) 2023-02-09

Similar Documents

Publication Publication Date Title
WO2022156621A1 (zh) 基于人工智能的图像上色方法、装置、电子设备、计算机可读存储介质及计算机程序产品
He et al. InSituNet: Deep image synthesis for parameter space exploration of ensemble simulations
US11966839B2 (en) Auto-regressive neural network systems with a soft attention mechanism using support data patches
US20200134778A1 (en) Image style transform methods and apparatuses, devices and storage media
US11544880B2 (en) Generating modified digital images utilizing a global and spatial autoencoder
CN113994384A (zh) 使用机器学习的图像着色
US11348203B2 (en) Image generation using subscaling and depth up-scaling
CN114008663A (zh) 实时视频超分辨率
CN107729948A (zh) 图像处理方法及装置、计算机产品和存储介质
US11983903B2 (en) Processing images using self-attention based neural networks
CN113205449A (zh) 表情迁移模型的训练方法及装置、表情迁移方法及装置
CN111742345A (zh) 通过着色的视觉跟踪
US20240096001A1 (en) Geometry-Free Neural Scene Representations Through Novel-View Synthesis
CN112132106A (zh) 基于人工智能的图像增广处理方法、装置、设备及存储介质
CN113822794A (zh) 一种图像风格转换方法、装置、计算机设备和存储介质
CN116109892A (zh) 虚拟试衣模型的训练方法及相关装置
Liao et al. Deep Learning‐Based Application of Image Style Transfer
Abbas et al. Improving deep learning-based image super-resolution with residual learning and perceptual loss using SRGAN model
CN112115744A (zh) 点云数据的处理方法及装置、计算机存储介质、电子设备
KR20220012785A (ko) 데이터 증강 기반 사물 분석 모델 학습 장치 및 방법
CN111898544A (zh) 文字图像匹配方法、装置和设备及计算机存储介质
CN115170418B (zh) 符合退化的低秩高维图像填充模型及其填充方法与系统
Cosmo et al. Multiple sequential regularized extreme learning machines for single image super resolution
JP2024521645A (ja) 時空間上のアテンションを使用したビデオシーケンスからの物体表現の教師なし学習
JP2023508639A (ja) データ増強基盤空間分析モデル学習装置及び方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22742100

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28.11.2023)