CN114549674A - Method, system, equipment and storage medium for toning image and video - Google Patents

Method, system, equipment and storage medium for toning image and video Download PDF

Info

Publication number
CN114549674A
CN114549674A CN202011329971.2A CN202011329971A CN114549674A CN 114549674 A CN114549674 A CN 114549674A CN 202011329971 A CN202011329971 A CN 202011329971A CN 114549674 A CN114549674 A CN 114549674A
Authority
CN
China
Prior art keywords
image
color
reference image
neural network
lens
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011329971.2A
Other languages
Chinese (zh)
Inventor
欧阳雯琪
杨涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202011329971.2A priority Critical patent/CN114549674A/en
Publication of CN114549674A publication Critical patent/CN114549674A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The embodiment of the application provides a color matching method, a system, equipment and a storage medium for images and videos. In the image toning method, when there is a toning requirement for the first image, the deep neural network can be trained by using the first image as training data under the supervision of a reference image matched with the first image, so as to obtain a toning model for the first image. In the process of training the deep neural network, the reference image is used as a learning target, so that the deep neural network can specifically learn the model parameters required for toning the first image according to the color information contained in the reference image. The first image is subjected to color mixing based on the color mixing model, the pertinence of color mixing operation is improved, the visual effect of the first image after color mixing has higher similarity with a reference image, refined intelligent color mixing based on the reference image is realized, and the personalized color mixing requirement is convenient to meet.

Description

Method, system, equipment and storage medium for toning image and video
Technical Field
The present application relates to the field of computer vision technologies, and in particular, to a method, a system, a device, and a storage medium for color matching of images and videos.
Background
Image toning is a common image processing means, and is mainly used for performing color adjustment on a local area or a whole area on an image. Conventional image toning methods rely on manual adjustment of the color of an image. In recent years, with the development of computer technology, a variety of automatic toning tools have gradually appeared.
However, the existing automatic color matching tool cannot realize highly targeted fine color matching on the image to be subjected to color matching. Therefore, a new solution is yet to be proposed.
Disclosure of Invention
Aspects of the present application provide a method, system, device and storage medium for color matching of images and videos, so as to achieve highly targeted fine color matching.
The embodiment of the application provides an image toning method, which comprises the following steps: responding to a calling request of a client to a first interface, and acquiring a first image to be color-mixed, which is contained in interface parameters; determining a reference image that matches the first image, the reference image including color information; taking the first image as training data, and carrying out supervised training on a deep neural network under the supervision of the reference image to obtain a color matching model of the first image; and returning the color mixing model of the first image to the client so that the client performs color mixing on the first image by using the color mixing model of the first image.
The embodiment of the application provides an image toning method, which comprises the following steps: acquiring a first image to be toned; determining a reference image that matches the first image, the reference image including color information; and taking the first image as training data, and carrying out supervised training on the deep neural network under the supervision of the reference image to obtain a color matching model of the first image.
The embodiment of the application provides an image toning method, which comprises the following steps: acquiring a first image to be toned; determining a reference image that matches the first image, the reference image including color information; taking the first image as training data, and carrying out supervised training on a deep neural network under the supervision of the reference image to obtain a color matching model of the first image; and inputting the first image into the color matching model to obtain a color-matched target image.
The embodiment of the present application further provides a video toning method, including: acquiring a video to be subjected to color mixing, wherein the video comprises at least one lens; for any lens in the at least one lens, determining a key frame image meeting a set condition from a plurality of frame images contained in the lens; determining a reference image that matches the key frame image, the reference image including color information; taking the key frame image as training data, and carrying out supervised training on a deep neural network under the supervision of the reference image to obtain a color matching model of the lens; and respectively inputting the multi-frame images contained in the lens into the color matching model to obtain the color-matched target images corresponding to the multi-frame images contained in the lens.
The embodiment of the present application further provides an image toning method, including: responding to a color matching request, and acquiring a first image to be subjected to color matching; sending the first image to a server so that the server trains a deep neural network under the supervision of a reference image by taking the first image as training data to obtain a color mixing model, and performing color transformation on the first image based on the color mixing model to obtain a color-mixed target image; and receiving the color-mixed target image returned by the server, and displaying the target image.
The embodiment of the present application further provides an image toning method, including: receiving a first image sent by terminal equipment; determining a reference image that matches the first image, the reference image including color information; taking the first image as training data, and carrying out supervised training on a deep neural network under the supervision of the reference image to obtain a color matching model of the first image; inputting the first image into the color matching model to obtain a color-matched target image; and sending the target image after color mixing to the terminal equipment for displaying.
An embodiment of the present application further provides a terminal device, including: a memory and a processor; the memory is to store one or more computer instructions; the processor is to execute the one or more computer instructions to: the steps in the method provided by the embodiment of the application are executed.
Embodiments of the present application further provide a computer-readable storage medium storing a computer program, where the computer program can implement the steps in the method provided in the embodiments of the present application when executed by a processor.
In the embodiment of the application, when there is a color matching demand for the first image, the first image may be used as training data to train the deep neural network under the supervision of a reference image matched with the first image, so as to obtain a color matching model for the first image. In the process of training the deep neural network, the reference image is used as a learning target, so that the deep neural network can specifically learn the model parameters required for toning the first image according to the color information contained in the reference image. The first image is subjected to color mixing based on the color mixing model, the pertinence of color mixing operation is improved, the visual effect of the first image after color mixing has higher similarity with a reference image, refined intelligent color mixing based on the reference image is realized, and the personalized color mixing requirement is convenient to meet.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic diagram of an image toning system according to an exemplary embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a training process of a palette model provided in an exemplary embodiment of the present application;
FIG. 3 is a schematic flowchart of an image toning method executed on a terminal device according to an exemplary embodiment of the present disclosure;
FIG. 4 is a schematic flowchart of an image toning method performed on a server side according to an exemplary embodiment of the present application;
FIG. 5 is a schematic flowchart of an image toning method performed on a terminal side according to another exemplary embodiment of the present disclosure;
fig. 6 is a schematic flowchart of a video toning method executed on a terminal side according to an exemplary embodiment of the present application;
FIG. 7 is a schematic flow chart of a toning method provided by another exemplary embodiment of the present application;
fig. 8 is a schematic structural diagram of a terminal device according to an exemplary embodiment of the present application;
fig. 9 is a schematic structural diagram of a server according to an exemplary embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Image toning is a common image processing means. Conventional image toning methods rely on manual adjustment of the color of an image. For example, toning tools such as davinci resolve and photoshop do not have sample-based intelligent toning capability, and require a user to manually adjust the color of an image.
In recent years, with the development of computer technology, a variety of automatic toning tools have gradually appeared. Currently, there are some tinting methods that do not rely on deep learning, such as tinting methods based on pitie color or bennel color transfer. In such a toning method, the transformed color cannot be finely controlled, resulting in a distortion phenomenon in the color of a local area (e.g., a human face skin color). In some end-to-end (end-to-end) color transformation methods depending on the deep neural network, some methods can cause the image to have the phenomena of flaws, distortion and the like after color mixing is performed on the image, and when some methods are applied to video color mixing, the phenomenon that the inter-frame color mixing result is discontinuous is often generated.
In view of the above technical problems, embodiments of the present application provide an image toning method and an image toning system, which will be described below with reference to the accompanying drawings.
Fig. 1 is a schematic structural diagram of an image toning system according to an exemplary embodiment of the present application, and as shown in fig. 1, the image toning system 100 includes: a terminal device 10 and a server 20.
In the present embodiment, the terminal device 10 refers to a device capable of interacting with a user to provide the user with an entry for an image toning operation, and having a communication function. The implementation form of the terminal device 10 may be different in different application scenarios. In some scenarios, the terminal device 10 may be represented by a mobile phone, a tablet computer, a computer device, or the like on the user side, and the user may initiate a color matching operation for an image or a video through a plug-in, an application program, or a browser, or the like provided by the terminal device 10.
In the image toning system 100, the server 20 may communicate with the terminal device 10 and may provide data support related to image toning and calculation support to the terminal device 10. The implementation form of the server 20 may include a conventional server, a cloud host, a virtual center, and other devices, which is not limited in this embodiment. The server device mainly includes a processor, a hard disk, a memory, a system bus, and the like, and is similar to a general computer architecture, and is not described in detail.
In this embodiment, the terminal device 10 is mainly configured to: in response to the toning request, a first image to be toned is acquired and sent to the server 20. After the server 20 completes the color matching operation for the first image, the terminal device 10 may receive the color-matched target image returned by the server 20 and display the target image. In some alternative embodiments, terminal device 10 may include an electronic display screen through which a user may initiate an image toning request. The electronic display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP), among others. If the electronic display screen includes a touch panel, the electronic display screen may be implemented as a touch screen that may receive input signals from a user to obtain a user's image toning request. Of course, in other alternative embodiments, the terminal device 10 may include a physical key or a voice input device for providing the user with an image toning operation, and the like, which will not be described herein.
In embodiments of the present application, the first image is used to describe an arbitrary image to be toned. It should be understood that in the present embodiment, the "first" image to be toned is used for limitation, and is only used for convenience of description and distinction, and no limitation is made to the order and number of the limited images.
The first image may be provided by the user to the terminal device 10, or may be obtained by the terminal device 10 according to an access path or a URL (uniform resource locator) provided by the user, which is not limited in this embodiment.
The server 20 is mainly configured to: the first image transmitted by the terminal device 10 is received, and a reference image matching the first image is determined, the reference image containing color information. And then, under the supervision of a reference image, taking the first image as training data, and carrying out supervised training on the deep neural network to obtain a color matching model of the first image. Based on the color matching model, the first image can be subjected to color matching to obtain a target image after color matching, and the image after color matching is sent to the end device 10 for displaying.
Wherein, the reference image is matched with the first image, which can be expressed as: the reference image comprises a portion of the image content similar to a portion of the image content comprised by the first image. For example, the first image may include contents such as a person and a building, and the reference image may include contents such as a person and a building. Further, the reference image may provide color reference information required for toning to the first image.
In some embodiments, the reference image may be provided by a user, and the user may select an image with a specific color and style as the reference image according to the color matching requirement. In other embodiments, the reference image may be actively provided by the image toning system 100. In the following embodiments, an alternative embodiment in which the image toning system 100 actively recommends the reference image matching the first image will be described in detail, and details are not repeated here.
The optional implementation of the server 20 for training the depth linear model based on the reference image and the first image will be described in detail in the following embodiments, which are not described herein.
In some exemplary embodiments, the terminal device 10 and the server 20 may communicate with each other in a wired communication manner or a wireless communication manner. The WIreless communication mode includes short-distance communication modes such as bluetooth, ZigBee, infrared, WiFi (WIreless-Fidelity), long-distance WIreless communication modes such as LORA, and WIreless communication mode based on a mobile network. When the mobile network is connected through communication, the network format of the mobile network may be any one of 2G (gsm), 2.5G (gprs), 3G (WCDMA, TD-SCDMA, CDMA2000, UTMS), 4G (LTE), 4G + (LTE +), 5G, WiMax, and the like.
In this embodiment, when there is a color matching requirement for the first image, the deep neural network may be trained by using the first image as training data under the supervision of a reference image matched with the first image, so as to obtain a color matching model for the first image. In the process of training the deep neural network, the reference image is used as a learning target, so that the deep neural network can specifically learn the model parameters required for toning the first image according to the color information contained in the reference image. The first image is subjected to color mixing based on the color mixing model, the pertinence of color mixing operation is improved, the visual effect of the first image after color mixing has higher similarity with a reference image, refined intelligent color mixing based on the reference image is realized, and the personalized color mixing requirement is convenient to meet.
The above embodiments describe an implementation of supervised training of the depth linear model with the first image to be color-matched as training data under the supervision of the reference image. The supervised training process will be further illustrated with reference to fig. 2.
In some embodiments, the reference image is provided by a user. In such an embodiment, the terminal device 10 may acquire an image provided by the user for serving as a reference for toning as a reference image, and transmit the reference image to the server.
In such an embodiment, the user may specify a reference image for use as a learning target of the depth linear model according to the toning requirement. When there is a need to switch different hues or styles, the user can switch the reference picture used as the reference for toning to quickly meet different toning requirements.
Based on such an embodiment in which the reference image is provided by the user, the image toning system 100 can flexibly provide personalized image toning services to the user.
In other embodiments, the reference image may be provided autonomously by server 20, as will be described in greater detail below.
In such an embodiment, a plurality of images containing color information may be collected in advance, and the features of each image may be extracted by the server 20 based on the deep learning model. Next, each image and extracted feature correspondence is saved in an image library, as shown in FIG. 2.
Based on the above, after the server 20 acquires the first image transmitted by the terminal device 10, the features of the first image may be extracted by the depth learning model in the same manner, and an image matching the first image may be selected from the image library as a reference image according to the extracted features of the first image.
The deep learning model may be implemented based on feature extraction networks such as VGGNet, ResNet (Residual Network), ZFNet, AlexNet, and the like, which includes but is not limited to this embodiment. The features extracted from the image by the deep learning model may include at least one of depth features, gist features (a macro-meaning description of scene features), and global features of the image.
The deep learning model can comprise a multi-level feature extraction network so as to extract multi-level deep features of each image. The multi-level depth features may include shallow features and deep features of the image. The shallow feature is mainly used for expressing detail information in the image. The high-level feature extraction network can extract deep features in the image, and the deep features are mainly used for expressing semantic information in the image.
When multiple features are extracted for each image, the multiple features can be fused to obtain the fused features corresponding to each image. The fused features of the image include both shallow features and deep features of the image. And image matching is carried out based on the features of different levels, a reference image with higher matching degree with the first image can be searched from the image library, and the effect of color matching of subsequent images is improved.
The image library storing the corresponding relationship between the image and the feature may be stored in the server 20, or may be located on another device communicatively connected to the server 20, which is not limited in this embodiment.
In this embodiment, when selecting an image matching the first image, the distance between the feature of the first image and the feature in the image library may be calculated, and the distance between the features is used to express the similarity between the features. Then, according to the calculated distance between the features, the distance between which and the features of the first image meets the set conditions, are determined from the image library, and the image corresponding to the features is used as a reference image. For example, a feature closest to the feature of the first image may be searched for from an image library, and an image corresponding to the feature may be used as the reference image.
Based on the embodiment, the server 20 may automatically recommend the image that can be used as the reference for color matching to the first image according to the feature of the first image, and may avoid the phenomenon that the deviation between the reference image and the first image is large, thereby causing the distortion of the color matching result.
In some exemplary embodiments, the process of server 20 performing supervised training on the deep neural network according to the first image under the supervision of the reference image may be described as the following steps:
and S1, inputting the first image and the reference image into a deep neural network.
And S2, performing color transformation on the first image based on the deep neural network to obtain a second image.
And S3, acquiring the color transformation loss of the deep neural network according to the second image and the reference image.
And S4, optimizing the deep neural network according to the color transformation loss, and repeatedly executing the step S2 until the color transformation loss of the deep neural network converges to a specified range.
When the color conversion loss of the deep neural network converges to a specified range, the deep linear conversion network can be output as the color mixing model of the first image.
Optionally, the deep neural network adopted in each embodiment of the present application may be implemented as a deep linear neural network, a deep nonlinear neural network, or another color space transformation network, which is not limited in this embodiment. In fig. 2, the deep neural network is implemented as a deep linear neural network.
As shown in fig. 2, the deep linear neural network may be composed of n linear transformation modules, forming a nested structure of multiple layers of linear transformation modules. In fig. 2, mi denotes the dimension of the parameter matrix of the i-th layer linear transformation module, and i is 1, 2, 3 … n. The number of input channels in the first layer and the number of output channels in the last layer are 3, which indicates three channels L, a, and b. The number of channels of the linear transformation modules nested in the middle of the first layer and the last layer is gradually increased and then reduced.
Optionally, in step S1, before the first image and the reference image are input into the deep neural network, the first image and the reference image may be further transformed from an RGB (red, green, blue) color space to an Lab color space.
Wherein, the Lab color model consists of three elements of L, a and b. Wherein, L represents brightness, a and b are two opposite color channels, and the color included in the channel a is from dark green (low brightness value) to gray (middle brightness value) and then to bright pink red (high brightness value); the b-channel includes colors ranging from bright blue (low luminance value) to gray (medium luminance value) to yellow (high luminance value). The Lab mode is independent of light and pigment, theoretically comprises all colors which can be seen by human eyes, and makes up the defects of the RGB color mode.
The first image and the reference image are converted into a Lab color space and then trained, so that the wide color gamut and rich colors can be reserved as much as possible, and the image color matching effect is improved.
Correspondingly, after the color matching model is obtained through training, when the first image to be color-matched is used for color matching, the first image to be color-matched can be converted from an RGB color space to a Lab color space, so that the color matching model can calculate conveniently.
Optionally, in this embodiment, the color transformation loss of the deep neural network may include at least one of the following losses: loss at the pixel level between the input image and the output image, loss in color distribution between local regions of the input image and local regions of the output image.
Where a loss at the pixel level between the input image and the output image is determined, a first loss function loss1 may be calculated based on a mean-square error (MSE) between pixels in the first image and pixels in the reference image.
In determining the loss in color distribution of the input image and the output image, a second loss function loss2 may be determined based on the loss between the color distribution function of the first image and the color distribution function of the reference image.
In determining the loss in color distribution of the local regions of the input image and the local regions of the output image, a third loss function loss3 may be determined based on the loss between the color distribution function of the first target region in the first image and the color distribution function of the second target region in the reference image.
The color distribution function may include: color distribution histogram, color distance, etc. are used as functions for statistical color distribution characteristics.
The first loss function may be an L2 loss function, and the second loss function and the third loss function may be KL (KL divergence) loss functions, which is not limited in this embodiment.
In some embodiments, the joint loss function may be determined based on the first loss function loss1, the second loss function loss2, and the third loss function loss3 as the loss function loss of the palette model, as shown in the following equation:
loss=a1*loss1+a2*loss2+a3*loss3
wherein a1, a2 and a3 are adjustment factors among the loss functions and are used for distributing the proportion of the three different loss functions.
The first target area and the second target area may be target areas designated by a user, or target areas actively identified by the deep neural network based on a target detection algorithm. When the loss function used in the training includes loss3 corresponding to the target region, the color distribution of the second target region on the reference image can be used to control the color distribution of the first target region in the output result, so as to avoid the color distortion phenomenon of the local region.
In some exemplary embodiments, the deep neural network further comprises: a target detection network; the Object Detection network is used for detecting (Object Detection) a specified Object in an input image.
After the first image and the reference image are input into the deep neural network, the deep neural network can perform target detection on the first image based on a target detection network to obtain a first target area in the first image; and performing target detection on the reference image based on the target detection network to obtain a second target area in the reference image.
The target detection network performs target detection learning in advance according to the training samples and the label values marked on the training samples. Based on the learned target detection parameters, the target detection network may extract a region where the target object is located from the input image as a target region. Under different scenes, the target detection network can learn model parameters for detecting different types of targets based on different types of training samples and sample labels.
For example, in some embodiments, a large number of face images may be acquired as training samples, and the region where the face is located may be marked on the face images as a supervision signal. Based on the training samples and the supervision signals, the face detection network can be trained. Based on the face detection network, a face region may be detected from the first image as a first target region, and a face region may be detected from the reference image as a second target region based on the face detection network.
For another example, in some embodiments, a large number of scenic images containing sky may be acquired as training samples, and the area where the sky is located may be marked on the scenic images as a supervision signal. Based on the training samples and the supervision signals, a sky detection network can be trained. A sky region may be detected from the first image as a first target region based on a sky detection network, and a sky region may be detected from the reference image as a second target region based on the sky detection network.
Of course, in other color-mixing scenes, different types of target detection networks can be obtained through training according to the requirements on color-mixing key areas. For example, a vehicle detection network, an animal detection network, a road detection network, a building detection network, etc. may be trained, but the embodiment is not limited.
Based on the target detection network and the third loss function, the color distribution of the finally output color mixing result can be controlled by using the loss of the color distribution after the color mixing of the target area, so that the phenomenon of color distortion of a local area (such as a human face area) can be effectively avoided, and the fine color control of the local area is realized.
On the basis of the above embodiments, the image toning system 100 provided in the embodiments of the present application can also be used for toning a video, which will be described in the following as an example.
Alternatively, in a video toning scene, the terminal device 10 may acquire a video to be toned and transmit the video to the server 20. The first image may be any frame image in the video to be toned.
Optionally, after the server 20 obtains the video to be color-matched, the lens detection may be performed on the video to obtain at least one lens included in the video.
The lens in this embodiment is not a lens in a physical or optical sense, but a lens that bears an image and can form a screen. A shot is a basic unit constituting a video/movie. Several shots constitute a paragraph or a scene, and several paragraphs or scenes constitute a movie. A shot comprising the sum of a segment of the picture that the camera takes without interruption from start to rest. During shooting, the shooting angle of the camera is unchanged, and the shooting contents are closer.
As can be seen from the above, since the images of multiple frames included in one shot have high similarity, when shot detection is performed, two adjacent images that are suddenly switched can be detected from the video. The difference between the two adjacent frames of images with abrupt switching often exceeds a certain threshold in the dimension of chrominance, luminance and the like. Therefore, in some embodiments, the shot detection can be realized by detecting the images of two adjacent frames with shot switching through the change of the chrominance and the luminance between the two adjacent frames. Alternatively, in other embodiments, the background difference between two adjacent frames of images can be detected; and if the background difference between the two adjacent frames of images is greater than the set threshold value, determining that shot switching occurs between the two adjacent frames of images.
In this embodiment, to ensure that the colors of the multiple frames of images corresponding to the same shot have continuity, a color matching model may be trained for each shot. The following description will be exemplified by taking any shot contained in a video as an example. For convenience of description, the any lens will be described as a first lens hereinafter.
Alternatively, a key frame image satisfying a set condition may be determined from a plurality of frame images included in the first shot, as the first image used for training the color matching model in the foregoing embodiments.
The key frame image may be one or more representative frames of images in the first shot. The method for determining the key frame image meeting the set condition from the multiple frame images contained in the first shot can be used as an alternative embodiment of the first image, and comprises the following steps:
embodiment A: at least one image located at the middle position is selected from multiple frames of images contained in the first lens to be used as a key frame image. The middle position refers to a middle position in the multi-frame images when the multi-frame images are arranged according to the time sequence.
Embodiment B: at least one frame image with the image quality meeting the specified quality requirement is selected from the multi-frame images contained in the first lens to be used as a key frame image. The image quality may be evaluated from a plurality of different angles, such as image brightness, image exposure, image noise, etc., and the embodiment is not limited thereto.
Embodiment C: at least one frame image containing a specified object is selected from the multiple frame images contained in the first shot as a key frame image. The designated object may be a person, an animal, a vehicle, or a building. In some cases, when the plurality of frame images each contain a specified object, an image whose shooting angle of the specified object is superior may be selected as the key frame image from among the plurality of frame images. For example, when the multiple frames of images all contain a human face, an image with a better human face angle can be selected from the multiple frames of images as the first image.
The above-mentioned embodiments A, B, C can be executed alone or in any combination, and this embodiment is not limited. For example, when the embodiment B and the embodiment C are performed in combination, a multi-frame image whose image quality satisfies a specified quality requirement may be first screened out from the multi-frame images included in the first shot. Then, an image including a human face is screened out from the multi-frame images meeting the quality requirement to be used as a key frame image. Or, the multi-frame image including the face may be first screened out from the multi-frame images included in the first shot, and then, the multi-frame image with better image quality may be screened out from the multi-frame images including the face as the key frame image, which is not described again.
After the key frame image corresponding to the first shot is acquired, a reference image matched with the key frame image can be determined, and a deep neural network is trained based on the key frame image and the reference image to obtain a color matching model, which can be referred to the description of the foregoing embodiment.
After the color matching model obtained based on the key frame image training is obtained, the color matching model corresponding to the key frame image can be used as the color matching model of the first lens. Namely, the color matching model obtained based on the key frame image and the reference image training is used as the color matching model for matching colors of the multi-frame image included in the first shot. Then, the multi-frame images included in the first lens can be respectively input into the color matching model, so as to obtain the color-matched target images corresponding to the multi-frame images included in the first lens.
Similarly, based on the method described in the foregoing embodiment, the respective color matching model for each shot included in the video may be obtained through training, and based on the respective color matching model for each shot, the color matching may be performed on multiple frames of images included in each shot. And obtaining the color matching result of the multi-frame images contained in the plurality of lenses respectively to obtain the video after color matching.
In the implementation mode, the video toning is decomposed into the lens-by-lens toning, and the multi-frame images under each lens adopt the same set of conversion, so that the phenomenon that the inter-frame color style of the multi-frame images of the same lens is discontinuous can be avoided, the toned video is flawless, and the overall impression style has higher similarity with the reference image under the condition that the complexion of the human face and the like is undistorted.
In addition to the image toning system described in the foregoing embodiments, the embodiments of the present application provide an image toning method, which will be exemplarily described below.
Fig. 3 is a schematic flowchart of an image toning method executed on a terminal device side according to an exemplary embodiment of the present application, and as shown in fig. 3, the method mainly includes:
step 301, responding to the color matching request, and acquiring a first image to be color-matched.
Step 302, sending the first image to a server, so that the server trains a deep neural network under the supervision of the reference image by using the first image as training data to obtain a color matching model, and performing color transformation on the first image based on the color matching model to obtain a color-matched target image.
And 303, receiving the color-mixed target image returned by the server, and displaying the target image.
In some exemplary embodiments, the method further comprises: acquiring an image which is provided by a user and used as a color matching reference basis, and taking the image as the reference image; and sending the reference image to the server.
In some exemplary embodiments, the method further comprises: and acquiring a video to be toned provided by a user, and sending the video to the server. After receiving the video, the server may perform shot detection on the video, and train a color matching model for each shot, which may specifically refer to the records of the foregoing embodiments, and is not described herein again.
In this embodiment, after receiving a color matching request from a user, a terminal device sends a first image to be color-matched to a server, and the server trains a deep neural network by using the first image as training data under the supervision of a reference image, so as to obtain a color matching model for the first image. The first image is subjected to color mixing based on the color mixing model, the pertinence of color mixing operation is improved, the visual effect of the first image after color mixing has higher similarity with a reference image, refined intelligent color mixing based on the reference image is realized, and the personalized color mixing requirement is convenient to meet.
Fig. 4 is a schematic flowchart of an image toning method executed on a server side according to an exemplary embodiment of the present application, and as shown in fig. 4, the method mainly includes:
step 401, receiving a first image sent by a terminal device.
Step 402, determining a reference image matched with the first image, wherein the reference image comprises color information.
And 403, taking the first image as training data, and performing supervised training on the deep neural network under the supervision of the reference image to obtain a color matching model of the first image.
And step 404, inputting the first image into the color matching model to obtain a color-matched target image.
And 405, sending the target image after color mixing to the terminal equipment for displaying.
In some exemplary embodiments, a way of determining a reference image matching the first image, the reference image containing color information, includes: acquiring an image which is sent by the terminal equipment and used as a color matching reference basis, and taking the image as the reference image; or extracting the feature of the first image, and selecting an image matched with the first image from an image library as the reference image according to the feature of the first image.
In some exemplary embodiments, a way of performing supervised training on a deep neural network under the supervision of the reference image using the first image as training data to obtain a color matching model of the first image includes: inputting the first image and the reference image into the deep neural network; performing color transformation on the first image based on the deep neural network to obtain a second image; acquiring the color transformation loss of the deep neural network according to the second image and the reference image; and optimizing the deep neural network according to the color transformation loss until the color transformation loss converges to a specified range.
In some exemplary embodiments, before inputting the first image and the reference image into the deep neural network, the method further comprises: transforming the first image and the reference image from an RGB color space to a Lab color space.
In some exemplary embodiments, one way to obtain the color transformation loss of the deep neural network from the second image and the reference image may include: determining a first loss function based on a mean square error between pixels in the first image and pixels in the reference image; determining a second loss function according to the loss between the color distribution function of the first image and the color distribution function of the reference image; determining a third loss function according to a loss between the color distribution function of the first target region in the first image and the color distribution function of the second target region in the reference image; determining a joint loss function as a loss function of the palette model based on the first loss function, the second loss function, and the third loss function.
In some exemplary embodiments, the deep neural network further comprises: a target detection network; the method further comprises the following steps: performing target detection on the first image and the reference image based on the target detection network to determine the first target region in the first image and the second target region in the reference image.
In some exemplary embodiments, the first target region comprises: a facial region in the first image; the second target region comprising: a face region in the reference image.
In some exemplary embodiments, one way to obtain a first image to be toned may include: acquiring a video to be color-matched, which is sent by the terminal equipment; performing shot detection on the video to obtain at least one shot contained in the video; for a first lens of the at least one lens, determining a key frame image satisfying a set condition from a plurality of frame images included in the first lens as the first image.
In some exemplary embodiments, determining, from among a plurality of frame images included in the first shot, a key frame image that satisfies a set condition as a manner of the first image may include: selecting at least one frame image positioned at the middle position from a plurality of frame images contained in the first lens as the first image; and/or selecting at least one frame of image with image quality meeting specified quality requirements from a plurality of frames of images contained in the first lens as the first image; and/or selecting at least one frame of image containing a specified object from a plurality of frames of images contained in the first lens as the first image.
In some exemplary embodiments, after obtaining the palette model for the first image, the method further comprises: taking the color mixing model of the first image as the color mixing model of the first lens; and respectively inputting the multi-frame images contained in the first lens into the color matching model to obtain the color-matched target images corresponding to the multi-frame images contained in the first lens.
Based on the above steps, a color matching model corresponding to at least one lens included in the video to be color-matched can be obtained, the color of the image included in each lens can be respectively matched to obtain a color-matched video, and the color-matched video is returned to the terminal equipment.
In this embodiment, the server takes the reference image as a learning target in the process of training the deep neural network, so that the deep neural network can specifically learn the model parameters required for color matching of the first image according to the color information included in the reference image. The first image is subjected to color mixing based on the color mixing model, the pertinence of color mixing operation is improved, the visual effect of the first image after color mixing has higher similarity with a reference image, refined intelligent color mixing based on the reference image is realized, and the personalized color mixing requirement is convenient to meet.
It should be appreciated that in some alternative embodiments, the deep neural network-based image toning process may be implemented as a "remote computing" process as described in the above embodiments, which may be described as "cloud computing" when the server 20 is implemented as a cloud server. In other alternative embodiments, the image instance segmentation process based on the neural network model may also be implemented as "edge computation" to provide near-end services to the user.
In an embodiment of "edge calculation", the deep neural network is deployed on a terminal device, and the terminal device may train the deep neural network based on a reference image required for color matching and an image to be color-matched, and perform color matching on the image to be color-matched based on the deep neural network, which will be described in the following as an example.
Fig. 5 is a schematic flowchart of an image toning method executed on a terminal side according to another exemplary embodiment of the present application. As shown in fig. 5, the method includes:
step 501, obtaining a first image to be toned.
Step 502, determining a reference image matched with the first image, wherein the reference image comprises color information.
Step 503, taking the first image as training data, and performing supervised training on the deep neural network under the supervision of the reference image to obtain a color matching model of the first image.
And 504, inputting the first image into the color matching model to obtain a color-matched target image.
In some exemplary embodiments, determining a reference image that matches the first image, one way in which the reference image contains color information, may include: acquiring an image which is provided by a user and used as a color matching reference basis, and taking the image as the reference image; or extracting the characteristics of the first image, and selecting an image matched with the first image from an image library as the reference image according to the characteristics of the first image.
In some exemplary embodiments, one way of performing supervised training on a deep neural network under the supervision of the reference image using the first image as training data to obtain a color matching model of the first image may include: inputting the first image and the reference image into the deep neural network; performing color transformation on the first image based on the deep neural network to obtain a second image; acquiring the color transformation loss of the deep neural network according to the second image and the reference image; and optimizing the deep neural network according to the color transformation loss until the color transformation loss is converged to a specified range.
In some exemplary embodiments, before inputting the first image and the reference image into the deep neural network, the method further includes: transforming the first image and the reference image from an RGB color space to a Lab color space.
In some exemplary embodiments, one way to obtain the color transformation loss of the deep neural network from the second image and the reference image may include: determining a first loss function based on a mean square error between pixels in the first image and pixels in the reference image; determining a second loss function according to the loss between the color distribution function of the first image and the color distribution function of the reference image; determining a third loss function according to a loss between the color distribution function of the first target region in the first image and the color distribution function of the second target region in the reference image; determining a joint loss function as a loss function of the palette model based on the first loss function, the second loss function, and the third loss function.
In some exemplary embodiments, the deep neural network further comprises: a target detection network; the method further comprises the following steps: performing target detection on the first image and the reference image based on the target detection network to determine the first target region in the first image and the second target region in the reference image.
In some exemplary embodiments, the first target region comprises: a facial region in the first image; the second target region comprising: a face region in the reference image.
In some exemplary embodiments, one way to obtain a first image to be toned may include: acquiring a video to be color-matched; performing shot detection on the video to obtain at least one shot contained in the video; for a first lens of the at least one lens, determining a key frame image satisfying a set condition from a plurality of frame images included in the first lens as the first image.
In some exemplary embodiments, determining, as the first image, a key frame image that satisfies a set condition from among a plurality of frame images included in the first shot, includes: selecting at least one frame image positioned at the middle position from a plurality of frame images contained in the first lens as the first image; and/or selecting at least one frame of image with image quality meeting specified quality requirements from a plurality of frames of images contained in the first lens as the first image; and/or selecting at least one frame of image containing a specified object from a plurality of frames of images contained in the first shot as the first image.
In some exemplary embodiments, after obtaining the color mixing model of the first image, further comprising: taking the color mixing model of the first image as the color mixing model of the first lens; and respectively inputting the multi-frame images contained in the first lens into the color matching model to obtain the color-matched target images corresponding to the multi-frame images contained in the first lens.
In this embodiment, the terminal device may use the reference image as a learning target, so that the deep neural network can specifically learn the model parameters required for toning the first image according to the color information included in the reference image, and obtain a toning model; based on the color matching model, the color matching operation of the image or the video is realized. In the embodiment, when the video is subjected to color matching, the video does not need to be uploaded to a remote server or downloaded from the remote server, so that the color matching speed of the video is greatly improved, and the flow cost required by the color matching of the video is reduced.
Fig. 6 is a schematic flowchart of a video toning method executed on a terminal side according to an exemplary embodiment of the present application. As shown in fig. 6, the method includes:
step 601, obtaining a video to be toned, wherein the video comprises at least one lens.
Step 602, for any shot in the at least one shot, determining a key frame image meeting a set condition from a plurality of frame images included in the shot.
Step 603, determining a reference image matched with the key frame image, wherein the reference image comprises color information.
And step 604, taking the key frame image as training data, and performing supervised training on the deep neural network under the supervision of the reference image to obtain the color matching model of the lens.
And 605, inputting the multiple frames of images contained in the lens into the color matching model respectively to obtain the color-matched target images corresponding to the multiple frames of images contained in the lens.
In this embodiment, after the video to be color-blended is acquired, shot detection may be performed on the video to determine one or more shots included in the video, which may specifically refer to the description of the foregoing embodiment, and details are not repeated here.
In this embodiment, for each shot, a corresponding color matching model may be trained. In training the tone model for each shot, a key frame image to be used as training data can be determined from a plurality of frame images included in the shot, and the description of embodiment a, embodiment B, and embodiment C can be specifically referred to.
The reference image can be provided by a user or can be obtained by terminal equipment through self-searching based on a pre-constructed image library. The optional implementation of performing supervised training on the deep neural network under the supervision of the reference image by using the key frame image as training data may refer to the optional implementation of training the deep neural network according to the first image and the reference image described in the foregoing embodiment, and details are not repeated here.
Based on the key frame image of each lens and the reference image corresponding to each key frame image, the respective color matching model of each lens can be obtained through training, and then color matching can be respectively carried out on each lens.
In the embodiment, the video toning is decomposed into the lens-by-lens toning, and the multi-frame images under each lens adopt the same set of conversion, so that the phenomenon that the inter-frame color style of the multi-frame images of the same lens is discontinuous can be avoided, the toned video is flawless, and the overall impression style has higher similarity with the reference image.
Fig. 7 is a schematic flowchart of a color matching method according to another exemplary embodiment of the present application, and as shown in fig. 7, the method includes:
and step 701, responding to a call request of the client to the first interface, and acquiring the first image to be color-mixed, which is contained in the interface parameters.
Step 702, determining a reference image matched with the first image, wherein the reference image comprises color information.
And 703, taking the first image as training data, and performing supervised training on the deep neural network under the supervision of the reference image to obtain a color matching model of the first image.
Step 704, returning the color matching model of the first image to the client, so that the client uses the color matching model of the first image to match colors of the first image.
The execution subject of the embodiment may be a server device, such as a conventional server or a cloud server. The client can be realized as a mobile phone, a computer, a tablet computer and other equipment on the user side.
In this embodiment, the image toning method provided in each of the foregoing embodiments may be packaged as a Software tool, such as a SaaS (Software-as-a-Service) tool, that can be used by a third party. Wherein the SaaS tool may be implemented as a plug-in or an application. The plug-in or application may be deployed on a server-side device and may open a specified interface to a third-party user, such as a client. For convenience of description, in the present embodiment, the specified interface is described as the first interface. Furthermore, a third-party user such as a client conveniently accesses and uses the image toning method provided by the server device by calling the first interface.
For example, in some scenarios, the SaaS tool may be deployed on a cloud server, and a third-party user may invoke a first interface provided by the cloud server to use the SaaS tool online. When the third-party user calls the first interface, input data required for training the image color matching model, that is, the first image to be color matched according to this embodiment, may be provided to the SaaS tool by configuring interface parameters of the first interface. In some embodiments, when the color matching model is used for color matching of the video, the third party user may provide the video to be color matched by configuring the interface parameters of the first interface.
After the SaaS tool receives the call request aiming at the first interface, the first image or video to be toned provided by the client side can be obtained by analyzing the interface parameters of the first interface. If the interface parameters of the first interface are analyzed to the first image, the SaaS tool can determine a reference image matched with the first image, and the first image is used as training data under the supervision of the reference image to train to obtain a color matching model of the first image. After the color matching model of the first image is obtained, the color matching model of the first image can be returned to the client side through the first interface or other communication modes for the client side to use.
If the interface parameters of the first interface are analyzed to obtain a video to be color-matched, the SaaS tool can perform shot detection on the video, and select a key frame image for training a color matching model corresponding to each shot. For each shot, the SaaS tool can determine a reference image matched with a key frame image of the shot, and under the supervision of the reference image, train the key frame image as training data to obtain a color matching model corresponding to the shot. After the color matching model corresponding to each lens is obtained, the color matching model corresponding to each lens can be returned to the client through the first interface or other communication modes for the client to use.
In this embodiment, the server device may provide the client with the color matching model training service based on the SaaS tool running thereon, and the client user may use the color matching model training service provided by the server device by calling an interface provided by the SaaS tool. Based on the interaction between the client and the server equipment, the client can completely submit the training operation of the color matching model to the server equipment for execution, and further, the model training operation with low cost and high efficiency can be realized by means of the strong computing capability of the server equipment.
It should be noted that the execution subjects of the steps of the methods provided in the above embodiments may be the same device, or different devices may be used as the execution subjects of the methods. For example, the execution subjects of steps 601 to 603 may be device a; for another example, the execution subject of steps 601 and 602 may be device a, and the execution subject of step 503 may be device B; and so on.
In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations are included in a specific order, but it should be clearly understood that the operations may be executed out of the order presented herein or in parallel, and the sequence numbers of the operations, such as 601, 602, etc., are merely used for distinguishing different operations, and the sequence numbers themselves do not represent any execution order. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel.
It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.
Fig. 8 is a schematic structural diagram of a terminal device provided in an exemplary embodiment of the present application, where the terminal device is suitable for the image toning system provided in the foregoing embodiment. As shown in fig. 8, the terminal device includes: memory 801, processor 802, and communications component 803.
A memory 801 for storing computer programs and may be configured to store other various data to support operations on the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, contact data, phonebook data, messages, pictures, videos, etc.
A processor 802, coupled to the memory 801, for executing computer programs in the memory 801 for: responding to a color matching request, and acquiring a first image to be subjected to color matching; sending the first image to a server through a communication component 803, so that the server trains a deep neural network under the supervision of a reference image by using the first image as training data to obtain a color matching model, and performing color transformation on the first image based on the color matching model to obtain a color-matched target image; and receiving the target image after color mixing returned by the server through a communication component 803, and displaying the target image.
Further optionally, the processor 802 is further configured to: and acquiring an image which is provided by a user and used as a toning reference basis, taking the image as the reference image, and sending the reference image to the server.
Further, as shown in fig. 8, the terminal device further includes: display component 804, power component 805, audio component 806, and other components. Only some of the components are schematically shown in fig. 8, and the terminal device is not meant to include only the components shown in fig. 8.
In this embodiment, after receiving a color matching request from a user, a terminal device sends a first image to be color-matched to a server, and the server trains a deep neural network by using the first image as training data under the supervision of a reference image, so as to obtain a color matching model for the first image. The first image is subjected to color mixing based on the color mixing model, the pertinence of color mixing operation is improved, the visual effect of the first image after color mixing has higher similarity with a reference image, refined intelligent color mixing based on the reference image is realized, and the personalized color mixing requirement is convenient to meet.
In addition to the foregoing embodiments, the terminal device illustrated in fig. 8 may also execute the following image toning logic: the processor 802 obtains a first image to be toned; determining a reference image that matches the first image, the reference image including color information; taking the first image as training data, and carrying out supervised training on a deep neural network under the supervision of the reference image to obtain a color matching model of the first image; and inputting the first image into the color matching model to obtain a color-matched target image.
Further optionally, when determining the reference image matched with the first image, the processor 802 is specifically configured to: acquiring an image which is provided by a user and used as a color matching reference basis, and taking the image as the reference image; or extracting the feature of the first image, and selecting an image matched with the first image from an image library as the reference image according to the feature of the first image.
Further optionally, the processor 802 uses the first image as training data, and performs supervised training on the deep neural network under the supervision of the reference image to obtain the color matching model of the first image, specifically configured to: inputting the first image and the reference image into the deep neural network; performing color transformation on the first image based on the deep neural network to obtain a second image; acquiring the color transformation loss of the deep neural network according to the second image and the reference image; and optimizing the deep neural network according to the color transformation loss until the color transformation loss converges to a specified range.
Further optionally, the processor 802, before inputting the first image and the reference image into the deep neural network, is further configured to: transforming the first image and the reference image from an RGB color space to a Lab color space.
Further optionally, when obtaining the color transformation loss of the deep neural network according to the second image and the reference image, the method is specifically configured to: determining a first loss function based on a mean square error between pixels in the first image and pixels in the reference image; determining a second loss function according to the loss between the color distribution function of the first image and the color distribution function of the reference image; determining a third loss function according to a loss between the color distribution function of the first target region in the first image and the color distribution function of the second target region in the reference image; determining a joint loss function as a loss function of the palette model based on the first loss function, the second loss function, and the third loss function.
Further optionally, the deep neural network further comprises: a target detection network; the processor 802 is further configured to: performing target detection on the first image and the reference image based on the target detection network to determine the first target region in the first image and the second target region in the reference image.
Further optionally, the first target region comprises: a face region in the first image; the second target region comprising: a face region in the reference image.
Further optionally, when acquiring the first image to be toned, the processor 802 is specifically configured to: acquiring a video to be color-matched; performing shot detection on the video to obtain at least one shot contained in the video; for a first lens of the at least one lens, determining a key frame image satisfying a set condition from a plurality of frame images included in the first lens as the first image.
Further optionally, when determining, as the first image, a key frame image that meets a set condition from among multiple frame images included in the first shot, the processor 802 is specifically configured to: selecting at least one image located at a middle position from a plurality of images contained in the first lens as the first image; and/or selecting at least one frame of image with image quality meeting specified quality requirements from a plurality of frames of images contained in the first lens as the first image; and/or selecting at least one frame of image containing a specified object from a plurality of frames of images contained in the first shot as the first image.
Further optionally, the processor 802, after obtaining the color-mixing model of the first image, is further configured to: taking the color mixing model of the first image as the color mixing model of the first lens; and respectively inputting the multi-frame images contained in the first lens into the color matching model to obtain the color-matched target images corresponding to the multi-frame images contained in the first lens.
In this embodiment, the terminal device may train the deep neural network based on the reference image and the first image to be color-matched to obtain a color matching model; based on the color matching model, the color matching operation of the image or the video is realized. In the embodiment, when the video is subjected to color matching, the video does not need to be uploaded to a remote server or downloaded from the remote server, so that the color matching speed of the video is greatly improved, and the flow cost required by the color matching of the video is reduced.
In addition to the foregoing embodiments, the terminal device illustrated in fig. 8 may also execute the following video toning logic: the processor 802 obtains a video to be toned, the video including at least one shot; for any lens in the at least one lens, determining a key frame image meeting a set condition from a plurality of frame images contained in the lens; determining a reference image that matches the key frame image, the reference image including color information; taking the key frame image as training data, and carrying out supervised training on a deep neural network under the supervision of the reference image to obtain a color matching model of the lens; and respectively inputting the multi-frame images contained in the lens into the color matching model to obtain the color-matched target images corresponding to the multi-frame images contained in the lens.
In the embodiment, the video toning is decomposed into the lens-by-lens toning, and the multi-frame images under each lens adopt the same set of conversion, so that the phenomenon that the inter-frame color style of the multi-frame images of the same lens is discontinuous can be avoided, the toned video is flawless, and the overall impression style has higher similarity with the reference image.
Accordingly, the present application further provides a computer-readable storage medium storing a computer program, where the computer program can implement the steps that can be executed by the terminal device in the foregoing method embodiments when executed.
Fig. 9 is a schematic structural diagram of a server provided in an exemplary embodiment of the present application, and the server is suitable for the image toning system provided in the foregoing embodiment. As shown in fig. 9, the server includes: memory 901, processor 902, and communications component 903.
A memory 901 for storing a computer program and may be configured to store other various data to support operations on the server. Examples of such data include instructions for any application or method operating on the server, contact data, phonebook data, messages, pictures, videos, and so forth.
A processor 902, coupled to the memory 901, for executing the computer program in the memory 901 for: receiving a first image sent by the terminal equipment through a communication component 903; determining a reference image that matches the first image, the reference image including color information; taking the first image as training data, and carrying out supervised training on a deep neural network under the supervision of the reference image to obtain a color matching model of the first image; inputting the first image into the color matching model to obtain a color-matched target image; and sending the target image after color mixing to the terminal equipment for displaying through a communication component 903.
Further optionally, when determining the reference image matching the first image, the processor 902 is specifically configured to: acquiring an image which is provided by a user and used as a color matching reference basis, and taking the image as the reference image; or extracting the feature of the first image, and selecting an image matched with the first image from an image library as the reference image according to the feature of the first image.
Further optionally, the processor 902 takes the first image as training data, and performs supervised training on the deep neural network under the supervision of the reference image to obtain a color matching model of the first image, which is specifically configured to: inputting the first image and the reference image into the deep neural network; performing color transformation on the first image based on the deep neural network to obtain a second image; acquiring the color transformation loss of the deep neural network according to the second image and the reference image; and optimizing the deep neural network according to the color transformation loss until the color transformation loss converges to a specified range.
Further optionally, the processor 902, before inputting the first image and the reference image into the deep neural network, is further configured to: transforming the first image and the reference image from an RGB color space to a Lab color space.
Further optionally, when obtaining the color transformation loss of the deep neural network according to the second image and the reference image, the method is specifically configured to: determining a first loss function based on a mean square error between pixels in the first image and pixels in the reference image; determining a second loss function according to the loss between the color distribution function of the first image and the color distribution function of the reference image; determining a third loss function according to a loss between the color distribution function of the first target region in the first image and the color distribution function of the second target region in the reference image; determining a joint loss function as a loss function of the palette model based on the first loss function, the second loss function, and the third loss function.
Further optionally, the deep neural network further comprises: a target detection network; the processor 902 is further configured to: performing target detection on the first image and the reference image based on the target detection network to determine the first target region in the first image and the second target region in the reference image.
Further optionally, the first target region comprises: a facial region in the first image; the second target region comprising: a face region in the reference image.
Further optionally, when acquiring the first image to be toned, the processor 902 is specifically configured to: acquiring a video to be color-mixed sent by terminal equipment; performing shot detection on the video to obtain at least one shot contained in the video; for a first lens of the at least one lens, determining a key frame image satisfying a set condition from a plurality of frame images included in the first lens as the first image.
Further optionally, when determining, as the first image, a key frame image that meets a set condition from among multiple frame images included in the first shot, the processor 902 is specifically configured to: selecting at least one frame image positioned at the middle position from a plurality of frame images contained in the first lens as the first image; and/or selecting at least one frame of image with image quality meeting specified quality requirements from a plurality of frames of images contained in the first lens as the first image; and/or selecting at least one frame of image containing a specified object from a plurality of frames of images contained in the first lens as the first image.
Further optionally, the processor 902, after obtaining the palette model for the first image, is further configured to: taking the color mixing model of the first image as the color mixing model of the first lens; and respectively inputting the multiple frames of images contained in the first lens into the color mixing model to obtain the target images after color mixing corresponding to the multiple frames of images contained in the first lens.
Based on the steps, the color matching model corresponding to at least one lens contained in the video to be color-matched can be obtained, the color of the image contained in each lens can be matched to obtain the video after color matching, and the video after color matching is returned to the terminal equipment.
Further, as shown in fig. 9, the server further includes: power supply component 904, and the like. Only some of the components are schematically shown in fig. 9, and it is not meant that the server includes only the components shown in fig. 9.
In this embodiment, the server takes the reference image as a learning target in the process of training the deep neural network, so that the deep neural network can specifically learn the model parameters required for toning the first image according to the color information included in the reference image. The first image is subjected to color mixing based on the color mixing model, the pertinence of color mixing operation is improved, the visual effect of the first image after color mixing has higher similarity with a reference image, refined intelligent color mixing based on the reference image is realized, and the personalized color mixing requirement is convenient to meet.
In addition to the foregoing embodiments, the server illustrated in FIG. 9 may also perform the following image toning logic: acquiring a first image to be toned, which is contained in interface parameters, in response to a calling request of a client to a first interface; determining a reference image that matches the first image, the reference image including color information; taking the first image as training data, and carrying out supervised training on a deep neural network under the supervision of the reference image to obtain a color matching model of the first image; and returning the color mixing model of the first image to the client so that the client performs color mixing on the first image by using the color mixing model of the first image.
In addition to the foregoing embodiments, the server illustrated in FIG. 9 may also perform the following image toning logic: acquiring a first image to be toned; determining a reference image that matches the first image, the reference image including color information; and taking the first image as training data, and carrying out supervised training on the deep neural network under the supervision of the reference image to obtain a color matching model of the first image.
Accordingly, the present application further provides a computer-readable storage medium storing a computer program, where the computer program can implement the steps that can be executed by the server in the foregoing method embodiments when executed.
The memories of fig. 8 and 9 may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The communication components of fig. 8 and 9 described above are configured to facilitate communication between the device in which the communication component is located and other devices in a wired or wireless manner. The device in which the communication component is located may access a wireless network based on a communication standard, such as WiFi, 2G, 3G, 4G, or 5G, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component may be implemented based on Near Field Communication (NFC) technology, Radio Frequency Identification (RFID) technology, infrared data association (I rDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
The display in fig. 8 described above includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
The power supply components of fig. 8 and 9 described above provide power to the various components of the device in which the power supply components are located. The power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device in which the power component is located.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (19)

1. An image toning method, comprising:
responding to a calling request of a client to a first interface, and acquiring a first image to be toned, wherein the first image is contained in interface parameters;
determining a reference image that matches the first image, the reference image including color information;
taking the first image as training data, and carrying out supervised training on a deep neural network under the supervision of the reference image to obtain a color matching model of the first image;
and returning the color mixing model of the first image to the client so that the client performs color mixing on the first image by using the color mixing model of the first image.
2. An image toning method, comprising:
acquiring a first image to be toned;
determining a reference image that matches the first image, the reference image including color information;
and taking the first image as training data, and carrying out supervised training on the deep neural network under the supervision of the reference image to obtain a color matching model of the first image.
3. An image toning method, comprising:
acquiring a first image to be toned;
determining a reference image that matches the first image, the reference image including color information;
taking the first image as training data, and carrying out supervised training on a deep neural network under the supervision of the reference image to obtain a color matching model of the first image;
and inputting the first image into the color matching model to obtain a color-matched target image.
4. The method of claim 3, wherein determining a reference image that matches the first image, the reference image containing color information comprises:
acquiring an image which is provided by a user and used as a color matching reference basis, and taking the image as the reference image; alternatively, the first and second electrodes may be,
and extracting the characteristics of the first image, and selecting an image matched with the first image from an image library as the reference image according to the characteristics of the first image.
5. The method of claim 3, wherein performing supervised training on a deep neural network under the supervision of the reference image using the first image as training data to obtain a color matching model of the first image comprises:
inputting the first image and the reference image into the deep neural network;
performing color transformation on the first image based on the deep neural network to obtain a second image;
acquiring the color transformation loss of the deep neural network according to the second image and the reference image;
and optimizing the deep neural network according to the color transformation loss until the color transformation loss converges to a specified range.
6. The method of claim 5, wherein before inputting the first image and the reference image into a deep neural network, further comprising:
transforming the first image and the reference image from an RGB color space to a Lab color space.
7. The method of claim 5, wherein obtaining the color transformation loss of the deep neural network from the second image and the reference image comprises:
determining a first loss function based on a mean square error between pixels in the first image and pixels in the reference image;
determining a second loss function according to the loss between the color distribution function of the first image and the color distribution function of the reference image;
determining a third loss function according to a loss between the color distribution function of the first target region in the first image and the color distribution function of the second target region in the reference image;
determining a joint loss function as a loss function of the palette model based on the first loss function, the second loss function, and the third loss function.
8. The method of claim 7, wherein the deep neural network further comprises: a target detection network; the method further comprises the following steps:
performing target detection on the first image and the reference image based on the target detection network to determine the first target region in the first image and the second target region in the reference image.
9. The method of claim 7, wherein the first target region comprises: a facial region in the first image; the second target region comprising: a face region in the reference image.
10. The method according to any of claims 3-9, wherein obtaining a first image to be toned comprises:
acquiring a video to be color-matched;
performing shot detection on the video to obtain at least one shot contained in the video;
for a first lens of the at least one lens, determining a key frame image satisfying a set condition from a plurality of frame images included in the first lens as the first image.
11. The method according to claim 10, wherein determining, as the first image, a key frame image that satisfies a set condition from among a plurality of frame images included in the first shot, comprises:
selecting at least one frame image positioned at the middle position from a plurality of frame images contained in the first lens as the first image; and/or the presence of a gas in the gas,
selecting at least one frame of image with image quality meeting specified quality requirements from a plurality of frames of images contained in the first lens as the first image; and/or the presence of a gas in the gas,
at least one frame image containing a specified object is selected from the multiple frame images contained in the first shot as the first image.
12. The method of claim 10, after obtaining the palette model for the first image, further comprising:
taking the color mixing model of the first image as the color mixing model of the first lens;
and respectively inputting the multi-frame images contained in the first lens into the color matching model to obtain the color-matched target images corresponding to the multi-frame images contained in the first lens.
13. A video toning method, comprising:
acquiring a video to be subjected to color mixing, wherein the video comprises at least one lens;
for any lens in the at least one lens, determining a key frame image meeting a set condition from a plurality of frame images contained in the lens;
determining a reference image that matches the key frame image, the reference image including color information;
taking the key frame image as training data, and carrying out supervised training on a deep neural network under the supervision of the reference image to obtain a color matching model of the lens;
and respectively inputting the multi-frame images contained in the lens into the color matching model to obtain the color-matched target images corresponding to the multi-frame images contained in the lens.
14. An image toning method, comprising:
responding to the color mixing request, and acquiring a first image to be mixed with colors;
sending the first image to a server so that the server trains a deep neural network under the supervision of a reference image by taking the first image as training data to obtain a color mixing model, and performing color transformation on the first image based on the color mixing model to obtain a color-mixed target image;
and receiving the color-mixed target image returned by the server, and displaying the target image.
15. The method of claim 14, further comprising:
acquiring an image which is provided by a user and used as a color matching reference basis, and taking the image as the reference image;
and sending the reference image to the server.
16. An image toning method, comprising:
receiving a first image sent by terminal equipment;
determining a reference image that matches the first image, the reference image including color information;
taking the first image as training data, and carrying out supervised training on a deep neural network under the supervision of the reference image to obtain a color matching model of the first image;
inputting the first image into the color matching model to obtain a color-matched target image;
and sending the target image after color mixing to the terminal equipment for displaying.
17. A terminal device, comprising: a memory and a processor; the memory is to store one or more computer instructions; the processor is to execute the one or more computer instructions to: performing the steps of the method of any one of claims 3-15.
18. A server, comprising: a memory and a processor;
the memory is to store one or more computer instructions;
the processor is to execute the one or more computer instructions to: performing the steps of the method of claim 1, 2 or 16.
19. A computer-readable storage medium storing a computer program, wherein the computer program is capable of performing the steps of the method of any one of claims 3-15 or the steps of the method of claim 1, 2 or 16 when executed by a processor.
CN202011329971.2A 2020-11-24 2020-11-24 Method, system, equipment and storage medium for toning image and video Pending CN114549674A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011329971.2A CN114549674A (en) 2020-11-24 2020-11-24 Method, system, equipment and storage medium for toning image and video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011329971.2A CN114549674A (en) 2020-11-24 2020-11-24 Method, system, equipment and storage medium for toning image and video

Publications (1)

Publication Number Publication Date
CN114549674A true CN114549674A (en) 2022-05-27

Family

ID=81660585

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011329971.2A Pending CN114549674A (en) 2020-11-24 2020-11-24 Method, system, equipment and storage medium for toning image and video

Country Status (1)

Country Link
CN (1) CN114549674A (en)

Similar Documents

Publication Publication Date Title
CN108600781B (en) Video cover generation method and server
US10728510B2 (en) Dynamic chroma key for video background replacement
US20140099022A1 (en) Image color matching and equalization devices and related methods
US11908241B2 (en) Method for correction of the eyes image using machine learning and method for machine learning
US11861810B2 (en) Image dehazing method, apparatus, and device, and computer storage medium
WO2022227308A1 (en) Image processing method and apparatus, device, and medium
KR20150087362A (en) Recommending transformations for photography
CN110647930B (en) Image processing method and device and electronic equipment
KR20170017911A (en) Methods and systems for color processing of digital images
US11631217B2 (en) Data processing method and electronic device
CN106165409B (en) Image processing apparatus, photographic device, image processing method and program
EP3751505A1 (en) Image coloring method and apparatus
CN107507158A (en) A kind of image processing method and device
KR20210007276A (en) Image generation apparatus and method thereof
CN114170425A (en) Model training method, image classification method, server and storage medium
WO2015189369A1 (en) Methods and systems for color processing of digital images
CN112419218A (en) Image processing method and device and electronic equipment
CN105677005A (en) Screen control method and device for mobile terminal
CN114549674A (en) Method, system, equipment and storage medium for toning image and video
CN111107336A (en) Image processing method, image processing device, electronic equipment and storage medium
CN113497954B (en) Video toning method, device and storage medium
Górriz et al. End-to-end conditional GAN-based architectures for image colourisation
CN115249221A (en) Image processing method and device and cloud equipment
EP3038059A1 (en) Methods and systems for color processing of digital images
WO2016102386A1 (en) Methods and systems for color processing of digital images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination