WO2023178648A1 - 视频处理方法及装置、电子设备、计算机可读存储介质 - Google Patents

视频处理方法及装置、电子设备、计算机可读存储介质 Download PDF

Info

Publication number
WO2023178648A1
WO2023178648A1 PCT/CN2022/083016 CN2022083016W WO2023178648A1 WO 2023178648 A1 WO2023178648 A1 WO 2023178648A1 CN 2022083016 W CN2022083016 W CN 2022083016W WO 2023178648 A1 WO2023178648 A1 WO 2023178648A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
color
color enhancement
image
video
Prior art date
Application number
PCT/CN2022/083016
Other languages
English (en)
French (fr)
Inventor
陈冠男
朱丹
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to PCT/CN2022/083016 priority Critical patent/WO2023178648A1/zh
Priority to CN202280000545.5A priority patent/CN117157665A/zh
Publication of WO2023178648A1 publication Critical patent/WO2023178648A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation

Definitions

  • the present disclosure relates to the fields of image processing technology and artificial intelligence, and specifically to video processing methods and devices, electronic equipment, and computer-readable storage media.
  • Color enhancement is an important technology in the field of image processing. This technology adjusts the color of pictures and videos to make the images richer and more realistic, and greatly improves the video quality. This technology can be applied to computer vision scenarios such as video color creation and film source image quality enhancement.
  • Embodiments of the present disclosure provide a video processing method and device, electronic equipment, and computer-readable storage media.
  • embodiments of the present disclosure provide a video processing method, which includes: obtaining a video frame of a video to be processed; performing color adjustment on the video frame using at least one color enhancement model in a color enhancement model pool to obtain a color-adjusted video output frames; wherein, the color enhancement model pool includes a collection of models for color adjustment for videos with different style requirements of multiple video content types.
  • using at least one color enhancement model in the color enhancement model pool to perform color adjustment on the video frame to obtain a color-adjusted video output frame includes:
  • the frames are processed to obtain the color-adjusted video output frames.
  • using at least one color enhancement model in the color enhancement model pool to perform color adjustment on the video frame to obtain a color-adjusted video output frame includes: using each color enhancement model in the color enhancement model pool to process The video frame is used to obtain the color-adjusted video frame output by each color enhancement model; the video frame is processed based on the content analysis model to generate a weight value corresponding to each color enhancement model and the video frame; wherein , the content analysis model is configured to determine the weight value of each color enhancement model in the color enhancement model pool according to the video frame; using the weight value corresponding to each color enhancement model and the video frame, The color-adjusted video frames output by each color enhancement model are subjected to weighted fusion processing to generate the video output frames.
  • the color enhancement model includes: an image reconstruction branch model and a color enhancement branch model; the selected color enhancement model is used to process the video frame to obtain the color-adjusted video output frame, including:
  • the image reconstruction branch model, the second image feature is input into the color enhancement branch model to perform image reconstruction and color adjustment according to the first image feature and the second image feature; to the image reconstruction branch model
  • the output result and the output result of the color enhancement branch model are subjected to feature fusion processing to obtain a fusion result; according to the fusion result, a color-adjusted video output frame is generated.
  • the image reconstruction branch model and the color enhancement branch model both include: N dense residual connection network modules RDB; the color enhancement model also includes N fully connected layers; where N is greater than or equal to an integer of 1; the first image feature is input into the image reconstruction branch model, and the second image feature is input into the color enhancement branch model, so as to use the first image feature and the second image feature according to the first image feature and the second image feature.
  • Perform image reconstruction and color adjustments including:
  • the first image feature is processed by the current RDB of the second branch model to obtain the feature map output by the current RDB of the first branch model and the low-frequency color output by the current RDB of the second branch model.
  • a feature vector is obtained after processing by a fully connected layer, and the feature vector is multiplied correspondingly with the feature map output by the current RDB of the first branch model, Obtain a feature map adjusted by local color features; use the feature map adjusted by local color features as a new first image feature, and use the low-frequency color feature output by the current RDB of the second branch model as a new second image feature ; Obtain the next RDB in the image reconstruction branch model and the next RDB in the color enhancement branch model as the new current RDB of the first branch model and the new current RDB of the second branch model until the number of acquisitions is equal to N , use the newly obtained feature map adjusted by local color features as the output result of the image reconstruction branch model, and use the newly obtained low-frequency color features as the output result of the color enhancement branch model.
  • the RDB includes a densely connected network layer and a local feature fusion layer; the local feature fusion layer is used to: use local residual learning to characterize the input features of the RDB and the output features of the densely connected network layer. Add up to obtain the output features of the RDB; the dense connection layer includes: M dense connection modules and 1 connection module, M is an integer greater than or equal to 1;
  • the input of the first dense connection module is the input feature of the RDB;
  • the input feature of the i-th dense connection module is: the input feature of the RDB and the input feature of each dense connection module located before the i-th dense connection module.
  • Output feature, i is an integer greater than 1 and less than or equal to M;
  • the input of the connection module is: the input feature of the RDB and the output feature of each dense connection module.
  • each color enhancement model in the color enhancement model pool has the same model structure; the color enhancement models in the color enhancement model pool adopt different color enhancements in advance for the color styles of different video content types.
  • Data pair a trained color enhancement model obtained by training the initial color enhancement model; the color enhancement data pair includes: the same image size selected from the same image position in the untoned image and the first toned image
  • the first toned image block is an untoned image block and a toned image block
  • the first toned image is an image obtained in advance and the untoned image is toned based on the image color style characteristics.
  • Training the initial color enhancement model includes: using the initial color enhancement model to process an untoned image block selected from the untoned image, obtaining a color adjustment result of the untoned image block, and From the color enhancement data pair, a toned image block corresponding to the untoned image block is obtained as a color adjustment reference result; based on the color adjustment result and the color adjustment reference result, an initial color is constructed Enhance the loss function of the model; use the loss function of the initial color enhancement model to adjust the model parameters of the initial color enhancement model to obtain an updated color enhancement model; use the updated color enhancement model as a new initial color enhancement model, and use The new initial color enhancement model processes the untoned image blocks selected from the untoned image until the training process of the initial color enhancement model meets the first training end condition, and the trained color enhancement is obtained model; wherein the first training end condition includes at least one of the following conditional items: the training loss calculated according to the loss function of the initial color enhancement model is less than or equal to the set first loss threshold, the initial color The number of training times of the enhanced
  • the loss function of constructing the initial color enhancement model based on the color adjustment result and the color adjustment reference result includes: calculating the minimized square error between the color adjustment result and the color adjustment reference result, Obtain the minimized square error loss; calculate the Gaussian filter output result of the color adjustment result and the Gaussian filter output result of the color adjustment reference result to obtain the color loss; weight the minimized square error loss and the color loss Sum to generate the loss function of the initial color enhancement model.
  • the content analysis model includes: convolutional networks of K stacked convolution structures and two fully connected layers connected in sequence, each of the first K-1 stacked convolution structures includes a Convolutional layer, an activation layer and a pooling layer.
  • the Kth layer stacked convolution structure includes a convolutional layer and an activation layer.
  • the content analysis model is based on each color enhancement model in the pre-trained color enhancement model pool, using image data pairs of different styles in advance to train the initial content analysis model.
  • Analysis model wherein the image data pair includes: an untoned image block and a toned image block with the same image size selected from the same image position in the untoned image and the second toned image, the The second toned image is a pre-acquired image that has been toned on an untoned image based on different video content types.
  • the training of the initial content analysis model includes:
  • each color enhancement model in the pre-trained color enhancement model pool Utilize each color enhancement model in the pre-trained color enhancement model pool to process the untoned image block selected from the untoned image, and obtain the effect of each color enhancement model on the untoned image block.
  • the color adjustment result use the initial content analysis model to process the untoned image block to obtain a predetermined number of weight values, the predetermined number being the same as the number of color enhancement models in the color enhancement model pool;
  • weighted fusion processing is performed on the color adjustment results of the untoned image block for each color enhancement model to obtain the color enhancement result of the untoned image block, and from the Obtain the toned image block corresponding to the untoned image block from the image data pair as the color enhancement reference result; the color enhancement result based on the untoned image block and the color enhancement reference result , construct the loss function of the initial content analysis model; use the loss function of the initial content analysis model to adjust the model parameters of the content analysis model, so as to use the adjusted model parameters to obtain an updated content analysis model; convert the updated The content analysis model serves as a new initial content analysis model, and uses the new initial content analysis model to process the untoned image blocks until the training process of the initial content analysis model satisfies the second training end condition, and we obtain The trained content analysis model; wherein the second training end condition includes at least one of the following conditional items: the training loss calculated according to the loss function of the initial content analysis model is less than or equal to the set second loss
  • the loss function of the initial content analysis model is a loss function constructed based on the minimized square error of the color enhancement result of the untoned image block and the color enhancement reference result.
  • embodiments of the present disclosure provide a video processing device, including: an acquisition module for acquiring a video frame to be processed; a processing module for using at least one color enhancement model in a color enhancement model pool to process the video Color adjustment is performed on the frames to obtain color-adjusted video output frames; wherein, the color enhancement model pool includes a collection of models that are color-adjusted for videos with different style requirements of multiple video content types.
  • an embodiment of the present disclosure provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program, the embodiment of the present disclosure is implemented. Any video processing method.
  • embodiments of the present disclosure provide a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, any video processing method of the embodiments of the disclosure is implemented.
  • Figure 1 is a flow chart of a video processing method in some embodiments of the present disclosure
  • Figure 2 is a flow chart of color adjustment processing in some embodiments of the present disclosure
  • Figure 3 is a flow chart of color adjustment processing in some embodiments of the present disclosure.
  • Figure 4a is a schematic structural diagram of a video processing system according to some embodiments of the present disclosure.
  • Figure 4b is a schematic structural diagram of a video processing system according to some embodiments of the present disclosure.
  • Figure 5 is a schematic diagram of the specific architecture of the color enhancement model according to an embodiment of the present disclosure.
  • Figure 6 is a specific flow chart of using a color enhancement model to adjust the color of video frames in an embodiment of the present disclosure
  • Figure 7 is a specific flow chart for image reconstruction and color adjustment in an embodiment of the present disclosure.
  • Figure 8 is a specific architecture diagram of the dense residual connection network module in the embodiment of the present disclosure.
  • Figure 9 is a schematic diagram of the specific architecture of the content analysis model in the embodiment of the present disclosure.
  • Figure 10 is a schematic structural diagram of a video processing device according to an embodiment of the present disclosure.
  • FIG. 11 is a block diagram of an electronic device implementing the video processing method according to an embodiment of the present disclosure.
  • HDR high-definition TV
  • new-generation broadband mobile communication technology at the transmission end to connect the network, ultra-high-definition TV stations, etc.
  • the resolution of the display end may include Standard Definition (SD), High Definition (High Definition, HD), Full High Definition (FHD) and Ultra High-Definition (Ultra High-Definition, Ultra HD). and other display formats.
  • SD Standard Definition
  • High Definition High Definition
  • HD Full High Definition
  • FHD Full High Definition
  • Ultra High-Definition Ultra High-Definition
  • the standard definition resolution may be 480 ⁇ 320 pixels or 640 ⁇ 480 pixels
  • the high definition resolution may be 1024 ⁇ 720 pixels
  • the full HD resolution may be 1920 ⁇ 1080 pixels
  • the ultra high definition resolution may be, for example, 1024 ⁇ 720 pixels.
  • the resolution can be, for example, 3840 ⁇ 2160 pixels, that is, the entire display screen can display 3840 pixels in the horizontal direction and 2160 pixels in the vertical direction. Since the ultra-high definition resolution can reach four times the full HD resolution, it can also Ultra HD resolution is called 4k resolution.
  • the 8k resolution can be 7680 ⁇ 4320 pixels, that is, the entire display screen can display 7680 pixels in the horizontal direction and 4320 pixels in the vertical direction.
  • the 8k resolution may also be named ultra-high definition resolution.
  • the new generation broadband mobile communication technology connection network may be, for example, a fifth generation mobile communication technology (5th Generation Mobile Communication Technology, 5G) or higher network standard communication network.
  • 5G Fifth Generation Mobile Communication Technology
  • the communication network with high speed and low delay characteristics provided by the new generation of broadband mobile communication technology provides better support for the popularization of ultra-clear video.
  • dynamic range refers to the ratio from “brightest” to "darkest” contained in the image, that is, the number of grayscale levels between the “brightest” and “darkest” in the image.
  • Dynamic range The larger it is, the richer the levels it can express and the wider the color space it contains; high dynamic range means that the ratio from "brightest” to "darkest” can reach a very high (exceeding a predetermined threshold); through grayscale
  • high dynamic range images i.e. HDR images
  • HDR TV is a battery that can support HDR technology. The use of HDR technology can increase the performance of details in the picture, improve the picture quality, and maintain the authenticity of the image color.
  • ultra-high-definition equipment platforms develops rapidly, but the production and production of ultra-high-definition video content lags behind.
  • the inventory of 4K/8K video sources often cannot meet the demand for ultra-high definition playback, but there is a large inventory of standard definition and high-definition videos that cannot be played on ultra-high-definition display platforms. Therefore, ultra-high-definition remakes of standard-definition and high-definition film sources are the fastest and most direct way to solve the shortage of ultra-high-definition film sources.
  • ultra-high-definition video adopts the definition of color space in the TV display standard of the new generation of ultra-high-definition video production and display system
  • 4K/8K video can adopt the BT.2020 color gamut, which specifies the maximum color space that can be achieved using the three primary colors so far.
  • the color gamut gives a larger space for color creation.
  • current video color creation is often done manually, and ultra-high-definition remakes of low-definition film sources often rely on manual processing and cannot be automated, resulting in a long production cycle of film sources, high labor costs, and low processing efficiency.
  • Embodiments of the present disclosure provide a video processing method and device, electronic equipment, and computer-readable storage media.
  • the method can be executed by electronic equipment such as terminal equipment or servers.
  • the terminal equipment can include but is not limited to: personal computers, smart phones, tablets, etc. Computer, Personal Digital Assistant (Personal Digital Assistant, PDA), server, etc.; the video processing method can be implemented by the processor calling computer-readable program instructions stored in the memory; or, the method can be executed by the server.
  • FIG. 1 is a flowchart of a video processing method in some embodiments of the present disclosure. As shown in Figure 1, the video processing method includes the following steps.
  • the execution device can receive the video to be processed through the communication network, read the video to be processed from the internal storage device or external storage device of the execution device, or download the video to be processed from the Internet through the search engine on the execution device.
  • a shooting device on the execution device such as a camera, a camera, etc.
  • the video frame of the video to be processed carried by the instruction from the video processing instruction can also be obtained in other ways, and the specific content is not limited here.
  • S120 Use at least one color enhancement model in the color enhancement model pool to perform color adjustment on the video frame to obtain a color-adjusted video output frame.
  • the color enhancement model pool contains a collection of models that perform color adjustment on videos with different style requirements for multiple video content types.
  • the color enhancement model pool may include at least one color enhancement model, and each color enhancement model in the color enhancement model pool may be used to perform color adjustment on video frames with one style requirement of one video content type.
  • video content types may include at least one of the following types: TV series, documentaries, movies, animations, sports, news, dramas, entertainment and education; in practical application scenarios, video
  • the content type can also be other types, which are not specifically limited here.
  • style requirements refer to the requirements for style features of the video to be processed; for videos of a certain video content type, the video content included generally has relatively uniform style features, and style features can be included in the following feature items At least one of: style, quality, tone, brightness.
  • style refers to the overall audio-visual experience and visual style formed by watching a video for viewers
  • image quality refers to picture quality, including clarity, sharpness, lens distortion, dispersion, resolution, color gamut, and color purity. (or color brightness), color balance and other indicators
  • hue refers to the relative light and dark colors of the image
  • brightness refers to the brightness of the image.
  • style features of video content may also include other feature items, which are not specifically limited here.
  • the color enhancement model pool contains a model set adjusted for videos with style requirements of multiple video types, and the obtained video to be processed is processed according to at least one color enhancement model in the model pool.
  • the color adjustment of the video frame can automatically generate the color adjustment effect corresponding to the video content type and style requirements of the video frame.
  • the video processing process can be automated by the execution device calling the model, so that the video color creation can be automated, and the automation can be completed at low cost. Ultra-high-definition remastering of clear film sources shortens the production cycle of film sources, saves labor costs, and improves video processing efficiency.
  • FIG. 2 illustrates a flowchart of color adjustment processing in some embodiments of the present disclosure. As shown in Figure 2, in some embodiments, step S120 may specifically include the following steps.
  • S21 Receive information including video content type and style requirements of the video frame.
  • step S21 when a video frame of a video to be processed is received, information including a video content type and style requirement of the video frame may also be received, so as to obtain a color enhancement model based on the video content type and style requirement. Select the corresponding model in the pool.
  • the execution order of step S21 may be before or after step S110.
  • Step S21 may also be executed synchronously with step S110. That is, the execution device may synchronously receive the video containing the video frame when receiving the video frame of the video to be processed.
  • different video content types may have different style requirements, and the style requirements of the same video content type may also be different; therefore, according to the received information including the video content type and style requirements of the video frame, the Select a color enhancement model from the color enhancement model pool that matches the video content type and style requirements.
  • a color enhancement model that matches the video content type and style requirements can be selected from the color enhancement model pool, and then The video frame is processed according to the matching color enhancement model to obtain a color-adjusted video output frame.
  • the model can be selected respectively.
  • Each of the at least two color enhancement models is used to process the video frame to obtain the input result of each selected color enhancement model, and according to the preset weight value corresponding to each color enhancement model and the video frame, The input results of each selected color enhancement model are weighted and fused to generate video output frames.
  • the video frame containing the video frame may be received first. Based on the information of content type and style requirements, select a matching color enhancement model from the color enhancement model pool, and then use the selected color enhancement model to process the video frames of the video to be processed, and generate a video that corresponds to the received video content type and style requirements.
  • the color adjustment effect realizes automatic adjustment of the video to be processed, meets the color style requirements of the corresponding video content type, and improves the data processing efficiency of video color adjustment.
  • FIG 3 illustrates a flowchart of color adjustment processing in some embodiments of the present disclosure.
  • step S120 may specifically include the following steps.
  • each color enhancement model in the color enhancement model pool processes the video frame of the video to be processed, and obtains the color adjustment result of each color enhancement model on the video frame according to the style requirements of its corresponding video content type. .
  • S32 process the video frame based on the content analysis model, and generate a weight value corresponding to each color enhancement model and the video frame; wherein the content analysis model is configured to determine the weight value of each color enhancement model in the color enhancement model pool based on the video frame.
  • the content analysis model can assign a weight value (also called a weight coefficient) to each color enhancement model in the color enhancement model pool by analyzing the video content type of the video frame, so that the content analysis model outputs The weight value corresponding to each color enhancement model and the video frame weights the output results of each model in the model pool.
  • a weight value also called a weight coefficient
  • the weighted fusion process includes: performing a matrix multiplication operation on the weight value corresponding to each color enhancement model and the video frame, and the color-adjusted video frame output by the corresponding color enhancement model; and then multiplying each matrix The operation results are matrix added to obtain the video output frame in matrix form.
  • the weight value corresponding to each color enhancement model output by the content analysis model and the video frame can be used to assign the weight to each color enhancement model in the color enhancement model pool, and finally output the The fused color adjustment result of the processing results of each color enhancement model in the model pool.
  • the color adjustment result is based on the video content type analysis of the video to be processed, and a video color adjustment result that is more consistent with the video content type is obtained.
  • Figure 4a shows a schematic structural diagram of a video processing system according to some embodiments of the present disclosure
  • Figure 4b shows a schematic structural diagram of a video processing system according to some embodiments of the present disclosure
  • the same numbers in Figures 4a and 4b may represent the same structure.
  • the video processing system includes: video frame input module 41, color enhancement model pool processing module 42, content analysis model processing module 43, weight value output module 44, fusion processing module 45 and video Frame output module 46.
  • the video frame input module 41 may be used to obtain input video frames.
  • the execution device may obtain the video frame carried in the received video processing instruction from the instruction.
  • the color enhancement model pool processing module 42 is used to perform color adjustment on the video frame using at least one color enhancement model in the color enhancement model pool to obtain a color-adjusted video output frame.
  • the color enhancement model pool processing module may be referred to as the color enhancement model pool for short.
  • a color enhancement model corresponding to the video content type and style requirements may be selected from the color enhancement model pool.
  • the color enhancement model is used to process the video frame using the selected color enhancement model to obtain the color-adjusted video output frame.
  • each color enhancement model in the color enhancement model pool can be used to process the video frame to obtain the output of each color enhancement model. Color adjusted video frames.
  • the content analysis model processing module 43 is used to process video frames based on the content analysis model and generate a weight value corresponding to each color enhancement model and the video frame.
  • the weight value corresponding to each color enhancement model and the video frame in the color enhancement model pool can be set in advance; for example, when processing a large batch of videos to be processed of the same video content type and the same style requirements, the weight value can be set in advance.
  • a weight value assigned to each color enhancement model in the color enhancement model pool based on multiple test results or actual experience.
  • video frames are processed based on the content analysis model, and the weight value corresponding to each color enhancement model and the video frame can be automatically generated, reducing the participation of manual processing, which is conducive to automating color creation of videos and greatly reducing the cost of video processing. Reduce reproduction costs and improve video processing efficiency.
  • the weight value output module 44 is used to output the weight value corresponding to each color enhancement model and the video frame.
  • the fusion processing module 45 is configured to use the weight value corresponding to each color enhancement model and the video frame to perform weighted fusion processing on the color-adjusted video frames output by each color enhancement model to obtain a fusion result.
  • the video frame output module 46 is used to generate a color-adjusted video output frame according to the fusion result.
  • the acquired video frame can be color adjusted according to at least one color enhancement model in the color enhancement model pool, and a color adjustment effect corresponding to the video content type and style requirements of the video frame can be automatically generated; and can Through the weight value corresponding to each color enhancement model output by the content analysis model and the video frame, the weight is assigned to each color enhancement model in the color enhancement model pool, and the fused color adjustment of the processing results of each color enhancement model is output. As a result, a video color adjustment result that is more in line with the style requirements of the video content type is obtained, and automatic adjustment of video color is achieved, thereby reducing video reproduction costs and improving video processing efficiency.
  • the video processing system is basically the same as the video processing system in Figure 4a, except that Figure 4b shows multiple color enhancement models in the color enhancement model pool.
  • the color enhancement model pool includes at least one group of color enhancement models, and each group of color enhancement models corresponds to a video content type; each group of color enhancement models includes at least one color enhancement model, and each color enhancement model corresponds to A style requirement.
  • the first group of color enhancement models is a group of models corresponding to TV series videos, and the group of models includes N1 models, such as TV series style model 1, TV series style model 2,..., TV series style model N1;
  • N1 models such as TV series style model 1, TV series style model 2,..., TV series style model N1;
  • Each model in the group of models can correspond to different style requirements of TV series videos;
  • the second group of color enhancement models is a group of models corresponding to documentary videos.
  • This group of models includes N2 models, such as documentary style model 1, Documentary style model 2,..., documentary style model N2, each model in this group of models can correspond to different style requirements of documentary videos;...;
  • the third group of color enhancement models is a group corresponding to movie videos Model, this set of models includes N3 models, such as movie style model 1, movie style model 2,..., movie style model N3; each model in this set of models can correspond to different style requirements of movie videos.
  • N1, N2 and N3 are all integers greater than or equal to 1.
  • the color enhancement model pool may also include other model groups corresponding to at least one style requirement of other video content types; the specific settings may be based on actual conditions, and are not specifically limited in the embodiments of the present disclosure.
  • Figure 4b shows the specific processing unit in the fusion module, where, the symbol Represents the matrix multiplication operation, and the symbol ⁇ represents the matrix addition operation.
  • the above-mentioned fusion module can be specifically configured to: perform matrix multiplication of the weight value corresponding to each color enhancement model and video frame output by the content analysis model and the color-adjusted video frame output by the corresponding color enhancement model. operation; and, perform matrix addition on each matrix multiplication operation result to obtain a matrix addition result, which is used to generate a video output frame based on the matrix addition result.
  • the video processing method can be automatically implemented by the execution device using an artificial intelligence (Artificial Intelligence, AI) model.
  • AI models can be models implemented based on convolutional neural networks, and can be used in image processing fields such as image detection, image classification, and image content analysis. Both the color enhancement model and the content analysis model in the embodiments of the present disclosure are AI models.
  • the models in the color enhancement model pool have the same model structure.
  • the specific structure and working principle of the color enhancement model are described in detail below through Figures 5 to 8.
  • Figure 5 shows a schematic diagram of the specific architecture of a color enhancement model according to an embodiment of the present disclosure.
  • the color enhancement model includes 2N dense residual connection network modules (Residual Dense Block, RDB) 51 and N fully connected (Fully Connected) layers 52, where N is an integer greater than or equal to 1.
  • the color enhancement model includes: at least two dense residual connection network modules (Residual Dense Block, RDB) 51 and at least one fully connected layer 52.
  • the above 2N RDBs can be set as two task branch models: image reconstruction (Image Reconstruction, IR) branch model and color enhancement (Color Enhancement) branch model, each Each task branch model includes N RDB51s.
  • the color enhancement model may also include: a first convolution layer 53 , a filter unit 54 , a second convolution layer 55 , a third convolution layer 56 , and a fourth convolution layer 57 .
  • Figure 6 shows a specific flow chart for color adjustment of video frames using a color enhancement model in an embodiment of the present disclosure
  • Figure 7 shows a specific flow chart for image reconstruction and color adjustment in an embodiment of the present disclosure.
  • the step of using the selected color enhancement model to process the video frame in step S23 to obtain a color-adjusted video output frame may specifically include the following steps.
  • S601 Perform feature extraction on the video frame to obtain the first image feature.
  • feature extraction can be performed on the input color enhancement model video frame through the first convolution layer 53 in the color enhancement model to obtain the first image features.
  • S602 Perform filtering on the video frame, and perform feature extraction on the filtered video frame to obtain second image features.
  • the filtering unit 54 can be used to filter the video frames input to the color enhancement model, and the second convolution layer 55 can be used to extract features of the filtered video frames to obtain second image features.
  • the filtering unit 54 may be one of a Gaussian filtering unit, a mean filtering unit, and a median filtering unit.
  • the Gaussian filter unit is used to perform Gaussian filtering on video frames
  • the mean filtering unit is used to perform mean filtering on video frames
  • the median filtering unit is used to perform median filtering on video frames.
  • Gaussian filtering is a linear smoothing filter, suitable for eliminating Gaussian noise.
  • Gaussian smoothing filter is very effective in suppressing noise that obeys normal distribution; mean filtering can reduce the image pixels in the video frame.
  • the pixel mean of local image information is used as the processed pixel value of the image pixels.
  • the filtering algorithm is simple and the calculation speed is fast; median filtering can change the grayscale of the image pixels in the video frame from large to large. Arrange them in small order, and use the grayscale intermediate value as the grayscale value of the pixel point of the image; in some scenes, median filtering is beneficial to better retaining the boundaries of the image when denoising the image.
  • filtering the video frame through the filtering unit is not only beneficial to removing the noise of the video frame and achieving smooth transition of the image in the video frame, but is also beneficial to enhancing the low-frequency features of the image in the video frame, and obtains Low-frequency information of images in video frames.
  • the information in the area is low-frequency information; if the pixel values of adjacent pixels in an area in the image are different The difference in pixel values is large (exceeding the pixel value threshold), then the information in this area is high-frequency information.
  • the low-frequency information can be the color information, grayscale information, etc. of the image
  • the high-frequency information can refer to the texture information, edge information, etc. of the image; after filtering by the filtering unit, it is beneficial to the subsequent extraction of image features. , is less affected by high-frequency information such as image noise and image texture, thus retaining more low-frequency color features (also called low-frequency color features).
  • the filtering unit can also be other types of filtering units, used to perform other types of filtering processing on video frames; specifically, the type of filtering unit can be selected according to actual needs, and is not specifically limited in this embodiment of the disclosure.
  • Image reconstruction is a technology that processes existing low-resolution (LR) images and converts them into high-resolution (High-resolution, HR) images.
  • the image reconstruction branch model extracts features of the first image feature of the video frame through RDB, and performs coarse-precision reconstruction of the output image; coarse-precision reconstruction uses the video frame of the video to be processed as the initial image, and the initial image is After the feature extraction of the convolution layer, the feature information of the initial image is obtained.
  • the RDB is used to extract the feature information of the initial image.
  • the image reconstructed based on the feature information of the initial image can be called a coarse-precision reconstructed image.
  • the second image features ie, low-frequency color features obtained after the filtering process of the filtering unit are analyzed and extracted for subsequent color adjustment based on the analyzed and extracted features.
  • S604 Perform feature fusion processing on the output results of the image reconstruction branch model and the output results of the color enhancement branch model to obtain a fusion result.
  • the output results of the image reconstruction branch model and the output results of the color enhancement branch model are multiplied by corresponding feature channels, and then the feature channel multiplication operation results are added by feature channels to obtain the fusion result.
  • S605 Generate a color-adjusted video output frame according to the fusion result.
  • the third convolution layer 56 is used to perform feature extraction on the output results of the image reconstruction branch model
  • the fourth convolution layer 57 is used to perform feature extraction on the image re-branch model
  • the features of the third convolution layer 56 are The extraction result and the feature extraction result of the fourth convolution layer 57 are multiplied by the corresponding feature channels, and the features of the two task branch models are fused to obtain the color adjustment output result of the color enhancement model.
  • the input video frames when using the color enhancement model to process video frames, can be image reconstructed and color adjusted through the image reconstruction branch model and the color enhancement branch model, and the output of the image reconstruction branch model can be The result and the output result of the color enhancement branch model are subjected to feature fusion processing to obtain the color adjustment output result of the color enhancement model.
  • the size (ie, width and height) of the convolution kernel in each RDB51 is the same.
  • the size of the convolution kernel in RDB51 is 3 ⁇ 3
  • the number of output feature channels is 64
  • the filter kernel in the filter unit 54 is The size is 21 ⁇ 21.
  • the size of the first convolution layer 53 is the same as the size of the second convolution layer 55.
  • the size of both is 3 ⁇ 64.
  • the size of the third convolution layer 56 is the same as the size of the fourth convolution layer. 57 has the same dimensions, for example both are 64 ⁇ 3.
  • the number and size of the convolution kernels used to extract features, the number and size of the convolution kernels in each RDB 51 , and the number and size of the filter kernels in the filter unit 54 can all be determined according to Settings are required and are not specifically limited in the embodiments of this disclosure.
  • N dense residual connection network modules RDB can be included in each branch model.
  • the color enhancement model also includes N fully connected layers; where N is an integer greater than or equal to 1; the image reconstruction and color adjustment processing processes based on the model architecture of the two task branch models are described below through specific embodiments .
  • the first image feature is input into the image reconstruction branch model
  • the second image feature is input into the color enhancement branch model to perform image reconstruction and color adjustment based on the first image feature and the second image feature.
  • Steps may specifically include the following steps.
  • S701 Obtain the first RDB in the image reconstruction branch model and the first RDB in the color enhancement branch model as the current RDB of the first branch model and the current RDB of the second branch model.
  • S702 process the first image feature through the current RDB of the first branch model, process the second image feature through the current RDB of the second branch model, and obtain the feature map output by the current RDB of the first branch model and the low-frequency color output by the current RDB of the second branch model. feature.
  • a feature vector is obtained after processing by a fully connected layer, and the feature vector is multiplied correspondingly with the feature map output by the current RDB of the first branch model to obtain the local color feature adjustment feature map.
  • image feature extraction can be performed through the current RDB in the image reconstruction branch model (for example, the convolution kernel is 3 ⁇ 3, and the output feature channel is 64); through the current RDB in the color enhancement branch model (for example, the convolution kernel is 3 ⁇ 3, output feature channel 64) analyzes and extracts the low-frequency color features of the image, and converts the extracted low-frequency color features of each layer into a feature vector with a width of 1 through a fully connected layer (for example, a feature vector with a size of 1 ⁇ 64 Feature vector), each element in the feature vector output by the fully connected layer is multiplied by the corresponding feature channel with the feature map output by the current RDB in the image reconstruction branch model, which can achieve fine adjustment of local color features.
  • the convolution kernel is 3 ⁇ 3, and the output feature channel is 64
  • the current RDB in the color enhancement branch model for example, the convolution kernel is 3 ⁇ 3, output feature channel 64
  • S704 Use the feature map adjusted by the local color feature as a new first image feature, and use the low-frequency color feature currently output by the RDB of the second branch model as a new second image feature.
  • the low-frequency color features extracted by the last RDB in the color enhancement branch model are obtained, the low-frequency color features are converted into a feature vector with a width of 1 through the last fully connected layer, and then the last fully connected layer is Each element in the output feature vector is multiplied by the corresponding feature channel of the feature map output by the last RDB in the image reconstruction branch model to obtain the output result of the image reconstruction branch model in the color enhancement model; the last one in the color enhancement branch model is The low-frequency color features extracted by RDB are used as the output results of the color enhancement branch model in the color enhancement model; feature fusion processing is performed based on the output results of the image reconstruction branch model and the input results of the color enhancement branch model to obtain the color adjustment output results of the color enhancement model. This achieves global color style enhancement.
  • FIG 8 shows a specific architecture diagram of the dense residual connection network module in the embodiment of the present disclosure.
  • each RDB includes a dense connection network layer (Dense Net) 511 and a local feature fusion layer 512.
  • the local feature fusion layer 512 is used to perform feature fusion processing on the input features of the RDB and the output features of the densely connected network layer 511 using local residual learning to obtain the output features of the RDB.
  • the dense connection network layer 511 includes: M dense connection modules 5111, M is an integer greater than or equal to 1; wherein the input of the first dense connection module 5111 is the input feature of the RDB; the i-th dense connection The input features of the module 5111 are: the input features of the RDB and the output features of each dense connection module 5111 located before the i-th dense connection module 5111, where i is an integer greater than 1 and less than or equal to M.
  • the input of the local feature fusion layer 512 includes: the input features of the RDB (for example, it can be recorded as F d-1 ) and the output features of each dense connection module 5111 (for example, it can be recorded as F d,1 , F d , C , etc.); the local feature fusion layer 512 may include a merging (Concat) layer and a convolution layer with a size of 1 ⁇ 1; wherein the merging layer uses a connection method to fuse the features input to the merging layer, 1 ⁇
  • the convolutional layer of 1 is used to adaptively fuse a series of features at different levels to generate the output features of local feature fusion (for example, it can be recorded as F d, LF ); the output features of local feature fusion are corresponding to the input features of the RDB
  • the features of the channels are added to generate a global feature fusion result (for example, it can be recorded as F d ), and the global feature fusion result is used as the output feature of the RDB.
  • RDB is a network structure that combines residual network and dense connection network. Under the joint action of dense connection, feature fusion and residual learning, information sharing is achieved by retaining low-level information. Feature fusion adaptively obtains information obtained from different convolutional layers, and has better performance in the direction of image processing. Specifically, the middle layer of the RDB can use the dense connection mode of the dense connection module 5121, and the connection module 5122 of the last layer outputs the processing result of the RDB through residual hopping.
  • each dense connection module in the RDB can transfer the feature information that needs to be saved to the subsequent dense connection modules.
  • This network structure can fully ensure the transfer and circulation of feature information between each layer. When implementing features While reusing, the amount of parameters is reduced, and even when the amount of network parameters is large, better processing speed can be guaranteed.
  • each color enhancement model in the color enhancement model pool has the same model structure; the color enhancement models in the color enhancement model pool use different color enhancement data pairs in advance for the color styles of different video content types.
  • a trained color enhancement model obtained by training the initial color enhancement model; the color enhancement data pair includes: an unadjusted image with the same image size selected from the same image position in the unadjusted image and the first toned image
  • the image block and the toned image block, the first toned image is an image obtained in advance that is obtained by toning the untoned image based on the image color style characteristics.
  • the initial color enhancement model and each color enhancement model in the color enhancement model pool have the same model structure;
  • the initial value of the model parameters of the initial color enhancement model can be a random value of the parameter of the model, or it can be is the parameter experience value of the model;
  • the first toned image can be a pre-acquired image that has been color adjusted by a colorist;
  • the training data of the color enhancement network model includes: an unadjusted image and the first toned image Image blocks with the same image size selected from the same image position in . For example, during the training process of each color enhancement model, an image block with a size of 256 ⁇ 256 at a certain position in the image can be randomly extracted and input into the network for training.
  • the size of the image blocks required during the training process can be selected according to actual needs, and is not specifically limited in the embodiment of the present disclosure.
  • training the initial color enhancement model may include: using the initial color enhancement model to process an untoned image block selected from an untoned image, to obtain a color adjustment of the untoned image block As a result, the toned image block corresponding to the untoned image block is obtained from the color enhancement data pair as the color adjustment reference result; based on the color adjustment result and the color adjustment reference result, the loss function of the initial color enhancement model is constructed ; Use the loss function of the initial color enhancement model to adjust the model parameters of the initial color enhancement model to obtain an updated color enhancement model; use the updated color enhancement model as a new initial color enhancement model, and use the new initial color enhancement model to process never-before-seen color enhancement models.
  • the un-toned image blocks selected from the toned image are used until the training process of the initial color enhancement model meets the first training end condition, and the trained color enhancement model is obtained.
  • the first training end condition includes at least one of the following conditions: the training loss calculated according to the loss function of the initial color enhancement model is less than or equal to the set first loss threshold, and the number of training times for the initial color enhancement model reaches the set The first number threshold.
  • the loss function can be used to estimate the gap between the result of each model training and the training target; for the initial color enhancement model, the result of each model training is the untoned image output by the initial color enhancement model.
  • the color adjustment result of the block, the training target is the corresponding color adjustment reference result; the first loss threshold and the first count threshold can be customized according to the actual training requirements.
  • the step of constructing a loss function of the initial color enhancement model based on the color adjustment result and the color adjustment reference result may specifically include: calculating the difference between the color adjustment result and the color adjustment reference result. Minimize the square error to obtain the minimum square error loss; calculate the Gaussian filter output result of the color adjustment result and the Gaussian filter output result of the color adjustment reference result to obtain the color loss; perform a weighted sum of the minimum square error loss and color loss , the loss function that generates the initial color enhancement model.
  • the loss function of the initial color enhancement model can be expressed as the following expression (1):
  • Color adjustment result, I gt1 is the color adjustment reference result
  • Gauss() is the function to calculate the Gaussian filter output result
  • the size of the filter kernel is the filter kernel size of the filter unit in the trained color enhancement model
  • It is the L2 paradigm of the difference between the Gaussian filter output result of the color adjustment result of each model training of the initial color enhancement model and the Gaussian filter output result of the color adjustment reference result, that is, the color loss of each model training
  • a is the L2 loss
  • a network optimizer such as the Adam optimizer can be used to ensure that the output result (color adjustment result) of each model training is closer to the target of model training (color adjustment reference result).
  • different color enhancement data pairs can be used to train the color enhancement model according to the style requirements of different video content types, so as to obtain the color adjustment of video frames according to the different style requirements of different video content types, thereby A collection of models with different color correction effects is generated to obtain a color enhancement model pool, which can be used to perform automatic color adjustment on the video frames of the video to be processed according to the color enhancement model pool, thereby improving color adjustment efficiency.
  • Figure 9 shows a schematic diagram of the specific architecture of the content analysis model in the embodiment of the present disclosure.
  • the content analysis model includes: a convolutional network of K stacked convolution structures 91 connected in sequence and two fully connected layers.
  • each of the first K-1 stacked convolution structures 91 includes a convolution layer 911, an activation layer (The Rectified Linear Unit, ReLU) 912 and a pooling layer (Max Pooling). ) 913
  • the Kth stacked convolution structure 92 includes a convolution layer 911 and an activation layer 912; the two fully connected layers are the fully connected layer 93 and the fully connected layer 94.
  • the activation layer 912 can be used to perform nonlinear mapping of the corresponding convolution layer output results to speed up the convergence speed of the model;
  • the pooling layer 913 can be used to perform image downsampling (Under Sampling). Downsampling can reduce image feature dimensions, improve accuracy and avoid overfitting.
  • the size of the input image is 256 ⁇ 256 ⁇ 3 (length, width and height), and in each stacked convolution structure 91 in the content analysis model pool, the size of the convolutional layer may be 3 ⁇ 3 ;
  • the size of the input image can be scaled to 64 ⁇ 64 ⁇ 64.
  • the size of the input image can be scaled to 32 ⁇ 32 ⁇ 128.
  • the size of the input image can be scaled to 32 ⁇ 32 ⁇ 128.
  • the size of the input image is scaled to 16 ⁇ 16 ⁇ 256, and the fifth stacked convolution structure 91 can scale the size of the input image to 8 ⁇ 8 ⁇ 512.
  • the fully connected layer 93 converts the output result of the last stacked convolution structure 91 into a one-dimensional vector of specified dimensions (for example, 1 ⁇ 1024), and the fully connected layer 94 is used to convert the one-dimensional vector into a dimension of A one-dimensional vector of L, where L is the total number of models in the color enhancement model pool.
  • the color enhancement model pool in Figure 9 has the same model structure as the color enhancement model pool described in conjunction with Figures 4a and 4b in the above embodiments.
  • the stacked convolution network structure in the content analysis model can analyze and extract image features by stacking convolution layers, perform feature extraction through the convolution layers, and the final convolution layer outputs through two fully connected
  • the layer is converted into L weight values, L is an integer greater than or equal to 1, each weight value is multiplied by the output result of the corresponding color enhancement model, and then each multiplication operation result is added to the features of the corresponding feature channel. and, generate the final color enhancement result, so that by analyzing the content of the video frame, different weights are assigned to different color enhancement models to achieve the purpose of adaptive color adjustment according to the content.
  • the content analysis model is a trained content analysis obtained by training the initial content analysis model using image data pairs of different styles based on each color enhancement model in the pre-trained color enhancement model pool. model; wherein the image data pair includes: an untoned image block and a toned image block with the same image size selected from the same image position in the untoned image and the second toned image, and the second toned image block
  • the image is a pre-acquired image that is color-graded on an ungraded image based on different video content types.
  • the model structure of the initial content analysis model is the same as that of the content analysis model;
  • the initial value of the model parameters of the initial content analysis model can be a random value of the parameters of the model, or it can be the parameter experience of the model. value;
  • the second toned image may be a pre-acquired image that has been color-adjusted by a colorist; the second toned image is different from the first toned image in the above embodiment.
  • the training data of the content analysis model includes: image blocks with the same image size selected from the same image position in the untoned image and the second toned image.
  • image blocks with the same image size selected from the same image position in the untoned image and the second toned image For example, during the training process of each color enhancement model, an image block with a size of 256 ⁇ 256 at a certain position in the image can be randomly extracted and input into the network for training. The random extraction method can ensure that the model will not appear to be overly biased in a certain style. fitting situation.
  • the size of the image blocks required during the training process can be selected according to actual needs, and is not specifically limited in the embodiment of the present disclosure.
  • training the initial content analysis model includes using each color enhancement model in the pre-trained color enhancement model pool to process untoned image patches selected from the untoned image to obtain each color enhancement model.
  • the color adjustment result of the untoned image block by a color enhancement model use the initial content analysis model to process the untoned image block to obtain a predetermined number of weight values, the predetermined number is the same as the number of color enhancement models in the color enhancement model pool The same; according to a predetermined number of weight values, each color enhancement model performs a weighted fusion process on the color adjustment results of the untoned image blocks to obtain the color enhancement results of the untoned image blocks, and obtain the results of the untoned image blocks from the image data pairs.
  • the toned image blocks corresponding to the toned image blocks are used as color enhancement reference results; based on the color enhancement results and color enhancement reference results of the untoned image blocks, the loss function of the initial content analysis model is constructed; using the initial content analysis model The loss function adjusts the model parameters of the initial content analysis model to obtain an updated content analysis model; the updated content analysis model is used as a new initial content analysis model, and the uncolored image blocks are processed using the new initial content analysis model until The training process of the initial content analysis model satisfies the second training end condition, and the trained content analysis model is obtained.
  • the second training end condition includes at least one of the following conditions: the training loss calculated according to the loss function of the initial content analysis model is less than or equal to the set second loss threshold, and the number of training times of the initial content analysis model reaches the set The second threshold value.
  • the result of each model training is the color enhancement result of the untoned image block output by the content analysis model, and the training target is the corresponding color enhancement reference result; and according to the initial content analysis model
  • the loss function of the initial content analysis model is constructed based on the results of each model training and the training objectives; it should be understood that the second loss threshold and the second times threshold can be customized according to actual training requirements.
  • the loss function of the initial content analysis model is a loss function constructed based on the minimized square error of the color enhancement result of the untoned image block and the color enhancement reference result.
  • the loss function of the initial content analysis model can be expressed as the following expression (2):
  • Color enhancement result I gt1 is the color enhancement reference result, It is the L2 norm of the difference between the color enhancement result and the color enhancement reference result, that is, the L2 loss of the image features for each model training of the initial content analysis model.
  • a network optimizer such as the Adam optimizer can also be used to ensure that the output result of each model training (color enhancement result) is closer to the target of model training (color enhancement reference result) .
  • the learning rate (Learning Rate) of model training can be set; the learning rate determines whether and when the loss function converges to the local minimum; as an example , for the training process of the color enhancement model and the training process of the content analysis model, the learning rate can be the same, for example, both are 1e-4; in other examples, the learning rates during the model training process of the two models can also be different, learning The specific value of the rate can be set according to the actual situation.
  • different color enhancement data pairs can be used to train the initial content analysis model according to the style requirements of different video content types.
  • the trained content analysis model can output each color enhancement model in the color enhancement model pool.
  • the weight value corresponding to the video frame, the weight value corresponding to each color enhancement model in the color enhancement model pool and the video frame, and the output result of the corresponding color enhancement model are processed through weighted fusion to achieve adaptive color style enhancement of the video frame processing to improve video color enhancement efficiency.
  • the present disclosure also provides video processing devices, electronic equipment, and computer-readable storage media, all of which can be used to implement any video processing method provided by the present disclosure.
  • video processing devices electronic equipment, and computer-readable storage media, all of which can be used to implement any video processing method provided by the present disclosure.
  • Embodiments of the present disclosure provide a video processing device, which is a corresponding device for implementing the video processing method provided by the above embodiments of the present disclosure.
  • the device can be implemented in software and/or hardware, and can generally be integrated into electronic equipment.
  • Figure 10 is a schematic structural diagram of a video processing device according to an embodiment of the present disclosure.
  • a video processing device 1000 provided by an embodiment of the present disclosure includes the following modules.
  • the acquisition module 1010 is used to acquire video frames to be processed.
  • the processing module 1020 is configured to perform color adjustment on the video frame using at least one color enhancement model in the color enhancement model pool to obtain a color-adjusted video output frame.
  • the color enhancement model pool contains a collection of models that perform color adjustment on videos with different style requirements for multiple video content types.
  • the acquired video frame of the video to be processed is color adjusted according to at least one color enhancement model in the model pool, and can automatically generate a video frame corresponding to the video content type and style requirements.
  • the color grading effect the video processing process can be automated by the execution device calling the model, so that video color creation can be automated, labor costs can be saved, and video processing efficiency can be improved.
  • the processing module 1020 may specifically include: a receiving unit, configured to receive information including video content type and style requirements of the video frame; and a selecting unit, configured to select the video content type and style from the color enhancement model pool.
  • the color enhancement model corresponding to the demand; the model processing unit is used to process the video frame using the selected color enhancement model to obtain a color-adjusted video output frame.
  • the processing module 1020 may specifically include: a color adjustment unit, configured to process video frames using each color enhancement model in the color enhancement model pool to obtain color-adjusted video frames output by each color enhancement model; A content analysis unit, configured to process video frames based on the content analysis model and generate a weight value corresponding to each color enhancement model and the video frame; wherein the content analysis model is configured to determine each color enhancement model in the color enhancement model pool based on the video frame The weight value; the fusion processing unit is used to use the weight value corresponding to each color enhancement model and the video frame to perform weighted fusion processing on the color-adjusted video frame output by each color enhancement model to generate a video output frame.
  • the color enhancement model includes: an image reconstruction branch model and a color enhancement branch model;
  • the model processing unit may specifically include: an extraction subunit, used to extract features of video frames to obtain first image features; a filter subunit. The unit is used to filter the video frames and extract features from the filtered video frames to obtain the second image features;
  • the branch model processing sub-unit is used to input the first image features into the image reconstruction branch model, and the second
  • the image features are input to the color enhancement branch model to perform image reconstruction and color adjustment according to the first image feature and the second image feature;
  • the result fusion subunit is used to perform output results of the image reconstruction branch model and the output results of the color enhancement branch model.
  • Feature fusion processing is performed to obtain the fusion result;
  • the result generation subunit is used to generate color-adjusted video output frames based on the fusion result.
  • both the image reconstruction branch model and the color enhancement branch model include: N dense residual connection network modules RDB; the color enhancement model also includes N fully connected layers; where N is greater than or equal to 1 Integer; the branch model processing subunit is specifically used to: obtain the first RDB in the image reconstruction branch model and the first RDB in the color enhancement branch model as the current RDB of the first branch model and the current RDB of the second branch model; through the first The current RDB of the branch model processes the first image feature, and the current RDB of the second branch model processes the second image feature to obtain the feature map output by the current RDB of the first branch model and the low-frequency color feature output by the current RDB of the second branch model; for the second The low-frequency color features currently output by the RDB of the branch model are processed by a fully connected layer to obtain a feature vector.
  • the feature vector is multiplied by the feature map currently output by the RDB of the first branch model to obtain a feature map adjusted by the local color features;
  • the feature map adjusted by local color features is used as the new first image feature, and the low-frequency color feature output by the current RDB of the second branch model is used as the new second image feature; the next RDB and color enhancement branch in the image reconstruction branch model are obtained
  • the next RDB in the model is used as the new current RDB of the first branch model and the new current RDB of the second branch model.
  • the newly obtained feature map adjusted by local color features is used as the image reconstruction branch model. Output the results, and use the newly obtained low-frequency color features as the output results of the color enhancement branch model.
  • the RDB includes a densely connected network layer and a local feature fusion layer; the local feature fusion layer is used to: use local residual learning to add the input features of the RDB and the output features of the densely connected network layer to obtain The output features of RDB;
  • the dense connection layer includes: M dense connection modules and 1 connection module, M is an integer greater than or equal to 1; among them, the input of the first dense connection module is the input feature of RDB; the i-th dense connection
  • the input features of the module are: the input features of the RDB and the output features of each dense connection module located before the i-th dense connection module, i is an integer greater than 1 and less than or equal to M; the input of the connection module is: the input of the RDB features and the output features of each densely connected module.
  • each color enhancement model in the color enhancement model pool has the same model structure; the color enhancement models in the color enhancement model pool use different color enhancement data pairs in advance for the color styles of different video content types.
  • a trained color enhancement model obtained by training the initial color enhancement model; the color enhancement data pair includes: an unadjusted image with the same image size selected from the same image position in the unadjusted image and the first toned image
  • the image block and the toned image block, the first toned image is an image obtained in advance that is obtained by toning the untoned image based on the image color style characteristics.
  • the video processing device 1000 further includes a color enhancement model training module for training an initial color enhancement model;
  • the color enhancement training module includes: a first adjustment result obtaining unit for using the initial color enhancement model to process the The untoned image block selected from the untoned image is used to obtain the color adjustment result of the untoned image block; the first reference result acquisition unit is used to obtain the color adjustment result corresponding to the untoned image block from the color enhancement data pair.
  • the toned image block serves as the color adjustment reference result;
  • the first loss function construction unit is used to construct the loss function of the initial color enhancement model based on the color adjustment result and the color adjustment reference result;
  • the first parameter adjustment unit is used to utilize the initial
  • the loss function of the color enhancement model adjusts the model parameters of the initial color enhancement model to obtain an updated color enhancement model;
  • the first iterative training unit is used to use the updated color enhancement model as a new initial color enhancement model and use the new initial color
  • the enhancement model processes the untoned image blocks selected from the untoned image until the training process of the initial color enhancement model meets the first training end condition, and the trained color enhancement model is obtained; wherein the first training end condition includes the following conditions At least one of the items: the training loss calculated according to the loss function of the initial color enhancement model is less than or equal to the set first loss threshold, and the number of times of training the initial color enhancement model reaches the set first number threshold.
  • the color enhancement training module when used to construct a loss function of the initial color enhancement model based on the color adjustment result and the color adjustment reference result, is specifically used to: calculate the difference between the color adjustment result and the color adjustment reference result. Minimize the squared error to obtain the minimized squared error loss; calculate the Gaussian filter output result of the color adjustment result and the Gaussian filtered output result of the color adjustment reference result to obtain the color loss; perform a weighted sum of the minimized squared error loss and color loss, Loss function that generates the initial color enhancement model.
  • the content analysis model includes: a convolutional network of K stacked convolution structures and two fully connected layers connected in sequence, and each of the first K-1 stacked convolution structures has It includes a convolution layer, an activation layer and a pooling layer, and the Kth layer stacked convolution structure includes a convolution layer and an activation layer.
  • the content analysis model is pre-trained content obtained by training the initial content analysis model using image data pairs of different styles based on each color enhancement model in the pre-trained color enhancement model pool.
  • Analysis model wherein, the image data pair includes: an untoned image block and a toned image block with the same image size selected from the same image position in the untoned image and the second toned image, and the second toned image block has the same image size.
  • Colored images are pre-acquired images that are color-graded on untoned images based on different video content types.
  • the video processing device 1000 further includes a content analysis model training module for training an initial content analysis model;
  • the content analysis model training module includes: a second adjustment result acquisition unit for utilizing pre-trained color
  • Each color enhancement model in the enhancement model pool processes the untoned image block selected from the untoned image to obtain the color adjustment result of each color enhancement model for the untoned image block;
  • the weight value acquisition unit is used to Use the initial content analysis model to process the uncolored image blocks to obtain a predetermined number of weight values, which is the same as the number of color enhancement models in the color enhancement model pool;
  • the enhancement result acquisition unit is used to obtain a predetermined number of weight values according to the predetermined number , perform weighted fusion processing on the color adjustment results of the untoned image block for each color enhancement model to obtain the color enhancement result of the untoned image block;
  • the second reference result acquisition unit is used to obtain the untoned image block from the image data pair The toned image block corresponding to the toned image block is used as the color enhancement reference result;
  • the second parameter adjustment unit is used to adjust the model parameters of the initial content analysis model using the loss function of the initial content analysis model to obtain an updated content analysis model
  • the second iterative training unit is used to use the updated content analysis model as New initial content analysis model, and use the new initial content analysis model to process the untoned image blocks until the training process of the initial content analysis model meets the second training end condition, and the trained content analysis model is obtained;
  • the first The second training end condition includes at least one of the following conditional items: the training loss calculated according to the loss function of the initial content analysis model is less than or equal to the set second loss threshold, and the number of training times of the initial content analysis model reaches the set second time. threshold.
  • the loss function of the initial content analysis model is a loss function constructed based on minimizing the squared error of the color enhancement result of the untoned image block and the color enhancement reference result.
  • the color enhancement model and the content analysis model in the color enhancement model pool can be used to process the input video frames respectively, so that each color enhancement in the color enhancement model pool output by the content analysis model can be used.
  • the weight value corresponding to the model and the video frame is used to assign the weight corresponding to the video frame to each color enhancement model in the color enhancement model pool, and finally output the weighted processing results of each color enhancement model in the color enhancement model pool.
  • the fused color adjustment result is based on the video content type analysis of the video to be processed, and a video color adjustment result that is more consistent with the video content type is obtained, so that the weight distribution of the content analysis model can be used to automatically adjust the video to be processed to meet the corresponding requirements.
  • the color style requirements of the video content type improve the data processing efficiency of video color adjustment.
  • Figure 11 is a block diagram of an electronic device provided by an embodiment of the present disclosure.
  • an embodiment of the present disclosure provides an electronic device, which includes: at least one processor 1101; at least one memory 1102, and one or more I/O interfaces 1103 connected between the processor 1101 and the memory 1102 among them, the memory 1102 stores one or more computer programs that can be executed by at least one processor 1101, and the one or more computer programs are executed by at least one processor 1101, so that at least one processor 1101 can execute the above-mentioned Video processing methods.
  • Embodiments of the present disclosure also provide a computer-readable storage medium on which a computer program is stored, wherein the computer program implements the above video processing method when executed by the processor/processing core.
  • Computer-readable storage media may be volatile or non-volatile computer-readable storage media.
  • Embodiments of the present disclosure also provide a computer program product, which includes computer readable code, or a non-volatile computer readable storage medium carrying the computer readable code.
  • a computer program product which includes computer readable code, or a non-volatile computer readable storage medium carrying the computer readable code.
  • Computer storage media includes volatile and non-volatile media implemented in any method or technology for storage of information such as computer readable program instructions, data structures, program modules or other data. lossless, removable and non-removable media.
  • Computer storage media include, but are not limited to, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), static random access memory (SRAM), flash memory or other memory technology, portable Compact disk read-only memory
  • CD-ROM compact disc-read only memory
  • DVD digital versatile disk
  • communication media typically embodies computer readable program instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery medium.
  • Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage on a computer-readable storage medium in the respective computing/processing device .
  • Computer program instructions for performing operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages.
  • the computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through the Internet). connect).
  • LAN local area network
  • WAN wide area network
  • an external computer such as an Internet service provider through the Internet. connect
  • an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA)
  • the electronic circuit can Computer readable program instructions are executed to implement various aspects of the disclosure.
  • the computer program product described here may be implemented specifically through hardware, software, or a combination thereof.
  • the computer program product is embodied as a computer storage medium.
  • the computer program product is embodied as a software product, such as a Software Development Kit (SDK), etc. wait.
  • SDK Software Development Kit
  • These computer-readable program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, thereby producing a machine that, when executed by the processor of the computer or other programmable data processing apparatus, , resulting in an apparatus that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes An article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other equipment, causing a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executed on a computer, other programmable data processing apparatus, or other equipment to implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that embody one or more elements for implementing the specified logical function(s).
  • Executable instructions may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or acts. , or can be implemented using a combination of specialized hardware and computer instructions.
  • Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a general illustrative sense only and not for purpose of limitation. In some instances, it will be apparent to those skilled in the art that features, characteristics and/or elements described in connection with a particular embodiment may be used alone, or may be used in conjunction with other embodiments, unless expressly stated otherwise. Features and/or components used in combination. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made without departing from the scope of the present disclosure as set forth in the appended claims.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

本公开涉及一种视频处理方法及装置、电子设备和存储介质,视频处理方法包括:获取待处理视频的视频帧;利用色彩增强模型池中的至少一个色彩增强模型对视频帧进行色彩调整,得到色彩调整后的视频输出帧;其中,色彩增强模型池包含针对多种视频内容类型的不同风格需求的视频进行色彩调整的模型集合。

Description

视频处理方法及装置、电子设备、计算机可读存储介质 技术领域
本公开涉及图像处理技术和人工智能领域,具体涉及视频处理方法及装置、电子设备、计算机可读存储介质。
背景技术
色彩增强是图像处理领域的重要技术,该技术通过调整图片和视频的画面色彩,使图像画面更加丰富和逼真,使视频画质得到较大提升。该技术可以应用于视频色彩创作、片源画质增强等计算机视觉场景。
发明内容
本公开实施例提出了一种视频处理方法及装置、电子设备、计算机可读存储介质。
第一方面,本公开实施例提供一种视频处理方法,包括:获取待处理视频的视频帧;利用色彩增强模型池中的至少一个色彩增强模型对所述视频帧进行色彩调整,得到色彩调整后的视频输出帧;其中,所述色彩增强模型池包含针对多种视频内容类型的不同风格需求的视频进行色彩调整的模型集合。
其中,所述利用色彩增强模型池中的至少一个色彩增强模型对所述视频帧进行色彩调整,得到色彩调整后的视频输出帧,包括:
接收包含所述视频帧的视频内容类型和风格需求的信息;从所述色彩增强模型池中选择与所述视频内容类型和风格需求对应的色彩增强模型;利用选择的色彩增强模型对所述视频帧进行处理,得到所述色彩调整后的视频输出帧。
其中,所述利用色彩增强模型池中的至少一个色彩增强模型对所述视频帧进行色彩调整,得到色彩调整后的视频输出帧,包括:利用色彩增强模型池中的每个色彩增强模型处理所述视频帧,得到所述每个色彩增强模型输出的色彩调整后的视频帧;基于内容分析模型处理所述视频帧,生成所述每个色彩增强模型与所述视频帧对应的权重值;其中,所述内容分析模型配置为根据所述视频帧确定所述色彩增强模型池中的每个色彩增强模型的权重值;使用所述每个色彩增强模型与所述视频帧对应的权重值,对所述每个色彩增强模型输出的色彩调整后的视频帧进行加权融合处理,生成所述视频输出帧。
其中,所述色彩增强模型中包括:图像重建分支模型和色彩增强分支模型;所述利用选择的色彩增强模型对所述视频帧进行处理,得到所述色彩调整后的视频输出帧,包括:
对所述视频帧进行特征提取,得到第一图像特征;对所述视频帧进行滤波处理,并对滤波处理后的视频帧进行特征提取,得到第二图像特征;将所述第一图像特征输入所述图像重建分支模型,所述第二图像特征输入所述色彩增强分支模型,以根据所述第一图像特征和所述第二图像特征进行图像重建和色彩调整;对所述图像重建分支模型的输出结果和所述色彩增强分支模型的输出结果进行特征融合处理,得到融合结果;根据所述融合结果,生成色彩调整后的视频输出帧。
其中,所述图像重建分支模型和所述色彩增强分支模型中均包括:N个密集残差连接网络模块RDB;所述色彩增强模型中还包括N个全连接层;其中,N为大于或等于1的整数;所述将所述第一图像特征输入所述图像重建分支模型,所述第二图像特征输入所述色彩增强分支模型,以根据所述第一图像特征和所述第二图像特征进行图像重建和色彩调整,包括:
获取所述图像重建分支模型中的首个RDB和所述色彩增强分支模 型中的首个RDB,作为第一分支模型当前RDB和第二分支模型当前RDB;通过所述第一分支模型当前RDB处理所述第一图像特征,通过所述第二分支模型当前RDB处理所述第二图像特征,得到所述第一分支模型当前RDB输出的特征图和所述第二分支模型当前RDB输出的低频色彩特征;对于所述第二分支模型当前RDB输出的低频色彩特征,经一个全连接层处理后得到特征向量,将所述特征向量与所述第一分支模型当前RDB输出的特征图对应相乘,得到经局部色彩特征调整的特征图;将所述经局部色彩特征调整的特征图作为新的第一图像特征,将所述第二分支模型当前RDB输出的低频色彩特征作为新的第二图像特征;获取所述图像重建分支模型中的下一个RDB和所述色彩增强分支模型中的下一个RDB,作为新的第一分支模型当前RDB和新的第二分支模型当前RDB,直到获取次数等于N,将新得到的经局部色彩特征调整的特征图作为所述图像重建分支模型的输出结果,并将新得到的低频色彩特征作为所述色彩增强分支模型的输出结果。
其中,所述RDB中包括密集连接网络层和局部特征融合层;所述局部特征融合层用于:利用局部残差学习将所述RDB的输入特征与所述密集连接网络层的输出特征进行特征相加,得到所述RDB的输出特征;所述密集连接层包括:M个密集连接模块和1个连接模块,M为大于或等于1的整数;
其中,首个密集连接模块的输入为所述RDB的输入特征;第i个密集连接模块的输入特征为:所述RDB的输入特征和位于第i个密集连接模块之前的每个密集连接模块的输出特征,i为大于1且小于或等于M的整数;所述连接模块的输入为:所述RDB的输入特征和每个密集连接模块的输出特征。
其中,所述色彩增强模型池中的每个所述色彩增强模型具有相同的模型结构;所述色彩增强模型池中的色彩增强模型,是预先针对不同视 频内容类型的色彩风格采用不同的色彩增强数据对,对初始色彩增强模型进行训练得到的训练好的色彩增强模型;所述色彩增强数据对包括:从未调色图像和第一已调色图像中的相同图像位置选取的具有相同图像尺寸的未调色图像块和已调色图像块,所述第一已调色图像是预先获取的基于图像色彩风格特征对所述未调色图像进行调色后的图像。
对所述初始色彩增强模型进行的训练包括:使用所述初始色彩增强模型处理从所述未调色图像中选取的未调色图像块,得到所述未调色图像块的色彩调整结果,并从所述色彩增强数据对中,获取与所述未调色图像块相对应的已调色图像块,作为色彩调整参考结果;基于所述色彩调整结果和所述色彩调整参考结果,构建初始色彩增强模型的损失函数;利用初始色彩增强模型的损失函数调整所述初始色彩增强模型的模型参数,得到更新的色彩增强模型;将所述更新的色彩增强模型作为新的初始色彩增强模型,并使用所述新的初始色彩增强模型处理从所述未调色图像中选取的未调色图像块,直到所述初始色彩增强模型的训练过程满足第一训练结束条件,得到所述训练好的色彩增强模型;其中,所述第一训练结束条件包括如下条件项的至少一项:根据所述初始色彩增强模型的损失函数计算的训练损失小于或者等于设定的第一损失阈值、对所述初始色彩增强模型的训练次数达到设定的第一次数阈值。
其中,所述基于所述色彩调整结果和所述色彩调整参考结果,构建初始色彩增强模型的损失函数,包括:计算所述色彩调整结果和所述色彩调整参考结果之间的最小化平方误差,得到最小化平方误差损失;计算所述色彩调整结果的高斯滤波输出结果和所述色彩调整参考结果的高斯滤波输出结果,得到色彩损失;对所述最小化平方误差损失和所述色彩损失进行加权求和,生成所述初始色彩增强模型的损失函数。
其中,所述内容分析模型包括:依次连接的K个层叠卷积结构的卷积网和两个全连接层,前K-1个层叠卷积结构中的每个层叠卷积结构中 均包括一个卷积层、一个激活层和一个池化层,第K层叠卷积结构中包括一个卷积层和一个激活层。
其中,所述内容分析模型,是基于预先训练好的所述色彩增强模型池中的每个色彩增强模型,预先采用不同风格的图像数据对,对初始内容分析模型进行训练得到的训练好的内容分析模型;其中,所述图像数据对包括:从未调色图像和第二已调色图像中的相同图像位置选取的具有相同图像尺寸的未调色图像块和已调色图像块,所述第二已调色图像是预先获取的基于不同视频内容类型对未调色图像进行调色后的图像。
其中,对所述初始内容分析模型进行的训练包括:
利用预先训练好的色彩增强模型池中的每个色彩增强模型,处理从所述未调色图像中选取的未调色图像块,得到所述每个色彩增强模型对所述未调色图像块的色彩调整结果;使用初始内容分析模型对所述未调色图像块进行处理,得到预定数量的权重值,所述预定数量与所述色彩增强模型池中的色彩增强模型的数量相同;
根据所述预定数量的权重值,对所述每个色彩增强模型对所述未调色图像块的色彩调整结果进行加权融合处理,得到所述未调色图像块的色彩增强结果,并从所述图像数据对中获取与所述未调色图像块相对应的已调色图像块,作为所述色彩增强参考结果;基于所述未调色图像块的色彩增强结果和所述色彩增强参考结果,构建所述初始内容分析模型的损失函数;利用初始内容分析模型的损失函数调整所述内容分析模型的模型参数,以利用调整后的模型参数,得到更新的内容分析模型;将所述更新的内容分析模型作为新的初始内容分析模型,并使用所述新的初始内容分析模型对所述未调色图像块进行处理,直到所述初始内容分析模型的训练过程满足第二训练结束条件,得到所述训练好的内容分析模型;其中,所述第二训练结束条件包括如下条件项的至少一项:根据所述初始内容分析模型的损失函数计算的训练损失小于或者等于设定的 第二损失阈值、所述初始内容分析模型的训练次数达到设定的第二次数阈值。
其中,所述初始内容分析模型的损失函数,是根据所述未调色图像块的色彩增强结果和所述色彩增强参考结果的最小化平方误差构建的损失函数。
第二方面,本公开实施例提供一种视频处理装置,包括:获取模块,用于获取待处理的视频帧;处理模块,用于利用色彩增强模型池中的至少一个色彩增强模型对所述视频帧进行色彩调整,得到色彩调整后的视频输出帧;其中,所述色彩增强模型池包含针对多种视频内容类型的不同风格需求的视频进行色彩调整的模型集合。
第三方面,本公开实施例提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现本公开实施例任意一种视频处理方法。
第四方面,本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现本公开实施例任意一种视频处理方法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。
附图说明
附图是用来提供对本公开的进一步理解,并且构成说明书的一部分,与下面的具体实施方式一起用于解释本公开,但并不构成对本公开的限制。在附图中:
图1为本公开的一些实施例中的视频处理方法的流程图;
图2为本公开一些实施例中的进行色彩调整处理的流程图;
图3为本公开一些实施例中的进行色彩调整处理的流程图;
图4a为本公开一些实施例的视频处理系统的结构示意图;
图4b为本公开一些实施例的视频处理系统的结构示意图;
图5为本公开实施例的色彩增强模型的具体架构示意图;
图6为本公开实施例中利用色彩增强模型对视频帧进行色彩调整的具体流程图;
图7为本公开实施例中进行图像重建和色彩调整的具体流程图;
图8为本公开实施例中密集残差连接网络模块的具体架构图;
图9为本公开实施例中内容分析模型的具体架构示意图;
图10为本公开实施例的视频处理装置的结构示意图;
图11为实现本公开实施例的视频处理方法的电子设备的框图。
具体实施方式
以下结合附图对本公开的具体实施方式进行详细说明。应当理解的是,此处所描述的具体实施方式仅用于说明和解释本公开,并不用于限制本公开。
在实际应用场景中,随着图像显示技术的发展,用户对收视体验提出了更高的要求,视频画质增强的显示应用例如超高清显示应用逐渐出现在生活场景中。目前,我国超高清显示的产业链也在不断完善中。为了获得更佳的视频画质,对视频图像质量和显示终端的硬件能力都提出了更高的要求,从采集端的超高清摄像机,显示端的超高清分辨率大屏幕,高动态范围(High Dynamic Range,HDR)电视,传输端新一代宽带移动通讯技术连接网络,超高清电视台等,都有大量的企事业单位在其中布局,推动超高清显示等画质增强显示技术的快速发展。
在本公开实施例中,显示端的分辨率可以包括标清(Standard Definition,SD)、高清(High Definition,HD)、全高清(Full High Definition, FHD)和超高清(Ultra High-Definition,Ultra HD)等多种显示格式。
示例性地,标清分辨率例如可以是480×320个像素或640×480个像素,高清分辨率例如可以是1024×720个像素,全高清分辨率例如可以是1920×1080个像素,而超高清分辨率例如可以是3840×2160个像素,即整个显示屏幕上水平方向可以显示3840个像素,垂直方向可以显示2160个像素,由于超高清分辨率可以达到全高清分辨率的四倍,因此也可以将超高清分辨率称为4k分辨率。
示例性地,8k分辨率例如可以是7680×4320个像素,即整个显示屏幕上水平方向可以显示7680个像素,垂直方向可以显示4320个像素。在本公开实施例中,可以将8k分辨率也命名为超高清分辨率。
示例性地,新一代宽带移动通讯技术连接网络例如可以是第五代移动通信技术(5th Generation Mobile Communication Technology,5G)或更高网络制式的通信网络。随着显示技术和移动通信技术的提升,借助新一代宽带移动通讯技术提供的具有高速率和低时延特性的通信网络,为超清视频普及提供了更好的支持。
示例性地,动态范围是指图像中所包含的从“最亮”至“最暗”的比值,也就是图像从“最亮”到“最暗”之间灰度划分的等级数,动态范围越大,所能表示的层次越丰富,所包含的色彩空间也越广;高动态范围是指从“最亮”到“最暗”可以达到非常高(超过预定阈值)的比值;通过灰度的量化,高动态范围图像(即HDR图像)可以提供更多的动态范围和图像细节。HDR电视是可以支持HDR技术的电池,HDR技术的使用可增加画面中细节的表现,提高画质,保持图像色彩的真实度。
在一些应用场景中,超高清设备平台的技术发展较快,而超高清视频内容的生产制作却较为落后。4K/8K的片源存量往往不能满足超高清播放需求,但标清、高清视频却有大量的库存无法在超高清显示平台上 播放。因此,对标清、高清片源进行超高清重制是可以解决超高清片源不足最快速最直接的手段。
由于超高清视频采用新一代超高清视频制作与显示系统的电视显示标准中色彩空间的定义,例如4K/8K视频可以采用BT.2020色域,该色域规定了迄今为止使用三原色可以达到的最大色域,使得色彩创作有了更大的空间。但目前的视频色彩创作往往靠人工完成,低清片源的超高清重制往往依靠人工处理,无法自动化,导致片源生产周期长,人力成本高,处理效率低。
本公开实施例提供一种视频处理方法及装置、电子设备、计算机可读存储介质,该方法可以由终端设备或服务器等电子设备执行,终端设备可以包括但不限于:个人电脑、智能手机、平板电脑、个人数字助理(Personal Digital Assistant,PDA)、服务器等;该视频处理方法可以通过处理器调用存储器中存储的计算机可读程序指令的方式来实现;或者,可通过服务器执行该方法。
为了更好地理解本公开,下面将结合附图,详细描述根据本公开实施例的视频处理方法,应注意,这些实施例并不是用来限制本公开公开的范围。
图1为本公开的一些实施例中的视频处理方法的流程图。如图1所示,该视频处理方法包括如下步骤。
S110,获取待处理视频的视频帧。
在该步骤中,执行设备可以通过通信网络接收待处理视频,可以从执行设备的内部存储装置或外部存储装置中读取待处理视频,也可以通过执行设备上的搜索引擎从互联网上下载得到待处理视频,还可以通过执行设备上的拍摄装置,例如照相机、摄像头等拍摄得到待处理视频,更可以从视频处理指令中获取该指令携带的待处理视频的视频帧。应理解,本公开实施例中的待处理视频的获取方式还可以是其它获取方式, 具体内容在此处不作限定。
S120,利用色彩增强模型池中的至少一个色彩增强模型对视频帧进行色彩调整,得到色彩调整后的视频输出帧。
其中,色彩增强模型池包含针对多种视频内容类型的不同风格需求的视频进行色彩调整的模型集合。
在一些实施例中,色彩增强模型池可以包括至少一个色彩增强模型,色彩增强模型池中的每个色彩增强模型可用于对一种视频内容类型的一种风格需求的视频帧进行色彩调整。
作为示例,视频内容类型可以包括如下类型中的至少一种:电视剧类、记录片类、电影类、动漫类、体育类、新闻类、戏剧类、娱乐类和教育类;在实际应用场景中,视频内容类型还可以是其它类型,在此不做具体限定。
作为示例,风格需求是指对待处理视频的风格特征的需求;对于某一种视频内容类型的视频而言,其所包括的视频内容一般具有较为统一的风格特征,风格特征可以包括如下特征项中的至少一种:画风、画质、色调、亮度。
其中,画风是指观看视频给观看者形成的一种整体视听感受和视觉风格;画质即画面质量,包括清晰度、锐度、镜头畸变、色散度、分辨率、色域范围、色彩纯度(或色彩艳度)、色彩平衡等指标;色调是指图像的相对明暗颜色;亮度是指图像画面的明亮程度。在实际应用场景中,视频内容的风格特征还可以包括其它特征项,在此不做具体限定。
根据本公开实施例的视频处理方法,色彩增强模型池包含针对多种视频种类的风格需求的视频进行调整的模型集合,根据该模型池中的至少一个色彩增强模型对获取到的待处理视频的视频帧进行色彩调整,可以自动生成与该视频帧的视频内容类型和风格需求对应的调色效果,视频处理过程可以由执行设备调用模型进行自动化实现,从而可以自动化 进行视频色彩创作,自动化完成低清片源的超高清重制,缩短片源生产周期,节约人力成本,提高视频处理效率。
图2示出本公开一些实施例中的进行色彩调整处理的流程图。如图2所示,在一些实施例中,步骤S120具体可以包括如下步骤。
S21,接收包含视频帧的视频内容类型和风格需求的信息。
在一些实施例中,在接收到待处理视频的视频帧的情况下,还可以接收包含视频帧的视频内容类型和风格需求的信息,以用于根据该视频内容类型和风格需求从色彩增强模型池中进行相应模型的选择。示例性地,步骤S21执行顺序可以是在执行步骤S110之前或之后,步骤S21也可以与步骤S110同步执行,即执行设备可以在接收待处理视频的视频帧时,同步接收包含该视频帧的视频内容类型和风格需求的信息;在实际应用场景中,只要在选择色彩增强模型之前,获取到待处理视频的视频帧和视频内容类型和风格需求即可,步骤S110和步骤S21的执行先后顺序可以根据实际需要进行灵活设置。
S22,从色彩增强模型池中选择与视频内容类型和风格需求对应的色彩增强模型。
在一些实施例中,不同视频内容类型可以具有不同风格需求,同一视频内容类型的风格需求也可以有所不同;因此,根据接收到的包含视频帧的视频内容类型和风格需求的信息,可以从色彩增强模型池中选择一个与该视频内容类型和风格需求相匹配的色彩增强模型。
S23,利用选择的色彩增强模型对视频帧进行处理,得到色彩调整后的视频输出帧。
在一些实施例中,根据接收到的包含视频帧的视频内容类型和风格需求的信息,可以从色彩增强模型池中选择出一个与该视频内容类型和风格需求相匹配的色彩增强模型,则可以根据该相匹配的色彩增强模型对视频帧进行处理,得到色彩调整后的视频输出帧。
在另一些实施例中,若从色彩增强模型池中选择至少两个色彩增强模型,该至少两个色彩增强模型的其中一个模型是与该视频内容类型和风格需求相匹配的模型,则可以分别使用该至少两个色彩增强模型中的每个模型对视频帧进行处理,得到选择的每个色彩增强模型的输入结果,并根据预设的每个色彩增强模型与视频帧对应的权重值,对选择的每个色彩增强模型的输入结果进行加权融合处理,生成视频输出帧。
在本公开实施例中,在利用色彩增强模型池中的至少一个色彩增强模型对视频帧进行色彩调整时,对于获取到的待处理视频的视频帧,可以先根据接收到的包含视频帧的视频内容类型和风格需求的信息,从色彩增强模型池中选择相匹配的色彩增强模型,再利用该选择的色彩增强模型对待处理视频的视频帧进行处理,生成与接收的视频内容类型和风格需求对应的调色效果,实现对待处理视频的自动化调整,满足相应视频内容类型的色彩风格需求,提高视频色彩调整的数据处理效率。
图3示出本公开一些实施例中的进行色彩调整处理的流程图。如图3所示,在一些实施例中,步骤S120具体可以包括如下步骤。
S31,利用色彩增强模型池中的每个色彩增强模型处理视频帧,得到每个色彩增强模型输出的色彩调整后的视频帧。
在该步骤中,色彩增强模型池中的每个色彩增强模型均对待处理视频的视频帧进行处理,得到每个色彩增强模型根据自个对应的视频内容类型的风格需求对视频帧的色彩调整结果。
S32,基于内容分析模型处理视频帧,生成每个色彩增强模型与视频帧对应的权重值;其中,内容分析模型配置为根据视频帧确定色彩增强模型池中的每个色彩增强模型的权重值。
在该步骤中,内容分析模型可以通过分析视频帧的视频内容类型,为色彩增强模型池中的每个色彩增强模型分配权重值(也可以称为是权重系数),从而通过内容分析模型输出的每个色彩增强模型与视频帧对 应的权重值对模型池中每个模型的输出结果进行权重分配。
S33,使用每个色彩增强模型与视频帧对应的权重值,对每个色彩增强模型输出的色彩调整后的视频帧进行加权融合处理,生成视频输出帧。
在该步骤中,加权融合处理包括:将每个色彩增强模型与视频帧对应的权重值,与相应色彩增强模型输出的色彩调整后的视频帧进行矩阵相乘运算;再将每个矩阵相乘运算结果进行矩阵相加,得到矩阵形式的视频输出帧。
在本公开实施例的视频处理方法中,可以通过内容分析模型输出的每个色彩增强模型与视频帧对应的权重值,对色彩增强模型池中的每个色彩增强模型进行权重分配,最后输出该模型池中每个色彩增强模型的处理结果的融合后的色彩调整结果,该色彩调整结果基于待处理视频的视频内容类型分析,得到更加符合视频内容类型的视频色彩调整结果。
下面结合图4a和图4b,描述本公开示例性实施例的视频处理系统和相应的视频处理方法。图4a示出本公开一些实施例的视频处理系统的结构示意图;图4b示出本公开一些实施例的视频处理系统的结构示意图;图4a和图4b中相同的标号可以表示相同的结构。
如图4a所示,在一些实施例中,视频处理系统包括:视频帧输入模块41、色彩增强模型池处理模块42、内容分析模型处理模块43、权重值输出模块44、融合处理模块45和视频帧输出模块46。
在一些实施例中,视频帧输入模块41可以用于获取输入的视频帧。
在一些实施例中,执行设备可以从接收到的视频处理指令中,获取该指令中携带的视频帧。
色彩增强模型池处理模块42,用于利用色彩增强模型池中的至少一个色彩增强模型对视频帧进行色彩调整,得到色彩调整后的视频输出帧。
在一些实施例中,色彩增强模型池处理模块可以简称为色彩增强模型池。
在一些实施例中,在获取输入的视频帧的情况下,若还接收包含该视频帧的视频内容类型和风格需求的信息,则可以从色彩增强模型池中选择与视频内容类型和风格需求对应的色彩增强模型,利用选择的色彩增强模型对视频帧进行处理,得到色彩调整后的视频输出帧。
在一些实施例中,无论是否接收到包含该视频帧的视频内容类型和风格需求的信息,均可以利用色彩增强模型池中的每个色彩增强模型处理视频帧,得到每个色彩增强模型输出的色彩调整后的视频帧。
内容分析模型处理模块43,用于基于内容分析模型处理视频帧,生成每个色彩增强模型与视频帧对应的权重值。
在一些实施例中,可以预先设置色彩增强模型池中每个色彩增强模型与视频帧对应的权重值;例如,在大批量的处理同一视频内容类型的相同风格需求的待处理视频时,可以预先根据多次测试结果或实际经验为色彩增强模型池中每个色彩增强模型分配的权重值。
在另一些实施例中,基于内容分析模型处理视频帧,可自动生成每个色彩增强模型与视频帧对应的权重值,减少人工处理的参与,有利于实现对视频进行色彩创作自动化,大幅降低视频重制成本,提高视频处理效率。
权重值输出模块44,用于输出每个色彩增强模型与视频帧对应的权重值。
融合处理模块45,用于使用每个色彩增强模型与视频帧对应的权重值,对每个色彩增强模型输出的色彩调整后的视频帧进行加权融合处理,得到融合结果。
视频帧输出模块46,用于根据融合结果生成色彩调整后的视频输出帧。
在本公开实施例中,可以根据色彩增强模型池中的至少一个色彩增强模型对获取的视频帧进行色彩调整,自动生成与该视频帧的视频内容 类型和风格需求对应的调色效果;并可以通过内容分析模型输出的每个色彩增强模型与视频帧对应的权重值,对色彩增强模型池中的每个色彩增强模型进行权重分配,输出每个色彩增强模型的处理结果的融合后的色彩调整结果,得到更加符合视频内容类型的风格需求的视频色彩调整结果,实现对视频色彩的自动化调整,从而可以降低视频重制成本,提高视频处理效率。
如图4b所示,该视频处理系统与图4a的视频处理系统基本相同,不同之处在于,图4b中示出了色彩增强模型池中的多个色彩增模型。
在图4中,色彩增强模型池中包括至少一组色彩增强模型,每组色彩增强模型对应于一个视频内容类型;每组色彩增强模型中包括至少一个色彩增强模型,每个色彩增强模型对应于一个风格需求。
示例性地,第一组色彩增强模型是与电视剧类视频对应的一组模型,该组模型中包括N1个模型,例如电视剧风格模型1,电视剧风格模型2,……,电视剧风格模型N1;该组模型中的每个模型可以对应于电视剧类视频的不同风格需求;第二组色彩增强模型是与纪录片类视频对应的一组模型,该组模型中包括N2个模型,例如纪录片风格模型1,纪录片风格模型2,……,纪录片风格模型N2,该组模型中的每个模型可以对应于纪录片类视频的不同风格需求;……;第三组色彩增强模型是与电影类视频对应的一组模型,该组模型中包括N3个模型,例如电影风格模型1,电影风格模型2,……,电影风格模型N3;该组模型中的每个模型可以对应于电影类视频的不同风格需求。N1、N2和N3均为大于或等于1的整数。
应理解,色彩增强模型池中还可以包括其它对应于其它视频内容类型的至少一种风格需求的模型组;具体可以根据实际情况进行设置,本公开实施例不做具体限定。
继续参考图4b,图4b中示出了融合模块中的具体处理单元,其中, 符号
Figure PCTCN2022083016-appb-000001
表示矩阵乘法运算,符号∑表示矩阵加法运算。在一些实施例中,上述融合模块具体可以用于:将内容分析模型输出的每个色彩增强模型与视频帧对应的权重值,与相应色彩增强模型输出的色彩调整后的视频帧进行矩阵相乘运算;以及,将每个矩阵相乘运算结果进行矩阵相加,得到矩阵相加结果,以用于根据该矩阵相加结果生成视频输出帧。
在本公开实施例中,视频处理方法可以由执行设备使用人工智能(Artificial Intelligence,AI)模型进行自动化实现。AI模型可以是基于卷积神经网络实现的模型,并可以用于图像检测、图像分类、图像内容分析等图像处理领域。本公开实施例中的色彩增强模型和内容分析模型均为AI模型。
在一些实施例中,色彩增强模型池中的模型具有相同的模型结构。下面通过图5-图8,详细描述色彩增强模型的具体结构和工作原理。
图5示出本公开实施例的色彩增强模型的具体架构示意图。
在图5中,色彩增强模型包括2N个密集残差连接网络模块(Residual Dense Block,RDB)51和N个全连接(Fully Connected)层52,N为大于或等于1的整数。示例性地,色彩增强模型包括:至少两个密集残差连接网络模块(Residual Dense Block,RDB)51和至少一个全连接层52。
在图5示出的色彩增强模型的网络结构中,上述2N个RDB可以被设置为两个任务分支模型:图像重建(Image Reconstruction,IR)分支模型和色彩增强(Color Enhancement)分支模型,每个任务分支模型中均包括N个RDB51。
继续参考图5,色彩增强模型中还可以包括:第一卷积层53、滤波单元54、第二卷积层55、第三卷积层56、第四卷积层57。
下面通过图6和图7,介绍如何通过图5中的色彩增强模型进行视频帧的色彩调整。图6示出本公开实施例中利用色彩增强模型对视频帧进行色彩调整的具体流程图;图7示出本公开实施例中进行图像重建和 色彩调整的具体流程图。
如图6所示,上述步骤S23中利用选择的色彩增强模型对视频帧进行处理,得到色彩调整后的视频输出帧的步骤,具体可以包括如下步骤。
S601,对视频帧进行特征提取,得到第一图像特征。
在该步骤中,可以通过色彩增强模型中的第一卷积层53对输入色彩增强模型视频帧进行特征提取,得到第一图像特征。
S602,对视频帧进行滤波处理,并对滤波处理后的视频帧进行特征提取,得到第二图像特征。
在该步骤中,可以通过滤波单元54对输入色彩增强模型的视频帧进行滤波处理,并通过第二卷积层55对滤波处理后的视频帧进行特征提取,得到第二图像特征。
在一些实施例中,滤波单元54可以是高斯滤波单元、均值滤波单元和中值滤波单元中的一种。其中,高斯滤波单元用于对视频帧进行高斯滤波;均值滤波单元用于对视频帧进行均值滤波;中值滤波单元用于对视频帧进行中值滤波。
具体地,高斯滤波是一种线性平滑滤波,适用于消除高斯噪声,在一些场景中,高斯平滑滤波器对于抑制服从正态分布的噪声非常有效;均值滤波可以将视频帧中的图像像素点的局部图像信息的像素均值,作为该图像像素点处理后的像素值,滤波算法简单,计算速度快;中值滤波可以将视频帧中的像素点的视频帧中的图像像素点的灰度从大到小顺序排列,将灰度中间值作为该图像像素点的灰度值;在一些场景中,中值滤波在对图像降噪时,有利于更好地保留图像的边界。
本公开实施例中,通过滤波单元对视频帧进行滤波处理,不仅有利于去除视频帧的噪声,实现视频帧中图像的平滑过渡,同时也有利于对视频帧中的图像进行低频特征增强,得到视频帧中的图像的低频信息。
在一些实施例中,若图像中的一个区域内相邻像素的像素值相差较 小(小于或等于像素值阈值),则该区域的信息为低频信息;若图像中的一个区域内相邻像素的像素值相差较大(超过像素值阈值),则该区域的信息为高频信息。
在一些实施例中,低频信息可以是图像的颜色信息、灰度信息等,高频信息可以是指图像的纹理信息、边缘信息等;经滤波单元的滤波处理后,有利于后续提取的图像特征,较少的受到图像噪声和图像纹理等高频信息的影响,从而保留较多的低频颜色特征(也可以称为是低频色彩特征)。
应理解,滤波单元还可以是其它类型滤波单元,以用于对视频帧进行其它类型的滤波处理;具体可以根据实际需要进行滤波单元类型的选择,本公开实施例不做具体限定。
S603,将第一图像特征输入图像重建分支模型,第二图像特征输入色彩增强分支模型,以根据第一图像特征和第二图像特征进行图像重建和色彩调整。
图像重建(Image Reconstruction,IR)是将已有的低分辨率(Low-resolution,LR)图像进行处理,转化成高分辨率(High-resolution,HR)图像的技术。
在该步骤中,图像重建分支模型通过RDB对视频帧的第一图像特征进行特征提取,并对输出图像进行粗精度重建;粗精度重建是将待处理视频的视频帧作为初始图像,初始图像经过卷积层的特征提取后得到初始图像的特征信息,利用RDB提取初始图像的特征信息,根据该初始图像的特征信息进行重建得到的图像可以称为是粗精度重建的图像。
在该步骤中,对滤波单元的滤波处理后得到的第二图像特征(即低频色彩特征)进行分析提取,以用于后续根据分析提取的特征进行色彩调整。
S604,对图像重建分支模型的输出结果和色彩增强分支模型的输出 结果进行特征融合处理,得到融合结果。
在该步骤中,对图像重建分支模型的输出结果和色彩增强分支模型的输出结果进行对应特征通道相乘,再将特征通道相乘运算结果进行特征通道相加,得到融合结果。
S605,根据融合结果,生成色彩调整后的视频输出帧。
在该步骤中,根据融合结果得到输入视频帧的全局的色彩调整输出结果。
示例性地,使用第三卷积层56对图像重建分支模型的输出结果进行特征提取,以及使用第四卷积层57对图像重新分支模型进行特征提取;并将第三卷积层56的特征提取结果和第四卷积层57的特征提取结果进行对应特征通道相乘,以及进行两个任务分支模型的特征融合,得到该色彩增强模型的色彩调整输出结果。
在本公开实施例中,在使用色彩增强模型对视频帧进行处理时,可以通过图像重建分支模型和色彩增强分支模型对输入的视频帧进行图像重建和色彩调整,并对图像重建分支模型的输出结果和色彩增强分支模型的输出结果进行特征融合处理,得到该色彩增强模型的色彩调整输出结果。
在图5中,每个RDB51中卷积核的尺寸(即,宽度和高度)相同,例如RDB51中卷积核的尺寸为3×3,输出特征通道数为64,滤波单元54中滤波核的尺寸为21×21,第一卷积层53的尺寸和第二卷积层55的尺寸相同,例如二者的尺寸均为3×64,第三卷积层56的尺寸和第四卷积层57的尺寸相同,例如二者的尺寸均为64×3。
需要说明的是,本公开实施例中用于提取特征的卷积核的数量和尺寸、每个RDB51中的卷积核的数量和尺寸,以及滤波单元54中滤波核的数量和尺寸均可以根据需要进行设置,本公开实施例不做具体限定。
在一些实施例中,对于图5示出的色彩增强模型中的两个任务分支 模型:图像重建分支模型和色彩增强分支模型,每个分支模型中均可以包括N个密集残差连接网络模块RDB;色彩增强模型中还包括N个全连接层;其中,N为大于或等于1的整数;下面通过具体实施例描述基于该两个任务分支模型的模型架构所进行的图像重建和色彩调整处理过程。
如图7所示,上述步骤S603中,将第一图像特征输入图像重建分支模型,第二图像特征输入色彩增强分支模型,以根据第一图像特征和第二图像特征进行图像重建和色彩调整的步骤,具体可以包括如下步骤。
S701,获取图像重建分支模型中的首个RDB和色彩增强分支模型中的首个RDB,作为第一分支模型当前RDB和第二分支模型当前RDB。
S702,通过第一分支模型当前RDB处理第一图像特征,通过第二分支模型当前RDB处理第二图像特征,得到第一分支模型当前RDB输出的特征图和第二分支模型当前RDB输出的低频色彩特征。
S703,对于第二分支模型当前RDB输出的低频色彩特征,经一个全连接层处理后得到特征向量,将特征向量与第一分支模型当前RDB输出的特征图对应相乘,得到经局部色彩特征调整的特征图。
在该步骤中,可以通过图像重建分支模型中的当前RDB(例如卷积核为3×3,输出特征通道64)进行图像特征提取;通过色彩增强分支模型中的当前RDB(例如卷积核为3×3,输出特征通道64)对图像的低频色彩特征进行分析提取,并将提取的低频色彩特征的各层特征通过全连接层转换成宽度为1的特征向量(例如尺寸为1×64的特征向量),经该全连接层输出的特征向量中各元素分别与图像重建分支模型中当前RDB输出的特征图进行对应特征通道的相乘,可实现局部色彩特征的精细化调整。
S704,将经局部色彩特征调整的特征图作为新的第一图像特征,将第二分支模型当前RDB输出的低频色彩特征作为新的第二图像特征。
S705,获取图像重建分支模型中的下一个RDB和色彩增强分支模型中的下一个RDB,作为新的第一分支模型当前RDB和新的第二分支模型当前RDB,直到获取次数等于N,将新得到的经局部色彩特征调整的特征图作为图像重建分支模型的输出结果,并将新得到的低频色彩特征作为色彩增强分支模型的输出结果。
通过上述步骤S701-S705,得到色彩增强分支模型中最后一个RDB提取的低频色彩特征,将该低频色彩特征通过最后一个全连接层转换成宽度为1的特征向量,再将该最后一个全连接层输出的特征向量中各元素分别与图像重建分支模型中最后一个RDB输出的特征图进行对应特征通道的相乘,得到色彩增强模型中图像重建分支模型的输出结果;将色彩增强分支模型中最后一个RDB提取的低频色彩特征作为色彩增强模型中色彩增强分支模型的输出结果;基于图像重建分支模型的输出结果和色彩增强分支模型的输入结果进行特征融合处理,得到色彩增强模型的色彩调整输出结果,从而实现全局的色彩风格增强。
图8示出本公开实施例中密集残差连接网络模块的具体架构图。如图8所示,在一些实施例中,每个RDB中包括密集连接网络层(Dense Net)511和局部特征融合层512。
在该实施例中,局部特征融合层512用于:利用局部残差学习将RDB的输入特征与密集连接网络层511的输出特征进行特征融合处理,得到RDB的输出特征。
在一些实施例中,密集连接网络层511包括:M个密集连接模块5111,M为大于或等于1的整数;其中,首个密集连接模块5111的输入为RDB的输入特征;第i个密集连接模块5111的输入特征为:RDB的输入特征和位于第i个密集连接模块5111之前的每个密集连接模块5111的输出特征,i为大于1且小于或等于M的整数。
在图8中,示出在M=3的情况下,密集连接网络层511中包含的3 个密集连接模块5111。在实际应用场景中,密集连接模块5111的数量可以根据实际需要进行设置。
继续参考图8,局部特征融合层512的输入包括:RDB的输入特征(例如可以记为F d-1)和每个密集连接模块5111的输出特征(例如可以记为F d,1、F d,C等);局部特征融合层512中可以包括一个合并(Concat)层和一个尺寸为1×1的卷积层;其中,合并层采用连结的方式对输入该合并层的特征融合,1×1的卷积层用于自适应融合一系列不同层次的特征,生成局部特征融合的输出特征(例如可以记为F d,LF);将局部特征融合的输出特征与该RDB的输入特征进行对应通道的特征相加,生成全局特征融合结果(例如可以记为F d),将该全局特征融合结果作为该RDB的输出特征。
在本公开实施例中,RDB是一种将残差网络和密集连接网络相结合的网络结构,在密集连接、特征融合和残差学习的共同作用下,通过保留低层级信息实现信息共享,通过特征融合自适应的获取不同卷积层获取的信息,在图像处理方向上有较好的性能体现。具体地,RDB的中间层可以使用密集连接模块5121的密集连接模式,最后一层的连接模块5122通过残差跳接的方式输出该RDB的处理结果。
在该实施例中,RDB中每个密集连接模块都可以向后面的密集连接模块传递需保存的特征信息,该网络结构可以充分保证特征信息在每一层之间的传递和流通,在实现特征复用的同时减少了参数量,并且在网络参数量较大的情况下,也可以保证更优的处理速度。
在一些实施例中,色彩增强模型池中的每个色彩增强模型具有相同的模型结构;色彩增强模型池中的色彩增强模型,是预先针对不同视频内容类型的色彩风格采用不同的色彩增强数据对,对初始色彩增强模型进行训练得到的训练好的色彩增强模型;色彩增强数据对包括:从未调色图像和第一已调色图像中的相同图像位置选取的具有相同图像尺寸的 未调色图像块和已调色图像块,第一已调色图像是预先获取的基于图像色彩风格特征对未调色图像进行调色后的图像。
在该实施例中,初始色彩增强模型与色彩增强模型池中的每个色彩增强模型具有相同的模型结构;初始色彩增强模型的模型参数的初始值,可以是该模型的参数随机值,也可以是该模型的参数经验值;第一已调色图像可以是预先获取的经调色师进行色彩调整的图像;色彩增强网络模型的训练数据包括:从未调色图像和第一已调色图像中的相同图像位置选取的具有相同图像尺寸的图像块。示例性地,可以在每个色彩增强模型的训练过程中随机抽取图像中某一位置的尺寸为256×256图像块输入网络进行训练。
应理解,训练过程中所需图像块的尺寸可以根据实际需要进行选择,本公开实施例不做具体限定。
在一些实施例中,对初始色彩增强模型即色彩增强模型进行的训练可以包括:使用初始色彩增强模型处理从未调色图像中选取的未调色图像块,得到未调色图像块的色彩调整结果,并从色彩增强数据对中,获取与未调色图像块相对应的已调色图像块,作为色彩调整参考结果;基于色彩调整结果和色彩调整参考结果,构建初始色彩增强模型的损失函数;利用初始色彩增强模型的损失函数调整初始色彩增强模型的模型参数,得到更新的色彩增强模型;将更新的色彩增强模型作为新的初始色彩增强模型,并使用新的初始色彩增强模型处理从未调色图像中选取的未调色图像块,直到初始色彩增强模型的训练过程满足第一训练结束条件,得到训练好的色彩增强模型。
其中,第一训练结束条件包括如下条件项的至少一项:根据初始色彩增强模型的损失函数计算的训练损失小于或者等于设定的第一损失阈值、对初始色彩增强模型的训练次数达到设定的第一次数阈值。
在本公开实施例中,损失函数可以用于估计每次模型训练的结果与 训练目标之间的差距;对于初始色彩增强模型,每次模型训练的结果是初始色彩增强模型输出的未调色图像块的色彩调整结果,训练目标是相应的色彩调整参考结果;第一损失阈值和第一次数阈值可以根据实际训练要求进行自定义设置。
在一些实施例中,色彩增强模型的训练过程中,基于色彩调整结果和色彩调整参考结果,构建初始色彩增强模型的损失函数的步骤,具体可以包括:计算色彩调整结果和色彩调整参考结果之间的最小化平方误差,得到最小化平方误差损失;计算色彩调整结果的高斯滤波输出结果和色彩调整参考结果的高斯滤波输出结果,得到色彩损失;对最小化平方误差损失和色彩损失进行加权求和,生成初始色彩增强模型的损失函数。
示例性地,初始色彩增强模型的损失函数可以表示为下面的表达式(1):
Figure PCTCN2022083016-appb-000002
彩调整结果,I gt1为色彩调整参考结果,Gauss()为计算高斯滤波输出结果的函数,该滤波核的尺寸即被训练的色彩增强模型中的滤波单元的滤波核尺寸,
Figure PCTCN2022083016-appb-000003
为色彩调整结果与色彩调整参考结果的差值的L2范数,即初始色彩增强模型的每次模型训练的图像特征的L2损失,
Figure PCTCN2022083016-appb-000004
为初始色彩增强模型的每次模型训练的色彩调整结果的高斯滤波输出结果和色彩调整参考结果的高斯滤波输出结果的差值的L2范式,即每次模型训练的色彩损失;a为该L2损失的权值,b为该色彩损失的权值;a和b可以根据实际训练需要或经验值进行预先设置。
在一些实施例中,该模型的训练过程中,可以采用网络优化器例如Adam优化器,保证每次模型训练的输出结果(色彩调整结果)更接近 于模型训练的目标(色彩调整参考结果)。
在本公开实施例中,针对不同视频内容类型的风格需求可以采用不同的色彩增强数据对,对色彩增强模型进行训练,得到针对根据不同视频内容类型的不同风格需求对视频帧进行色彩调整,从而生成不同调色效果的模型集合,得到色彩增强模型池,以用于根据色彩增强模型池对待处理视频的视频帧进行自动化的色彩调整,提高色彩调整效率。
图9示出本公开实施例中内容分析模型的具体架构示意图。如图9所示,在一些实施例中,内容分析模型包括:依次连接的K个层叠卷积结构91的卷积网和两个全连接层。
其中,前K-1个层叠卷积结构91中的每个层叠卷积结构91中均包括一个卷积层911、一个激活层(The Rectified Linear Unit,ReLU)912和一个池化层(Max Pooling)913,第K层叠卷积结构92中包括一个卷积层911和一个激活层912;两个全连接层为全连接层93和全连接层94。
在本公开实施例中,激活层912可以用于将相应卷积层输出结果做非线性映射,用于加快模型的收敛速度;池化层913可以用于进行图像下采样(Under Sampling),通过下采样可以降低图像特征维度,提高精度和避免过拟合。
在图9中,作为示例,输入图像的尺寸是256×256×3(长度、宽度和高度),内容分析模型汇中每个层叠卷积结构91中,卷积层的尺寸可以是3×3;以K=5为例,对于依次连接的5个层叠卷积结构91,第1个层叠卷积结构91中可以将输入图像的尺寸缩放到128×128×32,第2个层叠卷积结构91中可以将输入图像的尺寸缩放到64×64×64,第3个层叠卷积结构91中可以将输入图像的尺寸缩放到32×32×128,第4个层叠卷积结构91中可以将输入图像的尺寸缩放到16×16×256,第5个层叠卷积结构91中可以将输入图像的尺寸缩放到8×8×512。
继续参考图9,全连接层93将最后一个层叠卷积结构91的输出结 果转换为指定维度的一维向量(例如1×1024),全连接层94用于将该一维向量转化为维度为L的一维向量,L为色彩增强模型池中的模型总数量。
在一些实施例中,图9中的色彩增强模型池与上述实施例中结合图4a和图4b描述的色彩增强模型池具有相同的模型结构。
在本公开实施例中,内容分析模型中的层叠卷积网络结构可以通过层叠卷积层的方式分析和提取图像特征,通过卷积层进行特征提取,最后的卷积层输出通过两个全连接层转换为L个权重值,L为大于或等于1的整数,每个权重值与相应的色彩增强模型的输出结果相乘,再将每个相乘运算结果进行相应特征通道的特征相加求和,生成最终的色彩增强结果,从而通过对视频帧内容的分析,将不同权重分配给不同的色彩增强模型,以达到根据内容进行色彩自适应调整的目的。
在一些实施例中,内容分析模型,是基于预先训练好的色彩增强模型池中的每个色彩增强模型,采用不同风格的图像数据对,对初始内容分析模型进行训练得到的训练好的内容分析模型;其中,图像数据对包括:从未调色图像和第二已调色图像中的相同图像位置选取的具有相同图像尺寸的未调色图像块和已调色图像块,第二已调色图像是预先获取的基于不同视频内容类型对未调色图像进行调色后的图像。
在该实施例中,初始内容分析模型的模型结构与内容分析模型的模型结构相同;初始内容分析模型的模型参数的初始值,可以是该模型的参数随机值,也可以是该模型的参数经验值;第二已调色图像可以是预先获取的经调色师进行色彩调整的图像;第二已调色图像与上述实施例中的第一已调色图像不同。
其中,内容分析模型的训练数据包括:从未调色图像和第二已调色图像中的相同图像位置选取的具有相同图像尺寸的图像块。示例性地,可以在每个色彩增强模型的训练过程中随机抽取图像中某一位置的尺寸 为256×256图像块输入网络进行训练,随机抽取的方式可以保证模型不会出现对某种风格过拟合的情况。
应理解,训练过程中所需图像块的尺寸可以根据实际需要进行选择,本公开实施例不做具体限定。
在一些实施例中,对初始内容分析模型进行的训练包括:利用预先训练好的色彩增强模型池中的每个色彩增强模型,处理从未调色图像中选取的未调色图像块,得到每个色彩增强模型对未调色图像块的色彩调整结果;使用初始内容分析模型对未调色图像块进行处理,得到预定数量的权重值,预定数量与色彩增强模型池中的色彩增强模型的数量相同;根据预定数量的权重值,对每个色彩增强模型对未调色图像块的色彩调整结果进行加权融合处理,得到未调色图像块的色彩增强结果,并从图像数据对中获取与未调色图像块相对应的已调色图像块,作为色彩增强参考结果;基于未调色图像块的色彩增强结果和色彩增强参考结果,构建初始内容分析模型的损失函数;利用初始内容分析模型的损失函数调整初始内容分析模型的模型参数,得到更新的内容分析模型;将更新的内容分析模型作为新的初始内容分析模型,并使用新的初始内容分析模型对未调色图像块进行处理,直到初始内容分析模型的训练过程满足第二训练结束条件,得到训练好的内容分析模型。
其中,第二训练结束条件包括如下条件项的至少一项:根据初始内容分析模型的损失函数计算的训练损失小于或者等于设定的第二损失阈值、初始内容分析模型的训练次数达到设定的第二次数阈值。
在该实施例中,对于初始内容分析模型,每次模型训练的结果是内容分析模型输出的未调色图像块的色彩增强结果,训练目标是相应的色彩增强参考结果;并根据初始内容分析模型的每次模型训练的结果和训练目标构建初始内容分析模型的损失函数;应理解,第二损失阈值和第二次数阈值可以根据实际训练要求进行自定义设置。
在一些实施例中,内容分析模型的训练过程中,初始内容分析模型的损失函数,是根据未调色图像块的色彩增强结果和色彩增强参考结果的最小化平方误差构建的损失函数。
示例性地,初始内容分析模型的损失函数可以表示为下面的表达式(2):
Figure PCTCN2022083016-appb-000005
色彩增强结果,I gt1为色彩增强参考结果,
Figure PCTCN2022083016-appb-000006
为色彩增强结果与色彩增强参考结果的差值的L2范数,即初始内容分析模型的每次模型训练的图像特征的L2损失。
在一些实施例中,该模型的训练过程中,也可以采用网络优化器例如Adam优化器,保证每次模型训练的输出结果(色彩增强结果)更接近于模型训练的目标(色彩增强参考结果)。
在本公开色彩增强模型和内容分析模型的训练过程中,可以设置模型训练的学习率(Learning Rate);学习率决定了损失函数能否收敛到局部最小值以及何时收敛到最小值;作为示例,对于色彩增强模型的训练过程和内容分析模型的训练过程,学习率可以相同例如均为1e-4;在另一些示例中,该两个模型的模型训练过程中的学习率也可以不同,学习率的具体取值可以根据实际情况进行设置。
在本公开实施例中,针对不同视频内容类型的风格需求可以采用不同的色彩增强数据对,对初始内容分析模型进行训练,训练后的内容分析模型可以输出色彩增强模型池中每个色彩增强模型与视频帧对应的权重值,将色彩增强模型池中每个色彩增强模型与视频帧对应的权重值与相应色彩增强模型的输出结果,通过加权融合处理,实现对视频帧进行自适应色彩风格增强处理,提升视频色彩增强效率。
可以理解,本公开提及的上述各个方法实施例,在不违背原理逻辑 的情况下,均可以彼此相互结合形成结合后的实施例,限于篇幅,本公开不再赘述。本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
此外,本公开还提供了视频处理装置、电子设备、计算机可读存储介质,上述均可用来实现本公开提供的任一种视频处理方法,相应技术方案和描述和参见方法部分的相应记载,不再赘述。
本公开实施例提供一种视频处理装置,其是实现本公开上述实施例提供的视频处理方法的相应装置,该装置可采用软件和/或硬件的方式实现,并一般可集成于电子设备中。
图10为本公开实施例的视频处理装置的结构示意图。
参照图10,本公开实施例提供的一种视频处理装置1000包括如下模块。
获取模块1010,用于获取待处理的视频帧。
处理模块1020,用于利用色彩增强模型池中的至少一个色彩增强模型对视频帧进行色彩调整,得到色彩调整后的视频输出帧。
其中,色彩增强模型池包含针对多种视频内容类型的不同风格需求的视频进行色彩调整的模型集合。
根据本公开实施例的视频处理装置,根据该模型池中的至少一个色彩增强模型对获取到的待处理视频的视频帧进行色彩调整,可以自动生成与该视频帧的视频内容类型和风格需求对应的调色效果,视频处理过程可以由执行设备调用模型进行自动化实现,从而可以自动化进行视频色彩创作,节约人力成本,提高视频处理效率。
在一些实施例中,处理模块1020具体可以包括:接收单元,用于接收包含视频帧的视频内容类型和风格需求的信息;选择单元,用于从色彩增强模型池中选择与视频内容类型和风格需求对应的色彩增强模型;模型处理单元,用于利用选择的色彩增强模型对视频帧进行处理,得到 色彩调整后的视频输出帧。
在一些实施例中,处理模块1020具体可以包括:色彩调整单元,用于利用色彩增强模型池中的每个色彩增强模型处理视频帧,得到每个色彩增强模型输出的色彩调整后的视频帧;内容分析单元,用于基于内容分析模型处理视频帧,生成每个色彩增强模型与视频帧对应的权重值;其中,内容分析模型配置为根据视频帧确定色彩增强模型池中的每个色彩增强模型的权重值;融合处理单元,用于使用每个色彩增强模型与视频帧对应的权重值,对每个色彩增强模型输出的色彩调整后的视频帧进行加权融合处理,生成视频输出帧。
在一些实施例中,色彩增强模型中包括:图像重建分支模型和色彩增强分支模型;模型处理单元具体可以包括:提取子单元,用于对视频帧进行特征提取,得到第一图像特征;滤波子单元,用于对视频帧进行滤波处理,并对滤波处理后的视频帧进行特征提取,得到第二图像特征;分支模型处理子单元,用于将第一图像特征输入图像重建分支模型,第二图像特征输入色彩增强分支模型,以根据第一图像特征和第二图像特征进行图像重建和色彩调整;结果融合子单元,用于对图像重建分支模型的输出结果和色彩增强分支模型的输出结果进行特征融合处理,得到融合结果;结果生成子单元,用于根据融合结果,生成色彩调整后的视频输出帧。
在一些实施例中,图像重建分支模型和色彩增强分支模型中均包括:N个密集残差连接网络模块RDB;色彩增强模型中还包括N个全连接层;其中,N为大于或等于1的整数;分支模型处理子单元具体用于:获取图像重建分支模型中的首个RDB和色彩增强分支模型中的首个RDB,作为第一分支模型当前RDB和第二分支模型当前RDB;通过第一分支模型当前RDB处理第一图像特征,通过第二分支模型当前RDB处理第二图像特征,得到第一分支模型当前RDB输出的特征图和第二分支模型 当前RDB输出的低频色彩特征;对于第二分支模型当前RDB输出的低频色彩特征,经一个全连接层处理后得到特征向量,将特征向量与第一分支模型当前RDB输出的特征图对应相乘,得到经局部色彩特征调整的特征图;将经局部色彩特征调整的特征图作为新的第一图像特征,将第二分支模型当前RDB输出的低频色彩特征作为新的第二图像特征;获取图像重建分支模型中的下一个RDB和色彩增强分支模型中的下一个RDB,作为新的第一分支模型当前RDB和新的第二分支模型当前RDB,直到获取次数等于N,将新得到的经局部色彩特征调整的特征图作为图像重建分支模型的输出结果,并将新得到的低频色彩特征作为色彩增强分支模型的输出结果。
在一些实施例中,RDB中包括密集连接网络层和局部特征融合层;局部特征融合层用于:利用局部残差学习将RDB的输入特征与密集连接网络层的输出特征进行特征相加,得到RDB的输出特征;密集连接层包括:M个密集连接模块和1个连接模块,M为大于或等于1的整数;其中,首个密集连接模块的输入为RDB的输入特征;第i个密集连接模块的输入特征为:RDB的输入特征和位于第i个密集连接模块之前的每个密集连接模块的输出特征,i为大于1且小于或等于M的整数;连接模块的输入为:RDB的输入特征和每个密集连接模块的输出特征。
在一些实施例中,色彩增强模型池中的每个色彩增强模型具有相同的模型结构;色彩增强模型池中的色彩增强模型,是预先针对不同视频内容类型的色彩风格采用不同的色彩增强数据对,对初始色彩增强模型进行训练得到的训练好的色彩增强模型;色彩增强数据对包括:从未调色图像和第一已调色图像中的相同图像位置选取的具有相同图像尺寸的未调色图像块和已调色图像块,第一已调色图像是预先获取的基于图像色彩风格特征对未调色图像进行调色后的图像。
在一些实施例中,视频处理装置1000还包括色彩增强模型训练模块, 用于对初始色彩增强模型进行训练;色彩增强训练模块包括:第一调整结果获得单元,用于使用初始色彩增强模型处理从未调色图像中选取的未调色图像块,得到未调色图像块的色彩调整结果;第一参考结果获取单元,用于从色彩增强数据对中,获取与未调色图像块相对应的已调色图像块,作为色彩调整参考结果;第一损失函数构建单元,用于基于色彩调整结果和色彩调整参考结果,构建初始色彩增强模型的损失函数;第一参数调整单元,用于利用初始色彩增强模型的损失函数调整初始色彩增强模型的模型参数,得到更新的色彩增强模型;第一迭代训练单元,用于将更新的色彩增强模型作为新的初始色彩增强模型,并使用新的初始色彩增强模型处理从未调色图像中选取的未调色图像块,直到初始色彩增强模型的训练过程满足第一训练结束条件,得到训练好的色彩增强模型;其中,第一训练结束条件包括如下条件项的至少一项:根据初始色彩增强模型的损失函数计算的训练损失小于或者等于设定的第一损失阈值、对初始色彩增强模型的训练次数达到设定的第一次数阈值。
在一些实施例中,色彩增强训练模块,在用于基于色彩调整结果和色彩调整参考结果,构建初始色彩增强模型的损失函数时,具体用于:计算色彩调整结果和色彩调整参考结果之间的最小化平方误差,得到最小化平方误差损失;计算色彩调整结果的高斯滤波输出结果和色彩调整参考结果的高斯滤波输出结果,得到色彩损失;对最小化平方误差损失和色彩损失进行加权求和,生成初始色彩增强模型的损失函数。
在一些实施例中,内容分析模型包括:依次连接的K个层叠卷积结构的卷积网和两个全连接层,前K-1个层叠卷积结构中的每个层叠卷积结构中均包括一个卷积层、一个激活层和一个池化层,第K层叠卷积结构中包括一个卷积层和一个激活层。
在一些实施例中,内容分析模型,是预先基于预先训练好的色彩增强模型池中的每个色彩增强模型,采用不同风格的图像数据对,对初始 内容分析模型进行训练得到的训练好的内容分析模型;其中,图像数据对包括:从未调色图像和第二已调色图像中的相同图像位置选取的具有相同图像尺寸的未调色图像块和已调色图像块,第二已调色图像是预先获取的基于不同视频内容类型对未调色图像进行调色后的图像。
在一些实施例中,视频处理装置1000还包括内容分析模型训练模块,用于对初始内容分析模型进行训练;内容分析模型训练模块包括:第二调整结果获得单元,用于利用预先训练好的色彩增强模型池中的每个色彩增强模型,处理从未调色图像中选取的未调色图像块,得到每个色彩增强模型对未调色图像块的色彩调整结果;权重值获得单元,用于使用初始内容分析模型对未调色图像块进行处理,得到预定数量的权重值,预定数量与色彩增强模型池中的色彩增强模型的数量相同;增强结果获得单元,用于根据预定数量的权重值,对每个色彩增强模型对未调色图像块的色彩调整结果进行加权融合处理,得到未调色图像块的色彩增强结果;第二参考结果获取单元,用于从图像数据对中获取与未调色图像块相对应的已调色图像块,作为色彩增强参考结果;第二损失函数构建单元,用于基于未调色图像块的色彩增强结果和色彩增强参考结果,构建初始内容分析模型的损失函数;第二参数调整单元,用于利用初始内容分析模型的损失函数调整初始内容分析模型的模型参数,得到更新的内容分析模型;第二迭代训练单元,用于将更新的内容分析模型作为新的初始内容分析模型,并使用新的初始内容分析模型对未调色图像块进行处理,直到初始内容分析模型的训练过程满足第二训练结束条件,得到训练好的内容分析模型;其中,第二训练结束条件包括如下条件项的至少一项:根据初始内容分析模型的损失函数计算的训练损失小于或者等于设定的第二损失阈值、初始内容分析模型的训练次数达到设定的第二次数阈值。
在一些实施例中,初始内容分析模型的损失函数,是根据未调色图 像块的色彩增强结果和色彩增强参考结果的最小化平方误差构建的损失函数。
根据本公开实施例的视频处理装置,可以利用色彩增强模型池中的色彩增强模型和内容分析模型分别处理输入的视频帧,从而可利用内容分析模型输出的色彩增强模型池中的每个色彩增强模型与视频帧对应的权重值,以对色彩增强模型池中的每个色彩增强模型进行与视频帧对应的权重分配,最后输出对色彩增强模型池中的每个色彩增强模型的处理结果进行加权融合后的色彩调整结果,该色彩调整结果基于待处理视频的视频内容类型分析,得到更加符合视频内容类型的视频色彩调整结果,从而基于内容分析模型的权重分配对待处理视频的自动化调整,满足相应视频内容类型的色彩风格需求,提高视频色彩调整的数据处理效率。
需要明确的是,本公开并不局限于上文实施例中所描述并在图中示出的特定配置和处理。为了描述的方便和简洁,这里省略了对已知方法的详细描述,并且上述描述的系统、模块和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
图11为本公开实施例提供的一种电子设备的框图。
参照图11,本公开实施例提供了一种电子设备,该电子设备包括:至少一个处理器1101;至少一个存储器1102,以及一个或多个I/O接口1103,连接在处理器1101与存储器1102之间;其中,存储器1102存储有可被至少一个处理器1101执行的一个或多个计算机程序,一个或多个计算机程序被至少一个处理器1101执行,以使至少一个处理器1101能够执行上述的视频处理方法。
可以理解的是,以上实施方式仅仅是为了说明本公开的原理而采用的示例性实施方式,然而本公开并不局限于此。对于本领域内的普通技术人员而言,在不脱离本公开的精神和实质的情况下,可以做出各种变型和改进,这些变型和改进也视为本公开的保护范围。
本公开实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,其中,计算机程序在被处理器/处理核执行时实现上述的视频处理方法。计算机可读存储介质可以是易失性或非易失性计算机可读存储介质。
本公开实施例还提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当计算机可读代码在电子设备的处理器中运行时,电子设备中的处理器执行上述视频处理方法。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读存储介质上,计算机可读存储介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。
如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读程序指令、数据结构、程序模块或其它数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM)、静态随机存取存储器(SRAM)、闪存或其它存储器技术、便携式压缩盘只读存储器
(CD-ROM)、数字多功能盘(DVD)或其它光盘存储、磁盒、磁带、磁盘存储或其它磁存储装置、或者可以用于存储期望的信息并且可以被 计算机访问的任何其它的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读程序指令、数据结构、程序模块或者诸如载波或其它传输机制之类的调制数据信号中的其它数据,并且可包括任何信息递送介质。
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开的各个方面。
这里所描述的计算机程序产品可以具体通过硬件、软件或其结合的 方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其它设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上 可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
本文已经公开了示例实施例,并且虽然采用了具体术语,但它们仅用于并仅应当被解释为一般说明性含义,并且不用于限制的目的。在一些实例中,对本领域技术人员显而易见的是,除非另外明确指出,否则可单独使用与特定实施例相结合描述的特征、特性和/或元素,或可与其它实施例相结合描述的特征、特性和/或元件组合使用。因此,本领域技术人员将理解,在不脱离由所附的权利要求阐明的本公开的范围的情况下,可进行各种形式和细节上的改变。

Claims (16)

  1. 一种视频处理方法,包括:
    获取待处理视频的视频帧;
    利用色彩增强模型池中的至少一个色彩增强模型对所述视频帧进行色彩调整,得到色彩调整后的视频输出帧;
    其中,所述色彩增强模型池包含针对多种视频内容类型的不同风格需求的视频进行色彩调整的模型集合。
  2. 根据权利要求1所述的方法,其中,所述利用色彩增强模型池中的至少一个色彩增强模型对所述视频帧进行色彩调整,得到色彩调整后的视频输出帧,包括:
    接收包含所述视频帧的视频内容类型和风格需求的信息;
    从所述色彩增强模型池中选择与所述视频内容类型和风格需求对应的色彩增强模型;
    利用选择的色彩增强模型对所述视频帧进行处理,得到所述色彩调整后的视频输出帧。
  3. 根据权利要求1所述的方法,其中,所述利用色彩增强模型池中的至少一个色彩增强模型对所述视频帧进行色彩调整,得到色彩调整后的视频输出帧,包括:
    利用色彩增强模型池中的每个色彩增强模型处理所述视频帧,得到所述每个色彩增强模型输出的色彩调整后的视频帧;
    基于内容分析模型处理所述视频帧,生成所述每个色彩增强模型与所述视频帧对应的权重值;其中,所述内容分析模型配置为根据所述视频帧确定所述色彩增强模型池中的每个色彩增强模型的权重值;
    使用所述每个色彩增强模型与所述视频帧对应的权重值,对所述每个色彩增强模型输出的色彩调整后的视频帧进行加权融合处理,生成所述视频输出帧。
  4. 根据权利要求2所述的方法,其中,所述色彩增强模型中包括:图像重建分支模型和色彩增强分支模型;所述利用选择的色彩增强模型对所述视频帧进行处理,得到所述色彩调整后的视频输出帧,包括:
    对所述视频帧进行特征提取,得到第一图像特征;
    对所述视频帧进行滤波处理,并对滤波处理后的视频帧进行特征提取,得到第二图像特征;
    将所述第一图像特征输入所述图像重建分支模型,所述第二图像特征输入所述色彩增强分支模型,以根据所述第一图像特征和所述第二图像特征进行图像重建和色彩调整;
    对所述图像重建分支模型的输出结果和所述色彩增强分支模型的输出结果进行特征融合处理,得到融合结果;
    根据所述融合结果,生成色彩调整后的视频输出帧。
  5. 根据权利要求4所述的方法,其中,所述图像重建分支模型和所述色彩增强分支模型中均包括:N个密集残差连接网络模块RDB;所述色彩增强模型中还包括N个全连接层;其中,N为大于或等于1的整数;
    所述将所述第一图像特征输入所述图像重建分支模型,所述第二图像特征输入所述色彩增强分支模型,以根据所述第一图像特征和所述第二图像特征进行图像重建和色彩调整,包括:
    获取所述图像重建分支模型中的首个RDB和所述色彩增强分支模型中的首个RDB,作为第一分支模型当前RDB和第二分支模型当前RDB;
    通过所述第一分支模型当前RDB处理所述第一图像特征,通过所述第二分支模型当前RDB处理所述第二图像特征,得到所述第一分支模型当前RDB输出的特征图和所述第二分支模型当前RDB输出的低频色彩特征;
    对于所述第二分支模型当前RDB输出的低频色彩特征,经一个全连 接层处理后得到特征向量,将所述特征向量与所述第一分支模型当前RDB输出的特征图对应相乘,得到经局部色彩特征调整的特征图;
    将所述经局部色彩特征调整的特征图作为新的第一图像特征,将所述第二分支模型当前RDB输出的低频色彩特征作为新的第二图像特征;
    获取所述图像重建分支模型中的下一个RDB和所述色彩增强分支模型中的下一个RDB,作为新的第一分支模型当前RDB和新的第二分支模型当前RDB,直到获取次数等于N,将新得到的经局部色彩特征调整的特征图作为所述图像重建分支模型的输出结果,并将新得到的低频色彩特征作为所述色彩增强分支模型的输出结果。
  6. 根据权利要求5所述的方法,其中,
    所述RDB中包括密集连接网络层和局部特征融合层;
    所述局部特征融合层用于:利用局部残差学习将所述RDB的输入特征与所述密集连接网络层的输出特征进行特征相加,得到所述RDB的输出特征;
    所述密集连接层包括:M个密集连接模块和1个连接模块,M为大于或等于1的整数;其中,
    首个密集连接模块的输入为所述RDB的输入特征;
    第i个密集连接模块的输入特征为:所述RDB的输入特征和位于第i个密集连接模块之前的每个密集连接模块的输出特征,i为大于1且小于或等于M的整数;
    所述连接模块的输入为:所述RDB的输入特征和每个密集连接模块的输出特征。
  7. 根据权利要求1-6中任一项所述的方法,其中,所述色彩增强模型池中的每个所述色彩增强模型具有相同的模型结构;
    所述色彩增强模型池中的色彩增强模型,是预先针对不同视频内容类型的色彩风格采用不同的色彩增强数据对,对初始色彩增强模型进行 训练得到的训练好的色彩增强模型;
    所述色彩增强数据对包括:从未调色图像和第一已调色图像中的相同图像位置选取的具有相同图像尺寸的未调色图像块和已调色图像块,所述第一已调色图像是预先获取的基于图像色彩风格特征对所述未调色图像进行调色后的图像。
  8. 根据权利要求7所述的方法,其中,对所述初始色彩增强模型进行的训练包括:
    使用所述初始色彩增强模型处理从所述未调色图像中选取的未调色图像块,得到所述未调色图像块的色彩调整结果,并从所述色彩增强数据对中,获取与所述未调色图像块相对应的已调色图像块,作为色彩调整参考结果;
    基于所述色彩调整结果和所述色彩调整参考结果,构建初始色彩增强模型的损失函数;
    利用初始色彩增强模型的损失函数调整所述初始色彩增强模型的模型参数,得到更新的色彩增强模型;
    将所述更新的色彩增强模型作为新的初始色彩增强模型,并使用所述新的初始色彩增强模型处理从所述未调色图像中选取的未调色图像块,直到所述初始色彩增强模型的训练过程满足第一训练结束条件,得到所述训练好的色彩增强模型;其中,
    所述第一训练结束条件包括如下条件项的至少一项:根据所述初始色彩增强模型的损失函数计算的训练损失小于或者等于设定的第一损失阈值、对所述初始色彩增强模型的训练次数达到设定的第一次数阈值。
  9. 根据权利要求8所述的方法,其中,所述基于所述色彩调整结果和所述色彩调整参考结果,构建初始色彩增强模型的损失函数,包括:
    计算所述色彩调整结果和所述色彩调整参考结果之间的最小化平方误差,得到最小化平方误差损失;
    计算所述色彩调整结果的高斯滤波输出结果和所述色彩调整参考结果的高斯滤波输出结果,得到色彩损失;
    对所述最小化平方误差损失和所述色彩损失进行加权求和,生成所述初始色彩增强模型的损失函数。
  10. 根据权利要求3所述的方法,其中,所述内容分析模型包括:依次连接的K个层叠卷积结构的卷积网和两个全连接层,前K-1个层叠卷积结构中的每个层叠卷积结构中均包括一个卷积层、一个激活层和一个池化层,第K层叠卷积结构中包括一个卷积层和一个激活层。
  11. 根据权利要求3或10所述的方法,其中,
    所述内容分析模型,是基于预先训练好的所述色彩增强模型池中的每个色彩增强模型,采用不同风格的图像数据对,对初始内容分析模型进行训练得到的训练好的内容分析模型;其中,
    所述图像数据对包括:从未调色图像和第二已调色图像中的相同图像位置选取的具有相同图像尺寸的未调色图像块和已调色图像块,所述第二已调色图像是预先获取的基于不同视频内容类型对未调色图像进行调色后的图像。
  12. 根据权利要求11所述的方法,其中,对所述初始内容分析模型进行的训练包括:
    利用预先训练好的色彩增强模型池中的每个色彩增强模型,处理从所述未调色图像中选取的未调色图像块,得到所述每个色彩增强模型对所述未调色图像块的色彩调整结果;
    使用初始内容分析模型对所述未调色图像块进行处理,得到预定数量的权重值,所述预定数量与所述色彩增强模型池中的色彩增强模型的数量相同;
    根据所述预定数量的权重值,对所述每个色彩增强模型对所述未调色图像块的色彩调整结果进行加权融合处理,得到所述未调色图像块的 色彩增强结果,并从所述图像数据对中获取与所述未调色图像块相对应的已调色图像块,作为所述色彩增强参考结果;
    基于所述未调色图像块的色彩增强结果和所述色彩增强参考结果,构建所述初始内容分析模型的损失函数;
    利用初始内容分析模型的损失函数调整所述内容分析模型的模型参数,以利用调整后的模型参数,得到更新的内容分析模型;
    将所述更新的内容分析模型作为新的初始内容分析模型,并使用所述新的初始内容分析模型对所述未调色图像块进行处理,直到所述初始内容分析模型的训练过程满足第二训练结束条件,得到所述训练好的内容分析模型;
    其中,所述第二训练结束条件包括如下条件项的至少一项:根据所述初始内容分析模型的损失函数计算的训练损失小于或者等于设定的第二损失阈值、所述初始内容分析模型的训练次数达到设定的第二次数阈值。
  13. 根据权利要求12所述的方法,其中,
    所述初始内容分析模型的损失函数,是根据所述未调色图像块的色彩增强结果和所述色彩增强参考结果的最小化平方误差构建的损失函数。
  14. 一种视频处理装置,包括:
    获取模块,用于获取待处理的视频帧;
    处理模块,用于利用色彩增强模型池中的至少一个色彩增强模型对所述视频帧进行色彩调整,得到色彩调整后的视频输出帧;
    其中,所述色彩增强模型池包含针对多种视频内容类型的不同风格需求的视频进行色彩调整的模型集合。
  15. 一种电子设备,包括存储器和处理器,所述存储器上存储有计算机程序,其中,所述计算机程序被所述处理器执行时实现权利要求1至13中任意一项所述的视频处理方法。
  16. 一种非瞬态计算机可读存储介质,其上存储有计算机程序,其中,该计算机程序被处理器执行时实现权利要求1至13中任意一项所述的视频处理方法。
PCT/CN2022/083016 2022-03-25 2022-03-25 视频处理方法及装置、电子设备、计算机可读存储介质 WO2023178648A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2022/083016 WO2023178648A1 (zh) 2022-03-25 2022-03-25 视频处理方法及装置、电子设备、计算机可读存储介质
CN202280000545.5A CN117157665A (zh) 2022-03-25 2022-03-25 视频处理方法及装置、电子设备、计算机可读存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/083016 WO2023178648A1 (zh) 2022-03-25 2022-03-25 视频处理方法及装置、电子设备、计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2023178648A1 true WO2023178648A1 (zh) 2023-09-28

Family

ID=88099482

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/083016 WO2023178648A1 (zh) 2022-03-25 2022-03-25 视频处理方法及装置、电子设备、计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN117157665A (zh)
WO (1) WO2023178648A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109257600A (zh) * 2018-11-28 2019-01-22 福建帝视信息科技有限公司 一种基于深度学习的视频压缩伪影自适应去除方法
CN111031346A (zh) * 2019-10-28 2020-04-17 网宿科技股份有限公司 一种增强视频画质的方法和装置
CN113228660A (zh) * 2018-12-18 2021-08-06 杜比实验室特许公司 增强的标准动态范围视频(sdr+)中的基于机器学习的动态合成
CN113298740A (zh) * 2021-05-27 2021-08-24 中国科学院深圳先进技术研究院 一种图像增强方法、装置、终端设备及存储介质
CN113313650A (zh) * 2021-06-09 2021-08-27 北京百度网讯科技有限公司 图像画质增强方法、装置、设备和介质
CN113780252A (zh) * 2021-11-11 2021-12-10 深圳思谋信息科技有限公司 视频处理模型的训练方法、视频处理方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109257600A (zh) * 2018-11-28 2019-01-22 福建帝视信息科技有限公司 一种基于深度学习的视频压缩伪影自适应去除方法
CN113228660A (zh) * 2018-12-18 2021-08-06 杜比实验室特许公司 增强的标准动态范围视频(sdr+)中的基于机器学习的动态合成
CN111031346A (zh) * 2019-10-28 2020-04-17 网宿科技股份有限公司 一种增强视频画质的方法和装置
CN113298740A (zh) * 2021-05-27 2021-08-24 中国科学院深圳先进技术研究院 一种图像增强方法、装置、终端设备及存储介质
CN113313650A (zh) * 2021-06-09 2021-08-27 北京百度网讯科技有限公司 图像画质增强方法、装置、设备和介质
CN113780252A (zh) * 2021-11-11 2021-12-10 深圳思谋信息科技有限公司 视频处理模型的训练方法、视频处理方法和装置

Also Published As

Publication number Publication date
CN117157665A (zh) 2023-12-01

Similar Documents

Publication Publication Date Title
US11158286B2 (en) Machine learning color science conversion
Jiang et al. Learning the image processing pipeline
US11430095B2 (en) Automatic display management metadata generation for gaming and/or SDR+ contents
WO2023010754A1 (zh) 一种图像处理方法、装置、终端设备及存储介质
US20230230215A1 (en) Image processing method and apparatus, device, and medium
US11145065B2 (en) Selection of video frames using a machine learning predictor
US20230370646A1 (en) Adaptive local reshaping for sdr-to-hdr up-conversion
CN111047543A (zh) 图像增强方法、装置和存储介质
CN111612722A (zh) 基于简化Unet全卷积神经网络的低照度图像处理方法
CN113781318A (zh) 一种图像颜色映射方法、装置、终端设备及存储介质
Eilertsen The high dynamic range imaging pipeline
CN115984570A (zh) 一种视频去噪方法、装置、存储介质和电子装置
US11941816B2 (en) Automated cropping of images using a machine learning predictor
EP4285314A1 (en) Simultaneously correcting image degradations of multiple types in an image of a face
Kumar et al. Dynamic stochastic resonance and image fusion based model for quality enhancement of dark and hazy images
US20220301124A1 (en) Efficient user-defined sdr-to-hdr conversion with model templates
US11080549B1 (en) Automated cropping of images using a machine learning predictor
WO2023178648A1 (zh) 视频处理方法及装置、电子设备、计算机可读存储介质
CN111861940A (zh) 一种基于条件连续调节的图像调色增强方法
Pitié Advances in colour transfer
WO2023110880A1 (en) Image processing methods and systems for low-light image enhancement using machine learning models
CN112819707B (zh) 一种端到端抗块效应低照度图像增强方法
CN115375909A (zh) 一种图像处理方法及装置
CN114663300A (zh) 基于dce的低照度图像增强方法、系统及相关设备
CN113191376A (zh) 图像处理方法、装置、电子设备和可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22932710

Country of ref document: EP

Kind code of ref document: A1