WO2022242122A1 - Procédé et appareil d'optimisation vidéo, équipement terminal, et support d'enregistrement - Google Patents

Procédé et appareil d'optimisation vidéo, équipement terminal, et support d'enregistrement Download PDF

Info

Publication number
WO2022242122A1
WO2022242122A1 PCT/CN2021/137583 CN2021137583W WO2022242122A1 WO 2022242122 A1 WO2022242122 A1 WO 2022242122A1 CN 2021137583 W CN2021137583 W CN 2021137583W WO 2022242122 A1 WO2022242122 A1 WO 2022242122A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
video
feature
video frame
frames
Prior art date
Application number
PCT/CN2021/137583
Other languages
English (en)
Chinese (zh)
Inventor
刘翼豪
赵恒远
董超
乔宇
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2022242122A1 publication Critical patent/WO2022242122A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present application relates to the technical field of deep learning, and in particular to a video optimization method, device, terminal equipment and storage medium.
  • Video optimization generally includes optimization operations such as video denoising, video rain removal, video super resolution, video color correction, and black and white video coloring.
  • image optimization models such as image denoising models, image deraining models, super-resolution models, image color toning models, black and white image coloring models, etc.
  • image optimization models are often used to extract each image in the video.
  • the intermediate features of a frame of video frame and perform feature estimation on the intermediate features of each frame of video frame, and obtain the optimized image corresponding to each frame of video frame, so as to realize the optimization of video.
  • this method of independently optimizing each video frame in the video based on the image optimization model may cause different video frames to have different optimization effects, affecting the continuity of the optimized video.
  • the present application provides a video optimization method, device, terminal equipment, and storage medium, so as to improve the continuity of the optimized video.
  • the present application provides a video optimization method, including:
  • the video frame sequence includes N frame video frames
  • the M frame anchor frame includes the first video frame of the video frame sequence.
  • M is a positive integer greater than 2 and less than N; use the trained optical flow network to determine the forward optical flow parameters and reverse optical flow parameters of the N-M frame intermediate frame respectively, and the forward optical flow parameter of the intermediate frame
  • the flow parameters are used to describe the transformation relationship from the previous frame of the intermediate frame to the intermediate frame, and the reverse optical flow parameters of the intermediate frame are used to describe the transformation relationship from the subsequent frame to the intermediate frame.
  • Optimizing the video frames except the anchor point frame in the video according to the forward optical flow parameter and the reverse optical flow parameter of the N-M frame intermediate frame, and the intermediate feature of the M frame anchor frame, determine the intermediate feature of the N-M frame intermediate frame; utilize The trained feature estimation network performs feature estimation on the intermediate features of each frame of the video frame sequence respectively to obtain N frames of optimized images, and the N frames of optimized images constitute the optimized video of the video frame sequence.
  • the intermediate feature of the intermediate frame of the N-M frame is determined, including:
  • the value of i is ⁇ 1, 2, ..., N-1, N ⁇
  • the i-th frame video frame is an intermediate frame: use the positive value of the i-th frame video frame Transform the shape of the intermediate features of the i-1th video frame to the optical flow parameters to obtain the forward features of the i-th video frame; use the reverse optical flow parameters of the i-th video frame to transform the i+1th video frame Transform the shape of the reverse feature of the i-th video frame to obtain the reverse feature of the i-th video frame; perform feature fusion on the forward feature of the i-th video frame and the reverse feature of the i-th video frame to obtain the i-th video frame Intermediate features; Wherein, if the i+1th video frame is an anchor frame, the reverse feature value of the i+1th video frame is the intermediate feature of the i+1th video frame.
  • feature fusion is performed on the forward feature of the i-th video frame and the reverse feature of the i-th video frame to obtain the intermediate features of the i-th video frame, including:
  • the i-1th video frame, the i-th video frame, the i+1th video frame, the forward feature of the i-th video frame, the reverse feature of the i-th video frame, the i-1th video frame The forward features of the i+1th video frame and the reverse features of the i+1th video frame are input into the trained FFM model for fusion processing, and the intermediate features of the i-th video frame are obtained, wherein, if the i-1th video frame is the anchor Point frame, the value of the forward feature of the i-1th frame video frame is the intermediate feature of the i-1th frame video frame.
  • fusion processing includes:
  • the forward feature of the frame and the reverse feature of the i+1th video frame are convoluted to obtain supplementary features; the supplementary features and weighted features are superimposed to obtain the intermediate features of the i-th video frame.
  • the method also includes:
  • the initial model of video optimization which includes the initial network of feature extraction, initial network of optical flow, initial network of feature estimation and initial model of FFM; use the preset loss function and training set to conduct unsupervised training on the initial model of video optimization, and get The trained feature extraction network, optical flow network, feature estimation network and FFM model; wherein, the training set includes a plurality of video frame sequence samples to be optimized.
  • the feature extraction network and the feature estimation network are obtained by splitting a preset image optimization model, and the image optimization model is used to perform image optimization on a two-dimensional image.
  • the image optimization model is an image coloring model
  • the video frame sequence includes N frames of grayscale images; for the i-th frame grayscale image in the video frame sequence, the value of i is ⁇ 1, 2 ,..., N-1, N ⁇ , use the feature estimation network to perform feature estimation on the intermediate features of the grayscale image of the i-th frame, and obtain the optimized image of the grayscale image of the i-th frame including:
  • the present application provides a video optimization device, including:
  • the extraction unit is used to utilize the trained feature extraction network to extract the intermediate features of the M frame anchor frame in the video frame sequence to be optimized respectively, the video frame sequence includes N frame video frames, and the M frame anchor frame includes the video frame sequence The first video frame and the Nth video frame, M is a positive integer greater than 2 and less than N;
  • the determining unit is used to determine the forward optical flow parameters and the reverse optical flow parameters of the N-M frame intermediate frame respectively by using the trained optical flow network, and the forward optical flow parameter of the intermediate frame is used to describe the direction of the previous frame of the intermediate frame to the
  • the transformation relationship of the intermediate frame transformation, the reverse optical flow parameter of the intermediate frame is used to describe the transformation relationship of the next frame of the intermediate frame to the intermediate frame transformation, and the intermediate frame is a video frame other than the anchor point frame in the video to be optimized;
  • the determination unit is also used to determine the intermediate features of the N-M frame intermediate frames according to the forward optical flow parameters and reverse optical flow parameters of the N-M frame intermediate frames, and the intermediate features of the M frame anchor frame;
  • the estimation unit is used to use the trained feature estimation network to perform feature estimation processing on the intermediate features of the N frames of the video frame sequence to obtain N frames of optimized images, and the N frames of optimized images constitute the optimized video of the video frame sequence.
  • the present application provides a terminal device, including: a memory and a processor, where the memory is used to store a computer program; and the processor is used to execute the method described in any one of the above first aspects when calling the computer program.
  • the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method described in any one of the above-mentioned first aspects is implemented.
  • an embodiment of the present application provides a computer program product, which, when the computer program product runs on a processor, causes the processor to execute the method described in any one of the above-mentioned first aspects.
  • the intermediate features of the anchor frame are extracted by using the feature extraction network.
  • the optical flow parameters between the intermediate frames of each frame and the adjacent two frames before and after each frame are obtained through the optical flow network (that is, including the previous frame used to describe the intermediate frame to the The forward optical flow parameter of the transformation relationship of the intermediate frame transformation, and the reverse optical flow parameter used to describe the transformation relationship of the subsequent frame of the intermediate frame to the intermediate frame).
  • the intermediate features of the intermediate frames are then calculated using the optical flow parameters and the intermediate features of the anchor frames located before and after the intermediate frame.
  • the intermediate features of the intermediate frames are obtained by forward-propagating and back-propagating the intermediate features of the anchor frame between the intermediate frames. Therefore, the intermediate features of the intermediate frames retain the transformation information from frame to frame. Therefore, the optimized video obtained after feature estimation based on the intermediate features of each frame improves the continuity to a certain extent.
  • FIG. 1 is a schematic diagram of a network structure of a video optimization model provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a video optimization method provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a network structure of an FFM model provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a video optimization device provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the image optimization model is often directly used to individually optimize each frame in the video to achieve video optimization.
  • This method of independently optimizing each video frame in the video based on the image optimization model may cause different video frames to have different optimization effects, affecting the continuity of the optimized video.
  • the present application provides a video optimization method. After extracting the intermediate features of the anchor frames in the sequence of video frames to be optimized, the intermediate features of the anchor frames are placed in the intermediate frames (located between the anchor frames) Forward propagation and backpropagation between video frames) to calculate the intermediate features of the intermediate frames.
  • the intermediate features of the intermediate frames retain the transformation information between frames. Therefore, the optimized video obtained after performing feature estimation on the intermediate features of each frame can guarantee the continuity to a certain extent.
  • a video optimization model provided by this application is exemplarily introduced with reference to FIG. 1 .
  • the video optimization model is deployed in a video processing device, and the video processing device can process a sequence of video frames to be optimized based on the video optimization model, so as to implement the video optimization method provided in this application.
  • the video processing device may be a mobile terminal device such as a smart phone, a tablet computer, or a video camera, or may be a terminal device capable of processing video data such as a desktop computer, a robot, or a server.
  • the video optimization model provided by the present application includes a feature extraction (feature extraction) network GE , an optical flow network (FlowNet ) and a feature estimation network G C .
  • the feature extraction network is used to extract the intermediate features of the input image, and the size of the intermediate features matches the input size required by the feature estimation network.
  • the feature estimation network is used to perform feature estimation on the input intermediate features (including feature mapping, feature reconstruction, etc.), and the output is an optimized image.
  • the feature extraction network and the feature estimation network can be obtained by splitting an image optimization model for image optimization on 2D images.
  • the feature extraction network and feature estimation network are obtained by splitting the image coloring model.
  • the image coloring model can be any network model capable of automatically coloring black and white images, for example, Pix2Pix model, colornet.t7 model, colornet_imagenet.t7 model, etc.
  • the image coloring model generally extracts the intermediate features of the input grayscale image (that is, black and white image, which can be regarded as an L channel image in the Lab domain) through network layers such as layer by layer convolution layer, activation layer, and/or pooling layer. Perform color mapping or color reconstruction on the final extracted intermediate features to obtain an a-channel image and a b-channel image. Finally, the color image corresponding to the grayscale image in the Lab domain is constructed through the a-channel image, b-channel image and the input grayscale image.
  • the sub-network whose input is a grayscale image and whose output is an intermediate feature is defined as a feature extraction network;
  • the sub-network whose input is an intermediate feature and whose output is a color image is defined as a feature estimation network.
  • the feature extraction network and feature estimation network are obtained by splitting the super-resolution model.
  • the super-resolution model can be any network model capable of mapping low-resolution images to high-resolution images, for example, FSRCNN model, CARN model, SRResNet model, RCAN model, etc.
  • the super-resolution model generally extracts the intermediate features of the input low-resolution image through layer-by-layer convolutional layers, residual layers, pooling layers, and/or deconvolution layers, and then upsamples the final extracted intermediate features. (i.e., image reconstruction) to obtain the corresponding high-resolution image.
  • image reconstruction i.e., image reconstruction
  • the sub-network whose input is a low-resolution image and whose output is an intermediate feature is defined as a feature extraction network
  • the sub-network whose input is an intermediate feature and whose output is a high-resolution image is defined as a feature estimation network.
  • video optimization scenarios such as video rain removal, video defogging, and video color adjustment may also be included.
  • video optimization scenarios such as video rain removal, video defogging, and video color adjustment may also be included.
  • the optical flow network is used to estimate the optical flow parameters of two adjacent video frames, that is, the amount of movement of the same object from one video frame to another, which can describe the transformation from one video frame to another. transform relationship.
  • FlowNet2.0 may be used as the optical flow network in this application.
  • the video processing device obtains the video frame sequence to be optimized, and after determining the M frame anchor frame and the N-M frame intermediate frame in the video frame sequence, the video frame sequence can be input into the trained Processed in the video optimization model to obtain the optimized video.
  • the video frame sequence to be optimized may be a video segment cut out from a video, or a complete video.
  • the video frame sequence includes N frames of video.
  • N video frames including the first video frame and the Nth video frame there are M anchor frames, where M is a positive integer greater than 2 and less than N.
  • the M frames of anchor frames may be designated manually, or may be identified by the video processing device from N frames of video frames according to preset anchor frame extraction rules. For example, if the number of intervals between intermediate frames is set to 10, then the video processing device can start from the first video frame, identify the first video frame as the first frame anchor frame, and identify the twelfth frame of video after an interval of 10 intermediate frames The frame is the second anchor frame, and so on until the Nth video frame is identified as the Mth anchor frame. It can be understood that the number of intermediate frames between the anchor frame of the M-th frame and the anchor frame of the M-1 frame may be less than 10 frames.
  • the intermediate frame is the video frame between two adjacent anchor frames in the N frame video frame, for example, the first video frame and the 12th video frame are two adjacent video frames, located in the first video frame
  • the 2nd-11th frame video frame between the 12th frame and the 12th video frame is an intermediate frame.
  • the video processing device performs video optimization on the video frame sequence to be optimized based on the video optimization model, as shown in FIG. 2 , including:
  • the first video frame x 1 and the fourth video frame x 4 are anchor frames
  • the second video frame x 2 and the third video frame x 3 are intermediate frames.
  • the video processing device inputs x 1 and x 4 respectively to the feature extraction network GE for processing to obtain the intermediate feature F 1 of x 1 and the intermediate feature F 4 of x 4 .
  • the forward optical flow parameters of the intermediate frame are used to describe the transformation relationship from the previous frame of the intermediate frame to the intermediate frame
  • the reverse optical flow parameters of the intermediate frame are used to describe the transition from the next frame of the intermediate frame to the intermediate frame.
  • the video processing device inputs x 1 and x 2 into the optical flow network, and obtains the forward optical flow parameter f 1 ⁇ 2 of x 2 (used to describe x 1 to x 2 transform transformation relationship).
  • the video processing device inputs x 2 and x 3 into the optical flow network to obtain the forward optical flow parameter f 2 ⁇ 3 of x 3 (used to describe the transformation relationship from x 2 to x 3 ).
  • Input x 4 and x 3 into the optical flow network to obtain the reverse optical flow parameter f 4 ⁇ 3 of x 3 (used to describe the transformation relationship from x 4 to x 3 ).
  • the optical flow parameters are used to make the intermediate features of the two anchor frames propagate between the intermediate frames. That is to say, the optical flow parameters between each intermediate frame and the adjacent two frames of video frames are calculated through the optical flow network, and based on each optical flow parameter, the intermediate features of the anchor frame are forward or backward Propagated frame by frame such that the intermediate features of the intermediate frame are aligned to the intermediate features of the anchor frame.
  • the value of i is ⁇ 1, 2, ..., N-1, N ⁇ , when the i-th video frame is an intermediate frame:
  • the video processing device can use the forward optical flow parameters of the i-th frame of video frame to perform shape transformation on the intermediate features of the i-1th frame of video frame to obtain the forward feature of the i-th frame of video frame; Perform shape transformation on the reverse feature of the i+1 frame video frame to the optical flow parameter to obtain the reverse feature of the i frame video frame; for the forward feature of the i frame video frame and the reverse feature of the i frame video frame Feature fusion is performed to obtain the intermediate features of the i-th video frame.
  • the intermediate feature of the anchor frame can also be used as the reverse feature and forward feature of the anchor frame, that is, the values of the intermediate feature, reverse feature, and forward feature of the anchor frame are the same. That is to say, if the i+1th frame video frame is an anchor frame, the reverse feature value of the i+1th frame video frame is the intermediate feature of the i+1th frame video frame extracted by the feature extraction network.
  • the intermediate feature F4 of x4 is backpropagated to obtain the reverse features of x2 and x3 . That is, use the reverse optical flow parameter f 4 ⁇ 3 of x 3 to perform a shape change (warp) operation on F 4 to obtain the reverse feature of x 3 get After that, continue to use the reverse optical flow parameter f 3 ⁇ 2 of x 2 for Perform warp operation to get the reverse feature of x 2
  • the intermediate feature F1 of x1 is forward propagated to obtain the intermediate features of x2 and x3 . That is, use the forward optical flow parameter f 1 ⁇ 2 of x 2 to perform warp operation on F 1 to obtain the forward feature of x 2 followed by and Perform feature fusion to obtain the intermediate feature F 2 of x 2 . After obtaining F 2 , continue to use the forward optical flow parameter f 2 ⁇ 3 of x 3 to perform warp operation on F 2 to obtain the forward feature of x 3 followed by and Perform feature fusion to obtain the intermediate feature F 3 of x 3 .
  • the intermediate features of one anchor frame are transmitted backwards first, and the intermediate features of another anchor frame are forward propagated.
  • This two-way transmission of information can complement the information loss caused by the optical flow network and warp operation in a single transmission direction, and the temporal continuity of the intermediate features of each frame. This is more conducive to the subsequent video optimization effect.
  • the intermediate features of the intermediate frame are calculated based on the intermediate features of the anchor frames located on both sides of the intermediate frame, when there is a changing scene in the video frame sequence, some The influence information only exists in this time interval (even between two anchor frames), and will not affect the accuracy of intermediate features of intermediate frames in other time intervals.
  • the feature fusion when performing feature fusion on the forward feature of the i-th video frame and the reverse feature of the i-th video frame, the feature fusion can be performed by numerical calculation, or the feature fusion network can be set in the video optimization model. feature fusion.
  • the feature fusion network may be a conventional feature fusion network, for example, a field-aware factorization machine (FFM), a factorization machine (Factorization Machines, FM) and the like with field-aware capabilities.
  • FFM field-aware factorization machine
  • FM Factorization Machines
  • this embodiment of the present application provides an improved FFM model, by inputting the i-1th video frame x i-1 , the i-th video frame x i , the i+1th video frame x i+1 , Forward features of the i-th video frame Reverse feature of the i-th video frame Forward features of the i-1th video frame And the reverse feature of the i+1th frame video frame Perform a feature fusion operation to output the intermediate feature F i of the i-th video frame. That is, as shown in FIG. 1 , the video optimization model provided in the embodiment of the present application also includes the FFM model provided in the embodiment of the present application.
  • a convolutional layer is used to extract features on xi-1 , xi and xi+1 respectively.
  • the extracted features are combined (concat) to obtain a combined feature.
  • the merged features are then fed into the weighting network and feature refinement network respectively.
  • the weight estimation network and the feature compensation network are respectively composed of multiple convolutional layers.
  • the weight estimation network is based on the input merged features, and After multi-layer convolution operation, the weight matrix W is output.
  • W to and Weighted can achieve the and The choice of the same pixel in the middle, get a fusion feature (E.g, ).
  • the feature compensation network merges the input features, the convolutional and After multi-layer convolution operation, the output can be compared with Corresponding Supplementary Features
  • the supplementary feature can be restored in the calculation and In the process, the missing information is caused by the optical flow network and warp operation.
  • the intermediate feature F i of the i-th video frame is obtained.
  • the FFM model provided by this application can not only refer to the positive features of the i-1th video frame x i-1 , the i+1th video frame x i+1 , and the i-1th video frame And the reverse feature of the i+1th frame video frame to construct the F i of the i-th video frame. That is, the information of the previous and subsequent frames is considered, so that the F i of the i-th video frame has more continuity with the intermediate features of the previous and subsequent frames in time. while being able to and Supplement missing information due to optical flow network and warp operation. Therefore, the continuity of the intermediate features of the intermediate frames can be further improved.
  • the feature estimation network is used to perform feature estimation on the intermediate features of the i-th frame grayscale image
  • the optimized image of the i-th frame grayscale image includes:
  • the anchor frame is selected, and after extracting the intermediate features of the anchor frame, the intermediate feature of the anchor frame is extracted.
  • the features are forward-propagated and back-propagated between the intermediate frames to compute intermediate features for the intermediate frames.
  • the intermediate features of the intermediate frames retain the transformation information between frames. Therefore, the optimized video obtained after performing feature estimation on the intermediate features of each frame can guarantee the continuity to a certain extent.
  • an initial model for video optimization which includes an initial network for feature extraction, an initial network for optical flow, and an initial network for feature estimation.
  • a corresponding image optimization model can be selected based on a specific video optimization scenario. Then the image optimization model is split to obtain the corresponding feature extraction initial network and feature estimation initial network.
  • the feature fusion network can also be set in the initial model of video optimization.
  • the improved FFM initial model provided by the present application may be set in the video optimization initial model.
  • the trained video optimization module includes the above-mentioned trained feature extraction network, optical flow network, feature estimation network and FFM model.
  • the training set includes a plurality of video frame sequence samples to be optimized. Since unsupervised training is adopted, the training set may not need to collect corresponding color video frame sequences.
  • the design of the loss function can be designed based on the actual video optimization scenario. For example, taking the video optimization scene of black and white video coloring as an example, the loss function can be designed as:
  • M is the occlusion matrix
  • N is the frame number of video frame sequence samples
  • d is the interval between adjacent frames
  • a gradient descent algorithm may be used during training.
  • the parameters of the network are learned through iteration.
  • the initial learning rate can be set to 1e-4, and every 50,000 iterations, the learning rate is decayed by half until the network converges.
  • video optimization model and training method provided by this application are universal. It can be applied to any video optimization task or any task that takes the video optimization effect as an evaluation index.
  • the embodiment of the present application provides a video optimization device.
  • the device embodiment corresponds to the aforementioned method embodiment.
  • the details in the present invention will be described one by one, but it should be clear that the device in this embodiment can correspondingly implement all the content in the foregoing method embodiments.
  • FIG. 4 is a schematic structural diagram of a video optimization device provided by an embodiment of the present application.
  • the video optimization device provided by this embodiment includes: an extraction unit 401 , a determination unit 402 and an estimation unit 403 .
  • the extraction unit 401 is used to utilize the trained feature extraction network to extract the intermediate features of the M frame anchor frames in the video frame sequence to be optimized respectively, the video frame sequence includes N frame video frames, and the M frame anchor frame includes a video frame sequence
  • the 1st video frame and the Nth video frame of , M is a positive integer greater than 2 and less than N.
  • the determination unit 402 is used to determine the forward optical flow parameters and reverse optical flow parameters of the N-M frame intermediate frames respectively by using the trained optical flow network, and the forward optical flow parameters of the intermediate frames are used to describe the direction of the previous frame of the intermediate frame.
  • the transformation relationship of the intermediate frame transformation, the reverse optical flow parameter of the intermediate frame is used to describe the transformation relationship of the subsequent frame of the intermediate frame to the intermediate frame, and the intermediate frame is a video frame other than the anchor point frame in the video to be optimized.
  • the determination unit 402 is further configured to determine the intermediate features of the N-M intermediate frames according to the forward optical flow parameters and reverse optical flow parameters of the N-M intermediate frames, and the intermediate features of the M anchor frames.
  • the estimation unit 403 is configured to use the trained feature estimation network to perform feature estimation processing on the intermediate features of N frames of the video frame sequence to obtain N frames of optimized images, and the N frames of optimized images constitute an optimized video of the video frame sequence.
  • the video optimization device provided in this embodiment can execute the above-mentioned method embodiment, and its implementation principle and technical effect are similar, and details are not repeated here.
  • FIG. 5 is a schematic structural diagram of a terminal device provided in an embodiment of the present application.
  • the terminal device provided in this embodiment includes: a memory 501 and a processor 502, the memory 501 is used to store computer programs; the processor 502 is used to The methods described in the above method embodiments are executed when the computer program is called.
  • the computer program may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 501 and executed by the processor 502 to complete this Apply the method described in the examples.
  • the one or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program in the terminal device.
  • FIG. 5 is only an example of a terminal device, and does not constitute a limitation on the terminal device. It may include more or less components than those shown in the figure, or combine certain components, or different components, such as
  • the terminal device may also include an input and output device, a network access device, a bus, and the like.
  • the processor 502 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the storage 501 may be an internal storage unit of the terminal device, such as a hard disk or memory of the terminal device.
  • the memory 82 may also be an external storage device of the terminal device, such as a plug-in hard disk equipped on the terminal device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, Flash card (Flash Card), etc.
  • the memory 501 may also include both an internal storage unit of the terminal device and an external storage device.
  • the memory 501 is used to store the computer program and other programs and data required by the terminal device.
  • the memory 501 can also be used to temporarily store data that has been output or will be output.
  • the terminal device provided in this embodiment can execute the foregoing method embodiment, and its implementation principle and technical effect are similar, and details are not repeated here.
  • the embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method described in the foregoing method embodiment is implemented.
  • the embodiment of the present application further provides a computer program product, which, when the computer program product runs on a terminal device, enables the terminal device to implement the method described in the foregoing method embodiments when executed.
  • the above-mentioned integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments in the present application can be completed by instructing related hardware through computer programs, and the computer programs can be stored in a computer-readable storage medium.
  • the computer program When executed by a processor, the steps in the above-mentioned various method embodiments can be realized.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form.
  • the computer-readable storage medium may at least include: any entity or device capable of carrying computer program codes to a photographing device/terminal device, a recording medium, a computer memory, a read-only memory (Read-Only Memory, ROM), a random access Memory (Random Access Memory, RAM), electrical carrier signal, telecommunication signal and software distribution medium.
  • a photographing device/terminal device a recording medium
  • a computer memory a read-only memory (Read-Only Memory, ROM), a random access Memory (Random Access Memory, RAM), electrical carrier signal, telecommunication signal and software distribution medium.
  • ROM read-only memory
  • RAM random access Memory
  • electrical carrier signal telecommunication signal and software distribution medium.
  • U disk mobile hard disk
  • magnetic disk or optical disk etc.
  • computer readable media may not be electrical carrier signals and telecommunication signals under legislation and patent practice.
  • the disclosed device/device and method can be implemented in other ways.
  • the device/device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the term “if” may be construed, depending on the context, as “when” or “once” or “in response to determining” or “in response to detecting “.
  • the phrase “if determined” or “if [the described condition or event] is detected” may be construed, depending on the context, to mean “once determined” or “in response to the determination” or “once detected [the described condition or event] ]” or “in response to detection of [described condition or event]”.
  • references to "one embodiment” or “some embodiments” or the like in the specification of the present application means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically stated otherwise.
  • the terms “including”, “comprising”, “having” and variations thereof mean “including but not limited to”, unless specifically stated otherwise.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

Procédé et appareil d'optimisation vidéo, équipement terminal, et support d'enregistrement, se rapportant au domaine technique de l'apprentissage profond, et pouvant améliorer la continuité d'une optimisation vidéo. Le procédé d'optimisation vidéo consiste à : utiliser un réseau d'extraction de caractéristiques entraîné pour respectivement extraire des caractéristiques intermédiaires de M images de point d'ancrage de trame dans une séquence d'images vidéo à optimiser (S201), ladite séquence d'images vidéo comprenant N images vidéo de trame, et les M images de point d'ancrage de trame comprenant une première image vidéo de trame et une N-ième image vidéo de trame de ladite séquence d'images vidéo; respectivement déterminer un paramètre de flux optique direct et un paramètre de flux optique inverse de chaque N-M image intermédiaire de trame à l'aide d'un réseau de flux optique entraîné (S202); déterminer une caractéristique intermédiaire de la N-M image intermédiaire de trame selon le paramètre de flux optique direct et le paramètre de flux optique inverse de la N-M image intermédiaire de trame, et les caractéristiques intermédiaires des M images de point d'ancrage de trame (S203); et exécuter une estimation de caractéristique sur des caractéristiques intermédiaires de N images vidéo de trame de ladite séquence d'images vidéo à l'aide d'un réseau d'estimation de caractéristique entraîné pour obtenir N images optimisées de trame, les N images optimisées de trame constituant une vidéo optimisée de ladite séquence d'images vidéo (S204).
PCT/CN2021/137583 2021-05-21 2021-12-13 Procédé et appareil d'optimisation vidéo, équipement terminal, et support d'enregistrement WO2022242122A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110557336.8 2021-05-21
CN202110557336.8A CN113298728B (zh) 2021-05-21 2021-05-21 一种视频优化方法、装置、终端设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022242122A1 true WO2022242122A1 (fr) 2022-11-24

Family

ID=77323598

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/137583 WO2022242122A1 (fr) 2021-05-21 2021-12-13 Procédé et appareil d'optimisation vidéo, équipement terminal, et support d'enregistrement

Country Status (2)

Country Link
CN (1) CN113298728B (fr)
WO (1) WO2022242122A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117455798A (zh) * 2023-11-17 2024-01-26 北京同力数矿科技有限公司 一种轻量级视频去噪方法及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298728B (zh) * 2021-05-21 2023-01-24 中国科学院深圳先进技术研究院 一种视频优化方法、装置、终端设备及存储介质
CN116823973B (zh) * 2023-08-25 2023-11-21 湖南快乐阳光互动娱乐传媒有限公司 一种黑白视频上色方法、装置及计算机可读介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109756690A (zh) * 2018-12-21 2019-05-14 西北工业大学 基于特征级别光流的轻量级视频插值方法
US20190295228A1 (en) * 2018-03-21 2019-09-26 Nvidia Corporation Image in-painting for irregular holes using partial convolutions
CN112104830A (zh) * 2020-08-13 2020-12-18 北京迈格威科技有限公司 视频插帧方法、模型训练方法及对应装置
CN112584077A (zh) * 2020-12-11 2021-03-30 北京百度网讯科技有限公司 视频的插帧方法、装置及电子设备
CN113298728A (zh) * 2021-05-21 2021-08-24 中国科学院深圳先进技术研究院 一种视频优化方法、装置、终端设备及存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10176845B2 (en) * 2016-09-23 2019-01-08 Apple Inc. Seamless forward-reverse video loops
US10776688B2 (en) * 2017-11-06 2020-09-15 Nvidia Corporation Multi-frame video interpolation using optical flow

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190295228A1 (en) * 2018-03-21 2019-09-26 Nvidia Corporation Image in-painting for irregular holes using partial convolutions
CN109756690A (zh) * 2018-12-21 2019-05-14 西北工业大学 基于特征级别光流的轻量级视频插值方法
CN112104830A (zh) * 2020-08-13 2020-12-18 北京迈格威科技有限公司 视频插帧方法、模型训练方法及对应装置
CN112584077A (zh) * 2020-12-11 2021-03-30 北京百度网讯科技有限公司 视频的插帧方法、装置及电子设备
CN113298728A (zh) * 2021-05-21 2021-08-24 中国科学院深圳先进技术研究院 一种视频优化方法、装置、终端设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117455798A (zh) * 2023-11-17 2024-01-26 北京同力数矿科技有限公司 一种轻量级视频去噪方法及系统

Also Published As

Publication number Publication date
CN113298728B (zh) 2023-01-24
CN113298728A (zh) 2021-08-24

Similar Documents

Publication Publication Date Title
WO2022242122A1 (fr) Procédé et appareil d'optimisation vidéo, équipement terminal, et support d'enregistrement
CN109493350B (zh) 人像分割方法及装置
CN108416327B (zh) 一种目标检测方法、装置、计算机设备及可读存储介质
EP3716198A1 (fr) Procédé et dispositif de reconstruction d'image
WO2021164234A1 (fr) Procédé de traitement d'image et dispositif de traitement d'image
CN108960261B (zh) 一种基于注意力机制的显著物体检测方法
US11755889B2 (en) Method, system and apparatus for pattern recognition
CN112001914A (zh) 深度图像补全的方法和装置
CN112560980A (zh) 目标检测模型的训练方法、装置及终端设备
CN109815931B (zh) 一种视频物体识别的方法、装置、设备以及存储介质
WO2022166258A1 (fr) Procédé et appareil de reconnaissance de comportement, dispositif de terminal et support de stockage lisible par ordinateur
CN114821488B (zh) 基于多模态网络的人群计数方法、系统及计算机设备
JP2024018938A (ja) 周波数領域における自己注意機構に基づく夜間オブジェクト検出、訓練方法及び装置
CN114170167A (zh) 基于注意力引导上下文校正的息肉分割方法和计算机设备
CN114782759A (zh) 一种基于YOLOv5网络对密集遮挡鱼类的检测方法
TWI817896B (zh) 機器學習方法以及裝置
CN113658050A (zh) 一种图像的去噪方法、去噪装置、移动终端及存储介质
JP6963038B2 (ja) 画像処理装置および画像処理方法
CN116385369A (zh) 深度图像质量评价方法、装置、电子设备及存储介质
CN115937121A (zh) 基于多维度特征融合的无参考图像质量评价方法及系统
CN115880176A (zh) 多尺度非成对水下图像增强方法
US20220366262A1 (en) Method and apparatus for training neural network model
Fan et al. EGFNet: Efficient guided feature fusion network for skin cancer lesion segmentation
CN111612690B (zh) 一种图像拼接方法及系统
US10832076B2 (en) Method and image processing entity for applying a convolutional neural network to an image

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21940565

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21940565

Country of ref document: EP

Kind code of ref document: A1