WO2023179360A1 - 视频处理方法、装置、电子设备及存储介质 - Google Patents

视频处理方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2023179360A1
WO2023179360A1 PCT/CN2023/080197 CN2023080197W WO2023179360A1 WO 2023179360 A1 WO2023179360 A1 WO 2023179360A1 CN 2023080197 W CN2023080197 W CN 2023080197W WO 2023179360 A1 WO2023179360 A1 WO 2023179360A1
Authority
WO
WIPO (PCT)
Prior art keywords
aliasing
operator
processed
video frame
video
Prior art date
Application number
PCT/CN2023/080197
Other languages
English (en)
French (fr)
Inventor
杨定东
雷凯翔
尹淳骥
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023179360A1 publication Critical patent/WO2023179360A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/68Control of cameras or camera modules for stable pick-up of the scene, e.g. compensating for camera body vibrations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to the field of image processing technology, such as video processing methods, devices, electronic equipment and storage media.
  • application software can provide users with video processing functions, which can be understood as integrating multiple pre-built models into the application. After processing the video based on these models, the corresponding processing results can be obtained, such as , making the video screen show a specific style, color, etc.
  • the present disclosure provides video processing methods, devices, electronic equipment and storage media, which can effectively avoid “jitter” in the output video picture when the adjacent frames of the original video change significantly, while solving the problem of "jitter” in the picture. , it will not reduce the quality and clarity of the image, and improve the user experience.
  • the present disclosure provides a video processing method, including:
  • the image processing model includes an anti-aliasing operator for processing the video frame to be processed ,
  • the anti-aliasing operator includes an anti-aliasing upsampling operator, an anti-aliasing nonlinear operator and an anti-aliasing downsampling operator;
  • the target video is obtained.
  • the present disclosure also provides a video processing device, including:
  • the video frame acquisition module to be processed is configured to obtain the video frame to be processed
  • the target video frame determination module is configured to input the video frame to be processed into the image processing model to obtain the target video frame corresponding to the video frame to be processed; wherein the image processing model includes the video to be processed
  • An anti-aliasing operator for frame processing which includes an anti-aliasing upsampling operator, an anti-aliasing nonlinear operator, and an anti-aliasing downsampling operator;
  • the target video generation module is configured to obtain the target video by splicing multiple target video frames.
  • the present disclosure also provides an electronic device, which includes:
  • processors one or more processors
  • a storage device configured to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the above video processing method.
  • the present disclosure also provides a storage medium containing computer-executable instructions, which when executed by a computer processor are used to perform the above-mentioned video processing method.
  • the present disclosure also provides a computer program product, including a computer program carried on a non-transitory computer-readable medium, where the computer program includes program code for executing the above video processing method.
  • Figure 1 is a schematic flow chart of a video processing method provided by Embodiment 1 of the present disclosure
  • Figure 2 is a schematic structural diagram of an anti-aliasing operator provided by Embodiment 1 of the present disclosure
  • Figure 3 is a schematic flow chart of a video processing method provided by Embodiment 2 of the present disclosure.
  • Figure 4 is a schematic structural diagram of a video processing device provided by Embodiment 3 of the present disclosure.
  • FIG. 5 is a schematic structural diagram of an electronic device provided by Embodiment 4 of the present disclosure.
  • the term “include” and its variations are open-ended, ie, “including but not limited to.”
  • the term “based on” means “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; and the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • FIG. 1 is a schematic flowchart of a video processing method provided in Embodiment 1 of the present disclosure.
  • This embodiment of the present disclosure is suitable for processing the acquired video frames to be processed based on an image processing model including an anti-aliasing operator, thereby avoiding output
  • the method can be performed by a video processing device, which can be implemented in the form of software and/or hardware, for example, through an electronic device, which can be a mobile terminal, a personal computer (Personal Computer, PC) client or server, etc.
  • a video processing device which can be implemented in the form of software and/or hardware, for example, through an electronic device, which can be a mobile terminal, a personal computer (Personal Computer, PC) client or server, etc.
  • PC Personal Computer
  • the first implementation method is to input multiple video frames into the neural network at the same time for training (training) and prediction (inference). Since the image data of the first few frames needs to be stored, this method will seriously increase the resource overhead and delay during prediction. It cannot be applied on the mobile terminal in real time, and image jitter will still exist.
  • the second implementation method mainly blurs the input image. , this method not only cannot solve the jitter, but also blurs the output image, greatly reduces the clarity of the output image, and seriously damages the texture effect of the output image.
  • the third implementation method is mainly to enhance the input data and output data to imitate jitter when training the pix2pix network, hoping to make the network adapt to the jitter of the input picture.
  • the picture There's still a lot of jitter issues.
  • the above-mentioned data processing method still has the problem of serious output image jitter.
  • the acquired video to be processed can be based on an image processing model including an anti-aliasing operator. Frames are processed to avoid "shaking" in the output video.
  • the method includes:
  • the device for executing the video processing method provided by the embodiments of the present disclosure can be integrated into a device that supports video processing.
  • Functional application software, and the software can be installed into electronic devices, and the electronic devices can be mobile terminals or PCs, etc.
  • the application software can be a type of software for image/video processing.
  • the application software will not be described in detail here, as long as it can realize image/video processing. It can also be a specially developed application to implement video processing and display the output video in the software, or it can be integrated in the corresponding page. Users can process the special effects video through the page integrated in the PC. .
  • the user can shoot videos in real time based on the camera device of the mobile terminal, or actively upload videos based on pre-developed controls in the application.
  • the real-time captured videos obtained by the application or the videos actively uploaded by the user are the videos to be processed. .
  • Based on the pre-written program to parse the video to be processed multiple video frames to be processed can be obtained.
  • Those skilled in the art should understand that during the process of the user shooting a video, if the shooting angle of the camera device is displaced or rotated in a short period of time, after the video is processed by the traditional image processing model, the corresponding video frames will be There will be a "shaky" feeling, and the quality and clarity of the resulting image will be unsatisfactory.
  • the processing process of the embodiment of the present disclosure may be to first determine the anti-aliasing upsampling operator, the anti-aliasing nonlinear operator and the anti-aliasing downsampling operator, and then determine the anti-aliasing operator based on these mappings, Finally, the anti-aliasing operator is integrated into the image processing model and the model is trained. After the model training is completed, the model can be used to process the input image. In this process, the anti-aliasing operator is included The image processing model will not cause the spectrum of the image to expand, thereby ensuring the quality and clarity of the output image.
  • each video frame to be processed can be input into the image processing model.
  • the image processing model may be a pre-trained neural network model, for example, a bandwidth-strict neural network.
  • a bandwidth-strict neural network for example, a bandwidth-strict neural network.
  • the pixel-to-pixel network pixel2pixel can be referred to as pix2pix.
  • This technology is a style transfer and image generation technology. After an image is input to the neural network, the neural network will also output an image accordingly. , At the same time, the image output by the model can meet the user's expectations, for example, changing the real characters in the input image to a cartoon style, or a painting style, or changing the color and brightness of the image, etc.
  • the image processing model at least includes an anti-aliasing operator for processing the video frame to be processed.
  • the anti-aliasing operator includes an anti-aliasing upsampling operator, an anti-aliasing operator, and an anti-aliasing operator.
  • Nonlinear operators and anti-aliasing downsampling operators are the mapping from one function space to another function space in the model. An operation on any function in the model can be considered as an operation.
  • the anti-aliasing upsampling operator is the operator corresponding to the operation of collecting samples of analog signals.
  • the upsampling process is also called the discretization process of the waveform;
  • the anti-aliasing downsampling operator is the interval between a sample sequence The operator corresponding to the operation of sampling several samples at once and obtaining a new sequence can be an extraction process; anti-aliasing nonlinear operators are also called nonlinear mapping, that is, operators that do not satisfy linear conditions.
  • strict bandwidth means that the operators in the model have strict bandwidth restrictions on the spectrum. That is to say, when s represents the sampling frequency of the video frame to be processed as the input, no excess sampling frequency will be introduced. Half (s/2) of the frequency; correspondingly, the anti-aliasing representation is equivalent to the above bandwidth requirement, that is to say, only when the frequency of the continuous signal does not exceed half of the sampling frequency, the sampled signal can be restored to the real signal , to achieve anti-aliasing, otherwise it will produce aliasing.
  • the anti-aliasing upsampling operator, the anti-aliasing nonlinear operator and the anti-aliasing downsampling operator can be spliced and integrated to obtain the anti-aliasing operator, and the anti-aliasing operator can be introduced into the embodiment of the present disclosure.
  • the anti-aliasing operator can be introduced into the embodiment of the present disclosure. in the image processing model.
  • the anti-aliasing upsampling operator and the anti-aliasing nonlinear operator in the anti-aliasing operator and an anti-aliasing downsampling operator to process the video frame to be processed in turn to obtain a target video frame with the target effect.
  • the current tensor information corresponding to the video frame to be processed is used as the input of the anti-aliasing upsampling operator, and the current tensor information is processed based on the anti-aliasing upsampling operator.
  • Perform interpolation processing to obtain the first preprocessing tensor expand the signal spectrum corresponding to the first preprocessing tensor by at least twice based on the anti-aliasing nonlinear operator to obtain the target signal spectrum corresponding to the first preprocessing image. ;
  • the target signal spectrum is down-sampled based on the anti-aliasing down-sampling operator, and the down-sampling frequency is controlled to be the preset value of the original sampling frequency.
  • the sampling frequency is the size of the current image (that is, the current operation result). For example, when the side length of a square is L, the application can determine that the sampling frequency is L.
  • the cutoff frequency is the information contained in the image that can The frequency reached, continuing to take the above square as an example, in the case where the bandwidth of this embodiment is strictly limited and aliasing does not occur, the cutoff frequency should be less than L/2.
  • a tensor is a vector defined over some empty A multilinear mapping on the Cartesian product between spaces and some dual spaces, in which each component is a function of coordinates. When coordinate transformation is performed, these components will also undergo linear transformation according to some rules.
  • the tensor as a geometric entity, can include scalars, vectors and current operators, and can be expressed by a coordinate system.
  • the following takes the application's processing process of a video frame to be processed corresponding to the current moment as an example to illustrate.
  • the application determines the tensor information of the video frame to be processed at the current moment, it can be input into the image processing model and processed by the anti-aliasing upsampling operator to obtain the first predetermined Handle tensors.
  • the current tensor information is zero-interpolated in the spatial dimension to obtain the tensor information to be processed; the convolution kernel constructed based on the interpolation function performs interpolation processing on the tensor information to be processed to obtain the first preprocessed tensor.
  • 0s can be inserted at intervals in the spatial dimension, and then an ideal interpolation function can be used to perform the interpolation operation, where the interpolation function is Finally, the sinc function is used to construct a convolution kernel to perform convolution processing on the tensor information after the zero insertion operation, and the first preprocessed tensor can be obtained.
  • the first preprocessing tensor can be used as input and processed by an anti-aliasing nonlinear operator in the image processing model, thereby obtaining a target signal spectrum corresponding to the first preprocessing tensor.
  • the target signal spectrum is the abbreviation of the target signal frequency spectrum density, which can be a frequency distribution curve. Since the anti-aliasing nonlinear operator in the image processing model can at least expand the signal spectrum corresponding to the first preprocessing tensor by at least twice, an operator that doubles the upsampling can be used to perform element-by-element analysis. Nonlinear operation, and finally downsampling to restore the image to its original size.
  • the target signal spectrum can be used as input, and the anti-aliasing down-sampling operator is used to down-sample it, and the down-sampling frequency is controlled to be the preset value of the original sampling frequency, where, the original The sampling frequency is consistent with the sampling frequency of the current tensor; the preset value corresponds to the expansion multiple of the signal spectrum. Since the anti-aliasing downsampling operator will reduce the sampling frequency by two times, in this embodiment, it is also necessary to introduce an operator with a bandwidth of one quarter of the original sampling frequency into the image processing model.
  • the corresponding target video frame can be obtained after processing the tensor information corresponding to the video frame to be processed based on multiple operators in the image processing model.
  • the processing method of the embodiment of the present disclosure can at least make the image present a specific target effect, and the target effect is consistent with the non-dithered effect. For example, when the images of multiple consecutive frames in the video frame to be processed change greatly, the corresponding multiple consecutive video frames output by the image processing model will not show a "shaky" visual effect, that is, they will be consistent with the non-shaky effect. .
  • the image processing model processes multiple video frames to be processed and outputs the corresponding target video frame.
  • the application can splice multiple images according to the timestamp corresponding to each target video frame to obtain the target video.
  • the processed image can be processed in a non-jittering, A coherent form is displayed.
  • the application After the application determines the target video, it can either play the video directly to display the processed video screen on the display interface, or store the target video in a specific space according to a preset path.
  • the application determines the target video, it can either play the video directly to display the processed video screen on the display interface, or store the target video in a specific space according to a preset path.
  • the technical solution of the embodiment of the present disclosure is to obtain a video frame to be processed, and then input the video frame to be processed into an image processing model including an anti-aliasing operator, thereby obtaining a target video frame corresponding to the video frame to be processed, where, Anti-aliasing operators include anti-aliasing upsampling operators, anti-aliasing nonlinear operators, and anti-aliasing downsampling operators.
  • Anti-aliasing operators include anti-aliasing upsampling operators, anti-aliasing nonlinear operators, and anti-aliasing downsampling operators.
  • FIG 3 is a schematic flow chart of a video processing method provided in Embodiment 2 of the present disclosure.
  • the obtained target anti-aliasing The aliasing operator is deployed into the image processing model to be trained and the model is trained, which not only avoids the occurrence of "jitter" during video processing, but also reduces the cost of computing resources, making it easier to deploy the model on mobile terminals; at the same time , low-pass transformation of the neural network operator from the perspective of frequency, reducing the number of convolution kernels and making the constructed model more universal.
  • the technical solution of this embodiment please refer to the technical solution of this embodiment. The technical terms that are the same as or corresponding to the above embodiments will not be described again here.
  • the method includes the following steps:
  • S220 Determine the anti-aliasing upsampling operator, anti-aliasing nonlinear operator and anti-aliasing downsampling operator among the anti-aliasing operators, and deploy the anti-aliasing operators to the image processing model to be trained.
  • the image processing model to be trained is trained using multiple training samples in a training sample set to obtain the image processing model.
  • the application before processing the video frame to be processed based on the image processing model, the application first needs to determine the pre-constructed anti-aliasing upsampling operator, anti-aliasing nonlinear operator, and anti-aliasing downsampling operator. , after splicing these operators according to the architecture pre-designed by the staff, the anti-aliasing operator can be obtained and deployed into the image training model. This process is explained below.
  • the target anti-aliasing operator it can be determined based on the original sampling frequency, the cut-off frequency corresponding to the anti-aliasing down-sampling operator, the filtering frequency corresponding to the filter, the interpolation function and the width of the preset window.
  • the convolution kernel to be used; through the separation process of the convolution kernel to be used, two convolution kernels to be applied are determined; based on the two convolution kernels to be applied, the anti-aliasing upsampling operator is determined.
  • the application in addition to the original sampling frequency corresponding to the video frame to be processed, the application also needs to determine the shape of the filter spectrum diagram and the two parameters w_c and w_s corresponding to the filter, where w_c is the cutoff frequency. , that is, the frequency that the filter allows to effectively pass, w_s determines the length of the transition band.
  • the length of the transition band marks the performance and accuracy of the filter. The smaller the length, the better the filter performance.
  • a window needs to be deployed in advance. For example, Kaiser window, the width of which can be represented by N.
  • the Kaiser window is a locally optimized window function with strong capabilities, which is implemented using a modified zero-order Bessel function, which will not be described again in the embodiments of the present disclosure.
  • the application can determine the convolution kernel to be used.
  • the convolution kernel is that during the image processing process, given an input image, the weighted summation of pixels in a small area in the input image becomes each corresponding pixel in the output image, where the weight is determined by a Function definition, this function is the convolution kernel.
  • the convolution kernel to be used may be one or more, which are at least used to process the tensor corresponding to the video frame to be processed, and the convolution kernel to be used also includes a plurality of values to be used.
  • the convolution kernel to be used can be split into two convolution kernels to be applied, and then the anti-aliasing upsampling operator is determined based on the two convolution kernels to be applied.
  • At least four convolution kernels to be deployed are obtained, and all The at least four convolution kernels to be deployed serve as the anti-aliasing upsampling operators.
  • four N ⁇ N convolution kernels can be constructed based on these two convolution kernels, thereby achieving the original size of the to-be-processed convolution kernels. N ⁇ N convolution is performed on the video frame. At the same time, there is no need to insert 0 in the intervals in the spatial dimension during this process.
  • the current tensor information corresponding to the video frame to be processed is processed to obtain the first preprocessing tensor.
  • the result is concat processed, And execute the PixelShuffle method to complete the operation of determining the anti-aliasing upsampling operator and obtaining the corresponding first preprocessing tensor.
  • the PixelShuffle method can effectively enlarge the reduced feature map and can replace the interpolation or deconvolution method to achieve upscale.
  • the upsampling process in the embodiment of the present disclosure can be implemented based on the four convolution kernels to be deployed in the image processing model, and then the processing results are input to the anti-aliasing nonlinear operator, and Subsequent image processing is performed according to the method of Embodiment 1 of the present disclosure.
  • the image processing model to be trained is trained based on the training sample set to obtain the image processing model, so as to deploy the image processing model to the terminal device whose computing power is less than the preset computing power threshold.
  • the training sample set can be image data containing input and corresponding output.
  • the image data can be loss-processed based on the loss function corresponding to the model, so that the training process can be based on the multiple loss values obtained.
  • the model parameters in the image processing model are corrected, and at the same time, the convergence of the loss function is used as the training target, and the trained image processing model can be obtained.
  • the image processing model to be trained After the image processing model to be trained processes multiple images as inputs in the training set and obtains the corresponding output, it can determine the corresponding multiple loss values based on the output and the images as output in the training set, and use the multiple losses
  • the training error of the loss function that is, the loss parameter
  • the loss parameter can be used as a condition to detect whether the loss function currently reaches convergence, such as whether the training error is less than the preset error or whether the error change trend tends to be stable. Or whether the current number of iterations is equal to the preset number.
  • the detection reaches the convergence condition, for example, the training error of the loss function is less than the preset error, or the error trend becomes stable, it indicates that the training of the image processing model to be trained is completed, and the iterative training can be stopped at this time. If it is detected that the convergence condition has not been reached, other training sets can be obtained to continue training the model until the training error of the loss function is within the preset range.
  • the trained image processing model to be trained can be used as the image processing model to be used and deployed to the application.
  • S230 Process the first preprocessing tensor sequentially based on the anti-aliasing nonlinear operator and the anti-aliasing downsampling operator in the target anti-aliasing operator to obtain the target video frame.
  • the technical solution of this embodiment is to optimize the anti-aliasing upsampling operator in the anti-aliasing operator, deploy the obtained target anti-aliasing operator to the image processing model to be trained, and train the model. Not only does it avoid the occurrence of "jitter" during video processing, it also reduces the cost of computing resources, making it easier to deploy the model on mobile terminals.
  • the neural network operator is low-pass transformed from the perspective of frequency, reducing convolution The number of cores makes the constructed model more universal.
  • Figure 4 is a schematic structural diagram of a video processing device provided in Embodiment 3 of the present disclosure. As shown in Figure 4, it includes: a video frame acquisition module 310 to be processed, a target video frame determination module 320, and a target video generation module 330.
  • the video frame acquisition module 310 to be processed is configured to acquire the video frame to be processed.
  • the target video frame determination module 320 is configured to input the video frame to be processed into an image processing model to obtain a target video frame corresponding to the video frame to be processed; wherein the image processing model includes the video frame to be processed.
  • An anti-aliasing operator for video frame processing which includes an anti-aliasing upsampling operator, an anti-aliasing nonlinear operator, and an anti-aliasing downsampling operator.
  • the target video generation module 330 is configured to obtain the target video by splicing multiple target video frames.
  • the target video frame determination module 320 is also configured to perform non-linear processing on the video frame to be processed based on the image processing model, based on the anti-aliasing value in the anti-aliasing operator.
  • the over-sampling operator, the anti-aliasing nonlinear operator and the anti-aliasing down-sampling operator sequentially process the video frame to be processed to obtain the target video frame with the target effect; wherein the target effect is the same as the unprocessed video frame.
  • the dithering effect is consistent.
  • the target video frame determination module 320 includes a first preprocessing tensor determination unit, a target signal spectrum determination unit and a downsampling processing unit.
  • the first preprocessing tensor determination unit is configured to use the current tensor information corresponding to the video frame to be processed as the anti-aliasing upsampling operator when it is detected that the video frame to be processed is subjected to non-linear processing. Input, perform interpolation processing on the current tensor information based on the anti-aliasing upsampling operator to obtain the first preprocessed tensor.
  • a target signal spectrum determination unit configured to expand the signal spectrum corresponding to the first preprocessing tensor by at least twice based on the anti-aliasing nonlinear operator to obtain a target corresponding to the first preprocessing image. signal spectrum.
  • a downsampling processing unit configured to downsample the target signal spectrum based on the anti-aliasing downsampling operator, and control the downsampling frequency to be a preset value of the original sampling frequency; wherein the original sampling frequency and The sampling frequency of the current tensor is consistent; the preset value is consistent with the signal frequency. corresponds to the magnification factor of the spectrum.
  • the first preprocessing tensor determination unit is also configured to perform zero interpolation processing on the current tensor information in the spatial dimension to obtain the tensor information to be processed; the convolution constructed based on the interpolation function The tensor information to be processed is checked and interpolated to obtain the first preprocessed tensor.
  • the video processing device also includes a model training module.
  • the model training module is configured to determine the anti-aliasing upsampling operator, the anti-aliasing nonlinear operator and the anti-aliasing downsampling operator in the anti-aliasing operator, and deploy the anti-aliasing operator to the target
  • the image processing model to be trained is trained based on multiple training samples in the training sample set to obtain the image processing model.
  • the video processing device also includes a target anti-aliasing operator determination module and an image processing model determination module.
  • the target anti-aliasing operator determination module is configured to optimize the anti-aliasing upsampling operator in the anti-aliasing operator and keep the anti-aliasing nonlinear operator and the anti-aliasing downsampling operator unchanged. , obtain the target anti-aliasing operator, and deploy the target anti-aliasing operator in the image processing model to be trained.
  • the image processing model determination module is configured to train the image processing model to be trained based on the training sample set to obtain the image processing model, so as to deploy the image processing model to a terminal device whose computing power is less than a preset computing power threshold. .
  • the target anti-aliasing operator determination module includes a convolution kernel determination unit to be used, a convolution kernel determination unit to be applied, and an anti-aliasing upsampling operator determination unit.
  • the convolution kernel determination unit to be used is set to determine the convolution kernel to be used based on the original sampling frequency, the cutoff frequency corresponding to the anti-aliasing downsampling operator, the filtering frequency corresponding to the filter, the interpolation function and the width of the preset window. Use a convolution kernel; wherein the convolution kernel to be used includes multiple values to be used.
  • the convolution kernel determination unit to be applied is configured to determine two convolution kernels to be applied by separating the convolution kernels to be used.
  • the anti-aliasing upsampling operator determination unit is configured to determine the anti-aliasing upsampling operator based on two convolution kernels to be applied.
  • the anti-aliasing upsampling operator determination unit is also configured to obtain at least four convolution kernels to be deployed by combining two convolution kernels to be applied, and the at least four convolution kernels are The convolution kernel is to be deployed as the anti-aliasing upsampling operator.
  • the target video frame determination module 320 is also configured to determine the current frame corresponding to the video frame to be processed based on at least four to-be-deployed convolution kernels in the anti-aliasing upsampling operator. The amount of information is processed to obtain the first preprocessing tensor based on the target anti-aliasing operator.
  • the anti-aliasing nonlinear operator and the anti-aliasing downsampling operator process the first preprocessing tensor in sequence.
  • the technical solution provided by this embodiment is to obtain a video frame to be processed, and then input the video frame to be processed into an image processing model including an anti-aliasing operator, thereby obtaining a target video frame corresponding to the video frame to be processed, where , the anti-aliasing operator includes an anti-aliasing upsampling operator, an anti-aliasing nonlinear operator and an anti-aliasing downsampling operator.
  • the anti-aliasing operator includes an anti-aliasing upsampling operator, an anti-aliasing nonlinear operator and an anti-aliasing downsampling operator.
  • the video processing device provided by the embodiments of the present disclosure can execute the video processing method provided by any embodiment of the present disclosure, and has functional modules and effects corresponding to the execution method.
  • the multiple units and modules included in the above-mentioned device are only divided according to functional logic, but are not limited to the above-mentioned divisions, as long as they can achieve the corresponding functions; in addition, the names of the multiple functional units are only for the convenience of distinguishing each other. , are not used to limit the protection scope of the embodiments of the present disclosure.
  • FIG. 5 is a schematic structural diagram of an electronic device provided by Embodiment 4 of the present disclosure.
  • Terminal devices in embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (PDA), tablet computers (PAD), portable multimedia players (PMP), vehicle-mounted terminals (such as Mobile terminals such as vehicle navigation terminals) and fixed terminals such as digital television (TV), desktop computers, etc.
  • PDA personal digital assistants
  • PAD tablet computers
  • PMP portable multimedia players
  • vehicle-mounted terminals such as Mobile terminals such as vehicle navigation terminals
  • fixed terminals such as digital television (TV), desktop computers, etc.
  • the electronic device 400 shown in FIG. 5 is only an example and should not bring any limitations to the functions and usage scope of the embodiments of the present disclosure.
  • the electronic device 400 may include a processing device (such as a central processing unit, a pattern processor, etc.) 401, which may process data according to a program stored in a read-only memory (Read-Only Memory, ROM) 402 or from a storage device. 408 loads the program in the random access memory (Random Access Memory, RAM) 403 to perform various appropriate actions and processes. In the RAM 403, various programs and data required for the operation of the electronic device 400 are also stored.
  • the processing device 401, ROM 402 and RAM 403 are connected to each other via a bus 404.
  • An editing/output (I/O) interface 405 is also connected to bus 404.
  • the following devices can be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a Liquid Crystal Display (LCD) , output devices 407 of speakers, vibrators, etc.; including Storage device 408 such as magnetic tape, hard disk, etc.; and communication device 409.
  • the communication device 409 may allow the electronic device 400 to communicate wirelessly or wiredly with other devices to exchange data.
  • FIG. 5 illustrates electronic device 400 with various means, implementation or availability of all illustrated means is not required. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via communication device 409, or from storage device 408, or from ROM 402.
  • the processing device 401 When the computer program is executed by the processing device 401, the above-mentioned functions defined in the method of the embodiment of the present disclosure are performed.
  • the electronic device provided by the embodiments of the present disclosure and the video processing method provided by the above embodiments belong to the same concept.
  • Technical details that are not described in detail in this embodiment can be referred to the above embodiments, and this embodiment has the same effect as the above embodiments. .
  • Embodiments of the present disclosure provide a computer storage medium on which a computer program is stored.
  • the program is executed by a processor, the video processing method provided by the above embodiments is implemented.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium may be, for example, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof.
  • Examples of computer-readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard drives, RAM, ROM, Erasable Programmable Read-Only Memory (EPROM) or flash memory), optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a computer-readable signal medium may also be a computer Any computer-readable medium other than a machine-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code contained on a computer-readable medium can be transmitted using any appropriate medium, including but not limited to: wires, optical cables, radio frequency (Radio Frequency, RF), etc., or any suitable combination of the above.
  • the client and server can communicate using any currently known or future developed network protocol, such as HyperText Transfer Protocol (HTTP), and can communicate with digital data in any form or medium.
  • HTTP HyperText Transfer Protocol
  • Communications e.g., communications network
  • Examples of communication networks include Local Area Networks (LANs), Wide Area Networks (WANs), the Internet (e.g., the Internet), and end-to-end networks (e.g., ad hoc end-to-end networks), as well as any current network for knowledge or future research and development.
  • LANs Local Area Networks
  • WANs Wide Area Networks
  • the Internet e.g., the Internet
  • end-to-end networks e.g., ad hoc end-to-end networks
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; it may also exist independently without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs.
  • the electronic device executes the above-mentioned one or more programs.
  • the anti-aliasing operator includes an anti-aliasing upsampling operator, an anti-aliasing nonlinear operator and an anti-aliasing downsampling operator; by splicing multiple target video frames, Get target video.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages—such as "C" or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user computer through any kind of network, including a LAN or WAN, or may be connected to an external computer (eg, through the Internet using an Internet service provider).
  • each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions.
  • the functions noted in the blocks may function differently than in the figures. Occurs in the order noted. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure can be implemented in software or hardware.
  • the name of the unit does not constitute a limitation on the unit itself.
  • the first acquisition unit can also be described as "the unit that acquires at least two Internet Protocol addresses.”
  • exemplary types of hardware logic components include: field programmable gate array (Field Programmable Gate Array, FPGA), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), application specific standard product (Application Specific Standard Parts (ASSP), System on Chip (SOC), Complex Programming Logic Device (CPLD), etc.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing. Examples of machine-readable storage media would include an electrical connection based on one or more wires, a portable computer disk, a hard drive, RAM, ROM, EPROM or flash memory, optical fiber, CD-ROM, optical storage device, magnetic storage device, or Any suitable combination of the above.
  • Example 1 provides a video processing method, which method includes:
  • the image processing model includes an anti-aliasing operator for processing the video frame to be processed ,
  • the anti-aliasing operator includes an anti-aliasing upsampling operator, an anti-aliasing nonlinear operator and an anti-aliasing downsampling operator;
  • the target video is obtained.
  • Example 2 provides a video processing method, This method also includes:
  • the sampling operator processes the video frames to be processed in sequence to obtain a target video frame with a target effect
  • the target effect is consistent with the non-dithered effect.
  • Example 3 provides a video processing method, which further includes:
  • the current tensor information corresponding to the video frame to be processed is used as the input of the anti-aliasing upsampling operator, and based on the anti-aliasing upsampling operator Perform interpolation processing on the current tensor information to obtain the first preprocessed tensor;
  • the original sampling frequency is consistent with the sampling frequency of the current tensor; the preset value corresponds to the expansion multiple of the signal spectrum.
  • Example 4 provides a video processing method, which further includes:
  • the convolution kernel constructed based on the interpolation function performs interpolation processing on the tensor information to be processed to obtain the first preprocessing tensor.
  • Example 5 provides a video processing method, which further includes:
  • the image processing model to be trained is trained using multiple training samples based on the training sample set to obtain the image processing model.
  • Example 6 provides a video processing method, which further includes:
  • the image processing model to be trained is trained based on the training sample set to obtain the image processing model, so as to deploy the image processing model to a terminal device whose computing power is less than a preset computing power threshold.
  • Example 7 provides a video processing method, which further includes:
  • the convolution kernel to be used is determined; wherein, the convolution kernel to be used is determined.
  • the convolution kernel to be used is determined.
  • the anti-aliasing upsampling operator is determined.
  • Example 8 provides a video processing method, which further includes:
  • At least four convolution kernels to be deployed are obtained, and the at least four convolution kernels to be deployed are used as the anti-aliasing upsampling operator.
  • Example 9 provides a video processing method, which further includes:
  • the current tensor information corresponding to the video frame to be processed is processed based on at least four to-be-deployed convolution kernels in the anti-aliasing upsampling operator to obtain a first preprocessing tensor based on the target
  • the anti-aliasing nonlinear operator and the anti-aliasing downsampling operator in the anti-aliasing operator sequentially process the first preprocessing tensor.
  • Example 10 provides a video processing device, which includes:
  • the video frame acquisition module to be processed is configured to obtain the video frame to be processed
  • the target video frame determination module is configured to input the video frame to be processed into the image processing model to obtain the target video frame corresponding to the video frame to be processed; wherein the image processing model includes the video to be processed
  • An anti-aliasing operator for frame processing which includes an anti-aliasing upsampling operator, an anti-aliasing nonlinear operator, and an anti-aliasing downsampling operator;
  • the target video generation module is configured to obtain the target video by splicing multiple target video frames.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Image Processing (AREA)
  • Controls And Circuits For Display Device (AREA)

Abstract

本公开提供了视频处理方法、装置、电子设备及存储介质。该方法包括:获取待处理视频帧;将所述待处理视频帧输入至图像处理模型中,以得到与所述待处理视频帧相对应的目标视频帧;其中,所述图像处理模型中包括对待处理视频帧处理的抗混叠算子,所述抗混叠算子中包括抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子;通过对多个目标视频帧拼接处理,得到目标视频。

Description

视频处理方法、装置、电子设备及存储介质
本申请要求在2022年03月24日提交中国专利局、申请号为202210303579.3的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开涉及图像处理技术领域,例如涉及视频处理方法、装置、电子设备及存储介质。
背景技术
随着网络技术的不断发展,越来越多的应用程序进入了用户的生活,尤其是一系列可以拍摄短视频的软件,深受用户的喜爱。
相关技术中,应用软件可以向用户提供视频处理的功能,可以理解为,将预先构建的多种模型集成到应用中,基于这些模型对视频进行处理后,即可得到相对应的处理结果,例如,使视频画面呈现出特定的风格、颜色等。
然而,应用对视频进行处理时,如果相邻几帧画面出现较大的改变(如,画面发生略微位移、旋转),模型输出的处理后的视频帧即会发生较大的变化,从而使处理后的视频呈现出“抖动”的视觉效果,利用上述处理方式不仅无法彻底解决画面抖动的问题,还会降低图像的质量和清晰度,影响了用户的使用体验。
发明内容
本公开提供视频处理方法、装置、电子设备及存储介质,在原视频相邻几帧图像出现较大改变时,可以有效避免所输出的视频画面发生“抖动”,在解决画面“抖动”问题的同时,也不会降低图像的质量以及清晰度,提高了用户的使用体验。
第一方面,本公开提供了一种视频处理方法,包括:
获取待处理视频帧;
将所述待处理视频帧输入至图像处理模型中,以得到与所述待处理视频帧相对应的目标视频帧;其中,所述图像处理模型中包括对待处理视频帧处理的抗混叠算子,所述抗混叠算子中包括抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子;
通过对多个目标视频帧拼接处理,得到目标视频。
第二方面,本公开还提供了一种视频处理装置,包括:
待处理视频帧获取模块,设置为获取待处理视频帧;
目标视频帧确定模块,设置为将所述待处理视频帧输入至图像处理模型中,以得到与所述待处理视频帧相对应的目标视频帧;其中,所述图像处理模型中包括对待处理视频帧处理的抗混叠算子,所述抗混叠算子中包括抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子;
目标视频生成模块,设置为通过对多个目标视频帧拼接处理,得到目标视频。
第三方面,本公开还提供了一种电子设备,所述电子设备包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现上述的视频处理方法。
第四方面,本公开还提供了一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行上述的视频处理方法。
第五方面,本公开还提供了一种计算机程序产品,包括承载在非暂态计算机可读介质上的计算机程序,所述计算机程序包含用于执行上述的视频处理方法的程序代码。
附图说明
图1为本公开实施例一所提供的一种视频处理方法流程示意图;
图2为本公开实施例一所提供的一种抗混叠算子的结构示意图;
图3为本公开实施例二所提供的一种视频处理方法流程示意图;
图4为本公开实施例三所提供的一种视频处理装置结构示意图;
图5为本公开实施例四所提供的一种电子设备的结构示意图。
具体实施方式
下面将参照附图描述本公开的实施例。虽然附图中显示了本公开的一些实施例,然而本公开可以通过多种形式来实现,提供这些实施例是为了理解本公开。本公开的附图及实施例仅用于示例性作用。
本公开的方法实施方式中记载的多个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有指出,否则应该理解为“一个或多个”。
实施例一
图1为本公开实施例一所提供的一种视频处理方法流程示意图,本公开实施例适用于基于包括抗混叠算子的图像处理模型对所获取的待处理视频帧进行处理,从而避免输出的视频出现“抖动”的情形,该方法可以由视频处理装置来执行,该装置可以通过软件和/或硬件的形式实现,例如,通过电子设备来实现,该电子设备可以是移动终端、个人电脑(Personal Computer,PC)端或服务器等。
在介绍本技术方案之前,在此先对本公开实施例的应用场景进行说明。在应用基于待处理视频生成相应的目标视频时,通常采用的解决方式有三种,第一种实施方式为将多个视频帧同时输入至神经网络中,进行训练(train)和预测(inference)。由于需要存储前几帧的图像数据,此方法会严重增加预测时的资源开销和延迟,无法实时应用在移动端,且图像抖动会仍然存在;第二种实施方式主要是对输入图像进行模糊处理,此方法不仅无法解决抖动,还会让输出的图片变模糊,大幅度降低输出图片的清晰度,对于输出图片质感效果等折损严重。第三种实施方式主要是在训练pix2pix网络时,对于输入数据和输出数据进行增强,以模仿抖动,希望以此来让网络适应输入图片的抖动,然而,从实际得到的图片来看,画面中仍存在大量抖动的问题。基于上述可知,上述的数据处理方式依然存在输出图像抖动严重的问题,此时,基于本公开实施例的技术方案,即可基于包括抗混叠算子的图像处理模型对所获取的待处理视频帧进行处理,从而避免输出的视频出现“抖动”的情形。
如图1所示,所述方法包括:
S110、获取待处理视频帧。
执行本公开实施例提供的视频处理方法的装置,可以集成在支持视频处理 功能的应用软件中,且该软件可以安装至电子设备中,电子设备可以是移动终端或者PC端等。应用软件可以是对图像/视频处理的一类软件,对应用软件在此不再一一赘述,只要可以实现图像/视频处理即可。还可以是专门研发的应用程序,来实现视频处理并将输出的视频进行展示的软件中,亦或是集成在相应的页面中,用户可以通过PC端中集成的页面来实现对特效视频的处理。
在本实施例中,用户可以基于移动终端的摄像装置实时拍摄视频,或者,基于应用内预先开发的控件主动上传视频,应用所获取的实时拍摄的视频或用户主动上传的视频即是待处理视频。基于预先编写的程序对待处理视频进行解析,即可得到多个待处理视频帧。本领域技术人员应当理解,在用户拍摄视频的过程中,如果摄像装置的拍摄角度在短时间内发生位移或旋转,经过传统的图像处理模型对该视频进行处理后,相应的几个视频画面便会呈现出“抖动”的感觉,所得到图像的质量以及清晰度都无法令人满意。
S120、将待处理视频帧输入至图像处理模型中,以得到与待处理视频帧相对应的目标视频帧。
当将待处理视频帧经过卷积层进行处理时,相关技术所采用的方案会导致图像的频谱扩大,从而导致最终输出的视频出现“抖动”的视觉效果。而本公开实施例的处理过程可以为,先确定出抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子,进而基于这些映射确定出抗混叠算子,最后,将抗混叠算子集成至图像处理模型中并对模型进行训练,待模型训练完毕后,即可利用该模型对输入的图像进行处理,在这一过程中,包含抗混叠算子的图像处理模型不会导致图像的频谱扩大,从而保证所输出图像的质量以及清晰度。
在本实施例中,应用获取到多个待处理视频帧后,即可将每个待处理视频帧输入至图像处理模型中。其中,图像处理模型可以是预先训练好的神经网络模型,例如,带宽严格的神经网络,为了介绍本公开的图像处理过程,下面先对本公开实施例引入的带宽严格的像素到像素的神经网络进行说明。
在本实施例中,像素到像素的网络pixel2pixel可以简称为pix2pix,该技术是一种风格转换和图像生成技术,将一幅图像输入至该神经网络后,神经网络也会相应地输出一幅图像,同时,模型所输出的图像可以满足用户的期望,例如,将输入的图像中的真实人物改变为卡通漫画风格,或一种绘画风格,或改变图像的颜色以及亮度等。
为了解决输出的视频画面可能出现“抖动”的问题,图像处理模型中至少包括对待处理视频帧处理的抗混叠算子,抗混叠算子中包括抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子。其中,算子即是模型中一个函数空间到另一函数空间上的映射,对模型中的任何函数进行一项操作都可以认为是一 个算子。在本实施例中,抗混叠上采样算子即是与采集模拟信号的样本这一操作对应的算子,基于抗混叠上采样算子,可以将时间上、幅值上都连续的信号,在采样脉冲的作用下,转换成时间、幅值上离散的信号,因此,上采样的过程又称为波形的离散化过程;抗混叠下采样算子即是与对一个样值序列间隔几个样值取样一次并得到新序列这一操作对应的算子,可以为一个抽取的过程;抗混叠非线性算子又称为非线性映射,即不满足线性条件的算子。
在本实施例中,带宽严格即表示模型中的算子在频谱上具有严格的带宽限制,也即是说,以s表示作为输入的待处理视频帧的采样频率时,不会引入超过采样频率一半(s/2)的频率;相应的,抗混叠表示是和上述带宽要求等价,也即是说,只有连续信号的频率都不超过采样频率的一半,采样信号才能被还原成真实信号,实现抗混叠,反之即产生混叠。
基于此,可以将抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子拼接整合得到抗混叠算子,并将抗混叠算子引入本公开实施例的图像处理模型中。在基于该图像处理模型生成目标视频帧的过程中,在基于图像处理模型对待处理视频帧进行处理时,基于抗混叠算子中的抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子依次对待处理视频帧进行处理,以得到具有目标效果的目标视频帧。
结合图2可知,当待处理视频帧进行处理时,可以先将其对应的张量信息输入给抗混叠上采样算子进行处理,再将抗混叠上采样算子的处理结果输入给抗混叠非线性算子进行处理,最后将抗混叠非线性算子的处理结果输入给抗混叠下采样算子进行处理,即可得到与待处理视频帧相应的目标视频帧,这一过程即是基于包含有抗混叠算子的图像处理模型对待处理视频帧进行处理的过程。抗混叠算子前后所连接的网络层可以包含多种类型,本公开实施例对此不做限定。
在检测到对待处理视频帧进行处理时,将与待处理视频帧相对应的当前张量信息作为抗混叠上采样算子的输入,基于抗混叠上采样算子对所述当前张量信息进行插值处理,得到第一预处理张量;基于抗混叠非线性算子对第一预处理张量所对应的信号频谱扩大至少两倍,得到与第一预处理图像相对应的目标信号频谱;基于抗混叠下采样算子对目标信号频谱进行下采样处理,并控制下采样频率为原始采样频率的预设数值。
采样频率即是当前图像(即当前运算结果)的大小,例如,当一个正方形边长为L时,应用即可确定出采样频率为L,相应的,截止频率即是图像中包含的信息所能够达到的频率,继续以上述正方形为例,在本实施例带宽严格限制不发生混叠的情况下,截止频率应当小于L/2。张量是一个定义在一些向量空 间和一些对偶空间的笛卡尔积上的多重线性映射,其中每个分量都是坐标的函数,在进行坐标变换时,这些分量也会按照一些规则做线性变换。因此,对于本公开实施例中的每个待处理视频帧来说,张量作为一种几何实体,可以包括标量、向量和现行算子,且可以用坐标系统来表达。下面以应用对当前时刻对应的一个待处理视频帧的处理过程为例进行说明。
在本实施例中,当应用确定出当前时刻待处理视频帧的张量信息后,即可将其输入图像处理模型中,由抗混叠上采样算子对其进行处理,从而得到第一预处理张量。在空间维度上对当前张量信息进行插零处理,得到待处理张量信息;基于插值函数构建的卷积核对待处理张量信息进行插值处理,得到第一预处理张量。例如,基于抗混叠上采样算子,可以在空间维度上间隔插入0,再利用理想的插值函数执行插值操作,其中,插值函数即是最后使用sinc函数构造卷积核对插0操作后的张量信息进行卷积处理,即可得到第一预处理张量。
可以将第一预处理张量作为输入,由图像处理模型中的抗混叠非线性算子进行处理,从而得到与第一预处理张量对应的目标信号频谱。其中,目标信号频谱为目标信号频率谱密度的简称,可以为频率的分布曲线。由于基于图像处理模型中的抗混叠非线性算子,至少可以对第一预处理张量所对应的信号频谱扩大至少两倍,因此可以使用上采样两倍的算子,在进行逐元素的非线性操作,最后经过下采样处理使图像恢复为原尺寸。
最后,在得到目标信号频谱后,即可将目标信号频谱作为输入,由抗混叠下采样算子对其进行下采样处理,并控制下采样频率为原始采样频率的预设数值,其中,原始采样频率与当前张量的采样频率相一致;预设数值与信号频谱的扩大倍数相对应。由于抗混叠下采样算子会使采样频率下降两倍,因此在本实施例中,还需要在图像处理模型中引入带宽为原采样频率四分之一的算子,同时,为了保持旋转不变性,可以预先利用jinc函数来构造与下采样算子对应的卷积核。在所输入的原尺寸的图像上进行卷积后,在空间维度间隔剔除对应的特征即实现了带宽限制的下采样。
在本实施例中,基于图像处理模型中的多个算子对待处理视频帧对应的张量信息进行处理后,即可得到相对应的目标视频帧。本公开实施例的处理方式至少可以使图像呈现出特定的目标效果,目标效果与未抖动效果相一致。例如,当待处理视频帧中连续多帧的画面发生较大变化时,图像处理模型输出的相应的多个连续视频帧则不会呈现出“抖动”的视觉效果,即,与未抖动效果一致。
虽然上述说明为针对于一个待处理视频帧的处理过程,但本领域技术人员应当理解,对于待处理视频帧中其他待处理视频帧来说,同样可以按照本公开 实施例的上述方式将其一一输入至图像处理模型中进行处理,从而得到相应的多个目标视频帧,本公开实施例在此不再赘述。
S130、通过对多个目标视频帧拼接处理,得到目标视频。
在本实施例中,由于每个目标视频帧携带有与相应的待处理视频帧相一致的时间戳,因此,在图像处理模型对多个待处理视频帧进行处理,并输出相应的目标视频帧之后,应用即可按照每个目标视频帧对应的时间戳,对多个图像进行拼接,从而得到目标视频,通过将多帧画面进行拼接并生成目标视频,可以使处理后的画面以非抖动、连贯的形式展示出来。
在应用确定出目标视频后,既可以直接播放该视频,以将处理后的视频画面展示于显示界面上,也可以按照预设路径将目标视频存储至特定的空间内,本公开实施例对此不做限定。
本公开实施例的技术方案,获取待处理视频帧,再将待处理视频帧输入至包括抗混叠算子的图像处理模型中,从而得到与待处理视频帧相对应的目标视频帧,其中,抗混叠算子包括抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子,通过对多个目标视频帧拼接处理,即可得到目标视频,在原视频相邻几帧图像出现较大改变时,可以有效避免所输出的视频画面发生“抖动”,在解决画面“抖动”问题的同时,也不会降低图像的质量以及清晰度,提高了用户的使用体验。
实施例二
图3为本公开实施例二所提供的一种视频处理方法流程示意图,在前述实施例的基础上,通过对抗混叠算子中的抗混叠上采样算子进行优化,将得到的目标抗混叠算子部署至待训练图像处理模型中,并对模型进行训练,不仅在视频处理过程中避免了“抖动”的发生,还减少了计算资源的开销,便于将模型部署于移动端;同时,从频率的角度对神经网络算子进行低通改造,降低了卷积核的个数,使构建的模型更具普适性。其实施方式可以参见本实施例的技术方案。其中,与上述实施例相同或者相应的技术术语在此不再赘述。
如图3所示,该方法包括如下步骤:
S210、获取待处理视频帧。
S220、确定抗混叠算子中的抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子,并将所述抗混叠算子部署至待训练图像处理模型中,以基于训练样本集中的多个训练样本对所述待训练图像处理模型进行训练处理,得到所述图像处理模型。
在本实施例中,在基于图像处理模型对待处理视频帧进行处理之前,应用首先需要确定出预先构建的抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子,将这些算子按照工作人员预先设计的架构进行拼接后,即可得到抗混叠算子,并将其部署至图像训练模型中,下面这一过程进行说明。
对抗混叠算子中的抗混叠上采样算子进行优化,并保持抗混叠非线性算子和抗混叠下采样算子不变,得到目标抗混叠算子,并将目标抗混叠算子部署在待训练图像处理模型中。
在确定目标抗混叠算子的过程中,可以基于原始采样频率、抗混叠下采样算子所对应的截止频率、与滤波器所对应的滤波频率、插值函数以及预设窗的宽度,确定待使用卷积核;通过对待使用卷积核分离处理,确定两个待应用卷积核;基于两个待应用卷积核,确定抗混叠上采样算子。
在本实施例中,除待处理视频帧对应的原始采样频率之外,应用还需要确定出滤波器频谱图的形状,需要确定滤波器对应的w_c以及w_s两个参数,其中,w_c为截止频率,即滤波器允许有效通过的频率,w_s则确定了过渡带transition band的长度,过渡带的长度标志着滤波器的性能以及精确度,其长度越小则表明滤波器性能越好。在实际的设计滤波器的过程中,还需要预先部署一个窗口,例如,凯赛窗kaiser window,其窗的宽度可以以N来表示。本领域技术人员应当理解,凯赛窗是一种局部优化的具有较强能力的窗函数,它是采用修正的零阶贝塞尔函数实现的,本公开实施例对此不再赘述。
基于上述多个参数,应用即可确定出待使用卷积核。本领域技术人员应当理解,卷积核即是图像处理过程中,给定输入图像,输入图像中一个小区域中像素加权求和后成为输出图像中的每个对应像素,其中的权值由一个函数定义,这个函数即是卷积核。在本实施例中,待使用卷积核可以是一个或多个,至少用于对待处理视频帧对应的张量进行处理,在待使用卷积核中还包括多个待使用值。
在本实施例中,如果仅部署单一的上采样过程,在空间维度上间隔内插入0后,再利用x、y两个方向可分离的卷积(1×N′,N′×1,N′=N*2)进行卷积。若记输入的张量tensor为x,计算量则为x.nelement()*4*(N′+N′),然而,如果1×N′,N′×1没有优化,则不利于数据的存储与访问,模型实际的运行速度较慢。因此,在本实施例中,可以把待使用卷积核拆分成两个待应用卷积核,再基于这两个待应用卷积核确定抗混叠上采样算子。例如,应用可以把待使用卷积核拆分成s1=[k1,k3,k5,k7,…,kN′-1],s2=[k2,k4,k6,k8,…,kN′],从而得到两个待应用卷积核。
通过对两个待应用卷积核组合处理,得到至少四个待部署卷积核,并将所 述至少四个待部署卷积核作为所述抗混叠上采样算子。继续以上述示例进行说明,当得到s1、s2两个待应用卷积核后,即可基于这两个卷积核构造出4个N×N的卷积核,从而实现在原始尺寸的待处理视频帧上做N×N卷积,同时,在这一过程中不用在空间维度上的间隔内插入0。基于抗混叠上采样算子中的至少四个待部署卷积核对与待处理视频帧相对应的当前张量信息进行处理,即可得到第一预处理张量,例如,把结果concat处理,并执行PixelShuffle方法,即可完成确定抗混叠上采样算子并得到相应的第一预处理张量的操作。本领域技术人员应当理解,PixelShuffle方法可以对缩小后的特征图进行有效的放大,可以替代插值或解卷积的方法实现upscale。
对待处理视频帧进行处理时,基于图像处理模型中的四个待部署卷积核,即可实现本公开实施例中的上采样过程,进而将处理结果输入给抗混叠非线性算子,并根据本公开实施例一的方式执行后续图像处理过程。
最后,基于训练样本集对待训练图像处理模型进行训练,得到图像处理模型,以将图像处理模型部署至算力小于预设算力阈值的终端设备。训练样本集可以是包含输入以及相应输出的图片数据,对待训练图像处理模型进行训练的过程中,可以基于该模型对应的损失函数对图片数据进行损失处理,以根据得到的多个损失值对待训练图像处理模型中的模型参数进行修正,同时,将损失函数收敛作为训练目标,即可得到训练完成的图像处理模型。
待训练图像处理模型在对训练集中的多个作为输入的图像进行处理,并得到相应的输出后,可以基于输出以及训练集中作为输出的图像,确定出对应的多个损失值,利用多个损失值以及损失函数对模型参数进行修正时,可以将损失函数的训练误差,即损失参数作为检测损失函数当前是否达到收敛的条件,比如训练误差是否小于预设误差或误差变化趋势是否趋于稳定,或者当前的迭代次数是否等于预设次数。若检测达到收敛条件,比如损失函数的训练误差小于预设误差,或者误差变化趋势趋于稳定,表明该待训练图像处理模型训练完成,此时可以停止迭代训练。若检测到当前未达到收敛条件,可以获取其他训练集对模型继续进行训练,直至损失函数的训练误差在预设范围之内。当损失函数的训练误差达到收敛时,即可将训练完成的待训练图像处理模型作为待使用图像处理模型并部署至应用中。
S230、基于目标抗混叠算子中的抗混叠非线性算子和抗混叠下采样算子依次对第一预处理张量进行处理,得到目标视频帧。
S240、通过对多个目标视频帧拼接处理,得到目标视频。
本实施例的技术方案,通过对抗混叠算子中的抗混叠上采样算子进行优化,将得到的目标抗混叠算子部署至待训练图像处理模型中,并对模型进行训练, 不仅在视频处理过程中避免了“抖动”的发生,还减少了计算资源的开销,便于将模型部署于移动端;同时,从频率的角度对神经网络算子进行低通改造,降低了卷积核的个数,使构建的模型更具普适性。
实施例三
图4为本公开实施例三所提供的一种视频处理装置结构示意图,如图4所示,包括:待处理视频帧获取模块310、目标视频帧确定模块320以及目标视频生成模块330。
待处理视频帧获取模块310,设置为获取待处理视频帧。
目标视频帧确定模块320,设置为将所述待处理视频帧输入至图像处理模型中,以得到与所述待处理视频帧相对应的目标视频帧;其中,所述图像处理模型中包括对待处理视频帧处理的抗混叠算子,所述抗混叠算子中包括抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子。
目标视频生成模块330,设置为通过对多个目标视频帧拼接处理,得到目标视频。
在上述技术方案的基础上,目标视频帧确定模块320,还设置为在基于所述图像处理模型对所述待处理视频帧进行非线性处理时,基于所述抗混叠算子中的抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子依次对所述待处理视频帧进行处理,以得到具有目标效果的目标视频帧;其中,所述目标效果与未抖动效果相一致。
在上述技术方案的基础上,目标视频帧确定模块320包括第一预处理张量确定单元、目标信号频谱确定单元以及下采样处理单元。
第一预处理张量确定单元,设置为在检测到对待处理视频帧进行非线性处理时,将与所述待处理视频帧相对应的当前张量信息作为所述抗混叠上采样算子的输入,基于所述抗混叠上采样算子对所述当前张量信息进行插值处理,得到第一预处理张量。
目标信号频谱确定单元,设置为基于所述抗混叠非线性算子对所述第一预处理张量所对应的信号频谱扩大至少两倍,得到与所述第一预处理图像相对应的目标信号频谱。
下采样处理单元,设置为基于所述抗混叠下采样算子对所述目标信号频谱进行下采样处理,并控制下采样频率为原始采样频率的预设数值;其中,所述原始采样频率与所述当前张量的采样频率相一致;所述预设数值与所述信号频 谱的扩大倍数相对应。
在上述技术方案的基础上,第一预处理张量确定单元,还设置为在空间维度上对所述当前张量信息进行插零处理,得到待处理张量信息;基于插值函数构建的卷积核对所述待处理张量信息进行插值处理,得到所述第一预处理张量。
在上述技术方案的基础上,视频处理装置还包括模型训练模块。
模型训练模块,设置为确定抗混叠算子中的抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子,并将所述抗混叠算子部署至待训练图像处理模型中,以基于训练样本集中的多个训练样本对所述待训练图像处理模型进行训练处理,得到所述图像处理模型。
在上述技术方案的基础上,视频处理装置还包括目标抗混叠算子确定模块以及图像处理模型确定模块。
目标抗混叠算子确定模块,设置为对所述抗混叠算子中的抗混叠上采样算子进行优化,并保持抗混叠非线性算子和抗混叠下采样算子不变,得到目标抗混叠算子,并将所述目标抗混叠算子部署在所述待训练图像处理模型中。
图像处理模型确定模块,设置为基于训练样本集对所述待训练图像处理模型进行训练,得到所述图像处理模型,以将所述图像处理模型部署至算力小于预设算力阈值的终端设备。
在上述技术方案的基础上,目标抗混叠算子确定模块包括待使用卷积核确定单元、待应用卷积核确定单元以及抗混叠上采样算子确定单元。
待使用卷积核确定单元,设置为基于原始采样频率、所述抗混叠下采样算子所对应的截止频率、与滤波器所对应的滤波频率、插值函数以及预设窗的宽度,确定待使用卷积核;其中,所述待使用卷积核中包括多个待使用值。
待应用卷积核确定单元,设置为通过对所述待使用卷积核分离处理,确定两个待应用卷积核。
抗混叠上采样算子确定单元,设置为基于两个待应用卷积核,确定所述抗混叠上采样算子。
在上述技术方案的基础上,抗混叠上采样算子确定单元,还设置为通过对两个待应用卷积核组合处理,得到至少四个待部署卷积核,并将所述至少四个待部署卷积核作为所述抗混叠上采样算子。
在上述技术方案的基础上,目标视频帧确定模块320,还设置为基于所述抗混叠上采样算子中的至少四个待部署卷积核对与所述待处理视频帧相对应的当前张量信息进行处理,得到第一预处理张量,以基于所述目标抗混叠算子中的 抗混叠非线性算子和抗混叠下采样算子依次对所述第一预处理张量进行处理。
本实施例所提供的技术方案,获取待处理视频帧,再将待处理视频帧输入至包括抗混叠算子的图像处理模型中,从而得到与待处理视频帧相对应的目标视频帧,其中,抗混叠算子包括抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子,通过对多个目标视频帧拼接处理,即可得到目标视频,在原视频相邻几帧图像出现较大改变时,可以有效避免所输出的视频画面发生“抖动”,在解决画面“抖动”问题的同时,也不会降低图像的质量以及清晰度,提高了用户的使用体验。
本公开实施例所提供的视频处理装置可执行本公开任意实施例所提供的视频处理方法,具备执行方法相应的功能模块和效果。
上述装置所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,多个功能单元的名称也只是为了便于相互区分,并不用于限制本公开实施例的保护范围。
实施例四
图5为本公开实施例四所提供的一种电子设备的结构示意图。下面参考图5,其示出了适于用来实现本公开实施例的电子设备(例如图5中的终端设备或服务器)400的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(PDA)、平板电脑(PAD)、便携式多媒体播放器(PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字电视(Television,TV)、台式计算机等等的固定终端。图5示出的电子设备400仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图5所示,电子设备400可以包括处理装置(例如中央处理器、图案处理器等)401,其可以根据存储在只读存储器(Read-Only Memory,ROM)402中的程序或者从存储装置408加载到随机访问存储器(Random Access Memory,RAM)403中的程序而执行多种适当的动作和处理。在RAM 403中,还存储有电子设备400操作所需的多种程序和数据。处理装置401、ROM 402以及RAM403通过总线404彼此相连。编辑/输出(Input/Output,I/O)接口405也连接至总线404。
通常,以下装置可以连接至I/O接口405:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置406;包括例如液晶显示器(Liquid Crystal Display,LCD)、扬声器、振动器等的输出装置407;包括 例如磁带、硬盘等的存储装置408;以及通信装置409。通信装置409可以允许电子设备400与其他设备进行无线或有线通信以交换数据。虽然图5示出了具有多种装置的电子设备400,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置409从网络上被下载和安装,或者从存储装置408被安装,或者从ROM 402被安装。在该计算机程序被处理装置401执行时,执行本公开实施例的方法中限定的上述功能。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
本公开实施例提供的电子设备与上述实施例提供的视频处理方法属于同一构思,未在本实施例中详尽描述的技术细节可参见上述实施例,并且本实施例与上述实施例具有相同的效果。
实施例五
本公开实施例提供了一种计算机存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述实施例所提供的视频处理方法。
本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、RAM、ROM、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算 机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如超文本传输协议(HyperText Transfer Protocol,HTTP)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:
获取待处理视频帧;将所述待处理视频帧输入至图像处理模型中,以得到与所述待处理视频帧相对应的目标视频帧;其中,所述图像处理模型中包括对待处理视频帧处理的抗混叠算子,所述抗混叠算子中包括抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子;通过对多个目标视频帧拼接处理,得到目标视频。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括LAN或WAN—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开多种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图 中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在一种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field Programmable Gate Array,FPGA)、专用集成电路(Application Specific Integrated Circuit,ASIC)、专用标准产品(Application Specific Standard Parts,ASSP)、片上系统(System on Chip,SOC)、复杂可编程逻辑设备(Complex Programming Logic Device,CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、RAM、ROM、EPROM或快闪存储器、光纤、CD-ROM、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,【示例一】提供了一种视频处理方法,该方法包括:
获取待处理视频帧;
将所述待处理视频帧输入至图像处理模型中,以得到与所述待处理视频帧相对应的目标视频帧;其中,所述图像处理模型中包括对待处理视频帧处理的抗混叠算子,所述抗混叠算子中包括抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子;
通过对多个目标视频帧拼接处理,得到目标视频。
根据本公开的一个或多个实施例,【示例二】提供了一种视频处理方法, 该方法,还包括:
在基于所述图像处理模型对所述待处理视频帧进行非线性处理时,基于所述抗混叠算子中的抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子依次对所述待处理视频帧进行处理,以得到具有目标效果的目标视频帧;
其中,所述目标效果与未抖动效果相一致。
根据本公开的一个或多个实施例,【示例三】提供了一种视频处理方法,该方法,还包括:
在检测到对待处理视频帧进行非线性处理时,将与所述待处理视频帧相对应的当前张量信息作为所述抗混叠上采样算子的输入,基于所述抗混叠上采样算子对所述当前张量信息进行插值处理,得到第一预处理张量;
基于所述抗混叠非线性算子对所述第一预处理张量所对应的信号频谱扩大至少两倍,得到与所述第一预处理图像相对应的目标信号频谱;
基于所述抗混叠下采样算子对所述目标信号频谱进行下采样处理,并控制下采样频率为原始采样频率的预设数值;
其中,所述原始采样频率与所述当前张量的采样频率相一致;所述预设数值与所述信号频谱的扩大倍数相对应。
根据本公开的一个或多个实施例,【示例四】提供了一种视频处理方法,该方法,还包括:
在空间维度上对所述当前张量信息进行插零处理,得到待处理张量信息;
基于插值函数构建的卷积核对所述待处理张量信息进行插值处理,得到所述第一预处理张量。
根据本公开的一个或多个实施例,【示例五】提供了一种视频处理方法,该方法,还包括:
确定抗混叠算子中的抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子,并将所述抗混叠算子部署至待训练图像处理模型中,以基于训练样本集中的多个训练样本对所述待训练图像处理模型进行训练处理,得到所述图像处理模型。
根据本公开的一个或多个实施例,【示例六】提供了一种视频处理方法,该方法,还包括:
对所述抗混叠算子中的抗混叠上采样算子进行优化,并保持抗混叠非线性算子和抗混叠下采样算子不变,得到目标抗混叠算子,并将所述目标抗混叠算子部署在所述待训练图像处理模型中;
基于训练样本集对所述待训练图像处理模型进行训练,得到所述图像处理模型,以将所述图像处理模型部署至算力小于预设算力阈值的终端设备。
根据本公开的一个或多个实施例,【示例七】提供了一种视频处理方法,该方法,还包括:
基于原始采样频率、所述抗混叠下采样算子所对应的截止频率、与滤波器所对应的滤波频率、插值函数以及预设窗的宽度,确定待使用卷积核;其中,所述待使用卷积核中包括多个待使用值;
通过对所述待使用卷积核分离处理,确定两个待应用卷积核;
基于两个待应用卷积核,确定所述抗混叠上采样算子。
根据本公开的一个或多个实施例,【示例八】提供了一种视频处理方法,该方法,还包括:
通过对两个待应用卷积核组合处理,得到至少四个待部署卷积核,并将所述至少四个待部署卷积核作为所述抗混叠上采样算子。
根据本公开的一个或多个实施例,【示例九】提供了一种视频处理方法,该方法,还包括:
基于所述抗混叠上采样算子中的至少四个待部署卷积核对与所述待处理视频帧相对应的当前张量信息进行处理,得到第一预处理张量,以基于所述目标抗混叠算子中的抗混叠非线性算子和抗混叠下采样算子依次对所述第一预处理张量进行处理。
根据本公开的一个或多个实施例,【示例十】提供了一种视频处理装置,该装置包括:
待处理视频帧获取模块,设置为获取待处理视频帧;
目标视频帧确定模块,设置为将所述待处理视频帧输入至图像处理模型中,以得到与所述待处理视频帧相对应的目标视频帧;其中,所述图像处理模型中包括对待处理视频帧处理的抗混叠算子,所述抗混叠算子中包括抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子;
目标视频生成模块,设置为通过对多个目标视频帧拼接处理,得到目标视频。
此外,虽然采用特定次序描绘了多个操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了多个实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描 述的一些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的多种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。

Claims (13)

  1. 一种视频处理方法,包括:
    获取待处理视频帧;
    将所述待处理视频帧输入至图像处理模型中,以得到与所述待处理视频帧相对应的目标视频帧;其中,所述图像处理模型中包括对所述待处理视频帧处理的抗混叠算子,所述抗混叠算子中包括抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子;
    通过对多个目标视频帧拼接处理,得到目标视频。
  2. 根据权利要求1所述的方法,其中,所述将所述待处理视频帧输入至图像处理模型中,以得到与所述待处理视频帧相对应的目标视频帧,包括:
    在基于所述图像处理模型对所述待处理视频帧进行非线性处理的情况下,基于所述抗混叠算子中的抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子依次对所述待处理视频帧进行处理,以得到具有目标效果的目标视频帧;
    其中,所述目标效果与未抖动效果相一致。
  3. 根据权利要求2所述的方法,其中,所述基于所述抗混叠算子中的抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子依次对所述待处理视频帧进行处理,包括:
    在检测到对所述待处理视频帧进行非线性处理的情况下,将与所述待处理视频帧相对应的当前张量信息作为所述抗混叠上采样算子的输入,基于所述抗混叠上采样算子对所述当前张量信息进行插值处理,得到第一预处理张量;
    基于所述抗混叠非线性算子对所述第一预处理张量所对应的信号频谱扩大至少两倍,得到与所述第一预处理图像相对应的目标信号频谱;
    基于所述抗混叠下采样算子对所述目标信号频谱进行下采样处理,并控制下采样频率为原始采样频率的预设数值;
    其中,所述原始采样频率与所述当前张量的采样频率相一致;所述预设数值与所述信号频谱的扩大倍数相对应。
  4. 根据权利要求3所述的方法,其中,所述基于所述抗混叠上采样算子对所述当前张量信息进行插值处理,得到第一预处理张量,包括:
    在空间维度上对所述当前张量信息进行插零处理,得到待处理张量信息;
    基于插值函数构建的卷积核对所述待处理张量信息进行插值处理,得到所述第一预处理张量。
  5. 根据权利要求1所述的方法,还包括:
    确定所述抗混叠算子中的抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子,并将所述抗混叠算子部署至待训练图像处理模型中,以基于训练样本集中的多个训练样本对所述待训练图像处理模型进行训练处理,得到所述图像处理模型。
  6. 根据权利要求5所述的方法,还包括:
    对所述抗混叠算子中的抗混叠上采样算子进行优化,并保持所述抗混叠非线性算子和所述抗混叠下采样算子不变,得到目标抗混叠算子,并将所述目标抗混叠算子部署在所述待训练图像处理模型中;
    基于训练样本集对所述待训练图像处理模型进行训练,得到所述图像处理模型,以将所述图像处理模型部署至算力小于预设算力阈值的终端设备。
  7. 根据权利要求6所述的方法,其中,所述对所述抗混叠算子中的抗混叠上采样算子进行优化,包括:
    基于原始采样频率、所述抗混叠下采样算子所对应的截止频率、与滤波器所对应的滤波频率、插值函数以及预设窗的宽度,确定待使用卷积核;其中,所述待使用卷积核中包括多个待使用值;
    通过对所述待使用卷积核进行分离处理,确定两个待应用卷积核;
    基于所述两个待应用卷积核,确定所述抗混叠上采样算子。
  8. 根据权利要求7所述的方法,其中,所述基于两个待应用卷积核,确定所述抗混叠上采样算子,包括:
    通过对所述两个待应用卷积核进行组合处理,得到至少四个待部署卷积核,并将所述至少四个待部署卷积核作为所述抗混叠上采样算子。
  9. 根据权利要求8所述的方法,其中,基于所述抗混叠算子对所述待处理视频帧进行处理,包括:
    基于所述抗混叠上采样算子中的至少四个待部署卷积核对与所述待处理视频帧相对应的当前张量信息进行处理,得到第一预处理张量,基于所述目标抗混叠算子中的抗混叠非线性算子和抗混叠下采样算子依次对所述第一预处理张量进行处理。
  10. 一种视频处理装置,包括:
    待处理视频帧获取模块,设置为获取待处理视频帧;
    目标视频帧确定模块,设置为将所述待处理视频帧输入至图像处理模型中, 以得到与所述待处理视频帧相对应的目标视频帧;其中,所述图像处理模型中包括对待处理视频帧处理的抗混叠算子,所述抗混叠算子中包括抗混叠上采样算子、抗混叠非线性算子以及抗混叠下采样算子;
    目标视频生成模块,设置为通过对多个目标视频帧拼接处理,得到目标视频。
  11. 一种电子设备,包括:
    至少一个处理器;
    存储装置,设置为存储至少一个程序,
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-9中任一所述的视频处理方法。
  12. 一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如权利要求1-9中任一所述的视频处理方法。
  13. 一种计算机程序产品,包括承载在非暂态计算机可读介质上的计算机程序,所述计算机程序包含用于执行如权利要求1-9中任一所述的视频处理方法的程序代码。
PCT/CN2023/080197 2022-03-24 2023-03-08 视频处理方法、装置、电子设备及存储介质 WO2023179360A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210303579.3A CN114640796B (zh) 2022-03-24 2022-03-24 视频处理方法、装置、电子设备及存储介质
CN202210303579.3 2022-03-24

Publications (1)

Publication Number Publication Date
WO2023179360A1 true WO2023179360A1 (zh) 2023-09-28

Family

ID=81949960

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/080197 WO2023179360A1 (zh) 2022-03-24 2023-03-08 视频处理方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN114640796B (zh)
WO (1) WO2023179360A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114640796B (zh) * 2022-03-24 2024-02-09 北京字跳网络技术有限公司 视频处理方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541878A (zh) * 2020-12-24 2021-03-23 北京百度网讯科技有限公司 建立图像增强模型与图像增强的方法、装置
CN113034648A (zh) * 2021-04-30 2021-06-25 北京字节跳动网络技术有限公司 图像处理方法、装置、设备和存储介质
US20210209731A1 (en) * 2020-01-03 2021-07-08 Beijing Baidu Netcom Science And Technology Co., Ltd. Video processing method, apparatus, device and storage medium
US20210327033A1 (en) * 2019-03-19 2021-10-21 Shenzhen Sensetime Technology Co., Ltd. Video processing method and apparatus, and computer storage medium
CN114640796A (zh) * 2022-03-24 2022-06-17 北京字跳网络技术有限公司 视频处理方法、装置、电子设备及存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5173948A (en) * 1991-03-29 1992-12-22 The Grass Valley Group, Inc. Video image mapping system
US20010055320A1 (en) * 1994-12-15 2001-12-27 Pierzga Wayne Francis Multiplex communication
JPH09289437A (ja) * 1996-04-22 1997-11-04 Sony Corp デジタルリミッタ装置
US20020145610A1 (en) * 1999-07-16 2002-10-10 Steve Barilovits Video processing engine overlay filter scaler
CA2522856C (en) * 2000-06-09 2008-01-15 General Instrument Corporation Video size conversion and transcoding from mpeg-2 to mpeg-4
ATE425517T1 (de) * 2002-12-20 2009-03-15 Ericsson Telefon Ab L M Kostengünstige supersampling-aufrasterung
CN105408935B (zh) * 2013-04-26 2018-12-21 弗劳恩霍夫应用研究促进协会 上采样和信号增强
US10797739B1 (en) * 2019-03-11 2020-10-06 Samsung Electronics Co., Ltd. Nonlinear self-interference cancellation with sampling rate mismatch
CN112769441B (zh) * 2020-12-26 2022-11-01 南京理工大学 基于随机近端梯度张量分解的vdes接收碰撞信号分离方法
CN113221977B (zh) * 2021-04-26 2022-04-29 中国科学院大学 一种基于抗混叠语义重构的小样本语义分割方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210327033A1 (en) * 2019-03-19 2021-10-21 Shenzhen Sensetime Technology Co., Ltd. Video processing method and apparatus, and computer storage medium
US20210209731A1 (en) * 2020-01-03 2021-07-08 Beijing Baidu Netcom Science And Technology Co., Ltd. Video processing method, apparatus, device and storage medium
CN112541878A (zh) * 2020-12-24 2021-03-23 北京百度网讯科技有限公司 建立图像增强模型与图像增强的方法、装置
CN113034648A (zh) * 2021-04-30 2021-06-25 北京字节跳动网络技术有限公司 图像处理方法、装置、设备和存储介质
CN114640796A (zh) * 2022-03-24 2022-06-17 北京字跳网络技术有限公司 视频处理方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN114640796A (zh) 2022-06-17
CN114640796B (zh) 2024-02-09

Similar Documents

Publication Publication Date Title
CN112419151B (zh) 图像退化处理方法、装置、存储介质及电子设备
EP4172927A1 (en) Image super-resolution reconstructing
WO2020093724A1 (zh) 生成信息的方法和装置
CN110298851B (zh) 人体分割神经网络的训练方法及设备
WO2023179360A1 (zh) 视频处理方法、装置、电子设备及存储介质
CN113327318B (zh) 图像显示方法、装置、电子设备和计算机可读介质
WO2022132032A1 (zh) 人像图像处理方法及装置
EP4200793A1 (en) Neural network-based devices and method to remove video coding artifacts in video stream
CN115209064A (zh) 视频合成方法、装置、设备及存储介质
CN110636331B (zh) 用于处理视频的方法和装置
WO2023103682A1 (zh) 图像处理方法、装置、设备及介质
WO2022252883A1 (zh) 图像修复模型的训练方法及图像修复方法、装置及设备
CN111369475A (zh) 用于处理视频的方法和装置
WO2023025085A1 (zh) 视频处理方法、装置、设备、介质及程序产品
WO2022257677A1 (zh) 图像处理方法、装置、设备及存储介质
CN111798385B (zh) 图像处理方法及装置、计算机可读介质和电子设备
CN111815535B (zh) 图像处理方法、装置、电子设备和计算机可读介质
CN111275813B (zh) 数据处理方法、装置和电子设备
CN114283060B (zh) 视频生成方法、装置、设备及存储介质
WO2023093838A1 (zh) 超分图像处理方法、装置、设备及介质
CN111738958B (zh) 图片修复方法、装置、电子设备及计算机可读介质
WO2024131503A1 (zh) 特效图的生成方法、装置、设备及存储介质
JP2024506828A (ja) 画像輝度の調整方法、装置、電子機器及び媒体
CN117114981A (zh) 超分网络参数调整方法、装置、设备、介质及程序产品
WO2023136780A2 (zh) 图像处理方法及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23773609

Country of ref document: EP

Kind code of ref document: A1