WO2024027639A1 - 图片组长度确定方法、装置、计算机设备及可读介质 - Google Patents

图片组长度确定方法、装置、计算机设备及可读介质 Download PDF

Info

Publication number
WO2024027639A1
WO2024027639A1 PCT/CN2023/110200 CN2023110200W WO2024027639A1 WO 2024027639 A1 WO2024027639 A1 WO 2024027639A1 CN 2023110200 W CN2023110200 W CN 2023110200W WO 2024027639 A1 WO2024027639 A1 WO 2024027639A1
Authority
WO
WIPO (PCT)
Prior art keywords
scene
length
picture group
preset
image
Prior art date
Application number
PCT/CN2023/110200
Other languages
English (en)
French (fr)
Inventor
杨维
徐科
孔德辉
曹洲
陈杰
Original Assignee
深圳市中兴微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市中兴微电子技术有限公司 filed Critical 深圳市中兴微电子技术有限公司
Publication of WO2024027639A1 publication Critical patent/WO2024027639A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/142Detection of scene cut or scene change
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding

Definitions

  • the present disclosure relates to the technical field of video encoding and decoding, and specifically relates to a method, device, computer equipment and readable medium for determining the length of a picture group.
  • Each frame is further divided into Slice, Tile, macroblock, prediction module and transformation module. , using technologies such as Prediction, Transform, Quantization, Filter and Entropy Coding to compress the video.
  • the encoding process is shown in Figure 1, where prediction is the combination of space and time. Dimensional redundancy is removed by subtracting the pixels of the original block from similar reconstructed blocks elsewhere in the same frame or similar reconstructed blocks in other frames to obtain the residual block. The residual block is input to the transform to obtain a non-zero value concentrated in a certain area.
  • the transform coefficients are sent to the quantization module to obtain only a few quantized values, combined with other information such as motion vectors, intra-frame prediction modes, etc., and are sent to the entropy coding module for encoding to obtain a code stream with a low compression rate. As technology changes, higher quality can be achieved at lower compression rates.
  • the present disclosure provides a method, device, computer equipment and readable medium for determining the length of a picture group.
  • a method for determining the length of a picture group includes: respectively obtaining the coding information, optical flow motion vectors and feature parameters of the video stream; using a preset neural network model, according to the The coding information, the motion vector and the characteristic parameters determine the scene to which each pixel of each frame image in the video stream belongs, and the scene includes at least two preset scenes; and, under the current judgment scene, for each frame image, determine the length of the picture group based on the number of pixels in the image belonging to the current judgment scene and the total number of pixels of the image. If the length of the picture group cannot be determined, determine the picture in the next judgment scene. The length of the group until all the preset scenes are traversed; wherein the current judgment scene is one of the preset scenes and is determined according to the preset scene judgment sequence.
  • a picture group length determination device including a processing module, a scene determination module Block and picture group length determination module, the processing module is configured to obtain the coding information, optical flow motion vector and characteristic parameters of the video stream respectively;
  • the scene determination module is configured to use a preset neural network model to determine the scene to which each pixel of each frame image in the video stream belongs based on the encoding information, the motion vector and the characteristic parameters, the Scenes include at least two preset scenes;
  • the picture group length determination module is configured to, in the current judgment scenario, determine the length of the picture group for each frame of image according to the number of pixels in the image belonging to the current judgment scene and the total number of pixels of the image. If the length of the picture group cannot be determined, the length of the picture group is determined in the next judgment scenario until all the preset scenes are traversed; wherein the current judgment scene is one of the preset scenes, Determine according to the preset scene judgment sequence.
  • a computer device including: one or more processors; a storage device having one or more programs stored thereon; when the one or more programs are processed by the When executed by or multiple processors, the one or more processors implement the picture group length determination method as described above.
  • a computer-readable medium on which a computer program is stored, wherein when the program is executed, the picture group length determination method as described above is implemented.
  • Figure 1 is a schematic diagram of the H266 hybrid encoding framework in related technologies
  • Figure 2 is a schematic flowchart of a method for determining the length of a picture group according to an embodiment of the present disclosure
  • Figure 3 is a schematic flowchart of determining the scene to which each pixel of each frame image in the video stream belongs according to an embodiment of the present disclosure
  • Figure 4 is a schematic flowchart 1 of determining the length of a picture group according to an embodiment of the present disclosure
  • Figure 5 is a schematic flowchart 2 of determining the length of a picture group according to an embodiment of the present disclosure
  • Figure 6 is a schematic flowchart of a scene for determining where each pixel of each frame image in the video stream belongs according to an embodiment of the present disclosure
  • Figure 7 is a schematic flowchart of determining the length of a picture group according to a specific example of the present disclosure
  • FIG. 8 is a schematic structural diagram of a device for determining the length of a picture group according to an embodiment of the present disclosure.
  • Embodiments described herein may be described with reference to plan and/or cross-sectional illustrations, with the aid of idealized schematic illustrations of the present disclosure. Accordingly, example illustrations may be modified based on manufacturing techniques and/or tolerances. Therefore, the embodiments are not limited to those shown in the drawings but include modifications of configurations formed based on the manufacturing process. Accordingly, the regions illustrated in the figures have a schematic nature, and in the figures The shapes of the regions shown are illustrative of the specific shapes of regions of the element and are not intended to be limiting.
  • I frames can be reset according to scene switching, but because the scene detection method is not reliable enough, the final effect is not satisfactory.
  • secondary or multiple encoding in the field of encoding and decoding can provide the previous or previous encoding information for subsequent encoding, research on secondary or multiple encoding has also made certain progress. It would be a good direction if we could combine neural network and secondary coding to classify the scene of the video content and finally determine the length of the picture group.
  • An embodiment of the present disclosure provides a method for determining the length of a picture group. As shown in Figure 2, the method includes the following steps S11 to S13.
  • step S11 the coding information, optical flow motion vector and feature parameters of the video stream are obtained respectively.
  • each frame image in the video stream can be encoded at least once to obtain, for example, 5 channels of encoding information.
  • the coding information may include: prediction method, prediction block division, intra prediction mode, and inter prediction motion vector (Motion Vector, MV).
  • the encoded picture group adopts a fixed length, and the content in the encoding process is retained for the first encoding process, such as whether the intra-frame prediction method or the inter-frame prediction method used for encoding is represented by 0 and 1 respectively, forming a Channel; the division of prediction blocks, using 1 to represent the prediction block boundary and 0 to represent non-border, forming one channel; the prediction mode of intra-frame prediction forms one channel; the motion vector of inter-frame prediction forms two channels.
  • the encoding information can also be obtained through multiple encodings. It should be noted that the encoding information obtained by encoding does not necessarily only include the content corresponding to the above five channels, as long as the encoding information can be obtained by the encoder, it can be used.
  • the optical flow motion vector is extracted from each frame of video image in the video stream to obtain the optical flow motion vector in the horizontal and vertical directions.
  • the optical flow motion vector in each direction forms a channel respectively.
  • the optical flow of the video can be obtained by using a traditional algorithm or a neural network algorithm to obtain the optical flow motion vector, forming two channels (including horizontal and vertical motion vectors).
  • a feature extraction network (such as the VGG16 feature part network of the classification network, etc.) can be used to extract features from each frame of the video image in the video stream to obtain feature parameters of a preset number (C) of channels.
  • Step S12 Use a preset neural network model to determine the scene to which each pixel of each frame image in the video stream belongs based on the coding information, motion vectors and feature parameters.
  • the scene includes at least two preset scenes.
  • the pixel-level classification of the image is also the semantic segmentation of the image. In this step, it is necessary to determine the classification of each pixel (that is, the scene described by the pixel).
  • Figure 3 shows the process of extracting pixel-level classification of images according to an embodiment of the present disclosure.
  • the (C+7) channel data (ie, encoding information, optical flow motion vectors and feature parameters) obtained in step S11 are input
  • a semantic segmentation model which is a neural network model, and obtains a probability feature map composed of 4 channels that are exactly the same size as the image in the video stream.
  • Each channel represents the probability that the pixel is a certain preset scene.
  • the scene with the highest probability is the scene to which the pixel belongs.
  • each pixel of each frame has a corresponding scene type. There is not only one scene in each frame, but four scenes may exist.
  • the labels for this semantic segmentation task can be manually calibrated, that is, multiple videos are collected, and each pixel in the video is pre-guided to the category of the scene it is in. It should be noted that the labels in model training may not be manually calibrated, but the results obtained by better algorithms can be used as labels.
  • the loss function of the semantic segmentation model during training can be cross entropy or an improved method of cross entropy.
  • Step S13 Under the current judgment scene, for each frame image, determine the length of the picture group according to the number of pixels in the image belonging to the current judgment scene and the total number of pixels in the image. If the length of the picture group cannot be determined, the length of the picture group will be determined in the next step. The length of the picture group is determined under the judgment scene until all preset scenes are traversed; the current judgment scene is one of the preset scenes and is determined according to the preset scene judgment sequence.
  • the length of the picture group is determined based on the judgment scene. If the length of the picture group cannot be determined in the current judgment scene, the length of the picture group is determined in the next judgment scene. In one judgment scene, the judgment is made frame by frame based on the image. , determine the length of the picture group according to the pixel ratio of the current judgment scene of the image and the preset threshold, where the pixel ratio of the current judgment scene of the image is the ratio of the number of pixels belonging to the current judgment scene in the image to the total number of pixels in the image.
  • the method for determining the length of the picture group respectively obtains the coding information, optical flow motion vector and feature parameters of the video stream; uses the preset neural network model to determine the length of the video stream based on the coding information, motion vector and feature parameters.
  • the scene to which each pixel of each frame image belongs, and the scene includes at least two preset scenes; in the current judgment scene, for each frame image, determine the picture group according to the number of pixels in the image belonging to the current judgment scene and the total number of pixels in the image. Length, if the length of the picture group cannot be determined, the length of the picture group will be determined in the next judgment scenario until all preset scenes are traversed; among them, the current judgment scene is one of the preset scenes.
  • Embodiments of the present disclosure combine adaptive coding of video content information with neural networks to perform scene classification on the video content to determine the length of the picture group.
  • the length of the picture group can change as the video content changes, thereby improving the encoding quality and bit rate. Reduce the size of the compressed code stream.
  • each judgment scenario corresponds to a preset threshold.
  • the length of the picture group is determined based on the number of pixels in the image belonging to the current judgment scene and the total number of pixels in the image (ie step S13), including the following steps S131 and S132.
  • step S131 within the image range from the Kth frame to the (K+Gmax)th frame, for each frame image, calculate the ratio of the number of pixels belonging to the current judgment scene in each frame image to the total number of pixels of the image, and obtain the image The pixel ratio of the current judgment scene; among them, the Kth frame image is the next frame image of the last frame image in the previous picture group, Gmax is the preset length (default value), and Gmax is greater than or equal to the number of preset scenes .
  • the length of the picture group is determined within the image range of the preset length Gmax.
  • the frame image range starts from the K-th frame image and ends at the (K+Gmax)-th frame image.
  • the K frame image is the next frame image of the last frame image in the previous picture group, and is also the first frame image of the current picture group.
  • the preset length Gmax is the maximum length of the picture group, and the minimum value of Gmax is the number of preset scenes; the maximum value of Gmax can be infinite, and in some embodiments, can be set to 64.
  • p is the pixel ratio of the current judgment scene i in the image
  • Pi is the number of pixels in the image belonging to the current judgment scene i
  • P is the total number of pixels in the image.
  • step S132 the length of the current picture group is determined based on the pixel ratio and the preset threshold of the current judgment scene.
  • step S132 it is determined based on the pixel ratio and the preset threshold of the current judgment scene.
  • the length of the previous picture group includes the following steps S1321 to S1323.
  • step S1321 within the image range from the Kth frame to the (K+Gmax)th frame, according to the order of each frame image in the video stream, the pixel ratio of the current judgment scene of the current frame image is compared with the predetermined value of the current judgment scene.
  • Set a threshold for comparison When the pixel ratio is less than or equal to the preset threshold, compare the pixel ratio of the current judgment scene of the next frame image with the preset threshold until the comparison stops when the pixel ratio is greater than the preset threshold.
  • step S1322 if the pixel ratio of the current judgment scene is greater than the preset threshold, the length of the current picture group is determined based on the comparison result between the pixel ratio of the current judgment scene and the preset threshold.
  • the following operations are performed frame by frame: Compare the pixel ratio of the current judgment scene of the current frame image with the predetermined value of the current judgment scene. Set the threshold for comparison. If the former is less than or equal to the latter, the above comparison operation will be performed for the next frame of image, and so on, until the comparison stops when the pixel ratio is greater than the preset threshold.
  • the pixel ratio of the scene can be judged based on the current
  • the length of the current picture group is determined by comparing the result with the preset threshold, that is, the length of the current picture group is determined in the current judgment scenario, and there is no need to perform the above processing steps in the next judgment scenario.
  • determining the length of the current picture group based on the comparison result between the pixel ratio of the current judgment scene and the preset threshold includes the following steps: determining the pixel ratio of the current judgment scene that is less than the preset threshold. The number, as the length of the current picture group. That is to say, the number of pixel ratios of the current judgment scene that are greater than the preset threshold is determined as the length of the current picture group.
  • step S132 determining the length of the picture group in the next judgment scenario (ie step S132) includes step S1323:
  • step 1323 when the pixel ratio of the current judgment scene in all frame images is less than the preset threshold, the next judgment scene is determined according to the preset scene judgment sequence.
  • the next judgment scene for each frame image, according to The number of pixels in the image belonging to the next judgment scene and the total number of pixels in the image determine the length of the picture group.
  • the length of the group is determined according to the above-mentioned steps S131 and S1321-S1322 in the next judgment scenario, the length of the picture group is determined in the next judgment scenario.
  • the method for determining the length of the picture group further includes the following steps: no picture can be determined after traversing all preset scenes.
  • group length determine the length of the picture group as the preset length (Gmax). That is to say, if all preset scenes are traversed according to step S1323, it is found that in each scene, the pixel ratio of the scene in all frame images is less than the corresponding threshold. In this case, the current picture will be The length of the group is set to the preset length (Gmax).
  • a preset neural network model is used to determine the scene to which each pixel of each frame image in the video stream belongs based on the coding information, motion vectors and feature parameters (ie step S12) , including the following steps S121 and S122.
  • step S121 the encoding information, optical flow motion vector and feature parameters are input into the neural network model to obtain the probability of each pixel of each frame image in the video stream under the preset scene.
  • step S122 the scene to which each pixel belongs is determined based on probability.
  • the maximum value of the probability of the pixel in each preset scene is determined.
  • the preset scene corresponding to the maximum value is the scene to which the pixel belongs.
  • the preset scene includes at least two of the following scenes: still scene, switching scene, mixed scene, and high variance scene.
  • scenes are divided into four categories: static scenes, switching scenes, mixed scenes, and high variance scenes.
  • a static scene means that there is basically no change or a very small change between frames
  • a switching scene means a sudden change in content
  • a mixed scene means a mixture of two scenes, one of which is getting darker and darker, and the other scene is getting darker and darker. It is getting brighter and brighter
  • a high variance scene refers to a scene that is unstable, changes greatly and is complex between frames, but is a changing scene other than switching scenes and static scenes, and its variance is very large.
  • the preset scene judgment order is: switching scenes, still scenes, high variance scenes, and mixed scenes. Determining the length of the picture group according to the above scene judgment sequence can quickly and efficiently determine the length of the picture group and improve processing efficiency.
  • the disk redirection process of the embodiment of the present disclosure will be described in detail through a specific example with reference to FIG. 7 .
  • the threshold corresponding to the switching scene is ⁇ 1
  • the threshold corresponding to the stationary scene is ⁇ 2
  • the threshold corresponding to the high variance scene is ⁇ 3
  • the threshold corresponding to the mixed scene is ⁇ 4.
  • the pixel ratio of the switching scene from the Kth frame image to the (K+A-1)th frame image is less than ⁇ 1
  • the pixel ratio of the switching scene of the (K+A)th frame image is greater than ⁇ 1
  • the length of the current picture group G B; starting from the Kth frame image to the end of the (K+Gmax)th frame image , if the pixel ratio of the still scene of all Gmax frame images is less than ⁇ 2, it means that the length of the current picture group cannot be determined in the still scene, and the length of the current picture group is determined in the high variance scene.
  • the range of images in the next picture group is ((K+G)th frame image, (Kth) +G+Gmax) frame image).
  • the encoding information of the video stream is obtained through multiple encodings
  • the results of the multiple encodings are used as the input of subsequent final encoding. Therefore, encoding applications that require high real-time performance cannot be used.
  • Compressed transmission such as on-demand can be used.
  • the encoder used in the embodiment of the present disclosure must encode in picture groups, and the environment also needs to support operations related to neural networks, such as convolution, full connection, etc.
  • Embodiments of the present disclosure first encode the original video to obtain the prediction method (including intra prediction method or inter prediction method), encoded prediction module division, intra prediction mode, and inter prediction motion vector of each frame macroblock. Then the original video streams are put into the optical flow network and the feature extraction network to obtain optical flow features (i.e., optical flow motion vectors) and other feature parameters.
  • optical flow features i.e., optical flow motion vectors
  • the above coding information, optical flow features and other feature parameters are cascaded and put into the semantic segmentation model of the neural network to obtain a feature map with the same resolution as the original video.
  • the number of channels is 4.
  • the value of this feature map is expressed as classification It is the probability of static scenes, switching scenes, mixed scenes and high variance scenes, that is, semantic segmentation of pixel classification.
  • the scene category corresponding to the value with the highest probability is regarded as the scene to which the pixel belongs, and then the proportion of each scene in the frame image is counted.
  • the picture group length determination device includes a processing module 101, a scene determination module 102 and a picture group length determination module 103.
  • the processing module 101 is used to obtain coding information, optical flow motion vectors and feature parameters of the video stream respectively.
  • the scene determination module 102 is configured to use a preset neural network model and determine the scene to which each pixel of each frame image in the video stream belongs based on the coding information, the motion vector and the characteristic parameters. Includes at least two preset scenes.
  • the picture group length determination module 103 is used to, in the current judgment scenario, determine the length of the picture group for each frame of image according to the number of pixels in the image belonging to the current judgment scene and the total number of pixels of the image.
  • the length of the picture group is determined in the next judgment scenario until all the preset scenes are traversed; wherein the current judgment scene is one of the preset scenes, according to The preset scene judgment sequence is determined.
  • the picture group length determination module 103 is used to calculate, for each frame of image in the image range from the Kth frame to the (K+Gmax)th frame, the length of the image belonging to the current judgment scene.
  • the ratio of the number of pixels to the total number of pixels of the image is used to obtain the pixel ratio of the current judgment scene of the image; where the Kth frame image is the next frame image of the last frame image in the previous picture group, and the Gmax is a preset length, and the Gmax is greater than or equal to the number of the preset scenes; the length of the current picture group is determined according to the pixel ratio and the preset threshold of the current judgment scene.
  • the picture group length determination module 103 is configured to, within the image range from the Kth frame to the (K+Gmax)th frame, according to the order of each frame image in the video stream, divide the current frame image into The pixel ratio of the current judgment scene is compared with the preset threshold of the current judgment scene. If the pixel ratio is less than or equal to the preset threshold, the pixel ratio of the current judgment scene of the next frame image is compared. The pixel ratio is compared with the preset threshold, and the comparison is stopped until the pixel ratio is greater than the preset threshold; when the pixel ratio of the current judgment scene is greater than the preset threshold, the comparison is stopped according to the current judgment.
  • the length of the current picture group is determined by comparing the pixel ratio of the scene with the preset threshold.
  • the picture group length determination module 103 is configured to determine the number of pixel ratios of the current judgment scene that are smaller than the preset threshold as the length of the current picture group.
  • the picture group length determination module 103 is configured to determine the next step according to the preset scene judgment sequence when the pixel ratio of the current judgment scene in all frame images is less than the preset threshold.
  • a judgment field scene in the next judgment scene, for each frame image, determine the length of the picture group according to the number of pixels in the image belonging to the next judgment scene and the total number of pixels in the image.
  • the picture group length determination module 103 is further configured to determine the length of the picture group to be the preset length when the length of the picture group cannot be determined by traversing all the preset scenes.
  • the scene determination module 102 is configured to input the encoding information, the optical flow motion vector and the characteristic parameters into the neural network model to obtain the position of each pixel of each frame image in the video stream.
  • the probability in the preset scene determine the scene to which each pixel belongs based on the probability.
  • the preset scene includes at least two of the following scenes: still scene, switching scene, mixed scene, and high variance scene.
  • the preset scene judgment order is: switching scenes, still scenes, high variance scenes, mixed scenes. Scenes.
  • Embodiments of the present disclosure also provide a computer device.
  • the computer device includes: one or more processors and a storage device; wherein one or more programs are stored on the storage device.
  • the one or more programs are used by the above-mentioned one When executed by or multiple processors, the above one or more processors implement the picture group length determination method as provided in the foregoing embodiments.
  • Embodiments of the present disclosure also provide a computer-readable medium on which a computer program is stored, wherein when the computer program is executed, the method for determining the length of a picture group as provided in the foregoing embodiments is implemented.
  • Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
  • Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a general illustrative sense only and not for purpose of limitation. In some instances, it will be apparent to those skilled in the art that features, characteristics and/or elements described in connection with a particular embodiment may be used alone, or may be used in conjunction with other embodiments, unless expressly stated otherwise. Features and/or components are used in combination. Accordingly, it will be understood by those skilled in the art that various changes and details may be made without departing from the scope of the invention as set forth in the appended claims. Change.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本公开提供一种图片组长度确定方法,分别获取视频流的编码信息、光流运动向量和特征参数;利用预设的神经网络模型,根据编码信息、运动向量和特征参数,确定视频流中各帧图像的各像素所属的场景,场景包括至少两个预设场景;在当前判断场景下,针对各帧图像,根据图像中属于当前判断场景的像素数量和图像的像素总数,确定图片组的长度,在无法确定出图片组的长度的情况下,在下一个判断场景下确定图片组的长度,直到全部预设场景均遍历完成。本公开还提供一种图片组长度确定装置、计算机设备和可读介质。

Description

图片组长度确定方法、装置、计算机设备及可读介质
相关申请的交叉引用
本申请要求于2022年8月1日提交的名称为“图片组长度确定方法、装置、计算机设备及可读介质”的中国专利申请CN202210919825.8的优先权,其全部内容通过引用并入本文。
技术领域
本公开涉及视频编解码技术领域,具体涉及一种图片组长度确定方法、装置、计算机设备及可读介质。
背景技术
随着信息时代的来临,视频、音频、文件等等信息的载体数量越来越多,特别是对于视频而言,现在的采集设备甚至已经支持到8K分辨率,在这样的分辨率下,一个原始视频文件如果要按原始大小传输或者保存,对存储和传输的要求甚高,因此对视频编解码技术的要求也越来越高。传统的视频编解码已经发展了三十多年,从最开始的H260、MPEG1到现在最新的H266、MPEG(Moving Picture Experts Group,动态图像专家组)5、AV1、AVS3等,它们都是基于混合框架的编解码协议,将视频拆分为多个GOP(Group Of Picture,图片组),每个GOP由多帧组成,每帧再向下划分为Slice、Tile、宏块、预测模块和变换模块,使用预测(Prediction)、变换(Transform)、量化(Quantization)、滤波(Filter)和熵编码(Entropy Coding)等技术将视频压缩,其编码流程如图1所示,其中预测是将空间和时间维度上的冗余去除,让同一帧其他地方的相似重建块或者其他帧的相似重建块减去原始块的像素得到残差块,将残差块输入到变换得到非零值集中在一定区域的变换系数并送入量化模块得到仅有的几个量化值联合其他信息比如运动矢量、帧内预测模式等给到熵编码模块进行编码得到压缩率低的码流。随着技术的更替,能在更低的压缩率下达到更高质量。
然而,现有的视频编码器多数都设置了固定GOP长度,导致码率和编码质量参数较差,如峰值信噪比、VMAF(Video Multimethod Assessment Fusion,视频多方法评估融合)较低。
发明内容
本公开提供一种图片组长度确定方法、装置、计算机设备和可读介质。
在本公开的一方面中,提供了一种图片组长度确定方法,所述方法包括:分别获取视频流的编码信息、光流运动向量和特征参数;利用预设的神经网络模型,根据所述编码信息、所述运动向量和所述特征参数,确定所述视频流中各帧图像的各像素所属的场景,所述场景包括至少两个预设场景;以及,在当前判断场景下,针对各帧图像,根据所述图像中属于所述当前判断场景的像素数量和所述图像的像素总数,确定图片组的长度,在无法确定出图片组的长度的情况下,在下一个判断场景下确定图片组的长度,直到全部所述预设场景均遍历完成;其中,所述当前判断场景为所述预设场景中的一个,根据预设的场景判断顺序确定。
在本公开的另一方面中,提供了一种图片组长度确定装置,包括处理模块、场景确定模 块和图片组长度确定模块,所述处理模块配置为,分别获取视频流的编码信息、光流运动向量和特征参数;
所述场景确定模块配置为,利用预设的神经网络模型,根据所述编码信息、所述运动向量和所述特征参数,确定所述视频流中各帧图像的各像素所属的场景,所述场景包括至少两个预设场景;
所述图片组长度确定模块配置为,在当前判断场景下,针对各帧图像,根据所述图像中属于所述当前判断场景的像素数量和所述图像的像素总数,确定图片组的长度,在无法确定出图片组的长度的情况下,在下一个判断场景下确定图片组的长度,直到全部所述预设场景均遍历完成;其中,所述当前判断场景为所述预设场景中的一个,根据预设的场景判断顺序确定。
在本公开的又一方面中,提供了一种计算机设备,包括:一个或多个处理器;存储装置,其上存储有一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如前所述的图片组长度确定方法。
在本公开的再一方面中,提供了一种计算机可读介质,其上存储有计算机程序,其中,所述程序被执行时实现如前所述的图片组长度确定方法。
附图说明
图1为相关技术中H266混合编码框架的示意图;
图2为根据本公开实施例的图片组长度确定方法的流程示意图;
图3为根据本公开实施例的确定视频流中各帧图像各像素所属的场景的流程示意图;
图4为根据本公开实施例的确定图片组的长度的流程示意图一;
图5为根据本公开实施例的确定图片组的长度的流程示意图二;
图6为根据本公开实施例的确定视频流中各帧图像的各像素所属的场景流程示意图;
图7为根据本公开一具体实例的图片组长度确定流程示意图;
图8为根据本公开实施例的图片组长度确定装置的结构示意图。
具体实施方式
在下文中将参考附图更充分地描述示例实施例,但是所述示例实施例可以以不同形式来体现且不应当被解释为限于本文阐述的实施例。反之,提供这些实施例的目的在于使本公开透彻和完整,并将使本领域技术人员充分理解本公开的范围。
如本文所使用的,术语“和/或”包括一个或多个相关列举条目的任何和所有组合。
本文所使用的术语仅用于描述特定实施例,且不意欲限制本公开。如本文所使用的,单数形式“一个”和“该”也意欲包括复数形式,除非上下文另外清楚指出。还将理解的是,当本说明书中使用术语“包括”和/或“由……制成”时,指定存在所述特征、整体、步骤、操作、元件和/或组件,但不排除存在或添加一个或多个其他特征、整体、步骤、操作、元件、组件和/或其群组。
本文所述实施例可借助本公开的理想示意图而参考平面图和/或截面图进行描述。因此,可根据制造技术和/或容限来修改示例图示。因此,实施例不限于附图中所示的实施例,而是包括基于制造工艺而形成的配置的修改。因此,附图中例示的区具有示意性属性,并且图中 所示区的形状例示了元件的区的具体形状,但并不旨在是限制性的。
除非另外限定,否则本文所用的所有术语(包括技术和科学术语)的含义与本领域普通技术人员通常理解的含义相同。还将理解,诸如那些在常用字典中限定的那些术语应当被解释为具有与其在相关技术以及本公开的背景下的含义一致的含义,且将不解释为具有理想化或过度形式上的含义,除非本文明确如此限定。
现有的视频编码器多数都设置了固定的图片组长度,各个编码器根据经验对不同场景下的视频源的图片组长度采用经验值,在一个视频里对于不同情景图片组的长度都一样,这种图片组固定长度的设置使得编码器不能充分考虑视频的信息,导致码率和编码质量参数较差。
相关技术中可以根据场景切换重置I帧,但是由于场景检测方法不够可靠,最终的效果也差强人意。对于没有实时要求的应用,由于编解码领域的二次或者多次编码能提供前次或者前几次的编码信息给后续编码使用,因此对于二次或者多次编码的研究也有了一定的进展。如果能将神经网络和二次编码结合对视频的内容进行场景分类最后确定图片组长度是一个不错的方向。
本公开实施例提供一种图片组长度确定方法,如图2所示,所述方法包括以下步骤S11至S13。
在步骤S11中,分别获取视频流的编码信息、光流运动向量和特征参数。
在本步骤中,可对视频流中的各帧图像进行至少一次编码,获取例如5个通道的编码信息。编码信息可包括:预测方式、预测块划分、帧内预测模式、帧间预测的运动向量(Motion Vector,MV)。在一些实施例中,编码的图片组采用固定长度,对第一次编码的过程保留编码过程中内容,如编码采用的帧内预测方式还是帧间预测方式,分别用0和1表示,组成一个通道;预测块的划分,使用1表示预测块边界,0表示非边界,组成一个通道;帧内预测的预测模式组成一个通道;帧间预测的运动向量组成两个通道。需要说明的是,也可以通过多次编码获取编码信息。需要说明的是,由编码获得的编码信息不一定仅仅包含上述5个通道对应的内容,只要是编码器能得到的编码信息都可以采用。
在本步骤中,对视频流中各帧视频图像提取光流运动向量,得到水平方向和垂直方向的光流运动向量,每个方向的光流运动向量分别形成一个通道。可以将视频的光流通过传统算法或者神经网络算法得到光流运动向量,形成两个通道(包含水平和垂直方向的运动向量)。
在本步骤中,可利用特征提取网络(如分类网络的VGG16的特征部分网络等),对视频流中各帧视频图像进行特征提取,得到预设数量(C个)通道的特征参数。
步骤S12,利用预设的神经网络模型,根据编码信息、运动向量和特征参数,确定视频流中各帧图像的各像素所属的场景,场景包括至少两个预设场景。
图像的像素级分类也即图像的语义分割,在本步骤中,需要确定每个像素的分类情况(即像素所述的场景)。
图3为本公开实施例提取图像的像素级分类的过程,如图3所示,将步骤S11中得到的(C+7)个通道数据(即编码信息、光流运动向量和特征参数)输入一个语义分割模型,该语义分割模型为神经网络模型,得到和视频流中图像大小完全相同的4个通道组成的概率特征图,每个通道分别代表该像素为某个预设场景下的概率,概率最大的场景为该像素所属的场景。需要说明的是,每一帧的每一个像素都有对应的所属场景类型,每一帧不仅仅只有一个场景,可能四个场景都存在。
在进行语义分割模型训练时需要确定训练时的标签,该语义分割任务的标签可以人为标定,即采集多个视频,对视频中的每个像素都预先指导其所处的场景的类别。需要说明的是,模型训练中的标签也可以不由人为标定,而是将通过较为优秀的算法得到的结果作为标签。该语义分割模型在训练时的损失函数可以是交叉熵,也可以是交叉熵的改进方法。
步骤S13,在当前判断场景下,针对各帧图像,根据图像中属于当前判断场景的像素数量和图像的像素总数,确定图片组的长度,在无法确定出图片组的长度的情况下,在下一个判断场景下确定图片组的长度,直到全部预设场景均遍历完成;其中,当前判断场景为预设场景中的一个,根据预设的场景判断顺序确定。
在本步骤中,基于判断场景确定图片组的长度,若当前判断场景无法确定出图片组的长度,则在下一个判断场景下确定图片组的长度;在一个判断场景下,基于图像逐帧进行判断,根据图像的当前判断场景的像素比例和预设阈值确定图片组的长度,其中,图像的当前判断场景的像素比例为图像中属于当前判断场景的像素数量和图像的像素总数的比值。
本公开实施例提供的图片组长度确定方法,分别获取视频流的编码信息、光流运动向量和特征参数;利用预设的神经网络模型,根据编码信息、运动向量和特征参数,确定视频流中各帧图像的各像素所属的场景,场景包括至少两个预设场景;在当前判断场景下,针对各帧图像,根据图像中属于当前判断场景的像素数量和图像的像素总数,确定图片组的长度,在无法确定出图片组的长度的情况下,在下一个判断场景下确定图片组的长度,直到全部预设场景均遍历完成;其中,当前判断场景为预设场景中的一个,根据预设的场景判断顺序确定。本公开实施例将对视频内容信息自适应编码与神经网络相结合,对视频的内容进行场景分类确定图片组长度,图片组的长度能随着视频内容变化而改变,提高编码质量和码率,降低压缩码流的大小。
在一些实施例中,如图4所示,每个判断场景分别对应一个预设阈值。所述针对各帧图像,根据图像中属于当前判断场景的像素数量和图像的像素总数,确定图片组的长度(即步骤S13),包括以下步骤S131和S132。
在步骤S131中,在从第K帧到第(K+Gmax)帧的图像范围内,针对各帧图像,计算各帧图像中属于当前判断场景的像素数量和图像的像素总数的比值,得到图像的当前判断场景的像素比例;其中,第K帧图像为前一个图片组中最后一帧图像的下一帧图像,Gmax为预设长度(预设值),Gmax大于或等于预设场景的数量。
在本步骤中,针对每个判断场景,在预设长度Gmax的图像范围内确定图片组的长度,该帧图像范围为从第K帧图像开始,到第(K+Gmax)帧图像结束,第K帧图像是前一个图片组中最后一帧图像的下一帧图像,也是当前图片组的第一帧图像。预设长度Gmax是图片组的最大长度,Gmax最小值为预设场景的数量;Gmax最大值可以为无限大,在一些实施例中,可以设置为64。
针对每帧图像,根据以下公式(1)计算图像的当前判断场景的像素比例:
p=Pi/P          (1)
其中,p为图像的当前判断场景i的像素比例,Pi为图像中属于当前判断场景i的像素数量,P为图像的像素总数。
在步骤S132中,根据像素比例和当前判断场景的预设阈值,确定当前图片组的长度。
在一些实施例中,如图5所示,所述根据像素比例和当前判断场景的预设阈值,确定当 前图片组的长度(即步骤S132),包括以下步骤S1321至S1323。
在步骤S1321中,在从第K帧到第(K+Gmax)帧的图像范围内,按照视频流中各帧图像的顺序,将当前帧图像的当前判断场景的像素比例与当前判断场景的预设阈值相比较,在像素比例小于或等于预设阈值的情况下,将下一帧图像的当前判断场景的像素比例与预设阈值相比较,直到像素比例大于预设阈值时停止比较。
在步骤S1322中,在当前判断场景的像素比例大于预设阈值的情况下,根据当前判断场景的像素比例和预设阈值的比较结果确定当前图片组的长度。
也就是说,在从第K帧图像开始,到第(K+Gmax)帧图像结束的图像范围内,逐帧执行以下操作:将当前帧图像的当前判断场景的像素比例与当前判断场景的预设阈值相比较,若前者小于或等于后者,则针对下一帧图像执行上述比较操作,以此类推,直到像素比例大于预设阈值时停止比较,此时,可以根据当前判断场景的像素比例和预设阈值的比较结果确定出当前图片组的长度,即在当前判断场景下确定出当前图片组的长度,无需再在下一个判断场景下执行上述处理步骤。
在一些实施例中,所述根据当前判断场景的像素比例和预设阈值的比较结果确定当前图片组的长度(即步骤S1322),包括以下步骤:确定小于预设阈值的当前判断场景的像素比例的数量,作为当前图片组的长度。也就是说,将大于预设阈值之前的当前判断场景的像素比例的数量,确定为当前图片组的长度。
在一些实施例中,如图6所示,所述在无法确定出图片组的长度的情况下,在下一个判断场景下确定图片组的长度(即步骤S132),包括步骤S1323:
在步骤1323中,在全部帧图像的当前判断场景的像素比例均小于预设阈值的情况下,根据预设的场景判断顺序确定下一个判断场景,在下一个判断场景下,针对各帧图像,根据图像中属于下一个判断场景的像素数量和图像的像素总数,确定图片组的长度。
在从第K帧图像开始,到第(K+Gmax)帧图像结束的图像范围内,若全部帧图像的当前判断场景的像素比例均小于预设阈值,说明在当前判断场景下无法确定出图片组的长度,则在下一个判断场景下,根据上述步骤S131、S1321-S1322,在下一个判断场景下确定图片组的长度。
在一些实施例中,在确定视频流中各帧图像的各像素所属的场景(即步骤S12)之后,所述图片组长度确定方法还包括以下步骤:在遍历全部预设场景均无法确定出图片组的长度的情况下,确定图片组的长度为预设长度(Gmax)。也就是说,若按照步骤S1323将全部预设场景均遍历完成之后,发现在每个场景下,全部帧图像的该场景的像素比例均小于相应的阈值,在这种情况下,将当前的图片组的长度设置为预设长度(Gmax)。
在一些实施例中,如图5所示,所述利用预设的神经网络模型,根据编码信息、运动向量和特征参数,确定视频流中各帧图像的各像素所属的场景(即步骤S12),包括以下步骤S121和S122。
在步骤S121中,将编码信息、光流运动向量和特征参数输入神经网络模型,得到视频流中各帧图像的各像素在所述预设场景下的概率。
在步骤S122中,根据概率确定各像素所属的场景。
针对每个像素,确定该像素在各个预设场景下的概率的最大值,该最大值对应的预设场景即为该像素所属的场景。
在一些实施例中,预设场景包括以下场景中的至少两个:静止场景、切换场景、混合场景、高方差场景。
在公开实施例中,场景分为四类:静止场景、切换场景、混合场景、高方差场景。其中,静止场景表示帧与帧之间基本没有变化或者变化的很小;切换场景是指内容突然发生变化;混合场景是指两个场景的混合,其中一个场景亮度越来越暗,另外一个场景越来越亮;高方差场景是指场景不稳定,帧之间变化很大也很复杂,但是除了切换场景和静止场景之外的一种变化场景,其方差很大。
在一些实施例中,在预设场景包括静止场景、切换场景、混合场景和高方差场景的情况下,预设的场景判断顺序为:切换场景、静止场景、高方差场景、混合场景。按照上述场景判断顺序确定图片组的长度,可以快速、高效确定出图片组的长度,提高处理效率。
为清楚说明本公开实施例的方案,以下结合图7,通过一具体实例对本公开实施例的磁盘重定向过程进行详细说明。在本具体实例中,切换场景对应的阈值为λ1,静止场景对应的阈值为λ2,高方差场景对应的阈值为λ3,混合场景对应的阈值为λ4。
如图7所示,首先确定一个图片组长度的最大值Gmax。在切换场景下,从第K帧图像开始,到第(K+Gmax)帧图像结束的范围内,若第K帧图像到第(K+A-1)帧图像的切换场景的像素比例均小于λ1,且第(K+A)帧图像的切换场景的像素比例大于λ1,则当前图片组的长度G=A;从第K帧图像开始,到第(K+Gmax)帧图像结束的范围内,若全部的Gmax帧图像的切换场景的像素比例均小于λ1,说明在切换场景下无法确定出当前图片组的长度,则在静止场景下确定当前图片组的长度。
在静止场景下,从第K帧图像开始,到第(K+Gmax)帧图像结束的范围内,若第K帧图像到第(K+B-1)帧图像的静止场景的像素比例均小于λ2,且第(K+B)帧图像的静止场景的像素比例大于λ2,则当前图片组的长度G=B;从第K帧图像开始,到第(K+Gmax)帧图像结束的范围内,若全部的Gmax帧图像的静止场景的像素比例均小于λ2,说明在静止场景下无法确定出当前图片组的长度,则在高方差场景下确定当前图片组的长度。
在高方差场景下,从第K帧图像开始,到第(K+Gmax)帧图像结束的范围内,若第K帧图像到第(K+C-1)帧图像的高方差场景的像素比例均小于λ3,且第(K+C)帧图像的高方差场景的像素比例大于λ3,则当前图片组的长度G=C;从第K帧图像开始,到第(K+Gmax)帧图像结束的范围内,若全部的Gmax帧图像的高方差场景的像素比例均小于λ3,说明在高方差场景下无法确定出当前图片组的长度,则在混合场景下确定当前图片组的长度。
在混合场景下,从第K帧图像开始,到第(K+Gmax)帧图像结束的范围内,若第K帧图像到第(K+D-1)帧图像的混合场景的像素比例均小于λ4,且第(K+D)帧图像的混合场景的像素比例大于λ4,则当前图片组的长度G=D;从第K帧图像开始,到第(K+Gmax)帧图像结束的范围内,若全部的Gmax帧图像的混合场景的像素比例均小于λ4,说明在混合场景下无法确定出当前图片组的长度,则当前图片组的长度G=Gmax。
最后,设定当前图片组的长度为G,并令K=K+G,确定下一图片组的长度,下一图片组的图像的范围为(第(K+G)帧图像,第(K+G+Gmax)帧图像)。
本公开实施例在通过多次编码获取视频流的编码信息的情况下,多次编码的结果作为后续最终编码的输入,因此对实时性要求高的编码应用不能采用,对于本地存储或者不限时传输的压缩传输如点播等可以运用。
本公开实施例所应用的编码器必须以图片组为单位编码,且环境中也需要支持和神经网络相关的操作,比如卷积、全连接等。
本公开实施例首先对原始视频进行编码,获取每帧宏块的预测方式(包括帧内预测方式或帧间预测方式)、编码的预测模块划分、帧内预测模式、帧间预测的运动向量。然后将原始视频流分别放入光流网络、特征提取网络得到光流特征(即光流运动向量)和其他特征参数。将上述的编码信息、光流特征与其他特征参数级联后放入神经网络的语义分割模型中得到一个和原始视频分辨率相同的特征图,通道数为4,该特征图的值表示为分类为静止场景、切换场景、混合场景和高方差场景的概率,也即做到像素分类的语义分割。将上述概率最大的值对应的场景类别作为该像素所属的场景,然后统计每个场景在所在帧图像中所占的比例。设置一个图片组的最大值Gmax,从第K帧到第(K+Gmax)帧内若有第(K+A)帧切换场景的像素比例大于该场景对应的阈值,则图片组的长度即为A,否则,如果在静止场景下,直到第K+B帧,相应场景在所在帧图像中的像素比例大于相应的阈值,则图片组的长度即为B。高方差场景和混合场景以此类推。
基于相同的技术构思,本公开实施例还提供一种图片组长度确定装置,如图8所示,所述图片组长度确定装置包括处理模块101、场景确定模块102和图片组长度确定模块103,处理模块101用于,分别获取视频流的编码信息、光流运动向量和特征参数。
场景确定模块102用于,利用预设的神经网络模型,根据所述编码信息、所述运动向量和所述特征参数,确定所述视频流中各帧图像的各像素所属的场景,所述场景包括至少两个预设场景。
图片组长度确定模块103用于,在当前判断场景下,针对各帧图像,根据所述图像中属于所述当前判断场景的像素数量和所述图像的像素总数,确定图片组的长度,在无法确定出图片组的长度的情况下,在下一个判断场景下确定图片组的长度,直到全部所述预设场景均遍历完成;其中,所述当前判断场景为所述预设场景中的一个,根据预设的场景判断顺序确定。
在一些实施例中,图片组长度确定模块103用于,在从第K帧到第(K+Gmax)帧的图像范围内,针对各帧图像,计算所述图像中属于所述当前判断场景的像素数量和所述图像的像素总数的比值,得到所述图像的所述当前判断场景的像素比例;其中,第K帧图像为前一个图片组中最后一帧图像的下一帧图像,所述Gmax为预设长度,且所述Gmax大于或等于所述预设场景的数量;根据所述像素比例和所述当前判断场景的预设阈值,确定当前图片组的长度。
在一些实施例中,图片组长度确定模块103配置为,在从第K帧到第(K+Gmax)帧的图像范围内,按照所述视频流中各帧图像的顺序,将当前帧图像的所述当前判断场景的像素比例与所述当前判断场景的预设阈值相比较,在所述像素比例小于或等于所述预设阈值的情况下,将下一帧图像的所述当前判断场景的像素比例与所述预设阈值相比较,直到所述像素比例大于所述预设阈值时停止比较;在所述当前判断场景的像素比例大于所述预设阈值的情况下,根据所述当前判断场景的像素比例和所述预设阈值的比较结果确定当前图片组的长度。
在一些实施例中,图片组长度确定模块103配置为,确定小于所述预设阈值的所述当前判断场景的像素比例的数量,作为当前图片组的长度。
在一些实施例中,图片组长度确定模块103配置为,在全部帧图像的所述当前判断场景的像素比例均小于所述预设阈值的情况下,根据所述预设的场景判断顺序确定下一个判断场 景,在所述下一个判断场景下,针对各帧图像,根据所述图像中属于所述下一个判断场景的像素数量和所述图像的像素总数,确定图片组的长度。
在一些实施例中,图片组长度确定模块103还配置为,在遍历全部所述预设场景均无法确定出图片组的长度的情况下,确定图片组的长度为所述预设长度。
在一些实施例中,场景确定模块102配置为,将所述编码信息、所述光流运动向量和所述特征参数输入所述神经网络模型,得到所述视频流中各帧图像的各像素在所述预设场景下的概率;根据所述概率确定所述各像素所属的场景。
在一些实施例中,所述预设场景包括以下场景中的至少两个:静止场景、切换场景、混合场景、高方差场景。
在一些实施例中,在所述预设场景包括静止场景、切换场景、混合场景和高方差场景的情况下,所述预设的场景判断顺序为:切换场景、静止场景、高方差场景、混合场景。
本公开实施例还提供了一种计算机设备,该计算机设备包括:一个或多个处理器以及存储装置;其中,存储装置上存储有一个或多个程序,当上述一个或多个程序被上述一个或多个处理器执行时,使得上述一个或多个处理器实现如前述各实施例所提供的图片组长度确定方法。
本公开实施例还提供了一种计算机可读介质,其上存储有计算机程序,其中,该计算机程序被执行时实现如前述各实施例所提供的图片组长度确定方法。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
本文已经公开了示例实施例,并且虽然采用了具体术语,但它们仅用于并仅应当被解释为一般说明性含义,并且不用于限制的目的。在一些实例中,对本领域技术人员显而易见的是,除非另外明确指出,否则可单独使用与特定实施例相结合描述的特征、特性和/或元素,或可与其他实施例相结合描述的特征、特性和/或元件组合使用。因此,本领域技术人员将理解,在不脱离由所附的权利要求阐明的本发明的范围的情况下,可进行各种形式和细节上的 改变。

Claims (12)

  1. 一种图片组长度确定方法,所述方法包括:
    分别获取视频流的编码信息、光流运动向量和特征参数;
    利用预设的神经网络模型,根据所述编码信息、所述运动向量和所述特征参数,确定所述视频流中各帧图像的各像素所属的场景,所述场景包括至少两个预设场景;
    在当前判断场景下,针对各帧图像,根据所述图像中属于所述当前判断场景的像素数量和所述图像的像素总数,确定图片组的长度,在无法确定出图片组的长度的情况下,在下一个判断场景下确定图片组的长度,直到全部所述预设场景均遍历完成;其中,所述当前判断场景为所述预设场景中的一个,根据预设的场景判断顺序确定。
  2. 如权利要求1所述的方法,其中,每个所述判断场景分别对应一个预设阈值,所述针对各帧图像,根据所述图像中属于所述当前判断场景的像素数量和所述图像的像素总数,确定图片组的长度,包括:
    在从第K帧到第(K+Gmax)帧的图像范围内,针对各帧图像,计算所述图像中属于所述当前判断场景的像素数量和所述图像的像素总数的比值,得到所述图像的所述当前判断场景的像素比例;其中,第K帧图像为前一个图片组中最后一帧图像的下一帧图像,所述Gmax为预设长度,且所述Gmax大于或等于所述预设场景的数量;
    根据所述像素比例和所述当前判断场景的预设阈值,确定当前图片组的长度。
  3. 如权利要求2所述的方法,其中,所述根据所述像素比例和所述当前判断场景的预设阈值,确定当前图片组的长度,包括:
    在从第K帧到第(K+Gmax)帧的图像范围内,按照所述视频流中各帧图像的顺序,将当前帧图像的所述当前判断场景的像素比例与所述当前判断场景的预设阈值相比较,在所述像素比例小于或等于所述预设阈值的情况下,将下一帧图像的所述当前判断场景的像素比例与所述预设阈值相比较,直到所述像素比例大于所述预设阈值时停止比较;
    在所述当前判断场景的像素比例大于所述预设阈值的情况下,根据所述当前判断场景的像素比例和所述预设阈值的比较结果确定当前图片组的长度。
  4. 如权利要求3所述的方法,其中,所述根据所述当前判断场景的像素比例和所述预设阈值的比较结果确定当前图片组的长度,包括:
    确定小于所述预设阈值的所述当前判断场景的像素比例的数量,作为所述当前图片组的长度。
  5. 如权利要求3所述的方法,其中,所述在无法确定出图片组的长度的情况下,在下一个判断场景下确定图片组的长度,包括:
    在全部帧图像的所述当前判断场景的像素比例均小于所述预设阈值的情况下,根据所述预设的场景判断顺序确定下一个判断场景,在所述下一个判断场景下,针对各帧图像,根据 所述图像中属于所述下一个判断场景的像素数量和所述图像的像素总数,确定图片组的长度。
  6. 如权利要求2所述的方法,其中,在确定所述视频流中各帧图像的各像素所属的场景之后,所述方法还包括:
    在遍历全部所述预设场景均无法确定出图片组的长度的情况下,确定图片组的长度为所述预设长度。
  7. 如权利要求1所述的方法,其中,所述利用预设的神经网络模型,根据所述编码信息、所述运动向量和所述特征参数,确定所述视频流中各帧图像的各像素所属的场景,包括:
    将所述编码信息、所述光流运动向量和所述特征参数输入所述神经网络模型,得到所述视频流中各帧图像的各像素在所述预设场景下的概率;
    根据所述概率确定所述各像素所属的场景。
  8. 如权利要求1-7任一项所述的方法,其中,所述预设场景包括以下场景中的至少两个:静止场景、切换场景、混合场景、高方差场景。
  9. 如权利要求8所述的方法,其中,在所述预设场景包括静止场景、切换场景、混合场景和高方差场景的情况下,所述预设的场景判断顺序为:切换场景、静止场景、高方差场景、混合场景。
  10. 一种图片组长度确定装置,包括处理模块、场景确定模块和图片组长度确定模块,
    所述处理模块配置为分别获取视频流的编码信息、光流运动向量和特征参数;
    所述场景确定模块配置为,利用预设的神经网络模型,根据所述编码信息、所述运动向量和所述特征参数,确定所述视频流中各帧图像的各像素所属的场景,所述场景包括至少两个预设场景;
    所述图片组长度确定模块配置为,在当前判断场景下,针对各帧图像,根据所述图像中属于所述当前判断场景的像素数量和所述图像的像素总数,确定图片组的长度,在无法确定出图片组的长度的情况下,在下一个判断场景下确定图片组的长度,直到全部所述预设场景均遍历完成;其中,所述当前判断场景为所述预设场景中的一个,根据预设的场景判断顺序确定。
  11. 一种计算机设备,包括:
    一个或多个处理器;
    存储装置,其上存储有一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1-9任一项所述的图片组长度确定方法。
  12. 一种计算机可读介质,其上存储有计算机程序,其中,所述程序被执行时实现如权利要求1-9任一项所述的图片组长度确定方法。
PCT/CN2023/110200 2022-08-01 2023-07-31 图片组长度确定方法、装置、计算机设备及可读介质 WO2024027639A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210919825.8 2022-08-01
CN202210919825.8A CN117544770A (zh) 2022-08-01 2022-08-01 图片组长度确定方法、装置、计算机设备及可读介质

Publications (1)

Publication Number Publication Date
WO2024027639A1 true WO2024027639A1 (zh) 2024-02-08

Family

ID=89790538

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/110200 WO2024027639A1 (zh) 2022-08-01 2023-07-31 图片组长度确定方法、装置、计算机设备及可读介质

Country Status (2)

Country Link
CN (1) CN117544770A (zh)
WO (1) WO2024027639A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170099485A1 (en) * 2011-01-28 2017-04-06 Eye IO, LLC Encoding of Video Stream Based on Scene Type
CN108574843A (zh) * 2017-03-14 2018-09-25 安讯士有限公司 确定用于视频编码的gop长度的方法和编码器系统
CN112347996A (zh) * 2020-11-30 2021-02-09 上海眼控科技股份有限公司 一种场景状态判断方法、装置、设备及存储介质
CN113496208A (zh) * 2021-05-20 2021-10-12 华院计算技术(上海)股份有限公司 视频的场景分类方法及装置、存储介质、终端
WO2022042156A1 (zh) * 2020-08-27 2022-03-03 百果园技术(新加坡)有限公司 基于场景切换的图像组划分方法及装置、视频编码方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170099485A1 (en) * 2011-01-28 2017-04-06 Eye IO, LLC Encoding of Video Stream Based on Scene Type
CN108574843A (zh) * 2017-03-14 2018-09-25 安讯士有限公司 确定用于视频编码的gop长度的方法和编码器系统
WO2022042156A1 (zh) * 2020-08-27 2022-03-03 百果园技术(新加坡)有限公司 基于场景切换的图像组划分方法及装置、视频编码方法及装置
CN112347996A (zh) * 2020-11-30 2021-02-09 上海眼控科技股份有限公司 一种场景状态判断方法、装置、设备及存储介质
CN113496208A (zh) * 2021-05-20 2021-10-12 华院计算技术(上海)股份有限公司 视频的场景分类方法及装置、存储介质、终端

Also Published As

Publication number Publication date
CN117544770A (zh) 2024-02-09

Similar Documents

Publication Publication Date Title
CN108632625B (zh) 一种视频编码方法、视频解码方法和相关设备
EP3598758B1 (en) Encoder decisions based on results of hash-based block matching
EP3389276B1 (en) Hash-based encoder decisions for video coding
US11743475B2 (en) Advanced video coding method, system, apparatus, and storage medium
IL227674A (en) Encoding a video stream based on scene type
US12041243B2 (en) Systems and methods for compressing video
US20130235935A1 (en) Preprocessing method before image compression, adaptive motion estimation for improvement of image compression rate, and method of providing image data for each image type
Peixoto et al. Fast H. 264/AVC to HEVC transcoding based on machine learning
CA2689441C (en) A system and method for time optimized encoding
US20130251033A1 (en) Method of compressing video frame using dual object extraction and object trajectory information in video encoding and decoding process
US20150249829A1 (en) Method, Apparatus and Computer Program Product for Video Compression
CN114157870A (zh) 编码方法、介质及电子设备
WO2024027639A1 (zh) 图片组长度确定方法、装置、计算机设备及可读介质
TW202133618A (zh) 編解碼器及編解碼方法、電腦程式產品、非暫態性存儲介質
CN108024111A (zh) 一种帧类型判定方法及装置
US20150304686A1 (en) Systems and methods for improving quality of color video streams
EP4447441A1 (en) Video processing method and related device
RU2801266C2 (ru) Способ для кодирования изображения на основе внутреннего прогнозирования с использованием mpm-списка и оборудование для этого
Wang et al. High Performant AV1 for VOD Applications
Asfandyar et al. Accelerated CU decision based on enlarged CU sizes for HEVC UHD videos
US20230269380A1 (en) Encoding method, decoding method, encoder, decoder and storage medium
EP3606070A1 (en) Systems, methods, and apparatuses for video processing
JP2024535840A (ja) ループフィルタリング方法、ビデオ復号方法、ビデオ符号化方法、ループフィルタリング装置、ビデオ復号装置、ビデオ符号化装置、電子機器及びコンピュータプログラム
CN117319688A (zh) 视频数据的编码方法、计算机设备及存储介质
CN111901605A (zh) 视频处理方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23849346

Country of ref document: EP

Kind code of ref document: A1