WO2021174878A1 - 视频编码方法、装置、计算机设备及存储介质 - Google Patents
视频编码方法、装置、计算机设备及存储介质 Download PDFInfo
- Publication number
- WO2021174878A1 WO2021174878A1 PCT/CN2020/124536 CN2020124536W WO2021174878A1 WO 2021174878 A1 WO2021174878 A1 WO 2021174878A1 CN 2020124536 W CN2020124536 W CN 2020124536W WO 2021174878 A1 WO2021174878 A1 WO 2021174878A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- video segment
- image processing
- original
- original video
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000012545 processing Methods 0.000 claims abstract description 122
- 238000004458 analytical method Methods 0.000 claims abstract description 33
- 238000013139 quantization Methods 0.000 claims description 72
- 230000033001 locomotion Effects 0.000 claims description 24
- 230000011218 segmentation Effects 0.000 claims description 19
- 238000010801 machine learning Methods 0.000 claims description 18
- 238000012549 training Methods 0.000 claims description 17
- 230000004044 response Effects 0.000 claims description 15
- 239000012634 fragment Substances 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 5
- 238000013473 artificial intelligence Methods 0.000 abstract description 26
- 238000005516 engineering process Methods 0.000 description 29
- 238000010586 diagram Methods 0.000 description 17
- 230000006835 compression Effects 0.000 description 12
- 238000007906 compression Methods 0.000 description 12
- 238000004891 communication Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 9
- 238000005070 sampling Methods 0.000 description 9
- 230000002093 peripheral effect Effects 0.000 description 8
- 230000000694 effects Effects 0.000 description 6
- 230000006872 improvement Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000004438 eyesight Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 239000011435 rock Substances 0.000 description 1
- 239000004984 smart glass Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234381—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the temporal resolution, e.g. decreasing the frame rate by frame skipping
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/105—Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/117—Filters, e.g. for pre-processing or post-processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/132—Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/154—Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/179—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scene or a shot
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/186—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
Definitions
- the embodiments of the present application relate to the field of video processing technologies, and in particular, to a video encoding method, device, computer equipment, and storage medium.
- the video server on the network side When the video server on the network side provides network video to the user terminal, it can encode the original video data and push it to the user terminal.
- the video server can encode the original video data through perceptual coding technology. For example, the video server divides the original video data into video segments, and then determines appropriate encoding parameters for each video segment, and encodes the corresponding video segments according to the determined encoding parameters.
- the solution shown in the related art directly encodes video segments divided from the original video data.
- the original video quality is too high or too low, the video coding efficiency and the encoded video quality cannot be considered.
- the embodiments of the present application provide a video encoding method, device, computer equipment, and storage medium, which can take into account both video encoding efficiency and encoded video quality.
- the technical solution is as follows:
- a video encoding method which is applied to a computer device, and the method includes:
- the processed video segment is encoded according to the encoding parameter to obtain an encoded video segment.
- a video encoding device in another aspect, includes:
- the video segmentation module is used to segment the original video data to obtain the original video segment
- a video content analysis module configured to perform video content analysis on the original video segment to obtain video image processing parameters corresponding to the original video segment
- a video processing module configured to perform image processing on the video image in the original video segment based on the video image processing parameter to obtain a processed video segment
- An encoding parameter acquisition module configured to acquire the encoding parameters of the processed video segment based on the image feature data of the processed video segment;
- the encoding module is configured to encode the processed video segment according to the encoding parameter to obtain an encoded video segment.
- the video content analysis module is used to:
- the video analysis model is a machine learning model obtained by training video segment samples, target image quality data of the video segment samples, and video image processing parameters of the video segment samples.
- the target image quality data includes an image quality level.
- the video image processing parameter includes at least one of the following:
- Target frame rate target quantization bit depth, and brightness adjustment curve.
- the video processing module includes:
- the frame rate up-sampling unit is configured to respond to the video image processing parameters including the target frame rate, and the target frame rate is higher than the frame rate of the original video segment, to perform frame rate up-sampling on the original video Perform superframe processing on the segment to obtain the processed video segment;
- the frame rate cropping unit is configured to respond to the video image processing parameters including the target frame rate, and the target frame rate is lower than the frame rate of the original video segment, to perform frame rate downsampling on the original video segment Performing cropping processing to obtain the processed video segment;
- the frame rate maintaining unit is configured to respond to the video image processing parameter including the target frame rate, and the target frame rate is equal to the frame rate of the original video segment, maintaining the frame rate of the original video segment, and obtaining the Describe the processed video clip.
- the video processing module includes:
- the down-sampling quantization unit is configured to respond to the video image processing parameter including the target quantization bit depth, and the target quantization bit depth is lower than the quantization bit depth of the original video segment, to perform downloading on the original video segment Sampling and quantization to obtain the processed video segment;
- An inverse quantization unit configured to respond to the video image processing parameter including the target quantization bit depth, and the target quantization bit depth is higher than the quantization bit depth of the original video segment, to reverse the original video segment High-precision inverse quantization to obtain the processed video segment;
- a quantization holding unit configured to respond to the video image processing parameter including the target quantization bit depth, and the target quantization bit depth is equal to the quantization bit depth of the original video segment, and maintain the quantization bit depth of the original video segment Rate to obtain the processed video segment.
- the video processing module includes:
- the tone mapping unit is configured to respond to the video image processing parameter including the brightness adjustment curve, and that the brightness range corresponding to the brightness adjustment curve is inconsistent with the brightness range of the original video clip, and to compare the brightness adjustment curve based on the brightness adjustment curve. Performing tone mapping on the original video segment to obtain the processed video segment;
- a tone preserving unit configured to respond to the video image processing parameter including the brightness adjustment curve, and the brightness range corresponding to the brightness adjustment curve is consistent with the brightness range of the original video segment, and to maintain the tone of the original video segment , To obtain the processed video clip.
- the encoding parameter acquisition module is configured to:
- the coding parameter determination model is a machine learning model obtained through training of image feature data samples and coding parameters corresponding to the image feature data samples.
- the image feature data includes at least one of the following: frame rate, quantization bit depth, maximum brightness, minimum brightness, image type, motion vector, and target image quality data.
- the encoding parameter includes a code rate.
- the video segmentation module is configured to segment the original video data according to a specified dimension to obtain the original video segment;
- the specified dimension includes at least one of the following: the distribution characteristics of the dark part and the highlight part in the image, the trajectory and the degree of motion of the motion area, the color distribution and intensity, and the details of the picture.
- the device further includes:
- the merging module is used for merging each coded video segment according to the division order of the corresponding original video segment to obtain the coded video data.
- a computer device in another aspect, includes a processor and a memory.
- the memory stores at least one instruction, at least one program, code set, or instruction set. A section of the program, the code set or the instruction set is loaded and executed by the processor to implement the video encoding method as described above.
- a computer-readable storage medium stores at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code
- the set or instruction set is loaded and executed by the processor to implement the video encoding method as described above.
- the encoding parameters Before determining the encoding parameters, perform image processing on the original video segment, and then determine and encode the corresponding encoding parameters of the processed video segment.
- FIG. 1 is a system configuration diagram of a video service system related to various embodiments of the present application
- Figure 2 is a schematic diagram of the ultra-high-definition dimension decomposition involved in this application.
- Fig. 3 is a flowchart showing a video encoding method according to an exemplary embodiment
- FIG. 4 is a schematic diagram of a video encoding process involved in the embodiment shown in FIG. 3;
- Fig. 5 is a flowchart showing a video encoding method according to an exemplary embodiment
- FIG. 6 is a schematic diagram of the input and output of the video analysis model involved in the embodiment shown in FIG. 5;
- FIG. 7 is a schematic diagram of the input and output of the coding parameter determination model involved in the embodiment shown in FIG. 5;
- Fig. 8 is a block diagram showing the structure of a video encoding device according to an exemplary embodiment
- Fig. 9 is a block diagram showing the structure of a video encoding device according to an exemplary embodiment
- Fig. 10 is a schematic structural diagram showing a computer device according to an exemplary embodiment
- Fig. 11 is a schematic diagram showing the structure of a computer device according to an exemplary embodiment.
- the embodiment of the present application proposes a video coding scheme, which can better match image quality and coding parameters based on artificial intelligence (AI), and take into account the efficiency of video coding and the quality of the encoded video.
- AI artificial intelligence
- shot segmentation refers to a “shot” that divides the input original film source (to be coded) into a number of consecutive periods of time, does not overlap between the segments, and all the segments can be combined into the original film source in order.
- shots images with a continuous time range and similar content are divided into segments called “shots”. Subsequent processing is performed in units of video image fragments represented by these "shots”.
- AI is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
- artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
- Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
- Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
- Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
- Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
- artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, drones
- artificial intelligence technology will be applied in more fields and exert more and more important value.
- ML is a multi-disciplinary interdisciplinary, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance.
- Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence.
- Machine learning and deep learning usually include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and style teaching learning.
- FIG. 1 shows a system configuration diagram of a video service system involved in various embodiments of the present application.
- the system includes a server 120, a database 140, and a number of terminals 160.
- the server 120 is a server, or a server cluster composed of several servers, or a virtualization platform, or a cloud computing service center.
- the server 120 may be a server that provides background support for video service applications.
- the server 120 may be composed of one or more functional units.
- the server 120 may include an interface unit 120a, an encoding unit 120b, and a pushing unit 120c.
- the interface unit 120a is used for information interaction with video service applications installed in the terminal 160 to obtain user-related information corresponding to the terminal 160, such as user account information and user operation information.
- the encoding unit 120b is used for encoding unencoded video data to obtain an encoded video.
- the pushing unit 120c is configured to push the encoded video to the terminal 160 corresponding to each user.
- the aforementioned database 140 may be a Redis database, or may also be another type of database. Among them, the database 140 is used to store various types of data, such as user information of each user, including various unencoded original video data, encoded video data, and so on.
- the interface unit 120a obtains the relevant information of the user corresponding to each terminal, it stores the relevant information of the user in the database 140, and the encoding unit 120b encodes the original video data stored in the database 140 and stores it back in the database 140, and pushes
- the unit 120c pushes the video to the user, it extracts the encoded video data from the database 140 and pushes it to the terminal corresponding to the user.
- the foregoing video encoding may also be performed by the terminal 160.
- the terminal 160 may record original video data through an image acquisition component or screen recording software, and encode the recorded original video data and upload it to the network side, so that other terminals can obtain the encoded and uploaded video data from the network side.
- the terminal 160 encodes the original video data and uploads the encoded video data to the server 120.
- the server 120 stores the encoded video data in the database 140, and the server 120 receives the encoded video data sent by other terminals.
- the encoded video data can be obtained from the database 140 and pushed to other terminals, or the encoded video data can be obtained from the database 140 and sent to the content distribution network, and other terminals can obtain the encoded video data from the content distribution network. Pull the encoded video data.
- the terminal 160 may be a terminal device with a network connection function and installed with a video service application corresponding to the server 120.
- the terminal 160 may be a smart phone, a tablet computer, an e-book reader, smart glasses, a smart watch, or MP3.
- Players Motion Picture Experts Group Audio Layer III, Motion Picture Experts Compress Standard Audio Layer 3
- MP4 Motion Picture Experts Group Audio Layer IV, Motion Picture Experts Compress Standard Audio Layer 4
- the terminal 160 may also be called a user equipment, a portable terminal, a laptop terminal, a desktop terminal, and so on.
- the above-mentioned video service application programs may include any application programs that provide continuous image frame streams, such as, including but not limited to, traditional video playback applications, video live broadcast applications, game applications, and communication applications. Application or browser application, etc.
- the terminal 160 and the server 120 are connected through a communication network.
- the communication network is a wired network or a wireless network.
- the system may further include a management device (not shown in FIG. 1), and the management device and the server 120 are connected through a communication network.
- the communication network is a wired network or a wireless network.
- the aforementioned wireless network or wired network uses standard communication technologies and/or protocols.
- the network is usually the Internet, but it can also be any network, including but not limited to Local Area Network (LAN), Metropolitan Area Network (MAN), Wide Area Network (WAN), mobile, wired or wireless Any combination of network, private network, or virtual private network.
- technologies and/or formats including HyperText Mark-up Language (HTML), Extensible Markup Language (XML), etc. are used to represent data exchanged over the network.
- SSL Secure Socket Layer
- TLS Transport Layer Security
- VPN Virtual Private Network
- IPsec Internet Protocol Security
- customized and/or dedicated data communication technologies can also be used to replace or supplement the aforementioned data communication technologies.
- 5G NR 5th-Generation New Radio
- high-definition (HD) video represented by 720P/1080P (commercial channels are mostly named “Ultra Definition” or “Blu-ray”) began to provide ultra-high definition (UHD) video.
- UHD ultra-high definition
- Vibration upgrade Different from the upgrade in the HD era, in addition to the most intuitive image definition improvement, the upgrade of ultra-high-definition video also includes the upgrade of a total of 5 typical dimensions such as frame rate, dynamic range, color gamut, and bit depth (please refer to Figure 2, It shows a schematic diagram of the ultra-high-definition dimension decomposition involved in this application).
- VVC Versatile Video Coding
- AV1/AV2 video compression standards
- AVS3 Analog Video Coding Standard 3, the third generation of audio and video coding Standard
- CAE Content Aware Encoding
- Fig. 3 is a flowchart showing a video coding method according to an exemplary embodiment.
- the video coding method may be used in a computer device, such as a server or a terminal of the system shown in Fig. 1 above.
- the video encoding method may include the following steps:
- Step 31 Segment the original video data to obtain original video fragments.
- the foregoing segmentation of the original video data may be the segmentation of the original video data by means of shot segmentation.
- Step 32 Perform video content analysis on the original video segment to obtain video image processing parameters corresponding to the original video segment.
- the video image processing parameters obtained by analyzing the video content may be processing parameters corresponding to one or more dimensions of the ultra-high-definition video shown in FIG. 2 above.
- Step 33 Perform image processing on the video image in the original video segment based on the video image processing parameter to obtain a processed video segment.
- the computer device may adjust the parameters of one or more dimensions of the original video segment according to the above-mentioned video image processing parameters to obtain the processed video segment.
- Step 34 Obtain the encoding parameters of the processed video segment based on the image feature data of the processed video segment.
- Step 35 Encode the processed video segment according to the encoding parameter to obtain an encoded video segment.
- the computer device can further merge the encoded video segments according to the time sequence of the division to obtain the encoded video data and push it, for example, to the server/content Distribution network, or push to other terminals.
- FIG. 4 shows a schematic diagram of a video encoding process involved in an embodiment of the present application.
- the original video data undergoes the lens segmentation in step S1 to obtain the original video segment; each original video segment undergoes the video content analysis of step S2 to obtain the corresponding video image processing parameters, and then undergoes the image processing of step S3 to obtain The processed video segment; then obtained by the AI encoding parameters of step S4, and the video segment encoding of step S5, to obtain the encoded video segment, and then through the video segment synthesis of step S6, the encoded video data can be obtained.
- the original video segment is first image processed, and then the encoding parameters corresponding to the processed video segment are determined and encoded, so that the video quality and the encoding parameters can be controlled during the encoding parameter process.
- the video quality of the original video clip is too high, the video quality of the original video clip can be appropriately reduced to shorten the encoding time and improve the encoding efficiency; correspondingly, when the video quality of the original video clip is low, the The video quality of the original video clip can be appropriately improved to ensure the encoded video quality.
- the computer device in order to adapt to the multi-dimensional consideration of the ultra-high-definition video source, can determine various parameters based on the AI method to better match the video quality and the encoding parameters.
- the process can be as shown in subsequent embodiments.
- Fig. 5 is a flowchart showing a video encoding method according to an exemplary embodiment.
- the video encoding method may be used in a computer device.
- the computer device may be a server or a terminal in the system shown in FIG. 1 above.
- the video encoding method may include the following steps:
- step 501 the original video data is segmented according to a specified dimension to obtain an original video segment.
- the specified dimension includes at least one of the following: the distribution characteristics of the dark part and the highlight part in the image, the trajectory and the degree of motion of the motion area, the color distribution and intensity, and the details of the picture.
- the server may divide the original video data into multiple original video clips that are connected end to end according to time.
- the server when the server performs shot segmentation, it may consider more dimensions that affect the visual experience of the ultra-high-definition video, rather than being limited to the similarity of image textures.
- the specified dimensions of the lens segmentation may include:
- the server when it performs lens segmentation, it can be based on one or more of the distribution characteristics of the dark part and the highlight part in the image, the trajectory and degree of the motion area, the color distribution and intensity, and the details of the screen. , To determine the split point in the original video data.
- the server can determine the segmentation point in the original video data through a single dimension of the above 4 dimensions. For example, the server can analyze the dark and highlight parts of several video frames before a certain video frame in the original video data. The distribution characteristics of the video frame and the distribution characteristics of the dark and highlight parts of the video frames after the video frame. If the distribution characteristics of the dark and highlight parts of the video frames before and after the video frame are different in the image If the preset condition is met, the video frame can be determined as a segmentation point. Alternatively, the server can analyze the trajectory and degree of the motion area of several video frames before a certain video frame in the original video data, and the trajectory and degree of the motion area of several video frames after the video frame. If the trajectory and degree of the difference of the motion regions of several video frames meet the preset condition, the video frame can be determined as a segmentation point, and so on.
- the server may also combine multiple dimensions among the above-mentioned four specified dimensions to comprehensively determine the division points in the original video data.
- Step 502 Input the original video segment and the target image quality data to the video analysis model, and obtain the video image processing parameters output by the video analysis model.
- the video analysis model is a machine learning model obtained by training video clip samples, target image quality data of the video clip samples, and video image processing parameters of the video clip samples.
- the developer can pre-label the training data set of the video analysis model.
- the training data set consists of several video clip samples, the target image quality data of the video clip samples, and the video image processing parameters of the video clip samples.
- the model training device can train the video analysis model through the training data set of the video analysis model.
- the target image quality data includes an image quality level.
- the above-mentioned image quality level may be a designated quality level, which may be a level standard parameter designated by the service.
- the image quality level may use 1 to 10 to indicate the quality level, and 10 indicates the highest quality. , 1 means the lowest.
- the video image processing parameter includes at least one of the following: a target frame rate, a target quantization bit depth, and a brightness adjustment curve.
- the server can analyze the video content through a machine learning model, and output the most suitable image parameters under a specified visual quality factor for subsequent further preprocessing.
- Its working principle is: For ultra-high-resolution video, due to the difference in image content characteristics, the dimensions shown in Figure 2 are not as high as possible in all cases, and due to the current stage of human vision The principle research is imperfect, and it is impossible to calculate and quantify each dimension through precise mathematical modeling.
- the server can analyze the input lens (that is, the input original video segment) to obtain appropriate ultra-high-definition dimensional video image processing parameters.
- FIG. 6 shows a schematic diagram of input and output of a video analysis model involved in an embodiment of the present application.
- the server inputs the original video clips and the desired target image quality data into the AI network model (that is, the above-mentioned video analysis model).
- the AI network model can determine the input original image and the specified quality After the factor is inferred by AI, the most suitable ultra-high-definition dimension video image processing parameters are output.
- Step 503 Perform image processing on the video image in the original video segment based on the video image processing parameter to obtain a processed video segment.
- the manner of performing image processing on the video image in the original video segment may be as follows:
- the target quantization bit depth In response to the video image processing parameter including the target quantization bit depth, and the target quantization bit depth is lower than the quantization bit depth of the original video segment, down-sampling and quantizing the original video segment to obtain the processed video segment ; Responsive to the video image processing parameters including the target quantization bit depth, and the target quantization bit depth is higher than the quantization bit depth of the original video segment, the original video segment is subjected to high-precision inverse quantization in the reverse direction to obtain the processed Video clips.
- the quantization bit depth rate of the original video segment is maintained to obtain the processed video segment.
- tone mapping the original video segment based on the brightness adjustment curve to obtain the The processed video clip.
- the hue of the original video segment is maintained, and the processed video segment is obtained.
- the above-mentioned video image processing parameters include two or more than two types
- the above-mentioned three corresponding processing modes may be combined for processing.
- the server can superframe the original video clip according to the target frame rate, and perform high-precision on the original video clip according to the target quantization bit depth Inverse quantization process, and maintain the original video clip tones, so as to obtain the processed video clip.
- Step 504 Input the processed image feature data of the video segment into the coding parameter determination model, and obtain the coding parameter output by the coding parameter determination model.
- the coding parameter determination model is a machine learning model obtained through training of image feature data samples and coding parameters corresponding to the image feature data samples.
- the image feature data includes at least one of the following: frame rate, quantization bit depth, maximum brightness, minimum brightness, image type, motion vector, and target image quality data.
- the encoding parameter includes a code rate.
- the server may use a pre-trained AI model (ie, the aforementioned encoding parameter determination model) to determine the encoding parameters of the processed video segment based on the image feature data of the processed video segment obtained in step 503.
- a pre-trained AI model ie, the aforementioned encoding parameter determination model
- the developer can pre-label the training data set of the coding parameter determination model.
- the training data set is composed of several image feature data samples and the coding parameters of the image feature data samples.
- the model training device can use the The training data set of the coding parameter determination model is trained to obtain the above coding parameter determination model.
- FIG. 7 shows a schematic diagram of the input and output of the encoding parameter determination model involved in the embodiment of the present application.
- the AI model used in the embodiments of the present application can be a neural network model trained in advance using a calibration data set. Its input is a number of image feature data, and its output is a given image quality factor (ie, target image quality Data) encoding parameters, such as: bit rate.
- image quality factor ie, target image quality Data
- the input of the model shown in the embodiment of this application contains several dimensions directly related to the ultra-high-definition film source, such as frame rate, quantization bit depth, maximum brightness, minimum brightness, image type, motion vector, and target. Image quality data and more.
- the image type and motion vector of the above-mentioned processed video segment can be obtained through 1-pass encoding, such as encoding using a relatively fast conditional random field (CRF), or a fixed quantization factor QP.
- 1-pass encoding such as encoding using a relatively fast conditional random field (CRF), or a fixed quantization factor QP.
- Step 505 Encode the processed video segment according to the encoding parameter to obtain an encoded video segment.
- the server may perform a compression encoding operation on the processed video clip according to the encoding parameters obtained in step 504 above.
- the algorithm used in the above compression coding operation can adopt industry common standards, for example: H.264/AVC (Advanced Video Coding, Advanced Video Coding), HEVC (High Efficiency Video Coding, High-efficiency Video Coding), VP9, AVS2 (Audio Video Coding Standard 2, the second-generation audio and video coding standard), etc.
- the compression coding algorithm tool used in the embodiment of the present application may adopt a standard that includes algorithm tools supporting ultra-high-definition characteristics, such as HEVC, VVC, AV1, AVS3, and so on.
- Step 506 Combine the encoded video fragments according to the division order of the corresponding original video fragments to obtain encoded video data.
- the solution shown in the embodiments of the present application considers the multi-dimensionality of ultra-high-definition video when splitting the shots, and improves the distinction between different shots in terms of dark/high brightness, color, fast motion, and detailed information, thereby improving video clips The accuracy of segmentation to further optimize the compression efficiency in subsequent shots.
- the solution shown in the embodiments of the present application introduces the multi-dimensional features of ultra-high-definition video in the process of acquiring video image processing parameters and acquiring encoding parameters, and considers the effect of image characteristics such as frame rate, brightness range, and quantization bit depth on image quality. The associated influence of the image coding is improved.
- Fig. 8 is a block diagram showing the structure of a video encoding device according to an exemplary embodiment.
- the video encoding device can perform all or part of the steps in the embodiment shown in FIG. 3 or FIG. 5.
- the device can be a computer device, or it can be set in a computer device.
- the video encoding device may include:
- the video segmentation module 801 is configured to segment original video data to obtain original video segments
- the video content analysis module 802 is configured to perform video content analysis on the original video segment to obtain video image processing parameters corresponding to the original video segment;
- the video processing module 803 is configured to perform image processing on the video image in the original video segment based on the video image processing parameter to obtain a processed video segment;
- the encoding parameter acquisition module 804 is configured to acquire the encoding parameters of the processed video segment based on the image feature data of the processed video segment;
- the encoding module 805 is configured to encode the processed video segment according to the encoding parameter to obtain an encoded video segment.
- the video content analysis module 802 is configured to:
- the video analysis model is a machine learning model obtained by training video segment samples, target image quality data of the video segment samples, and video image processing parameters of the video segment samples.
- the target image quality data includes an image quality level.
- the video image processing parameter includes at least one of the following:
- Target frame rate target quantization bit depth, and brightness adjustment curve.
- the video processing module 803 includes:
- the frame rate up-sampling unit 8031 is configured to respond to the video image processing parameters including the target frame rate, and the target frame rate is higher than the frame rate of the original video segment, to perform frame rate up-sampling on the original Perform superframe processing on the video segment to obtain the processed video segment;
- the frame rate cropping unit 8032 is configured to respond to the video image processing parameters including the target frame rate, and the target frame rate is lower than the frame rate of the original video segment, and perform frame rate downsampling on the original video The segment is cropped to obtain the processed video segment.
- the frame rate maintaining unit 8033 is configured to respond to the video image processing parameters including the target frame rate, and the target frame rate is equal to the frame rate of the original video segment, and maintain the frame rate of the original video segment to obtain The processed video segment.
- the video processing module 803 includes:
- the down-sampling quantization unit 8034 is configured to respond to the video image processing parameter including the target quantization bit depth, and the target quantization bit depth is lower than the quantization bit depth of the original video segment, perform processing on the original video segment Down-sampling and quantization to obtain the processed video segment;
- the inverse quantization unit 8035 is configured to respond to the video image processing parameter including the target quantization bit depth, and the target quantization bit depth is higher than the quantization bit depth of the original video segment, to reverse the original video segment Directional high-precision inverse quantization to obtain the processed video segment.
- the quantization holding unit 8036 is configured to respond to the video image processing parameter including the target quantization bit depth, and the target quantization bit depth is equal to the quantization bit depth of the original video segment, and maintain the quantization bit of the original video segment Depth rate to obtain the processed video segment.
- the video processing module 803 includes:
- the tone mapping unit 8037 is configured to respond to the video image processing parameter including the brightness adjustment curve, and the brightness range corresponding to the brightness adjustment curve is inconsistent with the brightness range of the original video clip, based on the brightness adjustment curve pair
- the original video segment is tone mapped to obtain the processed video segment.
- the tone preserving unit 8038 is configured to respond to the video image processing parameter including the brightness adjustment curve, and the brightness range corresponding to the brightness adjustment curve is consistent with the brightness range of the original video segment, and to maintain the original video segment Hue to obtain the processed video clip.
- the encoding parameter acquisition module 804 is configured to:
- the coding parameter determination model is a machine learning model obtained through training of image feature data samples and coding parameters corresponding to the image feature data samples.
- the image feature data includes at least one of the following: frame rate, quantization bit depth, maximum brightness, minimum brightness, image type, motion vector, and target image quality data.
- the encoding parameter includes a code rate.
- the video segmentation module 801 is configured to segment the original video data according to a specified dimension to obtain the original video segment;
- the specified dimension includes at least one of the following: the distribution characteristics of the dark part and the highlight part in the image, the trajectory and the degree of motion of the motion area, the color distribution and intensity, and the details of the picture.
- the device further includes:
- the merging module 806 is configured to merge the encoded video segments according to the corresponding original video segment segmentation order to obtain encoded video data.
- the solution shown in the embodiments of the present application considers the multi-dimensionality of ultra-high-definition video when splitting the shots, and improves the distinction between different shots in terms of dark/high brightness, color, fast motion, and detailed information, thereby improving video clips The accuracy of segmentation to further optimize the compression efficiency in subsequent shots.
- the solution shown in the embodiments of the present application introduces the multi-dimensional features of ultra-high-definition video in the process of acquiring video image processing parameters and acquiring encoding parameters, and considers the effect of image characteristics such as frame rate, brightness range, and quantization bit depth on image quality. The associated influence of the image coding is improved.
- Fig. 10 is a schematic diagram showing the structure of a computer device according to an exemplary embodiment.
- the computer device can be implemented as a server on the network side.
- the server may be the server 120 shown in FIG. 1.
- the computer device 1000 includes a central processing unit (Central Processing Unit, CPU) 1001, a system memory 1004 including a random access memory (Random Access Memory, RAM) 1002 and a read-only memory (Read-Only Memory, ROM) 1003, and A system bus 1005 connecting the system memory 1004 and the central processing unit 1001.
- the computer device 1000 also includes a basic input/output system (Input/Output system, I/O system) 1006 that helps to transfer information between various devices in the computer, and is used to store an operating system 1013, application programs 1014, and other programs.
- the mass storage device 1007 of the module 1015 The mass storage device 1007 of the module 1015.
- the mass storage device 1007 is connected to the central processing unit 1001 through a mass storage controller (not shown) connected to the system bus 1005.
- the mass storage device 1007 and its associated computer-readable medium provide non-volatile storage for the computer device 1000. That is, the mass storage device 1007 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM (Compact Disc Read-Only Memory) drive.
- the computer-readable media may include computer storage media and communication media.
- Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules or other data.
- Computer storage media include RAM, ROM, EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash memory or other Solid-state storage technology, CD-ROM, DVD (Digital Versatile Disc, Digital Versatile Disc) or other optical storage, tape cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices.
- RAM random access memory
- ROM read-only memory
- EPROM Erasable Programmable Read-Only Memory
- EEPROM Electrically Erasable Programmable Read-Only Memory
- flash memory or other Solid-state storage technology
- CD-ROM DVD (Digital Versatile Disc, Digital Versatile Disc) or other optical storage, tape cartridges, magnetic tape, magnetic disk storage or other magnetic
- the computer device 1000 may be connected to the Internet or other network devices through the network interface unit 1011 connected to the system bus 1005.
- the memory further includes one or more programs, the one or more programs are stored in the memory, and the central processing unit 1001 executes the one or more programs to implement all or all of the methods shown in FIG. 3 or FIG. 5 Part of the steps.
- FIG. 11 shows a structural block diagram of a computer device 1100 provided by an exemplary embodiment of the present application.
- the computer device 1100 may be a terminal, and the terminal may be the terminal 160 shown in FIG. 1.
- the computer device 1100 includes a processor 1101 and a memory 1102.
- the processor 1101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on.
- the processor 1101 can adopt at least one hardware form among DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array, Programmable Logic Array). accomplish.
- the processor 1101 may also include a main processor and a co-processor.
- the processor 1101 may be integrated with a GPU (Graphics Processing Unit, image processor).
- the processor 1101 may further include an AI (Artificial Intelligence) processor, and the AI processor is used to process computing operations related to machine learning.
- AI Artificial Intelligence
- the memory 1102 may include one or more computer-readable storage media, which may be non-transitory.
- the memory 1102 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
- the non-transitory computer-readable storage medium in the memory 1102 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 1101 to implement the method provided in the method embodiment of the present application.
- the computer device 1100 may optionally further include: a peripheral device interface 1103 and at least one peripheral device.
- the processor 1101, the memory 1102, and the peripheral device interface 1103 may be connected by a bus or a signal line.
- Each peripheral device can be connected to the peripheral device interface 1103 through a bus, a signal line, or a circuit board.
- the peripheral device includes at least one of the following: a radio frequency circuit 1104, a touch screen 1105, a camera 1106, an audio circuit 1107, a positioning component 1108, and a power supply 1109.
- the peripheral device interface 1103 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 1101 and the memory 1102.
- I/O Input/Output
- the radio frequency circuit 1104 is used to receive and transmit RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
- the radio frequency circuit 1104 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, and so on.
- the radio frequency circuit 1104 may also include a circuit related to NFC (Near Field Communication), which is not limited in this application.
- the display screen 1105 is used to display UI (User Interface, user interface).
- the UI can include graphics, text, icons, videos, and any combination thereof.
- the display screen 1105 also has the ability to collect touch signals on or above the surface of the display screen 1105.
- the display screen 1105 may be made of materials such as LCD (Liquid Crystal Display) and OLED (Organic Light-Emitting Diode).
- the camera assembly 1106 is used to capture images or videos.
- the camera assembly 1106 includes a front camera and a rear camera.
- the camera assembly 1106 may also include a flash.
- the audio circuit 1107 may include a microphone and a speaker. For the purpose of stereo collection or noise reduction, there may be multiple microphones, which are respectively arranged in different parts of the computer device 1100. In some embodiments, the audio circuit 1107 may also include a headphone jack.
- the positioning component 1108 is used to locate the current geographic location of the computer device 1100 to implement navigation or LBS (Location Based Service, location-based service).
- LBS Location Based Service, location-based service
- the power supply 1109 is used to supply power to various components in the computer device 1100.
- the computer device 1100 further includes one or more sensors 1110.
- the one or more sensors 1110 include, but are not limited to: an acceleration sensor 1111, a gyroscope sensor 1112, a pressure sensor 1113, a fingerprint sensor 1114, an optical sensor 1115, and a proximity sensor 1116.
- FIG. 11 does not constitute a limitation on the computer device 1100, and may include more or fewer components than shown in the figure, or combine some components, or adopt different component arrangements.
- non-transitory computer-readable storage medium including instructions, such as a memory including a computer program (instruction), which can be executed by a processor of a computer device to complete the present application. All or part of the steps of the method shown in each embodiment.
- the non-transitory computer-readable storage medium may be a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a compact disc read-only memory (Compact Disc Read-Only Memory, CD-ROM), magnetic tapes, floppy disks and optical data storage devices, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims (15)
- 一种视频编码方法,应用于计算机设备中,所述方法包括:对原始视频数据进行分割,得到原始视频片段;对所述原始视频片段进行视频内容分析,得到所述原始视频片段对应的视频图像处理参数;基于所述视频图像处理参数对所述原始视频片段中的视频图像进行图像处理,得到处理后的视频片段;基于所述处理后的视频片段的图像特征数据,获取所述处理后的视频片段的编码参数;按照所述编码参数对所述处理后的视频片段进行编码,得到编码后的视频片段。
- 根据权利要求1所述的方法,其中,所述对所述原始视频片段进行视频内容分析,得到所述原始视频片段对应的视频图像处理参数,包括:将所述原始视频片段以及目标图像质量数据输入至视频分析模型,获得所述视频分析模型输出的所述视频图像处理参数;所述视频分析模型是通过视频片段样本、视频片段样本的目标图像质量数据以及所述视频片段样本的视频图像处理参数训练得到的机器学习模型。
- 根据权利要求2所述的方法,其中,所述目标图像质量数据包括图像质量等级。
- 根据权利要求1至3任一所述的方法,其中,所述视频图像处理参数包括如下至少一种:目标帧率、目标量化位深以及亮度调整曲线。
- 根据权利要求4所述的方法,其中,所述基于所述视频图像处理参数对所述原始视频片段中的视频图像进行图像处理,得到处理后的视频片段,包括:响应于所述视频图像处理参数包括所述目标帧率,且所述目标帧率高于所述原始视频片段的帧率,通过帧率上采样对所述原始视频片段进行超帧处理, 得到所述处理后的视频片段;响应于所述视频图像处理参数包括所述目标帧率,且所述目标帧率低于所述原始视频片段的帧率,通过帧率下采样对所述原始视频片段进行裁剪处理,得到所述处理后的视频片段;响应于所述视频图像处理参数包括所述目标帧率,且所述目标帧率等于所述原始视频片段的帧率,保持所述原始视频片段的帧率,得到所述处理后的视频片段。
- 根据权利要求4所述的方法,其中,所述基于所述视频图像处理参数对所述原始视频片段中的视频图像进行图像处理,得到处理后的视频片段,包括:响应于所述视频图像处理参数包括所述目标量化位深,且所述目标量化位深低于所述原始视频片段的量化位深,对所述原始视频片段进行下采样量化,得到所述处理后的视频片段;响应于所述视频图像处理参数包括所述目标量化位深,且所述目标量化位深高于所述原始视频片段的量化位深,对所述原始视频片段进行反方向高精度逆量化,得到所述处理后的视频片段;响应于所述视频图像处理参数包括所述目标量化位深,且所述目标量化位深等于所述原始视频片段的量化位深,保持所述原始视频片段的量化位深率,得到所述处理后的视频片段。
- 根据权利要求4所述的方法,其中,所述基于所述视频图像处理参数对所述原始视频片段中的视频图像进行图像处理,得到处理后的视频片段,包括:响应于所述视频图像处理参数包括所述亮度调整曲线,且所述亮度调整曲线对应的亮度范围与所述原始视频片段的亮度范围不一致,基于所述亮度调整曲线对所述原始视频片段进行色调映射,得到所述处理后的视频片段;响应于所述视频图像处理参数包括所述亮度调整曲线,且所述亮度调整曲线对应的亮度范围与所述原始视频片段的亮度范围一致,保持所述原始视频片段的色调,得到所述处理后的视频片段。
- 根据权利要求1所述的方法,其中,所述基于所述处理后的视频片段的 图像特征数据,获取所述处理后的视频片段的编码参数,包括:将所述处理后的视频片段的图像特征数据输入至编码参数确定模型,获得所述编码参数确定模型输出的所述编码参数;所述编码参数确定模型是通过图像特征数据样本,以及所述图像特征数据样本对应的编码参数训练得到的机器学习模型。
- 根据权利要求8所述的方法,其中,所述图像特征数据包括如下至少一种:帧率、量化位深、最大亮度、最小亮度、图像类型、运动矢量以及目标图像质量数据。
- 根据权利要求8或9所述的方法,其中,所述编码参数包括码率。
- 根据权利要求1所述的方法,其中,所述对原始视频数据进行分割,得到原始视频片段,包括:按照指定维度对所述原始视频数据进行分割,得到所述原始视频片段;其中,所述指定维度包括如下至少一种:暗部及高亮部在图像中的分布特性、运动区域的轨迹与运动程度、色彩分布及浓烈程度以及画面细节。
- 根据权利要求1所述的方法,其中,所述方法还包括:按照对应的原始视频片段的分割顺序,将各个编码后的视频片段进行合并,得到编码后的视频数据。
- 一种视频编码装置,所述装置包括:视频分割模块,用于对原始视频数据进行分割,得到原始视频片段;视频内容分析模块,用于对所述原始视频片段进行视频内容分析,得到所述原始视频片段对应的视频图像处理参数;视频处理模块,用于基于所述视频图像处理参数对所述原始视频片段中的视频图像进行图像处理,得到处理后的视频片段;编码参数获取模块,用于基于所述处理后的视频片段的图像特征数据,获 取所述处理后的视频片段的编码参数;编码模块,用于按照所述编码参数对所述处理后的视频片段进行编码,得到编码后的视频片段。
- 一种计算机设备,所述计算机设备包含处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至12任一所述的视频编码方法。
- 一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如权利要求1至12任一所述的视频编码方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/678,931 US20220256140A1 (en) | 2020-03-02 | 2022-02-23 | Video encoding method and apparatus, computer device, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010135358.0A CN110996131B (zh) | 2020-03-02 | 2020-03-02 | 视频编码方法、装置、计算机设备及存储介质 |
CN202010135358.0 | 2020-03-02 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/678,931 Continuation US20220256140A1 (en) | 2020-03-02 | 2022-02-23 | Video encoding method and apparatus, computer device, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021174878A1 true WO2021174878A1 (zh) | 2021-09-10 |
Family
ID=70081336
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/124536 WO2021174878A1 (zh) | 2020-03-02 | 2020-10-28 | 视频编码方法、装置、计算机设备及存储介质 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20220256140A1 (zh) |
CN (1) | CN110996131B (zh) |
WO (1) | WO2021174878A1 (zh) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110996131B (zh) * | 2020-03-02 | 2020-11-10 | 腾讯科技(深圳)有限公司 | 视频编码方法、装置、计算机设备及存储介质 |
CN111598768B (zh) * | 2020-07-23 | 2020-10-30 | 平安国际智慧城市科技股份有限公司 | 图像优化处理方法、装置、计算机设备及存储介质 |
CN113170054A (zh) * | 2020-07-28 | 2021-07-23 | 深圳市大疆创新科技有限公司 | 视频传输方法、可移动平台及计算机可读存储介质 |
US11995153B2 (en) * | 2020-09-24 | 2024-05-28 | Canon Kabushiki Kaisha | Information processing apparatus, information processing method, and storage medium |
CN113099132B (zh) * | 2021-04-19 | 2023-03-21 | 深圳市帧彩影视科技有限公司 | 视频处理方法、装置、电子设备、存储介质及程序产品 |
CN116320529A (zh) * | 2021-12-10 | 2023-06-23 | 深圳市中兴微电子技术有限公司 | 视频码率控制方法及装置、计算机可读存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1459981A (zh) * | 2002-05-22 | 2003-12-03 | 三星电子株式会社 | 自适应编码和解码运动图像的方法及其装置 |
CN105264888A (zh) * | 2014-03-04 | 2016-01-20 | 微软技术许可有限责任公司 | 用于对色彩空间、色彩采样率和/或比特深度自适应切换的编码策略 |
CN107409221A (zh) * | 2015-03-20 | 2017-11-28 | 杜比实验室特许公司 | 信号整形逼近 |
CN109286825A (zh) * | 2018-12-14 | 2019-01-29 | 北京百度网讯科技有限公司 | 用于处理视频的方法和装置 |
WO2020036502A1 (en) * | 2018-08-14 | 2020-02-20 | Huawei Technologies Co., Ltd | Machine-learning-based adaptation of coding parameters for video encoding using motion and object detection |
CN110996131A (zh) * | 2020-03-02 | 2020-04-10 | 腾讯科技(深圳)有限公司 | 视频编码方法、装置、计算机设备及存储介质 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101272489B (zh) * | 2007-03-21 | 2011-08-10 | 中兴通讯股份有限公司 | 视频图像质量增强的编解码装置与编解码方法 |
GB201603144D0 (en) * | 2016-02-23 | 2016-04-06 | Magic Pony Technology Ltd | Training end-to-end video processes |
US10460231B2 (en) * | 2015-12-29 | 2019-10-29 | Samsung Electronics Co., Ltd. | Method and apparatus of neural network based image signal processor |
US10402932B2 (en) * | 2017-04-17 | 2019-09-03 | Intel Corporation | Power-based and target-based graphics quality adjustment |
US10798399B1 (en) * | 2017-12-11 | 2020-10-06 | Amazon Technologies, Inc. | Adaptive video compression |
CN109729439B (zh) * | 2019-01-11 | 2022-06-17 | 北京世纪好未来教育科技有限公司 | 实时视频传输方法 |
CN110418177B (zh) * | 2019-04-19 | 2021-06-11 | 腾讯科技(深圳)有限公司 | 视频编码方法、装置、设备和存储介质 |
CN110149554B (zh) * | 2019-05-31 | 2021-06-15 | Oppo广东移动通信有限公司 | 视频图像处理的方法、装置、电子设备以及存储介质 |
CN110290381B (zh) * | 2019-08-01 | 2020-10-30 | 字节跳动(香港)有限公司 | 视频质量评估方法、装置、电子设备及计算机存储介质 |
-
2020
- 2020-03-02 CN CN202010135358.0A patent/CN110996131B/zh active Active
- 2020-10-28 WO PCT/CN2020/124536 patent/WO2021174878A1/zh active Application Filing
-
2022
- 2022-02-23 US US17/678,931 patent/US20220256140A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1459981A (zh) * | 2002-05-22 | 2003-12-03 | 三星电子株式会社 | 自适应编码和解码运动图像的方法及其装置 |
CN105264888A (zh) * | 2014-03-04 | 2016-01-20 | 微软技术许可有限责任公司 | 用于对色彩空间、色彩采样率和/或比特深度自适应切换的编码策略 |
CN107409221A (zh) * | 2015-03-20 | 2017-11-28 | 杜比实验室特许公司 | 信号整形逼近 |
WO2020036502A1 (en) * | 2018-08-14 | 2020-02-20 | Huawei Technologies Co., Ltd | Machine-learning-based adaptation of coding parameters for video encoding using motion and object detection |
CN109286825A (zh) * | 2018-12-14 | 2019-01-29 | 北京百度网讯科技有限公司 | 用于处理视频的方法和装置 |
CN110996131A (zh) * | 2020-03-02 | 2020-04-10 | 腾讯科技(深圳)有限公司 | 视频编码方法、装置、计算机设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN110996131B (zh) | 2020-11-10 |
US20220256140A1 (en) | 2022-08-11 |
CN110996131A (zh) | 2020-04-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021174878A1 (zh) | 视频编码方法、装置、计算机设备及存储介质 | |
Chiariotti | A survey on 360-degree video: Coding, quality of experience and streaming | |
CN109844736B (zh) | 概括视频内容 | |
US11978178B2 (en) | Electronic device, control method thereof, and system | |
US10534525B1 (en) | Media editing system optimized for distributed computing systems | |
US20210392392A1 (en) | Bitrate Optimizations For Immersive Multimedia Streaming | |
CN102474661A (zh) | 根据输送协议封装三维视频数据 | |
EP4373086A1 (en) | Image processing method and apparatus, medium, and electronic device | |
EP3917131A1 (en) | Image deformation control method and device and hardware device | |
WO2023273536A1 (zh) | 重光照图像的生成方法、装置及电子设备 | |
US20220236782A1 (en) | System and method for intelligent multi-application and power management for multimedia collaboration applications | |
US20130121573A1 (en) | Hybrid codec for compound image compression | |
CN110689478B (zh) | 图像风格化处理方法、装置、电子设备及可读介质 | |
CN114783459A (zh) | 一种语音分离方法、装置、电子设备和存储介质 | |
CN117095006B (zh) | 图像美学评估方法、装置、电子设备及存储介质 | |
US20210358087A1 (en) | A method and device for enhancing video image quality | |
CN111696034B (zh) | 图像处理方法、装置及电子设备 | |
KR102623148B1 (ko) | 전자 장치 및 전자 장치의 제어 방법 | |
CN114827567B (zh) | 视频质量分析方法、设备和可读介质 | |
Tang et al. | A cloud-edge collaborative gaming framework using AI-Powered foveated rendering and super resolution | |
CN117242421A (zh) | 用于基于场景的沉浸式媒体的流式传输的智能客户端 | |
KR20200080369A (ko) | 디스플레이장치, 그 제어방법 및 기록매체 | |
CN113780252A (zh) | 视频处理模型的训练方法、视频处理方法和装置 | |
US20210327034A1 (en) | A method and device for enhancing video image quality | |
Xie et al. | Bandwidth-Aware Adaptive Codec for DNN Inference Offloading in IoT |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20922889 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20922889 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 20-02-2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20922889 Country of ref document: EP Kind code of ref document: A1 |