WO2021174878A1 - 视频编码方法、装置、计算机设备及存储介质 - Google Patents

视频编码方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021174878A1
WO2021174878A1 PCT/CN2020/124536 CN2020124536W WO2021174878A1 WO 2021174878 A1 WO2021174878 A1 WO 2021174878A1 CN 2020124536 W CN2020124536 W CN 2020124536W WO 2021174878 A1 WO2021174878 A1 WO 2021174878A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
video segment
image processing
original
original video
Prior art date
Application number
PCT/CN2020/124536
Other languages
English (en)
French (fr)
Inventor
李大龙
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2021174878A1 publication Critical patent/WO2021174878A1/zh
Priority to US17/678,931 priority Critical patent/US20220256140A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234381Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the temporal resolution, e.g. decreasing the frame rate by frame skipping
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/179Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scene or a shot
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Definitions

  • the embodiments of the present application relate to the field of video processing technologies, and in particular, to a video encoding method, device, computer equipment, and storage medium.
  • the video server on the network side When the video server on the network side provides network video to the user terminal, it can encode the original video data and push it to the user terminal.
  • the video server can encode the original video data through perceptual coding technology. For example, the video server divides the original video data into video segments, and then determines appropriate encoding parameters for each video segment, and encodes the corresponding video segments according to the determined encoding parameters.
  • the solution shown in the related art directly encodes video segments divided from the original video data.
  • the original video quality is too high or too low, the video coding efficiency and the encoded video quality cannot be considered.
  • the embodiments of the present application provide a video encoding method, device, computer equipment, and storage medium, which can take into account both video encoding efficiency and encoded video quality.
  • the technical solution is as follows:
  • a video encoding method which is applied to a computer device, and the method includes:
  • the processed video segment is encoded according to the encoding parameter to obtain an encoded video segment.
  • a video encoding device in another aspect, includes:
  • the video segmentation module is used to segment the original video data to obtain the original video segment
  • a video content analysis module configured to perform video content analysis on the original video segment to obtain video image processing parameters corresponding to the original video segment
  • a video processing module configured to perform image processing on the video image in the original video segment based on the video image processing parameter to obtain a processed video segment
  • An encoding parameter acquisition module configured to acquire the encoding parameters of the processed video segment based on the image feature data of the processed video segment;
  • the encoding module is configured to encode the processed video segment according to the encoding parameter to obtain an encoded video segment.
  • the video content analysis module is used to:
  • the video analysis model is a machine learning model obtained by training video segment samples, target image quality data of the video segment samples, and video image processing parameters of the video segment samples.
  • the target image quality data includes an image quality level.
  • the video image processing parameter includes at least one of the following:
  • Target frame rate target quantization bit depth, and brightness adjustment curve.
  • the video processing module includes:
  • the frame rate up-sampling unit is configured to respond to the video image processing parameters including the target frame rate, and the target frame rate is higher than the frame rate of the original video segment, to perform frame rate up-sampling on the original video Perform superframe processing on the segment to obtain the processed video segment;
  • the frame rate cropping unit is configured to respond to the video image processing parameters including the target frame rate, and the target frame rate is lower than the frame rate of the original video segment, to perform frame rate downsampling on the original video segment Performing cropping processing to obtain the processed video segment;
  • the frame rate maintaining unit is configured to respond to the video image processing parameter including the target frame rate, and the target frame rate is equal to the frame rate of the original video segment, maintaining the frame rate of the original video segment, and obtaining the Describe the processed video clip.
  • the video processing module includes:
  • the down-sampling quantization unit is configured to respond to the video image processing parameter including the target quantization bit depth, and the target quantization bit depth is lower than the quantization bit depth of the original video segment, to perform downloading on the original video segment Sampling and quantization to obtain the processed video segment;
  • An inverse quantization unit configured to respond to the video image processing parameter including the target quantization bit depth, and the target quantization bit depth is higher than the quantization bit depth of the original video segment, to reverse the original video segment High-precision inverse quantization to obtain the processed video segment;
  • a quantization holding unit configured to respond to the video image processing parameter including the target quantization bit depth, and the target quantization bit depth is equal to the quantization bit depth of the original video segment, and maintain the quantization bit depth of the original video segment Rate to obtain the processed video segment.
  • the video processing module includes:
  • the tone mapping unit is configured to respond to the video image processing parameter including the brightness adjustment curve, and that the brightness range corresponding to the brightness adjustment curve is inconsistent with the brightness range of the original video clip, and to compare the brightness adjustment curve based on the brightness adjustment curve. Performing tone mapping on the original video segment to obtain the processed video segment;
  • a tone preserving unit configured to respond to the video image processing parameter including the brightness adjustment curve, and the brightness range corresponding to the brightness adjustment curve is consistent with the brightness range of the original video segment, and to maintain the tone of the original video segment , To obtain the processed video clip.
  • the encoding parameter acquisition module is configured to:
  • the coding parameter determination model is a machine learning model obtained through training of image feature data samples and coding parameters corresponding to the image feature data samples.
  • the image feature data includes at least one of the following: frame rate, quantization bit depth, maximum brightness, minimum brightness, image type, motion vector, and target image quality data.
  • the encoding parameter includes a code rate.
  • the video segmentation module is configured to segment the original video data according to a specified dimension to obtain the original video segment;
  • the specified dimension includes at least one of the following: the distribution characteristics of the dark part and the highlight part in the image, the trajectory and the degree of motion of the motion area, the color distribution and intensity, and the details of the picture.
  • the device further includes:
  • the merging module is used for merging each coded video segment according to the division order of the corresponding original video segment to obtain the coded video data.
  • a computer device in another aspect, includes a processor and a memory.
  • the memory stores at least one instruction, at least one program, code set, or instruction set. A section of the program, the code set or the instruction set is loaded and executed by the processor to implement the video encoding method as described above.
  • a computer-readable storage medium stores at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code
  • the set or instruction set is loaded and executed by the processor to implement the video encoding method as described above.
  • the encoding parameters Before determining the encoding parameters, perform image processing on the original video segment, and then determine and encode the corresponding encoding parameters of the processed video segment.
  • FIG. 1 is a system configuration diagram of a video service system related to various embodiments of the present application
  • Figure 2 is a schematic diagram of the ultra-high-definition dimension decomposition involved in this application.
  • Fig. 3 is a flowchart showing a video encoding method according to an exemplary embodiment
  • FIG. 4 is a schematic diagram of a video encoding process involved in the embodiment shown in FIG. 3;
  • Fig. 5 is a flowchart showing a video encoding method according to an exemplary embodiment
  • FIG. 6 is a schematic diagram of the input and output of the video analysis model involved in the embodiment shown in FIG. 5;
  • FIG. 7 is a schematic diagram of the input and output of the coding parameter determination model involved in the embodiment shown in FIG. 5;
  • Fig. 8 is a block diagram showing the structure of a video encoding device according to an exemplary embodiment
  • Fig. 9 is a block diagram showing the structure of a video encoding device according to an exemplary embodiment
  • Fig. 10 is a schematic structural diagram showing a computer device according to an exemplary embodiment
  • Fig. 11 is a schematic diagram showing the structure of a computer device according to an exemplary embodiment.
  • the embodiment of the present application proposes a video coding scheme, which can better match image quality and coding parameters based on artificial intelligence (AI), and take into account the efficiency of video coding and the quality of the encoded video.
  • AI artificial intelligence
  • shot segmentation refers to a “shot” that divides the input original film source (to be coded) into a number of consecutive periods of time, does not overlap between the segments, and all the segments can be combined into the original film source in order.
  • shots images with a continuous time range and similar content are divided into segments called “shots”. Subsequent processing is performed in units of video image fragments represented by these "shots”.
  • AI is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, drones
  • artificial intelligence technology will be applied in more fields and exert more and more important value.
  • ML is a multi-disciplinary interdisciplinary, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance.
  • Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence.
  • Machine learning and deep learning usually include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and style teaching learning.
  • FIG. 1 shows a system configuration diagram of a video service system involved in various embodiments of the present application.
  • the system includes a server 120, a database 140, and a number of terminals 160.
  • the server 120 is a server, or a server cluster composed of several servers, or a virtualization platform, or a cloud computing service center.
  • the server 120 may be a server that provides background support for video service applications.
  • the server 120 may be composed of one or more functional units.
  • the server 120 may include an interface unit 120a, an encoding unit 120b, and a pushing unit 120c.
  • the interface unit 120a is used for information interaction with video service applications installed in the terminal 160 to obtain user-related information corresponding to the terminal 160, such as user account information and user operation information.
  • the encoding unit 120b is used for encoding unencoded video data to obtain an encoded video.
  • the pushing unit 120c is configured to push the encoded video to the terminal 160 corresponding to each user.
  • the aforementioned database 140 may be a Redis database, or may also be another type of database. Among them, the database 140 is used to store various types of data, such as user information of each user, including various unencoded original video data, encoded video data, and so on.
  • the interface unit 120a obtains the relevant information of the user corresponding to each terminal, it stores the relevant information of the user in the database 140, and the encoding unit 120b encodes the original video data stored in the database 140 and stores it back in the database 140, and pushes
  • the unit 120c pushes the video to the user, it extracts the encoded video data from the database 140 and pushes it to the terminal corresponding to the user.
  • the foregoing video encoding may also be performed by the terminal 160.
  • the terminal 160 may record original video data through an image acquisition component or screen recording software, and encode the recorded original video data and upload it to the network side, so that other terminals can obtain the encoded and uploaded video data from the network side.
  • the terminal 160 encodes the original video data and uploads the encoded video data to the server 120.
  • the server 120 stores the encoded video data in the database 140, and the server 120 receives the encoded video data sent by other terminals.
  • the encoded video data can be obtained from the database 140 and pushed to other terminals, or the encoded video data can be obtained from the database 140 and sent to the content distribution network, and other terminals can obtain the encoded video data from the content distribution network. Pull the encoded video data.
  • the terminal 160 may be a terminal device with a network connection function and installed with a video service application corresponding to the server 120.
  • the terminal 160 may be a smart phone, a tablet computer, an e-book reader, smart glasses, a smart watch, or MP3.
  • Players Motion Picture Experts Group Audio Layer III, Motion Picture Experts Compress Standard Audio Layer 3
  • MP4 Motion Picture Experts Group Audio Layer IV, Motion Picture Experts Compress Standard Audio Layer 4
  • the terminal 160 may also be called a user equipment, a portable terminal, a laptop terminal, a desktop terminal, and so on.
  • the above-mentioned video service application programs may include any application programs that provide continuous image frame streams, such as, including but not limited to, traditional video playback applications, video live broadcast applications, game applications, and communication applications. Application or browser application, etc.
  • the terminal 160 and the server 120 are connected through a communication network.
  • the communication network is a wired network or a wireless network.
  • the system may further include a management device (not shown in FIG. 1), and the management device and the server 120 are connected through a communication network.
  • the communication network is a wired network or a wireless network.
  • the aforementioned wireless network or wired network uses standard communication technologies and/or protocols.
  • the network is usually the Internet, but it can also be any network, including but not limited to Local Area Network (LAN), Metropolitan Area Network (MAN), Wide Area Network (WAN), mobile, wired or wireless Any combination of network, private network, or virtual private network.
  • technologies and/or formats including HyperText Mark-up Language (HTML), Extensible Markup Language (XML), etc. are used to represent data exchanged over the network.
  • SSL Secure Socket Layer
  • TLS Transport Layer Security
  • VPN Virtual Private Network
  • IPsec Internet Protocol Security
  • customized and/or dedicated data communication technologies can also be used to replace or supplement the aforementioned data communication technologies.
  • 5G NR 5th-Generation New Radio
  • high-definition (HD) video represented by 720P/1080P (commercial channels are mostly named “Ultra Definition” or “Blu-ray”) began to provide ultra-high definition (UHD) video.
  • UHD ultra-high definition
  • Vibration upgrade Different from the upgrade in the HD era, in addition to the most intuitive image definition improvement, the upgrade of ultra-high-definition video also includes the upgrade of a total of 5 typical dimensions such as frame rate, dynamic range, color gamut, and bit depth (please refer to Figure 2, It shows a schematic diagram of the ultra-high-definition dimension decomposition involved in this application).
  • VVC Versatile Video Coding
  • AV1/AV2 video compression standards
  • AVS3 Analog Video Coding Standard 3, the third generation of audio and video coding Standard
  • CAE Content Aware Encoding
  • Fig. 3 is a flowchart showing a video coding method according to an exemplary embodiment.
  • the video coding method may be used in a computer device, such as a server or a terminal of the system shown in Fig. 1 above.
  • the video encoding method may include the following steps:
  • Step 31 Segment the original video data to obtain original video fragments.
  • the foregoing segmentation of the original video data may be the segmentation of the original video data by means of shot segmentation.
  • Step 32 Perform video content analysis on the original video segment to obtain video image processing parameters corresponding to the original video segment.
  • the video image processing parameters obtained by analyzing the video content may be processing parameters corresponding to one or more dimensions of the ultra-high-definition video shown in FIG. 2 above.
  • Step 33 Perform image processing on the video image in the original video segment based on the video image processing parameter to obtain a processed video segment.
  • the computer device may adjust the parameters of one or more dimensions of the original video segment according to the above-mentioned video image processing parameters to obtain the processed video segment.
  • Step 34 Obtain the encoding parameters of the processed video segment based on the image feature data of the processed video segment.
  • Step 35 Encode the processed video segment according to the encoding parameter to obtain an encoded video segment.
  • the computer device can further merge the encoded video segments according to the time sequence of the division to obtain the encoded video data and push it, for example, to the server/content Distribution network, or push to other terminals.
  • FIG. 4 shows a schematic diagram of a video encoding process involved in an embodiment of the present application.
  • the original video data undergoes the lens segmentation in step S1 to obtain the original video segment; each original video segment undergoes the video content analysis of step S2 to obtain the corresponding video image processing parameters, and then undergoes the image processing of step S3 to obtain The processed video segment; then obtained by the AI encoding parameters of step S4, and the video segment encoding of step S5, to obtain the encoded video segment, and then through the video segment synthesis of step S6, the encoded video data can be obtained.
  • the original video segment is first image processed, and then the encoding parameters corresponding to the processed video segment are determined and encoded, so that the video quality and the encoding parameters can be controlled during the encoding parameter process.
  • the video quality of the original video clip is too high, the video quality of the original video clip can be appropriately reduced to shorten the encoding time and improve the encoding efficiency; correspondingly, when the video quality of the original video clip is low, the The video quality of the original video clip can be appropriately improved to ensure the encoded video quality.
  • the computer device in order to adapt to the multi-dimensional consideration of the ultra-high-definition video source, can determine various parameters based on the AI method to better match the video quality and the encoding parameters.
  • the process can be as shown in subsequent embodiments.
  • Fig. 5 is a flowchart showing a video encoding method according to an exemplary embodiment.
  • the video encoding method may be used in a computer device.
  • the computer device may be a server or a terminal in the system shown in FIG. 1 above.
  • the video encoding method may include the following steps:
  • step 501 the original video data is segmented according to a specified dimension to obtain an original video segment.
  • the specified dimension includes at least one of the following: the distribution characteristics of the dark part and the highlight part in the image, the trajectory and the degree of motion of the motion area, the color distribution and intensity, and the details of the picture.
  • the server may divide the original video data into multiple original video clips that are connected end to end according to time.
  • the server when the server performs shot segmentation, it may consider more dimensions that affect the visual experience of the ultra-high-definition video, rather than being limited to the similarity of image textures.
  • the specified dimensions of the lens segmentation may include:
  • the server when it performs lens segmentation, it can be based on one or more of the distribution characteristics of the dark part and the highlight part in the image, the trajectory and degree of the motion area, the color distribution and intensity, and the details of the screen. , To determine the split point in the original video data.
  • the server can determine the segmentation point in the original video data through a single dimension of the above 4 dimensions. For example, the server can analyze the dark and highlight parts of several video frames before a certain video frame in the original video data. The distribution characteristics of the video frame and the distribution characteristics of the dark and highlight parts of the video frames after the video frame. If the distribution characteristics of the dark and highlight parts of the video frames before and after the video frame are different in the image If the preset condition is met, the video frame can be determined as a segmentation point. Alternatively, the server can analyze the trajectory and degree of the motion area of several video frames before a certain video frame in the original video data, and the trajectory and degree of the motion area of several video frames after the video frame. If the trajectory and degree of the difference of the motion regions of several video frames meet the preset condition, the video frame can be determined as a segmentation point, and so on.
  • the server may also combine multiple dimensions among the above-mentioned four specified dimensions to comprehensively determine the division points in the original video data.
  • Step 502 Input the original video segment and the target image quality data to the video analysis model, and obtain the video image processing parameters output by the video analysis model.
  • the video analysis model is a machine learning model obtained by training video clip samples, target image quality data of the video clip samples, and video image processing parameters of the video clip samples.
  • the developer can pre-label the training data set of the video analysis model.
  • the training data set consists of several video clip samples, the target image quality data of the video clip samples, and the video image processing parameters of the video clip samples.
  • the model training device can train the video analysis model through the training data set of the video analysis model.
  • the target image quality data includes an image quality level.
  • the above-mentioned image quality level may be a designated quality level, which may be a level standard parameter designated by the service.
  • the image quality level may use 1 to 10 to indicate the quality level, and 10 indicates the highest quality. , 1 means the lowest.
  • the video image processing parameter includes at least one of the following: a target frame rate, a target quantization bit depth, and a brightness adjustment curve.
  • the server can analyze the video content through a machine learning model, and output the most suitable image parameters under a specified visual quality factor for subsequent further preprocessing.
  • Its working principle is: For ultra-high-resolution video, due to the difference in image content characteristics, the dimensions shown in Figure 2 are not as high as possible in all cases, and due to the current stage of human vision The principle research is imperfect, and it is impossible to calculate and quantify each dimension through precise mathematical modeling.
  • the server can analyze the input lens (that is, the input original video segment) to obtain appropriate ultra-high-definition dimensional video image processing parameters.
  • FIG. 6 shows a schematic diagram of input and output of a video analysis model involved in an embodiment of the present application.
  • the server inputs the original video clips and the desired target image quality data into the AI network model (that is, the above-mentioned video analysis model).
  • the AI network model can determine the input original image and the specified quality After the factor is inferred by AI, the most suitable ultra-high-definition dimension video image processing parameters are output.
  • Step 503 Perform image processing on the video image in the original video segment based on the video image processing parameter to obtain a processed video segment.
  • the manner of performing image processing on the video image in the original video segment may be as follows:
  • the target quantization bit depth In response to the video image processing parameter including the target quantization bit depth, and the target quantization bit depth is lower than the quantization bit depth of the original video segment, down-sampling and quantizing the original video segment to obtain the processed video segment ; Responsive to the video image processing parameters including the target quantization bit depth, and the target quantization bit depth is higher than the quantization bit depth of the original video segment, the original video segment is subjected to high-precision inverse quantization in the reverse direction to obtain the processed Video clips.
  • the quantization bit depth rate of the original video segment is maintained to obtain the processed video segment.
  • tone mapping the original video segment based on the brightness adjustment curve to obtain the The processed video clip.
  • the hue of the original video segment is maintained, and the processed video segment is obtained.
  • the above-mentioned video image processing parameters include two or more than two types
  • the above-mentioned three corresponding processing modes may be combined for processing.
  • the server can superframe the original video clip according to the target frame rate, and perform high-precision on the original video clip according to the target quantization bit depth Inverse quantization process, and maintain the original video clip tones, so as to obtain the processed video clip.
  • Step 504 Input the processed image feature data of the video segment into the coding parameter determination model, and obtain the coding parameter output by the coding parameter determination model.
  • the coding parameter determination model is a machine learning model obtained through training of image feature data samples and coding parameters corresponding to the image feature data samples.
  • the image feature data includes at least one of the following: frame rate, quantization bit depth, maximum brightness, minimum brightness, image type, motion vector, and target image quality data.
  • the encoding parameter includes a code rate.
  • the server may use a pre-trained AI model (ie, the aforementioned encoding parameter determination model) to determine the encoding parameters of the processed video segment based on the image feature data of the processed video segment obtained in step 503.
  • a pre-trained AI model ie, the aforementioned encoding parameter determination model
  • the developer can pre-label the training data set of the coding parameter determination model.
  • the training data set is composed of several image feature data samples and the coding parameters of the image feature data samples.
  • the model training device can use the The training data set of the coding parameter determination model is trained to obtain the above coding parameter determination model.
  • FIG. 7 shows a schematic diagram of the input and output of the encoding parameter determination model involved in the embodiment of the present application.
  • the AI model used in the embodiments of the present application can be a neural network model trained in advance using a calibration data set. Its input is a number of image feature data, and its output is a given image quality factor (ie, target image quality Data) encoding parameters, such as: bit rate.
  • image quality factor ie, target image quality Data
  • the input of the model shown in the embodiment of this application contains several dimensions directly related to the ultra-high-definition film source, such as frame rate, quantization bit depth, maximum brightness, minimum brightness, image type, motion vector, and target. Image quality data and more.
  • the image type and motion vector of the above-mentioned processed video segment can be obtained through 1-pass encoding, such as encoding using a relatively fast conditional random field (CRF), or a fixed quantization factor QP.
  • 1-pass encoding such as encoding using a relatively fast conditional random field (CRF), or a fixed quantization factor QP.
  • Step 505 Encode the processed video segment according to the encoding parameter to obtain an encoded video segment.
  • the server may perform a compression encoding operation on the processed video clip according to the encoding parameters obtained in step 504 above.
  • the algorithm used in the above compression coding operation can adopt industry common standards, for example: H.264/AVC (Advanced Video Coding, Advanced Video Coding), HEVC (High Efficiency Video Coding, High-efficiency Video Coding), VP9, AVS2 (Audio Video Coding Standard 2, the second-generation audio and video coding standard), etc.
  • the compression coding algorithm tool used in the embodiment of the present application may adopt a standard that includes algorithm tools supporting ultra-high-definition characteristics, such as HEVC, VVC, AV1, AVS3, and so on.
  • Step 506 Combine the encoded video fragments according to the division order of the corresponding original video fragments to obtain encoded video data.
  • the solution shown in the embodiments of the present application considers the multi-dimensionality of ultra-high-definition video when splitting the shots, and improves the distinction between different shots in terms of dark/high brightness, color, fast motion, and detailed information, thereby improving video clips The accuracy of segmentation to further optimize the compression efficiency in subsequent shots.
  • the solution shown in the embodiments of the present application introduces the multi-dimensional features of ultra-high-definition video in the process of acquiring video image processing parameters and acquiring encoding parameters, and considers the effect of image characteristics such as frame rate, brightness range, and quantization bit depth on image quality. The associated influence of the image coding is improved.
  • Fig. 8 is a block diagram showing the structure of a video encoding device according to an exemplary embodiment.
  • the video encoding device can perform all or part of the steps in the embodiment shown in FIG. 3 or FIG. 5.
  • the device can be a computer device, or it can be set in a computer device.
  • the video encoding device may include:
  • the video segmentation module 801 is configured to segment original video data to obtain original video segments
  • the video content analysis module 802 is configured to perform video content analysis on the original video segment to obtain video image processing parameters corresponding to the original video segment;
  • the video processing module 803 is configured to perform image processing on the video image in the original video segment based on the video image processing parameter to obtain a processed video segment;
  • the encoding parameter acquisition module 804 is configured to acquire the encoding parameters of the processed video segment based on the image feature data of the processed video segment;
  • the encoding module 805 is configured to encode the processed video segment according to the encoding parameter to obtain an encoded video segment.
  • the video content analysis module 802 is configured to:
  • the video analysis model is a machine learning model obtained by training video segment samples, target image quality data of the video segment samples, and video image processing parameters of the video segment samples.
  • the target image quality data includes an image quality level.
  • the video image processing parameter includes at least one of the following:
  • Target frame rate target quantization bit depth, and brightness adjustment curve.
  • the video processing module 803 includes:
  • the frame rate up-sampling unit 8031 is configured to respond to the video image processing parameters including the target frame rate, and the target frame rate is higher than the frame rate of the original video segment, to perform frame rate up-sampling on the original Perform superframe processing on the video segment to obtain the processed video segment;
  • the frame rate cropping unit 8032 is configured to respond to the video image processing parameters including the target frame rate, and the target frame rate is lower than the frame rate of the original video segment, and perform frame rate downsampling on the original video The segment is cropped to obtain the processed video segment.
  • the frame rate maintaining unit 8033 is configured to respond to the video image processing parameters including the target frame rate, and the target frame rate is equal to the frame rate of the original video segment, and maintain the frame rate of the original video segment to obtain The processed video segment.
  • the video processing module 803 includes:
  • the down-sampling quantization unit 8034 is configured to respond to the video image processing parameter including the target quantization bit depth, and the target quantization bit depth is lower than the quantization bit depth of the original video segment, perform processing on the original video segment Down-sampling and quantization to obtain the processed video segment;
  • the inverse quantization unit 8035 is configured to respond to the video image processing parameter including the target quantization bit depth, and the target quantization bit depth is higher than the quantization bit depth of the original video segment, to reverse the original video segment Directional high-precision inverse quantization to obtain the processed video segment.
  • the quantization holding unit 8036 is configured to respond to the video image processing parameter including the target quantization bit depth, and the target quantization bit depth is equal to the quantization bit depth of the original video segment, and maintain the quantization bit of the original video segment Depth rate to obtain the processed video segment.
  • the video processing module 803 includes:
  • the tone mapping unit 8037 is configured to respond to the video image processing parameter including the brightness adjustment curve, and the brightness range corresponding to the brightness adjustment curve is inconsistent with the brightness range of the original video clip, based on the brightness adjustment curve pair
  • the original video segment is tone mapped to obtain the processed video segment.
  • the tone preserving unit 8038 is configured to respond to the video image processing parameter including the brightness adjustment curve, and the brightness range corresponding to the brightness adjustment curve is consistent with the brightness range of the original video segment, and to maintain the original video segment Hue to obtain the processed video clip.
  • the encoding parameter acquisition module 804 is configured to:
  • the coding parameter determination model is a machine learning model obtained through training of image feature data samples and coding parameters corresponding to the image feature data samples.
  • the image feature data includes at least one of the following: frame rate, quantization bit depth, maximum brightness, minimum brightness, image type, motion vector, and target image quality data.
  • the encoding parameter includes a code rate.
  • the video segmentation module 801 is configured to segment the original video data according to a specified dimension to obtain the original video segment;
  • the specified dimension includes at least one of the following: the distribution characteristics of the dark part and the highlight part in the image, the trajectory and the degree of motion of the motion area, the color distribution and intensity, and the details of the picture.
  • the device further includes:
  • the merging module 806 is configured to merge the encoded video segments according to the corresponding original video segment segmentation order to obtain encoded video data.
  • the solution shown in the embodiments of the present application considers the multi-dimensionality of ultra-high-definition video when splitting the shots, and improves the distinction between different shots in terms of dark/high brightness, color, fast motion, and detailed information, thereby improving video clips The accuracy of segmentation to further optimize the compression efficiency in subsequent shots.
  • the solution shown in the embodiments of the present application introduces the multi-dimensional features of ultra-high-definition video in the process of acquiring video image processing parameters and acquiring encoding parameters, and considers the effect of image characteristics such as frame rate, brightness range, and quantization bit depth on image quality. The associated influence of the image coding is improved.
  • Fig. 10 is a schematic diagram showing the structure of a computer device according to an exemplary embodiment.
  • the computer device can be implemented as a server on the network side.
  • the server may be the server 120 shown in FIG. 1.
  • the computer device 1000 includes a central processing unit (Central Processing Unit, CPU) 1001, a system memory 1004 including a random access memory (Random Access Memory, RAM) 1002 and a read-only memory (Read-Only Memory, ROM) 1003, and A system bus 1005 connecting the system memory 1004 and the central processing unit 1001.
  • the computer device 1000 also includes a basic input/output system (Input/Output system, I/O system) 1006 that helps to transfer information between various devices in the computer, and is used to store an operating system 1013, application programs 1014, and other programs.
  • the mass storage device 1007 of the module 1015 The mass storage device 1007 of the module 1015.
  • the mass storage device 1007 is connected to the central processing unit 1001 through a mass storage controller (not shown) connected to the system bus 1005.
  • the mass storage device 1007 and its associated computer-readable medium provide non-volatile storage for the computer device 1000. That is, the mass storage device 1007 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM (Compact Disc Read-Only Memory) drive.
  • the computer-readable media may include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash memory or other Solid-state storage technology, CD-ROM, DVD (Digital Versatile Disc, Digital Versatile Disc) or other optical storage, tape cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices.
  • RAM random access memory
  • ROM read-only memory
  • EPROM Erasable Programmable Read-Only Memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • flash memory or other Solid-state storage technology
  • CD-ROM DVD (Digital Versatile Disc, Digital Versatile Disc) or other optical storage, tape cartridges, magnetic tape, magnetic disk storage or other magnetic
  • the computer device 1000 may be connected to the Internet or other network devices through the network interface unit 1011 connected to the system bus 1005.
  • the memory further includes one or more programs, the one or more programs are stored in the memory, and the central processing unit 1001 executes the one or more programs to implement all or all of the methods shown in FIG. 3 or FIG. 5 Part of the steps.
  • FIG. 11 shows a structural block diagram of a computer device 1100 provided by an exemplary embodiment of the present application.
  • the computer device 1100 may be a terminal, and the terminal may be the terminal 160 shown in FIG. 1.
  • the computer device 1100 includes a processor 1101 and a memory 1102.
  • the processor 1101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on.
  • the processor 1101 can adopt at least one hardware form among DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array, Programmable Logic Array). accomplish.
  • the processor 1101 may also include a main processor and a co-processor.
  • the processor 1101 may be integrated with a GPU (Graphics Processing Unit, image processor).
  • the processor 1101 may further include an AI (Artificial Intelligence) processor, and the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence
  • the memory 1102 may include one or more computer-readable storage media, which may be non-transitory.
  • the memory 1102 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
  • the non-transitory computer-readable storage medium in the memory 1102 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 1101 to implement the method provided in the method embodiment of the present application.
  • the computer device 1100 may optionally further include: a peripheral device interface 1103 and at least one peripheral device.
  • the processor 1101, the memory 1102, and the peripheral device interface 1103 may be connected by a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 1103 through a bus, a signal line, or a circuit board.
  • the peripheral device includes at least one of the following: a radio frequency circuit 1104, a touch screen 1105, a camera 1106, an audio circuit 1107, a positioning component 1108, and a power supply 1109.
  • the peripheral device interface 1103 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 1101 and the memory 1102.
  • I/O Input/Output
  • the radio frequency circuit 1104 is used to receive and transmit RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 1104 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, and so on.
  • the radio frequency circuit 1104 may also include a circuit related to NFC (Near Field Communication), which is not limited in this application.
  • the display screen 1105 is used to display UI (User Interface, user interface).
  • the UI can include graphics, text, icons, videos, and any combination thereof.
  • the display screen 1105 also has the ability to collect touch signals on or above the surface of the display screen 1105.
  • the display screen 1105 may be made of materials such as LCD (Liquid Crystal Display) and OLED (Organic Light-Emitting Diode).
  • the camera assembly 1106 is used to capture images or videos.
  • the camera assembly 1106 includes a front camera and a rear camera.
  • the camera assembly 1106 may also include a flash.
  • the audio circuit 1107 may include a microphone and a speaker. For the purpose of stereo collection or noise reduction, there may be multiple microphones, which are respectively arranged in different parts of the computer device 1100. In some embodiments, the audio circuit 1107 may also include a headphone jack.
  • the positioning component 1108 is used to locate the current geographic location of the computer device 1100 to implement navigation or LBS (Location Based Service, location-based service).
  • LBS Location Based Service, location-based service
  • the power supply 1109 is used to supply power to various components in the computer device 1100.
  • the computer device 1100 further includes one or more sensors 1110.
  • the one or more sensors 1110 include, but are not limited to: an acceleration sensor 1111, a gyroscope sensor 1112, a pressure sensor 1113, a fingerprint sensor 1114, an optical sensor 1115, and a proximity sensor 1116.
  • FIG. 11 does not constitute a limitation on the computer device 1100, and may include more or fewer components than shown in the figure, or combine some components, or adopt different component arrangements.
  • non-transitory computer-readable storage medium including instructions, such as a memory including a computer program (instruction), which can be executed by a processor of a computer device to complete the present application. All or part of the steps of the method shown in each embodiment.
  • the non-transitory computer-readable storage medium may be a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a compact disc read-only memory (Compact Disc Read-Only Memory, CD-ROM), magnetic tapes, floppy disks and optical data storage devices, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请是关于一种视频编码方法、装置、计算机设备及存储介质。该方法包括:对原始视频数据进行分割,得到原始视频片段;对该原始视频片段进行视频内容分析,得到视频图像处理参数;基于该视频图像处理参数对该原始视频片段中的视频图像进行图像处理,得到处理后的视频片段;基于该处理后的视频片段的图像特征数据,获取该处理后的视频片段的编码参数;按照该编码参数对该处理后的视频片段进行编码,得到编码后的视频片段。本方案能够通过人工智能AI方式控制视频质量与编码参数之间的关系,从而达到兼顾编码效率和编码后的视频质量的目的,从而提高编码效果。

Description

视频编码方法、装置、计算机设备及存储介质
本申请要求于2020年03月02日提交的、申请号为202010135358.0、发明名称为“视频编码方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及视频处理技术领域,特别涉及一种视频编码方法、装置、计算机设备及存储介质。
背景技术
随着网络和计算机技术的不断发展,用户对网络视频的体验质量的要求也越来越高。
网络侧的视频服务器向用户终端提供网络视频时,可以对原始视频数据进行编码后推送给用户终端,在相关技术中,视频服务器可以通过感知编码技术对原始视频数据进行编码。比如,视频服务器将原始视频数据切分为视频片段,然后针对每个视频片段分别确定合适的编码参数,并根据确定的编码参数对相应的视频片段进行编码。
然而,相关技术所示的方案直接在原始视频数据分割出的视频片段上进行编码,当原始视频质量过高或过低时,无法兼顾视频编码效率和编码后的视频质量。
发明内容
本申请实施例提供了一种视频编码方法、装置、计算机设备及存储介质,可以兼顾视频编码效率和编码后的视频质量,该技术方案如下:
一方面,提供了一种视频编码方法,应用于计算机设备中,所述方法包括:
对原始视频数据进行分割,得到原始视频片段;
对所述原始视频片段进行视频内容分析,得到所述原始视频片段对应的视频图像处理参数;
基于所述视频图像处理参数对所述原始视频片段中的视频图像进行图像处理,得到处理后的视频片段;
基于所述处理后的视频片段的图像特征数据,获取所述处理后的视频片段的编码参数;
按照所述编码参数对所述处理后的视频片段进行编码,得到编码后的视频片段。
另一方面,提供了一种视频编码装置,所述装置包括:
视频分割模块,用于对原始视频数据进行分割,得到原始视频片段;
视频内容分析模块,用于对所述原始视频片段进行视频内容分析,得到所述原始视频片段对应的视频图像处理参数;
视频处理模块,用于基于所述视频图像处理参数对所述原始视频片段中的视频图像进行图像处理,得到处理后的视频片段;
编码参数获取模块,用于基于所述处理后的视频片段的图像特征数据,获取所述处理后的视频片段的编码参数;
编码模块,用于按照所述编码参数对所述处理后的视频片段进行编码,得到编码后的视频片段。
在一种可能的实现方式中,所述视频内容分析模块,用于:
将所述原始视频片段以及目标图像质量数据输入至视频分析模型,获得所述视频分析模型输出的所述视频图像处理参数;
所述视频分析模型是通过视频片段样本、视频片段样本的目标图像质量数据以及所述视频片段样本的视频图像处理参数训练得到的机器学习模型。
在一种可能的实现方式中,所述目标图像质量数据包括图像质量等级。
在一种可能的实现方式中,所述视频图像处理参数包括如下至少一种:
目标帧率、目标量化位深以及亮度调整曲线。
在一种可能的实现方式中,所述视频处理模块,包括:
帧率上采样单元,用于响应于所述视频图像处理参数包括所述目标帧率,且所述目标帧率高于所述原始视频片段的帧率,通过帧率上采样对所述原始视频片段进行超帧处理,得到所述处理后的视频片段;
帧率裁剪单元,用于响应于所述视频图像处理参数包括所述目标帧率,且所述目标帧率低于所述原始视频片段的帧率,通过帧率下采样对所述原始视频片段进行裁剪处理,得到所述处理后的视频片段;
帧率保持单元,用于响应于所述视频图像处理参数包括所述目标帧率,且所述目标帧率等于所述原始视频片段的帧率,保持所述原始视频片段的帧率,得到所述处理后的视频片段。
在一种可能的实现方式中,所述视频处理模块,包括:
下采样量化单元,用于响应于所述视频图像处理参数包括所述目标量化位深,且所述目标量化位深低于所述原始视频片段的量化位深,对所述原始视频片段进行下采样量化,得到所述处理后的视频片段;
逆量化单元,用于响应于所述视频图像处理参数包括所述目标量化位深,且所述目标量化位深高于所述原始视频片段的量化位深,对所述原始视频片段进行反方向高精度逆量化,得到所述处理后的视频片段;
量化保持单元,用于响应于所述视频图像处理参数包括所述目标量化位深,且所述目标量化位深等于所述原始视频片段的量化位深,保持所述原始视频片段的量化位深率,得到所述处理后的视频片段。
在一种可能的实现方式中,所述视频处理模块,包括:
色调映射单元,用于响应于所述视频图像处理参数包括所述亮度调整曲线,且所述亮度调整曲线对应的亮度范围与所述原始视频片段的亮度范围不一致,基于所述亮度调整曲线对所述原始视频片段进行色调映射,得到所述处理后的视频片段;
色调保持单元,用于响应于所述视频图像处理参数包括所述亮度调整曲线,且所述亮度调整曲线对应的亮度范围与所述原始视频片段的亮度范围一致,保持所述原始视频片段的色调,得到所述处理后的视频片段。
在一种可能的实现方式中,所述编码参数获取模块,用于:
将所述处理后的视频片段的图像特征数据输入至编码参数确定模型,获得所述编码参数确定模型输出的所述编码参数;
所述编码参数确定模型是通过图像特征数据样本,以及所述图像特征数据样本对应的编码参数训练得到的机器学习模型。
在一种可能的实现方式中,所述图像特征数据包括如下至少一种:帧率、量化位深、最大亮度、最小亮度、图像类型、运动矢量以及目标图像质量数据。
在一种可能的实现方式中,所述编码参数包括码率。
在一种可能的实现方式中,所述视频分割模块,用于按照指定维度对所述原始视频数据进行分割,得到所述原始视频片段;
其中,所述指定维度包括如下至少一种:暗部及高亮部在图像中的分布特性、运动区域的轨迹与运动程度、色彩分布及浓烈程度以及画面细节。
在一种可能的实现方式中,所述装置还包括:
合并模块,用于按照对应的原始视频片段的分割顺序,将各个编码后的视频片段进行合并,得到编码后的视频数据。
另一方面,提供了一种计算机设备,所述计算机设备包含处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如上所述的视频编码方法。
另一方面,提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如上所述的视频编码方法。
本申请提供的技术方案可以包括以下有益效果:
在确定编码参数之前,先对原始视频片段进行图像处理,然后确定处理后的视频片段对应的编码参数并进行编码,通过将待编码的视频片段的视频图像质量控制在合理的质量范围内,避免待编码的视频片段的视频图像质量过高或者过低,这样可以控制视频质量与编码参数之间的关系,从而达到兼顾编码效率和编码后的视频质量的目的,从而提高编码效果。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。
图1是本申请各个实施例涉及的一种视频服务系统的系统构成图;
图2是本申请涉及的超高清维度分解示意图;
图3是根据一示例性实施例示出的一种视频编码方法的流程图;
图4是图3所示实施例涉及的视频编码的流程示意图;
图5是根据一示例性实施例示出的一种视频编码方法的流程图;
图6是图5所示实施例涉及的视频分析模型的输入输出示意图;
图7是图5所示实施例涉及的编码参数确定模型的输入输出示意图;
图8是根据一示例性实施例示出的视频编码装置的结构方框图;
图9是根据一示例性实施例示出的视频编码装置的结构方框图;
图10是根据一示例性实施例示出的一种计算机设备的结构示意图;
图11是根据一示例性实施例示出的一种计算机设备的结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
本申请实施例提出了一种视频编码方案,该方案能够基于人工智能(Artificial Intelligence,AI),更好匹配图像质量与编码参数,兼顾视频编码的效率和编码后的视频质量。为了便于理解,下面对本申请涉及的名词进行解释。
1)镜头分割
本申请中,镜头分割是指将输入的原始片源(待编码)分割为若干段时间连续、片段之间不重叠、且所有片段按照顺序可合并为原始片源的“镜头”。为了保证整体系统性能最优,一般情况下将时间范围连续的、内容相似的图像分割为一个片段,称为“镜头”。后续的处理均以这些“镜头”代表的视频图像片段为单位来进行。
2)AI
AI是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
随着人工智能技术研究和进步,人工智能技术在多个领域展开研究和应用,例如常见的智能家居、智能穿戴设备、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、无人机、机器人、智能医疗、智能客服、智能视频服务等,随着技术的发展,人工智能技术将在更多的领域得到应用,并发挥越来越重要的价值。
3)机器学习(Machine Learning,ML)
ML是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。
本申请实施例提供的方案涉及人工智能的机器学习等技术,具体通过如下实施例进行说明。
请参考图1,其示出了本申请各个实施例涉及的一种视频服务系统的系统构成图。如图1所示,该系统包括服务器120、数据库140以及若干个终端160。
服务器120是一台服务器,或者是由若干台服务器组成的服务器集群,或者是一个虚拟化平台,或者是一个云计算服务中心。
服务器120可以是为视频服务类应用程序提供后台支持的服务器。服务器120可以由一个或多个功能单元组成。
在一种可能的实现方式中,如图1所示,服务器120可以包括接口单元120a、编码单元120b以及推送单元120c。
接口单元120a用于与终端160中安装的视频服务类应用程序进行信息交互,以获取终端160对应的用户的相关信息,比如,用户账号信息以及用户操作信息等等。
编码单元120b用于对未经编码的视频数据进行编码,得到编码后的视频。
推送单元120c用于向各个用户对应的终端160推送编码后的视频。
上述数据库140可以是Redis数据库,或者,也可以是其它类型数据库。其中,数据库140用于存储各类数据,比如,各个用户的用户信息、包含各个未经编码的原始视频数据、以及编码后的视频数据等等。
比如,接口单元120a获取到各个终端对应的用户的相关信息后,将用户的相关信息存储至数据库140,编码单元120b对数据库140中存储的原始视频数据进行编码后,存储回数据库140中,推送单元120c向用户推送视频时,从数据库140提取编码后的视频数据并推送给用户对应的终端。
或者,在另一种可能的实现方法中,上述视频编码也可以由终端160来执行。比如,终端160可以通过图像采集组件或者录屏软件录制原始视频数据,并将录制的原始视频数据进行编码后上传至网络侧,以便其它终端从网络侧获取编码后上传的视频数据。
例如,终端160对原始视频数据进行编码,并将编码后的视频数据上传至服务器120,服务器120将编码后的视频数据存储至数据库140中,服务器120接收到其它终端发送的、获取该编码后的视频数据的请求时,可以从数据库140中获取编码后的视频数据并推送给其它终端,或者,从数据库140中获取编码后的视频数据并发送给内容分发网络,由其它终端从内容分发网络拉取该编码后的视频数据。
终端160可以是具有网络连接功能,且安装有上述服务器120对应的视频服务类应用程序的终端设备,比如,终端160可以是智能手机、平板电脑、电子书阅读器、智能眼镜、智能手表、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑等等。可选的,该终端160还可能被称为用户设备、便携式终端、膝上型终端、台式终端等。
在本申请实施例中,上述视频服务类应用程序可以包括任意提供连续图像帧流的应用程序,比如,包括且不限于传统的视频播放应用程序、视频直播应用程序、游戏类应用程序、通信类应用程序或者浏览器类应用程序等等。
终端160与服务器120之间通过通信网络相连。可选的,该通信网络是有线网络或无线网络。
可选的,该系统还可以包括管理设备(图1未示出),该管理设备与服务器120之间通过通信网络相连。可选的,通信网络是有线网络或无线网络。
可选的,上述的无线网络或有线网络使用标准通信技术和/或协议。网络通常为因特网,但也可以是任何网络,包括但不限于局域网(Local Area Network,LAN)、城域网(Metropolitan Area Network,MAN)、广域网(Wide Area Network,WAN)、移动、有线或者无线网络、专用网络或者虚拟专用网络的任何组合。在一些实施例中,使用包括超文本标记语言(Hyper Text Mark-up Language,HTML)、可扩展标记语言(Extensible Markup Language,XML)等的技术和/或格式来代表通过网络交换的数据。此外还可以使用诸如安全套接字层(Secure Socket Layer,SSL)、传输层安全(Transport Layer Security,TLS)、虚拟专用网络(Virtual Private Network,VPN)、网际协议安全(Internet Protocol Security,IPsec)等常规加密技术来加密所有或者一些链路。在另一些实施例中,还可以使用定制和/或专用数据通信技术取代或者补充上述数据通信技术。
随着支持第五代移动通信新空口(5th-Generation New Radio,5G NR)网络连接的智能手机逐步进入市场,5G网络即将全面进入商用化。其高吞吐、低延迟、强可靠的技术特性为下一代互联网服务提供了坚实基础,视频点播服务、低时延直播、云游戏、物联网等强依赖网络基础设施质量的业务必然将会得到巨大的发展。
伴随人们对视听享受的不断提高,以及产业链相关设备的不断升级,消费端视频业务的趋势必然走向更高清晰度及其更优质体验。由此,以720P/1080P为代表的高分辨率(High Definition,HD)视频(商业渠道多以“超清”或者“蓝光”命名)开始向超高分辨率(Ultra High Definition,UHD)视频提振升级。不同于HD时代的升级,超高清视频的升级除了最直观的图像清晰度提升外,还包括了帧率、动态范围、色域、位深等共5个典型维度的升级(请参考图2,其示出了本申请涉及的超高清维度分解示意图)。如图2所示,UHD超高清视频各维度的提升会导致视频码率显著增加,即使依托5G高带宽,高效压缩仍然是一个极大的挑战。目前,针对超高清视频的压缩主要有2个思路:
1)寻找压缩比更高的算法工具,比如多功能视频编码(Versatile Video Coding,VVC)视频压缩标准,以及AV1/AV2视频压缩标准,及AVS3(Audio  Video Coding Standard 3,第三代音视频编码标准)视频压缩标准等等。
2)不依赖特定的视频压缩标准,结合近年来成熟的机器学习、AI能力提高编码过程中的码率分配效率,即重要的、或者对主观视觉感知更敏感的区域分配更多的码率,反之可以分配更少的码率,以此在整体主观质量不变的情况下节省码率。
上述1)和2)可以结合,即为感知编码(Content Aware Encoding,CAE)技术。目前的CAE技术主要用于传统视频,以维持同等主观视觉质量为前提、尽量节省码率为目的,并未充分考虑超高清视频在帧率、量化位深等多维度上的特点,自然也无法综合超高清视频多维度参数来达到视觉质量与码率之间“最好”的权衡。
为解决上述问题,本申请后续实施例提出一种视频编码方案,可以更好的兼顾视频编码的效率和编码后的视频质量。
图3是根据一示例性实施例示出的一种视频编码方法的流程图,该视频编码方法可以用于计算机设备,比如上述图1所示系统的服务器或者终端中。如图3所示,该视频编码方法可以包括如下步骤:
步骤31,对原始视频数据进行分割,得到原始视频片段。
其中,上述对原始视频数据进行分割,可以是采用镜头分割的方式对原始视频数据进行分割。
步骤32,对该原始视频片段进行视频内容分析,得到该原始视频片段对应的视频图像处理参数。
在本申请实施例中,对视频内容分析得到的视频图像处理参数,可以是对应上述图2所示的超高清视频的一种或者多种维度的处理参数。
步骤33,基于该视频图像处理参数对该原始视频片段中的视频图像进行图像处理,得到处理后的视频片段。
在本申请实施例中,计算机设备可以按照上述视频图像处理参数,对原始视频片段的一种或者多种维度的参数进行调整处理,得到处理后的视频片段。
步骤34,基于该处理后的视频片段的图像特征数据,获取该处理后的视频片段的编码参数。
步骤35,按照该编码参数对该处理后的视频片段进行编码,得到编码后的视频片段。
在对各个视频片段进行编码得到编码后的视频片段之后,计算机设备还可以进一步将各个编码后的视频片段按照分割的时间顺序合并得到编码后的视频数据并进行推送,比如,推送至服务器/内容分发网络,或者,推送给其它终端。
例如,请参考图4,其示出了本申请实施例涉及的视频编码的流程示意图。如图4所示,原始视频数据经过步骤S1的镜头分割,得到原始视频片段;各个原始视频片段分别经过步骤S2的视频内容分析得到对应的视频图像处理参数,再经过步骤S3的图像处理,得到处理后的视频片段;然后通过步骤S4的AI编码参数获取,以及步骤S5的视频片段编码,得到编码后的视频片段,再经过步骤S6的视频片段合成,即可以得到编码后的视频数据。
通过上述方案,分割得到原始视频片段之后,首先对原始视频片段进行图像处理,然后确定处理后的视频片段对应的编码参数并进行编码,这样可以在编码参数过程中控制视频质量与编码参数之间的关系,比如,当原始视频片段的视频质量过高时,可以适当降低原始视频片段的视频质量,以缩短编码时间,提高编码效率;相应的,当原始视频片段的视频质量较低时,还可以适当的提高原始视频片段的视频质量,以保证编码后的视频质量。
综上所述,本申请实施例所示的方案,在确定编码参数之前,先对原始视频片段进行图像处理,然后确定处理后的视频片段对应的编码参数并进行编码,通过将待编码的视频片段的视频图像质量控制在合理的质量范围内,避免待编码的视频片段的视频图像质量过高或者过低,这样可以控制视频质量与编码参数之间的关系,从而达到兼顾编码效率和编码后的视频质量的目的,从而提高编码效果。
在上述图3所示的方案中,为了适应超高清视频片源的多维度考量,计算机设备可以基于AI方式来进行各类参数的确定,以更好的匹配视频质量与编码参数。该过程可以如后续实施例所示。
图5是根据一示例性实施例示出的一种视频编码方法的流程图,该视频编码方法可以用于计算机设备中。其中,该计算机设备可以是上述图1所示系统中的服务器或者终端。如图5所示,以该计算机设备是上述图1所示系统的服务器为例,该视频编码方法可以包括如下步骤:
步骤501,按照指定维度对该原始视频数据进行分割,得到原始视频片段。
其中,该指定维度包括如下至少一种:暗部及高亮部在图像中的分布特性、运动区域的轨迹与运动程度、色彩分布及浓烈程度以及画面细节。
其中,服务器可以将原始视频数据按照时间划分为首尾相接的多个原始视频片段。
本申请实施例中,服务器进行镜头分割时,可以考虑影响超高清视频视觉体验的更多维度,而不仅限于图像纹理的相似度。可选的,其镜头分割的指定维度可以包括:
1)暗部、高亮部在图像中的分布特性;
2)运动区域的轨迹与程度;
3)色彩分布及浓烈程度;
4)画面细节,如平滑区域的过渡层次感。
在本申请实施例中,服务器在进行镜头分割时,可以按照暗部、高亮部在图像中的分布特性、运动区域的轨迹与程度、色彩分布及浓烈程度以及画面细节中的一种或者多种,来确定原始视频数据中的分割点。
比如,服务器可以通过上述4种维度中的单个维度来确定原始视频数据中的分割点,例如,服务器可以分析原始视频数据中某一视频帧前的若干视频帧的暗部、高亮部在图像中的分布特性,以及该视频帧之后的若干视频帧的暗部、高亮部在图像中的分布特性,如果该视频帧前后的若干视频帧的暗部、高亮部在图像中的分布特性的差异性满足预设条件,则可以将该视频帧确定为一个分割点。或者,服务器可以分析原始视频数据中某一视频帧前的若干视频帧的运动区域的轨迹与程度,以及,该视频帧之后的若干视频帧的运动区域的轨迹与程度,如果该视频帧前后的若干视频帧的运动区域的轨迹与程度的差异性满足预设条件,则可以将该视频帧确定为一个分割点,以此类推。
或者,服务器也可以结合上述4种指定维度中的多个维度来综合确定原始视频数据中的分割点。
步骤502,将原始视频片段以及目标图像质量数据输入至视频分析模型,获得该视频分析模型输出的该视频图像处理参数。
在本申请实施例中,该视频分析模型是通过视频片段样本、视频片段样本的目标图像质量数据以及该视频片段样本的视频图像处理参数训练得到的机器学习模型。
在本申请实施例中,开发人员可以预先标注得到视频分析模型的训练数据 集,该训练数据集由若干个视频片段样本、视频片段样本的目标图像质量数据以及该视频片段样本的视频图像处理参数构成,模型训练设备可以通过该视频分析模型的训练数据集,训练得到上述视频分析模型。
可选的,该目标图像质量数据包括图像质量等级。
上述图像质量等级可以是指定的一个质量等级,其可以是业务指定的一种水平标准参数,比如,在本申请实施例中,图像质量等级可以用1~10来表明质量等级,10表示质量最高、1表示最低。
可选的,该视频图像处理参数包括如下至少一种:目标帧率、目标量化位深以及亮度调整曲线。
本申请实施例所示的方案,服务器可以通过机器学习模型来分析视频内容,并且在指定的视觉质量因子下,输出最合适的图像参数,以便后续作进一步预处理。其工作原理是:对于超高分辨率的视频来说,由于图像内容特征的差异化,图2所示的各个维度并非在所有情况下都是越高越好,且由于现阶段对人眼视觉原理研究的不完善,无法通过精确的数学建模来计算和量化各维度。
举例而言:
1)对于电竞游戏直播画面,帧率越高、体验越好;但对于普通自然采集图像,如无剧烈运动的影视剧室内场景镜头,高帧率并不会带来显著的观影体验提升,于是使用更高的码率对此类镜头编码相当于是一种浪费。
2)对于纹理简单、亮度过度细腻的画面(如:蓝天、白云、水面、室内建筑阴影),需要使用较高的量化位深,以避免在超高清显示器上渲染时出现条带、斑块等伪影;但对于纹理复杂的画面(如:丛林、乱石),自身就存在很多边缘信息,使用较高的量化位深则是浪费。
基于上述原理,服务器可以对输入镜头(即输入的原始视频片段)进行分析,以得到合适的超高清维度的视频图像处理参数。
比如,请参考图6,其示出了本申请实施例涉及的视频分析模型的输入输出示意图。如图6所示,服务器将原始视频片段以及期望的目标图像质量数据输入到AI网络模型(即上述视频分析模型),通过预先训练标定数据集,AI网络模型可以对输入的原始图像及指定质量因子经过AI推理后,输出最合适的超高清维度的视频图像处理参数。
步骤503,基于该视频图像处理参数对该原始视频片段中的视频图像进行 图像处理,得到处理后的视频片段。
可选的,对该原始视频片段中的视频图像进行图像处理的方式可以如下:
1)响应于该视频图像处理参数包括该目标帧率,且该目标帧率高于该原始视频片段的帧率,通过帧率上采样对该原始视频片段进行超帧(Super Frame-rate)处理,得到该处理后的视频片段;响应于该视频图像处理参数包括该目标帧率,且该目标帧率低于该原始视频片段的帧率,通过帧率下采样对该原始视频片段进行裁剪处理,得到该处理后的视频片段。可选的,响应于视频图像处理参数包括目标帧率,且目标帧率等于原始视频片段的帧率,保持原始视频片段的帧率,得到处理后的视频片段。
2)响应于该视频图像处理参数包括该目标量化位深,且该目标量化位深低于该原始视频片段的量化位深,对该原始视频片段进行下采样量化,得到该处理后的视频片段;响应于该视频图像处理参数包括该目标量化位深,且该目标量化位深高于该原始视频片段的量化位深,对该原始视频片段进行反方向高精度逆量化,得到该处理后的视频片段。可选的,响应于视频图像处理参数包括目标量化位深,且目标量化位深等于原始视频片段的量化位深,保持原始视频片段的量化位深率,得到处理后的视频片段。
3)响应于该视频图像处理参数包括该亮度调整曲线,且该亮度调整曲线对应的亮度范围与该原始视频片段的亮度范围不一致,基于该亮度调整曲线对该原始视频片段进行色调映射,得到该处理后的视频片段。可选的,响应于视频图像处理参数包括亮度调整曲线,且亮度调整曲线对应的亮度范围与原始视频片段的亮度范围一致,保持原始视频片段的色调,得到处理后的视频片段。
可选的,当上述视频图像处理参数包括两种或者两种以上时,可以参考上述三种对应的处理方式合并处理。比如,以视频图像处理参数包括目标帧率、目标量化位深以及亮度调整曲线为例,其中,目标帧率高于该原始视频片段的帧率,目标量化位深高于该原始视频片段的量化位深,且亮度调整曲线对应的亮度范围与原始视频片段的亮度范围一致,则服务器可以按照目标帧率对该原始视频片段进行超帧处理,并按照目标量化位深对原始视频片段进行高精度逆量化处理,并保持原始视频片段的色调,从而得到处理后的视频片段。
步骤504,将该处理后的视频片段的图像特征数据输入至编码参数确定模型,获得该编码参数确定模型输出的该编码参数。
可选的,该编码参数确定模型是通过图像特征数据样本,以及该图像特征 数据样本对应的编码参数训练得到的机器学习模型。
可选的,该图像特征数据包括如下至少一种:帧率、量化位深、最大亮度、最小亮度、图像类型、运动矢量以及目标图像质量数据。
可选的,该编码参数包括码率。
在本申请实施例中,服务器可以使用一个预先训练的AI模型(即上述编码参数确定模型),根据步骤503得到的处理后的视频片段的图像特征数据,决定处理后的视频片段的编码参数。
在本申请实施例中,开发人员可以预先标注得到编码参数确定模型的训练数据集,该训练数据集由若干个图像特征数据样本以及该图像特征数据样本的编码参数构成,模型训练设备可以通过该编码参数确定模型的训练数据集,训练得到上述编码参数确定模型。
可选的,请参考图7,其示出了本申请实施例涉及的编码参数确定模型的输入输出示意图。如图7所示,本申请实施例中所使用的AI模型,可以是预先使用标定数据集训练的神经网络模型,其输入为若干图像特征数据,输出为给定图像质量因子(即目标图像质量数据)下的编码参数,如:码率。
如图7所示,本申请实施例所示的模型的输入中包含与超高清片源直接关联的若干维度,比如帧率、量化位深、最大亮度、最小亮度、图像类型、运动矢量以及目标图像质量数据等等。
其中,上述处理后的视频片段的图像类型和运动矢量可以通过1-pass编码、如使用较为快速的条件随机场(conditional random field,CRF)、或者固定量化因子QP的编码获得。
步骤505,按照该编码参数对该处理后的视频片段进行编码,得到编码后的视频片段。
在本申请实施例中,服务器可以根据上述步骤504得到的编码参数,对处理后的视频片段进行压缩编码操作。
可选的,上述压缩编码操作使用的算法可以采用行业通用标准,例如:H.264/AVC(Advanced Video Coding,高级视频编码)、HEVC(High Efficiency Video Coding,高效率视频编码)、VP9、AVS2(Audio Video Coding Standard 2,第二代音视频编码标准)等。
可选的,考虑面向超高清片源,本申请实施例所使用的压缩编码算法工具可以采用包含支持超高清特性的算法工具的标准,比如HEVC、VVC、AV1、 AVS3等。
步骤506,按照对应的原始视频片段的分割顺序,将各个编码后的视频片段进行合并,得到编码后的视频数据。
综上所述,本申请实施例所示的方案,在确定编码参数之前,先对原始视频片段进行图像处理,然后确定处理后的视频片段对应的编码参数并进行编码,通过将待编码的视频片段的视频图像质量控制在合理的质量范围内,避免待编码的视频片段的视频图像质量过高或者过低,这样可以控制视频质量与编码参数之间的关系,从而达到兼顾编码效率和编码后的视频质量的目的,从而提高编码效果。
此外,本申请实施例所示的方案,在进行镜头分割时,考虑超高清视频多维度,提升不同镜头间在暗部/高亮度、色彩、快速运动以及细节信息等方面的区分,从而提高视频片段分割的准确性,以进一步优化后续镜头内的压缩效率。
另外,本申请实施例所示的方案,在视频图像处理参数获取以及编码参数获取过程中引入了超高清视频的多维度特征,考虑了帧率、亮度范围、量化位深等图像特征对图像质量的关联影响,提高图像编码的效果。
图8是根据一示例性实施例示出的一种视频编码装置的结构方框图。该视频编码装置可以执行图3或图5所示实施例中的全部或者部分步骤。该装置可以是计算机设备,也可以设置在计算机设备中。该视频编码装置可以包括:
视频分割模块801,用于对原始视频数据进行分割,得到原始视频片段;
视频内容分析模块802,用于对所述原始视频片段进行视频内容分析,得到所述原始视频片段对应的视频图像处理参数;
视频处理模块803,用于基于所述视频图像处理参数对所述原始视频片段中的视频图像进行图像处理,得到处理后的视频片段;
编码参数获取模块804,用于基于所述处理后的视频片段的图像特征数据,获取所述处理后的视频片段的编码参数;
编码模块805,用于按照所述编码参数对所述处理后的视频片段进行编码,得到编码后的视频片段。
在一种可能的实现方式中,所述视频内容分析模块802,用于:
将所述原始视频片段以及目标图像质量数据输入至视频分析模型,获得所述视频分析模型输出的所述视频图像处理参数;
所述视频分析模型是通过视频片段样本、视频片段样本的目标图像质量数据以及所述视频片段样本的视频图像处理参数训练得到的机器学习模型。
在一种可能的实现方式中,所述目标图像质量数据包括图像质量等级。
在一种可能的实现方式中,所述视频图像处理参数包括如下至少一种:
目标帧率、目标量化位深以及亮度调整曲线。
在一种可能的实现方式中,如图9所示,所述视频处理模块803,包括:
帧率上采样单元8031,用于响应于所述视频图像处理参数包括所述目标帧率,且所述目标帧率高于所述原始视频片段的帧率,通过帧率上采样对所述原始视频片段进行超帧处理,得到所述处理后的视频片段;
帧率裁剪单元8032,用于响应于所述视频图像处理参数包括所述目标帧率,且所述目标帧率低于所述原始视频片段的帧率,通过帧率下采样对所述原始视频片段进行裁剪处理,得到所述处理后的视频片段。
帧率保持单元8033,用于响应于所述视频图像处理参数包括所述目标帧率,且所述目标帧率等于所述原始视频片段的帧率,保持所述原始视频片段的帧率,得到所述处理后的视频片段。
在一种可能的实现方式中,如图9所示,所述视频处理模块803,包括:
下采样量化单元8034,用于响应于所述视频图像处理参数包括所述目标量化位深,且所述目标量化位深低于所述原始视频片段的量化位深,对所述原始视频片段进行下采样量化,得到所述处理后的视频片段;
逆量化单元8035,用于响应于所述视频图像处理参数包括所述目标量化位深,且所述目标量化位深高于所述原始视频片段的量化位深,对所述原始视频片段进行反方向高精度逆量化,得到所述处理后的视频片段。
量化保持单元8036,用于响应于所述视频图像处理参数包括所述目标量化位深,且所述目标量化位深等于所述原始视频片段的量化位深,保持所述原始视频片段的量化位深率,得到所述处理后的视频片段。
在一种可能的实现方式中,所述视频处理模块803,包括:
色调映射单元8037,用于响应于所述视频图像处理参数包括所述亮度调整曲线,且所述亮度调整曲线对应的亮度范围与所述原始视频片段的亮度范围不一致,基于所述亮度调整曲线对所述原始视频片段进行色调映射,得到所述处理后的视频片段。
色调保持单元8038,用于响应于所述视频图像处理参数包括所述亮度调整 曲线,且所述亮度调整曲线对应的亮度范围与所述原始视频片段的亮度范围一致,保持所述原始视频片段的色调,得到所述处理后的视频片段。
在一种可能的实现方式中,所述编码参数获取模块804,用于:
将所述处理后的视频片段的图像特征数据输入至编码参数确定模型,获得所述编码参数确定模型输出的所述编码参数;
所述编码参数确定模型是通过图像特征数据样本,以及所述图像特征数据样本对应的编码参数训练得到的机器学习模型。
在一种可能的实现方式中,所述图像特征数据包括如下至少一种:帧率、量化位深、最大亮度、最小亮度、图像类型、运动矢量以及目标图像质量数据。
在一种可能的实现方式中,所述编码参数包括码率。
在一种可能的实现方式中,所述视频分割模块801,用于按照指定维度对所述原始视频数据进行分割,得到所述原始视频片段;
其中,所述指定维度包括如下至少一种:暗部及高亮部在图像中的分布特性、运动区域的轨迹与运动程度、色彩分布及浓烈程度以及画面细节。
在一种可能的实现方式中,如图9所示,所述装置还包括:
合并模块806,用于按照对应的原始视频片段的分割顺序,将各个编码后的视频片段进行合并,得到编码后的视频数据。
综上所述,本申请实施例所示的方案,在确定编码参数之前,先对原始视频片段进行图像处理,然后确定处理后的视频片段对应的编码参数并进行编码,通过将待编码的视频片段的视频图像质量控制在合理的质量范围内,避免待编码的视频片段的视频图像质量过高或者过低,这样可以控制视频质量与编码参数之间的关系,从而达到兼顾编码效率和编码后的视频质量的目的,从而提高编码效果。
此外,本申请实施例所示的方案,在进行镜头分割时,考虑超高清视频多维度,提升不同镜头间在暗部/高亮度、色彩、快速运动以及细节信息等方面的区分,从而提高视频片段分割的准确性,以进一步优化后续镜头内的压缩效率。
另外,本申请实施例所示的方案,在视频图像处理参数获取以及编码参数获取过程中引入了超高清视频的多维度特征,考虑了帧率、亮度范围、量化位深等图像特征对图像质量的关联影响,提高图像编码的效果。
图10是根据一示例性实施例示出的一种计算机设备的结构示意图。该计 算机设备可以实现为网络侧的服务器。该服务器可以为图1所示的服务器120。所述计算机设备1000包括中央处理单元(Central Processing Unit,CPU)1001、包括随机存取存储器(Random Access Memory,RAM)1002和只读存储器(Read-Only Memory,ROM)1003的系统存储器1004,以及连接系统存储器1004和中央处理单元1001的系统总线1005。所述计算机设备1000还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(Input/Output系统,I/O系统)1006,和用于存储操作系统1013、应用程序1014和其他程序模块1015的大容量存储设备1007。
所述大容量存储设备1007通过连接到系统总线1005的大容量存储控制器(未示出)连接到中央处理单元1001。所述大容量存储设备1007及其相关联的计算机可读介质为计算机设备1000提供非易失性存储。也就是说,所述大容量存储设备1007可以包括诸如硬盘或者CD-ROM(Compact Disc Read-Only Memory,只读光盘)驱动器之类的计算机可读介质(未示出)。
不失一般性,所述计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、EPROM(Erasable Programmable Read-Only Memory,可擦除可编程只读存储器)、EEPROM(Electrically Erasable Programmable Read-Only Memory,电可擦可编程只读存储器)、闪存或其他固态存储其技术,CD-ROM、DVD(Digital Versatile Disc,数字通用光盘)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器1004和大容量存储设备1007可以统称为存储器。
计算机设备1000可以通过连接在所述系统总线1005上的网络接口单元1011连接到互联网或者其它网络设备。
所述存储器还包括一个或者一个以上的程序,所述一个或者一个以上程序存储于存储器中,中央处理器1001通过执行该一个或一个以上程序来实现图3或图5所示的方法的全部或者部分步骤。
图11示出了本申请一个示例性实施例提供的计算机设备1100的结构框图。该计算机设备1100可以是终端,该终端可以为图1所示的终端160。
通常,计算机设备1100包括有:处理器1101和存储器1102。
处理器1101可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器1101可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器1101也可以包括主处理器和协处理器。在一些实施例中,处理器1101可以在集成有GPU(Graphics Processing Unit,图像处理器)。一些实施例中,处理器1101还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器1102可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器1102还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器1102中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器1101所执行以实现本申请中方法实施例提供的方法。
在一些实施例中,计算机设备1100还可选包括有:外围设备接口1103和至少一个外围设备。处理器1101、存储器1102和外围设备接口1103之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口1103相连。外围设备包括如下至少一种:射频电路1104、触摸显示屏1105、摄像头1106、音频电路1107、定位组件1108和电源1109。
外围设备接口1103可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器1101和存储器1102。
射频电路1104用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。可选地,射频电路1104包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。在一些实施例中,射频电路1104还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本申请对此不加以限定。
显示屏1105用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏1105是触摸显示屏时,显示屏1105还具有采集在显示屏1105的表面或表面上方的触摸信号的能力。显示屏1105可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic  Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件1106用于采集图像或视频。可选地,摄像头组件1106包括前置摄像头和后置摄像头。在一些实施例中,摄像头组件1106还可以包括闪光灯。
音频电路1107可以包括麦克风和扬声器。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在计算机设备1100的不同部位。在一些实施例中,音频电路1107还可以包括耳机插孔。
定位组件1108用于定位计算机设备1100的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。
电源1109用于为计算机设备1100中的各个组件进行供电。
在一些实施例中,计算机设备1100还包括有一个或多个传感器1110。该一个或多个传感器1110包括但不限于:加速度传感器1111、陀螺仪传感器1112、压力传感器1113、指纹传感器1114、光学传感器1115以及接近传感器1116。
本领域技术人员可以理解,图11中示出的结构并不构成对计算机设备1100的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括计算机程序(指令)的存储器,上述程序(指令)可由计算机设备的处理器执行以完成本申请各个实施例所示的方法的全部或者部分步骤。例如,所述非临时性计算机可读存储介质可以是只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、光盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、磁带、软盘和光数据存储设备等。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求指出。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。

Claims (15)

  1. 一种视频编码方法,应用于计算机设备中,所述方法包括:
    对原始视频数据进行分割,得到原始视频片段;
    对所述原始视频片段进行视频内容分析,得到所述原始视频片段对应的视频图像处理参数;
    基于所述视频图像处理参数对所述原始视频片段中的视频图像进行图像处理,得到处理后的视频片段;
    基于所述处理后的视频片段的图像特征数据,获取所述处理后的视频片段的编码参数;
    按照所述编码参数对所述处理后的视频片段进行编码,得到编码后的视频片段。
  2. 根据权利要求1所述的方法,其中,所述对所述原始视频片段进行视频内容分析,得到所述原始视频片段对应的视频图像处理参数,包括:
    将所述原始视频片段以及目标图像质量数据输入至视频分析模型,获得所述视频分析模型输出的所述视频图像处理参数;
    所述视频分析模型是通过视频片段样本、视频片段样本的目标图像质量数据以及所述视频片段样本的视频图像处理参数训练得到的机器学习模型。
  3. 根据权利要求2所述的方法,其中,所述目标图像质量数据包括图像质量等级。
  4. 根据权利要求1至3任一所述的方法,其中,所述视频图像处理参数包括如下至少一种:
    目标帧率、目标量化位深以及亮度调整曲线。
  5. 根据权利要求4所述的方法,其中,所述基于所述视频图像处理参数对所述原始视频片段中的视频图像进行图像处理,得到处理后的视频片段,包括:
    响应于所述视频图像处理参数包括所述目标帧率,且所述目标帧率高于所述原始视频片段的帧率,通过帧率上采样对所述原始视频片段进行超帧处理, 得到所述处理后的视频片段;
    响应于所述视频图像处理参数包括所述目标帧率,且所述目标帧率低于所述原始视频片段的帧率,通过帧率下采样对所述原始视频片段进行裁剪处理,得到所述处理后的视频片段;
    响应于所述视频图像处理参数包括所述目标帧率,且所述目标帧率等于所述原始视频片段的帧率,保持所述原始视频片段的帧率,得到所述处理后的视频片段。
  6. 根据权利要求4所述的方法,其中,所述基于所述视频图像处理参数对所述原始视频片段中的视频图像进行图像处理,得到处理后的视频片段,包括:
    响应于所述视频图像处理参数包括所述目标量化位深,且所述目标量化位深低于所述原始视频片段的量化位深,对所述原始视频片段进行下采样量化,得到所述处理后的视频片段;
    响应于所述视频图像处理参数包括所述目标量化位深,且所述目标量化位深高于所述原始视频片段的量化位深,对所述原始视频片段进行反方向高精度逆量化,得到所述处理后的视频片段;
    响应于所述视频图像处理参数包括所述目标量化位深,且所述目标量化位深等于所述原始视频片段的量化位深,保持所述原始视频片段的量化位深率,得到所述处理后的视频片段。
  7. 根据权利要求4所述的方法,其中,所述基于所述视频图像处理参数对所述原始视频片段中的视频图像进行图像处理,得到处理后的视频片段,包括:
    响应于所述视频图像处理参数包括所述亮度调整曲线,且所述亮度调整曲线对应的亮度范围与所述原始视频片段的亮度范围不一致,基于所述亮度调整曲线对所述原始视频片段进行色调映射,得到所述处理后的视频片段;
    响应于所述视频图像处理参数包括所述亮度调整曲线,且所述亮度调整曲线对应的亮度范围与所述原始视频片段的亮度范围一致,保持所述原始视频片段的色调,得到所述处理后的视频片段。
  8. 根据权利要求1所述的方法,其中,所述基于所述处理后的视频片段的 图像特征数据,获取所述处理后的视频片段的编码参数,包括:
    将所述处理后的视频片段的图像特征数据输入至编码参数确定模型,获得所述编码参数确定模型输出的所述编码参数;
    所述编码参数确定模型是通过图像特征数据样本,以及所述图像特征数据样本对应的编码参数训练得到的机器学习模型。
  9. 根据权利要求8所述的方法,其中,所述图像特征数据包括如下至少一种:
    帧率、量化位深、最大亮度、最小亮度、图像类型、运动矢量以及目标图像质量数据。
  10. 根据权利要求8或9所述的方法,其中,所述编码参数包括码率。
  11. 根据权利要求1所述的方法,其中,所述对原始视频数据进行分割,得到原始视频片段,包括:
    按照指定维度对所述原始视频数据进行分割,得到所述原始视频片段;
    其中,所述指定维度包括如下至少一种:暗部及高亮部在图像中的分布特性、运动区域的轨迹与运动程度、色彩分布及浓烈程度以及画面细节。
  12. 根据权利要求1所述的方法,其中,所述方法还包括:
    按照对应的原始视频片段的分割顺序,将各个编码后的视频片段进行合并,得到编码后的视频数据。
  13. 一种视频编码装置,所述装置包括:
    视频分割模块,用于对原始视频数据进行分割,得到原始视频片段;
    视频内容分析模块,用于对所述原始视频片段进行视频内容分析,得到所述原始视频片段对应的视频图像处理参数;
    视频处理模块,用于基于所述视频图像处理参数对所述原始视频片段中的视频图像进行图像处理,得到处理后的视频片段;
    编码参数获取模块,用于基于所述处理后的视频片段的图像特征数据,获 取所述处理后的视频片段的编码参数;
    编码模块,用于按照所述编码参数对所述处理后的视频片段进行编码,得到编码后的视频片段。
  14. 一种计算机设备,所述计算机设备包含处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至12任一所述的视频编码方法。
  15. 一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如权利要求1至12任一所述的视频编码方法。
PCT/CN2020/124536 2020-03-02 2020-10-28 视频编码方法、装置、计算机设备及存储介质 WO2021174878A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/678,931 US20220256140A1 (en) 2020-03-02 2022-02-23 Video encoding method and apparatus, computer device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010135358.0A CN110996131B (zh) 2020-03-02 2020-03-02 视频编码方法、装置、计算机设备及存储介质
CN202010135358.0 2020-03-02

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/678,931 Continuation US20220256140A1 (en) 2020-03-02 2022-02-23 Video encoding method and apparatus, computer device, and storage medium

Publications (1)

Publication Number Publication Date
WO2021174878A1 true WO2021174878A1 (zh) 2021-09-10

Family

ID=70081336

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/124536 WO2021174878A1 (zh) 2020-03-02 2020-10-28 视频编码方法、装置、计算机设备及存储介质

Country Status (3)

Country Link
US (1) US20220256140A1 (zh)
CN (1) CN110996131B (zh)
WO (1) WO2021174878A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110996131B (zh) * 2020-03-02 2020-11-10 腾讯科技(深圳)有限公司 视频编码方法、装置、计算机设备及存储介质
CN111598768B (zh) * 2020-07-23 2020-10-30 平安国际智慧城市科技股份有限公司 图像优化处理方法、装置、计算机设备及存储介质
CN113170054A (zh) * 2020-07-28 2021-07-23 深圳市大疆创新科技有限公司 视频传输方法、可移动平台及计算机可读存储介质
US11995153B2 (en) * 2020-09-24 2024-05-28 Canon Kabushiki Kaisha Information processing apparatus, information processing method, and storage medium
CN113099132B (zh) * 2021-04-19 2023-03-21 深圳市帧彩影视科技有限公司 视频处理方法、装置、电子设备、存储介质及程序产品
CN116320529A (zh) * 2021-12-10 2023-06-23 深圳市中兴微电子技术有限公司 视频码率控制方法及装置、计算机可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1459981A (zh) * 2002-05-22 2003-12-03 三星电子株式会社 自适应编码和解码运动图像的方法及其装置
CN105264888A (zh) * 2014-03-04 2016-01-20 微软技术许可有限责任公司 用于对色彩空间、色彩采样率和/或比特深度自适应切换的编码策略
CN107409221A (zh) * 2015-03-20 2017-11-28 杜比实验室特许公司 信号整形逼近
CN109286825A (zh) * 2018-12-14 2019-01-29 北京百度网讯科技有限公司 用于处理视频的方法和装置
WO2020036502A1 (en) * 2018-08-14 2020-02-20 Huawei Technologies Co., Ltd Machine-learning-based adaptation of coding parameters for video encoding using motion and object detection
CN110996131A (zh) * 2020-03-02 2020-04-10 腾讯科技(深圳)有限公司 视频编码方法、装置、计算机设备及存储介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101272489B (zh) * 2007-03-21 2011-08-10 中兴通讯股份有限公司 视频图像质量增强的编解码装置与编解码方法
GB201603144D0 (en) * 2016-02-23 2016-04-06 Magic Pony Technology Ltd Training end-to-end video processes
US10460231B2 (en) * 2015-12-29 2019-10-29 Samsung Electronics Co., Ltd. Method and apparatus of neural network based image signal processor
US10402932B2 (en) * 2017-04-17 2019-09-03 Intel Corporation Power-based and target-based graphics quality adjustment
US10798399B1 (en) * 2017-12-11 2020-10-06 Amazon Technologies, Inc. Adaptive video compression
CN109729439B (zh) * 2019-01-11 2022-06-17 北京世纪好未来教育科技有限公司 实时视频传输方法
CN110418177B (zh) * 2019-04-19 2021-06-11 腾讯科技(深圳)有限公司 视频编码方法、装置、设备和存储介质
CN110149554B (zh) * 2019-05-31 2021-06-15 Oppo广东移动通信有限公司 视频图像处理的方法、装置、电子设备以及存储介质
CN110290381B (zh) * 2019-08-01 2020-10-30 字节跳动(香港)有限公司 视频质量评估方法、装置、电子设备及计算机存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1459981A (zh) * 2002-05-22 2003-12-03 三星电子株式会社 自适应编码和解码运动图像的方法及其装置
CN105264888A (zh) * 2014-03-04 2016-01-20 微软技术许可有限责任公司 用于对色彩空间、色彩采样率和/或比特深度自适应切换的编码策略
CN107409221A (zh) * 2015-03-20 2017-11-28 杜比实验室特许公司 信号整形逼近
WO2020036502A1 (en) * 2018-08-14 2020-02-20 Huawei Technologies Co., Ltd Machine-learning-based adaptation of coding parameters for video encoding using motion and object detection
CN109286825A (zh) * 2018-12-14 2019-01-29 北京百度网讯科技有限公司 用于处理视频的方法和装置
CN110996131A (zh) * 2020-03-02 2020-04-10 腾讯科技(深圳)有限公司 视频编码方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN110996131B (zh) 2020-11-10
US20220256140A1 (en) 2022-08-11
CN110996131A (zh) 2020-04-10

Similar Documents

Publication Publication Date Title
WO2021174878A1 (zh) 视频编码方法、装置、计算机设备及存储介质
Chiariotti A survey on 360-degree video: Coding, quality of experience and streaming
CN109844736B (zh) 概括视频内容
US11978178B2 (en) Electronic device, control method thereof, and system
US10534525B1 (en) Media editing system optimized for distributed computing systems
US20210392392A1 (en) Bitrate Optimizations For Immersive Multimedia Streaming
CN102474661A (zh) 根据输送协议封装三维视频数据
EP4373086A1 (en) Image processing method and apparatus, medium, and electronic device
EP3917131A1 (en) Image deformation control method and device and hardware device
WO2023273536A1 (zh) 重光照图像的生成方法、装置及电子设备
US20220236782A1 (en) System and method for intelligent multi-application and power management for multimedia collaboration applications
US20130121573A1 (en) Hybrid codec for compound image compression
CN110689478B (zh) 图像风格化处理方法、装置、电子设备及可读介质
CN114783459A (zh) 一种语音分离方法、装置、电子设备和存储介质
CN117095006B (zh) 图像美学评估方法、装置、电子设备及存储介质
US20210358087A1 (en) A method and device for enhancing video image quality
CN111696034B (zh) 图像处理方法、装置及电子设备
KR102623148B1 (ko) 전자 장치 및 전자 장치의 제어 방법
CN114827567B (zh) 视频质量分析方法、设备和可读介质
Tang et al. A cloud-edge collaborative gaming framework using AI-Powered foveated rendering and super resolution
CN117242421A (zh) 用于基于场景的沉浸式媒体的流式传输的智能客户端
KR20200080369A (ko) 디스플레이장치, 그 제어방법 및 기록매체
CN113780252A (zh) 视频处理模型的训练方法、视频处理方法和装置
US20210327034A1 (en) A method and device for enhancing video image quality
Xie et al. Bandwidth-Aware Adaptive Codec for DNN Inference Offloading in IoT

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20922889

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20922889

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 20-02-2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20922889

Country of ref document: EP

Kind code of ref document: A1