CN115396683A - Video optimization processing method and device, electronic equipment and computer readable medium - Google Patents

Video optimization processing method and device, electronic equipment and computer readable medium Download PDF

Info

Publication number
CN115396683A
CN115396683A CN202211008975.XA CN202211008975A CN115396683A CN 115396683 A CN115396683 A CN 115396683A CN 202211008975 A CN202211008975 A CN 202211008975A CN 115396683 A CN115396683 A CN 115396683A
Authority
CN
China
Prior art keywords
transcoding
video
sample
parameter
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211008975.XA
Other languages
Chinese (zh)
Other versions
CN115396683B (en
Inventor
宋怡君
巢娅
于和新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Boguan Information Technology Co Ltd
Original Assignee
Guangzhou Boguan Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Boguan Information Technology Co Ltd filed Critical Guangzhou Boguan Information Technology Co Ltd
Priority to CN202211008975.XA priority Critical patent/CN115396683B/en
Publication of CN115396683A publication Critical patent/CN115396683A/en
Application granted granted Critical
Publication of CN115396683B publication Critical patent/CN115396683B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440263Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the spatial resolution, e.g. for displaying on a connected PDA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440281Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the temporal resolution, e.g. by frame skipping

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The disclosure relates to a video optimization processing method, a video optimization processing device, electronic equipment and a computer readable medium, and belongs to the technical field of video data processing. The method comprises the following steps: acquiring an original video to be processed, and determining the video type of the original video; determining target transcoding parameters corresponding to the original video according to a transcoding parameter model corresponding to the video type of the original video; and transcoding the original video according to the target transcoding parameters to obtain a transcoded video corresponding to the original video. According to the method, the transcoding parameter model is trained by using a deep learning algorithm, and different target transcoding parameters are adapted according to different video types, so that the definition of the transcoded video can be improved under the condition that the original code rate is kept.

Description

Video optimization processing method and device, electronic equipment and computer readable medium
Technical Field
The present disclosure relates to the field of video data processing technologies, and in particular, to a video optimization processing method, a video optimization processing apparatus, an electronic device, and a computer-readable medium.
Background
In a live broadcast system, in order to meet the viewing requirements of users under different network conditions, video streams with different definition levels are generated by downward recoding based on the original picture video of the main broadcast push stream, and the process is called transcoding.
The traditional transcoding technology uses several coding configuration levels recommended by an encoder to encode videos of all categories, and is simple and easy to maintain, but the definition under the code rate configuration cannot be maximized in the mode. If the video specification is improved by improving the frame rate, the code rate and the like, the bandwidth cost is inevitably increased.
In view of this, there is a need in the art for a video optimization method that can improve the definition of the transcoded video while maintaining the original bitrate.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present disclosure is directed to a video optimization method, a video optimization device, an electronic device, and a computer-readable medium, so as to improve the definition of a video after transcoding at least to a certain extent while maintaining an original bitrate.
According to a first aspect of the present disclosure, there is provided a video optimization processing method, including:
acquiring an original video to be processed, and determining the video type of the original video;
determining target transcoding parameters corresponding to the original video according to the video type of the original video, wherein the target transcoding parameters are obtained through a transcoding parameter model corresponding to the video type;
and transcoding the original video according to the target transcoding parameters to obtain a transcoded video corresponding to the original video.
In an exemplary embodiment of the present disclosure, the method for training the transcoding parameter model includes:
acquiring a training sample set and transcoding sample parameters corresponding to the video type, wherein the training sample set comprises a plurality of original sample videos corresponding to the video type;
the transcoding parameter model is constructed according to the transcoding sample parameters, and a plurality of groups of transcoding sample parameter sets are obtained through the transcoding parameter model;
transcoding the original sample video according to each group of the transcoding sample parameter sets respectively to obtain a transcoding sample video corresponding to each group of the transcoding sample parameter sets;
and determining a transcoding quality score function of the transcoding sample video corresponding to each group of transcoding sample parameter sets, constructing an objective function by taking the maximized transcoding quality score function as a target, and training the transcoding parameter model.
In an exemplary embodiment of the disclosure, the transcoding parameter model includes an attention layer therein, the method further comprising:
adjusting attention weights of the transcoding sample parameters in the transcoding parameter model by the attention layer in the transcoding parameter model.
In an exemplary embodiment of the present disclosure, the adjusting, by the attention layer in the transcoding parameter model, attention weights of the transcoding sample parameters in the transcoding parameter model includes:
and adjusting the attention weight of each transcoding sample parameter in the transcoding parameter model according to the attention score of each transcoding sample parameter in the transcoding parameter model in the attention layer.
In an exemplary embodiment of the present disclosure, the obtaining, by the transcoding parameter model, a plurality of sets of transcoding sample parameter sets includes:
and combining the values of the transcoding sample parameters in the transcoding parameter model according to the attention weight of each transcoding sample parameter to obtain a plurality of groups of transcoding sample parameter sets.
In an exemplary embodiment of the present disclosure, the determining a transcoding quality score function of a transcoded sample video corresponding to each set of the transcoding sample parameter set includes:
and determining the transcoding quality score function of the transcoding sample video corresponding to each group of transcoding sample parameter set according to the video quality score function and the file size score function of the transcoding sample video.
In an exemplary embodiment of the present disclosure, the determining, according to the video quality score function and the file size score function of the transcoded sample video, a transcoded quality score function of the transcoded sample video corresponding to each set of the transcoded sample parameter sets includes:
when the transcoding time corresponding to the transcoding sample parameter set is larger than a transcoding time threshold value or the size of a video file of the transcoding sample video is larger than that of the original sample video, determining a function value of a transcoding quality score function of the transcoding sample video as an unqualified score;
otherwise, weighting the video quality score function and the file size score function of the transcoded sample video according to a preset score weight to obtain the transcoded quality score function of the transcoded sample video.
In an exemplary embodiment of the present disclosure, the method further comprises:
and in each iteration process of the transcoding parameter model, selecting the transcoding sample parameter set in the next iteration round according to a preset proportion from each group of transcoding sample parameter sets in the current iteration round according to the function value of the transcoding quality score function corresponding to each group of transcoding sample parameter sets.
According to a second aspect of the present disclosure, there is provided a video optimization processing apparatus including:
the original video acquisition module is used for acquiring an original video to be processed and determining the video type of the original video;
the transcoding parameter determining module is used for determining target transcoding parameters corresponding to the original video according to the video type of the original video, wherein the target transcoding parameters are obtained through a transcoding parameter model corresponding to the video type;
and the video transcoding processing module is used for transcoding the original video according to the target transcoding parameters to obtain a transcoded video corresponding to the original video.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform any one of the video optimization processing methods described above via execution of the executable instructions.
According to a fourth aspect of the present disclosure, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the video optimization processing method of any one of the above.
The exemplary embodiments of the present disclosure may have the following advantageous effects:
in the video optimization processing method of the disclosed exemplary embodiment, a transcoding video corresponding to an original video is obtained by determining a video type of the original video, then determining a target transcoding parameter corresponding to the original video according to a transcoding parameter model corresponding to the video type, and finally transcoding the original video according to the target transcoding parameter. According to the video optimization processing method in the disclosed example embodiment, the transcoding parameter model is trained by using a deep learning algorithm, different target transcoding parameters are adapted according to different video types, the definition of the video after transcoding can be improved under the condition of keeping the original code rate, meanwhile, the operation bandwidth and the computing resources are not required to be increased in online deployment, and the user side does not have any flow burden and pause burden while feeling the clear image quality. On the other hand, the transcoding parameter model has higher iteration speed and the capability of large-scale laying, and can be popularized to a full platform for use.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
FIG. 1 is a schematic diagram illustrating a video optimization processing method according to a related embodiment of the disclosure;
FIG. 2 shows a flow diagram of a video optimization processing method of an example embodiment of the present disclosure;
fig. 3 shows a flowchart of a method of training a transcoding parameter model according to an example embodiment of the present disclosure;
FIG. 4 is a flow diagram illustrating a method for video optimization processing in accordance with one embodiment of the present disclosure;
FIG. 5 is a graph comparing results of a video quality score function obtained by a video optimization processing method according to an embodiment of the present disclosure;
FIG. 6 is a graph comparing another result of a video quality score function obtained by a video optimization processing method according to an embodiment of the present disclosure;
fig. 7 shows a block diagram of a video optimization processing apparatus of an example embodiment of the present disclosure;
FIG. 8 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
With the continuous development of the live broadcast industry, the application of video services is wider, and the requirements of users on video watching experience are higher and higher. Therefore, clearer image quality and smoother viewing experience become the most popular technical optimization direction in the industry. As shown in fig. 1, the currently commonly used image quality improving methods are divided into two categories:
1) The video specification is improved to achieve better image quality, such as improving frame rate, resolution, code rate and the like.
2) The image quality is enhanced by a post-stack processing algorithm, such as a noise reduction algorithm, a super-resolution algorithm, and an HDR (High Dynamic Range Imaging) algorithm.
However, these two methods are often accompanied by cost increase and smoothness loss, which may increase bandwidth cost, increase transmission delay and even cause users to watch cards. In addition, feedback of video quality often requires a large labor cost for collection, and it is necessary to provide objective evaluation for the result of quality improvement by using a quantitative index. In general, there are 2 problems in the following when studying the image quality enhancement method:
1) The enhancement of the image quality leads to the increase of the bandwidth cost, and the increase of the video specification inevitably increases the bandwidth cost.
2) The enhancement of the image quality leads to the reduction of the watching fluency, the post-processing algorithm has certain requirements on the watching equipment of the user, and the introduction of the post-processing algorithm may cause the watching of the user to be unsmooth.
Therefore, the mainstream method for enhancing the image quality at present needs a higher code rate to support the definition, so that the operation bandwidth cost is increased, the requirements on the user network and the equipment are higher, and the blocking risk at the user side is increased.
Based on the above problem, the present exemplary embodiment first provides a video optimization processing method, which can improve video definition while maintaining an original bitrate. Referring to fig. 2, the video optimization processing method may include the following steps:
and S210, acquiring an original video to be processed, and determining the video type of the original video.
S220, determining target transcoding parameters corresponding to the original video according to the transcoding parameter model corresponding to the video type of the original video.
And S230, transcoding the original video according to the target transcoding parameters to obtain a transcoded video corresponding to the original video.
In the video optimization processing method of the disclosed example embodiment, a transcoding video corresponding to an original video is obtained by determining a video type of the original video, then determining a target transcoding parameter corresponding to the original video according to a transcoding parameter model corresponding to the video type, and finally transcoding the original video according to the target transcoding parameter. According to the video optimization processing method in the disclosed example embodiment, the transcoding parameter model is trained by using a deep learning algorithm, different target transcoding parameters are adapted according to different video types, the definition of the video after transcoding can be improved under the condition of keeping the original code rate, meanwhile, the operation bandwidth and the computing resources are not required to be increased in online deployment, and the user side does not have any flow burden and pause burden while feeling the clear image quality. On the other hand, the transcoding parameter model has higher iteration speed and large-range laying capacity, and can be popularized to a full platform for use.
The above steps of the present exemplary embodiment will be described in more detail with reference to fig. 3 to 6.
In step S210, an original video to be processed is acquired, and a video type of the original video is determined.
In this example embodiment, the original video to be processed may include an original video of a live push stream. The video types may include entertainment categories such as live video of a real person, and live video corresponding to different game types.
Due to the characteristics of different types (different games and entertainment image quality) of image scenes, the research on an image quality enhancement technology in a transcoding stage can find that videos in different scenes are often adapted to different transcoding parameters, and the influence of partial transcoding parameters on the quality of the transcoded video is more obvious. By researching the working mechanism of the transcoding link and testing the configuration of various transcoding parameters, the method can find that different transcoding parameters are customized for different scene types, and the video quality is improved remarkably. Therefore, the transcoding process of the original video can be performed according to different video types.
In step S220, a target transcoding parameter corresponding to the original video is determined according to the transcoding parameter model corresponding to the video type of the original video.
Transcoding refers to converting a video signal from one format to another format, such as converting the bitrate, resolution, packing format, etc. The transcoding parameters are parameters used in the video transcoding process, such as H264 transcoding parameters, which may include adaptive quantization specific gravity, quantization value curve factor, chroma offset, and the like.
In this exemplary embodiment, a target transcoding parameter to be used in the transcoding process of the original video may be determined according to a transcoding parameter model corresponding to the video type of the original video, and a method for training the transcoding parameter model is described in detail in the following steps.
In step S230, transcoding the original video according to the target transcoding parameter to obtain a transcoded video corresponding to the original video.
And finally, transcoding the original video according to the target transcoding parameters corresponding to the video type to obtain the transcoded video.
The video optimization processing method provided in the present exemplary embodiment is an end-to-end transcoding image quality enhancement algorithm, and the result of the output end is directly obtained from the data at the input end without intermediate layer calculation and processing.
Next, the present example embodiment further provides a method for training a transcoding parameter model, where the method uses a deep learning algorithm to build a neural network training transcoding parameter, and uses a video transcoding quality score as an objective function, so that a transcoded video obtained by the training parameter under the condition of the same bitrate achieves the optimal image quality. Referring to fig. 3, the method for training the transcoding parameter model may specifically include the following steps:
step S310, a training sample set and transcoding sample parameters corresponding to the video type are obtained, wherein the training sample set comprises a plurality of original sample videos corresponding to the video type.
In this example embodiment, the transcoding sample parameter may select a plurality of key parameters in the video transcoding process, such as a plurality of key parameters in the H264 transcoding parameter. By investigating the meaning of each transcoding parameter, the following 12 key transcoding parameters may be selected for training, including an average bitrate, a maximum bitrate, a key frame interval, an adaptive quantization specific gravity, a quantization value curve factor, a chroma offset, a quantization matrix, the number of reference frames available for a P frame, a Macroblock Tree bitrate control parameter, an mbtree (Macroblock Tree) strength, a maximum B frame number, a P frame or B frame placement mode, and in addition, the transcoding sample parameters for training may be increased or decreased according to actual conditions, which is not specifically limited in the present exemplary embodiment.
And S320, constructing a transcoding parameter model according to the transcoding sample parameters, and obtaining a plurality of groups of transcoding sample parameter sets through the transcoding parameter model.
In this example embodiment, the transcoding parameter model may be constructed by modeling the transcoding sample parameters on the transcoding side (decoding-encoding).
In order to solve the problem that the process of generating the transcoded video by the transcoding parameters is not derivable, a neural network with 1-layer dimension N × M may be used for training, where M represents M key transcoding sample parameters, and N represents N possible values generated by each transcoding sample parameter. And randomly carrying out cross combination on the parameter values of the recommended transcoding sample parameters to obtain Q transcoding sample parameter sets.
Meanwhile, in the present exemplary embodiment, an attention mechanism may also be introduced in the transcoding parameter generation stage to train the importance degree of each transcoding sample parameter to improve the quality of the transcoded video. Specifically, an attention layer may be added to the transcoding parameter model, and the attention weight of each transcoding sample parameter in the transcoding parameter model is adjusted through the attention layer in the transcoding parameter model.
Because the transcoding sample parameters have different influence degrees on the video quality, the transcoding sample parameter set is generated in a random cross combination mode, so that part of important parameters are difficult to obtain the maximum training effect in the whole training process, and the waste of computing resources is caused.
Attention-driven mechanisms stem from the study of human vision, where humans selectively focus on a portion of all information while ignoring other visible information due to bottlenecks in information processing. Therefore, an attention layer with an M dimension can be introduced, and the attention weight of each transcoding sample parameter in the transcoding parameter model is adjusted to exert the maximum effect.
After the attention layer is introduced, the attention weight of each transcoding sample parameter in the transcoding parameter model can be adjusted according to the attention score of each transcoding sample parameter in the transcoding parameter model in the attention layer, and the values of each transcoding sample parameter in the transcoding parameter model are combined according to the attention weight of each transcoding sample parameter to obtain a plurality of groups of transcoding sample parameter sets. Transcoding sample parameters with high attention scores will get higher selection attention weights in generating the transcoding sample parameter sets instead of completely random cross-combinations.
In addition, for transcoding sample parameters with large influence in the scheme, such as chroma offset and a quantization value curve factor, a higher initialization attention weight can be set in the training process, so that the model achieves the best training effect.
And S330, transcoding the original sample video according to each group of transcoding sample parameter sets to obtain the transcoding sample video corresponding to each group of transcoding sample parameter sets.
In each iteration process, after a plurality of groups of transcoding sample parameter sets are obtained by combining the values of the transcoding sample parameters, transcoding processing is respectively carried out on the original sample video on the basis of the transcoding sample parameter sets, and the transcoded sample video of the original sample video corresponding to each group of transcoding sample parameter sets after transcoding is obtained.
Step S340, determining transcoding quality score functions of the transcoding sample videos corresponding to the transcoding sample parameter sets, constructing an objective function by taking the maximized transcoding quality score function as a target, and training a transcoding parameter model.
In this example embodiment, the transcoding quality score function of the transcoded sample video corresponding to each set of transcoding sample parameter sets may be determined according to the video quality score function and the file size score function of the transcoded sample video.
Specifically, when the transcoding time corresponding to the transcoding sample parameter set is greater than the transcoding time threshold value, or the size of the video file of the transcoding sample video is greater than that of the original sample video, determining a function value of a transcoding quality score function of the transcoding sample video as an unqualified score, such as 0 point; otherwise, weighting the video quality score function and the file size score function of the transcoded sample video according to the preset score weight to obtain the transcoded quality score function of the transcoded sample video.
In this example embodiment, for a given piece of original sample video, the video enhancement process of the transcoding layer may be modeled as the following optimization objective function:
H(x)=Max(Score(W(x)))
wherein x represents an original sample video, W (x) represents a transcoding sample parameter of neural network training acting on the original sample video x, and Score (W (x)) represents a transcoding quality Score function of the transcoding sample video. The optimization goal is to train the transcoding sample parameters so that the transcoding quality score function of the transcoding sample video is the highest without introducing cost.
In this example embodiment, the transcoding quality Score function Score (W (x)) of the transcoded sample video may be obtained by transcodingVideo quality Score function Score of sample video vq And a file size Score function Score for transcoding sample video fs And performing weighted calculation according to the preset fraction weights alpha and beta to obtain the weight. Meanwhile, in order to ensure that the image quality is enhanced and no additional cost and katon rate are introduced, a file size Score function Score of the transcoded sample video can be used fs (reflecting real code rate) and transcoding time Score corresponding to transcoding sample parameter set t And as a monitoring parameter, calculating a video transcoding quality score function, and searching for a transcoding sample parameter which generates the best image quality under the condition of obtaining the same code rate and transcoding efficiency. The specific calculation rule is as follows:
Figure BDA0003810042900000101
in particular, score when the video file size of the transcoded sample video exceeds the file size of the original sample video fs (W (x)) =0. For the transcoding time, a corresponding experience threshold can be set for the video duration, and when the transcoding time exceeds the transcoding time threshold, the performance of the set of transcoding sample parameter sets is considered to be unavailable in a real-time scene, namely Score t (W (x)) =0. Therefore, on the premise of ensuring the code rate and the transcoding efficiency, the video quality score function and the file size score function of the transcoded sample video are used for weighting to obtain the final transcoding quality score function of the transcoded sample video, namely the transcoding quality score function corresponding to the set of transcoding sample parameter sets.
In the present exemplary embodiment, in order to objectively quantify the effect of the enhancement algorithm and ensure the fairness of scoring, a scoring algorithm Vmaf (Video Multi-Method Assessment tool) which is most widely and authoritatively used in the industry may be used as a quantitative index of the Video quality score function. The Vmaf takes the original picture stream as a reference video, and obtains a video quality score function of the transcoded video by comparing the spatial information and the local time sequence information of the image. The video quality assessment is an algorithm for automatically assessing video quality, and can be perceptually consistent with human subjective assessment. A set of complete quantization tools are provided to provide a video transcoding quality score function, a video quality evaluation platform can be built, and objective quantization indexes are provided for image quality enhancement.
In the exemplary embodiment, in the aspect of model training, the global optimal solution may be retained while performing local optimization. Specifically, in each iteration process of the transcoding parameter model, according to the function value of the transcoding quality score function corresponding to each set of transcoding sample parameter sets, the transcoding sample parameter set in the next iteration round may be selected from each set of transcoding sample parameter sets in the current iteration round according to a preset proportion.
In a specific implementation process, an optimal solution can be sought for local transcoding parameters through gradient back-transmission of a loss function, and a global optimal solution is reserved in each iteration (epoch), that is, according to a function value of a transcoding quality score function corresponding to each group of transcoding sample parameter sets, an optimal transcoding sample parameter set is selected from each group of transcoding sample parameter sets in a current iteration turn according to a preset proportion, such as the first 10%, and then the optimal transcoding sample parameter set enters the next iteration turn. The method avoids the model from falling into the local optimal solution, accelerates the searching process of the global optimal solution and better realizes the convergence of the model. Moreover, the method also has the largest influence on the prediction effect of the model.
Fig. 4 is a complete flowchart of a video optimization processing method in an embodiment of the present disclosure, which is an illustration of the above steps in the present embodiment, wherein the transcoding parameter model is a neural network with attention mechanism, and the transcoding parameter model is trained and evaluated by a transcoding quality score function of a video to obtain an optimal transcoding parameter, so as to achieve an effect of improving video quality.
Fig. 5 is a graph showing a comparison of the result of the video quality score function obtained by the video optimization processing method according to an embodiment of the present disclosure, which is a comparison of the vmaf scores of a plurality of videos at a coding rate of 720p and 2000 in each live broadcast platform. The left histogram is the video vmaf score of the live broadcast platform H, the middle histogram is the video vmaf score of the current live broadcast platform C, and the right histogram is the video vmaf score of the live broadcast platform C obtained by using the video optimization processing method in the exemplary embodiment of the present disclosure.
Fig. 6 is a diagram showing another comparison result of the video quality score function obtained by the video optimization processing method according to an embodiment of the present disclosure, which is a comparison of vmaf scores of multiple videos at 1080p and 4000 bitrate in each live broadcast platform. The left histogram is the video vmaf score of the current live broadcast platform C, and the right histogram is the video vmaf score of the live broadcast platform C obtained by using the video optimization processing method in the exemplary embodiment of the present disclosure.
In the present exemplary embodiment, in order to ensure the fairness of the scoring, a conventional algorithm Vmaf is selected in the self-research tool to quantify and compare the video quality of 12 different scenes. According to the comparison result, the video quality score function before and after optimization is improved by 11% and exceeds the video quality score function of the live broadcast platform H by 25% under the condition of the same code rate of the super-clear gear. Further experiments show that the image quality of a live broadcast picture of a certain game is improved by 14 percent, and the image quality of an off-line effect of entertainment products is improved by 9 percent.
In practical application, the video optimization processing method in the exemplary embodiment obviously improves the image quality of videos of different types and contents, and has significant enhancement effects in three aspects of background details, character details and character details. For a live game picture, the method has obvious enhancement effects on icons, text details and character outlines in the game. The makeup effect and the hair of the real person anchor are obviously enhanced for the live broadcast pictures under entertainment products. Therefore, the video optimization processing method in the exemplary embodiment of the present disclosure has strong versatility, and meanwhile, the operation bandwidth and the computing resources are not increased, so that the user side does not have any traffic burden and hiton burden while feeling clear image quality.
It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Furthermore, the present disclosure also provides a video optimization processing apparatus. Referring to fig. 7, the video optimization processing apparatus may include an original video obtaining module 710, a transcoding parameter determining module 720, and a video transcoding processing module 730. Wherein:
the original video obtaining module 710 may be configured to obtain an original video to be processed, and determine a video type of the original video;
the transcoding parameter determining module 720 may be configured to determine, according to the video type of the original video, a target transcoding parameter corresponding to the original video, where the target transcoding parameter is obtained through a transcoding parameter model corresponding to the video type;
the video transcoding processing module 730 may be configured to perform transcoding processing on the original video according to the target transcoding parameter, so as to obtain a transcoded video corresponding to the original video.
In some exemplary embodiments of the present disclosure, a video optimization processing apparatus provided by the present disclosure may further include a transcoding parameter model training module, which may include a training sample obtaining unit, a transcoding sample parameter set determining unit, a transcoding sample video determining unit, and an objective function determining unit. Wherein:
the training sample acquisition unit can be used for acquiring a training sample set and transcoding sample parameters corresponding to the video type, wherein the training sample set comprises a plurality of original sample videos corresponding to the video type;
the transcoding sample parameter group determining unit can be used for constructing a transcoding parameter model according to the transcoding sample parameters and obtaining a plurality of groups of transcoding sample parameter groups through the transcoding parameter model;
the transcoding sample video determining unit can be used for transcoding the original sample video according to each group of transcoding sample parameter sets to obtain transcoding sample videos corresponding to each group of transcoding sample parameter sets;
the target function determining unit may be configured to determine a transcoding quality score function of the transcoded sample video corresponding to each set of transcoding sample parameter sets, construct a target function with the maximized transcoding quality score function as a target, and train a transcoding parameter model.
In some exemplary embodiments of the present disclosure, the transcoding parameter model training module may further include a parameter attention weight adjusting unit, which may be configured to adjust attention weights of individual transcoding sample parameters in the transcoding parameter model by an attention layer in the transcoding parameter model.
In some exemplary embodiments of the present disclosure, the attention weight adjusting unit may include an attention score adjusting unit, which may be configured to adjust the attention weight of each transcoding sample parameter in the transcoding parameter model according to the attention score of each transcoding sample parameter in the transcoding parameter model in the attention layer.
In some exemplary embodiments of the present disclosure, the transcoding sample parameter set determining unit may include a parameter value combining unit, and may be configured to combine values of each transcoding sample parameter in the transcoding parameter model according to the attention weight of each transcoding sample parameter, so as to obtain multiple sets of transcoding sample parameter sets.
In some example embodiments of the present disclosure, the objective function determining unit may include a transcoding quality score function determining unit, which may be configured to determine, according to a video quality score function and a file size score function of the transcoded sample video, a transcoding quality score function of the transcoded sample video corresponding to each set of transcoding sample parameter sets.
In some example embodiments of the present disclosure, the transcoding quality score function determining unit may include a failure score determining unit and a transcoding quality score function calculating unit. Wherein:
the unqualified score determining unit can be used for determining a function value of a transcoding quality score function of the transcoded sample video as an unqualified score when the transcoding time corresponding to the transcoding sample parameter group is larger than the transcoding time threshold or the size of a video file of the transcoded sample video is larger than that of the original sample video;
the transcoding quality score function calculation unit may be configured to weight the video quality score function and the file size score function of the transcoded sample video according to a preset score weight, so as to obtain the transcoding quality score function of the transcoded sample video.
In some exemplary embodiments of the present disclosure, the transcoding parameter model training module may further include a transcoding sample parameter set filtering unit, which is configured to, in each iteration process of the transcoding parameter model, select, according to a preset ratio, a transcoding sample parameter set in a next iteration round from each set of transcoding sample parameter sets in a current iteration round according to a function value of a transcoding quality score function corresponding to each set of transcoding sample parameter set.
The details of each module/unit in the video optimization processing apparatus have been described in detail in the corresponding method embodiment section, and are not described herein again.
FIG. 8 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device to implement an embodiment of the invention.
It should be noted that the computer system 800 of the electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiment of the present invention.
As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU) 801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for system operation are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.
In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. When the computer program is executed by the Central Processing Unit (CPU) 801, various functions defined in the system of the present application are executed.
It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments above.
It should be noted that although in the above detailed description several modules of the device for action execution are mentioned, this division is not mandatory. Indeed, the features and functionality of two or more of the modules described above may be embodied in one module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module described above may be further divided into embodiments by a plurality of modules.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (11)

1. A video optimization processing method is characterized by comprising the following steps:
acquiring an original video to be processed, and determining the video type of the original video;
determining target transcoding parameters corresponding to the original video according to a transcoding parameter model corresponding to the video type of the original video;
and transcoding the original video according to the target transcoding parameters to obtain a transcoded video corresponding to the original video.
2. The video optimization processing method according to claim 1, wherein the method for training the transcoding parameter model comprises:
acquiring a training sample set and transcoding sample parameters corresponding to the video type, wherein the training sample set comprises a plurality of original sample videos corresponding to the video type;
constructing the transcoding parameter model according to the transcoding sample parameters, and obtaining a plurality of groups of transcoding sample parameter sets through the transcoding parameter model;
transcoding the original sample video according to the transcoding sample parameter sets to obtain transcoding sample videos corresponding to the transcoding sample parameter sets;
and determining a transcoding quality score function of the transcoding sample video corresponding to each group of transcoding sample parameter sets, constructing an objective function by taking the maximized transcoding quality score function as a target, and training the transcoding parameter model.
3. The video optimization processing method according to claim 2, wherein an attention layer is included in the transcoding parameter model, the method further comprising:
adjusting attention weights of the transcoding sample parameters in the transcoding parameter model by the attention layer in the transcoding parameter model.
4. The video optimization processing method according to claim 3, wherein said adjusting attention weights of the transcoding sample parameters in the transcoding parameter model by the attention layer in the transcoding parameter model comprises:
and adjusting the attention weight of each transcoding sample parameter in the transcoding parameter model according to the attention score of each transcoding sample parameter in the transcoding parameter model in the attention layer.
5. The method of claim 3, wherein obtaining a plurality of sets of transcoding sample parameter sets through the transcoding parameter model comprises:
and combining the values of the transcoding sample parameters in the transcoding parameter model according to the attention weight of each transcoding sample parameter to obtain a plurality of groups of transcoding sample parameter sets.
6. The method of claim 2, wherein the determining the transcoding quality score function of the transcoded sample video corresponding to each set of the transcoded sample parameters comprises:
and determining the transcoding quality score function of the transcoding sample video corresponding to each group of transcoding sample parameter set according to the video quality score function and the file size score function of the transcoding sample video.
7. The method of claim 6, wherein the determining the transcoding quality score function of the transcoded sample video corresponding to each set of the transcoded sample parameters according to the video quality score function and the file size score function of the transcoded sample video comprises:
when the transcoding time corresponding to the transcoding sample parameter set is larger than a transcoding time threshold value or the size of a video file of the transcoding sample video is larger than that of the original sample video, determining a function value of a transcoding quality score function of the transcoding sample video as an unqualified score;
otherwise, weighting the video quality score function and the file size score function of the transcoded sample video according to a preset score weight to obtain the transcoded quality score function of the transcoded sample video.
8. The video optimization processing method according to claim 2, further comprising:
and in each iteration process of the transcoding parameter model, selecting the transcoding sample parameter set in the next iteration round according to a preset proportion from each group of transcoding sample parameter sets in the current iteration round according to the function value of the transcoding quality score function corresponding to each group of transcoding sample parameter sets.
9. A video optimization processing apparatus, comprising:
the original video acquisition module is used for acquiring an original video to be processed and determining the video type of the original video;
the transcoding parameter determining module is used for determining a target transcoding parameter corresponding to the original video according to a transcoding parameter model corresponding to the video type of the original video;
and the video transcoding processing module is used for transcoding the original video according to the target transcoding parameters to obtain a transcoded video corresponding to the original video.
10. An electronic device, comprising:
a processor; and
a memory for storing one or more programs that, when executed by the processor, cause the processor to implement the video optimization processing method of any of claims 1 to 8.
11. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the video optimization processing method according to any one of claims 1 to 8.
CN202211008975.XA 2022-08-22 2022-08-22 Video optimization processing method and device, electronic equipment and computer readable medium Active CN115396683B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211008975.XA CN115396683B (en) 2022-08-22 2022-08-22 Video optimization processing method and device, electronic equipment and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211008975.XA CN115396683B (en) 2022-08-22 2022-08-22 Video optimization processing method and device, electronic equipment and computer readable medium

Publications (2)

Publication Number Publication Date
CN115396683A true CN115396683A (en) 2022-11-25
CN115396683B CN115396683B (en) 2024-04-09

Family

ID=84119909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211008975.XA Active CN115396683B (en) 2022-08-22 2022-08-22 Video optimization processing method and device, electronic equipment and computer readable medium

Country Status (1)

Country Link
CN (1) CN115396683B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190335186A1 (en) * 2017-04-27 2019-10-31 Tencent Technology (Shenzhen) Company Limited Image transcoding method and apparatus
CN111314737A (en) * 2018-12-11 2020-06-19 阿里巴巴集团控股有限公司 Video transcoding method and device
CN111314706A (en) * 2018-12-11 2020-06-19 阿里巴巴集团控股有限公司 Video transcoding method and device
CN111510740A (en) * 2020-04-03 2020-08-07 咪咕文化科技有限公司 Transcoding method, transcoding device, electronic equipment and computer readable storage medium
CN111918066A (en) * 2020-09-08 2020-11-10 北京字节跳动网络技术有限公司 Video encoding method, device, equipment and storage medium
CN113596467A (en) * 2020-04-30 2021-11-02 北京达佳互联信息技术有限公司 Transcoding service detection method and device, electronic equipment and storage medium
US20210409741A1 (en) * 2020-06-29 2021-12-30 Beijing Baidu Netcom Science And Technology Co., Ltd. Method for transcoding video and related electronic device
CN113873291A (en) * 2021-09-24 2021-12-31 广州虎牙科技有限公司 Video coding parameter combination determination method and device and server
CN114298199A (en) * 2021-12-23 2022-04-08 北京达佳互联信息技术有限公司 Transcoding parameter model training method, video transcoding method and device
CN114693812A (en) * 2022-03-28 2022-07-01 上海哔哩哔哩科技有限公司 Video processing method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190335186A1 (en) * 2017-04-27 2019-10-31 Tencent Technology (Shenzhen) Company Limited Image transcoding method and apparatus
CN111314737A (en) * 2018-12-11 2020-06-19 阿里巴巴集团控股有限公司 Video transcoding method and device
CN111314706A (en) * 2018-12-11 2020-06-19 阿里巴巴集团控股有限公司 Video transcoding method and device
CN111510740A (en) * 2020-04-03 2020-08-07 咪咕文化科技有限公司 Transcoding method, transcoding device, electronic equipment and computer readable storage medium
CN113596467A (en) * 2020-04-30 2021-11-02 北京达佳互联信息技术有限公司 Transcoding service detection method and device, electronic equipment and storage medium
US20210409741A1 (en) * 2020-06-29 2021-12-30 Beijing Baidu Netcom Science And Technology Co., Ltd. Method for transcoding video and related electronic device
CN111918066A (en) * 2020-09-08 2020-11-10 北京字节跳动网络技术有限公司 Video encoding method, device, equipment and storage medium
CN113873291A (en) * 2021-09-24 2021-12-31 广州虎牙科技有限公司 Video coding parameter combination determination method and device and server
CN114298199A (en) * 2021-12-23 2022-04-08 北京达佳互联信息技术有限公司 Transcoding parameter model training method, video transcoding method and device
CN114693812A (en) * 2022-03-28 2022-07-01 上海哔哩哔哩科技有限公司 Video processing method and device

Also Published As

Publication number Publication date
CN115396683B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
CN110363716B (en) High-quality reconstruction method for generating confrontation network composite degraded image based on conditions
CN109286825B (en) Method and apparatus for processing video
TWI826321B (en) A method for enhancing quality of media
CN110634108B (en) Composite degraded network live broadcast video enhancement method based on element-cycle consistency confrontation network
Moorthy et al. Video quality assessment on mobile devices: Subjective, behavioral and objective studies
CN112954312B (en) Non-reference video quality assessment method integrating space-time characteristics
CN112995652B (en) Video quality evaluation method and device
CN112102212B (en) Video restoration method, device, equipment and storage medium
CN111901532B (en) Video stabilization method based on recurrent neural network iteration strategy
Shang et al. Study of the subjective and objective quality of high motion live streaming videos
CN110751649A (en) Video quality evaluation method and device, electronic equipment and storage medium
Zadtootaghaj et al. Demi: Deep video quality estimation model using perceptual video quality dimensions
CN110827380A (en) Image rendering method and device, electronic equipment and computer readable medium
CN113452944B (en) Picture display method of cloud mobile phone
CN110717868A (en) Video high dynamic range inverse tone mapping model construction and mapping method and device
CN113487564B (en) Double-flow time sequence self-adaptive selection video quality evaluation method for original video of user
Xie et al. Modeling the perceptual quality of viewport adaptive omnidirectional video streaming
Zhang et al. Quality-of-experience evaluation for digital twins in 6G network environments
CA3182110A1 (en) Reinforcement learning based rate control
CN113822954A (en) Deep learning image coding method for man-machine cooperation scene under resource constraint
Saha et al. Study of subjective and objective quality assessment of mobile cloud gaming videos
CN117478886A (en) Multimedia data encoding method, device, electronic equipment and storage medium
CN115396683B (en) Video optimization processing method and device, electronic equipment and computer readable medium
CN116416216A (en) Quality evaluation method based on self-supervision feature extraction, storage medium and terminal
CN116471262A (en) Video quality evaluation method, apparatus, device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant