CN118101956A

CN118101956A - Video processing method, apparatus, electronic device and computer readable medium

Info

Publication number: CN118101956A
Application number: CN202410350844.2A
Authority: CN
Inventors: 马伟; 赖丽秋; 卢京辉
Original assignee: Zhongxing Micro Technology Co ltd; Guangdong Zhongxing Electronics Co ltd; Vimicro Corp
Current assignee: Zhongxing Micro Technology Co ltd; Guangdong Zhongxing Electronics Co ltd; Vimicro Corp
Priority date: 2024-03-26
Filing date: 2024-03-26
Publication date: 2024-05-28

Abstract

Embodiments of the present disclosure disclose video processing methods, apparatuses, electronic devices, and computer readable media. One embodiment of the method comprises the following steps: performing video frame coding on a first frame image of a video to be processed to obtain a first frame coding result and a corresponding reconstructed frame of the first frame image; determining an initial reconstructed frame image group according to the initial video frame sequence and the reconstructed frame corresponding to the first frame image; for each initial video frame in the sequence of initial video frames, performing the following processing steps: generating a registration reference image group corresponding to the initial video frame according to the initial video frame and the initial reconstructed frame image group; and performing intelligent detection on the video frame sequence after registration and debouncing to obtain a detection result, and displaying the detection result to a user interface. The embodiment reduces the loss of image quality, reduces the waste of computing resources and shortens the time of video processing.

Description

Video processing method, apparatus, electronic device and computer readable medium

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a video processing method, apparatus, electronic device, and computer readable medium.

Background

With the widespread use of video media, video codec technology has also gained widespread attention and application. Video processing has been widely used in the fields of internet video, mobile video, high definition television, video monitoring, and the like. Video processing is a technique for processing video, which can reduce the waste of computing resources and shorten the period of video processing. In addition, registering the video for de-dithering may also reduce the waste of computing resources and shorten the period of video processing. In addition, the quality of the video image can be improved by reasonably selecting the reference image in the process of determining the coding result. Currently, video processing is generally performed in the following manner: video processing requires first compression processing of the video to achieve video processing.

However, when the above manner is adopted, there are often the following technical problems:

First, video compression is often required to reduce the amount of data when decoding video, which may result in a loss of image quality. A large amount of computing resources and time are consumed in encoding and decoding high frame rate video, resulting in wasted computing resources and long video processing cycles.

Second, noise may be introduced in the process of registering and debouncing the video, so that the quality of the video picture is lost, and the experience of the user is affected. And the degree of the registration debounce cannot be guaranteed, so that excessive processing can be caused, and therefore, calculation resources are wasted when the degree of the registration debounce is adjusted, and the period of video processing is longer.

Third, in the video encoding process, it is often necessary to compare each frame of image of the video with a reference image, resulting in lower encoding efficiency, and because the video may have more complex details, the quality of the video image may be lost when the video is compressed.

The above information disclosed in this background section is only for enhancement of understanding of the background of the inventive concept and, therefore, may contain information that does not form the prior art that is already known to those of ordinary skill in the art in this country.

Disclosure of Invention

The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose video processing methods, apparatuses, electronic devices, and computer readable media to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a video processing method, the method comprising: performing video frame coding on a first frame image of a video to be processed to obtain a first frame coding result and a corresponding reconstructed frame of the first frame image; determining an initial video frame sequence according to the video to be processed, wherein the initial video frame sequence is an initial video frame sequence from which a first frame image is removed; determining an initial reconstructed frame image group according to the initial video frame sequence and the reconstructed frame corresponding to the first frame image; for each initial video frame in the sequence of initial video frames, performing the following processing steps: generating a registration reference image group corresponding to the initial video frame according to the initial video frame and the initial reconstruction frame image group; determining an initial video frame coding result according to the pixel units corresponding to the initial video frame, the pixel units corresponding to the initial reconstructed frame image group and the pixel units corresponding to the registration reference image group; determining a coding result corresponding to the video to be processed according to the first frame coding result and the obtained initial video frame coding result set to obtain a video coding result to be processed; performing video frame decoding on the video coding result to be processed to obtain a decoded video frame sequence; performing registration debouncing processing on the decoded video frame sequence to obtain a registration debounced video frame sequence; and performing intelligent detection on the video frame sequence subjected to registration and debouncing to obtain a detection result, and displaying the detection result to a user interface.

In a second aspect, some embodiments of the present disclosure provide a video processing apparatus, the apparatus comprising: the coding unit is configured to perform video frame coding on a first frame image of the video to be processed to obtain a first frame coding result and a corresponding reconstruction frame of the first frame image; a first determining unit configured to determine an initial video frame sequence according to the video to be processed, wherein the initial video frame sequence is an initial video frame sequence from which the first frame image is removed; a second determining unit configured to determine an initial reconstructed frame image group from the initial video frame sequence and the reconstructed frame corresponding to the first frame image; an execution unit configured to execute, for each initial video frame in the initial video frame sequence, the following processing steps: generating a registration reference image group corresponding to the initial video frame according to the initial video frame and the initial reconstruction frame image group; determining an initial video frame coding result according to the pixel units corresponding to the initial video frame, the pixel units corresponding to the initial reconstructed frame image group and the pixel units corresponding to the registration reference image group; a third determining unit, configured to determine an encoding result corresponding to the video to be processed according to the first frame encoding result and the obtained initial video frame encoding result set, so as to obtain a video encoding result to be processed; the decoding unit is configured to decode the video frames of the video encoding result to be processed to obtain a decoded video frame sequence; the registration debouncing unit is configured to perform registration debouncing processing on the decoded video frame sequence to obtain a registration debounced video frame sequence; and the detection unit is configured to intelligently detect the video frame sequence subjected to registration and debouncing to obtain a detection result, and display the detection result to a user interface.

In a third aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors causes the one or more processors to implement the method described in any of the implementations of the first aspect above.

In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect above.

The above embodiments of the present disclosure have the following advantages: by the video processing method of some embodiments of the present disclosure, image quality loss is reduced, waste of computing resources is reduced, and video processing time is shortened. Specifically, the reason why the image quality is lost, the computational resources are wasted, and the period of video processing is long is that: video compression is often required to reduce the amount of data when decoding video, which may result in a loss of image quality. A large amount of computing resources and time are consumed in encoding and decoding high frame rate video, resulting in wasted computing resources and long video processing cycles. Based on this, in the video processing method of some embodiments of the present disclosure, first, video frame encoding is performed on a first frame image of a video to be processed, so as to obtain a first frame encoding result and a reconstructed frame corresponding to the first frame image. Thus, the encoding efficiency can be improved. And then, determining an initial video frame sequence according to the video to be processed, wherein the initial video frame sequence is the initial video frame sequence with the first frame image removed. Thereby, the subsequent processing can be facilitated. And then, determining an initial reconstructed frame image group according to the initial video frame sequence and the reconstructed frame corresponding to the first frame image. Thus, the loss of image quality caused during the encoding process can be reduced by determining the initial reconstructed frame image group. Thereafter, for each initial video frame in the sequence of initial video frames, the following processing steps are performed: first, generating a registration reference image group corresponding to the initial video frame according to the initial video frame and the initial reconstruction frame image group. And secondly, determining an initial video frame coding result according to the pixel units corresponding to the initial video frame, the pixel units corresponding to the initial reconstructed frame image group and the pixel units corresponding to the registration reference image group. Therefore, the method can provide more accurate motion estimation for the subsequent frames by determining the encoding result of the initial video frames, thereby reducing the calculated amount in the encoding and decoding processes and reducing the waste of calculation resources. And then, determining the coding result corresponding to the video to be processed according to the first frame coding result and the obtained initial video frame coding result set to obtain the video coding result to be processed. And secondly, carrying out video frame decoding on the video coding result to be processed to obtain a decoded video frame sequence. Thus, it is possible to avoid consuming a large amount of computing resources and time in encoding and decoding high frame rate video. And secondly, carrying out registration debouncing processing on the decoded video frame sequence to obtain a registration debounced video frame sequence. Thereby, the image quality of the decoded video frame sequence can be improved. And finally, performing intelligent detection on the video frame sequence subjected to registration and debouncing to obtain a detection result, and displaying the detection result to a user interface. Thus, the user experience can be improved. Therefore, the loss of image quality is reduced, and the waste of computing resources is reduced, thereby shortening the time of video processing.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a flow chart of some embodiments of a video processing method according to the present disclosure;

Fig. 2 is a schematic structural diagram of some embodiments of a video processing apparatus according to the present disclosure;

Fig. 3 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 is a flow 100 of some embodiments of the video processing method of the present disclosure. The video processing method comprises the following steps:

step 101, video frame encoding is carried out on a first frame image of a video to be processed, and a first frame encoding result and a corresponding reconstruction frame of the first frame image are obtained.

In some embodiments, an execution body (for example, a computing device) of the video processing method may perform video frame encoding on a first frame image of a video to be processed in a wired connection or wireless connection manner, so as to obtain a first frame encoding result and a reconstructed frame corresponding to the first frame image.

Here, the above-described video to be processed may refer to video that has not been subjected to encoding processing. For example, the video to be processed may refer to a video captured by a camera. Here, the first frame encoding result may refer to a compressed data stream result outputted after encoding the first frame image. The reconstructed frame corresponding to the first frame image may refer to a video frame similar to the first frame image generated in the encoding process.

As an example, the executing body may perform video frame encoding on a first frame image of a video to be processed by using the h.264/AVC coding standard, so as to obtain a first frame encoding result and a reconstructed frame corresponding to the first frame image.

Optionally, the executing body may perform video frame encoding on a first frame image of the video to be processed through the following steps to obtain a first frame encoding result and a reconstructed frame corresponding to the first frame image:

The first step, macro block division is carried out on pixel data corresponding to a first frame image of the video to be processed, and a divided macro block set is obtained.

As an example, the execution body may perform macroblock division on pixel data corresponding to a first frame of a video to be processed by using Intra Prediction coding (Intra Prediction), to obtain a divided macroblock set.

And secondly, comparing the pixel value of each divided macro block in the divided macro block set with the pixel value of a preset macro block to generate comparison difference information, and obtaining a comparison difference information set.

Here, the above-mentioned preset macroblock pixel value may refer to a pixel value of a preset neighboring macroblock. The adjacent macroblock may refer to a macroblock adjacent to the divided macroblock in the divided macroblock set. The comparison may be referred to as a comparison.

And thirdly, performing frequency domain conversion on each contrast difference information in the contrast difference information set to obtain converted frequency domain information.

As an example, the execution body may perform frequency domain conversion on each contrast difference information in the contrast difference information set by using discrete cosine transform (Discrete Cosine Transform, DCT) to obtain converted frequency domain information.

And fourthly, performing entropy coding on the converted frequency domain information to obtain a coding result as a first frame coding result.

As an example, the execution body may entropy encode the converted frequency domain information using huffman coding to obtain an encoding result as the first frame encoding result.

And fifthly, decoding the first frame coding result to obtain a decoded first frame image.

Here, the decoded first frame image may refer to an image in which the first frame encoding result is restored.

As an example, the execution body may decode the first frame image of the video to be processed by using the first frame encoding result and the contrast difference information set by a Decoder (Decoder) to obtain a decoded first frame image.

And sixthly, filtering the decoded first frame image to obtain a processed first frame image which is used as a corresponding reconstructed frame of the first frame image.

As an example, the execution subject may perform a filtering process on the decoded first frame image using a Gaussian filter (Gaussian Filters) to obtain a processed first frame image, which is used as a reconstructed frame corresponding to the first frame image.

Step 102, determining an initial video frame sequence according to the video to be processed.

In some embodiments, the executing body may determine an initial video frame sequence according to the video to be processed, where the initial video frame sequence is an initial video frame sequence from which the first frame image is removed.

As an example, the execution body may first reject the first frame image corresponding to the video to be processed, to obtain a video to be processed after being rejected. And then, carrying out image enhancement on each video frame in the video frame sequence corresponding to the video to be processed after the removal, and obtaining an enhanced video frame sequence as an initial video frame sequence.

And step 103, determining an initial reconstructed frame image group according to the initial video frame sequence and the reconstructed frame corresponding to the first frame image.

In some embodiments, the executing body may determine an initial reconstructed frame image group according to the initial video frame sequence and the reconstructed frame corresponding to the first frame image.

Here, the initial reconstructed frame image group may refer to a combination of each reconstructed frame corresponding to the initial video frame sequence and a reconstructed frame corresponding to the first frame image.

Optionally, the executing body may determine the initial reconstructed frame image group according to the initial video frame sequence and the reconstructed frame corresponding to the first frame image by:

and a first step of video frame coding is carried out on each initial video frame in the initial video frame sequence to obtain at least one initial reconstructed frame image.

As an example, the executing body may perform video frame encoding on each initial video frame in the initial video frame sequence by using an encoder to obtain at least one initial reconstructed frame image.

And a second step of determining the at least one initial reconstructed frame image and the reconstructed frame corresponding to the first frame image as an initial reconstructed frame image group.

Step 104, for each initial video frame in the initial video frame sequence, performing the following processing steps:

step 1041, generating a registration reference image group corresponding to the initial video frame according to the initial video frame and the initial reconstructed frame image group.

In some embodiments, the executing body may generate a registration reference image set corresponding to the initial video frame according to the initial video frame and the initial reconstructed frame image set.

As an example, the executing body may first perform feature extraction on the initial video frame and each initial reconstructed frame image in the initial reconstructed frame image group by using a computer vision technology, to obtain an initial reconstructed frame feature set. Then, the method comprises the steps of. And comparing each initial reconstructed frame characteristic in the initial reconstructed frame characteristic set with a preset characteristic descriptor to obtain a comparison result set. And then, determining the motion transformation relation corresponding to each comparison result in the comparison result set by using an optical flow method to obtain a motion transformation relation group which is used as a registration reference image group. The above feature extraction may refer to edge feature extraction. The above-mentioned preset feature descriptor may refer to a predetermined descriptor describing a feature. For example, the preset feature descriptor may refer to Scale-invariant feature transform (SIFT).

Optionally, the executing body may generate the registration reference image group corresponding to the initial video frame according to the initial video frame and the initial reconstructed frame image group by the following steps:

The first step is to determine the motion vector between each initial reconstructed frame image in the initial reconstructed frame image group and the initial video frame to generate a video frame motion vector, and obtain a video frame motion vector group.

Here, the motion vector may be a vector formed by moving distance information and direction information generated during a motion.

And a second step of determining a motion compensation video frame group according to the video frame motion vector group and the initial reconstructed frame image group.

As an example, the executing body may perform association analysis on the video frame motion vector in the video frame motion vector set and the initial reconstructed frame image in the initial reconstructed frame image set, to obtain an analyzed video frame image set. And then, performing motion compensation on the analyzed video frame image group to obtain a compensated video frame image group which is used as a motion compensation video frame group. Here, the above-described motion compensation may refer to image resampling.

Third, determining an error value between each motion compensation video frame in the motion compensation video frame group and the initial video frame to generate an error value, and obtaining an error value set.

And fourthly, sequencing error values of the motion compensation video frames corresponding to at least one error value smaller than a preset error threshold in the error value set, and obtaining a sequenced motion compensation video frame group.

As an example, the execution body may sort the motion compensation video frames corresponding to at least one error value smaller than the preset error threshold in the error value set from small to large, to obtain the sorted motion compensation video frame group.

And fifthly, screening the ordered motion compensation video frames meeting the preset conditions in the ordered motion compensation video frame groups to obtain screened motion compensation video frame groups serving as registration reference image groups corresponding to the initial video frames.

Here, the above-described preset condition may refer to a condition in which "the error threshold value is less than 0.5" is preset.

As an example, the executing body may determine at least one ordered motion compensation video frame having an error value smaller than a preset threshold value in the ordered motion compensation video frame set as a filtered motion compensation video frame set, and use the filtered motion compensation video frame set as the registration reference image set corresponding to the initial video frame. The preset threshold may be a preset threshold. For example, the preset threshold may refer to 0.5.

Step 1042, determining an initial video frame coding result according to the pixel unit corresponding to the initial video frame, the pixel unit corresponding to the initial reconstructed frame image group and the pixel unit corresponding to the registration reference image group.

In some embodiments, the executing body may determine the initial video frame encoding result according to the pixel unit corresponding to the initial video frame, the pixel unit corresponding to the initial reconstructed frame image group, and the pixel unit corresponding to the registration reference image group.

Optionally, the executing body may determine the initial video frame encoding result according to the pixel unit corresponding to the initial video frame, the pixel unit corresponding to the initial reconstructed frame image group, and the pixel unit corresponding to the registration reference image group by:

in the first step, in response to determining that the number of registration reference images in the registration reference image group is a preset number, setting the coding marks corresponding to the registration reference image group to a value corresponding to the preset number, and obtaining a first registration coding mark.

Here, the preset number may refer to a preset number. For example, the preset number may refer to 0. The first registration coding flag may be a coding flag that indicates that a coding flag corresponding to a registration reference image in the reference image group is 0.

A second step of, in response to determining that the number of registered reference images in the registered reference image group is not a preset number, performing the following processing steps:

And a first sub-step of generating a rate distortion result set according to the pixel units corresponding to the initial reconstructed frame image group and the pixel units corresponding to the registration reference image group.

As an example, the execution body may determine a distortion measure of a pixel unit corresponding to the initial reconstructed frame image group and a pixel unit corresponding to the registration reference image group by using a mean square error, to obtain a rate distortion result set.

And a second sub-step of determining a prediction result corresponding to the minimum rate distortion in the rate distortion result set.

And a third substep, in response to determining that the prediction result represents a result of predicting the corresponding pixel unit by the registration reference image group, setting the coding flag corresponding to the registration reference image group as a preset flag value, and obtaining a second registration coding flag.

Here, the preset flag value may refer to a preset flag value. For example, the preset flag value may refer to 1. The second registration coding flag may be a coding flag that indicates that a coding flag corresponding to the registration reference image in the reference image group is 1.

And a fourth sub-step, in response to determining that the prediction result represents the result predicted by the pixel unit corresponding to the initial reconstructed frame image group, setting the coding marks corresponding to the registration reference image group to the numerical values corresponding to the preset number, and obtaining a third registration coding mark.

Here, the third registration coding flag may refer to a coding flag of 0 corresponding to the registration reference image in the registration reference image group under the prediction of the pixel unit corresponding to the initial reconstructed frame image group.

And thirdly, carrying out macro block division on pixel units corresponding to the initial video frame to obtain a divided macro block unit set.

As an example, the execution body may divide the pixel unit corresponding to the initial video frame into macro blocks according to a preset macro block size, to obtain a set of divided macro block units. The above-mentioned preset macroblock size may refer to a macroblock of size 8×8.

And fourthly, adding the first registration coding mark and the corresponding index information, the second registration coding mark and the corresponding index information and the third registration coding mark to each code stream corresponding to the divided macro block unit set to obtain a code stream set corresponding to the added macro block unit.

And fifthly, combining the added macro block unit corresponding code streams in the added macro block unit corresponding code stream set to obtain a combined video frame coding result, and taking the combined video frame coding result as an initial video frame coding result.

As an example, the execution body may combine the respective added macroblock unit corresponding code streams in the added macroblock unit corresponding code stream set to obtain a combined video frame encoding result, which is used as the initial video frame encoding result.

The related content in the first step to the tenth step is taken as an invention point of the present disclosure, which solves the third technical problem mentioned in the background art, that in the video encoding process, each frame image of the video needs to be compared with a reference image, so that the encoding efficiency is low, and because the video may have more complicated details, the quality of the video image may be lost when the video is compressed. Factors that cause a loss in quality of video images tend to be as follows: in the video coding process, each frame of image of the video is often required to be compared with a reference image, so that the coding efficiency is low, and the quality of the video image can be lost when the video is compressed due to the fact that more complex details can exist in the video. If the above factors are solved, the effect of reducing the loss of quality of video images can be achieved. To achieve this, in a first step, in response to determining that the number of registered reference images in the registered reference image group is a preset number, setting the code flag corresponding to the registered reference image group to a value corresponding to the preset number, to obtain a first registered code flag. Thus, convenience can be provided for subsequent processing. A second step of, in response to determining that the number of registered reference images in the registered reference image group is not a preset number, performing the following processing steps: and a first sub-step of generating a rate distortion result set according to the pixel units corresponding to the initial reconstructed frame image group and the pixel units corresponding to the registration reference image group. Therefore, the situation that each frame image of the video is often required to be compared with a reference image in the video coding process can be avoided, and the coding efficiency can be improved through a rate distortion result set. And a second sub-step of determining a prediction result corresponding to the minimum rate distortion in the rate distortion result set. And a third substep, in response to determining that the prediction result represents a result of predicting the corresponding pixel unit by the registration reference image group, setting the coding flag corresponding to the registration reference image group as a preset flag value, and obtaining a second registration coding flag. Thus, the video can be processed according to different prediction results. And a fourth sub-step, in response to determining that the prediction result represents the result predicted by the pixel unit corresponding to the initial reconstructed frame image group, setting the coding marks corresponding to the registration reference image group to the numerical values corresponding to the preset number, and obtaining a third registration coding mark. And thirdly, carrying out macro block division on pixel units corresponding to the initial video frame to obtain a divided macro block unit set. Thus, a complicated video can be simplified. And fourthly, adding the first registration coding mark and the corresponding index information, the second registration coding mark and the corresponding index information and the third registration coding mark to each code stream corresponding to the divided macro block unit set to obtain a code stream set corresponding to the added macro block unit. Thus, the loss of video quality can be reduced. And fifthly, combining the added macro block unit corresponding code streams in the added macro block unit corresponding code stream set to obtain a combined video frame coding result, and taking the combined video frame coding result as an initial video frame coding result. Therefore, the quality loss of the video image can be reduced.

Step 105, determining the encoding result corresponding to the video to be processed according to the first frame encoding result and the obtained initial video frame encoding result set, and obtaining the video encoding result to be processed.

In some embodiments, the executing body may determine an encoding result corresponding to the video to be processed according to the first frame encoding result and the obtained initial video frame encoding result set, so as to obtain a video encoding result to be processed.

Here, the above-mentioned video encoding result to be processed may refer to an encoding result of the video to be processed.

As an example, the execution body may combine the first frame encoding result and the obtained initial video frame encoding result set to obtain a combined encoding result, which is used as the video encoding result to be processed.

And 106, performing video frame decoding on the video coding result to be processed to obtain a decoded video frame sequence.

In some embodiments, the executing body may perform video frame decoding on the video encoding result to be processed to obtain a decoded video frame sequence.

As an example, the execution body may decode the video frame from the video encoding result to be processed by using a decoder (decoder) to obtain a decoded video frame sequence.

And step 107, performing registration debouncing processing on the decoded video frame sequence to obtain a registration debounced video frame sequence.

In some embodiments, the executing body may perform a registration de-jitter process on the decoded video frame sequence to obtain a registration de-jittered video frame sequence.

As an example, the executing body may perform a registration de-jitter process on the decoded video frame sequence by using a video anti-jitter algorithm, to obtain a registration de-jittered video frame sequence.

Optionally, the executing body may perform a registration de-jitter process on the decoded video frame sequence to obtain a registration de-jittered video frame sequence by:

And firstly, carrying out resolution normalization processing on the decoded video frame sequence to obtain a normalized video frame sequence.

As an example, the executing body may normalize the resolution of each decoded video frame in the decoded video frame sequence to generate a normalized video frame, and obtain a normalized video frame sequence.

And secondly, extracting video frame characteristics of each normalized video frame in the normalized video frame sequence to generate a video frame characteristic group, and obtaining a video frame characteristic group set.

Here, the above-described video frame feature may refer to a video frame edge feature.

And thirdly, performing multi-scale registration debouncing processing on the normalized video frames corresponding to the video frame feature sets in the normalized video frame sequence according to each video frame feature set in the video frame feature set so as to generate the video frames after registration debouncing processing and obtain a video frame set after registration debouncing processing.

And fourthly, inputting the video frame set subjected to the registration de-jitter processing to a pre-trained video frame enhancement model to obtain an enhanced video frame set.

Here, the pre-trained video frame enhancement model may refer to generating an countermeasure network model. The pre-trained video frame enhancement model described above may be used as a model for image enhancement of video frames.

And fifthly, according to each video frame characteristic set in the video frame characteristic set, carrying out registration de-jitter parameter adjustment on the enhanced video frame set to obtain an adjusted video frame set.

As an example, the executing body may perform registration de-jitter parameter adjustment on the enhanced video frame set according to each video frame feature set in the video frame feature set through adaptive adjustment, to obtain an adjusted video frame set.

And sixthly, denoising the adjusted video frame set to obtain a denoised video frame set.

And seventh, carrying out smoothing treatment on the denoised video frame set to obtain a smoothed denoised video frame set.

And eighth step, registering and debouncing correction is carried out on the smoothed and denoised video frame set, and a corrected video frame set is obtained.

Here, the above-described registration debounce correction may characterize an increase in registration debounce or a decrease in registration debounce.

And ninth, carrying out inter-frame consistency analysis on each corrected video frame in the corrected video frame set to obtain an analyzed result.

And tenth, in response to determining that the characterization analysis of the analyzed results is consistent, video frame combination is performed on each corrected video frame in the corrected video frame set corresponding to the analyzed results, and a post-group video frame sequence is obtained and used as a post-jitter registration video frame sequence.

Here, the above combination may refer to merging.

The related content in the first step to the tenth step is taken as an invention point of the disclosure, which solves the second technical problem mentioned in the background art, namely that noise may be introduced in the process of registering and debouncing the video, so that the quality of the video picture is lost, and the experience of a user is affected. And the degree of the registration debounce cannot be guaranteed, so that excessive processing can be caused, and therefore, calculation resources are wasted when the degree of the registration debounce is adjusted, and the period of video processing is longer. Factors that lead to longer periods of video processing tend to be as follows: noise may be introduced in the process of registering and debouncing the video, so that the quality of video pictures is lost, and the experience of a user is affected. And the degree of the registration debounce cannot be guaranteed, so that excessive processing can be caused, and therefore, calculation resources are wasted when the degree of the registration debounce is adjusted, and the period of video processing is longer. If the above factors are solved, the effects of reducing the waste of computing resources and shortening the period of video processing can be achieved. In order to achieve this effect, in a first step, the decoded video frame sequence is subjected to resolution normalization processing, and a normalized video frame sequence is obtained. Thus, the problem that different video frames may have different resolutions can be solved. And secondly, extracting video frame characteristics of each normalized video frame in the normalized video frame sequence to generate a video frame characteristic group, and obtaining a video frame characteristic group set. Thus, convenience can be provided for subsequent processing. And thirdly, performing multi-scale registration debouncing processing on the normalized video frames corresponding to the video frame feature sets in the normalized video frame sequence according to each video frame feature set in the video frame feature set so as to generate the video frames after registration debouncing processing and obtain a video frame set after registration debouncing processing. Thus, the need for registering de-jittered video frames can be met at different scales. And fourthly, inputting the video frame set subjected to the registration de-jitter processing to a pre-trained video frame enhancement model to obtain an enhanced video frame set. Thereby, the quality of the video picture can be improved. And fifthly, according to each video frame characteristic set in the video frame characteristic set, carrying out registration de-jitter parameter adjustment on the enhanced video frame set to obtain an adjusted video frame set. Thus, a problem that the degree of registration debounce cannot be ensured, possibly resulting in excessive processing, can be avoided. And sixthly, denoising the adjusted video frame set to obtain a denoised video frame set. Thus, the possible introduction of noise into the video during the registration de-jitter process can be avoided. And seventh, carrying out smoothing treatment on the denoised video frame set to obtain a smoothed denoised video frame set. And eighth step, registering and debouncing correction is carried out on the smoothed and denoised video frame set, and a corrected video frame set is obtained. Therefore, the excessive processing can be avoided, so that the calculation resources can be saved in the follow-up processing of the registration debouncing, and the video processing period can be shortened. And ninth, carrying out inter-frame consistency analysis on each corrected video frame in the corrected video frame set to obtain an analyzed result. And tenth, in response to determining that the characterization analysis of the analyzed results is consistent, video frame combination is performed on each corrected video frame in the corrected video frame set corresponding to the analyzed results, and a post-group video frame sequence is obtained and used as a post-jitter registration video frame sequence. Thus, visual continuity between video frames can be maintained, reducing abrupt changes from frame to frame. Therefore, the waste of computing resources is reduced, and the period of video processing is shortened.

And step 108, performing intelligent detection on the video frame sequence subjected to the registration and debouncing to obtain a detection result, and displaying the detection result to a user interface.

In some embodiments, the executing body may perform intelligent detection on the video frame sequence after the registration and debouncing to obtain a detection result, and display the detection result to a user interface.

Here, the above-described user interface may refer to an interface with user equipment. The user equipment may be a mobile phone. The interface may be referred to as a home page.

As an example, the executing body may intelligently detect the registered and debounced video frame sequence through an image target detection algorithm to obtain a detection result, and display the detection result in a video form to a user interface.

With further reference to fig. 2, as an implementation of the method shown in the above figures, the present disclosure provides some embodiments of a video processing method, which apparatus embodiments correspond to those method embodiments shown in fig. 1, and the apparatus is particularly applicable to various electronic devices.

As shown in fig. 2, the video processing apparatus 200 of some embodiments includes: an encoding unit 201, a first determination unit 202, a second determination unit 203, an execution unit 204, a third determination unit 205, a decoding unit 206, a registration de-dithering unit 207, and a detection unit 208. The encoding unit 201 is configured to perform video frame encoding on a first frame image of a video to be processed, so as to obtain a first frame encoding result and a reconstructed frame corresponding to the first frame image; a first determining unit 202 configured to determine an initial video frame sequence according to the video to be processed, wherein the initial video frame sequence is an initial video frame sequence from which the first frame image is removed; a second determining unit 203 configured to determine an initial reconstructed frame image group based on the initial video frame sequence and the reconstructed frame corresponding to the first frame image; an execution unit 204 configured to execute, for each initial video frame in the sequence of initial video frames, the following processing steps: generating a registration reference image group corresponding to the initial video frame according to the initial video frame and the initial reconstruction frame image group; determining an initial video frame coding result according to the pixel units corresponding to the initial video frame, the pixel units corresponding to the initial reconstructed frame image group and the pixel units corresponding to the registration reference image group; a third determining unit 205 configured to determine, according to the first frame encoding result and the obtained initial video frame encoding result set, an encoding result corresponding to the video to be processed, and obtain a video encoding result to be processed; a decoding unit 206 configured to decode the video frame of the video encoding result to be processed, so as to obtain a decoded video frame sequence; a registration de-dithering unit 207 configured to perform registration de-dithering processing on the decoded video frame sequence to obtain a registration de-dithered video frame sequence; the detecting unit 208 is configured to intelligently detect the video frame sequence after the registration and the de-jitter to obtain a detection result, and display the detection result to the user interface.

It will be appreciated that the elements described in the apparatus 200 correspond to the various steps in the method described with reference to fig. 1. Thus, the operations, features and resulting benefits described above for the method are equally applicable to the apparatus 200 and the units contained therein, and are not described in detail herein.

Referring now to FIG. 3, a schematic diagram of an electronic device (e.g., computing device) 300 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 3 is merely an example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various suitable actions and processes in accordance with programs stored in a Read Only Memory (ROM) 302 or loaded from a storage 308 into a Random Access Memory (RAM) 304. In the RAM 303, various programs and data required for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 304 are connected to each other through a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

In general, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 308 including, for example, magnetic tape, hard disk, etc.; and communication means 309. The communication means 309 may allow the electronic device 300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 3 shows an electronic device 300 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 3 may represent one device or a plurality of devices as needed.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communications device 309, or from storage device 308, or from ROM 302. The computer program, when executed by the processing means 301, performs the functions defined in the methods of some embodiments of the present disclosure.

It should be noted that, the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (Hyper Text Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: performing video frame coding on a first frame image of a video to be processed to obtain a first frame coding result and a corresponding reconstructed frame of the first frame image; determining an initial video frame sequence according to the video to be processed, wherein the initial video frame sequence is an initial video frame sequence from which a first frame image is removed; determining an initial reconstructed frame image group according to the initial video frame sequence and the reconstructed frame corresponding to the first frame image; for each initial video frame in the sequence of initial video frames, performing the following processing steps: generating a registration reference image group corresponding to the initial video frame according to the initial video frame and the initial reconstruction frame image group; determining an initial video frame coding result according to the pixel units corresponding to the initial video frame, the pixel units corresponding to the initial reconstructed frame image group and the pixel units corresponding to the registration reference image group; determining a coding result corresponding to the video to be processed according to the first frame coding result and the obtained initial video frame coding result set to obtain a video coding result to be processed; performing video frame decoding on the video coding result to be processed to obtain a decoded video frame sequence; performing registration debouncing processing on the decoded video frame sequence to obtain a registration debounced video frame sequence; and performing intelligent detection on the video frame sequence subjected to registration and debouncing to obtain a detection result, and displaying the detection result to a user interface.

Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor comprising: the device comprises an encoding unit, a first determining unit, a second determining unit, an executing unit, a third determining unit, a decoding unit, a registration de-jittering unit and a detecting unit. The names of these units do not limit the unit itself in some cases, for example, the coding unit may also be described as "a unit that performs video frame coding on a first frame image of a video to be processed to obtain a first frame coding result and a reconstructed frame corresponding to the first frame image".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

The above description is only illustrative of some of the preferred embodiments of the present disclosure and of the principles of the technology employed above. It will be appreciated by those skilled in the art that the scope of the invention in question in the embodiments of the present disclosure is not limited to the specific combination of features described above, but encompasses other embodiments in which any combination of features described above or their equivalents is contemplated without departing from the inventive concepts described above. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. A video processing method, comprising:

Performing video frame coding on a first frame image of a video to be processed to obtain a first frame coding result and a corresponding reconstructed frame of the first frame image;

Determining an initial video frame sequence according to the video to be processed, wherein the initial video frame sequence is an initial video frame sequence from which a first frame image is removed;

determining an initial reconstructed frame image group according to the initial video frame sequence and the reconstructed frame corresponding to the first frame image;

for each initial video frame in the sequence of initial video frames, performing the following processing steps:

generating a registration reference image group corresponding to the initial video frame according to the initial video frame and the initial reconstructed frame image group;

determining an initial video frame coding result according to the pixel units corresponding to the initial video frame, the pixel units corresponding to the initial reconstructed frame image group and the pixel units corresponding to the registration reference image group;

determining a coding result corresponding to the video to be processed according to the first frame coding result and the obtained initial video frame coding result set to obtain a video coding result to be processed;

performing video frame decoding on the video coding result to be processed to obtain a decoded video frame sequence;

Performing registration de-dithering treatment on the decoded video frame sequence to obtain a registration de-dithering video frame sequence;

And performing intelligent detection on the video frame sequence subjected to registration and debouncing to obtain a detection result, and displaying the detection result to a user interface.

2. The method of claim 1, wherein the performing video frame encoding on the first frame image of the video to be processed to obtain a first frame encoding result and a reconstructed frame corresponding to the first frame image includes:

performing macro block division on pixel data corresponding to a first frame image of a video to be processed to obtain a divided macro block set;

Comparing the pixel value of each divided macro block in the divided macro block set with the pixel value of a preset macro block to generate comparison difference information, and obtaining a comparison difference information set;

Performing frequency domain conversion on each contrast difference information in the contrast difference information set to obtain converted frequency domain information;

Entropy coding is carried out on the converted frequency domain information, and a coding result is obtained and is used as a first frame coding result;

decoding the first frame coding result to obtain a decoded first frame image;

And filtering the decoded first frame image to obtain a processed first frame image which is used as a corresponding reconstructed frame of the first frame image.

3. The method of claim 1, wherein the determining an initial reconstructed frame image set from the initial video frame sequence and the corresponding reconstructed frame of the first frame image comprises:

performing video frame coding on each initial video frame in the initial video frame sequence to obtain at least one initial reconstructed frame image;

and determining the at least one initial reconstructed frame image and the reconstructed frame corresponding to the first frame image as an initial reconstructed frame image group.

4. The method of claim 1, wherein the generating a set of registered reference images corresponding to an initial video frame from the initial video frame and the initial set of reconstructed frame images comprises:

Determining a motion vector between each initial reconstructed frame image in the initial reconstructed frame image group and an initial video frame to generate a video frame motion vector, and obtaining a video frame motion vector group;

determining a motion compensated video frame set according to the video frame motion vector set and the initial reconstructed frame image set;

determining an error value between each motion compensation video frame in the motion compensation video frame group and the initial video frame to generate an error value, and obtaining an error value set;

performing error value sequencing on the motion compensation video frames corresponding to at least one error value smaller than a preset error threshold in the error value set to obtain a sequenced motion compensation video frame group;

And screening the ordered motion compensation video frames meeting the preset conditions in the ordered motion compensation video frame group to obtain a screened motion compensation video frame group serving as a registration reference image group corresponding to the initial video frame.

5. A video processing apparatus comprising:

The coding unit is configured to perform video frame coding on a first frame image of the video to be processed to obtain a first frame coding result and a corresponding reconstruction frame of the first frame image;

a first determining unit configured to determine an initial video frame sequence according to the video to be processed, wherein the initial video frame sequence is an initial video frame sequence from which a first frame image is removed;

a second determining unit configured to determine an initial reconstructed frame image group from the initial video frame sequence and the reconstructed frame corresponding to the first frame image;

An execution unit configured to, for each initial video frame in the sequence of initial video frames, perform the following processing steps: generating a registration reference image group corresponding to the initial video frame according to the initial video frame and the initial reconstructed frame image group; determining an initial video frame coding result according to the pixel units corresponding to the initial video frame, the pixel units corresponding to the initial reconstructed frame image group and the pixel units corresponding to the registration reference image group;

The third determining unit is configured to determine an encoding result corresponding to the video to be processed according to the first frame encoding result and the obtained initial video frame encoding result set, and obtain a video encoding result to be processed;

The decoding unit is configured to decode the video frames of the video encoding result to be processed to obtain a decoded video frame sequence;

The registration debouncing unit is configured to perform registration debouncing processing on the decoded video frame sequence to obtain a registration debounced video frame sequence;

and the detection unit is configured to intelligently detect the video frame sequence subjected to registration and debouncing to obtain a detection result, and display the detection result to a user interface.

6. An electronic device, comprising:

one or more processors;

A storage device having one or more programs stored thereon;

When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1 to 4.

7. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1 to 4.