CN115348448B

CN115348448B - Filter training method and device, electronic equipment and storage medium

Info

Publication number: CN115348448B
Application number: CN202211276714.6A
Authority: CN
Inventors: 郭磊; 黄跃; 陈宇聪; 闻兴
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2022-10-19
Filing date: 2022-10-19
Publication date: 2023-02-17
Anticipated expiration: 2042-10-19
Also published as: CN115348448A

Abstract

The disclosure relates to a filter training method, a filter training device, an electronic device and a storage medium, wherein the method comprises the following steps: coding and reconstructing the coding tree units of a first preset number in the current video frame to obtain the first reconstructed coding tree units of the first preset number; under the condition that a last associated video frame of the current video frame exists, a second reconstruction coding tree unit with a second preset number corresponding to the last associated video frame is obtained; the first reconstruction coding tree unit and the second reconstruction coding tree unit are positioned at different video frame coding positions; and performing filter training on a preset filter according to the first reconstruction coding tree unit, the second reconstruction coding tree unit and the current video frame, and determining that the filter training is completed under the condition that the output result of the filter meets preset filter conditions. By adopting the method, the video frame coding processing efficiency is improved.

Description

Filter training method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of video processing technologies, and in particular, to a filter training method and apparatus, an electronic device, and a storage medium.

Background

In the process of video coding processing, an ALF (Adaptive Loop Filter) module is a post-processing module of a multifunctional video encoder, and is usually implemented by using an Adaptive Filter based on wiener filtering.

The multifunctional video encoder encodes and reconstructs video frames contained in a video frame by frame, and after all CTU (Coding Tree Unit) encoding reconstruction and initial filtering processing contained in a current video frame are completed, the ALF module trains a filter according to pixels after all CTU encoding reconstruction and initial filtering processing are completed and original video pixels of an original video frame to obtain the trained filter. And then, based on the trained filter, performing ALF filtering processing on all CTU coding reconstruction pixels after the coding reconstruction and the initial filtering processing contained in the current video frame to obtain a final reconstruction video frame of the current video frame.

However, in the above method for training the filter in the video coding process, the ALF module needs to wait for all CTU coding reconstruction and initial filtering processing in the current video frame to complete, and then train the filter, thereby reducing the efficiency of the overall video frame coding processing.

Disclosure of Invention

The present disclosure provides a filter training method, an apparatus, an electronic device, and a storage medium, to at least solve the problem in the related art that it is necessary to wait for all CTU codes in a current video frame to be reconstructed and then perform filter training. The technical scheme of the disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided a filter training method, the method including:

coding and reconstructing a first preset number of coding tree units in a current video frame to obtain a first reconstructed coding tree unit with the first preset number;

under the condition that a last associated video frame of the current video frame exists, acquiring a second preset number of second reconstructed coding tree units corresponding to the last associated video frame; the first reconstruction coding tree unit and the second reconstruction coding tree unit are located at different video frame coding positions;

and performing filtering training on a preset filter according to the first reconstruction coding tree unit, the second reconstruction coding tree unit and the current video frame, and determining that the filter training is finished under the condition that the output result of the filter meets preset filtering conditions.

In one embodiment, after the encoding and reconstructing a first preset number of coding tree units in a current video frame according to a preset encoding and reconstructing strategy to obtain the first preset number of first reconstructed coding tree units, the method further includes:

extracting the characteristics of the current video frame to obtain the target position characteristics of the current video frame;

according to the target position characteristics, determining whether matched position characteristics matched with the target position characteristics exist in the position characteristics corresponding to each video frame which is subjected to coding reconstruction;

and if the matched position characteristics exist, determining the video frame corresponding to the matched position characteristics as a last associated video frame of the current video frame.

and under the condition that the last related video frame of the current video frame does not exist, performing filter training on a preset filter according to the first reconstruction coding tree units with the first preset number and the current video frame, and determining that the filter training is finished under the condition that the output result of the filter meets preset filter conditions.

according to a preset coding reconstruction strategy, coding reconstruction is carried out on other coding tree units except the first preset number in the current video frame, and all first reconstruction coding tree units corresponding to the current video frame are obtained;

and after the filter training is finished, filtering all the first reconstruction coding tree units according to the trained filter to obtain a filtered reconstruction pixel region corresponding to the current video frame.

In one embodiment, the filtering all the first reconstruction coding tree units according to the trained filter includes:

according to the trained filter and a preset filtering processing sequence, sequentially carrying out filtering processing on each first reconstruction coding tree unit corresponding to the current video frame;

after the filtering processing is performed on all the first reconstructed coding tree units according to the trained filter, the method further includes:

adding a processed identifier for the first reconstruction coding tree unit, wherein the processed identifier is used for triggering encoding reconstruction of a target coding unit in a next to-be-processed video frame corresponding to the current video frame, and the target coding unit is a coding unit in the next to-be-processed video frame, and the coding position of the video frame is the same as that of the first reconstruction coding tree unit added with the processed identifier; and the next video frame to be processed is a video frame which is coded and reconstructed depending on the reconstructed video frame of the current video frame.

In one embodiment, the obtaining a second preset number of second reconstructed coding tree units corresponding to a last associated video frame of the current video frame when the last associated video frame exists includes:

calculating the difference value between the number of all coding tree units contained in the current video frame and the first preset number, and determining a second preset number;

and under the condition that a last associated video frame of the current video frame exists, acquiring a second reconstructed coding tree unit in the last associated video frame according to the second preset number.

In one embodiment, the performing filter training on a preset filter according to the first reconstructed coding tree unit, the second reconstructed coding tree unit and the current video frame includes:

extracting a first reconstruction pixel feature from the first reconstruction coding tree unit, extracting a second reconstruction pixel feature from the second reconstruction coding tree unit, extracting an original pixel feature from the current video frame, and performing feature fusion on the first reconstruction pixel feature and the second reconstruction pixel feature to obtain a fused reconstruction pixel feature;

and performing filtering training on the filter according to the fusion reconstruction pixel characteristics and the original pixel characteristics.

According to a second aspect of embodiments of the present disclosure, there is provided a filter training apparatus, the apparatus comprising:

the encoding reconstruction unit is configured to perform encoding reconstruction on a first preset number of encoding tree units in a current video frame to obtain a first preset number of first reconstruction encoding tree units;

the obtaining unit is configured to obtain a second preset number of second reconstruction coding tree units corresponding to a last related video frame of the current video frame under the condition that the last related video frame exists; the first reconstruction coding tree unit and the second reconstruction coding tree unit are located at different video frame coding positions;

a training unit configured to perform filter training on a preset filter according to the first reconstruction coding tree unit, the second reconstruction coding tree unit and the current video frame, and determine that the filter training is completed if an output result of the filter satisfies a preset filtering condition.

In one embodiment, the apparatus further comprises:

the characteristic extraction unit is configured to perform characteristic extraction on the current video frame to obtain a target position characteristic of the current video frame;

a first determining unit, configured to determine whether there is a matching position feature matching the target position feature in the position features corresponding to the video frames after the encoding reconstruction is completed according to the target position feature;

a second determining unit configured to determine, if the matching position feature exists, a video frame corresponding to the matching position feature as a last associated video frame of the current video frame.

In one embodiment, the apparatus further comprises:

the training unit is configured to perform, in the absence of a last associated video frame of the current video frame, filter training on a preset filter according to the first preset number of first reconstruction coding tree units and the current video frame, and determine that the filter training is completed in the case that an output result of the filter satisfies a preset filtering condition.

In one embodiment, the apparatus further comprises:

the encoding reconstruction unit is configured to perform encoding reconstruction on other encoding tree units in the current video frame except the first preset number according to a preset encoding reconstruction strategy to obtain all the first reconstruction encoding tree units corresponding to the current video frame;

and the filtering processing unit is configured to perform filtering processing on all the first reconstruction coding tree units according to the trained filter after the filter training is completed, so as to obtain a filtered reconstruction pixel region corresponding to the current video frame.

In one embodiment, the filtering processing unit further includes:

the filtering processing subunit is configured to perform filtering processing on each first reconstruction coding tree unit corresponding to the current video frame in sequence according to the trained filter and a preset filtering processing sequence;

the device further comprises: the identification subunit is configured to add a processed identification to the first reconstruction coding tree unit, where the processed identification is used to trigger encoding and reconstruction of a target coding unit in a next to-be-processed video frame corresponding to the current video frame, and the target coding unit is a coding unit in the next to-be-processed video frame, where the coding position of the video frame is the same as that of the first reconstruction coding tree unit to which the processed identification is added; and the next video frame to be processed is a video frame which is coded and reconstructed depending on the reconstructed video frame of the current video frame.

In one embodiment, the obtaining unit further includes:

a determining subunit configured to perform a difference calculation according to the number of all coding tree units included in the current video frame and the first preset number, and determine a second preset number;

an obtaining subunit, configured to, in a case where there is a previous associated video frame of the current video frame, obtain, according to the second preset number, a second reconstructed coding tree unit in the previous associated video frame.

In one embodiment, the training unit further comprises:

a feature extraction subunit, configured to perform extraction of a first reconstructed pixel feature in the first reconstructed coding tree unit, extraction of a second reconstructed pixel feature in the second reconstructed coding tree unit, and extraction of an original pixel feature in the current video frame, and perform feature fusion on the first reconstructed pixel feature and the second reconstructed pixel feature to obtain a fused reconstructed pixel feature;

a training subunit configured to perform filter training on the filter according to the fused reconstructed pixel features and the original pixel features.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of demonstration filter training as described in any of the first aspects above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the filter training method according to any one of the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, which when executed by a processor of an electronic device, enables the electronic device to perform the method of filter training of any one of the above-mentioned first aspects.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

according to a preset coding reconstruction strategy, coding reconstruction is carried out on a first preset number of coding tree units in a current video frame to obtain a first preset number of first reconstruction coding tree units; under the condition that a last associated video frame of the current video frame exists, a second reconstructed coding tree unit with a second preset number corresponding to the last associated video frame is obtained; the first reconstruction coding tree unit and the second reconstruction coding tree unit are located at different video frame coding positions; and performing filtering training on a preset filter according to the first reconstruction coding tree unit, the second reconstruction coding tree unit and the current video frame, and determining that the filter training is finished under the condition that the output result of the filter meets preset filtering conditions. By adopting the method, the filter is trained through the reconstructed coding tree units with the first preset number in the current video frame and the reconstructed coding units with the second preset number in the associated video frame and the current video frame, the time generated by waiting for the completion of the coding reconstruction of all the coding tree units in the current video frame is reduced, and the coding processing efficiency of the video frame is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a flow chart illustrating a method of filter training in accordance with an exemplary embodiment.

Fig. 2 is a flowchart illustrating the step of determining an associated video frame according to an exemplary embodiment.

Fig. 3 is a flow diagram illustrating a filtering process resulting in filtered reconstructed video frames in accordance with an exemplary embodiment.

FIG. 4 is a flowchart illustrating the addition of a processed identification step in the filtering process according to an exemplary embodiment.

Fig. 5 is a schematic diagram illustrating an encoding process between video frames with association relationship according to an exemplary embodiment.

Fig. 6 is a schematic diagram illustrating an association relationship between video frames in a video frame sequence according to an exemplary embodiment.

FIG. 7 is a flowchart illustrating steps for obtaining a second reconstructed coding tree unit according to an example embodiment.

FIG. 8 is a flowchart illustrating filter training steps according to an exemplary embodiment.

FIG. 9 is a flowchart illustrating a method of filter training in accordance with an exemplary embodiment.

FIG. 10 is a block diagram illustrating a filter training apparatus in accordance with an exemplary embodiment.

FIG. 11 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

It should be further noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are information and data authorized by the user or sufficiently authorized by each party.

Fig. 1 is a flowchart illustrating a filter training method according to an exemplary embodiment, where the filter training method is used in an electronic device as shown in fig. 1, and includes the following steps.

In step S110, a first preset number of coding tree units in the current video frame are coded and reconstructed to obtain a first preset number of first reconstructed coding tree units.

In an implementation, a piece of video data may be viewed as a sequence of time-based video frames. Therefore, in the process of encoding a segment of Video data, a multifunctional Video encoder (for example, a Video encoder performs encoding processing on each Video frame included in a Video frame sequence frame by frame according to a preset encoding reconstruction policy), specifically, for a current Video frame, the multifunctional Video encoder divides the current Video frame into encoding tree units in M rows × L columns, and then performs encoding reconstruction on a first preset number of encoding tree units in the current Video frame according to a raster scanning order (i.e., from left to right, from top to bottom). For example, if the first preset number is the number of the coding tree units in the first N (where N < M) rows (default to the first N rows × L columns) of the current video frame, the multifunctional video encoder performs encoding reconstruction on the coding tree units in the first N rows to obtain first reconstructed coding tree units corresponding to the coding tree units in the first N rows.

Optionally, the encoding standard according to which the multifunctional video encoder is based may be other encoding standards using an ALF module, such as AVS3, besides the VVC encoding standard, and the embodiment of the present disclosure is not limited thereto.

In step S120, in the case that the last associated video frame of the current video frame exists, a second preset number of second reconstructed coding tree units in the reconstructed video frame of the last associated video frame is obtained.

The first reconstruction coding tree unit and the second reconstruction coding tree unit are located at different video frame coding positions.

In implementation, in a video sequence, an association relationship or a non-association relationship exists between video frames, the association relationship existing between the video frames represents that two video frames process the same coding temporal level, and the temporal distance between the two video frames is the closest. In addition, in the two video frames with the association relationship, in the process of encoding and reconstructing the current video frame, the last associated video frame is bound to complete the reconstruction of all encoding units. For example, if a number represents each video frame number in a video sequence, the video sequence is divided into a plurality of sub-sequences by taking 8 video frames as a video processing cycle, that is, the video sequence includes a: {0, 1, 2, 3, 4, 5, 7, 8}, B: {0, 1, 2, 3, 4, 5, 6, 7, 8}, C: {0, 1, 2, 3, 4, 5, 6, 7, 8}, wherein each identical video frame is numbered and has an association relationship with the nearest video frame, for example, there is an association relationship between video frame 4 in subsequence A and video frame 4 in subsequence B, and video frame 4 in subsequence A is the last associated video frame of video frame 4 in subsequence B. During the encoding and reconstruction process of the video frame 4 in the sub-sequence B, all the encoding units of the video frame 4 in the sub-sequence a are encoded and reconstructed.

Therefore, in the process of performing encoding processing on the current video frame, it is required to query whether the current video frame has a previous associated video frame, and in the case that the previous associated video frame exists, because the encoding sequence between two video frames having an association relationship indicates that the previous associated video frame of the current video frame has already been encoded and reconstructed, a second preset number of second reconstruction encoding units, which have already been encoded and reconstructed, corresponding to the previous associated video frame of the current video frame are obtained.

In order to avoid the repetition of the coded reconstruction information characterized between the first reconstructed tree unit and the second reconstructed tree unit, when the second reconstructed tree unit is selected, the coded positions of the selected second reconstructed tree unit and the first reconstructed tree unit in the video frame may be different or may have an overlapping portion. The video frame coding position is the position of each coding tree unit in the current video frame after the current video frame is divided and N rows and L columns of coding tree units are determined. For example, for a case that the video frame coding positions of the second reconstruction coding tree unit and the first reconstruction coding tree unit in the video frame are different, the first reconstruction coding tree unit corresponds to the video coding position of the first N rows and L columns in the current video frame, and the selected second reconstruction coding tree unit may correspond to the position of the M-N rows and L columns in the previous associated video frame, which is not limited in the embodiment of the present disclosure.

In step S130, a filter training is performed on a preset filter according to the first reconstructed coding tree unit, the second reconstructed coding tree unit and the current video frame, and it is determined that the filter training is completed when an output result of the filter satisfies a preset filtering condition.

In implementation, the electronic device performs filtering training on a preset filter according to a first preset number of first reconstructed coding tree units obtained by coding and reconstructing a current video frame, a second preset number of second reconstructed coding tree units corresponding to an obtained last associated video frame, and the current video frame. Specifically, there are several training methods for the filter: the method comprises the steps that a filter carries out filtering processing on a first reconstruction coding tree unit and a second reconstruction coding tree unit to obtain an output result, wherein the output result is a reconstruction coding tree unit (also called reconstruction pixels after filtering processing) after the filtering processing, then, electronic equipment calculates the mean square error between the reconstruction pixels after the filtering processing and original pixels of a current video frame in the output result, the mean square error serves as a condition for adjusting parameters of the filter, under the condition that the output result of the filter does not meet the preset filtering condition, the filtering parameters are adjusted in a self-adaptive mode according to the self algorithm characteristics (ALF, self-adaptive loop filtering) of the filter, and then, the first reconstruction coding tree unit and the second reconstruction coding tree unit are filtered again on the basis of the filter after parameter adjustment; and determining that the filter training is finished under the condition that the output result of the filter meets the preset filtering condition. And secondly, according to a preset SSE (The sum of square due to error) matrix operation method, estimating a mean square error between an output result of The filter and an original pixel of a current video frame, namely, obtaining The estimated mean square error without actually filtering The first reconstruction codebook unit and The second reconstruction codebook unit, then taking The mean square error as a filter parameter adjusting condition, and under The condition that The output result of The filter does not meet The preset filtering condition, adjusting a filtering parameter in a self-adaptive manner according to an algorithm characteristic (ALF, adaptive loop filtering) of The filter, and determining that The training of The filter is finished under The condition that The mean square error meets The preset filtering condition (for example, the mean square error is smaller than a preset mean square error threshold).

In the filter training method, the electronic device performs filter training on a preset filter, and determines that the filter training is completed under the condition that the output result of the filter meets the preset filter condition. By adopting the method, the filter is trained through the reconstructed coding tree units with the first preset number in the current video frame and the reconstructed coding units with the second preset number in the associated video frame and the current video frame, the time generated by waiting for the completion of the coding reconstruction of all the coding tree units in the current video frame is reduced, and the coding processing efficiency of the video frame is improved.

In an exemplary embodiment, during the encoding process for the current video frame, it is required to query whether there is a previous associated video frame in the current video frame, as shown in fig. 2, after step S110, the method further includes the following steps:

step S210, feature extraction is carried out on the current video frame to obtain the target position feature of the current video frame.

In implementation, the electronic device performs feature extraction on the current video frame according to a preset feature extraction algorithm (for example, a Local Binary Pattern feature extraction algorithm) to obtain a target position feature of the current video frame.

Step S220, determining whether there is a matching position feature matching the target position feature in the position features of the video frames that have completed encoding reconstruction according to the target position feature.

In implementation, the electronic device performs feature extraction on each video frame which has completed coding reconstruction to obtain a position feature corresponding to each video frame, then matches the target position feature of the current video frame in the position feature corresponding to each video frame respectively to obtain a matching degree of the position feature corresponding to each video frame, and further judges whether the matching degree of each position feature meets a preset similarity condition. And if the preset similarity condition is met, determining that the matched position feature matched with the target position feature of the current video frame exists. For example, if the preset similarity condition may be a preset similarity threshold, the determining process is as follows: if the position feature with the similarity between the target position feature and the target position feature of the current video frame smaller than the preset similarity threshold exists, the similarity between the target position feature representing the current video frame and the position feature meets a preset similarity condition, namely the position feature is a matched position feature. And if the similarity between the position feature corresponding to each video frame and the target position feature of the current video frame does not meet the preset similarity condition, determining that no matched position feature matched with the target position feature of the current video frame exists.

In step S230, if there is a matching position feature, determining the reconstructed video frame corresponding to the matching position feature as a previous associated video frame of the current video frame.

In implementation, if there is a matching position feature that matches the position feature of the current video frame, the electronic device determines a video frame corresponding to the matching position feature as an associated video frame of the current video frame, and because of query matching performed in each video frame that has completed encoding reconstruction, the electronic device determines that the associated video frame is a previous associated video frame of the current video frame based on the encoding processing sequence of the video frames.

In this embodiment, the target position feature of the current video frame is extracted and matched with the position feature of each video frame that has completed encoding reconstruction, so as to determine whether a matching position feature matching with the target position feature of the current video frame exists, thereby achieving the determination of whether a last associated video frame of the current video frame exists, so that the training data of the filter is determined according to the determination result of whether the last associated video frame of the current video frame exists.

In an exemplary embodiment, after step S110, the filter training method further includes the following steps:

and under the condition that the last related video frame of the current video frame does not exist, performing filtering training on a preset filter according to a first preset number of first reconstruction coding tree units and the current video frame, and determining that the filter training is finished under the condition that the output result of the filter meets a preset filtering condition.

In implementation, if there is no last associated video frame of the current video frame, it indicates that there is no coding information of a video frame that has been coded and reconstructed and that can be referred to before the current video frame, and at this time, the electronic device may perform filter training on a preset filter only according to the current video frame and first reconstruction coding tree units of parts (for example, a first preset number) of the current video frame that have been coded and reconstructed, and then, determine that the filter training is completed when an output result of the filter satisfies a preset filtering condition. The specific filter training process is similar to the filter training process in step S130, except that the output result is different from that in step S130, because only the first preset number of first reconstruction coding units are input, the output result of the filter is the first preset number of first reconstruction coding tree units after filtering, and then the mean square error calculation is performed according to the output result and the corresponding first preset number of original coding tree units in the current video frame, as a criterion for determining whether the output result meets the preset filtering condition, so that the filter training process is not described again in the embodiments of the present disclosure.

Optionally, the first preset number of reconstructed coding tree units that have been encoded and reconstructed may be determined as the previous N rows × L columns of reconstructed coding tree units in the current video frame, or may also be determined as the reconstructed coding tree units from N +1 rows to N + H rows (× L columns), where N + H is less than M, or also determined as the previous N rows × (L-1) columns of reconstructed coding tree units, that is, in the reconstructed coding tree units that have been encoded and reconstructed in the current video frame, the electronic device may select a partial region of the reconstructed coding tree units as the first preset number of first reconstructed coding tree units, so as to implement training on the preset filter.

In this embodiment, in the absence of a previous associated video frame of the current video frame, the preset filter is trained according to the first reconstruction coding tree unit that has completed coding reconstruction in part (the first preset number) of the current video frame and the current video frame, so that it is not necessary to wait for all the coding tree units in the current video frame to complete coding reconstruction, and then train the filter, thereby reducing the waiting time before training and improving the efficiency of video frame coding processing.

Optionally, under the condition that there is no last associated video frame of the current video frame, a training method for coding and reconstructing all coding tree units of the current video frame to obtain all first reconstruction coding tree units may be further adopted for the current video frame, and then training a preset filter according to all first reconstruction coding tree units and the current video frame, and under the condition that the filter training of the current video frame is completed, all first reconstruction coding tree units of the current video frame (all first reconstruction coding tree units also form a reconstructed video frame of the current video frame) are filtered according to the trained filter, and a next to-be-processed video frame after the current video frame executes the filter training method in the steps S110 to S130. And the next video frame to be processed is a video frame which is coded and reconstructed in the reconstructed video frame depending on the current video frame.

In an exemplary embodiment, during the process of training the filter, the electronic device may also synchronously complete the process of encoding and reconstructing the coding tree unit that is not encoded and reconstructed in the current video frame, as shown in fig. 3, after step S110, the filter training method further includes the following steps:

in step S310, according to a preset encoding and reconstructing strategy, encoding and reconstructing other encoding tree units in the current video frame except for the first preset number, so as to obtain all first reconstructed encoding tree units corresponding to the current video frame.

In implementation, while the electronic device performs a training process on the filter, the electronic device may continue to complete the encoding and reconstructing processing on other coding tree units in the current video frame except for the first preset number according to a preset encoding and reconstructing strategy, and then obtain first reconstructed coding tree units corresponding to all coding tree units in the current video frame.

In step S320, after the filter training is completed, filtering all the first reconstruction coding tree units according to the trained filter, so as to obtain a filtered reconstruction pixel region corresponding to the current video frame.

In implementation, after the filter training is completed, the electronic device may perform filtering processing on all first reconstruction coding tree units of which the coding reconstruction of the current video frame has been completed according to the trained filter, reduce a mean square error between the first reconstruction coding tree units and the coding tree units of the original pixels in the current video frame, and obtain filtered filtering coding tree units, where after the filtering of all first reconstruction coding tree units is completed, all obtained filtering coding tree units form a reconstruction pixel region corresponding to the current video frame. If the number of all the filtering coding tree units is equal to the total number of the coding tree units contained in the video frame, the reconstructed pixel region can be characterized as a reconstructed video frame; if the number of all the filtering coding tree units is less than the total number of the coding tree units contained in the video frame, the reconstructed pixel area is a partial area in the reconstructed video frame.

In this embodiment, in the process of training the filter, the coding reconstruction processing on the remaining coding tree units in the current video frame is synchronously completed, and after the filter training is completed, the trained filter is applied to perform filtering processing on all the first reconstruction coding tree units of the current video frame, so as to obtain a filtered reconstruction pixel region, reduce coding processing delay, and improve the efficiency of coding processing on the video frame.

In an exemplary embodiment, since each video frame needs to be sequentially encoded according to the sequence of the positions of the video frame in the video sequence, when filtering the reconstructed coding tree unit of the current video frame based on the trained filter, the current video frame and the next video frame to be processed of the current video frame may further set an encoding processing policy, as shown in fig. 4, in step S320, the filtering all the first reconstructed coding tree units according to the trained filter specifically includes the following steps:

step S321, sequentially filtering each first reconstructed coding tree unit corresponding to the current video frame according to the trained filter and a preset filtering processing sequence.

In implementation, the electronic device sequentially performs filtering processing on each first reconstructed coding tree unit corresponding to the current video frame according to the trained filter and a preset filtering processing sequence (for example, the filtering processing sequence is from left to right and from top to bottom in the video frame), so as to obtain a filtering coding tree unit after the filtering processing of each reconstructed coding tree unit.

After step S320, the method further comprises:

step S322, adding the processed flag to the first reconstructed coding tree unit.

The processed identifier is used for triggering encoding reconstruction of a target encoding unit in a next video frame to be processed corresponding to the current video frame, and the target encoding unit is an encoding unit in the next video frame to be processed, wherein the encoding unit has the same video frame encoding position as that of a first reconstruction encoding tree unit added with the processed identifier. The next video frame to be processed is a video frame which is coded and reconstructed depending on the reconstructed video frame of the current video frame. That is, there is a dependency relationship between the current video frame and the next video frame to be processed in the video frame encoding process, and the video encoding process of the next video frame to be processed is performed on the basis of the encoding process result of the current video frame, so that the current video frame and the next video frame to be processed corresponding to the current video frame are encoded according to the position of the current video frame in the video sequence.

In an implementation, each time the filter completes the filtering process of a first reconstruction coding tree unit, the processed flag may be added to the first reconstruction coding tree unit. As shown in fig. 5, after filtering the first reconstructed coding tree unit on the 1 st line of the current video frame, the processed flag may be added to the first reconstructed coding tree unit on the 1 st line. Based on the trigger of the processed identifier, the multifunctional video encoder performs encoding and reconstruction processing on the coding tree module in the 1 st row in the next to-be-processed video frame of the current video frame, so that after filtering processing is performed on the first reconstruction coding tree unit in the next row in the current video frame (i.e., the first reconstruction coding tree unit in the 2 nd row), the electronic device may perform encoding and reconstruction processing on the coding tree unit in the 1 st row in the next to-be-processed video frame, and further, when the number of coding tree units that have completed encoding and reconstruction in the next to-be-processed video frame satisfies a first preset number, the filter training method in steps S110 to S130 may be continuously performed on the next to-be-processed video frame.

Optionally, two video frames without an association relationship (i.e. a non-association relationship) may adopt a parallel encoding processing manner, as shown in fig. 6, in one video frame sequence, the video frame 1 and the video frame 3 have no association relationship, and the video frame 2 and the video frame 6 have no association relationship, so that the video frame 1 and the video frame 3, and the video frame 2 and the video frame 6 may execute a process of video frame encoding processing in parallel, that is, the processing processes of the above step S110 to step S130, and the processing processes of the above step S310 to step S320, so as to improve the overall video frame processing efficiency.

In an exemplary embodiment, as shown in fig. 7, in step S120, in the case that there is a previous associated video frame of the current video frame, the specific processing procedure of obtaining a second preset number of second reconstructed coding tree units corresponding to the previous associated video frame includes:

step S710, performing a difference calculation according to the number of all coding tree units included in the current video frame and a first preset number, and determining a second preset number.

In implementation, the electronic device performs difference calculation according to the number of all coding tree units included in the current video frame and a first preset number, and determines a second preset number, where the second preset number is the number of second reconstructed coding tree units in a last associated video frame that needs to obtain the current video frame. Specifically, the number of all the coding tree units included in the current video frame is (M × L), and the electronic device calculates (M × L) - (N × L) to obtain a second preset number (M-N) × L.

Step S720, in the case that there is a previous associated video frame of the current video frame, acquiring a second reconstructed coding tree unit in the previous associated video frame according to a second preset number.

In implementation, in the case that it is determined through the determination that the last associated video frame exists in the current video frame, the electronic device obtains, according to the calculated second preset number, the second preset number of second reconstruction coding tree units in the last associated video frame of the current video frame, and the first reconstruction coding tree unit and the second reconstruction coding tree unit are located at different video frame coding positions.

In this embodiment, since the sum of the first preset number and the second preset number is equal to the number of all coding tree units in the current video frame, when the first reconstructed coding tree unit and the second reconstructed coding tree unit are used as training data of the filter, reconstruction coding information corresponding to all coding tree units in the current video frame is included, and the training accuracy of the filter is improved.

In an exemplary embodiment, as shown in fig. 8, in step S130, a specific process of performing filter training on a preset filter according to the first reconstructed coding tree unit, the second reconstructed coding tree unit and the current video frame includes:

step S810, extracting a first reconstruction pixel feature from the first reconstruction coding tree unit, extracting a second reconstruction pixel feature from the second reconstruction coding tree unit, extracting an original pixel feature from the current video frame, and performing feature fusion on the first reconstruction pixel feature and the second reconstruction pixel feature to obtain a fused reconstruction pixel feature.

In implementation, the electronic device extracts a first reconstructed pixel feature included in the first reconstructed coding tree unit, extracts a second reconstructed pixel feature included in the second reconstructed coding tree unit, and extracts an original pixel feature from the current video frame according to a preset feature extraction algorithm, and then the electronic device performs fusion processing on the first reconstructed pixel feature and the second reconstructed pixel feature to obtain a fused reconstructed pixel feature.

Step S820, performing filtering training on the filter according to the fusion reconstructed pixel feature and the original pixel feature of the current video frame.

In implementation, the electronic device may further perform feature extraction on The current video frame according to a preset feature extraction algorithm to obtain an original pixel feature corresponding to The current video frame, and then predict a mean square error between an output result of The filter and an original pixel of The current video frame according to a preset SSE (The sum of square products to error) matrix operation method and The fusion reconstruction pixel feature, that is, obtain The estimated mean square error without performing actual filtering processing on The fusion reconstruction pixel feature. Then, the electronic device determines whether to adaptively adjust parameters in the filter according to a relationship between the magnitude of the mean square error and a mean square error threshold (or called a minimum mean square error value), so as to implement a training process for the filter.

In this embodiment, the filter is trained through the fusion reconstruction pixel characteristics between the first reconstruction coding tree unit and the second reconstruction coding tree unit and the original pixel characteristics of the current video frame, so that it is not necessary to wait for all coding tree units in the current video frame to complete coding reconstruction, and then train the filter, thereby reducing the waiting time before training and improving the coding processing efficiency of the video frame.

Optionally, in addition to performing feature fusion on the first reconstructed pixel feature and the second reconstructed pixel feature in the feature domain, fusion processing may be performed in the pixel domain, specifically, a first preset number of first reconstructed coding tree units and a second preset number of second reconstructed coding tree units are spliced first to obtain a spliced reconstructed video frame after splicing, and then, feature extraction is performed on the spliced reconstructed video frame to obtain a fused reconstructed pixel feature.

In one embodiment, as shown in fig. 9, there is provided a filter training method, including:

step S910, according to a preset encoding reconstruction policy, encoding and reconstructing a first preset number of encoding tree units in a current video frame to obtain a first preset number of first reconstructed encoding tree units.

Step S920, performing filter training on a preset filter according to a first preset number of first reconstructed coding tree units and the current video frame, and determining that the filter training is completed when an output result of the filter satisfies a preset filter condition.

It should be understood that, although the steps in the flowcharts of fig. 1 to 4, 7 to 9 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1 to 4 and fig. 7 to 9 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternatively with other steps or at least some of the other steps.

It is understood that the same/similar parts between the embodiments of the method described above in this specification can be referred to each other, and each embodiment focuses on the differences from the other embodiments, and it is sufficient that the relevant points are referred to the descriptions of the other method embodiments.

FIG. 10 is a block diagram illustrating a filter training apparatus according to an exemplary embodiment. Referring to fig. 8, the apparatus 800 includes an encoding reconstruction unit 1010, an acquisition unit 1020, and a training unit 1030.

The encoding reconstruction unit 1010 is configured to perform encoding reconstruction on a first preset number of encoding tree units in a current video frame according to a preset encoding reconstruction policy, so as to obtain a first reconstructed encoding tree unit with the first preset number.

The obtaining unit 1020 is configured to obtain a second preset number of second reconstructed coding tree units corresponding to a last associated video frame of the current video frame if the last associated video frame exists; the first reconstruction coding tree unit and the second reconstruction coding tree unit are located at different video frame coding positions.

The training unit 1030 is configured to perform filter training on a preset filter according to the first reconstruction coding tree unit, the second reconstruction coding tree unit and the current video frame, and determine that the filter training is completed if an output result of the filter satisfies a preset filtering condition.

In one embodiment, the apparatus 1000 further comprises:

In one embodiment, the filtering processing unit further includes:

the identification subunit is configured to add a processed identification to the first reconstruction coding tree unit after the filtering coding tree unit subjected to filtering processing by the first reconstruction coding tree unit is obtained, where the processed identification is used to trigger coding reconstruction of a next to-be-processed video frame corresponding to the current video frame at the same video frame coding position; and the next video frame to be processed is a video frame which is coded and reconstructed depending on the reconstructed video frame of the current video frame.

In one embodiment, the obtaining unit 1020 further includes:

a determining subunit, configured to perform a difference calculation according to the number of all coding tree units included in the current video frame and the first preset number, and determine a second preset number;

In one embodiment, the training unit 1030 further comprises:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 11 is a block diagram illustrating an electronic device 1100 for a filter training method in accordance with an example embodiment. For example, the electronic device 1100 can be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a gaming console, a tablet device, a medical device, a fitness device, a personal digital assistant, and so forth.

Referring to fig. 11, electronic device 1100 may include one or more of the following components: processing component 1102, memory 1104, power component 1106, multimedia component 1108, audio component 1110, input/output (I/O) interface 1112, sensor component 1114, and communications component 1116.

The processing component 1102 generally controls the overall operation of the electronic device 1100, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 1102 may include one or more processors 1120 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 1102 may include one or more modules that facilitate interaction between the processing component 1102 and other components. For example, the processing component 1102 may include a multimedia module to facilitate interaction between the multimedia component 1108 and the processing component 1102.

The memory 1104 is configured to store various types of data to support operations at the electronic device 1100. Examples of such data include instructions for any application or method operating on the electronic device 1100, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1104 may be implemented by any type or combination of volatile or non-volatile storage devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, optical disk, or graphene memory.

The power supply component 1106 provides power to the various components of the electronic device 1100. The power components 1106 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 1100.

The multimedia component 1108 includes a screen between the electronic device 1100 and a user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1108 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 1100 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 1110 is configured to output and/or input audio signals. For example, the audio component 1110 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 1100 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 1104 or transmitted via the communication component 1116. In some embodiments, audio component 1110 further includes a speaker for outputting audio signals.

The I/O interface 1112 provides an interface between the processing component 1102 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 1114 includes one or more sensors for providing various aspects of state assessment for the electronic device 1100. For example, the sensor assembly 1114 may detect an open/closed state of the electronic device 1100, the relative positioning of components, such as a display and keypad of the electronic device 1100, the sensor assembly 1114 may also detect a change in position of the electronic device 1100 or components of the electronic device 1100, the presence or absence of user contact with the electronic device 1100, orientation or acceleration/deceleration of the device 1100, and a change in temperature of the electronic device 1100. The sensor assembly 1114 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 1114 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1114 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 1116 is configured to facilitate wired or wireless communication between the electronic device 1100 and other devices. The electronic device 1100 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 1116 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 1116 also includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 1100 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as the memory 1104 comprising instructions, executable by the processor 1120 of the electronic device 1100 to perform the method described above is also provided. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, which includes instructions executable by the processor 1120 of the electronic device 1100 to perform the above-described method.

It should be noted that the descriptions of the above-mentioned apparatus, the electronic device, the computer-readable storage medium, the computer program product, and the like according to the method embodiments may also include other embodiments, and specific implementations may refer to the descriptions of the related method embodiments, which are not described in detail herein.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for filter training, the method comprising:

if the matched position characteristics exist, determining a video frame corresponding to the matched position characteristics as a last associated video frame of the current video frame;

and performing filter training on a preset filter according to the first reconstruction coding tree unit, the second reconstruction coding tree unit and the current video frame, and determining that the filter training is completed under the condition that the output result of the filter meets preset filter conditions.

2. The method of claim 1, wherein after the encoding and reconstructing the first predetermined number of the coding tree units in the current video frame to obtain the first predetermined number of the first reconstructed coding tree units, the method further comprises:

and under the condition that the last related video frame of the current video frame does not exist, performing filtering training on a preset filter according to the first reconstruction coding tree units with the first preset number and the current video frame, and determining that the filter training is finished under the condition that the output result of the filter meets preset filtering conditions.

3. The method of claim 1, wherein after the encoding and reconstructing the first predetermined number of the coding tree units in the current video frame to obtain the first predetermined number of the first reconstructed coding tree units, the method further comprises:

4. The filter training method according to claim 3, wherein the filtering all the first reconstructed coding tree units according to the trained filter comprises:

adding a processed identifier to the first reconstruction coding tree unit, where the processed identifier is used to trigger coding and reconstruction of a target coding unit in a next to-be-processed video frame corresponding to the current video frame, and the target coding unit is a coding unit in the next to-be-processed video frame at the same video frame coding position as that of the first reconstruction coding tree unit to which the processed identifier is added; and the next video frame to be processed is a video frame which is coded and reconstructed depending on the reconstructed video frame of the current video frame.

5. The method according to claim 1, wherein said obtaining a second predetermined number of second re-encoding tree units corresponding to a last associated video frame of the current video frame if the last associated video frame exists comprises:

6. The method according to claim 1, wherein the filter training of the preset filter according to the first reconstructed coding tree unit, the second reconstructed coding tree unit and the current video frame comprises:

7. A filter training apparatus, the apparatus comprising:

a second determining unit, configured to determine, if the matching position feature exists, a video frame corresponding to the matching position feature as a last associated video frame of the current video frame;

the obtaining unit is configured to obtain a second preset number of second reconstruction coding tree units corresponding to a last related video frame of the current video frame under the condition that the last related video frame exists;

8. The filter training apparatus according to claim 7, wherein the training unit is configured to perform encoding and reconstructing on other coding tree units in the current video frame except for the first preset number according to a preset encoding and reconstruction policy, so as to obtain all the first reconstructed coding tree units corresponding to the current video frame;

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the filter training method of any one of claims 1 to 6.

10. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the filter training method of any one of claims 1-6.