Background
In recent years, with the rapid development of the internet and multimedia technologies. Video signals are widely used in life due to the characteristics of large information bearing capacity, intuitive and efficient information transmission and the like. However, in real life, cases of video information leakage are not rare, the leaked privacy information can cause great loss and harm to victims if being illegally used, and some leaked information even relates to business confidentiality and national security. Therefore, video information encryption is of great significance to guarantee information security of enterprises and countries.
HEVC serving as a new generation video coding standard has excellent video compression performance and good network adaptability, and compared with the prior generation H.264 standard, the video code stream can be reduced by 50% under the same video quality. The HEVC standard has been widely applied in video surveillance, online conferences, etc. due to its excellent performance. Therefore, the research on HEVC-based video encryption has great application value.
At present, in an encryption method for HEVC (high efficiency video coding), a full-frame picture with indiscriminate content is encrypted mostly, and full-frame encryption is generally high in computation complexity and relatively long in encryption time. In some application scenarios, encryption of the region of interest of the video is more suitable for application requirements, for example, for a video monitoring system with a privacy protection function, not only privacy information reflecting personal identity needs to be protected, but also behaviors of monitored personnel need to be identified. Therefore, it is of great practical significance to design a video region-of-interest encryption method based on the HEVC standard.
Most of the current methods for encrypting the video interesting regions are designed for the h.264 standard, and the video interesting regions are encrypted by using a flexible macroblock rearrangement (FMO) mechanism in the h.264 standard. But the HEVC coding standard does not use the FMO mechanism and introduces a recursive quadtree coding structure, and the design of the entropy coding part is also greatly different from the previous coding standard. The region-of-interest encryption method designed for h.264 coding cannot be used for region-of-interest encryption of HEVC coded video. Although the method can be well combined with different coding standards, the pixel domain encryption method breaks the image correlation of the original video region, the code rate is increased during subsequent coding, and the original information is difficult to recover after lossy coding of ciphertext information of the region of interest.
Disclosure of Invention
Aiming at the problems in the prior art, the technical problem to be solved by the invention is to provide an interested region-based HEVC video encryption method, which can encrypt the interested region in HEVC video in real time. The encrypted video has small code rate increase, low calculation complexity in the encryption process and format compatibility, and can be played after the region-of-interest encryption by directly using a standard HEVC decoder.
In order to achieve the purpose, the specific technical scheme of the invention is as follows: an HEVC video encryption method based on a region of interest comprises the following steps:
1) initializing a Tile index variable TileEncryption _ idx to be encrypted to 9, a Tile index variable TileEncryption _ idx2 to be encrypted to 9, and a full-frame encryption flag variable FrameEncryption _ flag to 0;
2) reading a frame in an original YUV video sequence to be coded in sequence, and defining the frame as a current frame; wherein the frame height is recorded as FrameHeight, and the frame width is recorded as FrameWidth;
3) judging whether the current frame has an interested area, if so, turning to the step 4), otherwise, turning to the step 7);
4) carrying out Tile division on the current frame according to the position of the region of interest;
5) determining the Tile index to be encrypted, i.e. the index variable TiLEEncryption _ idx is equal to G (m)1,m2) The index of the Tile element where the corresponding CTU is located, index variable TiLEEncryption _ idx2 is equal to G (n)1,n2) The Tile unit index of the corresponding CTU;
6) determining whether the current frame encoding requires the use of a new image parameter set PPS;
7) judging whether the influence of error drift needs to be considered when the current frame is coded, namely judging whether the IDR frame closest to the current frame and the subsequent frames have video frames subjected to region-of-interest encryption, wherein the current frame is a non-I frame, if so, turning to the next step, and otherwise, turning to the step 9);
8) in the motion estimation and motion prediction stages of HEVC coding, limiting the motion estimation and MV prediction modes of a Tile unit in a non-encrypted region;
9) in the entropy coding stage, encrypting part of syntax elements of the selected Tile; wherein the syntax elements include: coeff _ abs _ level _ remaining indicating the remaining part of the absolute value of the magnitude of the transform coefficient, mvd _ sign _ flag indicating the sign bit of the motion vector difference, coeff _ sign _ flag indicating the sign bit of the transform coefficient, mvp index indicating the index of the candidate prediction motion vector;
10) ending the coding of the current frame and outputting a current frame code stream;
11) judging whether all the frames are processed, if so, turning to the step 12), otherwise, turning to the step 1);
12) the video encoding is ended.
Further, the Tile partitioning of the current frame in the step 4) includes the following steps:
4.1) dividing the current frame into a plurality of non-overlapping coding tree units CTUs according to the size of 64x64 pixels, wherein the ith column and j rows of CTUs are indicated by G (i, j),
4.2) the region of interest is expanded into a horizontal rectangular Area by determining that the CTU set covered by the region of interest is G (x)1,y1),G(x2,y2)……G(xn,yn) Let m1=min{x1,x2……xn},m2=min{y1,y2……yn},n1=max{x1,x2……xn},n2=max{y1,y2……ynThe upper left corner CTU of the horizontal rectangular Area is G (m)1,m2) The lower right corner CTU is G (m)1,m2);
4.3) judging whether the horizontal rectangular Area meets the Tile dividing condition, if not, turning to the next step, otherwise, turning to the step 4.5); the Tile dividing condition is that the horizontal distance from the left and right boundaries of a horizontal rectangular Area to the boundary of a video frame is more than or equal to 4 CTUs, and the length of the horizontal rectangular Area in the horizontal direction is more than or equal to 4 CTUs;
4.4) adjusting the horizontal rectangle Area;
4.5) carrying out Tile partition on the current frame according to a horizontal rectangle Area.
Further, the horizontal rectangle Area is adjusted by adjusting G (m) in the above step 4.4)1,m2),G(n1,n2) The point position realization comprises the following steps:
4.4.1) determination of m1If not, turning to the next step, otherwise, making m1Turning to the next step when the value is 1;
4.4.2) determination of n
1Whether or not greater than
If false, go to the next step, otherwise, order
Turning to the next step;
4.4.3) determination of n1-m1Whether the value is less than 3, if true, turning to the next step, otherwise, turning to the step 4.5);
4.4.4) judgment
Whether or not it is greater than 3- (n)
1-m
1) If false, go to the next step, otherwise let n
1=(m
1+3), go to step 4.5);
Further, the Tile partitioning of the current frame according to the horizontal rectangular Area in the step 4.5) includes the following steps:
4.5.1) determining whether m is satisfied
1Is equal to 1 and m
2Is equal to 1 and n
1Is equal to
And n is
2Is equal to
If the result is false, turning to the next step, otherwise, turning to the step 6) by making FrameEncryption _ flag equal to 1;
4.5.2) judging whether m is satisfied
1Is equal to 1 and n
1Is equal to
If false, go to the next step, otherwise, will
Setting the boundary line of the left boundary as a Tile column boundary, and turning to step 4.5.4);
4.5.3) judging whether m is satisfied
2Is equal to 1 and n
2Is equal to
If false, go to the next step, otherwise, will
Setting the boundary line of the upper boundary as a Tile line boundary, and turning to the next step;
4.5.4) judging m1If the boundary line is not equal to 1, turning to the next step, otherwise, setting the boundary line where the left boundary of the horizontal rectangular Area is located as a Tile line boundary, and turning to the next step;
4.5.5) determination of m2If the value is equal to 1, turning to the next step, otherwise setting the boundary line of the upper boundary of the horizontal rectangular Area as the Tile line boundary,turning to the next step;
4.5.6) judging n
1Whether or not equal to
If true, turning to the next step, otherwise, setting the boundary line where the right boundary of the horizontal rectangular Area is located as a Tile line boundary, and turning to the next step;
4.5.7) determining whether n is present
2Whether or not equal to
If true, the next step is carried out, otherwise, the boundary line where the lower boundary of the horizontal rectangular Area is located is set as the Tile line boundary.
Further, the determination of whether the current frame encoding requires the use of a new image parameter set PPS in step 6) above comprises the steps of:
6.1) because the change of the interesting region between each frame is small under the normal condition of the video frame, judging whether the Tile division of the current frame is the same as the previous frame or not for reducing the set frequency of PPS in the coding process, if so, turning to the step 6.2), otherwise, turning to the step 6.3);
6.2) not resetting PPS, the encoder uses the Tile division of the previous frame, and then the step 7) is carried out;
6.3) set the PPS using the Tile partition of the current frame.
Further, the motion estimation and MV prediction mode for limiting Tile units in the non-encryption region in step 8) includes the following steps:
8.1) limiting the motion estimation of a Tile unit in a non-encrypted area, judging whether all or part of pixels of a reference block obtained by generating offset of a motion vector of a current block in a current coding frame after motion estimation search are positioned in the corresponding Tile unit after area encryption in the process of P frame and B frame motion estimation, if so, turning to the next step, otherwise, turning to the step 8.3);
8.2) setting the RDCost (rate distortion cost) corresponding to the motion vector as a selectable maximum value, so that the motion vector cannot be selected as an optimal motion vector by an encoder, and turning to step 8.4);
8.3) the motion vector uses an HEVC encoder original rate distortion cost model, and the calculation formula is as follows: RDCost ═ J + λ R
Wherein J represents the estimation error generated by the current motion vector, R represents the bit number required by coding motion information, and lambda represents the proportional coefficient of loss and bit number, namely Lagrange factor;
8.4) limiting the MV prediction mode of the Tile unit in the non-encrypted region, in the MV prediction stage, an HEVC (high efficiency video coding) encoder establishes a candidate MV list, judges candidate MV information in the MVP candidate list, calculates whether a prediction block corresponding to each prediction motion vector is completely or partially located in the Tile unit corresponding to the region of interest, if true, transfers to the next step, otherwise, transfers to the step 8.6);
8.5) setting the RDcost corresponding to the predicted motion vector to be a selectable maximum value, ensuring that the predicted motion vector cannot be selected, and turning to the step 9);
8.6) the motion vector uses the original rate-distortion cost model of the HEVC encoder.
Further, the encrypting the part of the syntax elements of the selected Tile in step 9) includes the following steps:
9.1) judging whether the FrameEncryption _ flag is 1, if so, turning to the next step, otherwise, turning to the step 9.3), namely, all tiles need to be encrypted currently;
9.2) judging whether the current Tile index is equal to Tile encryption _ idx or Tile encryption _ idx2, if true, turning to the next step, otherwise, using a standard HEVC entropy encoder to encode, turning to step 10);
9.3) generating a binary chaotic sequence by utilizing a Logistic chaotic system according to the key for subsequent encryption processing of syntax elements;
9.4) in the CABAC entropy coding stage, the selected syntax element is processed in an encryption manner.
The invention has the beneficial effects that: 1) compared with the existing HEVC region-of-interest method, the method can be used for more accurately positioning the encryption target region, so that the data required to be encrypted is reduced, the real-time performance of encryption is improved, and the normal information acquisition of the region-of-interest is not influenced; 2) the method has small influence on the video code stream in the encryption process, and is favorable for realizing the balance of the encryption calculation complexity of the video interesting region and the visual security of the encrypted video.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and specific embodiments, it should be noted that the technical solutions and design principles of the present invention are described in detail below only with one optimized technical solution, but the scope of the present invention is not limited thereto.
The present invention is not limited to the above-described embodiments, and any obvious improvements, substitutions or modifications can be made by those skilled in the art without departing from the spirit of the present invention.
In the experimental embodiment, the region-of-interest encryption experimental verification is performed on a Kimoniol _1920x1080_24.yuv standard test video sequence by using the region-of-interest-based HEVC video encryption method. The video coding and decoding platform of the method is an HM16.7 standard coding and decoding program running on a Microsoft Visual Studio 2017 integrated development environment. In HEVC coding, a frame of picture is divided into a plurality of non-overlapping CTUs (coding tree units), the CTUs in the method of the present invention have a size of 64 × 64 pixels, and a frame of picture is divided into several rectangular regions in units of CTUs from the horizontal and vertical directions, where a rectangular region is a Tile unit. In the method, the areas needing to be encrypted are separated from the video frame mainly through dynamic Tile division, and encryption is realized in the entropy coding stage. In addition, in view of the problem of encryption diffusion, it is also necessary to achieve suppression of error drift in the inter-prediction stage, and other stages use the normal HEVC encoder for encoding. In the method, a coeff _ abs _ level _ remaining syntax element represents the residual part of the absolute value of the amplitude of a transformation coefficient, an mvd _ sign _ flag syntax element represents the sign bit of a motion vector difference, the coeff _ sign _ flag syntax element represents the sign bit of the transformation coefficient, an mvp index syntax element represents a candidate predicted motion vector index, FrameWidth represents the width of a video frame, FrameHeight represents the height of the video frame, TileEncryption _ idx, TileEncryption _ idx2 represents a Tile index to be encrypted, and FrameEncryption _ flag represents whether full-frame encryption is needed or not.
As shown in fig. 1, the present invention is a HEVC video encryption method based on regions of interest, including the following steps:
1) initializing a Tile index variable TileEncryption _ idx to be encrypted to 9, a Tile index variable TileEncryption _ idx2 to be encrypted to 9, and a full-frame encryption flag variable FrameEncryption _ flag to 0;
2) reading a frame in an original YUV video sequence to be coded in sequence, and defining the frame as a current frame; wherein, the frame height is marked as FrameHeight, and the frame width is marked as FrameWidth;
3) judging whether the current frame has an interested area, if so, turning to the step 4), and if not, turning to the step 7); in the specific embodiment of the invention, the method for judging whether the current frame has the region of interest is to detect whether the current frame has the face by using a face detection method based on a Gaussian model, and the detected face has the region of interest;
4) determining the Tile partition of the video frame according to the position of the region of interest, as a preferred embodiment of the present invention, performing the Tile partition on the video frame comprises the following steps:
4.1) dividing the current frame into a plurality of non-overlapping coding tree units CTUs according to the size of 64x64 pixels, wherein the ith column and j rows of CTUs are indicated by G (i, j),
4.2) the region of interest is expanded into a horizontal rectangular Area by determining that the CTU set covered by the region of interest is G (x)1,y1),G(x2,y2)……G(xn,yn) Let m1=min{x1,x2……xn},m2=min{y1,y2……yn},n1=max{x1,x2……xn},n2=max{y1,y2……ynThe upper left corner CTU of the horizontal rectangular Area is G (m)1,m2) The bottom right corner CTU is G (n)1,n2);
4.3) judging whether the horizontal rectangular Area meets the Tile dividing condition, if not, turning to the next step, otherwise, turning to the step 4.5). The Tile dividing condition is that the horizontal distance from the left and right boundaries of a horizontal rectangular Area to the boundary of a video frame is more than or equal to 4 CTUs, and the length of the horizontal rectangular Area in the horizontal direction is more than or equal to 4 CTUs;
4.4) adjusting the horizontal rectangle Area mainly by adjusting G (m)1,m2),G(n1,n2) Realizing the point position; the method comprises the following steps:
4.4.1) determination of m1If not, turning to the next step, otherwise, making m1Turning to the next step when the value is 1;
4.4.2) determination of n
1Whether or not greater than
If false, go to the next step, otherwise, order
Turning to the next step;
4.4.3) determination of n1-m1Whether the value is less than 3, if true, turning to the next step, otherwise, turning to the step 4.5);
4.4.4) judgment
Whether or not it is greater than 3- (n)
1-m
1) If false, go to the next step, otherwise let n
1=(m
1+3), go to step 4.5);
4.5) carrying out Tile division according to the horizontal rectangular Area; as a preferred embodiment of the present invention, Tile division includes the following steps:
4.5.1) determining whether m is satisfied
1Is equal to 1 and m
2Is equal to 1 and n
1Is equal to
And n is
2Is equal to
If the result is false, turning to the next step, otherwise, turning to the step 6) by making FrameEncryption _ flag equal to 1;
4.5.2) judging whether m is satisfied
1Is equal to 1 and n
1Is equal to
If false, go to the next step, otherwise, will
Setting the boundary line of the left boundary as a Tile column boundary, and turning to step 4.5.4);
4.5.3) judging whether m is satisfied
2Is equal to 1 and n
2Is equal to
If false, go to the next step, otherwise, will
Setting the boundary line of the upper boundary as a Tile line boundary, and turning to the next step;
4.5.4) judging m1If the boundary line is not equal to 1, turning to the next step, otherwise, setting the boundary line where the left boundary of the horizontal rectangular Area is located as a Tile line boundary, and turning to the next step;
4.5.5) determination of m2If the value is equal to 1, if the value is true, the next step is carried out, otherwise, the boundary line where the upper boundary of the horizontal rectangular Area is located is set as TiE, turning to the next step;
4.5.6) judging n
1Whether or not equal to
If true, turning to the next step, otherwise, setting the boundary line where the right boundary of the horizontal rectangular Area is located as a Tile line boundary, and turning to the next step;
4.5.7) determining whether n is present
2Whether or not equal to
If true, turning to the next step, otherwise setting the boundary line where the lower boundary of the horizontal rectangular Area is as the Tile line boundary;
5) determining the Tile index to be encrypted, i.e. TiLEEncryption _ idx is equal to G (m)1,m2) The index of the Tile element where the corresponding CTU is located, TiLEEncryption _ idx2 is equal to G (n)1,n2) The Tile unit index of the corresponding CTU;
6) determining whether the current frame encoding requires the use of a new image parameter set PPS; as a preferred embodiment of the invention, the method comprises the following steps:
6.1) because the change of the interesting region between each frame is small under the normal condition of the video frame, judging whether the Tile division of the current frame is the same as the previous frame or not for reducing the set frequency of PPS in the coding process, if so, turning to the step 6.2), otherwise, turning to the step 6.3);
6.2) not resetting PPS, the encoder uses the Tile division of the previous frame, and then the step 7) is carried out;
6.3) setting PPS by using Tile division of the current frame;
7) judging whether the influence of error drift needs to be considered when the current frame is coded, namely judging whether the IDR frame closest to the current frame and the subsequent frames have video frames subjected to region-of-interest encryption, wherein the current frame is a non-I frame, if so, turning to the next step, and otherwise, turning to the step 9);
8) in the motion estimation and motion prediction stages of HEVC coding, limiting the motion estimation and MV prediction modes of a Tile unit in a non-encrypted region; as a preferred embodiment of the invention, the method comprises the following steps:
8.1) limiting the motion estimation of a Tile unit in a non-encrypted area, judging whether all or part of pixels of a reference block obtained by generating offset of a motion vector of a current block in a current coding frame after motion estimation search are positioned in the corresponding Tile unit after area encryption in the process of P frame and B frame motion estimation, if so, turning to the next step, otherwise, turning to the step 8.3);
8.2) setting the RDCost (rate distortion cost) corresponding to the motion vector as a selectable maximum value, so that the motion vector cannot be selected as an optimal motion vector by an encoder, and turning to step 8.4);
8.3) the motion vector uses an HEVC encoder original rate distortion cost model, and the calculation formula is as follows: RDCost ═ J + λ R
Wherein J represents the estimation error generated by the current motion vector, R represents the bit number required by coding motion information, and lambda represents the proportional coefficient of loss and bit number, namely Lagrange factor;
8.4) limiting the MV prediction mode of the Tile unit in the non-encrypted region, in the MV prediction stage, an HEVC (high efficiency video coding) encoder establishes a candidate MV list, judges candidate MV information in the MVP candidate list, calculates whether a prediction block corresponding to each prediction motion vector is completely or partially located in the Tile unit corresponding to the region of interest, if true, transfers to the next step, otherwise, transfers to the step 8.6);
8.5) setting the RDcost corresponding to the predicted motion vector to be a selectable maximum value, ensuring that the predicted motion vector cannot be selected, and turning to the step 9);
8.6) the motion vector uses the original rate-distortion cost model of the HEVC encoder;
9) in the entropy coding stage, encrypting part of syntax elements of the selected Tile; as a preferred embodiment of the present invention, the encryption method is as shown in fig. 2, a binary sequence is generated by a Logistic chaotic system according to a secret key, the binarized data is subjected to bitwise xor processing at the CABAC entropy coding stage, and the data is replaced with the encrypted equal-length binary sequence and enters an arithmetic coder; the method comprises the following specific steps:
9.1) judging whether the FrameEncryption _ flag is 1, if so, turning to the next step, otherwise, turning to the step 9.3), namely, all tiles need to be encrypted currently;
9.2) judging whether the current Tile index is equal to Tile encryption _ idx or Tile encryption _ idx2, if true, turning to the next step, otherwise, using a standard HEVC entropy encoder to encode, turning to step 10);
9.3) generating a binary chaotic sequence by utilizing a Logistic chaotic system according to the key for subsequent encryption processing of syntax elements;
9.4) in the CABAC entropy coding stage, carrying out encryption processing on the selected syntax element, wherein a coeff _ abs _ level _ remaining syntax element represents the residual part of the absolute value of the transformation coefficient amplitude in HEVC coding, an mvd _ sign _ flag syntax element represents a motion vector difference sign bit, the coeff _ sign _ flag syntax element represents a transformation coefficient sign bit, and an mvp index syntax element represents a candidate prediction motion vector index; as a preferred embodiment of the invention, the method comprises the following steps:
9.4.1) carrying out bitwise XOR processing on binary data obtained by encoding and binarizing coeff _ abs _ level _ remaining syntax element suffixes by 0-order exponential Golomb (Ex-Golomb Oth, EGO) and binary strings generated by chaotic sequences, and substituting original data with ciphertext according to bit length and then entering bypass encoding;
9.4.2) for the syntax elements mvd _ sign _ flag and coeff _ sign _ flag, because the syntax elements are binary syntax elements, the sign bit is directly extracted and processed with the exclusive OR of the binary string generated by the chaotic series, the original sign bit data is replaced by the encrypted data, and then the encoded data enters bypass coding;
9.4.3) carrying out bitwise XOR processing on data after the mvp index syntax element is binarized by using Truncated Unary (TU) and the chaotic sequence, and carrying out conventional coding after bit-wise substitution;
10) ending the coding of the current frame and outputting a current frame code stream;
11) judging whether all the frames are processed, if so, turning to the step 12), otherwise, turning to the step 1);
12) the video encoding is ended.