CN113453012B

CN113453012B - Encoding and decoding method and device and electronic equipment

Info

Publication number: CN113453012B
Application number: CN202110711430.4A
Authority: CN
Inventors: 俞海
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2021-06-25
Filing date: 2021-06-25
Publication date: 2023-02-28
Anticipated expiration: 2041-06-25
Also published as: CN113453012A

Abstract

The application provides a coding and decoding method, a device and an electronic device, wherein the coding method comprises the following steps: acquiring position information of a region to be shielded in a first video image; copying the pixel value of the area to be shielded in the first video image to a preset image according to the position information to obtain a second video image; when a plurality of areas to be shaded exist in a first video image, copying pixel values in the areas to be shaded in the same first video image to the same preset image to obtain a second video image; shielding a region to be shielded in a first video image; encoding the shielding image to obtain a first code stream, and encoding and encrypting a second video image to obtain a second code stream; and packaging the first code stream, the second code stream and the position information of the area to be shielded to obtain a packaged data stream. The method can reduce the performance requirement of coding the image data of the region to be shielded, and ensure the shielding removal requirement under the condition of meeting the conditions.

Description

Encoding and decoding method and device and electronic equipment

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a coding and decoding method, device and electronic device.

Background

At present, with the rising of new generation computer vision and artificial intelligence technologies such as deep learning, besides the original privacy protection method based on the designated area, the privacy protection method based on the intelligence also starts to be widely applied. The method adopts intelligent means such as computer vision and the like to carry out fuzzy or covering processing on the human or vehicle area detected in the video image so as to prevent privacy disclosure.

Because the privacy mask is usually irreversible, and there is a need for a high-authority user to view image data before the privacy mask in practical applications, how to ensure information acquisition of the high-authority user while implementing the privacy mask becomes an urgent technical problem to be solved.

Disclosure of Invention

In view of the above, the present application provides a coding and decoding method, device and electronic device.

Specifically, the method is realized through the following technical scheme:

according to a first aspect of embodiments of the present application, there is provided an encoding method, including:

acquiring position information of a region to be shielded in a first video image;

copying the pixel value of the area to be shielded in the first video image to a preset image according to the position information to obtain a second video image; when a plurality of regions to be shielded exist in the first video image, copying pixel values in the plurality of regions to be shielded in the same first video image to the same preset image to obtain a second video image; the initial value of each position in the preset image is a preset value;

shielding the area to be shielded in the first video image to obtain a shielded image;

coding the shielding image to obtain a first code stream, and coding and encrypting the second video image to obtain a second code stream;

and packaging the first code stream, the second code stream and the position information of the area to be shielded to obtain a packaged data stream.

According to a second aspect of embodiments of the present application, there is provided a decoding method based on the encoding method of the first aspect, including:

acquiring a packaged data stream, and decapsulating the packaged data stream to obtain the first code stream, the second code stream and the position information of the area to be shielded;

for an accessor with the unmasking authority of a first type authority level, decoding the first code stream to obtain an masked image, and displaying the masked image; wherein the first type permission level is a level without obscured area viewing permission;

for an accessor with the unmasked authority of a second type authority level, decoding the first code stream to obtain an masked image, and decrypting and decoding the second code stream to obtain a second video image; according to the second video image and the position information of the area to be shielded, carrying out de-shielding processing on a specified shielding area in the shielding image, and displaying the video image subjected to the de-shielding processing on the specified shielding area; the second type permission level is a level with the viewing permission of the appointed shielding region, the appointed shielding region comprises at least one shielding region in the shielding image, and the shielding region is a region which shields a region to be shielded in the first video image.

According to a third aspect of embodiments of the present application, there is provided an encoding apparatus including:

the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring the position information of a region to be shielded in a first video image;

the preprocessing unit is used for copying the pixel value of the area to be shielded in the first video image to a preset image according to the position information to obtain a second video image; when a plurality of areas to be shaded exist in the first video image, copying pixel values in the areas to be shaded in the same first video image to the same preset image to obtain a second video image; the initial value of each position in the preset image is a preset value;

the shielding unit is used for shielding the area to be shielded in the first video image to obtain a shielded image;

the encoding unit is used for encoding the shielding image to obtain a first code stream, and encoding and encrypting the second video image to obtain a second code stream;

and the packaging unit is used for packaging the first code stream, the second code stream and the position information of the area to be shielded to obtain a packaged data stream.

According to a fourth aspect of embodiments of the present application, there is provided a decoding apparatus comprising:

an acquisition unit configured to acquire an encapsulated data stream;

a decapsulation unit, configured to decapsulate the encapsulated data stream to obtain the first code stream, the second code stream, and the location information of the to-be-masked area;

the decoding unit is used for decoding the first code stream to obtain a mask image aiming at an accessor with a de-masking authority of a first type authority level; wherein the first type permission level is a level without obscured area viewing permission;

a display unit configured to display the mask image;

the decoding unit is further configured to decode the first code stream to obtain a mask image for an accessor whose unmasking authority is a second type authority level, and decrypt and decode the second code stream to obtain a second video image; the second type permission level is a level with a designated occlusion region viewing permission, the designated occlusion region comprises at least one occlusion region in the occlusion image, and the occlusion region is a region which is obtained by occluding a region to be occluded in the first video image;

the de-occlusion unit is used for carrying out de-occlusion processing on the specified occlusion region in the occlusion image according to the second video image and the position information of the region to be occluded;

the display unit is further used for displaying the video image subjected to the de-occlusion processing of the designated occlusion area.

According to a fifth aspect of embodiments of the present application, there is provided an encoding end device, including a processor and a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions capable of being executed by the processor, and the processor is configured to execute the machine-executable instructions to implement the encoding method of the first aspect.

According to a sixth aspect of embodiments of the present application, there is provided a decoding-side device, including a processor and a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions capable of being executed by the processor, and the processor is configured to execute the machine-executable instructions to implement the decoding method of the second aspect.

The encoding method of the embodiment of the application obtains a second video image by copying the pixel values of the area to be shielded in the first video image to a preset image, records the original image data of the area to be shielded in the first video image through the second video image, and copies the pixel values in the areas to be shielded in the same first video image to the same preset image when a plurality of areas to be shielded exist in the first video image, so that the management difficulty of the image data of the areas to be shielded is reduced, and secondly, when encoding the image data of the areas to be shielded in the same video image, the encoding method can be realized in a single-channel encoding mode without encoding different areas to be shielded in different channels, so that the performance requirement of encoding the image data of the areas to be shielded is reduced, and moreover, the security of the original image data of the areas to be shielded is ensured by encoding and encrypting the second video image; and finally, the first code stream, the second code stream and the position information of the area to be shielded are packaged into a packaged data stream, so that the shielding removing requirement under the condition of meeting the condition is ensured.

Drawings

Fig. 1 is a schematic flowchart of an encoding method according to an exemplary embodiment of the present application;

FIG. 2 is a flowchart illustrating a decoding method according to an exemplary embodiment of the present application;

FIG. 3A is a schematic view of a video occlusion process flow provided by an exemplary embodiment of the present application;

FIG. 3B is a schematic diagram illustrating the effect of a masking process and a de-masking process according to an exemplary embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an encoding apparatus according to an exemplary embodiment of the present application;

fig. 5 is a schematic structural diagram of a decoding apparatus according to an exemplary embodiment of the present application;

fig. 6 is a schematic diagram of a hardware structure of an encoding-side device according to an exemplary embodiment of the present application;

fig. 7 is a schematic hardware structure diagram of a decoding-side device according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

In order to make the technical solutions provided in the embodiments of the present application better understood and make the above objects, features, and advantages of the embodiments of the present application more obvious and understandable by those skilled in the art, the technical solutions in the embodiments of the present application are further described in detail below with reference to the accompanying drawings.

Referring to fig. 1, a flow chart of an encoding method provided in an embodiment of the present application is shown, as shown in fig. 1, and as shown in fig. 1, the encoding method may include the following steps:

step S100, position information of a region to be shielded in the first video image is obtained.

In the embodiment of the present application, the first video image does not refer to a fixed video image, but may refer to any frame of video image in the original video.

For example, the region to be masked may include, but is not limited to, a region to be masked determined by the target detection method and/or a predetermined region to be masked.

For example, the area to be masked includes an area to be masked determined by the target detection method and a preset area to be masked. If a window is included in the monitored scene, the area corresponding to the window in the video image can be set as the area needing to be shielded in advance, so that the workload of target detection is reduced under the condition of ensuring the comprehensiveness of shielding.

For example, the region to be masked determined by the target detection method may include, but is not limited to, a human face region and/or a license plate region.

For example, when the region to be shielded is the region to be shielded determined by the target detection method, the position information of the region to be shielded may be determined by the target detection method.

When the region to be shielded is a preset region to be shielded, the position information of the region to be shielded can be preset.

It should be noted that, in an actual scene, there may be a situation that the front-end video capture device moves, for example, the cloud platform camera moves along with the movement of the cloud platform, in this case, the position of the preset region to be masked in the video image changes accordingly, and if the encoding end still determines the region to be masked in the video image according to the position of the preset region to be masked in the video image when the cloud platform moves, the problem of mask misalignment may be caused.

Based on the consideration, in the embodiment of the application, the preset position information of the region to be shielded can be updated according to the actual image acquisition condition, so that the shielding accuracy is improved.

For example, updating the preset position information of the region to be shielded according to the actual image acquisition condition may include: and calculating the actual position information of the preset region to be shielded in the video graph according to the movement of the front-end video acquisition equipment.

For example, the area to be shielded may be a rectangular area, and the position information of the area to be shielded may include, but is not limited to, coordinate information of diagonal vertices (e.g., top left vertex and bottom right vertex, top right vertex and bottom left vertex, etc.) of the area to be shielded, coordinate information of one of the vertices of the area to be shielded, and width and height information of the area to be shielded, or coordinate information of a center point of the area to be shielded, and width and height information of the area to be shielded, and the like.

It should be noted that, in the embodiment of the present application, for the case where the region to be masked is determined by the target detection method, when the contour of the target is obtained by the target detection method, the position information of the minimum bounding rectangle of the contour of the target may also be used as the position information of the region to be masked.

In addition, in the embodiments of the present application, the mentioned coordinate information refers to coordinates in a coordinate system with the top left vertex of the image as the origin, the horizontal right direction as the x-axis forward direction, and the vertical downward direction as the y-axis forward direction.

Step S110, copying pixel values of a region to be shielded in a first video image to a preset image according to position information of the region to be shielded in the first video image to obtain a second video image; when a plurality of areas to be shaded exist in a first video image, copying pixel values in the areas to be shaded in the same first video image to the same preset image to obtain a second video image; the initial value of each position in the preset image is a preset value.

In the embodiment of the present application, considering that the masking of the video image is usually irreversible, in order to implement the de-masking of the video image in the subsequent process, the original image data of the area to be masked may be saved before the area to be masked is subjected to the masking processing.

For example, original image data of regions to be masked in the same video image may be uniformly stored in the same preset image (also referred to as a target storage space herein), that is, when there are multiple regions to be masked in a first video image, pixel values in the multiple regions to be masked in the same first video image are copied to the same preset image to obtain a second video image, so that difficulty in managing image data of the regions to be masked may be reduced. In addition, when the image data of the region to be shielded in the same video image is coded and transmitted, the single-channel coding and transmission mode can be used for realizing the single-channel coding and transmission mode, the coding and transmission of different regions to be shielded in different channels are not needed, and the performance requirement for coding the image data of the region to be shielded is lowered.

When the decoding end decodes, the image data of a plurality of shielding areas of the same shielding image can be obtained from the same second video image, and the image data of different shielding areas in the same shielding image does not need to be obtained from a plurality of different images respectively, so that the efficiency of the de-shielding processing of the decoding end is improved.

For example, the initial value of each position in the preset image may be a preset value.

For example, the preset value may be a pixel value which is less frequently present in the actual image, so that when the pixel value of the region to be masked is copied to the preset image to obtain the second video image, the position of the region to be masked in the second video image may be determined according to the pixel value difference of each pixel position in the second video image, thereby improving the reliability of the de-masking process.

For example, the preset value may be 0, 127, 128, or 255.

In one example, the initial values of the positions in the preset image are the same, so that the code rate of the code stream obtained by encoding the second video image can be greatly reduced, and the storage space and the transmission flow are saved.

In this embodiment of the application, when the position information of the area to be masked in the first video image is obtained in the manner in step S100, the pixel value of the area to be masked in the first video image may be copied to the preset image according to the position information of the area to be masked in the first video image, so as to obtain the second video image.

Step S120, a region to be shielded in the first video image is shielded to obtain a shielded image.

In this embodiment of the application, after the pixel value of the region to be masked in the first video image is copied to the preset image in step S110, the region to be masked in the first video image may be masked to obtain a masked first video image (referred to as a masked image herein).

It should be noted that, for any region to be masked in the first video image, after copying the pixel value of the region to be masked in the first video image to the preset image, the region to be masked in the first video image may be masked; or after the pixel values of all the regions to be masked in the first video image are copied to the preset image, the regions to be masked in the first video image can be masked.

For example, the masking processing on the region to be masked in the first video image may adopt a multi-tap filtering manner.

For example, the masking process may be performed on the region to be masked (i.e., the mosaic process may be performed on the region to be masked) by using a multi-tap average filtering and downsampling.

Step S130, the first video image after shielding is coded to obtain a first code stream, and the second video image is coded and encrypted to obtain a second code stream.

Step S140, the first code stream, the second code stream and the position information of the area to be shielded are packaged to obtain a packaged data stream.

In the embodiment of the application, when the second video image and the masked first video image are obtained in the above manner, the masked image may be encoded to obtain the first code stream, and the second video image may be encoded to obtain the second code stream.

Illustratively, in order to improve the security of the image data of the region to be masked in the first video image, when the second video image is encoded, the code stream obtained by encoding may also be encrypted.

In the embodiment of the application, the encoding end can encapsulate the first code stream, the second code stream and the position information of the area to be shielded to obtain an encapsulated data stream.

For example, for the obtained encapsulated data stream, the encoding end device may send to the decoding end for decoding and displaying, or may perform storage processing, for example, store the encapsulated data stream in a specified storage space.

In the method flow shown in fig. 1, a second video image is obtained by copying pixel values of a region to be masked in a first video image to a preset image, and original image data of the region to be masked in the first video image is recorded by the second video image, when a plurality of regions to be masked exist in the first video image, the pixel values in the plurality of regions to be masked in the same first video image are copied to the same preset image, so that the management difficulty of the image data of the regions to be masked is reduced, and secondly, when encoding the image data of the regions to be masked in the same video image, the encoding can be realized in a single-channel encoding mode without encoding different regions to be masked in different channels, so that the performance requirement for encoding the image data of the regions to be masked is reduced, and moreover, the security of the original image data of the regions to be masked is ensured by encoding and encrypting the second video image; and finally, the first code stream, the second code stream and the position information of the area to be shielded are packaged into a packaged data stream, so that the shielding removing requirement under the condition of meeting the condition is ensured.

In some embodiments, the area to be masked comprises a first type of area to be masked, and/or a second type of area to be masked.

Illustratively, the first type of region to be occluded is determined by performing a target detection on the first video image, and the second type of region to be occluded is determined by a preset method.

In one example, when the area to be masked includes a first type area to be masked, the first type area to be masked is obtained by performing boundary expansion on a dynamic target area, where the dynamic target area is an area to be masked determined by performing a target detection on the first video image.

For example, in order to optimize the occlusion effect of the occlusion processing, for the region to be occluded (referred to as a dynamic target region herein) in the first video image determined by the target detection method, the boundary of the dynamic target region may be expanded to obtain the region actually subjected to the occlusion processing in the subsequent process (i.e., the first type region to be occluded), so as to improve the coverage of the first type region to be occluded, optimize the occlusion effect of the occlusion processing, and prevent the information leakage of the occluded target in the occlusion image due to the fact that the region to be occluded is too small.

It should be noted that, for the region to be masked determined in a preset manner, the boundary extension processing may also be performed as needed.

In an example, the performing the boundary extension on the dynamic target area may include:

extending at least one boundary of the dynamic target region outward by n ₁ One pixel, n ₁ Is a positive integer.

For example, when the boundary of the dynamic target region is expanded, at least one boundary of the dynamic target region may be expanded outward by at least 1 pixel (n is used herein) ₁ One pixel, n ₁ Positive integer) to increase flexibility of boundary extension.

As an example, a dynamic targetThe region is a rectangular region, and the distance from the expanded boundary to the designated boundary is n of 2 ₂ Integer multiples of the power; wherein the designated boundary is a boundary parallel to the boundary extended from the target region, n, of the left boundary and the upper boundary of the first video image ₂ Is a positive integer.

For example, it is considered that when a video image is encoded, the video image is usually divided into a plurality of image blocks, and each image block is encoded separately, and the width and height of each image block are usually an integer power of 2.

In addition, since the width and height of the region to be masked itself are not necessarily integral powers of 2, expanding the width and height of the integral region to be masked does not necessarily enable the width and height of the expanded block to be exactly integral powers of 2. In addition, even if the width and height of the region to be masked is an integer power of 2, if the position of the region to be masked is not aligned according to the integer power of 2, the expanded boundary cannot be exactly positioned at the integer power of 2, and the region to be masked crosses the boundary when the image block division is performed on the image.

Therefore, in order to reduce the situation that the boundary of the area to be shielded is crossed when the image blocks of the video image are divided, save the code stream of the code and simplify the mode selection during the image compression coding, when the boundary of the target area is expanded outwards, the distance from the expanded boundary to the designated boundary can be n, wherein the distance from the expanded boundary to the designated boundary is 2 ₂ Integer multiples of the power.

Illustratively, the specified boundary is a boundary parallel to the boundary at which the target region expands, among the left boundary and the upper boundary of the first video image.

Exemplary, n ₂ Is a positive integer.

For example, n ₂ And may be a natural number greater than 3 and less than 8.

For example, taking the example of expanding the left boundary of the target region outward (expanding to the left), n with a distance of 2 from the left boundary of the expanded target region to the left boundary of the first video image may be used ₂ Integer multiples of the power.

As another example, toFor example, the upper boundary of the target region may be expanded outward (expanded upward), and the distance from the upper boundary of the expanded target region to the upper boundary of the first video image may be n, where n is 2 ₂ Integer multiples of the power.

In some embodiments, in step S110, copying the pixel values of the region to be masked in the first video image to the preset image according to the position information of the region to be masked in the first video image may include:

for any region to be shielded in the first video image, determining second position information of the region to be shielded in a preset image according to first position information of the region to be shielded in the first video image and a preset position mapping relation; the preset position mapping relation comprises a mapping relation between the position of the area to be shielded in the first video image and the position of the area to be shielded in the preset image;

and copying the pixel value of the area to be shielded in the first video image to a second position in the preset image, wherein the second position is a position matched with the second position information in the preset image.

For example, a mapping relationship between a position of the region to be masked in the first video image and a position in the preset image (which may be referred to as a preset position mapping relationship) may be set in advance.

When the position information of the region to be masked in the first video image is obtained in the manner in step S100, for any region to be masked in the first video image, the position information (referred to as second position information herein) of the region to be masked in the preset image may be determined according to the position information (referred to as first position information herein) of the region to be masked in the first video image and the preset position mapping relationship, and the pixel value of the region to be masked in the first video image is copied to a position (referred to as a target position herein) in the preset image, where the position information matches the second position information.

For example, when there are a plurality of regions to be masked in the first video image, the copying of the pixel values of the respective regions to be masked may be performed sequentially or in parallel.

In one example, the same region to be occluded may be located in the same position in the first video image as in the second video image.

In another example, the position of the same region to be masked in the first video image may have a preset offset from the position in the second video image.

For example, assume that the vertex of the upper left corner of the region to be masked 1 has the coordinate (x) in the first video image ₁ ，y ₁ ) When the pixel value of the region to be masked 1 is copied to the target storage space, the region to be masked 1 may be shifted by 2 pixels to the right and downward, respectively, that is, the coordinate of the top left corner vertex of the region to be masked 1 in the first video image is (x) ₁ +2，y ₁ +2)。

It should be noted that the above-mentioned mapping relationship of the positions of the regions to be masked is only a specific example in the embodiment of the present application, and is not limited to the scope of the present application, in the embodiment of the present application, the mapping relationship between the regions to be masked in the first video image and the regions to be masked in the second video image may also be determined in other ways, for example, the regions to be masked may be copied and arranged in the second video image according to the order from the upper left corner of each region to be masked in the first video image to the origin of coordinates from small to large, and each region to be masked may have an interval of one or more pixels in the second video image in both the horizontal and vertical directions.

In one example, the resolution of the preset image is the same as the resolution of the first video image.

The copying the pixel value of the region to be masked in the first video image to the target position in the preset image may include:

for any pixel position in the area to be shielded in the first video image, copying the pixel value of the pixel position to a position which is the same as the coordinate of the pixel position in a preset image; and the corresponding coordinates of the first position information in the first video image are the same as the corresponding coordinates of the second position information in the preset image.

For example, in order to improve the matching efficiency of the region to be masked in the first video image and the region to be masked in the second video image in the subsequent de-masking process flow, for the same region to be masked, the position of the same region to be masked in the first video image may be the same as the position of the same region to be masked in the second video image, that is, the corresponding coordinate of the first position information in the first video image is the same as the corresponding coordinate of the second position information in the preset image.

Meanwhile, in order to reduce resource consumption of transmission and storage of the second video image, the resolution of the target storage space may be the same as that of the first video image.

For any area to be shaded in the first video image, when the pixel value of the area to be shaded in the first video image is copied to the preset image, for any pixel position of the area to be shaded in the first video image, the pixel value of the pixel position is copied to the position, identical to the coordinate of the pixel position, in the preset image.

It should be appreciated that the second video image generation method described in the foregoing embodiment is only a specific example in the embodiment of the present application, and is not limited to the scope of the present application, that is, in the embodiment of the present application, the second video image generation may also be implemented in other ways, for example, for any region to be masked in the first video image, the region may be shifted in a certain way and then copied to a preset image, so as to obtain the second video image.

For example, assume that the vertex of the upper left corner of the region to be masked 1 has the coordinate (x) in the first video image ₁ ，y ₁ ) If the pixel value of the region to be masked 1 is copied to the preset image, the region to be masked 1 may be shifted by 2 pixels to the right and downward, respectively, that is, the coordinate of the vertex of the upper left corner of the region to be masked 1 in the first video image is (x) ₁ +2，y ₁ +2)。

In addition, in the embodiment of the present application, the resolution of the preset image may also be greater than the resolution of the first video image.

In some embodiments, in step S140, encapsulating the first code stream, the second code stream, and the position information of the area to be masked may further include:

and performing composite encapsulation on a second code stream corresponding to the second video image and the position information of the area to be shielded as the attached private data of the first code stream corresponding to the first video image.

Illustratively, in order to improve the efficiency of the deblocking process in the decoding process, when the encoding end encapsulates the first code stream and the second code stream, the encoding end may also encapsulate the position information of the region to be masked in the first video image.

Illustratively, when the encoding end obtains the first code stream and the second code stream according to the above manner, the second code stream and the position information of the region to be masked in the first video image may be used as the attached private data of the first code stream to perform composite encapsulation, so as to ensure that the position information of the region to be masked in the first video image and the time axis of the second video image are aligned, and improve the accuracy of de-masking in the decoding process.

In an example, before encapsulating the first code stream, the second code stream, and the position information of the area to be masked, the method may further include:

obtaining a classification identifier of a privacy area;

the encapsulating the first code stream, the second code stream, and the position information of the region to be masked may include:

and performing composite encapsulation on a second code stream corresponding to the second video image, the position information of the area to be shielded in the first video image and the classification identifier of the area to be shielded in the first video image as the attached private data of the first code stream corresponding to the first video image.

Illustratively, in order to improve the flexibility of viewing image content, the regions to be masked may be subjected to permission classification, and the regions to be masked, which can be viewed at different permission levels, may be different.

Illustratively, the permission of the region to be shielded can be graded according to the category of the region to be shielded.

For example, a user at privilege level 1 can view the content of the region to be masked of category 1; the user at the authority level 2 can view the contents of the to-be-shielded areas of the category 1 and the category 2; permission level 2 is higher than permission level 1.

For example, when performing object detection on the first video image to determine the position information of the region to be occluded (the first-type occlusion region described above) in the first video image, the classification identifier of each first-type occlusion region in the first video image may also be determined.

For example, the classification flag of the region to be masked (i.e., the second type) determined by the preset method may be preset.

When the classification identifier of each region to be shielded in the first video image is obtained, the classification identifier of each region to be shielded in the first video image can be packaged when the first code stream, the second code stream and the position information of the region to be shielded in the first video image are packaged.

For example, the second code stream corresponding to the second video image, the position information of the region to be masked in the first video image, and the classification identifier of the region to be masked in the first video image may be used as the accompanying private data of the first code stream corresponding to the first video image to perform composite encapsulation.

In some embodiments, in the case of performing permission classification on the region to be masked, the region to be masked in the first video image may be masked according to the classification identifier of the region to be masked.

For example, to improve the flexibility of the masking process, the region to be masked in the first video image may be masked according to the finnish green identifier of the region to be masked.

In one example, the different classes of regions to be masked are not treated in exactly the same way.

For example, for different types of regions to be masked, different masking processing modes can be adopted; alternatively, the same masking processing manner may be adopted for some regions to be masked of different categories, and another masking processing manner may be adopted for another region to be masked of another category.

For example, assuming that the types of the regions to be shielded include an animal, a license plate, and a sensitive field, different shielding processing manners may be respectively adopted for the regions to be shielded of different types, that is, a first shielding processing manner is adopted for the animal; aiming at the license plate, a second shielding processing mode is adopted; and aiming at the sensitive field, adopting a third shielding processing mode.

In another example, different classes of regions to be masked are masked in the same manner.

In some embodiments, the frame rate of the second video image is less than or equal to the frame rate of the first video image.

Illustratively, considering that the frame rate of the first video image (i.e. the frame rate of the original video data) is usually a frame rate suitable for being viewed by human eyes, the frame rate is usually higher than the frame rate required for information acquisition, and the second video image is mainly used for performing de-occlusion processing on the occluded first video image (i.e. the occlusion image) in a specific scene to achieve specific information acquisition, therefore, the frame rate of the second video image can be set to be lower than the frame rate of the first video image, so as to reduce the data amount of the second video image and save network bandwidth and device encoding and decoding resources under the condition of ensuring information acquisition.

For example, the frame rate of the second video image may be less than or equal to the frame rate of the first video image.

In one example, the frame rate of the second video image is 1/m of the frame rate of the first video image; wherein m is a positive integer.

For example, when the frame rate of the second video image is less than the frame rate of the first video image, for the original video data, one frame of the first video image may be extracted every several frames, and the pixel value of the region to be masked in the first video image is copied to the preset image in the manner described in the above embodiment, so as to obtain the corresponding second video image.

For example, the frame rate of the second video image may be 1/m of the frame rate of the first video image, where m is an integer greater than 1, that is, for the original video data, one frame of the first video image may be extracted every m-1 frames, and the pixel value of the region to be masked in the first video image is copied to the preset image in the manner described in the above embodiment, so as to obtain the corresponding second video image.

Illustratively, m may be a natural number greater than 1 and less than 100.

Referring to fig. 2, a flowchart of a decoding method according to an embodiment of the present application is shown, as shown in fig. 2, and as shown in fig. 2, the decoding method may include the following steps:

step S200, obtaining the encapsulated data stream, and de-encapsulating the encapsulated data stream to obtain the first code stream, the second code stream and the position information of the area to be shielded.

For example, the generation of the encapsulated data stream may refer to the relevant description in the method embodiment shown in fig. 1, and details of the embodiment of the present application are not repeated herein.

For example, the decoding end may receive the encapsulated data stream sent by the encoding end, or the decoding end may obtain the encapsulated data stream stored by the encoding end from a specified storage device.

For example, for the obtained encapsulated data stream, the decoding end may decapsulate the encapsulated data stream to obtain the first code stream, the second code stream, and the position information of the region to be masked.

Step S210, aiming at the visitor of which the unmasking authority is in the first type authority level, decoding the first code stream to obtain a mask image, and displaying the mask image; wherein the first type permission level is a level without a masked area viewing permission.

Step S220, aiming at the visitor with the unmasked authority of the second type authority level, decoding the first code stream to obtain an unmasked image, and decrypting and decoding the second code stream to obtain a second video image; according to the second video image and the position information of the area to be shielded, carrying out de-shielding processing on the appointed shielding area in the shielding image, and displaying the video image subjected to the de-shielding processing on the appointed shielding area; the second type authority level is a level with a designated shading area viewing authority, the designated shading area comprises at least one shading area in the shading image, and the shading area is an area obtained after shading an area to be shaded in the first video image.

It should be noted that, in the embodiment of the present application, for the same video image, in the encoding end processing flow, the first video image is initially used, an area to be masked in the first video image is an area to be masked, the encoding end obtains a masked image by masking the area to be masked in the first video image, and the area to be masked in the first video image is converted into a masked area (or referred to as a masked area) in the masked image.

Similarly, for a preset image, the encoding end needs to copy pixels in the region to be masked in the first video image to the preset image to obtain a second video image.

In the decoding end processing flow, the obtained occlusion image and the second video image are subjected to de-occlusion processing on part or all of occlusion areas in the occlusion image under the condition that specific conditions are determined to be met according to requirements.

For any first video image and the occlusion image corresponding to the first video image, the to-be-occluded area in the first video image and the occlusion area in the occlusion image correspond to each other one by one, and the corresponding occlusion areas with the same position exist in the occlusion image in any to-be-occluded area in the first video image. The position information of the region to be masked, which is encapsulated in the encapsulation data stream by the encoding end, can also be used for indicating the position of the masking region in the masking image.

In the embodiment of the present application, in order to guarantee the information acquisition requirement under the condition that a specific condition is met while ensuring the information security of the masked area, different permission levels (referred to as unmasking permissions herein) may be set in advance for an visitor to the video data, and the permissions of the visitor with the unmasking permissions at different levels for acquiring the information of the masked area are different.

Illustratively, the de-eclipse permission may include a level having no occluding region viewing permission (referred to herein as a first type permission level) and a level having a designated occluding region viewing permission (referred to herein as a second type permission level).

For example, the designated occlusion region may include at least one occlusion region in the occlusion image, i.e., may include some or all of the occlusion region in the occlusion image.

In the embodiment of the application, when the decoding end acquires the encapsulated data stream and decapsulates the encapsulated data stream to obtain the first code stream, the second code stream and the position information of the region to be masked, corresponding decoding processing can be performed according to the masking-removing permission of the visitor.

For example, for an visitor whose unmasking authority is at the first type authority level, that is, the visitor does not have an unmasked region viewing authority, the decoding end may decode the first code stream to obtain an masked image, and display the masked image, that is, the visitor views the masked image after the masking processing is performed on the encoding end.

Illustratively, for an accessor with a de-masking authority of a second type authority level, namely the accessor has a designated masking region viewing authority, on one hand, a decoding end can decode a first code stream to obtain a masking image; on the other hand, the second code stream may be decrypted and decoded to obtain a second video image.

The decoding end may determine the position of the designated mask region in the mask image according to the second video image and the position information of the region to be masked (i.e., the position information of the region to be masked in the first video image).

Illustratively, any region to be masked in the first video image has a masking region with the same position as that in the region to be masked.

The decoding end can determine the corresponding position of the designated occlusion region in the second video image according to the position of the designated occlusion region in the occlusion image, copy the pixels of the corresponding position in the second video image into the occlusion image, perform de-occlusion processing on the designated occlusion region in the occlusion image, and display the video image subjected to de-occlusion processing on the designated occlusion region.

In some embodiments, in step S220, performing a de-occlusion process on the designated occlusion region in the occlusion image according to the second video image and the position information of the region to be occluded may include:

determining the position information of the appointed shielding region in the shielding image according to the position information of the region to be shielded; for any region to be shielded, the position information of the region to be shielded in the first video image is the same as the position information of the shielded region obtained by shielding the region to be shielded in the shielded image;

determining the position of the appointed shielding area in the second video image according to the position information of the appointed shielding area in the shielding image and a preset position mapping relation; the preset position mapping relation comprises a mapping relation between the position of the shielding area in the shielding image and the position of the shielding area in the second video image;

and copying the pixel value of the designated occlusion region in the second video image to the corresponding position in the occlusion image according to the position of the designated occlusion region in the second video image to obtain the video image subjected to the de-occlusion processing by the designated occlusion region.

For example, since the decoding end processes the second video image and the mask image, and when performing the de-mask processing, it needs to de-mask the specified mask region in the mask region according to the pixel at the specified position in the second video image, the decoding end may set the mapping relationship between the position of the mask region in the mask image and the position in the second video image, that is, for the decoding end, the preset position mapping relationship is converted from the mapping relationship between the position of the region to be masked in the first video image and the position in the preset image into the mapping relationship between the position of the mask region in the mask image and the position in the second video image.

The decoding end may determine the position information of the designated mask region in the mask image according to the position information of the region to be masked (i.e., the position information of the region to be masked in the first video image).

Illustratively, for any region to be masked, the position information of the region in the first video image is the same as the position information of the masked region obtained by masking the region to be masked in the mask image.

For example, the decoding end may determine the position of the designated occlusion region in the second video image according to the position of the designated occlusion region in the occlusion image and the preset position mapping relationship, and copy the pixel value of the designated occlusion region in the second video image to the corresponding position in the occlusion image according to the position of the designated occlusion region in the second video image, so as to obtain the video image with the designated occlusion region subjected to the occlusion removal processing.

For example, assuming that the location of the occlusion region in the occlusion image is the same as the location in the second video image, then for any pixel location of any given occlusion region (assume (x) ₂ ，y ₂ ) The decoding end may decode the (x) in the second video image ₂ ，y ₂ ) Pixel value of the location, copied into (x) in the first video image ₂ ，y ₂ ) Location.

For another example, assuming that the position of the mask region in the mask image and the position of the mask region in the second video image are both shifted by 2 pixels in the x-axis direction and the y-axis direction (assuming x-axis forward direction and y-axis forward direction shifts, respectively), for any given mask region, any pixel position (assuming (x) is ₂ ，y ₂ ) The decoding end may decode the (x) in the second video image ₂ ，y ₂ ) Pixel value of the location, copied into (x) in the first video image ₂ -2，y ₂ -2) position.

In one example, the second type of permission level may include at least two different permission levels, the different permission levels allowing the categories of obscured regions viewed to be not identical;

before determining the position information of the designated occlusion region in the occlusion image according to the position information of the region to be occluded, the method may further include:

acquiring a classification identifier of a region to be shielded;

the determining the position information of the designated occlusion region in the occlusion image according to the position information of the region to be occluded may include:

determining the target category of the shielding region matched with the de-shielding authority level according to the de-shielding authority level; for any region to be shielded, the type of the region to be shielded is the same as that of a shielded region obtained by shielding the region to be shielded;

determining the shielding area of the target category as a designated shielding area according to the target category and the classification identification of each shielding area in the shielding area;

and determining the position information of the appointed shielding region in the shielding image according to the position information of the region to be shielded.

For example, to improve the flexibility of the de-blocking process, the visitor with the view of the blocked area can be divided into at least two different permission levels, that is, the permission level of the second type can include at least two different permission levels, and the permission levels allow the viewed blocked area to be not completely the same in category.

Illustratively, under the condition that the regions to be masked are subjected to authority classification, when the masked image is subjected to de-masking processing, the classification identifier of each region to be masked can be obtained, and the category of each masked region is determined according to the classification identifier of each region to be masked.

Illustratively, for any region to be shielded, the category of the region to be shielded is the same as that of the shielded region obtained by shielding the region to be shielded.

For example, the decoding end may determine a class (referred to as a target class herein) of the occlusion region matching the de-occlusion permission level according to the de-occlusion permission level of the visitor, and determine the occlusion region of the target class as the designated occlusion region according to the target class and the classification identifier of each occlusion region in the occlusion region.

The decoding end can determine the position information of the appointed shielding region in the shielding image according to the position information of the region to be shielded in the first video image (which is consistent with the position information of the shielding region in the shielding image).

In some embodiments, the decoding method provided in the embodiments of the present application may further include:

acquiring a packaged data stream, and decapsulating the packaged data stream to obtain a first code stream and a second code stream;

for an accessor with the unmasked authority of the first type authority level, decoding the first code stream to obtain an masked image, and displaying the masked image;

for an accessor with the de-occlusion permission at the second type permission level, determining the position information of a rectangular area formed by pixel positions with pixel values not being preset values as the position information of an occlusion area in a second video image according to the pixel values of the pixel positions in the second video image;

copying the pixel value of the appointed shielding area in the second video image to the corresponding position in the shielding image according to the position information of the shielding area in the second video image and the preset position mapping relation to obtain the video image of the appointed shielding area after the shielding processing is removed, and displaying the video image of the appointed shielding area after the shielding processing is removed; the preset position mapping relation comprises a mapping relation between the position of the shielding area in the shielding image and the position of the shielding area in the second video image.

For example, it is considered that, in the transmission process of the encapsulated data stream, the position information of the area to be masked may be discarded due to incompatibility as the accessory information of the encapsulated data stream, so that when the decoding end acquires the encapsulated data stream, the decoding end decodes the encapsulated data stream and does not acquire the position information of the area to be masked.

For example, considering that the initial value of each position in the preset image is a preset value, that is, the pixel value of the other region except the occlusion region in the second video image is the preset value, when the decoding end does not obtain the position information of the region to be occluded from the encapsulated data stream, the rectangular region formed by the pixel position of which the pixel value is not the preset value may be determined as the occlusion region in the second video image according to the pixel value of each pixel position in the second video image.

For the visitor whose unmasking curve is the second type of authority, the decoding end may determine, according to the pixel value of each pixel position in the second video image, the position information of the rectangular region formed by the pixel positions whose pixel values are not the preset values, as the position information of the masked region in the second video image.

Illustratively, when the decoding end determines the position information of the mask region in the second video image, the pixel value of the designated mask region in the second video image may be copied to the corresponding position in the mask image phenomenon according to the position information of the mask region in the second video image and the preset position mapping relationship, so as to obtain the video image subjected to the mask removal processing by the designated mask region, and the video image subjected to the mask removal processing by the designated mask region is displayed.

For example, for an visitor whose unmasking authority is the first type of authority, the decoding end may decode the first code stream to obtain an masked image, and display the masked image.

It should be noted that, the foregoing embodiments are merely specific examples of implementations of the embodiments of the present application, and are not intended to limit the scope of the present application, and based on the foregoing embodiments, new embodiments may be obtained through combination between the embodiments or modification of the embodiments, which all belong to the scope of the present application.

In addition, in the embodiment of the present application, the implementation flows of the encoding end and the decoding end may be mutually referred to.

In order to enable those skilled in the art to better understand the technical solutions provided in the embodiments of the present application, the following describes the technical solutions provided in the embodiments of the present application with reference to specific application scenarios.

In this embodiment, the video occlusion processing system may include a front-end processing device (i.e., the encoding end) and a back-end processing device (i.e., the decoding end).

Illustratively, the front-end processing device may include a detection module, a matting module, an occlusion module, an encoding module, an encryption module, and an encapsulation module.

The back-end processing device may include a decapsulation module, a decryption module, a decoding module, and a composition module.

For example, the video masking processing flow of each module can be as shown in fig. 3A, and the overall scheme can ensure that the low-weight data stream after the masking processing can be played by all the previous standard players without affecting compatibility.

The functions of the respective modules will be described below with reference to fig. 3A.

1. Front-end processing apparatus

1.1 detection Module

The input of the detection module is an original video image, and one or more of the coordinates related to the dynamic target area are output by means of computer vision target detection and the like.

Illustratively, this set of coordinates may be denoted as x.

For example, the content of the dynamic target area may include a human face or license plate information.

Illustratively, the dynamic target area is a rectangle, and the coordinate information of the dynamic target area can be represented as two-dimensional coordinates of two vertexes of the upper left corner and the lower right corner of the image pixel lattice, or the coordinates of the upper left vertex and the width, height and the like of the area.

For example, the processing frame rate of the detection module is related to the computational power available in the actual system, and is not limited in the embodiment of the present application.

For example, the detection module may further output a classification identifier of each detected dynamic target region, the classification identifier being used for subsequent viewing of the content in conjunction with the viewing right rating control.

1.2 matting Module

The input of the matting module is the original video and the coordinate set x output by the detection module.

Illustratively, the matting module can expand the boundary of each dynamic target region, and expand the rectangle of the dynamic target region outward in at least 1 direction around the rectangle ₁ One pixel, n ₁ Is a natural number.

Illustratively, the upper and left boundaries of the dynamic target area rectangle may be aligned to extend up and left, respectively, to n of 2 with the upper left corner of the image as the origin ₂ At integer multiple positions of the power.

Illustratively, canExpanding and aligning the lower boundary and the right boundary of the dynamic target area rectangle to n of 2 with the upper left corner of the image as the origin respectively downwards and rightwards ₂ At integer multiple of the power.

Exemplary, n ₂ Is a natural number greater than 3 and smaller than 8.

Illustratively, the extended set of coordinates of the area to be masked may be referred to as y.

The matting module ensures the coverage of shielding by carrying out boundary expansion on the region to be shielded; when performing the boundary, n, which is the distance from the expanded boundary to the corresponding boundary of the image being 2 ₂ Integral multiple of the power can improve the coding efficiency.

It should be noted that, in addition to the region coordinate set x output by the detection module, in practical applications, there may be some preset coordinates, such as fixed window coordinates, and the like, which are also considered as the region to be masked (i.e., the region to be masked of the second type) in the subsequent processing, so these region coordinates are also added to the coordinate set y.

Taking or collecting z from the above region coordinate set y in the original video pixel, that is, the original video pixel falls into any rectangular coordinate region defined in the set y, that is, the pixel is considered to fall into the set z. The values of the pixels in the or set z are copied to another block of memory space (i.e. the above mentioned preset image).

Illustratively, the preset image has the same size as the original video frame width and high resolution, and is uniformly assigned with the initial value a.

Exemplarily, the initial value a =0 or 127 or 128 or 255.

Based on the area to be shielded obtained after the boundary expansion, the area to be shielded and the original video image data are taken or collected, extracted and copied to the corresponding coordinate position of the preset image for processing, and the other path of video (also called as a second path of video) is obtained, wherein the two paths of video have the same resolution ratio, so that the simplicity and convenience of post-processing synthesis can be ensured.

The preset image is filled with a specific preset value, so that the code rate of the second path of video can be greatly reduced, and the storage space and the transmission flow are saved.

For example, when the pixel values in the set z are copied from the original video to the preset image, the pixel values can be copied to the same position of each pixel coordinate. The part of the preset image outside the area to be shaded, which is not assigned with the value, still retains the initial value a. After the copying is completed, the preset image forms a video 2.

Illustratively, pixel values in a plurality of areas to be masked in the same original video image are copied to the same preset image.

Illustratively, the frame rate at which the matting module processes the output video 2 is less than the frame rate of the original video.

Illustratively, the frame rate of the video 2 is m times of the frame rate of the original video, which can reduce the requirement on the system coding performance and facilitate the adaptation of products with different performances.

Illustratively, m is a natural number greater than 1 and less than 100.

Alternatively, the frame rate at which the matting module processes the output video 2 can be equal to the original video frame rate.

1.3 Shielding Module

The input of the shielding module is an original video, and each region to be shielded in the video image is shielded.

For example, the video occlusion processing mode may employ multi-tap filtering.

For example, multi-tap average filtering and downsampling may be employed.

Illustratively, the original video is masked and then output as video 1.

1.4 coding module

The input of the coding module is the video 1 and the video 2, and the video 1 and the video 2 are subjected to video coding compression so as to reduce bandwidth consumption of transmission and storage and respectively obtain a code stream 1 and a code stream 2 (which are not encrypted).

For example, the encoding module may use various standards such as h.264 or h.265 to perform video encoding compression on video 1 and video 2.

The code stream (i.e. code stream 1) corresponding to the mask image is a standard code stream, and can realize the fully compatible decoding and playing of various standard playing terminals. In addition, the frame rates of the video 1 and the video 2 can be different, namely the code stream 1 and the code stream 2 do not need frame rate synchronization, and various encoding and decoding terminals can be flexibly combined and adapted.

1.5, encryption module

The input of the encryption module is a code stream 2 (not encrypted) and an authority key, and the output is an encrypted code stream 2.

For example, the encryption algorithm and the management method of the authority key used by the encryption module to encrypt the code stream 2 are not limited in the embodiment of the present application.

By encrypting the code stream 2, the safety of the image information in the shielding area is ensured.

In addition, the code stream 2 only comprises image information of the shielding area, and the rest positions are fixed preset values, so that the extra bandwidth consumption required by transmission and storage is low.

1.6 encapsulation module

The input of the encapsulation module is a code stream 1 and a code stream 2.

Illustratively, the coordinate set y of the region to be masked, the code stream 1 and the code stream 2, may be composited together in a time axis aligned manner for output.

Illustratively, the coordinate sets y of the regions to be masked participating in the composition are aligned with the frames of the code stream 2 one by one.

For example, the coordinate set y of the region to be masked and the code stream 2 may be compounded as the accompanying private data of the temporally corresponding frame in the code stream 1.

Illustratively, the classification identifier of each region to be masked and the coordinate set y can also be added into the code stream.

For example, the specific data definition and implementation of the encapsulation module may refer to standards such as ISO/IEC 13818 or RFC3984, which is not limited in the embodiments of the present application.

For example, the encapsulated data output by the encapsulation module may be used for network transmission or data storage.

By means of the code stream 1 and code stream 2 composite packaging, the decoding end can utilize a special player to perform authentication playing, information protection is achieved, and meanwhile information acquisition of a high-authority user is guaranteed.

When the coordinate set y of the area to be shielded is encapsulated into the data stream, the synthesis process of the rear end can be accurately finished, and when the coordinate set y of the area to be shielded is not encapsulated into the data stream, the judgment of the shielded area at the rear end is simpler and more convenient due to the filling of the preset value. The rear end synthesizes the two paths of videos, the decoding and displaying cost is low, and the high-level authority can approximately and invisibly watch the unmasked images.

2. Back-end processing device

2.1 decapsulation Module

The input of the decapsulation module is a code stream of composite encapsulation, and the code stream 1 and the code stream 2 are output.

Illustratively, when the code stream subjected to composite encapsulation further includes a region coordinate set y to be masked, the decapsulation module may further output the code stream corresponding to the region coordinate set y to be masked.

Illustratively, when the code stream subjected to composite encapsulation further includes a coordinate set y of the area to be masked and a classification identifier of the area to be masked, the decapsulation module may further output the code stream corresponding to the coordinate set y of the area to be masked and the classification identifier of the area to be masked.

2.2 decryption module

The input of the decryption module is a code stream 2 (after encryption) and an authority key, the code stream 2 is decrypted according to the input authority key, and the code stream 2 is output (not encrypted).

Illustratively, the decryption permission level (i.e., the unmasked permission level described above) may also be entered at the same time as the permission key is entered.

2.3 decoding Module

The input of the decoding module comprises a code stream 1 and a code stream 2 (not encrypted), and the decoding module can decode the code stream 1 and the code stream 2 respectively and output a video 1 and a video 2.

Illustratively, when the input further includes a code stream corresponding to the coordinate set y of the area to be masked, the decoding module may decode the code stream corresponding to the coordinate set y of the area to be masked to obtain the coordinate set y of the area to be masked.

Illustratively, when the input further includes a code stream corresponding to the coordinate set y of the region to be masked and the classification identifier of the region to be masked, the decoding module may decode the code stream corresponding to the coordinate set y of the region to be masked and the classification identifier of the region to be masked, to obtain the coordinate set y of the region to be masked and the classification identifier of the region to be masked.

For example, for any original video image (i.e. the first video image) and the mask image corresponding to the original video image, the regions to be masked in the original video image correspond to the mask regions in the mask image one to one, and there are corresponding mask regions in the same position in the mask image in any region to be masked in the original video image. The position information of the region to be masked, which is encapsulated in the encapsulation data stream by the encoding end, can also be used for indicating the position of the masking region in the masking image.

2.4 Synthesis of Module

The input of the synthesis module comprises a video 1 and a video 2, and the synthesis module can perform video synthesis processing on the video 2 and the video 1 according to the frame rate of the video 2 so as to perform de-occlusion processing on at least one region to be occluded in the video 1.

For example, if the input of the synthesis module includes a coordinate set y of the region to be masked, the merging module may determine a portion of the video 2 belonging to the masked region according to the coordinate set y of the region to be masked, and copy the pixel value of at least one masked region to a corresponding position in the video 1 to overlay the masked processing data in the video 1.

If the input of the composition module further includes the classification identifier of the region to be masked, the merge module may determine the class of the masked region (i.e., the above-mentioned target class) matching the decryption permission level according to the decryption permission level, and copy the pixel value of the masked region of the target class in the video 2 to the corresponding position in the video 1.

If the input of the synthesis module does not include the coordinate set y of the area to be masked, the synthesis module can search the pixel position of the pixel value in the video 2 deviating from the preset value a, determine that the rectangular range of the pixel position is the masking area, and copy the pixel value of the part of the masking area in the video 2 to the corresponding position in the video 1.

Thus, a partially or fully de-occluded composite video (which may be referred to as a de-occluded video) is viewable by high-rights users.

By performing the permission classification on the occlusion region, different video viewing effects can be set for different de-occlusion permissions (e.g., partial occlusion region de-occlusion or full occlusion region de-occlusion).

For example, the play frame rate of the deblocked video may be less than or equal to the frame rate of video 2.

For example, referring to fig. 3B, an effect schematic diagram of the masking process and the de-masking process in the embodiment of the present application can be shown, as shown in fig. 3B, a coding end can perform masking matting on an original video image to obtain two paths of videos, one path is a video corresponding to a masked image after a region to be masked is subjected to masking processing, and the other path is a video corresponding to an image storing original image data of the masked region.

As shown in fig. 3B, for the same original video image, original image data of a plurality of regions to be masked in the original video image are stored in the same image.

The encoding end can encode the two paths of videos, encrypt code streams corresponding to original image data of the shielding area, and package and transmit the two paths of videos.

At a decoding end, for a common authority user, the first path of video is decoded to obtain a shielding image for playing and displaying; for the advanced authority user, decoding the first path of video on one hand to obtain a shielding image; and on the other hand, decoding and decrypting the original image data of the shielded area, and synthesizing and de-shielding the shielded image according to the original image data of the shielded area to obtain a complete image for playing and displaying.

The methods provided herein are described above. The following describes the apparatus provided in the present application:

referring to fig. 4, a schematic structural diagram of an encoding apparatus provided in an embodiment of the present application is shown in fig. 4, where the encoding apparatus may include:

an obtaining unit 410, configured to obtain position information of a region to be masked in a first video image;

the preprocessing unit 420 is configured to copy the pixel value of the region to be masked in the first video image to a preset image according to the position information to obtain a second video image; when a plurality of areas to be shaded exist in the first video image, copying pixel values in the areas to be shaded in the same first video image to the same preset image to obtain a second video image; the initial value of each position in the preset image is a preset value;

the masking unit 430 is configured to mask the region to be masked in the first video image to obtain a masked image;

the encoding unit 440 is configured to encode the mask image to obtain a first code stream, and encode and encrypt the second video image to obtain a second code stream;

the encapsulating unit 450 is configured to encapsulate the first code stream, the second code stream, and the position information of the area to be masked, so as to obtain an encapsulated data stream.

In some embodiments, the region to be masked comprises a first type of region to be masked, and/or a second type of region to be masked; the first type of region to be shielded is determined by a target detection mode of the first video image, and the second type of region to be shielded is determined by a preset mode.

In some embodiments, when the region to be masked includes the first type region to be masked, the first type region to be masked is obtained by performing boundary expansion on a dynamic target region, where the dynamic target region is a region to be masked determined by performing a target detection method on the first video image.

In some embodiments, the performing the boundary extension on the dynamic target region includes:

and expanding n1 pixels outwards from at least one boundary of the dynamic target area, wherein n1 is a positive integer.

In some embodiments, the dynamic target region is a rectangular region, and the distance from the expanded boundary to the specified boundary is an integer multiple of the n2 power of 2; the specified boundary is a boundary parallel to the boundary expanded by the target area in the left boundary and the upper boundary of the first video image, and n2 is a positive integer.

In some embodiments, the pre-processing unit 420 copies the pixel values of the region to be masked in the first video image to a preset image according to the position information, including:

copying the pixel value of the area to be shielded in the first video image to a target position in the preset image, wherein the target position is a position in the preset image matched with the second position information.

In some embodiments, the preset image has the same resolution as the first video image;

the pre-processing unit 420 copies the pixel value of the region to be masked in the first video image to the target position in the preset image, and includes:

for any pixel position in the area to be shielded in the first video image, copying the pixel value of the pixel position to a position, which is the same as the coordinate of the pixel position, in the preset image; and the corresponding coordinates of the first position information in the first video image are the same as the corresponding coordinates of the second position information in the preset image.

In some embodiments, the encapsulating unit 450 encapsulates the first code stream, the second code stream, and the position information of the area to be masked, and further includes:

and performing composite packaging on the second code stream corresponding to the second video image and the position information of the area to be shielded as the attached private data of the first code stream corresponding to the first video image.

In some embodiments, the obtaining unit 410 is further configured to obtain a classification identifier of the region to be masked;

the encapsulating unit 450 encapsulates the first code stream, the second code stream, and the position information of the area to be masked, including:

and performing composite packaging on the second code stream corresponding to the second video image, the position information of the area to be shielded and the classification identifier of the area to be shielded as the attached private data of the first code stream corresponding to the first video image.

In some embodiments, a frame rate of the second video image is less than or equal to a frame rate of the first video image.

In some embodiments, the frame rate of the second video image is 1/m of the frame rate of the first video image; wherein m is a positive integer.

Referring to fig. 5, a schematic structural diagram of a decoding apparatus according to an embodiment of the present application is shown in fig. 5, where the decoding apparatus may include:

an obtaining unit 510, configured to obtain an encapsulated data stream;

a decapsulation unit 520, configured to decapsulate the encapsulated data stream to obtain the first code stream, the second code stream, and the location information of the area to be masked;

a decoding unit 530, configured to decode the first code stream to obtain a mask image for an accessor whose unmasking permission is a first type permission level; wherein the first type permission level is a level without obscured area viewing permission;

a display unit 540, configured to display the mask image;

the decoding unit 530 is further configured to decode the first code stream to obtain a mask image and decrypt and decode the second code stream to obtain a second video image for an accessor whose unmasking authority is at the second type authority level; the second type authority level is a level with a designated occlusion region viewing authority, the designated occlusion region comprises at least one occlusion region in the occlusion image, and the occlusion region is a region obtained after the occlusion region to be occluded in the first video image is occluded;

a de-occlusion unit 550, configured to perform de-occlusion processing on the specified occlusion region in the occlusion image according to the second video image and the position information of the region to be occluded;

the display unit 540 is further configured to display the video image after performing the de-occlusion processing on the designated occlusion region.

In some embodiments, the de-occlusion unit 550 performs de-occlusion processing on the designated occlusion region in the occlusion image according to the second video image and the position information of the region to be occluded, including:

determining the position information of the appointed shielding region in the shielding image according to the position information of the region to be shielded; for any region to be shielded, the position information of the region in the first video image is the same as the position information of the shielded region in the shielded image obtained by shielding the region to be shielded;

determining the position of the designated shielding region in the second video image according to the position information of the designated shielding region in the shielding image and a preset position mapping relation; the preset position mapping relation comprises a mapping relation between the position of the shielding area in the shielding image and the position of the shielding area in the second video image;

copying the pixel values of the specified occlusion region in the second video image to the corresponding position in the occlusion image according to the position of the specified occlusion region in the second video image to obtain the video image with the specified occlusion region subjected to the occlusion removal processing.

In some embodiments, the second type of permission level includes at least two different permission levels, the different permission levels allowing the categories of obscured regions viewed to be not identical;

the obtaining unit 510 is further configured to obtain a classification identifier of the region to be shielded;

the de-occlusion unit 550 determines the position information of the designated occlusion region in the occlusion image according to the position information of the region to be occluded, including:

determining the target class of the shielding region matched with the de-shielding authority level according to the de-shielding authority level; for any region to be shielded, the type of the region to be shielded is the same as that of a shielded region obtained by shielding the region to be shielded;

determining the shielding area of the target category as the designated shielding area according to the target category and the classification identification of each shielding area in the shielding areas;

In some embodiments, the obtaining unit 510 is further configured to obtain an encapsulated data stream;

the decapsulation unit 520 is further configured to decapsulate the encapsulated data stream to obtain the first code stream and the second code stream;

the decoding unit 530 is further configured to decode the first code stream to obtain a mask image for an accessor whose unmasking authority is at the first type authority level, and display the mask image;

the decoding unit 530 is further configured to, for an visitor whose unmasking authority is at the second type authority level, determine, according to a pixel value of each pixel position in the second video image, position information of a rectangular region formed by pixel positions whose pixel values are not the preset value, as position information of an unmasked region in the second video image;

the de-occlusion unit 550 is further configured to copy the pixel value of the designated occlusion region in the second video image to a corresponding position in the occlusion image according to the position information of the occlusion region in the second video image and a preset position mapping relationship, so as to obtain a video image after de-occlusion processing of the designated occlusion region; the preset position mapping relation comprises a mapping relation between the position of a shielding region in the shielding image and the position of the shielding region in the second video image;

Please refer to fig. 6, which is a schematic diagram of a hardware structure of an encoder device according to an embodiment of the present disclosure. The encoding end device may include a processor 601, a machine-readable storage medium 602 storing machine-executable instructions. The processor 601 and the machine-readable storage medium 602 may communicate via a system bus 603. Also, the processor 601 may perform the encoding method described above by reading and executing machine executable instructions in the machine readable storage medium 602 corresponding to the encoding control logic.

The machine-readable storage medium 602 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.

In some embodiments, there is also provided a machine-readable storage medium having stored therein machine-executable instructions that, when executed by a processor, implement the encoding method described above. For example, the machine-readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 7 is a schematic diagram of a hardware structure of a decoding-side device according to an embodiment of the present disclosure. The decode-side device may include a processor 701, a machine-readable storage medium 702 having stored thereon machine-executable instructions. The processor 701 and the machine-readable storage medium 702 may communicate via a system bus 703. Also, the processor 701 may perform the above-described decoding method by reading and executing machine-executable instructions corresponding to the decoding control logic in the machine-readable storage medium 702.

The machine-readable storage medium 702 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.

In some embodiments, there is also provided a machine-readable storage medium having stored therein machine-executable instructions which, when executed by a processor, implement the decoding method described above. For example, the machine-readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and so forth.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. An encoding method applied to a front-end video acquisition device, the method comprising:

copying the pixel value of the area to be shielded in the first video image to a preset image according to the position information to obtain a second video image; when a plurality of areas to be shaded exist in the first video image, copying pixel values in the areas to be shaded in the same first video image to the same preset image to obtain a second video image; the initial value of each position in the preset image is a preset value;

2. The method according to claim 1, characterized in that the area to be masked comprises an area to be masked of a first type and/or an area to be masked of a second type; the first type of region to be shielded is determined by a target detection mode of the first video image, and the second type of region to be shielded is determined by a preset mode.

3. The method according to claim 2, wherein when the region to be masked includes the first type region to be masked, the first type region to be masked is obtained by performing boundary extension on a dynamic target region, and the dynamic target region is a region to be masked determined by performing a target detection on the first video image.

4. The method of claim 3, wherein the performing the boundary extension on the dynamic target region comprises:

5. The method of claim 4, wherein the dynamic target area is a rectangular area, and the distance from the expanded boundary to the specified boundary is n of 2 ₂ Integer multiples of the power; wherein the designated boundary is a boundary parallel to the boundary extended from the target region, n, of the left boundary and the upper boundary of the first video image ₂ Is a positive integer.

6. The method according to any one of claims 1 to 5, wherein the copying the pixel values of the region to be masked in the first video image to a preset image according to the position information comprises:

7. The method according to claim 6, wherein the preset image has the same resolution as the first video image;

the copying the pixel value of the region to be masked in the first video image to a target position in the preset image includes:

8. The method according to any one of claims 1 to 5, wherein the encapsulating the first codestream, the second codestream, and the location information of the area to be masked further comprises:

9. The method according to claim 8, wherein before encapsulating the first code stream, the second code stream, and the location information of the area to be masked, the method further comprises:

acquiring a classification identifier of the area to be shielded;

the encapsulating the first code stream, the second code stream and the position information of the area to be shielded comprises:

10. The method according to any of claims 1-5, wherein a frame rate of the second video image is less than or equal to a frame rate of the first video image.

11. The method of claim 10, wherein the frame rate of the second video image is 1/m of the frame rate of the first video image; wherein m is a positive integer.

12. A decoding method based on the encoding method of any one of claims 1 to 11, comprising:

acquiring a packaged data stream, and decapsulating the packaged data stream to obtain the first code stream, the second code stream, and the position information of the area to be masked;

decoding the first code stream to obtain a mask image aiming at an accessor with a second type permission level as a de-mask permission, and decrypting and decoding the second code stream to obtain a second video image; according to the second video image and the position information of the area to be shielded, carrying out de-shielding processing on a specified shielding area in the shielding image, and displaying the video image subjected to the de-shielding processing on the specified shielding area; the second type permission level is a level with the viewing permission of the appointed shielding region, the appointed shielding region comprises at least one shielding region in the shielding image, and the shielding region is a region which shields a region to be shielded in the first video image.

13. The method according to claim 12, wherein the performing de-occlusion processing on the specified occlusion region in the occlusion image according to the second video image and the position information of the region to be occluded comprises:

determining the position of the designated shielding region in the second video image according to the position information of the designated shielding region in the shielding image and a preset position mapping relation; the preset position mapping relation comprises a mapping relation between the position of a shielding region in the shielding image and the position of the shielding region in the second video image;

copying the pixel values of the designated occlusion region in the second video image to the corresponding position in the occlusion image according to the position of the designated occlusion region in the second video image to obtain the video image subjected to the de-occlusion processing of the designated occlusion region.

14. The method of claim 13, wherein the second type of permission level comprises at least two different permission levels, the different permission levels allowing the categories of obscured regions viewed to be not identical;

the determining, according to the position information of the region to be occluded, the position information of the specified occlusion region in the occlusion image, and further comprising:

acquiring a classification identifier of the area to be shielded;

the determining the position information of the specified occlusion region in the occlusion image according to the position information of the region to be occluded comprises:

15. The method of claim 12, further comprising:

acquiring a packaged data stream, and decapsulating the packaged data stream to obtain the first code stream and the second code stream;

for the visitor with the unmasking authority of the second type authority level, according to the pixel value of each pixel position in the second video image, determining the position information of a rectangular area formed by the pixel positions of which the pixel values are not the preset values as the position information of the masked area in the second video image;

copying the pixel value of the appointed shielding area in the second video image to the corresponding position in the shielding image according to the position information of the shielding area in the second video image and a preset position mapping relation to obtain the video image of the appointed shielding area after the shielding processing is removed, and displaying the video image of the appointed shielding area after the shielding processing is removed; the preset position mapping relation comprises a mapping relation between the position of the shielding area in the shielding image and the position of the shielding area in the second video image.

16. An encoding apparatus, the apparatus being deployed in a front-end video capture device, the apparatus comprising:

17. A decoding apparatus corresponding to the encoding apparatus of claim 16, comprising:

an obtaining unit configured to obtain an encapsulated data stream;

the decapsulation unit is used for decapsulating the encapsulated data stream to obtain a first code stream, a second code stream and position information of an area to be masked;

a display unit configured to display the mask image;

the decoding unit is further configured to decode the first code stream to obtain a mask image and decrypt and decode the second code stream to obtain a second video image for an accessor whose unmasking authority is at a second type authority level; the second type authority level is a level with a designated occlusion region viewing authority, the designated occlusion region comprises at least one occlusion region in the occlusion image, and the occlusion region is a region obtained after the occlusion region to be occluded in the first video image is occluded;

18. An encoding end device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being configured to execute the machine-executable instructions to implement the method of any one of claims 1-11.

19. A decode-side device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor for executing the machine-executable instructions to implement the method of any one of claims 12-15.