WO2016161674A1

WO2016161674A1 - Method, device, and system for video image compression and reading

Info

Publication number: WO2016161674A1
Application number: PCT/CN2015/077729
Authority: WO
Inventors: 武晓阳; 浦世亮; 沈林杰; 俞海
Original assignee: 杭州海康威视数字技术股份有限公司
Priority date: 2015-04-08
Filing date: 2015-04-28
Publication date: 2016-10-13
Also published as: CN106162190A

Abstract

Disclosed are a method, device, and system for video image compression and reading. The method for video image compression and reading comprises: extracting a background layer and a target layer from images to be encoded, where a target in the target layer is an area of interest in the images to be encoded; encoding respectively the target layer and the background layer to produce respective streams; and, compounding the stream of the target layer and the stream of the background layer. By extracting the background layer and the target layer from the images to be encoded, encoding respectively the background layer and the target layer to produce respective streams, and then compounding the streams, when decoding, the compounding streams are decoded and images comprising a target object is retrieved directly, thus increasing the utilization rate of computer resources.

Description

Method, device and system for video image compression and reading

Technical field

The present invention relates to the field of image processing, and in particular, to a method, device and system for video image compression and reading.

Background technique

The digital video compression standard began in the 1980s. After more than 30 years of development, the existing standards include ITU-T series H.261, H.263, ISO MPEG-1, MPEG-4, and two organizations. Developed MPEG-2/H.262, H.264/AVC, HEVC (newest release in 2013). There are other standards of organizations, such as domestic AVS, Microsoft's VC-1, Google's VP8 and so on. Similarly, these standards use a block-based hybrid coding framework that combines predictive coding, transform coding, and entropy coding.

The block-based hybrid coding frame encoding process is shown in Figure 1. The image to be encoded is first block-processed and divided into 16x16 blocks called macroblocks (the HEVC block size can vary from 8x8 to 64x64, called Maximum coding unit LCU). As shown in FIG. 3, macroblocks are encoded in a scanning order from left to right and top to bottom. Each macroblock first performs predictive coding, and uses the previous frame to reconstruct the image or the already coded portion around the macroblock as a reference to obtain the predicted residual data; the residual data is spatially transformed and encoded, and the DCT or ICT is used according to different sizes. The parameter data is transformed to obtain transform coefficients in the frequency domain; after the transform coefficients are quantized, they are sent to the entropy coding to obtain the final code stream. In order to effectively encode the next frame image, the current quantized data needs to be inversely processed, that is, inverse quantized and inverse transformed, and then added to the predicted data to obtain a decoded image, that is, a reconstructed image, and the reconstructed image is placed. In the reference buffer, the reference image is encoded as the next frame image. The decoding process of the block-based hybrid coding framework is shown in FIG. 2. After the encoded code stream is entropy decoded, inverse quantized, and inverse transformed, and then added to the predicted image, the decoded image (video signal) is obtained. The decoded image needs to be stored for use as a reference image for frame decoding.

Predictive coding is an important coding technique for video compression. According to different sources of prediction data, the coded image can be divided into I frames (intra prediction frame, Intra), P frame (inter prediction frame, prediction), B frame (bidirectional prediction frame, Bi-Prediction). As shown in FIG. 4, when the I frame is subjected to predictive coding, only the data of the current frame is used for prediction, and the decoding can be independently decoded without relying on other frames. When the P frame is used for predictive coding, the reconstructed image of the encoded image of the previous frame is used as a reference. When the P frame is decoded, the image of the reference frame must be decoded before decoding. When predictive coding is performed on a B frame, the previous frame and the subsequent frame can be used as reference at the same time to become a bidirectional reference frame. The B frame decoding needs to be decoded after both the previous reference frame and the subsequent reference frame are successfully decoded. P frame, B frame in addition to encoding Use other frames as a reference, or use the frame data as a reference for I frame, and choose the best case for both. I frames can be decoded independently, and are usually used for random insertion. For example, digital TV requires 1 to 1.5 seconds to insert an I frame to ensure that the user can see the image as soon as possible when switching channels. However, the I frame compression efficiency is low, and the code rate is relatively large, usually 4 to 10 times or even several times of the P frame. In terms of compression efficiency, I frame <P frame <B frame is usually used. In terms of computational complexity, I frame <P frame <B frame is usually used.

When performing inter-frame prediction, the foregoing multiple reconstructed images may be used as reference frames, as shown in FIG. 5, which is a P-frame multi-frame reference case, and when encoding the second P-frame, the first two frames are used as reference; As shown in FIG. 6, it is a B frame multi-frame reference case, the forward reference frame of the B frame has two frames, and the backward reference frame is one frame. Multi-frame reference can improve compression efficiency and increase the complexity of the operation.

In practical applications, especially in video surveillance applications, users tend to be interested in specific targets in the picture, such as people, cars, entrances and exits, etc. in the picture, and hope that the picture quality of these areas is clear, that is, the coding of interest, Figure 7 error! The reference source was not found. There are 3 regions of interest in the image shown. In addition, because the monitoring video has many points and long time, resulting in a large amount of data, the user wants to quickly locate the target by means of retrieval, instead of viewing the entire video.

The processing of the encoding of interest in the existing video is implemented by assigning different quantized coefficients to the coding blocks of the region of interest class. Usually, the quantized coefficients are smaller than other regions and the picture quality is high. However, the order of the code streams, the dependencies between the blocks and adjacent blocks, and the dependence of the blocks on the reference image blocks have not changed. At this time, if the user needs to retrieve the video, it is necessary to decode all the pictures to obtain the picture of the region of interest. Normally, there are not many moving objects on the monitoring screen, and the time period of the moving objects is also a small number. All images are completely solved and retrieved, and the waste of computing resources is serious.

Summary of the invention

It is an object of the present invention to provide a video image compression and reading method, apparatus and system for separately encoding a background layer and a target layer by extracting a background layer and a target layer from the image to be encoded. The code stream is combined with the code stream. When decoding, the composite code stream is decoded, and the image containing the target object is directly retrieved, thereby improving the utilization of the computing resource.

In order to achieve the above objectives, the following technical solutions are specifically adopted:

The first aspect adopts a video image compression method, including:

Extracting a background layer and a target layer from the image to be encoded, wherein the target in the target layer is a portion of the region of interest in the image to be encoded;

Generating a code stream for each of the target layer and the background layer respectively;

Combine the code stream of the target layer with the code stream of the background layer.

Wherein, the combining the code stream of the target layer and the code stream of the background layer is specifically:

The header information is added to the code stream corresponding to the image to be encoded, and the code stream of the target layer and the code stream of the background layer are recorded after the header information.

The method respectively generates a code stream for each of the target layer and the background layer, including:

Filling the area outside the target in the target layer with a fixed value;

Filling in the background layer corresponding to the target in the target layer with a fixed value;

A code stream is separately generated for each of the filled target layer and the background layer.

The location information of the target in the target layer is recorded in the header information.

Wherein, when the target layer is failed to be extracted in the image to be encoded, the location information of the target in the target layer in the header information is recorded as empty.

The separation identifier is inserted between the code stream of the target layer and the code stream of the background layer after the header information.

The second aspect adopts a video image compression device, including:

a layer extracting unit, configured to extract a background layer and a target layer from the image to be encoded, where the target in the target layer is an area of the interest region in the image to be encoded;

a layer coding unit, configured to respectively generate a code stream for each of the target layer and the background layer;

The code stream composite unit is configured to combine the code stream of the target layer and the code stream of the background layer.

The code stream composite unit is specifically configured to:

The layer coding unit includes:

a first filling module, configured to fill an area other than the target in the target layer with a fixed value;

a second filling module, configured to fill, in the background layer, an area corresponding to the target in the target layer with a fixed value;

The layer coding module is configured to respectively generate a code stream for each of the filled target layer and the background layer.

The third aspect adopts a video image reading method, including:

Obtaining a video code stream, where the video code stream is composed of a code stream of a target layer and a code stream of a background layer; wherein the target in the target layer is a portion of the interest region in the image;

Confirm the target video frame where the decoding target is located;

The associated video codestream is decoded starting from the target video frame.

The video code stream is added with header information, where the header information records location information of the target in the target layer;

Decoding the related video code stream from the target video frame, specifically:

An area of interest region is decoded from the target video frame, and the portion of interest region is composited to the background layer according to the location information.

The fourth aspect adopts a video image reading device, including:

a code stream obtaining unit, configured to acquire a video code stream, where the video code stream is formed by combining a code stream of a target layer and a code stream of a background layer; wherein the target in the target layer is an interest in the image Regional part;

a target confirmation unit, configured to confirm a target video frame where the decoding target is located;

And a code stream decoding unit, configured to decode the related video code stream from the target video frame.

The code stream decoding unit is specifically configured to:

A video image processing system according to any one of the preceding claims, comprising the video image compression device according to any of the above aspects, and the video image reading device according to any one of the preceding claims.

The invention has the beneficial effects that: by extracting the background layer and the target layer from the image to be encoded, respectively, the background layer and the target layer are separately encoded to generate a code stream, and then the code stream is combined, and the composite code stream is decoded. Decoding, directly retrieve the image containing the target object, and improve the utilization of computing resources.

DRAWINGS

1 is a schematic flow chart of a block-based hybrid coding framework coding in the prior art;

2 is a schematic flow chart of decoding of a block-based hybrid coding frame in the prior art;

3 is a schematic diagram showing a scanning sequence of macroblocks in block-based hybrid coding in the prior art;

4 is a schematic diagram of an inter-frame reference relationship in block-based hybrid coding in the prior art;

5 is a schematic diagram of a reference relationship of a P frame multiframe reference in block-based hybrid coding in the prior art;

6 is a schematic diagram of a reference relationship of a B frame multiframe reference in block-based hybrid coding in the prior art;

Figure 7 is a schematic illustration of a region of interest in an image in the prior art;

FIG. 8 is a flowchart of a method of a first embodiment of a video image compression method according to an embodiment of the present invention; FIG.

9 is a flowchart of a method of a second embodiment of a video image compression method according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of an image layer and a background layer in a second embodiment of a video image compression method according to an embodiment of the present invention; FIG.

11 is a schematic diagram of a code stream organization manner in a second embodiment of a video image compression method according to an embodiment of the present invention;

FIG. 12 is a schematic structural diagram of a first embodiment of an apparatus for compressing video images according to an embodiment of the present invention; FIG.

FIG. 13 is a structural block diagram of a second embodiment of an apparatus for compressing video images according to an embodiment of the present invention; FIG.

FIG. 14 is a flowchart of a method of a first embodiment of a video image reading method according to an embodiment of the present invention; FIG.

15 is a block diagram showing the structure of a first embodiment of a video image reading apparatus according to an embodiment of the present invention;

16 is a block diagram showing the configuration of a first embodiment of a video image processing system according to an embodiment of the present invention.

detailed description

The present invention will be further described in detail below with reference to the specific embodiments thereof and the accompanying drawings. It is to be understood that the description is not intended to limit the scope of the invention. In addition, descriptions of well-known structures and techniques are omitted in the following description in order to avoid unnecessarily obscuring the inventive concept.

Please refer to FIG. 8 , which is a flowchart of a method for a first embodiment of a video image compression method according to an embodiment of the present invention. The method in this embodiment is mainly used for storing various videos, especially monitoring videos. As shown, the method includes:

Step S101: extracting a background layer and a target layer from the image to be encoded, where the target in the target layer is the portion of the interest region in the image to be encoded.

In this solution, especially for monitoring video, specific targets in the screen, such as people, cars, entrances and exits, etc., because of the large number of monitoring video points and long time, it is generally desirable to quickly locate these areas without The entire video is fully observed. So the image to be encoded is divided into a background layer and a target layer, The target in the frequency is divided into the target layer. When viewing the video, the target layer is directly searched, and the fast retrieval of the target to be retrieved is realized, and the operation efficiency is improved.

Step S102: respectively generating a code stream for each of the target layer and the background layer.

In order to enable the decoding to be separately decoded in the decoding, the target layer and the background layer are separately encoded to form respective code streams at the time of encoding, and the encoding may be specifically performed for the adopted encoding standard.

Step S103: Combine the code stream of the target layer and the code stream of the background layer.

The code stream of the target layer is combined with the code stream of the background layer. Compared with the prior art scheme, the combined code stream can perform more accurate positioning and directly access the image of the determined target. The decoding efficiency is improved.

In summary, by extracting the background layer and the target layer from the image to be encoded, the background layer and the target layer are separately encoded to generate a code stream, and then the code stream is combined, and the composite code stream is decoded during decoding. Direct retrieval of images containing target objects improves the utilization of computing resources.

Please refer to FIG. 9 , which is a flowchart of a method for a second embodiment of a video image compression method according to an embodiment of the present invention. As shown in the figure, the method includes:

Step S201: extracting a background layer and a target layer from the image to be encoded, where the target in the target layer is a portion of the region of interest in the image to be encoded.

The extraction of the background layer and the target layer is realized by image recognition or image analysis, and the range selection of the target layer can also be completed by setting the imaging device. The specific technical solutions have been implemented in the prior art and will not be further described herein.

Step S202: Filling the area other than the target in the target layer with a fixed value.

Step S203: Filling in the background layer corresponding to the target in the target layer with a fixed value.

In order to make the target layer in the original position in the image when decoding, the area outside the target in the target layer is filled with a fixed value, and the area in the background layer corresponding to the target in the target layer is also filled with a fixed value. The target layer and the background layer have the same image size and resolution when encoding, and the subsequent composite operations are more accurate. The specific filling method is as shown in FIG. 10, two extracting two layers from the image, and filling the corresponding positions of the other layer in the respective layers, which is equivalent to obtaining two sub-resolutions with the same resolution. The image frame is encoded separately for the two layers.

Step S204: respectively generating a code stream for each of the filled target layer and the background layer.

When encoding the target layer and the background layer, each block is still encoded according to the scanning method from left to right and from top to bottom, and only when the filled portion is encountered, skipping without processing, each The code streams generated by the layers are combined.

Step S205: Add header information to the code stream corresponding to the image to be encoded, and record the code stream of the target layer and the code stream of the background layer after the header information.

The header information of the target in the target layer is recorded in the header information. a separation identifier is inserted between the code stream of the target layer and the code stream of the background layer after the header information; specifically, the separation identifier may be a start code identifier capable of separating, that is, each layer The code stream is provided with a start code identifier so as to distinguish the start position of the code stream when decoding. The specific code stream is organized as shown in FIG. 11. The header information is added before the video stream, and the location of the target in the target layer is recorded. When the video is retrieved, the header information is directly used for accurate positioning, thereby improving data processing. The efficiency, the positional relationship between the code stream of the specific target layer and the code stream of the background layer is not limited, and the layer 1 code stream and the layer 2 code stream in FIG. 11 respectively correspond to one.

In summary, by extracting the background layer and the target layer from the image to be encoded, the background layer and the target layer are separately encoded to generate a code stream, and then the code stream is combined, and the composite code stream is decoded during decoding. Direct retrieval of images containing target objects improves the utilization of computing resources. At the same time, the header information is also set, and the code stream of the target layer and the code stream of the background layer are recorded after the header information, and the location information of the target in the target layer is recorded in the header information, and the code stream and background of the target layer are recorded. A separate identifier is inserted between the code streams of the layer to distinguish the two, and the ordered storage and fast retrieval of the two code streams are realized.

The following is an embodiment of a device for compressing video images provided in an embodiment of the present invention, and an embodiment of a device for compressing video images is implemented based on an embodiment of the method for compressing video images described above, in a device for compressing digital video images. For an explanation of the embodiments, please refer to the above embodiments of the video image compression method.

Please refer to FIG. 12, which is a structural block diagram of a first embodiment of a video image compression apparatus according to an embodiment of the present invention. As shown, the apparatus includes:

a layer extracting unit 310, configured to extract a background layer and a target layer from the image to be encoded, where the target in the target layer is an area of the interest region in the image to be encoded;

By distinguishing the image to be encoded into a background layer and a target layer, the target in the video is divided into the target layer, and when the video is viewed, the target layer is directly searched to realize a fast retrieval of the target that needs to be retrieved. Improve computing efficiency.

a layer coding unit 320, configured to separately generate a code stream for each of the target layer and the background layer;

The code stream combining unit 330 is configured to combine the code stream of the target layer and the code stream of the background layer.

In summary, the cooperation of the above units works by extracting the background layer and the target layer from the image to be encoded, respectively encoding the background layer and the target layer to generate a code stream, and then combining and decoding the code stream. The composite code stream is decoded to directly retrieve the image containing the target object, which improves the utilization of the computing resources.

Please refer to FIG. 13 , which is a structural block diagram of a second embodiment of a video image compression apparatus according to an embodiment of the present invention. As shown, the apparatus includes:

The code stream recombining unit 330 is specifically configured to:

The layer coding unit 320 includes:

a first filling module 321 , configured to fill an area other than the target in the target layer with a fixed value;

a second filling module 322, configured to fill, in the background layer, an area corresponding to the target in the target layer with a fixed value;

The layer coding module 323 is configured to respectively generate a code stream for each of the filled target layer and the background layer.

The separation identifier is inserted between the code stream of the target layer and the code stream of the background layer after the header information. Specifically, the code stream of each layer is set with a start code identifier for decoding. Differentiate the starting position of the code stream.

In summary, the cooperation of the above functional modules, by extracting the background layer and the target layer from the image to be encoded, respectively encoding the background layer and the target layer to generate a code stream, and then combining and decoding the code stream. The composite code stream is decoded to directly retrieve the image containing the target object, which improves the utilization of the computing resources. At the same time, the header information is also set, and the code stream of the target layer and the code stream of the background layer are recorded after the header information, and the location information of the target in the target layer is recorded in the header information, and the code stream and background of the target layer are recorded. A separate identifier is inserted between the code streams of the layer to distinguish the two, and the ordered storage and fast retrieval of the two code streams are realized.

The following is an embodiment of a video image reading method provided in the specific embodiment of the present invention. The solution in this embodiment is used to read the video code stream obtained in the foregoing embodiment. Please refer to FIG. 14 , which is a flowchart of a method for a first embodiment of a video image reading method according to an embodiment of the present invention. As shown in the figure, the method includes:

Step S401: Acquire a video code stream, where the video code stream is formed by combining a code stream of a target layer and a code stream of a background layer; wherein the target in the target layer is a portion of the interest region in the image.

The video stream is a composite of the code stream of the target layer and the code stream of the background layer. When reading, the target information to be read is targeted.

Step S402: Confirm that the frame in which the position information of the decoding target is recorded in the header information is the target video frame.

The video stream is added with header information, and the header information is recorded with location information of a target in the target layer. It can be accessed directly from the location where the header information is recorded.

The target video frame in which the decoding target is located can also be implemented according to other schemes, for example, without setting the header information, and directly accessing the video frame by frame.

Step S403: Decode an area of interest region from the target video frame, and combine the portion of the interest region into the background layer according to the location information.

A separation identifier is inserted between the code stream of the target layer and the code stream of the background layer after the header information. When the video image is accessed, the code stream in which the target layer is located may be accessed directly according to the boundary of the separated identifier.

In summary, by reading the video stream generated by the composite, fast access to the target layer is achieved, the decoding efficiency is improved, and the complexity of the operation is reduced.

The following is an embodiment of a device for reading video images provided in a specific embodiment of the present invention. The embodiment of the device for reading video images is implemented based on the embodiment of the method for reading video images described above, and is read in several video images. For an explanation of the embodiment of the device taken, please refer to the above embodiment of the method for video image reading.

Please refer to FIG. 15 , which is a structural block diagram of an apparatus for video image reading according to a specific embodiment of the present invention. As shown in the figure, the apparatus includes:

a code stream obtaining unit 510, configured to acquire a video code stream, where the video code stream is formed by combining a code stream of a target layer and a code stream of a background layer; wherein the target in the target layer is in an image Part of the area of interest;

a target confirmation unit 520, configured to confirm a target video frame where the decoding target is located;

The code stream decoding unit 530 is configured to start decoding the related video code stream from the target video frame.

The code stream decoding unit 530 is specifically configured to:

In summary, the cooperative work of the above units realizes fast access to the target layer by reading the composite generated video code stream, improves the decoding efficiency, and reduces the computational complexity.

Finally, an embodiment of the video image processing system is further provided in the embodiment of the present invention. The video image processing system comprises the above-mentioned video image compression device 30 and video image reading device 50. Specifically, as shown in FIG. 16, the video image compression apparatus 30 includes:

The video image reading device 50 includes:

In summary, the cooperation of the above units works by extracting the background layer and the target layer from the image to be encoded, respectively encoding the background layer and the target layer to generate a code stream, and then combining and decoding the code stream. The composite code stream is decoded to directly retrieve the image containing the target object, which improves the utilization of the computing resources. By reading the video stream generated by the composite, fast access to the target layer is achieved, the decoding efficiency is improved, and the computational complexity is reduced.

The above-described embodiments of the present invention are intended to be illustrative only and not to limit the invention. Therefore, any modifications, equivalent substitutions, improvements, etc., which are made without departing from the spirit and scope of the invention, are intended to be included within the scope of the invention. Rather, the scope of the appended claims is intended to cover all such modifications and modifications

Although the embodiments of the present invention have been described in detail, it is understood that various modifications, changes and changes may be made to the embodiments of the present invention without departing from the spirit and scope of the invention.

Claims

A video image compression method includes:

Extracting a background layer and a target layer from the image to be encoded, wherein the target in the target layer is a portion of the region of interest in the image to be encoded;

Generating a code stream for each of the target layer and the background layer respectively;

Combine the code stream of the target layer with the code stream of the background layer.
The video image compression method according to claim 1, wherein the combining the code stream of the target layer and the code stream of the background layer comprises:

The header information is added to the code stream corresponding to the image to be encoded, and the code stream of the target layer and the code stream of the background layer are recorded after the header information.
The video image compression method according to claim 1, wherein the generating a code stream for each of the target layer and the background layer respectively comprises:

Filling the area outside the target in the target layer with a fixed value;

Filling in the background layer corresponding to the target in the target layer with a fixed value;

A code stream is separately generated for each of the filled target layer and the background layer.
A video image compression method according to claim 2, wherein the header information of the target in the target layer is recorded in the header information.
A video image compression method according to claim 4, wherein when the target layer is failed to be extracted from the image to be encoded, the position information of the target in the target layer in the header information is recorded as empty.
A video image compression method according to claim 2, wherein a separation identifier is inserted between the code stream of the target layer after the header information and the code stream of the background layer.
A video image compression device includes:

a layer extracting unit, configured to extract a background layer and a target layer from the image to be encoded, where the target in the target layer is an area of the interest region in the image to be encoded;

a layer coding unit, configured to respectively generate a code stream for each of the target layer and the background layer;

The code stream composite unit is configured to combine the code stream of the target layer and the code stream of the background layer.
A video image compression apparatus according to claim 7, wherein said code stream combining unit is configured to:

The header information is added to the code stream corresponding to the image to be encoded, and the code stream of the target layer and the code stream of the background layer are recorded after the header information.
The video image compression device according to claim 7, wherein the layer coding unit comprises:

a first filling module, configured to fill an area other than the target in the target layer with a fixed value;

a second filling module, configured to fill, in the background layer, an area corresponding to the target in the target layer with a fixed value;

The layer coding module is configured to respectively generate a code stream for each of the filled target layer and the background layer.
A video image compressing apparatus according to claim 8, wherein position information of a target in the target layer is recorded in said header information.
A video image compressing apparatus according to claim 10, wherein when the target layer is failed to be extracted from the image to be encoded, the position information of the target in the target layer in the header information is recorded as empty.
A video image compressing apparatus according to claim 8, wherein a separation identifier is inserted between the code stream of the target layer after the header information and the code stream of the background layer.
A video image reading method includes:

Obtaining a video code stream, where the video code stream is composed of a code stream of a target layer and a code stream of a background layer; wherein the target in the target layer is a portion of the interest region in the image;

Confirm the target video frame where the decoding target is located;

The associated video codestream is decoded starting from the target video frame.
A video image reading method according to claim 13, wherein said video code stream is added with header information, and said header information is recorded with position information of a target in the target layer;

Decoding the associated video codestream from the target video frame, including:

An area of interest region is decoded from the target video frame, and the portion of interest region is composited to the background layer according to the location information.
A video image reading method according to claim 14, wherein a separation identifier is inserted between the code stream of the target layer after the header information and the code stream of the background layer.
A video image reading device comprising:

a code stream obtaining unit, configured to acquire a video code stream, where the video code stream is formed by combining a code stream of a target layer and a code stream of a background layer; wherein the target in the target layer is an interest in the image Regional part;

a target confirmation unit, configured to confirm a target video frame where the decoding target is located;

And a code stream decoding unit, configured to decode the related video code stream from the target video frame.
A video image reading apparatus according to claim 16, wherein said video code stream is added with header information, and said header information is recorded with position information of a target in a target layer;

The code stream decoding unit is configured to:

An area of interest region is decoded from the target video frame, and the portion of interest region is composited to the background layer according to the location information.
A video image reading apparatus according to claim 17, wherein a separation identifier is inserted between the code stream of the target layer after the header information and the code stream of the background layer.
A video image processing system comprising a video image compression device and a video image reading device;

The video image compression device includes:

a layer extracting unit, configured to extract a background layer and a target layer from the image to be encoded, where the target in the target layer is an area of the interest region in the image to be encoded;

a layer coding unit, configured to respectively generate a code stream for each of the target layer and the background layer;

a code stream composite unit, configured to combine a code stream of a target layer and a code stream of a background layer;

The video image reading device includes:

a code stream obtaining unit, configured to acquire a video code stream, where the video code stream is formed by combining a code stream of a target layer and a code stream of a background layer; wherein the target in the target layer is an interest in the image Regional part;

a target confirmation unit, configured to confirm a target video frame where the decoding target is located;

And a code stream decoding unit, configured to decode the related video code stream from the target video frame.
A video image processing system according to claim 19, wherein said code stream combining unit is configured to:

The header information is added to the code stream corresponding to the image to be encoded, and the code stream of the target layer and the code stream of the background layer are recorded after the header information.
A video image processing system according to claim 19, wherein said layer coding unit comprises:

a first filling module, configured to fill an area other than the target in the target layer with a fixed value;

a second filling module, configured to fill, in the background layer, an area corresponding to the target in the target layer with a fixed value;

The layer coding module is configured to respectively generate a code stream for each of the filled target layer and the background layer.
A video image processing system according to claim 20, wherein position information of a target in the target layer is recorded in said header information.
A video image processing system according to claim 22, wherein when the target layer is failed to be extracted from the image to be encoded, the position information of the target in the target layer in the header information is recorded as empty.
A video image processing system according to claim 20, wherein a separation identifier is inserted between the code stream of the target layer after the header information and the code stream of the background layer.
A video image processing system according to claim 19, wherein said video code stream is added with header information, and said header information is recorded with position information of a target in the target layer;

The code stream decoding unit is configured to:

An area of interest region is decoded from the target video frame, and the portion of interest region is composited to the background layer according to the location information.
A video image processing system according to claim 25, wherein a separation identifier is inserted between the code stream of the target layer after the header information and the code stream of the background layer.