CN114422788A

CN114422788A - Digital retina video joint coding method, decoding method, device and electronic equipment

Info

Publication number: CN114422788A
Application number: CN202210321529.8A
Authority: CN
Inventors: 滕波; 方赟; 向国庆; 焦立欣; 牛梅梅; 陆嘉瑶; 洪一帆; 张羿; 章卿妹
Original assignee: Zhejiang Smart Video Security Innovation Center Co Ltd
Current assignee: Zhejiang Smart Video Security Innovation Center Co Ltd
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2022-04-29

Abstract

The application discloses a digital retina video joint coding method, a decoding method, a device and electronic equipment. The digital retina video joint coding method comprises video compression processing and video characteristic processing; wherein the video feature processing comprises determining point of interest location information; the video compression processing includes: and determining coding parameters based on the interest point position information, and carrying out video coding by using the coding parameters. According to the digital retina video joint coding method, the video coding parameters are used for coding and compressing the video to obtain the video compressed stream, the video characteristic information is compressed to obtain the video characteristic code stream, the video compressed stream and the video characteristic code stream are sent when the video data are transmitted, the amount of the compressed video data is greatly reduced, the transmission, the storage, the receiving and the decoding decompression are facilitated, and the requirements of practical application can be well met.

Description

Digital retina video joint coding method, decoding method, device and electronic equipment

Technical Field

The application relates to the technical field of computer vision, in particular to a digital retina video joint coding method, a decoding method, a device and electronic equipment.

Background

In the technical field of computer vision, a traditional camera only compresses shot video data and uploads the compressed video data to a server for storage, and then performs analysis, identification and processing. Because the amount of compressed video data is large, the workload of transmission and decompression is large, the video data is inconvenient to store, transmit and decompress, and the requirements of practical application are difficult to meet.

Disclosure of Invention

The application aims to provide a digital retina video joint coding method, a decoding method, a device and an electronic device. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.

According to a first aspect of embodiments of the present application, there is provided a digital retinal video joint encoding method, including a video compression process and a video feature process; wherein the video feature processing comprises determining point of interest location information; the video compression processing includes: and determining coding parameters based on the interest point position information, and carrying out video coding by using the coding parameters.

In some embodiments of the present application, the point of interest location information is sent only through a video feature stream.

In some embodiments of the present application, said determining encoding parameters based on said point of interest location information comprises: a set of encoding parameters is assigned to a point of interest, wherein locations of the point of interest are distributed across different levels of block-based encoding structures.

In some embodiments of the present application, the block-based coding structure distribution comprises at least one of a macroblock, a subblock, a slice, a group of slices, a sequence, and a picture.

In some embodiments of the present application, the encoding parameters include at least one of: entropy encoding/decoding selected algorithms, golomb encoding, CAVLC, CABAC and quantization parameters.

In some embodiments of the present application, the quantization parameter comprises a quantization parameter of a luminance component and/or a chrominance component.

In some embodiments of the present application, when the point of interest coding parameters and the video stream coding parameters act on the same coding block at the same time, one of the parameters is selected for coding.

In some embodiments of the present application, the selecting one of the parameters is encoded as a point of interest parameter priority, a video stream parameter priority, or providing parameter selection indication information.

According to a second aspect of the embodiments of the present application, there is provided a digital retinal video joint decoding method, including a video decoding process and a visual feature process; wherein the video feature processing comprises determining point of interest location information; the video decoding processing comprises determining encoding parameters for video decoding based on the interest point position information; the point of interest location information for video coding is from a video feature stream.

In some embodiments of the present application, when the point of interest coding parameters and the video stream coding parameters act on the same coding block at the same time, one of the parameters is selected for decoding.

In some embodiments of the present application, the selecting one of the parameters for decoding is performed by taking a point of interest parameter first, taking a video stream parameter first, or providing parameter selection indication information.

According to a third aspect of the embodiments of the present application, there is provided a digital retinal video joint encoding apparatus, including a video compression processing module and a video feature processing module; wherein the video feature processing module comprises a determining unit; the determining unit is used for determining the position information of the interest point;

the video compression processing module comprises:

the coding parameter determining unit is used for determining coding parameters based on the interest point position information;

and the coding unit is used for carrying out video coding by utilizing the coding parameters.

According to a fourth aspect of the embodiments of the present application, there is provided a digital retinal video joint decoding apparatus, comprising a video decoding processing module and a visual feature processing module; wherein the video feature processing module comprises a determination unit; the determining unit is used for determining the position information of the interest point;

the video coding process comprises:

a coding unit for performing video coding according to the encoding parameters;

wherein the point of interest location information for video coding is from a video feature stream.

According to a fifth aspect of the embodiments of the present application, there is provided an electronic device, including a memory, a processor, and a first computer program and/or a second computer program stored in the memory and executable on the processor, wherein the processor executes the first computer program and/or the second computer program to implement the digital retinal video joint encoding method according to the first aspect and/or the digital retinal video joint decoding method according to the second aspect.

According to a fifth aspect of embodiments of the present application, there is provided a computer-readable storage medium on which a first computer program and/or a second computer program is stored, the first computer program and/or the second computer program being executed by a processor to implement the digital retinal video joint encoding method of the first aspect and/or the digital retinal video joint decoding method of the second aspect.

The technical scheme provided by one aspect of the embodiment of the application can have the following beneficial effects:

the digital retina video joint coding method provided by the embodiment of the application determines video coding parameters based on the video characteristic information, utilizes the video coding parameters to code and compress videos to obtain video compression streams, compresses the video characteristic information to obtain video characteristic code streams, and sends the video compression streams and the video characteristic code streams when video data are transmitted, so that the compressed video data volume is greatly reduced, the digital retina video joint coding method is convenient to transmit, store, receive, decode and decompress, and the requirements of practical application can be well met.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the application, or may be learned by the practice of the embodiments. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 shows a flow diagram of a digital retinal video joint encoding method of one embodiment of the present application;

FIG. 2 shows a flow diagram of a digital retinal video joint encoding method of another embodiment of the present application;

fig. 3 shows a flowchart of step S10 in fig. 2;

FIG. 4 shows a schematic diagram of step S10 in FIG. 2;

fig. 5 shows a flowchart of step S40 in fig. 2;

FIG. 6 is a block diagram of a digital retinal video joint encoding apparatus according to another embodiment of the present application;

FIG. 7 shows a flow diagram of a digital retinal video joint decoding method of another embodiment of the present application;

fig. 8 shows a flowchart of step S200 in fig. 7;

FIG. 9 is a block diagram of a digital retinal video processing system according to another embodiment of the present application;

FIG. 10 shows a schematic diagram of a computer-readable storage medium of another embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

A more efficient camera designed by taking the biological characteristics that human retinas have both image coding and feature coding functions as a reference is called a digital retinal camera (digital retina), which is called a digital retina (digital retina) for short. The core of the digital retina is "single camera dual data stream", where the compressed video stream is for storage and offline viewing, while the compact feature stream is for large data analysis and searching. An important feature of digital retinal technology is simultaneous dual-stream or multi-stream transmission of video streams and visual feature streams, including even summarized video streams, which provides convenience for video retrieval, video analysis, and storage. The digital retina technology can be widely applied to the technical field of video processing such as video retrieval, video analysis and the like.

Video retrieval is a common video analysis application, and the international standard MPEG CDVS (Compact Descriptors for Visual Search (CDVS) standard) specifies the compression, storage and transmission of feature Descriptors in video retrieval applications. The CDVS coding technique includes the following parts: identification of points of interest, local feature selection, local feature descriptor compression, local feature descriptor aggregation, local feature location compression, and the like.

Points of interest extracted in video analysis applications often contain rich visual features, and therefore large loss of detail due to video compression is often undesirable. For example, in some prior art techniques, a smaller quantization parameter may be set for the region of interest, which is beneficial to maintain a higher picture quality of the region of interest in the reconstructed video after video coding. In a block-based encoding technique, such as a video codec standard technique like h.264/h.265/h.266, corresponding encoding parameters, including quantization parameters, can be configured for structures related to a region of interest, including but not limited to macroblocks (macroblock), sub-blocks (sub-block), slices (Slice), Slice groups (Slice group), sequences (sequence), images (picture), and the like.

Interest points in the image can be well found in video analysis by various algorithms, including but not limited to the log (laplacian of gaussian) or bflog (block Based Frequency domain) algorithms. However, the interest point regions determined by these algorithms cannot be directly matched with the block segmentation structure in the existing video codec. In other words, the interest point regions determined by the interest point detection algorithm in the video analysis may be distributed in a part of a plurality of macro blocks (macroblock), sub blocks (sub-block), slices (Slice), and Slice groups (Slice group). The macro block (macroblock), the sub-block (sub-block), the Slice (Slice), and the Slice group (Slice group) are sequentially included from the structural level. For example, the points of interest are distributed in one of the sub-blocks under Slice 1 and one of the sub-blocks under Slice 2. That is, the point of interest distribution does not occupy the entire portion (i.e., all sub-blocks) of an encoding structure such as a slice. In the embodiments of the present application, this case is called the distribution of the interest point locations across different levels of the coding structure. When the positions of the interest points are distributed across coding structures of different levels, if corresponding coding parameters different from the interest points are required to be set for the interest points, the required overhead is very large, because the positions of the coding structures scattered at multiple positions and the corresponding coding parameters need to be coded and transmitted in a video code stream.

Considering that in the digital retina technology architecture, the position information of the interest point is transmitted through the feature stream after being coded, for the decoding end, the position of a plurality of video coding structures corresponding to dispersion can be determined by using only the position information of the interest point indicated in the feature stream, and the same information does not need to be transmitted through the video compression stream.

When the positions of the interest points are distributed across coding structures of different levels, if corresponding coding parameters different from the interest points are required to be set for the interest points, the required overhead is very large, because the positions of the coding structures scattered at multiple positions and the corresponding coding parameters need to be coded and transmitted in a video code stream.

As shown in fig. 1, an embodiment of the present application provides a digital retinal video joint encoding method, which includes a video compression process and a video feature process; wherein the video feature processing comprises determining point of interest location information; the video compression processing includes: and determining coding parameters based on the interest point position information, and carrying out video coding by using the coding parameters. The point of interest location information is sent only through the video feature stream.

In some embodiments, determining encoding parameters based on the point of interest location information comprises: a set of encoding parameters is assigned to a point of interest, wherein locations of the point of interest are distributed across different levels of block-based encoding structures.

In some embodiments, the block-based coding structure distribution includes at least one of a macroblock, a subblock, a slice, a group of slices, a sequence, and a picture.

The encoding parameters include at least one of the following parameters: entropy encoding/decoding selected algorithms, golomb encoding, CAVLC, CABAC and quantization parameters.

The quantization parameter includes a quantization parameter of a luminance component and/or a chrominance component.

In some embodiments, when the point of interest coding parameters and the video stream coding parameters act on the same coding block at the same time, one of the parameters is selected for coding.

In some embodiments, one of the parameters is selected for encoding as a point of interest parameter first, a video stream parameter first, or providing parameter selection indication information.

Referring to fig. 2, in some embodiments, in the digital retinal video joint encoding method, the step of determining the point of interest location information includes step S10, and the step of video compression processing includes steps S20 to S50.

Specifically, the step of determining the location information of the point of interest includes the steps of:

and S10, extracting the video characteristic information of the collected monitoring video.

Video feature information is extracted from a surveillance video acquired by a video acquisition device such as a camera. The video feature information includes at least location information of the point of interest. The interest points are pixel points with key positions and special properties in the image, such as corners, contours, edge endpoints, extreme points and the like protruding from the target image.

In the embodiment of the application, the video features may comprise at least one of visual features and machine learning features. The visual features comprise information about colors, patterns, textures, gray levels and the like in the video frame; the machine learning features may include pedestrian recognition, license plate recognition, traffic accidents, traffic direction recognition, and the like.

As shown in fig. 3, in some embodiments, step S10 includes:

s101, converting each frame of image of the monitoring video to form a plurality of characteristic values.

As shown in fig. 4, the transformed image 102 includes a plurality of points 1021 corresponding to feature values.

And S102, dividing each frame of image after transformation into a plurality of areas.

The image 104 is a segmented image and includes a plurality of regions 1022.

S103, identifying interest points and determining position information of the interest points according to the characteristic values contained in each region; the interest points are pixel points corresponding to characteristic values exceeding a preset threshold value.

The image 106 is an image after the interest point is identified, and a plurality of regions 1023 containing the interest point exist in the image 106. The image 108 is an image after determining the position information of the interest points, and each region containing the interest points is marked with a number representing the number of the interest points of the region.

The identification of the points of interest may employ, but is not limited to, the LoG (Laplacian of Gaussian) or BFLoG (Block Based Frequency Domain) algorithm, Harris corner determination algorithm, and the like.

The location information of the points of interest may be distributed across different levels of the block-based coding structure. For example, the Coding structure defined in the block-based Coding technology includes, but is not limited to, a macroblock (macroblock), a sub-block (sub-block), a Slice (Slice), a Slice group (Slice group), a sequence (sequence), a picture (picture), a Coding Unit (Coding Unit), and a Prediction Unit (Prediction Unit), and the position information of the interest point determined by the feature extraction method cannot be directly matched with the block partition structure in the existing video codec.

The interest point regions determined by the interest point detection algorithm in the video analysis may be distributed in a plurality of coding structures that are in a contained relationship in sequence on the structural level. For example, the points of interest are distributed in one of the sub-blocks under Slice 1 and one of the sub-blocks under Slice 2. That is, the point of interest distribution does not occupy the entire portion (i.e., all sub-blocks) of an encoding structure such as a slice. In the embodiments of the present application, this case is called the distribution of the interest point locations across different levels of the coding structure.

Generally, the result of the identification of the point of interest includes both the local feature values of the point of interest determined by different algorithms and the location of the point of interest.

In some embodiments, extracting the video feature information may further include: local feature selection, local feature descriptor compression, local feature descriptor aggregation, local feature location compression, and the like.

In some embodiments, the step of video compression processing specifically includes the steps of:

and S20, determining video coding parameters based on the video characteristic information.

In an embodiment of the present application, video feature information is used to determine video coding parameters. The video feature information includes at least location information of the point of interest. The video coding parameters include a quantized value of a luminance component and/or a quantized value of a chrominance component.

In one embodiment consistent with the present application, a video compression processing unit assigns a set of dedicated encoding parameters to a point of interest.

The relevant encoding parameters include at least: the quantization parameter includes at least a quantization value of the luminance component and/or the chrominance component. The application of the set of encoding parameters can preserve more visual details for the region-of-interest image than for neighboring regions outside the region-of-interest, which is very beneficial for improving the subjective and objective quality of the video image. Specifically on the coding parameters, the quantization values are relatively small.

In some embodiments, specialized encoding parameters may be set only for point of interest location pixels whose feature values exceed a certain threshold. Or special encoding parameters are set for only the pixels of the interest point position exceeding a certain threshold in part of the characteristic value types. This has the advantage of preserving the most needed video details as required by the video analysis task under the constraint of limited coding bandwidth.

S30, coding and compressing the video by using the video coding parameters to obtain a video compressed stream; the video compression stream includes the video coding parameters.

The video coding compression can reduce the video data rate on the premise of ensuring the visual effect as much as possible, and is convenient for transmission. When encoding and compressing, H.264 encoding technology, H.265 encoding technology, AVS2 encoding technology or AVS3 encoding technology can be adopted for encoding and compressing.

And S40, compressing the video characteristic information to obtain a video characteristic code stream.

In order to reduce the transmission bandwidth and the data storage space, the local feature values and the positions of the interest points need to be compressed.

In some embodiments, compression of local feature values generally requires a process that includes quantization and encoding. The algorithm adopted for compressing the location information of the interest point may include, for example: dividing each frame image of the video into N × M areas, and counting the number (count) of pixels of the characteristic value exceeding a preset threshold value on each area; the feature point exceeding the preset threshold is a feature point (salient feature point) of the region of interest. In many cases, only the positions and numbers of these salient feature points are coded and transmitted.

As shown in fig. 5, in some embodiments, step S40 includes:

s401, dividing the video characteristic information by a preset constant to obtain quantized data;

the preset constant is preset and can be adjusted according to actual needs.

S402, coding the quantized data to obtain the video characteristic code stream.

When the positions of the interest points are distributed across coding structures of different levels, if corresponding coding parameters different from the interest points are required to be set for the interest points, the required overhead is very large, because the positions of the coding structures scattered at multiple positions and the corresponding coding parameters need to be coded and transmitted in a video code stream. In order to reduce unnecessary overhead, the interest point position information is not transmitted through a video compression code stream, but is transmitted only through a video feature code stream. The digital retina data receiving terminal can simultaneously receive the video characteristic stream and the video compressed stream, so that the interest point characteristic information (including position information and quantity information) recovered from the video characteristic code stream is used for decoding the compressed video data.

For the video coding parameters Set for any video coding structure, for example, a sequence parameter Set SPS and a Picture Parameter Set (PPS) in the HEVC standard may be used. The SPS maintains a set of global parameters for a Coded video sequence. And a Picture may be segmented into one or more SS (slice segment) s, and the same Picture Parameter Set (PPS) is used for all SS in the same Picture. Some of the same parameters as the SPS exist in the PPS, and the PPS may override the corresponding values in the SPS, i.e., the SS may use these parameters in the PPS for decoding. However, in the embodiment of the present application, since the interest point locations may be distributed across different levels of block-based coding structures, it is necessary to design a coding parameter specifically for different interest points and different feature types. In some embodiments, the specialized encoding parameters are applied only for regions of interest determined for particular feature types. In some embodiments, the special encoding parameters are applied only to the region of interest determined by the feature points of a specific feature type and having a feature value exceeding a certain threshold.

In one embodiment, the following data structure (ROI coding parameters) is designed:

{

flag: 1 bit// identifying whether coding parameters exist for ROI area

1bit per ROI encoding parameter, and/or whether or not the ROI encoding parameter is in conflict with other encoding parameters

ROI _ id 8bit, identify ROI area;

feature _ id 8 bit// identifying ROI area

chroma _ qp _ delta 8 bit// chroma quantization offset to PPS

luma _ qp _ delta 8 bit// luma quantization offset value relative to PPS

}

And determining whether the current coding block is within the coverage range of the interest points according to the position distribution of the interest points. And if so, encoding the current encoding block according to the ROI area parameters. However, when the point-of-interest coding parameter and other coding parameters of the video stream simultaneously act on the same coding block and there is a conflict, the ROI coding parameter or other coding parameters of the video stream may be scheduled to be used, or one of the parameters may be selected for coding according to the mode _ select indication. If mode _ select is "1", then the ROI encoding parameters are used as the criteria. In this example, chroma _ qp _ delta/luma _ qp _ delta is defined relative to quantized values in the PPS parameter set, and may in fact be defined entirely with reference to other parameter values or in other manners.

In some embodiments, the same video processing unit may support multiple video analytics (including video retrieval) applications, while one video analytics application may include multiple video features.

And S50, sending the video compressed stream and the video characteristic code stream.

Data transmission may be through a priority network, such as an ethernet interface; or a cellular communication network such as a WIFI network or 4G/5G network. The video characteristic information is only sent through the video characteristic code stream.

In some embodiments, generation and compression of the summarized video stream may be further included.

Another embodiment of the present application provides a digital retinal video joint encoding apparatus, which includes a video compression processing module and a video feature processing module;

in some embodiments, the video compression processing module comprises:

The video feature processing module comprises a determining unit; the determination unit is used for determining the position information of the interest point.

As shown in fig. 6, the determination unit includes:

the extraction subunit is used for extracting the video characteristic information of the acquired monitoring video; the video characteristic information at least comprises interest point position information;

a determining subunit, configured to determine a video coding parameter based on the video feature information.

As shown in fig. 6, the video compression processing module includes:

the first compression submodule is used for coding and compressing the video by utilizing the video coding parameters to obtain a video compression stream; the video compression stream comprises the video coding parameters;

the second compression submodule is used for compressing the video characteristic information to obtain a video characteristic code stream;

and the sending submodule is used for sending the video compressed stream and the video characteristic code stream.

In some embodiments, the video feature information includes at least location information of the point of interest; the step of extracting the video characteristic information of the collected monitoring video, which is executed by the extraction subunit, comprises the following steps:

converting each frame of image of the monitoring video to form a plurality of characteristic values;

dividing each frame of image after transformation into a plurality of areas;

according to the characteristic value contained in each region, identifying interest points and determining the position information of the interest points; the interest points are pixel points corresponding to characteristic values exceeding a preset threshold value.

In some embodiments, the compressing, performed by the second compression sub-module, the video feature information to obtain the video feature code stream includes the following steps:

dividing the video characteristic information by a preset constant to obtain quantized data;

and coding the quantized data to obtain the video characteristic code stream.

Another embodiment of the present application provides a digital retinal video joint decoding method, which includes a video decoding process and a visual feature process; wherein the video feature processing comprises determining point of interest location information; the video decoding processing comprises determining encoding parameters for video decoding based on the interest point position information; the point of interest location information for video coding is from a video feature stream.

In some embodiments, determining encoding parameters based on the point of interest location information comprises: a set of encoding parameters is assigned to a point of interest, wherein locations of the point of interest are distributed across different levels of block-based encoding structures. The block-based coding structure distribution includes at least one of macroblocks, sub-blocks, slices, slice groups, sequences, and pictures. The encoding parameters include at least one of the following parameters: entropy encoding/decoding selected algorithms, golomb encoding, CAVLC, CABAC and quantization parameters. The quantization parameter includes a quantization parameter of a luminance component and/or a chrominance component.

In some embodiments, when the point of interest coding parameters and the video stream coding parameters act on the same coding block at the same time, one of the parameters is selected for decoding. One of the parameters is selected to be decoded into a point of interest parameter priority, a video stream parameter priority or parameter selection indication information is provided.

As shown in fig. 7, the step of determining the location information of the point of interest includes the following steps:

s100, receiving a video compression stream and a video characteristic code stream corresponding to the video compression stream; the video compression stream contains video coding parameters.

The video compressed stream and the video feature stream may be received simultaneously. The video compression stream and the video feature code stream can be received in a wired mode, a wireless mode (WIFI, a cellular network and the like) and the like. Different video characteristic code streams can have a uniform coding and decoding and/or characteristic matching mode, and can also adopt an independent coding and decoding and/or characteristic matching mode.

S200, decompressing the video characteristic code stream to obtain video characteristic information; the video characteristic information includes point of interest location information.

And recovering video characteristic information from the video characteristic code stream, wherein the video characteristic information at least comprises the position information of the interest points, and the video characteristic information also comprises the position distribution of the significant characteristic values, the type of the characteristic values and the number of nonzero or significant characteristic values of the significant characteristic values. This information is used to decode the compressed video.

In some implementations, determining encoding parameters for video coding based on the point of interest location information includes:

s300, decoding and decompressing the video compression stream by combining the video characteristic information and the video coding parameters to obtain a monitoring video.

In some embodiments, when the video compression stream indicates that the ROI coding parameters are used, the compressed data portions that may need to be decoded by applying the ROI coding parameters may be determined according to the location information of the interest point (or also according to the feature type, the feature value information, etc.), and corresponding coding parameters are determined to decode the data.

In some embodiments, the specialized encoding parameters are applied only for regions of interest determined for particular feature types. In some embodiments, a special encoding parameter is applied only to a region determined by a point of interest with a specific feature type and a feature value exceeding a certain threshold, which may be preset in a video coding unit or obtained through information carried in a video compressed code stream. When the point of interest coding parameter and other coding parameters of the video stream simultaneously act on the same coding block and conflict exists, the ROI coding parameter or other coding parameters of the video stream can be scheduled to be used as a standard, or one of the parameters can be selected for decoding according to the indication of mode _ select.

As shown in fig. 8, in some embodiments, the video feature information includes at least location information of the point of interest; the step S200 includes:

s2001, decoding the video characteristic code stream to obtain quantized data.

The quantized data is obtained by dividing the video feature information by a predetermined constant.

And S2002, multiplying the quantized data by a preset constant to obtain the video characteristic information.

The video feature information includes at least location information of the point of interest.

Another embodiment of the present application provides a digital retinal video joint decoding device, which includes a video decoding processing module and a visual characteristic processing module; wherein the video feature processing module comprises a determination unit; the determining unit is used for determining the position information of the interest point;

the video coding process comprises:

a coding unit for performing video coding according to the encoding parameters;

In some embodiments, the determining unit comprises:

the receiving subunit is used for receiving a video compressed stream and a video characteristic code stream corresponding to the video compressed stream; the video compression stream comprises video coding parameters;

the first decompression subunit is used for decompressing the video characteristic code stream to obtain video characteristic information; the video characteristic information includes point of interest location information.

In some embodiments, the encoding parameter determination unit includes:

and the second decompression sub-unit is used for decoding and decompressing the video compressed stream by combining the video characteristic information and the video coding parameters to obtain the monitoring video.

In some embodiments, the decompressing performed by the first decompressing subunit to decompress the video feature code stream to obtain the video feature information specifically includes the following steps:

and decoding the video characteristic code stream to obtain quantized data.

And multiplying the quantized data by a preset constant to obtain the video characteristic information.

Another embodiment of the present application provides a terminal, which includes a first memory, a first processor, and a first computer program stored in the first memory and executable on the first processor, where the first processor executes the first computer program to implement the digital retinal video joint encoding method according to any one of the foregoing embodiments.

Another embodiment of the present application provides a server, which includes a second memory, a second processor, and a second computer program stored in the second memory and executable on the second processor, and the second processor executes the second computer program to implement the digital retinal video joint decoding method according to any one of the foregoing embodiments.

As shown in fig. 9, another embodiment of the present application provides a digital retinal video processing system, including a terminal and a server; the terminal comprises a first memory, a first processor and a first computer program which is stored on the first memory and can run on the first processor, wherein the first processor executes the first computer program to realize the digital retinal video joint coding method of any one of the above embodiments; the server comprises a second memory, a second processor and a second computer program stored on the second memory and executable on the second processor, wherein the second processor executes the second computer program to implement the digital retinal video joint decoding method of any one of the above embodiments.

Another embodiment of the present application provides an electronic device, which includes a memory, a processor, and a first computer program and/or a second computer program stored in the memory and executable on the processor, where the processor executes the first computer program and/or the second computer program to implement the digital retinal video joint encoding method of any of the above embodiments and/or the digital retinal video joint decoding method of any of the above embodiments.

Another embodiment of the present application provides a computer-readable storage medium, on which a first computer program and/or a second computer program is stored, the first computer program and/or the second computer program being executed by a processor to implement the digital retinal video joint encoding method of any of the above embodiments and/or the digital retinal video joint decoding method of any of the above embodiments.

Referring to fig. 10, the computer-readable storage medium is an optical disc 20, on which a computer program (i.e., a program product) is stored, and when the computer program is executed by a processor, the computer program performs the joint encoding method of the digital retina video according to any of the above embodiments and/or the joint decoding method of the digital retina video according to any of the above embodiments.

It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memories (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory, or other optical and magnetic storage media, which are not described in detail herein.

The computer-readable storage medium provided by the above-mentioned embodiments of the present application and the method provided by the embodiments of the present application have the same advantages as the method adopted, executed or implemented by the application program stored in the computer-readable storage medium.

It should be noted that:

the term "module" is not intended to be limited to a particular physical form. Depending on the particular application, a module may be implemented as hardware, firmware, software, and/or combinations thereof. Furthermore, different modules may share common components or even be implemented by the same component. There may or may not be clear boundaries between the various modules.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may also be used with the examples based on this disclosure. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The above-mentioned embodiments only express the embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. A digital retina video joint coding method is characterized by comprising video compression processing and video characteristic processing; wherein the video feature processing comprises determining point of interest location information; the video compression processing includes: and determining coding parameters based on the interest point position information, and carrying out video coding by using the coding parameters.

2. The method of claim 1, wherein the point of interest location information is sent only via a video feature stream.

3. The method of claim 1, wherein the determining encoding parameters based on the point of interest location information comprises: a set of encoding parameters is assigned to a point of interest, wherein locations of the point of interest are distributed across different levels of block-based encoding structures.

4. The method of claim 3, wherein the block-based coding structure distribution comprises at least one of macroblocks, sub-blocks, slices, slice groups, sequences, and pictures.

5. The method of claim 1, wherein the encoding parameters comprise at least one of: entropy encoding/decoding selected algorithms, golomb encoding, CAVLC, CABAC and quantization parameters.

6. The method of claim 5, wherein the quantization parameter comprises a quantization parameter for a luma component and/or a chroma component.

7. The method of claim 1, wherein one of the parameters is selected for encoding when the point-of-interest coding parameter and the video stream coding parameter are applied to the same coding block at the same time.

8. The method of claim 7, wherein the selecting one of the parameters is encoded as a point of interest parameter first, a video stream parameter first, or providing parameter selection indication information.

9. A digital retina video joint decoding method is characterized by comprising video decoding processing and visual characteristic processing; wherein the video feature processing comprises determining point of interest location information; the video decoding processing comprises determining encoding parameters for video decoding based on the interest point position information; the point of interest location information for video coding is from a video feature stream.

10. The method of claim 9, wherein the determining encoding parameters based on the point of interest location information comprises: a set of encoding parameters is assigned to a point of interest, wherein locations of the point of interest are distributed across different levels of block-based encoding structures.

11. The method of claim 10, wherein the block-based coding structure distribution comprises at least one of macroblocks, sub-blocks, slices, slice groups, sequences, and pictures.

12. The method of claim 9, wherein the encoding parameters comprise at least one of: entropy encoding/decoding selected algorithms, golomb encoding, CAVLC, CABAC and quantization parameters.

13. The method of claim 12, wherein the quantization parameter comprises a quantization parameter for a luma component and/or a chroma component.

14. The method of claim 9, wherein one of the parameters is selected for decoding when the point-of-interest coding parameter and the video stream coding parameter are simultaneously applied to the same coding block.

15. The method of claim 14, wherein the selecting one of the parameters for decoding is performed by taking a point of interest parameter first, a video stream parameter first, or providing parameter selection indication information.

16. A digital retina video joint coding device is characterized by comprising a video compression processing module and a video feature processing module; wherein the video feature processing module comprises a determining unit; the determining unit is used for determining the position information of the interest point;

the video compression processing module comprises:

17. A digital retina video joint decoding device is characterized by comprising a video decoding processing module and a visual characteristic processing module; wherein the video feature processing module comprises a determination unit; the determining unit is used for determining the position information of the interest point;

the video coding process comprises:

a coding unit for performing video coding according to the encoding parameters;

18. An electronic device comprising a memory, a processor and a first computer program and/or a second computer program stored on the memory and executable on the processor, the processor executing the first computer program and/or the second computer program to implement the digital retinal video joint encoding method according to any one of claims 1 to 8 and/or the digital retinal video joint decoding method according to any one of claims 9 to 15.

19. A computer-readable storage medium, on which a first computer program and/or a second computer program is stored, characterized in that the first computer program and/or the second computer program is executed by a processor to implement the digital retinal video joint encoding method according to any one of claims 1 to 8 and/or the digital retinal video joint decoding method according to any one of claims 9 to 15.