CN112351252A

CN112351252A - Monitoring video coding and decoding device

Info

Publication number: CN112351252A
Application number: CN202011162454.0A
Authority: CN
Inventors: 张韵东; 昝劲文
Original assignee: Chongqing Zhongxing Micro Artificial Intelligence Chip Technology Co ltd
Current assignee: Chongqing Zhongxing Micro Artificial Intelligence Chip Technology Co ltd
Priority date: 2020-10-27
Filing date: 2020-10-27
Publication date: 2021-02-09
Anticipated expiration: 2040-10-27
Also published as: CN112351252B

Abstract

The embodiment of the disclosure discloses a monitoring video coding and decoding device. One embodiment of the apparatus comprises: video acquisition device, feature extraction device, machine vision device, coding parameter setting device, video coding device, wherein: the video acquisition device is used for acquiring video information in a monitoring range and sending the video information to the feature extraction device and the video coding device; the characteristic extraction device is used for receiving the video information sent by the video acquisition device and carrying out characteristic extraction processing on the video information to generate characteristic extraction information; the machine vision device is used for receiving the feature extraction information sent by the feature extraction device; the coding parameter setting device is used for receiving the feature extraction information sent by the feature extraction device; the video coding device is used for receiving the video information collected by the video collecting device and carrying out coding processing on the video information to generate a video coding stream. This embodiment improves the performance and efficiency of the system in processing video information.

Description

Monitoring video coding and decoding device

Technical Field

The disclosed embodiment relates to the technical field of information transmission, in particular to a monitoring video coding and decoding device.

Background

As machine vision technology has made breakthrough in recent years, the highlighting of various visual tasks has also accelerated its popularity in a number of application areas, including video surveillance. A user may view and interpret video data through a machine.

However, the following technical problems are generally encountered when a user views and interprets video data through a machine:

firstly, the existing video coding and decoding standard algorithm usually takes human audiences as content receivers, and the evaluation on video coding is inaccurate, so that the existing video coding and decoding standard algorithm is directly applied to a video monitoring system based on machine vision processing, and the performance of the system on video coding and decoding is reduced;

second, the conventional standard codec algorithm generally defaults the user to be a video information receiver, and is not designed and optimized for machine vision, so that the system performance cannot reach an optimal state.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure provide a surveillance video codec device to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a surveillance video encoding and decoding apparatus, including: video acquisition device, feature extraction device, machine vision device, coding parameter setting device, video coding device, wherein: the video acquisition device is used for acquiring video information in a monitoring range and sending the video information to the feature extraction device and the video coding device; the feature extraction device is in communication connection with the video acquisition device, and is used for receiving video information sent by the video acquisition device and performing feature extraction processing on the video information to generate feature extraction information; the machine vision device is in communication connection with the feature extraction device, and is used for receiving feature extraction information sent by the feature extraction device; the coding parameter setting device is in communication connection with the feature extraction device, wherein the coding parameter setting device is used for receiving feature extraction information sent by the feature extraction device; the video encoding device is in communication connection with the video acquisition device, wherein the video encoding device is used for receiving video information acquired by the video acquisition device and encoding the video information to generate a video encoding stream.

In some embodiments, the surveillance video codec device further includes a storage and transmission device, wherein: the video coding device is in communication connection with the storage transmission device, and is further used for sending the video coding stream to the storage transmission device; the storage and transmission device is used for receiving the video coding stream and storing and transmitting the video coding stream.

In some embodiments, the feature extraction device is further configured to send the feature extraction information to the machine vision device and the encoding parameter setting device, wherein the video encoding device is communicatively connected to the machine vision device and configured to send the video encoded stream to the machine vision device.

In some embodiments, the machine vision device is further configured to receive a video encoding stream sent by the video encoding device, and perform feature extraction processing on the video encoding stream to generate feature-extracted video encoding information.

In some embodiments, the machine vision device is further configured to perform an evaluation process on the generated feature extraction video coding information according to a preset feature parameter to generate evaluated feature extraction video coding information.

In some embodiments, the machine vision device is communicatively coupled to the encoding parameter setting device, and is configured to send the evaluated feature-extracted video encoding information to the encoding parameter setting device.

In some embodiments, the encoding parameter setting device is further configured to receive estimated feature extraction video encoding information sent by the machine vision device, and determine a parameter error between the feature extraction information and the estimated feature extraction video encoding information as an encoding parameter according to preset parameter information.

In some embodiments, the encoding parameter setting device is communicatively connected to the video encoding device, and the encoding parameter setting device is configured to send the encoding parameters to the video encoding device.

In some embodiments, the video encoding apparatus is further configured to receive the encoding parameter sent by the encoding parameter setting apparatus, and adjust a preset parameter in the video encoding apparatus according to the encoding parameter.

In some embodiments, the video encoding device is further configured to perform optimization processing on the video encoding stream according to the adjusted preset parameter, and send the video encoding stream after the optimization processing to the storage and transmission device and the machine vision device.

In some embodiments, the machine vision device supports a neural network comprising a convolutional layer, a pooling layer, and a fully-connected layer, wherein: the convolutional layer is used for extracting the characteristics of the video information; the pooling layer is used for performing dimensionality reduction processing on the characteristics of the extracted information; the full link layer is used for representing the relationship between the video information and the feature extraction information.

The above embodiments of the present disclosure have the following advantages: first, the video information can be captured by the video capture device. Next, the video information may be transmitted to the feature extraction device and the video encoding device. Then, the video encoding apparatus may perform encoding processing on the received video information. Thereby, data support can be provided for optimizing video information. Then, the feature extraction means may perform feature extraction processing on the video information. The feature extraction information may then be sent to a machine vision device and an encoding parameter setting device. Thereby, data support may be provided for evaluating errors of the video information. The video encoding device may then transmit the video encoded stream to the machine vision device. Therefore, the machine vision device can perform feature extraction processing on the video coding stream, evaluate the feature extraction video coding information according to the feature extraction information, and provide a reference basis for adjusting the coding parameters in the next step. And then, the coding parameter setting device determines a parameter error between the received characteristic extraction information and the evaluated characteristic extraction video coding information according to preset parameter information, and sends the parameter error to the video coding device so that the video coding device can perform parameter adjustment on the video coding stream. Therefore, the collected video information can be optimized, and the quality of the output video information of the system is improved. Thus, the performance and efficiency of the system for processing video information is improved.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

Fig. 1 is an exemplary system architecture diagram of a surveillance video codec device according to some embodiments of the present disclosure;

FIG. 2 is a schematic block diagram of one embodiment of a surveillance video codec device according to some embodiments of the present disclosure;

fig. 3 is a schematic structural diagram of yet another embodiment of a surveillance video codec device according to some embodiments of the present disclosure;

fig. 4 is a schematic diagram of a network structure in which a machine vision device in a surveillance video codec device supports a neural network including a convolutional layer, a pooling layer, and a fully-connected layer according to some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 of a surveillance video codec device according to an embodiment of the disclosure.

Referring to fig. 1, the system architecture 100 may include a video capture device 101, a feature extraction device 102, a machine vision device 103, an encoding parameter setting device 104, a video encoding device 105, a storage transmission device 106, a network 107, a network 108, a network 109, a network 110, and a network 111. The network 107 is used as a medium for providing a communication link between the video capture device 101 and the feature extraction device 102. The network 108 serves as a medium for providing a communication link between the feature extraction device 102 and the machine vision device 103. Network 109 is the medium used to provide a communication link between video capture device 101 and video encoding device 105. The network 110 is used as a medium to provide a communication link between the feature extraction device 102 and the machine vision device 103. The network 110 also serves as a medium for providing a communication link between the machine vision device 103 and the video encoding device 105. The network 110 is also used to encode the medium that provides the communication link between the parameter setting device 104 and the video encoding device 105. Network 111 is the medium used to provide a communication link between video encoding device 105 and storage transmission device 106. Network 107, network 108, network 109, network 110, and network 111 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The video capture device 101 may interact with the feature extraction device 102 via the network 107 to transmit video information. The video capture device 101 may be any of a variety of video information capture devices including, but not limited to, optical image sensors, infrared sensors, electronic radars, and the like.

Video capture device 101 may interact with video encoding device 105 via network 109 to transmit video information. The video coding device 105 may include, but is not limited to, an audio video compression codec chip, an information input channel, an information output channel, a network interface, an audio video interface, a protocol interface control, a serial communication interface, embedded software, and the like.

The feature extraction means 102 may interact with the encoding parameter setting means 104 via the network 108 to send feature extraction information. The feature extraction device 102 may support a neural network comprising at least one of convolutional layers, pooling layers, fully-connected layers.

The feature extraction device 102 may interact with the machine vision device 103 over the network 110 to send feature extraction information. The machine vision device 103 may support a neural network that includes convolutional layers, pooling layers, and fully-connected layers.

The machine vision device 103 may interact with the encoding parameter setting device 104 via the network 110 to send the evaluated feature extracted video encoding information.

The video encoding device 105 may interact with the machine vision device 103 over the network 110 to transmit a video encoded stream.

The encoding parameter setting means 104 may interact with the video encoding means 105 through the network 110 to transmit the encoding parameters. The coding parameter setting device 104 may support machine learning evaluation to perform error evaluation on the feature extraction information and the evaluated feature extraction video coding information.

Video encoding device 105 may interact with storage transport device 106 via network 111 to transmit video encoded streams. The storage transfer device 106 may be a variety of electronic devices, including: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to perform functions as described in the first aspect.

It should be understood that the number of video capture devices, feature extraction devices, machine vision devices, encoding parameter setting devices, video encoding devices, storage transmission devices, and networks in fig. 1 are merely illustrative. Any number of video acquisition devices, feature extraction devices, machine vision devices, encoding parameter setting devices, video encoding devices, storage and transmission devices, and networks may be provided as desired.

With continued reference to fig. 2, a schematic structural diagram of an embodiment of the surveillance video codec device provided in the present disclosure is shown. As shown in fig. 2, the surveillance video codec device of this embodiment may include: the device comprises a video acquisition device 1, a feature extraction device 2, a machine vision device 3, a coding parameter setting device 4 and a video coding device 5.

In some embodiments, the video capture device 1 may be any of various video information capture devices including, but not limited to, optical image sensors, infrared sensors, electronic radars, and the like. Here, the video capture device 1 may be configured to capture video information within a monitoring range and transmit the video information to the feature extraction device 2 and the video encoding device 5. Here, the video information may be a video signal.

In some embodiments, the feature extraction device 2 may receive video information sent by the video capture device 1, and perform feature extraction processing on the video information to generate feature extraction information. The feature extraction device 2 is further configured to send feature extraction information to the machine vision device 3 and the encoding parameter setting device 4. Here, the feature extraction information includes, but is not limited to, spatial information, temporal information, color information, character information, vehicle information, motion information, and the like. Here, the feature extraction device 2 may support an extraction manner using machine learning.

In some embodiments, the machine vision device 3 may receive a video encoding stream transmitted by the video encoding device 5. Here, the video encoding stream may refer to an encoded video signal. The machine vision means 3 is also arranged to perform a feature extraction operation on the video encoded stream and to send the extracted feature information as an input signal to the encoding parameter setting means 4. Here, the feature information includes, but is not limited to, spatial information, temporal information, color information, character information, vehicle information, motion information, and the like of the video. Here, the machine vision apparatus 3 may perform the feature extraction operation on the video encoding stream in a manner that supports, but is not limited to, machine learning.

In some embodiments, the encoding parameter setting device 4 may receive the feature extraction information sent by the feature extraction device 2 and the evaluated feature extraction video encoding information sent by the machine vision device 3. Optionally, the encoding parameter setting device 4 may further determine, according to preset parameter information, a parameter error between the feature extraction information and the estimated feature extraction video encoding information as an encoding parameter. Here, the preset parameter information may include, but is not limited to, a norm value between feature information, a component weighted distance measure between multidimensional features, and the like. Here, the parameter information may be acquired by machine learning. Here, the generation of the encoding parameters may be achieved by, but not limited to, machine learning. The encoding parameter setting means 4 is also used to send the encoding parameters to the video encoding means 5.

In some embodiments, the video encoding device 5 may receive the video information sent by the video capture device 1 and the encoding parameters sent by the encoding parameter setting device 4. Alternatively, the video encoding device 5 may encode the video information sent by the video acquisition device 1 by using the encoding parameters sent by the encoding parameter setting device 4. Optionally, video encoding device 5 may also send the video encoded stream to machine vision device 3 for evaluation. The machine vision device 3 can further adjust parameters through the encoding parameter setting device 4 by evaluating the video encoding stream to improve the performance of the system.

The monitoring video coding and decoding device in the embodiment has the following beneficial effects: first, the video information can be captured by the video capture device. Next, the video information may be transmitted to the feature extraction device and the video encoding device. Then, the video encoding apparatus may perform encoding processing on the received video information. Thereby, data support can be provided for optimizing video information. Then, the feature extraction means may perform feature extraction processing on the video information. The feature extraction information may then be sent to a machine vision device and an encoding parameter setting device. Thereby, data support may be provided for evaluating errors of the video information. The video encoding device may then transmit the video encoded stream to the machine vision device. Therefore, the machine vision device can perform feature extraction processing on the video coding stream, evaluate the feature extraction video coding information according to the feature extraction information, and provide a reference basis for adjusting the coding parameters in the next step. And then, the coding parameter setting device determines a parameter error between the received characteristic extraction information and the evaluated characteristic extraction video coding information according to preset parameter information, and sends the parameter error to the video coding device so that the video coding device can perform parameter adjustment on the video coding stream. Therefore, the collected video information can be optimized, and the quality of the output video information of the system is improved. Thus, the performance and efficiency of the system for processing video information is improved.

With continued reference to fig. 3, a schematic structural diagram of an embodiment of the surveillance video codec device provided in the present disclosure is shown. The same as the monitoring video coding and decoding device in the embodiment of fig. 2, the monitoring video coding and decoding device in this embodiment may also include a video acquisition device 1, a feature extraction device 2, a machine vision device 3, a coding parameter setting device 4, and a video coding device 5. For a specific structural relationship, reference may be made to the related description in the embodiment of fig. 2, which is not described herein again.

Different from the monitoring video codec device in the embodiment of fig. 2, the monitoring video codec device in this embodiment further includes a storage and transmission device 6, where: the video coding device is in communication connection with the storage transmission device and is also used for sending the video coding stream to the storage transmission device; the storage and transmission device is used for receiving the video coding stream and storing and transmitting the video coding stream.

Unlike the monitoring video codec device in the embodiment of fig. 2, the video encoding device 5 in this embodiment is further configured to receive the encoding parameters sent by the encoding parameter setting device 4, and adjust the preset parameters in the video encoding device according to the encoding parameters.

Different from the monitoring video encoding and decoding apparatus in the embodiment of fig. 2, the video encoding apparatus 5 in this embodiment is further configured to perform optimization processing on the video encoding stream according to the adjusted preset parameter, and send the video encoding stream after the optimization processing to the storage and transmission apparatus 6 and the machine vision apparatus 3.

Unlike the surveillance video codec in the embodiment of fig. 2, the machine vision device 3 in this embodiment supports a neural network including a convolutional layer, a pooling layer, and a full-link layer, where: the convolution layer is used for extracting the characteristics of the video information; the pooling layer is used for performing dimensionality reduction processing on the characteristics of the extracted information; the full link layer is used to characterize the relationship between the video information and the feature extraction information.

The above embodiments of the present disclosure have the following advantages: first, the video information can be captured by the video capture device. Next, the video information may be transmitted to the feature extraction device and the video encoding device. Then, the video encoding apparatus may perform encoding processing on the received video information. Thereby, data support can be provided for optimizing video information. Then, the feature extraction means may perform feature extraction processing on the video information. The feature extraction information may then be sent to a machine vision device and an encoding parameter setting device. Thereby, data support may be provided for evaluating errors of the video information. The video encoding device may then transmit the video encoded stream to the machine vision device. Therefore, the machine vision device can perform feature extraction processing on the video coding stream, evaluate the feature extraction video coding information according to the feature extraction information, and provide a reference basis for adjusting the coding parameters in the next step. And then, the coding parameter setting device determines a parameter error between the received characteristic extraction information and the evaluated characteristic extraction video coding information according to preset parameter information, and sends the parameter error to the video coding device so that the video coding device can perform parameter adjustment on the video coding stream. Therefore, the collected video information can be optimized, and the quality of the output video information of the system is improved. Thus, the performance and efficiency of the system for processing video information is improved. The machine vision device 3 and the encoding parameter setting device 4 serve as an invention point of the present disclosure, thereby solving the technical problem mentioned in the background art, i.e. the conventional standard encoding and decoding algorithm generally defaults that a user is a video information receiver, and the system performance cannot reach the optimal state because the user is not designed and optimized for the machine vision. The factors that lead to the inability to design and optimize for machine vision tend to be as follows: the user cannot design and optimize the video information. If the factors are solved, the effects of designing and optimizing the machine vision and improving the system performance can be achieved. To achieve this, the present disclosure introduces a machine vision apparatus 3 and an encoding parameter setting apparatus 4. Here, the machine vision device 3 is introduced to evaluate the video coding stream by means of machine learning to perform preliminary feature extraction on the video information processed by the video coding device. Therefore, the characteristic information in the input video and the coded video is compared and evaluated, and the result of the preliminary evaluation can be provided to the coding parameter setting device 4 so that the coding parameter setting device 4 can adjust the coding parameters. Here, by introducing the encoding parameter setting means 4, it is possible to evaluate the feature information from the feature extraction means and the machine vision means, and calculate the error resulting from the evaluation by means of machine learning to generate the encoding parameter. And finally, sending the coding parameters to a video coding device for coding the video information. By introducing the intelligent information processing method based on machine learning into the machine vision device 3 and the coding parameter setting device 4, the video information is evaluated and coded, the problem that the video information cannot be designed and optimized is solved, and the quality of the output video information of the system is improved.

With continued reference to fig. 4, a schematic diagram of a network structure in which a machine vision device in a surveillance video codec device provided by the present disclosure supports a neural network including a convolutional layer, a pooling layer, and a full-link layer is shown.

As shown in fig. 4, the machine vision device supports a neural network comprising a convolutional layer, a pooling layer, and a fully-connected layer, wherein: the convolution layer is used for extracting the characteristics of the video information; the pooling layer is used for performing dimensionality reduction processing on the characteristics of the extracted information; the full link layer is used for representing the relation between the video information and the feature extraction information. As an example, a convolution operation may be performed as described above on the video information to generate the feature map C1 of the first layer. Pooling is performed on feature map C1, generating a second level of feature maps S2. Repeating the operations of convolution and pooling may generate more feature maps (e.g., feature map C3 and feature map S4). Performing a full join operation on the feature map may generate recognition result C5.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A surveillance video codec device, wherein the surveillance video codec device comprises: video acquisition device, feature extraction device, machine vision device, coding parameter setting device, video coding device, wherein:

the video acquisition device is used for acquiring video information in a monitoring range and sending the video information to the feature extraction device and the video coding device;

the feature extraction device is in communication connection with the video acquisition device, and is used for receiving video information sent by the video acquisition device and performing feature extraction processing on the video information to generate feature extraction information;

the machine vision device is in communication connection with the feature extraction device, and is used for receiving feature extraction information sent by the feature extraction device;

the coding parameter setting device is in communication connection with the feature extraction device, wherein the coding parameter setting device is used for receiving feature extraction information sent by the feature extraction device;

the video coding device is in communication connection with the video acquisition device, wherein the video coding device is used for receiving video information acquired by the video acquisition device and performing coding processing on the video information to generate a video coding stream.

2. The surveillance video codec device according to claim 1, further comprising a storage transmission device, wherein:

the video coding device is in communication connection with the storage transmission device, and is further used for sending the video coding stream to the storage transmission device;

the storage and transmission device is used for receiving the video coding stream and storing and transmitting the video coding stream.

3. The surveillance video codec of claim 2, wherein the feature extraction means is further configured to send the feature extraction information to the machine vision device and the encoding parameter setting means, and wherein the video encoding means is communicatively coupled to the machine vision device and configured to send the video encoded stream to the machine vision device.

4. The surveillance video codec of claim 3, wherein the machine vision device is further configured to receive a video encoded stream sent by the video encoding device, and perform feature extraction processing on the video encoded stream to generate feature extracted video encoded information.

5. The surveillance video codec of claim 4, wherein the machine vision device is further configured to perform an evaluation process on the generated feature extraction video coding information according to a preset feature parameter to generate evaluated feature extraction video coding information.

6. A surveillance video codec according to claim 5 wherein said machine vision device is communicatively coupled to said encoding parameter setting device for transmitting said evaluated feature-extracted video encoding information to said encoding parameter setting device.

7. The surveillance video codec of claim 6, wherein the encoding parameter setting means is further configured to receive the evaluated feature extraction video encoding information sent by the machine vision device, and determine a parameter error between the feature extraction information and the evaluated feature extraction video encoding information as an encoding parameter according to preset parameter information.

8. The surveillance video codec device according to claim 7, wherein the encoding parameter setting device is communicatively connected to the video encoding device, and is configured to send the encoding parameters to the video encoding device;

the video coding device is also used for receiving the coding parameters sent by the coding parameter setting device and adjusting the preset parameters in the video coding device according to the coding parameters.

9. The surveillance video codec of claim 8, wherein the video encoder is further configured to perform an optimization process on the video encoded stream according to the adjusted preset parameters, and send the optimized video encoded stream to the storage and transmission device and the machine vision device.

10. The surveillance video codec device according to one of claims 1 to 9, wherein the machine vision device supports a neural network comprising a convolutional layer, a pooling layer, and a fully-connected layer, wherein:

the convolutional layer is used for extracting the characteristics of the video information;

the pooling layer is used for performing dimensionality reduction processing on the characteristics of the extracted information;

the full link layer is used for representing the relation between the video information and the feature extraction information.