CN112351252B

CN112351252B - Monitoring video coding and decoding device

Info

Publication number: CN112351252B
Application number: CN202011162454.0A
Authority: CN
Inventors: 张韵东; 昝劲文
Original assignee: Chongqing Zhongxing Micro Artificial Intelligence Chip Technology Co ltd
Current assignee: Chongqing Zhongxing Micro Artificial Intelligence Chip Technology Co ltd
Priority date: 2020-10-27
Filing date: 2020-10-27
Publication date: 2023-10-20
Anticipated expiration: 2040-10-27
Also published as: CN112351252A

Abstract

The embodiment of the disclosure discloses a monitoring video encoding and decoding device. One embodiment of the device comprises: video acquisition device, feature extraction device, machine vision device, coding parameter setting device, video encoding device, wherein: the video acquisition device is used for acquiring video information in the monitoring range and transmitting the video information to the feature extraction device and the video coding device; the feature extraction device is used for receiving the video information sent by the video acquisition device and carrying out feature extraction processing on the video information to generate feature extraction information; the machine vision device is used for receiving the feature extraction information sent by the feature extraction device; the coding parameter setting device is used for receiving the characteristic extraction information sent by the characteristic extraction device; the video coding device is used for receiving the video information acquired by the video acquisition device and coding the video information to generate a video coding stream. This embodiment improves the performance and efficiency of the system in processing video information.

Description

Monitoring video coding and decoding device

Technical Field

The embodiment of the disclosure relates to the technical field of information transmission, in particular to a monitoring video encoding and decoding device.

Background

As machine vision technology has made breakthrough progress in recent years, the prominence of various visual tasks has also accelerated its popularity in a large number of application fields, including video surveillance. The user can view and interpret the video data through the machine.

However, the following technical problems generally exist for users to view and interpret video data through machines:

firstly, the existing video coding and decoding standard algorithm usually takes human spectators as content receivers, and the evaluation of video coding is inaccurate, so that when the existing video coding and decoding standard algorithm is directly applied to a video monitoring system based on machine vision processing, the performance of the system on video coding and decoding is reduced;

second, conventional standard codec algorithms generally default to the video information receiver and are not designed and optimized for machine vision, resulting in a system performance that is not optimal.

Disclosure of Invention

The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose a surveillance video codec device to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a surveillance video codec device, including: video acquisition device, feature extraction device, machine vision device, coding parameter setting device, video encoding device, wherein: the video acquisition device is used for acquiring video information in a monitoring range and transmitting the video information to the feature extraction device and the video coding device; the feature extraction device is in communication connection with the video acquisition device, and is used for receiving video information sent by the video acquisition device and carrying out feature extraction processing on the video information to generate feature extraction information; the machine vision device is in communication connection with the feature extraction device, wherein the machine vision device is used for receiving the feature extraction information sent by the feature extraction device; the coding parameter setting device is in communication connection with the feature extraction device, wherein the coding parameter setting device is used for receiving the feature extraction information sent by the feature extraction device; the video encoding device is in communication connection with the video acquisition device, wherein the video encoding device is used for receiving video information acquired by the video acquisition device and encoding the video information to generate a video encoding stream.

In some embodiments, the above-mentioned monitoring video codec device further includes a storage transmission device, wherein: the video encoding device is in communication connection with the storage transmission device, and is further used for transmitting the video encoding stream to the storage transmission device; the storage and transmission device is used for receiving the video coding stream and storing and transmitting the video coding stream.

In some embodiments, the feature extraction device is further configured to send the feature extraction information to the machine vision device and the encoding parameter setting device, where the video encoding device is communicatively connected to the machine vision device and is configured to send the video encoded stream to the machine vision device.

In some embodiments, the machine vision device is further configured to receive a video encoded stream sent by the video encoding device, and perform feature extraction processing on the video encoded stream to generate feature extracted video encoding information.

In some embodiments, the machine vision device is further configured to evaluate the generated feature extraction video coding information according to a preset feature parameter to generate evaluated feature extraction video coding information.

In some embodiments, the machine vision device is communicatively connected to the encoding parameter setting device, and is configured to send the feature extraction video encoding information after the evaluation to the encoding parameter setting device.

In some embodiments, the encoding parameter setting device is further configured to receive the estimated feature extraction video encoding information sent by the machine vision device, and determine, as the encoding parameter, a parameter error between the feature extraction information and the estimated feature extraction video encoding information according to the parameter information set in advance.

In some embodiments, the encoding parameter setting device is communicatively coupled to the video encoding device, and the encoding parameter setting device is configured to send the encoding parameter to the video encoding device.

In some embodiments, the video encoding device is further configured to receive the encoding parameter sent by the encoding parameter setting device, and adjust a preset parameter in the video encoding device according to the encoding parameter.

In some embodiments, the video encoding device is further configured to perform optimization processing on the video encoding stream according to the adjusted preset parameter, and send the optimized video encoding stream to the storage and transmission device and the machine vision device.

In some embodiments, the machine vision device supports a neural network comprising a convolutional layer, a pooling layer, and a fully-connected layer, wherein: the convolution layer is used for extracting the characteristics of video information; the pooling layer is used for carrying out dimension reduction treatment on the characteristics of the extracted information; the full connection layer is used for representing the relation between the video information and the feature extraction information.

The above embodiments of the present disclosure have the following advantageous effects: first, the video information may be acquired by a video acquisition device. Next, the video information may be transmitted to the feature extraction device and the video encoding device. The video encoding device may then encode the received video information. Thus, data support can be provided for optimizing video information. Then, the feature extraction means may perform feature extraction processing on the video information. The feature extraction information may then be sent to a machine vision device and an encoding parameter setting device. Thus, data support can be provided for evaluating errors of video information. The video encoding device may then send the video encoded stream to the machine vision device. Therefore, the machine vision device can perform feature extraction processing on the video coding stream, evaluate the feature extraction video coding information according to the feature extraction information, and provide a reference basis for the next step of adjusting the coding parameters. Then, the coding parameter setting device determines parameter errors between the received feature extraction information and the estimated feature extraction video coding information according to preset parameter information, and sends the parameter errors to the video coding device so as to enable the video coding device to carry out parameter adjustment on the video coding stream. Therefore, the collected video information can be optimized, and the quality of the output video information of the system is improved. Thus, the performance and efficiency of the system for processing video information are improved.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is an exemplary system architecture diagram of a surveillance video codec device according to some embodiments of the present disclosure;

FIG. 2 is a schematic diagram of one embodiment of a surveillance video codec device according to some embodiments of the present disclosure;

fig. 3 is a schematic structural view of yet another embodiment of a surveillance video codec device according to some embodiments of the present disclosure;

fig. 4 is a network architecture diagram of a machine vision device in a surveillance video codec device supporting a neural network including a convolutional layer, a pooling layer, and a fully-connected layer, according to some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 for monitoring video codec devices according to an embodiment of the present disclosure.

Referring to fig. 1, a system architecture 100 may include a video acquisition device 101, a feature extraction device 102, a machine vision device 103, an encoding parameter setting device 104, a video encoding device 105, a storage transmission device 106, a network 107, a network 108, a network 109, a network 110, and a network 111. The network 107 is used as a medium to provide a communication link between the video capture device 101 and the feature extraction device 102. The network 108 is used as a medium to provide a communication link between the feature extraction device 102 and the machine vision device 103. Network 109 is the medium used to provide a communication link between video capture device 101 and video encoding device 105. The network 110 is used as a medium to provide a communication link between the feature extraction device 102 and the machine vision device 103. The network 110 also serves as a medium for providing a communication link between the machine vision device 103 and the video encoding device 105. The network 110 is also used to encode media that provides a communication link between the parameter setting device 104 and the video encoding device 105. The network 111 serves as a medium for providing a communication link between the video encoding device 105 and the storage transmission device 106. Network 107, network 108, network 109, network 110, and network 111 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The video capture device 101 may interact with the feature extraction device 102 over the network 107 to transmit video information. The video capture device 101 may be a variety of video information capture devices including, but not limited to, optical image sensors, infrared sensors, electronic radar, and the like.

The video capture device 101 may interact with the video encoding device 105 over the network 109 to transmit video information. The video encoding device 105 may include, but is not limited to, an audio video compression codec chip, an information input channel, an information output channel, a network interface, an audio video interface, a protocol interface control, a serial communication interface, embedded software, and the like.

The feature extraction means 102 may interact with the encoding parameter setting means 104 via the network 108 to send feature extraction information. The feature extraction device 102 may support a neural network that includes a convolutional layer, a pooling layer, and a fully-connected layer, at least one of the following.

The feature extraction device 102 may interact with the machine vision device 103 over the network 110 to send feature extraction information. The machine vision device 103 may support a neural network that includes a convolutional layer, a pooled layer, and a fully-connected layer.

The machine vision device 103 may interact with the encoding parameter setting device 104 via the network 110 to send the evaluated feature extraction video encoding information.

The video encoding device 105 may interact with the machine vision device 103 over the network 110 to transmit the video encoded stream.

The encoding parameter setting apparatus 104 may interact with the video encoding apparatus 105 via the network 110 to transmit encoding parameters. The encoding parameter setting apparatus 104 may support an evaluation manner of machine learning to perform error evaluation on the feature extraction information and the evaluated feature extraction video encoding information.

The video encoding device 105 may interact with the storage transmission device 106 over the network 111 to transmit the video encoded stream. The storage and transmission device 106 may be various electronic devices including: one or more processors; and a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to perform the functions as described in the first aspect.

It should be understood that the numbers of video acquisition devices, feature extraction devices, machine vision devices, encoding parameter setting devices, video encoding devices, storage transmission devices, and networks in fig. 1 are merely illustrative. There may be any number of video acquisition devices, feature extraction devices, machine vision devices, encoding parameter setting devices, video encoding devices, storage transmission devices, and networks, as desired for implementation.

With continued reference to fig. 2, a schematic structural diagram of one embodiment of a surveillance video codec device provided by the present disclosure is shown. As shown in fig. 2, the surveillance video codec device of the present embodiment may include: a video acquisition device 1, a feature extraction device 2, a machine vision device 3, a coding parameter setting device 4 and a video coding device 5.

In some embodiments, the video capture device 1 may be a variety of video information capture devices including, but not limited to, optical image sensors, infrared sensors, electronic radar, and the like. Here, the video capturing apparatus 1 may be used to capture video information within a monitoring range and transmit the video information to the feature extraction apparatus 2 and the video encoding apparatus 5. Here, the video information may be a video signal.

In some embodiments, the feature extraction device 2 may receive the video information sent by the video capture device 1, and perform feature extraction processing on the video information to generate feature extraction information. The above-described feature extraction means 2 is also used to send feature extraction information to the machine vision device 3 and the encoding parameter setting means 4. Here, the feature extraction information includes, but is not limited to, spatial information, temporal information, color information, character information, vehicle information, motion information, and the like. Here, the feature extraction apparatus 2 may support an extraction method using machine learning.

In some embodiments, the machine vision device 3 may receive the video encoded stream sent by the video encoding device 5. Here, a video encoding stream may refer to an encoded video signal. The machine vision device 3 is further configured to perform a feature extraction operation on the video encoded stream, and to transmit the extracted feature information as an input signal to the encoding parameter setting device 4. Here, the feature information includes, but is not limited to, spatial information of video, time information, color information, character information, vehicle information, motion information, and the like. Here, the machine vision device 3 may perform a feature extraction operation on the video encoding stream in a manner supporting, but not limited to, machine learning.

In some embodiments, the above-described encoding parameter setting device 4 may receive the feature extraction information transmitted by the feature extraction device 2 and the estimated feature extraction video encoding information transmitted by the machine vision device 3. Alternatively, the encoding parameter setting device 4 may also determine, as the encoding parameter, a parameter error between the feature extraction information and the estimated feature extraction video encoding information according to the parameter information set in advance. Here, the preset parameter information may include, but is not limited to, a norm value between feature information, a component weighted distance measure between multidimensional features, and the like. Here, the parameter information may be acquired by means of machine learning. Here, the generation of the encoding parameters may be achieved by, but not limited to, machine learning. The encoding parameter setting means 4 are also arranged to send the encoding parameters to the video encoding means 5.

In some embodiments, the video encoding device 5 may receive the video information sent by the video capturing device 1 and the encoding parameters sent by the encoding parameter setting device 4. Alternatively, the video encoding device 5 may encode the video information transmitted from the video acquisition device 1 by the encoding parameters transmitted from the encoding parameter setting device 4. Optionally, the video encoding device 5 may also send the video encoded stream to the machine vision device 3 for evaluation. The machine vision device 3 can further adjust parameters by the encoding parameter setting device 4 by evaluating the video encoding stream to improve the performance of the system.

The monitoring video encoding and decoding device in the embodiment has the following beneficial effects: first, the video information may be acquired by a video acquisition device. Next, the video information may be transmitted to the feature extraction device and the video encoding device. The video encoding device may then encode the received video information. Thus, data support can be provided for optimizing video information. Then, the feature extraction means may perform feature extraction processing on the video information. The feature extraction information may then be sent to a machine vision device and an encoding parameter setting device. Thus, data support can be provided for evaluating errors of video information. The video encoding device may then send the video encoded stream to the machine vision device. Therefore, the machine vision device can perform feature extraction processing on the video coding stream, evaluate the feature extraction video coding information according to the feature extraction information, and provide a reference basis for the next step of adjusting the coding parameters. Then, the coding parameter setting device determines parameter errors between the received feature extraction information and the estimated feature extraction video coding information according to preset parameter information, and sends the parameter errors to the video coding device so as to enable the video coding device to carry out parameter adjustment on the video coding stream. Therefore, the collected video information can be optimized, and the quality of the output video information of the system is improved. Thus, the performance and efficiency of the system for processing video information are improved.

With continued reference to fig. 3, a schematic structural diagram of one embodiment of a surveillance video codec device provided by the present disclosure is shown. As with the monitoring video codec device in the embodiment of fig. 2, the monitoring video codec device in the embodiment may also include a video acquisition device 1, a feature extraction device 2, a machine vision device 3, an encoding parameter setting device 4, and a video encoding device 5. The specific structural relationship may be referred to in the embodiment of fig. 2, and will not be described herein.

Unlike the surveillance video codec device in the embodiment of fig. 2, the surveillance video codec device in the present embodiment further includes a storage transmission device 6, in which: the video coding device is in communication connection with the storage transmission device and is also used for sending the video coding stream to the storage transmission device; the storage and transmission device is used for receiving the video coding stream and storing and transmitting the video coding stream.

Unlike the surveillance video codec device in the embodiment of fig. 2, the video encoding device 5 in the embodiment is further configured to receive the encoding parameters sent by the encoding parameter setting device 4, and adjust preset parameters in the video encoding device according to the encoding parameters.

Unlike the surveillance video codec device in the embodiment of fig. 2, the video encoding device 5 in the present embodiment is further configured to perform optimization processing on the video encoded stream according to the adjusted preset parameter, and send the optimized video encoded stream to the storage transmission device 6 and the machine vision device 3.

Unlike the surveillance video codec in the embodiment of fig. 2, the machine vision apparatus 3 in the present embodiment supports a neural network including a convolutional layer, a pooling layer, and a full-link layer, in which: the convolution layer is used for extracting the characteristics of the video information; the pooling layer is used for carrying out dimension reduction treatment on the characteristics of the extracted information; the full connection layer is used for representing the relation between the video information and the feature extraction information.

The above embodiments of the present disclosure have the following advantageous effects: first, the video information may be acquired by a video acquisition device. Next, the video information may be transmitted to the feature extraction device and the video encoding device. The video encoding device may then encode the received video information. Thus, data support can be provided for optimizing video information. Then, the feature extraction means may perform feature extraction processing on the video information. The feature extraction information may then be sent to a machine vision device and an encoding parameter setting device. Thus, data support can be provided for evaluating errors of video information. The video encoding device may then send the video encoded stream to the machine vision device. Therefore, the machine vision device can perform feature extraction processing on the video coding stream, evaluate the feature extraction video coding information according to the feature extraction information, and provide a reference basis for the next step of adjusting the coding parameters. Then, the coding parameter setting device determines parameter errors between the received feature extraction information and the estimated feature extraction video coding information according to preset parameter information, and sends the parameter errors to the video coding device so as to enable the video coding device to carry out parameter adjustment on the video coding stream. Therefore, the collected video information can be optimized, and the quality of the output video information of the system is improved. Thus, the performance and efficiency of the system for processing video information are improved. The machine vision device 3 and the encoding parameter setting device 4 are used as an application point of the present disclosure, thereby solving the technical problems mentioned in the background art, namely, two "the conventional standard codec algorithm is generally default to the video information receiver, and is not designed and optimized for machine vision, so that the system performance cannot reach the optimal state". Factors that lead to the inability to design and optimize for machine vision tend to be as follows: the user cannot design and optimize the video information. If the above factors are solved, the effects of designing and optimizing the machine vision and improving the system performance can be achieved. To achieve this effect, the present disclosure introduces a machine vision device 3 and an encoding parameter setting device 4. Here, the machine vision device 3 is introduced to evaluate the video encoding stream by means of machine learning to perform preliminary feature extraction of the video information processed by the video encoding device. Thus, the feature information in the input video and the encoded video is compared and evaluated, and the result of the preliminary evaluation can be supplied to the encoding parameter setting device 4 for the encoding parameter setting device 4 to adjust the encoding parameter. Here, the encoding parameter setting means 4 is introduced, and it is possible to evaluate the feature information from the feature extraction means and the machine vision means and calculate an error generated by the evaluation by means of machine learning to generate the encoding parameter. Finally, the encoding parameters are sent to a video encoding device for encoding the video information. By introducing an intelligent information processing method based on machine learning to the machine vision device 3 and the coding parameter setting device 4 to evaluate and code the video information, the problem that the video information cannot be designed and optimized is solved, and the quality of the output video information of the system is improved.

With continued reference to fig. 4, a schematic diagram of a network structure of a machine vision device supporting a neural network including a convolutional layer, a pooling layer, and a fully-connected layer in a surveillance video codec device provided by the present disclosure is shown.

As shown in fig. 4, the machine vision device supports a neural network comprising a convolutional layer, a pooling layer, and a fully-connected layer, wherein: the convolution layer is used for extracting the characteristics of the video information; the pooling layer is used for carrying out dimension reduction treatment on the characteristics of the extracted information; the full connection layer is used for representing the relation between the video information and the feature extraction information. As an example, a convolution operation may be performed on the video information as described above, generating a feature map C1 of the first layer. A pooling operation is performed on the feature map C1, generating a feature map S2 of the second layer. Repeating the operations of rolling and pooling may generate more feature maps (e.g., feature map C3 and feature map S4). Performing a full join operation on the feature map may generate a recognition result C5.

The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the application referred to in the present application is not limited to the specific combinations of the technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the inventive concept described above. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.

Claims

1. A surveillance video codec apparatus, wherein the surveillance video codec apparatus comprises: video acquisition device, feature extraction device, machine vision device, coding parameter setting device, video encoding device, wherein:

the video acquisition device is used for acquiring video information in a monitoring range and sending the video information to the feature extraction device and the video coding device;

the feature extraction device is in communication connection with the video acquisition device, and is used for receiving video information sent by the video acquisition device and carrying out feature extraction processing on the video information to generate feature extraction information;

the machine vision device is in communication connection with the feature extraction device, wherein the machine vision device is used for receiving feature extraction information sent by the feature extraction device;

the coding parameter setting device is in communication connection with the feature extraction device, wherein the coding parameter setting device is used for receiving feature extraction information sent by the feature extraction device;

the video coding device is in communication connection with the video acquisition device, and is used for receiving video information acquired by the video acquisition device and performing coding processing on the video information to generate a video coding stream;

the monitoring video encoding and decoding device further comprises a storage transmission device, wherein:

the video coding device is in communication connection with the storage transmission device, and is further used for sending the video coding stream to the storage transmission device;

the storage and transmission device is used for receiving the video coding stream and storing and transmitting the video coding stream;

the feature extraction device is further configured to send the feature extraction information to the machine vision device and the encoding parameter setting device, where the video encoding device is communicatively connected to the machine vision device, and is configured to send the video encoding stream to the machine vision device;

the machine vision device is also used for receiving the video coding stream sent by the video coding device and carrying out feature extraction processing on the video coding stream to generate feature extraction video coding information;

the machine vision device is further used for evaluating the generated feature extraction video coding information according to preset feature parameters so as to generate evaluated feature extraction video coding information;

the machine vision device is in communication connection with the coding parameter setting device and is used for sending the video coding information extracted by the evaluated characteristics to the coding parameter setting device;

the coding parameter setting device is further used for receiving the evaluated feature extraction video coding information sent by the machine vision device and determining parameter errors between the feature extraction information and the evaluated feature extraction video coding information as coding parameters according to preset parameter information, wherein the preset parameter information comprises norm values among the feature information and component weighted distance measures among multidimensional features;

the coding parameter setting device is in communication connection with the video coding device and is used for sending the coding parameters to the video coding device;

the video coding device is also used for receiving the coding parameters sent by the coding parameter setting device and adjusting preset parameters in the video coding device according to the coding parameters;

the video coding device is also used for carrying out optimization processing on the video coding stream according to the adjusted preset parameters and sending the video coding stream after the optimization processing to the storage transmission device and the machine vision device;

the machine vision device supports a neural network comprising a convolutional layer, a pooling layer, and a fully-connected layer, wherein: the convolution layer is used for extracting the characteristics of video information; the pooling layer is used for carrying out dimension reduction treatment on the characteristics of the extracted information; the full connection layer is used for representing the relation between the video information and the feature extraction information.