CN114445735A

CN114445735A - Vehicle-end multi-channel video stream reasoning analysis method and system

Info

Publication number: CN114445735A
Application number: CN202111622296.7A
Authority: CN
Inventors: 肖朝穗; 常思垚; 李汉玢
Original assignee: Heading Data Intelligence Co Ltd
Current assignee: Heading Data Intelligence Co Ltd
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-05-06

Abstract

The invention provides a vehicle-end multi-channel video stream reasoning analysis method and a system, wherein the method comprises the following steps: training the constructed target detection network to generate a target detection model based on image data perceived by a vehicle end, embedding the target detection model into a deep stream frame, and deploying the target detection model into AGX equipment; and inputting the multi-channel video streams of the vehicle end into a deep stream frame embedded with a target detection model, and acquiring a detection target result in the output multi-channel video streams. According to the method, the target detection model based on the deep stream framework is built, real-time reasoning analysis can be performed on the video stream acquired by the multi-channel camera at the vehicle end, development time of front and back processing can be reduced for a development team, and the time of the whole process from image data capturing to information obtaining and feedback is reduced.

Description

Vehicle-end multi-channel video stream reasoning analysis method and system

Technical Field

The invention relates to the field of data reasoning and image recognition, in particular to a vehicle-end multi-channel video stream reasoning analysis method and system.

Background

In the field of image recognition, the reasoning analysis processing time of a plurality of paths of video streams is used as an important factor in vehicle-end sensing processing, and the whole end-to-end processing time is important. At present, both ends (preprocessing and post-processing) of a general visual AI application and model optimization are deployed by a development team in person, so that the development team consumes a lot of time on program deployment and the processing time from input to output cannot be optimal.

Disclosure of Invention

The invention provides a vehicle-end multi-channel video stream reasoning analysis method and a vehicle-end multi-channel video stream reasoning analysis system aiming at the technical problems in the prior art, and aims to solve the problems of development time of an existing development team and time for simultaneously carrying out real-time reasoning analysis on multi-channel video streams.

According to a first aspect of the present invention, a vehicle-end multi-channel video stream inference analysis method is provided, including: training the constructed target detection network to generate a target detection model based on the image data sensed by the vehicle end; embedding a target detection model into a deep stream framework and deploying the target detection model into AGX equipment; and inputting the multi-channel video streams of the vehicle end into a deep stream frame embedded with a target detection model, and acquiring a detection target result in the output multi-channel video streams.

On the basis of the technical scheme, the invention can be improved as follows.

Optionally, the training the constructed target detection network based on the image data perceived by the vehicle end to generate the target detection model includes: acquiring images of different types of lanes at a vehicle end to obtain corresponding image data, and marking target information in each image data to form a training data set; training the constructed target detection network based on the training data set to generate a target detection model, wherein the target detection network is a lightweight yolov5s network.

Optionally, each piece of image data in the training data set has a size of 1280 × 1280 px.

Optionally, the embedding the target detection model into the DeepStream framework and deploying the target detection model into the AGX device includes: after the constructed target detection network is trained, acquiring a format model file corresponding to the training; converting the format model file into a format required by a DeepStream framework on an AGX device; and deploying the converted format model file to a deep stream framework of the AGX equipment so as to embed the target detection model into the deep stream framework.

Optionally, the corresponding format model file is a format model file of FP32best.pt, and the converting the format model file into a format required by the DeepStream framework on the AGX device includes: and converting the format model file of the FP32best.pt into the format model file of the FP16best.engine by using a TensrT conversion model plug-in carried by the DeepStream framework.

Optionally, the AGX device is deployed with multiple threads, and correspondingly, the multiple channels of video streams at the vehicle end are input into a DeepStream frame embedded with a target detection model, and the obtaining of the detection target result in the output multiple channels of video streams includes: the method comprises the steps that multiple paths of input video streams are processed in parallel on the basis of multiple threads on AGX equipment, and a target detection result in each path of video stream is obtained; and integrating and outputting the target detection results of the multiple paths of video streams.

According to a second aspect of the present invention, there is provided a vehicle-end multichannel video stream reasoning analysis system, including: the training module is used for training the constructed target detection network to generate a target detection model based on the image data sensed by the vehicle end; the deployment module is used for embedding the target detection model into a DeepStream framework and deploying the target detection model into the AGX equipment; and the acquisition module is used for inputting the multipath video streams of the vehicle end into a deep stream frame embedded with a target detection model and acquiring detection target results in the output multipath video streams.

Optionally, the deploying module is configured to deploy a target detection model to the DeepStream framework and to the AGX device, and includes: after the constructed target detection network is trained, acquiring a format model file corresponding to the training; converting the format model file into a format required by a DeepStream framework on an AGX device; and deploying the converted format model file to a deep stream framework of the AGX equipment so as to embed the target detection model into the deep stream framework.

According to a third aspect of the present invention, there is provided an electronic device comprising a memory, and a processor for implementing the steps of the car-end multi-channel video stream inference analysis method when executing a computer management-like program stored in the memory.

According to a fourth aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer management-like program, which when executed by a processor, performs the steps of the method for inferential analysis of a dilute multi-pass video stream.

The invention provides a vehicle-end multi-channel video stream reasoning analysis method and a vehicle-end multi-channel video stream reasoning analysis system, wherein a constructed target detection network is trained to generate a target detection model, and the target detection model is embedded into a deep stream frame and deployed into AGX equipment; and inputting the multi-channel video streams of the vehicle end into a deep stream frame embedded with a target detection model, and acquiring a detection target result in the output multi-channel video streams. By constructing a target detection model based on a deep stream frame, real-time reasoning analysis can be performed on video streams acquired by multiple cameras at a vehicle end, development time of front and back processing can be reduced for a development team, and the time from image data capturing to information obtaining and feedback processes is reduced.

Drawings

FIG. 1 is a flow chart of a vehicle-side multi-channel video stream inference analysis method provided by the invention;

FIG. 2 is a schematic diagram of a flow of a vehicle-side multi-channel video stream inference analysis method;

FIG. 3 is a schematic structural diagram of a vehicle-side multi-channel video stream inference analysis system provided by the present invention;

FIG. 4 is a schematic diagram of a hardware structure of a possible electronic device provided in the present invention;

fig. 5 is a schematic diagram of a hardware structure of a possible computer-readable storage medium according to the present invention.

Detailed Description

The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

At present, both ends (preprocessing and post-processing) of a general visual AI application program and model optimization are deployed by a development team in person, the development team can consume a large amount of time on program deployment, the processing time from input to output cannot be optimal, and a DeepStream framework can greatly improve the speed of forward and backward processing and intermediate model inference analysis in the whole link, so that vehicle-end multi-channel video stream real-time inference analysis based on the DeepStream framework is a very suitable method for deploying vehicle-end perception processing.

Example one

A reasoning analysis method for multi-channel video streams at a vehicle end mainly comprises the following steps as shown in figure 1:

and S1, training the constructed target detection network to generate a target detection model based on the image data sensed by the vehicle end.

As an embodiment, the training of the constructed target detection network based on the image data perceived by the vehicle end to generate the target detection model includes: acquiring images of different types of lanes at a vehicle end to obtain corresponding image data, and marking target information in each image data to form a training data set; training the constructed target detection network based on the training data set to generate a target detection model, wherein the target detection network is a lightweight yolov5s network.

Specifically, the vehicle-end-based camera acquires images of different types of lanes to obtain a plurality of pieces of image data, and marks target information in each piece of image data, including the type and position information of each target in the image, such as the type and position of each vehicle in the image data. And training the constructed target detection network by using the image data marked with the target information as a training data set to generate a target detection model.

When the target detection network is trained, a lightweight network is needed for realizing the target detection model deployment of the vehicle end, so the target detection model of the vehicle end is obtained by training with yolov5s with the minimum network model. The training data used in the present invention are 1280 x 1280px size images and labeled xml information.

S2, embedding the target detection model into a DeepStream framework and deploying the target detection model into an AGX device.

As an embodiment, the embedding the target detection model into the DeepStream framework and deploying the target detection model into the AGX device includes: after the constructed target detection network is trained, acquiring a format model file corresponding to the training; converting the format model file into a format required by a DeepStream framework on an AGX device; and deploying the converted format model file to a deep stream framework of the AGX equipment so as to embed the target detection model into the deep stream framework.

The method for converting the format model file into the format required by the deep stream framework on the AGX device includes: and converting the format model file of the FP32best.pt into the format model file of the FP16best.engine by using a TensrT conversion model plug-in carried by the DeepStream framework.

Specifically, the DeepStream framework is provided with a TensorRT conversion model plug-in, but when the DeepStream framework runs, a large amount of time is consumed for converting the model, and the extremely high requirements of the equipment on the parameter size of the network model are considered, so that the resource loss is reduced to the minimum. In the invention, FP16 can already meet the real-time requirement of vehicle-end target detection, while INT8 (integer) sacrifices certain precision although the model parameters are smaller than FP16, the memory damage is smaller and the reasoning speed is faster, so the FP16 quantization mode is finally adopted.

And converting the format model file of FP32best.pt into the format model file of FP16best.engine based on a TensorRT conversion model plug-in on the AGX equipment.

And S3, inputting the vehicle-end multi-channel video streams into a DeepStream frame embedded with a target detection model, and acquiring detection target results in the output multi-channel video streams.

As an embodiment, the AGX device is deployed with a plurality of threads, and correspondingly, the method for inputting the multiple video streams at the vehicle end into the DeepStream framework embedded with the target detection model and obtaining the detection target result in the multiple output video streams includes: the method comprises the steps that multiple paths of input video streams are processed in parallel on the basis of multiple threads on AGX equipment, and a target detection result in each path of video stream is obtained; and integrating and outputting the target detection results of the multiple paths of video streams.

Specifically, in the step S2, the trained target detection model is embedded in the depstream frame and deployed on the device to be AGX, and in this step, the multiple video streams shot by the vehicle end are input to the depstream frame in which the target detection model is embedded, so as to obtain the detection target result in the output multiple video streams.

The whole process of vehicle-end perception adopts a DeepStream framework and a neural network target detection model based on yolov5, for the input of video streams captured by a plurality of cameras, a plurality of threads execute in parallel, a plurality of video streams are transmitted to a GPU HW hardware decoder, inference analysis tasks are processed in batch through a TensorRT inference engine, and then video information after inference analysis is packaged and output and fed back, so that development time of preprocessing before inference analysis is shortened for developers, the method is higher in processing efficiency (1-3 ms faster than the traditional end-to-end processing time) compared with the traditional end-to-end processing, and the method has good performance in real-time.

Through the description of the steps, development time can be saved for developers, and real-time vehicle-end target detection efficiency on AGX equipment can be improved.

Example two

A vehicle end multi-channel video stream reasoning analysis method is disclosed, referring to fig. 2, the reasoning analysis method comprises the steps of firstly collecting and labeling vehicle end target image data; then, sending the prepared data into a yolov5s network model for training, converting the trained model file into a format required by deep stream on vehicle-end equipment AGX, and deploying the model file on the equipment; then, decoding and reading image data captured by a plurality of paths of cameras at the vehicle end through a deep stream frame and realizing inference analysis; and finally, integrating the inferred information and feeding back the information in real time. Through the steps, development time of front and back processing can be reduced for a development team, the whole time from image data capturing to information obtaining and feedback processes is reduced, and meanwhile, developers are more attentive to developing core knowledge of vehicle-end sensing application.

EXAMPLE III

Referring to fig. 3, the inference analysis system for vehicle-end multi-channel video streams includes a training module 301, a deployment module 302, and an acquisition module 303.

The training module 301 is configured to train the constructed target detection network to generate a target detection model based on image data perceived by the vehicle end;

a deployment module 302, configured to embed the target detection model into a DeepStream framework and deploy the target detection model into an AGX device;

the obtaining module 303 is configured to input the multiple channels of video streams at the vehicle end into a DeepStream frame embedded with a target detection model, and obtain a detection target result in the output multiple channels of video streams.

It can be understood that the vehicle-end multi-channel video stream inference analysis system provided by the present invention corresponds to the vehicle-end multi-channel video stream inference analysis method provided in the foregoing embodiments, and the relevant technical features of the vehicle-end multi-channel video stream inference analysis system may refer to the relevant technical features of the vehicle-end multi-channel video stream inference analysis method, and are not described herein again.

Example four

Referring to fig. 4, fig. 4 is a schematic view of an embodiment of an electronic device according to an embodiment of the invention. As shown in fig. 4, an embodiment of the present invention provides an electronic device 400, which includes a memory 410, a processor 420, and a computer program 411 stored on the memory 410 and running on the processor 420, and when the processor 420 executes the computer program 411, the following steps are implemented: training the constructed target detection network to generate a target detection model based on the image data sensed by the vehicle end; embedding a target detection model into a deep stream framework and deploying the target detection model into AGX equipment; and inputting the multi-channel video streams of the vehicle end into a deep stream frame embedded with a target detection model, and acquiring a detection target result in the output multi-channel video streams.

EXAMPLE five

Referring to fig. 5, fig. 5 is a schematic diagram of an embodiment of a computer-readable storage medium according to the present invention. As shown in fig. 5, the present embodiment provides a computer-readable storage medium 500 having a computer program 511 stored thereon, the computer program 511 implementing the following steps when executed by a processor: training the constructed target detection network to generate a target detection model based on the image data sensed by the vehicle end; embedding a target detection model into a deep stream framework and deploying the target detection model into AGX equipment; and inputting the multi-channel video streams of the vehicle end into a deep stream frame embedded with a target detection model, and acquiring a detection target result in the output multi-channel video streams.

The embodiment of the invention provides a vehicle-end multi-channel video stream reasoning analysis method and a vehicle-end multi-channel video stream reasoning analysis system, which are characterized by firstly acquiring and labeling vehicle-end target image data; then, sending the prepared data into a yolov5s network model for training, converting the trained model file into a format required by deep stream on vehicle-end equipment AGX, and deploying the model file on the equipment; then, decoding and reading image data captured by a plurality of paths of cameras at the vehicle end through a deep stream frame and realizing inference analysis; and finally, integrating the inferred information and feeding back the information in real time. Through the steps, development time of front and back processing can be reduced for a development team, the whole time from image data capturing to information obtaining and feedback processes is reduced, and meanwhile, developers are more attentive to developing core knowledge of vehicle-end sensing application.

It should be noted that, in the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to relevant descriptions of other embodiments for parts that are not described in detail in a certain embodiment.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A vehicle-end multi-channel video stream reasoning analysis method is characterized by comprising the following steps:

training the constructed target detection network to generate a target detection model based on the image data sensed by the vehicle end;

embedding a target detection model into a deep stream framework and deploying the target detection model into AGX equipment;

and inputting the multi-channel video streams of the vehicle end into a deep stream frame embedded with a target detection model, and acquiring a detection target result in the output multi-channel video streams.

2. The vehicle-end multi-channel video stream reasoning analysis method of claim 1, wherein the training of the constructed target detection network to generate the target detection model based on the image data sensed by the vehicle end comprises:

acquiring images of different types of lanes at a vehicle end to obtain corresponding image data, and marking target information in each image data to form a training data set;

training the constructed target detection network based on the training data set to generate a target detection model, wherein the target detection network is a lightweight yolov5s network.

3. The vehicle-end multichannel video stream reasoning analysis method according to claim 2, wherein each image data in the training data set is 1280 x 1280px in size.

4. The reasoning analysis method for the vehicle-end multipath video streams according to claim 1, wherein the embedding of the target detection model into the deep stream framework and the deployment of the target detection model into the AGX device comprises:

after the constructed target detection network is trained, acquiring a format model file corresponding to the training;

converting the format model file into a format required by a DeepStream framework on an AGX device;

and deploying the converted format model file to a deep stream framework of the AGX equipment so as to embed the target detection model into the deep stream framework.

5. The reasoning analysis method for the multi-path video streams at the vehicle end according to claim 4, wherein the corresponding format model file is a format model file of FP32best.

And converting the format model file of the FP32best.pt into the format model file of the FP16best.engine by using a TensrT conversion model plug-in carried by the DeepStream framework.

6. The reasoning analysis method for the vehicle-end multi-channel video streams according to claim 1, wherein a plurality of threads are deployed on the AGX device, and accordingly, the method for inputting the vehicle-end multi-channel video streams into a DeepStream framework embedded with a target detection model to obtain detection target results in the output multi-channel video streams includes:

the method comprises the steps that multiple paths of input video streams are processed in parallel on the basis of multiple threads on AGX equipment, and a target detection result in each path of video stream is obtained;

and integrating and outputting the target detection results of the multiple paths of video streams.

7. A vehicle-end multipath video stream reasoning analysis system is characterized by comprising:

the training module is used for training the constructed target detection network to generate a target detection model based on the image data sensed by the vehicle end;

the deployment module is used for embedding the target detection model into a DeepStream framework and deploying the target detection model into the AGX equipment;

and the acquisition module is used for inputting the multipath video streams of the vehicle end into a deep stream frame embedded with a target detection model and acquiring detection target results in the output multipath video streams.

8. The vehicle-end multichannel video stream reasoning analysis system according to claim 6, wherein the deployment module, configured to deploy the target detection model into the deep stream framework and the AGX device, includes:

9. An electronic device, comprising a memory and a processor, wherein the processor is configured to implement the steps of the vehicle-side multi-channel video stream inference analysis method according to any one of claims 1 to 6 when executing a computer management-like program stored in the memory.

10. A computer-readable storage medium, on which a computer management-like program is stored, which, when executed by a processor, implements the steps of the vehicle-side multi-channel video stream inference analysis method according to any of claims 1 to 6.