CN112712124B

CN112712124B - Multi-module cooperative object recognition system and method based on deep learning

Info

Publication number: CN112712124B
Application number: CN202011641665.2A
Authority: CN
Inventors: 奚照明; 杨哲; 邵强; 梁昭; 蔡达; 张辉; 马琳
Original assignee: Shandong Aubang Transportation Facilities Engineering Co ltd
Current assignee: Shandong Aubang Transportation Facilities Engineering Co ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-12-10
Anticipated expiration: 2040-12-31
Also published as: CN112712124A

Abstract

The invention belongs to the technical field of deep learning, and provides a multi-module cooperative object identification system and method based on deep learning. The system comprises a video input module, a video processing subsystem module, an intelligent video engine module, a neural network acceleration engine module, a video graphics subsystem module and a video output module which are integrated into a whole and work cooperatively. The real-time identification of the object is realized by utilizing the cooperation of multiple modules, the problem that the camera image needs to be uploaded to a server for identification processing at present is solved, the time delay caused by network delay or network bandwidth limitation is avoided, and the real-time identification is realized.

Description

Multi-module cooperative object recognition system and method based on deep learning

Technical Field

The invention belongs to the technical field of deep learning, and particularly relates to a multi-module cooperative object identification system and method based on deep learning.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

The visual information processing is to construct an intelligent system simulating human visual ability based on external perception data and judge and identify a target, wherein object identification is the basis of the visual information processing technology. With the popularization of computers and intelligent terminals and the rapid development of the internet, the rapid extension of the application field of image and video big data provides challenges for object recognition technology. The existing object recognition technology has the characteristics of high efficiency, high performance and even intellectualization.

In view of the requirements of efficiency and performance trade-off and intellectualization, deep learning is rapidly becoming a research hotspot of computer vision by virtue of strong modeling and data characterization capabilities. The deep learning establishes a logic level model of the internal implicit relation of the learning data through the function mapping from the low-level signal to the high-level feature so as to simulate the visual cognition reasoning process of the human brain, so that the learned feature has stronger generalization capability and expression capability.

However, the inventor finds that, with the increase of the quality of video images, the existing 5G network infrastructure is not covered completely, the bandwidth of a 4G network cannot meet the real-time transmission of high-quality video images, and the real-time performance of object identification cannot be guaranteed even more because a computer receives data returned from a field and then performs further computer vision processing on object identification.

Disclosure of Invention

In order to solve at least one technical problem in the background art, the invention provides a multi-module cooperation object recognition system and method based on deep learning, which can realize real-time recognition of an object by using multi-module cooperation.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides a multi-module cooperative object recognition system based on deep learning.

A multi-module cooperative object recognition system based on deep learning comprises a video input module, a video processing subsystem module, an intelligent video engine module, a neural network acceleration engine module, a video graphics subsystem module and a video output module which are integrated into a whole and cooperatively work;

the video input module is used for receiving real-time video data and storing the real-time video data into a specified memory area;

the video processing subsystem module is used for calling original video data of the memory area and decomposing the original video data into basic video data and extended video data;

the intelligent video engine module is used for converting image frame data in the current extended video data into frame data in an image format matched with the neural network model;

the neural network acceleration engine module is used for acquiring frame data after format conversion and identifying to obtain four-point coordinate position information of the class and the outline of the object through a neural network model;

the video graphics subsystem module is used for acquiring basic video data and then drawing a contour frame for identifying the object in the basic video data based on the category of the object and the position information of four-point coordinates of the contour;

and the video output module is used for outputting the video image data with the outline frame of the identified object.

As an embodiment, the base video data maintains the resolution of the original video data.

The technical scheme has the advantages that the consistency with original data can be guaranteed, and the object outline frame identified in the later period can be restored to the original video data more accurately.

In one embodiment, the resolution of the extended video data is matched to a neural network model within the neural network acceleration engine module.

The technical scheme has the advantage that the neural network model can be matched with the neural network model, and a data basis is provided for the neural network model.

As an implementation mode, the video input module, the video processing subsystem module, the intelligent video engine module, the neural network acceleration engine module, the video graphics subsystem module and the video output module are all started after receiving a starting command and all perform initialization operation at the same time.

The technical scheme has the advantages that the modules are initialized, and the accuracy of post-data processing can be guaranteed.

As an embodiment, during the initialization operation, the initialization of the neural network acceleration engine module includes loading a trained neural network model in a specific format.

The technical scheme has the advantages that before loading, the format of the neural network model trained in the computer needs to be converted into a specific format which can be loaded by the neural network acceleration engine module in advance, and the efficiency of video image data processing is improved.

As an embodiment, the video processing subsystem module, the intelligent video engine module, the neural network acceleration engine module, the video graphics subsystem module and the video output module perform multi-thread parallel operation; the VitoVo thread operation is carried out among the video processing subsystem module, the neural network acceleration engine module, the video graphics subsystem module and the video output module; and a detect thread operation is carried out between the intelligent video engine module and the neural network acceleration engine module.

The technical scheme has the advantages that the processing efficiency of video data can be improved and the real-time property of object identification can be guaranteed by parallel thread operation.

In a Vitevo thread operation, extracting frame data from the extended video frame data and putting the frame data into a frame data linked list; in detect thread operation, taking out frame data from the frame data linked list in sequence, judging whether the frame data is identified, defining a zone bit according to the result of identifying, and storing the zone bit and the frame number into the zone bit linked list.

In the Vitevo thread operation, object identification results and frame numbers of frame data are sequentially extracted from the identification result linked list, wherein the object identification results and the frame numbers of the frame data are identified by the neural network acceleration engine module in the detection thread and are stored in the identification result linked list.

The technical scheme has the advantages that the flag bit is utilized to detect the corresponding thread, the processing sequence of the video images in the corresponding thread is guaranteed, and omission of the video images is avoided.

The second aspect of the invention provides a multi-module cooperative object identification method based on deep learning.

A recognition method of a multi-module cooperative object recognition system based on deep learning comprises the following steps:

receiving a starting command and initializing a video input module, a video processing subsystem module, an intelligent video engine module, a neural network acceleration engine module, a video graphics subsystem module and a video output module;

receiving real-time video data by using a video input module and storing the real-time video data into a specified memory area;

calling original video data in a memory area by using a video processing subsystem module and decomposing the original video data into basic video data and extended video data;

converting image frame data in the current extended video data into frame data in an image format matched with the neural network model by using an intelligent video engine model;

acquiring frame data after format conversion by using a neural network acceleration engine module, and identifying to obtain four-point coordinate position information of the class and the outline of the object through a neural network model;

acquiring basic video data by using a video graphics subsystem module, and drawing a contour frame for identifying an object in the basic video data based on the category of the object and the position information of four-point coordinates of the contour;

and outputting the video image data with the outline frame of the identified object by using a video output module.

As one embodiment, the identification method of the deep learning based multi-module cooperative object identification system further comprises identifying a detect thread and a VitoVo thread, and the two threads are executed in parallel.

Compared with the prior art, the invention has the beneficial effects that:

the invention realizes the real-time identification of the object by utilizing the multi-module cooperation, solves the problem that the camera image needs to be uploaded to a server for identification processing at present, avoids the time delay caused by network delay or the limitation of network bandwidth, and realizes the real-time identification;

the video processing subsystem module, the intelligent video engine module, the neural network acceleration engine module, the video graphics subsystem module and the video output module carry out multi-thread parallel operation, and the intelligent video engine module is adopted to convert an input image format into an image format required by a model, so that the utilization rate of a CPU (Central processing Unit) is reduced, the image processing time is shortened, and the real-time property of object identification in an image is ensured.

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

Fig. 1 is a schematic structural diagram of a deep learning-based multi-module cooperative object recognition system according to an embodiment of the present invention.

Detailed Description

The invention is further described with reference to the following figures and examples.

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

Interpretation of terms:

the VitoVo thread: input to the output thread.

detect thread: and (4) identifying and detecting the object.

Example one

Referring to fig. 1, the deep learning-based multi-module cooperative object recognition system of the present embodiment includes a video input module, a video processing subsystem module, an intelligent video engine module, a neural network acceleration engine module, a video graphics subsystem module, and a video output module, which are integrated into a whole and cooperate with each other.

In specific implementation, in order to ensure the accuracy of the post-data processing, the video input module, the video processing subsystem module, the intelligent video engine module, the neural network acceleration engine module, the video graphics subsystem module and the video output module are all started after receiving a start command and are all initialized.

During an initialization operation, the initialization of the neural network acceleration engine module includes loading a trained neural network model in a particular format. Before loading, format conversion needs to be carried out on a neural network model trained in a computer in advance, and the neural network model is converted into a specific format which can be loaded by a neural network acceleration engine module, so that the efficiency of video image data processing is improved.

As a specific embodiment, the training process of the neural network model is as follows:

collecting pictures containing an object to be detected in different scenes, carrying out unification treatment on the pictures, and labeling the class of the object to be detected to form a sample set; the sample set is divided into a training set and a testing set;

selecting one sample (Ai Bi) of the training set; wherein Bi is data and Ai is a label;

sending the data into a network, and calculating the actual output Y of the network; at the moment, the weights in the network are random;

calculating an error D-Bi-Y (the difference between the predicted value Bi and the actual value Y);

adjusting a weight matrix W according to the error D;

the above process is repeated for each sample until the error does not exceed the specified range for the entire training set.

In the present embodiment, the scenes include, but are not limited to, day, night, rainy day, snowy day, foggy day, and the like.

The yolov3 neural network model is used in this embodiment. The Caffe framework performs model training. And the format is consistent with the deep learning framework cafe supported by the front-end chip and is converted into a format which can be supported by the neural network acceleration engine module.

It should be noted here that the neural network model may also be other existing network structures, and those skilled in the art can specifically select the neural network model according to actual situations, and the details are not described here.

Specifically, the video input module is used for receiving real-time video data and storing the real-time video data into a specified memory area.

For example: the video input module receives real-time video data shot by a camera through an MIPI (Mobile Industry Processor Interface), processes the received original video image data and realizes the acquisition of the video data; the video input module stores the received data into a designated memory area.

Specifically, the video processing subsystem module is configured to retrieve original video data of the memory area and decompose the original video data into basic video data and extended video data.

Wherein the base video data maintains the resolution of the original video data. The resolution of the extended video data is matched to a neural network model within the neural network acceleration engine module.

Therefore, the consistency with the original data can be guaranteed, the object outline frame identified in the later period can be restored to the original video data more accurately, the resolution of the extended video data can be matched with the neural network model, and a data basis is provided for the neural network model.

Specifically, the intelligent video engine module is used for converting image frame data in the current extended video data into frame data in an image format matched with the neural network model.

In this embodiment, the image frame data in the current extended video data is in yuv format, and the image format matched with the neural network model in this embodiment is in rgb format.

Specifically, the neural network acceleration engine module is used for acquiring frame data after format conversion, and obtaining the category and four-point coordinate position information of the outline of the object through neural network model identification.

Specifically, the video graphics subsystem module is used for acquiring basic video data, and then drawing a contour frame for identifying the object in the basic video data based on the category of the object and the position information of four-point coordinates of the contour.

Specifically, the video output module is used for outputting video image data with a contour frame of the identified object.

In specific implementation, the video processing subsystem module, the intelligent video engine module, the neural network acceleration engine module, the video graphics subsystem module and the video output module perform multi-thread parallel operation; the VitoVo thread operation is carried out among the video processing subsystem module, the neural network acceleration engine module, the video graphics subsystem module and the video output module; and a detect thread operation is carried out between the intelligent video engine module and the neural network acceleration engine module. The parallel thread operation can improve the processing efficiency of video data and ensure the real-time property of object identification.

In the VitoVo thread:

after the video processing subsystem module decomposes the video data collected by the video input module into basic video data and extended video data, extracting frame data from the extended video frame data, and putting the frame data into a frame data linked list;

sequentially taking out the zone bits and the frame numbers of the frame data from the zone bit linked list, wherein the zone bits of the frame data are stored in the zone bit linked list in the identification detect thread;

the flag bit represents whether the frame data is used for object identification; because the object recognition frame rate in the neural network acceleration engine module is less than the sampling frame rate of the video input module for video data, in the embodiment, a frame extraction recognition mode is adopted, and a flag bit is used for marking whether frame data is used for object recognition;

sequentially extracting the object identification result and the frame number of the frame data from the identification result linked list, wherein the object identification result and the frame number of the frame data are obtained by the neural network acceleration engine module in the identification detect thread and are stored in the identification result linked list; the identification result comprises the object category and the coordinate position information of four points of the outline;

the video graphics subsystem module acquires basic video data, and draws a contour frame for identifying an object in the basic video data according to contour four-point coordinate position information acquired by the neural network acceleration engine module;

the video output module outputs video image data with the outline frame of the identified object.

Identifying a detect thread:

taking out frame data from the frame data linked list in sequence, judging whether the frame data is identified, defining a zone bit according to the result of identifying whether the frame data is identified, and storing the zone bit and the frame number into the zone bit linked list;

an intelligent video engine module is adopted to convert the frame data in the input image format into the frame data in the image format required by the model,

acquiring frame data after format conversion by adopting a neural network acceleration engine module, and identifying to obtain four-point coordinate position information of the class and the outline of the object through a neural network model; and storing the coordinate position information and the frame number of the four points of the object type and the outline into an identification result linked list.

In the embodiment, the flag bit is used for detecting the corresponding thread, so that the processing sequence of the video images in the corresponding thread is guaranteed, and the omission of the video images is avoided.

In the embodiment, the real-time identification of the object is realized by utilizing the cooperation of multiple modules, the problem that the camera image needs to be uploaded to a server for identification processing at present is solved, the time delay caused by network delay or network bandwidth limitation is avoided, and the real-time identification is realized.

The recognition method of the deep learning-based multi-module collaborative object recognition system comprises the following steps:

s101: receiving a starting command and initializing a video input module, a video processing subsystem module, an intelligent video engine module, a neural network acceleration engine module, a video graphics subsystem module and a video output module;

s102: receiving real-time video data by using a video input module and storing the real-time video data into a specified memory area;

s103: calling original video data in a memory area by using a video processing subsystem module and decomposing the original video data into basic video data and extended video data;

s104: converting image frame data in the current extended video data into frame data in an image format matched with the neural network model by using an intelligent video engine model;

s105: acquiring frame data after format conversion by using a neural network acceleration engine module, and identifying to obtain four-point coordinate position information of the class and the outline of the object through a neural network model;

s106: acquiring basic video data by using a video graphics subsystem module, and drawing a contour frame for identifying an object in the basic video data based on the category of the object and the position information of four-point coordinates of the contour;

s107: and outputting the video image data with the outline frame of the identified object by using a video output module.

In some embodiments, the identification method of the deep learning based multi-module cooperative object identification system further comprises identifying a detect thread and a VitoVo thread, which are executed in parallel.

In the embodiment, the video processing subsystem module, the intelligent video engine module, the neural network acceleration engine module, the video graphics subsystem module and the video output module perform multi-thread parallel operation, and the intelligent video engine module is adopted to convert an input image format into an image format required by a model, so that the utilization rate of a CPU (Central processing Unit) is reduced, the image processing time is shortened, and the real-time property of object identification in an image is ensured.

In addition, in this embodiment, not only object recognition in a moving image but also still image recognition can be performed to input an image (picture/video) still image/moving image.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A multi-module cooperative object recognition system based on deep learning is characterized by comprising a video input module, a video processing subsystem module, an intelligent video engine module, a neural network acceleration engine module, a video graphics subsystem module and a video output module which are integrated into a whole and cooperate with one another;

the video input module is used for receiving real-time video data and storing the real-time video data into a specified memory area; the video input module receives real-time video data shot by a camera through an MIPI (Mobile industry processor interface), processes the received original video image data and realizes the acquisition of the video data;

the video processing subsystem module is used for calling original video data of the memory area and decomposing the original video data into basic video data and extended video data; wherein the base video data maintains the resolution of the original video data; the resolution of the extended video data is matched with a neural network model in a neural network acceleration engine module;

the neural network acceleration engine module is used for acquiring frame data after format conversion and identifying to obtain four-point coordinate position information of the class and the outline of the object through a neural network model; the neural network acceleration engine module carries out format conversion on a neural network model trained in a computer in an initialization operation process and converts the neural network model into a specific format which can be loaded by the neural network acceleration engine module;

2. The deep learning-based multi-module cooperative object recognition system of claim 1, wherein the video input module, the video processing subsystem module, the intelligent video engine module, the neural network acceleration engine module, the video graphics subsystem module and the video output module are all started after receiving a start command and all perform initialization operations at the same time.

3. The deep learning based multi-module cooperative object recognition system of claim 1, wherein the video processing subsystem module, the intelligent video engine module, the neural network acceleration engine module, the video graphics subsystem module and the video output module perform multi-thread parallel operations therebetween; the VitoVo thread operation is carried out among the video processing subsystem module, the neural network acceleration engine module, the video graphics subsystem module and the video output module; and a detect thread operation is carried out between the intelligent video engine module and the neural network acceleration engine module.

4. The deep learning based multi-module cooperative object recognition system of claim 3, wherein in a Vitevo thread operation, frame data is extracted from the extended video frame data and placed into a linked list of frame data; in detect thread operation, taking out frame data from the frame data linked list in sequence, judging whether the frame data is identified, defining a zone bit according to the result of identifying, and storing the zone bit and the frame number into the zone bit linked list.

5. The deep learning-based multi-module cooperative object recognition system as claimed in claim 3, wherein in the Vitevo thread operation, the object recognition result and the frame number of the frame data are sequentially extracted from the recognition result linked list, and the object recognition result and the frame number of the frame data are recognized by the neural network acceleration engine module in the recognition detect thread and stored in the recognition result linked list.

6. A recognition method of the deep learning based multi-module cooperative object recognition system according to any one of claims 1 to 5, comprising:

receiving real-time video data by using a video input module and storing the real-time video data into a specified memory area; real-time video data shot by a camera is received through an MIPI (Mobile industry processor interface), and the received original video image data is processed to realize the collection of the video data;

calling original video data in a memory area by using a video processing subsystem module and decomposing the original video data into basic video data and extended video data; wherein the base video data maintains the resolution of the original video data; the resolution of the extended video data is matched with a neural network model in a neural network acceleration engine module;

acquiring frame data after format conversion by using a neural network acceleration engine module, and identifying to obtain four-point coordinate position information of the class and the outline of the object through a neural network model; in the initialization operation process, format conversion is carried out on a trained neural network model in a computer, and the neural network model is converted into a specific format which can be loaded by a neural network acceleration engine module; acquiring basic video data by using a video graphics subsystem module, and drawing a contour frame for identifying an object in the basic video data based on the category of the object and the position information of four-point coordinates of the contour;

7. The identification method of claim 6 further comprising identifying a detect thread and a VitoVo thread, the two threads executing in parallel.