CN115550607A

CN115550607A - Model reasoning accelerator realized based on FPGA and intelligent visual perception terminal

Info

Publication number: CN115550607A
Application number: CN202211111792.0A
Authority: CN
Inventors: 南柄飞; 王凯; 郭志杰; 陈凯; 李森; 李首滨; 荣耀
Original assignee: Beijing Meike Tianma Automation Technology Co Ltd; Beijing Tianma Intelligent Control Technology Co Ltd
Current assignee: Beijing Meike Tianma Automation Technology Co Ltd; Beijing Tianma Intelligent Control Technology Co Ltd
Priority date: 2020-09-27
Filing date: 2020-09-27
Publication date: 2022-12-30
Also published as: CN112153347B; CN112153347A

Abstract

The invention provides a model inference accelerator realized based on an FPGA (field programmable gate array) and an intelligent visual perception terminal. According to the technical scheme provided by the invention, based on the idea of cooperation of processing hardware and an algorithm model, an intelligent computing processing unit architecture is designed, and a neural network model suitable for hardware environments of different scenes above a well and under the well is constructed, so that the automation and intelligence level is higher.

Description

Model reasoning accelerator realized based on FPGA and intelligent visual perception terminal

The application is a divisional application of Chinese patent applications with the application numbers of 202011035175.8, the application dates of 2020, 9 months and 27 days, and the application names of 'intelligent visual perception terminal, perception method, storage medium and electronic equipment under coal mine'.

Technical Field

The invention relates to the technical field of intelligent coal mining, in particular to a model reasoning accelerator and an intelligent visual perception terminal based on FPGA.

Background

In order to promote the integration and development of artificial intelligence and the coal industry, improve the intelligent unmanned mining level of a coal mine and promote the high-quality development of the coal industry, various coal mine enterprises and coal mine equipment suppliers research on large amounts of manpower and material resources invested in the coal mine intelligent technology.

At present, related enterprises carry out early exploration research based on related technologies of mine intelligent video monitoring to form a remote visual intervention type intelligent coal mining control system and promote the intelligent unmanned mining process of coal mines. In the technical implementation process, a large amount of video image sample data in a coal mine are collected, and a machine learning training sample database set with a certain scale is arranged and established. And then, a data processing center configured with a high-performance GPU hardware device is constructed on the ground, and an intelligent perception algorithm model related to the target object of the underground equipment environment is constructed through supervised training learning and offline iterative optimization based on deep learning. And finally, deploying the constructed robust sensing model on a ground high-performance workstation, accessing the robust sensing model into an underground real-time production video monitoring system through a reliable communication network with certain bandwidth guarantee to perform real-time reasoning sensing of the environmental state of the equipment, and using the sensing result for early warning intelligent monitoring in the high-efficiency and safe production process of coal. The remote visual intelligent monitoring video system based on the ground data center can liberate ground monitoring management personnel from a long-time busy monotone manual remote monitoring operation process to a certain extent, and improves the intelligent management and control level of remote visual monitoring.

However, with the advance of unmanned mining process, the level of underground intelligent technology is improved, the real-time requirement of the intelligent coal mining control system is higher and higher, and the real-time remote control short plate of the remote visual intelligent monitoring video system based on the ground data center is more and more prominent day by day, which will seriously restrict the underground intelligent unmanned mining process. Because of the rigid requirement of underground intrinsic safety, high-power-consumption hardware equipment of the architecture idea of the ground data processing center is difficult to enter underground deployment, and therefore a related intelligent perception algorithm model requiring high-power-consumption computing processing resources is difficult to realize underground landing. Meanwhile, under the restriction of mine scene conditions, a reliable and fast-response enough bandwidth data communication network which is one of the necessary conditions of the ground data processing center idea framework is difficult to popularize and realize in various large mines in a short time, so that the defects of the existing remote visual intelligent monitoring video system based on the ground data center idea in the aspects of automation and intelligent improvement of coal mine production management are gradually shown.

Disclosure of Invention

The invention aims to provide a model reasoning accelerator and an intelligent visual perception terminal which are realized based on an FPGA (field programmable gate array), so as to solve the technical problems of low automation and intelligence degree of a remote visual intelligent monitoring video system in the prior art.

Therefore, an embodiment of the present invention provides a model inference accelerator implemented based on an FPGA, including:

the convolution operation unit is used for performing matrix and matrix multiplication operation on the monitoring video image and the model grid weight kernel to obtain a high-bit-width convolution operation characteristic diagram;

the data conversion unit is used for converting the high bit width convolution operation characteristic diagram data to obtain low bit width characteristic diagram data;

the pooling operation unit is used for receiving the low bit width characteristic diagram output by the data conversion unit and obtaining a result according to a certain type of function in a pooling core range to serve as a pooling output characteristic diagram;

the translation operation unit translates the pooled output feature map along a plurality of directions so as to directly copy neighborhood quantity to the position of a central point and obtain a plurality of channel feature maps with the same number as the displacement directions;

the random reorganization operation unit is used for receiving the plurality of channel characteristic graphs output by the translation operation unit, randomly changing the write-back memory addresses of the channel characteristic graphs, realizing the diversified combination of the channel characteristic graphs and obtaining the channel characteristic graphs after random reorganization;

and the full connection unit is used for receiving the channel characteristic diagram output by the random recombination operation unit and mapping the channel characteristic diagram to the convolution operation unit for operation to obtain a characteristic vector corresponding to the perception information.

In some embodiments, the model inference accelerator is implemented based on an FPGA, and the data conversion unit includes, but is not limited to, being implemented by a comparator.

In some embodiments of the model inference accelerator implemented based on an FPGA, the comparator performs byte width conversion on a result of each convolution operation in the convolution operation unit, reduces a byte width of the result of the convolution operation to obtain a conversion result, and maps the conversion result to a binary tree structure.

In some embodiments, the model inference accelerator is implemented based on an FPGA, and the pooling operation unit and the translation operation unit are implemented by a linear cache design.

In some embodiments of the model inference accelerator implemented based on an FPGA, the random reassembly unit performs a random reassembly operation including combining data received this time in series with data received last time.

Some embodiments of the application also provide a coal mine underground intelligent visual perception terminal, including the camera and based on the intelligent perception of visual content calculation processing unit, the camera sends the surveillance video image who shoots to intelligent perception calculation processing unit and handles, wherein, intelligent perception calculation processing unit includes above arbitrary the model inference accelerator based on FPGA realizes.

The intelligent visual perception terminal in the colliery in some embodiments still includes:

the wired transmission interface and the wireless transmission interface are used for inputting the monitoring video image and outputting the sensing result information; the wired transmission interface comprises a wired network interface, an HDMI video data output interface, 2 USB interfaces, a Mirro-SD card slot and an SDRAM memory; the wireless transmission interface comprises a dual-frequency wireless network interface and a Bluetooth interface.

and the intrinsic safety power supply management module is used for providing electric energy for the camera and the intelligent perception calculation processing unit based on the visual content. Compared with the prior art, the technical scheme provided by the embodiment of the invention at least has the following beneficial effects:

the model inference accelerator and the intelligent visual perception terminal which are realized based on the FPGA provided by the embodiment of the invention are based on the collaborative thought of the algorithm model and the hardware processing architecture, design and construct a neural network model suitable for hardware environments of different scenes above and below the well, provide the intelligent terminal with the edge computing and acceleration reasoning hardware architecture, solve the problems of effective deployment and inference of a deep learning algorithm model on a device terminal under the scene of limited computing and processing resources of an unmanned working surface of a deep well, realize the problems of real-time perception digitization of the environment space position of the unmanned working surface of the deep well, on-site perception and on-site cognition related to the detection and identification of the condition of the device target, and provide a reliable solution for the construction of an unmanned intelligent production control system of a coal mine under the environment of a three-dimensional space below the unmanned working surface of the deep well, underground in-situ intelligent monitoring of a key device target object, underground and above-well bidirectional intelligent decision management and automatic and autonomous execution.

Drawings

FIG. 1 is a block diagram of an FPGA-based model inference accelerator according to an embodiment of the present application;

FIG. 2 is a diagram illustrating an underground intelligent visual perception terminal for a coal mine according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating an underground coal mine intelligent visual perception terminal according to another embodiment of the present invention;

FIG. 4 is a flow chart of a method for intelligent visual perception of an underground coal mine according to an embodiment of the invention;

FIG. 5 is a deployment flow chart of an intelligent visual perception algorithm for an underground coal mine according to an embodiment of the invention;

fig. 6 is a schematic diagram of a hardware connection relationship of an electronic device for executing the method for intelligent visual perception of an underground coal mine according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention will be further described with reference to the accompanying drawings. In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description of the present invention, and do not indicate or imply that the device or assembly referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Wherein the terms "first position" and "second position" are two different positions.

In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; the two components can be directly connected or indirectly connected through an intermediate medium, and the two components can be communicated with each other. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art. In the following embodiments provided in the present application, unless mutually contradictory, different technical solutions may be mutually combined, and technical features thereof may be mutually replaced.

The embodiment of the invention provides a model inference accelerator realized based on an FPGA (field programmable gate array), which comprises the following components as shown in figure 1:

and the convolution operation unit is used for performing matrix and matrix multiplication operation on the monitoring video image and the model grid weight kernel to obtain a high-bit-width convolution operation characteristic diagram.

And the data conversion unit is used for converting the high bit width convolution operation characteristic diagram data to obtain low bit width characteristic diagram data.

And the pooling operation unit is used for receiving the low bit width characteristic diagram output by the data conversion unit and obtaining a result as a pooling output characteristic diagram according to a certain type of function within a pooling core range.

And the translation operation unit translates the pooled output feature map along a plurality of directions so as to directly copy the neighborhood quantity to the position of the central point and obtain the channel feature maps with the same number as the displacement directions.

And the random reorganization operation unit is used for receiving the plurality of channel characteristic diagrams output by the translation operation unit, randomly changing the write-back memory addresses of the channel characteristic diagrams, realizing the diversified combination of the channel characteristic diagrams and obtaining the channel characteristic diagrams after random reorganization.

And the full-connection unit is used for receiving the channel characteristic diagram output by the random recombination operation unit and mapping the channel characteristic diagram to the convolution operation unit for operation to obtain a characteristic vector corresponding to the perception information.

The scheme is based on the collaborative thought of the algorithm model and the hardware processing architecture, the neural network model suitable for different scene hardware environments above the well and under the well is designed and constructed, the intelligent terminal with the edge computing and accelerated reasoning hardware architecture is provided, and the problems of effective deployment and reasoning of the deep learning algorithm model on the equipment terminal in the scene that computing processing resources of an unmanned working surface of a deep well are limited are solved.

Some embodiments of the present invention provide an intelligent visual perception terminal for a coal mine underground, as shown in fig. 2 and fig. 3, including a camera 101 and an intelligent perception computing processing unit 100 based on visual content, where the camera 101 sends a captured surveillance video image to the intelligent perception computing processing unit 100 for processing, where the intelligent perception computing processing unit 100 includes a model inference accelerator 102 implemented based on FPGA as in the foregoing embodiments, where the model inference accelerator 102 based on FPGA includes:

And the data conversion unit is used for converting the high bit width convolution operation characteristic diagram data to obtain low bit width characteristic diagram data. The data conversion unit is realized by a comparator, but is not limited to be realized by the comparator, the comparator performs byte width conversion on each convolution operation result in the convolution operation unit, the conversion result is mapped into a binary tree structure after the byte width of the convolution operation result is reduced to obtain the conversion result, and the convolution operation characteristic diagram data is converted from a high byte width to a low byte width, so that the calculation process can be simplified.

And the processing unit is used for realizing rapid calculation by a linear cache design, receiving the low-bit-width characteristic diagram output by the data conversion unit, and solving a result as a pooling output characteristic diagram according to a certain type of function within a pooling core range.

And the translation operation unit translates the pooled output feature map along a plurality of directions so as to directly copy the neighborhood quantity to the position of the central point and obtain the channel feature maps with the same number as the displacement directions. The unit is realized by a linear cache design, a row of filling items with the numerical value of zero is added to the visual feature data output by the merging operation unit in the transverse direction, a column of filling items with the numerical value of zero is added in the longitudinal direction, and further the visual feature data with the filling items is obtained, namely the visual feature data is input, 1 column of filling items with the numerical value of 0 are respectively added in the width direction and the height direction, further the visual features of the width +1 and the height +1 are obtained, the moving direction is determined according to the channel index value, the moving directions of different channels are different, and a numerical value is obtained by sliding window calculation each time after moving.

And the random reorganization operation unit receives the plurality of channel characteristic graphs output by the translation operation unit, randomly changes the write-back memory addresses of the channel characteristic graphs, realizes the diversified combination of the channel characteristic graphs, and obtains the channel characteristic graphs after random reorganization. Specifically, the data received this time and the data received last time are combined in series to be used as the visual characteristic data after the combination; in the calculation process of the neural network, the processing unit needs to serially combine the calculation result of the last time and the calculation result of the current time to obtain a final result, and the module realizes channel adjustment and information exchange by controlling offset addresses of the calculation results of 2 times in the serial combination process. In order to improve efficiency, the processing unit ensures that when the result is obtained by calculation, the CPU can simultaneously complete the copy of the calculation result of the last time, and then perform connection storage according to the offset address.

The intelligent perception calculation processing unit based on the visual content adopts a quad-core 64-bit ARM Cortex-A72 architecture CPU, and the dominant frequency reaches 1.5GHz. Further, as shown in fig. 3, the intelligent visual perception terminal under the coal mine is configured with: the wired transmission interface and the wireless transmission interface are used for inputting the monitoring video image and outputting the sensing result information; the wired transmission interface comprises a wired network interface, an HDMI video data output interface, 2 USB interfaces, a Mirro-SD card slot and an SDRAM memory; the wireless transmission interface comprises a dual-frequency wireless network interface and a Bluetooth interface. Therefore, the terminal in the scheme is completely provided with the SDRAM memory, the Mirro-SD card slot and the USB interface, is used for loading an operating system and storing data, supports a gigabit Ethernet, has a 2.4GHz/5GHz dual-frequency 802.11ac wireless network, supports Bluetooth 5.0, provides 2 USB3.0 interfaces and supports a high-definition HDMI video data output interface.

Optionally, the above coal mine underground intelligent visual perception terminal further includes: and the intrinsic safety power supply management module is used for providing electric energy for the camera and the intelligent perception calculation processing unit based on the visual content. The intrinsic safety power supply management module ensures that the electrical characteristics of a terminal system meet the intrinsic safety rigid requirements of underground work of a coal mine, so that the electric energy and the heat energy of the terminal are low enough not to cause explosive gas to burn. The principle is as follows: the spark discharge energy and the heat energy of the circuit are limited by limiting various parameters of the circuit or taking protective measures, so that electric sparks and heat effects generated under normal working and specified fault states cannot ignite explosive mixtures in the surrounding environment, and therefore electrical intrinsic safety is achieved.

Some embodiments of the present invention further provide a method for intelligent visual perception of an underground coal mine, as shown in fig. 4, including the following steps:

s101: constructing a model network structure space, training models in the model network structure space by using sample data, and taking the trained model set as a set of models to be selected, wherein the sample data comprises a monitoring video image and an annotation data label in the monitoring video image.

S102: and selecting a specific model which is matched with the actual visual perception application hardware environment from the set of candidate models.

S103: and inputting the practical application monitoring video image into the selected model network, and obtaining a perception result corresponding to the practical application monitoring video image according to the inference result of the selected model network.

Specifically, the building and deployment of the terminal-aware algorithm model are given as shown in fig. 5, and as shown in the left half of fig. 5, where K represents the convolution kernel size (Kmax is the maximum value), D represents the network depth (Dmax is the maximum value), and W represents the channel number (Wmax is the maximum value). In the process of building the terminal perception algorithm model, the input visual image meets different resolution sizes. The specific steps are described as follows:

(1) a model network sample space is defined. Set of convolution kernel sizes { K) according to network model structure ₁ ,K ₂ ,...,K _max }, e.g., {3 } _, 5 _,..., 9}, namely the size of the convolution kernel is {3 × 3,5 × 5,7 × 7,9 × 9}; set of network depths { D _1, D _2,..., D _max }, e.g., {4,6,8}; set of channel numbers W ₁ ,W ₂ ,…,W _max }, e.g., {3,4,6}; and resolution size set R of different visual images ₁ ,R ₂ ,...,R _max }, e.g., {256,264, …, resolution }, where Resolution represents the source image Resolution size.

(2) Initially training a maximum-scale master model network. The maximum-scale model network is automatically constructed through training and learning, the size of a convolution kernel is the maximum size Kmax in a convolution kernel set, the depth of the network is the maximum number of layers Dmax in a depth number set, the number of channels is the maximum value Wmax in a channel number set, and the resolution of the visual image is the maximum number Rmax in a resolution set.

(3) And (5) constructing the self-adaptive training of the sub-model network set. Gradually reducing the scale of the network structure from the largest-scale main model network structure, exhausting the model network structure space based on avoiding the relevance reasoning principle among all network models, completing the self-adaptive training learning of all sub-model networks and realizing the one-time training construction of a model network set; after the self-adaptive iterative machine learning, a model network set suitable for the diversified hardware platform perception is finally obtained so as to adapt to different hardware system deployments. Wherein weights are shared between the master model network and the sub model networks.

(4) Through the self-adaptive iterative machine learning, a perception sub-model network set suitable for diversified hardware platforms is finally obtained, and the requirement of adapting to the deployment of different hardware systems can be met.

(5) And in a pre-deployment stage of a model network terminal, optimally matching and selecting the specific model from a model network set through a pre-trained prediction guider, and introducing the specific model into a terminal platform where the actual visual intelligent perception application is located.

After model network set construction and model network terminal pre-deployment preparation work are completed, personalized deployment of a visual perception algorithm model on an underground terminal system platform is implemented, and visual intelligent in-place perception application is achieved. As shown in the right-hand portion of fig. 5 above, the main steps are as follows:

(1) Initializing an application program, namely checking and confirming the environment required by the running of the current program;

loading configuration parameters required by the operation of the analysis program, such as optional configuration parameters in inference processes of batch _ size and the like;

(2) Reading a specific model network required by a loading terminal;

(3) Creating an inference engine according to the loaded model network;

(4) Performing edge reasoning on the terminal system according to the requirements of a reasoning engine;

(5) And outputting a visual intelligent perception application result.

The scheme is based on the idea of the existing ground data center architecture, the objective deficiency of the related technology of the remote visual intelligent monitoring video system is changed, and the intelligent visual perception terminal and the system method for the underground coal mine are provided based on the idea of the cooperation of a hardware structure and a software algorithm. Firstly, a terminal with edge computing reasoning capability under a coal mine is designed based on FPGA, the vision in-situ intelligent sensing capability under the coal mine is endowed, and the transmission delay of data information is reduced. Secondly, on the basis of implementation of a terminal hardware system, a lightweight algorithm model suitable for the underground terminal hardware environment is constructed through self-adaptive machine learning, the problem of hardware efficiency limitation of underground algorithm model deployment is solved, the algorithm model is constructed once, and the requirements of diversified hardware platform deployment and reasoning application are met.

The terminal perception algorithm model construction, deployment and perception application method system is provided. A lightweight algorithm model suitable for the underground terminal hardware environment is constructed through self-adaptive machine learning, the problem of hardware efficiency limitation of underground algorithm model deployment is solved, the algorithm model is constructed at one time, and diversified hardware platform deployment is met. In the stage of deployment inference application, the system selects a specific model network for deployment inference and perception application according to the hardware platform and condition constraint matching.

Some embodiments of the present invention provide a storage medium, which may be a computer-usable storage medium (including but not limited to a disk memory, a CD-ROM, an optical memory, etc.), where program instructions are stored in the storage medium, and after the program instructions are read by a computer, the computer executes the method for intelligent visual perception in an underground coal mine according to any of the above technical solutions.

Fig. 6 is a schematic diagram of a hardware structure of an electronic device for executing the method for intelligent visual perception of an underground coal mine provided by the embodiment, where the device includes: one or more processors 201 and a memory 202, with one processor 201 being illustrated in fig. 6. The equipment for executing the coal mine underground intelligent visual perception method can also comprise: an input device 203 and an output device 204. The processor 201, the memory 202, the input device 203 and the output device 204 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example. Memory 202, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The processor 201 executes various functional applications and data processing of the server by running the nonvolatile software program, instructions and modules stored in the memory 202, that is, the method for intelligent visual perception in the underground coal mine of the embodiment of the method is realized.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A model reasoning accelerator realized based on FPGA is characterized by comprising:

the translation operation unit translates the pooled output feature map along a plurality of directions so as to directly copy neighborhood quantity to the position of a central point and obtain a plurality of channel feature maps with the same quantity as the displacement direction;

2. The FPGA-based model reasoning accelerator of claim 1, wherein:

the data conversion unit includes, but is not limited to, being implemented by a comparator.

3. The FPGA-based model reasoning accelerator of claim 2, wherein:

and the comparator performs byte width conversion on each convolution operation result in the convolution operation unit, reduces the byte width of the convolution operation result to obtain a conversion result, and maps the conversion result into a binary tree structure.

4. The FPGA-based model reasoning accelerator of claim 1, wherein:

the pooling operation unit and the translation operation unit are both implemented by a linear cache design.

5. The FPGA-based model reasoning accelerator of claim 1, wherein:

and the random recombination operation unit performs the random recombination operation on the data received this time and the data received last time in series.

6. An intelligent visual perception terminal for a coal mine underground is characterized by comprising a camera and an intelligent perception calculation processing unit based on visual contents, wherein the camera sends a shot monitoring video image to the intelligent perception calculation processing unit for processing, and the intelligent perception calculation processing unit comprises the model inference accelerator realized based on the FPGA according to any one of claims 1-5.

7. The intelligent visual perception terminal in a coal mine well according to claim 6, further comprising:

8. The intelligent visual perception terminal in a coal mine well according to claim 7, further comprising:

and the intrinsic safety power supply management module is used for providing electric energy for the camera and the intelligent perception calculation processing unit based on the visual content.