CN113111735A

CN113111735A - Rapid scene recognition method and device under complex environment

Info

Publication number: CN113111735A
Application number: CN202110317587.9A
Authority: CN
Inventors: 齐飞; 谢也佳; 王晓甜; 石光明
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-03-25
Filing date: 2021-03-25
Publication date: 2021-07-13

Abstract

The invention discloses a rapid scene recognition method and a rapid scene recognition device under a complex environment, wherein the method is applied to a rapid scene recognition system under the complex environment, the system comprises an image acquisition device, and the method comprises the following steps of; obtaining first image data by the image acquisition device; constructing a first scene recognition model through a GoogleNet network; network parameter training is carried out on the first scene recognition model on a self-built five-type terrain data set, and the first scene recognition model after convergence is obtained; carrying out transplantation deployment of the converged first scene recognition model at an embedded end through TensorRT; inputting the first image data into the converged first scene recognition model after deployment, and performing forward reasoning to obtain a recognition result. The technical problems that in the prior art, the sensor identification is easy to have device errors, the complex scene identification rate is low, the identification speed is low, and the device is not suitable for real-time occasions are solved.

Description

Rapid scene recognition method and device under complex environment

Technical Field

The invention relates to the field of artificial intelligence correlation, in particular to a method and a device for rapidly identifying scenes in a complex environment.

Background

Scene recognition belongs to an image processing task, namely, the type of a place where an image scene is located is judged in an image and contains accurate geographic position coordinates, so that the result of the scene recognition can be used for subsequent positioning, but under the condition that higher requirements are provided for maneuvering operation capacity such as high jump and lower jump, the terrain is changeable and strong in instantaneity in the jump process, and the traditional method for completing the scene recognition by observing and fusing sensor data based on stacking of various sensors is difficult to evaluate and feed back complex terrain in real time.

However, in the process of implementing the technical solution of the invention in the embodiments of the present application, the inventors of the present application find that the above-mentioned technology has at least the following technical problems:

the sensor identification in the prior art is easy to have device errors, and has the technical problems of low identification rate for complex scenes, low identification speed and inapplicability to real-time occasions.

Disclosure of Invention

The embodiment of the application provides a rapid scene recognition method and device in a complex environment, solves the technical problems that in the prior art, sensor recognition is easy to have device errors, the complex scene recognition rate is low, the recognition speed is low, and the method is not suitable for real-time occasions, achieves the technical effects of realizing embedded end deployment of a scene recognition model with high accuracy by using a high-efficiency deep neural network, and meeting the real-time performance, accuracy and rapidity of scene recognition.

In view of the foregoing problems, the embodiments of the present application provide a method and an apparatus for fast scene recognition in a complex environment.

In a first aspect, an embodiment of the present application provides a method for fast scene recognition in a complex environment, where the method is applied to a system for fast scene recognition in a complex environment, the system includes an image acquisition device, and the method includes; obtaining first image data by the image acquisition device; constructing a first scene recognition model through a GoogleNet network; network parameter training is carried out on the first scene recognition model on a self-built five-type terrain data set, and the first scene recognition model after convergence is obtained; carrying out transplantation deployment of the converged first scene recognition model at an embedded end through TensorRT; inputting the first image data into the converged first scene recognition model after deployment, and performing forward reasoning to obtain a recognition result.

On the other hand, the application also provides a device for fast scene recognition in a complex environment, and the device comprises: a first obtaining unit for obtaining first image data by the image acquisition device; the first construction unit is used for constructing a first scene recognition model through a GoogleNet network; a second obtaining unit, configured to perform network parameter training on the first scene recognition model on a self-established five-class terrain data set, and obtain the converged first scene recognition model; a first operation unit, configured to perform migration deployment of the converged first scene recognition model at an embedding end through a TensorRT; a third obtaining unit, configured to input the first image data into the converged first scene identification model after deployment is completed, and perform forward inference to obtain an identification result.

In a third aspect, the present invention provides a fast scene recognition apparatus in a complex environment, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method of the first aspect when executing the program.

One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:

because the method adopts the mode that the camera acquires image data through photoelectric imaging, the laser radar acquires distance information and encodes the distance information into a point cloud data packet by receiving reflected laser, the point cloud data packet is transmitted to the core processor through the USB interface, the Ethernet interface and the data bus for processing, a sparse and high-computation-performance network structure is built based on the GoogleNet network architecture, the first scene recognition model is subjected to network parameter training on a self-built five-type terrain data set to obtain the converged first scene recognition model, then transplantation deployment at the embedded end is carried out, the deployed converged first scene recognition model is subjected to forward reasoning to obtain a recognition result, the embedded end deployment of the high-accuracy scene recognition model is realized by utilizing the high-efficiency deep neural network, the real-time performance of scene recognition is met, Accuracy and rapidity.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

Fig. 1 is a schematic flowchart of a method for fast scene recognition in a complex environment according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a fast scene recognition apparatus in a complex environment according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an exemplary electronic device according to an embodiment of the present application.

Description of reference numerals: a first obtaining unit 11, a first constructing unit 12, a second obtaining unit 13, a first operating unit 14, a third obtaining unit 15, a bus 300, a receiver 301, a processor 302, a transmitter 303, a memory 304, a bus interface 305.

Detailed Description

The embodiment of the application provides a rapid scene recognition method and device in a complex environment, solves the technical problems that in the prior art, sensor recognition is easy to have device errors, the complex scene recognition rate is low, the recognition speed is low, and the method is not suitable for real-time occasions, achieves the technical effects of realizing embedded end deployment of a scene recognition model with high accuracy by using a high-efficiency deep neural network, and meeting the real-time performance, accuracy and rapidity of scene recognition. Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are merely some embodiments of the present application and not all embodiments of the present application, and it should be understood that the present application is not limited to the example embodiments described herein.

Summary of the application

Scene recognition belongs to an image processing task, namely, the type of a place where an image scene is located is judged in an image and contains accurate geographic position coordinates, so that the result of the scene recognition can be used for subsequent positioning, but under the condition that higher requirements are provided for maneuvering operation capacity such as high jump and lower jump, the terrain is changeable and strong in instantaneity in the jump process, and the traditional method for completing the scene recognition by observing and fusing sensor data based on stacking of various sensors is difficult to evaluate and feed back complex terrain in real time. However, in the prior art, the sensor identification is easy to have device errors, the complex scene identification rate is low, the identification speed is low, and the method is not suitable for real-time occasions.

In view of the above technical problems, the technical solution provided by the present application has the following general idea:

the embodiment of the application provides a rapid scene identification method under a complex environment, wherein the method is applied to a rapid scene identification system under the complex environment, the system comprises an image acquisition device, and the method comprises the following steps of; obtaining first image data by the image acquisition device; constructing a first scene recognition model through a GoogleNet network; network parameter training is carried out on the first scene recognition model on a self-built five-type terrain data set, and the first scene recognition model after convergence is obtained; carrying out transplantation deployment of the converged first scene recognition model at an embedded end through TensorRT; inputting the first image data into the converged first scene recognition model after deployment, and performing forward reasoning to obtain a recognition result. The embedded end deployment of the scene recognition model with high accuracy is realized by utilizing the efficient deep neural network, and the technical effects of real-time performance, accuracy and rapidity of scene recognition are met.

Having thus described the general principles of the present application, various non-limiting embodiments thereof will now be described in detail with reference to the accompanying drawings.

Example one

As shown in fig. 1, an embodiment of the present application provides a method for fast scene recognition in a complex environment, where the method is applied to a system for fast scene recognition in a complex environment, where the system includes an image capture device, and the method includes:

step S100: obtaining first image data by the image acquisition device;

specifically, the image acquisition device comprises a camera module and a laser radar, wherein after the camera module forms an image through a camera, an optical signal captured from the outside is converted into an electric signal to be processed; the laser radar collects outside distance information by receiving reflected laser, encodes the collected information into a point cloud data packet and processes the point cloud data packet. The first image data is the integration of the camera module and the data collected by the laser radar. The camera module and the laser radar need to initialize and set related parameters through a control bus and a USB interface Ethernet interface according to a processor after acquiring related information. Furthermore, the image data can be subjected to advanced relevant preprocessing and then to next data processing, the corona data packet can be analyzed and then to relevant calculation, and in addition, all data information obtained by the image acquisition device can be integrated and then transmitted.

Step S200: constructing a first scene recognition model through a GoogleNet network;

specifically, the first scene recognition model is constructed and realized based on the GoogleNet network, wherein the construction based on the GoogleNet network is based on obtaining a high-quality scene model so as to increase the depth or the width of the model, and in detail, the GoogleNet network constructs a 'basic neuron' inclusion structure to construct a sparse and high-computing-performance network structure. Generally speaking, since general convolution is converted into sparse connection, the calculation efficiency is not high, and therefore, the main idea of the inclusion structure is to find out an approximate optimal local sparse structure, so that optimization operation should be completed by classification, and therefore the technical effects of accurately constructing the first scene identification model and improving the accuracy of processing data by the first scene identification model are achieved.

Step S300: network parameter training is carried out on the first scene recognition model on a self-built five-type terrain data set, and the first scene recognition model after convergence is obtained;

specifically, the detail level generated by the terrain data set for improving efficiency is generated by a point reduction or point refinement process based on a terrain pyramid due to the complexity of a flight scene, so that the number of measurement values required for representing the surface of a given area is reduced for research, not only the basic features of the terrain are maintained, but also the diversity accuracy of training data is ensured, and since the purpose of the network is to classify, the loss function optimized in the network parameter training process is a cross entropy function of a predicted value and a true value:

wherein p is_iRepresenting the probability of the sample belonging to the ith class; y is_iOnehot representation representing a sample label, y when a sample belongs to the category i_i1, otherwise y_i0; and c represents a sample label. And further. By training the network parameters of the first scene recognition model, the corresponding optimal scene recognition accuracy can be obtained, and the technical effect of ensuring high-precision recognition of the first scene recognition model is further achieved.

Step S400: carrying out transplantation deployment of the converged first scene recognition model at an embedded end through TensorRT;

specifically, the TensorRT is a high-performance deep learning inference optimizer, and can provide low-delay and high-throughput deployment inference for deep learning applications. After the server end completes model training, the model obtained through training needs to be transplanted to the embedded end. The TensorRT is adopted to carry out transplantation and deployment of the network at the embedded end, and the reasoning speed of the scene recognition model can be accelerated. The designed embedded device needs to have the functions of acquiring image data, acquiring laser radar data, transmitting, carrying and the like. In detail, after the scene model is trained, the information of the first scene model is saved, and the weight of the model is saved in a binary form. And the wts file with the model structure information and the binary weight parameters is moved to the embedded end, and the next processing is completed based on the embedded end. The method mainly comprises the steps of reconstructing a model structure, combining some operations together, wherein the operation combination comprises vertical combination and horizontal combination, the vertical combination is to combine Conv, BN and Relu three layers of the current mainstream neural network structure into one layer, and the horizontal combination is to combine the layers which are input into the same tensor and execute the same operation together. Therefore, the technical effect of accelerating the reasoning speed of the scene recognition model is achieved.

Step S500: inputting the first image data into the converged first scene recognition model after deployment, and performing forward reasoning to obtain a recognition result.

Specifically, model reasoning is further completed through the embedded end, wherein the first scene recognition model needs to be subjected to format conversion and stored before model reasoning is carried out, when the model needs to be operated, the model can be directly loaded through deserialization, the time for program initialization is greatly shortened, and the first scene recognition model is transplanted and deployed, so that the technical effects that the recognition delay is low in a process of a flying scene, and the method can be suitable for high-real-time application occasions are achieved.

Further, where the first scene recognition model is constructed through a GoogleNet network, step S200 in the embodiment of the present application further includes:

step S210: obtaining an inclusion structure according to the GoogleNet network;

step S220: extracting features by utilizing convolution kernels with different sizes through a plurality of branches according to the inclusion structure to obtain feature maps with different sizes;

step S230: and performing feature fusion according to the feature graphs of different sizes to obtain the first scene recognition model.

In particular, since the main idea of the inclusion structure is to find an approximately optimal local sparse structure, the characteristics are extracted by a plurality of branches by using convolution kernels with different sizes to obtain characteristic diagrams of reception fields with different sizes, and finally the characteristic diagrams are spliced to fuse characteristics with different scales, in particular, the inclusion structure increases the width of a network on one hand and also increases the adaptability of the network to the scales on the other hand, since extracting features through convolution kernels of different sizes implies a receptive field of different sizes, and final stitching implies fusion of features of different scales, namely, the features with the same dimensionality can be obtained through setting a convolution kernel, and then the features are directly spliced together to further obtain the optimal local sparse structure so as to obtain the first scene recognition model, in order to avoid too many channels of the feature map, correlation processing of convolution kernels is required. The convolutional network layer can extract information and simultaneously reduce overfitting so as to increase the nonlinear characteristics of the network, thereby achieving the technical effect of increasing the operation speed on the basis of processing more and richer spatial characteristics.

Further, in which the converged first scene recognition model is transplanted and deployed at an embedding end through TensorRT, step S400 in this embodiment of the present application further includes:

step S410: generating wts files of the converged first scene recognition model at a server side;

step S420: moving the wts file to an embedded end;

step S430: defining, by the embedded end, the first scene recognition model definition layer and/or structure information;

step S440: loading and analyzing the wts file to obtain model structure information and weight parameters;

step S450: and deploying and creating the first scene recognition model according to the model structure information and the weight parameters.

Specifically, after the scene model is trained at the server side, the information of the model is saved, and the weight of the model is saved in a binary form. The wts file with model structure information and binary weight parameters is then moved to the embedded end, where further the model can be rewritten in machine language, since TensrT is a C + + library, providing C + + API, basically with more classical layers such as convolution, deconvolution, full concatenation, softmax, etc., with corresponding implementations in TensrT. Specifically, the relevant layer or structure is defined by using the API of the TensorRT, and then the wts file having the model structure information and the weight parameters is loaded and parsed to obtain the structure information and the weight parameters of the model, which are stored in the map container. And calling a predefined layer or structure according to the structural information of the model, loading the model weight parameters and then creating the model, thereby realizing the embedded end deployment effect of the rapid scene recognition device and the scene recognition model with high accuracy.

Further, before inputting the first image data into the converged first scene recognition model after deployment and performing forward inference to obtain a recognition result, step S500 in this embodiment of the present application further includes:

step S510: converting the first scene recognition model into an Engine file;

step S520: and storing the Engine file after serialization.

In particular, the process of building an Engine due to TensorRT is often time consuming, especially on embedded devices. Therefore, the first scene recognition model is converted into the Engine file capable of loading the model, the generated Engine file is further stored after serialization, when the model needs to be operated, the model can be directly loaded through deserialization, the program initialization time is greatly shortened, the program modularization is realized, the model calling speed is improved, the response speed of the model when the model processes data is higher, and the technical effect of meeting the requirements of scene recognition accuracy and rapidity is achieved.

Further, in an embodiment of the present invention, the inputting the first image data into the converged first scene recognition model after deployment is performed, and performing forward inference to obtain a recognition result, step S500 further includes:

step S530: judging whether a device needs to operate the first scene recognition model or not;

step S540: and if the device needs to operate the first scene recognition model, loading the first scene recognition model through deserialization, and performing forward reasoning to obtain a recognition result.

In particular, since each module is independent of each other, the loading speed of the program is faster, and the modules are loaded only when the corresponding functions are requested, the updates are easily applied to the respective modules without affecting other parts of the program. Further, the deserialization core is the saving and reconstruction of the object state, and the search of the forward reasoning, which starts from the initial data towards the target, aims to find the path through the problem space. The forward reasoning process can be mainly expressed as that a reasoning engine utilizes the provided information to explore a knowledge base, the priority of the constraint is matched with the given current state, and then an accurate recognition result is obtained, so that the technical effects of meeting the situation with high real-time requirement, having very high scene recognition accuracy and being capable of quickly and accurately recognizing a complex down-stepping scene are achieved.

To sum up, the method and the device for fast scene recognition in a complex environment provided by the embodiment of the present application have the following technical effects:

1. because the method adopts the mode that the camera acquires image data through photoelectric imaging, the laser radar acquires distance information and encodes the distance information into a point cloud data packet by receiving reflected laser, the point cloud data packet is transmitted to the core processor through the USB interface, the Ethernet interface and the data bus to be processed, the first scene recognition model is built based on the GoogleNet network architecture, the network parameter training is carried out on the first scene recognition model on the self-built five-type terrain data set to obtain the converged first scene recognition model, then the transplantation deployment at the embedding end is carried out, and the deployed converged first scene recognition model is subjected to forward reasoning to obtain a recognition result, the embedding end deployment of the high-accuracy scene recognition model is realized by utilizing the high-efficiency deep neural network, and the real-time performance of scene recognition is met, Accuracy and rapidity.

2. Because a network structure with sparsity and high computing performance is built based on a GoogleNet network architecture, and a characteristic fusion mode is carried out according to the inclusion structure, the first scene recognition model with higher precision is obtained, and the technical effect of improving the recognition speed is achieved.

3. Because the model obtained after training is transplanted to the embedded end, the transplantation and deployment of the network at the embedded end are carried out based on TensrT, and the embedded hardware device designed by deep neural network deployment accelerates the reasoning speed of the scene recognition model with the powerful calculation performance, the recognition speed is high, and the technical effect of occasions with higher real-time requirements can be met.

Example two

Based on the same inventive concept as the method for identifying a fast scene in a complex environment in the foregoing embodiment, the present invention further provides a device for identifying a fast scene in a complex environment, as shown in fig. 2, the device includes:

a first obtaining unit 11, wherein the first obtaining unit 11 is used for obtaining first image data through the image acquisition device;

a first construction unit 12, wherein the first construction unit 12 is configured to construct a first scene recognition model through a google net network;

a second obtaining unit 13, where the second obtaining unit 13 is configured to perform network parameter training on the first scene recognition model on a self-established five-class terrain data set, and obtain the converged first scene recognition model;

a first operation unit 14, wherein the first operation unit 14 is configured to perform migration deployment of the converged first scene recognition model at an embedding end through TensorRT;

a third obtaining unit 15, where the third obtaining unit 15 is configured to input the first image data into the converged first scene identification model after deployment is completed, and perform forward inference to obtain an identification result.

Further, the apparatus further comprises:

a fourth obtaining unit, configured to obtain an inclusion structure according to the GoogleNet network;

a fifth obtaining unit, configured to extract features by using convolution kernels of different sizes through a plurality of branches according to the inclusion structure, and obtain feature maps of different sizes;

a sixth obtaining unit, configured to perform feature fusion according to the feature maps of different sizes to obtain the first scene identification model.

Further, the apparatus further comprises:

a first generating unit, configured to generate, at a server side, an wts file of the converged first scene recognition model;

a first moving unit for moving the wts file to an embedded end;

a first defining unit for defining the first scene recognition model definition layer and/or structure information by the embedding terminal;

a seventh obtaining unit, configured to load and parse the wts file, and obtain model structure information and weight parameters;

a first creating unit, configured to deploy and create the first scene recognition model according to the model structure information and the weight parameter.

Further, the apparatus further comprises:

a first conversion unit for converting the first scene recognition model into an Engine file;

and the first storage unit is used for storing the Engine file after serialization.

Further, the apparatus further comprises:

a first judging unit, configured to judge whether a device needs to run the first scene recognition model;

and the eighth obtaining unit is used for loading the first scene recognition model through deserialization and carrying out forward reasoning to obtain a recognition result if the first scene recognition model needs to be operated by the device.

Various changes and specific examples of the method for identifying a fast scene in a complex environment in the first embodiment of fig. 1 are also applicable to the apparatus for identifying a fast scene in a complex environment of the present embodiment, and through the foregoing detailed description of the method for identifying a fast scene in a complex environment, a person skilled in the art can clearly know the method for implementing the apparatus for identifying a fast scene in a complex environment of the present embodiment, so for the brevity of the description, detailed descriptions are omitted here.

Exemplary electronic device

The electronic device of the embodiment of the present application is described below with reference to fig. 3.

Fig. 3 illustrates a schematic structural diagram of an electronic device according to an embodiment of the present application.

Based on the inventive concept of the method for fast scene recognition in a complex environment in the foregoing embodiments, the present invention further provides a device for fast scene recognition in a complex environment, in which a computer program is stored, and when the computer program is executed by a processor, the steps of any one of the methods for fast scene recognition in a complex environment are implemented.

Where in fig. 3 a bus architecture (represented by bus 300), bus 300 may include any number of interconnected buses and bridges, bus 300 linking together various circuits including one or more processors, represented by processor 302, and memory, represented by memory 304. The bus 300 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 306 provides an interface between the bus 300 and the receiver 301 and transmitter 303. The receiver 301 and the transmitter 303 may be the same element, i.e., a transceiver, providing a means for communicating with various other apparatus over a transmission medium.

The processor 302 is responsible for managing the bus 300 and general processing, and the memory 304 may be used for storing data used by the processor 302 in performing operations.

The embodiment of the invention provides a rapid scene identification method under a complex environment, wherein the method is applied to a rapid scene identification system under the complex environment, the system comprises an image acquisition device, and the method comprises the following steps of; obtaining first image data by the image acquisition device; constructing a first scene recognition model through a GoogleNet network; network parameter training is carried out on the first scene recognition model on a self-built five-type terrain data set, and the first scene recognition model after convergence is obtained; carrying out transplantation deployment of the converged first scene recognition model at an embedded end through TensorRT; inputting the first image data into the converged first scene recognition model after deployment, and performing forward reasoning to obtain a recognition result. The technical problems that in the prior art, device errors are easy to exist in sensor identification, the complex scene identification rate is low, the identification speed is low, and the method is not suitable for real-time occasions are solved, the embedded end deployment of the scene identification model with high accuracy is realized by utilizing the high-efficiency deep neural network, and the technical effects of real-time performance, accuracy and rapidity of scene identification are met.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A rapid scene recognition method under a complex environment is applied to a rapid scene recognition system under the complex environment, the system comprises an image acquisition device, and the method comprises the following steps of;

obtaining first image data by the image acquisition device;

constructing a first scene recognition model through a GoogleNet network;

network parameter training is carried out on the first scene recognition model on a self-built five-type terrain data set, and the first scene recognition model after convergence is obtained;

carrying out transplantation deployment of the converged first scene recognition model at an embedded end through TensorRT;

inputting the first image data into the converged first scene recognition model after deployment, and performing forward reasoning to obtain a recognition result.

2. The method of claim 1, wherein said building a first scene recognition model through a GoogleNet network comprises:

obtaining an inclusion structure according to the GoogleNet network;

extracting features by utilizing convolution kernels with different sizes through a plurality of branches according to the inclusion structure to obtain feature maps with different sizes;

and performing feature fusion according to the feature graphs of different sizes to obtain the first scene recognition model.

3. The method of claim 1, wherein the optimized loss function in the first scene recognition model training process is a cross-entropy function of predicted values and true values:

wherein p is_iRepresenting the probability of the sample belonging to the ith class;

y_ionehot representation representing a sample label, y when a sample belongs to the category i_i1, otherwise y_i＝0；

And c represents a sample label.

4. The method of claim 1, wherein said converged migration deployment of said first scene recognition model at an embedding end by TensorRT comprises;

generating wts files of the converged first scene recognition model at a server side;

moving the wts file to an embedded end;

defining, by the embedded end, the first scene recognition model definition layer and/or structure information;

loading and analyzing the wts file to obtain model structure information and weight parameters;

and deploying and creating the first scene recognition model according to the model structure information and the weight parameters.

5. The method of claim 1, wherein said inputting said first image data into said first scene recognition model after said convergence of deployment, and performing forward reasoning, before obtaining recognition results, comprises;

converting the first scene recognition model into an Engine file;

and storing the Engine file after serialization.

6. The method as claimed in claim, wherein said inputting said first image data into said first scene recognition model after deployment is completed and said converged scene recognition model is subject to forward reasoning to obtain recognition results, including;

judging whether a device needs to operate the first scene recognition model or not;

and if the device needs to operate the first scene recognition model, loading the first scene recognition model through deserialization, and performing forward reasoning to obtain a recognition result.

7. An apparatus for fast scene recognition in a complex environment, wherein the apparatus comprises:

a first obtaining unit for obtaining first image data by the image acquisition device;

the first construction unit is used for constructing a first scene recognition model through a GoogleNet network;

a second obtaining unit, configured to perform network parameter training on the first scene recognition model on a self-established five-class terrain data set, and obtain the converged first scene recognition model;

a first operation unit, configured to perform migration deployment of the converged first scene recognition model at an embedding end through a TensorRT;

a third obtaining unit, configured to input the first image data into the converged first scene identification model after deployment is completed, and perform forward inference to obtain an identification result.

8. A fast scene recognition apparatus in a complex environment, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method according to any one of claims 1 to 6 when executing the program.