CN117612139A

CN117612139A - Scene target detection method and system based on deep learning and electronic equipment

Info

Publication number: CN117612139A
Application number: CN202311756173.1A
Authority: CN
Inventors: 谢思敏
Original assignee: Kunming Shengai Hehao Technology Co ltd
Current assignee: Kunming Shengai Hehao Technology Co ltd
Priority date: 2023-12-19
Filing date: 2023-12-19
Publication date: 2024-02-27

Abstract

The application relates to the field of target detection, and particularly discloses a scene target detection method, a scene target detection system and electronic equipment based on deep learning. Thus, real-time target detection can be realized in the automobile process, and the road environment can be perceived and understood more accurately.

Description

Scene target detection method and system based on deep learning and electronic equipment

Technical Field

The present application relates to the field of object detection, and more particularly, to a scene object detection method, system and electronic device based on deep learning.

Background

With the continuous development of economy and society, an automobile is taken as an important transportation means, and brings great convenience to the travel of people. However, as the number of automobiles increases rapidly, a number of problems associated therewith are also associated. Accident frequency becomes a worrying problem, a large number of traffic accidents occur each year, and huge threats are brought to life and property safety of people. Meanwhile, traffic jams are also increasing, especially in urban central areas, when vehicles are waiting in line.

To address these traffic issues, autopilot technology is becoming a focus of attention. The automatic driving automobile can autonomously sense the surrounding road environment in the driving process, and make corresponding decisions and actions according to the sensed information. The development of this technology has received extensive attention and investment from many countries and enterprises.

In an environment-aware system for an autonomous car, object detection is a very critical component. The method and the system provide accurate data for a decision system by analyzing and processing the received road information so as to ensure the driving safety. However, autopilot automobiles face a complex and diverse road environment in which there are many difficult-to-detect target objects that are small in size and may be occluded. Conventional target detection algorithms generally perform in the face of these situations, and it is difficult to meet the requirements of accuracy and stability.

Thus, there is a need for an optimized scene object detection scheme based on deep learning.

Disclosure of Invention

The present application has been made in order to solve the above technical problems. The embodiment of the application provides a scene target detection method, a scene target detection system and electronic equipment based on deep learning, which adopt an artificial intelligent detection technology based on machine vision, acquire a vehicle surrounding environment image in the running process of an automobile through shooting by a vehicle-mounted camera, extract the characteristics of a high-dimensional space of the vehicle surrounding environment image, and analyze and judge whether a target object exists. Thus, real-time target detection can be realized in the automobile process, and the road environment can be perceived and understood more accurately.

According to one aspect of the present application, there is provided a scene object detection method based on deep learning, including:

acquiring a vehicle surrounding image shot by a vehicle-mounted camera;

performing image preprocessing on the vehicle surrounding environment image to obtain a preprocessed vehicle surrounding environment image;

passing the preprocessed vehicle surrounding environment image through an environment feature extractor based on a mixed convolution layer to obtain a plurality of vehicle environment feature images;

fusing the plurality of vehicle environment feature images to obtain a vehicle environment comprehensive feature image;

inputting the comprehensive characteristic map of the vehicle environment into a Shuffle NetV2 basic module to obtain a target detection classification characteristic map;

optimizing the target detection classification characteristic diagram to obtain an optimized target detection classification characteristic diagram;

and the optimized target detection classification characteristic diagram is passed through a classifier to obtain a classification result, wherein the classification result is used for indicating whether a target object exists or not.

In the scene target detection method based on deep learning, the mixed convolution layer comprises a first convolution branch structure, a second convolution branch structure, a third convolution branch structure and a fourth convolution branch structure which are parallel, wherein the first convolution branch uses a first convolution kernel with a first size, the second convolution branch uses a second convolution kernel with a first size and a first void ratio, the third convolution branch uses a third convolution kernel with a first size and a second void ratio, and the fourth convolution branch uses a fourth convolution kernel with a first size and a third void ratio.

In the scene object detection method based on deep learning, the step of obtaining a plurality of vehicle environment feature maps by passing the preprocessed vehicle surrounding environment image through an environment feature extractor based on a mixed convolution layer includes: performing convolutional encoding on the preprocessed vehicle surrounding environment image by using the first convolutional check with the first size to obtain a first-scale vehicle environment feature map; performing convolutional encoding on the preprocessed vehicle surrounding environment image by using the second convolutional check with the first void ratio to obtain a second-scale vehicle environment feature map; performing convolutional encoding on the preprocessed vehicle surrounding environment image by using the third convolutional check with the second void ratio to obtain a third-scale vehicle environment feature map; and performing convolutional encoding on the preprocessed vehicle surrounding environment image by using the fourth convolutional check with the third void ratio to obtain a fourth-scale vehicle environment characteristic diagram.

In the scene target detection method based on deep learning, the optimizing the target detection classification feature map to obtain an optimized target detection classification feature map includes: calculating the hidden characteristic expression of the motion parameterization model of the target detection classification characteristic diagram relative to the target classification function according to the following optimization formula to obtain the optimized target detection classification characteristic diagram; wherein, the optimization formula is:

wherein f _i，j，k Characteristic values representing the (i, j, k) th position of the target detection classification characteristic diagram, log represents a logarithmic function value based on 2, lambda represents a predetermined super-parameter, f _i，j，k ' represents the feature value of the (i, j, k) th position of the optimization target detection classification feature map.

In the scene target detection method based on deep learning, the step of passing the optimized target detection classification feature map through a classifier to obtain a classification result, wherein the classification result is used for indicating whether a target object exists or not, and the method comprises the following steps: processing the optimization target detection classification feature map with the classifier in the following classification formula to generate the classification result; wherein, the classification formula is:

O＝softmax{(W _n ，B _n )：…：(W ₁ ，B ₁ )|Project(F)}

wherein O is the classification result, project (F) represents projecting the optimization target detection classification feature map as a vector, W ₁ To W _n Weight matrix for all the connection layers of each layer, B ₁ To B _n Representing the bias vector of each fully connected layer, softmax is a normalized exponential function.

According to another aspect of the present application, there is provided a scene object detection system based on deep learning, including:

the vehicle surrounding environment data acquisition module is used for acquiring vehicle surrounding environment images shot by the vehicle-mounted camera;

the environment image preprocessing module is used for preprocessing the vehicle surrounding environment image to obtain a preprocessed vehicle surrounding environment image;

the environment feature extraction module is used for enabling the preprocessed vehicle surrounding environment image to pass through an environment feature extractor based on a mixed convolution layer to obtain a plurality of vehicle environment feature images;

the environment characteristic fusion module is used for fusing the plurality of vehicle environment characteristic diagrams to obtain a vehicle environment comprehensive characteristic diagram;

the target detection feature generation module is used for inputting the vehicle environment comprehensive feature map into a Shuffle NetV2 basic module to obtain a target detection classification feature map;

the optimizing module is used for optimizing the target detection classification characteristic diagram to obtain an optimized target detection classification characteristic diagram;

and the target detection result generation module is used for enabling the optimized target detection classification characteristic diagram to pass through a classifier to obtain a classification result, wherein the classification result is used for indicating whether a target object exists or not.

According to yet another aspect of the present application, there is provided an electronic device comprising a memory and a processor coupled to the memory, the processor being configured to perform the scene object detection method based on deep learning as described above based on instructions stored in the memory.

Compared with the prior art, the scene target detection method, the scene target detection system and the scene target detection electronic equipment based on the deep learning adopt an artificial intelligent detection technology based on machine vision, a vehicle surrounding environment image in the running process of an automobile is obtained through shooting by a vehicle-mounted camera, and high-dimensional space feature extraction is carried out on the vehicle surrounding environment image so as to analyze and judge whether a target object exists. Thus, real-time target detection can be realized in the automobile process, and the road environment can be perceived and understood more accurately.

Drawings

The foregoing and other objects, features and advantages of the present application will become more apparent from the following more particular description of embodiments of the present application, as illustrated in the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.

Fig. 1 is a flowchart of a scene object detection method based on deep learning according to an embodiment of the present application.

Fig. 2 is an architecture diagram of a scene object detection method based on deep learning according to an embodiment of the application.

Fig. 3 is a flowchart of a method for detecting a scene object based on deep learning according to an embodiment of the present application, in which the preprocessed vehicle surrounding image is passed through an environment feature extractor based on a mixed convolution layer to obtain a plurality of vehicle environment feature maps.

Fig. 4 is a system block diagram of a scene object detection system based on deep learning according to an embodiment of the application.

Fig. 5 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings. These embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The illustrative embodiments of the present application and their description are presented herein to illustrate the application and not to limit the application.

As described above in the background art, with the development of social and economic technologies, automobiles are used as important vehicles, which greatly facilitates the travel of people. However, as the number of automobiles increases rapidly, a number of problems associated therewith are also associated. Among them, traffic accidents frequently occur alarming, and a large number of accidents bring huge threats to life and property safety of people every year. Meanwhile, traffic jam problems are also increasing, especially in urban central areas, when vehicles are waiting in line. To address these traffic issues, autopilot technology has become a hotspot concern. Autopilot is a comprehensive system integrating a number of advanced technologies, the key technologies of which include environmental awareness, logical reasoning and decision-making, motion control, processor performance, etc. With the continuous progress and development of technology, autopilot will play an increasingly important role in future traffic fields. Environmental awareness is a critical part of an autopilot system that includes vehicle awareness and understanding of the surrounding environment. The automatic driving automobile can independently sense the surrounding road environment in the driving process and make corresponding decisions and actions according to the sensed information. This development of technology has attracted extensive attention and investment by many countries and enterprises. In an environment-aware system for an autonomous car, object detection is a crucial component. It provides accurate data to decision systems by analyzing and processing received road information to ensure driving safety, such as identifying traffic signals, obstacles, pedestrians, and other vehicle locations and movements. However, autopilot automobiles face a complex and diverse road environment in which there are many difficult-to-detect target objects that are small in size and may be occluded. Conventional target detection algorithms generally perform in the face of these situations, and it is difficult to meet the requirements of accuracy and stability. Therefore, an optimized scene object detection scheme based on deep learning is desired.

In recent years, deep learning and neural networks have been widely used in the fields of computer vision, natural language processing, text signal processing, and the like. In addition, deep learning and neural networks have also shown levels approaching and even exceeding humans in the fields of image classification, object detection, semantic segmentation, text translation, and the like. The development of deep learning and neural networks provides new solutions and schemes for scene target detection based on deep learning.

The technical concept of the application is that the acquired vehicle surrounding environment image is preprocessed, then the feature extractor is used for extracting relevant vehicle surrounding environment features from the image, and the extracted features are used for judging whether a target object exists or not through the classifier.

Fig. 1 is a flowchart of a scene object detection method based on deep learning according to an embodiment of the present application. Fig. 2 is an architecture diagram of a scene object detection method based on deep learning according to an embodiment of the application. As shown in fig. 1 and 2, a scene object detection method based on deep learning according to an embodiment of the present application includes: s110, acquiring a vehicle surrounding image shot by a vehicle-mounted camera; s120, performing image preprocessing on the vehicle surrounding environment image to obtain a preprocessed vehicle surrounding environment image; s130, passing the preprocessed vehicle surrounding environment image through an environment feature extractor based on a mixed convolution layer to obtain a plurality of vehicle environment feature graphs; s140, fusing the plurality of vehicle environment feature images to obtain a vehicle environment comprehensive feature image; s150, inputting the comprehensive characteristic map of the vehicle environment into a Shuffle NetV2 basic module to obtain a target detection classification characteristic map; s160, optimizing the target detection classification characteristic map to obtain an optimized target detection classification characteristic map; and S170, the optimized target detection classification characteristic diagram is passed through a classifier to obtain a classification result, and the classification result is used for indicating whether a target object exists or not.

Specifically, in the technical scheme of the present application, first, a vehicle surrounding image captured by an in-vehicle camera is acquired. It will be appreciated that an onboard camera may provide a true perception of the environment surrounding the vehicle. By acquiring the environmental image, real-time, high-resolution visual information can be obtained, which can reflect the real road situation and the position, shape, movement state, etc. of surrounding objects. This is critical for an autopilot system because it requires accurate perception of the surrounding environment to make the correct decisions. In an autopilot scenario, real-time and low latency are very important because the vehicle needs to sense the surrounding environment and respond in time. The vehicle-mounted camera can provide real-time image data, and the processing and transmission delay of sensor data can be reduced by directly acquiring the image of the vehicle-mounted camera, so that the real-time performance of the system is improved.

Various noise and interference, such as image blurring, illumination variation, lens distortion, etc., may exist in the image acquired by the vehicle-mounted camera, and these factors may affect the performance of the target detection algorithm. Thus, the vehicle surroundings image is subjected to image preprocessing to obtain a preprocessed vehicle surroundings image. By preprocessing the image of the surrounding environment of the vehicle, noise and interference in the image of the surrounding environment of the vehicle can be reduced, the quality of the image is improved, more accurate detection of the target is facilitated, and the visibility and the identification degree of the target are also facilitated, so that the target is easier to detect.

In an embodiment of the present application, the image preprocessing of the vehicle surrounding image to obtain the preprocessed vehicle surrounding image may be: 1. reading an original vehicle surrounding image; 2. performing image smoothing processing by using a Gaussian filter to remove high-frequency noise and smooth images, wherein a deblurring algorithm can be considered to be applied to reduce image blurring; 3. selecting the proper histogram equalization to enhance the contrast of the image according to the requirement, which helps to highlight the boundary and detail of the target, makes the target more obvious and easy to detect, can try to use CLAHE (Contrast Limited Adaptive Histogram Equalization) algorithm, and can maintain the local contrast of the image; 4. and carrying out scale normalization processing on the image subjected to the steps, scaling the image to a uniform scale, and carrying out image scaling operation by using bilinear interpolation, nearest neighbor interpolation and other methods.

The preprocessed vehicle surroundings image contains rich information, and features related to the target object need to be extracted from the image. That is, the preprocessed vehicle surroundings image is passed through a mixed convolution layer-based surroundings feature extractor to obtain a plurality of vehicle surroundings feature maps. The mixed convolution layer can capture information of structures, textures, colors and the like of the vehicle surroundings by inputting the preprocessed vehicle surroundings image into an environment feature extractor of the mixed convolution layer. Convolutional layers are a type of layer commonly used in deep learning, which performs a filtering operation on an input image in a sliding manner to extract local features in the image. The convolution layer consists of a set of learnable convolution kernels (also called filters), each of which can extract a different feature. Specifically, the convolution layer obtains an output feature map by multiplying the convolution kernel by elements at different positions of the input image and adding the multiplication results. This process can be seen as a process where the convolution kernel slides over the input image and performs local feature extraction. The parameters of the convolution kernel are learnable, and the optimal feature extraction mode is automatically learned through a training process. The hybrid convolution layer contains a plurality of convolution kernels of different scales at which features can be extracted. Thus, environmental characteristics with different scales can be captured, and the target detection algorithm can effectively detect targets with different scales. For example, a smaller convolution kernel may capture detail features, while a larger convolution kernel may capture overall structural features.

Specifically, the hybrid convolution layer includes a first convolution branch structure, a second convolution branch structure, a third convolution branch structure, and a fourth convolution branch structure in parallel, wherein the first convolution branch uses a first convolution kernel having a first size, the second convolution branch uses a second convolution kernel having a first size and having a first void fraction, the third convolution branch uses a third convolution kernel having a first size and having a second void fraction, and the fourth convolution branch uses a fourth convolution kernel having a first size and having a third void fraction. In this embodiment, the convolution kernel size of the first dimension is 3×3.

Fig. 3 is a flowchart of a method for detecting a scene object based on deep learning according to an embodiment of the present application, in which the preprocessed vehicle surrounding image is passed through an environment feature extractor based on a mixed convolution layer to obtain a plurality of vehicle environment feature maps. As shown in fig. 3, the step of passing the preprocessed vehicle surrounding image through an environment feature extractor based on a mixed convolution layer to obtain a plurality of vehicle environment feature graphs includes: s131, performing convolutional encoding on the preprocessed vehicle surrounding environment image by using the first convolutional check with the first size to obtain a first-scale vehicle environment feature map; s132, performing convolutional encoding on the preprocessed vehicle surrounding environment image by using the second convolutional check with the first void ratio to obtain a second-scale vehicle environment feature map; s133, performing convolutional encoding on the preprocessed vehicle surrounding environment image by using the third convolutional check with the second void ratio to obtain a third-scale vehicle environment feature map; and S134, performing convolutional encoding on the preprocessed vehicle surrounding environment image by using the fourth convolutional check with the third void ratio to obtain a fourth-scale vehicle environment feature map.

It should be appreciated that each vehicle environment feature map contains feature information of different levels and scales, and by fusing them together, feature representations of different levels and scales can be comprehensively considered, so that the diversity and complexity of the target object can be better captured. In the technical scheme of the application, the plurality of vehicle environment feature maps are fused to obtain a vehicle environment comprehensive feature map. There are a number of ways to fuse multiple vehicle environment feature maps, such as using convolution operations, weighted summation, and the like. The aim of the method is to mutually supplement and enhance the useful information in different characteristic diagrams and improve the accuracy and the robustness of target detection.

In this embodiment of the present application, the fusing the plurality of vehicle environment feature maps to obtain the vehicle environment integrated feature map may be implemented as follows: there are 4 vehicle environment feature maps, F1, F2, F3, F4, respectively. The size of each feature map is h×w×c, where H represents the height, W represents the width, and C represents the number of channels. The multiple feature maps can be fused by using a weighted summation mode, and the specific steps are as follows: and (3) distributing a weight to each feature map to represent the importance of the feature map in fusion, wherein the weight can be set according to the quality, importance or other priori knowledge of the feature map, the weights are assumed to be w1, w2, w3 and w4, and the weights are normalized to meet the condition that the sum is 1. I.e., w1' =w1/(w1+w2+w3+w4), w2' =w2/(w1+w2+w3+w4),. W4' =w4/(w1+w2+w3+w4); multiplying each feature map Fi by a corresponding normalized weight wi 'and adding them to obtain a fused feature map Ffusion, where ffusion=w1' +w2 '. F2+ + w4'. F4; the fused feature map Ffusion is a vehicle environment comprehensive feature map, and the size is H×W×C, which is the same as the input feature map.

After the comprehensive characteristic map of the vehicle environment is obtained, the characteristics can be further processed and extracted through the deep learning network, so that more accurate characteristic representation is provided for the target detection task. That is, the vehicle environment comprehensive feature map is input into a Shuffle NetV2 basic module to obtain a target detection classification feature map. The SheffeNetV 2 is a lightweight convolutional neural network structure, has lower computational complexity and parameter quantity, and is suitable for target detection in a resource-limited environment. By inputting the comprehensive characteristic map of the vehicle environment into the ShuffleNetV2 basic module, the capability of the convolution operation and the characteristic mapping recombination can be utilized to further extract and integrate the characteristic information. The ShuffleNetV2 base module can effectively reduce the dimension of the feature map and reduce the computational complexity while maintaining the feature information. It can extract more discriminative feature representations, helping to distinguish different classes of target objects. The lightweight nature of the ShuffleNetV2 also ensures that real-time target detection requirements are met with limited computational resources. The design can effectively combine deep learning and a lightweight network model to realize an efficient and accurate target detection function.

The SheffeNetV 2 basic module adopts the techniques of channel-by-channel grouping convolution, channel rearrangement and the like, and can remarkably reduce the parameter number and the calculated amount of a model while maintaining higher accuracy. In this embodiment, one implementation manner of inputting the vehicle environment integrated feature map into the Shuffle NetV2 basic module to obtain the target detection classification feature map may be: first, the input vehicle environment comprehensive feature map is divided into several groups by channels, and the number of channels C is divided into G groups, each group contains C/G channels, and for each group, independent convolution operation is performed on each group, and each group can be processed by standard convolution operation or depth separable convolution, so that G output feature maps are obtained, and each feature map has C/G channels. And then, carrying out channel rearrangement operation on the output characteristic diagrams so as to alternately arrange the channels of each group together, wherein for each output characteristic diagram, the channels can be rearranged according to a certain rule, and the specific arrangement rule can be determined according to actual requirements. The channel reordering operations may be implemented using simple tensor operations, such as those using reshape and trans-phase, among others. By the lane-by-lane grouping convolution and lane rearrangement operation, the vehicle environment integrated feature map can be processed as a target detection classification feature map having a smaller number of lanes.

In particular, in the technical solution of the present application, it is considered that since a plurality of vehicle environment feature maps are extracted by the environment feature extractor based on the mixed convolution layer, there is a difference in feature distribution of the plurality of vehicle environment feature maps in the high-dimensional feature space. Specifically, the hybrid convolution layer based environmental feature extractor extracts features of the vehicle surroundings images at different levels, which contain information of different levels of abstraction. An earlier level may extract some low-level detail features such as edges, textures, etc., while a deeper level may extract higher-level semantic features such as shape, structure, etc. of the object. Because of the different locations and structures of the feature extractors in the network, the perception ranges and perception capabilities of the different levels of feature extractors to the surrounding environment of the vehicle are different. Earlier levels are closer to the input image and are more sensitive to detail information and local features; and deeper layers focus more on global semantic information. When a plurality of vehicle environment feature maps are fused, there is a possibility that the fused feature maps have structural collapse in terms of local feature distribution due to their distribution differences in the feature space. This means that some detail information may be lost or obscured, affecting the accuracy of the target detection classification feature map obtained by the Shuffle NetV2 basic module. In addition, the Shuffle NetV2 basic module may further change the distribution and structure of the feature map when performing feature processing. This may lead to further structural collapse of the feature map in terms of local feature distribution, affecting the accuracy of the classification result obtained by the classifier. In order to solve the problems of feature distribution differences of a plurality of vehicle environment feature graphs in a high-dimensional feature space and structural changes introduced by a fusion and Shuffle NetV2 basic module, in the technical scheme of the application, hidden feature expression of a motion parameterized model of the target detection classification feature graph relative to a target classification function is calculated to obtain an optimized target detection classification feature graph.

Specifically, optimizing the target detection classification feature map to obtain an optimized target detection classification feature map includes: calculating the hidden characteristic expression of the motion parameterization model of the target detection classification characteristic diagram relative to the target classification function according to the following optimization formula to obtain the optimized target detection classification characteristic diagram; wherein, the optimization formula is:

That is, for the above technical problems, an optimization method based on latent feature expression is provided, which can calculate the latent feature expression of the motion parameterized model of the target detection classification feature map relative to the target classification function to obtain an optimized target detection classification feature map, so that the feature distribution extracted by the neural network can be self-adapted along with iteration by performing probabilistic interpretation on the feature values of the target detection classification feature map according to the position, thereby gradually approaching to the real information distribution, and improving the accuracy of the classification result of the target detection classification feature map through the classifier. The hidden feature expression is a feature coding method based on hidden variables, and can construct a probability distribution function of hidden variables according to the feature values of the target detection classification feature graphs and posterior distribution of the target classification functions, and the probability distribution function is used for carrying out probabilistic coding and decoding on the feature values of the target detection classification feature graphs so that the probability distribution function is more consistent with the posterior distribution of the target classification functions, thus not only improving the precision and efficiency of feature coding, but also maintaining the original information and semantics of the features and avoiding the information loss and confusion of the features.

And finally, the optimized target detection classification characteristic diagram passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether a target object exists or not. It should be understood that the optimized target detection classification feature map is obtained by processing, extracting and optimizing the comprehensive feature map of the vehicle environment through the deep learning network, and includes feature representations of different positions in the image. By inputting the feature map into the classifier, the features of each position can be classified, and whether the position belongs to the category of the target object is judged. The classifier is typically a trained machine learning model that classifies the input features and outputs a corresponding class probability or confidence. By inputting the optimized target detection classification feature map into the classifier, the target classification can be performed for each position in the image using the feature-class relationship learned by the model. The classification result indicates whether or not there is information of the target object. Through the output of the classifier, the probability or confidence that each location belongs to the target object class can be obtained. According to the set threshold value, a position having a probability higher than the threshold value may be determined as a target object being present, and a position having a probability lower than the threshold value may be determined as a target object not being present. The method has the significance that the optimized target detection classification characteristic diagram is combined with the classifier to realize classification judgment of the target object in the image. By means of the classifier, the target classification can be performed for each position, so that whether a target object exists in the image or not can be determined.

Specifically, the step of passing the optimized target detection classification feature map through a classifier to obtain a classification result, wherein the classification result is used for indicating whether a target object exists or not, and the step of including: processing the optimization target detection classification feature map with the classifier in the following classification formula to generate the classification result; wherein, the classification formula is:

O＝softmax{(W _n ，B _n )：…：(W ₁ ，B ₁ )|Project(F)}

In summary, the scene target detection method based on deep learning according to the embodiments of the present application is illustrated, which adopts an artificial intelligence detection technology based on machine vision, and obtains a vehicle surrounding image during the running process of an automobile through shooting by a vehicle-mounted camera, and performs feature extraction in a high-dimensional space on the vehicle surrounding image so as to analyze and determine whether a target object exists. Thus, real-time target detection can be realized in the automobile process, and the road environment can be perceived and understood more accurately.

Fig. 4 is a system block diagram of a scene object detection system based on deep learning according to an embodiment of the application. As shown in fig. 4, a scene object detection system 100 based on deep learning according to an embodiment of the present application includes: a vehicle surrounding data acquisition module 110 for acquiring a vehicle surrounding image captured by the vehicle-mounted camera; an environmental image preprocessing module 120, configured to perform image preprocessing on the vehicle surrounding image to obtain a preprocessed vehicle surrounding image; an environmental feature extraction module 130, configured to pass the preprocessed vehicle surrounding image through an environmental feature extractor based on a mixed convolution layer to obtain a plurality of vehicle environmental feature maps; the environmental feature fusion module 140 is configured to fuse the plurality of vehicle environmental feature maps to obtain a vehicle environmental comprehensive feature map; the target detection feature generation module 150 is configured to input the vehicle environment integrated feature map into a Shuffle NetV2 basic module to obtain a target detection classification feature map; the optimizing module 160 is configured to optimize the target detection classification feature map to obtain an optimized target detection classification feature map; the target detection result generating module 170 is configured to pass the optimized target detection classification feature map through a classifier to obtain a classification result, where the classification result is used to indicate whether the target object exists.

Here, it will be understood by those skilled in the art that the specific functions and operations of the respective units and modules in the above-described deep learning-based scene target detection system 100 have been described in detail in the above description of the deep learning-based scene target detection method with reference to fig. 1 to 3, and thus, repetitive descriptions thereof will be omitted.

In summary, the scene target detection system 100 based on deep learning according to the embodiment of the application is illustrated, which adopts an artificial intelligence detection technology based on machine vision, and obtains an image of the surrounding environment of a vehicle in the running process of the vehicle through shooting by a vehicle-mounted camera, and performs feature extraction in a high-dimensional space on the image of the surrounding environment of the vehicle so as to analyze and determine whether a target object exists. Thus, real-time target detection can be realized in the automobile process, and the road environment can be perceived and understood more accurately.

As described above, the scene target detection system 100 based on deep learning according to the embodiment of the present application may be implemented in various wireless terminals, such as a server or the like for scene target detection based on deep learning. In one example, the deep learning based scene object detection system 100 according to embodiments of the present application may be integrated into a wireless terminal as one software module and/or hardware module. For example, the deep learning based scene object detection system 100 may be a software module in the operating system of the wireless terminal or may be an application developed for the wireless terminal; of course, the deep learning based scene object detection system 100 can also be one of many hardware modules of the wireless terminal.

Alternatively, in another example, the deep learning based scene object detection system 100 and the wireless terminal may be separate devices, and the deep learning based scene object detection system 100 may be connected to the wireless terminal through a wired and/or wireless network and transmit interaction information in a agreed data format.

Next, an electronic device according to an embodiment of the present application is described with reference to fig. 5. Fig. 5 is a block diagram of an electronic device according to an embodiment of the present application. As shown in fig. 5, the electronic device may include: a processor (processor) 10, a communication interface (communication interface) 12, a memory (memory) 13 and a communication bus 14, wherein the processor 11, the communication interface 12 and the memory 13 communicate with each other through the communication bus 14. The processor 11 may invoke logic instructions in the memory 13 to perform the network traffic anomaly detection method described above.

Further, the logic instructions in the memory 13 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Claims

1. The scene target detection method based on deep learning is characterized by comprising the following steps of:

acquiring a vehicle surrounding image shot by a vehicle-mounted camera;

2. The deep learning based scene object detection method of claim 1, wherein the hybrid convolution layer comprises a first convolution branch structure, a second convolution branch structure, a third convolution branch structure, and a fourth convolution branch structure in parallel, wherein the first convolution branch uses a first convolution kernel having a first size, the second convolution branch uses a second convolution kernel having a first size and having a first void fraction, the third convolution branch uses a third convolution kernel having a first size and having a second void fraction, and the fourth convolution branch uses a fourth convolution kernel having a first size and having a third void fraction.

3. The deep learning based scene object detection method of claim 2, wherein passing the preprocessed vehicle surroundings image through a mixed convolution layer based environment feature extractor to obtain a plurality of vehicle environment feature maps, comprising:

performing convolutional encoding on the preprocessed vehicle surrounding environment image by using the first convolutional check with the first size to obtain a first-scale vehicle environment feature map;

performing convolutional encoding on the preprocessed vehicle surrounding environment image by using the second convolutional check with the first void ratio to obtain a second-scale vehicle environment feature map;

performing convolutional encoding on the preprocessed vehicle surrounding environment image by using the third convolutional check with the second void ratio to obtain a third-scale vehicle environment feature map;

and performing convolutional encoding on the preprocessed vehicle surrounding environment image by using the fourth convolutional check with the third void ratio to obtain a fourth-scale vehicle environment characteristic diagram.

4. A scene object detection method based on deep learning according to claim 3, characterized in that optimizing the object detection classification feature map to obtain an optimized object detection classification feature map comprises: calculating the hidden characteristic expression of the motion parameterization model of the target detection classification characteristic diagram relative to the target classification function according to the following optimization formula to obtain the optimized target detection classification characteristic diagram;

wherein, the optimization formula is:

wherein f _i,j,k Characteristic values representing the (i, j, k) th position of the target detection classification characteristic diagram, log represents a logarithmic function value based on 2, lambda represents a predetermined super-parameter, f _i,j,k ' represents the feature value of the (i, j, k) th position of the optimization target detection classification feature map.

5. The deep learning-based scene object detection method according to claim 4, wherein the optimizing object detection classification feature map is passed through a classifier to obtain a classification result, the classification result being used to indicate whether an object exists, and the method comprises: processing the optimization target detection classification feature map with the classifier in the following classification formula to generate the classification result;

wherein, the classification formula is:

O＝softmax{(W _n ,B _n ):…:(W ₁ ,B ₁ )|Project(F)}

6. A scene target detection system based on deep learning, comprising:

7. The deep learning based scene target detection system of claim 6, wherein the environmental feature extraction module comprises:

a first scale vehicle environment feature extraction unit, configured to perform convolutional encoding on the preprocessed vehicle surrounding environment image by using the first convolution kernel having the first size to obtain a first scale vehicle environment feature map;

the second-scale vehicle environment feature extraction unit is used for performing convolutional encoding on the preprocessed vehicle surrounding environment image by using the second convolutional check with the first void ratio so as to obtain a second-scale vehicle environment feature map;

a third-scale vehicle environment feature extraction unit, configured to perform convolutional encoding on the preprocessed vehicle surrounding environment image by using the third convolution kernel with the second void ratio to obtain a third-scale vehicle environment feature map;

and the fourth-scale vehicle environment feature extraction unit is used for performing convolutional encoding on the preprocessed vehicle surrounding environment image by using the fourth convolutional check with the third void ratio so as to obtain a fourth-scale vehicle environment feature map.

8. The deep learning based scene target detection system of claim 7, wherein the optimization module is configured to: calculating the hidden characteristic expression of the motion parameterization model of the target detection classification characteristic diagram relative to the target classification function according to the following optimization formula to obtain the optimized target detection classification characteristic diagram;

wherein, the optimization formula is:

wherein f _i,k,k Characteristic values representing the (i, j, k) th position of the target detection classification characteristic diagram, log represents a logarithmic function value based on 2, lambda represents a predetermined super-parameter, f _i,j,k ' represents the feature value of the (i, j, k) th position of the optimization target detection classification feature map.

9. The deep learning based scene target detection system of claim 8, wherein the target detection result generation module comprises: processing the optimization target detection classification feature map with the classifier in the following classification formula to generate the classification result;

wherein, the classification formula is:

O＝softmax{(W _n ,B _n ):…:(W ₁ ,B ₁ )|Project(F)}

10. An electronic device comprising a memory and a processor coupled to the memory, wherein the processor is configured to perform the deep learning based scene object detection method of any of claims 1-5 based on instructions stored in the memory.