CN111104903B - Depth perception traffic scene multi-target detection method and system - Google Patents

Depth perception traffic scene multi-target detection method and system Download PDF

Info

Publication number
CN111104903B
CN111104903B CN201911317498.3A CN201911317498A CN111104903B CN 111104903 B CN111104903 B CN 111104903B CN 201911317498 A CN201911317498 A CN 201911317498A CN 111104903 B CN111104903 B CN 111104903B
Authority
CN
China
Prior art keywords
neural network
layer
target
network layer
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911317498.3A
Other languages
Chinese (zh)
Other versions
CN111104903A (en
Inventor
张登银
彭巧
孙誉焯
周超
刘子捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Austria Internet Of Things Technology Nanjing Co ltd
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201911317498.3A priority Critical patent/CN111104903B/en
Publication of CN111104903A publication Critical patent/CN111104903A/en
Application granted granted Critical
Publication of CN111104903B publication Critical patent/CN111104903B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats

Abstract

The invention discloses a method and a system for detecting multiple targets in a depth perception traffic scene, which comprises the steps of inputting a picture to be detected to a Mask R-CNN model which is trained in advance to identify the category and the target position of a first type of target; and inputting the identified picture into a pre-trained optimized CNN model, and detecting the class, confidence and target position of a second class of target in the picture. The invention can fully consider the problems of complex traffic scene and serious small target missing detection of the existing target tracking algorithm, provides an optimized CNN model, optimizes the feature extraction network and the detection network on the basis of the advantages of the original CNN, and trains to generate a new model for detecting the small target. The method for detecting the small target on the large target detection result can enhance the detection effect of multiple targets in a traffic scene and improve the accuracy of small target identification.

Description

Depth perception traffic scene multi-target detection method and system
Technical Field
The invention relates to a depth perception traffic scene multi-target detection method and system, and belongs to the technical field of video image processing.
Background
Vision-based traffic scene awareness (TSP) is one of many emerging areas in intelligent traffic systems, and this research area has been widely studied during the past decade. TSPs aim at extracting accurate real-time road information, and generally involve three phases for various research objects contained in an image: detecting, identifying and tracking. Since tracking generally relies on the results of detection and identification, the ability to effectively detect and identify research objects plays a crucial role in TSPs, which has also been a classical problem in identifying multiple target objects in images or videos.
In addition to traditional image processing techniques, CNN is a powerful and efficient method for common image classification recognition detection tasks, extending a number of excellent models and ideas. Early overtureat used a sliding window selection search in ConvNet for classification, localization and detection, and Ross Girshick proposed Region-CNN (R-CNN) by classifying objects using deep ConvNet. Due to the computational temporal and spatial shortcomings, he has employed pooling layers on Fast region-based convolutional networks (Fast-RCNN) to improve speed and detection accuracy. Later, more efficient Faster R-CNN was proposed based on the above, which introduced a new area proposal network directly to obtain candidate areas. The Mask R-CNN prototype with Faster R-CNN added a branch for the segmentation task. The architecture of the series of models has several similarities, one is that they are backbone networks of CNN, and originate from basic CNN; another is to add some extra proposed layers, such as ROI pool and RPN layer, which can effectively handle the feature map of the backbone CNN.
As a typical deep learning model, CNN can obtain excellent performance in object detection due to its strong feature extraction capability, but for some important small visual objects, such as license plates, passengers in a vehicle, etc., their labels and information are insufficient, which increases the difficulty of traffic scene information acquisition and deep learning development.
Disclosure of Invention
The invention aims to solve the problems of insufficient labels and information of some important small visual objects such as license plates, passengers in a vehicle and the like in the prior art, and provides a method and a system for detecting multiple targets in a deep perception traffic scene.
The invention adopts the following technical scheme:
a multi-target detection method for traffic scene perception comprises the following steps:
inputting a picture to be detected into a Mask R-CNN model which is trained in advance, and extracting the category and the target position of a first type of target;
and (4) pre-training the recognized picture input value to the optimized CNN model, and detecting the class, the confidence coefficient and the target position of the second class of targets in the picture.
Further, the optimized CNN model comprises a feature extraction network and an object detection network, wherein the feature extraction network is used for detecting the input features of the pictures to obtain a feature map; and the object detection network detects the picture to be detected and outputs the category, the confidence coefficient and the target position of the second type of targets in the picture.
Further preferably, the optimized CNN model includes a feature extraction network and an object detection network, the feature extraction network structure includes 8 layers, and from layer 1 to layer 8, a first convolutional neural network layer, a first maximum pooling layer, a second convolutional neural network layer, a third convolutional neural network layer, a second maximum pooling layer, a fourth convolutional neural network layer, a fifth convolutional neural network layer, and a third maximum pooling layer are respectively provided;
the object detection network comprises three layers, wherein the first layer is a sixth convolutional neural network layer, the second layer is a seventh neural network layer and an eighth neural network layer of two convolutional neural network layers which are parallel, the seventh neural network layer and the eighth neural network layer are simultaneously connected with the sixth neural network layer, the third layer is a ninth neural network layer and a tenth neural network layer which are respectively connected with the seventh neural network layer and the eighth neural network layer, the ninth neural network layer outputs the confidence coefficient and the target position of a target, and the tenth neural network layer outputs the category of the target. Preferably, the first convolutional neural network layer is a normalization layer.
On the basis of the above technical solution, it is further preferable that the first convolutional neural network layer kernel adopts 11 × 11, and the first convolutional neural network layer first plays a role in the input image to retain low-level but rich details. The second convolutional neural network layer and the third convolutional neural network layer as well as the fourth convolutional neural network layer and the fifth convolutional neural network layer are 3x3 convolutional layers, and by using a deconvolution method of two 3x3 convolutional layers, fewer parameters can be introduced, so that simplified overfitting can be realized, stronger functions can be expressed by fewer parameters, and then batch normalization is performed. The role of the max pooling layer is to compute the maximum value in each identified n × n region to enable image downsampling. It helps to simplify the network computational complexity, compress the input feature map and extract the main features.
The object detection network comprises three layers, wherein the first layer is a sixth convolutional neural network layer, the second layer is a seventh neural network layer and an eighth neural network layer which are two convolutional neural network layers in parallel and are simultaneously connected with the sixth neural network layer, the third layer is a ninth neural network layer and a tenth neural network layer which are respectively connected with the seventh neural network layer and the eighth neural network layer, the ninth neural network layer outputs the confidence coefficient and the target position of a target, and the tenth neural network layer outputs the category of the target. Preferably, the first convolutional neural network layer is a normalization layer. Wherein the seventh neural network layer and the ninth neural network layer are convolutional layers with a kernel of 1 × 1.
In the technical scheme, the feature extraction network designs a network integrating different convolution layers, a local normalization layer and a maximum pooling layer, and acquires detailed features of a target as much as possible to obtain a feature map of an image to be detected; the feature map is input into the detection network, pixel-level target features acquired from the feature image are input, the targets in the image can be classified and positioned element by element, a predicted object boundary is generated, and a difference value between a predicted boundary frame and a ground truth is output.
In another aspect, the invention provides a depth-aware traffic scene multi-target detection system, which is characterized in that,
the Mask R-CNN model is used for inputting and identifying the category and the target position of a first type of target in the picture to be detected;
and the optimized CNN model is used for detecting the type, confidence degree and target position of a second type of target in the picture from the picture which is identified by the input Mask R-CNN model.
Further, the optimized CNN model comprises a feature extraction network and an object detection network, wherein the feature extraction network is used for detecting input features of pictures to obtain a feature map; and the object detection network detects the picture to be detected and outputs the category, the confidence coefficient and the target position of the second type of targets in the picture.
The invention achieves the following beneficial technical effects:
firstly, the invention adopts Mask R-CNN to detect the large target object, and obtains the large target object which can be clearly detected in each picture. The invention selects the network of Mask R-CNN, which can not only detect the objects, but also segment them from the input image, but the invention only keeps the object with larger size and clear segmented by the Mask R-CNN, because the object with smaller size and unclear size can cause the object to be identified wrongly;
second, the present invention employs an optimized feature extractor and detector for small object detection. The core of the feature extractor is a network integrating different convolution layers, a local normalization layer and a maximum pooling layer, and aims to acquire detailed features of small targets as much as possible; the core of the detector is to use a 1 × 1 convolution kernel instead of the normal fully-connected layer. Since such 1 x 1 convolution kernels contain local receive domains, they can be slid over a larger input image to obtain multiple outputs regardless of the different size input images. Therefore, the conversion improves the efficiency of forward propagation of the neutral network, enhances the learning capability of the CNN, and saves a large amount of time overhead.
Drawings
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention.
FIG. 2 is a diagram of a MASK R-CNN model architecture employed in an embodiment of the present invention;
fig. 3 is a training flow diagram of an optimized CNN algorithm for small target detection in an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The method combines Mask R-CNN and an improved optimization model based on CNN, and is suitable for multi-target detection. The feature extraction network of the optimized small target detection network part learns a large amount of information from fine-grained details of a lower layer, and enriches the representation of the small target by reasonably increasing the size of the feature map and down-sampling. The detection part is trained by fully utilizing the convolutional layer, so that the input characteristics of various targets can be well classified and detected. The invention further optimizes the special detector in a full convolution mode and applies the deep learning to the traffic scene. The optimized CNN detector overall architecture is composed of a feature extractor and a detector, wherein the feature extractor is composed of different convolution layers, a maximum pooling layer and a local normalization layer, and the detector is mainly used for classification and positioning tasks of 'Softmax with loss' and bounding box regression. A standard fully connected layer will typically introduce a large number of parameters. Therefore, it is effective to replace the full-link layer with a 1 × 1 convolution kernel, which is advantageous in reducing the amount of calculation.
Fig. 1 is a flowchart of a method according to an embodiment of the present invention, in which a format of the PASCAL VOC data set and an evaluation algorithm tool are used to select four types of target objects: and carrying out format conversion on vehicles, people, traffic signs and license plates to generate a training set. And acquiring pictures needing to be tested in the actual life, and manufacturing a test set according to the method for generating the data set.
The practice of the present invention is further illustrated below with reference to fig. 1 and examples. Fig. 1 shows a depth-aware traffic scene multi-target detection method provided by the embodiment, which includes:
s1, adopting the format of the PASCAL VOC data set and an evaluation algorithm tool. First, the category of KITTI is switched: the PASCAL VOC has 20 categories in total, in an urban traffic scene, the key detection objects are four types, namely vehicles, people, traffic signs and license plates, so that the data set is divided into the 4 categories; secondly, converting the labeling information: converting the tagged file from txt to xml, removing other information in the tag, and only leaving four types of vehicles, people, traffic signs and license plates; finally, a required training set is generated. Similarly, the pictures to be tested in the real life are collected, and the test set of the embodiment is generated according to the method.
S2, aiming at the large target contained in the image, the embodiment inputs the training set into the original Mask R-CNN network for training, and generates a network model, as shown in FIG. 2. In the embodiment, a network such as Mask R-CNN is selected, so that not only can objects be detected, but also the objects can be segmented from an input image, but only the large-size and clear object segmented by the Mask R-CNN, namely the first-class object, is reserved, and the small-size and unclear object can cause the object to be identified incorrectly. It should be noted that the Mask R-CNN network is the prior art, and the construction and training method thereof is common knowledge in the art and will not be described herein.
S3, secondly, for small-sized objects with insufficient labels and information, the embodiment inputs the training set in the PASCAL VOC data set into the network architecture of the optimized CNN detector designed in the embodiment to perform training, and generates a network model. The network structure is divided into two parts, a feature extraction network and a detection network.
(1) The feature extraction network portion, the present embodiment, uses a network integrating different convolutional layers, local normalization layers, and max pooling layers, as shown in fig. 3.
Fig. 3 shows that the feature extraction network structure includes 8 layers, from layer 1 to layer 8, which are a first convolutional neural network layer, a first maximum pooling layer, a second convolutional neural network layer, a third convolutional neural network layer, a second maximum pooling layer, a fourth convolutional neural network layer, a fifth convolutional neural network layer, and a third maximum pooling layer, respectively;
preferably, the first convolutional neural network layer is a normalization layer.
The plurality of convolution layers with nonlinear activation functions help to enhance the capability of nonlinear expansion, and compared with a single convolution layer, the method can correctly process multiple targets in the image and acquire detailed characteristics of the targets as much as possible. The network gradually deepens from Conv1, the representation of small targets in the image, namely the second type of targets (such as license plates, passengers in the vehicle and the like) is demonstrated in a smaller dimension, and the output result is a feature map which needs to be input by the next detection part of the embodiment. One large kernel (11 × 11) in Conv1 works first in the input image to preserve low-level but rich details. The resulting feature is then passed into two 3x3 convolutional layers, which, as shown in fig. 2, decompose two 5x5 convolutional layers into two 3x3 convolutional layers, conv2 and conv3, and two 3x3 convolutional layers, conv4 and conv 5. Here, the advantage of this embodiment of replacing the 5 × 5 kernel in VGG Net with two smaller consecutive 3 × 3 convolutional layers is: firstly, the structure of the plurality of convolution layers has a nonlinear function, which is helpful for enhancing the nonlinear expansion capability, and can extract deeper features than that of a single 5 × 5 convolution layer; second, using the method of deconvolution of two 3 × 3 convolutional layers, fewer parameters can be introduced, because the present embodiment assumes that the input and output channels of the convolutional layers are C and D, respectively, implementing the parameters of a single 5 × 5 kernel as 5 × 5 × C × D = 25 × C × D, while the combined × 3 convolutional layers of two 3 have only 2 × (3 × C × D) = 18 × C × D, which reduces the parameters by 25/18 = 1.4 times. Fewer parameters may enable simplified overfitting and express more powerful functions.
The role of the max pooling layer is to compute the maximum value in each identified n × n region to enable image downsampling. It helps to simplify the network computational complexity, compress the input feature map and extract the main features.
(2) The detection network part is used for finishing classification and positioning tasks. It is divided into two branches, denoted "Output _ type" and "Output _ bbox", respectively.
The "Output _ type" branch actually functions to classify objects at the pixel level. Here the fully connected layers in a traditional network (such as VGG Net) are replaced with two convolutional layers, Conv7 and Conv 9. Thus, the output (excluding the softmax layer) from the transformed network is no longer a category but a heatmap. The next step is to perform element-by-element classification prediction: the maximum numerical probability of the same pixel in 1000 images is calculated pixel by pixel and considered as a pixel class. Finally, the "Softmax lossy" layer is used to compute the lossy function in this task.
The "Output _ bbox" branch implements the localization of the target, which is composed of similar full convolutional layers. It can predict the object boundary and output the predicted boundary box ( x min y min w h ) And ground truth.
S4, inputting the test set (to-be-detected pictures) into the trained Mask R-CNN model to detect the type, confidence and target position of a large target in the image, and identifying the pictures of the large target and saving the pictures as a new test set; and inputting the new test set into the trained optimized CNN model, and detecting the category, the confidence coefficient and the target position of the small target in the image.
The optimized CNN model structure parameters in this example are shown in table 1.
TABLE 1 optimized CNN model structural parameters
Figure DEST_PATH_IMAGE001
Another embodiment provides a depth-aware traffic scene multi-target detection system, including:
the Mask R-CNN model is used for inputting and identifying the category and the target position of a first type of target in the picture to be detected;
and the optimized CNN model is used for detecting the class, the confidence coefficient and the target position of a second class of targets in the picture by using the picture which is identified by the input Mask R-CNN model.
On the basis of the above embodiment, further, the optimized CNN model includes a feature extraction network and an object detection network, where the feature extraction network is used to detect input features of a picture to obtain a feature map; and the object detection network detects the picture to be detected and outputs the category, the confidence coefficient and the target position of the second type of target in the picture.
The specific implementation manners of the Mask R-CNN model and the optimized CNN model in this embodiment are the same as those in the above embodiment, and will not be described again.
The invention divides the detection of multiple targets in a traffic scene into: and detecting a large target and a small target. The first part is directed to large targets, comprising: identifying and segmenting target objects in the input image by adopting a Mask R-CNN model for vehicles, traffic signs and pedestrians; the second part is directed to a small target, comprising: a license plate and a passenger in a vehicle provide an optimized CNN model, and the feature extraction network and the detection network are optimized and trained to generate a new model for small target detection based on the advantages of the original CNN network. The method for detecting the small target on the large target detection result can enhance the detection of multiple targets in the traffic scene, improve the accuracy of small target identification and provide a model with good performance for the detection of the multiple targets in the actual traffic scene.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (5)

1. A depth perception traffic scene multi-target detection method is characterized by comprising the following steps:
inputting a picture to be detected into a Mask R-CNN model which is trained in advance to identify the category and the target position of a first type of target;
inputting the recognized picture into an optimized CNN model which is trained in advance, and detecting the category, confidence and target position of a second type of target in the picture;
the optimized CNN model comprises a feature extraction network and an object detection network, wherein the feature extraction network is used for detecting the input features of pictures to obtain a feature map; the object detection network detects the picture to be detected and outputs the category, the confidence coefficient and the target position of the second type of target in the picture;
the object detection network comprises three layers, wherein the first layer is a sixth convolutional neural network layer, the second layer is a seventh neural network layer and an eighth neural network layer which are two convolutional neural network layers in parallel and are simultaneously connected with the sixth neural network layer, the third layer is a ninth neural network layer and a tenth neural network layer which are respectively connected with the seventh neural network layer and the eighth neural network layer, the ninth neural network layer outputs the confidence coefficient and the target position of a target, and the tenth neural network layer outputs the category of the target.
2. The method as claimed in claim 1, wherein the feature extraction network structure includes 8 layers, and from layer 1 to layer 8, there are a first convolutional neural network layer, a first maximum pooling layer, a second convolutional neural network layer, a third convolutional neural network layer, a second maximum pooling layer, a fourth convolutional neural network layer, a fifth convolutional neural network layer and a third maximum pooling layer, respectively.
3. The method for multi-target detection in the deep perception traffic scene as claimed in claim 2, wherein the first convolutional neural network layer is a normalization layer.
4. A depth perception traffic scene multi-target detection system is characterized in that,
the Mask R-CNN model is used for inputting and identifying the category and the target position of a first type of target in the picture to be detected;
the optimized CNN model is used for detecting the type, confidence and target position of a second type of target in an image after the image is identified by the input Mask R-CNN model;
the optimized CNN model comprises a feature extraction network and an object detection network, wherein the feature extraction network is used for detecting input features of pictures to obtain a feature map; the object detection network detects the picture to be detected and outputs the category, the confidence coefficient and the target position of the second type of target in the picture;
the object detection network comprises three layers, wherein the first layer is a sixth convolutional neural network layer, the second layer is a seventh neural network layer and an eighth neural network layer which are two convolutional neural network layers in parallel and are simultaneously connected with the sixth neural network layer, the third layer is a ninth neural network layer and a tenth neural network layer which are respectively connected with the seventh neural network layer and the eighth neural network layer, the ninth neural network layer outputs the confidence coefficient and the target position of a target, and the tenth neural network layer outputs the category of the target.
5. The system for multi-target detection in a depth-aware traffic scene as claimed in claim 4, wherein the optimized CNN model includes a feature extraction network and an object detection network, the feature extraction network is configured to detect input features of pictures to obtain a feature map; and the object detection network detects the picture to be detected and outputs the category, the confidence coefficient and the target position of the second type of target in the picture.
CN201911317498.3A 2019-12-19 2019-12-19 Depth perception traffic scene multi-target detection method and system Active CN111104903B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911317498.3A CN111104903B (en) 2019-12-19 2019-12-19 Depth perception traffic scene multi-target detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911317498.3A CN111104903B (en) 2019-12-19 2019-12-19 Depth perception traffic scene multi-target detection method and system

Publications (2)

Publication Number Publication Date
CN111104903A CN111104903A (en) 2020-05-05
CN111104903B true CN111104903B (en) 2022-07-26

Family

ID=70422517

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911317498.3A Active CN111104903B (en) 2019-12-19 2019-12-19 Depth perception traffic scene multi-target detection method and system

Country Status (1)

Country Link
CN (1) CN111104903B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611947B (en) * 2020-05-25 2024-04-09 济南博观智能科技有限公司 License plate detection method, device, equipment and medium
CN113743398B (en) * 2020-05-29 2023-11-17 富泰华工业(深圳)有限公司 Image identification method, device, computer device and storage medium
CN111694973B (en) * 2020-06-09 2023-10-13 阿波罗智能技术(北京)有限公司 Model training method and device for automatic driving scene and electronic equipment
CN111723723A (en) * 2020-06-16 2020-09-29 东软睿驰汽车技术(沈阳)有限公司 Image detection method and device
CN112766364A (en) * 2021-01-18 2021-05-07 南京信息工程大学 Tomato leaf disease classification method for improving VGG19
CN113191273A (en) * 2021-04-30 2021-07-30 西安聚全网络科技有限公司 Oil field well site video target detection and identification method and system based on neural network
CN113191274A (en) * 2021-04-30 2021-07-30 西安聚全网络科技有限公司 Oil field video intelligent safety event detection method and system based on neural network
CN113469272B (en) * 2021-07-20 2023-05-19 东北财经大学 Target detection method for hotel scene picture based on fast R-CNN-FFS model
CN113393410A (en) * 2021-07-26 2021-09-14 浙江大华技术股份有限公司 Image fusion method and device, electronic equipment and storage medium
CN114022705B (en) * 2021-10-29 2023-08-04 电子科技大学 Self-adaptive target detection method based on scene complexity pre-classification
CN114742204A (en) * 2022-04-08 2022-07-12 黑龙江惠达科技发展有限公司 Method and device for detecting straw coverage rate
CN115359301A (en) * 2022-09-06 2022-11-18 上海寻序人工智能科技有限公司 Data mining method based on cloud platform

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977812B (en) * 2019-03-12 2023-02-24 南京邮电大学 Vehicle-mounted video target detection method based on deep learning
CN110516670B (en) * 2019-08-26 2022-04-22 广西师范大学 Target detection method based on scene level and area suggestion self-attention module

Also Published As

Publication number Publication date
CN111104903A (en) 2020-05-05

Similar Documents

Publication Publication Date Title
CN111104903B (en) Depth perception traffic scene multi-target detection method and system
Dai et al. TIRNet: Object detection in thermal infrared images for autonomous driving
Azimi et al. Aerial LaneNet: Lane-marking semantic segmentation in aerial imagery using wavelet-enhanced cost-sensitive symmetric fully convolutional neural networks
CN109840521B (en) Integrated license plate recognition method based on deep learning
Kaur et al. A comprehensive review of object detection with deep learning
CN107316016A (en) A kind of track of vehicle statistical method based on Hadoop and monitoring video flow
CN111767878B (en) Deep learning-based traffic sign detection method and system in embedded device
CN110929593A (en) Real-time significance pedestrian detection method based on detail distinguishing and distinguishing
Alvarez et al. Road geometry classification by adaptive shape models
Xiang et al. Lightweight fully convolutional network for license plate detection
CN111462140B (en) Real-time image instance segmentation method based on block stitching
CN110705412A (en) Video target detection method based on motion history image
CN111126401B (en) License plate character recognition method based on context information
CN112861931B (en) Multi-level change detection method, system, medium and electronic device based on difference attention neural network
Gad et al. Real-time lane instance segmentation using segnet and image processing
Yun et al. Part-level convolutional neural networks for pedestrian detection using saliency and boundary box alignment
Zhang et al. A front vehicle detection algorithm for intelligent vehicle based on improved gabor filter and SVM
CN115147450B (en) Moving target detection method and detection device based on motion frame difference image
CN112446292B (en) 2D image salient object detection method and system
Xu et al. SPNet: Superpixel pyramid network for scene parsing
Nataprawira et al. Pedestrian Detection on Multispectral Images in Different Lighting Conditions
CN113221604A (en) Target identification method and device, storage medium and electronic equipment
Chanawangsa et al. A new color-based lane detection via Gaussian radial basis function networks
Nataprawira et al. Pedestrian Detection in Different Lighting Conditions Using Deep Neural Networks.
Saranya et al. The Proficient ML method for Vehicle Detection and Recognition in Video Sequence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221221

Address after: Room 802-5, 8th floor, building A1, Huizhi Science Park, Hengtai Road, Nanjing Economic and Technological Development Zone, Jiangsu 210000

Patentee after: China Austria Internet of things technology (Nanjing) Co.,Ltd.

Address before: 210023 9 Wen Yuan Road, Qixia District, Nanjing, Jiangsu.

Patentee before: NANJING University OF POSTS AND TELECOMMUNICATIONS