CN106372597B - CNN Vehicle Detection method based on adaptive contextual information - Google Patents

CNN Vehicle Detection method based on adaptive contextual information Download PDF

Info

Publication number
CN106372597B
CN106372597B CN201610786130.1A CN201610786130A CN106372597B CN 106372597 B CN106372597 B CN 106372597B CN 201610786130 A CN201610786130 A CN 201610786130A CN 106372597 B CN106372597 B CN 106372597B
Authority
CN
China
Prior art keywords
context
target
feature
traffic
cnn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201610786130.1A
Other languages
Chinese (zh)
Other versions
CN106372597A (en
Inventor
李涛
李冬梅
张玉宏
曲豪
邹香玲
张栋梁
朱晓珺
郭航宇
高大伟
刘永
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Chantu Intelligent Technology Co ltd
Original Assignee
Zhengzhou Zen Graphics Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Zen Graphics Intelligent Technology Co Ltd filed Critical Zhengzhou Zen Graphics Intelligent Technology Co Ltd
Priority to CN201610786130.1A priority Critical patent/CN106372597B/en
Publication of CN106372597A publication Critical patent/CN106372597A/en
Application granted granted Critical
Publication of CN106372597B publication Critical patent/CN106372597B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/768Arrangements for image or video recognition or understanding using pattern recognition or machine learning using context analysis, e.g. recognition aided by known co-occurring patterns

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The CNN Vehicle Detection method based on adaptive contextual information that the invention discloses a kind of, including training stage and detection-phase, under special traffic scene, the adaptive contextual feature preference pattern of training;On the basis of obtaining adaptive contextual feature preference pattern, CNN traffic detection system of the training based on adaptive contextual information;Traffic target (when detecting, carrying out the associated prediction of context and target, by post-processing, accurately confine traffic target) is accurately confined by post-processing in the forward direction stage.The invention proposes the CNN traffic detection system based on adaptive contextual information, the main adaptive context preference pattern comprising based on CNN and the traffic detection system for merging the model further improve the accuracy of vehicle and pedestrian detection.

Description

CNN traffic detection method based on self-adaptive context information
Technical Field
The invention relates to a CNN traffic detection technology based on self-adaptive context information, which can be applied in real time.
Background art:
in order to solve these increasingly serious traffic problems, Intelligent Transportation Systems (ITS) have been developed, in which vehicle and pedestrian identification is an important component of the intelligent transportation system, and some existing related technologies related to vehicles and pedestrians are also widely used.
The existing traffic detection system mainly realizes the identification and detection of different targets (pedestrians and vehicles) through the depiction of the appearance information of the targets. Currently, this type of system mainly uses artificially designed features (such as HOG, LBP, SIFT, etc.) or deep features directly obtained from the image itself through deep learning to depict the target appearance, and uses the target appearance to realize target detection. However, in the actual detection of daily traffic, most of the traffic is an open environment without constraint, the traffic is complex and changeable, the interference such as illumination change, view angle change, target shielding and the like exists, and if the traffic is only based on the appearance information of the target, when the information provided by the traffic target in the image or video is too little, the target category cannot be accurately judged according to the target. Moreover, different traffic scenes have certain differences, and a ubiquitous traffic target detection system neglecting the differences of the different traffic scenes can reduce the accuracy of traffic target detection.
The invention content is as follows:
the invention provides a CNN traffic detection method based on self-adaptive context information, which aims at further enriching the description of traffic targets by means of different context information under different traffic scenes in a traffic video so as to improve the accuracy of traffic target detection.
The technical scheme adopted for realizing the purpose of the invention is as follows: a CNN traffic detection method based on self-adaptive context information comprises a training phase and a detection phase, and is characterized in that,
the training phase comprises two steps:
firstly, under a specific traffic scene, training and acquiring a self-adaptive context feature selection model; firstly, extracting two groups of CNN characteristic graphs of a traffic target image and a context image thereof in a specific traffic scene; then, calculating the difference between the two groups of feature maps under the same scale, and recording and counting the position indexes of the feature maps with the sample difference degree smaller than a set threshold value; then, selecting position indexes of K effective context CNN characteristic graphs to obtain a self-adaptive context selection model, wherein K is more than 0 and is an integer;
secondly, training a CNN traffic detection system based on self-adaptive context information on the basis of obtaining a self-adaptive context feature selection model; in the forward stage, firstly, two groups of CNN feature maps of a traffic target image and a context image thereof are extracted, K feature map position indexes reserved by a context feature selection model are utilized, and corresponding effective feature maps are reserved from the obtained context CNN feature maps; then, performing convolution calculation on the two groups of obtained feature graphs by using a target kernel and a context kernel respectively to obtain a target score and a context score; then, fusing the target score and the context score through a mixing coefficient to obtain a detection score;
in the backward stage, calculating errors of detection values and labels, and updating parameters such as a target core, a context core, a mixing coefficient and the like by using a BP (Back propagation) algorithm;
a detection stage: in a specific traffic scene, firstly, inputting a detected traffic image, extracting 256 feature maps by using CNN (compressed natural number), and on one hand, obtaining a target mask map by using a trained target kernel convolution 256 feature maps; on the other hand, K feature maps are selected from 256 feature maps by using K feature map position indexes reserved by the context feature selection model, and a context mask map is obtained by performing convolution by using a trained context kernel; then, fusing the obtained target mask image and the context mask image by using the trained mixing coefficient, and jointly predicting the target position; and finally, accurately framing the traffic target through post-processing, wherein K is greater than 0 and is an integer.
According to the scheme, the context characteristics can be adaptively blended according to different scenes through the difference measurement, the representation of the traffic target is enhanced, and the accuracy of traffic target depiction is effectively improved.
1. The CNN traffic detection system based on self-adaptive context information adopts the first five-layer structure of Alexnet to extract the characteristic diagram of the traffic image; the method comprises the following steps: (Alexnet is a model of a convolutional neural network of eight-layer structure proposed 2012. the neural network has 6000 ten thousand parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by a max-posing layer, and three fully-connected layers, and a softmax layer ranked at the last 1000-way
1.1 assume input image x0It is expressed asWherein,andrespectively representing images x0Three-channel map in RGB space. The index of the convolutional layer is denoted by l, 1,2,3,4, 5. MlNumber of first convolutional layer feature map, M1=96、M2=256、M3=384、M4=384、M5256; the jth feature map of the ith convolutional layerThe calculation method is as follows:
wherein, WlRepresenting the connection relation of the characteristic diagrams of the adjacent convolutional layers;represents a convolution operation;andrespectively representing a convolution kernel and an offset;
1.2obtained by the pooling layer and the non-linear layer of the l-th layerExpressed as:
wherein g (g) represents the local response normalization, f (g) represents the activation function, and the unsaturated nonlinear function is adopted:
thus, CNN is for input image x0256 feature maps were obtained for the fifth convolutional layerj 1.. 256. Each feature mapIs the size of the input image x01/16 of (1). For convenience of expression, the system uses F (x)0J) indicates that the input image is at x0And extracting the jth feature map of the 5 th convolutional layer.
2. Firstly, uniformly expressing a read traffic image set I by using an adaptive context selection model, wherein the specific form is as follows:whereinA representation of the target image is shown,representing a context image containing a target image, ynE {0,1} represents the label of the positive and negative samples, n represents the sample index:
2.1 input of target image of 80X 48 sizeAnd its corresponding context image of size 144 x 112Using CNN to extract the feature map of the 5 th convolution layer to obtain 256 target feature maps of 5 × 3 size256 and 256 corresponding context feature maps of size 9 × 7j=1,...,256。
2.2 to be able to compare the target feature maps at the same scaleAnd its corresponding contextual feature graphThe difference of (2) is required for the target feature mapUpsampling was performed to a size of 9 × 7 for each feature map, and this was recorded as
2.3 measuring two characteristic graphs by cosine similarityAndthe difference in (a). Mapping target featuresIs marked as x, context feature mapIs noted as y, when the similarity is less than a certain empirical threshold epsilon, i.e.
Scos(x,y)≤ε (4)
If it is notAndwith less difference, discarding the context feature mapOtherwise, the context feature map is retainedAnd finally record the location index of the retained contextual feature map. The results of 2N positive and negative sample pictures are counted, sorted according to the frequency of occurrence of corresponding positions, and finally the position index of the reserved context feature picture is selected, so that the self-adaptive context feature selection is realized, wherein N is more than 0 and is an integer. About 85% of the contextual characteristic map is reserved in the traffic system.
3. Training to obtain relevant parameters of the traffic detection system, and obtaining a context mask and a target mask by adopting a forward process in a training stage:
3.1 separately for the target imageAnd upper including the targetContext imageUsing 1) to extract CNN characteristics to obtain corresponding target characteristic diagramAnd contextual feature graphsWhere j 1.., 256 represents the number of feature maps.
3.2 selecting a model according to the self-adaptive context characteristics obtained in the step 2, and measuring a target characteristic graph by utilizing cosine similarityAnd corresponding contextual feature graphAccording to the difference of (3), selecting K contextual feature graphs needing to be reserved from 256 contextual feature graphs according to a threshold valueWherein q is 1.
3.3 convolving the target feature map and the context feature map respectively by using different target kernels and context kernels to obtain corresponding target masksAnd context maskRespectively expressed as:
wherein,boindicating the target core and the corresponding offset,bcrepresenting the context core and the corresponding offset. Target nucleusAnd context coreRespectively in the sizes ofAndare consistent in size.The val id convolution (which has boundary loss) is used, soAndare all scalar quantities.
3.4 detection score for blending into contextual information scorenExpressed as:
where γ represents a mixing coefficient of the object and the context, γ ∈ [0,1 ]. Different γ needs to be obtained for different scenarios. By reflecting the variable in different scenes, different functions of different context information on target detection are achieved, for example, if γ is 0, the context has no function on target detection, and the model is equivalent to a CNN target detection model without context being considered.
3.5 the system uses the minimum mean square error method to establish the objective function, and gradually reduces score by using BP algorithmnAnd label ynTo the error between. The objective function of the model is:
where 2N represents the total number of positive and negative samples. In order to solve the optimization problem of the related parameters in the above formula, the system trains the parameters in the model by using a random gradient descent method, and all the parameters w are updated by using the following formula until convergence:
where i represents the index of the iteration, α represents the learning rate of the gradient descent algorithm, updating the relevant parameters requires iterative computation of the gradient of the objective function l (g).
4. Detecting a traffic target: and predicting the target position by combining the target mask image and the context mask image, and then acquiring the detection result of the traffic target by non-maximum inhibition.
4.1 first, the system inputs an image I of a traffic scenenAnd extracting feature maps by the method in the step 1 to generate 256 feature maps.
4.2 then, according to the obtained context feature diagram selection model, K effective context feature diagrams are selected from the 256 feature diagrams, and the existing target feature diagrams are supplemented with context information.
4.3 then, convolving the context feature map and the target feature map with the corresponding convolution kernels respectively to obtain the corresponding target mask mapAnd context mask map
4.4 in the detection phase, the same convolution is used (the convolution has no boundary loss), soAndare all matrices. Finally, a target mask map is utilizedAnd context mask mapAnd jointly predicting the position M of the target in a weighting mode.
4.5, obtaining the detection result of the corresponding traffic target through the post-processing of non-maximum value inhibition.
The invention has the beneficial effects that: in the above solution of the present invention, for the problem of insufficient information of the traffic target itself, the related information from outside the target in the picture or video, such as the context information around the target, is used to directly or indirectly provide the auxiliary information for target detection, thereby improving the accuracy of traffic target detection. The scheme provides a CNN traffic detection system based on self-adaptive context information. The method mainly comprises a CNN-based adaptive context selection model and a traffic detection system fusing the model. Compared with the existing system, the method has the advantages that the context information around the target and the difference of different traffic scenes are fused, so that the accuracy of vehicle and pedestrian detection is further improved.
Description of the drawings:
FIG. 1 is an overall framework diagram of a context information adaptive CNN traffic detection system;
FIG. 2 is a diagram of an adaptive context selection model;
FIG. 3 is a parameter learning diagram for a context information adaptive CNN traffic detection system;
FIG. 4 is a diagram of a CNN traffic detection system detection process with adaptive context information;
fig. 5 is a diagram of the results of the traffic target detecting section.
The specific implementation mode is as follows:
in the traffic video, the description of the traffic target can be further enriched by means of the differential context information in different traffic scenes, so that the accuracy of the traffic target detection is improved. The overall framework is shown in fig. 1 and mainly comprises a training phase and a detection phase.
The training phase mainly comprises two steps. In the first step, under a specific traffic scene, an adaptive context feature selection model is trained. Firstly, extracting two groups of CNN characteristic graphs of a traffic target image and a context image thereof in a specific traffic scene; then, calculating the difference between the two groups of feature maps under the same scale, and recording and counting the position indexes of the feature maps with the sample difference degree smaller than a set threshold value; and then, selecting the position indexes of K effective context CNN characteristic graphs to obtain a self-adaptive context selection model. And secondly, training a CNN traffic detection system based on the self-adaptive context information on the basis of obtaining the self-adaptive context feature selection model. In the forward stage, firstly, two groups of CNN feature maps of a traffic target image and a context image thereof are extracted, K feature map position indexes reserved by a context feature selection model are utilized, and corresponding feature maps are reserved from the obtained context CNN feature maps; then, performing convolution calculation on the two groups of obtained feature graphs by using a target kernel and a context kernel respectively to obtain a target score and a context score; and then, fusing the target score and the context score through a mixing coefficient to obtain a detection score. In the backward stage, the error of the detection score and the label is calculated, and parameters such as a target core, a context core, a mixing coefficient and the like are updated by using a BP (Back propagation) algorithm.
In the detection stage, under a specific traffic scene, firstly, inputting a detected traffic image, and extracting 256 feature maps by using CNN (convolutional neural network), on one hand, obtaining a target mask map by using a trained target kernel to convolve the 256 feature maps; on the other hand, K feature maps are selected from 256 feature maps by using K feature map position indexes reserved by the context feature selection model, and a context mask map is obtained by performing convolution by using a trained context kernel. And then, fusing the obtained target mask image and the context mask image by using the trained mixing coefficient to jointly predict the target position. And finally, accurately framing the traffic target through post-processing.
According to the scheme, the context characteristics can be adaptively blended according to different scenes through the difference measurement, the representation of the traffic target is enhanced, and the accuracy of traffic target depiction is effectively improved.
A CNN feature extraction
1) The CNN traffic detection system based on the self-adaptive context information adopts the first five-layer structure of Alexnet to extract the characteristic diagram of the traffic image. The detailed steps are as follows:
(1.1) assume that the input image is x0It is expressed asWherein,andrespectively representing images x0Three-channel map in RGB space. The index of the convolutional layer is denoted by l, 1,2,3,4, 5. Ml denotes the number of first convolutional layer feature maps, M1=96、M2=256、M3=384、M4=384、M5256. The jth feature map of the ith convolutional layerThe calculation method is as follows:
wherein, WlShowing the connection relationship of the characteristic diagrams of the adjacent convolutional layers.Representing a convolution operation.Andrepresenting the convolution kernel and the offset, respectively.
(1.2)Obtained by the pooling layer and the non-linear layer of the l-th layerExpressed as:
wherein g (g) represents the local response normalization, f (g) represents the activation function, and the unsaturated nonlinear function is adopted:
thus, CNN is for input image x0256 feature maps were obtained for the fifth convolutional layerj 1.. 256. Each feature mapIs the size of the input image x01/16 of (1). For convenience of expression, the system uses F (x)0J) indicates that the input image is at x0And extracting the jth feature map of the 5 th convolutional layer.
Two-adaptive context selection model
2) Firstly, uniformly expressing a read traffic image set I, wherein the specific form is as follows:whereinA representation of the target image is shown,representing a context image containing a target image, ynE {0,1} represents the label of the positive and negative samples, and n represents the sample index. The specific process is shown in fig. 2:
(2.1) for the input of the target image with the size of 80X 48And its corresponding context image of size 144 x 112Using CNN to extract the feature map of the 5 th convolution layer to obtain 256 target feature maps of 5 × 3 size256 and 256 corresponding context feature maps of size 9 × 7j=1,...,256。
(2.2) to enable comparison of target feature maps at the same scaleAnd its corresponding contextual feature graphThe difference of (2), the target featureDrawing (A)Upsampling was performed to a size of 9 × 7 for each feature map, and this was recorded as
(2.3) measuring two characteristic graphs by adopting cosine similarity methodAndthe difference in (a). Mapping target featuresIs marked as x, context feature mapIs noted as y, when the similarity is less than a certain empirical threshold epsilon, i.e.
Scos(x,y)≤ε(4)
If it is notAndwith less difference, discarding the context feature mapOtherwise, the context feature map is retainedAnd finally record the location index of the retained contextual feature map. By performing a result on 2N positive and negative sample picturesAnd line counting, sequencing according to the frequency of the corresponding position, and finally selecting the position index of the reserved context feature graph to realize self-adaptive context feature selection. About 85% of the contextual characteristic map is reserved in the traffic system.
Three-training acquisition of relevant parameters of traffic detection system
3) The training phase adopts a forward process to obtain a context mask and a target mask, and the overall framework of the related parameter learning is shown in fig. 3.
(3.1) separately for the target imagesAnd a context image containing the targetUsing 1) to extract CNN characteristics to obtain corresponding target characteristic diagramAnd contextual feature graphsWhere j 1.., 256 represents the number of feature maps.
(3.2) according to the self-adaptive context feature selection model obtained in the step 2), measuring a target feature map by utilizing cosine similarityAnd corresponding contextual feature graphAccording to the difference of (3), selecting K contextual feature graphs needing to be reserved from 256 contextual feature graphs according to a threshold valueWherein q is 1.
(3.3) respectively convolving the target feature graph and the context feature graph by using different target kernels and context kernels to obtain corresponding target masksAnd context maskRespectively expressed as:
wherein,boindicating the target core and the corresponding offset,bcrepresenting the context core and the corresponding offset. Target nucleusAnd context coreRespectively in the sizes ofAndare consistent in size.The valid convolution is used (the convolution has boundary loss), soAndare all scalar quantities.
(3.4) detection score merged into context informationnExpressed as:
where γ represents a mixing coefficient of the object and the context, γ ∈ [0,1 ]. Different γ needs to be obtained for different scenarios. By reflecting the variable in different scenes, different functions of different context information on target detection are achieved, for example, if γ is 0, the context has no function on target detection, and the model is equivalent to a CNN target detection model without context being considered.
(3.5) the system adopts a minimum mean square error method to establish an objective function, and gradually reduces the score by using a BP algorithmnAnd label ynTo the error between. The objective function of the model is:
where 2N represents the total number of positive and negative samples. In order to solve the optimization problem of the related parameters in the above formula, the system trains the parameters in the model by using a random gradient descent method, and all the parameters w are updated by using the following formula until convergence:
where i represents the index of the iteration, α represents the learning rate of the gradient descent algorithm, updating the relevant parameters requires iterative computation of the gradient of the objective function l (g).
Traffic target detection based on traffic detection system
4) The target position is predicted by combining the target mask map and the context mask map, and then the detection result of the traffic target is obtained by non-maximum suppression, and the detection process is shown in fig. 4.
(4.1) first, the system inputs an image I of a traffic scenenThe feature maps are extracted by the method 1) to generate 256 feature maps.
And (4.2) according to the obtained context feature diagram selection model, selecting K effective context feature diagrams from 256 feature diagrams, and supplementing the context information to the existing target feature diagram.
(4.3) then, respectively convolving the context feature map and the target feature map with corresponding convolution kernels to obtain corresponding target mask mapsAnd context mask map
(4.4) in the detectionStage, the same convolution is adopted (the convolution has no boundary loss), soAndare all matrices. Finally, a target mask map is utilizedAnd context mask mapAnd jointly predicting the position M of the target in a weighting mode.
And (4.5) obtaining a detection result of the corresponding traffic target through post-processing of non-maximum suppression.
The scheme establishes a high-efficiency and quick traffic detection system. The system achieves satisfactory detection results of traffic targets as shown in fig. 5.

Claims (5)

1. A CNN traffic detection method based on self-adaptive context information is characterized by comprising a training phase and a detection phase,
the training phase comprises two steps:
firstly, under a specific traffic scene, training and acquiring a self-adaptive context feature selection model; firstly, extracting two groups of CNN characteristic graphs of a traffic target image and a context image thereof in a specific traffic scene; then, calculating the difference between the two groups of feature maps under the same scale, and recording and counting the position indexes of the feature maps with the sample difference degree smaller than a set threshold value; then, selecting position indexes of K effective context CNN characteristic graphs to obtain a self-adaptive context selection model, wherein K is more than 0 and is an integer;
secondly, training a CNN traffic detection system based on self-adaptive context information on the basis of obtaining a self-adaptive context feature selection model; in the forward stage, firstly, two groups of CNN feature maps of a traffic target image and a context image thereof are extracted, K feature map position indexes reserved by a context feature selection model are utilized, and corresponding effective feature maps are reserved from the obtained context CNN feature maps; then, performing convolution calculation on the two groups of obtained feature graphs by using a target kernel and a context kernel respectively to obtain a target score and a context score; then, fusing the target score and the context score through a mixing coefficient to obtain a detection score;
in the backward stage, calculating errors of the detection values and the labels, and updating the target kernel, the context kernel and the mixing coefficient parameters by using a BP algorithm;
a detection stage: in a specific traffic scene, firstly, inputting a detected traffic image, extracting 256 feature maps by using CNN (compressed natural number), and on one hand, obtaining a target mask map by using a trained target kernel convolution 256 feature maps; on the other hand, K feature maps are selected from 256 feature maps by using K feature map position indexes reserved by the context feature selection model, and a context mask map is obtained by performing convolution by using a trained context kernel; then, fusing the obtained target mask image and the context mask image by using the trained mixing coefficient, and jointly predicting the target position; and finally, accurately framing the traffic target through post-processing.
2. The CNN traffic detection method based on adaptive context information according to claim 1, wherein the feature extraction on the image feature map adopts a first five-layer structure of a CNN-based Alexnet model to extract a corresponding feature map; the method comprises the following specific steps:
(1) suppose the input image is x0It is expressed asWherein,and
respectively representing images x0A three-channel map in RGB space; the index of the convolutional layer is denoted by l, 1,2,3,4, 5; mlNumber of first convolutional layer feature map, M1=96、M2=256、M3=384、M4=384、M5256; the jth feature map of the ith convolutional layerThe calculation method is as follows:
wherein, WlRepresenting the connection relation of the characteristic diagrams of the adjacent convolutional layers;represents a convolution operation;andrespectively representing a convolution kernel and an offset;
(2)obtained by the pooling layer and the non-linear layer of the l-th layerExpressed as:
wherein g (g) represents the local response normalization, f (g) represents the activation function, and the unsaturated nonlinear function is adopted:
thus, CNN is for input image x0256 feature maps were obtained for the fifth convolutional layerEach feature mapIs the size of the input image x01/16 of (1); for convenience of expression, the system uses F (x)0J) indicates that the input image is at x0And extracting the jth feature map of the 5 th convolutional layer.
3. The CNN traffic detection method based on adaptive context information according to claim 1, wherein the adaptive context selection model obtaining process is as follows: firstly, uniformly expressing a read traffic image set I, wherein the specific form is as follows:whereinA representation of the target image is shown,representing a context image containing a target image, ynE {0,1} represents a label of positive and negative samples, n represents a sample index; then:
(1) for the input of the target image with the size of 80 × 48And its corresponding context image of size 144 x 112Using CNN to extract the feature map of the 5 th convolution layer to obtain 256 target feature maps of 5 × 3 sizeAnd 256 corresponding context feature maps of size 9 × 7
(2) To compare target feature maps at the same scaleAnd its corresponding contextual feature graphThe difference of (2) is required for the target feature mapUpsampling was performed to a size of 9 × 7 for each feature map, and this was recorded as
(3) Method for measuring two characteristic graphs by adopting cosine similarityAnda difference of (a); mapping target featuresIs marked as x, context feature mapIs noted as y, when the similarity is less than a certain empirical threshold epsilon, i.e.
Scos(x,y)≤ε (4)
If it is notAndwith less difference, discarding the context feature mapOtherwise, the context feature map is retainedFinally recording the position index of the reserved context feature graph; the method comprises the steps of counting results of 2N positive and negative sample pictures, sorting according to the frequency of occurrence of corresponding positions, and finally selecting a reserved context feature picture position index to realize self-adaptive context feature selection, wherein N is greater than 0 and is an integer.
4. The CNN traffic detection method based on adaptive context information of claim 3, further comprising training to obtain relevant parameters of the traffic detection system, wherein the training phase adopts a forward process to obtain a context mask and a target mask:
(1) respectively aiming at the target imageAnd a context image containing the targetExtracting CNN characteristics to obtain corresponding target characteristic diagramAnd contextual feature graphsWhere j 1., 256 represents the number of feature maps;
(2) the adaptive context feature selection model obtained according to claim 3, using cosine similarity measure target feature mapAnd corresponding contextual feature graphAccording to the difference of (3), selecting K contextual feature graphs needing to be reserved from 256 contextual feature graphs according to a threshold valueWherein q is 1, K is less than or equal to 256;
(3) respectively convolving the target feature graph and the context feature graph by using different target kernels and context kernels to obtain corresponding target masksAnd context maskRespectively expressed as:
wherein,boindicating the target core and the corresponding offset,bcrepresenting the context core and the corresponding offset; target nucleusAnd context coreRespectively in the sizes ofAndare consistent in size;
the valid convolution is used (the convolution has boundary loss), soAndare all scalars;
(4) detection score incorporated into context informationnExpressed as:
wherein gamma represents the mixing coefficient of the target and the context, and gamma belongs to [0,1 ]; different gammas need to be obtained for different scenes; the variable is reflected in different scenes, different functions of different context information on target detection are realized, if gamma is o, the context has no function on the target detection, and equivalently, the model does not need to be merged into the context, and the model is a CNN target detection model without considering the context;
(5) the system adopts a minimum mean square error method to establish an objective function, and gradually reduces the score by using a BP algorithmnAnd label ynThe error between; the objective function of the model is:
wherein 2N represents the total number of positive and negative samples; in order to solve the optimization problem of the related parameters in the above formula, the system trains the parameters in the model by using a random gradient descent method, and all the parameters w are updated by using the following formula until convergence:
wherein i represents the index of iteration, α represents the learning rate of the gradient descent algorithm, and the gradient of the objective function L (g) needs to be repeatedly calculated by updating relevant parameters.
5. The CNN traffic detection method based on adaptive context information of claim 2, further comprising traffic target detection, wherein the target position is predicted jointly through a target mask map and a context mask map, and then the detection result of the traffic target is obtained through non-maximum suppression:
(1) first, the system inputs an image I of a traffic scenenExtracting feature maps by the method of claim 2 to generate 256 feature maps;
(2) then, according to the obtained context feature diagram selection model, K effective context feature diagrams are selected from 256 feature diagrams, and the existing target feature diagrams are supplemented with context information;
(3) then, respectively convolving the context feature map and the target feature map with corresponding convolution kernels to obtain corresponding target mask mapsAnd context mask map
(4) In the detection phase, the same convolution is used (the convolution has no boundary loss), so thatAndare all matrices; finally, a target mask map is utilizedAnd context mask mapJointly predicting the position M of the target in a weighting mode;
(5) and obtaining a detection result of the corresponding traffic target through post-processing of non-maximum value inhibition.
CN201610786130.1A 2016-08-31 2016-08-31 CNN Vehicle Detection method based on adaptive contextual information Expired - Fee Related CN106372597B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610786130.1A CN106372597B (en) 2016-08-31 2016-08-31 CNN Vehicle Detection method based on adaptive contextual information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610786130.1A CN106372597B (en) 2016-08-31 2016-08-31 CNN Vehicle Detection method based on adaptive contextual information

Publications (2)

Publication Number Publication Date
CN106372597A CN106372597A (en) 2017-02-01
CN106372597B true CN106372597B (en) 2019-09-13

Family

ID=57898758

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610786130.1A Expired - Fee Related CN106372597B (en) 2016-08-31 2016-08-31 CNN Vehicle Detection method based on adaptive contextual information

Country Status (1)

Country Link
CN (1) CN106372597B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270668A (en) * 2020-11-06 2021-01-26 南京斌之志网络科技有限公司 Suspended cable detection method and system and electronic equipment

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273870A (en) * 2017-07-07 2017-10-20 郑州航空工业管理学院 The pedestrian position detection method of integrating context information under a kind of monitoring scene
CN107563299B (en) * 2017-08-07 2021-06-15 郑州信息科技职业学院 Pedestrian detection method using RecNN to fuse context information
CN108229477B (en) * 2018-01-25 2020-10-09 深圳市商汤科技有限公司 Visual relevance identification method, device, equipment and storage medium for image
CN109658412B (en) * 2018-11-30 2021-03-30 湖南视比特机器人有限公司 Rapid packaging box identification and segmentation method for unstacking and sorting
CN111833601B (en) * 2020-06-28 2022-05-20 北京邮电大学 Macroscopic traffic law modeling method with low communication cost

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104067314A (en) * 2014-05-23 2014-09-24 中国科学院自动化研究所 Human-shaped image segmentation method
CN105740891A (en) * 2016-01-27 2016-07-06 北京工业大学 Target detection method based on multilevel characteristic extraction and context model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104067314A (en) * 2014-05-23 2014-09-24 中国科学院自动化研究所 Human-shaped image segmentation method
CN105740891A (en) * 2016-01-27 2016-07-06 北京工业大学 Target detection method based on multilevel characteristic extraction and context model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Intriguing properties of neural networks;Christian Szegedy等;《arXiv:1312.6199 》;20140219;全文 *
Learning Deep Features for Scene Recognition using Places Database;Bolei Zhou等;《Advances in Neural Information Processing Systems 27 (NIPS 2014)》;20141231;全文 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270668A (en) * 2020-11-06 2021-01-26 南京斌之志网络科技有限公司 Suspended cable detection method and system and electronic equipment
CN112270668B (en) * 2020-11-06 2021-09-21 威海世一电子有限公司 Suspended cable detection method and system and electronic equipment

Also Published As

Publication number Publication date
CN106372597A (en) 2017-02-01

Similar Documents

Publication Publication Date Title
CN106372597B (en) CNN Vehicle Detection method based on adaptive contextual information
CN113065558B (en) Lightweight small target detection method combined with attention mechanism
WO2023077816A1 (en) Boundary-optimized remote sensing image semantic segmentation method and apparatus, and device and medium
US20200250436A1 (en) Video object segmentation by reference-guided mask propagation
CN109086811B (en) Multi-label image classification method and device and electronic equipment
CN110659664B (en) SSD-based high-precision small object identification method
CN110378381A (en) Object detecting method, device and computer storage medium
CN110879982B (en) Crowd counting system and method
CN107273870A (en) The pedestrian position detection method of integrating context information under a kind of monitoring scene
CN110659601B (en) Depth full convolution network remote sensing image dense vehicle detection method based on central point
CN102542302A (en) Automatic complicated target identification method based on hierarchical object semantic graph
CN113052106B (en) Airplane take-off and landing runway identification method based on PSPNet network
CN113743417B (en) Semantic segmentation method and semantic segmentation device
CN112036455A (en) Image identification method, intelligent terminal and storage medium
CN111768415A (en) Image instance segmentation method without quantization pooling
CN114913498A (en) Parallel multi-scale feature aggregation lane line detection method based on key point estimation
CN108776777A (en) The recognition methods of spatial relationship between a kind of remote sensing image object based on Faster RCNN
CN115345905A (en) Target object tracking method, device, terminal and storage medium
CN116524189A (en) High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
CN114999637A (en) Pathological image diagnosis method and system based on multi-angle coding and embedded mutual learning
CN116883868A (en) Unmanned aerial vehicle intelligent cruising detection method based on adaptive image defogging
CN117727046A (en) Novel mountain torrent front-end instrument and meter reading automatic identification method and system
CN117253044A (en) Farmland remote sensing image segmentation method based on semi-supervised interactive learning
CN112070181B (en) Image stream-based cooperative detection method and device and storage medium
CN112115786B (en) Monocular vision odometer method based on attention U-net

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20170801

Address after: 450000, Henan economic and Technological Development Zone, Zhengzhou Province Second Street West, south one South Road, Xinghua science and Technology Industrial Park, No. 2, building 9, room 908, -37

Applicant after: ZHENGZHOU CHANTU INTELLIGENT TECHNOLOGY CO.,LTD.

Address before: Yuelu District City, Hunan province 410000 Changsha Lushan Road No. 932

Applicant before: Li Tao

GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190913

Termination date: 20210831