CN113111711A - Pooling method based on bilinear pyramid and spatial pyramid - Google Patents

Pooling method based on bilinear pyramid and spatial pyramid Download PDF

Info

Publication number
CN113111711A
CN113111711A CN202110265552.5A CN202110265552A CN113111711A CN 113111711 A CN113111711 A CN 113111711A CN 202110265552 A CN202110265552 A CN 202110265552A CN 113111711 A CN113111711 A CN 113111711A
Authority
CN
China
Prior art keywords
pooling
feature
bilinear
feature map
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110265552.5A
Other languages
Chinese (zh)
Inventor
邵一鸣
包晓安
包梓群
许铭洋
马云龙
马铉钧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Zhejiang Sci Tech University ZSTU
Zhejiang University of Science and Technology ZUST
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110265552.5A priority Critical patent/CN113111711A/en
Publication of CN113111711A publication Critical patent/CN113111711A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pooling method based on bilinear and spatial pyramids, and belongs to the field of image processing and computer vision. The invention comprises the following steps: acquiring a video stream, and intercepting a target image to be processed; extracting features of different levels or different categories in the target image; fusing the feature groups through a bilinear method to obtain a global feature map; pyramid pooling is carried out on the fused global feature map, and the dimensionality of the feature map is reduced; and normalizing the feature map subjected to dimension reduction to be used as the final feature of the target image, finishing pooling operation, and using the obtained final feature for subsequent classification to realize identification of the object to be detected. The method is suitable for behavior recognition in the image and pooling operation in target detection, reduces dimensionality of multi-feature fusion, improves recognition efficiency, and meets different recognition requirements for multiple features in recognition.

Description

Pooling method based on bilinear pyramid and spatial pyramid
Technical Field
The invention relates to the field of image processing and computer vision, in particular to a pooling method based on bilinear and spatial pyramids.
Background
In the era of rapid development of intelligent science and technology, functions of behavior recognition, target detection and the like of intelligent monitoring are gradually perfected and popularized, and pooling operation is often used in a convolutional neural network to reduce the dimension of a feature vector output by a convolutional layer and improve a result under the condition of minimum influence on the expression of original image semantics. The image has the characteristic of 'statics', useful features can be shared and applied in different image areas, the human visual system is simulated, and the pooling operation can be used for carrying out aggregation statistics on the features at different positions.
The traditional pooling mode generally comprises average pooling, maximum pooling, random pooling and the like, namely, one of the average value and the maximum value of the corresponding image area is taken, the elements are randomly selected according to the probability, and the probability that the element value is selected according to the random selection is also successively increased based on the large value of the element in the random selection, so that the value range of the maximum value is ensured, the existence sense of other elements is kept, and excessive distortion is prevented.
And finally, considering the corresponding advantages and disadvantages of different pooling methods, the invention aims to adopt bilinear pooling to fuse two features and then obtains a corresponding feature map by means of pyramid pooling dimension reduction and fixed output dimension, thereby being better helpful for the accuracy of behavior recognition target detection.
Disclosure of Invention
In order to overcome the defects of the conventional image pooling method aiming at behavior recognition, target detection and the like, the method combines bilinear pooling and pyramid pooling, firstly performs multi-feature extraction on an object in a target image, performs bilinear fusion on a feature group to obtain a fused global feature map, and then performs pyramid pooling on the corresponding position of the global feature map. The pooling method disclosed by the invention integrates more image characteristics, reduces data loss, lays a foundation for improving the subsequent classification accuracy, generates output with a fixed size aiming at image input with any size, can be suitable for various classifiers, and is wide in application. The technical scheme adopted by the invention for solving the technical problems is as follows:
a bilinear and spatial pyramid-based pooling method comprises the following steps:
s1: acquiring a video stream according to a time sequence recorded by a monitoring system, wherein the video stream comprises an object to be detected;
s2: preprocessing the intercepted video stream, wherein the preprocessing comprises video shot segmentation and key frame extraction, and the extracted key frame image is used as a target image;
s3: identifying an object in a target image, labeling a candidate frame, and performing multi-feature extraction on the object in the candidate frame to obtain multi-feature data;
s4: multiplying multiple features corresponding to the same position of a target image by a bilinear method to obtain a local feature map, and summing and pooling the local feature maps corresponding to all target positions in the image to obtain a fused global feature map;
s5: pyramid pooling is carried out on the fused global feature map, and the dimensionality of the feature map is reduced; and normalizing the feature map subjected to dimension reduction to be used as the final feature of the target image, finishing pooling operation, and using the obtained final feature for subsequent classification to realize identification of the object to be detected.
Compared with the prior art, the invention has the advantages that:
(1) the invention can realize the fusion of feature groups of different levels and various categories by utilizing a bilinear method, and the feature groups can be related feature groups of different levels and different frequencies; or similar feature groups extracted in different extraction modes, wherein the individual features have the original dimensionality. Because the fused feature map contains features of different levels and different types, the obtained feature information is more comprehensive, and a foundation is laid for improving the subsequent classification accuracy.
(2) The invention further adopts a pyramid pooling method to reduce the dimension of the feature graph after bilinear fusion, and performs pooling operation of different sizes on the feature graph to obtain feature information of different resolutions, thereby effectively improving the identification precision of the network on the features. Compared with the traditional R-CNN dimension reduction method, the method has the advantages that the global feature map is divided by using windows with different scales, pooling is carried out on each feature region, weights of different layers are set, finally, features with uniform square sum and dimension are obtained through splicing, the numerical value of the features is far smaller than the dimension of the features under the R-CNN calculation mode, and the calculation efficiency is high.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of bilinear pooling employed in the present embodiment;
fig. 3 is a schematic diagram of spatial pyramid pooling employed in the present embodiment.
Detailed Description
The invention is further described by the following detailed description in conjunction with the accompanying drawings.
As shown in fig. 1, the pooling method based on bilinear pyramid and spatial pyramid provided by the present invention combines bilinear pooling with spatial pyramid pooling, and is used for multi-feature fusion and reduction to obtain uniform dimensionality, and includes the following steps:
step 1: and acquiring screening data to obtain video streams. Part of data of the invention is from an INRIA XMAX multi-view video library, and part of data is shot and recorded by a monitoring system.
Step 2: and preprocessing the intercepted video stream, wherein the preprocessing comprises video shot segmentation and key frame extraction, and the extracted key frame image is used as a target image.
And step 3: and identifying the object in the target image, labeling the candidate frame, performing multi-feature extraction on the object in the candidate frame, and acquiring multi-feature data.
And 4, step 4: as shown in fig. 2, the extracted multi-feature points are linearly fused by a bilinear pooling method, and a feature map after linear fusion is output.
And 5: as shown in fig. 3, the linear fused feature map is subjected to dimension reduction by using a spatial pyramid pooling method, and the output size is unified for further processing.
In a specific implementation of the present invention, the video stream obtained in step 1 includes shot images of different viewing angles of the object to be detected, and is set according to a specific application scenario. The method also comprises a step of dividing each key frame image into area blocks by adopting a block matching method between the step 2 and the step 3, wherein the similarity between the continuous frames is judged by comparing the corresponding area blocks, and the method utilizes the local characteristics of the images to inhibit noise.
In one embodiment of the present invention, the multiple features extracted in step 3 form a feature group, each type of feature has its own original dimension, and the feature group may be related feature groups of different levels and different frequencies; or similar feature groups extracted in different extraction manners. In this embodiment, a dynamic video feature extraction technology is used to perform multi-feature extraction.
As shown in fig. 2, the extracted multiple features are subjected to bilinear fusion processing, multiple features corresponding to the same position of the target image are multiplied to obtain a local feature map, and then the local feature maps corresponding to all target positions in the image are summed and pooled to obtain a fused global feature map; the method specifically comprises the following steps:
for the target image I in fig. 2, different features are extracted through two branches, and the features extracted at the same position are multiplied by each other, and the calculation formula is as follows:
Figure BDA0002971576350000041
fA(l,I)∈R1×M
fB(l,I)∈R1×N
in the formula, b (l, I, f)A,fB) Representing a local feature map at a position l of the target image I after bilinear fusion, fA(l, I) and fB(l, I) are two features extracted at the same location l of the image I, M and N are the number of feature channels, and T is the transpose.
Summing and pooling local feature maps corresponding to all target positions in the image to obtain a global feature map, wherein the calculation formula is as follows:
Figure BDA0002971576350000042
where ξ (I) represents the global feature map of the target image, the final output in fig. 2.
In a specific implementation of the present invention, as shown in fig. 3, pyramid pooling of different scales is performed on the fused global feature map, so as to reduce the dimension of the feature map and obtain feature information with different resolutions; normalizing the feature map subjected to dimension reduction, and outputting the feature map with unified standard dimensions as final features of the target image; it should be noted here that the input data of fig. 3 is the final output in fig. 2, i.e., ξ (I) in the above description, and the picture given in fig. 3 is only for exemplary purposes. The method specifically comprises the following steps:
1) and dividing the global feature map by using windows with different scales, wherein each scale represents one layer of the pyramid, and the global feature map is divided into image blocks in each layer.
Image division calculation formula:
win-size ═ a/n-pooling windows (rounded up);
str-size ═ a/n-pooling stride size (rounded down);
a represents that the size of the feature map input by the pyramid pooling layer is a multiplied by a;
n represents the size of the feature map output by the pyramid pooling layer as n × n.
2) Performing unified pooling operation on each image block, setting the number of feature layers obtained by pooling in each layer as a weight, and extracting higher-level image feature information; the pooling operation described herein may be maximum pooling, average pooling, or random pooling.
3) And cascading the feature vectors of the corresponding dimensionality generated by each layer.
4) And carrying out normalization processing on the cascaded feature vectors to serve as final features of the target image.
In this embodiment, as shown in fig. 3, three layers of pyramids are used for pooling, different dedicated resolutions of corresponding layers are formulated, and feature maps of corresponding layers are extracted.
Setting the resolution of the pooling layer 1 as a multiplied by a, the resolution of the pooling layer 2 as b multiplied by b, the resolution of the pooling layer 3 as c multiplied by c, and taking the number of the feature layers of each pooling layer as x, y and z as weights. The corresponding dimensions of each layer are respectively a multiplied by x, b multiplied by y, c multiplied by z characteristic vectors, wherein x, y and z can select the size of corresponding numerical values according to the identification requirement, and if the occupied area of an identified object in the whole picture is small, the weight of the pooling layer in a small area needs to be correspondingly large; if the identified object focuses on the association of each part of the whole image and occupies a large area, the corresponding weight of the pooling layer in a relatively large area needs to be increased as appropriate. For example: when the method is applied to small targets such as traffic signal mark identification, as marks are generally fine, the occupied area in the picture is small, but the semantic information is concentrated, the z value of the pooling layer 3 in fig. 3 is large, and the total concentration (x + y + z) is heavier; similarly, when the method is applied to pedestrian motion recognition, because the specific area of a person in the whole picture is large, the weight z of the pooling layer 3 is reduced appropriately, the ratio of the weights x and y of the pooling layer 2 and the pooling layer 1 is relatively increased, in terms of the weight ratio, x, y and z are in percentage difference, but the ratio is a numerical value (integer) and directly determines the number of semantic features of each dimension after linear concatenation.
Referring to fig. 3, in this embodiment, the pooling layer 1 uses a 1 × 1 window to divide the global feature map, that is, the global feature map is not divided into small regions, the number of feature map layers output by the pooling layer 1 is set to be x, and a 1 × 1 × x feature vector is output; the pooling layer 2 divides the global feature map by adopting a 2 × 2 window, namely, the global feature map is divided into 2 × 2 small areas, the number of feature layers output by the pooling layer 2 is set as y, and 2 × 2 × y feature vectors are output by the pooling layer; the pooling layer 3 divides the global feature map by using a 4 × 4 window, that is, the global feature map is divided into 4 × 4 small regions, the number of feature layers output by the pooling layer 3 is set to z, and a 4 × 4 × z feature vector is output. And finally, performing linear cascade feature fusion on the three feature vectors to obtain (a multiplied by x + b multiplied by y + c multiplied by z) dimensional feature vectors, wherein the feature vectors are fused with the semantic information of bilinear feature fusion and simultaneously contain feature information of different scales and different levels. And carrying out normalization processing on the cascaded feature vectors through a normalization function, wherein the normalized feature vectors serve as final features of the target image and are used for subsequent classifier classification.
In this embodiment, after the values of x, y, and z are determined, due to the pooling layer settings with different resolutions, feature vectors with different numbers and different dimensions are obtained in each layer, wherein z 16-dimensional feature vectors are extracted from the pooling layer 3, y 4-dimensional feature vectors are extracted from the pooling layer 2, and x 1-dimensional feature vectors are extracted from the pooling layer 1, where feature vectors extracted from three layers are fused, where feature fusion is a conventional means in a neural network, and this embodiment employs a direct "early fusion": and (4) performing feature fusion by Concat to obtain the dimensionality of the linear cascade vector, namely adding the dimensionality vectors obtained by the three pooling layers respectively, wherein the fusion vector is used for classification of a subsequent classifier, and selecting semantic information with different dimensionalities to judge the recognition result, such as an SVM classifier and the like. The invention does not limit the selection of the feature fusion method and the classifier, and can be replaced according to the actual requirement and the operating environment.
According to the method, the bilinear pooling is used for fusing the feature groups, the pyramid pooling is used for dimensionality reduction, the corresponding feature graph is obtained in a fixed output dimensionality mode, the final feature graph comprises semantic information sampled under different levels, different features, different dimensionalities and resolution ratios, and the semantic information is used for subsequent classification, so that the accuracy of behavior identification and target detection can be effectively improved.
The foregoing lists merely illustrate specific embodiments of the invention. It is obvious that the invention is not limited to the above embodiments, but that many variations are possible. All modifications which can be derived or suggested by a person skilled in the art from the disclosure of the present invention are to be considered within the scope of the invention.

Claims (9)

1. A pooling method based on bilinear and spatial pyramids, characterized by comprising the following steps:
s1: acquiring a video stream according to a time sequence recorded by a monitoring system, wherein the video stream contains an object to be detected;
s2: preprocessing the intercepted video stream, wherein the preprocessing comprises video shot segmentation and key frame extraction, and the extracted key frame image is used as a target image;
s3: identifying an object in a target image, labeling a candidate frame, and performing multi-feature extraction on the object in the candidate frame to obtain multi-feature data;
s4: multiplying multiple features corresponding to the same position of a target image by a bilinear method to obtain a local feature map, and summing and pooling the local feature maps corresponding to all target positions in the image to obtain a fused global feature map;
s5: pyramid pooling is carried out on the fused global feature map, and the dimensionality of the feature map is reduced; and normalizing the feature map subjected to dimension reduction to be used as the final feature of the target image, finishing pooling operation, and using the obtained final feature for subsequent classification to realize identification of the object to be detected.
2. The bilinear and spatial pyramid based pooling method of claim 1, wherein the acquired video stream includes different view angle captured images of the object to be detected.
3. The bilinear and spatial pyramid-based pooling method of claim 1, further comprising a step of dividing each key frame image into region blocks by using a block matching method between said step S2 and step S3, wherein the similarity between consecutive frames is determined by comparing the corresponding region blocks.
4. The bilinear and spatial pyramid based pooling method of claim 1, wherein said multi-feature data includes different levels or classes of features.
5. The bilinear and spatial pyramid based pooling method of claim 1, wherein said bilinear method is calculated by the formula:
Figure FDA0002971576340000011
fA(l,I)∈R1×M
fB(l,I)∈R1×N
in the formula, b (l, I, f)A,fB) Representing a local feature map at a position l of the target image I after bilinear fusion, fA(l, I) and fB(l, I) are two features extracted at the same location l of the image I, M and N are the number of feature channels, and T is the transpose.
6. The bilinear and spatial pyramid-based pooling method of claim 5, wherein the local feature maps corresponding to all target positions in the image are summed and pooled to obtain a global feature map, and the calculation formula is:
Figure FDA0002971576340000021
in the formula, ξ (I) represents a global feature map of the target image.
7. The bilinear and spatial pyramid-based pooling method of claim 1, wherein said pyramid pooling process comprises:
7.1) dividing the global feature map by using windows with different scales, wherein each scale represents one layer of a pyramid, and the global feature map is divided into image blocks in each layer;
7.2) carrying out uniform pooling operation on each image block, and setting the number of feature layers obtained by pooling of each layer as a weight;
7.3) cascading the feature vectors of the corresponding dimensionality generated by each layer;
and 7.4) carrying out normalization processing on the cascaded feature vectors to be used as the final features of the target image.
8. Bilinear and spatial pyramid based pooling method according to claim 7, characterized in that said pooling of step 7.2) is a maximal pooling, an average pooling or a random pooling.
9. The bilinear and spatial pyramid-based pooling method of claim 1, wherein the fused global feature map is pyramid-pooled and then output in a uniform standard dimension.
CN202110265552.5A 2021-03-11 2021-03-11 Pooling method based on bilinear pyramid and spatial pyramid Pending CN113111711A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110265552.5A CN113111711A (en) 2021-03-11 2021-03-11 Pooling method based on bilinear pyramid and spatial pyramid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110265552.5A CN113111711A (en) 2021-03-11 2021-03-11 Pooling method based on bilinear pyramid and spatial pyramid

Publications (1)

Publication Number Publication Date
CN113111711A true CN113111711A (en) 2021-07-13

Family

ID=76711151

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110265552.5A Pending CN113111711A (en) 2021-03-11 2021-03-11 Pooling method based on bilinear pyramid and spatial pyramid

Country Status (1)

Country Link
CN (1) CN113111711A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160104056A1 (en) * 2014-10-09 2016-04-14 Microsoft Technology Licensing, Llc Spatial pyramid pooling networks for image processing
CN109215034A (en) * 2018-07-06 2019-01-15 成都图必优科技有限公司 A kind of Weakly supervised image, semantic dividing method for covering pond based on spatial pyramid
CN109902693A (en) * 2019-02-16 2019-06-18 太原理工大学 One kind being based on more attention spatial pyramid characteristic image recognition methods
WO2019222951A1 (en) * 2018-05-24 2019-11-28 Nokia Technologies Oy Method and apparatus for computer vision
CN110633661A (en) * 2019-08-31 2019-12-31 南京理工大学 Semantic segmentation fused remote sensing image target detection method
CN112329747A (en) * 2021-01-04 2021-02-05 湖南大学 Vehicle parameter detection method based on video identification and deep learning and related device
CN112329808A (en) * 2020-09-25 2021-02-05 武汉光谷信息技术股份有限公司 Optimization method and system of Deeplab semantic segmentation algorithm
CN112418176A (en) * 2020-12-09 2021-02-26 江西师范大学 Remote sensing image semantic segmentation method based on pyramid pooling multilevel feature fusion network

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160104056A1 (en) * 2014-10-09 2016-04-14 Microsoft Technology Licensing, Llc Spatial pyramid pooling networks for image processing
WO2019222951A1 (en) * 2018-05-24 2019-11-28 Nokia Technologies Oy Method and apparatus for computer vision
CN109215034A (en) * 2018-07-06 2019-01-15 成都图必优科技有限公司 A kind of Weakly supervised image, semantic dividing method for covering pond based on spatial pyramid
CN109902693A (en) * 2019-02-16 2019-06-18 太原理工大学 One kind being based on more attention spatial pyramid characteristic image recognition methods
CN110633661A (en) * 2019-08-31 2019-12-31 南京理工大学 Semantic segmentation fused remote sensing image target detection method
CN112329808A (en) * 2020-09-25 2021-02-05 武汉光谷信息技术股份有限公司 Optimization method and system of Deeplab semantic segmentation algorithm
CN112418176A (en) * 2020-12-09 2021-02-26 江西师范大学 Remote sensing image semantic segmentation method based on pyramid pooling multilevel feature fusion network
CN112329747A (en) * 2021-01-04 2021-02-05 湖南大学 Vehicle parameter detection method based on video identification and deep learning and related device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
段迅达等: "基于注意力机制与空间金字塔池化的行人属性识别", 《济南大学学报(自然科学版)》, vol. 34, no. 4, pages 342 - 349 *
马力等: "基于稀疏化双线性卷积神经网络的细粒度图像分类", 《模式识别与人工智能》, vol. 32, no. 4, pages 336 - 344 *

Similar Documents

Publication Publication Date Title
CN111639692B (en) Shadow detection method based on attention mechanism
JP4479478B2 (en) Pattern recognition method and apparatus
CN112257569B (en) Target detection and identification method based on real-time video stream
CN110176024B (en) Method, device, equipment and storage medium for detecting target in video
CN113011329A (en) Pyramid network based on multi-scale features and dense crowd counting method
CN115035295B (en) Remote sensing image semantic segmentation method based on shared convolution kernel and boundary loss function
CN110929649B (en) Network and difficult sample mining method for small target detection
CN110532959B (en) Real-time violent behavior detection system based on two-channel three-dimensional convolutional neural network
CN109902576B (en) Training method and application of head and shoulder image classifier
CN113052185A (en) Small sample target detection method based on fast R-CNN
CN110929685A (en) Pedestrian detection network structure based on mixed feature pyramid and mixed expansion convolution
CN112183649A (en) Algorithm for predicting pyramid feature map
CN113052170A (en) Small target license plate recognition method under unconstrained scene
CN116342894A (en) GIS infrared feature recognition system and method based on improved YOLOv5
Vijayan et al. A universal foreground segmentation technique using deep-neural network
Dargham et al. The effect of training data selection on face recognition in surveillance application
CN111488839B (en) Target detection method and target detection system
CN110717424B (en) Real-time minimum face detection method based on pretreatment mechanism
CN117315578A (en) Monitoring method and system for rust area expansion by combining classification network
CN113111711A (en) Pooling method based on bilinear pyramid and spatial pyramid
CN116309270A (en) Binocular image-based transmission line typical defect identification method
CN114998879A (en) Fuzzy license plate recognition method based on event camera
CN111783683A (en) Human body detection method based on feature balance and relationship enhancement
Song et al. Object tracking with dual field-of-view switching in aerial videos
CN117496160B (en) Indoor scene-oriented semantic segmentation method for low-illumination image shot by unmanned aerial vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination