CN111967522A - Image sequence classification method based on funnel convolution structure - Google Patents

Image sequence classification method based on funnel convolution structure Download PDF

Info

Publication number
CN111967522A
CN111967522A CN202010834656.9A CN202010834656A CN111967522A CN 111967522 A CN111967522 A CN 111967522A CN 202010834656 A CN202010834656 A CN 202010834656A CN 111967522 A CN111967522 A CN 111967522A
Authority
CN
China
Prior art keywords
convolution
funnel
convolution kernel
image sequence
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010834656.9A
Other languages
Chinese (zh)
Other versions
CN111967522B (en
Inventor
黄新俊
陈建炜
陈阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Tuge Medical Technology Co ltd
Original Assignee
Nanjing Tuge Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Tuge Medical Technology Co ltd filed Critical Nanjing Tuge Medical Technology Co ltd
Priority to CN202010834656.9A priority Critical patent/CN111967522B/en
Publication of CN111967522A publication Critical patent/CN111967522A/en
Application granted granted Critical
Publication of CN111967522B publication Critical patent/CN111967522B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image sequence classification method based on a funnel convolution structure, which comprises the following steps: step 1: extracting spatial features of the image sequence by adopting a convolution kernel of 1 × n; step 2: extracting short-term time characteristics of the image sequence by adopting a funnel convolution kernel; and step 3: extracting long-term time characteristics of the image sequence by using convlstm; and 4, step 4: adding weights to the features obtained in the steps 1-3 after the channel dimensions are connected; the method improves the 3D convolution kernel, replaces the 3D convolution kernel in the original network with the funnel convolution structure, and the funnel convolution structure completely separates the extraction of the time characteristic and the extraction of the space characteristic, so that the decoupling is better, the physical significance is more definite, the training parameters are reduced, the characteristics are extracted independently, the parameters have less mutual influence, and the effect is improved.

Description

Image sequence classification method based on funnel convolution structure
Technical Field
The invention belongs to the technical field of computer image processing, and particularly relates to an image sequence classification method based on a funnel convolution structure.
Background
Deep learning results from the stacking of sensors in machine learning. Convolutional neural networks, cyclic neural networks, etc. in deep learning may be used to solve problems including, but not limited to, classification, object detection, segmentation. In video classification, it is common to extract some frames, extract temporal features and spatial features for the frames, and then classify, i.e., classify the image sequence. There are three general categories of image sequence classification: 3D convolutional neural networks, convolutional neural networks + LSTM, dual-flow optical flow based networks. In the 3D convolution neural network, a 3X 3 convolution kernel is usually used, and the time feature and the space feature can be simultaneously extracted by using the 3D convolution kernel, so that the effect is better than that of extracting the space feature by using a single frame and a traditional method. The problem with the 3D convolution kernel is also apparent, i.e. the amount of parameters increases exponentially, leading to severe overfitting. The classical approach used in recent years to solve the problem of too large a number of 3D convolution kernels is to decompose the 3 x 3 convolution kernels into 1 x 3 and 3 x 1 convolution kernels, which are used to extract spatial and temporal features, respectively, to alleviate the overfitting problem.
However, temporal features cannot be extracted with only one convolution kernel 3 x 1. Since elements at the same location do not necessarily have the same semantics. For example, the position of the swing may not be in the same position in the next frame when the swing is swung, and the significance of extracting the time feature is lost even in the same position. Therefore, the 3 × 1 convolution kernel must extract features from the 1 × 3 convolution kernel before considering the pixels around the target position of the previous and subsequent frames. A problem arises in that spatial feature extraction affects temporal feature extraction, and parameter training is more difficult.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide an image sequence classification method based on a funnel convolution structure, which adopts a convolution structure that completely separates temporal features and spatial features, uses a funnel convolution kernel to extract short-term temporal features, uses a 1 × 3 convolution kernel to extract spatial features, extracts long-term temporal features, and uses a channel attention mechanism to give weights to different feature channels.
In order to achieve the technical purpose, the technical scheme adopted by the invention is as follows:
an image sequence classification method based on a funnel convolution structure,
performing image sequence classification by replacing a 3D convolution kernel in a 3D convolution neural network with a funnel convolution structure, the method comprising:
step 1: extracting the spatial features of the image sequence from the output of the previous layer of network through a convolution layer with the convolution kernel size of 1 x n;
step 2: extracting short-term time characteristics of an image sequence, namely relationship characteristics of a certain frame and surrounding frames thereof, from the output of the previous layer of network through a convolution layer of a funnel convolution kernel;
and step 3: the output of the previous layer network is passed through a convlstm network structure proposed by Xingjian SHI, zhoouring Chen et al to extract the long-term temporal features of the image sequence, i.e. the relationship features from the first frame to this frame.
And 4, step 4: and (3) adding weights to the features obtained in the steps 1-3 after the channel dimensions are connected, and then classifying through a full connection layer.
In order to optimize the technical scheme, the specific measures adopted further comprise:
the sum of the number of the characteristic channels extracted in the steps 1 to 3 is equal to the number of the convolutional layer channels of the original 3D convolutional neural network.
The funnel convolution kernel described in step 2 above is obtained by modifying a 3D convolution kernel of n x n by: the spatial convolution size at the center of the 3D convolution kernel convolution is changed to 1 x 1, and the other positions are unchanged.
The above-mentioned 3D convolution kernel is a 3 x 3D convolution kernel, and the funnel convolution kernel improved from the 3 x 3D convolution kernel is: 3D convolution kernels stacked from 3 x 3, 1 x 1, 3 x 3, these 3 2D convolution kernels.
The step 4 is specifically:
and (3) adding weight to the features obtained in the steps (1) to (3) by adopting a channel attention mechanism, namely connecting the features obtained in the steps (1) to (3) on the channel, performing global pooling outside the channel, and multiplying the features after the features are connected element by element after passing through a full connection layer to realize image sequence classification.
The invention has the following beneficial effects:
a 3D convolutional neural network is a network used to extract features of a time-series image, typically using a convolution kernel of n x n, i.e., the size of the convolution kernel is n in time, image length, and width, so that temporal and spatial features can be extracted simultaneously. In order to extract time and space characteristics independently in 3D convolution, the convolution kernel is improved, the 3D convolution kernel in the original network is replaced by a funnel convolution structure, so that the funnel convolution structure can extract space characteristics, short-term time characteristics and long-term time characteristics independently, the weights of the characteristics are measured by an attention mechanism, the network can process the time characteristics and the space characteristics independently, the funnel convolution structure completely separates the extraction of the time characteristics and the extraction of the space characteristics, the decoupling is better, the physical significance is more clear, training parameters are reduced, the characteristics are extracted independently, the parameters have less mutual influence, and the effect is improved.
Drawings
FIG. 1 is a schematic diagram of a funnel convolution structure;
fig. 2 is a schematic structural diagram of an I3D convolutional neural network.
Detailed Description
Embodiments of the present invention are described in further detail below with reference to the accompanying drawings.
The invention discloses an image sequence classification method based on a funnel convolution structure, which is characterized by comprising the following steps of:
step 1: extracting spatial features of the image sequence by adopting a convolution kernel of 1 × n;
step 2: extracting short-term time characteristics of the image sequence by adopting a funnel convolution kernel;
the funnel convolution kernel results from the following modifications to a n x n 3D convolution kernel: the spatial convolution size at the center of the 3D convolution kernel convolution is changed to 1 x 1, and the other positions are unchanged.
Referring to fig. 1, the 3D convolution kernel is a 3D convolution kernel of 3 × 3, and the funnel convolution improved by the 3D convolution kernel of 3 × 3 is: 3D convolution kernels stacked from 3 x 3, 1 x 1, 3 x 3, these 3 2D convolution kernels.
The left diagram of fig. 1 is a funnel convolution structure, which can replace the 3D convolutional layer in the original network, and at this time, N1, N2, N3 and the number of channels of the 3D convolutional layer in the original network should be ensured. The right graph is a funnel convolution, i.e. the convolution center size of the original 3D convolution kernel is changed to 1 x 1.
In addition to convolving the center pixel, changes in other pixels will only have an effect on one of the short-term temporal or spatial signatures.
And step 3: extracting long-term time characteristics of the image sequence by using convlstm, and connecting the characteristics of the steps 1-3 in a channel dimension;
referring to fig. 1 and 2, in the embodiment, all 3 × 3 convolution kernels of the I3D network are replaced with the funnel convolution structure proposed by the present invention. Fig. 2 shows the left diagram of I3D network structure, in which there are several inclusion modules, and the right diagram shows the network structure of the inclusion modules.
And 4, step 4: and (3) adding weights to the features obtained in the steps 1-3 by adopting a channel attention mechanism, and then classifying through a full connection layer.
In an example, comparing the accuracy of I3D and improved I3D on a test set, I3D network and improved I3D network results pairs are shown in table 1, trained ab initio on a UCF101 data set.
TABLE 1I 3D and improved I3D accuracy and parameter variables
Rate of accuracy Amount of ginseng
I3D 42.59% 12.4M
Improved I3D 45.02% 10.99M
The improved I3D has less parameter quantity and higher accuracy, and the decoupling and operation achieves obvious results.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims (5)

1. An image sequence classification method based on a funnel convolution structure is characterized by comprising the following steps:
step 1: extracting the spatial features of the image sequence from the output of the previous layer of network through a convolution layer with the convolution kernel size of 1 x n;
step 2: extracting short-term time characteristics of an image sequence, namely relationship characteristics of a certain frame and surrounding frames thereof, from the output of the previous layer of network through a convolution layer of a funnel convolution kernel;
and step 3: and (3) extracting long-term time characteristics of the image sequence, namely relationship characteristics from the first frame to the frame, from the output of the upper layer network through a convlstm network structure.
And 4, step 4: and (3) adding weights to the features obtained in the steps 1-3 after the channel dimensions are connected, and then classifying through a full connection layer.
2. The method for classifying image sequences based on the funnel convolution structure as claimed in claim 1, wherein the sum of the number of the extracted feature channels from step 1 to step 3 is equal to the number of convolution layer channels of the original 3D convolution neural network.
3. The method of claim 1, wherein the funnel convolution kernel of step 2 is obtained by modifying a n x n 3D convolution kernel by: the spatial convolution size at the center of the 3D convolution kernel convolution is changed to 1 x 1, and the other positions are unchanged.
4. The method according to claim 1, wherein the 3D convolution kernel is a 3 x 3D convolution kernel, and the funnel convolution kernel modified from the 3 x 3D convolution kernel is: 3D convolution kernels stacked by 3 x 3, 1 x 1, 3 x 3 2D convolution kernels.
5. The method for classifying image sequences based on the funnel convolution structure according to claim 1, wherein the step 4 specifically comprises:
and (3) adding weight to the features obtained in the steps (1) to (3) by adopting a channel attention mechanism, namely connecting the features obtained in the steps (1) to (3) on the channel, performing global pooling outside the channel, and multiplying the features after the features are connected element by element after passing through a full connection layer to realize image sequence classification.
CN202010834656.9A 2020-08-19 2020-08-19 Image sequence classification method based on funnel convolution structure Active CN111967522B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010834656.9A CN111967522B (en) 2020-08-19 2020-08-19 Image sequence classification method based on funnel convolution structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010834656.9A CN111967522B (en) 2020-08-19 2020-08-19 Image sequence classification method based on funnel convolution structure

Publications (2)

Publication Number Publication Date
CN111967522A true CN111967522A (en) 2020-11-20
CN111967522B CN111967522B (en) 2022-02-25

Family

ID=73388458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010834656.9A Active CN111967522B (en) 2020-08-19 2020-08-19 Image sequence classification method based on funnel convolution structure

Country Status (1)

Country Link
CN (1) CN111967522B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464831A (en) * 2020-12-01 2021-03-09 马上消费金融股份有限公司 Video classification method, training method of video classification model and related equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549841A (en) * 2018-03-21 2018-09-18 南京邮电大学 A kind of recognition methods of the Falls Among Old People behavior based on deep learning
CN109508375A (en) * 2018-11-19 2019-03-22 重庆邮电大学 A kind of social affective classification method based on multi-modal fusion
CN109711316A (en) * 2018-12-21 2019-05-03 广东工业大学 A kind of pedestrian recognition methods, device, equipment and storage medium again
CN110705431A (en) * 2019-09-26 2020-01-17 中国人民解放军陆军炮兵防空兵学院 Video saliency region detection method and system based on depth C3D feature
CN110826447A (en) * 2019-10-29 2020-02-21 北京工商大学 Restaurant kitchen staff behavior identification method based on attention mechanism
CN110942006A (en) * 2019-11-21 2020-03-31 中国科学院深圳先进技术研究院 Motion gesture recognition method, motion gesture recognition apparatus, terminal device, and medium
CN111091045A (en) * 2019-10-25 2020-05-01 重庆邮电大学 Sign language identification method based on space-time attention mechanism
CN111382616A (en) * 2018-12-28 2020-07-07 广州市百果园信息技术有限公司 Video classification method and device, storage medium and computer equipment
CN111523645A (en) * 2020-04-16 2020-08-11 北京航天自动控制研究所 Convolutional neural network design method for improving small-scale target detection and identification performance

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549841A (en) * 2018-03-21 2018-09-18 南京邮电大学 A kind of recognition methods of the Falls Among Old People behavior based on deep learning
CN109508375A (en) * 2018-11-19 2019-03-22 重庆邮电大学 A kind of social affective classification method based on multi-modal fusion
CN109711316A (en) * 2018-12-21 2019-05-03 广东工业大学 A kind of pedestrian recognition methods, device, equipment and storage medium again
CN111382616A (en) * 2018-12-28 2020-07-07 广州市百果园信息技术有限公司 Video classification method and device, storage medium and computer equipment
CN110705431A (en) * 2019-09-26 2020-01-17 中国人民解放军陆军炮兵防空兵学院 Video saliency region detection method and system based on depth C3D feature
CN111091045A (en) * 2019-10-25 2020-05-01 重庆邮电大学 Sign language identification method based on space-time attention mechanism
CN110826447A (en) * 2019-10-29 2020-02-21 北京工商大学 Restaurant kitchen staff behavior identification method based on attention mechanism
CN110942006A (en) * 2019-11-21 2020-03-31 中国科学院深圳先进技术研究院 Motion gesture recognition method, motion gesture recognition apparatus, terminal device, and medium
CN111523645A (en) * 2020-04-16 2020-08-11 北京航天自动控制研究所 Convolutional neural network design method for improving small-scale target detection and identification performance

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
NRZZN: "【论文阅读】Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset", 《CSDN,HTTPS://BLOG.CSDN.NET/ZZMSHUAI/ARTICLE/DETAILS/84936338?SPM=1001.2014.3001.5502》 *
张小俊,李辰政,孙凌宇,张明路: "于改进3D卷积神经网络的行为识别", 《计算机集成制造系统》 *
方鹏飞: "基于卷积神经网络框架的室内物体目标检测和姿态估计", 《中国优秀硕士学位论文全文数据库》 *
蔡强,邓毅彪,李海生,余乐,明少锋: "基于深度学习的人体行为识别方法综述", 《计算机科学》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464831A (en) * 2020-12-01 2021-03-09 马上消费金融股份有限公司 Video classification method, training method of video classification model and related equipment
CN112464831B (en) * 2020-12-01 2021-07-30 马上消费金融股份有限公司 Video classification method, training method of video classification model and related equipment

Also Published As

Publication number Publication date
CN111967522B (en) 2022-02-25

Similar Documents

Publication Publication Date Title
CN111738344B (en) Rapid target detection method based on multi-scale fusion
CN109784372B (en) Target classification method based on convolutional neural network
CN105590319A (en) Method for detecting image saliency region for deep learning
Tan et al. Towards real-time tracking and counting of seedlings with a one-stage detector and optical flow
CN111860398A (en) Remote sensing image target detection method and system and terminal equipment
CN110837808A (en) Hyperspectral image classification method based on improved capsule network model
CN112257738A (en) Training method and device of machine learning model and classification method and device of image
CN111967522B (en) Image sequence classification method based on funnel convolution structure
CN107066959A (en) A kind of hyperspectral image classification method based on Steerable filter and linear space correlation information
CN112529090A (en) Small target detection method based on improved YOLOv3
CN111160107B (en) Dynamic region detection method based on feature matching
CN110827265A (en) Image anomaly detection method based on deep learning
CN106228557A (en) Distorted image method for detecting area based on secondary JPEG compression
CN104202448A (en) System and method for solving shooting brightness unevenness of mobile terminal camera
CN113569687B (en) Scene classification method, system, equipment and medium based on double-flow network
CN107527001A (en) A kind of hyperspectral image classification method based on Steerable filter and linear space correlation information
CN113283351A (en) Video plagiarism detection method using CNN to optimize similarity matrix
CN116740460A (en) Pcb defect detection system and detection method based on convolutional neural network
Tang et al. Salient object detection with chained multi-scale fully convolutional network
CN113435389B (en) Chlorella and golden algae classification and identification method based on image feature deep learning
CN114359786A (en) Lip language identification method based on improved space-time convolutional network
CN114332745A (en) Near-repetitive video big data cleaning method based on deep neural network
CN113469103A (en) PCR liquid drop image detection technology system and use method thereof
CN113378722A (en) Behavior identification method and system based on 3D convolution and multilevel semantic information fusion
CN116486189A (en) Multi-scale feature fusion method based on feature sharing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant