WO2019136761A1 - 一种用于识别人为动作的三维卷积装置 - Google Patents

一种用于识别人为动作的三维卷积装置 Download PDF

Info

Publication number
WO2019136761A1
WO2019136761A1 PCT/CN2018/072675 CN2018072675W WO2019136761A1 WO 2019136761 A1 WO2019136761 A1 WO 2019136761A1 CN 2018072675 W CN2018072675 W CN 2018072675W WO 2019136761 A1 WO2019136761 A1 WO 2019136761A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
dimensional convolution
frame
convolution
input
Prior art date
Application number
PCT/CN2018/072675
Other languages
English (en)
French (fr)
Inventor
肖梦秋
Original Assignee
深圳鲲云信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳鲲云信息科技有限公司 filed Critical 深圳鲲云信息科技有限公司
Priority to CN201880002145.1A priority Critical patent/CN109416743B/zh
Priority to PCT/CN2018/072675 priority patent/WO2019136761A1/zh
Publication of WO2019136761A1 publication Critical patent/WO2019136761A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/955Hardware or software architectures specially adapted for image or video understanding using specific electronic processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Definitions

  • the invention belongs to the field of artificial intelligence technology, and relates to a convolution device, in particular to a three-dimensional convolution device for recognizing human actions.
  • 3D CNNs have been widely used in video analysis, 3D geometric data and medical image diagnosis. While convolutional neural networks are more computationally intensive, three-dimensional convolutional neural networks push computational requirements to another level because each calculation is determined by complex images.
  • the prior art generally uses a GPU and a CPU that apply a two-dimensional convolutional neural network to analyze and process video data, but the GPU generates high power consumption during the operation process, the CPU has low processing speed in the operation process, and both the GPU and the CPU It is not suitable for handling video motion recognition with a large amount of data.
  • an object of the present invention is to provide a three-dimensional convolution device for recognizing a human action, which is used to solve the problem that the prior art cannot identify a human being from a large amount of video data through a hardware structure. Actions, and there are problems with storage and bandwidth limitations.
  • the present invention provides a three-dimensional convolution device for recognizing a human motion
  • the three-dimensional convolution device for recognizing a human motion comprising: at least one three-dimensional convolution layer, at least one layer Correcting a linear unit layer and at least one layer of a three-dimensional pooling layer;
  • the three-dimensional convolution layer includes: a cache memory for buffering video data to be identified, the to-be-identified video data includes a plurality of feature images; and a line buffer for bit by bit Receiving pixels of the feature image to form row data, and outputting K C adjacent input frames in parallel; wherein the adjacent input frames are composed of row data;
  • K C represents a kernel number of the 3D convolution kernel; K C is greater than or equal to 3; (K C -1) frame buffers for buffering (K C -1) adjacent input frames; K C matrix buffers for receiving K C adjacent input frames, simultaneously outputting K C * K C adjacent output frames; wherein the first matrix buffer is directly connected to the line buffer to directly form the
  • the three-dimensional convolution layer can process N C * N L feature images, each feature image having a height H and a width W, and a coefficient size K C 3 N C * N L coefficient vector.
  • the three-dimensional convolution layer is processed using frame blocking, pixel blocking, and/or coefficient buffering.
  • the frame blocking refers to dividing an input frame input to the K C convolution processors into input data, and maintaining an original size of each frame; if the buffered input frame is C i , each frame block includes a C i /(N C *H*W) frame, and the pixels above each frame block are: (K C -1)*(N C *H*W)/C i ; H is the height of the feature image, W is the width of the feature image, and N C is the number of image channels.
  • the pixel blocking refers to dividing each input frame input to the K C convolution processors into square frames of the same size, and retaining all input frames;
  • the frame has 2*(K C -1) pixels above. If each input frame contains C i /(N L *N C ) pixels, the pixels above each frame block are Where N L represents the number of input frames.
  • the duty ratio of frame blocking and pixel blocking is wherein, if the duty ratio is greater than 1, the three-dimensional convolution layer is processed by pixel block processing; if the duty ratio is less than 1, the three-dimensional convolution layer is processed by frame block processing.
  • the coefficient buffer means that if the coefficient buffer size is C C , each vector contains K C *K C *K C coefficients, and the coefficient buffer size needs to satisfy C C ⁇ N f *N C *K C *K C *K C .
  • the 2-dimensional convolution kernel includes K C 2 multipliers and an adder of depth log (K C ).
  • the line buffer is provided with K C serial FIFO memories; each of the FIFO stores a row of data of the feature image; wherein each of the row data edges
  • the paths formed by the serial FIFO memories are sequentially stored in each of the FIFO memories.
  • the matrix buffer arranges input adjacent input frames into a matrix to store a plurality of registers.
  • the three-dimensional convolution device for recognizing human actions of the present invention has the following advantageous effects:
  • the three-dimensional convolution device for recognizing human actions recognizes human actions from large data volume video data through a hardware structure, and solves storage and bandwidth limitation problems, thereby reducing overall power consumption.
  • FIG. 1 is a block diagram showing an embodiment of a three-dimensional convolution device for recognizing an artificial motion of the present invention.
  • FIG. 2 is a schematic diagram showing the hardware structure of a three-dimensional convolution device for identifying an artificial action according to an embodiment of the present invention.
  • the video data After inputting a video data by the three-dimensional convolution device for identifying a human action provided by the present invention, the video data is divided into 16 non-overlapping frame segments and adjusted to three channels of 112*112 size, and the three-dimensional convolution is used.
  • the motion information encoded in the plurality of consecutive frame data is extracted in the time dimension and the spatial dimension.
  • the present invention allows the size of the input feature image to be equal to the size of the output feature image.
  • the embodiment provides a three-dimensional convolution device 1 for recognizing human actions, the three-dimensional convolution device 1 comprising at least one layer of three-dimensional convolution layer 2, at least one layer of corrected linear unit layer 3 and at least one layer of three-dimensional pooling layer 4. Wherein, the three-dimensional convolution layer is used to identify a human action.
  • FIG. 1 a schematic structural view of an embodiment of a three-dimensional convolution device is shown.
  • the three-dimensional convolution device 1 is provided with eight layers of three-dimensional convolution layers (represented by Cov), a five-layer three-dimensional pooling layer (represented by pool), and two connection layers (represented by fc6).
  • the three-dimensional convolutional layer uses a three-dimensional convolution kernel with a coefficient of 3*3*3.
  • FIG. 2 a hardware structure diagram of a three-dimensional convolution device is shown in an embodiment.
  • the three-dimensional convolution device 1 is provided with a cache memory 21, a line buffer 22, a frame buffer 23, a matrix buffer 24, a three-dimensional convolution processor 25, and Accumulator 26.
  • the cache memory 21 is configured to cache video data to be identified, and the to-be-identified video data includes a plurality of feature images.
  • the input feature images are sequentially input into the cache memory 21 in the order of division.
  • the height of the feature image is H and the width is W, and the N C * N L coefficient vector of the coefficient size K C 3 .
  • a line buffer 22 connected to the cache memory 21 is configured to receive pixels of the feature image bit by bit to form line data, and output K C adjacent input frames in parallel; wherein the adjacent input frame is composed of line data ; K C represents the number of cores of the 3D convolution kernel.
  • the row buffer 22 is provided with K C serial FIFO memories (in this embodiment, three serial FIFO memories 221 are provided); each of the FIFO stores a row of data of the feature image; Each of the row data is sequentially stored in each of the FIFO memories along a path formed by the serial FIFO memory.
  • the (K C -1) frame buffers 23 are used to buffer (K C -1) adjacent input frames.
  • K C matrix buffers 24 are used to receive K C adjacent input frames while outputting K C *K C adjacent output frames.
  • the first matrix buffer 24 is directly connected to the line buffer 21 to directly form the received row data into the first adjacent input frame, and output the first adjacent Output frames; the remaining (K C -1) matrix buffers 24 are connected to (K C -1) frame buffers 23, respectively.
  • the K C three-dimensional convolution processors 25 respectively connected to the K C matrix buffers 24 are used to process K C * K C adjacent output frames using a pre-stored three-dimensional convolution kernel convolution.
  • the matrix buffer arranges input adjacent input frames into a matrix to store a plurality of registers.
  • the pre-stored three-dimensional convolution kernel consists of three 2-dimensional convolution kernels.
  • the three-dimensional convolution kernel is used for convolution processing of three adjacent output frames.
  • the 2-dimensional convolution kernel includes K C 2 multipliers and an adder of depth log (K C ).
  • the first input frame is subjected to three-dimensional convolution processing in the first convolution processor 25 (core 1.3 in FIG. 2)
  • the second is buffered in the second and third frame buffers 23
  • the third adjacent input frame flows into a second convolution processor (such as core 1.2 in Figure 2) and a third convolution processor (such as core 1.1 in Figure 2) for three-dimensional convolution processing.
  • the correction linear unit layer 3 includes a three-dimensional output buffer 31 connected to the accumulator 26 for selecting the largest pixel compared with the number 0 from the accumulated result output by the accumulator 26, and selecting the largest pixel Pixels are cached line by line.
  • the three-dimensional pooling layer 4 includes a three-dimensional pooling unit 41 connected to the three-dimensional output buffer 31 for buffering the largest pixel in the row data row by row to form a two-dimensional pooling result through the frame.
  • the buffer caches the two-dimensional pooling result, and selects a maximum value from the two-dimensional pooling result to form a three-dimensional pooling result; the three-dimensional pooling result is a human motion recognized from the to-be-identified video.
  • the three-dimensional convolutional layer since the three-dimensional convolutional layer requires at least three 2-dimensional convolution kernels and more chip memory to buffer input data of different frames, it is more resource-intensive and requires more memory than the two-dimensional convolution design. . Therefore, in order to solve the memory and bandwidth limitation of the three-dimensional convolutional layer, the three-dimensional convolutional layer is frame-blocked or pixel-blocked.
  • the frame tiling refers to dividing an input frame input to the K C convolution processors into input data and maintaining an original size of each frame; if the buffered input frame is C i , each frame block includes C i /(N C *H*W) frame, the pixel above each frame block is: (K C -1)*(N C *H*W)/C i , where H is the height of the feature image, W is The width of the feature image, N C is the number of image channels.
  • the block tiling refers to dividing each input frame input to the K C convolution processors into square frames of the same size and retaining all input frames; if each input frame has 2* (K C -1) Above the pixel, if each input frame contains C i /(N L *N C ) pixels, the pixels above each frame block are Where N L represents the number of input frames.
  • the N C * N f coefficient vector is buffered until the B block input frame is completed.
  • this coefficient buffer is limited by the amount of chip storage. Therefore the weight of the coefficients is much smaller than the coefficient buffer.
  • each coefficient vector contains K C *K C *K C coefficients, and the coefficient buffer size needs to satisfy C C ⁇ N f *N C *K C *K C * K C .
  • the three-dimensional convolution device for recognizing human actions can be implemented by an FPGA chip.
  • the three-dimensional convolution device for recognizing human actions recognizes human actions from large data volume video data through a hardware structure, and solves storage and bandwidth limitation problems, thereby reducing overall power consumption. Therefore, the present invention effectively overcomes various shortcomings in the prior art and has high industrial utilization value.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Complex Calculations (AREA)
  • Image Generation (AREA)
  • Image Analysis (AREA)

Abstract

一种用于识别人为动作的三维卷积装置(1),该装置包括至少一层三维卷积层(2)、至少一层校正线性单元层(3)及至少一层三维池化层(4);所述三维卷积层(2)包括:缓存存储器(21),行缓存器(22),(K C-1)个帧缓存器(23),K C个矩阵缓存器(24),K C个三维卷积处理器(25),累加器(26);校正线性单元层(3)包括:三维输出缓存器(31);所述三维池化层(4)包括:三维池化器(41)。所述用于识别人为动作的三维卷积装置(1)通过硬件结构从大数据量的视频数据中识别出人为动作,且解决了存储和带宽限制问题,降低了整体功耗。

Description

一种用于识别人为动作的三维卷积装置 技术领域
本发明属于人工智能技术领域,涉及一种卷积装置,特别是涉及一种用于识别人为动作的三维卷积装置。
背景技术
在最近几年,三维卷积神经网络(3D CNNs)已经广泛应用于视频分析,三维几何数据及医学图像诊断等技术领域。而卷积神经网络计算量更大,三维卷积神经网络将计算要求推进到另一个层次,因为每种计算决定于复杂图像。
现有技术通常采用应用二维卷积神经网络的GPU和CPU来分析处理视频数据,但是GPU在运算过程中会产生高功耗,CPU在运算处理过程中低处理速度,GPU和CPU两者都不适用于处理数据量较大的视频动作识别。
因此,如何提供一种用于识别人为动作的三维卷积装置,以解决现有技术无法通过硬件结构从大数据量的视频数据中识别出人为动作,且存在存储和带宽限制等缺陷,实已成为本领域技术人员亟待解决的技术问题。
发明内容
鉴于以上所述现有技术的缺点,本发明的目的在于提供一种用于识别人为动作的三维卷积装置,用于解决现有技术无法通过硬件结构从大数据量的视频数据中识别出人为动作,且存在存储和带宽限制的问题。
为实现上述目的及其他相关目的,本发明提供一种用于识别人为动作的三维卷积装置,所述用于识别人为动作的三维卷积装置包括:至少一层三维卷积层、至少一层校正线性单元层及至少一层三维池化层;所述三维卷积层包括:缓存存储器,用于缓存待识别视频数据,该待识别视频数据包括若干特征图像;行缓存器,用于逐位接收所述特征图像的像素,以组成行数据,并平行输出K C个毗邻输入帧;其中,所述毗邻输入帧由行数据组成;K C表示3D卷积核的核数;K C大于等于3;(K C-1)个帧缓存器,用于缓存(K C-1)个毗邻输入帧;K C个矩阵缓存器,用于接收K C个毗邻输入帧,将同时输出K C*K C个毗邻输出帧;其中,第一矩阵缓存器与所述行缓存器直接连接,以将接收的行数据直接组成第一毗邻输入帧,并输出第一毗邻输出帧;其余(K C-1)个矩阵缓存器分别与(K C-1)个帧缓存器连接;K C个三维卷积处理器,用于利用预存三维卷积核三维卷积处理K C*K C个毗邻输出帧;所述预存三维卷积 核由3个2维卷积核组成;累加器,用于将K C个卷积处理器三维卷积处理后的卷积结果进行累加;所述校正线性单元层包括:三维输出缓存器,用于从所述累加器输出的累加结果中选取与数字0比较后的最大像素,并将选取出来的最大像素进行逐行缓存;所述三维池化层包括:三维池化器,用于通过行缓存器缓存逐行输入的行数据中最大像素,以形成二维池化结果,通过帧缓存器缓存所述二维池化结果,并从所述二维池化结果中选取最大值,以形成三维池化结果;所述三维池化结果为从待识别视频识别出的人为动作。
于本发明的一实施例中,所述三维卷积层可处理N C*N L幅特征图像,每一特征图像的高度为H和宽度为W,及系数尺寸K C 3的N C*N L系数矢量。
于本发明的一实施例中,采用帧块化、像素块化和/或系数缓存处理所述三维卷积层。
于本发明的一实施例中,所述帧块化指将输入所述K C个卷积处理器的输入帧划分为输入数据,并保持每一帧的原始尺寸;若缓存的输入帧为C i,每个帧块包括C i/(N C*H*W)帧,每个帧块的上空像素为:(K C-1)*(N C*H*W)/C i;其中,H为特征图像的高度,W为特征图像的宽度,N C为图像通道数。
于本发明的一实施例中,所述像素块化指将输入所述K C个卷积处理器的每一输入帧划分成相同尺寸的正方形的帧,并保留所有输入帧;若每一输入帧有2*(K C-1)上空像素,若每个输入帧包含C i/(N L*N C)像素,则每个帧块的上空像素为
Figure PCTCN2018072675-appb-000001
其中,N L表示输入帧的数量。
于本发明的一实施例中,帧块化和像素块化的占空率为
Figure PCTCN2018072675-appb-000002
其中,若该占空率大于1,采用像素块化处理三维卷积层;若该占空率小于1,采用帧块化处理三维卷积层。
于本发明的一实施例中,所述系数缓存指若系数缓存尺寸为C C,每个向量包含K C*K C*K C个系数,系数缓存尺寸需满足C C≥N f*N C*K C*K C*K C
于本发明的一实施例中,所述2维卷积核包括K C 2个乘法器及深度为log(K C)的加法器。
于本发明的一实施例中,所述行缓存器中设置有K C个串联的先进先出存储器;每一个所述先进先出存储器存储特征图像的一行数据;其中,各所述行数据沿串联的先进先出存储器形成的路径依次存储至各所述先进先出存储器。
于本发明的一实施例中,所述矩阵缓存器将输入的毗邻输入帧排列成矩阵以存储的多个寄存器。
如上所述,本发明的用于识别人为动作的三维卷积装置,具有以下有益效果:
本发明所述用于识别人为动作的三维卷积装置通过硬件结构从大数据量的视频数据中识别出人为动作,且解决了存储和带宽限制问题,降低了整体功耗。
附图说明
图1显示为本发明的用于识别人为动作的三维卷积装置的一实施例结构示意图。
图2显示为本发明的用于识别人为动作的三维卷积装置于一实施例中的硬件结构示意图。
元件标号说明
1        用于识别人为动作的
         三维卷积装置
2        维卷积层
3        校正线性单元层
4        三维池化层
21       缓存存储器
22       行缓存器
23       帧缓存器
24       矩阵缓存器
25       三维卷积处理器
26       累加器
31       三维输出缓存器
41       三维池化器
具体实施方式
以下通过特定的具体实例说明本发明的实施方式,本领域技术人员可由本说明书所揭露的内容轻易地了解本发明的其他优点与功效。本发明还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本发明的精神下进行各种修饰或改变。需说明的是,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。
需要说明的是,以下实施例中所提供的图示仅以示意方式说明本发明的基本构想,遂图式中仅显示与本发明中有关的组件而非按照实际实施时的组件数目、形状及尺寸绘制,其实 际实施时各组件的型态、数量及比例可为一种随意的改变,且其组件布局型态也可能更为复杂。
在本发明所提供的用于识别人为动作的三维卷积装置输入一视频数据后,将该视频数据划分为16个非重叠帧片段并调整为112*112大小的三通道,利用三维卷积从时间维度和空间维度中提取出编码在多个连续帧数据中的动作信息。对于所有卷积层,应用1步长的0边界而言,本发明可使输入特征图像的尺寸等于输出特征图像的尺寸。
本实施例提供一种用于识别人为动作的三维卷积装置1,该三维卷积装置1包括至少一层三维卷积层2、至少一层校正线性单元层3及至少一层三维池化层4。其中,所述三维卷积层用于识别人为动作。
请参阅图1,显示为三维卷积装置的一实施例结构示意图。如图1所示,所述三维卷积装置1设置有8层三维卷积层(用Cov表示)、5层三维池化层(用pool表示)及2个连接层(用fc6表示)。三维卷积层采用三维卷积核,该核的系数为3*3*3。
请参阅图2,显示为三维卷积装置于一实施例中的硬件结构示意图。如图2所示,所述三维卷积装置1在所述三维卷积层2上设置有缓存存储器21、行缓存器22、帧缓存器23、矩阵缓存器24、三维卷积处理器25及累加器26。
其中,所述缓存存储器21用于缓存待识别视频数据,该待识别视频数据包括若干特征图像。在本实施例中,输入的特征图像按照划分顺序依次输入所述缓存存储器21中。所述特征图像的高度为H和宽度为W,及系数尺寸K C 3的N C*N L系数矢量。K C大于等于3。在本实施例中,K C=3。
与所述缓存存储器21连接的行缓存器22用于逐位接收所述特征图像的像素,以组成行数据,并平行输出K C个毗邻输入帧;其中,所述毗邻输入帧由行数据组成;K C表示3D卷积核的核数。所述行缓存器22中设置有K C个串联的先进先出存储器(于本实施例中,设置3个串联的FIFO存储器221);每一个所述先进先出存储器存储特征图像的一行数据;其中,各所述行数据沿串联的先进先出存储器形成的路径依次存储至各所述先进先出存储器。
与所述(K C-1)个帧缓存器23用于缓存(K C-1)个毗邻输入帧。
K C个矩阵缓存器24用于接收K C个毗邻输入帧,同时输出K C*K C个毗邻输出帧。参阅图2,由于该实施例中K C=3,其中,第一矩阵缓存器24与所述行缓存器21直接连接,以将接收的行数据直接组成第一毗邻输入帧,输出第一毗邻输出帧;其余(K C-1)个矩阵缓存器24分别与(K C-1)个帧缓存器23连接。
分别与K C个矩阵缓存器24连接的K C个三维卷积处理器25用于利用预存三维卷积核卷 积处理K C*K C个毗邻输出帧。所述矩阵缓存器将输入的毗邻输入帧排列成矩阵以存储的多个寄存器。所述预存三维卷积核由3个2维卷积核组成。所述三维卷积核用于卷积处理三个毗邻输出帧。所述2维卷积核包括K C 2个乘法器及深度为log(K C)的加法器。
例如,当第一个输入帧在第一个卷积处理器25(如图2中的核1.3)中进行三维卷积处理,缓存在第二个和第三个帧缓存器23中的第二个和第三个毗邻输入帧流入第二个卷积处理器(如图2中的核1.2)和第三个卷积处理器(如图2中的核1.1)中分别进行三维卷积处理。
与所述K C个卷积处理器25连接的累加器26用于将K C个卷积处理器25三维卷积处理后的卷积结果进行累加。
所述校正线性单元层3包括与所述累加器26连接的三维输出缓存器31用于从所述累加器26输出的累加结果中选取与数字0比较后的最大像素,并将选取出来的最大像素进行逐行缓存。
所述三维池化层4包括与所述三维输出缓存器31连接的三维池化器41用于通过行缓存器缓存逐行输入的行数据中最大像素,以形成二维池化结果,通过帧缓存器缓存所述二维池化结果,并从所述二维池化结果中选取最大值,以形成三维池化结果;所述三维池化结果为从待识别视频识别出的人为动作。
在本实施例中,由于三维卷积层需要至少3个2维卷积核和更多的芯片存储器来缓存不同帧的输入数据,这样比二维卷积设计更加耗费资源和需要更多的内存。因此,为了解决三维卷积层的内存和带宽限制,对所述三维卷积层进行采用帧块化或像素块化。
所述帧块化指将输入所述K C个卷积处理器的输入帧划分为输入数据,并保持每一帧的原始尺寸;若缓存的输入帧为C i,每个帧块包括C i/(N C*H*W)帧,每个帧块的上空像素为:(K C-1)*(N C*H*W)/C i,其中,H为特征图像的高度,W为特征图像的宽度,N C为图像通道数。
所述像素块化指将输入所述K C个卷积处理器的每一输入帧划分成相同尺寸的正方形的帧,并保留所有输入帧;若每一输入帧有2*(K C-1)上空像素,若每个输入帧包含C i/(N L*N C)像素,则每个帧块的上空像素为
Figure PCTCN2018072675-appb-000003
其中,N L表示输入帧的数量。
(K C-1)*(N C*H*W)/C i除以
Figure PCTCN2018072675-appb-000004
得到
Figure PCTCN2018072675-appb-000005
用以表示帧块化和像素块化的占空率。若该占空率大于1,采用像素块化处理三维卷积层,若该占空率小于1,采用帧块化处理三维卷积层。
在本实施例中,缓存N C*N f系数矢量直至B块输入帧完成。但是这种系数缓存受到芯 片存储量的限制。因此系数的重量远远小于系数缓存。
若所述系数缓存指若系数缓存尺寸为C C,每个系数向量包含K C*K C*K C个系数,系数缓存尺寸需满足C C≥N f*N C*K C*K C*K C
在本实施例中,所述识别人为动作的三维卷积装置可通过FPGA芯片实现。
综上所述,本发明所述用于识别人为动作的三维卷积装置通过硬件结构从大数据量的视频数据中识别出人为动作,且解决了存储和带宽限制问题,降低了整体功耗。所以,本发明有效克服了现有技术中的种种缺点而具高度产业利用价值。
上述实施例仅例示性说明本发明的原理及其功效,而非用于限制本发明。任何熟悉此技术的人士皆可在不违背本发明的精神及范畴下,对上述实施例进行修饰或改变。因此,举凡所属技术领域中具有通常知识者在未脱离本发明所揭示的精神与技术思想下所完成的一切等效修饰或改变,仍应由本发明的权利要求所涵盖。

Claims (10)

  1. 一种用于识别人为动作的三维卷积装置,其特征在于,所述用于识别人为动作的三维卷积装置包括:至少一层三维卷积层、至少一层校正线性单元层及至少一层三维池化层;
    所述三维卷积层包括:
    缓存存储器,用于缓存待识别视频数据,该待识别视频数据包括若干特征图像;
    行缓存器,用于逐位接收所述特征图像的像素,以组成行数据,并平行输出K C个毗邻输入帧;其中,所述毗邻输入帧由行数据组成;K C表示3D卷积核的核数;K C大于等于3;
    (K C-1)个帧缓存器,用于缓存(K C-1)个毗邻输入帧;
    K C个矩阵缓存器,用于接收K C个毗邻输入帧,将同时输出K C*K C个毗邻输出帧;其中,第一矩阵缓存器与所述行缓存器直接连接,以将接收的行数据直接组成第一毗邻输入帧,并输出第一毗邻输出帧;其余(K C-1)个矩阵缓存器分别与(K C-1)个帧缓存器连接;
    K C个三维卷积处理器,用于利用预存三维卷积核三维卷积处理K C*K C个毗邻输出帧;所述预存三维卷积核由3个2维卷积核组成;
    累加器,用于将K C个卷积处理器三维卷积处理后的卷积结果进行累加;
    所述校正线性单元层包括:
    三维输出缓存器,用于从所述累加器输出的累加结果中选取与数字0比较后的最大像素,并将选取出来的最大像素进行逐行缓存;
    所述三维池化层包括:
    三维池化器,用于通过行缓存器缓存逐行输入的行数据中最大像素,以形成二维池化结果,通过帧缓存器缓存所述二维池化结果,并从所述二维池化结果中选取最大值,以形成三维池化结果;所述三维池化结果为从待识别视频识别出的人为动作。
  2. 根据权利要求1所述的用于识别人为动作的三维卷积装置,其特征在于,所述三维卷积层可处理N C*N L幅特征图像,每一特征图像的高度为H和宽度为W,及系数尺寸K C 3的N C*N L系数矢量。
  3. 根据权利要求1所述的用于识别人为动作的三维卷积装置,其特征在于,采用帧块化、像素块化和/或系数缓存处理所述三维卷积层。
  4. 根据权利要求3所述用于识别人为动作的三维卷积装置,其特征在于,所述帧块化指将输入所述K C个卷积处理器的输入帧划分为输入数据,并保持每一帧的原始尺寸;若缓存的输入帧为C i,每个帧块包括C i/(N C*H*W)帧,每个帧块的上空像素为:(K C-1)*(N C*H*W)/C i;其中,H为特征图像的高度,W为特征图像的宽度,N C为图像通道数。
  5. 根据权利要求3所述的用于识别人为动作的三维卷积装置,其特征在于,所述像素块化指将输入所述K C个卷积处理器的每一输入帧划分成相同尺寸的正方形的帧,并保留所有输入帧;若每一输入帧有2*(K C-1)上空像素,若每个输入帧包含C i/(N L*N C)像素,则每个帧块的上空像素为
    Figure PCTCN2018072675-appb-100001
    其中,N L表示输入帧的数量。
  6. 根据权利要求4或5所述的用于识别人为动作的三维卷积装置,其特征在于,帧块化和像素块化的占空率为
    Figure PCTCN2018072675-appb-100002
    其中,若该占空率大于1,采用像素块化处理三维卷积层;若该占空率小于1,采用帧块化处理三维卷积层。
  7. 根据权利要求3所述的用于识别人为动作的三维卷积装置,其特征在于,所述系数缓存指若系数缓存尺寸为C C,每个向量包含K C*K C*K C个系数,系数缓存尺寸需满足C C≥N f*N C*K C*K C*K C
  8. 根据权利要求1所述的用于识别人为动作的三维卷积装置,其特征在于,所述2维卷积核包括K C 2个乘法器及深度为log(K C)的加法器。
  9. 根据权利要求1所述的用于识别人为动作的三维卷积装置,其特征在于,所述行缓存器中设置有K C个串联的先进先出存储器;每一个所述先进先出存储器存储特征图像的一行数据;其中,各所述行数据沿串联的先进先出存储器形成的路径依次存储至各所述先进先出存储器。
  10. 根据权利要求1所述的用于识别人为动作的三维卷积装置,其特征在于,所述矩阵缓存器将输入的毗邻输入帧排列成矩阵以存储的多个寄存器。
PCT/CN2018/072675 2018-01-15 2018-01-15 一种用于识别人为动作的三维卷积装置 WO2019136761A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880002145.1A CN109416743B (zh) 2018-01-15 2018-01-15 一种用于识别人为动作的三维卷积装置
PCT/CN2018/072675 WO2019136761A1 (zh) 2018-01-15 2018-01-15 一种用于识别人为动作的三维卷积装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/072675 WO2019136761A1 (zh) 2018-01-15 2018-01-15 一种用于识别人为动作的三维卷积装置

Publications (1)

Publication Number Publication Date
WO2019136761A1 true WO2019136761A1 (zh) 2019-07-18

Family

ID=65462098

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/072675 WO2019136761A1 (zh) 2018-01-15 2018-01-15 一种用于识别人为动作的三维卷积装置

Country Status (2)

Country Link
CN (1) CN109416743B (zh)
WO (1) WO2019136761A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113808111A (zh) * 2021-09-18 2021-12-17 广州幻境科技有限公司 一种医学影像的三维虚拟重构方法和系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110728367B (zh) * 2019-12-18 2020-05-05 深圳鲲云信息科技有限公司 用于神经网络的数据存储方法及装置
CN112016522B (zh) * 2020-09-25 2022-06-07 苏州浪潮智能科技有限公司 一种视频数据处理方法、系统及相关组件

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203283A (zh) * 2016-06-30 2016-12-07 重庆理工大学 基于三维卷积深度神经网络和深度视频的动作识别方法
US20170243053A1 (en) * 2016-02-18 2017-08-24 Pinscreen, Inc. Real-time facial segmentation and performance capture from rgb input
CN107403117A (zh) * 2017-07-28 2017-11-28 西安电子科技大学 基于fpga的三维卷积器
CN107506756A (zh) * 2017-09-26 2017-12-22 北京航空航天大学 一种基于Gabor滤波器三维卷积神经网络模型的人体动作识别方法
CN107506740A (zh) * 2017-09-04 2017-12-22 北京航空航天大学 一种基于三维卷积神经网络和迁移学习模型的人体行为识别方法
CN107564063A (zh) * 2017-08-30 2018-01-09 广州华多网络科技有限公司 一种基于卷积神经网络的虚拟物显示方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217214B (zh) * 2014-08-21 2017-09-19 广东顺德中山大学卡内基梅隆大学国际联合研究院 基于可配置卷积神经网络的rgb‑d人物行为识别方法
CN106503610B (zh) * 2015-09-08 2020-05-26 阿里巴巴集团控股有限公司 视频识别方法和装置
WO2017210690A1 (en) * 2016-06-03 2017-12-07 Lu Le Spatial aggregation of holistically-nested convolutional neural networks for automated organ localization and segmentation in 3d medical scans
CN106940815B (zh) * 2017-02-13 2020-07-28 西安交通大学 一种可编程卷积神经网络协处理器ip核
CN107527381B (zh) * 2017-09-11 2023-05-12 Oppo广东移动通信有限公司 图像处理方法及装置、电子装置和计算机可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170243053A1 (en) * 2016-02-18 2017-08-24 Pinscreen, Inc. Real-time facial segmentation and performance capture from rgb input
CN106203283A (zh) * 2016-06-30 2016-12-07 重庆理工大学 基于三维卷积深度神经网络和深度视频的动作识别方法
CN107403117A (zh) * 2017-07-28 2017-11-28 西安电子科技大学 基于fpga的三维卷积器
CN107564063A (zh) * 2017-08-30 2018-01-09 广州华多网络科技有限公司 一种基于卷积神经网络的虚拟物显示方法及装置
CN107506740A (zh) * 2017-09-04 2017-12-22 北京航空航天大学 一种基于三维卷积神经网络和迁移学习模型的人体行为识别方法
CN107506756A (zh) * 2017-09-26 2017-12-22 北京航空航天大学 一种基于Gabor滤波器三维卷积神经网络模型的人体动作识别方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113808111A (zh) * 2021-09-18 2021-12-17 广州幻境科技有限公司 一种医学影像的三维虚拟重构方法和系统

Also Published As

Publication number Publication date
CN109416743B (zh) 2022-05-24
CN109416743A (zh) 2019-03-01

Similar Documents

Publication Publication Date Title
US20230351186A1 (en) Processing for multiple input data sets
CN111684473B (zh) 提高神经网络阵列的性能
WO2019136764A1 (zh) 卷积器及其所应用的人工智能处理装置
US10445638B1 (en) Restructuring a multi-dimensional array
JP6771018B2 (ja) 二次元配列プロセッサの性能向上
TWI634490B (zh) 卷積運算裝置及卷積運算方法
CN109844738A (zh) 运算处理电路和识别系统
US20190303731A1 (en) Target detection method and device, computing device and readable storage medium
CN108388537B (zh) 一种卷积神经网络加速装置和方法
WO2020062284A1 (zh) 基于卷积神经网络的图像处理方法和设备,以及无人机
CN107066239A (zh) 一种实现卷积神经网络前向计算的硬件结构
WO2019136762A1 (zh) 人工智能处理器、及其所应用的处理方法
WO2019136761A1 (zh) 一种用于识别人为动作的三维卷积装置
US10776689B2 (en) Systems and methods for processing convolutional neural network operations using textures
WO2021143569A1 (zh) 一种基于fpga的稠密光流计算系统及方法
US10402196B2 (en) Multi-dimensional sliding window operation for a vector processor, including dividing a filter into a plurality of patterns for selecting data elements from a plurality of input registers and performing calculations in parallel using groups of the data elements and coefficients
JP7402623B2 (ja) フィルタ処理装置及びその制御方法
JP7492555B2 (ja) 複数の入力データセットのための処理
JP2020042774A (ja) 人工知能推論演算装置
CN103198451A (zh) 一种用gpu通过分块实现快速小波变换的方法
US20220215617A1 (en) Viewpoint image processing method and related device
CN108073548B (zh) 卷积运算装置及卷积运算方法
CN106952215B (zh) 一种图像金字塔特征提取电路、装置及方法
WO2019136747A1 (zh) 反卷积器及其所应用的人工智能处理装置
CN115601223A (zh) 一种图像预处理装置、方法和芯片

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18900182

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 08/09/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18900182

Country of ref document: EP

Kind code of ref document: A1