WO2023115814A1 - Fpga hardware architecture, data processing method therefor and storage medium - Google Patents

Fpga hardware architecture, data processing method therefor and storage medium Download PDF

Info

Publication number
WO2023115814A1
WO2023115814A1 PCT/CN2022/095365 CN2022095365W WO2023115814A1 WO 2023115814 A1 WO2023115814 A1 WO 2023115814A1 CN 2022095365 W CN2022095365 W CN 2022095365W WO 2023115814 A1 WO2023115814 A1 WO 2023115814A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
feature data
unit
picture
processing
Prior art date
Application number
PCT/CN2022/095365
Other languages
French (fr)
Chinese (zh)
Inventor
曹其春
董刚
胡克坤
杨宏斌
尹文枫
王斌强
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023115814A1 publication Critical patent/WO2023115814A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Definitions

  • the present application relates to the technical field of FPGA hardware processing, in particular to an FPGA hardware architecture and a data processing method, device, computer equipment and storage medium.
  • the current hardware architecture based on FPGA-based neural network acceleration is mainly focused on improving the following performance: FPGA (Field Programmable Gate Array, Field Programmable Gate Array) computing power, network accuracy, and network model size.
  • FPGA Field Programmable Gate Array
  • the FPGA hardware architecture almost includes on-chip cache, convolution acceleration module, pool (pooling) module, load (loading) module, save (storage) module, and instruction control module.
  • the FPGA hardware architecture is not too difficult to implement, but the software compilation is relatively difficult to implement.
  • a data processing method applied to an FPGA hardware architecture comprising: acquiring first image feature data to be processed; inputting the first image feature data into a pedestrian re-identification network model based on contextual multi-scale feature learning to obtain a pedestrian re-identification network model
  • the output classification and identification information; among them, the structural blocks of the pedestrian re-identification network model include sequentially connected forward layered connection group, backward layered connection group and channel scale selection module, forward layered connection group and backward layered
  • Each connection group contains a plurality of first structural units, and the first structural unit includes the first 1x1 convolutional network, the first batch normalization network, the first translation network, the second 1x1 convolutional network, and the second batch normalization network connected in sequence
  • the summation unit is included in the channel scale selection module.
  • the forward hierarchical connectome is used to perform step-by-step inter-scale information fusion on the feature data of the first picture through multiple first structural units of the forward hierarchical connectome
  • the backward hierarchical connectome is used to
  • the channel scale selection module is used to perform the cross-scale information fusion through the summation unit The information output by each first structural unit in the hierarchical connection group is summed to obtain classification identification information.
  • the pedestrian re-identification network model further includes a 3x3 convolutional network module, and the 3x3 convolutional network module is connected to the forward layered connection group; the 3x3 convolutional network module is used to adopt multiple separable 3x3 convolutional networks
  • the feature data of the first picture is processed to obtain the feature data of the second picture; the forward hierarchical connection group is used to fuse information between the scales of the second picture feature data step by step through a plurality of first structural units.
  • the pedestrian re-identification network model also includes a translation convolution module, and the translation convolution module is connected to the forward layered connection group;
  • the feature data is processed to obtain the feature data of the third picture;
  • the forward hierarchical connection group is used to fuse the feature data of the third picture step by step through multiple first structural units.
  • the translational convolution module includes a second structural unit with the same structure as the first structural unit, and the translational convolution module is used to perform the first picture feature data through one or a plurality of second structural units connected in sequence Processing, after processing, the feature data of the third picture is obtained.
  • the translational convolution module further includes a pooling unit, and the pooling unit is located between the first second structural unit and the second second structural unit among the plurality of sequentially connected second structural units.
  • the feature data of the first picture is the feature data obtained after processing the original picture data with the quantization algorithm of the arbitrary bit quantization network DoReFa-Net.
  • a data processing device applied to an FPGA hardware architecture comprising: an acquisition module for acquiring feature data of a first picture to be processed; a processing module for inputting the feature data of the first picture into a pedestrian weighting system based on contextual multi-scale feature learning Identify the network model to obtain the classification and recognition information output by the pedestrian re-identification network model; wherein, the structural blocks of the pedestrian re-identification network model include sequentially connected forward hierarchical connection groups, backward hierarchical connection groups, and channel scale selection modules.
  • Both the layered connection group and the backward layered connection group contain a plurality of first structural units, and the first structural unit includes the first 1x1 convolutional network, the first batch normalization network, the first translation network, the second 1x1 convolutional network, second batch normalization network, and linear activation function network, with a summation unit included in the channel scale selection module.
  • FPGA hardware architecture comprises central processing unit, memory, computing unit processing part, pooling part and residual part, controller; Central processing unit is used for receiving the picture feature data to be processed, and will The picture feature data is stored in the memory; one or more arithmetic logic unit matrices are arranged in the calculation unit processing part, and the calculation unit processing part is used to read the picture feature data from the memory, and perform 1x1 on the picture feature data through the arithmetic logic unit matrix Convolution processing; the pooling component is used for pooling the picture feature data output by the computing unit processing component; the residual component is used for the picture feature data output by the pooling component and/or the picture feature data output by the computing unit processing component Carry out residual accumulation processing; the controller is used to control the computing unit processing part to read the picture feature data, weight data and translation parameters from the memory according to the pedestrian re-identification network model to perform 1x1 convolution processing, and control whether the pooling part performs calculation The feature data output by the unit processing
  • Both the forward hierarchical connectome and the backward hierarchical connectome contain multiple first structural units.
  • a structural unit includes the first 1x1 convolutional network, the first batch normalization network, the first translation network, the second 1x1 convolutional network, the second batch normalization network and the linear activation function network connected in sequence, and the channel scale selection module contains the calculation and unit.
  • a computer device comprising a memory and one or more processors, wherein computer readable instructions are stored in the memory, and when executed by the one or more processors, the one or more processing The processor executes the steps of any one of the above-mentioned data processing methods applied to the FPGA hardware architecture.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform any of the above-mentioned The item is applied to the steps of the data processing method of the FPGA hardware architecture.
  • Fig. 1 is a frame structure block diagram of a kind of FPGA hardware architecture provided by the present application according to one or more embodiments;
  • FIG. 2 is a schematic flow diagram of a data processing method applied to FPGA hardware architecture provided by the present application according to one or more embodiments;
  • FIG. 3 is a network block diagram of an existing pedestrian re-identification network model based on contextual multi-scale feature learning provided by the present application according to one or more embodiments;
  • FIG. 4 is a network block diagram of the internal model of DW Conv provided by the present application according to one or more embodiments;
  • FIG. 5 is a network block diagram of an improved pedestrian re-identification network model based on contextual multi-scale feature learning provided by the present application according to one or more embodiments;
  • FIG. 6 is an example diagram of convolution calculation of a translation operation provided by the present application according to one or more embodiments
  • Fig. 7 is an example diagram of convolution calculation of translation operation by means of average grouping provided by the present application according to one or more embodiments;
  • Fig. 8 is a schematic diagram of the dx and dy position identification of the convolution kernel provided by the translation operation according to one or more embodiments of the present application;
  • FIG. 9 is an example diagram of convolution calculation of a translation operation with a convolution kernel of 5 provided by the present application according to one or more embodiments;
  • FIG. 10 is an example diagram of 61 offset parameter coordinates provided by the present application according to one or more embodiments.
  • FIG. 11 is an example diagram of an 8-bit fixed point of arbitrary bit quantization DoReFa-Net provided by the present application according to one or more embodiments;
  • FIG. 12 is a specific hardware structural diagram of a FPGA hardware architecture provided by the present application according to one or more embodiments.
  • FIG. 13 is a structural block diagram of a data processing device applied to FPGA hardware architecture provided by the present application according to one or more embodiments;
  • Fig. 14 is an internal structure diagram of a computer device provided by the present application according to one or more embodiments.
  • a data processing method applied to FPGA hardware architecture provided by the present application can be applied to FPGA hardware architecture as shown in FIG. 1 .
  • the FPGA hardware architecture includes a central processing unit, a memory, a computing unit processing unit, a pooling unit, a residual unit, and a controller.
  • the central processing unit is connected to the memory, and when receiving the first picture characteristic data to be processed, stores the first picture characteristic data in the memory.
  • the computing unit processing unit is mainly used to process the 1x1 convolutional network.
  • the arithmetic logic unit matrix is set in the calculation unit processing part, and the matrix size of the arithmetic logic unit matrix can be set according to different hardware resource conditions.
  • the calculation unit processing part reads the feature data of the first picture from the memory, performs 1x1 convolution processing on the feature data of the first picture through the ALU matrix, and outputs the information of the convolution processing result to the pooling part.
  • the pooling component is used to perform pooling processing on the information output by the computing unit processing component.
  • the residual components are respectively connected to the pooling component and the computing unit processing component, and can perform residual accumulation processing on the information output by the pooling component, or perform residual cumulative processing on the information output by the computing unit processing component, or can perform pooling
  • the information output by the component and the calculation unit process the information output by the component to perform residual accumulation processing.
  • the controller is used to control the computing unit processing unit to read the first picture feature data, weight data and translation parameters from the memory according to the pedestrian re-identification network model, so as to perform 1x1 convolution processing, and control whether the pooling unit outputs to the computing unit processing unit
  • the feature data of the residual component is pooled, and whether the residual component reads other image feature data from the memory and whether to perform residual processing on the image feature data input to the residual component; among them, the structural block of the pedestrian re-identification network model Including the sequentially connected forward hierarchical connectome, backward hierarchical connectome and channel scale selection module, both the forward hierarchical connectome and the backward hierarchical connectome contain a plurality of first structural units, the first structural unit It includes the first 1x1 convolution network, the first batch normalization network, the first translation network, the second 1x1 convolution network, the second batch normalization network and the linear activation function network connected in sequence, and the channel scale selection module includes a summation unit. Therefore, the controller can control other components
  • a data processing method applied to the FPGA hardware architecture of the present application adopts the FPGA hardware architecture as shown in FIG. 1 .
  • the first picture feature data is read from the memory by the computing unit processing component. Realize the correlation operation of the first structural unit in the forward hierarchical connection group and the backward hierarchical connection group in the pedestrian re-identification network model through the arithmetic logic unit matrix in the calculation unit processing part, and realize the channel scale selection module through the residual part The correlation operation of the summation unit in .
  • the pooling component can also be used to perform pooling processing on the information output by the computing unit processing component, and then the residual component can be used to perform residual accumulation processing .
  • the present application is a data processing method applied to the FPGA hardware architecture.
  • the FPGA hardware architecture When using the FPGA hardware architecture to realize image data processing, it can simplify the hardware requirements for the FPGA hardware architecture and realize the use of fewer resources to realize the FPGA hardware architecture. Network acceleration.
  • a kind of data processing method that is applied to FPGA hardware framework is provided, is applied to the FPGA hardware framework in Figure 1 with this method as example to illustrate, comprises the following steps:
  • a data processing method applied to an FPGA hardware architecture is used to accelerate processing of image data based on a neural network of the FPGA hardware architecture, so as to quickly identify classification information of an image.
  • the acquired first picture feature data is the picture feature data obtained after feature data extraction of the picture to be processed.
  • the structural blocks of the pedestrian re-identification network model include sequentially connected forward hierarchical connectome, backward hierarchical connectome and channel scale selection module.
  • Both forward hierarchical connectome and backward hierarchical connectome contain multiple A first structural unit, the first structural unit includes the first 1x1 convolutional network, the first batch normalization network, the first translation network, the second 1x1 convolutional network, the second batch normalization network and the linear activation function network connected in sequence,
  • a summation unit is included in the channel scale selection module.
  • the forward hierarchical connection group is used to perform step-by-step inter-scale information fusion on the feature data of the first picture through multiple first structural units of the forward hierarchical connection group
  • the backward hierarchical connection group is used to
  • the multiple first structural units in the group perform cross-scale information fusion on the information output by each first structural unit in the forward hierarchical connection group
  • the channel scale selection module is used to use the summation unit to perform cross-scale information fusion on the information output by each first structural unit in the backward hierarchical connection group.
  • the information output by a structural unit is summed to obtain classification identification information.
  • the pedestrian re-identification network model based on contextual multi-scale feature learning is used to process the feature data of the first picture to obtain the classification and identification information of the original picture corresponding to the feature data of the first picture.
  • the pedestrian re-identification network model based on contextual multi-scale feature learning is obtained by improving the network model of the existing pedestrian re-identification network model based on contextual multi-scale feature learning.
  • CMSNet Contextual Multi-Scale Feature Learning for Person Re-Identification
  • HFCG Forward Hierarchical Connectgroup
  • BHCG Backward Hierarchical Connectgroup
  • CMSNet also includes a CSS (Channel-Wise Scale Selection module, channel-wise scale selection module) structure. Operations such as b-softmax, fullconnected, and matrix product are used in the CSS structure.
  • Conv 1x1 represents a 1x1 convolutional network
  • the internal model of DW Conv is shown in Figure 4.
  • GConv 3x3 represents a separable 3x3 convolution
  • BN Batch Normalization
  • ReLU represents a linear activation function network.
  • b 1 , b 2 , b 3 , and b 4 respectively identify the output information of the corresponding DW Conv.
  • AvgPod represents the averaging node.
  • the improved pedestrian re-identification network model based on contextual multi-scale feature learning also includes three modules: forward hierarchical connection group HFCG, backward hierarchical connection group BHCG and channel scale selection module CSS.
  • the improved forward hierarchical connection group HFCG and backward hierarchical connection group BHCG contain convolutional units no longer DW Conv, but the first structural unit, the CSC unit.
  • the CSC unit includes Conv 1x1, BN, Shift, and ReLU. Among them, Shift means translation operation, Conv 1x1 means 1x1 convolutional network, BN means batch normalization network, and ReLU means linear activation function network.
  • the improved channel scale selection module CSS only includes the summation unit Sum.
  • the above-mentioned FPGA hardware architecture and data processing method can simplify the hardware configuration of the FPGA hardware architecture when processing image feature data, cooperate with software and hardware, and realize network acceleration of the FPGA hardware architecture with fewer resources.
  • the pedestrian re-identification network model further includes a 3x3 convolutional network module, and the 3x3 convolutional network module is connected to the forward layered connection group; the 3x3 convolutional network module is used to adopt a plurality of separable 3x3 convolutional network pairs
  • the feature data of the first picture is processed, and the feature data of the second picture are obtained after processing; the forward hierarchical connection group is used to fuse the feature data of the second picture step by step through a plurality of first structural units.
  • the extracted picture feature data is obtained after picture feature extraction is performed on the original picture.
  • the existing pedestrian re-identification network model uses a 7x7 convolutional network to process image feature data.
  • the improved pedestrian re-identification network model of this application uses multiple separable 3x3 convolutional networks to process the feature data of the first picture, and obtain the feature data of the second picture after processing. Therefore, this application replaces the 7x7 convolutional network in the original pedestrian re-identification network model with three separable 3x3 convolutional networks.
  • the pedestrian re-identification network model further includes a translational convolution module, and the translational convolutional module is connected to the forward layered connection group; the translational convolutional module is used to perform a translation operation and a 1x1 convolutional network on the first picture feature
  • the data is processed to obtain the feature data of the third picture after processing; the forward hierarchical connection group is used to fuse the feature data of the third picture step by step through multiple first structural units.
  • the feature data of the first picture is processed through a translation operation and a 1x1 convolutional network.
  • the 7x7 convolutional network in the original pedestrian re-identification network model is replaced by the translation operation and the 1x1 convolutional network, or the separable 3x3 convolutional network in the above-mentioned embodiment can be replaced.
  • This replacement can reduce the migration of feature map featuremap data. For example, in a 3x3 convolutional network, each image feature data needs to be used three times, while in a 1x1 convolutional network, the image feature data only needs to be moved once, thus reducing the logic complexity and improving the calculation speed of the pedestrian re-identification network model.
  • each image feature data needs to be used 7 times, while the image feature data in the 1x1 convolutional network only needs to be moved once, thus reducing the logic complexity and improving the efficiency.
  • the computational speed of the pedestrian re-identification network model is to move a certain range of pixels of the image feature data to the middle as a result, and such an operation reduces the number of multiplication operations. This replacement will result in reduced precision, but can reduce the number of operations of the FPGA hardware architecture and simplify the network structure design of the FPGA hardware architecture.
  • the translation operation and 1x1 convolutional network are described in detail below:
  • the convolution process of the translation operation is equivalent to translating the original input matrix in a certain direction, as shown in Figure 6.
  • the simple translation operation does not seem to extract spatial information
  • the channel domain is the hierarchical diffusion of spatial domain information
  • the input image feature data (such as separable
  • the tensor of the second picture feature data output by the 3x3 convolutional network is translated in different channels, and then combined with the 1x1 convolutional network to achieve cross-channel information fusion, the information extraction in the spatial domain and the channel domain can be realized.
  • a convolution kernel for translation operation on each channel channel ( represents an integer) possible translation directions, and assuming that there are M channels, so there are possible translation options. Obviously, it is unrealistic to violently search for the most suitable translation option in such a space. So, divide the M channels into Each group is called a shift group. then each set of Channels use the same translation selection, using the same translation direction. Of course, there may be inexhaustible situations. At this time, there will be some channels that cannot be divided into any group. These channels are called “centered" groups, and the "centered” group does not perform translation operations. The input is grouped by the number of channels, and each group of channels only translates in one direction.
  • the convolution kernel is 3x3, and then there are 64 channels in total. According to the above method, these channels are divided into 9 groups, each group has 7 channels, and the remaining channel does not perform translation operations. These 9 groups need to be assigned to a certain translation group in sequence, and the 7 channels of each group share a translation parameter.
  • the 3x3 convolutional network in separability is replaced by a translation operation and a 1x1 convolutional network.
  • the translation operation uses average grouping, and each group is assigned to a translation group in order.
  • the dx and dy of the x-axis and y-axis of the translation are controlled at [-1 ,1] range, a simple description is shown in Figure 7.
  • the above-mentioned translational convolution module includes a second structural unit having the same structure as the first structural unit, and the translational convolution module is used to process the feature data of the first picture through one or a plurality of second structural units connected in sequence Processing, after processing, the feature data of the third picture is obtained.
  • the translational convolution module further includes a pooling unit, and the pooling unit is located in the first one of the second structural units and the second one of the second structural units among the sequentially connected multiple second structural units. between structural units.
  • the above-mentioned translational convolution module is used to process the feature data of the first picture through a translation operation and a 1x1 convolutional network.
  • the first picture feature data may be processed by using a second structural unit having the same structure as the first structural unit.
  • the second structural unit includes the sequentially connected first 1x1 convolutional network, the first batch normalization network, the first translation network, the second 1x1 convolutional network, the second batch normalization network, and a linear activation function as in the first structural unit network.
  • the first picture feature data can be processed through the translation operation and the 1 ⁇ 1 convolutional network through the second structural unit.
  • this application improves on the existing pedestrian re-identification network model.
  • the improved pedestrian re-identification network of this application is 2
  • the improved pedestrian re-identification network of this application In the model, a pooling unit is added between the first second structural unit and the second second structural unit among the sequentially connected multiple second structural units.
  • the pooling unit can be a unit for maximum pooling or an average pooling unit. Therefore, in order to keep the size of the intermediate feature map of the entire network consistent with the original network, a maximum pooling layer is added after the first CSC structure. The details are shown in the table below.
  • this application optimizes the CMSNet network structure in order to be compatible with the algorithm structure and hardware characteristics, and maximizes the cost performance of software and hardware within a reasonable range of network performance.
  • the optimized network structure is as follows:
  • Input represents the input
  • Layer represents the network level of the CMSNet network
  • Cout represents the number of output channels
  • Kernel represents the convolution kernel
  • Stride represents the convolution step size
  • Number represents the number of repetitions
  • Params represents the parameter
  • Flops represents the number of floats per second.
  • CSC represents the first structural unit of the above CSC
  • max pool represents the maximum pooling
  • CMS block represents the entire network module in Figure 5
  • conv2d represents two-dimensional convolution
  • average pool represents average pooling
  • global average pool represents the global Average pooling
  • fc means full connection.
  • the CMS Block structure is also optimized.
  • the CMS Block structure is shown in Figure 5. Comparing Figure 3 and Figure 4, it can be seen that in the optimized CMS Block structure, DWConv is replaced with a CSC structure, and the CSC structure is composed of Conv1x1+Shift+Conv1x1+BN+Relu. After training the CMSNet network structure, BN will be fused to Conv1x1 and solidified into the model file.
  • the optimized network structure contains multiple first structural units.
  • a structural unit processes the feature data of the second picture.
  • the above-mentioned first picture feature data is feature data obtained after processing the original picture data by using an arbitrary bit quantization network DoReFa-Net.
  • the weight parameters in the network-optimized CMSNet network are only Conv1x1 parameters, and the structure is unified, which is more conducive to quantifying the network structure.
  • the quantization algorithm of the arbitrary bit quantization network DoReFa-Net is used to quantize the full-precision weights in the CMSNet network.
  • it includes using an arbitrary bit quantization network to obtain the first image feature data from the original image data, and may also include using an arbitrary bit quantization network to quantize the first image feature data before inputting it into the pedestrian re-identification network model, and using an arbitrary bit quantization
  • the network quantizes the feature data of the second picture before inputting it to the pedestrian re-identification network model, and uses an arbitrary bit quantization network to quantify the feature data of the third picture before inputting it to the pedestrian re-identification network model.
  • the quantization algorithm of the arbitrary bit quantization network DoReFa-Net is used to quantize the picture feature data input to the pedestrian re-identification network model for convolution processing.
  • a data processing method applied to the FPGA hardware architecture of the present application from the aspect of collaborative design of hardware and network, separates the neural network acceleration hardware design and network compression in the pedestrian re-identification network model, and compresses the network as much as possible Considering the characteristics of the hardware, the network model is more suitable for the hardware architecture.
  • This application is based on the latest pedestrian re-identification network CMSNet network to design a network that can be applied to the FPGA hardware architecture.
  • Conv1x1+Shift operation makes it unnecessary for the hardware to design a separate translation operation Shift module, simplifies the resource consumption of the hardware design translation operation Shift module, and reduces the transmission of data between modules.
  • the optimized pedestrian re-identification network only includes Conv1x1 convolution, pooling, and residual.
  • Conv1x1 convolution In hardware design, several main modules are the calculation unit, pooling unit, and residual unit of Conv1x1 convolution.
  • the structure of the hardware is simpler, and the data can also achieve the maximum flow between the hardware modules, so that the data can be maximized on the hardware.
  • steps in the flow chart are displayed sequentially according to the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in the flowchart may include multiple sub-steps or multiple stages, these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, the execution of these sub-steps or stages The order is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • the present application also provides an FPGA hardware architecture.
  • the FPGA hardware architecture includes a central processing unit, a memory, a computing unit processing unit, a pooling unit, a residual unit, and a controller.
  • the central processing unit is used to receive the picture feature data to be processed, and store the picture feature data into the memory; one or more arithmetic logic unit matrices are arranged in the computing unit processing part, and the computing unit processing part is used to read the picture feature data from the memory Data, and perform 1x1 convolution processing on the image feature data through the ALU matrix;
  • the pooling component is used to perform pooling processing on the image feature data output by the computing unit processing component;
  • the residual component is used for the image output by the pooling component
  • the feature data and/or the picture feature data output by the calculation unit processing part are processed for residual accumulation;
  • the controller is used to control the calculation unit processing part to read the picture feature data, weight data and translation parameters from the memory according to the pedestrian re-identification network model, to Perform 1x1
  • the pedestrian re-identification network model may further include the modules or units described in the embodiments corresponding to the above-mentioned data processing method applied to the FPGA hardware architecture. For details, please refer to the descriptions of the above-mentioned embodiments.
  • FIG. 12 a specific hardware structure diagram of an FPGA hardware architecture is given, as shown in FIG. 12 .
  • CPU represents a central processing unit
  • DDR/DRAM represents a double-rate synchronous dynamic random access memory/dynamic random access memory.
  • the central processing unit receives the image characteristic data to be processed, and stores the image characteristic data in a double-rate synchronous dynamic random access memory/dynamic random access memory.
  • One or more arithmetic logic unit matrices are arranged in the calculation unit processing part, and the calculation unit processing part reads the picture feature data from the double-rate synchronous dynamic random access memory/dynamic random access memory, and uses the arithmetic logic unit matrix to compare the picture feature data
  • the data is processed by 1x1 convolution.
  • the pooling component performs pooling processing on the image feature data output by the computing unit processing component.
  • the residual component performs residual accumulation processing on the picture feature data output by the pooling component and/or the picture feature data output by the calculation unit processing component.
  • the controller controls the processing unit of the calculation unit to read the picture feature data, weight data and translation parameters from the double-rate synchronous DRAM/DRAM to perform 1x1 convolution processing, and control the pool Whether the optimization component performs pooling processing on the feature data output by the computing unit processing component, and whether the control residual component reads the image feature data from the double-rate synchronous dynamic random access memory/dynamic random access memory and whether to input it to the residual
  • the structural blocks of the pedestrian re-identification network model include sequentially connected forward hierarchical connectome, backward hierarchical connectome and channel scale selection module.
  • Both forward hierarchical connectome and backward hierarchical connectome contain multiple A first structural unit, the first structural unit includes the first 1x1 convolutional network, the first batch normalization network, the first translation network, the second 1x1 convolutional network, the second batch normalization network and the linear activation function network connected in sequence, A summation unit is included in the channel scale selection module.
  • the present application also provides a data processing device applied to FPGA hardware architecture.
  • a data processing device applied to FPGA hardware architecture including an acquisition module 1302 and a processing module 1304 .
  • the acquiring module 1302 is used to acquire the feature data of the first picture to be processed;
  • the processing module 1304 is used to input the feature data of the first picture into the pedestrian re-identification network model based on contextual multi-scale feature learning, and obtain the output of the pedestrian re-identification network model Classification identification information; among them, the structural blocks of the pedestrian re-identification network model include sequentially connected forward hierarchical connection group, backward hierarchical connection group and channel scale selection module, forward hierarchical connection group and backward hierarchical connection group
  • the forward hierarchical connectome is used to perform step-by-step inter-scale information fusion on the feature data of the first picture through multiple first structural units of the forward hierarchical connectome
  • the backward hierarchical connectome is used to
  • the channel scale selection module is used to perform the cross-scale information fusion through the summation unit The information output by each first structural unit in the hierarchical connection group is summed to obtain classification identification information.
  • the pedestrian re-identification network model further includes a 3x3 convolutional network module, and the 3x3 convolutional network module is connected to the forward layered connection group; the 3x3 convolutional network module is used to adopt multiple separable 3x3 convolutional networks
  • the feature data of the first picture is processed to obtain the feature data of the second picture; the forward hierarchical connection group is used to fuse information between the scales of the second picture feature data step by step through a plurality of first structural units.
  • the pedestrian re-identification network model also includes a translation convolution module, and the translation convolution module is connected to the forward layered connection group;
  • the feature data is processed to obtain the feature data of the third picture;
  • the forward hierarchical connection group is used to fuse the feature data of the third picture step by step through multiple first structural units.
  • the translational convolution module includes a second structural unit with the same structure as the first structural unit, and the translational convolution module is used to perform the first picture feature data through one or a plurality of second structural units connected in sequence Processing, after processing, the feature data of the third picture is obtained.
  • the translational convolution module further includes a pooling unit, and the pooling unit is located between the first second structural unit and the second second structural unit among the plurality of sequentially connected second structural units.
  • the feature data of the first picture is the feature data obtained after processing the original picture data with the quantization algorithm of the arbitrary bit quantization network DoReFa-Net.
  • Each module in the above-mentioned data processing device applied to the FPGA hardware architecture can be fully or partially realized by software, hardware and combinations thereof.
  • the above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can invoke and execute the corresponding operations of the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 14 .
  • the computer device includes a processor, memory, network interface and database connected by a system bus. Wherein, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions and a database.
  • the internal memory provides an environment for the execution of the operating system and computer readable instructions in the non-volatile storage medium.
  • the network interface of the computer equipment is used to connect with external equipment, so as to receive the information of external equipment.
  • Figure 14 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer equipment on which the solution of this application is applied.
  • the specific computer equipment can be More or fewer components than shown in the figures may be included, or some components may be combined, or have a different arrangement of components.
  • a computer device including a memory, a processor, and computer-readable instructions stored in the memory and operable on the processor.
  • the processor executes the computer-readable instructions to implement any of the above-mentioned embodiments. The steps of the data processing method applied to the FPGA hardware architecture.
  • non-volatile computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions are processed by one or more The steps of the data processing method applied to the FPGA hardware architecture in any one of the above embodiments can be implemented when the device is executed.
  • Nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM random access memory
  • RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The present application relates to an FPGA hardware architecture, a data processing method and apparatus, a computer device, and a storage medium. The method comprises: acquiring first picture feature data to be processed; inputting the first picture feature data into a pedestrian re-identification network model to obtain classified identification information, a structural block of the pedestrian re-identification network model comprising a forward hierarchical connection group, a backward hierarchical connection group and a channel scale selection module which are connected in sequence, each of the forward hierarchical connection group and the backward hierarchical connection group comprising multiple first structural units, a first structural unit comprising a first 1x1 convolutional network, a first batch normalization network, a first translation network, a second 1x1 convolutional network, a second batch normalization network and a linear activation function network which are connected in sequence, and the channel scale selection module comprising a summation unit.

Description

FPGA硬件架构及其数据处理方法、存储介质FPGA hardware architecture and its data processing method and storage medium
相关申请的交叉引用Cross References to Related Applications
本申请要求于2021年12月22日提交中国专利局,申请号为202111579432.9,申请名称为“FPGA硬件架构及其数据处理方法、存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202111579432.9 and the application title "FPGA hardware architecture and its data processing method, storage medium" submitted to the China Patent Office on December 22, 2021, the entire contents of which are incorporated by reference in this application.
技术领域technical field
本申请涉及FPGA硬件处理技术领域,特别是涉及一种FPGA硬件架构及数据处理方法、装置、计算机设备和存储介质。The present application relates to the technical field of FPGA hardware processing, in particular to an FPGA hardware architecture and a data processing method, device, computer equipment and storage medium.
背景技术Background technique
当前基于FPGA的神经网络加速实现的硬件架构主要集中于提高以下几种性能:FPGA(Field Programmable Gate Array,现场可编程逻辑门阵列)算力、网络精度以及网络模型大小。FPGA硬件架构也差不多包括片上缓存、卷积加速模块、pool(池化)模块、load(加载)模块、save(存储)模块以及指令控制模块。FPGA硬件架构上并不是太难实现,反而是软件编译这块相对较难实现。The current hardware architecture based on FPGA-based neural network acceleration is mainly focused on improving the following performance: FPGA (Field Programmable Gate Array, Field Programmable Gate Array) computing power, network accuracy, and network model size. The FPGA hardware architecture almost includes on-chip cache, convolution acceleration module, pool (pooling) module, load (loading) module, save (storage) module, and instruction control module. The FPGA hardware architecture is not too difficult to implement, but the software compilation is relatively difficult to implement.
软件编译需要适应不同的网络模型,还要能兼容FPGA硬件的变化,同时还需要为用户提供一个容易操作的接口,在目前情景同时实现这些还比较困难。原因是:FPGA硬件架构的变化太多,FPGA硬件架构的各个模块可配参数可以基于需求变化,比如卷积模块并行数的变化。另外,网络模型多种多样以及开源的网络模型平台也很多,从而导致FPGA硬件架构的多样化。Software compilation needs to adapt to different network models and be compatible with changes in FPGA hardware. At the same time, it needs to provide users with an easy-to-operate interface. It is still difficult to achieve these at the same time in the current situation. The reason is: there are too many changes in the FPGA hardware architecture, and the configurable parameters of each module of the FPGA hardware architecture can be changed based on requirements, such as changes in the parallel number of convolution modules. In addition, there are various network models and many open source network model platforms, which lead to the diversification of FPGA hardware architecture.
发明人意识到,由于软硬件的协调问题,在使用FPGA硬件架构实现神经网络加速处理图片数据时,若软件设置较为复杂,则FPGA硬件架构基于软件设置也会随之较为复杂。因此,如何软硬件协同设计,简化FPGA硬件架构成为当前急需解决的问题。The inventor realized that due to the coordination problem of software and hardware, when using the FPGA hardware architecture to realize the accelerated processing of image data by the neural network, if the software settings are more complicated, the FPGA hardware architecture based on the software settings will also be more complicated. Therefore, how to design software and hardware together and simplify the FPGA hardware architecture has become an urgent problem to be solved.
发明内容Contents of the invention
一种应用于FPGA硬件架构的数据处理方法,包括:获取待处理的第一图片特征数据;将第一图片特征数据输入基于上下文多尺度特征学习的行人重识别网络模型,得到行人重识别网络模型输出的分类识别信息;其中,行人重识别网络模型的结构块包括依次连接的前向分层连接组、后向分层连接组以及信道尺度选择模块,前向分层连接组以及后向分层连接组中均包含多个第一结构单元,第一结构单元包含依次连接的第一1x1卷积网络、第一批量标准化网络、第一平移网络、第二1x1卷积网络、第二批量标准化网络以及线性激活函数网络,信道尺度选择模块中包含求和单元。A data processing method applied to an FPGA hardware architecture, comprising: acquiring first image feature data to be processed; inputting the first image feature data into a pedestrian re-identification network model based on contextual multi-scale feature learning to obtain a pedestrian re-identification network model The output classification and identification information; among them, the structural blocks of the pedestrian re-identification network model include sequentially connected forward layered connection group, backward layered connection group and channel scale selection module, forward layered connection group and backward layered Each connection group contains a plurality of first structural units, and the first structural unit includes the first 1x1 convolutional network, the first batch normalization network, the first translation network, the second 1x1 convolutional network, and the second batch normalization network connected in sequence As well as the linear activation function network, the summation unit is included in the channel scale selection module.
在其中一个实施例中,前向分层连接组用于通过前向分层连接组的多个第一结构单元对第一图片特征数据进行分步尺度间信息融合,后向分层连接组用于通过后向分层连接组的多个第一结构单元对前向分层连接组中各第一结构单元输出的信息进行跨尺度信息融合,信道尺度选择模块用于通过求和单元对后向分层连接组中各第一结构单元输出的信息进行求和,得到分类识别信息。In one of the embodiments, the forward hierarchical connectome is used to perform step-by-step inter-scale information fusion on the feature data of the first picture through multiple first structural units of the forward hierarchical connectome, and the backward hierarchical connectome is used to In order to perform cross-scale information fusion on the information output by each first structural unit in the forward hierarchical connection group through the multiple first structural units of the backward hierarchical connection group, the channel scale selection module is used to perform the cross-scale information fusion through the summation unit The information output by each first structural unit in the hierarchical connection group is summed to obtain classification identification information.
在其中一个实施例中,行人重识别网络模型还包括3x3卷积网络模块,3x3卷积网络模块与前向分 层连接组连接;3x3卷积网络模块用于采用多个可分离3x3卷积网络对第一图片特征数据进行处理,处理后得到第二图片特征数据;前向分层连接组用于通过多个第一结构单元对第二图片特征数据进行分步尺度间信息融合。In one of the embodiments, the pedestrian re-identification network model further includes a 3x3 convolutional network module, and the 3x3 convolutional network module is connected to the forward layered connection group; the 3x3 convolutional network module is used to adopt multiple separable 3x3 convolutional networks The feature data of the first picture is processed to obtain the feature data of the second picture; the forward hierarchical connection group is used to fuse information between the scales of the second picture feature data step by step through a plurality of first structural units.
在其中一个实施例中,行人重识别网络模型还包括平移卷积模块,平移卷积模块与前向分层连接组连接;平移卷积模块用于通过平移操作和1x1卷积网络对第一图片特征数据进行处理,处理后得到第三图片特征数据;前向分层连接组用于通过多个第一结构单元对第三图片特征数据进行分步尺度间信息融合。In one of the embodiments, the pedestrian re-identification network model also includes a translation convolution module, and the translation convolution module is connected to the forward layered connection group; The feature data is processed to obtain the feature data of the third picture; the forward hierarchical connection group is used to fuse the feature data of the third picture step by step through multiple first structural units.
在其中一个实施例中,平移卷积模块包含与第一结构单元结构相同的第二结构单元,平移卷积模块用于通过一个或依次连接的多个第二结构单元对第一图片特征数据进行处理,处理后得到第三图片特征数据。In one of the embodiments, the translational convolution module includes a second structural unit with the same structure as the first structural unit, and the translational convolution module is used to perform the first picture feature data through one or a plurality of second structural units connected in sequence Processing, after processing, the feature data of the third picture is obtained.
在其中一个实施例中,平移卷积模块中还包含池化单元,池化单元位于依次连接的多个第二结构单元中第一个第二结构单元和第二个第二结构单元之间。In one embodiment, the translational convolution module further includes a pooling unit, and the pooling unit is located between the first second structural unit and the second second structural unit among the plurality of sequentially connected second structural units.
在其中一个实施例中,第一图片特征数据为采用任意比特量化网络DoReFa-Net的量化算法对原始图片数据处理后得到的特征数据。In one embodiment, the feature data of the first picture is the feature data obtained after processing the original picture data with the quantization algorithm of the arbitrary bit quantization network DoReFa-Net.
一种应用于FPGA硬件架构的数据处理装置,包括:获取模块,用于获取待处理的第一图片特征数据;处理模块,用于将第一图片特征数据输入基于上下文多尺度特征学习的行人重识别网络模型,得到行人重识别网络模型输出的分类识别信息;其中,行人重识别网络模型的结构块包括依次连接的前向分层连接组、后向分层连接组以及信道尺度选择模块,前向分层连接组以及后向分层连接组中均包含多个第一结构单元,第一结构单元包含依次连接的第一1x1卷积网络、第一批量标准化网络、第一平移网络、第二1x1卷积网络、第二批量标准化网络以及线性激活函数网络,信道尺度选择模块中包含求和单元。A data processing device applied to an FPGA hardware architecture, comprising: an acquisition module for acquiring feature data of a first picture to be processed; a processing module for inputting the feature data of the first picture into a pedestrian weighting system based on contextual multi-scale feature learning Identify the network model to obtain the classification and recognition information output by the pedestrian re-identification network model; wherein, the structural blocks of the pedestrian re-identification network model include sequentially connected forward hierarchical connection groups, backward hierarchical connection groups, and channel scale selection modules. Both the layered connection group and the backward layered connection group contain a plurality of first structural units, and the first structural unit includes the first 1x1 convolutional network, the first batch normalization network, the first translation network, the second 1x1 convolutional network, second batch normalization network, and linear activation function network, with a summation unit included in the channel scale selection module.
一种FPGA硬件架构,其特征在于,FPGA硬件架构包括中央处理器、存储器、计算单元处理部件、池化部件以及残差部件、控制器;中央处理器用于接收待处理的图片特征数据,并将图片特征数据存储到存储器;计算单元处理部件中设置有一个或多个算术逻辑单元矩阵,计算单元处理部件用于从存储器中读取图片特征数据,并通过算术逻辑单元矩阵对图片特征数据进行1x1卷积处理;池化部件用于对计算单元处理部件输出的图片特征数据进行池化处理;残差部件用于对池化部件输出的图片特征数据和/或计算单元处理部件输出的图片特征数据进行残差累加处理;控制器用于根据行人重识别网络模型控制计算单元处理部件从存储器中读取图片特征数据、权重数据及平移参数,以进行1x1卷积处理,以及控制池化部件是否对计算单元处理部件输出的特征数据进行池化处理,以及控制残差部件是否从存储器中读取图片特征数据以及是否将输入到残差部件的图片特征数据进行残差处理;其中,行人重识别网络模型的结构块包括依次连接的前向分层连接组、后向分层连接组以及信道尺度选择模块,前向分层连接组以及后向分层连接组中均包含多个第一结构单元,第一结构单元包含依次连接的第一1x1卷积网络、第一批量标准化网络、第一平移网络、第二1x1卷积网络、第二批量标准化网络以及线性激活函数网络,信道尺度选择模块中包含求和单元。A kind of FPGA hardware architecture, it is characterized in that, FPGA hardware architecture comprises central processing unit, memory, computing unit processing part, pooling part and residual part, controller; Central processing unit is used for receiving the picture feature data to be processed, and will The picture feature data is stored in the memory; one or more arithmetic logic unit matrices are arranged in the calculation unit processing part, and the calculation unit processing part is used to read the picture feature data from the memory, and perform 1x1 on the picture feature data through the arithmetic logic unit matrix Convolution processing; the pooling component is used for pooling the picture feature data output by the computing unit processing component; the residual component is used for the picture feature data output by the pooling component and/or the picture feature data output by the computing unit processing component Carry out residual accumulation processing; the controller is used to control the computing unit processing part to read the picture feature data, weight data and translation parameters from the memory according to the pedestrian re-identification network model to perform 1x1 convolution processing, and control whether the pooling part performs calculation The feature data output by the unit processing component is pooled, and whether the residual component reads the picture feature data from the memory and whether the picture feature data input to the residual component is subjected to residual processing; among them, the pedestrian re-identification network model The structural block includes sequentially connected forward hierarchical connectome, backward hierarchical connectome and channel scale selection module. Both the forward hierarchical connectome and the backward hierarchical connectome contain multiple first structural units. A structural unit includes the first 1x1 convolutional network, the first batch normalization network, the first translation network, the second 1x1 convolutional network, the second batch normalization network and the linear activation function network connected in sequence, and the channel scale selection module contains the calculation and unit.
一种计算机设备,包括存储器及一个或多个处理器,存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行上述任意一项应用于FPGA硬件架构的数据处理方法的步骤。A computer device comprising a memory and one or more processors, wherein computer readable instructions are stored in the memory, and when executed by the one or more processors, the one or more processing The processor executes the steps of any one of the above-mentioned data processing methods applied to the FPGA hardware architecture.
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行上述任意一项应用于FPGA硬件架构的数据处理方法的步骤。One or more non-volatile computer-readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform any of the above-mentioned The item is applied to the steps of the data processing method of the FPGA hardware architecture.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below. Other features and advantages of the application will be apparent from the description, drawings, and claims.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application.
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, for those of ordinary skill in the art, In other words, other drawings can also be obtained from these drawings without paying creative labor.
图1为本申请根据一个或多个实施例提供的一种FPGA硬件架构的框架结构框图;Fig. 1 is a frame structure block diagram of a kind of FPGA hardware architecture provided by the present application according to one or more embodiments;
图2为本申请根据一个或多个实施例提供的一种应用于FPGA硬件架构的数据处理方法的流程示意图;FIG. 2 is a schematic flow diagram of a data processing method applied to FPGA hardware architecture provided by the present application according to one or more embodiments;
图3为本申请根据一个或多个实施例提供的现有的基于上下文多尺度特征学习的行人重识别网络模型的网络框图;FIG. 3 is a network block diagram of an existing pedestrian re-identification network model based on contextual multi-scale feature learning provided by the present application according to one or more embodiments;
图4为本申请根据一个或多个实施例提供的DW Conv的内部模型的网络框图;FIG. 4 is a network block diagram of the internal model of DW Conv provided by the present application according to one or more embodiments;
图5为本申请根据一个或多个实施例提供的改进后的基于上下文多尺度特征学习的行人重识别网络模型的网络框图;FIG. 5 is a network block diagram of an improved pedestrian re-identification network model based on contextual multi-scale feature learning provided by the present application according to one or more embodiments;
图6为本申请根据一个或多个实施例提供的平移操作的卷积计算的示例图;FIG. 6 is an example diagram of convolution calculation of a translation operation provided by the present application according to one or more embodiments;
图7为本申请根据一个或多个实施例提供的采用平均分组的方式进行平移操作的卷积计算的示例图;Fig. 7 is an example diagram of convolution calculation of translation operation by means of average grouping provided by the present application according to one or more embodiments;
图8为本申请根据一个或多个实施例提供的平移操作的卷积核的dx、dy位置标识的示意图;Fig. 8 is a schematic diagram of the dx and dy position identification of the convolution kernel provided by the translation operation according to one or more embodiments of the present application;
图9为本申请根据一个或多个实施例提供的卷积核为5的平移操作的卷积计算的示例图;FIG. 9 is an example diagram of convolution calculation of a translation operation with a convolution kernel of 5 provided by the present application according to one or more embodiments;
图10为本申请根据一个或多个实施例提供的61个偏移参数坐标的示例图;FIG. 10 is an example diagram of 61 offset parameter coordinates provided by the present application according to one or more embodiments;
图11为本申请根据一个或多个实施例提供的任意比特量化DoReFa-Net的8位定点的示例图;FIG. 11 is an example diagram of an 8-bit fixed point of arbitrary bit quantization DoReFa-Net provided by the present application according to one or more embodiments;
图12为本申请根据一个或多个实施例提供的一种FPGA硬件架构的具体硬件结构图;FIG. 12 is a specific hardware structural diagram of a FPGA hardware architecture provided by the present application according to one or more embodiments;
图13为本申请根据一个或多个实施例提供的一种应用于FPGA硬件架构的数据处理装置的结构框图;FIG. 13 is a structural block diagram of a data processing device applied to FPGA hardware architecture provided by the present application according to one or more embodiments;
图14为本申请根据一个或多个实施例提供的计算机设备的内部结构图。Fig. 14 is an internal structure diagram of a computer device provided by the present application according to one or more embodiments.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.
本申请提供的一种应用于FPGA硬件架构的数据处理方法,可以应用于如图1所示的FPGA硬件架构中。其中,如图1所示,FPGA硬件架构包括中央处理器、存储器、计算单元处理部件、池化部件以及残差部件、控制器。中央处理器与存储器连接,当接收到待处理的第一图片特征数据时,将第一图片特征数据存储到存储器中。计算单元处理部件主要用于对1x1卷积网络进行处理。计算单元处理部件中设置算术逻辑单元矩阵,可以根据不同硬件资源情况设置算术逻辑单元矩阵的矩阵大小。计算单元处理部件从存储器中读取第一图片特征数据,并通过算术逻辑单元矩阵对第一图片特征数据进行1x1卷积处理,并将卷积处理结果的信息输出到池化部件。池化部件用于对计算单元处理部件输出的信息进行池化处理。残差部件分别与池化部件以及计算单元处理部件连接,可以对池化部件输出的信息进行残差累计处理,也可以对计算单元处理部件输出的信息进行残差累计处理,或者可以对池化部件输出的信息输出的信息和计算单元处理部件输出的信息进行残差累计处理。控制器用于根据行人重识别网络模型控制计算单元处理部件从存储器中读取第一图片特征数据、权重数据及平移参数,以进行1x1卷积处理,以及控制池化部件是否对计算单元处理部件输出的特征数据进行池化处理,以及控制残差部件是否从存储器中读取其他图片特征数据以及是否将输入到残差部件的图片特征数据进行残差处理;其中,行人重识别网络模型的结构块包括依次连接的前向分层连接组、后向分层连接组以及信道尺度选择模块,前向分层连接组以及后向分层连接组中均包含多个第一结构单元,第一结构单元包含依次连接的第一1x1卷积网络、第一批量标准化网络、第一平移网络、第二1x1卷积网络、第二批量标准化网络以及线性激活函数网络,信道尺度选择模块中包含求和单元。因此,控制器中可以依据一种应用于FPGA硬件架构的数据处理方法中的行人重识别网络模型控制其他部件从而实现本申请的一种应用于FPGA硬件架构的数据处理方法。A data processing method applied to FPGA hardware architecture provided by the present application can be applied to FPGA hardware architecture as shown in FIG. 1 . Wherein, as shown in FIG. 1 , the FPGA hardware architecture includes a central processing unit, a memory, a computing unit processing unit, a pooling unit, a residual unit, and a controller. The central processing unit is connected to the memory, and when receiving the first picture characteristic data to be processed, stores the first picture characteristic data in the memory. The computing unit processing unit is mainly used to process the 1x1 convolutional network. The arithmetic logic unit matrix is set in the calculation unit processing part, and the matrix size of the arithmetic logic unit matrix can be set according to different hardware resource conditions. The calculation unit processing part reads the feature data of the first picture from the memory, performs 1x1 convolution processing on the feature data of the first picture through the ALU matrix, and outputs the information of the convolution processing result to the pooling part. The pooling component is used to perform pooling processing on the information output by the computing unit processing component. The residual components are respectively connected to the pooling component and the computing unit processing component, and can perform residual accumulation processing on the information output by the pooling component, or perform residual cumulative processing on the information output by the computing unit processing component, or can perform pooling The information output by the component and the calculation unit process the information output by the component to perform residual accumulation processing. The controller is used to control the computing unit processing unit to read the first picture feature data, weight data and translation parameters from the memory according to the pedestrian re-identification network model, so as to perform 1x1 convolution processing, and control whether the pooling unit outputs to the computing unit processing unit The feature data of the residual component is pooled, and whether the residual component reads other image feature data from the memory and whether to perform residual processing on the image feature data input to the residual component; among them, the structural block of the pedestrian re-identification network model Including the sequentially connected forward hierarchical connectome, backward hierarchical connectome and channel scale selection module, both the forward hierarchical connectome and the backward hierarchical connectome contain a plurality of first structural units, the first structural unit It includes the first 1x1 convolution network, the first batch normalization network, the first translation network, the second 1x1 convolution network, the second batch normalization network and the linear activation function network connected in sequence, and the channel scale selection module includes a summation unit. Therefore, the controller can control other components according to a person re-identification network model applied to the data processing method of the FPGA hardware architecture to implement a data processing method of the present application applied to the FPGA hardware architecture.
本申请的一种应用于FPGA硬件架构的数据处理方法,采用如图1所示的FPGA硬件架构。具体地,通过计算单元处理部件从存储器中读取第一图片特征数据。通过计算单元处理部件中的算术逻辑单元矩阵实现行人重识别网络模型中前向分层连接组以及后向分层连接组中的第一结构单元的相关运算,通过残差部件实现信道尺度选择模块中的求和单元的相关运算。当一种应用于FPGA硬件架构的数据处理方法中,需要池化处理时,还可以采用池化部件对计算单元处理部件输出的信息进行池化处理后,再采用残差部件进行残差累加处理。由此可知,本申请一种应用于FPGA硬件架构的数据处理方法,在使用FPGA硬件架构实现图片数据处理时,能够简化对FPGA硬件架构的硬件要求,实现使用更少的资源实现FPGA硬件架构的网络加速。A data processing method applied to the FPGA hardware architecture of the present application adopts the FPGA hardware architecture as shown in FIG. 1 . Specifically, the first picture feature data is read from the memory by the computing unit processing component. Realize the correlation operation of the first structural unit in the forward hierarchical connection group and the backward hierarchical connection group in the pedestrian re-identification network model through the arithmetic logic unit matrix in the calculation unit processing part, and realize the channel scale selection module through the residual part The correlation operation of the summation unit in . When a data processing method applied to the FPGA hardware architecture requires pooling processing, the pooling component can also be used to perform pooling processing on the information output by the computing unit processing component, and then the residual component can be used to perform residual accumulation processing . It can be seen that the present application is a data processing method applied to the FPGA hardware architecture. When using the FPGA hardware architecture to realize image data processing, it can simplify the hardware requirements for the FPGA hardware architecture and realize the use of fewer resources to realize the FPGA hardware architecture. Network acceleration.
在一个实施例中,如图2所示,提供了一种应用于FPGA硬件架构的数据处理方法,以该方法应用于图1中的FPGA硬件架构为例进行说明,包括以下步骤:In one embodiment, as shown in Figure 2, a kind of data processing method that is applied to FPGA hardware framework is provided, is applied to the FPGA hardware framework in Figure 1 with this method as example to illustrate, comprises the following steps:
S202,获取待处理的第一图片特征数据。S202. Acquire feature data of the first picture to be processed.
本实施例中,一种应用于FPGA硬件架构的数据处理方法,基于FPGA硬件架构的神经网络对图片数据进行加速处理,以快速识别出图片的分类信息。本实施例中,获取的第一图片特征数据为对待处理图片进行特征数据提取后得到的图片特征数据。In this embodiment, a data processing method applied to an FPGA hardware architecture is used to accelerate processing of image data based on a neural network of the FPGA hardware architecture, so as to quickly identify classification information of an image. In this embodiment, the acquired first picture feature data is the picture feature data obtained after feature data extraction of the picture to be processed.
S204,将第一图片特征数据输入基于上下文多尺度特征学习的行人重识别网络模型,得到行人重识别网络模型输出的分类识别信息。S204. Input the feature data of the first picture into the pedestrian re-identification network model based on contextual multi-scale feature learning, and obtain classification and recognition information output by the pedestrian re-identification network model.
其中,行人重识别网络模型的结构块包括依次连接的前向分层连接组、后向分层连接组以及信道尺度选择模块,前向分层连接组以及后向分层连接组中均包含多个第一结构单元,第一结构单元包含依次 连接的第一1x1卷积网络、第一批量标准化网络、第一平移网络、第二1x1卷积网络、第二批量标准化网络以及线性激活函数网络,信道尺度选择模块中包含求和单元。Among them, the structural blocks of the pedestrian re-identification network model include sequentially connected forward hierarchical connectome, backward hierarchical connectome and channel scale selection module. Both forward hierarchical connectome and backward hierarchical connectome contain multiple A first structural unit, the first structural unit includes the first 1x1 convolutional network, the first batch normalization network, the first translation network, the second 1x1 convolutional network, the second batch normalization network and the linear activation function network connected in sequence, A summation unit is included in the channel scale selection module.
前向分层连接组用于通过前向分层连接组的多个第一结构单元对第一图片特征数据进行分步尺度间信息融合,后向分层连接组用于通过后向分层连接组的多个第一结构单元对前向分层连接组中各第一结构单元输出的信息进行跨尺度信息融合,信道尺度选择模块用于通过求和单元对后向分层连接组中各第一结构单元输出的信息进行求和,得到分类识别信息。The forward hierarchical connection group is used to perform step-by-step inter-scale information fusion on the feature data of the first picture through multiple first structural units of the forward hierarchical connection group, and the backward hierarchical connection group is used to The multiple first structural units in the group perform cross-scale information fusion on the information output by each first structural unit in the forward hierarchical connection group, and the channel scale selection module is used to use the summation unit to perform cross-scale information fusion on the information output by each first structural unit in the backward hierarchical connection group. The information output by a structural unit is summed to obtain classification identification information.
本实施例中,采用基于上下文多尺度特征学习的行人重识别网络模型对第一图片特征数据进行处理,以得到第一图片特征数据对应的原始图片的分类识别信息。此处需要说明的是,基于上下文多尺度特征学习的行人重识别网络模型为对现有的基于上下文多尺度特征学习的行人重识别网络模型进行网络模型改进得到。以下对现有的基于上下文多尺度特征学习的行人重识别网络模型进行说明:In this embodiment, the pedestrian re-identification network model based on contextual multi-scale feature learning is used to process the feature data of the first picture to obtain the classification and identification information of the original picture corresponding to the feature data of the first picture. What needs to be explained here is that the pedestrian re-identification network model based on contextual multi-scale feature learning is obtained by improving the network model of the existing pedestrian re-identification network model based on contextual multi-scale feature learning. The following describes the existing pedestrian re-identification network model based on contextual multi-scale feature learning:
行人重识别网络模型,简称为CMSNet(Contextual Multi-Scale Feature Learning for Person Re-Identification),即上下文多尺度网络,用于同时学习公共和上下文多尺度表示。如图3所示,CMSNet的构建块通过双向分层连接组获得上下文多尺度表示,双向分层连接组包括分步尺度间信息融合的前向分层连接组和跨尺度信息融合的后向分层连接组。如图3所示,HFCG表示前向分层连接组,BHCG表示后向分层连接组。此外,如图3所示,CMSNet中还包括CSS(Channel-Wise Scale Selection module,信道尺度选择模块)结构。CSS结构中使用b-softmax、fullconnected以及矩阵乘积等操作。其中,如图3所示,Conv 1x1表示1x1卷积网络,DW Conv的内部模型参见图4所示。图4中,GConv 3x3表示可分离3x3卷积,BN(Batch Normalization)表示批量标准化网络,ReLU表示线性激活函数网络。如图3所示,b 1、b 2、b 3、b 4分别标识对应的DW Conv的输出信息。AvgPod表示求均值节点。 The pedestrian re-identification network model, referred to as CMSNet (Contextual Multi-Scale Feature Learning for Person Re-Identification), is a contextual multi-scale network, which is used to simultaneously learn public and contextual multi-scale representations. As shown in Figure 3, the building blocks of CMSNet obtain contextual multi-scale representations through bidirectional hierarchical connectomes, which include forward hierarchical connectomes for stepwise inter-scale information fusion and backward hierarchical connectomes for cross-scale information fusion. layer connection group. As shown in Figure 3, HFCG stands for Forward Hierarchical Connectgroup and BHCG stands for Backward Hierarchical Connectgroup. In addition, as shown in FIG. 3 , CMSNet also includes a CSS (Channel-Wise Scale Selection module, channel-wise scale selection module) structure. Operations such as b-softmax, fullconnected, and matrix product are used in the CSS structure. Among them, as shown in Figure 3, Conv 1x1 represents a 1x1 convolutional network, and the internal model of DW Conv is shown in Figure 4. In Figure 4, GConv 3x3 represents a separable 3x3 convolution, BN (Batch Normalization) represents a batch normalization network, and ReLU represents a linear activation function network. As shown in FIG. 3 , b 1 , b 2 , b 3 , and b 4 respectively identify the output information of the corresponding DW Conv. AvgPod represents the averaging node.
结合图3和图4可知,现有的基于上下文多尺度特征学习的行人重识别网络模型采用可分离3x3卷积以及在在CSS结构中使用b-softmax、fullconnected、矩阵乘积等操作,若是在设计FPGA硬件架构的硬件时考虑这几个模块的计算方式和资源占用,将大大增加硬件设计的复杂性。Combining Figure 3 and Figure 4, it can be seen that the existing pedestrian re-identification network model based on contextual multi-scale feature learning uses separable 3x3 convolution and uses b-softmax, fullconnected, matrix product and other operations in the CSS structure. The calculation method and resource occupation of these modules are considered when the hardware of the FPGA hardware architecture is used, which will greatly increase the complexity of the hardware design.
本实施中,基于上述现有的基于上下文多尺度特征学习的行人重识别网络模型进行改进。如图5所示,改进后的基于上下文多尺度特征学习的行人重识别网络模型,同样包含前向分层连接组HFCG、后向分层连接组BHCG以及信道尺度选择模块CSS三大模块。然而,改进后的前向分层连接组HFCG和后向分层连接组BHCG包含卷积单元的不再是DW Conv,而是第一结构单元,即CSC单元。如图5所示,CSC单元包含Conv 1x1、BN、Shift以及ReLU。其中,Shift表示平移操作,Conv 1x1表示1x1卷积网络,BN表示批量标准化网络,ReLU表示线性激活函数网络。改进后的信道尺度选择模块CSS中只包含求和单元Sum。In this implementation, improvements are made based on the above-mentioned existing pedestrian re-identification network model based on contextual multi-scale feature learning. As shown in Figure 5, the improved pedestrian re-identification network model based on contextual multi-scale feature learning also includes three modules: forward hierarchical connection group HFCG, backward hierarchical connection group BHCG and channel scale selection module CSS. However, the improved forward hierarchical connection group HFCG and backward hierarchical connection group BHCG contain convolutional units no longer DW Conv, but the first structural unit, the CSC unit. As shown in Figure 5, the CSC unit includes Conv 1x1, BN, Shift, and ReLU. Among them, Shift means translation operation, Conv 1x1 means 1x1 convolutional network, BN means batch normalization network, and ReLU means linear activation function network. The improved channel scale selection module CSS only includes the summation unit Sum.
对比图3和图5可知,本申请中,经过优化后的行人重识别网络模型的网络结构简单,只包含Conv1x1卷积、池化、残差,在FPGA硬件架构的硬件设计时只需要设置几个主要模块,即Conv1x1卷积的计算单元、池化单元、残差单元,由此可知,硬件的结构更加简单,数据也能在硬件模块之间实现最大的流水,使数据在硬件上实现最大化的利用率。Comparing Figure 3 and Figure 5, it can be seen that in this application, the network structure of the optimized pedestrian re-identification network model is simple, including only Conv1x1 convolution, pooling, and residuals, and only a few settings are required in the hardware design of the FPGA hardware architecture. It can be seen that the structure of the hardware is simpler, and the data can also achieve the maximum flow between the hardware modules, so that the data can be maximized on the hardware. utilization rate.
因此,上述FPGA硬件架构及数据处理方法,在处理图片特征数据时,能够简化FPGA硬件架构的硬件设置,协同软硬件,实现使用更少的资源实现FPGA硬件架构的网络加速。Therefore, the above-mentioned FPGA hardware architecture and data processing method can simplify the hardware configuration of the FPGA hardware architecture when processing image feature data, cooperate with software and hardware, and realize network acceleration of the FPGA hardware architecture with fewer resources.
在一个实施例中,行人重识别网络模型还包括3x3卷积网络模块,3x3卷积网络模块与前向分层连 接组连接;3x3卷积网络模块用于采用多个可分离3x3卷积网络对第一图片特征数据进行处理,处理后得到第二图片特征数据;前向分层连接组用于通过多个第一结构单元对第二图片特征数据进行分步尺度间信息融合。In one embodiment, the pedestrian re-identification network model further includes a 3x3 convolutional network module, and the 3x3 convolutional network module is connected to the forward layered connection group; the 3x3 convolutional network module is used to adopt a plurality of separable 3x3 convolutional network pairs The feature data of the first picture is processed, and the feature data of the second picture are obtained after processing; the forward hierarchical connection group is used to fuse the feature data of the second picture step by step through a plurality of first structural units.
该实施例中,对原始图片进行图片特征提取后,得到提取的图片特征数据。现有的行人重识别网络模型采用7x7卷积网络对图片特征数据进行处理。本申请改进后的行人重识别网络模型,采用多个可分离3x3卷积网络对第一图片特征数据进行处理,处理后得到第二图片特征数据。因此,本申请将原有行人重识别网络模型中的7x7卷积网络采用3个可分离3x3卷积网络替换。由于基于上下文多尺度特征学习的行人重识别网络模型中存在有可分离3x3卷积网络,采用多个可分离3x3卷积网络对第一图片特征数据进行处理,再将得到的第二图片特征数据输入行人重识别网络模型,因此行人重识别网络模型内无需处理7x7卷积网络特性的特征数据,只需要关注可分离3x3卷积网络特性的特征数据即可,可减少行人重识别网络模型的处理量,提高行人重识别网络模型对特征数据的处理效率。In this embodiment, the extracted picture feature data is obtained after picture feature extraction is performed on the original picture. The existing pedestrian re-identification network model uses a 7x7 convolutional network to process image feature data. The improved pedestrian re-identification network model of this application uses multiple separable 3x3 convolutional networks to process the feature data of the first picture, and obtain the feature data of the second picture after processing. Therefore, this application replaces the 7x7 convolutional network in the original pedestrian re-identification network model with three separable 3x3 convolutional networks. Since there are separable 3x3 convolutional networks in the pedestrian re-identification network model based on contextual multi-scale feature learning, multiple separable 3x3 convolutional networks are used to process the feature data of the first picture, and then the feature data of the second picture obtained Input the pedestrian re-identification network model, so there is no need to process the characteristic data of the 7x7 convolutional network characteristics in the pedestrian re-identification network model, only need to pay attention to the characteristic data of the separable 3x3 convolutional network characteristics, which can reduce the processing of the pedestrian re-identification network model To improve the processing efficiency of the pedestrian re-identification network model for feature data.
在一个实施例中,行人重识别网络模型还包括平移卷积模块,平移卷积模块与前向分层连接组连接;平移卷积模块用于通过平移操作和1x1卷积网络对第一图片特征数据进行处理,处理后得到第三图片特征数据;前向分层连接组用于通过多个第一结构单元对第三图片特征数据进行分步尺度间信息融合。In one embodiment, the pedestrian re-identification network model further includes a translational convolution module, and the translational convolutional module is connected to the forward layered connection group; the translational convolutional module is used to perform a translation operation and a 1x1 convolutional network on the first picture feature The data is processed to obtain the feature data of the third picture after processing; the forward hierarchical connection group is used to fuse the feature data of the third picture step by step through multiple first structural units.
该实施例中,通过平移操作和1x1卷积网络将第一图片特征数据进行处理。通过平移操作和1x1卷积网络替换原有行人重识别网络模型中的7x7卷积网络,或可以是替换上述实施例中的可分离3x3卷积网络。这样替换能够降低特征图featuremap数据的迁移。比如3x3卷积网络中,每个图片特征数据要使用3次,而1x1卷积网络中图片特征数据只需要搬移一次,因此降低了逻辑复杂性,也提高了行人重识别网络模型的运算速度。比如原有行人重识别网络模型中的7x7卷积网络中,每个图片特征数据要使用7次,而1x1卷积网络中图片特征数据只需要搬移一次,因此降低了逻辑复杂性,也提高了行人重识别网络模型的运算速度。其中,平移操作是将图片特征数据的某个范围的像素移动到中间作为结果,这样的操作减少了乘法运算次数。这种替换会导致精度降低,但是可以减少FPGA硬件架构的运算次数,简化FPGA硬件架构的网络结构设计。以下对平移操作和1x1卷积网络进行详细说明:In this embodiment, the feature data of the first picture is processed through a translation operation and a 1x1 convolutional network. The 7x7 convolutional network in the original pedestrian re-identification network model is replaced by the translation operation and the 1x1 convolutional network, or the separable 3x3 convolutional network in the above-mentioned embodiment can be replaced. This replacement can reduce the migration of feature map featuremap data. For example, in a 3x3 convolutional network, each image feature data needs to be used three times, while in a 1x1 convolutional network, the image feature data only needs to be moved once, thus reducing the logic complexity and improving the calculation speed of the pedestrian re-identification network model. For example, in the 7x7 convolutional network in the original pedestrian re-identification network model, each image feature data needs to be used 7 times, while the image feature data in the 1x1 convolutional network only needs to be moved once, thus reducing the logic complexity and improving the efficiency. The computational speed of the pedestrian re-identification network model. Among them, the translation operation is to move a certain range of pixels of the image feature data to the middle as a result, and such an operation reduces the number of multiplication operations. This replacement will result in reduced precision, but can reduce the number of operations of the FPGA hardware architecture and simplify the network structure design of the FPGA hardware architecture. The translation operation and 1x1 convolutional network are described in detail below:
平移操作的卷积过程相当于将原输入的矩阵在某个方向进行平移,如图6所示。虽然简单的平移操作似乎没有提取到空间信息,但是考虑到通道域是空间域信息的层次化扩散,因此通过设置不同方向的平移操作的卷积核,可以将输入的图片特征数据(比如可分离3x3卷积网络输出的第二图片特征数据)的张量在不同通道进行平移,随后配合1x1卷积网络实现跨通道的信息融合,即可实现空间域和通道域的信息提取。The convolution process of the translation operation is equivalent to translating the original input matrix in a certain direction, as shown in Figure 6. Although the simple translation operation does not seem to extract spatial information, considering that the channel domain is the hierarchical diffusion of spatial domain information, by setting the convolution kernels of translation operations in different directions, the input image feature data (such as separable The tensor of the second picture feature data output by the 3x3 convolutional network is translated in different channels, and then combined with the 1x1 convolutional network to achieve cross-channel information fusion, the information extraction in the spatial domain and the channel domain can be realized.
一个平移操作的卷积核在每一个通道channel上有
Figure PCTCN2022095365-appb-000001
(
Figure PCTCN2022095365-appb-000002
表示整数)种可能的平移方向,再假设有M个通道channel,所以共有
Figure PCTCN2022095365-appb-000003
种可能的平移选择。显然,在这样的空间里暴力搜索出最适合的平移选择是不现实的。所以,把这M个通道channel分成
Figure PCTCN2022095365-appb-000004
组,将每个组称之为平移组(shift group)。那么每组的
Figure PCTCN2022095365-appb-000005
个通道channel使用相同的平移选择,采用相同的平移方向。当然,有可能存在除不尽的情况,这个时候将存在一些通道不能被划分到任意一个组内,这些通道称之为“居中”组,“居中”组不做平移操作。对输入按通道数进行分组,每一组通道只往一个方向平移。
A convolution kernel for translation operation on each channel channel
Figure PCTCN2022095365-appb-000001
(
Figure PCTCN2022095365-appb-000002
represents an integer) possible translation directions, and assuming that there are M channels, so there are
Figure PCTCN2022095365-appb-000003
possible translation options. Obviously, it is unrealistic to violently search for the most suitable translation option in such a space. So, divide the M channels into
Figure PCTCN2022095365-appb-000004
Each group is called a shift group. then each set of
Figure PCTCN2022095365-appb-000005
Channels use the same translation selection, using the same translation direction. Of course, there may be inexhaustible situations. At this time, there will be some channels that cannot be divided into any group. These channels are called "centered" groups, and the "centered" group does not perform translation operations. The input is grouped by the number of channels, and each group of channels only translates in one direction.
比如卷积核是3x3的,然后一共有64个通道channel,按照上面的做法,把这些通道channel分成 9组,每组7个通道channel,剩下的一个通道channel不做平移操作。这9组需要依次按着顺序赋值某个平移组,每一组的7个通道channel共用一个平移参数。可分离中的3x3卷积网络使用平移操作和1x1卷积网络代替,平移操作采用平均分组,每组按照顺序赋值到一个平移组,平移的x轴和y轴的dx、dy控制在[-1,1]范围内,简单说明如图7所示。考虑到Conv1x1与平移操作shift可以进行合并,为了减少FPGA硬件架构的硬件上平移操作shift单个模块,硬件的设计可以更加的精简,平移操作shift卷积核的dx、dy位置标识如图8所示。For example, the convolution kernel is 3x3, and then there are 64 channels in total. According to the above method, these channels are divided into 9 groups, each group has 7 channels, and the remaining channel does not perform translation operations. These 9 groups need to be assigned to a certain translation group in sequence, and the 7 channels of each group share a translation parameter. The 3x3 convolutional network in separability is replaced by a translation operation and a 1x1 convolutional network. The translation operation uses average grouping, and each group is assigned to a translation group in order. The dx and dy of the x-axis and y-axis of the translation are controlled at [-1 ,1] range, a simple description is shown in Figure 7. Considering that Conv1x1 and the translation operation shift can be combined, in order to reduce the single module of the translation operation shift on the FPGA hardware architecture, the hardware design can be more streamlined, and the dx and dy position identification of the translation operation shift convolution kernel are shown in Figure 8 .
如图9所示,如果平移操作的卷积核kernel=5,则产生5x5=25种偏移,dx、dy在[-3,3]范围内,如此共用一组偏移参数的通道就会变少,可能的结果也较多,行人重识别网络模型的网络的性能也能进一步得到提升。As shown in Figure 9, if the convolution kernel of the translation operation kernel=5, 5x5=25 kinds of offsets will be generated, and dx and dy are in the range of [-3,3], so the channels that share a set of offset parameters will be There are fewer possible results, and the performance of the network of the person re-identification network model can be further improved.
如果平移操作的卷积核kernel=9,就会产生9x9=81种偏移参数,dx、dy在[-4,4]范围内,在作用域内保留合适的[dx,dy]坐标标值,在这些偏移参数中采用高斯Gaussian滤波算法,如下公式:If the convolution kernel of the translation operation kernel=9, 9x9=81 kinds of offset parameters will be generated, dx and dy are in the range of [-4,4], and the appropriate [dx,dy] coordinates are reserved in the scope. The Gaussian filter algorithm is used in these offset parameters, as follows:
Figure PCTCN2022095365-appb-000006
Figure PCTCN2022095365-appb-000006
做归一化处理,产生每个位置的概率值,采用numpy.random.choice函数为每个通道生成随机的偏移位置[dx,dy],产生平移位置偏移种类。设receptive_field_radius=4.25,满足条件的偏移参数坐标有61个,如图10所示。Do normalization processing to generate the probability value of each position, use the numpy.random.choice function to generate a random offset position [dx, dy] for each channel, and generate a translation position offset type. Set receptive_field_radius=4.25, and there are 61 offset parameter coordinates that meet the conditions, as shown in Figure 10 .
在一个实施例中,上述平移卷积模块包含与第一结构单元结构相同的第二结构单元,平移卷积模块用于通过一个或依次连接的多个第二结构单元对第一图片特征数据进行处理,处理后得到第三图片特征数据。In one embodiment, the above-mentioned translational convolution module includes a second structural unit having the same structure as the first structural unit, and the translational convolution module is used to process the feature data of the first picture through one or a plurality of second structural units connected in sequence Processing, after processing, the feature data of the third picture is obtained.
其中,平移卷积模块中还包含池化单元,所述池化单元位于所述依次连接的多个所述第二结构单元中第一个所述第二结构单元和第二个所述第二结构单元之间。Wherein, the translational convolution module further includes a pooling unit, and the pooling unit is located in the first one of the second structural units and the second one of the second structural units among the sequentially connected multiple second structural units. between structural units.
该实施例中,上述平移卷积模块用于通过平移操作和1x1卷积网络对第一图片特征数据进行处理。可以采用第一结构单元结构相同的第二结构单元对第一图片特征数据进行处理。第二结构单元中包含如第一结构单元中的依次连接的第一1x1卷积网络、第一批量标准化网络、第一平移网络、第二1x1卷积网络、第二批量标准化网络以及线性激活函数网络。通过第二结构单元可实现通过平移操作和1x1卷积网络对第一图片特征数据进行处理。此外,本申请基于现有的行人重识别网络模型进行改进,若现有的行人重识别网络模型中第一层7x7卷积网络的卷积步长为2时,本申请改进的行人重识别网络模型中,在依次连接的多个第二结构单元中第一个第二结构单元和第二个第二结构单元之间添加池化单元。池化单元可以是做最大池化处理的单元,也可以是做平均池化处理的单元。因此,为保持整个网络的中间特征图大小与原网络一致,在第一个CSC结构后添加一层最大池化层。具体如下表格所示。In this embodiment, the above-mentioned translational convolution module is used to process the feature data of the first picture through a translation operation and a 1x1 convolutional network. The first picture feature data may be processed by using a second structural unit having the same structure as the first structural unit. The second structural unit includes the sequentially connected first 1x1 convolutional network, the first batch normalization network, the first translation network, the second 1x1 convolutional network, the second batch normalization network, and a linear activation function as in the first structural unit network. The first picture feature data can be processed through the translation operation and the 1×1 convolutional network through the second structural unit. In addition, this application improves on the existing pedestrian re-identification network model. If the convolution step size of the first layer of 7x7 convolutional network in the existing pedestrian re-identification network model is 2, the improved pedestrian re-identification network of this application In the model, a pooling unit is added between the first second structural unit and the second second structural unit among the sequentially connected multiple second structural units. The pooling unit can be a unit for maximum pooling or an average pooling unit. Therefore, in order to keep the size of the intermediate feature map of the entire network consistent with the original network, a maximum pooling layer is added after the first CSC structure. The details are shown in the table below.
具体地,本申请为了兼容算法结构和硬件的特性对CMSNet网络结构进行优化,在网络性能合理的范围内,最大化软硬件的性价比。优化后的网络结构如下表:Specifically, this application optimizes the CMSNet network structure in order to be compatible with the algorithm structure and hardware characteristics, and maximizes the cost performance of software and hardware within a reasonable range of network performance. The optimized network structure is as follows:
Figure PCTCN2022095365-appb-000007
Figure PCTCN2022095365-appb-000007
Figure PCTCN2022095365-appb-000008
Figure PCTCN2022095365-appb-000008
上述表格中,Input表示输入,Layer表示CMSNet网络的网络层级,Cout表示输出的通道数,Kernel表示卷积核,Stride表示卷积步长,Number表示重复次数,Params表示参数,Flops表示每秒浮点运算次数;CSC表示上述CSC第一结构单元,max pool表示最大池化,CMS block表示图5中整个的网络模块,conv2d表示二维卷积,average pool表示平均池化,global average pool表示全局平均池化,fc表示全连接。In the above table, Input represents the input, Layer represents the network level of the CMSNet network, Cout represents the number of output channels, Kernel represents the convolution kernel, Stride represents the convolution step size, Number represents the number of repetitions, Params represents the parameter, and Flops represents the number of floats per second. The number of point operations; CSC represents the first structural unit of the above CSC, max pool represents the maximum pooling, CMS block represents the entire network module in Figure 5, conv2d represents two-dimensional convolution, average pool represents average pooling, and global average pool represents the global Average pooling, fc means full connection.
由上表格可知,优化有的CMSNet网络结构,除了采用CSC卷积单元之外,对CMS Block结构也进行了优化。其中,CMS Block结构如图5所示。对比图3和图4可知,优化后的CMS Block结构中DWConv替换成CSC结构,CSC结构由Conv1x1+Shift+Conv1x1+BN+Relu组成,训练完CMSNet网络结构后会将BN融合到Conv1x1,并固化到模型文件中。As can be seen from the above table, to optimize some CMSNet network structures, in addition to using CSC convolution units, the CMS Block structure is also optimized. Among them, the CMS Block structure is shown in Figure 5. Comparing Figure 3 and Figure 4, it can be seen that in the optimized CMS Block structure, DWConv is replaced with a CSC structure, and the CSC structure is composed of Conv1x1+Shift+Conv1x1+BN+Relu. After training the CMSNet network structure, BN will be fused to Conv1x1 and solidified into the model file.
由上表格可知,优化后的网络结构中包含多个第一结构单元,当第二图片特征数据输入到行人重识别网络模型时,如上表格可知,通过行人重识别网络模型中依次的多个第一结构单元对第二图片特征数据进行处理。It can be seen from the above table that the optimized network structure contains multiple first structural units. When the second picture feature data is input to the pedestrian re-identification network model, as shown in the above table, through the sequential multiple first structural units in the pedestrian re-identification network model A structural unit processes the feature data of the second picture.
在一个实施例中,上述第一图片特征数据为采用任意比特量化网络DoReFa-Net对原始图片数据处理后得到的特征数据。In one embodiment, the above-mentioned first picture feature data is feature data obtained after processing the original picture data by using an arbitrary bit quantization network DoReFa-Net.
网路优化后的CMSNet网络中的权重参数只有Conv1x1参数,结构统一,更有利于对网络结构进行量化。该实施例中,为了进一步降低网络参数量,采用了任意比特量化网络DoReFa-Net的量化算法,对CMSNet网络中的全精度权重进行了量化。其中,包括采用任意比特量化网络对原始图片数据得到第一图片特征数据,还可以包括采用任意比特量化网络对第一图片特征数据进行量化处理后再输入到行人重识别网络模型、采用任意比特量化网络对第二图片特征数据进行量化处理后再输入到行人重识别网络模型以及采用任意比特量化网络对第三图片特征数据进行量化处理后再输入到行人重识别网络模型。具体地,采用任意比特量化网络DoReFa-Net的量化算法对输入到行人重识别网络模型进行卷积处理的图片特征数据进行量化处理。The weight parameters in the network-optimized CMSNet network are only Conv1x1 parameters, and the structure is unified, which is more conducive to quantifying the network structure. In this embodiment, in order to further reduce the amount of network parameters, the quantization algorithm of the arbitrary bit quantization network DoReFa-Net is used to quantize the full-precision weights in the CMSNet network. Among them, it includes using an arbitrary bit quantization network to obtain the first image feature data from the original image data, and may also include using an arbitrary bit quantization network to quantize the first image feature data before inputting it into the pedestrian re-identification network model, and using an arbitrary bit quantization The network quantizes the feature data of the second picture before inputting it to the pedestrian re-identification network model, and uses an arbitrary bit quantization network to quantify the feature data of the third picture before inputting it to the pedestrian re-identification network model. Specifically, the quantization algorithm of the arbitrary bit quantization network DoReFa-Net is used to quantize the picture feature data input to the pedestrian re-identification network model for convolution processing.
该实施例可以使用8位的定点表示进行量化处理,如图11所示,在量化到k=8位之前,先使用双曲正切函数tanh将权重限制在[-1,1]之间。通过In this embodiment, 8-bit fixed-point representation can be used for quantization processing, as shown in FIG. 11 , before quantization to k=8 bits, the hyperbolic tangent function tanh is used to limit the weight between [-1, 1]. pass
Figure PCTCN2022095365-appb-000009
Figure PCTCN2022095365-appb-000009
将数值约束在[0,1]之间,最大值是相对于整个层的权重而言。然后通过:Constrain the value between [0, 1], and the maximum value is relative to the weight of the entire layer. Then pass:
Figure PCTCN2022095365-appb-000010
Figure PCTCN2022095365-appb-000010
将浮点数转换位k=8位定点数,范围在[0,1],最后通过x0=2q-1映射变换将权重约束到[-1,1]。The floating-point number is converted to k=8-bit fixed-point number, and the range is [0, 1], and finally the weight is constrained to [-1, 1] through x0=2q-1 mapping transformation.
本申请的一种应用于FPGA硬件架构的数据处理方法,从硬件和网络的协同设计方面,将行人重识别网络模型中的神经网络加速硬件设计和网络压缩分开,并在网络压缩的时候尽可能考虑到硬件的特点,让网络模型更加适合硬件架构。本申请在基于最新的行人重识别网络CMSNet网络来设计一个能够适用于FPGA硬件架构的网络,对于行人重识别网络CMSNet的网络结构中的7x7卷积、b-softmax、fully-connected layer等操作在硬件上不易实现,而且在实际使用频率较低的操作进行替换和删减,优化整个网络结构使其保留大部分精度的情况下,更好的有利于硬件的实现,加快网络的推理速度。具体地,上述实施例给出的一种应用于FPGA硬件架构的数据处理方法有以下几方面的效果:A data processing method applied to the FPGA hardware architecture of the present application, from the aspect of collaborative design of hardware and network, separates the neural network acceleration hardware design and network compression in the pedestrian re-identification network model, and compresses the network as much as possible Considering the characteristics of the hardware, the network model is more suitable for the hardware architecture. This application is based on the latest pedestrian re-identification network CMSNet network to design a network that can be applied to the FPGA hardware architecture. For the 7x7 convolution, b-softmax, fully-connected layer and other operations in the network structure of the pedestrian re-identification network CMSNet It is not easy to implement on hardware, and in the case of replacing and deleting operations with low frequency and optimizing the entire network structure to retain most of the accuracy, it is more conducive to hardware implementation and speeds up network reasoning. Specifically, a kind of data processing method that is applied to the FPGA hardware framework that the above-mentioned embodiment provides has the effect of the following aspects:
1、针对最新的轻量化的行人重识别网络CMSNet,应用软硬件协同设计的思想,使用3个可分离3x3卷积替换普通7x7卷积,减少参数量,去掉使用频率较低的普通7x7卷积,有利于硬件的实现;再使用平移操作和1x1卷积网络(Shift+Conv1x1)替换整个网络的可分离中的3x3卷积网络;去掉CMS Block中的CSS结构,只保留SUM操作。因此,优化后的行人重识别网络CMSNet只保留1x1卷积和其他操作,大大简化了硬件设计,能够使用更少的资源实现网络加速。1. For the latest lightweight pedestrian re-identification network CMSNet, apply the idea of software and hardware co-design, use three separable 3x3 convolutions to replace ordinary 7x7 convolutions, reduce the amount of parameters, and remove ordinary 7x7 convolutions that are less frequently used , which is conducive to the realization of hardware; then use the translation operation and 1x1 convolutional network (Shift+Conv1x1) to replace the separable 3x3 convolutional network of the entire network; remove the CSS structure in the CMS Block, and only keep the SUM operation. Therefore, the optimized pedestrian re-identification network CMSNet only retains 1x1 convolution and other operations, which greatly simplifies the hardware design and can achieve network acceleration with fewer resources.
2、Conv1x1+Shift操作融合,使得硬件不需要单独设计平移操作Shift模块,简化硬件设计平移操作Shift模块的资源消耗,减少数据在各个模块之间的传输。2. The integration of Conv1x1+Shift operation makes it unnecessary for the hardware to design a separate translation operation Shift module, simplifies the resource consumption of the hardware design translation operation Shift module, and reduces the transmission of data between modules.
3、针对优化后的行人重识别网络的网络结构,只包含Conv1x1卷积、池化、残差,在硬件设计时几个主要模块就是Conv1x1卷积的计算单元、池化单元、残差单元,硬件的结构更加简单,数据也能在硬件模块之间实现最大的流水,使数据在硬件上实现最大化的利用率。3. For the network structure of the optimized pedestrian re-identification network, it only includes Conv1x1 convolution, pooling, and residual. In hardware design, several main modules are the calculation unit, pooling unit, and residual unit of Conv1x1 convolution. The structure of the hardware is simpler, and the data can also achieve the maximum flow between the hardware modules, so that the data can be maximized on the hardware.
应该理解的是,虽然流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flow chart are displayed sequentially according to the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in the flowchart may include multiple sub-steps or multiple stages, these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, the execution of these sub-steps or stages The order is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
在一个实施例中,本申请还提供一种FPGA硬件架构,如图1所示,FPGA硬件架构包括中央处理器、存储器、计算单元处理部件、池化部件以及残差部件以及控制器。中央处理器用于接收待处理的图片特征数据,并将图片特征数据存储到存储器;计算单元处理部件中设置有一个或多个算术逻辑单元矩阵,计算单元处理部件用于从存储器中读取图片特征数据,并通过算术逻辑单元矩阵对图片特征数据进行1x1卷积处理;池化部件用于对计算单元处理部件输出的图片特征数据进行池化处理;残差部件用于对池化部件输出的图片特征数据和/或计算单元处理部件输出的图片特征数据进行残差累加处理;控制器用于根据行人重识别网络模型控制计算单元处理部件从存储器中读取图片特征数据、权重数据及平移参 数,以进行1x1卷积处理,以及控制池化部件是否对计算单元处理部件输出的特征数据进行池化处理,以及控制残差部件是否从存储器中读取图片特征数据以及是否将输入到残差部件的图片特征数据进行残差处理;其中,行人重识别网络模型的结构块包括依次连接的前向分层连接组、后向分层连接组以及信道尺度选择模块,前向分层连接组以及后向分层连接组中均包含多个第一结构单元,第一结构单元包含依次连接的第一1x1卷积网络、第一批量标准化网络、第一平移网络、第二1x1卷积网络、第二批量标准化网络以及线性激活函数网络,信道尺度选择模块中包含求和单元。In one embodiment, the present application also provides an FPGA hardware architecture. As shown in FIG. 1 , the FPGA hardware architecture includes a central processing unit, a memory, a computing unit processing unit, a pooling unit, a residual unit, and a controller. The central processing unit is used to receive the picture feature data to be processed, and store the picture feature data into the memory; one or more arithmetic logic unit matrices are arranged in the computing unit processing part, and the computing unit processing part is used to read the picture feature data from the memory Data, and perform 1x1 convolution processing on the image feature data through the ALU matrix; the pooling component is used to perform pooling processing on the image feature data output by the computing unit processing component; the residual component is used for the image output by the pooling component The feature data and/or the picture feature data output by the calculation unit processing part are processed for residual accumulation; the controller is used to control the calculation unit processing part to read the picture feature data, weight data and translation parameters from the memory according to the pedestrian re-identification network model, to Perform 1x1 convolution processing, and control whether the pooling component performs pooling processing on the feature data output by the computing unit processing component, and control whether the residual component reads the image feature data from the memory and whether the image input to the residual component Residual processing is performed on the feature data; among them, the structural block of the pedestrian re-identification network model includes the forward hierarchical connection group, the backward hierarchical connection group and the channel scale selection module connected in sequence, the forward hierarchical connection group and the backward classification Each layer connection group contains a plurality of first structural units, and the first structural unit includes the first 1x1 convolutional network, the first batch normalization network, the first translation network, the second 1x1 convolutional network, and the second batch normalization Network and linear activation function network, the channel scale selection module contains a summation unit.
在其他实施例中,行人重识别网络模型还可包括上述一种应用于FPGA硬件架构的数据处理方法对应的各个实施例所述的模块或单元,具体可参见上述各个实施例的说明。In other embodiments, the pedestrian re-identification network model may further include the modules or units described in the embodiments corresponding to the above-mentioned data processing method applied to the FPGA hardware architecture. For details, please refer to the descriptions of the above-mentioned embodiments.
具体地,给出一种FPGA硬件架构的具体硬件结构图,如图12所示。其中,图12中,CPU表示中央处理器,DDR/DRAM表示双倍速率同步动态随机存储器/动态随机存取存储器。该FPGA硬件架构的具体硬件结构图执行上述一种应用于FPGA硬件架构的数据处理方法时,处理流程如下:Specifically, a specific hardware structure diagram of an FPGA hardware architecture is given, as shown in FIG. 12 . Wherein, in FIG. 12, CPU represents a central processing unit, and DDR/DRAM represents a double-rate synchronous dynamic random access memory/dynamic random access memory. When the specific hardware structure diagram of this FPGA hardware architecture executes the above-mentioned data processing method applied to the FPGA hardware architecture, the processing flow is as follows:
中央处理器接收待处理的图片特征数据,并将图片特征数据存储到双倍速率同步动态随机存储器/动态随机存取存储器。计算单元处理部件中设置有一个或多个算术逻辑单元矩阵,计算单元处理部件从双倍速率同步动态随机存储器/动态随机存取存储器中读取图片特征数据,并通过算术逻辑单元矩阵对图片特征数据进行1x1卷积处理。池化部件对计算单元处理部件输出的图片特征数据进行池化处理。残差部件对池化部件输出的图片特征数据和/或计算单元处理部件输出的图片特征数据进行残差累加处理。控制器根据行人重识别网络模型控制计算单元处理部件从双倍速率同步动态随机存储器/动态随机存取存储器中读取图片特征数据、权重数据及平移参数,以进行1x1卷积处理,以及控制池化部件是否对计算单元处理部件输出的特征数据进行池化处理,以及控制残差部件是否从双倍速率同步动态随机存储器/动态随机存取存储器中读取图片特征数据以及是否将输入到残差部件的图片特征数据进行残差处理。其中,行人重识别网络模型的结构块包括依次连接的前向分层连接组、后向分层连接组以及信道尺度选择模块,前向分层连接组以及后向分层连接组中均包含多个第一结构单元,第一结构单元包含依次连接的第一1x1卷积网络、第一批量标准化网络、第一平移网络、第二1x1卷积网络、第二批量标准化网络以及线性激活函数网络,信道尺度选择模块中包含求和单元。The central processing unit receives the image characteristic data to be processed, and stores the image characteristic data in a double-rate synchronous dynamic random access memory/dynamic random access memory. One or more arithmetic logic unit matrices are arranged in the calculation unit processing part, and the calculation unit processing part reads the picture feature data from the double-rate synchronous dynamic random access memory/dynamic random access memory, and uses the arithmetic logic unit matrix to compare the picture feature data The data is processed by 1x1 convolution. The pooling component performs pooling processing on the image feature data output by the computing unit processing component. The residual component performs residual accumulation processing on the picture feature data output by the pooling component and/or the picture feature data output by the calculation unit processing component. According to the pedestrian re-identification network model, the controller controls the processing unit of the calculation unit to read the picture feature data, weight data and translation parameters from the double-rate synchronous DRAM/DRAM to perform 1x1 convolution processing, and control the pool Whether the optimization component performs pooling processing on the feature data output by the computing unit processing component, and whether the control residual component reads the image feature data from the double-rate synchronous dynamic random access memory/dynamic random access memory and whether to input it to the residual The image feature data of the part is subjected to residual processing. Among them, the structural blocks of the pedestrian re-identification network model include sequentially connected forward hierarchical connectome, backward hierarchical connectome and channel scale selection module. Both forward hierarchical connectome and backward hierarchical connectome contain multiple A first structural unit, the first structural unit includes the first 1x1 convolutional network, the first batch normalization network, the first translation network, the second 1x1 convolutional network, the second batch normalization network and the linear activation function network connected in sequence, A summation unit is included in the channel scale selection module.
由此可知,基于上述一种应用于FPGA硬件架构的数据处理方法,在使用FPGA硬件架构的神经网络加速处理图片特征数据时,对FPGA硬件架构的硬件配置要求简单,进而能够使用更少的资源实现FPGA硬件架构的网络加速处理It can be seen that based on the above-mentioned data processing method applied to the FPGA hardware architecture, when using the neural network of the FPGA hardware architecture to accelerate the processing of image feature data, the hardware configuration requirements for the FPGA hardware architecture are simple, and less resources can be used. Implementing Network Accelerated Processing for FPGA Hardware Architecture
在一个实施例中,本申请还提供一种应用于FPGA硬件架构的数据处理装置。如图13所示,提供了一种应用于FPGA硬件架构的数据处理装置,包括获取模块1302以及处理模块1304。获取模块1302,用于获取待处理的第一图片特征数据;处理模块1304,用于将第一图片特征数据输入基于上下文多尺度特征学习的行人重识别网络模型,得到行人重识别网络模型输出的分类识别信息;其中,行人重识别网络模型的结构块包括依次连接的前向分层连接组、后向分层连接组以及信道尺度选择模块,前向分层连接组以及后向分层连接组中均包含多个第一结构单元,第一结构单元包含依次连接的第一1x1卷积网络、第一批量标准化网络、第一平移网络、第二1x1卷积网络、第二批量标准化网络以及线性激活函数网络,信道尺度选择模块中包含求和单元。In one embodiment, the present application also provides a data processing device applied to FPGA hardware architecture. As shown in FIG. 13 , a data processing device applied to FPGA hardware architecture is provided, including an acquisition module 1302 and a processing module 1304 . The acquiring module 1302 is used to acquire the feature data of the first picture to be processed; the processing module 1304 is used to input the feature data of the first picture into the pedestrian re-identification network model based on contextual multi-scale feature learning, and obtain the output of the pedestrian re-identification network model Classification identification information; among them, the structural blocks of the pedestrian re-identification network model include sequentially connected forward hierarchical connection group, backward hierarchical connection group and channel scale selection module, forward hierarchical connection group and backward hierarchical connection group Each contains a plurality of first structural units, the first structural unit includes the first 1x1 convolutional network, the first batch normalization network, the first translation network, the second 1x1 convolutional network, the second batch normalization network and the linear Activation function network, the channel scale selection module contains a summation unit.
在其中一个实施例中,前向分层连接组用于通过前向分层连接组的多个第一结构单元对第一图片特征数据进行分步尺度间信息融合,后向分层连接组用于通过后向分层连接组的多个第一结构单元对前向 分层连接组中各第一结构单元输出的信息进行跨尺度信息融合,信道尺度选择模块用于通过求和单元对后向分层连接组中各第一结构单元输出的信息进行求和,得到分类识别信息。In one of the embodiments, the forward hierarchical connectome is used to perform step-by-step inter-scale information fusion on the feature data of the first picture through multiple first structural units of the forward hierarchical connectome, and the backward hierarchical connectome is used to In order to perform cross-scale information fusion on the information output by each first structural unit in the forward hierarchical connection group through the multiple first structural units of the backward hierarchical connection group, the channel scale selection module is used to perform the cross-scale information fusion through the summation unit The information output by each first structural unit in the hierarchical connection group is summed to obtain classification identification information.
在其中一个实施例中,行人重识别网络模型还包括3x3卷积网络模块,3x3卷积网络模块与前向分层连接组连接;3x3卷积网络模块用于采用多个可分离3x3卷积网络对第一图片特征数据进行处理,处理后得到第二图片特征数据;前向分层连接组用于通过多个第一结构单元对第二图片特征数据进行分步尺度间信息融合。In one of the embodiments, the pedestrian re-identification network model further includes a 3x3 convolutional network module, and the 3x3 convolutional network module is connected to the forward layered connection group; the 3x3 convolutional network module is used to adopt multiple separable 3x3 convolutional networks The feature data of the first picture is processed to obtain the feature data of the second picture; the forward hierarchical connection group is used to fuse information between the scales of the second picture feature data step by step through a plurality of first structural units.
在其中一个实施例中,行人重识别网络模型还包括平移卷积模块,平移卷积模块与前向分层连接组连接;平移卷积模块用于通过平移操作和1x1卷积网络对第一图片特征数据进行处理,处理后得到第三图片特征数据;前向分层连接组用于通过多个第一结构单元对第三图片特征数据进行分步尺度间信息融合。In one of the embodiments, the pedestrian re-identification network model also includes a translation convolution module, and the translation convolution module is connected to the forward layered connection group; The feature data is processed to obtain the feature data of the third picture; the forward hierarchical connection group is used to fuse the feature data of the third picture step by step through multiple first structural units.
在其中一个实施例中,平移卷积模块包含与第一结构单元结构相同的第二结构单元,平移卷积模块用于通过一个或依次连接的多个第二结构单元对第一图片特征数据进行处理,处理后得到第三图片特征数据。In one of the embodiments, the translational convolution module includes a second structural unit with the same structure as the first structural unit, and the translational convolution module is used to perform the first picture feature data through one or a plurality of second structural units connected in sequence Processing, after processing, the feature data of the third picture is obtained.
在其中一个实施例中,平移卷积模块中还包含池化单元,池化单元位于依次连接的多个第二结构单元中第一个第二结构单元和第二个第二结构单元之间。In one embodiment, the translational convolution module further includes a pooling unit, and the pooling unit is located between the first second structural unit and the second second structural unit among the plurality of sequentially connected second structural units.
在其中一个实施例中,第一图片特征数据为采用任意比特量化网络DoReFa-Net的量化算法对原始图片数据处理后得到的特征数据。In one embodiment, the feature data of the first picture is the feature data obtained after processing the original picture data with the quantization algorithm of the arbitrary bit quantization network DoReFa-Net.
关于一种应用于FPGA硬件架构的数据处理装置的具体限定可以参见上文中对于一种应用于FPGA硬件架构的数据处理方法的限定,在此不再赘述。上述一种应用于FPGA硬件架构的数据处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For a specific definition of a data processing device applied to an FPGA hardware architecture, reference may be made to the above definition of a data processing method applied to an FPGA hardware architecture, which will not be repeated here. Each module in the above-mentioned data processing device applied to the FPGA hardware architecture can be fully or partially realized by software, hardware and combinations thereof. The above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can invoke and execute the corresponding operations of the above-mentioned modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图14所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的网络接口用于与外部设备连接,以接收外部设备的信息。该计算机可读指令被处理器执行时以实现一种应用于FPGA硬件架构的数据处理方法。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure may be as shown in FIG. 14 . The computer device includes a processor, memory, network interface and database connected by a system bus. Wherein, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions and a database. The internal memory provides an environment for the execution of the operating system and computer readable instructions in the non-volatile storage medium. The network interface of the computer equipment is used to connect with external equipment, so as to receive the information of external equipment. When the computer-readable instructions are executed by the processor, a data processing method applied to FPGA hardware architecture is implemented.
本领域技术人员可以理解,图14中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in Figure 14 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer equipment on which the solution of this application is applied. The specific computer equipment can be More or fewer components than shown in the figures may be included, or some components may be combined, or have a different arrangement of components.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令可实现上述任意一个实施例的应用于FPGA硬件架构的数据处理方法的步骤。In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored in the memory and operable on the processor. The processor executes the computer-readable instructions to implement any of the above-mentioned embodiments. The steps of the data processing method applied to the FPGA hardware architecture.
在一个实施例中,还提供了一种非易失性计算机可读存储介质,该非易失性计算机可读存储介质中存储有计算机可读指令,该计算机可读指令被一个或多个处理器执行时可实现上述任意一个实施例的应 用于FPGA硬件架构的数据处理方法的步骤。In one embodiment, there is also provided a non-volatile computer-readable storage medium, the non-volatile computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions are processed by one or more The steps of the data processing method applied to the FPGA hardware architecture in any one of the above embodiments can be implemented when the device is executed.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be completed by instructing related hardware through computer-readable instructions, and the computer-readable instructions can be stored in a non-volatile computer In the readable storage medium, the computer-readable instructions may include the processes of the embodiments of the above-mentioned methods when executed. Wherein, any references to memory, storage, database or other media used in the various embodiments provided in the present application may include non-volatile and/or volatile memory. Nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. To make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered to be within the range described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several implementation modes of the present application, and the description thereof is relatively specific and detailed, but it should not be construed as limiting the scope of the patent for the invention. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present application, and these all belong to the protection scope of the present application. Therefore, the scope of protection of the patent application should be based on the appended claims.

Claims (11)

  1. 一种应用于FPGA硬件架构的数据处理方法,其特征在于,所述方法包括:A data processing method applied to FPGA hardware architecture, characterized in that said method comprises:
    获取待处理的第一图片特征数据;和Acquire the feature data of the first picture to be processed; and
    将所述第一图片特征数据输入基于上下文多尺度特征学习的行人重识别网络模型,得到所述行人重识别网络模型输出的分类识别信息;Inputting the first picture feature data into a pedestrian re-identification network model based on contextual multi-scale feature learning, to obtain classification and identification information output by the pedestrian re-identification network model;
    其中,所述行人重识别网络模型的结构块包括依次连接的前向分层连接组、后向分层连接组以及信道尺度选择模块,所述前向分层连接组以及所述后向分层连接组中均包含多个第一结构单元,所述第一结构单元包含依次连接的第一1x1卷积网络、第一批量标准化网络、第一平移网络、第二1x1卷积网络、第二批量标准化网络以及线性激活函数网络,所述信道尺度选择模块中包含求和单元。Wherein, the structural block of the pedestrian re-identification network model includes a sequentially connected forward layered connection group, a backward layered connection group, and a channel scale selection module, and the forward layered connection group and the backward layered Each connection group contains a plurality of first structural units, and the first structural units include the first 1x1 convolutional network, the first batch normalization network, the first translation network, the second 1x1 convolutional network, the second batch A standardized network and a linear activation function network, the channel scale selection module includes a summing unit.
  2. 根据权利要求1所述的方法,其特征在于,所述前向分层连接组用于通过所述前向分层连接组的多个所述第一结构单元对所述第一图片特征数据进行分步尺度间信息融合,所述后向分层连接组用于通过所述后向分层连接组的多个所述第一结构单元对所述前向分层连接组中各所述第一结构单元输出的信息进行跨尺度信息融合,所述信道尺度选择模块用于通过所述求和单元对所述后向分层连接组中各所述第一结构单元输出的信息进行求和,得到所述分类识别信息。The method according to claim 1, wherein the forward hierarchical connection group is used to perform the first picture feature data through a plurality of the first structural units of the forward hierarchical connection group Step-by-step inter-scale information fusion, the backward hierarchical connectivity group is used to pair each of the first structural units in the forward hierarchical connectivity group through a plurality of the first structural units of the backward hierarchical connectivity group. Cross-scale information fusion is performed on the information output by the structural units, and the channel scale selection module is used to sum the information output by each of the first structural units in the backward hierarchical connection group through the summation unit to obtain The classification identification information.
  3. 根据权利要求2所述的方法,其特征在于,所述行人重识别网络模型还包括3x3卷积网络模块,所述3x3卷积网络模块与所述前向分层连接组连接;The method according to claim 2, wherein the pedestrian re-identification network model also includes a 3x3 convolutional network module, and the 3x3 convolutional network module is connected to the forward layered connection group;
    所述3x3卷积网络模块用于采用多个可分离3x3卷积网络对所述第一图片特征数据进行处理,处理后得到第二图片特征数据;和The 3x3 convolutional network module is used to process the feature data of the first picture by using a plurality of separable 3x3 convolutional networks, and obtain the feature data of the second picture after processing; and
    所述前向分层连接组用于通过多个所述第一结构单元对所述第二图片特征数据进行分步尺度间信息融合。The forward hierarchical connection group is used to perform step-by-step inter-scale information fusion on the second picture feature data through a plurality of the first structural units.
  4. 根据权利要求2所述的方法,其特征在于,所述行人重识别网络模型还包括平移卷积模块,所述平移卷积模块与所述前向分层连接组连接;The method according to claim 2, wherein the pedestrian re-identification network model further includes a translational convolution module, and the translational convolution module is connected to the forward layered connection group;
    所述平移卷积模块用于通过平移操作和1x1卷积网络对所述第一图片特征数据进行处理,处理后得到第三图片特征数据;和The translation convolution module is used to process the feature data of the first picture through a translation operation and a 1x1 convolution network, and obtain the feature data of the third picture after processing; and
    所述前向分层连接组用于通过多个所述第一结构单元对所述第三图片特征数据进行分步尺度间信息融合。The forward hierarchical connection group is used to perform step-by-step inter-scale information fusion on the third picture feature data through a plurality of the first structural units.
  5. 根据权利要求4所述的方法,其特征在于,所述平移卷积模块包含与所述第一结构单元结构相同的第二结构单元,所述平移卷积模块用于通过一个或依次连接的多个所述第二结构单元对所述第一图片特征数据进行处理,处理后得到所述第三图片特征数据。The method according to claim 4, wherein the translational convolution module includes a second structural unit having the same structure as the first structural unit, and the translational convolution module is used to pass one or sequentially connected multiple The second structural unit processes the first picture feature data to obtain the third picture feature data after processing.
  6. 根据权利要求5所述的方法,其特征在于,所述平移卷积模块中还包含池化单元,所述池化单元位于所述依次连接的多个所述第二结构单元中第一个第二结构单元和第二个第二结构单元之间。The method according to claim 5, wherein the translational convolution module further includes a pooling unit, and the pooling unit is located at the first of the plurality of sequentially connected second structural units. Between the second structural unit and the second second structural unit.
  7. 根据权利要求1所述的方法,其特征在于,所述第一图片特征数据为采用任意比特量化网络DoReFa-Net的量化算法对原始图片数据处理后得到的特征数据。The method according to claim 1, wherein the first picture feature data is feature data obtained after processing original picture data with a quantization algorithm of an arbitrary bit quantization network DoReFa-Net.
  8. 一种应用于FPGA硬件架构的数据处理装置,其特征在于,所述装置包括:A data processing device applied to FPGA hardware architecture, characterized in that said device comprises:
    获取模块,用于获取待处理的第一图片特征数据;和An acquisition module, configured to acquire the feature data of the first picture to be processed; and
    处理模块,用于将所述第一图片特征数据输入基于上下文多尺度特征学习的行人重识别网络模型,得到所述行人重识别网络模型输出的分类识别信息;A processing module, configured to input the first picture feature data into a pedestrian re-identification network model based on contextual multi-scale feature learning, and obtain classification and identification information output by the pedestrian re-identification network model;
    其中,所述行人重识别网络模型的结构块包括依次连接的前向分层连接组、后向分层连接组以及信道尺度选择模块,所述前向分层连接组以及所述后向分层连接组中均包含多个第一结构单元,所述第一结构单元包含依次连接的第一1x1卷积网络、第一批量标准化网络、第一平移网络、第二1x1卷积网络、第二批量标准化网络以及线性激活函数网络,所述信道尺度选择模块中包含求和单元。Wherein, the structural block of the pedestrian re-identification network model includes a sequentially connected forward layered connection group, a backward layered connection group, and a channel scale selection module, and the forward layered connection group and the backward layered Each connection group contains a plurality of first structural units, and the first structural units include the first 1x1 convolutional network, the first batch normalization network, the first translation network, the second 1x1 convolutional network, the second batch A standardized network and a linear activation function network, the channel scale selection module includes a summing unit.
  9. 一种FPGA硬件架构,其特征在于,所述FPGA硬件架构包括中央处理器、存储器、计算单元处理部件、池化部件以及残差部件、控制器;A kind of FPGA hardware architecture, it is characterized in that, described FPGA hardware architecture comprises central processing unit, memory, computing unit processing unit, pooling unit and residual unit, controller;
    所述中央处理器用于接收待处理的图片特征数据,并将所述图片特征数据存储到所述存储器;The central processing unit is used to receive picture feature data to be processed, and store the picture feature data into the memory;
    所述计算单元处理部件中设置有一个或多个算术逻辑单元矩阵,所述计算单元处理部件用于从所述存储器中读取所述图片特征数据,并通过所述算术逻辑单元矩阵对所述图片特征数据进行1x1卷积处理;One or more arithmetic logic unit matrices are set in the calculation unit processing part, and the calculation unit processing part is used to read the picture feature data from the memory, and use the arithmetic logic unit matrix to Image feature data is processed by 1x1 convolution;
    所述池化部件用于对所述计算单元处理部件输出的图片特征数据进行池化处理;The pooling component is used to perform pooling processing on the picture feature data output by the computing unit processing component;
    所述残差部件用于对所述池化部件输出的图片特征数据和/或所述计算单元处理部件输出的图片特征数据进行残差累加处理;和The residual component is configured to perform residual accumulation processing on the picture feature data output by the pooling component and/or the picture feature data output by the calculation unit processing component; and
    所述控制器用于根据行人重识别网络模型控制所述计算单元处理部件从所述存储器中读取图片特征数据、权重数据及平移参数,以进行1x1卷积处理,以及控制所述池化部件是否对所述计算单元处理部件输出的特征数据进行池化处理,以及控制所述残差部件是否从所述存储器中读取图片特征数据以及是否将输入到所述残差部件的图片特征数据进行残差处理;The controller is used to control the processing unit of the computing unit to read image feature data, weight data and translation parameters from the memory according to the pedestrian re-identification network model to perform 1x1 convolution processing, and to control whether the pooling unit Perform pooling processing on the feature data output by the computing unit processing part, and control whether the residual part reads the picture feature data from the memory and whether the picture feature data input to the residual part performs residual Poor handling;
    其中,所述行人重识别网络模型的结构块包括依次连接的前向分层连接组、后向分层连接组以及信道尺度选择模块,所述前向分层连接组以及所述后向分层连接组中均包含多个第一结构单元,所述第一结构单元包含依次连接的第一1x1卷积网络、第一批量标准化网络、第一平移网络、第二1x1卷积网络、第二批量标准化网络以及线性激活函数网络,所述信道尺度选择模块中包含求和单元。Wherein, the structural block of the pedestrian re-identification network model includes a sequentially connected forward layered connection group, a backward layered connection group, and a channel scale selection module, and the forward layered connection group and the backward layered Each connection group contains a plurality of first structural units, and the first structural units include the first 1x1 convolutional network, the first batch normalization network, the first translation network, the second 1x1 convolutional network, the second batch A standardized network and a linear activation function network, the channel scale selection module includes a summing unit.
  10. 一种计算机设备,其特征在于,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1-7任意一项所述的方法的步骤。A computer device, characterized by comprising a memory and one or more processors, wherein computer readable instructions are stored in the memory, and when the computer readable instructions are executed by the one or more processors, the The one or more processors execute the steps of the method according to any one of claims 1-7.
  11. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1-7任意一项所述的方法的步骤。One or more non-transitory computer-readable storage media storing computer-readable instructions, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors Execute the steps of the method according to any one of claims 1-7.
PCT/CN2022/095365 2021-12-22 2022-05-26 Fpga hardware architecture, data processing method therefor and storage medium WO2023115814A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111579432.9A CN113963241B (en) 2021-12-22 2021-12-22 FPGA hardware architecture, data processing method thereof and storage medium
CN202111579432.9 2021-12-22

Publications (1)

Publication Number Publication Date
WO2023115814A1 true WO2023115814A1 (en) 2023-06-29

Family

ID=79473583

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/095365 WO2023115814A1 (en) 2021-12-22 2022-05-26 Fpga hardware architecture, data processing method therefor and storage medium

Country Status (2)

Country Link
CN (1) CN113963241B (en)
WO (1) WO2023115814A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113963241B (en) * 2021-12-22 2022-03-08 苏州浪潮智能科技有限公司 FPGA hardware architecture, data processing method thereof and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018140294A1 (en) * 2017-01-25 2018-08-02 Microsoft Technology Licensing, Llc Neural network based on fixed-point operations
CN111414815A (en) * 2020-03-04 2020-07-14 清华大学深圳国际研究生院 Pedestrian re-identification network searching method and pedestrian re-identification method
CN111967468A (en) * 2020-08-10 2020-11-20 东南大学 FPGA-based lightweight target detection neural network implementation method
CN113963241A (en) * 2021-12-22 2022-01-21 苏州浪潮智能科技有限公司 FPGA hardware architecture, data processing method thereof and storage medium

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3582142A1 (en) * 2018-06-15 2019-12-18 Université de Liège Image classification using neural networks
CN109086782A (en) * 2018-08-21 2018-12-25 广东工业大学 Feature Descriptor generation method, device, equipment and computer readable storage medium
CN110334622B (en) * 2019-06-24 2022-04-19 电子科技大学 Pedestrian retrieval method based on adaptive feature pyramid
CN111192320B (en) * 2019-12-30 2023-07-25 上海联影医疗科技股份有限公司 Position information determining method, device, equipment and storage medium
CN111339942B (en) * 2020-02-26 2022-07-12 山东大学 Method and system for recognizing skeleton action of graph convolution circulation network based on viewpoint adjustment
CN111523470B (en) * 2020-04-23 2022-11-18 苏州浪潮智能科技有限公司 Pedestrian re-identification method, device, equipment and medium
CN111523489A (en) * 2020-04-26 2020-08-11 上海眼控科技股份有限公司 Generation method of age classification network, and vehicle-mounted person detection method and device
US20210350517A1 (en) * 2020-05-08 2021-11-11 The Board Of Trustees Of The University Of Alabama Robust roadway crack segmentation using encoder-decoder networks with range images
CN112132023B (en) * 2020-09-22 2024-05-17 上海应用技术大学 Crowd counting method based on multi-scale context enhancement network
CN112685609A (en) * 2021-01-04 2021-04-20 福州大学 Knowledge graph complementing method combining translation mechanism and convolutional neural network
CN112861780A (en) * 2021-03-05 2021-05-28 上海有个机器人有限公司 Pedestrian re-identification method, device, medium and mobile robot
CN112990116B (en) * 2021-04-21 2021-08-06 四川翼飞视科技有限公司 Behavior recognition device and method based on multi-attention mechanism fusion and storage medium
CN113488058B (en) * 2021-06-23 2023-03-24 武汉理工大学 Voiceprint recognition method based on short voice

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018140294A1 (en) * 2017-01-25 2018-08-02 Microsoft Technology Licensing, Llc Neural network based on fixed-point operations
CN111414815A (en) * 2020-03-04 2020-07-14 清华大学深圳国际研究生院 Pedestrian re-identification network searching method and pedestrian re-identification method
CN111967468A (en) * 2020-08-10 2020-11-20 东南大学 FPGA-based lightweight target detection neural network implementation method
CN113963241A (en) * 2021-12-22 2022-01-21 苏州浪潮智能科技有限公司 FPGA hardware architecture, data processing method thereof and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LANG LEI, YINGQING XIA: "Survey on Compact Neural Network Model Design", JOURNAL OF FRONTIERS OF COMPUTER SCIENCE AND TECHNOLOGY,, vol. 14, no. 9, 20 May 2020 (2020-05-20), pages 1456 - 1470, XP093075483, ISSN: 1673-9418, DOI: 10.3778/j.issn.1673-9418.1912079 *

Also Published As

Publication number Publication date
CN113963241A (en) 2022-01-21
CN113963241B (en) 2022-03-08

Similar Documents

Publication Publication Date Title
US10983754B2 (en) Accelerated quantized multiply-and-add operations
CN108765247B (en) Image processing method, device, storage medium and equipment
US10096134B2 (en) Data compaction and memory bandwidth reduction for sparse neural networks
US20210224125A1 (en) Operation Accelerator, Processing Method, and Related Device
CN110659725B (en) Neural network model compression and acceleration method, data processing method and device
CN107944545B (en) Computing method and computing device applied to neural network
CN112418392A (en) Neural network construction method and device
WO2023231794A1 (en) Neural network parameter quantification method and apparatus
KR102655950B1 (en) High speed processing method of neural network and apparatus using thereof
CN113326930A (en) Data processing method, neural network training method, related device and equipment
CN109993293B (en) Deep learning accelerator suitable for heap hourglass network
EP4200722A1 (en) Tabular convolution and acceleration
WO2023115814A1 (en) Fpga hardware architecture, data processing method therefor and storage medium
CN110874627A (en) Data processing method, data processing apparatus, and computer readable medium
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
US11748100B2 (en) Processing in memory methods for convolutional operations
CN116797850A (en) Class increment image classification method based on knowledge distillation and consistency regularization
CN117688984A (en) Neural network structure searching method, device and storage medium
EP3933705A1 (en) Methods and systems for running dynamic recurrent neural networks in hardware
CN115496181A (en) Chip adaptation method, device, chip and medium of deep learning model
CN113902107A (en) Data processing method, readable medium and electronic device for neural network model full connection layer
CN112668656A (en) Image classification method and device, computer equipment and storage medium
WO2021120036A1 (en) Data processing apparatus and data processing method
CN115409150A (en) Data compression method, data decompression method and related equipment
CN112561050A (en) Neural network model training method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22909149

Country of ref document: EP

Kind code of ref document: A1