CN109919307B - FPGA (field programmable Gate array) and depth residual error network implementation method, system and computer medium - Google Patents

FPGA (field programmable Gate array) and depth residual error network implementation method, system and computer medium Download PDF

Info

Publication number
CN109919307B
CN109919307B CN201910081806.0A CN201910081806A CN109919307B CN 109919307 B CN109919307 B CN 109919307B CN 201910081806 A CN201910081806 A CN 201910081806A CN 109919307 B CN109919307 B CN 109919307B
Authority
CN
China
Prior art keywords
module
data
characteristic data
result
fpga
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910081806.0A
Other languages
Chinese (zh)
Other versions
CN109919307A (en
Inventor
张新
赵雅倩
董刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Inspur Smart Computing Technology Co Ltd
Original Assignee
Guangdong Inspur Big Data Research Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Inspur Big Data Research Co Ltd filed Critical Guangdong Inspur Big Data Research Co Ltd
Priority to CN201910081806.0A priority Critical patent/CN109919307B/en
Publication of CN109919307A publication Critical patent/CN109919307A/en
Application granted granted Critical
Publication of CN109919307B publication Critical patent/CN109919307B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Techniques For Improving Reliability Of Storages (AREA)
  • Advance Control (AREA)

Abstract

The application discloses a method, a system and a computer medium for realizing a PFGA (pulse frequency gating) and a deep residual error network, wherein the deep residual error network realizing system realizes the deep residual error network in an FPGA (field programmable gate array) by virtue of a weight buffer module, a characteristic data buffer module, a memory reading module, a Winograd transformation module, a convolution module, a standardization module, a residual error module, an activation module, a maximum pooling module and a memory writing-back module. The FPGA, the depth residual error network implementation method and the computer readable storage medium solve the corresponding technical problems.

Description

FPGA (field programmable Gate array) and depth residual error network implementation method, system and computer medium
Technical Field
The present application relates to the field of software technologies, and in particular, to a method, a system, and a computer medium for implementing an FPGA and a deep residual error network.
Background
The depth residual network plays an important role in deep learning and is widely used for image classification due to its high recognition rate. However, as the amount of data increases in data centers and embedded systems, performance and power consumption become limiting factors in the development of deep residual error networks, so that the applicability of deep residual error networks decreases.
In order to improve the applicability of the deep residual error network, a conventional solution is to use a GPU (Graphics Processing Unit) to heterogeneous accelerate the deep residual error network. However, the method of using the GPU heterogeneous accelerated deep residual error network still has the problem of large power consumption.
In summary, how to provide a deep residual error network with low power consumption and good applicability is an urgent problem to be solved by those skilled in the art.
Disclosure of Invention
The application aims to provide a system for realizing a deep residual error network, which can solve the technical problem of how to provide a deep residual error network with low power consumption and good applicability to a certain extent. The application also provides an FPGA, a depth residual error network implementation method and a computer readable storage medium.
In order to achieve the above object, the present application provides the following technical solutions:
a depth residual error network implementation system is applied to an FPGA and comprises the following steps:
the weight buffer module is used for buffering weight data sent by host end equipment connected with the FPGA;
the characteristic data buffer module is used for caching the characteristic data sent by the host end equipment and the data sent by the memory write-back module;
the memory reading module is used for reading target weight data in the weight buffer module at each operation moment and reading target characteristic data in the characteristic data buffer module;
the Winograd conversion module is used for carrying out Winograd conversion on the target weight data and the target characteristic data to obtain conversion weight data and conversion characteristic data; carrying out Winograd transformation on the convolution operation result of the convolution module to obtain a transformation result;
the convolution module is used for carrying out convolution operation on the transformation weight data and the transformation characteristic data to obtain a convolution operation result;
the standardization module is used for carrying out standardization processing on the transformation result to obtain a standardization processing result;
the residual error module is used for reading the first characteristic data in the characteristic data buffer module and summing the standardized processing result and the first characteristic data to obtain a residual error processing result;
the activation module is used for activating the residual error processing result to obtain an activation processing result;
the maximum pooling module is used for performing maximum pooling on the activation processing result to obtain a maximum pooling processing result;
and the memory write-back module is used for writing the maximum pooling processing result back to the characteristic data buffering module.
Preferably, the Winograd conversion module includes:
the first Winograd conversion unit is used for taking the target characteristic data as the conversion characteristic data;
the second Winograd conversion unit is used for copying the target weight data to obtain the conversion weight data;
and the third Winograd conversion unit is used for taking the convolution operation result as the conversion result.
Preferably, the algorithm for Winograd transformation by the Winograd transformation module includes an F (2,3) Winograd transformation algorithm.
Preferably, the method further comprises the following steps:
and the control module is used for controlling the working states of the memory reading module, the convolution module, the Winograd conversion module, the standardization module, the residual error module, the activation module, the maximum pooling module and the memory writing-back module.
A method for realizing a depth residual error network is applied to an FPGA and comprises the following steps:
caching weight data and characteristic data sent by host end equipment connected with the FPGA;
at each operation moment, reading target weight data and target characteristic data from the characteristic data of the cached weight data;
carrying out Winograd transformation on the target weight data and the target characteristic data to obtain transformation weight data and transformation characteristic data;
performing convolution operation on the transformation weight data and the transformation characteristic data to obtain a convolution operation result;
carrying out Winograd transformation on the convolution operation result to obtain a transformation result;
standardizing the transformation result to obtain a standardized processing result;
reading first characteristic data in the cached characteristic data, and adding the standardized processing result and the first characteristic data to obtain a residual error processing result;
activating the residual error processing result to obtain an activation processing result;
performing maximum pooling treatment on the activation treatment result to obtain a maximum pooling treatment result;
and caching the maximum pooling processing result to a cache region of the characteristic data.
An FPGA, comprising:
a storage unit for storing a computer program;
a processing unit for implementing the steps of the deep residual error network implementation method as described above when executing the computer program.
A computer-readable storage medium for an FPGA, having a computer program stored thereon, which, when being executed by a processor, implements the steps of the deep residual network implementation method as described above.
A method for realizing a deep residual error network is applied to a host end device connected with an FPGA and comprises the following steps:
sending the weight data and the characteristic data to the FPGA;
receiving an operation result of the FPGA for performing depth residual operation on the weight data and the feature data;
performing mean pooling on the operation result to obtain a mean pooling result;
and performing softmax operation on the average value pooling processing result to obtain a softmax operation result.
Preferably, the sending the weight data and the feature data to the FPGA includes:
recombining the weight data and the feature data according to the data format of the FPGA;
and sending the recombined weight data and feature data to the FPGA.
Preferably, data transmission is performed between the host device and the FPGA through PCIE.
The system for realizing the deep residual error network is applied to an FPGA and comprises a weight buffer module, a weight data processing module and a residual error processing module, wherein the weight buffer module is used for caching weight data sent by host end equipment connected with the FPGA; the characteristic data buffer module is used for caching the characteristic data sent by the host end equipment and the data sent by the memory write-back module; the memory reading module is used for reading target weight data in the weight buffer module at each operation moment and reading target characteristic data in the characteristic data buffer module; the Winograd conversion module is used for carrying out Winograd conversion on the target weight data and the target characteristic data to obtain conversion weight data and conversion characteristic data; carrying out Winograd transformation on the convolution operation result of the convolution module to obtain a transformation result; the convolution module is used for carrying out convolution operation on the transformation weight data and the transformation characteristic data to obtain a convolution operation result; the standardization module is used for carrying out standardization processing on the transformation result to obtain a standardization processing result; the residual error module is used for reading the first characteristic data in the characteristic data buffer module and summing the standardized processing result and the first characteristic data to obtain a residual error processing result; the activation module is used for activating the residual error processing result to obtain an activation processing result; the maximum pooling module is used for performing maximum pooling on the activation processing result to obtain a maximum pooling processing result; and the memory write-back module is used for writing the maximum pooling processing result back to the characteristic data buffer module. According to the system for realizing the depth residual error network, the depth residual error network is realized by means of the FPGA, and due to the characteristics of low delay and low power consumption of the FPGA, the depth residual error network realized by means of the FPGA also has the characteristics of low delay and low power consumption, and accordingly, the applicability is good. The FPGA, the depth residual error network implementation method and the computer readable storage medium solve the corresponding technical problems.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a deep residual error network implementation system according to an embodiment of the present disclosure;
fig. 2 is a first flowchart of a method for implementing a deep residual error network according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an FPGA according to an embodiment of the present application;
fig. 4 is another schematic structural diagram of an FPGA according to an embodiment of the present application;
fig. 5 is a second flowchart of a method for implementing a deep residual error network according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The depth residual network plays an important role in deep learning and is widely used for image classification due to its high recognition rate. However, as the amount of data increases in data centers and embedded systems, performance and power consumption become limiting factors in the development of deep residual error networks, so that the applicability of deep residual error networks decreases. In order to improve the applicability of the deep residual error network, a conventional solution is to use a GPU (Graphics Processing Unit) to implement heterogeneous acceleration of the deep residual error network. However, the method of using the GPU heterogeneous accelerated deep residual error network still has the problem of large power consumption. The deep residual error network implementation system solves the technical problem of how to provide a deep residual error network with low power consumption and good applicability.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a deep residual error network implementation system according to an embodiment of the present disclosure.
The system 100 for implementing a deep residual error network provided in the embodiment of the present application is applied to an FPGA (Field-Programmable Gate Array), and may include:
the weight buffer module 101 is used for caching weight data sent by host end equipment connected with the FPGA;
the characteristic data buffer module 102 is configured to buffer characteristic data sent by the host device and data sent by the memory write-back module;
a memory reading module 103, configured to read target weight data in the weight buffer module 101 and target feature data in the feature data buffer module 102 at each operation time;
the Winograd conversion module 104 is configured to perform Winograd conversion on the target weight data and the target feature data to obtain conversion weight data and conversion feature data; performing Winograd transformation on the convolution operation result of the convolution module 105 to obtain a transformation result;
the convolution module 105 is used for performing convolution operation on the transformation weight data and the transformation characteristic data to obtain a convolution operation result;
a standardization module 106, configured to standardize the transformation result to obtain a standardized processing result;
a residual error module 107, configured to read the first feature data in the feature data buffer module 102, and add the normalized processing result and the first feature data to obtain a residual error processing result;
an activation module 108, configured to perform activation processing on the residual processing result to obtain an activation processing result;
a maximum pooling module 109, configured to perform maximum pooling on the activation processing result to obtain a maximum pooling result;
a memory write-back module 110, configured to write the maximum pooling processing result back to the feature data buffering module 102.
In a specific application scenario, in the system for implementing the deep residual error network provided by the present application, the connection manner between the modules may be determined according to actual needs, for example, the modules may be connected in a wireless manner or in a wired manner. The storage media applied by the weight buffer module and the characteristic Data buffer module may also be determined according to actual needs, for example, the storage media may be DDR (Double Data Rate, double synchronous dynamic random access memory) and the like.
In a specific application scenario, the convolution operation of the convolution module may be calculated by a PE array, and the PE array may be a daisy chain connection structure. Further, the single PE structure may be an array of multipliers, accumulator (MAC) units, and caches. The accumulator is used for multiplying the vectorization characteristic data after Winograd conversion and the filter parameters, the accumulator accumulates the intermediate result until convolution is completed, and finally the accumulated result also needs Winograd conversion to obtain a real convolution result.
In a specific application scenario, the normalization module may achieve normalization through the formulas y = (x-mu)/sqrt (sigma) and z = alpha × y + bias, where the formula y = (x-mu)/sqrt (sigma) describes a batch normalization operation, x represents a convolution result, and mu and sigma represent corresponding coefficients; the formula z = alpha x y + bias describes the scaling operation, alpha representing the scaling factor and bias representing the bias factor.
The system for realizing the deep residual error network is applied to an FPGA and comprises a weight buffer module, a weight data processing module and a residual error processing module, wherein the weight buffer module is used for caching weight data sent by host end equipment connected with the FPGA; the characteristic data buffer module is used for caching the characteristic data sent by the host end equipment and the data sent by the memory write-back module; the memory reading module is used for reading target weight data in the weight buffer module at each operation moment and reading target characteristic data in the characteristic data buffer module; the Winograd conversion module is used for carrying out Winograd conversion on the target weight data and the target characteristic data to obtain conversion weight data and conversion characteristic data; carrying out Winograd transformation on the convolution operation result of the convolution module to obtain a transformation result; the convolution module is used for carrying out convolution operation on the transformation weight data and the transformation characteristic data to obtain a convolution operation result; the standardization module is used for carrying out standardization processing on the transformation result to obtain a standardization processing result; the residual error module is used for reading the first characteristic data in the characteristic data buffer module and summing the standardized processing result and the first characteristic data to obtain a residual error processing result; the activation module is used for activating the residual error processing result to obtain an activation processing result; the maximum pooling module is used for performing maximum pooling on the activation processing result to obtain a maximum pooling processing result; and the memory write-back module is used for writing the maximum pooling processing result back to the characteristic data buffer module. According to the system for realizing the deep residual error network, the deep residual error network is realized by means of the FPGA, and due to the characteristics of low delay and low power consumption of the FPGA, the deep residual error network realized by means of the FPGA also has the characteristics of low delay and low power consumption, and accordingly, the applicability is better.
In the system for implementing a deep residual error network provided in the embodiment of the present application, the Winograd transform module may include: the first Winograd conversion unit is used for taking the target feature data as conversion feature data; the second Winograd conversion unit is used for copying the target weight data to obtain conversion weight data; and the third Winograd conversion unit is used for taking the convolution operation result as a conversion result.
That is, the deep residual error network implementation system provided in the embodiment of the present application improves the conversion process of Winograd, and compared with the existing technology that converts target feature data Winograd into conversion feature data, converts target weight data Winograd into conversion weight data, and converts the convolution operation result Winograd into a conversion result, the process is simple and convenient, the operation speed is high, and the data throughput of the convolution module can be improved. Specifically, when the convolution module performs 1 × 1 convolution operation, the improved Winograd transform method of the present application may be preferentially used for the operation.
In a practical application scenario, an algorithm applied by the existing Winograd conversion is F (4,3) Winograd conversion, however, in the FPGA, PE may process vectorized feature data, W _ VECTOR = C _ VECTOR represents a block size, W _ VECTOR represents a width of a feature pixel, and C _ VECTOR represents a depth, in the FPGA architecture, C _ VECTOR is a fixed value equal to 8, W _ VECTOR =6 corresponds to F (4,3) Winograd conversion, since a half-precision floating point data type is adopted, a block of W _ VECTOR = C _ VECTOR = 2byte = 2byte = is required to be obtained every cycle, the FPGA clock frequency can reach the MHz FPGA clock frequency, and therefore, a bandwidth requirement is 96byte frequency 22.8gb/s, but one buffer module only provides a bandwidth of 15GB/s, the bandwidth becomes a bottleneck bandwidth of 238 GB/s, and the Winograd conversion algorithm can be selected for reducing the Winograd conversion algorithm (Winograd 3763).
In an actual application scenario, in order to satisfy control over each module in the deep residual error network implementation system, the system for implementing a deep residual error network provided in the embodiment of the present application may further include: and the control module is used for controlling the working states of the memory reading module, the convolution module, the Winograd conversion module, the standardization module, the residual error module, the activation module, the maximum pooling module and the memory write-back module. In a specific application scenario, the control module may further control respective data reading processes of the weight buffer module and the feature data buffer module, for example, calculate a start address when reading data each time.
The application also provides a method for realizing the deep residual error network, which has the corresponding effect of the system for realizing the deep residual error network provided by the embodiment of the application. Referring to fig. 2, fig. 2 is a first flowchart of a method for implementing a deep residual error network according to an embodiment of the present application.
The method for implementing the depth residual error network provided by the embodiment of the application is applied to an FPGA and can comprise the following steps:
step S101: caching weight data and characteristic data sent by host end equipment connected with the FPGA;
step S102: at each operation moment, reading target weight data and target characteristic data from the cached weight data characteristic data;
step S103: carrying out Winograd conversion on the target weight data and the target characteristic data to obtain conversion weight data and conversion characteristic data;
step S104: performing convolution operation on the transformation weight data and the transformation characteristic data to obtain a convolution operation result;
step S105: carrying out Winograd transformation on the convolution operation result to obtain a transformation result;
step S106: carrying out standardization processing on the transformation result to obtain a standardization processing result;
step S107: reading first characteristic data in the cached characteristic data, and adding the standardized processing result and the first characteristic data to obtain a residual error processing result;
step S108: activating the residual error processing result to obtain an activation processing result;
step S109: performing maximum pooling treatment on the activation treatment result to obtain a maximum pooling treatment result;
step S110: and caching the maximum pooling processing result into a cache region of the characteristic data.
In the method for implementing a deep residual error network provided in the embodiment of the present application, the process of performing Winograd conversion on target weight data and target feature data to obtain conversion weight data and conversion feature data may specifically be: taking the target characteristic data as transformation characteristic data, and copying the target weight data to obtain transformation weight data; correspondingly, the process of obtaining the conversion result by performing Winograd conversion on the convolution operation result may specifically be as follows: and taking the convolution operation result as a transformation result.
In the method for implementing the depth residual error network provided in the embodiment of the present application, an algorithm for performing Winograd transform includes an F (2,3) Winograd transform algorithm.
The application also provides an FPGA and a computer readable storage medium, which have the corresponding effects of the depth residual error network implementation system provided by the embodiment of the application. Referring to fig. 3, fig. 3 is a schematic structural diagram of an FPGA according to an embodiment of the present disclosure.
An FPGA provided in an embodiment of the present application includes a storage unit 201 and a processing unit 202, where the storage unit 201 stores a computer program, and the processing unit 202 implements the following steps when executing the computer program stored in the storage unit 201:
caching weight data and characteristic data sent by host end equipment connected with the FPGA;
at each operation moment, reading target weight data and target characteristic data from the cached weight data characteristic data;
carrying out Winograd transformation on the target weight data and the target characteristic data to obtain transformation weight data and transformation characteristic data;
performing convolution operation on the transformation weight data and the transformation characteristic data to obtain a convolution operation result;
carrying out Winograd transformation on the convolution operation result to obtain a transformation result;
carrying out standardization processing on the transformation result to obtain a standardization processing result;
reading first characteristic data in the cached characteristic data, and adding the standardized processing result and the first characteristic data to obtain a residual error processing result;
activating the residual processing result to obtain an activation processing result;
performing maximum pooling treatment on the activation treatment result to obtain a maximum pooling treatment result;
and caching the maximum pooling processing result into a cache region of the characteristic data.
The FPGA provided in the embodiment of the present application includes a storage unit 201 and a processing unit 202, a computer subprogram is stored in the storage unit 201, and the following steps are specifically implemented when the processing unit 202 executes the computer subprogram stored in the storage unit 201: taking the target characteristic data as transformation characteristic data, and copying the target weight data to obtain transformation weight data; correspondingly, the convolution operation result is used as a transformation result.
The FPGA provided in the embodiment of the present application includes a storage unit 201 and a processing unit 202, a computer subprogram is stored in the storage unit 201, and the following steps are specifically implemented when the processing unit 202 executes the computer subprogram stored in the storage unit 201: the Winograd transformation is performed by using the F (2,3) Winograd transformation algorithm.
Referring to fig. 4, another FPGA provided in the embodiment of the present application may further include: an input port 203 connected to the processing unit 202, for transmitting externally input commands to the processing unit 202; a display unit 204 connected to the processing unit 202, for displaying the processing result of the processing unit 202 to the outside; and the communication module 205 is connected with the processing unit 202 and is used for realizing the communication between the FPGA and the outside. The display unit 204 may be a display panel, a laser scanning display, or the like; the communication method adopted by the communication module 205 includes, but is not limited to, mobile high definition link technology (HML), universal Serial Bus (USB), high Definition Multimedia Interface (HDMI), and wireless connection: wireless fidelity (WiFi), bluetooth communication, bluetooth low energy (low) communication, ieee802.11s based communication.
The computer-readable storage medium provided by the embodiment of the application is applied to an FPGA, and a computer program is stored in the computer-readable storage medium, and when being executed by a processor, the computer program realizes the following steps:
caching weight data and characteristic data sent by host end equipment connected with the FPGA;
at each operation moment, reading target weight data and target characteristic data from the cached weight data characteristic data;
carrying out Winograd conversion on the target weight data and the target characteristic data to obtain conversion weight data and conversion characteristic data;
carrying out convolution operation on the transformation weight data and the transformation characteristic data to obtain a convolution operation result;
carrying out Winograd transformation on the convolution operation result to obtain a transformation result;
carrying out standardization processing on the transformation result to obtain a standardization processing result;
reading first characteristic data in the cached characteristic data, and adding the standardized processing result and the first characteristic data to obtain a residual error processing result;
activating the residual processing result to obtain an activation processing result;
performing maximum pooling treatment on the activation treatment result to obtain a maximum pooling treatment result;
and caching the maximum pooling processing result into a cache region of the characteristic data.
The computer-readable storage medium provided by the embodiment of the application is applied to an FPGA, and a computer subprogram is stored in the computer-readable storage medium and specifically realizes the following steps when the computer subprogram is executed by a processor: taking the target characteristic data as transformation characteristic data, and copying the target weight data to obtain transformation weight data; correspondingly, the convolution operation result is used as a transformation result.
The computer-readable storage medium provided in the embodiment of the present application is applied to an FPGA, and a computer subprogram is stored in the computer-readable storage medium, and when being executed by a processor, the computer subprogram specifically implements the following steps: the Winograd transformation is performed by using the F (2,3) Winograd transformation algorithm.
Referring to fig. 5, fig. 5 is a second flowchart of a method for implementing a deep residual error network according to an embodiment of the present application.
The method for implementing the deep residual error network provided by the embodiment of the application is applied to the host end device connected with the above-described FPGA, and may include the following steps:
step S201: sending the weight data and the characteristic data to the FPGA;
step S202: receiving an operation result of the FPGA for performing depth residual operation on the weight data and the characteristic data;
step S203: performing mean pooling on the operation result to obtain a mean pooling result;
step S204: and performing softmax operation on the mean value pooling processing result to obtain a softmax operation result.
It is easy to understand that in the method for implementing a deep residual error network provided in the embodiment of the present application, the maximum pooling process is performed by the FPGA, and the mean pooling process is performed by the host device, which can simplify the hardware architecture of the FPGA and improve the performance of the mean pooling. In a specific application scenario, after the softmax operation result is obtained, the host end device can also perform calculation work such as recognition rate calculation and performance analysis.
In the method for implementing a deep residual error network applied to a host-side device provided in the embodiment of the present application, a process of sending weight data and feature data to an FPGA may specifically be: according to the data format of the FPGA, recombining the weight data and the characteristic data; and sending the recombined weight data and feature data to the FPGA.
In the method for implementing a deep residual error network applied to a host device, provided by the embodiment of the present application, data transmission between the host device and an FPGA may be performed through PCIE.
The computer-readable storage media to which this application relates include Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage media known in the art.
For a description of relevant parts in an FPGA, a method for implementing a deep residual error network, and a computer readable storage medium provided in the embodiments of the present application, reference is made to detailed descriptions of corresponding parts in a system for implementing a deep residual error network provided in the embodiments of the present application, and details are not repeated here. In addition, parts of the above technical solutions provided in the embodiments of the present application, which are consistent with the implementation principles of corresponding technical solutions in the prior art, are not described in detail so as to avoid redundant description.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. The system for realizing the deep residual error network is applied to an FPGA and comprises the following steps:
the weight buffer module is used for caching weight data sent by host end equipment connected with the FPGA;
the characteristic data buffer module is used for caching the characteristic data sent by the host end equipment and the data sent by the memory write-back module;
the memory reading module is used for reading target weight data in the weight buffer module at each operation moment and reading target characteristic data in the characteristic data buffer module;
the Winograd conversion module is used for carrying out Winograd conversion on the target weight data and the target characteristic data to obtain conversion weight data and conversion characteristic data; carrying out Winograd transformation on the convolution operation result of the convolution module to obtain a transformation result;
the convolution module is used for carrying out convolution operation on the transformation weight data and the transformation characteristic data to obtain a convolution operation result;
the standardization module is used for carrying out standardization processing on the transformation result to obtain a standardization processing result;
the residual error module is used for reading the first characteristic data in the characteristic data buffer module and adding the standardized processing result and the first characteristic data to obtain a residual error processing result;
the activation module is used for activating the residual processing result to obtain an activation processing result;
the maximum pooling module is used for performing maximum pooling on the activation processing result to obtain a maximum pooling processing result;
and the memory write-back module is used for writing the maximum pooling processing result back to the characteristic data buffering module.
2. The system of claim 1, wherein the Winograd transform module comprises:
the first Winograd conversion unit is used for taking the target characteristic data as the conversion characteristic data;
the second Winograd conversion unit is used for copying the target weight data to obtain the conversion weight data;
and the third Winograd conversion unit is used for taking the convolution operation result as the conversion result.
3. The system according to claim 1 or 2, wherein the Winograd transform module performs Winograd transform algorithm comprising F (2,3) Winograd transform algorithm.
4. The system of claim 3, further comprising:
and the control module is used for controlling the working states of the memory reading module, the convolution module, the Winograd conversion module, the standardization module, the residual error module, the activation module, the maximum pooling module and the memory writing-back module.
5. A method for realizing a deep residual error network is applied to an FPGA and comprises the following steps:
caching weight data and characteristic data sent by host end equipment connected with the FPGA;
at each operation moment, reading target weight data and target characteristic data from the characteristic data of the cached weight data;
carrying out Winograd transformation on the target weight data and the target characteristic data to obtain transformation weight data and transformation characteristic data;
performing convolution operation on the transformation weight data and the transformation characteristic data to obtain a convolution operation result;
carrying out Winograd transformation on the convolution operation result to obtain a transformation result;
standardizing the transformation result to obtain a standardized processing result;
reading first characteristic data in the cached characteristic data, and adding the standardized processing result and the first characteristic data to obtain a residual error processing result;
activating the residual error processing result to obtain an activation processing result;
performing maximum pooling treatment on the activation treatment result to obtain a maximum pooling treatment result;
and caching the maximum pooling processing result to a cache region of the characteristic data.
6. An FPGA, comprising:
a storage unit for storing a computer program;
a processing unit for implementing the steps of the deep residual network implementation method as claimed in claim 5 when said computer program is executed.
7. A computer-readable storage medium for an FPGA, wherein a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the depth residual error network implementation method according to claim 5 are implemented.
8. A deep residual error network implementation method is applied to a host end device connected with the FPGA of claim 6, and comprises the following steps:
sending the weight data and the characteristic data to the FPGA;
receiving an operation result of the FPGA for performing depth residual operation on the weight data and the feature data;
performing mean pooling on the operation result to obtain a mean pooling result;
and performing softmax operation on the average value pooling processing result to obtain a softmax operation result.
9. The method of claim 8, wherein sending the weight data and the characterization data to the FPGA comprises:
recombining the weight data and the feature data according to the data format of the FPGA;
and sending the recombined weight data and feature data to the FPGA.
10. The method of claim 8, wherein data transmission between the host device and the FPGA is performed via PCIE.
CN201910081806.0A 2019-01-28 2019-01-28 FPGA (field programmable Gate array) and depth residual error network implementation method, system and computer medium Active CN109919307B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910081806.0A CN109919307B (en) 2019-01-28 2019-01-28 FPGA (field programmable Gate array) and depth residual error network implementation method, system and computer medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910081806.0A CN109919307B (en) 2019-01-28 2019-01-28 FPGA (field programmable Gate array) and depth residual error network implementation method, system and computer medium

Publications (2)

Publication Number Publication Date
CN109919307A CN109919307A (en) 2019-06-21
CN109919307B true CN109919307B (en) 2023-04-07

Family

ID=66961035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910081806.0A Active CN109919307B (en) 2019-01-28 2019-01-28 FPGA (field programmable Gate array) and depth residual error network implementation method, system and computer medium

Country Status (1)

Country Link
CN (1) CN109919307B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806586B (en) * 2021-11-18 2022-03-15 腾讯科技(深圳)有限公司 Data processing method, computer device and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107993186A (en) * 2017-12-14 2018-05-04 中国人民解放军国防科技大学 3D CNN acceleration method and system based on Winograd algorithm
WO2018107383A1 (en) * 2016-12-14 2018-06-21 上海寒武纪信息科技有限公司 Neural network convolution computation method and device, and computer-readable storage medium
CN108765247A (en) * 2018-05-15 2018-11-06 腾讯科技(深圳)有限公司 Image processing method, device, storage medium and equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018107383A1 (en) * 2016-12-14 2018-06-21 上海寒武纪信息科技有限公司 Neural network convolution computation method and device, and computer-readable storage medium
CN107993186A (en) * 2017-12-14 2018-05-04 中国人民解放军国防科技大学 3D CNN acceleration method and system based on Winograd algorithm
CN108765247A (en) * 2018-05-15 2018-11-06 腾讯科技(深圳)有限公司 Image processing method, device, storage medium and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FPGA异构计算平台及其应用;胡雷钧等;《电力信息与通信技术》;20160731;全文 *

Also Published As

Publication number Publication date
CN109919307A (en) 2019-06-21

Similar Documents

Publication Publication Date Title
CN109871510B (en) Two-dimensional convolution operation processing method, system, equipment and computer storage medium
US10116746B2 (en) Data storage method and network interface card
CN102591783B (en) Programmable memory controller
US10318165B2 (en) Data operating method, device, and system
CN111625546B (en) Data writing method, device, equipment and medium
US9619411B2 (en) Polling determination
CN109919307B (en) FPGA (field programmable Gate array) and depth residual error network implementation method, system and computer medium
CN113127382A (en) Data reading method, device, equipment and medium for additional writing
CN106681659A (en) Data compression method and device
CN109213745A (en) A kind of distributed document storage method, device, processor and storage medium
CN115860080B (en) Computing core, accelerator, computing method, apparatus, device, medium, and system
CN113590666B (en) Data caching method, system, equipment and computer medium in AI cluster
CN113672176B (en) Data reading method, system, equipment and computer readable storage medium
CN115237349A (en) Data read-write control method, control device, computer storage medium and electronic equipment
US10832132B2 (en) Data transmission method and calculation apparatus for neural network, electronic apparatus, computer-readable storage medium and computer program product
US20220188032A1 (en) Methods and apparatus for improving data transformation in processing devices
CN117093530B (en) FPGA (field programmable Gate array), model training system and data access method for data transmission
CN111506518B (en) Data storage control method and device
CN114153505B (en) Data interaction method, system, device, equipment and computer storage medium
CN117097346B (en) Decompressor and data decompression method, system, equipment and computer medium
CN115953651B (en) Cross-domain equipment-based model training method, device, equipment and medium
CN112380158B (en) Deep learning-oriented computing platform
TWI764311B (en) Memory access method and intelligent processing apparatus
US20220229583A1 (en) Ai algorithm operation accelerator and method thereof, computing system and non-transitory computer readable media
US10346572B1 (en) Inclusion and configuration of a transaction converter circuit block within an integrated circuit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant