WO2020119188A1 - Program detection method, apparatus and device, and readable storage medium - Google Patents

Program detection method, apparatus and device, and readable storage medium Download PDF

Info

Publication number
WO2020119188A1
WO2020119188A1 PCT/CN2019/103639 CN2019103639W WO2020119188A1 WO 2020119188 A1 WO2020119188 A1 WO 2020119188A1 CN 2019103639 W CN2019103639 W CN 2019103639W WO 2020119188 A1 WO2020119188 A1 WO 2020119188A1
Authority
WO
WIPO (PCT)
Prior art keywords
program
winograd
test data
convolution
fpga
Prior art date
Application number
PCT/CN2019/103639
Other languages
French (fr)
Chinese (zh)
Inventor
曹芳
赵雅倩
郭振华
Original Assignee
广东浪潮大数据研究有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广东浪潮大数据研究有限公司 filed Critical 广东浪潮大数据研究有限公司
Publication of WO2020119188A1 publication Critical patent/WO2020119188A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present invention relates to the field of computer application technology, and in particular, to a program detection method, device, device, and readable storage medium.
  • CNN Convolutional Neural Networks
  • FPGA Field-Programmable Gate Array
  • DSP multiplier
  • the Winograd algorithm is a fast algorithm for convolutional neural networks. It uses the structural similarity between elements to generate a list of elements in the output feature map. Can reduce the number of multiplication operations, thereby greatly reducing the complexity of the algorithm, can improve the CNN performance on the FPGA.
  • the purpose of the present invention is to provide a program detection method, device, equipment and readable storage medium to detect the Winograd program on the FPGA to ensure the accuracy of the entire CNN algorithm.
  • the present invention provides the following technical solutions:
  • a program detection method including:
  • the target algorithm program of the convolutional neural network uses the target algorithm program of the convolutional neural network to perform convolution calculation on the test data to obtain a convolution result;
  • the target algorithm program is an algorithm program that implements the convolutional neural network in a sliding window manner;
  • the target algorithm program of the convolutional neural network is used to perform convolution calculation on the test data to obtain a convolution result, including:
  • the FPGA uses the Winograd program to perform fast convolution calculation on the test data, including:
  • the FPGA uses the Winograd program to perform fast convolution calculation on the test data, and uses the fast calculation to obtain the first layer result of the convolutional neural network as the fast convolution result.
  • it also includes:
  • the filter parameters are set in the target convolution algorithm program and the Winograd program, respectively.
  • sending the test data to the FPGA includes:
  • the FPGA uses the Winograd program to perform fast convolution calculation on the test data, including:
  • the FPGA starts the kernel and uses the Winograd program to perform fast convolution calculation on the test data.
  • calculating the similarity between the fast convolution result and the convolution result includes:
  • the similarity when the similarity is less than or equal to the threshold, it further includes:
  • a program detection device including:
  • Test data acquisition module used to acquire test data when receiving the Winograd program detection instruction
  • the convolution calculation module is used to use the target algorithm program of the convolutional neural network to perform convolution calculation on the test data to obtain a convolution result;
  • the target algorithm program is to implement the convolutional nerve in a sliding window manner Algorithm program of the network;
  • a test data sending module configured to send the test data to the FPGA, so that the FPGA uses the Winograd program to perform fast convolution calculation on the test data;
  • a similarity calculation module configured to receive the fast convolution result sent by the FPGA, and calculate the similarity between the fast convolution result and the convolution result;
  • the detection result determination module is used to determine that the Winograd program is correct when the similarity is greater than a threshold.
  • a program detection device including:
  • Memory used to store computer programs
  • the processor is configured to implement the steps of the above program detection method when executing the computer program.
  • a readable storage medium a computer program is stored on the readable storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned program detection method are realized.
  • the test data is obtained; the target algorithm program of the convolutional neural network is used to perform convolution calculation on the test data to obtain the convolution result; the target algorithm program is Implement the algorithm program of convolutional neural network by sliding window method; send test data to FPGA, so that FPGA can use Winograd program to perform fast convolution calculation on test data; receive the fast convolution result sent by FPGA, and calculate the fast convolution result and The similarity of the convolution results; when the similarity is greater than the threshold, it is determined that the Winograd program is correct.
  • the target algorithm program of the deep neural convolutional network that is, the implementation process of the sliding window algorithm
  • only the use of loop nesting can accurately express the convolution algorithm, and it has the advantages of simple code and low probability of error.
  • the Winograd program is a fast algorithm program for realizing the convolutional neural network, that is to say, when the correctly expressed Winograd program and the target algorithm program calculate the convolution calculation result of the same input data, the two convolution results obtained should be Consistency or keeping within a certain range of differences means similarity. Based on this, after writing the Winograd program to the FPGA, when receiving the Winograd program detection instruction, first obtain the test data for verification.
  • the test data can be sent to the FPGA.
  • the FPGA uses the Winograd program to perform fast convolution calculation on the test data, and then sends the fast convolution calculation result to the CPU.
  • the CPU receives the fast convolution result sent by the FPGA, it calculates the similarity between the fast convolution result and the convolution result; when the similarity is greater than the threshold, it is determined that the Winograd program is correct.
  • the target algorithm program running in the CPU can be used to detect the Winograd program in the FPGA, which ensures the accuracy of the Winograd algorithm part, can improve the accuracy of the CNN algorithm in the FPGA, and further improve the implementation on the FPGA The accuracy of computer vision tasks.
  • the embodiments of the present invention also provide a program detection device, device and readable storage medium corresponding to the above-mentioned program detection method, which have the above-mentioned technical effects and will not be repeated here.
  • FIG. 1 is an implementation flowchart of a program detection method in an embodiment of the present invention
  • Figure 2 is a schematic diagram of the function of creating a board running environment
  • Figure 3 is a schematic diagram of the initialization board parameter function
  • FIG. 5 is a schematic structural diagram of a program detection device according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a program detection device according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a specific structure of a program detection device in an embodiment of the present invention.
  • the core of the present invention is to provide a program detection method that combines the advantages of the Winograd algorithm and the target algorithm, and proposes a way to check the running result of the Winograd algorithm program based on the running result of the target algorithm program to further determine whether the Winograd algorithm program is correct
  • the method of Winograd algorithm is expressed.
  • trans_input0 4.0f*d1-5.0f*d3+d5;
  • trans_input1 -4.0f*d2-4.0f*d3+d4+d5;
  • trans_input3 -2.0f*d2-d3+2.0f*d4+d5;
  • float trans_filter2 minus_one_over_6*g0+one_over_6*g1
  • float trans_filter3 one_over_24*g0+one_over_12*g1+one_over_6*g2;
  • float trans_filter4 one_over_24*g0-one_over_12*g1+one_over_6*g2;
  • Winograd matrix multiplication
  • the expression of the Winograd algorithm on the kernel side of the FPGA remains unchanged, but the first layer CNN convolution result of its calculation is returned to the host side, and at the same time, the traditional convolution calculation is implemented on the host side, and another thread is used to calculate the convolution of the first layer CNN result.
  • the calculation result of the traditional convolution algorithm on the host side with the calculation result of the Winograd algorithm returned by the kernel side. If the difference between the calculation results is small and within the expected permission range, it means that the Winograd calculation result is correct, and the kernel side cnn program continues Run; if the difference in the calculation results exceeds the expected result, it means that the Winograd algorithm expression is wrong, and the program needs to be interrupted to check and modify.
  • FIG. 1 is a flowchart of a program detection method according to an embodiment of the present invention. The method can be applied to a CPU. The method includes the following steps:
  • the Winograd program is a fast algorithm program for realizing a convolutional neural network.
  • the Winograd program detection instruction can be sent to the CPU through the visual interface or through the command line.
  • the CPU receives the Winograd program detection instruction, the CPU can obtain data for testing the Winograd program.
  • the test data may specifically be image data or a matrix. Obtaining test data can receive external incoming test data through the interface, and can also read parameter data directly from the storage device.
  • the target algorithm program is an algorithm program that implements a convolutional neural network in a sliding window manner.
  • the target algorithm program of the convolutional neural network may be written in advance. After the test data is obtained, the target algorithm program can be used to perform convolution calculation on the test data to obtain the convolution result. Among them, the target algorithm program can also use Fourier or im2col to realize the algorithm program of the convolutional neural network.
  • the sliding window algorithm this method is the most intuitive and simple method.
  • im2col algorithm At present, almost all mainstream computing frameworks including Caffe, MXNet, etc. have implemented this method. This method converts the entire convolution process into a GEMM process, and GEMM is extremely optimized in various BLAS libraries.
  • FFT algorithm Fourier transform and fast Fourier transform are commonly used calculation methods in classic image processing. Since the sliding window algorithm, Fourier algorithm or im2col algorithm are common algorithms, the specific processing logic will not be repeated here.
  • the test data After the test data is obtained, the test data needs to be sent to the FPGA.
  • the FPGA in the embodiment of the present invention may be a chip or device with editable logic gates. After the FPGA receives the test data, it can use the Winograd program to perform a fast convolution calculation on the test data to obtain a fast convolution result. After the FPGA calculates the fast convolution result, the fast convolution result can be returned to the CPU.
  • test data is sent to the FPGA, including:
  • Step 1 Create PFGA board operating environment and initialize board parameters
  • Step 2 Send test data to FPGA.
  • the creation of the board operating environment and the initialization of the board parameters can be operated on the board by calling the functions packaged by intel.
  • the function of creating the board running environment as shown in FIG. 2 can be called, as shown in FIG. 3.
  • the function of initializing board parameters can be used.
  • the FPGA After the FPGA receives the test data, it can start the kernel and use the Winograd program to perform fast convolution calculation on the test data.
  • the kernel is a real-time operating system with event scheduling and synchronization in the FPGA, communication between processes (messaging), memory management, and process management. In this way, after the fast convolution result is obtained, the result can be returned to the CPU.
  • S104 Receive the fast convolution result sent by the FPGA, and calculate the similarity between the fast convolution result and the convolution result.
  • the CPU After the CPU receives the FPGA legal fast convolution result, it can calculate the similarity between the fast convolution result and the convolution result.
  • the calculation results should be consistent or have a high degree of similarity.
  • the code of the target algorithm program is relatively simple and is not easy to make mistakes, the convolution result obtained by the convolution calculation of the test data by the target algorithm program can be used as a reference value.
  • the judgment The similarity between the convolution result and the fast convolution result can determine whether the Winograd program is correct.
  • the calculation method of the similarity includes but is not limited to the following two methods.
  • a calculation method of relative degree may be selected:
  • Method 1 Calculate the ratio between the fast convolution result and the convolution result, and use the ratio to determine the similarity. By judging the relationship between the ratio of two values and 1, the similarity of these two numbers can be determined. Specifically, the ratio is close to 1, indicating that the similarity between the two values is higher. Based on this, after the fast convolution result and the convolution result are obtained, the ratio of the fast convolution result to the convolution result can be calculated, and then the similarity can be determined using the ratio.
  • the guarantee ratio is between (0, 1) (that is, the calculated fast convolution result is better than the convolution result, or the convolution result is better than the calculated fast convolution Results
  • the specified ratio is 1
  • the similarity is 100%
  • the ratio is (0-1)
  • the ratio is directly After differentiation, the percentage is determined as the similarity.
  • Method 2 Calculate the difference between the fast convolution result and the convolution result, and use the difference to determine the similarity. Specifically, when the difference is 0, the similarity is 100%, and different differences are specified as different similarities. For example, when the difference is 1, the similarity is 99%, and the difference is 2, yes. The degree is 98%. According to a certain ratio, the larger the difference, the smaller the similarity.
  • a threshold may be set, and the threshold is used to compare with the similarity to determine whether the Winograd program is correct. Specifically, when the similarity is greater than the threshold, the Winograd program can be determined to be correct. When the similarity is less than or equal to the threshold, it is determined that the Winograd program is wrong.
  • the value of the threshold can be determined as 99%, or 99.9%, or 99.999%.
  • the test data is obtained; the target algorithm program of the convolutional neural network is used to perform convolution calculation on the test data to obtain the convolution result; the target algorithm program is Implement the algorithm program of convolutional neural network by sliding window method; send test data to FPGA, so that FPGA can use Winograd program to perform fast convolution calculation on test data; receive the fast convolution result sent by FPGA, and calculate the fast convolution result and The similarity of the convolution results; when the similarity is greater than the threshold, it is determined that the Winograd program is correct.
  • the target algorithm program of the deep neural convolution network that is, the implementation process of the sliding window algorithm
  • only the use of loop nesting can accurately express the convolution algorithm, and it has the advantages of simple code and low error probability.
  • the Winograd program is a fast algorithm program for implementing convolutional neural networks, that is, when the correctly expressed Winograd program and the target algorithm program calculate the convolution calculation result of the same input data, the two convolution results obtained should be Consistency or keeping within a certain range of differences means similarity. Based on this, after the Winograd program is written into the FPGA, when the Winograd program detection instruction is received, the test data for verification is first obtained.
  • the test data can be sent to the FPGA.
  • the FPGA uses the Winograd program to perform fast convolution calculation on the test data, and then sends the fast convolution calculation result to the CPU.
  • the CPU receives the fast convolution result sent by the FPGA, it calculates the similarity between the fast convolution result and the convolution result; when the similarity is greater than the threshold, it is determined that the Winograd program is correct.
  • the target algorithm program running in the CPU can be used to detect the Winograd program in the FPGA, which ensures the accuracy of the Winograd algorithm part, can improve the accuracy of the CNN algorithm in the FPGA, and further improve the implementation on the FPGA The accuracy of computer vision tasks.
  • the embodiments of the present invention also provide corresponding improvements.
  • the same steps as in the above-mentioned embodiments or the corresponding steps can be referred to each other, and the corresponding beneficial effects can also be cross-referenced, which will not be repeated in the preferred/improved embodiments herein.
  • the difference and the ratio can be directly compared with the preset judgment threshold to determine whether the Winograd program is correct. Specifically, after calculating the difference between the convolution result and the fast convolution result, if the difference is less than 10 -3 , it is determined that the Winograd program is correct, or the ratio between the convolution result and the fast convolution result is greater than 0.999, Then make sure the Winograd program is correct.
  • the judgment thresholds of 10 -3 and 0.999 can be adjusted according to the actual accuracy requirements.
  • step S102 may specifically use a target algorithm program to perform convolution calculation on the test data, and use the first layer result of the convolutional neural network as the convolution result; accordingly, the FPGA in step S104 uses the Winograd program to test the data Carry out fast convolution calculation, specifically, FPGA uses the Winograd program to perform fast convolution calculation on the test data, and the first layer result of the convolutional neural network obtained by the fast calculation is used as the fast convolution result. In this way, the verification time of the Winograd program can be shortened.
  • the filter parameters in the convolutional neural network can also be set. Specifically, the filter parameters of the convolutional neural network are obtained, and the filter parameters are respectively set in the target convolution algorithm program and the Winograd program. In this way, the filter parameters in the target convolution algorithm program and the Winograd program can be guaranteed to be consistent, and the accuracy of the Winograd program under different filter parameters can also be tested separately.
  • FIG. 4 is a specific flowchart of a program detection method according to an embodiment of the present invention.
  • the expression of the Winograd algorithm expressed by the Winograd program on the FPGA-kernel side remains unchanged, but the first layer of CNN convolution results of its calculation are returned to the host (host side, same as the CPU or processor above), while achieving the target on the host side Convolution calculation, starting a new thread to calculate the convolution result of the first layer CNN. After the calculation is completed, compare the calculation result of the target convolution algorithm on the host side with the calculation result of the Winograd algorithm returned by the kernel side.
  • test data and filter data (same as the filter parameters above) into the CPU cache. Then, two threads are started on the host side, where thread 1 is used to calculate the convolution according to the target convolution algorithm; thread 2 is used to start the kernel to use the FPGA board to accelerate the calculation of CNN.
  • thread 2 After thread 2 starts, it first creates the FPGA board operating environment, initializes the board parameters, and then writes the test data and filter data into the FPGA board card cache, and then starts the FPGA kernel program to perform calculations. That is, the kernel program calculates the convolution according to the Winograd algorithm, obtains the first layer CNN convolution, and returns the convolution result to the host side.
  • the input and filter data are first obtained, and then the first layer CNN convolution result is calculated according to the target convolution algorithm.
  • Received Winograd convolution result data returned from the kernel. Then compare the difference of the convolution results obtained by the two methods. If the difference is less than 10 -3 , it means that the Winograd algorithm program is correct. If the difference is beyond the expected range, it means that there is a problem with the expression of the kernel Winograd algorithm program, and the program needs to be interrupted to check and modify.
  • an embodiment of the present invention also provides a program detection device.
  • the program detection device described below and the program detection method described above can be referred to each other.
  • the device includes the following modules:
  • the test data obtaining module 101 is used to obtain test data when receiving the Winograd program detection instruction; wherein, the Winograd program is a fast algorithm program for realizing a convolutional neural network;
  • the convolution calculation module 102 is used to use the target algorithm program of the convolutional neural network to perform convolution calculation on the test data to obtain the convolution result;
  • the target algorithm program is an algorithm program that implements the convolutional neural network in a sliding window manner
  • the test data sending module 103 is used to send the test data to the FPGA so that the FPGA can use the Winograd program to perform fast convolution calculation on the test data;
  • the similarity calculation module 104 is used to receive the fast convolution result sent by the FPGA and calculate the similarity between the fast convolution result and the convolution result;
  • the detection result determination module 105 is used to determine that the Winograd program is correct when the similarity is greater than the threshold.
  • the test data is obtained; the target algorithm program of the convolutional neural network is used to perform convolution calculation on the test data to obtain the convolution result; the target algorithm program is Implement the algorithm program of convolutional neural network by sliding window method; send test data to FPGA, so that FPGA can use Winograd program to perform fast convolution calculation on test data; receive the fast convolution result sent by FPGA, and calculate the fast convolution result and The similarity of the convolution results; when the similarity is greater than the threshold, it is determined that the Winograd program is correct.
  • the target algorithm program of the deep neural convolutional network that is, the implementation process of the sliding window algorithm
  • only the use of loop nesting can accurately express the convolution algorithm, and it has the advantages of simple code and low probability of error.
  • the Winograd program is a fast algorithm program for realizing the convolutional neural network, that is to say, when the correctly expressed Winograd program and the target algorithm program calculate the convolution calculation result of the same input data, the two convolution results obtained should be Consistency or keeping within a certain range of differences means similarity. Based on this, after writing the Winograd program to the FPGA, when receiving the Winograd program detection instruction, first obtain the test data for verification.
  • the test data can be sent to the FPGA.
  • the FPGA uses the Winograd program to perform fast convolution calculation on the test data, and then sends the fast convolution calculation result to the CPU.
  • the CPU receives the fast convolution result sent by the FPGA, it calculates the similarity between the fast convolution result and the convolution result; when the similarity is greater than the threshold, it is determined that the Winograd program is correct.
  • the target algorithm program running in the CPU can be used to detect the Winograd program in the FPGA, which ensures the accuracy of the Winograd algorithm part, can improve the accuracy of the CNN algorithm in the FPGA, and further improve the implementation on the FPGA The accuracy of computer vision tasks.
  • the convolution calculation module 102 is specifically used to perform fast convolution calculation on the test data in the FPGA using the Winograd program, and use the fast calculation to obtain the first layer result of the convolutional neural network as a fast
  • the target algorithm program is used to perform convolution calculation on the test data, and the first layer result of the convolutional neural network is used as the convolution result.
  • the filter setting module is used to obtain the filter parameters of the convolutional neural network; the filter parameters are set in the target convolution algorithm program and the Winograd program, respectively.
  • the test data sending module 103 is specifically used to create a PFGA board operating environment and initialize board parameters; send the test data to the FPGA so that the FPGA starts the kernel and uses the Winograd program to Test data for fast convolution calculation.
  • the similarity calculation module 104 is specifically used to calculate the ratio between the fast convolution result and the convolution result, and use the ratio to determine the similarity; or, calculate the fast convolution result and the convolution result. Difference, use the difference to determine the similarity.
  • the detection result determination module 105 is specifically configured to determine that the Winograd program is wrong when the similarity is less than or equal to the threshold.
  • an embodiment of the present invention further provides a program detection device.
  • a program detection device described below and a program detection method described above can be referred to each other.
  • the program detection equipment includes:
  • Memory D1 used to store computer programs
  • the processor D2 is configured to implement the steps of the program detection method in the foregoing method embodiments when the computer program is executed.
  • FIG. 7 is a schematic diagram of a specific structure of a program detection device provided in this embodiment.
  • the program detection device may have a relatively large difference due to different configurations or performances, and may include one or more processings.
  • a central processing unit (CPU) 322 for example, one or more processors
  • a memory 332 for example, one or more storage media 330 (for example, one or more mass storage devices) that store application programs 342 or data 344.
  • the memory 332 and the storage medium 330 may be short-term storage or persistent storage.
  • the program stored in the storage medium 330 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the data processing device.
  • the central processor 322 may be configured to communicate with the storage medium 330 and execute a series of instruction operations in the storage medium 330 on the program detection device 301.
  • the program detection device 301 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input output interfaces 358, and/or one or more operating systems 341.
  • one or more power supplies 326 for example, Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • the steps in the program detection method described above can be implemented by the structure of the program detection device.
  • an embodiment of the present invention further provides a readable storage medium.
  • a readable storage medium described below and a program detection method described above can be referred to each other.
  • a readable storage medium stores a computer program on the readable storage medium, and when the computer program is executed by a processor, the steps of the program detection method of the foregoing method embodiments are implemented.
  • the readable storage medium may specifically be a U-disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, which can store program codes Readable storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A program detection method, apparatus and device, and a readable storage medium. The method of the present application comprises: after a Winograd program detection instruction is received, obtaining test data; performing convolution calculation on the test data by using a target algorithm program of a convolutional neural network to obtain a convolution result; sending the test data to an FPGA, so that the FPGA performs fast convolution calculation on the test data by using a Winograd program; receiving a fast convolution result sent by the FPGA, and calculating a similarity between the fast convolution result and the convolution result; and if the similarity is greater than a threshold, determining that the Winograd program is correct. By means of the method, the Winograd program in the FPGA can be detected.

Description

一种程序检测方法、装置、设备及可读存储介质Program detection method, device, equipment and readable storage medium
本申请要求于2018年12月10日提交至中国专利局、申请号为201811514703.0、发明名称为“一种程序检测方法、装置、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application filed on December 10, 2018 in the Chinese Patent Office with the application number 201811514703.0 and the invention titled "a program detection method, device, equipment and readable storage medium", all of which are The content is incorporated into this application by reference.
技术领域Technical field
本发明涉及计算机应用技术领域,特别是涉及一种程序检测方法、装置、设备及可读存储介质。The present invention relates to the field of computer application technology, and in particular, to a program detection method, device, device, and readable storage medium.
背景技术Background technique
近年来,卷积神经网络(CNN)越来越广泛地应用于计算机视觉任务。CNN通常包含多个层,每一层的输出特征图是下一层的输入特征图。当前最优CNN的计算主要由卷积层主导。In recent years, Convolutional Neural Networks (CNN) are more and more widely used in computer vision tasks. CNN usually contains multiple layers, and the output feature map of each layer is the input feature map of the next layer. The calculation of the current optimal CNN is mainly dominated by the convolutional layer.
FPGA(Field-Programmable Gate Array,现场可编程门阵列),因其具有高性能、低能耗和可重配置性的优点,并成为CNN的有效硬件加速器而备受关注。如若使用目标的卷积算法,输出特征图中的每个元素要经多步乘积累加运算进行单独计算,这需要耗费FPGA中大量的DSP(乘法器)资源来进行乘法运算,然而FPGA板卡中的DSP资源有限且十分珍贵,不能满足目标卷积算法需要的乘法数量。FPGA (Field-Programmable Gate Array), because of its advantages of high performance, low energy consumption and reconfigurability, has become the effective hardware accelerator of CNN and has attracted much attention. If the target's convolution algorithm is used, each element in the output feature map needs to be calculated separately through multi-step multiply-accumulate and add operations, which requires a large amount of DSP (multiplier) resources in the FPGA to perform multiplication operations. However, in the FPGA board DSP resources are limited and very precious, and cannot meet the number of multiplications required by the target convolution algorithm.
Winograd算法是一种卷积神经网络快速算法,它利用元素之间的结构相似性生成输出特征图中的一列元素。可减少乘法运算的数量,从而大幅度降低了算法复杂度,可改善了FPGA上的CNN性能。The Winograd algorithm is a fast algorithm for convolutional neural networks. It uses the structural similarity between elements to generate a list of elements in the output feature map. Can reduce the number of multiplication operations, thereby greatly reducing the complexity of the algorithm, can improve the CNN performance on the FPGA.
然而,Winograd算法在FPGA上的代码实现,程序复杂度增加,在代码开发过程中,极易出错。一旦Winograd算法部分计算出错,将会影响整个CNN算法的准确度。为了使得程序校验结果更为准确,往往需要输入不同的测试数据对Winograd程序进行检验。但是,又因,在开发阶段,由于Winograd算法复杂,难以找出不同测试数据对应的卷积结果。即便是预先存储不同测试数据与测试结果的对应表,用于程序校验,也存在测试数据量少、具有偶然性和测试过程复杂,难以实现的问题。However, the code implementation of the Winograd algorithm on the FPGA increases the complexity of the program, and it is extremely error-prone during the code development process. Once the Winograd algorithm is partially wrong, it will affect the accuracy of the entire CNN algorithm. In order to make the program verification results more accurate, it is often necessary to enter different test data to verify the Winograd program. However, because of the complexity of the Winograd algorithm in the development stage, it is difficult to find the convolution results corresponding to different test data. Even if a correspondence table of different test data and test results is stored in advance for program verification, there are problems that the amount of test data is small, there is contingency and the test process is complicated, and it is difficult to realize.
综上所述,如何有效地检验Winograd程序是否正确等问题,是目前本领域技术人员急需解决的技术问题。In summary, how to effectively check whether the Winograd program is correct is a technical problem urgently needed by those skilled in the art.
发明内容Summary of the invention
本发明的目的是提供一种程序检测方法、装置、设备及可读存储介质,以对FPGA上的Winograd程序的进行检测,以保障整个CNN算法的准确度。The purpose of the present invention is to provide a program detection method, device, equipment and readable storage medium to detect the Winograd program on the FPGA to ensure the accuracy of the entire CNN algorithm.
为解决上述技术问题,本发明提供如下技术方案:To solve the above technical problems, the present invention provides the following technical solutions:
一种程序检测方法,包括:A program detection method, including:
在接收到Winograd程序检测指令时,获取测试数据;Obtain test data when receiving the Winograd program detection instruction;
利用所述卷积神经网络的目标算法程序,对所述测试数据进行卷积计算,获得卷积结果;所述目标算法程序为以滑窗方式实现所述卷积神经网络的算法程序;Use the target algorithm program of the convolutional neural network to perform convolution calculation on the test data to obtain a convolution result; the target algorithm program is an algorithm program that implements the convolutional neural network in a sliding window manner;
将所述测试数据发送给FPGA,以便所述FPGA利用所述Winograd程序对所述测试数据进行快速卷积计算;Sending the test data to the FPGA, so that the FPGA uses the Winograd program to perform fast convolution calculation on the test data;
接收所述FPGA发送的快速卷积结果,并计算所述快速卷积结果与所述卷积结果的相似度;Receiving the fast convolution result sent by the FPGA, and calculating the similarity between the fast convolution result and the convolution result;
当所述相似度大于阈值时,确定所述Winograd程序正确。When the similarity is greater than the threshold, it is determined that the Winograd program is correct.
优选地,利用所述卷积神经网络的目标算法程序,对所述测试数据进行卷积计算,获得卷积结果,包括:Preferably, the target algorithm program of the convolutional neural network is used to perform convolution calculation on the test data to obtain a convolution result, including:
利用所述目标算法程序,对所述测试数据进行卷积计算,将所述卷积神经网络的第一层结果作为所述卷积结果;Use the target algorithm program to perform convolution calculation on the test data, and use the first layer result of the convolutional neural network as the convolution result;
相应地,所述FPGA利用所述Winograd程序对所述测试数据进行快速卷积计算,包括:Correspondingly, the FPGA uses the Winograd program to perform fast convolution calculation on the test data, including:
所述FPGA利用所述Winograd程序对所述测试数据进行快速卷积计算,并将快速计算得到所述卷积神经网络的第一层结果作为所述快速卷积结果。The FPGA uses the Winograd program to perform fast convolution calculation on the test data, and uses the fast calculation to obtain the first layer result of the convolutional neural network as the fast convolution result.
优选地,还包括:Preferably, it also includes:
获取所述卷积神经网络的滤波器参数;Obtaining filter parameters of the convolutional neural network;
将所述滤波器参数分别设置在所述目标卷积算法程序和Winograd程序中。The filter parameters are set in the target convolution algorithm program and the Winograd program, respectively.
优选地,将所述测试数据发送给所述FPGA,包括:Preferably, sending the test data to the FPGA includes:
创建所述PFGA板卡运行环境,并初始化板卡参数;Create the PFGA board operating environment and initialize board parameters;
将所述测试数据发送给所述FPGA。Sending the test data to the FPGA.
优选地,所述FPGA利用所述Winograd程序对所述测试数据进行快速卷积计算,包括:Preferably, the FPGA uses the Winograd program to perform fast convolution calculation on the test data, including:
所述FPGA启动kernel,并利用所述Winograd程序对所述测试数据进行快速卷积计算。The FPGA starts the kernel and uses the Winograd program to perform fast convolution calculation on the test data.
优选地,计算所述快速卷积结果与所述卷积结果的相似度,包括:Preferably, calculating the similarity between the fast convolution result and the convolution result includes:
计算所述快速卷积结果与所述卷积结果的比值,利用所述比值确定所述相似度;Calculating the ratio of the fast convolution result to the convolution result, and using the ratio to determine the similarity;
或,计算所述快速卷积结果与所述卷积结果的差值,利用所述差值确定所述相似度。Or, calculate the difference between the fast convolution result and the convolution result, and use the difference to determine the similarity.
优选地,当所述相似度小于或等于所述阈值时,还包括:Preferably, when the similarity is less than or equal to the threshold, it further includes:
确定所述Winograd程序错误。It is determined that the Winograd program is wrong.
一种程序检测装置,包括:A program detection device, including:
测试数据获取模块,用于在接收到Winograd程序检测指令时,获取测试数据;Test data acquisition module, used to acquire test data when receiving the Winograd program detection instruction;
卷积计算模块,用于利用所述卷积神经网络的目标算法程序,对所述测试数据进行卷积计算,获得卷积结果;所述目标算法程序为以滑窗方式实现所述卷积神经网络的算法程序;The convolution calculation module is used to use the target algorithm program of the convolutional neural network to perform convolution calculation on the test data to obtain a convolution result; the target algorithm program is to implement the convolutional nerve in a sliding window manner Algorithm program of the network;
测试数据发送模块,用于将所述测试数据发送给FPGA,以便所述FPGA利用所述Winograd程序对所述测试数据进行快速卷积计算;A test data sending module, configured to send the test data to the FPGA, so that the FPGA uses the Winograd program to perform fast convolution calculation on the test data;
相似度计算模块,用于接收所述FPGA发送的快速卷积结果,并计算所述快速卷积结果与所述卷积结果的相似度;A similarity calculation module, configured to receive the fast convolution result sent by the FPGA, and calculate the similarity between the fast convolution result and the convolution result;
检测结果确定模块,用于当所述相似度大于阈值时,确定所述Winograd程序正确。The detection result determination module is used to determine that the Winograd program is correct when the similarity is greater than a threshold.
一种程序检测设备,包括:A program detection device, including:
存储器,用于存储计算机程序;Memory, used to store computer programs;
处理器,用于执行所述计算机程序时实现上述程序检测方法的步骤。The processor is configured to implement the steps of the above program detection method when executing the computer program.
一种可读存储介质,所述可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述程序检测方法的步骤。A readable storage medium, a computer program is stored on the readable storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned program detection method are realized.
应用本发明实施例所提供的方法,在接收到Winograd程序检测指令时,获取测试数据;利用卷积神经网络的目标算法程序,对测试数据进行卷积计算,获得卷积结果;目标算法程序为以滑窗方式实现卷积神经网络的算法程序;将测试数据发送给FPGA,以便FPGA利用Winograd程序对测试数据进行快速卷积计算;接收FPGA发送的快速卷积结果,并计算快速卷积结果与卷积结果的相似度;当相似度大于阈值时,确定Winograd程序正确。Using the method provided by the embodiment of the present invention, when the Winograd program detection instruction is received, the test data is obtained; the target algorithm program of the convolutional neural network is used to perform convolution calculation on the test data to obtain the convolution result; the target algorithm program is Implement the algorithm program of convolutional neural network by sliding window method; send test data to FPGA, so that FPGA can use Winograd program to perform fast convolution calculation on test data; receive the fast convolution result sent by FPGA, and calculate the fast convolution result and The similarity of the convolution results; when the similarity is greater than the threshold, it is determined that the Winograd program is correct.
由于深度神经卷积网络的目标算法程序,即实现滑窗算法的实现流程,只需要使用循环嵌套便可将卷积算法准确表达出,且具有代码简单,出错概率小的优点。又因Winograd程序为实现卷积神经网络的快速算法程序,也就是说,正确表达的Winograd程序与目标算法程序在计算同一个输入数据的卷积计算结果时,所得到的两个卷积结果应当一致或保持在一定差异范围内,即具有相似性。基于此,在将Winograd程序写入FPGA之后,当接收到Winograd程序检测指令时,首先获取用于检验的测试数据。然后在CPU中利用卷积神经网络的目标算法程序对测试数据进行卷积计算,获得卷积结果。与此同时,可将测试数据发送给FPGA。FPGA得到测试数据之后,利用Winograd程序对测试数据进行快速卷积计算,然后将快速卷积计算结果发送给CPU。CPU接收FPGA发送的快速卷积结果之后,计算快速卷积结果与卷积结果的相似度;当相似度大于阈值时,确定Winograd程序正确。如此,便可通过运行在CPU中的目标算法程序,对FPGA中的Winograd程序进行检测,保障了Winograd算法部分的准确率,可提升FPGA内的CNN算法的准确度,进一步提高在FPGA上所进行计算机视觉任务的准确率。Because the target algorithm program of the deep neural convolutional network, that is, the implementation process of the sliding window algorithm, only the use of loop nesting can accurately express the convolution algorithm, and it has the advantages of simple code and low probability of error. Because the Winograd program is a fast algorithm program for realizing the convolutional neural network, that is to say, when the correctly expressed Winograd program and the target algorithm program calculate the convolution calculation result of the same input data, the two convolution results obtained should be Consistency or keeping within a certain range of differences means similarity. Based on this, after writing the Winograd program to the FPGA, when receiving the Winograd program detection instruction, first obtain the test data for verification. Then use the target algorithm program of the convolutional neural network in the CPU to perform convolution calculation on the test data to obtain the convolution result. At the same time, the test data can be sent to the FPGA. After the FPGA obtains the test data, it uses the Winograd program to perform fast convolution calculation on the test data, and then sends the fast convolution calculation result to the CPU. After the CPU receives the fast convolution result sent by the FPGA, it calculates the similarity between the fast convolution result and the convolution result; when the similarity is greater than the threshold, it is determined that the Winograd program is correct. In this way, the target algorithm program running in the CPU can be used to detect the Winograd program in the FPGA, which ensures the accuracy of the Winograd algorithm part, can improve the accuracy of the CNN algorithm in the FPGA, and further improve the implementation on the FPGA The accuracy of computer vision tasks.
相应地,本发明实施例还提供了与上述程序检测方法相对应的程序检测装置、设备和可读存储介质,具有上述技术效果,在此不再赘述。Correspondingly, the embodiments of the present invention also provide a program detection device, device and readable storage medium corresponding to the above-mentioned program detection method, which have the above-mentioned technical effects and will not be repeated here.
附图说明BRIEF DESCRIPTION
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the drawings required in the embodiments or the description of the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, without paying any creative work, other drawings can be obtained based on these drawings.
图1为本发明实施例中一种程序检测方法的实施流程图;FIG. 1 is an implementation flowchart of a program detection method in an embodiment of the present invention;
图2为创建板卡运行环境函数示意图;Figure 2 is a schematic diagram of the function of creating a board running environment;
图3为初始化板卡参数函数示意图;Figure 3 is a schematic diagram of the initialization board parameter function;
图4为本发明实施例中一种程序检测方法的具体流程图;4 is a specific flowchart of a program detection method in an embodiment of the present invention;
图5为本发明实施例中一种程序检测装置的结构示意图;5 is a schematic structural diagram of a program detection device according to an embodiment of the present invention;
图6为本发明实施例中一种程序检测设备的结构示意图;6 is a schematic structural diagram of a program detection device according to an embodiment of the present invention;
图7为本发明实施例中一种程序检测设备的具体结构示意图。7 is a schematic diagram of a specific structure of a program detection device in an embodiment of the present invention.
具体实施方式detailed description
本发明的核心是提供一种程序检测方法,该方法将Winograd算法与目标算法的优势相结合,提出以目标算法程序的运行结果来检验Winograd算法程序运行结果的方式,进一步确定Winograd算法程序是否正确表达了Winograd算法的方法。The core of the present invention is to provide a program detection method that combines the advantages of the Winograd algorithm and the target algorithm, and proposes a way to check the running result of the Winograd algorithm program based on the running result of the target algorithm program to further determine whether the Winograd algorithm program is correct The method of Winograd algorithm is expressed.
其中,快速Winograd算法简介:假设F(m,r)表示输入数据大小为m,滤波器大小为r的一维卷积,而F(mxm,rxr)表示输入数据大小为m*m,滤波器大小为r*r的二维卷积。一维卷积F(m,r)的winograd快速过滤算法可以写成矩阵形式:Y=A T[(Gg)·(B Td)];F(m,r)与其自身嵌套可获得最小二维卷积F(m*m,r*r)的Winograd快速过滤算法,可表示为:Y=A T[(GgG T)·(B TdB)]A,其中g为滤波数据,d为输入数据,G,A,B为三个变换矩阵。以F(4,3)为例,G,B,A的值如下所示: Among them, the introduction of the fast Winograd algorithm: suppose F(m, r) represents the one-dimensional convolution of the input data size m and the filter size r, and F(mxm, rxr) represents the input data size m*m, the filter A two-dimensional convolution of size r*r. The one-dimensional convolution F (m, r) Winograd fast filtering algorithm can be written in the form of a matrix: Y = A T [(Gg) · (B T d)]; F (m, r) nested with itself to obtain the least square Dimensional convolution F(m*m, r*r) Winograd fast filtering algorithm can be expressed as: Y = A T [(GgG T )·(B T dB)] A, where g is the filtered data and d is the input Data, G, A and B are three transformation matrices. Taking F(4, 3) as an example, the values of G, B, and A are as follows:
Figure PCTCN2019103639-appb-000001
Figure PCTCN2019103639-appb-000001
在FPGA上代码实现Winograd算法时,需要按以上公式进行计算,以一维F(4,3)为例,d=[d1,d2,d3,d4,d5],g=[g0,g1,g2],计算B Td需要用代码表达出6个数学表达式: When implementing the Winograd algorithm on the FPGA code, you need to calculate according to the above formula, taking one-dimensional F(4, 3) as an example, d=[d1, d2, d3, d4, d5], g=[g0, g1, g2 ], the calculation of B T d needs to express 6 mathematical expressions in code:
float trans_input0=4.0f*d1-5.0f*d3+d5;float, trans_input0=4.0f*d1-5.0f*d3+d5;
float trans_input1=-4.0f*d2-4.0f*d3+d4+d5;float, trans_input1 = -4.0f*d2-4.0f*d3+d4+d5;
float trans_input2=4.0f*d2-4.0f*d3-d4+d5;float trans_input2=4.0f*d2-4.0f*d3-d4+d5;
float trans_input3=-2.0f*d2-d3+2.0f*d4+d5;float, trans_input3 = -2.0f*d2-d3+2.0f*d4+d5;
float trans_input4=2.0f*d2-d3-2.0f*d4+d5;float, trans_input4=2.0f*d2-d3-2.0f*d4+d5;
float trans_input5=4.0f*d2-5.0f*d4+d5;float trans_input5=4.0f*d2-5.0f*d4+d5;
计算Gg需要用代码表达出6个数学表达式:To calculate Gg, you need to express 6 mathematical expressions in code:
float trans_filter0=one_over_4*g0;float trans_filter0=one_over_4*g0;
float trans_filter1=minus_one_over_6*g0float trans_filter1=minus_one_over_6*g0
+minus_one_over_6*g1+minus_one_over_6*g2;+minus_one_over_6*g1+minus_one_over_6*g2;
float trans_filter2=minus_one_over_6*g0+one_over_6*g1float trans_filter2=minus_one_over_6*g0+one_over_6*g1
+minus_one_over_6*g2;+minus_one_over_6*g2;
float trans_filter3=one_over_24*g0+one_over_12*g1+one_over_6*g2;float trans_filter3=one_over_24*g0+one_over_12*g1+one_over_6*g2;
float trans_filter4=one_over_24*g0-one_over_12*g1+one_over_6*g2;float trans_filter4=one_over_24*g0-one_over_12*g1+one_over_6*g2;
float trans_filter5=g2;float trans_filter5=g2;
假设B Td与Gg点乘的结果为[mul0,mul1,mul2,mul3,mul4,mul5],最后计算得到最终结果Y需要用代码表达4个数学表达式: Assuming that the result of the dot product of B T d and Gg is [mul0, mul1, mul2, mul3, mul4, mul5], and finally calculate the final result Y, you need to express 4 mathematical expressions in code:
float result0=mul0+mul1+mul2+mul3+mul4;floatresult0=mul0+mul1+mul2+mul3+mul4;
float result1=mul1-mul2+2.0f*mul3-2.0f*mul4;float results1=mul1-mul2+2.0f*mul3-2.0f*mul4;
float result2=mul1+mul2+4.0f*mul3+4.0f*mul4;floatresult2=mul1+mul2+4.0f*mul3+4.0f*mul4;
float result3=mul1-mul2+8.0f*mul3-8.0f*mul4+mul5;floatresult3=mul1-mul2+8.0f*mul3-8.0f*mul4+mul5;
如上所示,一维F(4,3)的Winograd代码中共需手动编写6+6+4=16个四则运算表达式。类似的,二维F(4x4,3x3)则至少包含(6x6+6x6)+(6x3+6x6)+(4x6+4x4)=166个数学表达式。代码中需要手动编写的表达式数据急剧增加,且每一个表达中包含加减乘除四则运算,不同的常数和不同的变量。如此一来,表达式的繁多复杂使得代码式出错几率大大增加。As shown above, in the one-dimensional F(4,3) Winograd code, a total of 6+6+4=16 four arithmetic expressions need to be written manually. Similarly, two-dimensional F(4x4, 3x3) contains at least (6x6+6x6)+(6x3+6x6)+(4x6+4x4)=166 mathematical expressions. The expression data that needs to be manually written in the code increases sharply, and each expression contains the four operations of addition, subtraction, multiplication and division, different constants and different variables. As a result, the complexity of expressions greatly increases the chance of code errors.
传统的卷积算法简介:若按照传统的卷积算法计算F(4x4,3x3),如采用滑窗方式实现,则程序中只需要使用四个for循环嵌套计算,代码简单,出错概率小。传统算法程序如下:Brief introduction of traditional convolution algorithm: If F (4x4, 3x3) is calculated according to the traditional convolution algorithm, if sliding window method is used, only four for loop nesting calculations are needed in the program, the code is simple, and the probability of error is small. The traditional algorithm program is as follows:
Figure PCTCN2019103639-appb-000002
Figure PCTCN2019103639-appb-000002
经以上分析可知,FPGA kernel端实现的Winograd(矩阵乘法)算法代码书写复杂,出错概率高,而传统的卷积算法在host端实现时代码书写简单,出错率极低。本方法将结合这两种算法的优点,对Winograd算法进行改进。具体做法如下:It can be seen from the above analysis that the Winograd (matrix multiplication) algorithm code implemented on the FPGA-kernel side is complex and has a high error probability, while the traditional convolution algorithm is simple to write on the host side and has a very low error rate. This method will combine the advantages of these two algorithms to improve the Winograd algorithm. The specific practices are as follows:
FPGA kernel端的Winograd算法表达式不变,但将其计算的第一层CNN卷积结果传回host端,同时在host端实现传统的卷积计算,另起一个线程计算第一层CNN的卷积结果。待计算完后,比较host端传统卷积算法的计算结果与kernel端传回的Winograd算法计算结果,如果计算结果差异很小,在预期许可范围内,说明Winograd计算结果正确,kernel端cnn程序继续运行;如果计算结果存在的差异超出预期结果,则说明Winograd算法表达式出错,需中断程序进行检查修改。The expression of the Winograd algorithm on the kernel side of the FPGA remains unchanged, but the first layer CNN convolution result of its calculation is returned to the host side, and at the same time, the traditional convolution calculation is implemented on the host side, and another thread is used to calculate the convolution of the first layer CNN result. After the calculation is completed, compare the calculation result of the traditional convolution algorithm on the host side with the calculation result of the Winograd algorithm returned by the kernel side. If the difference between the calculation results is small and within the expected permission range, it means that the Winograd calculation result is correct, and the kernel side cnn program continues Run; if the difference in the calculation results exceeds the expected result, it means that the Winograd algorithm expression is wrong, and the program needs to be interrupted to check and modify.
为了使本技术领域的人员更好地理解本发明方案,下面结合附图和具体实施方式对本发明作进一步的详细说明。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to enable those skilled in the art to better understand the solution of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without making creative efforts fall within the protection scope of the present invention.
实施例一:Example one:
请参考图1,图1为本发明实施例中一种程序检测方法的流程图,该方法可应用于CPU中,该方法包括以下步骤:Please refer to FIG. 1. FIG. 1 is a flowchart of a program detection method according to an embodiment of the present invention. The method can be applied to a CPU. The method includes the following steps:
S101、在接收到Winograd程序检测指令时,获取测试数据。S101. Obtain test data when receiving the Winograd program detection instruction.
其中,Winograd程序为实现卷积神经网络的快速算法程序。Among them, the Winograd program is a fast algorithm program for realizing a convolutional neural network.
当开发人员完成Winograd程序的代码开发之后,并将Winograd程序写入FPGA之后,可通过在可视化界面或通过命令行的方式向CPU发送Winograd程序检测指令。在CPU接收到Winograd程序检测指令时,CPU可获取测试Winograd程序的数据。具体的,该测试数据可具体为图像数据、矩阵。获取测试数据可通过接口接收外部传入的测试数据,也可直接从存储设备中读取参数数据。After the developer completes the code development of the Winograd program and writes the Winograd program to the FPGA, the Winograd program detection instruction can be sent to the CPU through the visual interface or through the command line. When the CPU receives the Winograd program detection instruction, the CPU can obtain data for testing the Winograd program. Specifically, the test data may specifically be image data or a matrix. Obtaining test data can receive external incoming test data through the interface, and can also read parameter data directly from the storage device.
S102、利用卷积神经网络的目标算法程序,对测试数据进行卷积计算,获得卷积结果。S102. Use the target algorithm program of the convolutional neural network to perform convolution calculation on the test data to obtain a convolution result.
其中,目标算法程序为以滑窗方式实现卷积神经网络的算法程序。Among them, the target algorithm program is an algorithm program that implements a convolutional neural network in a sliding window manner.
在本发明实施例中,可以预先编写卷积神经网络的目标算法程序。当获取到测试数据之后,便可利用目标算法程序,对测试数据进行卷积计算,得到卷积结果。其中,目标算法程序还可选用傅里叶或im2col方式实现卷积神经网络的算法程序。In the embodiment of the present invention, the target algorithm program of the convolutional neural network may be written in advance. After the test data is obtained, the target algorithm program can be used to perform convolution calculation on the test data to obtain the convolution result. Among them, the target algorithm program can also use Fourier or im2col to realize the algorithm program of the convolutional neural network.
其中,滑窗算法,这种方法是最直观最简单的方法。im2col算法:目前几乎所有的主流计算框架包括Caffe,MXNet等都实现了该方法,该方法把整个卷积过程转化成了GEMM过程,而GEMM在各种BLAS库中都是被极致优化的。FFT算法:傅里叶变换和快速傅里叶变化是在经典图像处理里面经常使用的计算方法。由于滑窗算法、傅里叶算法或im2col算法为常见算法,在此不再赘述具体的处理逻辑。Among them, the sliding window algorithm, this method is the most intuitive and simple method. im2col algorithm: At present, almost all mainstream computing frameworks including Caffe, MXNet, etc. have implemented this method. This method converts the entire convolution process into a GEMM process, and GEMM is extremely optimized in various BLAS libraries. FFT algorithm: Fourier transform and fast Fourier transform are commonly used calculation methods in classic image processing. Since the sliding window algorithm, Fourier algorithm or im2col algorithm are common algorithms, the specific processing logic will not be repeated here.
S103、将测试数据发送给FPGA,以便FPGA利用Winograd程序对测试数据进行快速卷积计算。S103. Send the test data to the FPGA so that the FPGA can use the Winograd program to perform fast convolution calculation on the test data.
得到测试数据之后,还需将测试数据发送给FPGA。具体的,本发明实施例中的FPGA可为具有可编辑逻辑门的芯片或设备。当FPGA接收到测试数据之后,可利用Winograd程序对测试数据进行快速卷积计算,得到快速卷积结果。FPGA计算出快速卷积结果之后,便可将该快速卷积结果返回至CPU。After the test data is obtained, the test data needs to be sent to the FPGA. Specifically, the FPGA in the embodiment of the present invention may be a chip or device with editable logic gates. After the FPGA receives the test data, it can use the Winograd program to perform a fast convolution calculation on the test data to obtain a fast convolution result. After the FPGA calculates the fast convolution result, the fast convolution result can be returned to the CPU.
其中,将测试数据发送给FPGA,具体包括:Among them, the test data is sent to the FPGA, including:
步骤一、创建PFGA板卡运行环境,并初始化板卡参数;Step 1: Create PFGA board operating environment and initialize board parameters;
步骤二、将测试数据发送给FPGA。Step 2: Send test data to FPGA.
为便于描述,下面将上述步骤一和步骤二结合起来进行说明。For ease of description, the following steps 1 and 2 are combined for description.
首先创建出FPGA板卡运行环境,然后初始化板卡参数之后,可将测试数据发送给FPGA。其中,创建板卡运行环境和初始化板卡参数都可通过调用intel封装好的函数对板卡进行操作,如可调用如图2所示的创建板卡运行环境的函数,调用如图3所示的初始化板卡参数的函数。First create the FPGA board operating environment, and then initialize the board parameters, you can send the test data to the FPGA. Among them, the creation of the board operating environment and the initialization of the board parameters can be operated on the board by calling the functions packaged by intel. For example, the function of creating the board running environment as shown in FIG. 2 can be called, as shown in FIG. 3. The function of initializing board parameters.
FPGA接收到测试数据之后,可通过启动kernel,并利用Winograd程序对测试数据进行快速卷积计算。其中,kernel为FPGA内具有事件的调度和同步,进程间的通信(消息传递),存储器管理,进程管理的实时操作系统。如此,便可在得到快速卷积结果之后,将结果返回至CPU中。After the FPGA receives the test data, it can start the kernel and use the Winograd program to perform fast convolution calculation on the test data. Among them, the kernel is a real-time operating system with event scheduling and synchronization in the FPGA, communication between processes (messaging), memory management, and process management. In this way, after the fast convolution result is obtained, the result can be returned to the CPU.
S104、接收FPGA发送的快速卷积结果,并计算快速卷积结果与卷积结果的相似度。S104. Receive the fast convolution result sent by the FPGA, and calculate the similarity between the fast convolution result and the convolution result.
当CPU接收到FPGA法定快速卷积结果之后,可计算快速卷积结果与卷积结果的相似度。After the CPU receives the FPGA legal fast convolution result, it can calculate the similarity between the fast convolution result and the convolution result.
具体的,由于卷积神经网络的目标算法和Winograd算法分别对应的程序对同一输入数据进行卷积计算之后,计算结果应当一致或具有较高的相似度。又因目标算法程序代码相对简单,不易出错,因此,可利用目标算法程序对测试数据的进行卷积计算得到的卷积结果作为参考值,当得到Winograd算法程序的快速卷积计算结果之后,判断卷积结果和快速卷积结果的相似度便可确定Winograd程序是否正确。Specifically, since the programs corresponding to the target algorithm of the convolutional neural network and the Winograd algorithm respectively perform convolution calculation on the same input data, the calculation results should be consistent or have a high degree of similarity. Because the code of the target algorithm program is relatively simple and is not easy to make mistakes, the convolution result obtained by the convolution calculation of the test data by the target algorithm program can be used as a reference value. When the fast convolution calculation result of the Winograd algorithm program is obtained, the judgment The similarity between the convolution result and the fast convolution result can determine whether the Winograd program is correct.
具体的,该相似度的计算方式包括但不限于以下两种方式,在实际应用中,可任选一种相对度计算方式:Specifically, the calculation method of the similarity includes but is not limited to the following two methods. In practical applications, a calculation method of relative degree may be selected:
方式1:计算快速卷积结果与卷积结果的比值,利用比值确定相似度。通过判断两个数值的比值与1的关系,便可确定出这两个数的相似度。具体的,比值约接近于1,则表明这两个数值的相似度越高。基于此,在得到快速卷积结果和卷积结果之后,可计算出快速卷积结果与卷积结果的比值,然后利用该比值确定相似度。具体的,在计算快速卷积结果与卷积结果的比值时,保障比值在(0,1]之间(即,取计算快速卷积结果比卷积结果,或卷积结果比计算快速卷积结果这两个结果中小于或等于0的结果作为快速卷积结果与卷积结果的比值),规定比值为1,相似度为100%,若比值在(0-1),则直接将比值百分化之后,将百分比确定为相似度。Method 1: Calculate the ratio between the fast convolution result and the convolution result, and use the ratio to determine the similarity. By judging the relationship between the ratio of two values and 1, the similarity of these two numbers can be determined. Specifically, the ratio is close to 1, indicating that the similarity between the two values is higher. Based on this, after the fast convolution result and the convolution result are obtained, the ratio of the fast convolution result to the convolution result can be calculated, and then the similarity can be determined using the ratio. Specifically, when calculating the ratio between the fast convolution result and the convolution result, the guarantee ratio is between (0, 1) (that is, the calculated fast convolution result is better than the convolution result, or the convolution result is better than the calculated fast convolution Results The result of these two results is less than or equal to 0 as the ratio of the fast convolution result to the convolution result), the specified ratio is 1, the similarity is 100%, if the ratio is (0-1), the ratio is directly After differentiation, the percentage is determined as the similarity.
方式2:计算快速卷积结果与卷积结果的差值,利用差值确定相似度。 具体的,可规定差值为0时,相似度为100%,规定出不同的差值为不同的相似度,如规定差值为1时,相似度为99%,差值为2是,相似度为98%。按照一定的比例,遵循差值越大,相似度越小即可。Method 2: Calculate the difference between the fast convolution result and the convolution result, and use the difference to determine the similarity. Specifically, when the difference is 0, the similarity is 100%, and different differences are specified as different similarities. For example, when the difference is 1, the similarity is 99%, and the difference is 2, yes. The degree is 98%. According to a certain ratio, the larger the difference, the smaller the similarity.
S105、当相似度大于阈值时,确定Winograd程序正确。S105. When the similarity is greater than the threshold, determine that the Winograd program is correct.
在本发明实施例中,可设置一个阈值,该阈值用来与相似度进行比较,以确定Winograd程序是否正确。具体的,可在相似度大于阈值时,确定Winograd程序正确。当相似度小于或等于阈值时,确定Winograd程序错误。该阈值的数值可确定为99%,或99.9%,或99.999%。In the embodiment of the present invention, a threshold may be set, and the threshold is used to compare with the similarity to determine whether the Winograd program is correct. Specifically, when the similarity is greater than the threshold, the Winograd program can be determined to be correct. When the similarity is less than or equal to the threshold, it is determined that the Winograd program is wrong. The value of the threshold can be determined as 99%, or 99.9%, or 99.999%.
应用本发明实施例所提供的方法,在接收到Winograd程序检测指令时,获取测试数据;利用卷积神经网络的目标算法程序,对测试数据进行卷积计算,获得卷积结果;目标算法程序为以滑窗方式实现卷积神经网络的算法程序;将测试数据发送给FPGA,以便FPGA利用Winograd程序对测试数据进行快速卷积计算;接收FPGA发送的快速卷积结果,并计算快速卷积结果与卷积结果的相似度;当相似度大于阈值时,确定Winograd程序正确。Using the method provided by the embodiment of the present invention, when the Winograd program detection instruction is received, the test data is obtained; the target algorithm program of the convolutional neural network is used to perform convolution calculation on the test data to obtain the convolution result; the target algorithm program is Implement the algorithm program of convolutional neural network by sliding window method; send test data to FPGA, so that FPGA can use Winograd program to perform fast convolution calculation on test data; receive the fast convolution result sent by FPGA, and calculate the fast convolution result and The similarity of the convolution results; when the similarity is greater than the threshold, it is determined that the Winograd program is correct.
由于深度神经卷积网络的目标算法程序,即实现滑窗算法的实现流程,只需要使用循环嵌套便可将卷积算法准确表达出,且具有代码简单,出错概率小的优点。又因Winograd程序为实现卷积神经网络的快速算法程序,也就是说,正确表达的Winograd程序与目标算法程序在计算同一个输入数据的卷积计算结果时,所得到的两个卷积结果应当一致或保持在一定差异范围内,即具有相似性。基于此,在将Winograd程序写入FPGA之后,当接收到Winograd程序检测指令时,首先获取用于检验的测试数据。然后在CPU中利用卷积神经网络的目标算法程序对测试数据进行卷积计算,获得卷积结果。与此同时,可将测试数据发送给FPGA。FPGA得到测试数据之后,利用Winograd程序对测试数据进行快速卷积计算,然后将快速卷积计算结果发送给CPU。CPU接收FPGA发送的快速卷积结果之后,计算快速卷积结果与卷积结果的相似度;当相似度大于阈值时,确定Winograd程序正确。如此,便可通过运行在CPU中的目标算法程序,对FPGA中的Winograd程序进行检测,保障了Winograd算法部分的准确率,可提升FPGA内的CNN算法的准确度,进一步提高在FPGA上所进行计算机视觉任务 的准确率。Because the target algorithm program of the deep neural convolution network, that is, the implementation process of the sliding window algorithm, only the use of loop nesting can accurately express the convolution algorithm, and it has the advantages of simple code and low error probability. Because the Winograd program is a fast algorithm program for implementing convolutional neural networks, that is, when the correctly expressed Winograd program and the target algorithm program calculate the convolution calculation result of the same input data, the two convolution results obtained should be Consistency or keeping within a certain range of differences means similarity. Based on this, after the Winograd program is written into the FPGA, when the Winograd program detection instruction is received, the test data for verification is first obtained. Then use the target algorithm program of the convolutional neural network in the CPU to perform convolution calculation on the test data to obtain the convolution result. At the same time, the test data can be sent to the FPGA. After the FPGA obtains the test data, it uses the Winograd program to perform fast convolution calculation on the test data, and then sends the fast convolution calculation result to the CPU. After the CPU receives the fast convolution result sent by the FPGA, it calculates the similarity between the fast convolution result and the convolution result; when the similarity is greater than the threshold, it is determined that the Winograd program is correct. In this way, the target algorithm program running in the CPU can be used to detect the Winograd program in the FPGA, which ensures the accuracy of the Winograd algorithm part, can improve the accuracy of the CNN algorithm in the FPGA, and further improve the implementation on the FPGA The accuracy of computer vision tasks.
需要说明的是,基于上述实施例,本发明实施例还提供了相应的改进方案。在优选/改进实施例中涉及与上述实施例中相同步骤或相应步骤之间可相互参考,相应的有益效果也可相互参照,在本文的优选/改进实施例中不再一一赘述。It should be noted that, based on the foregoing embodiments, the embodiments of the present invention also provide corresponding improvements. In the preferred/improved embodiments, the same steps as in the above-mentioned embodiments or the corresponding steps can be referred to each other, and the corresponding beneficial effects can also be cross-referenced, which will not be repeated in the preferred/improved embodiments herein.
优选地,在确定Winograd程序是否正确时,还可根据相似度计算原理,可直接将其中差值与比值与预先设置的判断阈值进行比值,以确定Winograd程序是否正确。具体的,当计算出卷积结果和快速卷积结果的差值之后,若该差值小于10 -3,则确定Winograd程序正确,或卷积结果与快速卷积结果之间的比值大于0.999,则确定Winograd程序正确。当然,其中的10 -3和0.999的判断阈值可根据实际精度需求进行调整。 Preferably, when determining whether the Winograd program is correct, according to the similarity calculation principle, the difference and the ratio can be directly compared with the preset judgment threshold to determine whether the Winograd program is correct. Specifically, after calculating the difference between the convolution result and the fast convolution result, if the difference is less than 10 -3 , it is determined that the Winograd program is correct, or the ratio between the convolution result and the fast convolution result is greater than 0.999, Then make sure the Winograd program is correct. Of course, the judgment thresholds of 10 -3 and 0.999 can be adjusted according to the actual accuracy requirements.
优选地,由于卷积神经网络中包括若干卷积层,且卷积算法可通过循环调用的方式减少代码数量,因此,在进行Winograd程序测试时,可仅对第一层卷积计算结果进行比对即可。具体的,即步骤S102可具体为利用目标算法程序,对测试数据进行卷积计算,将卷积神经网络的第一层结果作为卷积结果;相应地,步骤S104中FPGA利用Winograd程序对测试数据进行快速卷积计算,具体为FPGA利用Winograd程序对测试数据进行快速卷积计算,并将快速计算得到卷积神经网络的第一层结果作为快速卷积结果。如此,便可缩短Winograd程序的验证时间。Preferably, because the convolutional neural network includes several convolutional layers, and the convolution algorithm can reduce the number of codes by means of cyclic calls, therefore, when testing the Winograd program, only the first layer of convolution calculation results can be compared That's it. Specifically, step S102 may specifically use a target algorithm program to perform convolution calculation on the test data, and use the first layer result of the convolutional neural network as the convolution result; accordingly, the FPGA in step S104 uses the Winograd program to test the data Carry out fast convolution calculation, specifically, FPGA uses the Winograd program to perform fast convolution calculation on the test data, and the first layer result of the convolutional neural network obtained by the fast calculation is used as the fast convolution result. In this way, the verification time of the Winograd program can be shortened.
优选地,在进行Winograd程序测试之前,还可设置卷积神经网络中的滤波器参数。具体的,即获取卷积神经网络的滤波器参数,将滤波器参数分别设置在目标卷积算法程序和Winograd程序中。如此,便可保障目标卷积算法程序和Winograd程序内的滤波器参数一致,同时也可分别测试出不同滤波器参数下,Winograd程序的准确率。Preferably, before performing the Winograd program test, the filter parameters in the convolutional neural network can also be set. Specifically, the filter parameters of the convolutional neural network are obtained, and the filter parameters are respectively set in the target convolution algorithm program and the Winograd program. In this way, the filter parameters in the target convolution algorithm program and the Winograd program can be guaranteed to be consistent, and the accuracy of the Winograd program under different filter parameters can also be tested separately.
实施例二:Example 2:
为了便于本领域技术人员更好的理解本发明实施例所提供的程序检测方法,下面以具体的应用场景为例,对本发明实施例所提供的程序检测方法进行详细说明。In order to facilitate those skilled in the art to better understand the program detection method provided by the embodiment of the present invention, the following uses a specific application scenario as an example to describe in detail the program detection method provided by the embodiment of the present invention.
请参考图4,图4为本发明实施例中一种程序检测方法的具体流程图。Please refer to FIG. 4, which is a specific flowchart of a program detection method according to an embodiment of the present invention.
FPGA kernel端Winograd程序表达的Winograd算法表达式不变,但将其计算的第一层CNN卷积结果传回host(主机端,同上文的CPU或处理器)端,同时在host端实现目标的卷积计算,另起一个线程计算第一层CNN的卷积结果。待计算完后,比较host端目标卷积算法的计算结果与kernel端传回的Winograd算法计算结果,如果计算结果差异很小,在预期许可范围内,说明Winograd计算结果正确,kernel端cnn程序继续运行;如果计算结果存在的差异超出预期结果,则说明Winograd算法表达式出错,需中断程序进行检查修改。The expression of the Winograd algorithm expressed by the Winograd program on the FPGA-kernel side remains unchanged, but the first layer of CNN convolution results of its calculation are returned to the host (host side, same as the CPU or processor above), while achieving the target on the host side Convolution calculation, starting a new thread to calculate the convolution result of the first layer CNN. After the calculation is completed, compare the calculation result of the target convolution algorithm on the host side with the calculation result of the Winograd algorithm returned by the kernel side. If the calculation result is very small, it is within the expected permission range, indicating that the Winograd calculation result is correct, and the kernel side cnn program continues Run; if the difference in the calculation results exceeds the expected result, it means that the Winograd algorithm expression is wrong, and the program needs to be interrupted to check and modify.
具体的,首先将测试数据和filter数据(同上文的滤波器参数)输入至CPU缓存中。然后,在host端启动两个线程,其中,线程1用于按照目标卷积算法计算卷积;线程2用于启动kernel利用FPGA板卡加速计算CNN。Specifically, first input the test data and filter data (same as the filter parameters above) into the CPU cache. Then, two threads are started on the host side, where thread 1 is used to calculate the convolution according to the target convolution algorithm; thread 2 is used to start the kernel to use the FPGA board to accelerate the calculation of CNN.
线程2启动后首先创建FPGA板卡运行环境、初始化板卡参数,然后将测试数据和filter数据写入FPGA板卡缓存,接着启动FPGA kernel程序进行运算。即,kernel程序按照Winograd算法计算卷积,得到第一层CNN卷积后将卷积结果返回host端。After thread 2 starts, it first creates the FPGA board operating environment, initializes the board parameters, and then writes the test data and filter data into the FPGA board card cache, and then starts the FPGA kernel program to perform calculations. That is, the kernel program calculates the convolution according to the Winograd algorithm, obtains the first layer CNN convolution, and returns the convolution result to the host side.
线程1启动后,首先获取input和filter数据,然后按照目标卷积算法计算第一层CNN卷积结果。接收到kernel端返回的Winograd卷积结果数据。然后比较两种方法获取的卷积结果差异,如果差异小于10 -3,则说明Winograd算法程序表达无误。如果差异超出预期范围,则说明kernel的Winograd算法程序的表达式书写存在问题,需中断程序进行检查修改。通过增加host端验证程序来确保Winograd计算结果准确无误,便可判断出在FPGA上运行的Winograd算法程序是否正确,并在有误的情况下,进行改进。从而进一步保证整个CNN网络的计算结果无误。 After thread 1 is started, the input and filter data are first obtained, and then the first layer CNN convolution result is calculated according to the target convolution algorithm. Received Winograd convolution result data returned from the kernel. Then compare the difference of the convolution results obtained by the two methods. If the difference is less than 10 -3 , it means that the Winograd algorithm program is correct. If the difference is beyond the expected range, it means that there is a problem with the expression of the kernel Winograd algorithm program, and the program needs to be interrupted to check and modify. By adding the host-side verification program to ensure that the Winograd calculation results are accurate, you can determine whether the Winograd algorithm program running on the FPGA is correct, and improve it if there is an error. In order to further ensure that the calculation results of the entire CNN network are correct.
实施例三:Example three:
相应于上面的方法实施例,本发明实施例还提供了一种程序检测装置,下文描述的程序检测装置与上文描述的程序检测方法可相互对应参照。Corresponding to the above method embodiments, an embodiment of the present invention also provides a program detection device. The program detection device described below and the program detection method described above can be referred to each other.
参见图5示,该装置包括以下模块:Referring to FIG. 5, the device includes the following modules:
测试数据获取模块101,用于在接收到Winograd程序检测指令时,获取测试数据;其中,Winograd程序为实现卷积神经网络的快速算法程序;The test data obtaining module 101 is used to obtain test data when receiving the Winograd program detection instruction; wherein, the Winograd program is a fast algorithm program for realizing a convolutional neural network;
卷积计算模块102,用于利用卷积神经网络的目标算法程序,对测试 数据进行卷积计算,获得卷积结果;目标算法程序为以滑窗方式实现卷积神经网络的算法程序The convolution calculation module 102 is used to use the target algorithm program of the convolutional neural network to perform convolution calculation on the test data to obtain the convolution result; the target algorithm program is an algorithm program that implements the convolutional neural network in a sliding window manner
测试数据发送模块103,用于将测试数据发送给FPGA,以便FPGA利用Winograd程序对测试数据进行快速卷积计算;The test data sending module 103 is used to send the test data to the FPGA so that the FPGA can use the Winograd program to perform fast convolution calculation on the test data;
相似度计算模块104,用于接收FPGA发送的快速卷积结果,并计算快速卷积结果与卷积结果的相似度;The similarity calculation module 104 is used to receive the fast convolution result sent by the FPGA and calculate the similarity between the fast convolution result and the convolution result;
检测结果确定模块105,用于当相似度大于阈值时,确定Winograd程序正确。The detection result determination module 105 is used to determine that the Winograd program is correct when the similarity is greater than the threshold.
应用本发明实施例所提供的装置,在接收到Winograd程序检测指令时,获取测试数据;利用卷积神经网络的目标算法程序,对测试数据进行卷积计算,获得卷积结果;目标算法程序为以滑窗方式实现卷积神经网络的算法程序;将测试数据发送给FPGA,以便FPGA利用Winograd程序对测试数据进行快速卷积计算;接收FPGA发送的快速卷积结果,并计算快速卷积结果与卷积结果的相似度;当相似度大于阈值时,确定Winograd程序正确。Using the device provided by the embodiment of the present invention, when receiving the Winograd program detection instruction, the test data is obtained; the target algorithm program of the convolutional neural network is used to perform convolution calculation on the test data to obtain the convolution result; the target algorithm program is Implement the algorithm program of convolutional neural network by sliding window method; send test data to FPGA, so that FPGA can use Winograd program to perform fast convolution calculation on test data; receive the fast convolution result sent by FPGA, and calculate the fast convolution result and The similarity of the convolution results; when the similarity is greater than the threshold, it is determined that the Winograd program is correct.
由于深度神经卷积网络的目标算法程序,即实现滑窗算法的实现流程,只需要使用循环嵌套便可将卷积算法准确表达出,且具有代码简单,出错概率小的优点。又因Winograd程序为实现卷积神经网络的快速算法程序,也就是说,正确表达的Winograd程序与目标算法程序在计算同一个输入数据的卷积计算结果时,所得到的两个卷积结果应当一致或保持在一定差异范围内,即具有相似性。基于此,在将Winograd程序写入FPGA之后,当接收到Winograd程序检测指令时,首先获取用于检验的测试数据。然后在CPU中利用卷积神经网络的目标算法程序对测试数据进行卷积计算,获得卷积结果。与此同时,可将测试数据发送给FPGA。FPGA得到测试数据之后,利用Winograd程序对测试数据进行快速卷积计算,然后将快速卷积计算结果发送给CPU。CPU接收FPGA发送的快速卷积结果之后,计算快速卷积结果与卷积结果的相似度;当相似度大于阈值时,确定Winograd程序正确。如此,便可通过运行在CPU中的目标算法程序,对FPGA中的Winograd程序进行检测,保障了Winograd算法部分的准确率,可提升FPGA内的CNN算法的准确度,进一步提高在FPGA上所进行计算机视觉任务 的准确率。Because the target algorithm program of the deep neural convolutional network, that is, the implementation process of the sliding window algorithm, only the use of loop nesting can accurately express the convolution algorithm, and it has the advantages of simple code and low probability of error. Because the Winograd program is a fast algorithm program for realizing the convolutional neural network, that is to say, when the correctly expressed Winograd program and the target algorithm program calculate the convolution calculation result of the same input data, the two convolution results obtained should be Consistency or keeping within a certain range of differences means similarity. Based on this, after writing the Winograd program to the FPGA, when receiving the Winograd program detection instruction, first obtain the test data for verification. Then use the target algorithm program of the convolutional neural network in the CPU to perform convolution calculation on the test data to obtain the convolution result. At the same time, the test data can be sent to the FPGA. After the FPGA obtains the test data, it uses the Winograd program to perform fast convolution calculation on the test data, and then sends the fast convolution calculation result to the CPU. After the CPU receives the fast convolution result sent by the FPGA, it calculates the similarity between the fast convolution result and the convolution result; when the similarity is greater than the threshold, it is determined that the Winograd program is correct. In this way, the target algorithm program running in the CPU can be used to detect the Winograd program in the FPGA, which ensures the accuracy of the Winograd algorithm part, can improve the accuracy of the CNN algorithm in the FPGA, and further improve the implementation on the FPGA The accuracy of computer vision tasks.
在本发明的一种具体实施方式中,卷积计算模块102,具体用于在FPGA利用Winograd程序对测试数据进行快速卷积计算,并将快速计算得到卷积神经网络的第一层结果作为快速卷积结果时,利用目标算法程序,对测试数据进行卷积计算,将卷积神经网络的第一层结果作为卷积结果。In a specific embodiment of the present invention, the convolution calculation module 102 is specifically used to perform fast convolution calculation on the test data in the FPGA using the Winograd program, and use the fast calculation to obtain the first layer result of the convolutional neural network as a fast When the convolution result is used, the target algorithm program is used to perform convolution calculation on the test data, and the first layer result of the convolutional neural network is used as the convolution result.
在本发明的一种具体实施方式中,还包括:In a specific embodiment of the present invention, it further includes:
滤波器设置模块,用于获取卷积神经网络的滤波器参数;将滤波器参数分别设置在目标卷积算法程序和Winograd程序中。The filter setting module is used to obtain the filter parameters of the convolutional neural network; the filter parameters are set in the target convolution algorithm program and the Winograd program, respectively.
在本发明的一种具体实施方式中,测试数据发送模块103,具体用于创建PFGA板卡运行环境,并初始化板卡参数;将测试数据发送给FPGA,以便FPGA启动kernel,并利用Winograd程序对测试数据进行快速卷积计算。In a specific embodiment of the present invention, the test data sending module 103 is specifically used to create a PFGA board operating environment and initialize board parameters; send the test data to the FPGA so that the FPGA starts the kernel and uses the Winograd program to Test data for fast convolution calculation.
在本发明的一种具体实施方式中,相似度计算模块104,具体用于计算快速卷积结果与卷积结果的比值,利用比值确定相似度;或,计算快速卷积结果与卷积结果的差值,利用差值确定相似度。In a specific embodiment of the present invention, the similarity calculation module 104 is specifically used to calculate the ratio between the fast convolution result and the convolution result, and use the ratio to determine the similarity; or, calculate the fast convolution result and the convolution result. Difference, use the difference to determine the similarity.
在本发明的一种具体实施方式中,检测结果确定模块105,具体用于当相似度小于或等于阈值时,确定Winograd程序错误。In a specific embodiment of the present invention, the detection result determination module 105 is specifically configured to determine that the Winograd program is wrong when the similarity is less than or equal to the threshold.
实施例四:Example 4:
相应于上面的方法实施例,本发明实施例还提供了一种程序检测设备,下文描述的一种程序检测设备与上文描述的一种程序检测方法可相互对应参照。Corresponding to the above method embodiment, an embodiment of the present invention further provides a program detection device. A program detection device described below and a program detection method described above can be referred to each other.
参见图6所示,该程序检测设备包括:As shown in Figure 6, the program detection equipment includes:
存储器D1,用于存储计算机程序;Memory D1, used to store computer programs;
处理器D2,用于执行计算机程序时实现上述方法实施例的程序检测方法的步骤。The processor D2 is configured to implement the steps of the program detection method in the foregoing method embodiments when the computer program is executed.
具体的,请参考图7,图7为本实施例提供的一种程序检测设备的具体结构示意图,该程序检测设备可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)322(例如,一个或一个以上处理器)和存储器332,一个或一个以上存储应用程序342或数据344的存储介质330(例如一个或一个以上海量存储设备)。其中, 存储器332和存储介质330可以是短暂存储或持久存储。存储在存储介质330的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对数据处理设备中的一系列指令操作。更进一步地,中央处理器322可以设置为与存储介质330通信,在程序检测设备301上执行存储介质330中的一系列指令操作。Specifically, please refer to FIG. 7, which is a schematic diagram of a specific structure of a program detection device provided in this embodiment. The program detection device may have a relatively large difference due to different configurations or performances, and may include one or more processings. A central processing unit (CPU) 322 (for example, one or more processors) and a memory 332, one or more storage media 330 (for example, one or more mass storage devices) that store application programs 342 or data 344. The memory 332 and the storage medium 330 may be short-term storage or persistent storage. The program stored in the storage medium 330 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the data processing device. Furthermore, the central processor 322 may be configured to communicate with the storage medium 330 and execute a series of instruction operations in the storage medium 330 on the program detection device 301.
程序检测设备301还可以包括一个或一个以上电源326,一个或一个以上有线或无线网络接口350,一个或一个以上输入输出接口358,和/或,一个或一个以上操作系统341。例如,Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等。The program detection device 301 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input output interfaces 358, and/or one or more operating systems 341. For example, Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
上文所描述的程序检测方法中的步骤可以由程序检测设备的结构实现。The steps in the program detection method described above can be implemented by the structure of the program detection device.
实施例五:Example 5:
相应于上面的方法实施例,本发明实施例还提供了一种可读存储介质,下文描述的一种可读存储介质与上文描述的一种程序检测方法可相互对应参照。Corresponding to the above method embodiment, an embodiment of the present invention further provides a readable storage medium. A readable storage medium described below and a program detection method described above can be referred to each other.
一种可读存储介质,可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现上述方法实施例的程序检测方法的步骤。A readable storage medium stores a computer program on the readable storage medium, and when the computer program is executed by a processor, the steps of the program detection method of the foregoing method embodiments are implemented.
该可读存储介质具体可以为U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可存储程序代码的可读存储介质。The readable storage medium may specifically be a U-disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, which can store program codes Readable storage media.
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。Professionals can further realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of the two. In order to clearly illustrate the hardware and software Interchangeability, in the above description, the composition and steps of each example have been generally described according to function. Whether these functions are executed in hardware or software depends on the specific application of the technical solution and design constraints. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the present invention.

Claims (10)

  1. 一种程序检测方法,其特征在于,包括:A program detection method, which includes:
    在接收到Winograd程序检测指令时,获取测试数据;Obtain test data when receiving the Winograd program detection instruction;
    利用所述卷积神经网络的目标算法程序,对所述测试数据进行卷积计算,获得卷积结果;所述目标算法程序为以滑窗方式实现所述卷积神经网络的算法程序;Use the target algorithm program of the convolutional neural network to perform convolution calculation on the test data to obtain a convolution result; the target algorithm program is an algorithm program that implements the convolutional neural network in a sliding window manner;
    将所述测试数据发送给FPGA,以便所述FPGA利用所述Winograd程序对所述测试数据进行快速卷积计算;Sending the test data to the FPGA, so that the FPGA uses the Winograd program to perform fast convolution calculation on the test data;
    接收所述FPGA发送的快速卷积结果,并计算所述快速卷积结果与所述卷积结果的相似度;Receiving the fast convolution result sent by the FPGA, and calculating the similarity between the fast convolution result and the convolution result;
    当所述相似度大于阈值时,确定所述Winograd程序正确。When the similarity is greater than the threshold, it is determined that the Winograd program is correct.
  2. 根据权利要求1所述的程序检测方法,其特征在于,利用所述卷积神经网络的目标算法程序,对所述测试数据进行卷积计算,获得卷积结果,包括:The program detection method according to claim 1, wherein using the target algorithm program of the convolutional neural network to perform convolution calculation on the test data to obtain a convolution result includes:
    利用所述目标算法程序,对所述测试数据进行卷积计算,将所述卷积神经网络的第一层结果作为所述卷积结果;Use the target algorithm program to perform convolution calculation on the test data, and use the first layer result of the convolutional neural network as the convolution result;
    相应地,所述FPGA利用所述Winograd程序对所述测试数据进行快速卷积计算,包括:Correspondingly, the FPGA uses the Winograd program to perform fast convolution calculation on the test data, including:
    所述FPGA利用所述Winograd程序对所述测试数据进行快速卷积计算,并将快速计算得到所述卷积神经网络的第一层结果作为所述快速卷积结果。The FPGA uses the Winograd program to perform fast convolution calculation on the test data, and uses the fast calculation to obtain the first layer result of the convolutional neural network as the fast convolution result.
  3. 根据权利要求1所述的程序检测方法,其特征在于,还包括:The program detection method according to claim 1, further comprising:
    获取所述卷积神经网络的滤波器参数;Obtaining filter parameters of the convolutional neural network;
    将所述滤波器参数分别设置在所述目标卷积算法程序和Winograd程序中。The filter parameters are set in the target convolution algorithm program and the Winograd program, respectively.
  4. 根据权利要求1所述的程序检测方法,其特征在于,将所述测试数据发送给所述FPGA,包括:The program detection method according to claim 1, wherein sending the test data to the FPGA includes:
    创建所述PFGA板卡运行环境,并初始化板卡参数;Create the PFGA board operating environment and initialize board parameters;
    将所述测试数据发送给所述FPGA。Sending the test data to the FPGA.
  5. 根据权利要求1所述的程序检测方法,其特征在于,所述FPGA 利用所述Winograd程序对所述测试数据进行快速卷积计算,包括:The program detection method according to claim 1, wherein the FPGA uses the Winograd program to perform fast convolution calculation on the test data, including:
    所述FPGA启动kernel,并利用所述Winograd程序对所述测试数据进行快速卷积计算。The FPGA starts the kernel and uses the Winograd program to perform fast convolution calculation on the test data.
  6. 根据权利要求1至5任一项所述的程序检测方法,其特征在于,计算所述快速卷积结果与所述卷积结果的相似度,包括:The program detection method according to any one of claims 1 to 5, wherein calculating the similarity between the fast convolution result and the convolution result includes:
    计算所述快速卷积结果与所述卷积结果的比值,利用所述比值确定所述相似度;Calculating the ratio of the fast convolution result to the convolution result, and using the ratio to determine the similarity;
    或,计算所述快速卷积结果与所述卷积结果的差值,利用所述差值确定所述相似度。Or, calculate the difference between the fast convolution result and the convolution result, and use the difference to determine the similarity.
  7. 根据权利要求6所述的程序检测方法,其特征在于,当所述相似度小于或等于所述阈值时,还包括:The program detection method according to claim 6, wherein when the similarity is less than or equal to the threshold, the method further comprises:
    确定所述Winograd程序错误。It is determined that the Winograd program is wrong.
  8. 一种程序检测装置,其特征在于,包括:A program detection device, including:
    测试数据获取模块,用于在接收到Winograd程序检测指令时,获取测试数据;Test data acquisition module, used to acquire test data when receiving the Winograd program detection instruction;
    卷积计算模块,用于利用所述卷积神经网络的目标算法程序,对所述测试数据进行卷积计算,获得卷积结果;所述目标算法程序为以滑窗方式实现所述卷积神经网络的算法程序;The convolution calculation module is used to use the target algorithm program of the convolutional neural network to perform convolution calculation on the test data to obtain a convolution result; the target algorithm program is to implement the convolutional nerve in a sliding window manner Algorithm program of the network;
    测试数据发送模块,用于将所述测试数据发送给FPGA,以便所述FPGA利用所述Winograd程序对所述测试数据进行快速卷积计算;A test data sending module, configured to send the test data to the FPGA, so that the FPGA uses the Winograd program to perform fast convolution calculation on the test data;
    相似度计算模块,用于接收所述FPGA发送的快速卷积结果,并计算所述快速卷积结果与所述卷积结果的相似度;A similarity calculation module, configured to receive the fast convolution result sent by the FPGA, and calculate the similarity between the fast convolution result and the convolution result;
    检测结果确定模块,用于当所述相似度大于阈值时,确定所述Winograd程序正确。The detection result determination module is used to determine that the Winograd program is correct when the similarity is greater than a threshold.
  9. 一种程序检测设备,其特征在于,包括:A program detection device, characterized in that it includes:
    存储器,用于存储计算机程序;Memory, used to store computer programs;
    处理器,用于执行所述计算机程序时实现如权利要求1至7任一项所述程序检测方法的步骤。The processor is configured to implement the steps of the program detection method according to any one of claims 1 to 7 when executing the computer program.
  10. 一种可读存储介质,其特征在于,所述可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7任一项 所述程序检测方法的步骤。A readable storage medium, characterized in that a computer program is stored on the readable storage medium, and when the computer program is executed by a processor, the steps of the program detection method according to any one of claims 1 to 7 are implemented.
PCT/CN2019/103639 2018-12-10 2019-08-30 Program detection method, apparatus and device, and readable storage medium WO2020119188A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811514703.0 2018-12-10
CN201811514703.0A CN109558329A (en) 2018-12-10 2018-12-10 A kind of program detecting method, device, equipment and readable storage medium storing program for executing

Publications (1)

Publication Number Publication Date
WO2020119188A1 true WO2020119188A1 (en) 2020-06-18

Family

ID=65869926

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103639 WO2020119188A1 (en) 2018-12-10 2019-08-30 Program detection method, apparatus and device, and readable storage medium

Country Status (2)

Country Link
CN (1) CN109558329A (en)
WO (1) WO2020119188A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112330524A (en) * 2020-10-26 2021-02-05 沈阳上博智像科技有限公司 Device and method for quickly realizing convolution in image tracking system

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558329A (en) * 2018-12-10 2019-04-02 广东浪潮大数据研究有限公司 A kind of program detecting method, device, equipment and readable storage medium storing program for executing
CN110457907B (en) * 2019-07-25 2021-04-20 腾讯科技(深圳)有限公司 Firmware program detection method and device
CN110516334B (en) * 2019-08-16 2021-12-03 浪潮电子信息产业股份有限公司 Convolution calculation simulation test method and device based on hardware environment and related equipment
CN111027277B (en) * 2019-11-12 2024-07-05 天津大学 Software and hardware cooperation verification method
CN113496272A (en) * 2021-05-10 2021-10-12 中国电子科技集团公司第十四研究所 Convolutional neural network operation method based on heterogeneous platform

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117330A (en) * 2015-08-07 2015-12-02 百度在线网络技术(北京)有限公司 CNN (Convolutional Neural Network) code testing method and apparatus
CN106528363A (en) * 2015-09-14 2017-03-22 深圳市博巨兴实业发展有限公司 Software and hardware cooperative design verifying method and device
EP3346390A1 (en) * 2016-12-30 2018-07-11 INTEL Corporation Winograd algorithm on a matrix processing architecture
CN108764083A (en) * 2018-05-17 2018-11-06 淘然视界(杭州)科技有限公司 Object detection method, electronic equipment, storage medium based on natural language expressing
CN109558329A (en) * 2018-12-10 2019-04-02 广东浪潮大数据研究有限公司 A kind of program detecting method, device, equipment and readable storage medium storing program for executing

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170344876A1 (en) * 2016-05-31 2017-11-30 Samsung Electronics Co., Ltd. Efficient sparse parallel winograd-based convolution scheme
CN108229645B (en) * 2017-04-28 2021-08-06 北京市商汤科技开发有限公司 Convolution acceleration and calculation processing method and device, electronic equipment and storage medium
CN107844833A (en) * 2017-11-28 2018-03-27 郑州云海信息技术有限公司 A kind of data processing method of convolutional neural networks, device and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117330A (en) * 2015-08-07 2015-12-02 百度在线网络技术(北京)有限公司 CNN (Convolutional Neural Network) code testing method and apparatus
CN106528363A (en) * 2015-09-14 2017-03-22 深圳市博巨兴实业发展有限公司 Software and hardware cooperative design verifying method and device
EP3346390A1 (en) * 2016-12-30 2018-07-11 INTEL Corporation Winograd algorithm on a matrix processing architecture
CN108764083A (en) * 2018-05-17 2018-11-06 淘然视界(杭州)科技有限公司 Object detection method, electronic equipment, storage medium based on natural language expressing
CN109558329A (en) * 2018-12-10 2019-04-02 广东浪潮大数据研究有限公司 A kind of program detecting method, device, equipment and readable storage medium storing program for executing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112330524A (en) * 2020-10-26 2021-02-05 沈阳上博智像科技有限公司 Device and method for quickly realizing convolution in image tracking system

Also Published As

Publication number Publication date
CN109558329A (en) 2019-04-02

Similar Documents

Publication Publication Date Title
WO2020119188A1 (en) Program detection method, apparatus and device, and readable storage medium
US11106486B2 (en) Techniques to manage virtual classes for statistical tests
WO2021088688A1 (en) Convolution acceleration operation method and apparatus, storage medium and terminal device
CN108334408B (en) Code execution method and device, terminal equipment and computer readable storage medium
CN110929865A (en) Network quantification method, service processing method and related product
CN111553215A (en) Personnel association method and device, and graph convolution network training method and device
CN111985831A (en) Scheduling method and device of cloud computing resources, computer equipment and storage medium
US8856102B2 (en) Modifying structured query language statements
CN114861039B (en) Parameter configuration method, device, equipment and storage medium of search engine
CN111813721B (en) Neural network data processing method, device, equipment and storage medium
WO2019134084A1 (en) Code execution method and apparatus, terminal device, and computer-readable storage medium
CN114386577A (en) Method, apparatus, and storage medium for executing deep learning model
CN113327194A (en) Image style migration method, device, equipment and storage medium
CN115511047B (en) Quantification method, device, equipment and medium of Softmax model
US20230244974A1 (en) Quantum state processing method, computing device and storage medium
CN113553407B (en) Event tracing method and device, electronic equipment and storage medium
WO2024016894A1 (en) Method for training neural network and related device
WO2024045175A1 (en) Optimization of executable graph for artificial intelligence model inference
CN117610619A (en) Quantized convolution method, device and equipment
CN117409939A (en) Disease appeal diagnosis method, device, electronic equipment and storage medium
CN117196927A (en) Image processing method, device, equipment and medium
CN118535336A (en) Task execution method, device, electronic equipment and storage medium
CN111625526A (en) Fuzzy data processing method and system and terminal equipment
CN113962370A (en) Fixed-point processing method and device for convolutional neural network and storage medium
CN114004479A (en) Target resource demand determining method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19896206

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19896206

Country of ref document: EP

Kind code of ref document: A1