CN109472735B

CN109472735B - Accelerator, method and accelerating system for realizing fabric defect detection neural network

Info

Publication number: CN109472735B
Application number: CN201811273849.0A
Authority: CN
Inventors: 金玲玲; 饶东升
Original assignee: Shenzhen Lintsense Technology Co ltd
Current assignee: Shenzhen Lintsense Technology Co ltd
Priority date: 2018-10-30
Filing date: 2018-10-30
Publication date: 2023-05-26
Anticipated expiration: 2038-10-30
Also published as: CN109472735A

Abstract

The application discloses an FPGA acceleration device, an FPGA acceleration method and an FPGA acceleration system for realizing a fabric defect detection neural network. In addition, according to the scheme of the embodiment of the invention, the FPGA acceleration device outputs the calculation result only when the defects are detected, so that the transmission bandwidth is saved.

Description

Accelerator, method and accelerating system for realizing fabric defect detection neural network

Technical Field

The application relates to the technical field of fabric detection, in particular to a method, an accelerating device and an accelerating system for realizing a neural network for detecting fabric defects.

Background

In the production line of woven fabrics, knitted fabrics, nonwoven fabrics and the like, it is necessary to detect whether or not there are defects in the produced fabrics, for example, whether or not there are stains, holes, fuzzing and the like on the fabrics.

The existing detection method mainly comprises the steps that a detector stands in front of a cloth inspection device, detects defects of fabrics in a visual detection mode and marks or records the defects. In the case where the yield of the fabric is large, the detection by the detecting person is labor-intensive, and the detecting person is fatigued after a period of work, so that there is a possibility that false detection occurs. Therefore, the overall defect detection efficiency by the inspection personnel is not high, and the detection accuracy is not stable enough.

In the related art, the cloth inspection by using a computer is mainly realized by a machine vision and defect classification method, that is, a cloth image to be detected is obtained by a camera shooting technology, a plurality of categories are preset, the probability that the image respectively belongs to each category in the plurality of categories is determined by using a detection model according to the characteristics of the image, and the category with the highest probability is determined as the category of the image, so that the defect classification of the cloth image is obtained.

However, the current method of performing cloth inspection by using a computer is generally implemented by using a general purpose processor (CPU) or a Graphics Processor (GPU), wherein, with the continuous promotion of the support of a massive parallel architecture of the GPU, the running speed of a detection model on a GPU system is often increased by tens or thousands of times compared with a single-core CPU, but the high energy consumption of the GPU has a certain limit on the application thereof. Compared to GPUs, FPGAs have great advantages in terms of power consumption.

Disclosure of Invention

In view of the above, embodiments of the present invention provide an acceleration apparatus, method, and acceleration system for implementing a fabric defect detection neural network.

According to an embodiment of the invention, an FPGA acceleration device for realizing a fabric defect detection neural network comprises: the storage unit is used for storing the operation instruction, at least one candidate area data of the fabric image to be detected and the weight data of the fabric defect detection neural network, which participate in calculation, and distributing the at least one candidate area data to the at least one calculation unit in a one-to-one correspondence manner; the at least one calculation unit is used for executing vector multiplication and addition operation in the calculation of the fabric defect detection neural network according to the operation instruction, the weight data and the allocated candidate area data so as to obtain a calculation result; the at least one storage unit is further used for storing the calculation result and outputting the calculation result that the candidate area contains the fabric defects; and the control unit is connected with the at least one storage unit and the at least one calculation unit, and is used for obtaining the operation instruction through the at least one storage unit and analyzing the operation instruction to control the at least one calculation unit.

According to the embodiment of the invention, the method for realizing the fabric defect detection neural network based on the FPGA comprises the following steps: setting at least one storage unit, storing an operation instruction, at least one candidate area data of a fabric image to be detected and weight data of a fabric defect detection neural network, which participate in calculation, and distributing the at least one candidate area data to at least one calculation unit in a one-to-one correspondence manner; setting at least one calculation unit, and executing vector multiplication and addition operation in the calculation of the fabric defect detection neural network according to the operation instruction, the weight data and the allocated candidate area data to obtain a calculation result; the at least one storage unit stores the calculation result and outputs the calculation result that the candidate area contains the fabric defects; and the control unit is connected with the at least one storage unit and the at least one calculation unit, obtains the operation instruction through the at least one storage unit, and analyzes the operation instruction to control the at least one calculation unit.

According to an embodiment of the invention, a hardware acceleration system for realizing a fabric defect detection neural network based on an FPGA comprises: a processor and the FPGA accelerating device; the processor is used for running the fabric defect detection device and sending initial data for calculation to the FPGA acceleration device; and the FPGA accelerating device is used for executing calculation of the fabric defect detection neural network according to the initial data sent by the processor to obtain a calculation result, and returning the calculation result of the fabric defects contained in the candidate area to the processor.

From the above description, it can be seen that the solution of the embodiment of the present invention uses FPGA to implement the detection process of accelerating the fabric defect detection neural network, and has the characteristics of high performance and low power consumption compared with the general-purpose processor and the graphics processor. In addition, according to the scheme of the embodiment of the invention, the FPGA acceleration device outputs the calculation result only when the defects are detected, so that the transmission bandwidth is saved.

Drawings

FIG. 1 is a schematic diagram of an FPGA acceleration device implementing a fabric defect detection neural network in accordance with one embodiment of the present invention;

FIG. 2 is a schematic diagram of a hardware acceleration system for implementing a fabric defect detection neural network based on an FPGA, in accordance with one embodiment of the invention.

Detailed Description

The subject matter described herein will now be discussed with reference to example embodiments. It should be appreciated that these embodiments are discussed only to enable a person skilled in the art to better understand and thereby practice the subject matter described herein, and are not limiting of the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure. Various examples may omit, replace, or add various procedures or components as desired. For example, the described methods may be performed in a different order than described, and various steps may be added, omitted, or combined. In addition, features described with respect to some examples may be combined in other examples as well.

As used herein, the term "comprising" and variations thereof mean open-ended terms, meaning "including, but not limited to. The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment. The term "another embodiment" means "at least one other embodiment". The terms "first," "second," and the like, may refer to different or the same object. Other definitions, whether explicit or implicit, may be included below. Unless the context clearly indicates otherwise, the definition of a term is consistent throughout this specification.

The FPGA accelerating device for realizing the fabric defect detection neural network provided by the invention is based on a storage-control-calculation structure:

the storage structure is used for storing data and operation instructions which participate in calculation;

the control structure comprises a decoding circuit, a control circuit and a control circuit, wherein the decoding circuit is used for analyzing an operation instruction and generating a control signal to control the scheduling and storage of on-chip data and the calculation process of a neural network;

the computing structure includes an arithmetic logic unit for participating in the neural network computing operation, the data implementing the computing operation in the computing structure.

The invention provides an FPGA accelerating device for realizing a neural network for detecting fabric defects, which comprises the following components: the storage unit is used for storing the operation instruction, at least one candidate area data of the fabric image to be detected and the weight data of the fabric defect detection neural network, which participate in calculation, and distributing the at least one candidate area data to the at least one calculation unit in a one-to-one correspondence manner; the at least one calculation unit is used for executing vector multiplication and addition operation in the calculation of the fabric defect detection neural network according to the operation instruction, the weight data and the allocated candidate area data so as to obtain a calculation result; the at least one storage unit is further used for storing the calculation result and outputting the calculation result that the candidate area contains the fabric defects; and the control unit is connected with the at least one storage unit and the at least one calculation unit, and is used for obtaining the operation instruction through the at least one storage unit and analyzing the operation instruction to control the at least one calculation unit.

According to the FPGA acceleration device, the weight data is the weight of the trained fabric defect detection neural network.

According to the FPGA acceleration device, when the device performs neural network calculation, the trained neural network weights can be compressed and stored in the storage unit outside the chip.

The invention compresses the fabric defect detection neural network under the chip in an off-line compression mode, and transmits the compressed fabric defect detection neural network to a storage unit on the chip through an input interface.

Fig. 1 shows a schematic diagram of an FPGA acceleration device implementing a fabric defect detection neural network according to an embodiment of the present invention, the device 100 is composed of seven parts, including an input data storage unit 102, a weight storage unit 104, an instruction storage unit 106, a calculation unit 108, an output data storage unit 110, and a control unit 112.

The input data storage unit 102 is configured to store at least one candidate area data of the fabric image to be detected, which participates in the calculation, and allocate the at least one candidate area data to the at least one calculation unit 108 in a one-to-one correspondence, for example, allocate the first candidate area data to the calculation unit 1, allocate the second candidate area data to the calculation unit 2, …, and allocate the nth candidate area data to the calculation unit N. The candidate region data comprise candidate region original characteristic map data and data generated by calculation of the intermediate layer of the fabric defect detection neural network. The candidate region may be an abnormal region which may contain a fabric defect obtained from an image of the fabric to be detected through image segmentation, which may be implemented by software, and the image segmentation technique is a known technique, and a description thereof is omitted herein.

The weight storage unit 104 is configured to store weight data of a fabric defect detection neural network, where the fabric defect detection neural network is a neural network structure weight that has been previously trained on a preset training sample and has been trained to make the accuracy of the neural network satisfy a preset accuracy. In one embodiment, the weight data is obtained by offline compression of the trained fabric defect detection neural network, and the compressed weight data is stored in the weight storage unit 104. Specifically, the accuracy of the compressed neural network based on the preset training sample is not lower than the preset accuracy. The embodiment adopts a genetic algorithm to execute compression processing on the neural network for detecting the defects of the fabric, and the realization principle is that under the condition of considering the accuracy of the neural network, the compression neural network is taken as a criterion to execute various genetic operations on the trained neural network, and finally, the neural network with the simplest structure is obtained, thereby realizing the compression processing on the neural network. In the embodiment, the weight of the neural network is compressed off-chip in an off-chip compression mode, so that the neural network with a larger model can be applied to the FPGA acceleration device.

The instruction storage unit 106 is used for storing operation instructions participating in calculation, and the operation instructions are analyzed to realize neural network calculation.

The calculation unit 108 is configured to perform a corresponding neural network calculation according to the control signal generated by the control unit 112. The computing unit 108 is associated with one or more storage units, and the computing unit 108 may obtain data from the data storage components in its associated input data storage unit 102 for computation and may write data to its associated output data storage unit 110. The computing unit 108 performs most of the operations in the neural network algorithm, i.e., vector multiply-add operations, etc. In the implementation, for example, the calculation unit 108 may perform the vector multiply-add operation in the neural network calculation according to the operation instruction, the allocated candidate region data, and the weight data to obtain the calculation result. The calculation results include intermediate calculation results and final calculation results.

The output data storage unit 110 is configured to store the calculation result calculated by the calculation unit 108, and if the calculation result calculated by the calculation unit 108 is that the candidate area includes the fabric defect, the output data storage unit 110 outputs the calculation result.

The control unit 112 is respectively connected to the input data storage unit 102, the weight storage unit 104, the instruction storage unit 106, the calculation unit 108, and the output data storage unit 110, the control unit 112 obtains the instruction stored in the instruction storage unit 106 and analyzes the instruction, and the control unit 112 can control the calculation unit 108 to perform neural network calculation according to a control signal obtained by analyzing the instruction.

From the above description, it can be seen that the scheme of the embodiment of the invention uses FPGA to accelerate the detection process of the fabric defect detection neural network, and has the characteristics of high performance and low power consumption compared with general-purpose processor and graphic processor. In addition, according to the scheme of the embodiment of the invention, the FPGA acceleration device outputs the calculation result only when the defects are detected, so that the transmission bandwidth is saved.

The invention also provides an embodiment of a method for realizing the neural network for detecting the fabric defects based on the FPGA, which corresponds to the device, and the description is simpler as the embodiment of the method is basically similar to the embodiment of the device, and the relevant parts refer to the part of the description of the embodiment of the device. The method comprises the following steps:

setting at least one storage unit, storing an operation instruction, at least one candidate area data of a fabric image to be detected and weight data of a fabric defect detection neural network, which participate in calculation, and distributing the at least one candidate area data to at least one calculation unit in a one-to-one correspondence manner; setting at least one calculation unit, and executing vector multiplication and addition operation in the calculation of the fabric defect detection neural network according to the operation instruction, the weight data and the allocated candidate area data to obtain a calculation result; the at least one storage unit stores the calculation result and outputs the calculation result that the candidate area contains the fabric defects; and the control unit is connected with the at least one storage unit and the at least one calculation unit, obtains the operation instruction through the at least one storage unit, and analyzes the operation instruction to control the at least one calculation unit.

In an embodiment of the above method, the storage unit includes an input data storage unit, an output data storage unit, a weight storage unit, and an instruction storage unit; the input data storage unit is used for storing the candidate area data, wherein the candidate area data comprises candidate area original feature map data and data generated by calculation of an intermediate layer of the fabric defect detection neural network; the output data storage unit is used for storing the calculation result; the weight storage unit is used for storing the weight data; the instruction storage unit is used for storing the operation instructions.

In another embodiment of the above method, the candidate region data comprises candidate region raw signature data and data generated by calculation of intermediate layers of the fabric defect detection neural network.

In yet another embodiment of the above method, the weight data is obtained after offline compression of the trained fabric defect detection neural network off-chip using a genetic algorithm.

The invention also provides a hardware acceleration system for realizing the fabric defect detection neural network based on the FPGA, which comprises: a processor and FPGA acceleration means as described above; the processor is used for running the fabric defect detection device and sending initial data for calculation to the FPGA acceleration device; and the FPGA accelerating device is used for executing calculation of the fabric defect detection neural network according to the initial data sent by the processor to obtain a calculation result, and returning the calculation result of the fabric defects contained in the candidate area to the processor. The processor includes a central processing unit (CPU: central Processing Unit), a network processor (NP: network processor) or a ARM (Advanced RISC Machines) processor, or a combination of CPU and NP.

In one embodiment of the above system, the processor obtains at least one candidate area data of the fabric image to be detected by using the fabric defect detection device, sends the at least one candidate area data to the FPGA acceleration device, receives a calculation result of the fabric defect contained in the candidate area output by the FPGA acceleration device, and outputs classification information of the fabric defect by using the fabric defect detection device according to the calculation result.

FIG. 2 illustrates a schematic diagram of a hardware acceleration system for implementing a fabric defect detection neural network based on an FPGA in accordance with an embodiment of the invention, the system 200 may include a processor 202, a memory 204, an FPGA acceleration device 206, and a bus 208, with the processor 202, the memory 204, and the FPGA acceleration device 206 being interconnected by a bus 208 protocol. Specifically, the system 200 and other necessary chips may be mounted on a printed circuit board (PCB: printed circuit board).

In this embodiment, the processor 202 (processor) is a CPU. Processor 202 is the control side of system 200. The processor 202 operates the fabric defect detection device to control the calculation process of the FPGA acceleration device 206 by issuing a number of configuration parameters. The fabric defect detection means may be implemented in software.

The memory 204 (memory) may include a volatile memory (RAM), such as a random-access memory (RAM), a nonvolatile memory (non-volatile memory), such as a flash memory (flash memory), a hard disk (HDD) or a Solid State Drive (SSD), and may include a combination of the above types of memories. The memory 204 is used for buffering collected data, input weight data, and calculation results returned by the FPGA acceleration device 206.

The FPGA acceleration device 206 is a hardware acceleration component FPGA chip of the system 200, and is used to implement acceleration of the fabric defect detection neural network algorithm. The FPGA accelerator 206 includes direct memory storage (DMA), control interconnect (control interconnection), input buffer (input buffer), output buffer (output buffer), weight store buffer (weight buffer), instruction store buffer (instruction buffer), and compute unit (PE: processing Element). The input buffer is used for storing candidate region data, the output buffer is used for storing calculation results, the weight storage buffer is used for storing weight data, the instruction storage buffer is used for storing operation instructions, the direct memory storage is responsible for data transmission between the FPGA acceleration device 206 and the memory 204, and the control interconnection is responsible for interconnection of control signal lines.

The buses 208 may include a DATA BUS (DATA BUS) and a CONTROL BUS (CONTROL BUS). The data bus is responsible for data transmission between the processor 202 and the FPGA acceleration device 206, and adopts AXI-Stream protocol, which is a high-performance transmission protocol, allowing unrestricted data burst transmission. The control bus is responsible for the control signal transmission of the processor 202 and the FPGA accelerator 206, and adopts AXI-Lite protocol, which is a lightweight address mapping single transmission protocol suitable for the control signal transmission of the hardware computing unit.

In a specific application, when defect detection is required for a fabric, the system 200 obtains an image of the fabric to be detected through a peripheral device, then performs image segmentation through fabric defect detection software, and then obtains at least one candidate area and stores the candidate area in the memory 204, wherein the memory 204 also stores neural network model data and control data. Wherein the control data includes a Buffer Descriptor (BD) used by the DMA and an operation instruction used by the controller. When all data is ready, the processor 202 starts to configure the BD pre-stored in the memory 204 for the DMA. The configured DMA interconnects the model data, candidate region data, and the operation instructions to the control, triggering the computation process of the FPGA acceleration 206. Each time a DMA interrupt, the processor 202 adds a self-sustaining pointer address to the BD list for each DMA, configuring a new BD, which works until the last BD transfer. When the last BD interrupt from the DMA is received, if the processor 202 receives the calculation result returned from the PE, the softmax function is run with the fabric defect detection software for the received result, and the defect type information is output.

The detailed description set forth above in connection with the appended drawings describes exemplary embodiments, but does not represent all embodiments that may be implemented or fall within the scope of the claims. The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.

The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. FPGA accelerating device for realizing fabric defect detection neural network, comprising:

the storage unit is used for storing the operation instruction, at least one candidate area data of the fabric image to be detected and the weight data of the fabric defect detection neural network, which participate in calculation, and distributing the at least one candidate area data to the at least one calculation unit in a one-to-one correspondence manner;

the at least one calculation unit is used for executing vector multiplication and addition operation in the calculation of the fabric defect detection neural network according to the operation instruction, the weight data and the allocated candidate area data so as to obtain a calculation result; the at least one storage unit is further used for storing the calculation result and outputting the calculation result that the candidate area contains the fabric defects;

the control unit is connected with the at least one storage unit and the at least one calculation unit, and is used for obtaining the operation instruction through the at least one storage unit and analyzing the operation instruction to control the at least one calculation unit;

the weight data is trained fabric defect detection neural network weights, when the device calculates the neural network weights, the trained neural network weights are compressed outside the chip and stored in the storage unit, and the compression compresses the fabric defect detection neural network under the chip in an off-line compression mode and transmits the compressed fabric defect detection neural network to the storage unit on the chip through the input interface.

2. The method for realizing the fabric defect detection neural network based on the FPGA comprises the following steps:

setting at least one storage unit, storing an operation instruction, at least one candidate area data of a fabric image to be detected and weight data of a fabric defect detection neural network, which participate in calculation, and distributing the at least one candidate area data to at least one calculation unit in a one-to-one correspondence manner;

setting at least one calculation unit, and executing vector multiplication and addition operation in the calculation of the fabric defect detection neural network according to the operation instruction, the weight data and the allocated candidate area data to obtain a calculation result; the at least one storage unit stores the calculation result and outputs the calculation result that the candidate area contains the fabric defects;

the control unit is connected with the at least one storage unit and the at least one calculation unit, obtains the operation instruction through the at least one storage unit, and analyzes the operation instruction to control the at least one calculation unit;

the weight data is trained fabric defect detection neural network weights, the trained neural network weights are compressed outside the chip and stored in a storage unit when the neural network calculation is carried out, and the compression compresses the fabric defect detection neural network under the chip in an off-line compression mode and transmits the compressed fabric defect detection neural network to the storage unit on the chip through an input interface.

3. A hardware acceleration system for realizing a fabric defect detection neural network based on an FPGA comprises: a processor and FPGA acceleration apparatus as claimed in claim 1; wherein, the liquid crystal display device comprises a liquid crystal display device,

the processor is used for running the fabric defect detection device and sending initial data for calculation to the FPGA acceleration device;

and the FPGA accelerating device is used for executing calculation of the fabric defect detection neural network according to the initial data sent by the processor to obtain a calculation result, and returning the calculation result of the fabric defects contained in the candidate area to the processor.