CN115131339A

CN115131339A - Factory tooling detection method and system based on neural network target detection

Info

Publication number: CN115131339A
Application number: CN202210877409.6A
Authority: CN
Inventors: 林旭; 李密; 陈旭; 陈佳期; 唐光铁; 曾远强; 卢雨畋; 周小报
Original assignee: Fujian Strait Zhihui Technology Co ltd
Current assignee: Fujian Strait Zhihui Technology Co ltd
Priority date: 2022-07-25
Filing date: 2022-07-25
Publication date: 2022-09-30

Abstract

The factory tooling detection method and system based on neural network target detection provided by the embodiment of the application comprise the steps of dividing a data set, designing a loss function and a target function, training and reasoning a target detection model and the like. The target detection model used herein mainly adopts the YOLOv6 model, i.e., the target detection framework of the YOLO architecture. The method comprises the steps of dividing a training sample data set of a sample image of the work clothes, mainly adopting two classifications to distinguish and judge whether a wearing tool exists, transmitting a data set into a target detection framework to generate a corresponding detection model, and finally obtaining a reasoning detection result.

Description

Factory tooling detection method and system based on neural network target detection

Technical Field

The application relates to the technical field of industrial vision, in particular to a factory tooling detection method and system based on neural network target detection.

Background

In recent years, due to the reasons of irregular operation and irregular dressing, the operation device has become one of the main reasons of accidents of operators in industrial factories. The standardized operation specification detection also becomes a technical index of industrial detection.

The common clothes generate static electricity due to friction between the clothes when the clothes are dry in the weather or in operation, but the static electricity can not be generated in special occasions such as a transformer substation, an oil depot and the like, so that the special anti-static work clothes need to be worn, and the fabric only needs to be an anti-static fabric. In addition, for convenience of management, the style of the clothes is unified for key personnel such as internal workers, drivers and constructors. In order to ensure that all internal personnel wear the antistatic work clothes and the workers wear uniform clothes, all internal personnel in and out are detected. However, the inspection is only supervised by manpower, and a large amount of manpower resources are consumed.

In view of the above, the method and the system for detecting the factory area tooling based on the neural network target detection are provided, and can accurately and quickly detect the factory area tooling and identify whether the workers in the workplace wear the designated working clothing according to the requirements.

Disclosure of Invention

The embodiment of the application provides a factory tooling detection method and system based on neural network target detection, and aims to solve the technical problems mentioned in the background technology section.

In a first aspect, an embodiment of the present application provides a factory floor tooling detection method based on neural network target detection, including the following steps:

s1, acquiring a plurality of working clothes sample images, labeling the working clothes sample images, and determining all the working clothes sample images and labels corresponding to the working clothes sample images as a training sample data set;

s2, dividing the training sample data set into a training set, a verification set and a test set according to the proportion;

s3, constructing a target detection model: inputting the images in the training sample data set into a backbone network, continuously outputting three layers of feature graphs with different sizes through a Rep-PAN network at a neck layer according to three-layer output in the backbone network, inputting the feature graphs into a head layer, and performing three-type task prediction on the feature graphs; constructing a loss function of the target detection model;

s4, inputting the training set into the constructed target detection model for training, continuously iterating the loss function until convergence to obtain an optimal network weight, predicting the target detection model through the verification set, and testing and verifying through the test set; and

and S5, setting a fixed threshold value, and outputting a target detection result according to the fixed threshold value.

Through above-mentioned technical scheme, according to appointed work clothes training target detection model, gather the work clothes sample in a large number, through the degree of depth study, whether discernment staff dresses appointed work clothes as required in the workplace, to not wearing the personnel of appointed work clothes, grab the picture and report to the police. In practical application, voice prompt alarm can be performed.

In a specific embodiment, in step S3, inputting the images in the training sample data set into a backbone network includes the following sub-steps:

s311, inputting 640 × 3 images in the training sample data set into the backbone network, and outputting 320 × 3 × 2 images through a stem layer;

s312, the stem layer is connected with a plurality of ERBlock, each ERBlock performs down-sampling of the feature layer and channel increase, each ERBlock consists of an RVB and an RB, the feature layer is down-sampled in the RVB, the channel is increased at the same time, and the feature layers are fully fused in the RB and then output; and

and S313, finally, outputting three characteristic graphs by the backbone network.

In a specific embodiment, in step S3, the step of outputting three layers of feature maps with different sizes continuously through the Rep-PAN network at the tack layer includes the following sub-steps:

s321, outputting a feature map of 20 × 512 from ERB5, changing the feature map into a size of 20 × 128 by SConv, after the true frame height h and the true frame width w are increased by one time after upsampling, performing feature fusion on the channel layer with the output feature map of ERB4, changing the size of the feature map into 40 × 43 × 84, and after RB, outputting a feature map of 40 × 128;

s322, after the step S321 is repeatedly executed, outputting a first feature map;

s323, down-sampling the 80 × 64 feature map by the SConv to obtain a 40 × 64 feature map, fusing the features of the feature map on the channel layer, which are consistent with the real frame height h and the real frame width w in step S321, and outputting a second feature map after passing through RB; and

and S324, repeatedly executing the step S323, and outputting a third feature map.

In a specific embodiment, in step S3, inputting the feature map into the head layer, and performing three types of task prediction on the feature map includes the following sub-steps:

s331, outputting three branches from the neck layer, and for each branch, firstly performing feature fusion on the output feature graph through a BConv layer;

s332, after the features of the feature map are fused in the step S331, the feature map is divided into two branches, one branch completes the prediction of classification tasks through BConv + Conv, the other branch firstly passes through BConv fusion features and then is divided into two branches, one branch completes the regression of a frame through Conv, and the other branch completes the classification of the front background and the rear background through Conv; and

and S333, performing feature fusion on the three branches through a channel layer, and outputting a prediction result.

In a specific embodiment, in step S120, the training sample data set is divided into a training set, a validation set and a test set according to a ratio of 8:1:1

In a specific embodiment, in step S3, the loss function is an SIOU loss function, and the expression is:

SIOU＝DIOU+βv

wherein, DIOU is a distance loss function, β is a weight coefficient, and v is used for measuring the similarity of the aspect ratio between the prediction frame and the real frame;

wherein the content of the first and second substances,

w, h are the corresponding widths and heights of the prediction box and the real box, respectively.

In a specific embodiment, the method further comprises setting a path for the target detection model to pass in, and setting a reading path for the training sample data set to pass in the target detection model.

In a specific embodiment, the method further comprises controlling the number of training iterations and the size of an iteration picture of the target detection model by adjusting the epoch and blocksize parameters.

In a second aspect, the present application provides a factory floor tooling detection system based on neural network target detection, the system includes:

the acquisition module is used for acquiring a plurality of working clothes sample images, labeling the working clothes sample images with labels, and determining all the working clothes sample images and the labels corresponding to the working clothes sample images as a training sample data set;

the dividing module is used for dividing the training sample data set into a training set, a verification set and a test set according to the proportion;

the target detection module is used for constructing a target detection model: inputting the images in the training sample data set into a backbone network, continuously outputting three layers of feature graphs with different sizes through a Rep-PAN network at a neck layer according to three-layer output in the backbone network, inputting the feature graphs into a head layer, and performing three-type task prediction on the feature graphs; constructing a loss function of the target detection model;

the optimization module is used for inputting the training set into the constructed target detection model for training, continuously iterating the loss function until the loss function converges to obtain the optimal network weight, predicting the target detection model through the verification set, and testing and verifying the target detection model through the test set; and

and the output module is used for setting a fixed threshold value and outputting a target detection result according to the fixed threshold value.

In a third aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method of any one of the above.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, with reference to the accompanying drawings in which:

FIG. 1 is a flow chart of a factory floor tooling detection method based on neural network target detection according to the present application;

FIG. 2 is a schematic flow chart of a plant area tooling detection method based on neural network target detection according to the present application;

FIG. 3 is a schematic diagram of target detection model training parameters according to one embodiment of the present application;

FIG. 4a is a schematic diagram of a labels _ correlogram histogram according to one embodiment of the present application;

FIG. 4b is a schematic representation of the behavior of a training set and a validation set on a model according to one embodiment of the present application;

fig. 4c is a schematic diagram of P _ curve according to an embodiment of the present application;

FIG. 4d is a schematic diagram of PR _ curve according to one embodiment of the present application;

fig. 4e is a schematic diagram of R _ curve according to an embodiment of the present application;

FIG. 5 is a schematic illustration of prediction of an object detection model according to an embodiment of the present application;

FIG. 6 is a schematic illustration of prediction of an object detection model according to another embodiment of the present application;

FIG. 7 is a schematic diagram of a factory floor tooling detection system based on neural network target detection according to the present application;

FIG. 8 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

FIG. 1 illustrates a flow diagram of a factory floor tooling detection method based on neural network target detection in accordance with the present application; fig. 2 shows a specific flowchart of the factory floor tooling detection method based on neural network target detection according to the present application. Referring collectively to fig. 1 and 2, the method 100 includes the steps of:

s1, acquiring a plurality of work clothes sample images, labeling the work clothes sample images, and determining all the work clothes sample images and corresponding labels as a training sample data set;

s3, constructing a target detection model: inputting images in a training sample data set into a backbone network, continuously outputting three layers of feature graphs with different sizes through a Rep-PAN network at a neck layer according to three-layer output in the backbone network, inputting the feature graphs into a head layer, and performing three types of task prediction on the feature graphs; constructing a loss function of a target detection model;

in this embodiment, the method further includes setting a path for the target detection model, and setting a training sample data set reading path for the target detection model. And controlling the training iteration times and the iteration picture size of the target detection model by adjusting the epoch and batch size parameters. Referring to fig. 3, fig. 3 is a schematic diagram illustrating training parameters of a target detection model according to an embodiment of the present application.

In this embodiment, inputting the images in the training sample data set into the backbone network includes the following sub-steps:

s311, inputting images in the training sample data set of 640 × 3 into a backbone network, and outputting 320 × 3 × 2 images through a stem layer;

s312, connecting a plurality of ERBlock layers by a stem layer, wherein each ERBlock performs down-sampling of the feature layer and channel increase, each ERBlock consists of an RVB and an RB, the down-sampling of the feature layer is performed in the RVB, the channel is increased at the same time, and the feature layers are fully fused in the RB and then output; and

and S313, finally, outputting the three characteristic graphs by the backbone network.

In the embodiment, the method for continuously outputting three layers of feature maps with different sizes through the Rep-PAN network at the nack layer comprises the following sub-steps:

s323, down-sampling the 80 × 64 feature map by SConv to obtain a 40 × 64 feature map, fusing the features of the feature map, which is consistent with the real frame height h and the real frame width w in step S321, on the channel layer, and outputting a second feature map after passing through RB; and

In the embodiment, the feature diagram is input into the head layer, and three types of task prediction are carried out on the feature diagram, wherein the three types of task prediction comprise the following sub-steps:

s331, outputting three branches from the tack layer, and for each branch, firstly performing feature fusion on the output feature graph through the BConv layer;

s332, after feature fusion of the feature graph in the step S331, dividing the feature graph into two branches, wherein one branch completes prediction of classification tasks through BConv + Conv, the other branch firstly performs feature fusion through BConv and then divides the other branch into two branches, one branch completes regression of a frame through Conv, and the other branch completes classification of front and rear backgrounds through Conv; and

In this embodiment, the loss function is a sio loss function, and the expression is:

SIOU＝DIOU+βv

wherein the content of the first and second substances,

w, h are the corresponding width and height of the prediction box and the real box, respectively.

S4, inputting the training set into the constructed target detection model for training, continuously iterating the loss function until convergence to obtain the optimal network weight, predicting the target detection model through the verification set, and testing and verifying the target detection model through the test set; the training process can be visualized and the change curve of the relevant indexes of the model can be checked. 4a-e, FIGS. 4a-4e show the labels _ correlogram histogram, the representation of the training set and the verification set on the model, and the P _ curve, PR _ curve, and R _ curve, respectively.

In a specific embodiment, the threshold variable set is iou-thres, and is set to 0.65, that is, the confidence level is greater than or equal to 0.65, and then the threshold variable is selected as the condition. And (4) carrying out predictive reasoning on the trained target detection model by using the test set samples. As shown in fig. 5 and 6, fig. 5 and 6 show schematic diagrams of prediction of an object detection model according to an embodiment of the present application.

Through above-mentioned technical scheme, according to appointed work clothes training target detection model, gather the work clothes sample in a large number, through the degree of depth study, whether discernment staff dresses appointed work clothes as required in the workplace, to not wearing the personnel of appointed work clothes, grab the picture and report to the police. In practical application, voice prompt alarm can be combined.

With further reference to fig. 7, as an implementation of the method described above, the present application provides an embodiment of a factory floor tooling detection system based on neural network target detection, where the embodiment of the system corresponds to the embodiment of the method shown in fig. 1, and the system may be specifically applied to various electronic devices. The system 200 includes:

the obtaining module 210 is configured to obtain a plurality of work clothes sample images, label the work clothes sample images, and determine all the work clothes sample images and labels corresponding to the work clothes sample images as a training sample data set;

a dividing module 220, configured to divide the training sample data set into a training set, a verification set, and a test set in proportion;

an object detection module 230, configured to construct an object detection model: inputting images in a training sample data set into a backbone network, continuously outputting three layers of feature graphs with different sizes through a Rep-PAN network at a neck layer according to three-layer output in the backbone network, inputting the feature graphs into a head layer, and performing three types of task prediction on the feature graphs; constructing a loss function of a target detection model;

the optimization module 240 is configured to input the training set into the constructed target detection model for training, continuously iterate the loss function until convergence, obtain an optimal network weight, predict the target detection model through the verification set, and perform test verification through the test set; and

and an output module 250, configured to set a fixed threshold, and output a target detection result according to the fixed threshold.

As shown in fig. 8, the computer system 300 includes a Central Processing Unit (CPU)301 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)302 or a program loaded from a storage section 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data necessary for the operation of the system 300 are also stored. The CPU301, ROM 302, and RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

The following components are connected to the I/O interface 305: an input portion 306 including a keyboard, a mouse, and the like; an output section 307 including a Liquid Crystal Display (LCD) and the like and a speaker and the like; a storage section 308 including a hard disk and the like; and a communication section 309 including a network interface card such as a LAN card, a modem, or the like. The communication section 309 performs communication processing via a network such as the internet. A drive 310 is also connected to the I/O interface 305 as needed. A removable medium 311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 310 as necessary, so that a computer program read out therefrom is mounted into the storage section 308 as necessary.

In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 309, and/or installed from the removable medium 311. The computer program performs the above-described functions defined in the method of the present application when executed by the Central Processing Unit (CPU) 301. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present application may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes an acquisition module, an analysis module, and an output module. Wherein the names of the modules do not in some cases constitute a limitation of the module itself.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A factory floor tool detection method based on neural network target detection is characterized by comprising the following steps:

2. The factory floor tooling detection method based on neural network target detection as claimed in claim 1, wherein in step S3, inputting the images in the training sample data set into a backbone network, comprising the following sub-steps:

3. The factory floor tooling detection method based on neural network target detection as claimed in claim 1, wherein in step S3, the method continues to output three layers of feature maps with different sizes through the Rep-PAN network at the tack layer, and comprises the following sub-steps:

4. The factory floor tool detection method based on neural network target detection as claimed in claim 1, wherein in step S3, the feature map is input into the head layer, and three types of task prediction are performed on the feature map, including the following sub-steps:

s332, after the feature of the feature map is fused in the step S331, the feature map is divided into two branches, one branch completes prediction of classification tasks through BConv + Conv, the other branch firstly completes regression of a frame through the BConv and then is divided into two branches, and the other branch completes classification of a front background and a rear background through the Conv; and

5. The factory tooling detection method based on neural network target detection according to claim 1, characterized in that in step S120, the training sample data set is divided into a training set, a verification set and a test set according to a ratio of 8:1: 1.

6. The factory floor tooling detection method based on neural network target detection as claimed in claim 1, wherein in step S3, said loss function is a SIOU loss function, and the expression is:

SIOU＝DIOU+βv

wherein the content of the first and second substances,

7. The factory floor tooling detection method based on neural network target detection according to claim 1, further comprising setting a path transmission of the target detection model, and setting a reading path of the training sample data set to be transmitted to the target detection model.

8. The factory floor tooling detection method based on neural network target detection according to claim 1, characterized by further comprising controlling the number of training iterations and the iterative picture size of the target detection model by adjusting the epoch and blocksize parameters.

9. The utility model provides a factory frock detecting system based on neural network target detection which characterized in that, the system includes:

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-8.