CN110399905A

CN110399905A - The detection and description method of safety cap wear condition in scene of constructing

Info

Publication number: CN110399905A
Application number: CN201910593069.2A
Authority: CN
Inventors: 徐守坤; 李宁
Original assignee: Changzhou University
Current assignee: Changzhou University
Priority date: 2019-07-03
Filing date: 2019-07-03
Publication date: 2019-11-01
Anticipated expiration: 2039-07-03
Also published as: CN110399905B

Abstract

The present invention provides the detection and description method of safety cap wear condition in a kind of construction scene, the present invention is detected and is described using image and the method for natural language processing to worker safety helmet wear condition.In terms of iamge description due to be currently based on the Image Description Methods of neural network lack interpretation and detail section describe it is insufficient, the research that scene image describes of constructing simultaneously is more deficient, the present invention generates the descriptive statement that safety cap is worn using the method that YOLOv3 algorithm of target detection and rule-based and template combine.Using K-means clustering initialization anchor frame parameter value, then it is trained and detects on self-control data set, the iamge description for finally combining sentence template generation safety cap to wear according to predefined rule.The present invention has apparent advantage in terms of detection efficiency, while the description generated is more accurate, can achieve the purpose for reducing accident rate.

Description

The detection and description method of safety cap wear condition in scene of constructing

Technical field

The present invention relates to image understanding technical fields, more particularly to a kind of inspection of safety cap wear condition in construction scene Survey and describe method.

Background technique

In recent years, with the increasingly quickening of Urbanization in China, the continuous development of infrastructure, construction accident is frequently sent out It is raw.Substation, chemical plant, mine workspace etc. is constructed, and scene is complex, and there are certain risk factor, worker's is dangerous Behavior easily causes accident, causes casualties and economic loss.At the construction field (site), safety cap is the guarantee of life, and worker wears Safety cap meets the requirement of behavioural norm, and the wearing of safety cap can reduce the operating risk of worker to a certain extent.To ensure The personal safety of staff reduces accident rate caused by due to non-safe wearing cap, for the safety cap of construction personnel The behavior description of wearing problem is particularly important.

Iamge description is to give expression in picture on the basis of image recognition with the method for natural language processing Hold, is the further understanding to image recognition.In construction scene, research to the iamge description of worker's safe wearing cap has Important meaning and application value.

Image Description Methods description generated at present is the global description to image mostly, and detailed information easy to be lost lacks Weary certain accuracy rate.For the picture for scene of constructing, iamge description is generated in terms of the safety cap of construction personnel is worn, It is the basis for analyzing the construction site situation, thus further judge the safety and operability of construction, it is hidden to exclude safety Suffer from.The current existing research worn about safety cap, is carried out for this task of image recognition.Either utilize tradition Algorithm or depth learning technology detect the wearing of safety cap, have achieved considerable research achievement, but there are one Fixed limitation is not yet described using safety cap wear condition of the natural language to operating personnel.

Summary of the invention

The technical problems to be solved by the present invention are: in order to overcome the shortcomings in the prior art, the present invention provides one kind and applies In the scape of workshop safety cap wear condition detection and description method, using YOLOv3 algorithm of target detection and it is rule-based and The method that template combines generates the descriptive statement that safety cap is worn, and can more accurately judge and describe construction personnel with sentence Whether worn safety cap in the construction process, to exclude security risk, improve the safety coefficient in construction scene, while its Accurate detection effect and efficient detection speed can also provide theory support for intelligent monitoring machine people.

The present invention solves its technical problem technical solution to be taken: safety cap wear condition in a kind of construction scene Detection and description method, comprising the following steps:

S1: the production of data set

The Image Acquisition work of picture, the voluntarily modes such as collection in worksite picture expanding data collection is collected by web crawlers technology Make.Data collected include various background conditions, different resolution, different quality construction site wear about safety cap Picture, the construction personnel of the construction personnel containing safe wearing cap and non-safe wearing cap.Picture total amount amounts to 5000, data Collection it is rich obtain certain guarantee, include various scene conditions, can be more complete reflection real scene the case where.Number It is divided into two steps according to the making step of collection:

S1.1: safety cap wears the production of detection data collection

According to the annotation formatting of Pascal VOC2007 common data sets using open source annotation tool LabelImg to picture Sample carries out multi-tag mark, automatically generates corresponding xml format mark file, wherein including object name and real border frame Coordinate information.The target category of mark are as follows: people (man), safety cap (helmet) and people wear a safety helmet (man wear helmet)。

S1.2: safety cap wears image caption data collection production method are as follows: the data set labeled to step S1.1 carries out Sentence mark.In such a way that self-programming marking software and artificial mark combine, caption data collection mark is divided into:

S1.2.1: utilizing self-programming marking software, reads the title and dimension information (wide and high, unit of every picture Pixel), and assign the unique picture of every picture id；

S1.2.2: carrying out subtitle mark to picture using self-programming marking software, manually marks 5 descriptions of every picture Sentence, the safety cap wearing mainly around personnel in construction scene is described, and assigns unique sentence id to each sentence Number.Every picture has corresponding picture id and 5 sentences id corresponding, stores picture word with json format Curtain labeled data.

S2: target detection

S2.1: the selection of detection model

The existing algorithm of target detection based on deep learning is broadly divided into two kinds, is the two- based on region detection respectively Stage method and one-stage method without region detection.The thought of two-stage algorithm of target detection is first to obtain region Detection, then classifies in current region, this will lead to the increase of the time complexity based on candidate region method, detection Time is longer.In one-stage algorithm of target detection, the target detection based on YOLOv3 achieves certain effect, such The thought of algorithm is directly predict classification and the position of different target using only a CNN network, be one kind fast and accurately Target detection technique.Compared with other detection methods, in detection accuracy, if target to be detected is not very small, detection essence Degree is close with Faster-RCNN.For belonging to the SSD of one-stage detection method, YOLOv3 detection speed and Precision aspect is also superior to SSD.Comprehensively consider in terms of the detection speed of algorithm and detection accuracy two, chooses YOLOv3 as construction In scene whether safe wearing cap judgement and descriptive model, trained model can be preferably applied in engineering.

S2.2: self-control data set pretreatment

Data set is worn according to Pascal VOC format self-control safety cap, markup information includes the classification and bounding box of target Coordinate, markup information is normalized, the available trained format of YOLOv3 is converted into.

It, i.e., will most by sample labeled data divided by the width and height of image to sample labeled data normalized Whole data control between 0~1, convenient for the quick reading of training sample data, while meeting the requirement of multiple dimensioned training.Tool Body normalization formula is shown below:

Wherein, (x_max,x_min,y_max,y_min) indicate that original sample bounding box markup information, (width, height) indicate figure Chip size, (x, y, w, h) indicate that the markup information after normalization, (x, y) are the center point coordinate of target, and (w, h) is target It is wide and high.In data sample after normalization, the bounding box information of each target of every picture include 5 parameters (x, y, W, h, class_id), wherein class_id is target category number.

S2.3:K-means clustering initialization anchor frame

YOLOv3 using K-means clustering algorithm initialize anchor frame, to the coordinate of predicted boundary frame, anchor frame size it is big The small height for influencing whether Detection accuracy.The K-means clustering algorithm of original YOLOv3 uses Euclidean distance formula, anchor frame Parameter value is to cluster to generate in common data sets, and anchor frame parameter value has generality, but is not suitable for homemade safety cap Data set is worn, therefore needs to design new anchor frame before being trained, to improve the verification and measurement ratio of bounding box.In homemade peace Full cap wears progress K-means on data set and clusters to obtain 9 anchor frames, is arranged successively from small to large, is uniformly distributed in 3 kinds On the characteristic pattern of scale, preceding 3 anchor frames correspond to 52 × 52 characteristic pattern, and 3 intermediate anchor frames correspond to 26 × 26 feature Figure, rear 3 anchor frames correspond to 13 × 13 characteristic pattern.Finally obtain 9 anchor frame parameter values respectively (26,19), (49,36), (58,145), (76,58), (101,199), (123,111), (152,222), (223,261), (372,491) respectively correspond c1 The width of~c8 cluster centre point coordinate, the high-dimensional target frame corresponding to cluster centre point of the width of anchor frame is high, and data unit is picture Element.

S2.4: the training of network model

YOLOv3 is to handle whole merely with single CNN as a kind of major advantage of one-stage object detection method Picture while positioning to the target in image and predicts its target category, target detection problems is converted to recurrence Problem.In addition to needing detection and localization coordinates of targets information when network training, it is also necessary to the confidence level of predicted boundary frame and predetermined The score of the target category of justice.The training step of network model is as follows:

S2.4.1: coordinates of targets Information locating

Input picture is expressed as the tensor that a size is n × m × 3, and wherein n and m indicates the width and height of picture, single Position is pixel, and 3 indicate RGB triple channel number.Different size of image is automatically regulated to be to 416 × 416 fixed dimension first, then Original image is divided into 13 × 13 grid, is responsible for the detection of the target as the grid where target's center's point.Each grid will be pre- The confidence level of 3 bounding boxes and these bounding boxes that are covered on the grid is surveyed, each bounding box includes 6 premeasurings: x, Y, w, h, confidence and class_id, wherein (x, y) indicates the center of the bounding box of prediction and the relative value of net boundary, (w, h) indicates ratio of the predicted boundary frame relative to whole picture width and height, and confidence indicates confidence level, to pick Except the bounding box for being lower than threshold value, class_id indicates target category number.The predictive information of each bounding box includes bounding box Coordinate, width and height, bounding box coordinates calculation formula are as follows:

b_x=σ (t_x)+c_x, b_y=σ (t_y)+c_y

Wherein, (bx, by, bw, bh) indicates the bounding box centre coordinate predicted and width and height.(tx, ty, tw, th) is indicated The target of e-learning, (cx, cy) are the coordinate shift amounts of grid, and (pw, ph) is preset anchor frame dimension.

S2.4.2: the prediction of bounding box confidence level

After coordinates of targets Information locating, the confidence level of predicted boundary frame is needed, 3 classes (C=3) target according to mark: People (man), safety cap (helmet) and people wear a safety helmet (man wear helmet), are 3 bounding boxes of each grid forecasting, Each bounding box includes 6 premeasurings, therefore port number is 3 × (4+1+3)=24, and 3 kinds of scale feature figures of output are respectively 13×13×24、26×26×24、52×52×24。

S2.4.3: the score prediction of predefined target classification.

The confidence level prediction of bounding box is completed, then the score of predefined target classification is predicted, to improve Small object Detection effect, using the thought of multi-scale prediction, respectively with the feature of 13 × 13,26 × 26,52 × 52 this 3 kinds of different scales Figure is predicted.

S2.4.4: the training of model

The characteristics of according to homemade data set, makes corresponding modification to the configuration file of YOLOv3 network.It is being trained Before, the weight file for providing official website is also needed to be converted into the weight under Keras frame according to modified network profile File provides initiation parameter so as to the load of pre-training model for the training of model.

Batch processing size (batch) when training is set as 64, i.e., every wheel iteration randomly selects 64 sample datas and participates in Training, grouping (subdivision) are set as 8, i.e., sample are divided into 8 groups and sent to network training, to mitigate the pressure of EMS memory occupation. Using BN (batch normalization) specification network model, with the convergence rate of lift scheme.Momentum (momentum) is set It is set to 0.9, weight decaying (decay) is set as 0.0005, to prevent model over-fitting, initial learning rate (learning Rate it) is set as 0.001,5000 learning rates of every iteration decay to original 1/10.The final iteration of model 20000 times, time-consuming 8 A hour, with the increase of the number of iterations, the loss of model is being gradually reduced.Model is rapid in preceding 4000 iterative process Fitting, penalty values decrease speed is fast, and penalty values tend towards stability after iteration 10000 times, only a little oscillation.

S2.5: target detection

Input picture size is reset to 416 × 416 first, then carries out picture feature using Darknet-53 network Extract, then feature vector is sent and carries out multi-scale prediction into feature pyramid structure, finally to the bounding box predicted into Row non-maxima suppression obtains final prediction result to eliminate to repeat to detect.

S3: sentence generates

The visual concept in image is detected first with algorithm of target detection, secondly combines rule predetermined and sentence Subtemplate, the visual concept that then will test are filled into sentence template, ultimately produce the descriptive statement of safety cap wearing.Sentence The definition method of sub- description rule and template:

S3.1: the definition of sentence description rule

3 classification target visual concept of people according to people, safety cap and the safe wearing cap extracted in last stage target detection. It is zero triple (m, n, p) to count of target to be detected by the way that an initial value is respectively set to 3 classes target to be detected Number, wherein m indicates that the total number of persons detected, n indicate the total number of the safety cap detected, and p indicates the safe wearing detected The number of cap.The personnel of safe wearing cap should be no more than the total number of persons (0≤p≤m) of construction site, otherwise be considered as the wrong (p of detection > m), the descriptive statement of safety cap wearing can not be generated.If it is detected that safe wearing cap number and total number of persons it is equal, i.e. p= M then indicates all and has worn safety cap per capita；If it is detected that safe wearing cap number and total number of persons differ, i.e. p ≠ m, then It indicates that groups of people have worn safety cap, still there is the non-safe wearing cap of personnel.

S3.2: the definition of sentence description template

Sentence description template is generated by picture subtitle mark, the generating resource of word in original picture subtitle mark or It is the visual concept that algorithm of target detection extracts.The essence of vision word is a label, it is therefore an objective to be special in description image The word for determining region retains vacancy.Use algorithm of target detection to extract visual concept, in conjunction with rule-based and template side Method generates the iamge description sentence that construction personnel's safety cap is worn in construction scene.

The detection and description method of safety cap wear condition, adopt YOLOv3 algorithm in a kind of construction scene provided by the invention Can more accurately judge whether construction personnel's has worn safety cap in the construction process, and to exclude security risk, raising is applied Safety coefficient in the scape of workshop, while its accurate detection effect and efficient detection speed can also mention for intelligent monitoring machine people For theory support.

Detailed description of the invention

Present invention will be further explained below with reference to the attached drawings and examples.

Fig. 1 is inventive algorithm flow chart.

Fig. 2 is inventive algorithm frame diagram.

Fig. 3 is the comparative result figure of NIC method and embodiment method to iamge description, wherein figure (a) is single wearing figure As description, figure (b) is that more people wear iamge description.

Fig. 4 is embodiment method visualized experiment result figure.

Specific embodiment

Presently in connection with attached drawing, the present invention is described in detail.This figure is simplified schematic diagram, is only illustrated in a schematic way Basic structure of the invention, therefore it only shows the composition relevant to the invention.

As shown in Figure 1, in a kind of construction scene of the invention safety cap wear condition detection and description method, tie below Specific embodiment is closed method of the invention is described in detail.

Embodiment platform is built with Linux, selects Ubuntu16.04 as operating system, GPU selects NVIDIA GeForce GTX 1080Ti, CUDA8.0, CUDNN6.0, inside save as 12GB.Model is carried out using Keras deep learning frame Training and test.Selection single phase algorithm of target detection YOLOv3 examines the safety cap wearing of construction personnel in picture It surveys.Using rule-based and template method, in conjunction with the wearing detection algorithm of safety cap, to generate the wearing of construction personnel's safety cap Iamge description, algorithm flow chart of the invention is as shown in Figure 1.

(1) production of data set

1) safety cap wears the production of detection data collection

2) safety cap wears image caption data collection production method are as follows: to the first step 1) labeled data set carries out sentence Mark.In such a way that self-editing marking software and artificial mark combine, caption data collection mark is divided into:

A) self-programming marking software is utilized, the picture name and dimension information (wide and high) of every picture are read, and is assigned Unique picture id of every picture；

B) subtitle mark is carried out to picture using self-programming marking software, manually marks 5 descriptive statements of every picture, Safety cap wearing mainly around personnel in construction scene is described, and assigns unique sentence id to each sentence.Often Picture has corresponding picture id and 5 sentences id corresponding, with json format storage picture subtitle mark Data.

(2) safety cap wears detection

Homemade safety cap wearing detection data collection is divided into three groups according to the ratio of 7:2:1 by embodiment, is respectively instructed Practice collection, verifying collection and test set, wherein 3500 pictures are as training sample, 1000 pictures are as verifying sample, 500 figures Piece is as test sample.Training set and verifying collection include markup information, and test sample is free of markup information, after verifying training Model validity.

1) safety cap wears the pretreatment of detection data collection

The anchor frame parameter value of common data sets is not suitable for data set of the invention, it is therefore desirable to according to homemade safety cap Wear the parameter value that data set redefines anchor frame.By K-means clustering algorithm, cluster point is carried out on data set of the present invention Analysis obtains 9 anchor frame parameter values respectively (26,19), (49,36), (58,145), (76,58), (101,199), (123, 111), (152,222), (223,261), (372,491) respectively correspond c1~c8 cluster centre point coordinate, the wide higher-dimension of anchor frame The width that degree corresponds to the target frame of cluster centre point is high.

2) training and test of model

According to the present invention the characteristics of homemade data set, corresponding modification is made to the configuration file of YOLOv3 network.Into Before row training, needs to carry out the conversion operation of weight file, the weight file that official website provides is matched according to modified network The weight file that file is converted under Keras frame is set, so as to the load of pre-training model, provides initialization for the training of model Parameter.

Batch processing size (batch) when training is set as 64, i.e., every wheel iteration randomly selects 64 sample datas and participates in Training, grouping (subdivision) are set as 8, i.e., sample are divided into 8 groups and sent to network training, to mitigate the pressure of EMS memory occupation. Using BN (batch normalization) specification network model, with the convergence rate of lift scheme.Momentum (momentum) is set It is set to 0.9, weight decaying (decay) is set as 0.0005, to prevent model over-fitting, initial learning rate (learning Rate it) is set as 0.001,5000 learning rates of every iteration decay to original 1/10.The final iteration of model 20000 times, time-consuming 8 A hour, experiments have shown that the loss of model is being gradually reduced with the increase of the number of iterations.In preceding 4000 iterative process Model is fitted rapidly, and penalty values decrease speed is fast, and penalty values tend towards stability after iteration 10000 times, only a little oscillation.

The present invention using YOLOv3 algorithm of target detection realize safety cap wearing detection, while with Faster-RCNN, SSD algorithm compares experiment.After training, the Model Weight file loaded surveys model on test set Examination assessment, inventive algorithm is slightly below in detection Average Accuracy AP (Average Precision) value of safety cap Faster-RCNN, but it is of the invention in the detection speed of algorithm and 3 classification target Average Accuracy mean value MAP (Mean AP) Algorithm is superior to other algorithms.

(3) the iamge description sentence generation that safety cap is worn

Detect the visual concept in image using algorithm of target detection, in conjunction with it is predetermined rule and sentence template, The visual concept that will test is filled into sentence template, ultimately produces the descriptive statement of safety cap wearing.Its algorithm frame figure As shown in Figure 2.

Homemade safety cap is worn into image caption data collection and is divided into three groups by 7:2:1, to guarantee experimental data sample Consistency.The size of training set, verifying collection and test set is respectively 3500,1000 and 500, and training set and verifying collection include Picture and its corresponding subtitle mark, and subtitle of the test set without picture marks, and the validity of this method is verified with this.

1) safety cap wears the pretreatment of image caption data collection

Homemade image caption data collection is pre-processed, primary operational are as follows: a) more than 15 in truncation mark sample The subtitle of word marks sentence；B) ", " and " " in mark sample is deleted, and carries out the unification of word capital and small letter, capitalization is single Word is converted to small letter；C) word frequency is counted, and assigns its unique No. id for each word in mark sample；D) word is constructed The table that converges will mark the word at least occurring 3 times in sample and be stored in vocabulary comprising 3 group informations (word id, word and word frequency) In, remaining word is considered as uncommon word, is indicated with " UNK ".

Construct vocabulary on homemade picture subtitle training set, word total amount is 183047, share 2872 it is different Word is 3 screening words with threshold value, and generating active vocabulary size is 1343, that is, describing vocabulary size is 1343 differences Word.

2) definition of sentence description rule and template

A) definition of sentence description rule.According to people, safety cap and the safe wearing cap extracted in last stage target detection 3 classification target visual concept of people.By to 3 classes target to be detected be respectively set an initial value be zero triple (m, n, P) to count the number of target to be detected, wherein m indicates that the total number of persons detected, n indicate the total number of the safety cap detected, P indicates the number of the safe wearing cap detected.The personnel of safe wearing cap should be no more than construction site total number of persons (0≤p≤ M), otherwise it is considered as and detects wrong (p > m), the descriptive statement of safety cap wearing can not be generated.If it is detected that safe wearing cap Number and total number of persons are equal, i.e. p=m then indicates all and worn safety cap per capita；If it is detected that safe wearing cap number With total number of persons etc., i.e. p ≠ m then indicates that groups of people have worn safety cap, still there is the non-safe wearing cap of personnel.

B) definition of sentence description template.Sentence description template is generated by picture subtitle mark, the generating resource of word In the visual concept that original picture subtitle mark and algorithm of target detection extract.The essence of vision word is a label, Purpose is to retain vacancy for the word of specific region in description image.Algorithm of target detection is used to extract visual concept, is tied Rule-based and template method is closed, the iamge description sentence that construction personnel's safety cap is worn in construction scene is generated.

C) generation of sentence

Sentence is ultimately produced by previously defined sentence description rule combination sentence description template, such as: sentence template can For "<num-1>men<verb-1><noun-1>on their heads ", the view in region is then extracted using YOLOv3 algorithm Feel concept (man, wear, helmet), and combine the rule of predefined descriptive statement, m=2 and p=2, indicates own in figure Construction personnel has worn safety cap, be filled with into sentence template (<num-1>→ two,<verb-1>→ wear,< Noun-1 > → helmets) ultimately generate the descriptive statement of image are as follows: " two men wear helmets on their heads.”

(4) interpretation of result

For the validity for verifying inventive algorithm, worn on image caption data collection in homemade safety cap, it will be of the invention Algorithm is compared with iamge descriptions algorithms such as NIC, Soft-Attention, Adaptive respectively, and discovery inventive algorithm exists Score and Adaptive algorithm on BLEU-4 maintain an equal level, but the score on other evaluation indexes increases, and reason is Inventive algorithm carries out wearing detection to safety cap, strengthens the corresponding relationship between image-region and descriptive statement, binding rule Safety cap is worn with the method for template and carries out iamge description, can accurately describe the number of safe wearing cap in picture And the number of non-safe wearing cap.

While in order to verify the validity of the method for the present invention, the test picture that no subtitle marks is tested, this is sent out The descriptive statement that bright method generates same picture with other algorithms compares, as shown in figure 3, since Markup Language is English, so experiment output descriptive statement is English.(a) shows that single safety cap wears description in Fig. 3, and left figure is that illumination is good Descriptive statement in good situation, NIC describe sentence are as follows: the working man wears a yellow helmet (that work People has on the helmet of a top yellow), the method for the present invention describes sentence are as follows: a man wears a helmet on his head (people wears the helmet).The right descriptive statement in illumination deficiency situation of figure.NIC describes sentence are as follows: the man with a White helmet is wearing a blue shirt (people of the Dai Baise helmet wears a blue shirts), side of the present invention Method describes sentence are as follows: a man wears a helmet on his head (people has on the helmet).What two kinds of algorithms generated Though descriptive statement slightly difference has good description for single wear condition.(b) shows more people's safety caps in Fig. 3 Description, the left descriptive statement in the case of some of the staff's safe wearing cap of figure are worn, NIC describes sentence are as follows: three men Wear helmets in the factory (has on the helmet there are three people in factory), and the method for the present invention describes sentence are as follows: two Men wear helmets and a man without a helmet (two people have on the helmet, and one is not helmeted).Figure is right For the descriptive statement in the larger situation of target size difference, NIC describes sentence are as follows: two men are wearing helmets On the field (two people grounds on the scene have on the helmet), the method for the present invention describes sentence are as follows: three men all wear Helmets on their heads (three people have on the helmet).Therefrom as it can be seen that two kinds of algorithms respectively have superiority and inferiority.NIC algorithm is raw At descriptive statement have diversity, but since the algorithm easily causes the loss of detailed information, be unable to accurate description safe wearing The personnel amount of cap.Because the present invention uses the rule-based method combined with template to generate iamge description, from the more of sentence From the aspect of sample, the descriptive statement of generation slightly has deficiency, but the method for the present invention the number of safe wearing cap can be carried out compared with Good description.

It is illustrated in figure 4 the visual experimental result picture of the present invention, Fig. 4 totally 6 picture, from left to right from top to bottom, often Picture is tested iamge description generated and is respectively as follows:

(people not helmeted is making great efforts work to a man without a helmet is working hard Make)

(one is worn the man wearing a helmet is standing in the construction site The people of the helmet is just standing on-site)

Tow persons without helmets are working (two people not helmeted are working)

(one is worn people's dress of the orange helmet to a man in an orange helmet wears an orange vest Orange vest)

A man in a white helmet is smiling (man for having on the white helmet is smiling)

A man is wearing a blue helmet on his head (has on the male of the blue helmet on a head People)

Fig. 4 is the results showed that whether single safety cap wear condition or more people's safety cap wear conditions；No matter Construction personnel's safety cap of construction personnel's safety cap wear condition or complicated construction scene under simple construction scene wears feelings Condition, this method can preferably realize that image, semantic describes.Therefore the method for the present invention can be directed under different complex scenes and construct The safety cap wear condition of personnel carries out more accurate image, semantic description.

Taking the above-mentioned ideal embodiment according to the present invention as inspiration, through the above description, relevant staff Various changes and amendments can be carried out without departing from the scope of the present invention completely.The technical scope of this invention is not The content being confined on specification, it is necessary to which the technical scope thereof is determined according to the scope of the claim.

Claims

1. the detection and description method of safety cap wear condition in a kind of construction scene, it is characterised in that: the following steps are included:

S1: the production of data set；

Picture is collected web crawlers technology or voluntarily by way of collection in worksite picture to construction site developing scenes data set Image Acquisition work；It closes the construction site that data collected include various background conditions, different resolution and different quality The construction personnel of the construction personnel containing safe wearing cap and non-safe wearing cap in the picture that safety cap is worn, picture, will All pictures of acquisition wear data set as safety cap；The production that safety cap wears data set includes: that safety cap wears detection Data set production and safety cap wear the production of image caption data collection；

S2: target detection；

S2.1: the selection of detection model comprehensively considers in terms of the detection speed of algorithm and detection accuracy two, chooses YOLOv3 and makees For construction scene in whether safe wearing cap judgement and descriptive model；

S2.2: self-control data set pretreatment, the markup information for wearing data set to safety cap homemade in step S1 carry out normalizing Change processing, is converted into the available trained format of YOLOv3；

S2.3:K-means clustering initialization anchor frame；

Safety cap after step S2.2 normalization, which is worn, carries out K-means clustering algorithm initialization anchor frame on data set, to The coordinate of predicted boundary frame；

S2.4: the training of network model；

Label target coordinate information is positioned first, then label target bounding box confidence level is predicted and right The score of predefined target classification is predicted, will finally not marked test picture and is sent into trained target detection network model In, if detection target score be greater than given threshold if outline the target detected in image and output target score, otherwise without Method detects the target in image；

S2.5: network test

Input picture size is reset to 416 × 416 first, then carries out mentioning for picture feature using Darknet-53 network It takes, then feature vector is sent and carries out multi-scale prediction into feature pyramid structure, finally the bounding box predicted is carried out Non-maxima suppression obtains final prediction result to eliminate to repeat to detect；

S3: sentence generates；

The visual concept in image is detected first with algorithm of target detection, secondly combines rule predetermined and sentence mould Plate, the visual concept that then will test are filled into sentence template, ultimately produce the descriptive statement of safety cap wearing.

2. the detection and description method of safety cap wear condition in scene of constructing as described in claim 1, it is characterised in that: step The production that safety cap wears data set in rapid S1 specifically includes:

S1.1: safety cap wears the production of detection data collection；

According to the annotation formatting of Pascal VOC2007 common data sets using open source annotation tool LabelImg to picture sample Multi-tag mark is carried out, corresponding xml format mark file is automatically generated, wherein including the coordinate of object name and real border frame Information；The target category of mark are as follows: people, safety cap and people wear a safety helmet；

S1.2: and safety cap wears the production of image caption data collection；

The data set labeled to step S1.1 carries out sentence mark, is combined using self-programming marking software and artificial mark Mode, caption data collection mark are divided into:

S1.2.1: self-programming marking software is utilized, reads the title and dimension information of every picture, and assign every picture only One picture id；

S1.2.2: subtitle mark is carried out to picture using self-programming marking software, manually marks 5 description languages of every picture Sentence, the safety cap wearing mainly around personnel in construction scene is described, and assigns unique sentence id to each sentence； Every picture has corresponding picture id and 5 sentences id corresponding, stores picture subtitle mark with json format Infuse data.

3. the detection and description method of safety cap wear condition in scene of constructing as claimed in claim 2, it is characterised in that: step Markup information is normalized in rapid S2.2 and is specifically included:

It, i.e., will be final by sample labeled data divided by the width and height of image to sample labeled data normalized Data control between 0~1, convenient for the quick reading of training sample data, while meeting the requirement of multiple dimensioned training, specifically return One change formula is shown below:

Wherein, (x_max,x_min,y_max,y_min) indicate that original sample bounding box markup information, (width, height) indicate picture ruler It is very little, (x, y, w, h) indicate normalization after markup information, (x, y) be target center point coordinate, (w, h) be target width and It is high；In data sample after normalization, the bounding box information of each target of every picture include 5 parameters (x, y, w, h, Class_id), wherein class_id is target category number.

4. the detection and description method of safety cap wear condition in scene of constructing as claimed in claim 2, it is characterised in that: step The training step of network model specifically includes in rapid S2.4:

S2.4.1: coordinates of targets Information locating；

Input picture is expressed as the tensor that a size is n × m × 3, and wherein n and m indicates the width and height of picture, and unit is Pixel, 3 indicate RGB triple channel number；Different size of image is automatically regulated to be to 416 × 416 fixed dimension first, then will be former Figure is divided into 13 × 13 grid, is responsible for the detection of the target as the grid where target's center's point；Each grid covers prediction The confidence level of 3 bounding boxes and these bounding boxes on the grid is covered, each bounding box includes 6 premeasurings: x, y, w, H, confidence and class_id, wherein (x, y) indicates the center of the bounding box of prediction and the relative value of net boundary, (w, H) ratio of the predicted boundary frame relative to whole picture width and height is indicated, confidence indicates confidence level, to reject Lower than the bounding box of threshold value, class_id indicates target category number；The predictive information of each bounding box includes the seat of bounding box Mark, width and height, bounding box coordinates calculation formula are as follows:

b_x=σ (t_x)+c_x, b_y=σ (t_y)+c_y

Wherein, (bx, by, bw, bh) indicates the bounding box centre coordinate predicted and width and height；(tx, ty, tw, th) indicates network The target of study, (cx, cy) are the coordinate shift amounts of grid, and (pw, ph) is preset anchor frame dimension；

S2.4.2: the prediction of bounding box confidence level；

After coordinates of targets Information locating, need the confidence level of predicted boundary frame, the 3 class targets according to mark: people, safety cap and People wears a safety helmet, and is 3 bounding boxes of each grid forecasting, each bounding box includes 6 premeasurings, therefore port number is 3 × (4+ 1+3)=24,3 kinds of scale feature figures of output are respectively 13 × 13 × 24,26 × 26 × 24,52 × 52 × 24, and unit is picture Element；

S2.4.3: the score prediction of predefined target classification；

The confidence level prediction of bounding box is completed, then the score of predefined target classification is predicted, for the inspection for improving Small object Survey effect, using the thought of multi-scale prediction, respectively with the characteristic pattern of 13 × 13,26 × 26,52 × 52 this 3 kinds of different scales into Row prediction, unit is pixel；

S2.4.4: the training of model

The characteristics of wearing data set according to homemade safety cap, makes corresponding modification to the configuration file of YOLOv3 network；Into Before row training, weight file is converted into the weight file under Keras frame according to modified network profile, so as to The load of pre-training model provides initiation parameter for the training of model.

5. the detection and description method of safety cap wear condition in scene of constructing as claimed in claim 2, it is characterised in that: step The definition method of sentence description rule and template specifically includes in rapid S3:

S3.1: the definition of sentence description rule

3 classification target visual concept of people according to people, safety cap and the safe wearing cap extracted in last stage target detection；Pass through It is zero triple (m, n, p) to count the number of target to be detected that an initial value, which is respectively set, to 3 classes target to be detected, Middle m indicates that the total number of persons detected, n indicate the total number of the safety cap detected, and p indicates the people of the safe wearing cap detected Number；The personnel of safe wearing cap should be no more than the total number of persons (0≤p≤m) of construction site, be otherwise considered as and detect wrong (p > m), nothing Method generates the descriptive statement that safety cap is worn；If it is detected that safe wearing cap number and total number of persons it is equal, i.e. p=m table Show and all has worn safety cap per capita；If it is detected that safe wearing cap number and total number of persons differ, i.e. p ≠ m, then it represents that portion Divide people to wear safety cap, still there is the non-safe wearing cap of personnel；

S3.2: the definition of sentence description template

Sentence description template is generated by picture subtitle mark, and the generating resource of word is in original picture subtitle mark or mesh The visual concept that mark detection algorithm extracts；The essence of vision word is a label, it is therefore an objective to for given zone in description image The word in domain retains vacancy；Use algorithm of target detection to extract visual concept, it is raw in conjunction with rule-based and template method The iamge description sentence worn at construction personnel's safety cap in construction scene.