CN117037023A

CN117037023A - Unmanned forklift tray jack detection method and system based on Yolov5-Pallet

Info

Publication number: CN117037023A
Application number: CN202310779769.7A
Authority: CN
Inventors: 周尖; 张旭; 占升荣; 孙知信; 孙哲; 赵学健; 胡冰; 徐玉华; 汪胡青; 宫婧
Original assignee: Anhui Yougu Express Intelligent Technology Co ltd
Current assignee: Anhui Yougu Express Intelligent Technology Co ltd
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2023-11-10

Abstract

The application discloses a method for detecting jacks of an unmanned forklift tray based on Yolov5-Pallet, which comprises the steps that video acquisition equipment acquires a video stream to be detected; transmitting the detected video stream to a computer terminal for controlling the unmanned forklift; the computer terminal detects the target; the computer terminal generates corresponding instructions by combining and combining the real-time target detection result and the work task and transmits the corresponding instructions to the unmanned forklift. The application has a faster response speed, and can continuously detect under the uninterrupted condition, thus greatly improving the production efficiency and reducing the downtime of the production line, thereby ensuring the accuracy and stability of the tray jack, avoiding the incomplete jack or misplug hole, and reducing the risks of damage to goods and injury to staff; meanwhile, the hardware cost of the traditional unmanned forklift is reduced, and meanwhile, the damage of goods caused by manual operation and errors can be reduced.

Description

Unmanned forklift tray jack detection method and system based on Yolov5-Pallet

Technical Field

The application relates to the field of computer vision and logistics intelligent equipment, in particular to a method and a system for detecting jacks of an unmanned forklift tray based on Yolov 5-Pallet.

Background

Under the large background of comprehensive promotion of intelligent manufacturing, the trend of intelligentization and unmanned logistics operation is more and more obvious. Unmanned forklifts are gradually accepted by enterprises as one of the main implementation modes of unmanned warehouse operations. Compared with the traditional manual forklift, the unmanned forklift has the advantages of being safer, cost-saving, high in anti-interference capability, stable and reliable in beat, applicable to special environments and the like, and gradually becomes a main trend of industrial automation logistics.

In unmanned fork truck's use, it needs to carry out the detection recognition work of tray jack in a large number to stretch out the prong in accurate position and promote the tray, guarantee the steady of delivery goods and send. The recognition work of the tray jack is too slow, so that the work efficiency of the unmanned forklift is seriously affected; inaccurate identification can lead to unstable cargo carrying, and even cause cargo collapse in severe cases, so that economic loss or safety accidents are caused. Therefore, when the unmanned forklift works, the visual recognition of the tray jack needs to be performed efficiently and precisely.

Disclosure of Invention

This section is intended to outline some aspects of embodiments of the application and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description of the application and in the title of the application, which may not be used to limit the scope of the application.

The application is provided in view of the problems of high recognition cost and insufficient detection precision of the existing unmanned forklift.

Therefore, the application aims to realize high-efficiency and high-precision identification of the tray and the jack by using the target detection technology in computer vision.

In order to solve the technical problems, the application provides the following technical scheme:

in a first aspect, an embodiment of the present application provides a method for detecting a jack of an unmanned forklift Pallet based on Yolov5-Pallet, which includes a video acquisition device acquiring a video stream to be detected; transmitting the detected video stream to a computer terminal for controlling the unmanned forklift; the computer terminal carries out target detection on the tray and the jack, and combines a real-time target detection result and a work task to generate a corresponding instruction and transmits the corresponding instruction to the unmanned forklift; and the unmanned forklift forks and takes the tray according to the operation instruction.

As a preferable scheme of the method for detecting the jacks of the unmanned forklift Pallet based on the Yolov5-Pallet, the application comprises the following steps: the target detection comprises the following steps: collecting images and establishing a data set; improving the Yolov5 network structure to obtain an improved Yolov5-Pallet target detection algorithm; and training a Yolov5-Pallet target detection model.

As a preferable scheme of the method for detecting the jacks of the unmanned forklift Pallet based on the Yolov5-Pallet, the application comprises the following steps: the improved Yolov5 network structure comprises the following steps: adding a characteristic fusion palettFPN based on BiFPN improvement into the original Yolov5 network structure to replace PANet which is originally used as a Neck structure; an attention mechanism is added into the original Yolov5 network structure.

As a preferable scheme of the method for detecting the jacks of the unmanned forklift Pallet based on the Yolov5-Pallet, the application comprises the following steps: palettfpn is the importance of learning different input features according to their resolution, the specific formulas being as follows:

wherein w is _i Refers to the weight value of the input feature of the ith layer, a _i And b _i The number of pixels in the horizontal and vertical directions of the i-th layer input feature map is referred to as "i", and the total number of input features is referred to as "n".

As a preferable scheme of the method for detecting the jacks of the unmanned forklift Pallet based on the Yolov5-Pallet, the application comprises the following steps: the improved Yolov5-Pallet target detection algorithm is characterized in that the PalletFPN extracts five frames of images based on Gaussian distribution, and downsampling operations with different degrees are respectively carried out according to the sequence, wherein the specific formula is as follows:

wherein Conv refers to the convolution calculation,for the n-th layer intermediate feature, mid represents the intermediate feature, w _n Refers to the weight value of the input feature of the n-th layer, refers to the up-sampling or down-sampling operation, n represents the level of the input feature, +.>Input features of different levels, in representing the input feature,/->For the n-th layer output feature, out represents the output feature.

As a preferable scheme of the method for detecting the jacks of the unmanned forklift Pallet based on the Yolov5-Pallet, the application comprises the following steps: extracting five frames of images based on Gaussian distribution comprises the following steps: fitting the mean value and standard deviation of Gaussian distribution according to the distribution condition of video data; the specific formula for calculating the probability density for each frame is as follows:

where μ is the mean of the gaussian, σ is the standard deviation of the gaussian, x is the sequence number of the frame, and p (x) is the probability density of the x-th frame.

The required frames are selected according to the probability density, and the first five frames are selected as representative frames in order from large to small.

As a preferable scheme of the method for detecting the jacks of the unmanned forklift Pallet based on the Yolov5-Pallet, the application comprises the following steps: adding an attention mechanism into the original Yolov5 network structure comprises the following steps: carrying out maximum pooling on the input feature map to obtain an intermediate feature map f; decomposing the middle feature map f of the largest pooling layer, and coding each pixel point along a horizontal coordinate and a vertical coordinate by using a pooling kernel; the specific formula for calculating the similarity between the query object Q and each characteristic pixel point is as follows:

wherein s is _ij Is the attention score.

For the attention score s _ij The softmax operation is performed as follows:

wherein N is the total number of attention scoresNumber s _ij Is the attention score.

According to the weight coefficient softmax (s _ij ) Output z to feature pixel point _ij And carrying out weighted summation to obtain an attention value att, wherein the specific formula is as follows:

wherein att is the attention value, x _i Is the abscissa of the characteristic pixel point, y _j Is the ordinate, z of the feature pixel point _ij Is of the coordinates (x _i ,y _j ) Output value of characteristic pixel point i _max The lateral pixel length, j, of the intermediate feature map f _max Is the vertical pixel length of the intermediate feature map f.

In a second aspect, an embodiment of the present application provides an unmanned forklift pallet jack detection system, which includes a data acquisition module, configured to acquire video information of a forklift on a site during a forklift operation; the data transmission module is used for transmitting the acquired video stream information to the computer terminal; the target detection module is used for detecting the tray and the jack in the video image information and outputting coordinate data; and the task execution module is used for receiving the coordinate data sent by the computer terminal and executing the forking operation.

In a third aspect, embodiments of the present application provide a computer apparatus comprising a memory and a processor, the memory storing a computer program, wherein: the computer program instructions, when executed by a processor, implement the steps of the Yolov5-Pallet based unmanned forklift Pallet jack detection method according to the first aspect of the application.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having a computer program stored thereon, wherein: the computer program instructions, when executed by a processor, implement the steps of the Yolov5-Pallet based unmanned forklift Pallet jack detection method according to the first aspect of the application.

The application has the beneficial effects that: according to the application, by using a target detection technology, the tray jack can be automatically detected by the unmanned forklift, so that the automation level of operation is improved, and the risk of human errors is effectively avoided; the method has a relatively high response speed, and can continuously detect under the uninterrupted condition, so that the production efficiency can be greatly improved, and the downtime of a production line can be reduced; the accuracy and stability of the tray jack are ensured, and the condition that the jack is incomplete or misplaced is avoided, so that the risks of damage to goods and injury to staff are reduced; the hardware cost of the traditional unmanned forklift is reduced, and meanwhile, the damage of goods caused by manual operation and errors can be reduced, so that the cost is saved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

fig. 1 is a flow chart of a detection method of an unmanned forklift Pallet jack based on Yolov5-Pallet in embodiment 1.

Fig. 2 is a schematic structural diagram of a Pallet fpn of the yolov5-Pallet based unmanned forklift Pallet jack detection method of embodiment 1.

Fig. 3 is an original view of an embodiment of a method for detecting jacks of an unmanned forklift Pallet based on Yolov5-Pallet in embodiment 2.

Fig. 4 is an example target detection effect diagram of the unmanned forklift Pallet jack detection method based on Yolov5-Pallet of example 2.

Fig. 5 is an example model confidence and accuracy relationship diagram of an example 2 Yolov5-Pallet jack detection method.

Fig. 6 is a graph of example model confidence and recall relationship for the example yolov5-Pallet jack detection method of example 2.

Fig. 7 is a graph of model accuracy and recall of an embodiment of the method for detecting jacks in an unmanned forklift Pallet based on Yolov5-Pallet of embodiment 2.

Detailed Description

In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the application will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present application is not limited to the specific embodiments disclosed below.

Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the application. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

Example 1

Referring to fig. 1 and 2, a first embodiment of the present application provides a method for detecting a jack of an unmanned forklift Pallet based on Yolov5-Pallet, which includes,

s1: the video acquisition equipment acquires a video stream to be detected.

Further, in this example, video information obtained by the unmanned forklift carrying the video equipment in actual work is collected, the resolution of the video is 1920×1080, the format is the FLV streaming media format, and the unmanned forklift has the characteristics of low occupancy rate, good video quality, small volume and the like, and is very suitable for network transmission.

It should be noted that, before the operation of acquiring the video stream information, the ambient light detection is required, and when the brightness of the ambient light is insufficient to obtain a clear image, the light supplementing operation is required, because the light of the working environment where the unmanned forklift is located is complex and changeable, and the accuracy of the target detection needs to be ensured by stable imageable light, so that the operation is increased.

Further, the light supplementing operation uses a light brightness control method based on a proportional control algorithm, the algorithm compares the measured ambient light brightness value with a set target light brightness value and calculates the difference between the measured ambient light brightness value and the target light brightness value, and then the algorithm generates a control signal according to the difference value, wherein the control signal controls the light brightness so that the measured ambient light brightness gradually approaches the target light brightness, and the specific formula is as follows:

S＝K _P *(L _d -L _m )

wherein K is _p Is a proportionality coefficient, is used for controlling the sensitivity of the light brightness adjustment, L _m Is the ambient light level, L _d For the target brightness, S is an output control signal, and the control signal S is a value ranging from 0 to 100, for controlling the duty cycle of the lamp brightness.

When L _m Less than L _d When the control signal is positive, the brightness of the lamp light is increased;

when L _d Greater than L _d And when the control signal is negative, the light brightness is reduced.

Further, the purpose of this process is to get a proper supplemental light source for the forklift in poorly lit scenes, so as to get a clear video image.

S2: and transmitting the detected video stream to a computer terminal for controlling the unmanned forklift.

It should be noted that, in this example, the video stream is transmitted to the computer terminal controlling the unmanned forklift through the network, and the transmission protocol of the video stream is RTMP, so that low delay in the transmission process can be ensured.

S3: the computer terminal carries out target detection on the tray and the jack, and generates corresponding instructions by combining a real-time target detection result and a work task and transmits the instructions to the unmanned forklift.

Further, the detection source is a video stream input by an unmanned forklift, the detection network is a pre-trained Yolov5-Pallet network, the pre-trained Yolov5-Pallet network pre-trains a Yolov5-Pallet network, and the detection network comprises the following steps:

s3.1, collecting images and establishing a data set.

Specifically, yolov5 is taken as a deep learning model, a large number of data samples are needed for training, an image acquisition device is used for collecting images of trays in a warehouse or outdoor environment where an unmanned forklift is about to work, the states of the trays in the images comprise three states of cargo carrying, a single empty tray and an empty tray group, and images under different weather and different light rays are acquired.

Preferably, the images collected in step S3.1 are labeled one by one using image labeling software, thereby creating a customized tray dataset.

S3.2: and (3) improving the Yolov5 network structure to obtain an improved Yolov5-Pallet target detection algorithm.

Further, there are two main improvements:

s3.2.1: a feature fusion based on BiFPN improvement is added into the original Yolov5 network structure and named as PalletFPN for replacing PANet which is originally used as a Neck structure.

Preferably, as shown in fig. 2, the feature fusion palettfpn based on the BiFPN improvement is added because different input features have different resolutions, so that the weights occupied by the different input features in the fused output features are different, and the original neik structure is replaced by a simple and efficient weighted bidirectional feature pyramid network (palettfpn) by PANet, so that a learnable weight is introduced to learn the importance of the different input features, and simultaneously, top-down and bottom-up multi-scale features are repeatedly applied to fusion.

Furthermore, the PalletFPN improves the feature input required by feature fusion on the basis of BiFPN, after the unmanned forklift completes gesture correcting operation, a computer terminal obtains a proper frame suitable for carrying out tray jack target detection, then extracts the last four frames of a transmission video stream based on Gaussian distribution and takes the four frames and the proper frame together as the input of feature fusion, so that input information is dynamic and three-dimensional, more scene details are obtained, and the accuracy of a target detection algorithm is improved.

Further, the method for extracting the four input frames based on Gaussian distribution mainly comprises the following steps:

s3.2.1.1: the mean value and standard deviation of the Gaussian distribution are fitted according to the distribution condition of the video data.

S3.2.1.2: the probability density for each frame is calculated using a gaussian distributed probability density function formula, which is as follows:

where μ is the mean value of the gaussian, σ is the standard deviation of the gaussian, x is the sequence number of the frame, p (x) is the probability density of the x-th frame, and indicates the probability of the frame, and the larger the probability density value, the greater the probability of the frame.

Preferably, the larger the probability density is, the higher the representativeness of the selected frame image is, so that the theoretical basis of replacing one section of dynamic video stream information by the representative frame image is provided, the operation amount of the whole algorithm is reduced, the response speed is improved, and meanwhile, the representative frame image replaces one section of dynamic video stream information, so that the algorithm can conveniently capture more dynamic information in the feature map during training.

S3.2.1.3 the required frames are selected according to the probability density, probability density values of all frames are ordered in order from big to small, and then the first n frames are selected as representative frames, so that more dynamic, stereoscopic and representative information is obtained.

Preferably, the palettfpn uses a feature fusion method with a learnable weight, and can learn the importance of different input features according to the resolution of a specific input feature, where the expression is as follows:

Further, after five frames of images taking the proper frame as the initial frame are obtained, respectively performing according to the sequenceDifferent degrees of downsampling operations to give them different resolutions, thereby obtaining different levels of input features from different frame images, denoted asWhere in represents the input feature and n represents the hierarchy of the input feature.

Further, the intermediate features of the n layer are obtained by carrying out convolution and weighted feature fusion operation on the input features of the n layer and the n+1 layer and are recorded asWhere mid represents the intermediate feature, the calculation expression of the n-th layer intermediate feature of the PalletFPN structure is:

wherein Conv refers to convolution calculation, w _n Refers to the weight values of the n-th layer input features and Resize refers to the upsampling or downsampling operation.

Further, the n-th layer output feature is obtained by performing convolution and weighted feature fusion operation on the n-th layer input feature, the intermediate feature and the n-1-th layer output feature, and is recorded asWherein out represents the output characteristic, and the calculation expression of the n-th layer output characteristic of the PalletFPN structure is as follows:

wherein Conv refers to the convolution calculation,for the n-th layer intermediate feature, mid represents the intermediate feature, w _n Refers to the weight value of the input feature of the n layer, the Resize refers to the up-sampling or down-sampling operation, and the n generationHierarchy of table input features, +.>Input features of different levels, in representing the input feature,/->For the n-th layer output feature, out represents the output feature.

S3.2.2 adds an attention mechanism to the original Yolov5 network structure.

Preferably, an attention mechanism is added in the original Yolov5 network structure, so that the algorithm focuses more on information required by the current task in a large amount of input information, and the efficiency and the accuracy of target detection are improved while overload of the information is prevented.

It should be noted that an attention mechanism is added because, considering the complexity of the unmanned forklift work environment, it is necessary to use the attention mechanism to locate the information of interest (trays and jacks), suppress the garbage, and thereby improve the effect of enhanced detection.

Further, the attention mechanism comprises the steps of:

s3.2.2.1 the input feature map of size h×w is maximally pooled, the filter parameter is set to 3×3, the stride is set to 2, and the intermediate feature map f is obtained.

S3.2.2.2 decomposing the largest pooling layer, and for the intermediate feature map f, encoding each pixel point along horizontal and vertical coordinates with a pooling kernel, respectively, with an abscissa x _i The ordinate is y _j The output at is denoted as z _ij Each feature pixel point can be represented as (x _i ,y _j ,z _ij )。

S3.2.2.3 abstract the coded coordinates of the query object Q in the max pooling layer into (x, y), calculate the similarity of the query object Q and each feature pixel point, and mark it as the attention score s _ij The expression is as follows:

wherein s is _ij For the attention score, x _i Is the abscissa of the characteristic pixel point, y _j Is the ordinate of the feature pixel point.

S3.2.2.4 pair attention score s _ij Performing a softmax procedure, and comparing s _ij Conversion to a range of [0,1 ]]And the probability distribution with a sum of 1 is expressed as follows:

where N is the total number of attention scores and the expression is n=i _max ×j _max ，s _ij Is the attention score.

S3.2.2.5 according to the weight coefficient softmax (s _ij ) Output z to feature pixel point _ij And carrying out weighted summation to obtain an attention value att, wherein the expression is as follows:

S3.3, training a Yolov5-Pallet target detection model.

Specifically, a data set to be trained is imported, the data set is divided into a training set, a verification set and a test set, parameters of a train.py file are adjusted, the parameters comprise a training weight file, a training data path, the number of training rounds and the like, and the train.py file is operated to obtain a trained model and indexes for measuring the performance of the model.

It should be noted here that after a new detection model is trained, it is necessary to analyze the parameter indexes of the model, and select a part of images which may encounter a scene for pre-detection, so as to check the detection effect of the model.

Further, the related instruction sent back to the unmanned forklift is that after receiving the video information sent by the unmanned forklift, the computer terminal generates information of parameters required for subsequent operation of the unmanned forklift through target detection processing, and the method comprises the following steps: selecting a certain relevant tray detected in the video stream as a working object according to a set working task; after determining the tray to be forked, acquiring anchor frame parameters of a proper frame image in a video stream, wherein the coordinate coding format is coco, namely [ x_min, y_min, width, height]The parameters of the tray anchor frame are marked as [ x ] ₁ ,y ₁ ,w ₁ ,h ₁ ]。

Specifically, the method for judging the proper frame image means that the anchor frames of the two tray jacks are all arranged in the tray anchor frame, because when the unmanned forklift acquires video information, the relative angle between the unmanned forklift and the tray to be forked is too large or too small, which is not beneficial to the subsequent forking operation, and the anchor frames of the two tray jacks are all arranged in the tray anchor frame, so that the unmanned forklift can be ensured to be basically arranged right in front of the tray to be forked, the subsequent forking operation is beneficial to the subsequent forking operation, and therefore, the judging process is also the gesture correcting process of the unmanned forklift, and the expression required to be satisfied in the process is as follows:

x ₂ ,x ₂ +w ₂ ,x ₃ ,x ₃ +w ₃ ∈(x ₁ ,x ₁ +w ₁ )

y ₂ ,y ₂ +h ₂ ,y ₃ ,y ₃ +h ₃ ∈(y ₁ ,y ₁ +h ₁ )

preferably, the parameters of the anchor frames of the two jacks in the anchor frame of the selected tray are obtained and respectively recorded as [ x ] ₂ ,y ₂ ,w ₂ ,h ₂ ]，[x ₃ ,y ₃ ,w ₃ ,h ₃ ]。

Further, two tray jacks to be forked are selected, and a central coordinate parameter [ x_center ] of the tray jacks is obtained in real time ₂ ,y_center ₂ ]And [ x_center ] ₃ ,y_center ₃ ]The calculation formula is as follows:

further, the generated jack center parameter coordinates are returned to the unmanned forklift.

S4: and the unmanned forklift forks and takes the tray according to the operation instruction.

The embodiment also provides an unmanned forklift Pallet jack detection system based on the Yolov5-Pallet, which comprises a data acquisition module, a data transmission module, a target detection module and a task execution module, wherein the data acquisition module is used for acquiring video information of a forklift on site during working; the data transmission module is used for transmitting the acquired video stream information to the computer terminal; the target detection module is used for detecting the tray and the jack in the video image information and outputting coordinate data; and the task execution module is used for receiving the coordinate data sent by the computer terminal and executing the forking operation.

The above unit modules may be embedded in hardware or independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above units.

The embodiment also provides computer equipment which is suitable for the situation of the unmanned forklift pallet jack detection method and comprises a memory and a processor; the memory is used for storing computer executable instructions, and the processor is used for executing the computer executable instructions to realize the unmanned forklift pallet jack detection method according to the embodiment.

The computer device may be a terminal comprising a processor, a memory, a communication interface, a display screen and input means connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

The present embodiment also provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of: the video acquisition equipment acquires a video stream to be detected; transmitting the detected video stream to a computer terminal for controlling the unmanned forklift; the computer terminal carries out target detection on the tray and the jack, and combines a real-time target detection result and a work task to generate a corresponding instruction and transmits the corresponding instruction to the unmanned forklift; and the unmanned forklift forks and takes the tray according to the operation instruction.

In conclusion, the application has higher response speed, and can continuously detect under the uninterrupted condition, thus greatly improving the production efficiency and reducing the downtime of the production line, thereby ensuring the accuracy and stability of the tray jack, avoiding the occurrence of incomplete jack or misplug hole, and reducing the risks of damage to goods and injury to staff; meanwhile, the hardware cost of the traditional unmanned forklift is reduced, and meanwhile, the damage of goods caused by manual operation and errors can be reduced.

Example 2

Referring to fig. 3 to 7, in order to verify the beneficial effects of the present application, a method for detecting the insertion holes of the tray of the unmanned forklift based on Yolov5-Pallet is provided in the second embodiment of the present application, and scientific demonstration is performed through economic benefit calculation and simulation experiments.

Specifically, in the traditional scheme, the unmanned forklift mainly relies on sensors such as a laser radar and an SLAM technology to realize target detection and positioning, but the price of the existing vehicle-standard laser radar is about 7000 yuan, the cost is high, a visual recognition method with a high-definition camera as detection hardware is adopted, the cost is only 200-300 yuan, and the unmanned forklift has a large market space, but the unmanned forklift depends on a target detection algorithm.

Further, in order to verify the performance of the algorithm in the unmanned forklift Pallet jack detection method based on Yolov5-Pallet mentioned in the patent, the following simulation experiment is developed:

firstly, preparing a data set, collecting pictures related to a tray and a warehouse required for implementation, marking 112 pictures by using Labelimg software, importing the data set into an integrated development environment, including an image jpeg file and a label data txt file, adjusting parameters in a training execution file train.py, wherein a training weight file is 'yolov5 s.pt', the number of training rounds is 300, then running train.py to obtain a training completed model, finally running detect.py, verifying the performance of the model by using the pictures in a verification set, the original image is shown in fig. 3, and a detection result picture generated by a target detection process is shown in fig. 4.

TABLE 1 precision and recall of detection results with different anchor frame confidence levels

Detecting anchor frame confidence	Tray recognition results	Recognition result of jack	Recall rate of recall
				0-0.792	Is not completely accurate	Is not completely accurate	Is very high
0.792	Accurate and accurate	Accurate and accurate	Higher height
				0.792-0.96	Accurate and accurate	Accurate and accurate	Lower level
0.96	Accurate and accurate	Accurate and accurate	Very low
				0.96-1	Accurate and accurate	Accurate and accurate	Very low

Preferably, as shown in table 1, by training the performance of the result analysis algorithm, in the p_cut graph showing the detection precision, when the confidence of the detection anchor frame reaches 0.792, the recognition results of the tray and the jack are completely accurate; in the R_curve graph showing the detection recall rate, when the confidence coefficient of the detection anchor frame reaches 0.96, no category of targets can be found completely, and obviously, the two indexes are mutually exclusive, the higher the detection precision is, the lower the recall rate is; the higher the detection recall, the lower the accuracy.

TABLE 2 average precision (mAP) of means for different classes

Category(s)	Cross-over ratio	mAP
			Tray for holding food	/	0.989
Jack (Jack)	/	0.912
			Comprehensive synthesis	0.5	0.95

Further, the two indexes are combined to obtain average mean precision, namely mAP (mean Average Precision), and the performance of the target detection algorithm is measured by the index. As shown in table 2, in the pr_cut graph showing average precision of the mean value, when the overlap ratio is 0.5, the two types of maps of the tray and the jack are 0.95, and the two types of maps are both above 0.9, so that the method has good detection effect, and the p_cut graph, the r_cut graph and the pr_cut graph are shown in fig. 5, 6 and 7.

It should be noted that the above embodiments are only for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present application may be modified or substituted without departing from the spirit and scope of the technical solution of the present application, which is intended to be covered in the scope of the claims of the present application.

Claims

1. A method for detecting jacks of an unmanned forklift Pallet based on Yolov5-Pallet is characterized by comprising the following steps: comprising the steps of (a) a step of,

the video acquisition equipment acquires a video stream to be detected;

transmitting the detected video stream to a computer terminal for controlling the unmanned forklift;

the computer terminal carries out target detection on the tray and the jack, and combines a real-time target detection result and a work task to generate a corresponding instruction and transmits the corresponding instruction to the unmanned forklift;

and the unmanned forklift forks and takes the tray according to the operation instruction.

2. The method for detecting the jacks of the unmanned forklift Pallet based on Yolov5-Pallet, as set forth in claim 1, is characterized in that: the target detection comprises the following steps:

collecting images and establishing a data set;

improving the Yolov5 network structure to obtain an improved Yolov5-Pallet target detection algorithm;

and training a Yolov5-Pallet target detection model.

3. The method for detecting the jacks of the unmanned forklift Pallet based on Yolov5-Pallet, as claimed in claim 2, is characterized by comprising the following steps: the improved Yolov5 network structure comprises the following steps:

adding a characteristic fusion palettFPN based on BiFPN improvement into the original Yolov5 network structure to replace PANet which is originally used as a Neck structure;

an attention mechanism is added into the original Yolov5 network structure.

4. The method for detecting the jacks of the unmanned forklift Pallet based on Yolov5-Pallet, as claimed in claim 3, wherein the method comprises the following steps of: the palettfpn learns the importance of different input features according to the resolution of the particular input feature, and the particular formula is as follows:

wherein w is _i Refers to the weight value of the input feature of the ith layer, a _i And b _i Respectively refer to the firstThe i-layer input feature map has the number of pixels in the lateral and longitudinal directions, and n refers to the total number of input features.

5. The method for detecting the jacks of the unmanned forklift Pallet based on Yolov5-Pallet, as claimed in claim 2, is characterized by comprising the following steps: according to the Yolov5-Pallet target detection algorithm, five frames of images are extracted by the Pallet FPN based on Gaussian distribution, and downsampling operations with different degrees are respectively carried out according to the sequence, wherein the specific formula is as follows:

6. The method for detecting the jacks of the unmanned forklift Pallet based on Yolov5-Pallet, as set forth in claim 5, is characterized in that: the extracting five frames of images based on Gaussian distribution comprises the following steps:

fitting the mean value and standard deviation of Gaussian distribution according to the distribution condition of video data;

the specific formula for calculating the probability density of each frame is as follows:

wherein μ is the mean of the gaussian distribution, σ is the standard deviation of the gaussian distribution, x is the sequence number of the frame, and p (x) is the probability density of the x-th frame;

7. The method for detecting the jacks of the unmanned forklift Pallet based on Yolov5-Pallet, as claimed in claim 3, wherein the method comprises the following steps of: the adding of an attention mechanism in the original Yolov5 network structure comprises the following steps:

carrying out maximum pooling on the input feature map to obtain an intermediate feature map f;

decomposing the middle feature map f of the largest pooling layer, and coding each pixel point along a horizontal coordinate and a vertical coordinate by using a pooling kernel;

the specific formula of the similarity between the query object Q and each characteristic pixel point is as follows:

wherein s is _ij For the attention score, x _i Is the abscissa of the characteristic pixel point, y _j Is the ordinate of the feature pixel point;

attention score s _ij The specific formulas for performing the softmax operation are as follows:

wherein N is the total number of attention scores, s _ij Is an attention score;

according to the weight coefficient softmax (s _ij ) Output z to feature pixel point _ij The specific formula for deriving the attention value att is as follows:

8. An unmanned forklift Pallet jack detection system based on Yolov5-Pallet, based on the unmanned forklift Pallet jack detection method based on Yolov5-Pallet described in any one of claims 1-7, characterized in that: comprising the steps of (a) a step of,

the data acquisition module is used for acquiring video information of a forklift on site during working;

the data transmission module is used for transmitting the acquired video stream information to the computer terminal;

the target detection module is used for detecting the tray and the jack in the video image information and outputting coordinate data;

and the task execution module is used for receiving the coordinate data sent by the computer terminal and executing the forking operation.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that: the processor, when executing the computer program, implements the steps of the method of any one of claims 1 to 7.

10. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program, when executed by a processor, implements the steps of the method of any of claims 1 to 7.