CN115457494A

CN115457494A - Object identification method and system based on infrared image and depth information fusion

Info

Publication number: CN115457494A
Application number: CN202110550741.7A
Authority: CN
Inventors: 邓耀桓
Original assignee: Guangzhou Lanpangzi Mobile Technology Co ltd
Current assignee: Guangzhou Lanpangzi Mobile Technology Co ltd
Priority date: 2021-05-19
Filing date: 2021-05-19
Publication date: 2022-12-09

Abstract

The application provides an object identification method based on infrared image and depth information fusion, which comprises the following steps: an infrared image and depth information of the object are collected. And respectively determining an object boundary frame of the object and a plurality of supporting leg boundary frames of the object in the infrared image by adopting a single-stage target detection model. And calculating and uploading the distance between the object and the unmanned transport vehicle and the angle information of the object to the unmanned transport vehicle according to the supporting leg boundary frame and the depth information. The method and the device adopt a single-stage target detection model, are beneficial to quickly training and identifying various types of industrial trays, and are convenient for deployment and application of related systems of the method and the device. The infrared image has higher robustness in an environment with unstable illumination, and the detection result is more stable. In addition, the method and the device identify the two-dimensional image, have higher detection speed, and avoid the increase of time cost and operation cost caused by a large amount of operations generated when the point cloud data is processed on the three-dimensional image in the prior art.

Description

Object identification method and system based on infrared image and depth information fusion

Technical Field

The application relates to the technical field of intelligent warehousing, in particular to an object identification method and system based on infrared image and depth information fusion.

Background

An Automated Guided Vehicle (AGV) is a transport Vehicle equipped with an automatic guide device, capable of traveling along a predetermined guide path, and having safety protection and various transfer functions. In the technical field of intelligent warehousing, the AGV can fully embody the automation and the flexibility thereof, and realizes high-efficiency, economic and flexible unmanned production. However, how to quickly and accurately identify the industrial pallet is a problem that the AGV needs to face when forking the load.

Currently, the most common identification algorithm for industrial pallets is mainly based on point cloud data processing, such as point cloud classification/segmentation deep learning framework pointNet proposed by stanford university. Since the point cloud data processing requires a larger amount of calculation support, in the detection stage of the industrial tray, in order to realize real-time detection, higher cost of computing equipment must be paid, which inevitably results in expensive calculation cost. In addition, the labeling price of the 3D point cloud is more than ten times of that of a common image, and the labeling difficulty is high, so that the iteration time of the product is influenced. In addition, the depth camera is also very susceptible to environmental influences, such as external light, reflection and other factors, so that the depth information is unstable, and the recognition result is difficult to ensure.

Disclosure of Invention

The present application provides an object recognition method and system based on infrared image and depth information fusion, which is intended to solve or partially solve at least one of the above problems related to the background art or at least one of the other disadvantages of the related art.

The application provides an object identification method based on infrared image and depth information fusion, which comprises the following steps: an infrared image and depth information of an object are acquired. And respectively determining an object boundary frame of the object and a plurality of supporting leg boundary frames of the object in the infrared image by adopting a single-stage target detection model. And calculating and uploading the distance between the object and the unmanned transport vehicle and the angle information of the object to the unmanned transport vehicle according to the supporting leg boundary frame and the depth information.

In some embodiments, the infrared image and the depth information each include all of the support feet of the object and a plurality of receptacles disposed on the object that are compatible with the automated guided vehicle.

In some embodiments, determining the object bounding box of the object and the plurality of supporting foot bounding boxes of the object in the infrared image using a single-stage object detection model may include: receiving the infrared image by the single-stage target detection model, and adjusting the size of the infrared image according to the proportion of objects in the infrared image; performing feature extraction on the infrared image by a convolutional neural network of a single-stage target detection model to obtain a prediction result comprising a plurality of object candidate frames and parameter information of each object candidate frame, wherein the parameter information comprises position parameters and confidence degrees of the object candidate frames; and sequencing the confidence coefficient values of the object candidate frames, screening the object candidate frame corresponding to the confidence coefficient of the object candidate frame with the maximum value, and taking the object candidate frame as an object boundary frame.

In some embodiments, determining the object bounding box of the object and the plurality of supporting foot bounding boxes of the object in the infrared image using a single-stage object detection model may further include: receiving an infrared image by a single-stage target detection model, and adjusting the size of the infrared image according to the proportion of supporting legs in the infrared image; performing feature extraction on the infrared image by a convolutional neural network of a single-stage target detection model to obtain a prediction result comprising a plurality of supporting leg candidate frames and parameter information of each supporting leg candidate frame, wherein the parameter information comprises position parameters and confidence degrees of the supporting leg candidate frames; and screening out the supporting leg candidate frames corresponding to the confidence degrees of the supporting leg candidate frames which accord with the preset threshold value, and taking the supporting leg candidate frames as supporting leg boundary frames.

In some embodiments, after determining the object bounding box of the object and the plurality of supporting foot bounding boxes of the object in the infrared image respectively by using the single-stage object detection model, the method may further include: and filtering the supporting leg boundary frames which do not accord with preset conditions in the obtained supporting leg boundary frames by utilizing prior information for identifying the object, wherein the preset conditions comprise that the supporting leg boundary frames are on the same straight line and the supporting leg boundary frames are positioned in the object boundary frame. And establishing the relation between the supporting leg boundary frame and the object boundary frame.

The present application further provides such an object recognition system based on infrared image and depth information fusion, which may include: the system comprises a depth camera, a single-stage target detection model and a recognition module. In particular, the depth camera is used to capture infrared images of objects as well as depth information. The single-stage target detection model is used for respectively determining an object boundary frame of an object and a plurality of supporting foot boundary frames of the object in the infrared image. The recognition module is used for calculating and uploading the distance between the object and the unmanned transport vehicle and the angle information of the object to the unmanned transport vehicle according to the supporting leg boundary frame and the depth information.

In some embodiments, the infrared image and the depth information each include all of the support feet of the object and a plurality of jacks provided on the object that are adapted to the automated guided vehicle.

In some embodiments, the performing step of the single-stage object detection model may include: and receiving the infrared image, and adjusting the size of the infrared image according to the proportion of the object in the infrared image. And carrying out feature extraction on the infrared image by using a built-in convolutional neural network to obtain a prediction result containing a plurality of object candidate frames and parameter information of each object candidate frame, wherein the parameter information comprises position parameters and confidence degrees of the object candidate frames. And sequencing the confidence values of the plurality of object candidate frames, screening the object candidate frame corresponding to the confidence value of the object candidate frame with the maximum value, and taking the object candidate frame as an object boundary frame.

In some embodiments, the performing step of the single-stage object detection model may further include: and receiving the infrared image by the single-stage target detection model, and adjusting the size of the infrared image according to the proportion of the supporting legs in the infrared image. And carrying out characteristic extraction on the infrared image by using a built-in convolutional neural network to obtain a prediction result containing a plurality of supporting leg candidate frames and parameter information of each supporting leg candidate frame, wherein the parameter information comprises position parameters and confidence degrees of the supporting leg candidate frames. And screening out the supporting leg candidate frames corresponding to the confidence degrees of the supporting leg candidate frames which accord with the preset threshold value, and taking the supporting leg candidate frames as supporting leg boundary frames.

In some embodiments, a filtering module and an association module may also be included. Specifically, the filtering module is configured to filter supporting leg bounding boxes that do not meet a preset condition among the obtained multiple supporting leg bounding boxes by using prior information for identifying the object, where the preset condition includes that the multiple supporting leg bounding boxes are on the same straight line and that the supporting leg bounding boxes are located in the object bounding box. The correlation module is used for establishing the relation between the supporting foot boundary frame and the object boundary frame.

According to the technical scheme of the embodiment, at least one of the following advantages can be obtained.

According to the object identification method and the object identification system based on the fusion of the infrared image and the depth information, a single-stage target detection model based on the infrared image is adopted, so that the rapid training and identification of various industrial trays are facilitated, and the deployment and application of the related system are facilitated. The infrared image has higher robustness in an environment with unstable illumination, and the detection result is more stable. In addition, the method and the device identify the two-dimensional image, have higher detection speed, and avoid the increase of time cost and operation cost caused by a large amount of operations generated when the point cloud data is processed on the three-dimensional image in the prior art.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, with reference to the accompanying drawings in which:

FIG. 1 is a flow chart of an object identification method based on infrared image and depth information fusion according to an exemplary embodiment of the present application; and

fig. 2 is a schematic structural diagram of an object recognition system based on infrared image and depth information fusion according to an exemplary embodiment of the present application.

Detailed Description

For a better understanding of the present application, various aspects of the present application will be described in more detail with reference to the accompanying drawings. It should be understood that the detailed description is merely illustrative of exemplary embodiments of the present application and does not limit the scope of the present application in any way. Like reference numerals refer to like elements throughout the specification. The expression "and/or" includes any and all combinations of one or more of the associated listed items.

In the drawings, the size, dimension, and shape of elements have been slightly adjusted for convenience of explanation. The figures are purely diagrammatic and not drawn to scale. As used herein, the terms "approximately", "about" and the like are used as table-approximating terms and not as table-degree terms, and are intended to account for inherent deviations in measured or calculated values that would be recognized by one of ordinary skill in the art. In addition, in the present application, the order in which the processes of the respective steps are described does not necessarily indicate an order in which the processes occur in actual operation, unless explicitly defined otherwise or can be inferred from the context.

It will be further understood that terms such as "comprising," "including," "having," "including," and/or "containing," when used in this specification, are open-ended and not closed-ended, and specify the presence of stated features, elements, and/or components, but do not preclude the presence or addition of one or more other features, elements, components, and/or groups thereof. Furthermore, when a statement such as "at least one of" appears after a list of listed features, it modifies that entire list of features rather than just individual elements in the list. Furthermore, the use of "may" mean "one or more embodiments of the application" when describing embodiments of the application. Also, the term "exemplary" is intended to refer to an example or illustration.

Unless otherwise defined, all terms (including engineering and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

In addition, the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 is a flowchart of an object recognition method based on infrared image and depth information fusion according to an exemplary embodiment of the present application.

As shown in fig. 1, the present application provides an object identification method based on infrared image and depth information fusion, which may include: step S1, acquiring an infrared image and depth information of an object. And S2, respectively determining an object boundary frame of the object and a plurality of supporting leg boundary frames of the object in the infrared image by adopting a single-stage target detection model. And S3, calculating and uploading the distance between the object and the unmanned transport vehicle and the angle information of the object to the unmanned transport vehicle according to the supporting leg boundary frame and the depth information.

Step S1, acquiring an infrared image and depth information of an object. In particular, the shooting of the conventional image needs to be realized in an environment with stable and appropriate illuminance, that is, the conventional image has high requirements on the environment, and in an environment with uncertain illuminance, such as a warehouse, it is obviously difficult to acquire a clear and easily recognized conventional image. Based on this, the present application uses an Infrared imaging technology to acquire an IR (Infrared) image, that is, an image obtained by measuring heat radiated from an object, and thus can acquire the shape and texture characteristics of the photographed object. When the IR image is obtained by adopting the mode, the IR image can not be influenced by environmental factors such as illuminance and the like, can continuously and stably work, and has certain robustness. In addition, in order to facilitate an AGV (Automated Guided Vehicle) to carry the object, it is necessary to acquire depth information of the object.

Based on the foregoing, the present application employs a depth camera to capture IR images and depth information of an object, and more particularly, a combined optical depth camera may be employed. The combined optical depth camera is provided with a near-infrared laser, can project light rays with certain structural characteristics to a shot object, and then collects the light rays through a special infrared camera. The light with a certain structure can acquire different image phase information according to different depth areas of a shot object, and then the different image phase information is converted into depth information through a built-in arithmetic unit, so that the position and the depth information of the object are obtained. In other words, the depth information includes the depth distance of the shooting space, that is, the actual distance from each point in the shooting space to the lens is obtained by using the exact data.

It should be noted that the present application can be applied to an application scenario in which an AGV identifies an industrial tray, that is, an object of the present application can be regarded as an industrial tray. Therefore, the depth camera is arranged at a position which is slightly inclined to the ground in the middle of the fork rake of the AGV, so that the front IR image of the industrial tray in the AGV forking direction can be recorded in real time, and the supporting legs of the industrial tray and the inserting holes matched with the AGV can be shot clearly. The shape of the receptacle may be rectangular, without limitation. Based on this application scenario, for ease of understanding, the industrial pallet will be set forth in the following description in place of the object.

It should be noted that, the IR image and the depth information collected in the present application are collected and recorded by the depth camera in real time, so that the real-time performance is ensured, and accurate and effective input data are provided for the identification of the subsequent steps.

And S2, respectively determining an object boundary frame of the object and a plurality of supporting leg boundary frames of the object in the infrared image by adopting a single-stage target detection model.

In some embodiments, after obtaining the IR image of the industrial pallet, a single-stage object detection model is used to identify the specific position of the industrial pallet in the IR image and the specific positions of the supporting feet of the industrial pallet in the IR image, thereby providing a basis for subsequently identifying the distance and relative angle information of the industrial pallet from the AGV. It should be noted that in the present application, the IR image and depth information are transmitted to the single-stage object detection model through a Robot Operating System (ROS) built in the AGV.

In some embodiments, the single-stage object detection model detects the object by regularly and densely sampling the position, scale, and aspect ratio of the object in the IR image, which has the advantage of high computational efficiency, and therefore the application uses the single-stage object detection model to determine the bounding box of the industrial pallet and the bounding box of the industrial pallet support leg. More specifically, an IR image containing an industrial pallet is uploaded to a single-stage object detection model, and the single-stage object detection model adjusts the size of the IR image according to the proportion of the industrial pallet in the IR image to ensure that the IR image has the most appropriate definition and the most appropriate size for feature extraction.

Further, the feature extraction is carried out on the infrared image by a convolutional neural network of the single-stage target detection model, and a prediction result containing a plurality of industrial tray candidate frames and parameter information of each industrial tray candidate frame is obtained, wherein the parameter information comprises position parameters of the industrial tray candidate frames and confidence degrees of the position parameters. Specifically, the single-stage target detection model has a convolutional neural network, the convolutional layer of which can analyze the input IR image to perform feature extraction on the IR image, output a plurality of industrial pallet candidate boxes, and each industrial pallet candidate box has parameter information corresponding thereto. The parameter information comprises specific position coordinates of the parameter information, and can be expressed in a space rectangular coordinate system, namely the positioning information of the industrial pallet; also included are a width value, a height value, and a confidence level of the industrial pallet. The confidence coefficient is the probability of predicting that the industrial pallet candidate box contains the industrial pallet, and the confidence coefficient is higher when the probability is higher.

Furthermore, a confidence threshold value is preset, and industrial tray candidate frames meeting the threshold value requirement are screened out from the confidence values of the industrial tray candidate frames. And sequencing the confidence degree values of the screened industrial tray candidate frames, screening the industrial tray candidate frame corresponding to the confidence degree of the industrial tray candidate frame with the maximum value, and taking the industrial tray candidate frame as an industrial tray boundary frame. It should be noted that the shape of the industrial pallet is not limited in the present application, and the specific shape of the industrial pallet can be determined by the above-described method.

Similarly, the supporting foot boundary frame of each supporting foot in the infrared image can be obtained according to the mode. The method specifically comprises the following steps: and receiving the infrared image by the single-stage target detection model, and adjusting the size of the infrared image according to the proportion of the supporting legs in the infrared image. And performing characteristic extraction on the infrared image by a convolutional neural network of the single-stage target detection model to obtain a prediction result containing a plurality of supporting leg candidate frames and parameter information of each supporting leg candidate frame, wherein the parameter information comprises position parameters and confidence degrees of the supporting leg candidate frames. And screening out the supporting leg candidate frames corresponding to the confidence degrees of the supporting leg candidate frames which accord with the preset threshold value, and taking the supporting leg candidate frames as supporting leg boundary frames. The principle of obtaining the supporting leg boundary frame is the same as that of obtaining the industrial tray boundary frame, and reference may be made to the above description, which is not repeated herein.

Of course, the convolutional neural network of the single-stage target detection model needs to be trained before it is used. Inputting a large number of industrial tray boundary frame samples and supporting leg boundary frame samples into a single-stage target detection model, outputting an initial industrial tray boundary frame or an initial supporting leg boundary frame through analysis and calculation of a convolution layer of a convolution neural network, and comparing the initial industrial tray boundary frame or the initial supporting leg boundary frame with a corresponding expected output result. If the comparison result shows that an error exists, the weight parameters of each neuron in the convolutional layer are adjusted until the output result is within the acceptable range of the expected output result, and the convolutional neural network training is completed.

In some embodiments, the method further includes filtering, by using the historical identified prior information of the industrial pallet, the supporting leg bounding boxes that do not meet a preset condition from among the obtained plurality of supporting leg bounding boxes, where the preset condition includes that the plurality of supporting leg bounding boxes are on the same straight line and that the supporting leg bounding boxes are located inside the industrial pallet bounding box. By the aid of the method, the accuracy of the determined supporting leg boundary frame can be guaranteed.

In some embodiments, the relationship between the supporting leg bounding box and the industrial tray bounding box may be further established, so as to provide a basis for determining the angle of the industrial tray relative to the AGV through the supporting leg bounding box and the parameter information thereof in the subsequent steps.

And S3, calculating and uploading the distance between the object and the unmanned transport vehicle and the angle information of the object to the unmanned transport vehicle according to the supporting leg boundary frame and the depth information. Specifically, after the supporting leg boundary frame is obtained, the distance from the AGV set by the lens to the industrial tray is obtained according to the position parameter information, such as the position coordinate, corresponding to the supporting leg boundary frame, and the acquired depth information, that is, the actual distance from each point in the industrial tray to the lens, especially the actual distance from each point of the supporting leg to the lens. Furthermore, the relation between the supporting legs and the industrial tray is combined, and the angle information of the industrial tray for the AGV can be determined according to the position coordinates of the supporting legs.

In some embodiments, after the angle and position information is determined, the AGV may be driven an appropriate distance to fork the load.

According to the object identification method based on the fusion of the infrared image and the depth information, a single-stage target detection model based on the infrared image is adopted, so that the rapid training and identification of various industrial trays are facilitated, and the deployment and application of related systems are facilitated. The infrared image has higher robustness in an environment with unstable illumination, and the detection result is more stable. In addition, the method and the device identify the two-dimensional image, have higher detection speed, and avoid the increase of time cost and operation cost caused by a large amount of operations generated when the point cloud data is processed on the three-dimensional image in the prior art.

The present application further provides such an object recognition system based on infrared image and depth information fusion, which may include: a depth camera 1, a single-stage object detection model 2 and a recognition module 3. Specifically, the depth camera 1 is used to capture infrared images of an object as well as depth information. The single-stage target detection model 2 is used for determining an object boundary frame of an object and a plurality of supporting foot boundary frames of the object in the infrared image respectively. The recognition module 3 is used for calculating and uploading the distance between the object and the automated guided vehicle and the angle information of the object to the automated guided vehicle according to the supporting leg boundary frame and the depth information.

In some embodiments, the performing step of the single-stage object detection model 2 may include: and receiving the infrared image, and adjusting the size of the infrared image according to the proportion of the object in the infrared image. And carrying out feature extraction on the infrared image by using a built-in convolutional neural network to obtain a prediction result containing a plurality of object candidate frames and parameter information of each object candidate frame, wherein the parameter information comprises position parameters and confidence degrees of the object candidate frames. And sequencing the confidence values of the plurality of object candidate frames, screening the object candidate frame corresponding to the confidence value of the object candidate frame with the maximum value, and taking the object candidate frame as an object boundary frame.

In some embodiments, the performing step of the single-stage object detection model 2 may further include: and receiving the infrared image by the single-stage target detection model, and adjusting the size of the infrared image according to the proportion of the supporting legs in the infrared image. And carrying out characteristic extraction on the infrared image by using a built-in convolutional neural network to obtain a prediction result containing a plurality of supporting leg candidate frames and parameter information of each supporting leg candidate frame, wherein the parameter information comprises position parameters and confidence degrees of the supporting leg candidate frames. And screening out the supporting leg candidate frames corresponding to the confidence degrees of the supporting leg candidate frames meeting the preset threshold value, and taking the supporting leg candidate frames as supporting leg boundary frames.

In some embodiments, a filtering module 4 and an association module 5 may also be included. Specifically, the filtering module 1 is configured to filter the supporting leg boundary frames that do not meet the preset condition among the obtained multiple supporting leg boundary frames by using the prior information for identifying the object, where the preset condition includes that the multiple supporting leg boundary frames are on the same straight line and that the supporting leg boundary frames are located in the object boundary frame. The correlation module 5 is used for establishing the relation between the supporting foot boundary frame and the object boundary frame.

The system is proposed for implementing the method, so that the specific application scenarios and principles of each module are completely consistent with the parts related to the method, which may be referred to above specifically, and are not described herein again.

According to the object recognition system based on the fusion of the infrared image and the depth information, a single-stage target detection model based on the infrared image is adopted, so that the rapid training and recognition of various types of industrial trays are facilitated, and the deployment and application of related systems of the application are facilitated. The infrared image has higher robustness in an environment with unstable illumination, and the detection result is more stable. In addition, the method and the device identify the two-dimensional image, have higher detection speed, and avoid the increase of time cost and operation cost caused by a large amount of operations generated when the point cloud data is processed on the three-dimensional image in the prior art.

The objects, technical solutions and advantageous effects of the present invention will be further described in detail with reference to the above-described embodiments. It should be understood that the above description is only a specific embodiment of the present invention, and is not intended to limit the present invention. Any modification, equivalent replacement, or improvement made without departing from the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. An object identification method based on infrared image and depth information fusion is characterized by comprising the following steps:

collecting an infrared image and depth information of an object;

respectively determining an object boundary frame of an object in the infrared image and a plurality of supporting leg boundary frames of the object by adopting a single-stage target detection model; and

and calculating and uploading the distance between the object and the automated guided vehicle and the angle information of the object to the automated guided vehicle according to the supporting leg boundary frame and the depth information.

2. The method of claim 1, wherein the infrared image and the depth information each include all of the support feet of the object and a plurality of receptacles provided on the object that are compatible with the automated guided vehicle.

3. The method of claim 1, wherein the determining an object bounding box of the object in the infrared image and a plurality of supporting foot bounding boxes of the object using a single-stage object detection model comprises:

receiving the infrared image by the single-stage target detection model, and adjusting the size of the infrared image according to the ratio of objects in the infrared image;

performing feature extraction on the infrared image by a convolutional neural network of the single-stage target detection model to obtain a prediction result containing a plurality of object candidate frames and parameter information of each object candidate frame, wherein the parameter information comprises position parameters and confidence coefficients of the object candidate frames;

and sequencing the confidence values of the object candidate frames, screening the object candidate frame corresponding to the confidence of the object candidate frame with the maximum value, and taking the object candidate frame as the object boundary frame.

4. The method of claim 1 or 3, wherein the determining the object bounding box of the object and the plurality of leg bounding boxes of the object in the infrared image using a single-stage object detection model, respectively, further comprises:

receiving the infrared image by the single-stage target detection model, and adjusting the size of the infrared image according to the proportion of supporting legs in the infrared image;

performing feature extraction on the infrared image by a convolutional neural network of the single-stage target detection model to obtain a prediction result containing a plurality of supporting leg candidate frames and parameter information of each supporting leg candidate frame, wherein the parameter information comprises position parameters and confidence degrees of the supporting leg candidate frames;

screening out the supporting leg candidate frames corresponding to the confidence degrees of the supporting leg candidate frames meeting the preset threshold value, and taking the supporting leg candidate frames as the supporting leg boundary frames.

5. The method of claim 4, after said determining an object bounding box of the object in the infrared image and a plurality of supporting foot bounding boxes of the object, respectively, using a single-stage object detection model, further comprising:

filtering the supporting leg boundary frames which do not meet preset conditions in the obtained supporting leg boundary frames by utilizing prior information for identifying the object, wherein the preset conditions comprise that the supporting leg boundary frames are on the same straight line and the supporting leg boundary frames are located in the object boundary frames; and

and establishing the relation between the supporting leg boundary frame and the object boundary frame.

6. An object recognition system based on infrared image and depth information fusion, comprising:

the depth camera is used for acquiring an infrared image and depth information of an object;

the single-stage target detection model is used for respectively determining an object boundary frame of an object in the infrared image and a plurality of supporting leg boundary frames of the object; and

and the identification module is used for calculating and uploading the distance between the object and the automated guided vehicle and the angle information of the object to the automated guided vehicle according to the supporting leg boundary frame and the depth information.

7. The system of claim 6, wherein the infrared image and the depth information each include all of the support feet of the object and a plurality of receptacles disposed on the object that are compatible with the automated guided vehicle.

8. The system of claim 6, wherein the performing step of the single-stage object detection model comprises:

receiving the infrared image, and adjusting the size of the infrared image according to the ratio of objects in the infrared image;

extracting the characteristics of the infrared image by a built-in convolutional neural network to obtain a prediction result containing a plurality of object candidate frames and parameter information of each object candidate frame, wherein the parameter information comprises position parameters and confidence degrees of the object candidate frames;

9. The system of claim 6 or 8, wherein the performing step of the single-stage object detection model further comprises:

extracting the characteristics of the infrared image by a built-in convolutional neural network to obtain a prediction result comprising a plurality of supporting leg candidate frames and parameter information of each supporting leg candidate frame, wherein the parameter information comprises position parameters and confidence degrees of the supporting leg candidate frames;

screening out the supporting leg candidate frames corresponding to the confidence degrees of the supporting leg candidate frames which accord with the preset threshold value, and taking the supporting leg candidate frames as the supporting leg boundary frames.

10. The system of claim 6, further comprising:

the filtering module is used for filtering the supporting leg boundary frames which do not meet preset conditions in the plurality of supporting leg boundary frames obtained by utilizing the prior information for identifying the object, wherein the preset conditions comprise that the supporting leg boundary frames are on the same straight line and the supporting leg boundary frames are located in the object boundary frames; and

and the association module is used for establishing the relation between the supporting leg boundary frame and the object boundary frame.