CN117315264B

CN117315264B - Tray detection method based on image recognition and related device

Info

Publication number: CN117315264B
Application number: CN202311622695.2A
Authority: CN
Inventors: 漆文星
Original assignee: Shenzhen Pallet Sharing Technology Co ltd
Current assignee: Shenzhen Pallet Sharing Technology Co ltd
Priority date: 2023-11-30
Filing date: 2023-11-30
Publication date: 2024-03-08
Anticipated expiration: 2043-11-30
Also published as: CN117315264A

Abstract

The invention relates to the technical field of data processing, and discloses a tray detection method and a related device based on image recognition. The tray detection method based on image recognition comprises the following steps: acquiring a first image of a target tray through preset image acquisition equipment, and dividing the first image according to a preset tray region division strategy to obtain a division image of each specific region; converting each divided image into corresponding sub-image stream data based on the divided image of each specific area, preprocessing each sub-image stream data through a preset image processing algorithm, restoring each sub-image stream data into a divided processing image, and combining each divided processing image to obtain a second image of the target tray; according to the invention, the image processing and the depth estimation are carried out through the deep learning model and the algorithm, so that the automation level of tray detection and parameter calculation is improved. Accurate image processing is realized, depth information is acquired and utilized, and the accuracy of tray analysis is improved.

Description

Tray detection method based on image recognition and related device

Technical Field

The invention relates to the technical field of data processing, in particular to a tray detection method based on image recognition and a related device.

Background

Tray detection is a key link in various industrial, warehouse and logistics scenes, and can ensure the smooth process of the process for accurate identification and parameter measurement of the tray. Traditional methods rely mainly on manual measurement or simple image processing techniques, which can lead to inefficiency and poor accuracy problems.

In general, the existing tray detection technology has obvious limitations in terms of the fineness of image processing, the acquisition of depth information, the accurate calculation of tray parameters and the like, which affect the efficiency and accuracy of the tray detection technology in practical application.

Therefore, there is a need to develop new methods for finer segmentation and processing of images so that this process can utilize depth information to more accurately calculate important parameters of the tray.

Disclosure of Invention

The invention provides a tray detection method and a related device based on image recognition, which are used for solving the problem of how to realize high-efficiency image data processing and calculating important parameters of a tray more accurately.

The first aspect of the invention provides a tray detection method based on image recognition, which comprises the following steps:

acquiring a first image of a target tray through preset image acquisition equipment, and dividing the first image according to a preset tray region division strategy to obtain a division image of each specific region;

Converting each divided image into corresponding sub-image stream data based on the divided image of each specific area, preprocessing each sub-image stream data through a preset image processing algorithm, restoring each sub-image stream data into a divided processing image, and combining each divided processing image to obtain a second image of the target tray;

encoding the second image of the target tray to generate a plurality of corresponding encoding character strings, and inputting the encoding character strings into a trained depth estimation model to obtain depth data corresponding to the second image; wherein the depth data comprises at least depth distance data of a target tray rail;

determining the image position of a target tray cross bar in a first image based on the feature matrix of each specific area on the target tray, and acquiring the depth distance of the target tray cross bar from the depth data;

and calculating the three-dimensional coordinates of the target tray cross bar based on the image position of the target tray cross bar and the depth distance corresponding to the target tray cross bar, and calculating the offset angle of the target tray relative to the image acquisition equipment and the center position of the jack of the target tray based on the three-dimensional coordinates of the target tray cross bar.

Optionally, in a first implementation manner of the first aspect of the present invention, the training process of the depth estimation model includes:

acquiring an initial image data set, and acquiring depth information of a corresponding initial image as a label; extracting main features of the initial image through a predefined feature extraction algorithm to generate an image feature vector;

inputting the image feature vector into a preset deep learning model for network training; the deep learning model comprises a multi-layer convolution network and a two-layer coding network;

completing one-layer convolution operation on the image feature vector through a multi-layer convolution network, optimizing according to network parameters, and generating a middle feature vector after convolution processing;

inputting the intermediate feature vector into a first layer coding network for coding operation, completing intermediate feature vector mapping, and generating a first coding feature vector;

inputting the first coding feature vector into a second layer coding network to perform secondary coding operation, completing deeper feature mapping, and generating a second coding feature vector;

based on a preset vector splicing algorithm, splicing the first coding feature vector and the second coding feature vector to obtain a target coding feature vector;

A Gaussian noise generator is utilized to randomly generate a unique noise characteristic vector, the unique noise characteristic vector is generated only once in a set period, and then the noise characteristic vector and a target coding characteristic vector are subjected to characteristic integration to generate a final characteristic vector; the database stores rules for integrating the noise characteristic vector and the target coding characteristic vector;

and inputting the final feature vector and the corresponding depth information label into a classification layer of the deep learning model, performing network training, and iteratively optimizing network parameters until a loss function of the classification layer converges, and completing a training process of the deep learning model to obtain a trained depth estimation model.

Optionally, in a second implementation manner of the first aspect of the present invention, the performing, based on a preset vector splicing algorithm, a splicing process on the first coding feature vector and the second coding feature vector to obtain a target coding feature vector includes:

collecting a first coding feature vector and a corresponding first event of a first source environment, and a second coding feature vector and a corresponding second event of a second source environment; wherein the first event and the second event are preliminary estimates of what may occur for each event before any data is received;

According to the first coding feature vector, analyzing the real-time state of the first event to obtain a first pre-measurement value; analyzing the real-time state of the second event from the second coding feature vector to obtain a second pre-measurement value;

generating a first posterior probability index for the first coding feature vector by using a preset first conditional probability formula and combining a preset prior probability and a first pre-measurement; the first posterior probability index is the joint characterization distribution of a first coding feature vector;

generating a second posterior probability index for the second coding feature vector by using a preset second conditional probability formula and combining a preset prior probability and a second preset measure; wherein the second posterior probability indicator is a joint characterization distribution of a second encoded feature vector;

splicing the first posterior probability index and the second posterior probability index through a preset vector splicing algorithm to generate a complete target coding feature vector; the target coding feature vector represents the weight fusion of the first coding feature vector and the second coding feature vector in a unified multi-mode vector space.

Optionally, in a third implementation manner of the first aspect of the present invention, the encoding processing is performed on the second image of the target tray to generate a plurality of corresponding encoding strings, including:

Acquiring a plurality of sub-region images of a second image of the target tray based on a preset plate characteristic mark; wherein, the database stores the dynamic mapping rule between the plate characteristic mark and the sub-region image of the second image of the target tray in advance;

analyzing the acquired multiple sub-region images through the trained section analysis model, and extracting attribute modes of the multiple sub-region images; the section analysis model is obtained after training based on a convolutional neural network model;

acquiring the attribute mode, and matching with a preset digital conversion data sheet so as to construct a target code sheet; the target code list is constructed by matching the attribute mode with a preset digital conversion data list based on a standard code list;

in the target code list, retrieving coding information corresponding to a plurality of sub-region images to obtain specific coding information of each sub-region image;

adding attribute labels to each sub-region image according to the specific coding information, integrating each sub-region image with the attribute labels to obtain a digital index of the complete sub-region image, and converting the digital index into a plurality of corresponding coding character strings; the database extracts and stores the conversion rules of each digitized index and a plurality of coded character strings.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the attribute mode includes at least brightness, color distribution, and edge characteristics of the sub-area image.

The second aspect of the present invention provides an image recognition-based tray detection device, comprising:

the acquisition module is used for acquiring a first image of the target tray through preset image acquisition equipment, and dividing the first image according to a preset tray region division strategy to obtain divided images of each specific region;

the generation module is used for converting each divided image into corresponding sub-image stream data based on the divided image of each specific area, preprocessing each sub-image stream data through a preset image processing algorithm, restoring each sub-image stream data into a divided processing image, and combining each divided processing image to obtain a second image of the target tray;

the coding module is used for carrying out coding processing on the second image of the target tray, generating a plurality of corresponding coding character strings, and inputting the coding character strings into the trained depth estimation model to obtain depth data corresponding to the second image; wherein the depth data comprises at least depth distance data of a target tray rail;

The determining module is used for determining the image position of the target tray cross bar in the first image based on the feature matrix of each specific area on the target tray and obtaining the depth distance of the target tray cross bar from the depth data;

the calculating module is used for calculating the three-dimensional coordinates of the target tray cross bar based on the image position of the target tray cross bar and the depth distance corresponding to the target tray cross bar, and calculating the offset angle of the target tray relative to the image acquisition equipment and the center position of the jack of the target tray based on the three-dimensional coordinates of the target tray cross bar.

A third aspect of the present invention provides a tray detection apparatus based on image recognition, comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the image recognition-based tray detection device to perform the image recognition-based tray detection method described above.

A fourth aspect of the present invention provides a computer-readable storage medium having instructions stored therein that, when executed on a computer, cause the computer to perform the above-described image recognition-based tray detection method.

In the technical scheme provided by the invention, the beneficial effects are as follows: the invention provides a tray detection method based on image recognition and a related device, wherein a first image of a target tray is acquired through preset image acquisition equipment, and the first image is segmented according to a preset tray region segmentation strategy to obtain segmented images of each specific region; converting each divided image into corresponding sub-image stream data based on the divided image of each specific area, preprocessing each sub-image stream data through a preset image processing algorithm, restoring each sub-image stream data into a divided processing image, and combining each divided processing image to obtain a second image of the target tray; encoding the second image of the target tray to generate a plurality of corresponding encoding character strings, and inputting the encoding character strings into a trained depth estimation model to obtain depth data corresponding to the second image; determining the image position of a target tray cross bar in a first image based on the feature matrix of each specific area on the target tray, and acquiring the depth distance of the target tray cross bar from the depth data; and calculating the three-dimensional coordinates of the target tray cross bar based on the image position of the target tray cross bar and the depth distance corresponding to the target tray cross bar, and calculating the offset angle of the target tray relative to the image acquisition equipment and the center position of the jack of the target tray based on the three-dimensional coordinates of the target tray cross bar. According to the invention, through a preset tray region segmentation strategy and an image processing algorithm, the system can accurately segment and process the image of the target tray, so that the detailed information of each specific region is effectively extracted, and the analysis accuracy is increased. And analyzing the coded image through a depth estimation model, so as to obtain the depth data of the image. Acquiring depth information is very useful in many scenarios, such as in calculating the exact position of objects and analyzing the spatial layout of a scene. Through the acquired depth data and the characteristic matrix of the specific area, the system can accurately calculate the three-dimensional coordinates of the cross bar of the target tray, the offset angle of the tray and the center position of the jack. The accuracy and the reliability of the calculation of the tray parameters are improved. The image processing and the depth estimation are carried out by adopting a deep learning model and algorithm, so that the influence of manual operation and experience decision is reduced, and the automation level of tray detection and parameter calculation is improved.

Drawings

Fig. 1 is a schematic diagram of steps of a tray detection method based on image recognition in an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a tray detection device based on image recognition in an embodiment of the invention.

Detailed Description

The embodiment of the invention provides a tray detection method and a related device based on image recognition. The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

For easy understanding, the following describes a specific flow of an embodiment of the present invention, referring to fig. 1, and an embodiment of a tray detection method based on image recognition in the embodiment of the present invention includes:

step 101, acquiring a first image of a target tray through preset image acquisition equipment, and dividing the first image according to a preset tray region division strategy to obtain divided images of each specific region;

it can be understood that the execution subject of the present invention may be a tray detection method device based on image recognition, and may also be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.

Specifically, to implement step 101, the following is a specific embodiment:

and (3) image acquisition: and acquiring the image of the target tray by using a preset image acquisition device. The image acquisition device may be a camera or scanner for acquiring image data of the target tray. The acquired image may be a color image, a gray scale image, or other form of digital image.

Presetting a tray region segmentation strategy: and presetting a region segmentation strategy of the tray according to specific requirements and application scenes. The tray region segmentation strategy is a segmentation algorithm or method for a target tray for distinguishing a tray region in an image from the surrounding environment. For example, the position and extent of the tray can be determined by image features such as color, texture, contours, etc.

Image segmentation: and dividing the acquired first image of the target tray according to a preset tray region division strategy. The process of segmentation involves image processing and analysis techniques such as image filtering, edge detection, region growing, etc. By processing and analyzing the image at the pixel level, the tray area can be extracted from the whole image, and the segmented image of each specific area can be obtained.

Obtaining a segmented image of a specific region: and obtaining segmented images of each specific region according to the image segmentation result. The specific area may be defined according to different parts or properties of the tray, such as a cargo area, a marking area, a blank area, etc. The segmented image may be used for subsequent analysis and processing of object recognition, feature extraction, computational metrology, and the like.

102, converting each split image into corresponding sub-image stream data based on the split image of each specific area, preprocessing each sub-image stream data through a preset image processing algorithm, restoring each sub-image stream data into a split processing image, and combining each split processing image to obtain a second image of the target tray;

specifically, to implement step 102, the following is a specific embodiment:

The divided image is converted into sub-image stream data: each divided image is converted into corresponding sub-image stream data according to the divided image of each specific region. Sub-image stream data is a form of encoding and representing each specific region in a divided image. For example, each segmented image may be converted into a corresponding binary data stream by using an image encoding algorithm, such as JPEG, PNG, or the like.

Preprocessing sub-image stream data by a preset image processing algorithm: and preprocessing each sub-image stream data by adopting a preset image processing algorithm. The preprocessing process may include image filtering, edge enhancement, noise cancellation, etc. to improve image quality and information readability, depending on the specific needs and application. For example, a mean filter may be applied to smooth sub-image stream data to reduce noise interference.

Restoring to segmentation processing image: and restoring the preprocessed sub-image stream data into a segmentation processing image. The restoration process involves image decoding and reconstruction techniques, such as decoding a binary data stream into a matrix of pixels and reconstructing the matrix of pixels according to the image size and the segmented regions. By the reduction, a segmentation processed image subjected to the preprocessing can be obtained.

Combining the segmentation processed images to obtain a second image: and combining all the preprocessed segmentation processing images to obtain a second image of the target tray. The combination can adopt an image stitching technology to stitch each segmentation processing image according to a preset layout and position. For example, the complete target tray image may be formed by stitching the individual segmentation process images in a certain order or overlapping manner.

Step 103, performing coding processing on the second image of the target tray to generate a plurality of corresponding coding character strings, and inputting the coding character strings into a trained depth estimation model to obtain depth data corresponding to the second image; wherein the depth data comprises at least depth distance data of a target tray rail;

specifically, to implement step 103, the following is a specific embodiment:

image coding processing: and carrying out coding processing on the second image of the target tray to generate a plurality of corresponding coding character strings. The encoding process may be performed using an image encoding algorithm or a compression algorithm, such as JPEG, PNG, or the like. The encoding algorithm may convert the image into a compressed format or binary data for subsequent processing and transmission.

Depth estimation model: the encoded string is processed by a trained depth estimation model. The depth estimation model is a neural network model trained by a deep learning technique for estimating depth information from an image. The model may predict depth data of the target tray second image by inputting the encoding string.

Depth data generation: and inputting the coded character string into the trained depth estimation model to obtain depth data corresponding to the second image. The depth data is a representation of depth distance information for pixel points in the image. Through the processing and reasoning of the depth estimation model, the coded character string can be converted into a specific depth value, and the distance between each pixel point in the second image of the target tray and the camera is represented.

Target pallet cross bar depth distance data extraction: and extracting depth distance data of the target tray cross bar from the depth data. According to the shape and position of the cross bar of the specific target tray, depth information of the cross bar in the image, namely the distance from the camera, can be extracted through an image analysis and processing technology. The depth distance data may represent the spatial position and size of the crossbar, providing a basis for subsequent analysis and control.

104, determining the image position of a target tray cross bar in a first image based on the feature matrix of each specific area on the target tray, and acquiring the depth distance of the target tray cross bar from the depth data;

specifically, to implement step 104, the following is a specific embodiment:

specific region feature matrix: and extracting the characteristics of each specific area on the target tray, and generating a characteristic matrix. Feature extraction may use computer vision techniques, such as image processing and pattern recognition algorithms, to extract and describe image features of a particular region. The feature matrix is a numerical matrix, and records feature information of each specific area, such as edges, textures and the like.

Target pallet rail image position determination: and determining the position of the cross bar of the target tray in the first image based on the feature matrix of each specific area on the target tray by carrying out image processing and feature matching on the first image. Feature matching the feature matrix on the target tray may be compared to features of the corresponding region in the first image using feature matching algorithms, such as SIFT, SURF, etc., to find the location of the match.

Acquiring a cross bar depth distance from depth data: and extracting the depth distance of the target tray cross bar from the depth data according to the position of the target tray cross bar in the first image. According to the position of the cross bar in the first image, the corresponding pixel position can be determined, and the depth distance of the cross bar of the target tray can be obtained by inquiring the numerical value of the corresponding position in the depth data. The depth data may be obtained by means of a depth sensor, laser scanning, etc.

And 105, calculating the three-dimensional coordinates of the target tray cross bar based on the image position of the target tray cross bar and the depth distance corresponding to the target tray cross bar, and calculating the offset angle of the target tray relative to the image acquisition equipment and the center position of the jack of the target tray based on the three-dimensional coordinates of the target tray cross bar.

Specifically, to implement step 105, the following is a specific embodiment:

calculating three-dimensional coordinates based on the image position and depth distance: by knowing the position of the target pallet rail in the first image and the corresponding depth distance, a calculation of three-dimensional coordinates can be performed. According to the geometric relationship between the camera calibration parameters and the image acquisition equipment, the pixel positions in the image can be converted into coordinates in a three-dimensional space. And mapping and calculating the image position and the corresponding depth distance to obtain the three-dimensional coordinates of the cross bar of the target tray.

Calculating an offset angle relative to the acquisition device: based on the three-dimensional coordinates of the target tray rail, an offset angle of the target tray relative to the image acquisition device may be calculated. The offset angle may represent a relative positional relationship between the target tray and the image capture device, and is used to describe an angular difference of the target tray from the capture device in space. And through the calculation result of the three-dimensional coordinates and combining with the camera calibration parameters and the gesture resolving algorithm, the rotation angle or gesture information of the target tray relative to the acquisition equipment can be obtained.

Calculating the center position of the jack: based on the three-dimensional coordinates of the target tray cross bar, the position of the center of the target tray jack can be calculated. The center position of the jack can represent the spatial position of the jack on the target tray, and is used for determining the geometric center point of the jack of the target tray. And combining the geometric shape of the target tray and the jack layout through the calculation result of the three-dimensional coordinates, so that the three-dimensional coordinate position of the center of the jack of the target tray can be obtained.

In the embodiment of the invention, the beneficial effects are as follows: the position of the cross bar of the target tray in the first image can be determined by extracting and generating the feature matrix of the specific area on the target tray. And then, according to the corresponding pixels of the cross bar position in the first image, combining the depth data, and acquiring the depth distance information of the cross bar of the target tray. The position and distance of the cross bar of the target tray can be accurately measured, and important visual information is provided for subsequent analysis and control. By calculating the three-dimensional coordinates of the cross bar of the target tray, the offset angle of the target tray relative to the image acquisition equipment and the information of the center position of the jack of the target tray can be obtained. By utilizing fusion analysis of the image and the depth data, positioning and posture estimation of the target tray in a three-dimensional space can be realized, and accurate space reference is provided for subsequent target control and operation.

Another embodiment of the tray detection method based on image recognition in the embodiment of the invention comprises the following steps:

the training process of the depth estimation model comprises the following steps:

Specifically, in a specific embodiment of the present invention, a training process of a depth estimation model is described, and the specific method is as follows:

acquiring an initial image dataset and extracting features: first, an initial image dataset is acquired, and the primary features of the initial image are extracted by a predefined feature extraction algorithm, such as SIFT, SURF, HOG, or the like, to generate an image feature vector. Meanwhile, depth information corresponding to the initial image is acquired as a tag.

Performing network training of a deep learning model: and inputting the image feature vector into a preset deep learning model for training. The deep learning model may be a Convolutional Neural Network (CNN) model, a self-encoder network model, or the like, and the model includes a multi-layer convolutional network and a two-layer encoding network.

Performing convolution operation and generating a feature vector: and carrying out one-layer convolution operation on the image feature vector through a multi-layer convolution network, and optimizing based on network parameters to obtain a middle feature vector after convolution processing.

Performing coding operation and generating coding feature vectors: and inputting the intermediate feature vector into a first layer coding network, completing a coding operation and generating a first coding feature vector. Subsequently, the first encoded feature vector is input into a second layer encoding network for a secondary encoding operation, thereby generating a second encoded feature vector.

And (3) performing feature vector processing: and splicing the first coding feature vector and the second coding feature vector by using a predefined vector splicing algorithm to obtain a target coding feature vector.

Generating a final feature vector: based on Gaussian noise generator, a unique noise characteristic vector is randomly generated and integrated with the target coding characteristic vector to obtain a final characteristic vector.

Inputting a depth information label for network training: and inputting the final feature vector and the corresponding depth information label into a classification layer of the deep learning model together for further network training. And (3) iteratively optimizing network parameters until the loss function of the classification layer converges, thus completing the training of the depth estimation model.

In the embodiment of the invention, the beneficial effects are as follows: according to the embodiment of the invention, through a deep learning model and combining a main feature extraction technology, a convolution network and a coding network, the feature representation with more expressive force and distinguishing degree is generated by carrying out deep feature extraction and repeated mapping on the image, so that the prediction precision of depth information is improved. Secondly, the generalization capability of the model is enhanced: by splicing different layers of coding features and combining noise features, the stability of the model is improved, so that the network has stronger prediction performance when facing different depth scenes, and the generalization capability of the model is improved. In addition, the training efficiency of the deep learning model is improved: by iterative optimization of model parameters until the loss function converges, the deep learning model can be converged as quickly as possible, thereby improving training efficiency. The trained depth estimation model can provide support for scenes needing depth information, such as roadblock recognition and avoidance of unmanned vehicles, real-time response of somatosensory games, environment perception of robot navigation and the like, and has wide application prospects.

The splicing processing is performed on the first coding feature vector and the second coding feature vector based on a preset vector splicing algorithm to obtain a target coding feature vector, including:

Specifically, in a specific embodiment of the present invention, how to splice the first encoded feature vector and the second encoded feature vector based on a preset vector splicing algorithm to obtain the target encoded feature vector is described. The specific method comprises the following steps:

collecting source environment coding feature vectors and corresponding events: first, first and second coded feature vectors corresponding to a preliminarily estimated event (e.g., a damaged condition of a pallet) are acquired from a first source environment (e.g., a warehouse environment) and a second source environment (e.g., a distribution point environment), respectively. These encoded feature vectors may be extracted from the ambient image by an image processing algorithm.

Analyzing the real-time state of the event: then, the real-time states of the first event and the second event are respectively resolved according to the first and the second coding feature vectors, and the first and the second prediction values are obtained. For example, if the coded feature vector contains shape and color information for the tray, such information can be used to predict the extent of breakage of the tray.

Generating posterior probability indexes: and then, combining the preset prior probability and the preset pre-measure, and respectively generating posterior probability indexes for the first and second coding feature vectors by using the first and second conditional probability formulas. These posterior probability indicators characterize the joint characterization distribution of the encoded feature vectors, providing basis for further decisions.

Generating a target coding feature vector: and finally, splicing the first posterior probability index and the second posterior probability index through a preset vector splicing algorithm to generate a complete target coding feature vector. The target coding feature vector integrates the information of the first source environment and the second source environment in a unified multi-mode vector space, and can reflect the comprehensive states of the two environments more accurately.

In the embodiment of the invention, the beneficial effects are as follows: according to the embodiment of the invention, the coding feature vectors extracted under different environments are analyzed and calculated in detail, so that the detection accuracy of the damaged state of the tray is effectively improved. By generating posterior probability indexes, decision basis can be provided for analyzing the damage condition of the tray, and related decision process is further assisted, so that the decision process is more scientific and credible. The multi-mode vector space is utilized to perform feature fusion, so that the target coding feature vector considers the warehouse environment, and the obtained decision is more comprehensive through the distribution point environment, thereby effectively preventing misjudgment caused by environment conversion.

the encoding processing is performed on the second image of the target tray, and a plurality of corresponding encoding character strings are generated, including:

Specifically, the specific steps of the embodiment of the invention are as follows:

collecting sub-region images: first, a plurality of sub-area images of the second image on the target tray are acquired based on preset plate characteristic marks, such as edges, colors, textures, and the like. For example, the target pallet image may be divided into a plurality of small, fixed-size square regions, each labeled with a plate characteristic marker.

Resolving the sub-region image: then, analyzing the acquired multiple sub-region images through the trained section analysis model. The tangent plane analytical model is a deep learning model, and a specific type may be a Convolutional Neural Network (CNN) model, and through training on a large amount of labeling data, the model can learn to extract useful image characteristics, namely attribute modes, from the sub-region images.

Constructing a target code list: and then, converting the analyzed attribute mode into a digital string by using a preset digital conversion data sheet, and constructing a target code sheet. The digital conversion data sheet may contain a conversion rule from an attribute mode to a digital string, so that the conversion from visual information to digital information is realized.

Generating coding information: in the target code book, the coding information corresponding to each sub-area image can be retrieved, and the specific coding information of each sub-area image can be obtained.

Generating a digital index and a coding character string: and finally, marking the attribute of each sub-region image according to the obtained specific coding information, which is equivalent to adding a label to each sub-region image. And integrating the sub-region images with the corresponding attribute labels to form a digital index. And converting the digitized index into a plurality of coded character strings by searching a conversion rule of the digitized index and the coded character strings in a database.

In the embodiment of the invention, the beneficial effects are as follows: the embodiment of the invention can effectively convert the image information into the coding character string which is easy to process by a computer. The preset plate characteristic marks and the digital conversion data sheet are combined with the attribute modes extracted by the section analysis model to accurately describe the state of the tray at the digital level, so that the method has important significance for subsequent image analysis and state judgment. In addition, in the embodiment, the acquired multiple sub-region images are analyzed by utilizing the trained section analysis model (such as a convolutional neural network), the attribute mode is extracted, and then the digital processing is performed by combining the digital conversion data sheet, so that the tray state is judged on a highly accurate level, and the judgment precision is improved. Meanwhile, due to the adoption of automatic processing, human resources are greatly saved, and the efficiency is improved. And the coding information of all the sub-region images is integrated and converted into a group of complete coding character strings, so that the possibility is provided for analyzing the overall state of the tray, for example, the overall damage degree of the tray is conveniently analyzed, or the specific distribution condition of the damaged region is calculated, so that a decision maker can comprehensively view the condition of the tray. The last coded character string represents the information of the image, but the data volume is far smaller than the image itself, which makes it more convenient to store and transmit, and is especially suitable for remote monitoring in network environment.

the attribute mode at least comprises brightness, color distribution and edge characteristics of the sub-region image.

The above description is made on the tray detection method based on image recognition in the embodiment of the present invention, and the following description is made on the tray detection device based on image recognition in the embodiment of the present invention, referring to fig. 2, one embodiment of the tray detection device based on image recognition in the embodiment of the present invention includes:

The invention also provides a tray detection device based on image recognition, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the tray detection method based on image recognition in the above embodiments.

The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and may also be a volatile computer readable storage medium, where instructions are stored in the computer readable storage medium, when the instructions are executed on a computer, cause the computer to perform the steps of the tray detection method based on image recognition.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The tray detection method based on image recognition is characterized by comprising the following steps of:

calculating three-dimensional coordinates of the target tray cross bar based on the image position of the target tray cross bar and the depth distance corresponding to the target tray cross bar, and respectively calculating an offset angle of the target tray relative to the image acquisition equipment and the center position of a jack of the target tray based on the three-dimensional coordinates of the target tray cross bar;

acquiring an initial image data set, and acquiring depth information of a corresponding initial image as a label; extracting features of the initial image through a predefined feature extraction algorithm to generate an image feature vector;

inputting the final feature vector and the corresponding depth information label into a classification layer of the deep learning model, performing network training, and iteratively optimizing network parameters until a loss function of the classification layer converges, and completing a training process of the deep learning model to obtain a trained depth estimation model;

2. The method for detecting a tray based on image recognition according to claim 1, wherein the encoding the second image of the target tray to generate a plurality of corresponding encoded character strings includes:

3. The image recognition-based tray detection method according to claim 2, wherein the attribute pattern includes at least brightness, color distribution, edge characteristics of the sub-area image.

4. A tray detection device based on image recognition, characterized in that the tray detection device based on image recognition includes:

the calculating module is used for calculating the three-dimensional coordinates of the target tray cross bar based on the image position of the target tray cross bar and the depth distance corresponding to the target tray cross bar, and calculating the offset angle of the target tray relative to the image acquisition equipment and the jack center position of the target tray based on the three-dimensional coordinates of the target tray cross bar;

5. A tray detection device based on image recognition, characterized in that the tray detection device based on image recognition comprises: a memory and at least one processor, the memory having instructions stored therein;

the at least one processor invoking the instructions in the memory to cause the image recognition based tray detection device to perform the image recognition based tray detection method of any of claims 1-3.

6. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the image recognition-based tray detection method of any one of claims 1-3.