CN111709987B

CN111709987B - Package volume measuring method, device, equipment and storage medium

Info

Publication number: CN111709987B
Application number: CN202010528604.9A
Authority: CN
Inventors: 李斯; 赵齐辉
Original assignee: Dongpu Software Co Ltd
Current assignee: Dongpu Software Co Ltd
Priority date: 2020-06-11
Filing date: 2020-06-11
Publication date: 2023-04-07
Anticipated expiration: 2040-06-11
Also published as: CN111709987A

Abstract

The invention relates to the field of logistics management and discloses a method, a device, equipment and a storage medium for measuring parcel volume. The method comprises the following steps: acquiring an image of a package stacking scene and inputting a preset package identification model to obtain a first area range of a package in the image; then, according to the first area range of the package in the image, obtaining a corresponding package image, and inputting a preset three-dimensional prediction model to obtain three-dimensional position coordinate information of the package in the image; and obtaining a target bounding box of the parcel in the image according to the three-dimensional position coordinate information of the parcel in the image, outputting volume parameters of the parcel, and calculating the volume of the parcel. According to the invention, the three-dimensional position coordinate information of the parcel in the picture can be directly obtained through the picture by the three-dimensional prediction model obtained through training, and the parcel volume is judged based on the three-dimensional coordinate information, so that the original manual estimation is replaced, and the parcel volume measurement efficiency is improved.

Description

Package volume measuring method, device, equipment and storage medium

Technical Field

The invention relates to the field of logistics management, in particular to a method, a device, equipment and a storage medium for measuring a parcel volume.

Background

Along with the popularization and use of internet, more and more people carry out communication and commodity shopping through the network, in order to deal with more and more parcel transportation safety, need plan the shipment transportation of loading to the parcel, and plan the transportation and then need carry out the estimation of volume and weight to the parcel, present measurement of parcel volume mostly leans on the experience of people to go near-sighted to estimate, or use 3d camera, generate 3d cloud picture, 3d camera cost is higher, and estimation through 3d camera, need install a plurality of cameras, just can realize the volume estimation, realization mode cost is too high like this.

Although the 3d camera can determine the target bounding box of the parcel through images and acquire the volume parameter information of the corresponding parcel, the method is only limited to estimation through a two-dimensional image form, and the two-dimensional image has a large difference with an actual product, so that the finally estimated volume is greatly different from the actual volume of the parcel.

Disclosure of Invention

The invention mainly aims to solve the technical problems that the difference between the estimated parcel volume and the actual parcel volume is large and the measurement accuracy is low in the prior art.

The invention provides a parcel volume measuring method in a first aspect, which comprises the following steps:

acquiring a first scene image of a real-time snapshot parcel stacking scene;

inputting the first field image into a preset package identification model for package identification to obtain a first identification image of a package label of the first field image, wherein the package label is represented by a first area range;

extracting a first parcel image corresponding to each parcel from the first identification image according to a first region range in the first identification image;

inputting the first wrapped image into a preset three-dimensional prediction model for three-dimensional prediction processing to obtain three-dimensional position coordinate information corresponding to the first wrapped image, wherein the three-dimensional prediction processing comprises the construction of the wrapped three-dimensional image and the calculation of the three-dimensional coordinate information of the wrapped three-dimensional image in the first live image;

determining a volume parameter of a package in the first package image based on three-dimensional position coordinate information of the package in the first package image;

calculating a volume of a parcel in the first live image based on the volume parameter of the parcel in the first live image.

Optionally, in a first implementation manner of the first aspect of the present invention, before the acquiring the first live image of the real-time captured parcel stacking scene, the method further includes:

acquiring a plurality of sample images of a freight car loading scene, and taking the sample images as first training sample images;

labeling the packages in the first training sample image to obtain first labeling information corresponding to each training sample image, and storing the first labeling information as a first labeling file, wherein the first labeling information is used for identifying a second area range in which the packages are loaded in the first training sample image;

and inputting the first training sample image and the corresponding first label file into a preset MASK R-CNN model for deep learning training to obtain a parcel recognition model, wherein the MASK R-CNN model comprises a ResNet-101 network, an RPN network, a ROIAlign layer and a classification network.

Optionally, in a second implementation manner of the first aspect of the present invention, the inputting the first live image into a preset package recognition model for package recognition, and obtaining a first recognition image of a package label of the first live image includes:

inputting the first field image into a preset package identification model, and extracting image characteristics of the first field image through the ResNet-101 network to obtain a first characteristic diagram;

inputting the first feature map into the RPN, and extracting a prediction frame of the first feature map through a preset selective search algorithm to obtain a prediction frame corresponding to the first feature map;

inputting the first feature map and the prediction box into the ROIAlign layer for prediction to obtain a second feature map containing the prediction box;

inputting the second characteristic diagram into the full-connection layer for classification processing to obtain the prediction probability that the prediction frame contains the package;

and obtaining a first identification image of the parcel annotation of the first live image based on the prediction probability that the prediction frame contains the parcel.

Optionally, in a third implementation manner of the first aspect of the present invention, before the acquiring the first scene image of the real-time captured parcel stacking scene, the method further includes:

acquiring a second scene image of a real-time snapshot parcel stacking scene;

inputting the second field image into a preset package identification model for identification to obtain a second identification image of a package label of the second field image, wherein the package label is represented by a third area range;

extracting a second parcel image corresponding to each parcel from the second identification image according to a third area range in the second identification image;

performing point cloud annotation on the second wrapped image to obtain second annotation information corresponding to the second wrapped image and storing the second annotation information as a second annotation file, wherein the second annotation information is used for identifying a target bounding box wrapped in the second field image;

and taking the second field image as a second training sample image, and training the second training sample image and a corresponding second annotation file based on a three-dimensional prediction network to obtain a three-dimensional prediction model, wherein the three-dimensional prediction network comprises a two-dimensional target detection sub-network, an example depth estimation sub-network, a three-dimensional positioning sub-network and an angular point regression sub-network.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the performing point cloud annotation on the second package image to obtain second annotation information corresponding to the second package image and storing the second annotation information as a second annotation file includes:

performing semantic segmentation on the second wrapping image according to a preset target segmentation algorithm to obtain a semantic segmentation result;

determining a parcel to be detected in the second parcel image based on the semantic segmentation result;

scene reconstruction is carried out according to the plurality of second wrapped images to obtain a reconstructed scene reconstruction image;

performing point cloud segmentation on the scene reconstruction image to obtain point cloud data of the to-be-detected package;

generating a bounding box corresponding to the parcel to be detected in the scene reconstruction image;

fitting the bounding box with the point cloud data of the to-be-detected package to obtain a target bounding box and outputting parameter information of the target bounding box;

and determining second annotation information corresponding to each second package image based on the parameter information of the target bounding box, and saving the second annotation information as a second annotation file.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the inputting the first parcel image into a preset three-dimensional prediction model for performing three-dimensional prediction processing to obtain three-dimensional position coordinate information corresponding to the first parcel image includes:

inputting the first wrapping image into a preset three-dimensional prediction model, and extracting a feature map of the first wrapping image through an example depth estimation network of the three-dimensional prediction model to obtain a shallow feature map and a deep feature map;

splicing the shallow feature map and the deep feature map, and outputting a plurality of disparity maps with different scales through depth prediction to obtain a depth map of the first package image;

inputting the first parcel image into a two-dimensional target detection sub-network, performing target detection on the first parcel image through the two-dimensional target detection sub-network, and determining the geometric type of a parcel in the first live image and four vertex coordinates of a two-dimensional bounding box;

inputting four vertex coordinates of the two-dimensional bounding box wrapped in the first field image into the three-dimensional positioning sub-network to obtain horizontal and vertical coordinates of a projection point of the three-dimensional bounding box wrapped in the first field image in the field image;

drawing a three-dimensional bounding box wrapped in the first live image based on horizontal and vertical coordinates of a projection point of the three-dimensional bounding box in the first live image, and determining a target bounding box wrapped in the first live image according to the three-dimensional bounding box;

inputting the depth map of the first parcel image and the target bounding box wrapped in the first field image into the corner regression sub-network, and performing corner regression on the depth map of the first parcel image through the corner regression sub-network to obtain the depth of the center of the target bounding box wrapped in the first field image;

determining three-dimensional position coordinate information of a parcel in the first live image based on a depth of a target bounding box center of the parcel in the first live image.

Optionally, in a sixth implementation manner of the first aspect of the present invention, the fitting the bounding box with the point cloud data of the parcel to be detected to obtain a target bounding box and output parameter information of the target bounding box includes:

establishing an objective function of the bounding box and the point cloud according to the inclination angle and the size parameter of the bounding box relative to the vertical direction, wherein the objective function indicates the remaining space of the bounding box after the bounding box wraps the point cloud;

calculating to obtain an optimal solution of the objective function, wherein the optimal solution comprises an optimal inclination angle and an optimal size parameter;

adjusting the bounding box based on the optimal solution to obtain a target bounding box;

determining a target display area based on the position of the parcel to be detected in the first sample image;

and outputting the target bounding box in the target display area, and displaying the inclination angle and the size parameter of the target bounding box.

A second aspect of the present invention provides a package volume measuring device comprising:

the first acquisition module is used for acquiring a first field image of a real-time snapshot parcel stacking scene;

the first identification module is used for inputting the first field image into a preset package identification model for package identification to obtain a first identification image of a package label of the first field image, wherein the package label is represented by a first area range;

the first extraction module is used for extracting a first parcel image corresponding to each parcel from the first identification image according to a first region range in the first identification image;

the prediction module is used for inputting the first wrapped image into a preset three-dimensional prediction model to perform three-dimensional prediction processing to obtain three-dimensional position coordinate information corresponding to the first wrapped image, wherein the three-dimensional prediction processing comprises the construction of the wrapped three-dimensional image and the calculation of the three-dimensional coordinate information of the wrapped three-dimensional image in the first field image;

a determination module for determining a volume parameter of a parcel in the first parcel image based on three-dimensional position coordinate information of the parcel in the first parcel image;

a calculation module to calculate a volume of a parcel in the first live image based on a volume parameter of a parcel in the first live image.

Optionally, in a first implementation manner of the second aspect of the present invention, the parcel volume measuring apparatus further includes:

the second acquisition module is used for acquiring a plurality of sample images of a freight car loading scene and taking the sample images as first training sample images;

the first labeling module is used for labeling the packages in the first training sample image to obtain first labeling information corresponding to each training sample image and storing the first labeling information as a first labeling file, wherein the first labeling information is used for identifying a second area range loaded with the packages in the first training sample image;

and the first training module is used for inputting the first training sample image and the corresponding first annotation file into a preset MASK R-CNN model for deep learning training to obtain a package recognition model, wherein the MASKR-CNN model comprises a ResNet-101 network, an RPN network, a ROIAlign layer and a classification network.

Optionally, in a second implementation manner of the second aspect of the present invention, the first identifying module is specifically configured to:

inputting the first field image into a preset package identification model, and extracting the image characteristics of the first field image through the ResNet-101 network to obtain a first characteristic diagram;

Optionally, in a third implementation form of the second aspect of the present invention, the parcel volume measuring apparatus further comprises:

the third acquisition module is used for acquiring a second site image of the real-time snapshot parcel stacking scene;

the second identification module is used for inputting the second field image into a preset package identification model for identification to obtain a second identification image of a package label of the second field image, wherein the package label is represented by a third area range;

the second extraction module is used for extracting a second parcel image corresponding to each parcel from the second identification image according to a third area range in the second identification image;

the second labeling module is used for performing point cloud labeling on the second wrapped image to obtain second labeling information corresponding to the second wrapped image and storing the second labeling information as a second labeling file, wherein the second labeling information is used for identifying a target bounding box wrapped in the second field image;

and the second training module is used for taking the second field image as a second training sample image, training the second training sample image and a corresponding second annotation file based on a three-dimensional prediction network, and obtaining a three-dimensional prediction model, wherein the three-dimensional prediction network comprises a two-dimensional target detection sub-network, an example depth estimation sub-network, a three-dimensional positioning sub-network and a corner point regression sub-network.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the second labeling module includes:

the semantic segmentation unit is used for performing semantic segmentation on the second package image according to a preset target segmentation algorithm to obtain a semantic segmentation result;

a first determining unit, configured to determine, based on the semantic segmentation result, a parcel to be detected in the second parcel image;

the scene reconstruction unit is used for reconstructing a scene according to the plurality of second wrapped images to obtain a reconstructed scene reconstruction image;

the point cloud segmentation unit is used for performing point cloud segmentation on the scene reconstruction image to obtain point cloud data of the to-be-detected package;

the generating unit is used for generating a bounding box corresponding to the to-be-detected package in the scene reconstruction image;

the fitting unit is used for fitting the bounding box with the point cloud data of the to-be-detected package to obtain a target bounding box and outputting parameter information of the target bounding box;

and the second determining unit is used for determining second annotation information corresponding to each second parcel image based on the parameter information of the target bounding box and saving the second annotation information as a second annotation file.

Optionally, in a fifth implementation manner of the second aspect of the present invention, the prediction module is specifically configured to:

inputting the first parcel image into a two-dimensional object detection sub-network, performing object detection on the first parcel image through the two-dimensional object detection sub-network, and determining the geometric type of a parcel in the first field image and four vertex coordinates of a two-dimensional bounding box;

Optionally, in a sixth implementation manner of the second aspect of the present invention, the fitting unit is specifically configured to:

A third aspect of the invention provides a package volume measuring device comprising: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;

the at least one processor invokes the instructions in the memory to cause the parcel volume measurement apparatus to perform the parcel volume measurement method described above.

A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the above-described parcel volume measurement method.

In the technical scheme provided by the invention, an image of a package stacking scene is acquired and a preset package identification model is input to obtain a first area range of a package in the image; then, according to the first area range of the package in the image, obtaining a corresponding package image, and inputting a preset three-dimensional prediction model to obtain three-dimensional position coordinate information of the package in the image; and obtaining a target bounding box of the parcel in the image according to the three-dimensional position coordinate information of the parcel in the image, outputting volume parameters of the parcel, and calculating the volume of the parcel. According to the invention, the three-dimensional position coordinate information of the parcel in the picture can be directly obtained through the picture by the three-dimensional prediction model obtained through training, and the parcel volume is judged based on the three-dimensional coordinate information, so that the original manual estimation is replaced, and the parcel volume measurement efficiency is improved.

Drawings

FIG. 1 is a schematic diagram of a first embodiment of a parcel volume measurement method in an embodiment of the invention;

FIG. 2 is a schematic diagram of a second embodiment of a parcel volume measurement method in an embodiment of the invention;

FIG. 3 is a schematic diagram of a third embodiment of a parcel volume measurement method in an embodiment of the invention;

FIG. 4 is a schematic diagram of a fourth embodiment of a parcel volume measurement method in an embodiment of the invention;

FIG. 5 is a schematic view of a fifth embodiment of a parcel volume measurement method in an embodiment of the present invention;

FIG. 6 is a schematic view of a first embodiment of a package volume measuring device in an embodiment of the present invention;

FIG. 7 is a schematic view of a second embodiment of a package volume measuring device in accordance with an embodiment of the present invention;

fig. 8 is a schematic diagram of an embodiment of a parcel volume measuring apparatus in an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a method, a device, equipment and a storage medium for measuring a parcel volume, wherein in the technical scheme of the invention, an image of a parcel stacking scene is obtained and a preset parcel identification model is input to obtain a first area range of parcels in the image; then, according to the first area range of the package in the image, obtaining a corresponding package image, and inputting a preset three-dimensional prediction model to obtain three-dimensional position coordinate information of the package in the image; and obtaining a target bounding box of the parcel in the image according to the three-dimensional position coordinate information of the parcel in the image, outputting volume parameters of the parcel, and calculating the volume of the parcel. Compared with the past MASK R-CNN model, the three-dimensional prediction network has higher accuracy, image data acquisition is added in the measurement stage of the parcel volume, the parcel volume can be directly judged through pictures, and the original manual estimation is replaced, so that the parcel volume measurement efficiency is improved, and the logistics transportation cost is reduced.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For ease of understanding, a specific flow of an embodiment of the present invention is described below, with reference to fig. 1, where a first embodiment of a parcel volume measurement method in an embodiment of the present invention comprises:

101. acquiring a first scene image of a real-time snapshot parcel stacking scene;

it is to be understood that the executing subject of the present invention may be a package volume measuring device, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.

In this embodiment, a first live image of a parcel stacking scene is first captured by a camera or other device. For example, the packages on the conveyor belt have rectangular, circular, irregular shapes and the like. The server then reads the stored first live image.

102. Inputting the first field image into a preset package identification model for package identification to obtain a first identification image of a package label of the first field image, wherein the package label is represented by a first area range;

in this embodiment, the first live image collected on site is input into a preset package identification model for identification, so as to obtain a first area range of each package in the first live image. Because there may be people (staff such as express sorting, etc.), (transporting packages) vehicles and packages in the collected sample images, and it cannot be determined that there are packages in those images, it is necessary to determine which position in the image is a package through a pre-trained package recognition model, input the collected first field image into the pre-trained package recognition model, and output a first region range including packages in the first field image.

103. Extracting a first parcel image corresponding to each parcel from the first identification image according to the first region range in the first identification image;

in this embodiment, the area range of each parcel in the first live image is then cropped from the first live image, so as to extract a parcel image corresponding to each parcel. For example, in order to more accurately obtain the position of a package, a picture needs to be "segmented" again according to the region range of the package in the image, and a first package image corresponding to each package is extracted, so as to achieve a more accurate package identification effect.

104. Inputting the first wrapping image into a preset three-dimensional prediction model for three-dimensional prediction processing to obtain three-dimensional position coordinate information corresponding to the first wrapping image;

in this embodiment, after the three-dimensional prediction model is obtained through training, a first live image shot on the conveyor belt at present is obtained in a real-time snapshot manner, the first live image includes a parcel to be identified, and the first parcel image is input into the three-dimensional prediction model according to a first parcel image corresponding to the parcel in the first live image.

The three-dimensional prediction model can obtain the target bounding boxes of all packages in the first package image through target detection and depth estimation, and further determine the three-dimensional position coordinate information of the packages in the first field image.

105. Determining volume parameters of the packages in the first package image based on the three-dimensional position coordinate information of the packages in the first package image;

in this embodiment, the target bounding box of the parcel is determined according to the three-dimensional position coordinate information of the parcel (volume to be measured) detected in the field image, and the volume parameter of each parcel (volume to be measured), that is, the length, width and height of the object is further determined. Taking the parcel (bounding box) as a body cuboid for example, the size parameters of the parcel (bounding box) are length, width and height, and the size parameters of other various types of parcels (bounding boxes) are not illustrated here.

In this embodiment, since the target bounding box approximately represents the (volume to be measured) parcel, the parameter information of the target bounding box can be approximately regarded as the parameter information of the (volume to be measured) parcel, and based on this, the parameter information of the target bounding box can be output to the user. Alternatively, the parameter information may be output in a text form, or may also be output in a voice form. For example, if the (volume to be measured) parcel is a book, the bounding box may be selected as a rectangular parallelepiped, and the rectangular parallelepiped fitted to the (volume to be measured) parcel closest to the bounding box may be displayed in the upper region of the parcel to be measured, along with the dimensional parameters such as the length (L), width (W), height (H), and the like of the target bounding box and the inclination angle (θ) of the target bounding box with respect to the vertical direction.

106. The volume of the parcel in the first field image is calculated based on the volume parameter of the parcel in the first field image.

In this embodiment, the volume of each (volume to be measured) parcel is calculated from the volume parameter of the parcel (volume to be measured) in the first field image. For example, a series of processing is performed on the picture to obtain that the corresponding parcel in the picture is a cuboid, the length, the width and the height of the cuboid are x, y and z respectively, and the volume of the corresponding parcel in the picture is calculated according to a volume calculation formula of the cuboid.

In the embodiment of the invention, an image of a package stacking scene is acquired and a preset package identification model is input to obtain a first area range of a package in the image; then, according to the first area range of the package in the image, obtaining a corresponding package image, and inputting a preset three-dimensional prediction model to obtain three-dimensional position coordinate information of the package in the image; and obtaining a target bounding box of the parcel in the image according to the three-dimensional position coordinate information of the parcel in the image, outputting volume parameters of the parcel, and calculating the volume of the parcel. According to the invention, the three-dimensional position coordinate information of the parcel in the picture can be directly obtained through the picture by the three-dimensional prediction model obtained through training, and further, the parcel volume is judged, so that the original manual estimation is replaced, and the parcel volume measurement efficiency is improved.

Referring to fig. 2, a second embodiment of a parcel volume measurement method according to an embodiment of the present invention includes:

201. taking a plurality of sample images acquired from a truck loading scene as first training sample images, labeling packages in the first training sample images to obtain first labeling information corresponding to each training sample image, and storing the first labeling information as a first labeling file;

in this embodiment, a sample image of a truck loading scene is first captured by a camera or other device. For example, the packages on the conveyor belt have rectangular, circular, irregular shapes and the like. Then the server reads the stored multiple sample images as the first training sample image.

In this embodiment, two ways are generally adopted for labeling packages in the first training sample image, one is model labeling and the other is manual labeling. Because the model that can accurately mark the parcel lacks at present, therefore this scheme adopts the mode of artifical mark.

In this embodiment, the first training sample image is input into preset image annotation software for display. Labelme software is preferred as the image annotation software. The Lableme software is an image annotation tool which can be used to create a customized annotation task or perform image annotation. And selecting the packages in the images by using the closed lines connected at the head through an interactive device in a manual mode. And the interactive equipment sends the position coordinates corresponding to the closed lines to the server. And the server defines the wrapping area in the training sample image according to the position coordinate to obtain an image containing the range of the labeling wrapping area, so that the example segmentation labeling of the training sample image is realized. And the image containing the labeling package area range is the required labeling information.

202. Inputting the first training sample image and a corresponding first annotation file into a preset MASK R-CNN model for deep learning training to obtain a package identification model, wherein the MASKR-CNN model comprises a ResNet-101 network, an RPN network, a ROIAlign layer and a classification network;

in this embodiment, the MASKR-CNN is an Instance segmentation (Instance segmentation) model, and may be used for "target detection", "target Instance segmentation", and "target keypoint detection".

In this embodiment, the MASKR-CNN is formed by connecting a ResNet-101 network, an RPN network, a ROIAlign layer and a classification network in sequence.

ResNet-101 is a member of the ResNet series of convolutional neural networks. In addition to learning feature extraction, resNet learns the loss from the feature of the previous layer to the feature of the next layer, so that more features can be extracted. And inputting the training sample image into a ResNet-101 network, and extracting the features in the training sample image through the convolutional layer to obtain first feature data.

203. Acquiring a first field image of a real-time snapshot package stacking scene, inputting the first field image into a preset package identification model for package identification, and acquiring a first identification image of a package label of the first field image, wherein the package label is represented by a first area range;

204. extracting a first parcel image corresponding to each parcel from the first identification image according to the first region range in the first identification image;

205. acquiring a second scene image of a real-time snapshot parcel stacking scene;

in this embodiment, a second live image of the package placement scene is first captured by a camera or other device. For example, the packages on the conveyor belt have rectangular, circular, irregular shapes and the like. The server then reads the stored second live image.

206. Inputting the second field image into a preset package identification model for identification to obtain a second identification image of a package label of the second field image, wherein the package label is represented by a third area range;

in this embodiment, after the package recognition model is obtained through training, a second live image shot on the conveyor belt at present is obtained in a real-time snapshot manner, and the second live image contains packages to be sorted. The second live image is then input into the package identification model. The parcel recognition model can identify the express mails in the second image through a circular, rectangular or other-shaped frame to obtain a third area range of each express mail in the second field image.

207. Extracting a second parcel image corresponding to each parcel from the second identification image according to a third area range in the second identification image;

and then cutting the area range of each parcel in the second field image from the second field image, thereby extracting a second parcel image corresponding to each parcel.

208. Performing point cloud annotation on the second wrapped image to obtain second annotation information corresponding to the second wrapped image and storing the second annotation information as a second annotation file, wherein the second annotation information is used for identifying a target bounding box wrapped in the second field image;

in this embodiment, an Opencv label open source tool CVAT is used to perform point cloud labeling on a third area range including a package in each collected second package image, so as to obtain second label information corresponding to the second package image, and store the second label information as a second label file, where the second label information is used to identify a target bounding box of the package (to be detected) in the first field image.

The Opencv annotation open source tool (CVAT), also called computer vision annotation tool, is a Web-based tool and can be used for annotating videos and images for computer vision algorithms. It was inspired by the Vatic free online interactive video annotation tool. CVAT has many powerful functions: the interpolation of bounding boxes between key frames, automatic labeling using TensorFlow ODAPI, shortcuts to most key operations, dashboards with labeling task lists, LDAP and basic authorization, etc.

Point cloud labeling, also called three-dimensional point cloud labeling, refers to labeling collected point cloud data. Before explaining the point cloud annotation, the "point cloud data" is explained first. The point cloud data may represent the RGB color, gray value, depth, segmentation result, etc. of a point in addition to the geometric position information represented by (X, Y, Z), when the point cloud data is labeled, in order to better distinguish whether the labeling is accurate, the point cloud in the labeling frame may be color rendered to form a color contrast with the unlabeled point cloud, thereby assisting the labeling personnel in determining the accuracy of the labeling result.

209. Taking a second field image as a second training sample image, and training the second training sample image and a corresponding second annotation file based on a three-dimensional prediction network to obtain a three-dimensional prediction model, wherein the three-dimensional prediction network comprises a two-dimensional target detection sub-network, an example depth estimation sub-network, a three-dimensional positioning sub-network and an angular point regression sub-network;

in this embodiment, the collected second field image is used as a second training sample image, and a corresponding second annotation file is input to the three-dimensional prediction network for training, so as to obtain a three-dimensional prediction model. The three-dimensional prediction network is a single unified network structure consisting of four specific subtask networks, wherein the four subtask networks are respectively a two-dimensional target detection subnetwork, an Instance Depth Estimation (IDE) subnetwork, a three-dimensional positioning subnetwork and a local corner regression subnetwork, and each subnetwork corresponds to one task, namely, two-dimensional target detection, instance Depth Estimation (IDE), three-dimensional positioning and local corner regression.

The three-dimensional prediction network (MonoGRNet: interferometric reading network for Monocular 3D Object Localization) is a geometric inference network used for Monocular three-dimensional Object detection and Localization, and can be used for estimating the three-dimensional position of an Object in a single image. Estimating depth from two-dimensional images is a key step in scene reconstruction and understanding tasks, such as three-dimensional object detection and segmentation. Obtaining Depth information based on Monocular images is defined as an MDE problem (singular Depth Estimation). Depth estimation of two-dimensional images is a fundamental task in many applications, including image blurring, scene understanding and reconstruction, etc. Wherein the goal of the depth estimation is to assign to each pixel in the image the distance between the viewer and the scene point represented by that pixel.

The depth estimation method for the image is generally as follows: training a Convolutional Neural Network (CNN) according to the sample image and the depth map corresponding to the sample image to obtain a prediction model; then, red Green Blue (RGB) images of a large number of sample images are obtained, the depth value of each pixel point in the RGB images is artificially labeled, the RGB images labeled with the depth value of each pixel point are used for performing optimization training on the prediction model, and then the prediction model after optimization training can be used for performing depth estimation on the target image to obtain the depth value of each pixel point in the target image.

210. Inputting the first wrapping image into a preset three-dimensional prediction model for three-dimensional prediction processing to obtain three-dimensional position coordinate information corresponding to the first wrapping image;

211. determining volume parameters of the packages in the first package image based on the three-dimensional position coordinate information of the packages in the first package image;

212. the volume of the parcel in the first field image is calculated based on the volume parameter of the parcel in the first field image.

The embodiment of the invention provides a detailed process for identifying a parcel in a picture by a parcel identification model, which comprises the steps of firstly extracting features through a ResNet-101 network, then generating a preselected frame containing the parcel through an RPN network, then fusing the preselected frame and a feature map through an ROI Align, finally predicting results through an FCN network, and identifying a first region range of the parcel in a first field image according to the prediction probability that the predicted frame contains the parcel to obtain a more accurate parcel image.

Referring to fig. 3, a third embodiment of the method for measuring a parcel volume according to an embodiment of the present invention includes:

301. acquiring a first scene image of a real-time snapshot parcel stacking scene;

302. inputting the first field image into a preset package identification model, and extracting the image characteristics of the first field image through a ResNet-101 network to obtain a first characteristic diagram;

in this embodiment, the first field image is input into a preset package identification model, and the image features of the first field image are extracted through the ResNet-101 network of the package identification model to obtain a first feature map.

In this embodiment, resNet-101 is a member of the ResNet series of convolutional neural networks. The ResNet learns the loss from the previous layer of features to the next layer of features, namely the residual error, in addition to learning feature extraction by adding an identical quick link mode, so that the accumulation layer can learn new features on the basis of input features, and more features can be extracted. And the depth of ResNet-101 is 101 layers, so that the extracted features are finer and the accuracy is higher in example segmentation.

In this embodiment, after the training sample image is input into the ResNet-101 network, the ResNet network extracts features therein by convolution to obtain corresponding first feature data. Since the image is composed of individual pixels, each of which can be represented by a numerical value, such as an RGB-type image, which can be represented by three numerical values of three channels, R, G, and B, it can be represented as a mathematical matrix of 3 × a × B. The essence of feature extraction is to convolve the value of each pixel with a convolution kernel of a certain size, such as c x d. The corresponding first characteristic data can therefore also be represented by a matrix of m × k.

303. Inputting the first feature map into an RPN network, and extracting a prediction frame of the first feature map by a preset selective search algorithm to obtain a prediction frame corresponding to the first feature map;

in the embodiment, the obtained first characteristic diagram is input into an RPN network to obtain the preset anchor frame information, and the RPN network comprises a first classifier; and generating an anchor frame of the first feature map according to the anchor frame information. Further, whether the package exists in the anchor frame or not is judged through the first classifier, and if the package exists, border regression is conducted on the anchor frame to obtain a prediction frame corresponding to the first feature map.

In the embodiment, a sliding window is used for target recognition in the past, however, only one target can be detected in one window, and the problem of multiple sizes exists. Anchor boxes (anchors) have therefore been proposed. The anchor frame information is preset, for example, the number of the anchor frames is 9, and the anchor frames include nine specifications of 3 × 1,3 × 2 and the like.

The RPN network includes a first classifier, and the present embodiment preferably uses softmax as the first classifier for the determination. softmax is also called a normalization index function, and is normalized through the gradient logarithm of wired discrete probability distribution, so that a corresponding probability value is obtained. The value of the package contained in each anchor frame is calculated, and then normalization is carried out, so that the probability that the package contained in each anchor frame is obtained. If the probability is larger than a preset threshold value, determining that the anchor frame has the parcel, and if the probability is smaller than the preset threshold value, determining that the anchor frame has no parcel.

Border-box regression, also called BB regression, refers to the fine position adjustment of the retained anchor frame by regression analysis. The anchor frames with the packages can be screened out through the classifier, but the sizes of the anchor frames are all fixed by the preset anchor frame information, so that the corresponding packages are not necessarily and accurately contained, and the fine adjustment is needed.

The fine-tuning approaches that are often employed are panning and size scaling. Because the two modes can be completed through simple linear mapping, a linear transformation formula can be preset, and the linear transformation formula is learned through training. If the parcel exists in the anchor frame, the anchor frame containing the parcel is reserved, and the reserved anchor frame is finely adjusted through frame regression, so that a preselected frame corresponding to the first characteristic data is obtained.

304. Inputting the first feature map and the prediction box into a ROIAlign layer for prediction to obtain a second feature map containing the prediction box;

in this embodiment, the first feature map and the prediction box are input to the roilign layer, and the prediction box and the first feature map are predicted through the roilign layer, so as to obtain a second feature map including the prediction box.

In this embodiment, ROIAlign is a manner of gathering regional features. Since the grid size required by the subsequent network is generally smaller than that of the feature map, two times of quantization are adopted in the ROI Pooling layer, so that decimal points may exist at the positions of the grid size, and the number of values in the feature map is an integer, so that the matching is performed in an integer manner. However, the matching is not completely matched, so that the phenomenon of mismatching exists. While ROIAlign may solve this problem.

Firstly, traversing a corresponding region of each preselected frame in the first feature data, keeping the boundary of a floating point number not to be quantized, then dividing the region into k units, finally calculating and fixing four position coordinates in each unit, calculating the values of the four positions by using a bilinear interpolation method, and then performing maximum pooling operation. Thereby obtaining second characteristic data comprising the preselected frame.

305. Inputting the second characteristic diagram into a full-connection layer for classification processing to obtain the prediction probability that a prediction box contains a package;

in this embodiment, the second feature map is input into the full-link layer, so as to obtain a target vector corresponding to the second feature map through the full-link layer. Wherein the classification network comprises a full connectivity layer and a second classifier; and inputting the target vector into a second classifier for prediction to obtain the prediction probability of the parcel contained in the prediction box.

In this embodiment, each node in the fully connected layers (FC) is connected to all nodes in the previous layer, so as to integrate all the extracted features.

In this embodiment, the fully-connected layer is a one-dimensional vector. And extracting and integrating all the previous features, and then adding an activation function to perform nonlinear mapping, so that all the features are mapped onto the one-dimensional vector to obtain the vector features corresponding to the second feature data.

In this embodiment, the second classifier is a softmax classifier. After the vector features are obtained, the probability value that each pre-selection box contains the package or does not contain the package is obtained through a softmax classifier.

And if the probability value of the packages is larger than the preset threshold value of the packages, judging that the pre-selection box contains the packages. And then taking the area range corresponding to the prediction frame as a prediction area with the parcel, and outputting the prediction area as a classification result.

306. Obtaining a first identification image marked by the parcel of the first field image based on the prediction probability that the prediction frame contains the parcel;

in this embodiment, a first region range of a parcel in the first live image is identified based on a prediction probability that the prediction box contains the parcel. And if the prediction probability is greater than the preset threshold value, taking the area range corresponding to the prediction frame as a prediction area, and taking the prediction area as a prediction result to obtain a first identification image of the package label of the first field image.

307. Extracting a first parcel image corresponding to each parcel from the first identification image according to the first region range in the first identification image;

308. inputting the first wrapping image into a preset three-dimensional prediction model for three-dimensional prediction processing to obtain three-dimensional position coordinate information corresponding to the first wrapping image;

309. determining volume parameters of the packages in the first package image based on the three-dimensional position coordinate information of the packages in the first package image;

310. the volume of the parcel in the first field image is calculated based on the volume parameter of the parcel in the first field image.

Referring to fig. 4, a fourth embodiment of the parcel volume measuring method according to the embodiment of the present invention includes:

401. acquiring a first site image of a real-time snapshot package stacking scene, inputting the first site image into a preset package identification model for package identification, and acquiring a first identification image of package labeling of the first site image;

402. extracting a first parcel image corresponding to each parcel from the first identification image according to the first region range in the first identification image;

403. acquiring a second field image of a real-time snapshot package stacking scene, inputting the second field image into a preset package identification model for identification, and obtaining a second identification image of a package label of the second field image;

404. extracting a second parcel image corresponding to each parcel from the second identification image according to a third area range in the second identification image;

405. performing semantic segmentation on the second wrapping image according to a preset target segmentation algorithm to obtain a semantic segmentation result;

in this embodiment, according to a preset target segmentation algorithm, semantic segmentation is performed on the collected first sample images to obtain semantic segmentation results corresponding to the first sample images.

Semantic segmentation is also called image semantic segmentation (semantic segmentation), and it is understood in a literal sense that a computer performs segmentation according to the semantics of an image, and for example, when a graph a is input, the computer can output a certain graph B corresponding to the graph a. The semantics refers to the meaning of voice in voice recognition, and in the image field, the semantics refers to the content of an image and the understanding of the meaning of a picture, for example, the semantics of the picture A refers to the fact that three people ride three bicycles; the division means that different objects in the picture are divided from the perspective of pixels, and each pixel in the original image is labeled, for example, pink represents a person and green represents a bicycle in the image B. The current application fields of semantic segmentation are mainly as follows: geographic information systems, unmanned vehicle driving, medical image analysis, robots and the like.

The target segmentation algorithm can be constructed by adopting a deep learning image semantic segmentation model of the current mainstream, such as Mask R-CNN and the like, so as to realize semantic segmentation of the specified image. Specifically, the semantic segmentation operation may segment an object region from the specified image and identify the content therein; that is, the pixel points belonging to the same object in the designated image are divided. Through the semantic segmentation operation, one or more semantic segmentation results can be obtained. Furthermore, in consideration of the resource cost and the time cost of data processing, the designated image may be uploaded to a preset cloud server for semantic segmentation processing, and after the processing of the cloud server is completed, the semantic segmentation result is then transmitted back to the mobile terminal, which is not limited herein.

406. Determining a package to be detected in the second package image based on the semantic segmentation result;

in this embodiment, the package to be detected in the second package image is determined according to the semantic segmentation result. The package to be detected here refers to a package of express items in the sample image.

According to the number of the semantic segmentation results, if only one semantic segmentation result exists, only one object exists in the corresponding first sample image, that is, the user can only select to perform size detection on the only object, and if more than two semantic segmentation results exist, the designated image is determined to have a plurality of objects, and at this time, the selection needs to be made based on a selection instruction input by the user.

407. Scene reconstruction is carried out according to the plurality of second wrapping images to obtain reconstructed scene reconstruction images;

in this embodiment, the scene stacked by the packages is reconstructed according to the plurality of second package images, so as to obtain a reconstructed scene reconstructed image. The process of dense reconstruction described above involves two steps, including the operation of the VIO (visual-interactive _ algorithm) algorithm and the construction of a map based on dense optical flow. In the following, the steps based on the VIO algorithm are described first: the input data of the algorithm are image sequence and inertia measurement data; the output data is the pose of each frame of image, namely the rotation and translation conditions of each frame of image. After the pose of each frame of image is obtained, various methods can be adopted to perform semi-dense three-dimensional reconstruction.

In this embodiment, after obtaining the relative transformation between the dense optical flow and each frame (sample image), a conventional monocular reconstruction method may be adopted, the optical flow parallax is used as a matching result of two frame points, the depth of each pixel is triangulated, and the depths of the pixel points are continuously updated and optimized in subsequent frames to form dense reconstruction of a scene, so as to obtain a scene reconstruction image.

408. Performing point cloud segmentation on the scene reconstruction image to obtain point cloud data of the package to be detected;

in this embodiment, point cloud segmentation is performed on the reconstructed scene image to obtain a point cloud of the package to be detected in the image. In consideration of the fact that the point cloud is three-dimensional information, in order to acquire the point cloud of the package to be detected in the image, the position and pose of each frame of image, namely the rotation and translation conditions of each frame of image, can be used for projecting the package to be detected into the scene reconstruction image to obtain the point cloud of the package to be detected.

409. Generating a bounding box corresponding to the to-be-detected package in the scene reconstruction image;

in this embodiment, a bounding box corresponding to a parcel to be detected in an image is generated in a scene reconstruction image. The scene reconstruction image is not displayed in the foreground of the mobile terminal but only appears in the background during data processing of the mobile terminal, i.e. the scene reconstruction image is not visible to the user. After the reconstructed scene reconstruction image is obtained, the mobile terminal can generate a bounding box associated with the parcel to be detected in the scene reconstruction image, and the shape of the parcel to be detected is approximated by the bounding box, so that a complex object can be simplified.

410. Establishing a target function of the bounding box and the point cloud according to the inclination angle and the size parameter of the bounding box relative to the vertical direction;

in this embodiment, an objective function of the bounding box and the point cloud is established according to the inclination angle and the size parameter of the bounding box corresponding to the parcel relative to the vertical direction.

And taking the centroid position of the point cloud as the initial position of the bounding box to realize the initial alignment of the bounding box and the point cloud of the parcel to be detected, estimating the current gravity reverse direction according to the inertial measurement data, and taking the gravity reverse direction as the vertical upward direction. In the case of objects encountered in life, most regular objects are naturally parallel to the direction of gravity. Thus, at this time, for the orientation of the bounding box, it is only necessary to estimate its tilt angle θ in the vertical direction, and the size parameter s of the bounding box itself. For the volume parameter, taking the bounding box as a cylinder as an example, the volume parameter of the bounding box is height and radius; taking the bounding box as a body cuboid as an example, the volume parameters of the bounding box are length, width and height, and the volume parameters of other bounding boxes of various types are not illustrated here.

411. Calculating to obtain an optimal solution of the target function, and adjusting the bounding box based on the optimal solution to obtain a target bounding box, wherein the optimal solution comprises an optimal inclination angle and an optimal size parameter;

in this embodiment, an optimal solution of the obtained objective function is calculated, where the optimal solution includes an optimal tilt angle and an optimal size parameter. The optimal solution refers to a solution of an objective function which enables the bounding box to have the minimum residual space after the point cloud is wrapped in the bounding box, wherein the objective function is constructed based on the inclination angle and the size parameter, and the obtained optimal solution comprises the optimal inclination angle and the optimal size parameter.

In this embodiment, the bounding box formed under the optimal solution is the target bounding box, and the optimal size and the optimal tilt angle of the bounding box can be obtained, so as to obtain the target bounding box.

412. Determining a target display area based on the position of the parcel to be detected in the first sample image;

in this embodiment, the corresponding target display area is determined according to the position of the parcel to be detected in the first sample image. The target display area is understood here as a specific (contour) position (with measurement volume) enclosed in the image.

413. Outputting a target bounding box in a target display area, and displaying the inclination angle and the size parameter of the target bounding box;

in this embodiment, in the target display area, a target bounding box (of the volume to be measured) wrapped in the image is output, and the inclination angle of the target bounding box and the corresponding dimension parameter are output, where the dimension parameter may be the length, the width, and the like of the target bounding box.

414. Determining second annotation information corresponding to each second wrapping image based on the parameter information of the target bounding box, and storing the second annotation information as a second annotation file;

in this embodiment, according to the parameter information of the target bounding box, second annotation information corresponding to each second package image is determined and stored as a second annotation file.

415. Taking the second field image as a second training sample image, and training the second training sample image and a corresponding second annotation file based on a three-dimensional prediction network to obtain a three-dimensional prediction model;

the three-dimensional prediction network comprises a two-dimensional target detection sub-network, an example depth estimation sub-network, a three-dimensional positioning sub-network and a corner point regression sub-network;

416. inputting the first wrapping image into a preset three-dimensional prediction model for three-dimensional prediction processing to obtain three-dimensional position coordinate information corresponding to the first wrapping image;

417. based on the three-dimensional position coordinate information of the package in the first package image, determining volume parameters of the package in the first package image and calculating the volume of the package in the first live image.

Referring to fig. 5, a fifth embodiment of the parcel volume measuring method according to the embodiment of the present invention includes:

501. acquiring a first scene image of a real-time snapshot parcel stacking scene;

502. inputting the first field image into a preset package identification model for package identification to obtain a first identification image of a package label of the first field image;

503. extracting a first parcel image corresponding to each parcel from the first identification image according to the first region range in the first identification image;

504. inputting the first wrapping image into a preset three-dimensional prediction model, and extracting a feature map of the first wrapping image through an example depth estimation network of the three-dimensional prediction model to obtain a shallow feature map and a deep feature map;

in this embodiment, a preset three-dimensional prediction model is input to the first parcel image corresponding to each parcel in the first live image, and the first parcel image is preprocessed through an example depth estimation network of the three-dimensional prediction model to extract a feature map, so as to obtain a shallow feature map and a deep feature map. The example depth estimation network adopts ResNet as a skeleton network, and a coding part and a decoding part form a U-shaped network structure. The coding part comprises the following steps in sequence: a first convolution layer, a pooling layer, a second convolution layer, a third convolution layer, a fourth convolution layer, and a fifth convolution layer.

The depth estimation problem in the field of computer vision, which is part of three-dimensional reconstruction, derives depth distances from the relationship of spatial geometry, temporal transformations and focal length variation. The rest are monocular inputs. Depth estimation can be used in the fields of three-dimensional modeling, scene understanding, depth-aware (depth-aware) image synthesis, and the like.

The monocular estimation basis based on deep learning is that the pixel value relationship reflects the depth relationship, and the method is to fit a function to map an image into a depth map: it can be seen from the resulting depth map profile and from the increasing experimental results that it is indeed possible to recover the relative depth values from the pixel values using one such function. Monocular images can also estimate depth if image blur is modeled, and the existing monocular depth estimation method generally directly predicts the depth value corresponding to each pixel in the image by using image data of a single view angle as input.

505. Splicing the shallow feature map and the deep feature map, and outputting a plurality of disparity maps with different scales through depth prediction to obtain a depth map of the first wrapping image;

in this embodiment, the depth map refers to a three-dimensional representation of an object, and is generally acquired by a stereo camera or a TOF camera. If the parameters are fixed by the camera, the depth image can be converted into a point cloud. Further, the depth image (depth image) is also called range image (range image), and refers to an image in which the distance (depth) from the image capture device to each point in the scene is used as a pixel value, and directly reflects the geometric shape of the visible surface of the scene. The depth image can be calculated into point cloud data through coordinate conversion, and the point cloud data with regular and necessary information can also be inversely calculated into depth image data.

In the image frames provided by the depth data stream, each pixel represents the distance (in millimeters) to the plane of the camera from the object closest to the plane at that particular (x, y) coordinate in the field of view of the depth sensor.

The focus of research on depth images is mainly focused on the following aspects: segmentation technology of depth images, edge detection technology of depth images, registration technology of multiple depth images based on different viewpoints, three-dimensional reconstruction technology based on depth data, three-dimensional target recognition technology based on depth images, multi-resolution modeling and geometric compression technology of depth data and the like.

506. Inputting the first parcel image into a two-dimensional target detection subnetwork, carrying out target detection on the first parcel image through the two-dimensional target detection subnetwork, and determining the geometric type of a parcel in the first field image and four vertex coordinates of a two-dimensional bounding box;

in this embodiment, the first live image and the first package image corresponding to the first live image are input into a preset three-dimensional prediction model, and the geometric type of the package in the first live image and the four vertex coordinates of the two-dimensional bounding box are acquired through a two-dimensional target detection network of the model. According to the geometric type of each parcel (parcel to be detected) in the first live image, a corresponding bounding box is selected and generated, and various types of bounding boxes including a cube, a cylinder, a cuboid and the like can be preset. For example, when the package (package to be detected) is a book, the geometrical category is a cuboid, and then a bounding box in the form of a cuboid can be generated here, and when the package (package to be detected) is a television or a washing machine, the geometrical category is a cube.

In the past, a sliding window is adopted for target recognition, however, only one target can be detected by one window, and the problem of multiple sizes exists. Anchor boxes (anchors) have therefore been proposed. Anchor frame information is set in advance, and nine specifications such as 9 anchor frames including 3x1,3x2 are set. And generating corresponding 9 anchor frames for each numerical value in the matrix according to the anchor frame information, wherein the specifications are nine specifications of 3x1,3x2 and the like.

507. Inputting four vertex coordinates of the two-dimensional bounding box wrapped in the first field image into a three-dimensional positioning sub-network to obtain horizontal and vertical coordinates of a projection point of the three-dimensional bounding box wrapped in the first field image in the field image;

in this embodiment, the coordinates of four vertices of the two-dimensional bounding box wrapped in the first live image are input into the three-dimensional positioning network, and the horizontal and vertical coordinates of the projection point of the three-dimensional bounding box wrapped in the first live image are obtained through the three-dimensional positioning network. The two-dimensional bounding box of the package is an external frame of the area where the package is located in the first field image; and inputting the four vertex coordinates of the wrapped two-dimensional bounding box into a three-dimensional positioning network to obtain the vertex coordinates (of the volume to be measured) wrapped in the image and the connection sequence among the vertex coordinates, and further obtain the horizontal and vertical coordinates of the middle projection point wrapped in the first field image.

508. Drawing a three-dimensional boundary frame wrapped in the first field image based on the horizontal and vertical coordinates of the projection point of the three-dimensional boundary frame in the first field image, and determining a target bounding box wrapped in the first field image according to the three-dimensional boundary frame;

in this embodiment, the horizontal and vertical coordinates of the projection point wrapped in the first live image are obtained according to the vertex coordinates (of the volume to be measured) wrapped in the image and the connection sequence between the vertex coordinates, and the three-dimensional bounding box wrapped in the first live image is drawn according to the horizontal and vertical coordinates of the projection point wrapped in the first live image, so as to determine the target bounding box wrapped in the first live image.

In this embodiment, according to the three-dimensional bounding box wrapped in the first field image, parameter information such as an inclination angle and a size of the (to-be-measured-volume) parcel in the image relative to the vertical direction may be determined, and according to the parameter information, the three-dimensional bounding box wrapped in the first field image is adjusted to determine the target bounding box wrapped in the first field image.

509. Inputting the depth map of the first parcel image and the target bounding box wrapped in the first field image into a corner point regression subnetwork, and performing corner point regression on the depth map of the first parcel image through the corner point regression subnetwork to obtain the depth of the center of the target bounding box wrapped in the first field image;

in this embodiment, the depth map of the first package image and the target bounding box wrapped in the first field image are input into a corner regression sub-network, and the corner regression is performed on the depth map of the first package image through the corner regression sub-network to obtain the depth of the center of the target bounding box wrapped in the first field image.

In three-dimensional computer graphics, a Depth Map (Depth Map) is an image or image channel that contains information about the distance of the surface of a scene object from a viewpoint. Where the Depth Map is similar to a grayscale image except that each pixel value thereof is the actual distance of the sensor from the object. Usually, the RGB image and the Depth image are registered, so that there is a one-to-one correspondence between the pixel points.

510. Determining three-dimensional position coordinate information of the parcel in the first live image based on the depth of the center of the target bounding box of the parcel in the first live image;

in this embodiment, the three-dimensional position coordinate information of the parcel in the first live image is determined according to the depth of the center of the target bounding box of the parcel in the first live image. After the depth of the center of the target bounding box wrapped in the image is determined, the three-dimensional position coordinate information of the parcel in the image is determined according to the size and the inclination angle of the target bounding box.

511. Determining volume parameters of the packages in the first package image based on the three-dimensional position coordinate information of the packages in the first package image;

512. the volume of the parcel in the first field image is calculated based on the volume parameter of the parcel in the first field image.

With reference to fig. 6, the method for measuring the parcel volume in the embodiment of the present invention is described above, and the device for measuring the parcel volume in the embodiment of the present invention is described below, where the first embodiment of the device for measuring the parcel volume in the embodiment of the present invention includes:

a first obtaining module 601, configured to obtain a first scene image of a parcel stacking scene captured in real time;

a first identification module 602, configured to input the first field image into a preset package identification model for package identification, so as to obtain a first identification image of a package label of the first field image;

in practical applications, the package label is represented by using the first area range.

A first extraction module 603, configured to extract, according to a first area range in the first identification image, a first parcel image corresponding to each parcel from the first identification image;

the prediction module 604 is configured to input the first package image into a preset three-dimensional prediction model to perform three-dimensional prediction processing, so as to obtain three-dimensional position coordinate information corresponding to the first package image, where the three-dimensional prediction processing includes construction of the package three-dimensional image and calculation of three-dimensional coordinate information of the package three-dimensional image in the first live image;

a determining module 605, configured to determine a volume parameter of the package in the first package image based on the three-dimensional position coordinate information of the package in the first package image;

a calculation module 606 for calculating a volume of the parcel in the first live image based on the volume parameter of the parcel in the first live image.

In the embodiment of the invention, an image of a package stacking scene is acquired and a preset package identification model is input to obtain a first area range of a package in the image; then, according to the first area range of the package in the image, obtaining a corresponding package image, and inputting a preset three-dimensional prediction model to obtain three-dimensional position coordinate information of the package in the image; and obtaining a target bounding box of the parcel in the image according to the three-dimensional position coordinate information of the parcel in the image, outputting volume parameters of the parcel, and calculating the volume of the parcel. According to the invention, the three-dimensional prediction model obtained through training can directly obtain the three-dimensional position coordinate information of the parcel in the picture through the picture, and further, the parcel volume is judged, so that the original manual estimation is replaced, and the parcel volume measurement efficiency is improved.

Referring to fig. 7, a second embodiment of the parcel volume measuring apparatus according to the embodiments of the present invention specifically includes:

a calculating module 606 for calculating a volume of the parcel in the first field image based on the volume parameter of the parcel in the first field image.

In practical application, the used parcel recognition model is obtained by training according to actual sample data before performing volume measurement, and the training process of the parcel recognition model is specifically realized by a training module, wherein the parcel volume measurement device further comprises:

the second obtaining module 607 is configured to obtain a plurality of sample images of a truck loading scene, and use the sample images as a first training sample image;

a first labeling module 608, configured to label the packages in the first training sample image, obtain first labeling information corresponding to each training sample image, and store the first labeling information as a first labeling file, where the first labeling information is used to identify a second area range in which the packages are loaded in the first training sample image;

the first training module 609 is configured to input the first training sample image and the corresponding first annotation file into a preset MASK R-CNN model for deep learning training, so as to obtain a package identification model, where the MASK kr-CNN model includes a ResNet-101 network, an RPN network, a roilign layer, and a classification network.

In this embodiment, the first identifying module 602 is specifically configured to:

inputting the first feature map into the RPN, and extracting a prediction box of the first feature map by a preset Selective Search (Selective Search) algorithm to obtain a prediction box corresponding to the first feature map;

In practical applications, the used three-dimensional prediction model is obtained by training according to actual sample data before performing volume measurement, and the training process of the three-dimensional prediction model is specifically realized by a training module, wherein the package volume measurement device further comprises:

a third obtaining module 610, configured to obtain a second scene image of the parcel stacking scene captured in real time;

a second identifying module 611, configured to input the second field image into a preset package identifying model for identification, so as to obtain a second identifying image of a package label of the second field image, where the package label is represented by a third area range;

a second extracting module 612, configured to extract, according to a third area range in the second identification image, a second parcel image corresponding to each parcel from the second identification image;

a second labeling module 613, configured to perform point cloud labeling on the second wrapped image, obtain second labeling information corresponding to the second wrapped image, and store the second labeling information as a second labeling file, where the second labeling information is used to identify a target bounding box wrapped in the second in-situ image;

and a second training module 614, configured to use the second field image as a second training sample image, and train the second training sample image and a corresponding second annotation file based on a three-dimensional prediction network to obtain a three-dimensional prediction model, where the three-dimensional prediction network includes a two-dimensional target detection sub-network, an example depth estimation sub-network, a three-dimensional positioning sub-network, and a corner point regression sub-network.

In this embodiment, the second labeling module 613 includes:

the semantic segmentation unit 6131 is configured to perform semantic segmentation on the second package image according to a preset target segmentation algorithm to obtain a semantic segmentation result;

a first determining unit 6132, configured to determine, based on the semantic segmentation result, a package to be detected in the second package image;

the scene reconstruction unit 6133 is configured to perform scene reconstruction according to the plurality of second wrapped images to obtain a reconstructed scene reconstruction image;

a point cloud segmentation unit 6134, configured to perform point cloud segmentation on the scene reconstruction image to obtain point cloud data of the package to be detected;

a generating unit 6135, configured to generate a bounding box corresponding to the parcel to be detected in the scene reconstruction image;

a fitting unit 6136, configured to fit the bounding box with the point cloud data of the package to be detected, to obtain a target bounding box, and output parameter information of the target bounding box;

a second determining unit 6137, configured to determine, based on the parameter information of the target bounding box, second annotation information corresponding to each second package image, and store the second annotation information as a second annotation file.

In this embodiment, the prediction module 604 is specifically configured to:

inputting the coordinates of four vertexes of the two-dimensional bounding box wrapped in the first live image into the three-dimensional positioning sub-network to obtain the horizontal and vertical coordinates of the projection point of the three-dimensional bounding box wrapped in the first live image in the live image;

drawing a three-dimensional bounding box wrapped in the first live image based on the horizontal and vertical coordinates of the projection point of the three-dimensional bounding box in the first live image, and determining a target bounding box wrapped in the first live image according to the three-dimensional bounding box;

In this embodiment, the fitting unit 6136 is specifically configured to:

In the embodiment of the invention, a first area range of a package in an image is obtained by acquiring the image of a package stacking scene and inputting a preset package identification model; then, according to the first area range of the package in the image, obtaining a corresponding package image, and inputting a preset three-dimensional prediction model to obtain three-dimensional position coordinate information of the package in the image; and obtaining a target bounding box of the parcel in the image according to the three-dimensional position coordinate information of the parcel in the image, outputting volume parameters of the parcel, and calculating the volume of the parcel. According to the invention, the three-dimensional position coordinate information of the parcel in the picture can be directly obtained through the picture by the three-dimensional prediction model obtained through training, and further, the parcel volume is judged, so that the original manual estimation is replaced, and the parcel volume measurement efficiency is improved.

Fig. 6 and 7 describe the parcel volume measuring apparatus in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the parcel volume measuring apparatus in the embodiment of the present invention is described in detail from the perspective of hardware processing.

Fig. 8 is a schematic structural diagram of a package volume measuring device 800 according to an embodiment of the present invention, where the package volume measuring device 800 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 810 (e.g., one or more processors) and a memory 820, and one or more storage media 830 (e.g., one or more mass storage devices) storing an application 833 or data 832. Memory 820 and storage medium 830 may be, among other things, transient or persistent storage. The program stored in the storage medium 830 may include one or more modules (not shown), each of which may include a sequence of instructions operating on the package volume measurement device 800. Still further, processor 810 may be configured to communicate with storage medium 830 to execute a series of instruction operations in storage medium 830 on package volume measurement device 800.

The package volume measuring device 800 may also include one or more power supplies 840, one or more wired or wireless network interfaces 850, one or more input-output interfaces 860, and/or one or more operating systems 831, such as Windows Server, mac OS X, unix, linux, freeBSD, and the like. Those skilled in the art will appreciate that the package volume measurement device configuration shown in fig. 8 does not constitute a limitation of the package volume measurement devices provided herein, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the above-described parcel volume measurement method.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is substantially or partly contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A parcel volume measuring method, characterized in that it comprises:

acquiring a first scene image of a real-time snapshot parcel stacking scene;

determining a volume parameter of a parcel in the first parcel image based on three-dimensional position coordinate information of the parcel in the first parcel image;

calculating a volume of a parcel in the first live image based on a volume parameter of a parcel in the first live image.

2. The parcel volume measurement method of claim 1, further comprising, prior to said acquiring a first live image of a real-time captured parcel stacking scene:

3. The method of claim 2, wherein the entering the first field image into a preset package recognition model for package recognition to obtain a first recognition image of a package label of the first field image comprises:

4. The parcel volume measurement method according to any one of claims 1-3, further comprising, prior to the acquiring the first live image of the real-time grabbed parcel stacking scene:

acquiring a second scene image of a real-time snapshot parcel stacking scene;

performing point cloud annotation on the second parcel image to obtain second annotation information corresponding to the second parcel image and storing the second annotation information as a second annotation file, wherein the second annotation information is used for identifying a target bounding box wrapped in the second field image;

5. The parcel volume measuring method of claim 4, wherein said point cloud labeling of the second parcel image to obtain second labeling information corresponding to the second parcel image and saving the second labeling information as a second labeling file comprises:

fitting the bounding box with the point cloud data of the parcel to be detected to obtain a target bounding box and outputting the parameter information of the target bounding box;

6. The parcel volume measuring method of claim 4, wherein said inputting the first parcel image into a preset three-dimensional prediction model for three-dimensional prediction processing to obtain three-dimensional position coordinate information corresponding to the first parcel image comprises:

splicing the shallow feature map and the deep feature map, and outputting a plurality of disparity maps with different scales through depth prediction to obtain a depth map of the first wrapping image;

7. The parcel volume measuring method of claim 5, wherein said fitting said bounding box to the point cloud data of the parcel to be detected to obtain a target bounding box and outputting parameter information of said target bounding box comprises:

8. A parcel volume measuring apparatus, comprising:

the first acquisition module is used for acquiring a first site image of a parcel stacking scene captured in real time;

9. A parcel volume measuring apparatus, comprising: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;

the at least one processor invokes the instructions in the memory to cause the parcel volume measurement apparatus to perform the parcel volume measurement method of any of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a parcel volume measurement method according to any one of claims 1 to 7.