CN117132633A - Method, device, equipment and medium for estimating loading rate based on monocular camera - Google Patents

Method, device, equipment and medium for estimating loading rate based on monocular camera Download PDF

Info

Publication number
CN117132633A
CN117132633A CN202210550981.1A CN202210550981A CN117132633A CN 117132633 A CN117132633 A CN 117132633A CN 202210550981 A CN202210550981 A CN 202210550981A CN 117132633 A CN117132633 A CN 117132633A
Authority
CN
China
Prior art keywords
image
point cloud
warehouse
dimensional
monocular camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210550981.1A
Other languages
Chinese (zh)
Inventor
田万鑫
张修宝
沈海峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Didi Infinity Technology and Development Co Ltd
Original Assignee
Beijing Didi Infinity Technology and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Didi Infinity Technology and Development Co Ltd filed Critical Beijing Didi Infinity Technology and Development Co Ltd
Priority to CN202210550981.1A priority Critical patent/CN117132633A/en
Publication of CN117132633A publication Critical patent/CN117132633A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30268Vehicle interior

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

The disclosure provides a method, a device, equipment and a medium for estimating a loading rate based on a monocular camera, wherein the method for estimating the loading rate based on the monocular camera comprises the following steps: acquiring a first image in a warehouse acquired by the monocular camera, and preprocessing the first image to obtain a first depth estimation image of the first image; mapping the first depth estimation map into a first three-dimensional point cloud; performing three-dimensional reconstruction on the first three-dimensional point cloud to obtain a second three-dimensional point cloud under a world coordinate system; obtaining a first two-dimensional square matrix based on the second three-dimensional point cloud; estimating a loading rate of the warehouse based on the first two-dimensional square matrix. According to the technical scheme provided by the disclosure, the estimation accuracy is high, the hardware self cost and the installation cost are far lower than those of the traditional dot matrix laser equipment, and a large amount of real-time data are not required to be acquired and marked for model training, so that the adaptability is high.

Description

Method, device, equipment and medium for estimating loading rate based on monocular camera
Technical Field
The disclosure relates to the technical field of data processing, in particular to a method, a device, equipment and a medium for estimating a loading rate based on a monocular camera.
Background
In the freight industry, freight efficiency and safety are two key indicators, and freight volumetric technology is a core technology for improving efficiency and safety. The freight measurement technique is a technique for detecting and estimating the physical layout in the freight warehouse of the truck in real time through various sensor devices so as to realize better planning, prevention and control.
Currently, a freight party estimation scheme may be implemented based on a lattice laser device. Although the algorithm of the scheme is simple, the precision is higher, and the application floor requirement can be met, the freight volume estimation scheme based on the dot matrix laser equipment is not applied in a large scale due to the high price and the high installation cost of the dot matrix laser equipment.
Another freight party estimation scheme may be implemented based on an image classification method of the image acquisition device. The scheme has lower precision, can not meet the application grounding requirement, and has higher acquisition cost and labeling cost of the data set. Thus, this solution is not applied on a large scale, but only in a small, fixed, easier shipping scenario.
Disclosure of Invention
In order to solve the problems in the related art, embodiments of the present disclosure provide a method, apparatus, device, and medium for estimating a loading rate based on a monocular camera.
In a first aspect, a method for estimating a loading rate based on a monocular camera is provided in an embodiment of the present disclosure.
Specifically, the monocular camera-based load rate estimation method includes:
acquiring a first image in a warehouse acquired by the monocular camera, and preprocessing the first image to obtain a first depth estimation image of the first image;
mapping the first depth estimation map into a first three-dimensional point cloud;
performing three-dimensional reconstruction on the first three-dimensional point cloud to obtain a second three-dimensional point cloud under a world coordinate system;
obtaining a first two-dimensional square matrix based on the second three-dimensional point cloud;
estimating a loading rate of the warehouse based on the first two-dimensional square matrix.
According to an embodiment of the present disclosure, the performing three-dimensional reconstruction on the first three-dimensional point cloud to obtain a second three-dimensional point cloud under a world coordinate system includes:
acquiring a warehouse empty image acquired by the monocular camera, and preprocessing the empty image to obtain a second depth estimation image of the empty image;
mapping the second depth estimation map to a third three-dimensional point cloud;
performing point cloud plane segmentation on the third three-dimensional point cloud to obtain normal vectors and corresponding intercepts of each warehouse plane under a camera coordinate system;
Obtaining an origin of the world coordinate system based on the intercept of the warehouse plane and a camera optical center;
obtaining a three-dimensional coordinate vector of the world coordinate system based on the normal vector of the warehouse plane;
and obtaining the second three-dimensional point cloud based on the origin of the world coordinate system and the three-dimensional coordinate vector.
According to an embodiment of the present disclosure, the obtaining a first two-dimensional square matrix based on the second three-dimensional point cloud includes:
projecting the second three-dimensional point cloud to an XY plane of a world coordinate system to obtain a shape d y *d x Wherein dx is the maximum intercept in the X-axis direction in the second three-dimensional point cloud, and dy is the maximum intercept in the Y-axis direction in the second three-dimensional point cloud;
and when a plurality of points of the second three-dimensional point cloud are projected to the same coordinate point under the XY plane, the point with the largest Z value is projected to the coordinate point under the XY plane.
According to an embodiment of the disclosure, the estimating the loading rate of the warehouse based on the two-dimensional square matrix includes:
carrying out mean value processing on the two-dimensional square matrix;
and obtaining the loading rate based on the two-dimensional square matrix subjected to mean value processing and the intercept dz, wherein the dz is the maximum intercept in the Z-axis direction in the second three-dimensional point cloud.
According to an embodiment of the present disclosure, the preprocessing the first image to obtain a first depth estimation map of the first image includes:
performing de-distortion treatment on the first image to obtain a first de-distorted image;
and obtaining a first depth estimation image of the first image based on the first undistorted image and a pre-trained depth estimation model.
According to an embodiment of the present disclosure, further comprising:
calibrating the monocular camera to obtain built-in parameters of the monocular camera;
performing de-distortion processing on the first image based on built-in parameters of the monocular camera to obtain a first de-distorted image;
and mapping the first depth estimation map into a first three-dimensional point cloud based on built-in parameters of the monocular camera.
According to an embodiment of the present disclosure, further comprising:
dividing the warehouse space into blocks in the horizontal direction or the vertical direction;
and estimating the loading rate of the warehouse in each partition to obtain a first gridding loading rate matrix of the warehouse.
In a second aspect, in an embodiment of the present disclosure, a cargo dumping identifying method based on a monocular camera is provided.
Specifically, the goods dumping identification method based on the monocular camera comprises the following steps:
Acquiring a second image and a third image in a warehouse acquired by the monocular camera;
obtaining a second two-dimensional square matrix corresponding to the second image and a third two-dimensional square matrix corresponding to the third image by adopting the monocular camera-based loading rate estimation method according to the first aspect;
and when the difference between the second two-dimensional square matrix and the third two-dimensional square matrix is larger than a first threshold value, judging that goods toppling occurs.
In a third aspect, an embodiment of the present disclosure provides a cargo displacement identification method based on a monocular camera.
Specifically, the goods shift identification method based on the monocular camera comprises the following steps:
acquiring a fourth image and a fifth image in a warehouse acquired by the monocular camera;
obtaining a second gridding loading rate matrix corresponding to the fourth image and a third gridding loading rate matrix corresponding to the fifth image by adopting the loading rate estimation method based on the monocular camera according to the first aspect;
and when the difference between the second gridding loading rate matrix and the third gridding loading rate matrix is larger than a second threshold value, judging that cargo shifting occurs.
According to an embodiment of the present disclosure, further comprising:
Determining a partition with the largest difference between the second gridding loading rate matrix and the third gridding loading rate matrix;
and estimating the starting position and the ending position of the cargo shift based on the blocks with the largest difference.
In a fourth aspect, a monocular camera-based load rate estimation apparatus is provided in an embodiment of the present disclosure.
Specifically, the monocular camera-based load rate estimation device includes:
the first acquisition unit is configured to acquire a first image in a warehouse acquired by the monocular camera, and preprocess the first image to obtain a first depth estimation image of the first image;
a mapping unit configured to map the first depth estimation map to a first three-dimensional point cloud;
the reconstruction unit is configured to perform three-dimensional reconstruction on the first three-dimensional point cloud to obtain a second three-dimensional point cloud under a world coordinate system;
a projection unit configured to obtain a first two-dimensional square matrix based on the second three-dimensional point cloud;
an estimation unit configured to determine the network based on the first to nth layer network components.
According to an embodiment of the present disclosure, the performing three-dimensional reconstruction on the first three-dimensional point cloud to obtain a second three-dimensional point cloud under a world coordinate system includes:
Acquiring a warehouse empty image acquired by the monocular camera, and preprocessing the empty image to obtain a second depth estimation image of the empty image;
mapping the second depth estimation map to a third three-dimensional point cloud;
performing point cloud plane segmentation on the third three-dimensional point cloud to obtain normal vectors and corresponding intercepts of each warehouse plane under a camera coordinate system;
obtaining an origin of the world coordinate system based on the intercept of the warehouse plane and a camera optical center;
obtaining a three-dimensional coordinate vector of the world coordinate system based on the normal vector of the warehouse plane;
and obtaining the second three-dimensional point cloud based on the origin of the world coordinate system and the three-dimensional coordinate vector.
According to an embodiment of the present disclosure, the obtaining a first two-dimensional square matrix based on the second three-dimensional point cloud includes:
projecting the second three-dimensional point cloud to an XY plane of a world coordinate system to obtain a shape d y *d x Wherein dx is the maximum intercept in the X-axis direction in the second three-dimensional point cloud, and dy is the maximum intercept in the Y-axis direction in the second three-dimensional point cloud;
and when a plurality of points of the second three-dimensional point cloud are projected to the same coordinate point under the XY plane, the point with the largest Z value is projected to the coordinate point under the XY plane.
According to an embodiment of the disclosure, the estimating the loading rate of the warehouse based on the two-dimensional square matrix includes:
carrying out mean value processing on the two-dimensional square matrix;
and obtaining the loading rate based on the two-dimensional square matrix subjected to mean value processing and the intercept dz, wherein the dz is the maximum intercept in the Z-axis direction in the second three-dimensional point cloud.
According to an embodiment of the present disclosure, the preprocessing the first image to obtain a first depth estimation map of the first image includes:
performing de-distortion treatment on the first image to obtain a first de-distorted image;
and obtaining a first depth estimation image of the first image based on the first undistorted image and a pre-trained depth estimation model.
According to an embodiment of the present disclosure, further comprising:
calibrating the monocular camera to obtain built-in parameters of the monocular camera;
performing de-distortion processing on the first image based on built-in parameters of the monocular camera to obtain a first de-distorted image;
and mapping the first depth estimation map into a first three-dimensional point cloud based on built-in parameters of the monocular camera.
According to an embodiment of the present disclosure, further comprising:
Dividing the warehouse space into blocks in the horizontal direction or the vertical direction;
and estimating the loading rate of the warehouse in each partition to obtain a first gridding loading rate matrix of the warehouse.
In a fifth aspect, in an embodiment of the present disclosure, a cargo dumping identifying device based on a monocular camera is provided.
Specifically, the goods dumping identification device based on the monocular camera comprises:
a second acquisition unit configured to acquire a second image and a third image within the warehouse acquired by the monocular camera;
a first determining unit configured to obtain a second two-dimensional square matrix corresponding to the second image and a third two-dimensional square matrix corresponding to the third image using the monocular camera-based load rate estimation method according to the first aspect;
and a first determination unit configured to determine that dumping of the cargo occurs when a difference between the second two-dimensional square matrix and the third two-dimensional square matrix is greater than a first threshold.
In a sixth aspect, in an embodiment of the present disclosure, there is provided a cargo displacement recognition device based on a monocular camera.
Specifically, the goods shift identification device based on the monocular camera comprises:
a third acquisition unit configured to acquire a fourth image and a fifth image within the warehouse acquired by the monocular camera;
A second determining unit configured to obtain a second gridding loading rate matrix corresponding to the fourth image and a third gridding loading rate matrix corresponding to the fifth image by using the monocular camera-based loading rate estimation method according to the first aspect;
and a second determination unit configured to determine that cargo displacement occurs when a difference between the second and third gridding loading rate matrices is greater than a second threshold.
According to an embodiment of the present disclosure, further comprising:
determining a partition with the largest difference between the second gridding loading rate matrix and the third gridding loading rate matrix;
and estimating the starting position and the ending position of the cargo shift based on the blocks with the largest difference.
In a seventh aspect, an embodiment of the present disclosure provides an electronic device, including a memory for storing one or more computer instructions supporting a data processing apparatus to perform the above-described monocular camera-based load rate estimation method, cargo dumping identification method, and cargo displacement identification method, and a processor configured to execute the computer instructions stored in the memory. The electronic device may further comprise a communication interface for the data transmission means to communicate with other devices or a communication network.
In an eighth aspect, in an embodiment of the present disclosure, a computer readable storage medium is provided for storing computer instructions for use by a data processing apparatus, where the computer instructions for executing the above-described monocular camera-based load rate estimation method, cargo dumping identification method, and cargo shifting identification method are related to the data processing apparatus.
According to the technical scheme provided by the embodiment of the disclosure, based on the monocular camera, the real-time loading rate of the warehouse is estimated by combining camera calibration, image de-distortion, depth estimation, point cloud mapping and a three-dimensional reconstruction technology, the estimation accuracy is high, the hardware self cost and the installation cost are far lower than those of the traditional dot matrix laser equipment, and a large amount of real-time data are not required to be acquired and marked for model training, so that the adaptability is high.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
Other features, objects and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments, taken in conjunction with the accompanying drawings. In the drawings.
Fig. 1 illustrates a flowchart of a monocular camera-based load rate estimation method according to an embodiment of the present disclosure.
Fig. 2 shows a camera installation schematic according to an embodiment of the present disclosure.
Fig. 3 is a flow chart of a monocular camera-based cargo dumping identification method in accordance with an embodiment of the disclosure.
Fig. 4 shows a flowchart of a monocular camera-based cargo displacement identification method according to an embodiment of the present disclosure.
Fig. 5 shows a block diagram of a monocular camera-based load rate estimation device according to an embodiment of the present disclosure.
Fig. 6 shows a block diagram of a monocular camera-based cargo dumping identification device in accordance with an embodiment of the disclosure.
Fig. 7 shows a block diagram of a monocular camera-based cargo displacement recognition device according to an embodiment of the present disclosure.
Fig. 8 shows a block diagram of an electronic device according to an embodiment of the disclosure.
Fig. 9 shows a schematic diagram of a computer system suitable for use in implementing methods according to embodiments of the present disclosure.
Detailed Description
Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. In addition, for the sake of clarity, portions irrelevant to description of the exemplary embodiments are omitted in the drawings.
In this disclosure, it should be understood that terms such as "comprises" or "comprising," etc., are intended to indicate the presence of features, numbers, steps, acts, components, portions, or combinations thereof disclosed in this specification, and are not intended to exclude the possibility that one or more other features, numbers, steps, acts, components, portions, or combinations thereof are present or added.
In addition, it should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
In the present disclosure, the acquisition of user information or user data is an operation that is authorized, confirmed, or actively selected by the user.
As mentioned above, the current common measuring party estimation schemes include implementation based on a lattice type laser device and implementation based on an image classification method of an image acquisition device. The former equipment is high in price and high in installation cost, the latter equipment needs to acquire a large amount of real-time data to perform model training, the sample acquisition cost and the labeling cost are high, the adaptability of the selected image classification algorithm to different scenes is low, and a large amount of error classification cases are easy to cause. Therefore, none of the current solutions is applied on a large scale, with only a small number of fixed, easier freight scenarios.
In view of this, the embodiment of the present disclosure provides a method of estimating a loading rate based on a monocular camera.
Fig. 1 illustrates a flowchart of a monocular camera-based load rate estimation method according to an embodiment of the present disclosure. As shown in fig. 1, the method for estimating the loading rate based on the monocular camera includes the following steps S101 to S105:
In step S101, a first image in a warehouse acquired by the monocular camera is acquired, and the first image is preprocessed to obtain a first depth estimation image of the first image;
in step S102, mapping the first depth estimation map to a first three-dimensional point cloud;
in step S103, performing three-dimensional reconstruction on the first three-dimensional point cloud to obtain a second three-dimensional point cloud under a world coordinate system;
in step S104, a first two-dimensional square matrix is obtained based on the second three-dimensional point cloud;
in step S105, the loading rate of the warehouse is estimated based on the first two-dimensional square matrix.
In the disclosed embodiments, the monocular camera-based load rate estimation method may be applied to closed or semi-closed bins, including but not limited to truck bins, container bins, warehouse bins. The three-dimensional shape of the warehouse is cuboid, and each adjacent plane of the warehouse is vertical and parallel to the plane. The goods warehouse is closed under the condition of closing, and the goods warehouse is three-dimensionally a surrounding hexahedral cuboid; the goods warehouse is semi-closed under the condition of opening, and is a cuboid with one open surface. For ease of illustration and explanation, the following description will be given with respect to an example of load rate estimation for a truck cargo bay. For a freight car freight house, the position close to the headstock can be appointed as the head, the position far away from the headstock is the tail, the position close to the ground is the bottom, the position far away from the ground is the top, and the parts positioned at two sides of the freight car freight house are freight house side parts. Correspondingly, the warehouse surface at the head is the head surface, and the other surfaces are the tail surface, the bottom surface, the top surface and the side surfaces respectively.
In the embodiment of the disclosure, the estimation method may be performed by an estimation device, which may be implemented as software or as a combination of software and hardware, and the estimation device may be disposed in a remote server, a local server, or a mobile terminal, and wirelessly connected to a monocular camera, and receives data collected by the monocular camera, and outputs the processed data to a display device for display.
In the embodiment of the present disclosure, the monocular camera may be a general pinhole camera, a wide-angle camera, and a fisheye camera, which are not limited herein. The installation of the monocular camera needs to meet the following conditions: 1) The camera should be stable as much as possible after being installed, so that the situation that equipment is deviated or rotated due to vehicle shake or manual touch and the like is avoided; 2) After the camera is installed, the image acquisition visual field range of the camera should at least comprise five planes of the warehouse, and must comprise all the warehouse planes opposite to the camera; 3) The mounting position of the camera is at least attached to the plane of the warehouse, and the optimal mounting position of the camera is on the intersection line of the two planes in the warehouse; 4) In order to make each plane in the acquired image as symmetrical as possible, the mounting position of the camera should be on the midline of the warehouse plane or in the midpoint of the intersection line of the warehouse planes; 5) In the case where item 2 is satisfied, the camera should capture as much of the field of view of the plane that does not intersect the camera as possible. Wherein 2) and 3) are the requirements that must be met at the time of installation, and 1), 4) and 5) are the requirements that should be met at the time of installation as much as possible.
Fig. 2 shows a camera installation schematic according to an embodiment of the present disclosure. As shown in fig. 2, the camera is mounted at the midpoint of the intersection of the aft face of the warehouse and the top face of the warehouse, meeting the mounting conditions 3) and 4) above; θ in fig. 2 is the field angle of the camera, the range within the field angle is the image acquisition range, and all the corner points a of the warehouse head face and the other four planes are within the field angle range, which meets the above-mentioned mounting condition 2).
In the embodiment of the disclosure, after the first image in the warehouse acquired by the monocular camera is acquired, the first image may be first preprocessed to obtain a first depth estimation image of the first image. The preprocessing may include a de-distortion process and a depth estimation process, where the de-distortion process may be a de-distortion process corresponding to camera calibration, and the depth estimation process may be a depth estimation process implemented based on a depth estimation model, including but not limited to unsupervised, semi-supervised, or fully supervised, single purpose, and dual purpose depth estimation models.
Specifically, the monocular camera may be calibrated first to obtain built-in parameters of the camera, where the built-in parameters include a built-in parameter matrix and distortion parameters. In the embodiment of the disclosure, camera calibration may be implemented by adopting an existing calibration method through steps of calibration material collection, calibration algorithm calculation, camera built-in parameter determination, and the like, where the existing calibration method may be a Zhang Zhengyou calibration method, or other calibration methods, and is not limited herein. After calibration of the camera is completed, the image acquired by the monocular camera is subjected to de-distortion processing based on the built-in parameters, and the de-distortion processing can be performed by a formula i=u (I 0 ) The representation, wherein I is the output two-dimensional de-distorted image, U is the de-distorted model, I 0 For the original two-dimensional image that is input, the de-distortion model may be a polynomial de-distortion model or other model, without limitation. After the de-distorted image is obtained, depth estimation is completed based on a formula d=f (I), where D is an output depth estimation map and F is a depth estimation model. For any point p on the undistorted image, its coordinates are [ u, v ]] T There are identical point coordinates [ u, v ] on the depth estimation map D] T The value of d is d, and d is the depth distance. By adopting the method, the preprocessing of the first image can be realized, and the first depth estimation image of the first image is obtained.
In the embodiment of the disclosure, mapping the first depth estimation map to a first three-dimensional point cloud refers to mapping the first depth estimation map to a first three-dimensional point cloud based on the built-in parameters of the camera by using a point cloud mapping technology, wherein the first three-dimensional point cloud is located under a camera coordinate system, an optical axis of the camera is parallel to a Z-axis under the camera coordinate system, an optical center of the camera is an origin under the camera coordinate system, and an XY plane of the camera coordinate system is parallel to a camera imaging plane. The point cloud mapping technology may be any technology capable of converting a depth estimation map into a three-dimensional point cloud, and is not limited herein.
In an embodiment of the present disclosure, performing three-dimensional reconstruction on the first three-dimensional point cloud to obtain a second three-dimensional point cloud in a world coordinate system, including: acquiring a warehouse empty image acquired by the monocular camera, and preprocessing the empty image to obtain a second depth estimation image of the empty image; mapping the second depth estimation map to a third three-dimensional point cloud; performing point cloud plane segmentation on the third three-dimensional point cloud to obtain normal vectors and corresponding intercepts of each warehouse plane under a camera coordinate system; obtaining an origin of the world coordinate system based on the intercept of the warehouse plane and a camera optical center; obtaining a three-dimensional coordinate vector of the world coordinate system based on the normal vector of the warehouse plane; and obtaining the second three-dimensional point cloud based on the origin of the world coordinate system and the three-dimensional coordinate vector.
Specifically, when the first three-dimensional point cloud is reconstructed in three dimensions, a normal vector and a corresponding intercept of each warehouse plane in a camera coordinate system obtained based on warehouse empty image processing are required to be obtained first, and then the first three-dimensional point cloud in the camera coordinate system is reconstructed based on the normal vector and the corresponding intercept of each warehouse plane to obtain a second three-dimensional point cloud in a world coordinate system. The acquiring the normal vector and the corresponding intercept of each warehouse plane under a camera coordinate system based on warehouse empty warehouse image processing comprises the following steps: the method comprises the steps of acquiring a warehouse empty image acquired by a monocular camera, preprocessing the empty image to obtain a second depth estimation image of the empty image, and mapping the second depth estimation image into a third three-dimensional point cloud, wherein the preprocessing step and the step of mapping the depth estimation image into the three-dimensional point cloud are the same as the preprocessing and mapping methods for the first image, and are not repeated herein; and carrying out point cloud plane segmentation on the third three-dimensional point cloud to obtain normal vectors and corresponding intercepts of each warehouse plane under a camera coordinate system, wherein the point cloud plane segmentation can be realized by adopting any one of a random sampling consistency (Random Sample Consensus, ranSaC) algorithm, a point cloud segmentation algorithm based on proximity information, a segmentation algorithm based on point cloud frequency, a minimum segmentation algorithm, an ultra-volume clustering segmentation algorithm and a segmentation algorithm based on concave-convex type. Taking RanSaC algorithm as an example, the implementation manner of dividing the third three-dimensional point cloud into point cloud planes to obtain normal vectors and corresponding intercepts of each warehouse plane under a camera coordinate system is specifically described.
As described above, the third three-dimensional point cloud is established under the camera coordinate system in which the optical axis of the camera is parallel to the Z-axis under the camera coordinate system, the optical center of the camera is the origin under the camera coordinate system, and the XY-plane of the camera coordinate system is parallel to the camera imaging plane. Under the world coordinate system, the embodiment of the disclosure considers the condition that the truck is horizontally placed on the ground, at the moment, the Z axis of the world coordinate system is perpendicular to the bottom surface of the cargo warehouse, the bottom surface of the cargo warehouse is a cargo warehouse bottom XY plane, and the top surface of the cargo warehouse is a cargo warehouse top XY plane; the vertical face of the goods warehouse, which is vertical to the ground and close to the head of the goods van, is marked as a goods warehouse XZ face, namely the Y axis of the world coordinate system is vertical to the goods warehouse XZ plane; the inner surfaces, which are perpendicular to the X axis of the world coordinate system and are positioned on two sides of the truck, are respectively marked as a left cargo warehouse YZ plane and a right cargo warehouse YZ plane.
Under the condition that no cargo exists in the cargo warehouse, the obtained third three-dimensional point cloud can be subjected to plane segmentation to obtain normal vectors of the cargo warehouse top XY plane, the cargo warehouse XY bottom plane, the cargo warehouse XZ plane, the cargo warehouse left YZ plane and the cargo warehouse right YZ plane under a camera coordinate systemAnd corresponding intercepts (dzt, dzb, dy, dxl, dxr). Here, since the left warehouse YZ plane is parallel to the right warehouse YZ plane and the warehouse top XY plane is parallel to the warehouse XY bottom plane, +. >And->The same is marked as->And->The same is marked as->The three normal vectors are three-dimensional coordinate vectors of an X axis, a Y axis and a Z axis in a world coordinate system.
After the normal vector and the corresponding intercept of each warehouse plane under the camera coordinate system are obtained, firstly, on the premise that the origin of the coordinate system is still positioned in the camera optical center, the coordinate system where the second three-dimensional point cloud is positioned is rotated from the camera coordinate system to the world coordinate system, and then, based on the intercept, the origin of the coordinate system where the second three-dimensional point cloud is positioned is translated from the camera optical center to the origin of the world coordinate system. Specifically, if the first three-dimensional point cloud in the camera coordinate system is represented by Pc, the second three-dimensional point cloud in the world coordinate system is represented by Pw, where the three-dimensional point cloud Pw may be represented as P w ={(x,y,z) T |0≤x≤d x ,0≤y≤d y ,0≤z≤d z The process of coordinate system rotation and coordinate system translation is described below in the form of equations. Three bin plane normal vectors obtained assuming no cargo in the binMutually orthogonal and unit vectors are +.>Can be used as a base vector of a coordinate system. The basis vector of the coordinate system after rotation is (1, 0) T ,(0,1,0) T ,(0,0,1) T Here denoted +.>The rotation matrix r=b of the coordinate system -1 A, wherein B -1 Is the inverse of matrix B. The coordinate system rotation may be described as a left-hand rotation matrix R over all points in the three-dimensional point cloud. Let a point p in the three-dimensional point cloud be (a, b, c) as its coordinates T The coordinates of the point p after rotation are (a ', b ', c ') T The formula for the ride-on operation is expressed as (a ', b ', c ') T =R(a,b,c) T . After rotation of the coordinate system, the point p has a coordinate of (a ",b″,c″) T the displacement of the coordinate system can be expressed as (a ', b ', c ') T =(a’,b’,c’) T +(d xr ,0,d zb ) T . Finally, the process of transforming the point cloud Pc under the camera coordinate system into the point cloud Pw under the world coordinate system after rotation and translation can be expressed by the following formula: p (P) w =RP c +(d xr ,0,d zb ) T Wherein d is xr Is the intercept of the right YZ plane of the warehouse, d zb Is the intercept of the XY bottom plane of the cargo compartment.
In an embodiment of the present disclosure, the obtaining a first two-dimensional square matrix based on the second three-dimensional point cloud includes: and projecting the second three-dimensional point cloud to an XY plane of a world coordinate system to obtain a first two-dimensional square matrix V with a shape of dy X dx, wherein dx is the maximum intercept in the X-axis direction, namely the difference value between the maximum value and the minimum value of the X-axis direction in the second three-dimensional point cloud, and dy is the maximum intercept in the Y-axis direction, namely the difference value between the maximum value and the minimum value of the Y-axis direction in the second three-dimensional point cloud. And when a plurality of points of the second three-dimensional point cloud are projected to the same coordinate point under the XY plane, the point with the largest Z value is projected to the coordinate point under the XY plane.
In an embodiment of the disclosure, the estimating the loading rate of the warehouse based on the first two-dimensional square matrix includes: carrying out mean value processing on the two-dimensional square matrix; and obtaining the loading rate based on the two-dimensional square matrix after the mean value processing and the intercept dz. In one embodiment of the present disclosure, the formula may be according toAnd carrying out average value processing on the two-dimensional square matrix, wherein the average value of the M two-dimensional square matrix is respectively an X-axis coordinate and a Y-axis coordinate, and the value range of the X-axis coordinate and the Y-axis coordinate is respectively from the minimum value to the maximum value of the X-axis coordinate and from the minimum value to the maximum value of the Y-axis coordinate in the second three-dimensional point cloud. In one embodiment of the present disclosure, the formula +.> And calculating the loading rate, wherein VR is the loading rate, and dz is the maximum intercept in the Z-axis direction, namely the difference value between the maximum value and the minimum value of the Z-coordinate, in the second three-dimensional point cloud.
According to the technical scheme provided by the embodiment of the disclosure, based on the monocular camera, the real-time loading rate of the warehouse is estimated by combining camera calibration, image de-distortion, depth estimation, point cloud mapping and a three-dimensional reconstruction technology, the estimation accuracy is high, the hardware self cost and the installation cost are far lower than those of the traditional dot matrix laser equipment, and a large amount of real-time data are not required to be acquired and marked for model training, so that the adaptability is high.
The whole loading rate in the warehouse is estimated in the mode, in practical application, the warehouse space can be divided into blocks in the horizontal direction or the vertical direction, and then the gridding loading rate of the warehouse is estimated. In an embodiment of the present disclosure, the method for estimating a loading rate based on a monocular camera may further include: dividing the warehouse space into blocks in the horizontal direction or the vertical direction; and estimating the loading rate of the warehouse in each partition to obtain a first gridding loading rate matrix of the warehouse. Specifically, taking horizontal direction blocking as an example, firstly dividing a top view of a warehouse into m×n blocks in equal proportion, wherein M represents the column number after blocking, N represents the row number after blocking, the space of the warehouse after blocking is equal, and the whole space of the warehouse is composed of m×n block columnar spaces after blocking. On the basis, the loading rate estimation method can estimate the loading rate of each columnar space in real time after the goods are loaded in the goods warehouse.
Fig. 3 illustrates a flowchart of a monocular camera-based cargo dumping identification method in accordance with an embodiment of the disclosure. As shown in fig. 3, the goods dumping identification method based on the monocular camera includes the following steps S301 to S303:
In step S301, a second image and a third image in the warehouse acquired by the monocular camera are acquired;
in step S302, a second two-dimensional square matrix corresponding to the second image and a third two-dimensional square matrix corresponding to the third image are obtained by adopting a monocular camera-based loading rate estimation method as shown in fig. 1;
in step S303, when the difference between the second two-dimensional square matrix and the third two-dimensional square matrix is greater than a first threshold, it is determined that cargo toppling occurs.
In this embodiment of the present disclosure, the second image and the third image may be different frame images in continuous time, the first threshold may be determined according to a two-dimensional square matrix obtained when full load is performed, and if a sum of all elements in the two-dimensional square matrix obtained when full load is S, a value range of the first threshold may be 0.05×s to 0.1×s.
According to the technical scheme provided by the embodiment of the disclosure, whether goods topple or not is judged by comparing the two-dimensional square matrix information corresponding to different frame images under continuous time, the implementation mode is simple, the cost is low, the judgment accuracy is high, and the adaptability is strong.
Fig. 4 shows a flowchart of a monocular camera-based cargo displacement identification method according to an embodiment of the present disclosure. As shown in fig. 4, the goods shift identification method based on the monocular camera includes the following steps S401 to S403:
In step S401, a fourth image and a fifth image in the warehouse acquired by the monocular camera are acquired;
in step S402, a second gridding loading rate matrix corresponding to the fourth image and a third gridding loading rate matrix corresponding to the fifth image are obtained by adopting a loading rate estimation method based on a monocular camera as shown in fig. 1;
in step S403, it is determined that cargo shifting occurs when the difference between the second and third gridding loading rate matrices is greater than a second threshold.
In the embodiment of the present disclosure, the second threshold may be determined according to a size of the gridding loading rate matrix, and if the width of the gridding loading rate matrix is W and the height of the gridding loading rate matrix is H, a value range of the second threshold may be 1/(w×h) to 2/(w×h). In an embodiment of the present disclosure, the cargo shift identifying method may further include: determining a partition with the largest difference between the second gridding loading rate matrix and the third gridding loading rate matrix; and estimating the starting position and the ending position of the cargo shift based on the blocks with the largest difference. Specifically, assuming that the shooting time of the fourth image encounters the shooting time of the fifth image, the block with the largest difference may be first determined, then the position corresponding to the block with the largest difference of the second gridding loading rate matrix is determined to be the starting position of the cargo translation, and the position corresponding to the block with the largest difference of the third gridding loading rate matrix is determined to be the ending position of the cargo translation.
According to the technical scheme provided by the embodiment of the disclosure, whether the goods are displaced or not is judged by comparing the gridding loading rate matrixes corresponding to the two images acquired by the monocular camera, and the starting position and the ending position of the goods displacement are estimated based on the gridding loading rate matrixes corresponding to the two images.
Fig. 5 shows a block diagram of a monocular camera-based load rate estimation device according to an embodiment of the present disclosure. The apparatus may be implemented as part or all of an electronic device by software, hardware, or a combination of both.
As shown in fig. 5, the monocular camera-based loading rate estimating apparatus 500 includes:
a first obtaining unit 501, configured to obtain a first image in a warehouse acquired by the monocular camera, and perform preprocessing on the first image to obtain a first depth estimation map of the first image;
a mapping unit 502 configured to map the first depth estimation map to a first three-dimensional point cloud;
a reconstruction unit 503 configured to perform three-dimensional reconstruction on the first three-dimensional point cloud to obtain a second three-dimensional point cloud in a world coordinate system;
A projection unit 504 configured to obtain a first two-dimensional square matrix based on the second three-dimensional point cloud;
an estimation unit 505 is configured to determine the network based on the first to nth layer network components.
In the embodiment of the disclosure, after the first image in the warehouse acquired by the monocular camera is acquired, the first image may be first preprocessed to obtain a first depth estimation image of the first image. The preprocessing may include a de-distortion process and a depth estimation process, where the de-distortion process may be a de-distortion process corresponding to camera calibration, and the depth estimation process may be a depth estimation process implemented based on a depth estimation model, including but not limited to unsupervised, semi-supervised, or fully supervised, single purpose, and dual purpose depth estimation models.
Specifically, the monocular camera may be calibrated first to obtain built-in parameters of the camera, where the built-in parameters include a built-in parameter matrix and distortion parameters. In the embodiment of the disclosure, camera calibration may be implemented by adopting an existing calibration method through steps of calibration material collection, calibration algorithm calculation, camera built-in parameter determination, and the like, where the existing calibration method may be a Zhang Zhengyou calibration method, or other calibration methods, and is not limited herein. After calibration of the camera is completed, the image acquired by the monocular camera is subjected to de-distortion processing based on the built-in parameters, wherein the de-distortion process can be represented by a formula i=u (I0), I is an output two-dimensional de-distorted image, U is a de-distortion model, I0 is an input original two-dimensional image, and the de-distortion model can be a polynomial de-distortion model or other models, and is not limited herein. After the de-distorted image is obtained, depth estimation is completed based on a formula d=f (I), where D is an output depth estimation map and F is a depth estimation model. For any point p on the undistorted image, its coordinate is [ u, v ] T, the same point coordinate is [ u, v ] T on the depth estimation map D, its value is D, D is the depth distance. By adopting the method, the preprocessing of the first image can be realized, and the first depth estimation image of the first image is obtained.
In the embodiment of the disclosure, mapping the first depth estimation map to a first three-dimensional point cloud refers to mapping the first depth estimation map to a first three-dimensional point cloud based on the built-in parameters of the camera by using a point cloud mapping technology, wherein the first three-dimensional point cloud is located under a camera coordinate system, an optical axis of the camera is parallel to a Z-axis under the camera coordinate system, an optical center of the camera is an origin under the camera coordinate system, and an XY plane of the camera coordinate system is parallel to a camera imaging plane. The point cloud mapping technology may be any technology capable of converting a depth estimation map into a three-dimensional point cloud, and is not limited herein.
In an embodiment of the present disclosure, performing three-dimensional reconstruction on the first three-dimensional point cloud to obtain a second three-dimensional point cloud in a world coordinate system, including: acquiring a warehouse empty image acquired by the monocular camera, and preprocessing the empty image to obtain a second depth estimation image of the empty image; mapping the second depth estimation map to a third three-dimensional point cloud; performing point cloud plane segmentation on the third three-dimensional point cloud to obtain normal vectors and corresponding intercepts of each warehouse plane under a camera coordinate system; obtaining an origin of the world coordinate system based on the intercept of the warehouse plane and a camera optical center; obtaining a three-dimensional coordinate vector of the world coordinate system based on the normal vector of the warehouse plane; and obtaining the second three-dimensional point cloud based on the origin of the world coordinate system and the three-dimensional coordinate vector.
Specifically, when the first three-dimensional point cloud is reconstructed in three dimensions, a normal vector and a corresponding intercept of each warehouse plane in a camera coordinate system obtained based on warehouse empty image processing are required to be obtained first, and then the first three-dimensional point cloud in the camera coordinate system is reconstructed based on the normal vector and the corresponding intercept of each warehouse plane to obtain a second three-dimensional point cloud in a world coordinate system. The acquiring the normal vector and the corresponding intercept of each warehouse plane under a camera coordinate system based on warehouse empty warehouse image processing comprises the following steps: the method comprises the steps of acquiring a warehouse empty image acquired by a monocular camera, preprocessing the empty image to obtain a second depth estimation image of the empty image, and mapping the second depth estimation image into a third three-dimensional point cloud, wherein the preprocessing step and the step of mapping the depth estimation image into the three-dimensional point cloud are the same as the preprocessing and mapping methods for the first image, and are not repeated herein; and carrying out point cloud plane segmentation on the third three-dimensional point cloud to obtain normal vectors and corresponding intercepts of each warehouse plane under a camera coordinate system, wherein the point cloud plane segmentation can be realized by adopting any one of a random sampling consistency (Random Sample Consensus, ranSaC) algorithm, a point cloud segmentation algorithm based on proximity information, a segmentation algorithm based on point cloud frequency, a minimum segmentation algorithm, an ultra-volume clustering segmentation algorithm and a segmentation algorithm based on concave-convex type. Taking RanSaC algorithm as an example, the implementation manner of dividing the third three-dimensional point cloud into point cloud planes to obtain normal vectors and corresponding intercepts of each warehouse plane under a camera coordinate system is specifically described.
As described above, the third three-dimensional point cloud is established under the camera coordinate system in which the optical axis of the camera is parallel to the Z-axis under the camera coordinate system, the optical center of the camera is the origin under the camera coordinate system, and the XY-plane of the camera coordinate system is parallel to the camera imaging plane. Under the world coordinate system, the embodiment of the disclosure considers the condition that the truck is horizontally placed on the ground, at the moment, the Z axis of the world coordinate system is perpendicular to the bottom surface of the cargo warehouse, the bottom surface of the cargo warehouse is a cargo warehouse bottom XY plane, and the top surface of the cargo warehouse is a cargo warehouse top XY plane; the vertical face of the goods warehouse, which is vertical to the ground and close to the head of the goods van, is marked as a goods warehouse XZ face, namely the Y axis of the world coordinate system is vertical to the goods warehouse XZ plane; the inner surfaces, which are perpendicular to the X axis of the world coordinate system and are positioned on two sides of the truck, are respectively marked as a left cargo warehouse YZ plane and a right cargo warehouse YZ plane.
Under the condition that no cargo exists in the cargo warehouse, the obtained third three-dimensional point cloud can be subjected to plane segmentation to obtain normal vectors of the cargo warehouse top XY plane, the cargo warehouse XY bottom plane, the cargo warehouse XZ plane, the cargo warehouse left YZ plane and the cargo warehouse right YZ plane under a camera coordinate systemAnd corresponding intercepts (dzt, dzb, dy, dxl, dxr). Here, since the left warehouse YZ plane is parallel to the right warehouse YZ plane and the warehouse top XY plane is parallel to the warehouse XY bottom plane, +. >And->The same is marked as->And->The same is marked as->The three normal vectors are three-dimensional coordinate vectors of an X axis, a Y axis and a Z axis in a world coordinate system.
After the normal vector and the corresponding intercept of each warehouse plane under the camera coordinate system are obtained, firstly, on the premise that the origin of the coordinate system is still positioned in the camera optical center, the coordinate system where the second three-dimensional point cloud is positioned is rotated from the camera coordinate system to the world coordinate system, and then, based on the intercept, the origin of the coordinate system where the second three-dimensional point cloud is positioned is translated from the camera optical center to the origin of the world coordinate system.
In an embodiment of the present disclosure, the obtaining a first two-dimensional square matrix based on the second three-dimensional point cloud includes: and projecting the second three-dimensional point cloud to an XY plane of a world coordinate system to obtain a first two-dimensional square matrix V with a shape of dy X dx, wherein dx is the maximum intercept in the X-axis direction, namely the difference value between the maximum value and the minimum value of the X-axis direction in the second three-dimensional point cloud, and dy is the maximum intercept in the Y-axis direction, namely the difference value between the maximum value and the minimum value of the Y-axis direction in the second three-dimensional point cloud. And when a plurality of points of the second three-dimensional point cloud are projected to the same coordinate point under the XY plane, the point with the largest Z value is projected to the coordinate point under the XY plane.
In an embodiment of the disclosure, the estimating the loading rate of the warehouse based on the first two-dimensional square matrix includes: carrying out mean value processing on the two-dimensional square matrix; and obtaining the loading rate based on the two-dimensional square matrix after the mean value processing and the intercept dz. In the present disclosureIn one embodiment of (2), the formula may be based onAnd carrying out average value processing on the two-dimensional square matrix, wherein the average value of the M two-dimensional square matrix is respectively an X-axis coordinate and a Y-axis coordinate, and the value range of the X-axis coordinate and the Y-axis coordinate is respectively from the minimum value to the maximum value of the X-axis coordinate and from the minimum value to the maximum value of the Y-axis coordinate in the second three-dimensional point cloud. In one embodiment of the present disclosure, the formula +.> And calculating the loading rate, wherein VR is the loading rate, and dz is the maximum intercept in the Z-axis direction, namely the difference value between the maximum value and the minimum value of the Z-coordinate, in the second three-dimensional point cloud.
According to the technical scheme provided by the embodiment of the disclosure, based on the monocular camera, the real-time loading rate of the warehouse is estimated by combining camera calibration, image de-distortion, depth estimation, point cloud mapping and a three-dimensional reconstruction technology, the estimation accuracy is high, the hardware self cost and the installation cost are far lower than those of the traditional dot matrix laser equipment, and a large amount of real-time data are not required to be acquired and marked for model training, so that the adaptability is high.
The whole loading rate in the warehouse is estimated in the mode, in practical application, the warehouse space can be divided into blocks in the horizontal direction or the vertical direction, and then the gridding loading rate of the warehouse is estimated. In an embodiment of the present disclosure, the method for estimating a loading rate based on a monocular camera may further include: dividing the warehouse space into blocks in the horizontal direction or the vertical direction; and estimating the loading rate of the warehouse in each partition to obtain a first gridding loading rate matrix of the warehouse. Specifically, taking horizontal direction blocking as an example, firstly dividing a top view of a warehouse into m×n blocks in equal proportion, wherein M represents the column number after blocking, N represents the row number after blocking, the space of the warehouse after blocking is equal, and the whole space of the warehouse is composed of m×n block columnar spaces after blocking. On the basis, the loading rate estimation method can estimate the loading rate of each columnar space in real time after the goods are loaded in the goods warehouse.
Fig. 6 shows a block diagram of a monocular camera-based cargo dumping identification device in accordance with an embodiment of the disclosure.
The apparatus may be implemented as part or all of an electronic device by software, hardware, or a combination of both.
As shown in fig. 6, the goods dumping identification device 600 based on the monocular camera includes:
a second acquiring unit 601 configured to acquire a second image and a third image in the warehouse acquired by the monocular camera;
a first determining unit 602 configured to obtain a second two-dimensional square matrix corresponding to the second image and a third two-dimensional square matrix corresponding to the third image by using the monocular camera-based load rate estimation method as shown in fig. 1;
a first determining unit 603 configured to determine that dumping of cargo occurs when a difference between the second two-dimensional square matrix and the third two-dimensional square matrix is greater than a first threshold.
According to the technical scheme provided by the embodiment of the disclosure, whether goods topple or not is judged by comparing the two-dimensional square matrix information corresponding to different frame images under continuous time, the implementation mode is simple, the cost is low, the judgment accuracy is high, and the adaptability is strong.
Fig. 7 shows a block diagram of a monocular camera-based cargo displacement recognition device according to an embodiment of the present disclosure.
The apparatus may be implemented as part or all of an electronic device by software, hardware, or a combination of both.
As shown in fig. 7, the goods-shift-identifying device 700 based on a monocular camera includes:
a third acquiring unit 701 configured to acquire a fourth image and a fifth image in the warehouse acquired by the monocular camera;
a second determining unit 702 configured to obtain a second gridding loading rate matrix corresponding to the fourth image and a third gridding loading rate matrix corresponding to the fifth image by using a monocular camera-based loading rate estimation method as shown in fig. 1;
a second determination unit 703 configured to determine that cargo displacement occurs when a difference between the second and third gridding loading rate matrices is greater than a second threshold.
In an embodiment of the present disclosure, the cargo shift identifying method may further include: determining a partition with the largest difference between the second gridding loading rate matrix and the third gridding loading rate matrix; and estimating the starting position and the ending position of the cargo shift based on the blocks with the largest difference. Specifically, assuming that the shooting time of the fourth image encounters the shooting time of the fifth image, the block with the largest difference may be first determined, then the position corresponding to the block with the largest difference of the second gridding loading rate matrix is determined to be the starting position of the cargo translation, and the position corresponding to the block with the largest difference of the third gridding loading rate matrix is determined to be the ending position of the cargo translation.
According to the technical scheme provided by the embodiment of the disclosure, whether the goods are displaced or not is judged by comparing the gridding loading rate matrixes corresponding to the two images acquired by the monocular camera, and the starting position and the ending position of the goods displacement are estimated based on the gridding loading rate matrixes corresponding to the two images.
The present disclosure also discloses an electronic device, and fig. 8 shows a block diagram of the electronic device according to an embodiment of the present disclosure.
As shown in fig. 8, the electronic device includes a memory and a processor, wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement a method in accordance with an embodiment of the present disclosure.
Fig. 9 shows a schematic diagram of a computer system suitable for use in implementing methods according to embodiments of the present disclosure.
As shown in fig. 9, the computer system includes a processing unit that can execute various processes in the above-described embodiments in accordance with a program stored in a Read Only Memory (ROM) or a program loaded from a storage section into a Random Access Memory (RAM). In the RAM, various programs and data required for the operation of the computer system are also stored. The processing unit, ROM and RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus.
The following components are connected to the I/O interface: an input section including a keyboard, a mouse, etc.; an output section including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), etc., and a speaker, etc.; a storage section including a hard disk or the like; and a communication section including a network interface card such as a LAN card, a modem, and the like. The communication section performs communication processing via a network such as the internet. The drives are also connected to the I/O interfaces as needed. Removable media such as magnetic disks, optical disks, magneto-optical disks, semiconductor memories, and the like are mounted on the drive as needed so that a computer program read therefrom is mounted into the storage section as needed. The processing unit may be implemented as a processing unit such as CPU, GPU, TPU, FPGA, NPU.
In particular, according to embodiments of the present disclosure, the methods described above may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising computer instructions which, when executed by a processor, implement the method steps described above. In such embodiments, the computer program product may be downloaded and installed from a network via a communications portion, and/or installed from a removable medium.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules referred to in the embodiments of the present disclosure may be implemented in software or in programmable hardware. The units or modules described may also be provided in a processor, the names of which in some cases do not constitute a limitation of the unit or module itself.
As another aspect, the present disclosure also provides a computer-readable storage medium, which may be a computer-readable storage medium included in the electronic device or the computer system in the above-described embodiments; or may be a computer-readable storage medium, alone, that is not assembled into a device. The computer-readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention referred to in this disclosure is not limited to the specific combination of features described above, but encompasses other embodiments in which any combination of features described above or their equivalents is contemplated without departing from the inventive concepts described. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Claims (16)

1. A monocular camera-based load rate estimation method, comprising:
Acquiring a first image in a warehouse acquired by the monocular camera, and preprocessing the first image to obtain a first depth estimation image of the first image;
mapping the first depth estimation map into a first three-dimensional point cloud;
performing three-dimensional reconstruction on the first three-dimensional point cloud to obtain a second three-dimensional point cloud under a world coordinate system;
obtaining a first two-dimensional square matrix based on the second three-dimensional point cloud;
estimating a loading rate of the warehouse based on the first two-dimensional square matrix.
2. The method of claim 1, wherein the performing three-dimensional reconstruction on the first three-dimensional point cloud to obtain a second three-dimensional point cloud in a world coordinate system comprises:
acquiring a warehouse empty image acquired by the monocular camera, and preprocessing the empty image to obtain a second depth estimation image of the empty image;
mapping the second depth estimation map to a third three-dimensional point cloud;
performing point cloud plane segmentation on the third three-dimensional point cloud to obtain normal vectors and corresponding intercepts of each warehouse plane under a camera coordinate system;
obtaining an origin of the world coordinate system based on the intercept of the warehouse plane and a camera optical center;
obtaining a three-dimensional coordinate vector of the world coordinate system based on the normal vector of the warehouse plane;
And obtaining the second three-dimensional point cloud based on the origin of the world coordinate system and the three-dimensional coordinate vector.
3. The method of claim 1, wherein the deriving a first two-dimensional square matrix based on the second three-dimensional point cloud comprises:
projecting the second three-dimensional point cloud to an XY plane of a world coordinate system to obtain a shape d y *d x Is a first two-dimensional square matrix of (2)V, wherein dx is the maximum intercept in the X-axis direction in the second three-dimensional point cloud, and dy is the maximum intercept in the Y-axis direction in the second three-dimensional point cloud;
and when a plurality of points of the second three-dimensional point cloud are projected to the same coordinate point under the XY plane, the point with the largest Z value is projected to the coordinate point under the XY plane.
4. The method of claim 1, wherein the estimating the loading rate of the warehouse based on the first two-dimensional square matrix comprises:
carrying out mean value processing on the two-dimensional square matrix;
and obtaining the loading rate based on the two-dimensional square matrix subjected to mean value processing and the intercept dz, wherein the dz is the maximum intercept in the Z-axis direction in the second three-dimensional point cloud.
5. The method of claim 1, wherein the preprocessing the first image to obtain a first depth estimation map of the first image comprises:
Performing de-distortion treatment on the first image to obtain a first de-distorted image;
and obtaining a first depth estimation image of the first image based on the first undistorted image and a pre-trained depth estimation model.
6. The method of claim 5, further comprising:
calibrating the monocular camera to obtain built-in parameters of the monocular camera;
performing de-distortion processing on the first image based on built-in parameters of the monocular camera to obtain a first de-distorted image;
and mapping the first depth estimation map into a first three-dimensional point cloud based on built-in parameters of the monocular camera.
7. The method of any of claims 1-6, further comprising:
dividing the warehouse space into blocks in the horizontal direction or the vertical direction;
and estimating the loading rate of the warehouse in each partition to obtain a first gridding loading rate matrix of the warehouse.
8. A method for identifying cargo dumping based on a monocular camera, comprising:
acquiring a second image and a third image in a warehouse acquired by the monocular camera;
obtaining a second two-dimensional square matrix corresponding to the second image and a third two-dimensional square matrix corresponding to the third image by adopting the method as claimed in any one of claims 1 to 6;
And when the difference between the second two-dimensional square matrix and the third two-dimensional square matrix is larger than a first threshold value, judging that goods toppling occurs.
9. A monocular camera-based cargo displacement identification method, comprising:
acquiring a fourth image and a fifth image in a warehouse acquired by the monocular camera;
obtaining a second gridding loading rate matrix corresponding to the fourth image and a third gridding loading rate matrix corresponding to the fifth image by adopting the method of claim 7;
and when the difference between the second gridding loading rate matrix and the third gridding loading rate matrix is larger than a second threshold value, judging that cargo shifting occurs.
10. The method of claim 9, further comprising:
determining a partition with the largest difference between the second gridding loading rate matrix and the third gridding loading rate matrix;
and estimating the starting position and the ending position of the cargo shift based on the blocks with the largest difference.
11. A monocular camera-based load rate estimation device, comprising:
the first acquisition unit is configured to acquire a first image in a warehouse acquired by the monocular camera, and preprocess the first image to obtain a first depth estimation image of the first image;
A mapping unit configured to map the first depth estimation map to a first three-dimensional point cloud;
the reconstruction unit is configured to perform three-dimensional reconstruction on the first three-dimensional point cloud to obtain a second three-dimensional point cloud under a world coordinate system;
a projection unit configured to obtain a first two-dimensional square matrix based on the second three-dimensional point cloud;
an estimation unit configured to determine the network based on the first to nth layer network components.
12. The apparatus of claim 11, further comprising:
dividing the warehouse space into blocks in the horizontal direction or the vertical direction;
and estimating the loading rate of the warehouse in each partition to obtain a first gridding loading rate matrix of the warehouse.
13. A monocular camera-based cargo dumping identification device, comprising:
a second acquisition unit configured to acquire a second image and a third image within the warehouse acquired by the monocular camera;
a first determining unit configured to obtain a second two-dimensional square matrix corresponding to the second image and a third two-dimensional square matrix corresponding to the third image by using the method according to any one of claims 1 to 6;
and a first determination unit configured to determine that dumping of the cargo occurs when a difference between the second two-dimensional square matrix and the third two-dimensional square matrix is greater than a first threshold.
14. A monocular camera-based cargo displacement identification device, comprising:
a third acquisition unit configured to acquire a fourth image and a fifth image within the warehouse acquired by the monocular camera;
a second determining unit configured to obtain a second gridding loading rate matrix corresponding to the fourth image and a third gridding loading rate matrix corresponding to the fifth image by using the method as set forth in claim 7;
and a second determination unit configured to determine that cargo displacement occurs when a difference between the second and third gridding loading rate matrices is greater than a second threshold.
15. An electronic device includes a memory and a processor; wherein the memory is for storing one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of any of claims 1-10.
16. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the method steps of any of claims 1-10.
CN202210550981.1A 2022-05-18 2022-05-18 Method, device, equipment and medium for estimating loading rate based on monocular camera Pending CN117132633A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210550981.1A CN117132633A (en) 2022-05-18 2022-05-18 Method, device, equipment and medium for estimating loading rate based on monocular camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210550981.1A CN117132633A (en) 2022-05-18 2022-05-18 Method, device, equipment and medium for estimating loading rate based on monocular camera

Publications (1)

Publication Number Publication Date
CN117132633A true CN117132633A (en) 2023-11-28

Family

ID=88855104

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210550981.1A Pending CN117132633A (en) 2022-05-18 2022-05-18 Method, device, equipment and medium for estimating loading rate based on monocular camera

Country Status (1)

Country Link
CN (1) CN117132633A (en)

Similar Documents

Publication Publication Date Title
CN111862179B (en) Three-dimensional object modeling method and apparatus, image processing device, and medium
US9805505B2 (en) Estimation of object properties in 3D world
US8659593B2 (en) Image processing apparatus, method and program
US10192345B2 (en) Systems and methods for improved surface normal estimation
EP0875860B1 (en) Precise gradient calculation system and method for a texture mapping system of a computer graphics system
CN107123142A (en) Position and orientation estimation method and device
CN111007485B (en) Image processing method and device and computer storage medium
CN112099025B (en) Method, device, equipment and storage medium for positioning vehicle under bridge crane
CN109472829A (en) A kind of object positioning method, device, equipment and storage medium
CN112132523A (en) Method, system and device for determining quantity of goods
CN109766896B (en) Similarity measurement method, device, equipment and storage medium
CN114187589A (en) Target detection method, device, equipment and storage medium
CN116012515A (en) Neural radiation field network training method and related equipment
CN115271149A (en) Container space optimization method and device, electronic equipment and storage medium
CN113075716A (en) Image-based vehicle positioning method and device, storage medium and electronic equipment
CN116152306B (en) Method, device, apparatus and medium for determining masonry quality
CN116957459A (en) Intelligent matching method, system, equipment and storage medium based on freight scene
CN117132633A (en) Method, device, equipment and medium for estimating loading rate based on monocular camera
US20180357784A1 (en) Method for characterising a scene by computing 3d orientation
US6535219B1 (en) Method and apparatus to display objects in a computer system
CN111336959A (en) Truck cargo volume processing method and device, equipment and computer readable medium
CN114418952A (en) Goods counting method and device, computer equipment and storage medium
US7555163B2 (en) Systems and methods for representing signed distance functions
CN113129354A (en) Method and system for measuring residual volume of vehicle compartment
CN114549650A (en) Camera calibration method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination