CN111028283B

CN111028283B - Image detection method, device, equipment and readable storage medium

Info

Publication number: CN111028283B
Application number: CN201911266589.9A
Authority: CN
Inventors: 孙伟
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2019-12-11
Filing date: 2019-12-11
Publication date: 2024-01-12
Anticipated expiration: 2039-12-11
Also published as: CN111028283A

Abstract

The embodiment of the application provides an image detection method, an image detection device and a computer readable storage medium, wherein the method comprises the following steps: acquiring an original image comprising an object to be measured and a reference image comprising an outer surface of the object to be measured, wherein the original image comprises a color image and a depth image; determining a bounding box corresponding to the object to be detected in the color image according to the color image and the reference image; determining position coordinates of key points of the object to be detected according to the bounding box; and determining the three-dimensional position information of the object to be detected according to the position coordinates and the depth image of the key points. According to the method, by introducing the reference image, accurate detection is realized to obtain the position information of the object to be detected.

Description

Image detection method, device, equipment and readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image detection method, apparatus, device, and readable storage medium.

Background

In the prior art, a stack of boxes is piled together, the positions of the boxes need to be known, but the existing mode for detecting the positions of the boxes has the problems of low accuracy, high cost and poor flexibility.

Disclosure of Invention

Aiming at the defects of the existing mode, the application provides an image detection method, an image detection device, image detection equipment and a computer readable storage medium, which are used for solving the problem of how to accurately detect and obtain the position information of an object to be detected.

In a first aspect, the present application provides an image detection method, including:

acquiring an original image comprising an object to be measured and a reference image comprising an outer surface of the object to be measured, wherein the original image comprises a color image and a depth image;

determining a bounding box corresponding to the object to be detected in the color image according to the color image and the reference image;

determining position coordinates of key points of the object to be detected according to the bounding box;

and determining the three-dimensional position information of the object to be detected according to the position coordinates and the depth image of the key points.

In a second aspect, the present application provides an image detection apparatus, comprising:

the first processing module is used for acquiring an original image comprising an object to be detected and a reference image comprising the outer surface of the object to be detected, wherein the original image comprises a color image and a depth image;

the second processing module is used for determining a bounding box corresponding to the object to be detected in the color image according to the color image and the reference image;

the third processing module is used for determining the position coordinates of the key points of the object to be detected according to the bounding box;

and the fourth processing module is used for determining the three-dimensional position information of the object to be detected according to the position coordinates of the key points and the depth image.

In a third aspect, the present application provides an electronic device, including: a processor, a memory, and a bus;

a bus for connecting the processor and the memory;

a memory for storing operation instructions;

and the processor is used for executing the image detection method of the first aspect of the application by calling the operation instruction.

In a fourth aspect, the present application provides a computer-readable storage medium storing a computer program for executing the image detection method of the first aspect of the present application.

The technical scheme provided by the embodiment of the application has at least the following beneficial effects:

acquiring an original image comprising an object to be measured and a reference image comprising an outer surface of the object to be measured, wherein the original image comprises a color image and a depth image; determining a bounding box corresponding to the object to be detected in the color image according to the color image and the reference image; determining position coordinates of key points of the object to be detected according to the bounding box; and determining the three-dimensional position information of the object to be detected according to the position coordinates and the depth image of the key points. Thus, by introducing the reference image, accurate detection is realized to obtain the position information of the object to be detected.

Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic flow chart of an image detection method according to an embodiment of the present application;

fig. 2 is a schematic diagram of acquiring a bounding box according to an embodiment of the present application;

FIG. 3 is a schematic diagram of scaling a first image extracted from a bounding box to a first size according to an embodiment of the present application;

fig. 4 is a schematic diagram of acquiring position coordinates of a key point according to an embodiment of the present application;

FIG. 5 is a schematic diagram of four key points and all points in a quadrilateral formed by the four key points according to an embodiment of the present application;

fig. 6 is a flowchart of another image detection method according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an image detection device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of illustrating the present application and are not to be construed as limiting the invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

In order to better understand and illustrate the embodiments of the present application, some technical terms related to the embodiments of the present application are briefly described below.

Neural network: is an algorithm mathematical model which simulates the behavior characteristics of an animal neural network and processes distributed parallel information. The network relies on the complexity of the system and achieves the purpose of processing information by adjusting the relationship of the interconnection among a large number of nodes.

mask-rcnn model: the object detection and object segmentation model, nn in mask-rcnn is a neural network whose inspiration comes from the working principle of biological neurons, which is a connected set of neurons, each of which outputs a signal according to input and internal parameters. When training a neural network, internal parameters of neurons are continually adjusted to obtain a desired output. C in mask-rcnn represents convolution, cnn uses fewer parameters and memory than conventional neural networks, which enables them to process larger images than conventional neural networks.

A Unet network: the Unet network comprises a feature extraction part and an up-sampling part, and is called as the network structure is like a U-type; the feature extraction part is provided with a scale after each pooling layer passes through; and the up-sampling part fuses the same scale as the channel number corresponding to the feature extraction part once for up-sampling.

RANSAC algorithm: a random sample consensus algorithm (RANdomSAmple Consensus, RANSAC) iteratively estimates parameters of a mathematical model from a set of observed data containing outliers. The RANSAC algorithm assumes that the data contains both correct data and anomalous data (otherwise known as noise). Correct data is denoted as inner points (inliers), and abnormal data is denoted as outer points (outliers). At the same time RANSAC also assumes that, given a correct set of data, there is a way in which model parameters that fit these data can be calculated. The core idea of the algorithm is randomness and supposition, the randomness is to randomly select sampling data according to the probability of occurrence of correct data, and the randomness simulation can approximately obtain a correct result according to the law of large numbers. The assumption is that the sampled data selected are all correct data, then the correct data are used to calculate other points through the model of the problem satisfaction, and then the result is scored.

The following describes the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Example 1

An embodiment of the present application provides an image detection method, where a flow chart of the method is shown in fig. 1, and the method includes:

s101, acquiring an original image comprising an object to be measured and a reference image comprising the outer surface of the object to be measured, wherein the original image comprises a color image and a depth image.

Optionally, the reference map comprises a surface image of the object to be measured.

Optionally, the original image is obtained by a Depth camera, the original image being an RGB-D (Red, green, blue-Depth) image, the original image comprising a color image and a Depth image, the color image corresponding to an RGB (Red, green, blue) image in the RGB-D image and the Depth image corresponding to a Depth image (Depth Map) in the RGB-D image. RGB color mode is a color standard in industry, and is obtained by changing three color channels of Red, green and Blue and overlapping them with each other, and RGB is the color representing the three channels of Red, green and Blue. In 3D computer graphics, depth Map is an image or image channel containing information about the distance of the surface of a scene object of a viewpoint. Wherein the Depth Map is similar to a gray scale image, except that each of its pixel values characterizes the actual distance of the sensor from the object. Typically the RGB image and the Depth image are registered so that there is a one-to-one correspondence between pixel points.

Optionally, the color image includes one or more objects to be measured, the objects to be measured are boxes, the top surface images of all boxes are the same, and the reference image is the top surface image of the box.

S102, determining a bounding box corresponding to the object to be detected in the color image according to the color image and the reference image.

Optionally, the bounding box corresponding to one object to be measured is a smallest rectangle bounding the one object to be measured, and the length and width of the smallest rectangle are parallel to the image boundary including the one object to be measured.

S103, determining the position coordinates of the key points of the object to be detected according to the bounding box.

Optionally, the object to be measured is a box, and one box has four key points, and the four key points respectively correspond to four vertex angles of the top surface image of the box, and can represent coordinates of any point on the box on 2D coordinates through coordinate values (X, Y).

S104, determining three-dimensional position information of the object to be detected according to the position coordinates and the depth image of the key points.

Optionally, the object to be measured is a box, one box has four key points, and the 3D coordinates of any point of the box can be represented by three-dimensional vectors (X, Y, Z), X represents the coordinate value on the X-axis of the coordinates, Y represents the coordinate value on the Y-axis of the coordinates, Z represents the coordinate value on the Z-axis of the coordinates, and the coordinate value on the Z-axis is the depth value.

In the embodiment of the application, an original image comprising an object to be detected and a reference image comprising the outer surface of the object to be detected are obtained, wherein the original image comprises a color image and a depth image; determining a bounding box corresponding to the object to be detected in the color image according to the color image and the reference image; determining position coordinates of key points of the object to be detected according to the bounding box; and determining the three-dimensional position information of the object to be detected according to the position coordinates and the depth image of the key points. Thus, by introducing the reference image, accurate detection is realized to obtain the position information of the object to be detected.

Optionally, determining, according to the color image and the reference image, a bounding box corresponding to the object to be measured includes:

inputting the color image into a first feature extraction layer to obtain a first feature map;

inputting the reference image into a second feature extraction layer to obtain a second feature image;

and performing feature matching on the first feature map and the second feature map, and determining a bounding box.

Optionally, feature matching is performed on the first feature map and the second feature map, and determining the bounding box includes:

according to the spatial correspondence between the first feature map and the second feature map, performing feature matching on the first feature map and the second feature map to obtain a third feature map;

and superposing the first characteristic diagram and the third characteristic diagram to determine a bounding box.

Optionally, the first feature map has a size (h, w, c 1) and the second feature map has a size (h, w, c 2); and superposing the first characteristic diagram and the second characteristic diagram to obtain a third characteristic diagram, wherein the size of the third characteristic diagram is (h, w, c1+c2).

Optionally, the first feature map has a size (h, w, c 1) and the second feature map has a size (h, w, c 2); the first feature map and the second feature map have a certain corresponding relation in space, the second feature map is connected with a space transformation network (Spatial Transformers Network) to obtain a new feature map, and the size of the new feature map is (h, w, c 2); and superposing the first characteristic diagram and the new characteristic diagram to obtain a third characteristic diagram, wherein the size of the third characteristic diagram is (h, w, c1+c2).

Optionally, as shown in fig. 2, the improved mask-rcnn model includes two feature extraction layers, a region candidate network and a region of interest pooling (ropooling) layer, where the two feature extraction layers are a first feature extraction layer and a second feature extraction layer, respectively, for performing feature extraction on an original image and a reference image. For example, the object to be measured is a box, and one color image comprising two boxes is input to a first feature extraction layer to obtain a first feature map; the corresponding reference pictures of the two boxes are the same, the reference pictures are top surface images of the boxes, and the top surface images of the boxes are input into a preset second feature extraction layer to obtain a second feature picture; splicing the first characteristic diagram and the second characteristic diagram to obtain a third characteristic diagram; inputting the third feature map into a region candidate network to obtain an initial bounding box; the initial bounding box and the third feature map are input into the region of interest pooling layer, one bounding box is corresponding to each box, two bounding boxes corresponding to two boxes are output from the region of interest pooling layer, the bounding box corresponding to each box is the smallest rectangle which encloses the box, and the length and the width of the smallest rectangle are parallel to the boundary of the box. In an embodiment, the area candidate network may be an RPN, R-CNN, or other network, and is configured to determine a position coordinate of the object to be measured in the third feature map, that is, an initial bounding box corresponding to the object to be measured, where the initial bounding box may be multiple. And inputting the third feature map and a plurality of initial bounding boxes output from the region candidate network into the region-of-interest pooling layer, and updating the position coordinates of the object to be detected to obtain more accurate position coordinates of the object to be detected, thereby determining the bounding box corresponding to the object to be detected in the original image.

Optionally, determining the position coordinates of the key points of the object to be measured according to the bounding box includes:

intercepting a region corresponding to the bounding box to obtain a first image;

upsampling the first image and the reference image to obtain a thermodynamic diagram comprising key points of the object to be measured;

and determining the position coordinates of the key points according to the thermodynamic diagram.

when the color image comprises N objects to be detected, extracting N objects to be detected from N bounding boxes corresponding to the N objects to be detected respectively to obtain N first images, wherein the first images comprise the objects to be detected, and N is a positive integer;

scaling the N first images to a first size to obtain N second images;

and determining the position coordinates of the key points of the object to be detected according to the second image.

Optionally, as shown in fig. 3, when N is 2, one color image includes 2 boxes, and the 2 boxes are respectively extracted from 2 bounding boxes corresponding to the 2 boxes, so as to obtain 2 first images, where each first image includes one box; the 2 first images are scaled to the same first size, resulting in 2 second images, the size of the box in one second image being the same as the size of the box in the other second image.

Optionally, as shown in fig. 4, the modified uiet network includes two feature extraction layers, an upsampling layer, and a network output four thermodynamic diagrams, wherein the two feature extraction layers are a third feature extraction layer and a fourth feature extraction layer, respectively. When 2 second images exist, inputting one second image comprising one box into a third feature extraction layer to obtain a fourth feature image; the reference pictures corresponding to the two boxes are the same, the reference pictures are, for example, top surface images of the boxes, and the top surface image of one box corresponding to the second image is input into a fourth feature extraction layer to obtain a fifth feature picture; splicing the fourth characteristic diagram and the fifth characteristic diagram to obtain a sixth characteristic diagram; inputting the sixth characteristic diagram into an upper sampling layer to obtain four thermodynamic diagrams; from the four thermodynamic diagrams, the 2D coordinates of the four key points of the box are obtained. The other second image is processed in the same way as above to obtain the 2D coordinates of the four key points of the other box. The thermodynamic diagram comprises a gaussian distribution diagram centered on key points corresponding to the four vertices of the top image of the box, respectively.

Optionally, determining the location coordinates of the key points according to the thermodynamic diagram includes:

the coordinates of the point with the largest pixel value in each thermodynamic diagram are obtained according to the thermodynamic diagrams, the coordinates of the point with the largest pixel value in each thermodynamic diagram are taken as the position coordinates of the key points, the thermodynamic diagram comprises a Gaussian distribution diagram taking the key points as the center, and the key points correspond to the vertex angles of the reference diagram.

Optionally, determining the three-dimensional position information of the object to be measured according to the position coordinates of the key points and the depth image includes:

according to the position coordinates of the four key points, position coordinates of a plurality of points in a quadrilateral formed by the four key points are obtained;

according to the position coordinates of a plurality of points in the quadrangle and the depth image, obtaining three-dimensional position information of the plurality of points in the quadrangle;

and determining the three-dimensional position information of the four key points according to the three-dimensional position information of the plurality of points in the quadrangle and the position coordinates of the four key points.

Alternatively, as shown in fig. 5, all points in a quadrangle surrounded by four key points are selected, 3D coordinates of each of all points in the quadrangle are calculated in combination with camera parameters, a plane is fitted by RANSAC algorithm according to the 3D coordinates of each of all points in the quadrangle, a part of points in all points in the quadrangle can be randomly selected each time by RANSAC algorithm, a plane is calculated by least square method, and then the error of each point in the part of points and the plane is calculated. When the points with errors within the preset threshold value are enough, a new plane is recalculated by the points, after the new plane is calculated, the 2D coordinates of the four key points are substituted into the new plane, and therefore the 2D coordinates of the four key points are converted into the 3D coordinates of the four key points.

Optionally, determining the three-dimensional position information of the four key points according to the three-dimensional position information of the plurality of points in the quadrangle and the position coordinates of the four key points includes:

performing plane fitting on a plurality of points according to three-dimensional position information of the plurality of points in the quadrangle to obtain a fitting plane;

determining error values between each point in the plurality of points and the fitting plane according to the three-dimensional position information of the plurality of points and the fitting plane;

and when the number of points, of which the error value between the plurality of points and the fitting plane is smaller than a preset threshold value, is not smaller than the preset number, obtaining the three-dimensional position information of the four key points according to the three-dimensional position information of the points, of which the error value between the fitting plane and the fitting plane is smaller than the preset threshold value, and the position coordinates of the four key points.

Another image detection method is provided in the embodiments of the present application, and a flow chart of the method is shown in fig. 6, where the method includes:

s201, acquiring an original image and a reference image, wherein the original image comprises a color image and a depth image.

Optionally, the reference image is an image of the top surface of the box.

S202, respectively inputting color images and reference images comprising N boxes into a first feature extraction layer and a second feature extraction layer to respectively obtain a first feature image and a second feature image, wherein N is a positive integer.

Optionally, the first feature extraction layer and the second feature extraction layer are both converted from the image into a feature map through a convolutional neural network, the first feature map representing features of the color image, and the second feature map representing features of the reference map. The input of the feature extraction layer is a picture with the size of h×w×3, wherein 3 refers to RGB3 channels, and the output is a feature map of h '×w' ×c. Both the first and second feature maps are h 'w' c feature maps.

And S203, splicing the first characteristic diagram and the second characteristic diagram to obtain a third characteristic diagram.

Optionally, the first feature map and the second feature map are spliced into a feature map of h '×w' ×2c, and the third feature map is a feature map of h '×w' ×2c.

S204, inputting the third feature map into the area candidate network to obtain an initial bounding box.

Optionally, the input of the region candidate network RPN is h '×w' ×2c, the output is N initial bounding boxes, and the initial bounding boxes are marked as [ (x) ₁ ，y ₁ ，h ₁ ，w ₁ )，…，(x _N ，y _N ，h _N ，w _N )]. Taking any point i in the bounding box as an example, (x) _i ，y _i ，h _i ，w _i ) 1.ltoreq.i.ltoreq.N, representing three-dimensional coordinates and depth information of the midpoint i of the bounding box, the four numbers representing that the upper left corner is (x _i ，y _i ) Long is h _i Width is w _i Is included in the initial bounding box of (a). N is a positive integer.

S205, inputting the initial bounding box and the third feature map into a region-of-interest pooling layer to obtain N bounding boxes.

Optionally, the input to the region of interest pooling layer is N initial bounding boxes [ (x) ₁ ，y ₁ ，h ₁ ，w ₁ )，…，(x _N ，y _N ，h _N ，w _N )]And h 'w' 2c, the output is N bounding boxes [ (x) ₁ ’，y ₁ ’，h ₁ ’，w ₁ ’)，…，(x _N ’，y _N ’，h _N ’，w _N ’)]. The bounding box position is more accurate than the initial bounding box position.

And S206, extracting N first images according to the N bounding boxes, and scaling the N first images to the same size to obtain N second images with the same size.

Optionally, one color image includes N boxes, the N boxes are respectively extracted from N bounding boxes corresponding to the N boxes, so as to obtain N first images, and each first image includes one box; and scaling the N first images to the same first size to obtain N second images, wherein the boxes in the N second images are the same in size.

S207, obtaining 2D coordinates of 4N key points of the N boxes according to the N second images with the same size.

Optionally, inputting one second image of the N second images with the same size into the third feature extraction layer to obtain a fourth feature map; inputting a reference image into a fourth feature extraction layer to obtain a fifth feature image; splicing the fourth characteristic diagram and the fifth characteristic diagram to obtain a sixth characteristic diagram; inputting the sixth characteristic diagram into an upper sampling layer to obtain four thermodynamic diagrams; obtaining 2D coordinates of four key points of a box according to the four thermodynamic diagrams; in the manner described above, 2D coordinates of 4N keypoints for N bins are obtained.

Optionally, the feature extraction layer is a third feature extraction layer or a fourth feature extraction layer. The input of the feature extraction layer is a picture with the size of h×w×3, wherein 3 refers to RGB3 channels, and the output is a feature map of h '×w' ×c. The feature extraction layer may be a network structure of the resnet101, and the fourth feature map and the fifth feature map are both h 'w' c feature maps. The fourth feature map and the fifth feature map are spliced into a feature map of h '×w' ×2c, the sixth feature map is a feature map of h '×w' ×2c, the up-sampling layer input is the feature map of h '×w' ×2c, the output is 4 thermodynamic diagrams of h '×w' ×1, and each thermodynamic diagram corresponds to a key point of a box. For the ith thermodynamic diagram, taking the coordinate of the point with the maximum pixel value in the ith thermodynamic diagram as the predicted key point coordinate (xi, yi) of the ith thermodynamic diagram, predicting one key point coordinate for each thermodynamic diagram, and obtaining 4 key point coordinates [ (x 1, y 1), (x 2, y 2), (x 3, y 3), (x 4, y 4) ] of a box by using 4 thermodynamic diagrams, wherein i is any integer of 1, 2, 3 and 4.

And S208, obtaining 3D coordinates of the 4N key points according to the 2D coordinates of the 4N key points and the depth image.

Optionally, obtaining 2D coordinates of all points in a quadrilateral formed by the four key points of one box according to the 2D coordinates of the four key points of one box; obtaining 3D coordinates of all points in the quadrangle according to the 2D coordinates of all points in the quadrangle and the depth image; the 3D coordinates of the four key points of a box are determined from the 3D coordinates of all the points in the quadrilateral and the 2D coordinates of the four key points. According to the same processing mode, the 3D coordinates of 4N key points of the N boxes are obtained.

The application of the embodiment of the application has at least the following beneficial effects:

by introducing the reference diagram, the 3D position information of the box can be accurately detected.

Example two

Based on the same inventive concept, the embodiment of the present application further provides an image detection apparatus, and a schematic structural diagram of the apparatus is shown in fig. 7, where the image detection apparatus 30 includes a first processing module 301, a second processing module 302, a third processing module 303, and a fourth processing module 304.

A first processing module 301, configured to obtain an original image including an object to be measured and a reference image including an outer surface of the object to be measured, where the original image includes a color image and a depth image;

the second processing module 302 is configured to determine a bounding box corresponding to the object to be measured in the color image according to the color image and the reference image;

a third processing module 303, configured to determine position coordinates of key points of the object to be measured according to the bounding box;

the fourth processing module 304 is configured to determine three-dimensional position information of the object to be measured according to the position coordinates of the key points and the depth image.

Optionally, the second processing module 302 is specifically configured to input the color image to the first feature extraction layer, so as to obtain a first feature map; inputting the reference image into a second feature extraction layer to obtain a second feature image; and performing feature matching on the first feature map and the second feature map, and determining a bounding box.

Optionally, the second processing module 302 is specifically configured to perform feature matching on the first feature map and the second feature map according to the spatial correspondence between the first feature map and the second feature map, so as to obtain a third feature map; and superposing the first characteristic diagram and the third characteristic diagram to determine a bounding box.

The third processing module 303 is specifically configured to intercept an area corresponding to the bounding box, and obtain a first image; upsampling the first image and the reference image to obtain a thermodynamic diagram comprising key points of the object to be measured; and determining the position coordinates of the key points according to the thermodynamic diagram.

The third processing module 303 is specifically configured to obtain, from the thermodynamic diagrams, coordinates of a point with a maximum pixel value in each thermodynamic diagram, and use the coordinates of the point with the maximum pixel value in each thermodynamic diagram as position coordinates of a key point, where the thermodynamic diagram includes a gaussian distribution diagram centered on the key point, and the key point corresponds to a vertex angle of the reference diagram.

The fourth processing module 304 is specifically configured to obtain, according to the position coordinates of the four key points, position coordinates of a plurality of points in a quadrilateral formed by the four key points; obtaining three-dimensional position information of a plurality of points in the quadrangle according to the position coordinates and the depth image of the plurality of points in the quadrangle; and determining the three-dimensional position information of the four key points according to the three-dimensional position information of the plurality of points in the quadrangle and the position coordinates of the four key points.

The fourth processing module 304 is specifically configured to perform plane fitting on a plurality of points according to three-dimensional position information of the plurality of points in the quadrilateral, so as to obtain a fitted plane; determining error values between each point in the plurality of points and the fitting plane according to the three-dimensional position information of the plurality of points and the fitting plane; and when the number of points, of which the error value between the plurality of points and the fitting plane is smaller than a preset threshold value, is not smaller than the preset number, obtaining the three-dimensional position information of the four key points according to the three-dimensional position information of the points, of which the error value between the fitting plane and the fitting plane is smaller than the preset threshold value, and the position coordinates of the four key points.

The image detection device provided in the embodiment of the present application may refer to the image detection method provided in the first embodiment, and the beneficial effects that the image detection device provided in the embodiment of the present application can achieve are the same as those of the image detection method provided in the first embodiment, which is not described herein again.

Example III

Based on the same inventive concept, the embodiment of the present application further provides an electronic device, a schematic structural diagram of which is shown in fig. 8, where the electronic device 7000 includes at least one processor 7001, a memory 7002 and a bus 7003, and at least one processor 7001 is electrically connected to the memory 7002; the memory 7002 is configured to store at least one computer executable instruction, and the processor 7001 is configured to execute the at least one computer executable instruction to perform the steps of any one of the image detection methods as provided in any one of the embodiments or any one of the alternative implementations of the present application.

Further, the processor 7001 may be an FPGA (Field-Programmable Gate Array, field programmable gate array) or other device having logic processing capability, such as an MCU (Microcontroller Unit, micro control unit), CPU (Central Process Unit, central processing unit).

Example IV

Based on the same inventive concept, the embodiments of the present application further provide a computer readable storage medium storing a computer program for implementing the steps of any one of the embodiments of the present application or any one of the image detection methods when executed by a processor.

The computer readable storage medium provided by the embodiments of the present application includes, but is not limited to, any type of disk including floppy disks, hard disks, optical disks, CD-ROMs, and magneto-optical disks, ROMs (Read-Only memories), RAMs (Random Access Memory, random access memories), EPROMs (Erasable Programmable Read-Only memories), EEPROMs (Electrically Erasable Programmable Read-Only memories), flash memories, magnetic cards, or optical cards. That is, a readable storage medium includes any medium that stores or transmits information in a form readable by a device (e.g., a computer).

It will be understood by those within the art that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. Those skilled in the art will appreciate that these computer program instructions can be implemented in a processor of a general purpose computer, special purpose computer, or other programmable data processing method, such that the blocks of the block diagrams and/or flowchart illustration are implemented by the processor of the computer or other programmable data processing method.

Those of skill in the art will appreciate that the various operations, methods, steps in the flow, actions, schemes, and alternatives discussed in the present application may be alternated, altered, combined, or eliminated. Further, other steps, means, or steps in a process having various operations, methods, or procedures discussed in this application may be alternated, altered, rearranged, split, combined, or eliminated. Further, steps, measures, schemes in the prior art with various operations, methods, flows disclosed in the present application may also be alternated, altered, rearranged, decomposed, combined, or deleted.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for a person skilled in the art, several improvements and modifications can be made without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. An image detection method, comprising:

and determining the three-dimensional position information of the object to be detected according to the position coordinates of the key points and the depth image.

2. The method according to claim 1, wherein determining a bounding box corresponding to the object to be measured from the color image and the reference image comprises:

and carrying out feature matching on the first feature map and the second feature map, and determining the bounding box.

3. The method of claim 2, wherein the feature matching the first feature map and the second feature map to determine the bounding box comprises:

and superposing the first characteristic diagram and the third characteristic diagram to determine the bounding box.

4. The method according to claim 1, wherein determining the position coordinates of the key points of the object to be measured according to the bounding box comprises:

upsampling the first image and the reference image to obtain a thermodynamic diagram comprising key points of the object to be detected;

5. The method of claim 4, wherein determining location coordinates of the keypoints from the thermodynamic diagrams comprises:

and obtaining coordinates of a point with the maximum pixel value in each thermodynamic diagram according to the thermodynamic diagram, wherein the coordinates of the point with the maximum pixel value in each thermodynamic diagram are taken as position coordinates of the key point, the thermodynamic diagram comprises a Gaussian distribution diagram taking the key point as a center, and the key point corresponds to the vertex angle of the reference diagram.

6. The method according to claim 5, wherein determining the three-dimensional position information of the object to be measured based on the position coordinates of the key points and the depth image comprises:

obtaining three-dimensional position information of a plurality of points in the quadrangle according to the position coordinates of the plurality of points in the quadrangle and the depth image;

and determining the three-dimensional position information of the four key points according to the three-dimensional position information of a plurality of points in the quadrangle and the position coordinates of the four key points.

7. The method of claim 6, wherein determining the three-dimensional position information of the four key points from the three-dimensional position information of the plurality of points in the quadrilateral and the position coordinates of the four key points comprises:

performing plane fitting on a plurality of points in the quadrangle according to the three-dimensional position information of the plurality of points to obtain a fitting plane;

and when the number of points, of which the error value between the plurality of points and the fitting plane is smaller than a preset threshold value, is not smaller than the preset number, obtaining three-dimensional position information of the four key points according to the three-dimensional position information of the points, of which the error value between the plurality of points and the fitting plane is smaller than the preset threshold value, and the position coordinates of the four key points.

8. An image detection apparatus, comprising:

9. An electronic device, comprising: a processor, a memory;

the memory is used for storing a computer program;

the processor is configured to execute the image detection method according to any one of the preceding claims 1-7 by invoking the computer program.

10. A computer readable storage medium, characterized in that a computer program is stored, which computer program is adapted to implement the image detection method according to any one of claims 1-7 when being executed by a processor.