CN111028283A

CN111028283A - Image detection method, device, equipment and readable storage medium

Info

Publication number: CN111028283A
Application number: CN201911266589.9A
Authority: CN
Inventors: 孙伟
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2019-12-11
Filing date: 2019-12-11
Publication date: 2020-04-17
Anticipated expiration: 2039-12-11
Also published as: CN111028283B

Abstract

The embodiment of the application provides an image detection method, an image detection device, image detection equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring an original image including an object to be detected and a reference image including the outer surface of the object to be detected, wherein the original image includes a color image and a depth image; determining a corresponding bounding box of the object to be detected in the color image according to the color image and the reference image; determining the position coordinates of key points of the object to be detected according to the surrounding frame; and determining the three-dimensional position information of the object to be detected according to the position coordinates and the depth image of the key points. The method realizes accurate detection to obtain the position information of the object to be detected by introducing the reference picture.

Description

Image detection method, device, equipment and readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image detection method, an image detection apparatus, an image detection device, and a readable storage medium.

Background

In the prior art, a stack of boxes is piled together, and the positions of the boxes need to be known, but the current mode for detecting the positions of the boxes has the problems of low accuracy, high cost and poor flexibility.

Disclosure of Invention

The application provides an image detection method, an image detection device, an image detection apparatus and a computer-readable storage medium, which are used for solving the problem of how to accurately detect and obtain position information of an object to be detected.

In a first aspect, the present application provides an image detection method, including:

acquiring an original image including an object to be detected and a reference image including the outer surface of the object to be detected, wherein the original image includes a color image and a depth image;

determining a corresponding bounding box of the object to be detected in the color image according to the color image and the reference image;

determining the position coordinates of key points of the object to be detected according to the surrounding frame;

and determining the three-dimensional position information of the object to be detected according to the position coordinates and the depth image of the key points.

In a second aspect, the present application provides an image detection apparatus comprising:

the device comprises a first processing module, a second processing module and a third processing module, wherein the first processing module is used for acquiring an original image comprising an object to be detected and a reference image comprising the outer surface of the object to be detected, and the original image comprises a color image and a depth image;

the second processing module is used for determining a corresponding bounding box of the object to be detected in the color image according to the color image and the reference image;

the third processing module is used for determining the position coordinates of the key points of the object to be detected according to the surrounding frame;

and the fourth processing module is used for determining the three-dimensional position information of the object to be detected according to the position coordinates of the key points and the depth image.

In a third aspect, the present application provides an electronic device, comprising: a processor, a memory, and a bus;

a bus for connecting the processor and the memory;

a memory for storing operating instructions;

and the processor is used for executing the image detection method of the first aspect of the application by calling the operation instruction.

In a fourth aspect, the present application provides a computer-readable storage medium storing a computer program for executing the image detection method of the first aspect of the present application.

The technical scheme provided by the embodiment of the application at least has the following beneficial effects:

acquiring an original image including an object to be detected and a reference image including the outer surface of the object to be detected, wherein the original image includes a color image and a depth image; determining a corresponding bounding box of the object to be detected in the color image according to the color image and the reference image; determining the position coordinates of key points of the object to be detected according to the surrounding frame; and determining the three-dimensional position information of the object to be detected according to the position coordinates and the depth image of the key points. Therefore, by introducing the reference picture, the position information of the object to be detected is accurately detected.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic flowchart of an image detection method according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of an acquisition bounding box provided by an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating scaling of a first image extracted from a bounding box to a first size according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram of obtaining position coordinates of key points according to an embodiment of the present application;

FIG. 5 is a schematic diagram of four keypoints and all points in a quadrilateral formed by the four keypoints according to an embodiment of the present disclosure;

fig. 6 is a schematic flowchart of another image detection method according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

For better understanding and description of the embodiments of the present application, some technical terms used in the embodiments of the present application will be briefly described below.

A neural network: the method is an arithmetic mathematical model simulating animal neural network behavior characteristics and performing distributed parallel information processing. The network achieves the aim of processing information by adjusting the mutual connection relationship among a large number of nodes in the network depending on the complexity of the system.

mask-rcnn model: the object detection and object segmentation model, nn in mask-rcnn, is a neural network whose inspiration comes from the working principle of biological neurons, a neural network is a collection of connected neurons, each of which outputs a signal according to input and internal parameters. When training a neural network, the internal parameters of the neurons are continually adjusted to obtain the desired output. C in mask-rcnn represents a convolution, and cnn uses fewer parameters and memory than conventional neural networks, which enables them to process larger images than conventional neural networks.

The Unet network: the Unet network comprises a feature extraction part and an up-sampling part, and is called as the Unet network because the network structure is U-shaped; a feature extraction part, which is one scale after passing through each pooling layer; and the upsampling part performs the same scale fusion with the number of channels corresponding to the feature extraction part every time of upsampling.

RANSAC algorithm: random sample Consensus (RANSAC) estimates the parameters of a mathematical model from a set of observed data that includes outliers in an iterative manner. The RANSAC algorithm assumes that the data contains both correct data and anomalous data (otherwise known as noise). Correct data are denoted as inner points (inliers) and abnormal data are denoted as outer points (outliers). RANSAC also assumes that, given a correct set of data, there is a way to calculate the model parameters that fit into the data. The core idea of the algorithm is randomness and hypothesis, wherein the randomness is to randomly select sampling data according to the occurrence probability of correct data, and the randomness simulation can approximately obtain a correct result according to a law of large numbers. The hypothesis is that the sampled data are all correct data, then the correct data are used to calculate other points through the model satisfied by the problem, and then the result is scored.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Example one

An embodiment of the present application provides an image detection method, a flowchart of the method is shown in fig. 1, and the method includes:

s101, acquiring an original image comprising an object to be measured and a reference image comprising the outer surface of the object to be measured, wherein the original image comprises a color image and a depth image.

Optionally, the reference map comprises a surface image of the object to be measured.

Optionally, the original image is obtained by a Depth camera, the original image is an RGB-D (Red, Green, Blue-Depth, Red, Green, Blue-Depth) image, the original image includes a color image and a Depth image, the color image corresponds to an RGB (Red, Green, Blue) image in the RGB-D image, and the Depth image corresponds to a Depth image (Depth Map) in the RGB-D image. The RGB color scheme is a color standard in the industry, and various colors are obtained by changing three color channels of Red, Green and Blue and superimposing the three color channels on each other, wherein RGB is a color representing three channels of Red, Green and Blue. In 3D computer graphics, a Depth Map is an image or image channel containing information about the distance of the surface of a scene object of a viewpoint. Where the Depth Map is similar to a grayscale image except that each pixel value thereof characterizes the actual distance of the sensor from the object. Usually, the RGB image and the Depth image are registered, so that there is a one-to-one correspondence between the pixel points.

Optionally, the color image includes one or more objects to be measured, the objects to be measured are boxes, the top surface images of all the boxes are the same, and the reference image is the top surface image of the box.

S102, determining a corresponding surrounding frame of the object to be detected in the color image according to the color image and the reference image.

Optionally, the bounding box corresponding to one object to be measured is a minimum rectangle that bounds the one object to be measured, and the length and width of the minimum rectangle are parallel to the image boundary including the one object to be measured.

And S103, determining the position coordinates of the key points of the object to be detected according to the enclosing frame.

Optionally, the object to be measured is a box, one box has four key points, the four key points respectively correspond to four vertex angles of the top image of the box, and coordinates of any point on the box on the 2D coordinates can be represented by coordinate values (X, Y).

And S104, determining the three-dimensional position information of the object to be detected according to the position coordinates and the depth image of the key points.

Optionally, the object to be measured is a box, one box has four key points, and the 3D coordinate of any point of the box can be represented by a three-dimensional vector (X, Y, Z), where X represents a coordinate value on an X-axis of the coordinate, Y represents a coordinate value on a Y-axis of the coordinate, Z represents a coordinate value on a Z-axis of the coordinate, and the coordinate value on the Z-axis is a depth value.

In the embodiment of the application, an original image comprising an object to be detected and a reference image comprising the outer surface of the object to be detected are obtained, wherein the original image comprises a color image and a depth image; determining a corresponding bounding box of the object to be detected in the color image according to the color image and the reference image; determining the position coordinates of key points of the object to be detected according to the surrounding frame; and determining the three-dimensional position information of the object to be detected according to the position coordinates and the depth image of the key points. Therefore, by introducing the reference picture, the position information of the object to be detected is accurately detected.

Optionally, determining a bounding box corresponding to the object to be detected according to the color image and the reference map, including:

inputting the color image to a first feature extraction layer to obtain a first feature map;

inputting the reference picture into a second feature extraction layer to obtain a second feature picture;

and performing feature matching on the first feature map and the second feature map, and determining a surrounding frame.

Optionally, performing feature matching on the first feature map and the second feature map, and determining an enclosure frame, including:

performing feature matching on the first feature map and the second feature map according to the spatial corresponding relationship of the first feature map and the second feature map to obtain a third feature map;

and overlapping the first characteristic diagram and the third characteristic diagram, and determining a surrounding frame.

Optionally, the size of the first feature map is (h, w, c1), the size of the second feature map is (h, w, c 2); and superposing the first characteristic diagram and the second characteristic diagram to obtain a third characteristic diagram, wherein the size of the third characteristic diagram is (h, w, c1+ c 2).

Optionally, the size of the first feature map is (h, w, c1), the size of the second feature map is (h, w, c 2); the first feature map and the second feature map have a certain corresponding relation in space, and the second feature map is connected with a space transformation Network (Spatial transformation Network) to obtain a new feature map, wherein the size of the new feature map is (h, w, c 2); and superposing the first characteristic diagram and the new characteristic diagram to obtain a third characteristic diagram, wherein the size of the third characteristic diagram is (h, w, c1+ c 2).

Optionally, as shown in fig. 2, the improved mask-rcnn model includes two feature extraction layers, a region candidate network and a region of interest pooling (Rolpooling) layer, where the two feature extraction layers are a first feature extraction layer and a second feature extraction layer respectively, and are used for feature extraction on the original image and the reference image respectively. For example, the object to be measured is a box, and a color image including two boxes is input to the first feature extraction layer to obtain a first feature map; the reference images corresponding to the two boxes are the same, the reference images are top surface images of the boxes, and the top surface images of the boxes are input to a preset second feature extraction layer to obtain a second feature image; splicing the first characteristic diagram and the second characteristic diagram to obtain a third characteristic diagram; inputting the third feature map into the regional candidate network to obtain an initial enclosure frame; inputting the initial enclosing frame and the third feature map into the interested region pooling layer, wherein each box corresponds to one enclosing frame, outputting the initial enclosing frame and the third feature map from the interested region pooling layer to obtain two enclosing frames corresponding to the two boxes, wherein the enclosing frame corresponding to each box is a minimum rectangle enclosing the box, and the length and the width of the minimum rectangle are parallel to the boundary of the box. In an embodiment, the area candidate network may be a network such as RPN, R-CNN, etc. for determining the position coordinates of the object to be measured in the third feature map, that is, the initial bounding box corresponding to the object to be measured, where the number of the initial bounding boxes may be multiple. And inputting the third characteristic diagram and a plurality of initial bounding boxes output from the regional candidate network into the region of interest pooling layer, updating the position coordinates of the object to be detected, and obtaining more accurate position coordinates of the object to be detected, thereby determining the final corresponding bounding box of the object to be detected in the original image.

Optionally, determining the position coordinates of the key points of the object to be measured according to the bounding box includes:

intercepting an area corresponding to the surrounding frame to obtain a first image;

the first image and the reference image are subjected to up-sampling, and a thermodynamic diagram comprising key points of the object to be detected is obtained;

and determining the position coordinates of the key points according to the thermodynamic diagram.

when the color image comprises N objects to be detected, extracting the N objects to be detected from N surrounding frames corresponding to the N objects to be detected respectively to obtain N first images, wherein the first images comprise the objects to be detected, and N is a positive integer;

scaling the N first images to a first size to obtain N second images;

and determining the position coordinates of the key points of the object to be detected according to the second image.

Optionally, as shown in fig. 3, when N is 2, one color image includes 2 bins, and the 2 bins are extracted from 2 bounding boxes corresponding to the 2 bins, respectively, to obtain 2 first images, where each first image includes one bin; scaling the 2 first images to the same first size results in 2 second images, the size of the bins in one second image being the same as the size of the bins in the other second image.

Optionally, as shown in fig. 4, the improved Unet network includes two feature extraction layers, an upsampling layer, and a network output four thermodynamic diagrams, where the two feature extraction layers are a third feature extraction layer and a fourth feature extraction layer, respectively. The object to be detected is a box, and when 2 second images exist, one second image comprising one box is input into the third feature extraction layer to obtain a fourth feature map; the reference images corresponding to the two boxes are the same, for example, top surface images of the boxes, and the top surface image of one box corresponding to the second image is input to the fourth feature extraction layer to obtain a fifth feature image; splicing the fourth characteristic diagram and the fifth characteristic diagram to obtain a sixth characteristic diagram; inputting the sixth characteristic diagram into an upper sampling layer to obtain four thermodynamic diagrams; from the four thermodynamic diagrams, 2D coordinates of the four key points of this box are obtained. The other second image is processed in the same manner as above to obtain the 2D coordinates of the four key points of the other box. The thermodynamic diagram comprises a gaussian distribution map centered on key points, four of which correspond to the four corners of the image of the top surface of the box.

Optionally, determining the position coordinates of the key points according to the thermodynamic diagram includes:

and obtaining the coordinates of the point with the maximum pixel value in each thermodynamic diagram in the thermodynamic diagram according to the thermodynamic diagram, and taking the coordinates of the point with the maximum pixel value in each thermodynamic diagram as the position coordinates of the key point, wherein the thermodynamic diagram comprises a Gaussian distribution diagram with the key point as the center, and the key point corresponds to the top corner of the reference diagram.

Optionally, determining three-dimensional position information of the object to be measured according to the position coordinates of the key points and the depth image, including:

obtaining the position coordinates of a plurality of points in a quadrangle formed by the four key points according to the position coordinates of the four key points;

obtaining three-dimensional position information of a plurality of points in the quadrangle according to the position coordinates of the plurality of points in the quadrangle and the depth image;

and determining the three-dimensional position information of the four key points according to the three-dimensional position information of the plurality of points in the quadrangle and the position coordinates of the four key points.

Alternatively, as shown in fig. 5, all points in a quadrilateral surrounded by four key points are selected, 3D coordinates of each point in the quadrilateral are calculated by combining camera parameters, a plane is fitted by a RANSAC algorithm according to the 3D coordinates of each point in the quadrilateral, the RANSAC algorithm can randomly select a part of points in the quadrilateral at a time, calculate a plane by a least square method, and then calculate an error between each point in the part of points and the plane. When the number of points with the error within the preset threshold value is enough, a new plane is recalculated by using the points, and after the new plane is calculated, the 2D coordinates of the four key points are substituted into the new plane, so that the 2D coordinates of the four key points are converted into the 3D coordinates of the four key points.

Optionally, determining three-dimensional position information of the four key points according to the three-dimensional position information of the plurality of points in the quadrangle and the position coordinates of the four key points includes:

performing plane fitting on the multiple points according to the three-dimensional position information of the multiple points in the quadrangle to obtain a fitting plane;

determining error values between each point of the plurality of points and the fitting plane according to the three-dimensional position information of the plurality of points and the fitting plane;

and when the number of the points of which the error values with the fitting plane are smaller than the preset threshold value is not smaller than the preset number, obtaining the three-dimensional position information of the four key points according to the three-dimensional position information of the points of which the error values with the fitting plane are smaller than the preset threshold value and the position coordinates of the four key points.

Another image detection method is provided in the embodiment of the present application, a flowchart of the method is shown in fig. 6, and the method includes:

s201, acquiring an original image and a reference image, wherein the original image comprises a color image and a depth image.

Optionally, the reference image is a top image of the box.

S202, inputting the color image comprising N boxes and the reference image into a first feature extraction layer and a second feature extraction layer respectively to obtain a first feature image and a second feature image respectively, wherein N is a positive integer.

Optionally, the first feature extraction layer and the second feature extraction layer are both converted from the image into feature maps through a convolutional neural network, the first feature map represents features of the color image, and the second feature map represents features of the reference map. The input of the feature extraction layer is a picture with the size h × w × 3, wherein 3 refers to 3 channels of RGB, and the output is a feature map of h '× w' × c. The first feature map and the second feature map are both feature maps of h '. w'. c.

S203, splicing the first characteristic diagram and the second characteristic diagram to obtain a third characteristic diagram.

Optionally, the first feature map and the second feature map are pieced together to form a feature map of h '× w' × 2c, and the third feature map is a feature map of h '× w' × 2 c.

And S204, inputting the third feature map into the regional candidate network to obtain an initial surrounding frame.

Optionally, the input of the regional candidate network RPN is a feature map of h '× w' × 2c, and the output is N initial bounding boxes, denoted as [ (x)₁，y₁，h₁，w₁)，…，(x_N，y_N，h_N，w_N)]. Taking any point i in the bounding box as an example, (x)_i，y_i，h_i，w_i) I is more than or equal to 1 and less than or equal to N, the three-dimensional coordinate and the depth information of the point i in the bounding box are represented, and the four numbers represent that the upper left corner is (x)_i，y_i) Length is h_iWidth is w_iThe initial bounding box. N is a positive integer.

S205, inputting the initial enclosing frame and the third feature map into the interested region pooling layer to obtain N enclosing frames.

Optionally, the input to the region of interest pooling layer is N initial bounding boxes [ (x)₁，y₁，h₁，w₁)，…，(x_N，y_N，h_N，w_N)]And h '. w'. 2c, the output being N bounding boxes [ (x₁’，y₁’，h₁’，w₁’)，…，(x_N’，y_N’，h_N’，w_N’)]. The bounding box position is more accurate than the position of the initial bounding box.

S206, extracting N first images according to the N surrounding frames, and scaling the N first images to the same size to obtain N second images with the same size.

Optionally, one color image includes N boxes, the N boxes are respectively extracted from N bounding boxes corresponding to the N boxes, so as to obtain N first images, and each first image includes one box; and zooming the N first images to the same first size to obtain N second images, wherein the sizes of boxes in the N second images are the same.

And S207, obtaining 2D coordinates of 4N key points of the N boxes according to the N second images with the same size.

Optionally, inputting one of the N second images with the same size to a third feature extraction layer to obtain a fourth feature map; inputting a reference picture into a fourth feature extraction layer to obtain a fifth feature picture; splicing the fourth characteristic diagram and the fifth characteristic diagram to obtain a sixth characteristic diagram; inputting the sixth characteristic diagram into an upper sampling layer to obtain four thermodynamic diagrams; obtaining 2D coordinates of four key points of one box according to the four thermodynamic diagrams; in the above manner, 2D coordinates of 4N keypoints for N bins are obtained.

Optionally, the feature extraction layer is a third feature extraction layer or a fourth feature extraction layer. The input of the feature extraction layer is a picture with the size h × w × 3, wherein 3 refers to 3 channels of RGB, and the output is a feature map of h '× w' × c. The feature extraction layer may be a network structure of resnet101, and the fourth feature map and the fifth feature map are both feature maps of h '. times.w'. times.c. And (3) splicing the fourth feature diagram and the fifth feature diagram into a feature diagram of h '. times.w'. times.2 c, wherein the sixth feature diagram is the feature diagram of h '. times.w'. times.2 c, the input of the up-sampling layer is the feature diagram of h '. times.w'. 2c, and the output is 4 thermodynamic diagrams of h.times.w.1, and each thermodynamic diagram corresponds to a key point of one box. Regarding the ith thermodynamic diagram, the coordinate of the point with the largest pixel value in the ith thermodynamic diagram is taken as the predicted key point coordinate (xi, yi) of the ith thermodynamic diagram, each thermodynamic diagram predicts one key point coordinate, 4 thermodynamic diagrams obtain 4 key point coordinates [ (x1, y1), (x2, y2), (x3, y3), (x4, y4) ] of one box, and i is any integer of 1, 2, 3 and 4.

And S208, obtaining 3D coordinates of the 4N key points according to the 2D coordinates and the depth images of the 4N key points.

Optionally, obtaining 2D coordinates of all points in a quadrilateral formed by the four key points of one box according to the 2D coordinates of the four key points of the one box; obtaining 3D coordinates of all points in the quadrangle according to the 2D coordinates and the depth images of all points in the quadrangle; the 3D coordinates of the four keypoints of a bin are determined from the 3D coordinates of all points in the quadrilateral and the 2D coordinates of the four keypoints. According to the same processing method, the 3D coordinates of 4N key points of N boxes are obtained.

The application of the embodiment of the application has at least the following beneficial effects:

by introducing the reference picture, the accurate detection of the 3D position information of the box is realized.

Example two

Based on the same inventive concept, the embodiment of the present application further provides an image detection apparatus, a schematic structural diagram of the apparatus is shown in fig. 7, and the image detection apparatus 30 includes a first processing module 301, a second processing module 302, a third processing module 303, and a fourth processing module 304.

A first processing module 301, configured to obtain an original image including an object to be measured and a reference image including an outer surface of the object to be measured, where the original image includes a color image and a depth image;

the second processing module 302 is configured to determine, according to the color image and the reference map, a corresponding bounding box of the object to be detected in the color image;

the third processing module 303 is configured to determine the position coordinates of the key points of the object to be detected according to the bounding box;

and the fourth processing module 304 is configured to determine three-dimensional position information of the object to be detected according to the position coordinates of the key points and the depth image.

Optionally, the second processing module 302 is specifically configured to input the color image to the first feature extraction layer to obtain a first feature map; inputting the reference picture into a second feature extraction layer to obtain a second feature picture; and performing feature matching on the first feature map and the second feature map, and determining a surrounding frame.

Optionally, the second processing module 302 is specifically configured to perform feature matching on the first feature map and the second feature map according to a spatial correspondence between the first feature map and the second feature map, so as to obtain a third feature map; and overlapping the first characteristic diagram and the third characteristic diagram, and determining a surrounding frame.

The third processing module 303 is specifically configured to intercept an area corresponding to the bounding box to obtain a first image; the first image and the reference image are subjected to up-sampling, and a thermodynamic diagram comprising key points of the object to be detected is obtained; and determining the position coordinates of the key points according to the thermodynamic diagram.

The third processing module 303 is specifically configured to obtain, according to the thermodynamic diagrams, coordinates of a point in each thermodynamic diagram where a pixel value is maximum, and use the coordinates of the point in each thermodynamic diagram where the pixel value is maximum as position coordinates of a key point, where the thermodynamic diagrams include a gaussian distribution diagram with the key point as a center, and the key point corresponds to a vertex angle of the reference diagram.

The fourth processing module 304 is specifically configured to obtain position coordinates of a plurality of points in a quadrangle formed by the four key points according to the position coordinates of the four key points; obtaining three-dimensional position information of a plurality of points in the quadrangle according to the position coordinates and the depth images of the plurality of points in the quadrangle; and determining the three-dimensional position information of the four key points according to the three-dimensional position information of the plurality of points in the quadrangle and the position coordinates of the four key points.

The fourth processing module 304 is specifically configured to perform plane fitting on the plurality of points according to the three-dimensional position information of the plurality of points in the quadrangle to obtain a fitting plane; determining error values between each point of the plurality of points and the fitting plane according to the three-dimensional position information of the plurality of points and the fitting plane; and when the number of the points of which the error values with the fitting plane are smaller than the preset threshold value is not smaller than the preset number, obtaining the three-dimensional position information of the four key points according to the three-dimensional position information of the points of which the error values with the fitting plane are smaller than the preset threshold value and the position coordinates of the four key points.

For the content that is not described in detail in the image detection apparatus provided in the embodiment of the present application, reference may be made to the image detection method provided in the first embodiment, and the beneficial effects that the image detection apparatus provided in the embodiment of the present application can achieve are the same as the image detection method provided in the first embodiment, and are not described herein again.

EXAMPLE III

Based on the same inventive concept, the embodiment of the present application further provides an electronic device, a schematic structural diagram of the electronic device is shown in fig. 8, the electronic device 7000 includes at least one processor 7001, a memory 7002 and a bus 7003, and the at least one processor 7001 is electrically connected to the memory 7002; the memory 7002 is configured to store at least one computer executable instruction, and the processor 7001 is configured to execute the at least one computer executable instruction so as to execute the steps of any one of the image detection methods as provided in any one of the embodiments or any one of the alternative embodiments of the present application.

Further, the processor 7001 may be an FPGA (Field-Programmable Gate Array) or other devices having logic processing capability, such as an MCU (micro controller Unit) and a CPU (Central processing Unit).

Example four

Based on the same inventive concept, the present application further provides a computer-readable storage medium storing a computer program, where the computer program is used to implement, when executed by a processor, any one of the embodiments of the present application or any one of the steps of the image detection method.

The computer-readable storage medium provided by the embodiments of the present application includes, but is not limited to, any type of disk (including floppy disks, hard disks, optical disks, CD-ROMs, and magneto-optical disks), ROMs (Read-Only memories), RAMs (random access memories), EPROMs (Erasable Programmable Read-Only memories), EEPROMs (Electrically Erasable Programmable Read-Only memories), flash memories, magnetic cards, or optical cards. That is, a readable storage medium includes any medium that stores or transmits information in a form readable by a device (e.g., a computer).

It will be understood by those within the art that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. Those skilled in the art will appreciate that the computer program instructions may be implemented by a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, implement the aspects specified in the block or blocks of the block diagrams and/or flowchart illustrations disclosed herein.

Those of skill in the art will appreciate that the various operations, methods, steps in the processes, acts, or solutions discussed in this application can be interchanged, modified, combined, or eliminated. Further, other steps, measures, or schemes in various operations, methods, or flows that have been discussed in this application can be alternated, altered, rearranged, broken down, combined, or deleted. Further, steps, measures, schemes in the prior art having various operations, methods, procedures disclosed in the present application may also be alternated, modified, rearranged, decomposed, combined, or deleted.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. An image detection method, comprising:

acquiring an original image comprising an object to be detected and a reference image comprising the outer surface of the object to be detected, wherein the original image comprises a color image and a depth image;

determining the position coordinates of the key points of the object to be detected according to the enclosing frame;

and determining the three-dimensional position information of the object to be detected according to the position coordinates of the key points and the depth image.

2. The method according to claim 1, wherein determining the bounding box corresponding to the object to be tested according to the color image and the reference map comprises:

inputting the reference image into a second feature extraction layer to obtain a second feature image;

and performing feature matching on the first feature map and the second feature map, and determining the surrounding frame.

3. The method of claim 2, wherein the performing feature matching on the first feature map and the second feature map and determining the bounding box comprises:

performing feature matching on the first feature map and the second feature map according to the spatial correspondence between the first feature map and the second feature map to obtain a third feature map;

and overlapping the first feature map and the third feature map to determine the surrounding frame.

4. The method of claim 1, wherein determining the location coordinates of the key points of the object to be measured according to the bounding box comprises:

up-sampling the first image and the reference image to obtain a thermodynamic diagram comprising key points of the object to be detected;

5. The method of claim 4, wherein determining the location coordinates of the keypoints from the thermodynamic diagram comprises:

6. The method according to claim 5, wherein the determining three-dimensional position information of the object to be measured according to the position coordinates of the key points and the depth image comprises:

7. The method of claim 6, wherein determining three-dimensional position information of the four keypoints from the three-dimensional position information of the plurality of points in the tetragon and the position coordinates of the four keypoints comprises:

and when the number of the points of the plurality of points, the error values of which are smaller than a preset threshold value, is not smaller than the preset number, obtaining the three-dimensional position information of the four key points according to the three-dimensional position information of the points, the error values of which are smaller than the preset threshold value, and the position coordinates of the four key points.

8. An image detection apparatus, characterized by comprising:

the third processing module is used for determining the position coordinates of the key points of the object to be detected according to the enclosing frame;

9. An electronic device, comprising: a processor, a memory;

the memory for storing a computer program;

the processor is configured to execute the image detection method according to any one of claims 1 to 7 by calling the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the image detection method according to any one of claims 1 to 7.