CN109816724B

CN109816724B - Three-dimensional feature extraction method and device based on machine vision

Info

Publication number: CN109816724B
Application number: CN201811474153.4A
Authority: CN
Inventors: 沈震; 熊刚; 李志帅; 彭泓力; 郭超; 董西松; 商秀芹; 王飞跃
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2018-12-04
Filing date: 2018-12-04
Publication date: 2021-07-23
Anticipated expiration: 2038-12-04
Also published as: WO2020114035A1; CN109816724A

Abstract

The invention belongs to the field of machine vision, and particularly provides a three-dimensional feature extraction method and device based on machine vision. The invention aims to solve the problems of complex and time-consuming process, difficult popularization and the like of three-dimensional model reconstruction in the prior art. To this end, the steps of the three-dimensional feature extraction method based on machine vision of the present invention include: acquiring a multi-angle image containing preset feature points to be detected of a target object; extracting the position information of the feature point to be detected in each image; acquiring spatial position information of the feature points to be detected according to the position information of the feature points to be detected in each image; and calculating first distance information and/or second distance information corresponding to a certain feature point to be detected based on the spatial position information and a preset three-dimensional feature category. And acquiring different-angle images containing the characteristic points to be detected through machine vision so as to acquire the spatial position information of the characteristic points to be detected, so that the distance information of the target object can be calculated.

Description

Three-dimensional feature extraction method and device based on machine vision

Technical Field

The invention belongs to the field of machine vision, and particularly relates to a three-dimensional feature extraction method and device based on machine vision.

Background

With the development of cloud manufacturing, cloud computing and the approach of "industry 4.0", a social manufacturing model, i.e., a model of custom-made production for customers, is produced. The social manufacturing method has the characteristics that the requirements of consumers can be directly converted into products, based on the social computing theory, based on the mobile internet technology, social media and 3D printing technology, the social people can fully participate in the whole life manufacturing process of the products in the modes of crowdsourcing and the like, and the personalized, real-time and economical production and consumption mode is realized. That is, in social manufacturing, each consumer may participate in various stages of the product's full lifecycle, including the design, manufacture, and consumption of the product. Taking shoemaking as an example, the application of social manufacturing in the shoemaking process is embodied in that a user can customize and select the shoemaking according to the requirement, so that the three-dimensional characteristics of the foot shape of the user can be simply, quickly and accurately obtained.

However, the original manual measurement can obtain fewer parameters of the foot shape, the foot shape cannot be accurately described, and only professional tools in the shoe making industry can obtain accurate measurement results. In order to enable non-professionals to obtain more accurate foot shape parameters so as to realize personalized customization of shoes, the invention provides a method for obtaining the foot shape parameters by adopting model building calculation. Because the arch height of each person and the included angle between the toes and the sole plane are different, if only two characteristic sizes of the foot length and the foot width are obtained, the difference of different individual foot types belonging to the same model cannot be accurately reflected, and therefore, the foot type needs to be subjected to three-dimensional model reconstruction to obtain accurate foot type parameters. At present, foot-shaped three-dimensional model reconstruction can be carried out through equipment such as laser three-dimensional scanning, but the method is complex and time-consuming to operate, high in hardware cost and difficult to popularize. Thus, a simpler three-dimensional modeling method is needed to accurately obtain the foot shape parameters.

Accordingly, there is a need in the art for a new three-dimensional model reconstruction method that solves the above-mentioned problems.

Disclosure of Invention

In order to solve the above problems in the prior art, that is, to solve the problems of complexity, time consumption, difficulty in popularization and the like of the existing three-dimensional model reconstruction process, a first aspect of the present invention discloses a three-dimensional feature extraction method based on machine vision, the three-dimensional feature extraction method comprising the following steps: acquiring a multi-angle image containing a reference object and preset to-be-detected feature points of a target object arranged relative to the reference object; extracting the position information of the feature point to be detected in each image; acquiring spatial position information of the feature points to be detected according to the position information of the feature points to be detected in each image; calculating first distance information and/or second distance information corresponding to a certain feature point to be detected based on the spatial position information and a preset three-dimensional feature category; the first distance information is distance information between the certain characteristic point to be measured and other characteristic points to be measured, and the second distance information is vertical distance information between the certain characteristic point to be measured and a preset plane; the certain feature point to be measured, the other feature points to be measured and the plane are all dependent on the three-dimensional feature category.

In a preferred embodiment of the above three-dimensional feature extraction method based on machine vision, the step of "extracting the position information of the feature point to be measured in each image" includes: acquiring the pixel position of the characteristic point to be detected in a certain image by using a manual marking method; and extracting the corresponding pixel positions of the feature points to be detected in other images by using a preset feature point matching method and according to the acquired pixel positions.

In a preferred embodiment of the above three-dimensional feature extraction method based on machine vision, the step of "extracting the position information of the feature point to be measured in each image" includes: acquiring the area shape corresponding to the area where the characteristic point to be detected in the target object is located; acquiring a region to be detected corresponding to each image according to the region shape; and acquiring the position information of the feature point to be detected in each image according to the relative position between the feature point to be detected and the shape of the region and each region to be detected.

In a preferred embodiment of the above three-dimensional feature extraction method based on machine vision, the step of "extracting the position information of the feature point to be measured in each image" includes: acquiring the position information of the feature point to be detected in each image by utilizing a pre-constructed neural network; the neural network is a deep neural network which is based on a preset training set and trained by using a deep learning correlation algorithm.

In a preferred embodiment of the three-dimensional feature extraction method based on machine vision, the step of "obtaining spatial location information of the feature point to be measured according to location information of the feature point to be measured in each image" includes: and acquiring the Euclidean position of the feature point to be detected by using a triangulation method according to the position information of the feature point to be detected in each image and the internal and external parameters of the camera.

In a preferred embodiment of the three-dimensional feature extraction method based on machine vision, the step of "obtaining spatial location information of the feature point to be measured according to location information of the feature point to be measured in each image" includes: constructing a sparse model by using an incremental SFM method and the position information of the characteristic point to be detected in each image, and calculating the spatial position information of the characteristic point to be detected in a world coordinate system by using a triangulation method; and restoring the spatial position information of the characteristic point to be detected in the above steps in the world coordinate system by using the scale coefficient obtained in advance to obtain the real position of the characteristic point to be detected.

In a preferred technical solution of the three-dimensional feature extraction method based on machine vision, before "recovering, by using a scale coefficient obtained in advance, spatial position information of the feature point in the world coordinate system obtained in the above step to obtain a true position of the feature point to be measured", the three-dimensional feature extraction method based on machine vision further includes: acquiring coordinates of the vertex of the reference object in a world coordinate system by using the sparse model and according to the pixel position of the vertex of the reference object in the camera coordinate system, wherein the difference between the vertex coordinates and the real space position in the world coordinate system needs to be noted by a scale coefficient lambda; and calculating the scale coefficient lambda according to the coordinates of the vertex of the reference object in the world coordinate system and the real space position of the vertex of the reference object.

In a preferred embodiment of the above three-dimensional feature extraction method based on machine vision, the triangularization method includes: and acquiring the projective space position of the feature point to be detected according to the internal and external parameters of the camera and the position information of the feature point to be detected in each image, and performing homogenization processing on the projective space position to obtain the Euclidean space position of the feature point to be detected.

The technical solution of the present invention is to obtain images of different angles of a target object and extract the position of a feature point to be measured in the images, then calculate the spatial position of the feature point to be measured in a world coordinate system by using a triangulation method or a sparse reconstruction problem, and calculate first distance information and/or second distance information between the feature points according to the calculated spatial position information of the feature point to be measured. The three-dimensional feature extraction method can quickly determine the three-dimensional feature points of the target object only through the multi-angle image acquired by the photographing equipment, further calculates to obtain the distance information of the target object, does not need to use hardware equipment with high cost and complex operation, such as laser three-dimensional scanning and the like, and simplifies the three-dimensional reconstruction process.

In the preferred technical scheme of the invention, the pixel position of the characteristic point to be detected in each image is determined by manual marking or an automatic method, wherein the automatic method comprises the step of acquiring the position information of the characteristic point to be detected in each image by reusing the region to be detected of each image or utilizing a pre-constructed neural network according to the region shape corresponding to the region where the characteristic point to be detected is located. And then, the camera parameters are automatically calibrated by using a reference object, and then the real space position of the characteristic point to be detected is obtained by carrying out triangulation or solving through a sparse reconstruction problem, so that the model reconstruction of the whole target object is not needed, the calculated amount can be reduced, and the model establishment process is simplified. And finally, calculating distance information corresponding to the feature points to be detected based on the real space position of the feature points to be detected and the preset three-dimensional feature category.

A second aspect of the invention provides a storage device storing a plurality of programs adapted to be loaded by a processor to perform the machine vision based three-dimensional feature extraction method of any of the preceding claims.

It should be noted that the storage device has all the technical effects of the foregoing three-dimensional feature extraction method based on machine vision, and details are not repeated here.

The third aspect of the present invention also provides a control apparatus comprising a processor and a storage device, wherein the storage device is adapted to store a plurality of programs, and the programs are adapted to be loaded by the processor to perform the machine vision-based three-dimensional feature extraction method according to any one of the preceding claims.

It should be noted that the control device has all the technical effects of the aforementioned three-dimensional feature extraction method based on machine vision, and details are not repeated herein.

Drawings

The three-dimensional feature extraction method based on machine vision of the present invention is described below with reference to the accompanying drawings in conjunction with a foot shape. In the drawings:

FIG. 1 is a flow chart of the main steps of a method for extracting three-dimensional features of foot shapes based on machine vision in the embodiment of the present invention;

fig. 2 is a schematic diagram of detecting feature points by using a generalized hough transform with a circle as a template in a foot-shaped three-dimensional feature extraction method based on machine vision according to an embodiment of the present invention;

fig. 3 is a schematic diagram of detecting feature points by using a generalized hough transform with a circle as a template in a foot-shaped three-dimensional feature extraction method based on machine vision according to an embodiment of the present invention;

fig. 4 is a schematic diagram of detecting feature points by using a generalized hough transform with a circle as a template in a foot-shaped three-dimensional feature extraction method based on machine vision according to an embodiment of the present invention;

fig. 5 is a schematic diagram of detecting a reference object by using a generalized hough transform with a straight line as a template according to a foot-shaped three-dimensional feature extraction method based on machine vision in the embodiment of the present invention;

FIG. 6 is a schematic diagram of a process of solving spatial position information of feature points in a triangularization process of a foot-shaped three-dimensional feature extraction method based on machine vision according to an embodiment of the present invention;

fig. 7 is a schematic process diagram of solving the spatial position information of the feature points in the sparse reconstruction process of the foot-shaped three-dimensional feature extraction method based on machine vision in the embodiment of the present invention.

Detailed Description

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention. For example, although the present invention has been described with reference to a foot shape, other objects, such as clothing, may be converted into a product by modeling. In addition, the invention has been described with reference to A4 paper, but may also be other objects of known dimensions (such as floor tiles). And can be adjusted as needed by those skilled in the art to suit particular applications.

It should be noted that the terms "first", "second" and "third" in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The following describes a method for extracting three-dimensional features of a foot shape based on machine vision according to the present invention with reference to the accompanying drawings.

In a specific embodiment of the invention, the extraction and calculation process of the three-dimensional characteristic parameters of the foot shape is converted into the determination of the spatial position of the corresponding characteristic point, and then the characteristic parameters of the foot shape to be measured are calculated by using the euclidean distance formula. Wherein, the basic parameters of the foot shape that can be obtained include: foot type parameter information required by shoe manufacturing, such as foot length, foot circumference, instep circumference height, arch upper bending point height, foot width, thumb height, heel convexity point height, ankle external bone central point height and the like. The following describes a possible implementation manner of the foot-shape three-dimensional feature extraction method based on machine vision, taking three parameters of foot length, foot width and ankle point height as an example.

Referring to fig. 1, fig. 1 exemplarily shows main steps of a foot-shaped three-dimensional feature extraction method based on machine vision in an embodiment of the present invention, and the foot-shaped three-dimensional feature extraction method based on machine vision in the present invention may include the following steps:

and S100, acquiring a multi-angle image containing the preset feature points to be detected of the target object.

Specifically, a foot is placed on a4 paper, and images of the foot shape at multiple angles are taken by using a mobile photographing device such as a camera, so that the characteristics of the foot shape can be fully expressed, enough characteristic points to be measured can be obtained, such as the top point of the longest toe and the salient point of the heel, so as to calculate the length of the foot shape to be measured, the outer side point of the thumb ball and the outer side point of the root of the tail toe, so as to calculate the width of the foot shape to be measured, the ankle point, so as to calculate the height of the ankle point, and the like. It should be noted that the number of the captured images of the foot shape should be at least three or more, and the more the number of the images including the feature point to be measured is, the more accurate the foot shape parameter calculated from the feature point to be measured is.

And step S200, extracting the position information of the feature point to be detected in each image.

Specifically, in a preferred embodiment of this embodiment, the three-dimensional feature extraction method shown in fig. 1 may obtain the pixel position (x, y) of the feature point to be detected in each image according to the following steps, specifically:

firstly, the pixel position of the Feature Point to be detected in a certain image is marked manually, and then the corresponding pixel position of the Feature Point to be detected in other images is found by using a Feature Point matching method, such as Scale Invariant Feature Transform (SIFT) or Iterative Closest Point (ICP). Taking the measurement of the height of the ankle as an example, an image including the ankle point is selected, the pixel position of the ankle point in the image is manually marked, and then the corresponding pixel position of the ankle point in the images of other angles including the ankle point is found by using a feature point matching method such as SIFT or ICP. By the method, the corresponding pixel positions of the characteristic points to be detected in all the images can be quickly found without manually marking the characteristic points on each image, and the efficiency of acquiring the pixel positions of the characteristic points is improved.

Optionally, in another preferred embodiment of this embodiment, the three-dimensional feature extraction method shown in fig. 1 may further obtain a pixel position (x, y) of the feature point to be detected in each image according to the following steps, specifically:

and according to the uniqueness of the shape of the region where the feature point to be detected is located, detecting the specific shape by using a feature detection method such as generalized Hough transform so as to determine the position information of the feature point to be detected in each image. Specifically, the method comprises the steps of firstly determining the area shape corresponding to the area where the feature point to be detected is located, then automatically finding the corresponding area to be detected of the feature point to be detected in each image according to the area shape and by utilizing generalized Hough transform, and then obtaining the position information of the feature point to be detected in each image according to the relative position between the feature point to be detected and the area shape and the area to be detected in each image. In the following, a possible implementation is described by taking a circle as a template and finding feature points by using a generalized hough transform as an example.

Referring to fig. 2, fig. 3 and fig. 4, fig. 2 is a schematic diagram of detecting feature points by using a generalized hough transform with a circle as a template according to a foot-shaped three-dimensional feature extraction method based on machine vision in the embodiment of the present invention; fig. 3 is a schematic diagram of detecting feature points by using a generalized hough transform with a circle as a template in a foot-shaped three-dimensional feature extraction method based on machine vision according to an embodiment of the present invention; fig. 4 is a schematic diagram of detecting feature points by using a generalized hough transform with a circle as a template according to a foot-shaped three-dimensional feature extraction method based on machine vision in an embodiment of the present invention, and fig. 2, 3, and 4 respectively show specific implementations of finding feature points by using a generalized hough transform with an ankle point in a circular area as a template at different angles. As shown in fig. 2, 3 and 4, the ankle where the ankle center is located is circular, and as can be seen from the figures, this circular outline is unique in the foot shape, so that when the generalized hough transform is used, the circular shape is taken as a template, a circular position (a circular template shown by a dotted line in fig. 2-4) is automatically found in the image, the position is the position where the ankle is located, and the center G point of the circular position is searched, namely the position of the ankle point of the feature point to be measured in the image.

It can be understood that, when determining the position information of the vertex of the longest toe, the outline of the longest toe can be used as a template of the generalized hough transform, a search is performed in the image, and after finding the outline of the toe, the pixel position of the feature point is determined by the relative position of the outline and the vertex of the longest toe.

and constructing a deep neural network based on data samples of the characteristic points of the foot type marked by enough quantity and by using a deep learning algorithm, and then acquiring the position information of the characteristic points to be detected in each image by using the neural network. Specifically, when the neural network is trained, image data containing the feature point to be detected is input, the pixel position (x, y) of the feature point to be detected in the image is output, wherein the output comprises real output and expected output, the real output of the last full-connection layer of the network is the pixel position (x, y) of the feature point to be detected in the image, and the expected output of the network is the marked actual pixel position of the feature point to be detected in the image. And then reversely training the whole network by utilizing an error generated by the real output and the expected output of the network, iteratively training until the network is converged, inputting a certain image to be tested containing the characteristic point to be tested after the neural network is trained, and automatically outputting the pixel position of the neural network in the image by the neural network. Taking the pixel position of the ankle point as an example, selecting a sufficient amount of image samples marked with the ankle point as a training set, constructing a deep neural network, then training the deep neural network by using the training set, inputting an image to be tested containing the ankle point after the training is finished, and automatically outputting the pixel position of the ankle point in the image by using the neural network. It can be understood that when the pixel positions of other feature points are determined, the image data samples corresponding to the feature points are used for training a pre-constructed deep neural network, and then the image to be measured containing the feature points is input, so that the pixel positions of the feature points in the image are obtained.

And step S300, acquiring the spatial position information of the feature point to be detected according to the position information of the feature point to be detected in each image.

Specifically, in a preferred embodiment of this embodiment, the three-dimensional feature extraction method shown in fig. 1 may obtain spatial position information of the feature point to be measured according to the following steps:

firstly, calibrating camera parameters by using a reference object, and then calculating the spatial position information of the characteristic point to be measured by using a triangulation method. Specifically, taking a sheet of a4 as a reference, a foot shape is placed on a sheet of a4, and a plurality of images at different angles, including the outline of the sheet of a4, are acquired by an imaging device such as a camera. And calibrating the camera by using the images at different angles, and determining an internal parameter matrix K of the camera, and a rotation matrix R and a translation matrix t of external parameters relative to a world coordinate system. And then, according to the pixel position (X, Y) of the feature point to be measured in the image obtained in the step S200, and by utilizing a triangulation method and a homogenization method, the spatial position information (X, Y, Z) of the feature point to be measured in a world coordinate system is solved. A possible implementation of obtaining the true spatial position of the feature point by the triangulation method is described below with reference to fig. 5 and 6.

Referring to FIG. 5, FIG. 5 is a machine-based representation of an embodiment of the present inventionThe visual foot-shaped three-dimensional feature extraction method is characterized in that a straight line is used as a template to detect a schematic diagram of a reference object by using generalized Hough transform. As shown in fig. 5, the edge straight line of a4 paper in the image is detected using a straight line template and using a random hough transform. It can be seen that four edge straight lines are detected, each straight line intersects with each other in pairs, and the intersection point is the pixel position (x) of the four vertexes (A, B, C, D) of the A4 paper_i，y_i) And i is 1, 2, 3, 4. With continued reference to fig. 2, 3 and 4, the following relationship between points a in euclidean space and projective space can be derived from knowledge of the spatial geometry transformation:

the parameters K, R and t in equation (1) are the in-camera parameter matrix, the rotation matrix and the translation matrix of the camera relative to the world coordinate system ([ R | t [ ]), respectively]Collectively referred to as the camera extrinsic parameter matrix). Wherein the symbol "|" represents an augmented matrix, r₁、r₂、r₃Respectively, the expansion form of the rotation matrix R of the camera relative to the world coordinate system, and R is obtained by matrix multiplication₃Multiplied by 0 element to be eliminated.

Wherein,

is the pixel position of the apex A of A4 paper, (X)^A，Y^A，Z^A)^TIs its true position in the world coordinate system, K [ R | t]Are the internal and external parameters of the camera. Homography matrix H ═ K [ r ]₁r₂|t]With 8 degrees of freedom, the world coordinate system is established on the vertex a of a4 paper, and the world coordinate systems of the four vertices of a4 paper are (0, 0, 0), (X, 0, 0), (0, Y, 0), (X, Y, 0), where X is 210mm and Y is 297 mm. Each vertex can be written in the form of equation (1) to construct two sets of linear equations. Therefore, four sets of vertices can construct 8 sets of Linear equations, and H is solved by a Direct Linear Transform (DLT) method.

The three photos are obtained at different angles, so that the cameras of the three photos have different poses according to the same methodThree homography matrixes H in the camera of the world coordinate system can be obtained by the method₁，H₂，H₃。

K can be determined from the homography matrix H, since H ═ H₁ h₂ h₃]＝K[r₁ r₂|t]Thus, it is possible to obtain:

K^-1[h₁ h₂ h₃]＝[r₁ r₂|t] (2)

parameter K in equation (2)^-1R, t and H are the inverse of the parameter matrix within the camera, the camera rotation relative to the world coordinate system matrix, translation matrix and homography matrix, respectively. Wherein r is₁、r₂Are respectively rotation matrixes of camera external parameters relative to a world coordinate system, h, obtained through two images at different angles₁、h₂、h₃Three homography matrices in the camera are obtained relative to the world coordinate system through three images at different angles.

Wherein R ═ R₁ r₂ r₃]Is a rotation matrix, has orthogonal properties, namely: r is₁ ^Tr₂0 and | r₁‖＝‖r₂| ═ 1. Thus, it is possible to obtain: h is₁ ^TK^-TK^-1h₂0, further, one can obtain:

h₁ ^TK^-TK^-1h₁＝h₂ ^TK^-TK^-1h₂ (3)

parameter K in equation (3)^-TAnd K^-1Orthogonal and inverse matrices, h, respectively, of the transpose of the intra-camera parameter matrix₁、h₂Two sets of homography matrices, h, respectively, in the camera relative to the world coordinate system obtained from two of the images at different angles₁ ^T、h₂ ^TIs a homography matrix h₁、h₂The transpose matrix of (2) can obtain the constraint equation of the intrinsic parameters of two cameras for every two images.

The camera intrinsic parameter matrix K is an upper triangular matrix, and w is equal to K^-TK^-1The method is a symmetric array, w is solved linearly through DLT according to images of three different angles in fig. 2, 3 and 4, and K can be solved through orthogonal decomposition. As shown in the formula (1), [ r ]₁ r₂|t]＝K^-1[h₁ h₂ h₃]H obtained by combining the above solutions₁、h₂、h₃And K, can be solved to obtain r₁、r₂And t. Deriving r from the orthogonality of the rotation matrices₃＝r₁×r₂Thus R ═ R₁ r₂ r₃]. The method can obtain internal and external parameters K [ R ] of the camera when the camera is shot in figures 2, 3 and 4₁|t₁]、K[R₂|t₂]、K[R₃|t₃]。

Referring to fig. 6, fig. 6 is a schematic diagram illustrating a process of solving feature point spatial position information in a triangularization process of a foot-shaped three-dimensional feature extraction method based on machine vision in the embodiment of the present invention. As shown in fig. 6, a triangularization process taking the ankle point G in fig. 3 and 4 (i.e., Image1 and Image2) as an example is shown, and the pixel position x of the ankle point G in Image1 and Image2 obtained in step S200 is shown according to the ankle point G in step S200₁And x₂And the internal and external parameters P of the camera obtained in the above steps₁＝K₁[R₁|t₁]、P₂＝K₂[R₂|t₂]The minimum min sigma of the quadratic sum of reprojection errors is carried out in sequence_i‖P_iX-x_iII, obtaining the position X of the characteristic point to be measured in the projection space as (M, N, O, w), wherein P₁、P₂The internal and external parameters, K, of the images of Image1 and Image2 taken by the camera according to the calibration method₁、K₂The parameter matrices R in the camera are the two images of Image1 and Image2, respectively₁、R₂The rotation matrices t are relative to the world coordinate system when the camera takes two images, Image1 and Image2, respectively₁、t₂Respectively translation matrices. Finally, by homogenizing the projective space coordinates, the euclidean space position X of the feature point G to be measured (M/w, N/w, O/w) ═ X, Y, Z) can be obtained, where M, N, O, w are respectivelyIs the position coordinates of the feature point G in the projective space.

Optionally, in another preferred embodiment of this embodiment, the three-dimensional feature extraction method shown in fig. 1 may further obtain the real spatial position of the feature point to be detected according to the following steps, specifically:

and (3) converting the three-dimensional reconstruction problem into the sparse reconstruction problem of the characteristic points to be detected, such as constructing a sparse model by using an incremental SFM method and solving the sparse reconstruction problem by using a triangulation method. Specifically, according to the pixel position (X, Y) of the feature point to be measured in the multiple images obtained in step S200, unlike the previous embodiment, the incremental SFM method is used to directly solve the intra-camera parameter matrix K, the camera rotation matrix R, the translation t with respect to the world coordinate, and the coordinate λ (X, Y, Z) of the feature point to be measured in the world coordinate system, omitting the process of labeling the camera with the reference object, and then determining the scale coefficient λ using the reference object with known specifications, thereby obtaining the real spatial position coordinate (X, Y, Z) of the feature point. A possible implementation of solving the sparse reconstruction problem by using the incremental SFM method is described below with reference to fig. 7, which takes 3 images from different angles as an example.

Referring to fig. 7, fig. 7 is a schematic diagram illustrating a process of solving feature point spatial position information in a sparse reconstruction process of a foot-shaped three-dimensional feature extraction method based on machine vision in the embodiment of the present invention. As shown in fig. 7, the step of solving the sparse reconstruction problem by using the incremental SFM method specifically includes:

step 1: two images, Image1 and Image2, were randomly picked out of 3 images at different angles to determine an initial Image pair, and initial values of internal and external parameters [ R | t ] of cameras capturing the images, Image1 and Image2, were calculated by an incremental SFM method]Matrix: by using 5 sets of feature point pairs (longest toe apex and heel convex point, thumb ball outer side point and tail toe root outer side point, ankle point) in images Image1 and Image2, essence matrix E corresponding to images Image1 and Image2 is calculated by using 5-point method₁And E₂Wherein E ═ R | t]The camera rotation matrix R can be decomposed from the essential matrix E₁、R₂And translation t relative to world coordinates₁、t₂And (4) matrix. Then, combining the pixel positions of the feature point to be measured in the camera coordinate system obtained in the step S200 in the images Image1 and Image2 to construct an initial sparse model;

step 2: according to the initial sparse model constructed in the step 1, and a triangulation method is utilized to calculate the position coordinates lambda (X) of the feature point to be measured under the world coordinate system in the images Image1 and Image2₁，Y₁，Z₁) And λ (X)₂，Y₂，Z₂)；

And step 3: inputting the pixel position of the feature point to be measured of the Image3 in the camera coordinate system obtained in the step S200 into the initial sparse model obtained in the step 2, and acquiring the camera internal and external parameters [ R | t |)]Matrices, i.e. camera rotation matrices R₃And translation t relative to world coordinates₃Correcting the initial sparse model by utilizing the internal and external parameters of the camera;

and 4, step 4: according to the sparse model corrected in the step 3, a triangulation method is used for calculating a space position coordinate lambda (X) of the characteristic point to be measured in the world coordinate system in the Image3₃，Y₃，Z₃)；

And 5: and (4) correcting the position coordinates of the feature points obtained in the steps (2) and (4) by using a Bundle Adjustment (BA) method to obtain an optimized sparse model.

And 5, repeatedly binding and adjusting the to-be-measured feature points at different coordinate positions obtained in the rest of other images until the error of the coordinates lambda (X, Y, Z) of the to-be-measured feature points obtained by two times of calculation is less than or equal to a preset threshold value.

Although the present invention provides only one specific implementation scheme of solving spatial position information of a feature point to be measured in three images by using an incremental SFM method, it can be understood by those skilled in the art that the incremental SFM method provided by the present invention can also be used to solve images at a plurality of different angles, in the process of constructing a sparse model by using the incremental SFM method, pixel position information of the feature point to be measured in a new image under a camera coordinate system is repeatedly substituted, internal and external parameters of a camera are re-acquired, and the sparse model is corrected by using the internal and external parameters of the camera until all the obtained images are added to the sparse model. It can be understood that the more different angles of the acquired image, the more times of iterative computation, the more accurate the obtained internal and external parameters of the camera, and the more accurate the spatial position information of the feature point to be measured in the world coordinate system, which is obtained by computing according to the sparse model constructed by the method.

Step 6: with point a in fig. 4 as the origin of coordinates, the spatial coordinates of vertex D are (M, N, 0) and the true spatial position of vertex D is (210mm, 297mm, 0) by calculation using the sparse model obtained in step 5 based on the pixel position information of vertex D of a4 paper in the camera coordinate system obtained in step S200, and therefore, the scale coefficient λ is 210mm/M or 297 mm/N. And (5) combining the space coordinates lambda (X, Y, Z) of the characteristic point to be detected in the world coordinate system obtained in the step (5), and dividing the space coordinates lambda by the scale coefficient lambda to obtain the real space position (X, Y, Z) of the characteristic point to be detected.

Step S400, calculating first distance information and/or second distance information corresponding to a certain feature point to be measured based on the spatial position information and the preset three-dimensional feature category.

It should be noted that the first distance information is distance information between a certain feature point to be measured and other feature points to be measured, such as length, and the second distance information is vertical distance information between the certain feature point to be measured and a preset plane, such as height.

Specifically, taking the foot shape as an example, the spatial position information of the five feature points to be measured calculated in step S300 is obtained, for example, the longest toe vertex is (X)₁，Y₁，Z₁) The heel salient point is (X)₂，Y₂，Z₂) The point on the outer side of the ball of the thumb is (X)₃，Y₃，Z₃) The outer point of the root of the tail toe is (X)₄，Y₄，Z₄) The ankle point is (X)₅，Y₅，Z₅) Using distance formulas, e.g. Euclidean distance formulas

The following calculation can be obtainedThe formula:

parameters L, W and H in equation (4) are foot length, foot width, and ankle height, respectively.

Thus, three parameters of foot length, foot width and ankle point height can be obtained. Although the present invention provides only one specific embodiment of calculating three parameters of foot length, foot width and ankle point height by extracting three-dimensional feature points, it will be understood by those skilled in the art that the three-dimensional feature extraction method provided in the present invention can also calculate other foot type parameters, such as calculating instep height, in which case, the images at different angles all need to include feature points of the instep point, and then calculate the instep height sequentially according to the steps of the three-dimensional feature extraction method of the present invention described in the above embodiments.

In summary, in a preferred technical solution of the present invention, an image capturing device is used to obtain five images at different angles including five feature points to be measured, namely, a longest toe vertex of a foot shape, a heel convex point, an outer point of a thumb ball, an outer point of a tail toe root and an ankle point, pixel position information of each feature point to be measured in each image is determined through a manual marking or automatic method, and then a real space position of the feature point to be measured is obtained through manual calibration of camera parameters and subsequent triangulation or through sparse reconstruction problem solution, so that model reconstruction of the whole object is not required, calculation amount can be reduced, and a model building process is simplified. And finally, based on the space positions of the five characteristic points to be measured, calculating three foot type parameters of the foot length, the foot width and the ankle point height by using an Euclidean distance formula. By analogy, images of different angles of different feature points are obtained, and foot shape parameters corresponding to the feature points can also be calculated, for example, images of different angles including instep points are obtained, and spatial position information of the instep points can be calculated according to the steps, so that the parameter of the instep height is calculated.

Further, based on the above method embodiments, the present invention also provides a storage device, where multiple programs are stored, and the programs may be suitable for being loaded by a processor to execute the machine vision-based three-dimensional feature extraction method described in the above method embodiments.

Furthermore, based on the above method embodiments, the present invention also provides a control apparatus, which includes a processor and a storage device, wherein the storage device may be adapted to store a plurality of programs, and the programs may be adapted to be loaded by the processor to execute the machine vision-based three-dimensional feature extraction method described in the above method embodiments.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A three-dimensional feature extraction method based on machine vision is characterized by comprising the following steps:

the method comprises the steps that a multi-angle image containing a reference object and preset feature points to be detected of a target object arranged relative to the reference object is obtained through a mobile camera, and the number of the multi-angle images is at least three;

extracting the position information of the feature point to be detected in each image, wherein the feature point to be detected is on a target object;

acquiring spatial position information of the feature points to be detected according to the position information of the feature points to be detected in each image, specifically, calibrating camera parameters by using a reference object, and calculating the spatial position information of the feature points to be detected by using a triangulation method;

calculating first distance information and/or second distance information corresponding to a certain feature point to be detected based on the spatial position information and a preset three-dimensional feature category;

the first distance information is distance information between the certain characteristic point to be measured and other characteristic points to be measured, and the second distance information is vertical distance information between the certain characteristic point to be measured and a preset plane; the certain feature point to be measured, the other feature points to be measured and the plane all depend on the three-dimensional feature category;

wherein the reference object is an object of known dimensions;

wherein the multi-angle image simultaneously comprises the reference object and the target object arranged relative to the reference object;

the step of "extracting the position information of the feature point to be detected in each image" is to obtain the pixel position (x, y) of the feature point to be detected in each image;

the model reconstruction is not performed on the whole target object, and the step of acquiring the spatial position information of the feature point to be detected according to the position information of the feature point to be detected in each image is to acquire the real spatial position (X, Y, Z) of the feature point to be detected.

2. The machine-vision-based three-dimensional feature extraction method according to claim 1, wherein the step of "extracting the position information of the feature point to be measured in each image" includes:

acquiring the pixel position of the characteristic point to be detected in a certain image by using a manual marking method;

and extracting the corresponding pixel positions of the feature points to be detected in other images by using a preset feature point matching method and according to the acquired pixel positions.

3. The machine-vision-based three-dimensional feature extraction method according to claim 1, wherein the step of "extracting the position information of the feature point to be measured in each image" includes:

acquiring the area shape corresponding to the area where the characteristic point to be detected in the target object is located;

acquiring a region to be detected corresponding to each image according to the region shape;

and acquiring the position information of the feature point to be detected in each image according to the relative position between the feature point to be detected and the shape of the region and each region to be detected.

4. The machine-vision-based three-dimensional feature extraction method according to claim 1, wherein the step of "extracting the position information of the feature point to be measured in each image" includes:

acquiring the position information of the feature point to be detected in each image by utilizing a pre-constructed neural network;

the neural network is a deep neural network which is based on a preset training set and trained by using a deep learning correlation algorithm.

5. The machine-vision-based three-dimensional feature extraction method according to any one of claims 1 to 4, wherein the step of acquiring spatial position information of the feature point to be measured from position information of the feature point to be measured in each of the images comprises:

and acquiring the Euclidean position of the feature point to be detected by using a triangulation method according to the position information of the feature point to be detected in each image and the internal and external parameters of the camera.

6. A storage device having stored therein a plurality of programs, characterized in that said programs are adapted to be loaded by a processor for performing the method of machine vision based three-dimensional feature extraction according to any of claims 1-5.

7. A control apparatus comprising a processor and a storage device adapted to store a plurality of programs, characterized in that the programs are adapted to be loaded by the processor to perform the machine vision based three-dimensional feature extraction method of any one of claims 1-5.