CN113066132A

CN113066132A - 3D modeling calibration method based on multi-device acquisition

Info

Publication number: CN113066132A
Application number: CN202110407269.1A
Authority: CN
Inventors: 左忠斌; 左达宇
Original assignee: Tianmu Aishi Beijing Technology Co Ltd
Current assignee: Tianmu Aishi Beijing Technology Co Ltd
Priority date: 2020-03-16
Filing date: 2020-03-16
Publication date: 2021-07-02
Also published as: CN111445528B; CN111445528A; WO2021185215A1

Abstract

The embodiment of the invention provides a 3D modeling calibration method based on multi-device acquisition.A first acquisition device acquires an image of an object A; the second acquisition equipment acquires an image of the object B; the first acquisition equipment and the second acquisition equipment both acquire images of the object C; wherein the object a is not within the acquisition range of the first acquisition device; the object B is not in the acquisition range of the second acquisition equipment; the object A is a calibration object, and the object B is a target object; or the object A is a target object, and the object B is a calibration object; the calibration object is provided with a plurality of calibration points; and calibrating the coordinates of the target object according to the coordinates of the plurality of calibration points. The absolute size calibration of the remote target object is realized by a multi-camera relay shooting method.

Description

3D modeling calibration method based on multi-device acquisition

Technical Field

The invention relates to the technical field of topography measurement, in particular to the technical field of 3D topography measurement.

Background

At present, when 3D acquisition and measurement are performed visually, a camera is usually rotated relative to a target object, or a plurality of cameras are arranged around the target object to perform acquisition simultaneously. For example, in the Digital Emily project of the university of California, a spherical bracket is adopted, and hundreds of cameras are fixed at different positions and different angles on the bracket, so that 3D acquisition and modeling of a human body are realized. In either case, however, it is desirable that the camera be at a short distance from the target, at least to the extent that it can be deployed, so that the camera can be configured to capture images of the target at different locations.

In some applications, however, the acquisition of images around the object is not possible. For example, when the monitoring probe acquires a monitored region, it is difficult to set a camera around a target object or rotate the camera around the target object because the region is large, the distance is long, and the acquisition object is not fixed. How to perform 3D acquisition and modeling of the target object in such a situation is an urgent problem to be solved.

Further, it is an unsolved problem how to obtain the accurate size of these distant objects even when 3D modeling is performed, so that the 3D models have absolute sizes. For example, when modeling a building at a distance, in order to obtain its absolute dimensions, the prior art generally sets a calibration object on or beside the building, and obtains the size of the 3D model of the building according to the size of the calibration object. However, not all cases allow us to place a calibration object near the object, and even if a 3D model is obtained, the absolute size cannot be obtained, and the actual size of the object cannot be known. For example, a house on the opposite side of a river must have a landmark placed on it to model it, which is difficult to do if it is not possible to cross the river. In addition to the long distance, there is a problem that the distance is not long, but a calibration object cannot be placed on the target object for some reason, for example, when a human body is collected, the calibration object cannot be placed on the human body, and in this case, how to obtain the absolute size of the human body model becomes a huge problem.

In addition, it has been proposed in the prior art to define the camera position using empirical formulas including rotation angle, target size, and object distance, thereby taking into account the speed of synthesis and the effect. However, in practical applications it is found that: unless a precise angle measuring device is provided, the user is insensitive to the angle and is difficult to accurately determine the angle; the object size is difficult to determine accurately, for example in the scenario of 3D model construction of the river house described above. The measured error causes the setting error of the camera position, thereby influencing the acquisition and synthesis speed and effect; accuracy and speed need to be further improved.

Therefore, the following technical problems are urgently needed to be solved: firstly, 3D information of a long-distance and unspecified target can be collected; secondly, the synthesis speed and the synthesis precision are considered simultaneously. The three-dimensional absolute size of a distant object or an object which is not suitable for placing a calibration object can be accurately and conveniently obtained.

Disclosure of Invention

In view of the above, the present invention has been developed to provide a calibration method that overcomes, or at least partially solves, the above-mentioned problems.

The embodiment of the invention provides a 3D modeling calibration method based on multi-device acquisition,

the method comprises the steps that a first acquisition device acquires an image of an object A;

the second acquisition equipment acquires an image of the object B;

the first acquisition equipment and the second acquisition equipment both acquire images of the object C;

wherein the object a is not within the acquisition range of the first acquisition device; the object B is not in the acquisition range of the second acquisition equipment;

the object A is a calibration object, and the object B is a target object; or the object A is a target object, and the object B is a calibration object; the calibration object is provided with a plurality of calibration points;

and calibrating the coordinates of the target object according to the coordinates of the plurality of calibration points.

The first acquisition equipment acquires an image of an object C while acquiring an image of an object A; or the first acquisition equipment is moved and/or rotated after acquiring the image of the object A, the image of the object C is acquired until the object C enters the acquisition range, and a plurality of background images are acquired during the movement and/or rotation of the acquisition equipment.

The second acquisition device acquires the image of the object C while acquiring the image of the object B; or the second acquisition equipment acquires the image of the object C by moving and/or rotating the second acquisition equipment after acquiring the image of the object B until the object C enters the acquisition range, and acquires a plurality of background images during the movement and/or rotation of the acquisition equipment.

In an alternative embodiment, during the movement or rotation of the acquisition device, the following conditions are satisfied: and the intersection of the three images acquired by the adjacent three acquisition positions is not empty.

In an optional embodiment, the acquisition device is a 3D smart vision device, and includes an image acquisition device and a rotation device;

the rotating device is used for driving the acquisition area of the image acquisition device to generate relative motion with the target object;

and the image acquisition device is used for acquiring a group of images of the target object through the relative movement.

In an optional embodiment, when the acquiring device is a 3D intelligent image acquiring device, two adjacent acquiring positions of the 3D intelligent image acquiring device meet the following conditions:

wherein L is the linear distance between the optical centers of the two adjacent image acquisition positions; f is the focal length of the image acquisition device; d is the rectangular length or width of the photosensitive element of the image acquisition device; t is the distance from the photosensitive element of the image acquisition device to the surface of the target along the optical axis; δ is the adjustment coefficient.

In an optional embodiment, feature point extraction is performed on the acquired image, feature point matching is performed, sparse object feature points are obtained, matched feature point coordinates are input, and sparse model three-dimensional point cloud of the object A, B, C and model coordinate values of the positions are obtained by resolving sparse three-dimensional point cloud and position and posture data of the photographing camera.

In an alternative embodiment, the absolute coordinates X of the marker points on the calibration object are imported_T、Y_T、Z_TMatching the image template of the mark point with all the input photos to obtain the pixel row number and column number x containing the mark point in the input photos_i、y_i。

In an optional embodiment, the method further comprises inputting a pixel row and column number x of the mark point according to the position and posture data of the photographing camera_i、y_iThe coordinates (X) of the marker point in the model coordinate system can be calculated_i、Y_i、Z_i) (ii) a From the absolute coordinates of the landmark points and the model coordinates (X)_T、Y_T、Z_T) And (X)_i、Y_i、Z_i) And 7 space coordinate conversion parameters of the model coordinates and the absolute coordinates are solved by using a space similarity transformation formula.

In an alternative embodiment, the method further includes using the calculated 7 parameters, and then converting the three-dimensional point cloud of the object a and the object B and the coordinates of the position and posture data of the photographing camera into an absolute coordinate system, that is, obtaining the real size of the target object.

In an alternative embodiment, there is at least one acquisition device between the first acquisition device and the second acquisition device, adjacent acquisition devices each acquiring a common acquired object.

The embodiment of the invention also provides 3D model construction equipment and a method using the equipment and the method.

Invention and technical effects

1. The absolute size calibration of the remote target object is realized by a multi-camera relay shooting method.

2. By optimizing the position of the camera for collecting the picture, the synthesis speed and the synthesis precision can be ensured to be improved simultaneously. When the camera acquisition position is optimized, the angle and the target size do not need to be measured, and the applicability is stronger.

3. The method has the advantages that the camera optical axis and the turntable form a certain included angle instead of being parallel to rotate to acquire the target object image, 3D synthesis and modeling are achieved, rotation around the target object is not needed, and adaptability of a scene is improved.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a schematic diagram of shooting calibration using multiple acquisition devices in an embodiment of the present invention;

FIG. 2 is a schematic diagram of shooting with a 3D smart vision device in an embodiment of the present invention;

FIG. 3 is another schematic diagram of photographing using a 3D smart vision device according to an embodiment of the present invention;

FIG. 4 is a diagram of an acquisition apparatus with a rotational acquisition area moving device according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an onboard collection device spin-shooting in an embodiment of the invention;

FIG. 6 is a schematic diagram of the vehicle-mounted acquisition device for driving shooting in the embodiment of the invention;

the correspondence of reference numerals to the respective components is as follows:

1 target object, 2 rotating device, 3 rotating device and 4 image acquisition device.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

3D acquisition calibration process

As shown in fig. 1, when the target object to be collected is B, the calibration object a may be placed around B, but in many cases, the calibration object a cannot be placed near the target object B. At this time, the following steps can be carried out:

(1) arranging a collection device m to ensure that the calibration object A is in the collection range; and (4) setting acquisition equipment n to ensure that the target object B is in the acquisition range.

(2) The intermediate C is selected to be within the acquisition range of acquisition device m and within the acquisition range of acquisition device n.

(3) The acquisition equipment m acquires an image of the calibration object A; and simultaneously acquiring the image of the intermediate object C, or acquiring the image of the intermediate object C by moving and/or rotating the intermediate object C to be in an acquisition range. Of course, it is also possible to capture the image of C first and then the image of a. The specific method that may be used in one location acquisition is as follows: the motor drives the turntable to rotate, and drives the camera to rotate, so that the optical axis position of the camera moves in the space. For example, when the image capturing device 1 captures an image of the target object once every distance L, the image capturing device 1 captures n images when the turntable is rotated by 360 °, and the images are captured by the camera at different positions. The camera can be rotated to a corresponding acquisition position and then stopped rotating, and the camera can be rotated to the next acquisition position after the acquisition is finished.

(4) The acquisition equipment n acquires an image of the target object B; and simultaneously acquiring the image of the intermediate object C, or acquiring the image of the intermediate object C by moving and/or rotating the intermediate object C to be in an acquisition range. Of course, it is also possible to capture the image of C first and then the image of a. The specific method that may be used in one location acquisition is as follows: the motor drives the turntable to rotate, and drives the camera to rotate, so that the optical axis position of the camera moves in the space. For example, when the image capturing device 1 captures an image of the target object once every distance L, the image capturing device 1 captures n images when the turntable is rotated by 360 °, and the images are captured by the camera at different positions. The camera can be rotated to a corresponding acquisition position and then stopped rotating, and the camera can be rotated to the next acquisition position after the acquisition is finished.

(5) Finally, a group of images comprising the target object B, the calibration object A and the intermediate object C are obtained.

Of course, it is also possible to capture the image of B first and then shift the capturing until the image of a is captured. The process is the reverse of that described above. Since the state of the object may change in some situations, it is necessary to increase the acquisition speed, otherwise the state of the object acquired by the image acquisition apparatus 1 in different images may not be 3D synthesized and modeled. This can be solved by two methods: the turntable is provided with n image acquisition devices, so that n images can be shot once and n images can be obtained at the next position. Secondly, in order to save cost at the same time, although the number of the image acquisition devices 1 is not increased, the rotating speed of the turntable can be increased, but the shutter of the image acquisition devices 1 needs to be adjusted to be in a faster mode, otherwise, image blurring can be caused. Whereas an increase in shutter speed requires better lighting conditions of the light source. Therefore, there is a need to provide better light sources or scenes with better natural light can use this method. Meanwhile, in the method, under the condition of low requirements, image acquisition equipment such as a camera and the like can be used for replacing 3D intelligent vision equipment for acquisition. The above-mentioned removal of collection equipment can be through handheld, track, unmanned aerial vehicle machine carries, the vehicle bears multiple mode such as.

In another embodiment, there is no common intermediate C in the acquisition range of the m and n acquisition devices, where multiple cameras may be used and multiple intermediate relays may be used. For example: m is₁Shoot A₀And A₁，m₂Shoot A₁And A₂，m₃Shoot A₂And A₃....m_iShoot A_i-1And A_iWherein A is₀And A_iRespectively as a calibration object and a target object, m₁、m₂、m₃....m_iRespectively a plurality of acquisition devices. It will be appreciated that any one of the acquisition devices m_iAt the time of acquisition, the images need to be shot from A at a certain shooting interval_i-1Continuously shoot to A_i. The acquisition device continuously acquires at certain time/space intervals during the movement, wherein the continuous acquisition should satisfy: the images P, Q, R acquired at three adjacent acquisition locations should satisfy that P ≈ Q ≈ R non-null, so as to ensure that the information of the object to be calibrated can be used for calibration of the object.

Calibration method

(1) Through the photographing device, the photos of A and B are obtained at different photographing angles, and A and B are at a certain distance and are difficult to photograph in one picture. Shooting A by an m camera and B by an n camera; the m-camera and the n-camera are moved so that they can photograph the C object at the same time. The A and the B can be connected together through the C pictures, and the number of the shot pictures is not less than 3;

(2) uniformly arranging 4 (or more than 4) mark points with known coordinates on the shot calibration object A, and ensuring that a plurality of (more than 3) photos shoot the measured mark points which are static and immovable;

(3) and extracting feature points of all the pictures shot by the m and n cameras, and matching the feature points. And acquiring sparse object characteristic points. And inputting the matched feature point coordinates, and obtaining sparse model three-dimensional point cloud of the object A, B, C and model coordinate values of the positions by utilizing the solved sparse three-dimensional point cloud and the position and posture data of the photographing camera.

(4) Leading-in absolute coordinate mark X of mark point_T、Y_T、Z_TMatching the image template of the mark point with all the input photos to obtain the pixel row number and column number x containing the mark point in the input photos_i、y_i(or the pixel row and column number x of the mark point is obtained by manual measurement from the picture_i、y_i)；

(5) Inputting the pixel row and column number x of the mark point according to the position and posture data of the camera in the step (3)_i、y_iThe coordinates (X) of the marker point in the model coordinate system can be calculated_i、Y_i、Z_i) (ii) a According to 4 (or more than 4) absolute coordinates of the mark points and model coordinates (X)_T、Y_T、Z_T) And (X)_i、Y_i、Z_i) 7 space coordinate conversion parameters of the model coordinate and the absolute coordinate are solved by using a space similarity transformation formula; wherein ε X, ε y, ε z, λ, X₀、Y₀、Z₀Is 7 parameters.

(6) And (4) converting the coordinates of the three-dimensional point clouds of the objects A, B and C and the position and posture data of the photographing camera into an absolute coordinate system by using the 7 parameters calculated in the step (5), so that the real size and size of the object model B are obtained.

Utilizing 3D intelligent vision devices

Comprises an image acquisition device 4, a rotating device 2 and a cylindrical shell. As shown in fig. 2, the image pickup device is mounted on a rotating device 2 which is accommodated in a cylindrical housing and can freely rotate therein.

The image acquisition device 4 is used for acquiring a group of images of the target object through the relative movement of an acquisition area of the image acquisition device 4 and the target object; and the acquisition area moving device is used for driving the acquisition area of the image acquisition device to generate relative motion with the target object. The collection area is the effective field range of the image collection device.

The image capturing device may be a camera and the rotating device 2 may be a turntable. The camera is arranged on the rotary table, a certain included angle is formed between the optical axis of the camera and the rotary table, and the rotary table surface is approximately parallel to the target object to be collected. The turntable drives the camera to rotate, so that the camera can acquire images of the target object at different positions.

Further, the camera is mounted on the turntable through an angle adjusting device, and the angle adjusting device can rotate so as to adjust the included angle between the optical axis of the image acquisition device 4 and the surface of the turntable, wherein the adjusting range is-90 degrees < gamma <90 degrees. When shooting a closer target object, the optical axis of the image acquisition device 4 can be enabled to deviate towards the central axis direction of the turntable, namely, the gamma direction is adjusted to be minus 90 degrees. When the inside of the shooting cavity is shot, the optical axis of the image acquisition device 4 can deviate from the central axis direction of the turntable, namely, the gamma is adjusted to 90 degrees. The adjustment can be manually completed, or a distance measuring device can be arranged on the 3D intelligent vision equipment to measure the distance between the 3D intelligent vision equipment and the target object, and the gamma angle is automatically adjusted according to the distance.

The turntable can be connected with the motor through the transmission device, and is driven by the motor to rotate, and the image acquisition device is driven to rotate. The transmission means may be a gear system or a belt or other conventional mechanical structure.

In order to improve the acquisition efficiency, a plurality of image acquisition devices 4 may be arranged on the carousel. As shown in fig. 3, a plurality of image capturing devices 4 are sequentially distributed along the circumference of the turntable. For example, an image acquisition device can be respectively arranged at two ends of any diameter of the turntable. Or one image acquisition device can be arranged at intervals of 60 degrees of circumferential angle, and 6 image acquisition devices are uniformly arranged on the whole disc. The plurality of image acquisition devices can be the same type of camera or different types of cameras. For example, a visible light camera and an infrared camera are arranged on the turntable, so that images of different wave bands can be acquired.

The image acquisition device is used for acquiring an image of a target object and can be a fixed-focus camera or a zoom camera. In particular, the camera may be a visible light camera or an infrared camera. Of course, it is understood that any device with image capturing function can be used, and does not limit the present invention, and for example, the device can be a CCD, a CMOS, a camera, a video camera, an industrial camera, a monitor, a camera, a mobile phone, a tablet, a notebook, a mobile terminal, a wearable device, a smart glasses, a smart watch, a smart bracelet, and all devices with image capturing function.

Besides the turntable, the rotating device 2 can also be in various forms such as a rotating arm, a rotating beam, a rotating bracket and the like, as long as the image acquisition device can be driven to rotate. Whichever mode is used, the optical axis of the image capturing device 1 and the rotation plane all have a certain included angle γ.

In general, the light sources are distributed around the lens of the image capturing device in a distributed manner, for example, the light sources are annular LED lamps around the lens and are located on the turntable; or may be provided on the cross section of the cylindrical housing. Since in some applications the object to be acquired is a human body, the intensity of the light source needs to be controlled to avoid discomfort to the human body. In particular, a light softening means, for example a light softening envelope, may be arranged in the light path of the light source. Or the LED surface light source is directly adopted, so that the light is soft, and the light is more uniform. Preferably, an OLED light source can be adopted, the size is smaller, the light is softer, and the flexible OLED light source has the flexible characteristic and can be attached to a curved surface. The light source may also be positioned at other locations that provide uniform illumination of the target. The light source can also be an intelligent light source, namely, the light source parameters are automatically adjusted according to the conditions of the target object and the ambient light.

When 3D acquisition is carried out, the direction of the optical axis of the image acquisition device at different acquisition positions does not change relative to the target object, and is generally approximately vertical to the surface of the target object, and the positions of two adjacent image acquisition devices or two adjacent acquisition positions of the image acquisition device 1 meet the following conditions:

μ＜0.482

wherein L is the linear distance between the optical centers of the two adjacent acquisition position image acquisition devices 1; f is the focal length of the image acquisition device 1; d is the rectangular length of a photosensitive element (CCD) of the image acquisition device; m is the distance from the photosensitive element of the image acquisition device 1 to the surface of the target along the optical axis; μ is an empirical coefficient.

When the two positions are along the length direction of the photosensitive element of the image acquisition device, d is a rectangle; when the two positions are along the width direction of the photosensitive element of the image acquisition device, d is in a rectangular width.

When the image acquisition device is at any one of the two positions, the distance from the photosensitive element to the surface of the target object along the optical axis is taken as M.

As mentioned above, L should be a straight-line distance between the optical centers of the two image capturing devices, but since the optical center position of the image capturing device is not easily determined in some cases, the center of the photosensitive element of the image capturing device, the geometric center of the image capturing device 1, the axial center of the connection between the image capturing device and the pan/tilt head (or platform, support), and the center of the proximal or distal surface of the lens may be used in some cases instead, and the error caused by the above is found to be within an acceptable range through experiments, and therefore the above range is also within the protection scope of the present invention.

Experiments were conducted using the apparatus of the present invention, and the following experimental results were obtained.

From the above experimental results and a lot of experimental experience, it can be concluded that the value of μ should satisfy μ <0.482, and at this time, it is already possible to synthesize a part of the 3D model, and although some parts cannot be automatically synthesized, it is acceptable in the case of low requirements, and the part that cannot be synthesized can be compensated manually or by replacing the algorithm. Particularly, when the value of μ satisfies μ <0.357, the balance between the synthesis effect and the synthesis time can be optimally taken into consideration; mu <0.198 can be chosen for better synthesis, where the synthesis time increases but the synthesis quality is better. When μ is 0.5078, it cannot be synthesized. It should be noted that the above ranges are only preferred embodiments and should not be construed as limiting the scope of protection.

The above data are obtained by experiments for verifying the conditions of the formula, and do not limit the invention. Without these data, the objectivity of the formula is not affected. Those skilled in the art can adjust the equipment parameters and the step details as required to perform experiments, and obtain other data which also meet the formula conditions.

The adjacent acquisition positions refer to two adjacent positions on a movement track where acquisition actions occur when the image acquisition device moves relative to a target object. This is generally easily understood for the image acquisition device movements. However, when the target object moves to cause relative movement between the two, the movement of the target object should be converted into the movement of the target object, which is still, and the image capturing device moves according to the relativity of the movement. And then measuring two adjacent positions of the image acquisition device in the converted movement track.

Using 3D image acquisition devices

(1) The collecting area moving device is a rotary structure

Referring to fig. 4, the object 1 is fixed at a certain position, and the rotating device 3 drives the image capturing device 4 to rotate around the object 1. The rotating device 3 can drive the image acquisition device 4 to rotate around the target 1 through a rotating arm. Of course, the rotation is not necessarily a complete circular motion, and can be only rotated by a certain angle according to the acquisition requirement. The rotation does not necessarily need to be circular motion, and the motion track of the image acquisition device 4 can be other curved tracks as long as the camera can shoot the object from different angles.

The rotating device 3 can also drive the image capturing device to rotate, as shown in fig. 5, so that the image capturing device 4 can capture images of the target object from different angles through rotation.

The rotating device 3 can be in various forms such as a cantilever, a turntable, a track and the like, and can also be handheld, vehicle-mounted or airborne, so that the image acquisition device can move.

In addition to the above, in some cases, the camera may be fixed, and the stage carrying the object 1 may be rotated, so that the direction of the object facing the image capturing device is changed at any moment, thereby enabling the image capturing device to capture images of the object from different angles. However, in this case, the calculation may still be performed according to the condition of converting the motion into the motion of the image capturing device, so that the motion conforms to the corresponding empirical formula (which will be described in detail below). For example, in a scenario where the stage rotates, it may be assumed that the stage is stationary and the image capture device 4 rotates. The distance of the shooting position when the image acquisition device rotates is set by using an empirical formula, so that the rotating speed of the image acquisition device is deduced, the rotating speed of the object stage is reversely deduced, the rotating speed is conveniently controlled, and 3D acquisition is realized. Of course, such scenes are not commonly used, and it is more common to rotate the image capture device.

In addition, in order to enable the image capturing device 4 to capture images of the target object in different directions, the image capturing device and the target object may be kept still, and the image capturing device 4 may be rotated by rotating the optical axis. For example: the collecting area moving device is an optical scanning device, so that under the condition that the image collecting device 4 does not move or rotate, the collecting area of the image collecting device 4 and the target object 1 generate relative motion. The acquisition area moving device also comprises a light deflection unit which is driven by machinery to rotate, or is driven by electricity to cause light path deflection, or is distributed in space in multiple groups, so that images of the target object can be acquired from different angles. The light deflection unit may typically be a mirror, which is rotated to collect images of the target object in different directions. Or a reflector surrounding the target object is directly arranged in space, and the light of the reflector enters the image acquisition device in turn. Similarly to the foregoing, the rotation of the optical axis in this case can be regarded as the rotation of the virtual position of the image pickup device, and by this method of conversion, it is assumed that the image pickup device 4 rotates, so that the calculation is performed using the following empirical formula.

The image capturing device 4 is used for capturing an image of the object 1, and may be a fixed focus camera or a zoom camera. In particular, the camera may be a visible light camera or an infrared camera. Of course, it is understood that any device with image capturing function can be used, and does not limit the present invention, and for example, the device can be a CCD, a CMOS, a camera, a video camera, an industrial camera, a monitor, a camera, a mobile phone, a tablet, a notebook, a mobile terminal, a wearable device, a smart glasses, a smart watch, a smart bracelet, and all devices with image capturing function.

The device further comprises a processor, also called processing unit, for synthesizing a 3D model of the object according to the plurality of images acquired by the image acquisition means and according to a 3D synthesis algorithm, to obtain 3D information of the object.

(2) The acquisition area moving device is a translation structure

In addition to the above-described rotating structure, the image pickup device 4 may move in a linear trajectory relative to the object. For example, the image capturing device 4 is located on a linear track or on a vehicle or a drone traveling in a straight line, as shown in fig. 6, and the image capturing device 4 is sequentially photographed by passing through the target along the linear track, and is not rotated during the process. Wherein the linear track can also be replaced by a linear cantilever. However, it is more preferable that the image capturing device 4 is rotated so that the optical axis of the image capturing device 4 faces the target object 1 when the entire image capturing device 4 moves along a linear trajectory.

(3) The mobile device of the acquisition area is a random motion structure

Sometimes, the movement of the capturing area is irregular, for example, when the image capturing device is held by hand, or when the vehicle is mounted or carried on a vehicle, when the traveling route is irregular, it is difficult to move along a strict track, and the movement trajectory of the image capturing device 4 is difficult to predict accurately. Therefore, in this case, how to ensure that the captured images can be accurately and stably synthesized into the 3D model is a difficult problem, which has not been mentioned yet. A more common approach is to take multiple photographs, with redundancy in the number of photographs to address this problem. However, the synthesis results are not stable. Although there are some ways to improve the composite effect by limiting the rotation angle of the camera, in practice, the user is not sensitive to the angle, and even if the preferred angle is given, the user is difficult to operate in the case of hand-held shooting. Therefore, the invention provides a method for improving the synthesis effect and shortening the synthesis time by limiting the moving distance of the camera for twice photographing.

In the case of irregular movement, a sensor may be provided in the mobile terminal or the image capturing device 4, and a linear distance moved by the image capturing device 4 during two times of photographing may be measured by the sensor, and when the moving distance does not satisfy the above-mentioned experience condition with respect to L (specifically, the following condition), an alarm may be issued to the user. The alarm comprises sounding or lighting an alarm to the user. Of course, the distance moved by the user and the maximum movable distance L may also be displayed on the screen of the mobile phone or prompted by voice in real time when the user moves the image capturing device 4. The sensor that accomplishes this function includes: a range finder, a gyroscope, an accelerometer, a positioning sensor, and/or combinations thereof.

(4) Multiple camera mode

It can be understood that the camera can shoot images of the target object at different angles by the relative movement of the camera and the target object, and a plurality of cameras can be arranged at different positions around the target object, so that the images of the target object at different angles can be shot simultaneously.

When the acquisition area moves relative to the target object, particularly, the image acquisition device rotates around the target object, when 3D acquisition is carried out, the image acquisition device changes relative to the target object in the direction of the optical axis at different acquisition positions, and the positions of two adjacent image acquisition devices or two adjacent acquisition positions of the image acquisition device 4 meet the following conditions:

δ<0.603

wherein L is the linear distance between the optical centers of the two adjacent image acquisition positions; f is the focal length of the image acquisition device; d is the rectangular length or width of the photosensitive element (CCD) of the image acquisition device; t is the distance from the photosensitive element of the image acquisition device 4 to the surface of the target object 1 along the optical axis; δ is the adjustment coefficient.

When the image acquisition device is at any one of the two positions, the distance from the photosensitive element to the surface of the target object along the optical axis is taken as T. In addition to this method, in another case, L is A_n、A_n+1Linear distance between optical centers of two image capturing devices, and A_n、A_n+1Two image acquisition devices adjacent to each other_n-1、A_n+2Two image acquisition devices and A_n、A_n+1The distances from the respective photosensitive elements of the two image acquisition devices to the surface of the target object along the optical axis are respectively T_n-1、T_n、T_n+1、T_n+2，T＝(T_n-1+T_n+T_n+1+T_n+2)/4. Of course, the average value may be calculated by using more positions than the adjacent 4 positions.

The camera lens is replaced, and the experiment is carried out again, so that the following experiment results are obtained.

As mentioned above, L should be a straight-line distance between the optical centers of the two image capturing devices, but since the optical center position of the image capturing device is not easily determined in some cases, the center of the photosensitive element of the image capturing device, the geometric center of the image capturing device, the axial center of the connection between the image capturing device and the pan/tilt head (or platform, support), and the center of the proximal or distal surface of the lens may be used in some cases instead, and the error caused by the displacement is found to be within an acceptable range through experiments, and therefore the above range is also within the protection scope of the present invention.

In general, parameters such as object size and angle of view are used as means for estimating the position of a camera in the prior art, and the positional relationship between two cameras is also expressed in terms of angle. Because the angle is not well measured in the actual use process, it is inconvenient in the actual use. Also, the size of the object may vary with the variation of the measurement object. For example, when a pavilion is collected after 3D information collection is performed on an office building, the size needs to be measured again and reckoning needs to be performed again. The inconvenient measurement and the repeated measurement bring errors in measurement, thereby causing errors in camera position estimation. According to the scheme, the experience conditions required to be met by the position of the camera are given according to a large amount of experimental data, so that the problem that the measurement is difficult to accurately measure the angle is solved, and the size of an object does not need to be directly measured. In the empirical condition, d and f are both fixed parameters of the camera, and corresponding parameters can be given by a manufacturer when the camera and the lens are purchased without measurement. And T is only a straight line distance, and can be conveniently measured by using a traditional measuring method, such as a ruler and a laser range finder. Therefore, the empirical formula of the invention enables the preparation process to be convenient and fast, and simultaneously improves the arrangement accuracy of the camera position, so that the camera can be arranged in an optimized position, thereby simultaneously considering the 3D synthesis precision and speed.

From the above experimental results and a lot of experimental experiences, it can be found that the value of δ should satisfy δ <0.603, and at this time, a part of the 3D model can be synthesized, although a part cannot be automatically synthesized, it is acceptable in the case of low requirements, and the part which cannot be synthesized can be compensated manually or by replacing the algorithm. Particularly, when the value of δ satisfies δ <0.410, the balance between the synthesis effect and the synthesis time can be optimally taken into consideration; delta <0.356 can be chosen for better synthesis, where the synthesis time is increased but the synthesis quality is better. Of course, to further enhance the synthesis effect, δ <0.311 may be selected. When the delta is 0.681, the synthesis is not possible. It should be noted that the above ranges are only preferred embodiments and should not be construed as limiting the scope of protection.

Moreover, as can be seen from the above experiment, for the determination of the photographing position of the camera, only the camera parameters (focal length f, CCD size) and the distance T between the camera CCD and the object surface need to be obtained according to the above formula, which makes it easy to design and debug the device. Since the camera parameters (focal length f, CCD size) are determined at the time of purchase of the camera and are indicated in the product description, they are readily available. Therefore, the camera position can be easily calculated according to the formula without carrying out complicated view angle measurement and object size measurement. Particularly, in some occasions, the lens of the camera needs to be replaced, and then the position of the camera can be obtained by directly replacing the conventional parameter f of the lens and calculating; similarly, when different objects are collected, the measurement of the size of the object is complicated due to the different sizes of the objects. By using the method of the invention, the position of the camera can be determined more conveniently without measuring the size of the object. And the camera position determined by the invention can give consideration to both the synthesis time and the synthesis effect. Therefore, the above-described empirical condition is one of the points of the present invention.

The rotation movement of the invention is that the front position collection plane and the back position collection plane are crossed but not parallel in the collection process, or the optical axis of the front position image collection device and the optical axis of the back position image collection device are crossed but not parallel. That is, the capture area of the image capture device moves around or partially around the target, both of which can be considered as relative rotation. Although the embodiment of the present invention exemplifies more orbital rotation, it should be understood that the limitation of the present invention can be used as long as the non-parallel motion between the acquisition region of the image acquisition device and the target object is rotation. The scope of the invention is not limited to the embodiment with track rotation.

3D synthetic modeling device and method

And the processor is also called as a processing unit and is used for synthesizing a 3D model of the target object according to a plurality of images acquired by the image acquisition device and a 3D synthesis algorithm to obtain 3D information of the target object. The image acquisition device 1 sends the acquired images to the processing unit, and the processing unit obtains the 3D information of the target object according to the images in the group of images. Of course, the processing unit may be directly disposed in the housing where the image capturing device 1 is located, or may be connected to the image capturing device through a data line or in a wireless manner. For example, an independent computer, a server, a cluster server, or the like may be used as a processing unit, and the image data acquired by the image acquisition apparatus 1 may be transmitted thereto to perform 3D synthesis. Meanwhile, the data of the image acquisition device 1 can be transmitted to the cloud platform, and 3D synthesis is performed by using the powerful computing capability of the cloud platform.

The following method is executed in the processing unit:

1. and performing image enhancement processing on all input photos. The contrast of the original picture is enhanced and simultaneously the noise suppressed using the following filters.

In the formula: g (x, y) is the gray value of the original image at (x, y), f (x, y) is the gray value of the original image at the position after being enhanced by the Wallis filter, and m_gIs the local gray average value, s, of the original image_gIs the local standard deviation of gray scale of the original image, m_fFor the transformed image local gray scale target value, s_fThe target value of the standard deviation of the local gray scale of the image after transformation. c belongs to (0, 1) as the expansion constant of the image variance, and b belongs to (0, 1) as the image brightness coefficient constant.

The filter can greatly enhance image texture modes of different scales in an image, so that the quantity and the precision of feature points can be improved when the point features of the image are extracted, and the reliability and the precision of a matching result are improved in photo feature matching.

2. And extracting characteristic points of all input images, matching the characteristic points and acquiring sparse characteristic points. And extracting and matching feature points of the photos by adopting a SURF operator. The SURF feature matching method mainly comprises three processes of feature point detection, feature point description and feature point matching. The method uses a Hessian matrix to detect characteristic points, a Box filter (Box Filters) is used for replacing second-order Gaussian filtering, an integral image is used for accelerating convolution to improve the calculation speed, and the dimension of a local image characteristic descriptor is reduced to accelerate the matching speed. The method mainly comprises the steps of firstly, constructing a Hessian matrix, generating all interest points for feature extraction, and constructing the Hessian matrix for generating stable edge points (catastrophe points) of an image; secondly, establishing scale space characteristic point positioning, comparing each pixel point processed by the Hessian matrix with 26 points in a two-dimensional image space and a scale space neighborhood, preliminarily positioning a key point, filtering the key point with weak energy and the key point with wrong positioning, and screening out a final stable characteristic point; and thirdly, determining the main direction of the characteristic points by adopting the harr wavelet characteristics in the circular neighborhood of the statistical characteristic points. In a circular neighborhood of the feature points, counting the sum of horizontal and vertical harr wavelet features of all points in a sector of 60 degrees, rotating the sector at intervals of 0.2 radian, counting the harr wavelet feature values in the region again, and taking the direction of the sector with the largest value as the main direction of the feature points; and fourthly, generating a 64-dimensional feature point description vector, and taking a 4-by-4 rectangular region block around the feature point, wherein the direction of the obtained rectangular region is along the main direction of the feature point. Each subregion counts haar wavelet features of 25 pixels in both the horizontal and vertical directions, where both the horizontal and vertical directions are relative to the principal direction. The haar wavelet features are in 4 directions of the sum of the horizontal direction value, the vertical direction value, the horizontal direction absolute value and the vertical direction absolute value, and the 4 values are used as feature vectors of each sub-block region, so that a total 4 x 4-64-dimensional vector is used as a descriptor of the Surf feature; and fifthly, matching the characteristic points, wherein the matching degree is determined by calculating the Euclidean distance between the two characteristic points, and the shorter the Euclidean distance is, the better the matching degree of the two characteristic points is.

3. Inputting matched feature point coordinates, resolving the sparse three-dimensional point cloud of the target object and the position and posture data of the photographing camera by using a light beam method adjustment, namely obtaining model coordinate values of the sparse three-dimensional point cloud of the target object model and the position; and performing multi-view photo dense matching by taking the sparse feature points as initial values to obtain dense point cloud data. The process mainly comprises four steps: stereo pair selection, depth map calculation, depth map optimization and depth map fusion. For each image in the input data set, we select a reference image to form a stereo pair for use in computing the depth map. Therefore, we can get rough depth maps of all images, which may contain noise and errors, and we use its neighborhood depth map to perform consistency check to optimize the depth map of each image. And finally, carrying out depth map fusion to obtain the three-dimensional point cloud of the whole scene.

4. And reconstructing the curved surface of the target object by using the dense point cloud. The method comprises the steps of defining an octree, setting a function space, creating a vector field, solving a Poisson equation and extracting an isosurface. And obtaining an integral relation between the sampling point and the indicating function according to the gradient relation, obtaining a vector field of the point cloud according to the integral relation, and calculating the approximation of the gradient field of the indicating function to form a Poisson equation. And (3) solving an approximate solution by using matrix iteration according to a Poisson equation, extracting an isosurface by adopting a moving cube algorithm, and reconstructing a model of the measured point cloud.

5. Full-automatic texture mapping of object models. And after the surface model is constructed, texture mapping is carried out. The main process comprises the following steps: texture data is obtained to reconstruct a surface triangular surface grid of a target through an image; and secondly, reconstructing the visibility analysis of the triangular surface of the model. Calculating a visible image set and an optimal reference image of each triangular surface by using the calibration information of the image; and thirdly, clustering the triangular surface to generate a texture patch. Clustering the triangular surfaces into a plurality of reference image texture patches according to the visible image set of the triangular surfaces, the optimal reference image and the neighborhood topological relation of the triangular surfaces; and fourthly, automatically sequencing the texture patches to generate texture images. And sequencing the generated texture patches according to the size relationship of the texture patches to generate a texture image with the minimum surrounding area, and obtaining the texture mapping coordinate of each triangular surface.

Acquisition position optimization of image acquisition device

When 3D acquisition is carried out, the direction of the optical axis of the image acquisition device at different acquisition positions does not change relative to the target object, and is generally approximately vertical to the surface of the target object, and at the moment, the positions of two adjacent image acquisition devices or two adjacent acquisition positions of the image acquisition devices meet the following conditions:

μ＜0.482

wherein L is the linear distance between the optical centers of the two adjacent image acquisition positions; f is the focal length of the image acquisition device; d is the rectangular length of a photosensitive element (CCD) of the image acquisition device; m is the distance from the photosensitive element of the image acquisition device to the surface of the target object along the optical axis; μ is an empirical coefficient.

Application of 3D intelligent vision equipment

For example, the 3D intelligent vision equipment is installed on a factory crane tower, when the crane tower works, pictures of a target object under the crane tower can be collected in real time, and a 3D model of the target object is synthesized in the processing unit, so that the type and the size of the target object are identified, the crane tower can conveniently hoist the target object, and meanwhile, a placing area with a proper size is matched.

Although the image capturing device captures an image in the above embodiments, the image capturing device is not understood to be applicable to only a group of pictures made of a single picture, and this is merely an illustrative manner for facilitating understanding. The image acquisition device can also acquire video data, and directly utilize the video data or intercept images from the video data to carry out 3D synthesis. However, the shooting position of the corresponding frame of the video data or the captured image used in the synthesis still satisfies the above empirical formula.

The target object, and the object all represent objects for which three-dimensional information is to be acquired. The object may be a solid object or a plurality of object components. For example, a building, a bridge, etc. The three-dimensional information of the target object comprises a three-dimensional image, a three-dimensional point cloud, a three-dimensional grid, a local three-dimensional feature, a three-dimensional size and all parameters with the three-dimensional feature of the target object. Three-dimensional in the present invention means having XYZ three-direction information, particularly depth information, and is essentially different from only two-dimensional plane information. It is also fundamentally different from some definitions, which are called three-dimensional, panoramic, holographic, three-dimensional, but actually comprise only two-dimensional information, in particular not depth information.

The capture area in the present invention refers to a range in which an image capture device (e.g., a camera) can capture an image. The image acquisition device can be a CCD, a CMOS, a camera, a video camera, an industrial camera, a monitor, a camera, a mobile phone, a tablet, a notebook, a mobile terminal, a wearable device, intelligent glasses, an intelligent watch, an intelligent bracelet and all devices with image acquisition functions.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in an apparatus in accordance with embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Thus, it should be appreciated by those skilled in the art that while a number of exemplary embodiments of the invention have been illustrated and described in detail herein, many other variations or modifications consistent with the principles of the invention may be directly determined or derived from the disclosure of the present invention without departing from the spirit and scope of the invention. Accordingly, the scope of the invention should be understood and interpreted to cover all such other variations or modifications.

Claims

1. A3D modeling calibration method based on multi-device acquisition is characterized in that:

the second acquisition equipment acquires an image of the object B;

wherein the object a is not within the acquisition range of the second acquisition device; the object B is not in the acquisition range of the first acquisition equipment;

the object A is a calibration object, and the object B is a target object; or the object A is a target object, and the object B is a calibration object;

the calibration object is provided with a plurality of calibration points;

calibrating the coordinates of the target object according to the coordinates of the plurality of calibration points;

the first acquisition equipment acquires an image of an object C while acquiring an image of an object A; or the first acquisition equipment moves and/or rotates after acquiring the image of the object A, acquires the image of the object C until the object C enters the acquisition range, and acquires a plurality of background images in the movement and/or rotation process of the acquisition equipment;

2. The method of claim 1, wherein: in the moving or rotating process of the acquisition equipment, the following conditions are met: and the intersection of the three images acquired by the adjacent three acquisition positions is not empty.

3. The method of claim 1, wherein: the acquisition equipment is 3D intelligent vision equipment and comprises an image acquisition device and a rotating device;

4. The method of claim 1, wherein: when the acquisition equipment is 3D intelligent image acquisition equipment, two adjacent acquisition positions of the 3D intelligent image acquisition equipment accord with the following conditions:

wherein L is the linear distance between the optical centers of the two adjacent image acquisition positions; f is the focal length of the image acquisition device; d is the rectangular length of the photosensitive element of the image acquisition device; t is the distance from the photosensitive element of the image acquisition device to the surface of the target along the optical axis; δ is the adjustment coefficient.

5. The method of claim 1, wherein: extracting characteristic points of the acquired image, matching the characteristic points, obtaining sparse object characteristic points, inputting matched characteristic point coordinates, and obtaining sparse model three-dimensional point cloud of the object A, B, C and model coordinate values of the positions by utilizing the resolved sparse three-dimensional point cloud and the position and posture data of the photographing camera.

6. The method of claim 1, wherein: introducing absolute coordinates X of marking points on a calibration object_T、Y_T、Z_TMatching the image template of the mark point with all the input photos to obtain the pixel row number and column number x containing the mark point in the input photos_i、y_i。

7. The method of claim 6, wherein: the method also comprises inputting the pixel row and column number x of the mark point according to the position and posture data of the camera_i、y_iThe model of the mark point can be calculatedCoordinate (X) in a coordinate system_i、Y_i、Z_i)；

From the absolute coordinates of the landmark points and the model coordinates (X)_T、Y_T、Z_T) And (X)_i、Y_i、Z_i) And 7 space coordinate conversion parameters of the model coordinates and the absolute coordinates are solved by using a space similarity transformation formula.

8. The method of claim 7, wherein: and the calculated 7 parameters are utilized, so that the three-dimensional point cloud of the object A and the object B and the coordinates of the position and posture data of the photographing camera can be converted into an absolute coordinate system, and the real size of the target object is obtained.

9. The method of any of claims 1-8, wherein: at least one acquisition device is arranged between the first acquisition device and the second acquisition device, and adjacent acquisition devices acquire a common acquired object.

10. A 3D model building apparatus, characterized in that the method according to any of claims 1-8 is used.