CN111145339B

CN111145339B - Image processing method and device, equipment and storage medium

Info

Publication number: CN111145339B
Application number: CN201911356995.4A
Authority: CN
Inventors: 鲁晋杰; 马标; 李姬俊男
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2023-06-02
Anticipated expiration: 2039-12-25
Also published as: CN111145339A

Abstract

The embodiment of the application discloses an image processing method, an image processing device, image processing equipment and a storage medium, wherein the method comprises the following steps: collecting N frames of sample images in a real physical scene through a camera on electronic equipment, wherein N is an integer greater than 0; acquiring a first real-time pose set, wherein the first real-time pose set comprises a first real-time pose of the camera in the real physical scene when each frame of the sample image is acquired; performing three-dimensional reconstruction according to the N frames of sample images to obtain an initial point cloud and a first reconstruction pose set; the first reconstruction pose set comprises a first reconstruction pose of a camera corresponding to each frame of the sample image; and performing scale transformation on at least the initial point cloud according to the first real-time pose set and the first reconstruction pose set to obtain a target point cloud.

Description

Image processing method and device, equipment and storage medium

Technical Field

Embodiments of the present application relate to electronic technology, and relate to, but are not limited to, image processing methods and apparatuses, devices, and storage media.

Background

The construction of point cloud data in a real physical scene is a core technology of computer vision positioning. If the scale information is not considered in the construction of the point cloud data, a three-dimensional structure model in equal proportion to the actual physical scene cannot be obtained. Thus, when the point cloud data is applied to indoor and outdoor navigation positioning, the obtained positioning result is often inaccurate.

Currently, artificial calibrations of known sizes are typically introduced in real physical scenes to assist the electronic device in scale recovery of the reconstructed initial point cloud. However, in the image processing method, the manual calibration objects need to be manufactured, installed, disassembled and the like manually, so that the scaled point cloud data cannot be conveniently and quickly obtained, and additional material cost and labor cost are introduced.

Disclosure of Invention

In view of this, embodiments of the present application provide an image processing method, an image processing apparatus, a device, and a storage medium.

The technical scheme of the embodiment of the application is realized as follows:

in a first aspect, an embodiment of the present application provides an image processing method, including: collecting N frames of sample images in a real physical scene through a camera on electronic equipment, wherein N is an integer greater than 0; acquiring a first real-time pose set, wherein the first real-time pose set comprises a first real-time pose of the camera in the real physical scene when each frame of the sample image is acquired; performing three-dimensional reconstruction according to the N frames of sample images to obtain an initial point cloud and a first reconstruction pose set; the first reconstruction pose set comprises a first reconstruction pose of a camera corresponding to each frame of the sample image; and performing scale transformation on at least the initial point cloud according to the first real-time pose set and the first reconstruction pose set to obtain a target point cloud.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including: the image acquisition module is used for acquiring N frames of sample images in a physical scene, wherein N is an integer greater than 0; the pose acquisition module is used for acquiring a first real-time pose set, wherein the first real-time pose set comprises a first real-time pose of the image acquisition module in the real physical scene when acquiring each frame of the sample image; the three-dimensional reconstruction module is used for carrying out three-dimensional reconstruction according to the N frames of sample images to obtain an initial point cloud and a first reconstruction pose set; the first reconstruction pose set comprises a first reconstruction pose of the image acquisition module corresponding to each sample image; and the map transformation module is used for performing scale transformation on at least the initial point cloud according to the first real-time pose set and the first reconstruction pose set to obtain a target point cloud.

In a third aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program executable on the processor, and the processor implements steps in any of the image processing methods in the embodiments of the present application when the processor executes the program.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements steps in any of the image processing methods of the embodiments of the present application.

In the embodiment of the application, the electronic equipment can acquire a sample image in a real physical scene through the camera and synchronously acquire a first real-time pose of the camera in the real physical scene; therefore, the method does not need to introduce artificial calibration objects in the actual physical scene, and only can obtain the scaled target point cloud according to the data acquired by the electronic equipment, so that the scaled image processing can be realized more conveniently and rapidly, and the introduction of additional material cost and labor cost can be avoided.

Drawings

Fig. 1 is a schematic diagram of a part of a structure of a mobile phone according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an implementation flow of an image processing method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an implementation flow of another image processing method according to an embodiment of the present application;

fig. 4A is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

FIG. 4B is a schematic diagram of another image processing apparatus according to an embodiment of the present disclosure;

Fig. 5 is a schematic diagram of a hardware entity of an electronic device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a chip according to an embodiment of the present application.

Detailed Description

For the purposes, technical solutions and advantages of the embodiments of the present application to be more apparent, the specific technical solutions of the present application will be described in further detail below with reference to the accompanying drawings in the embodiments of the present application. The following examples are illustrative of the present application, but are not intended to limit the scope of the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

It should be noted that the term "first\second\third" in relation to the embodiments of the present application is merely to distinguish similar or different objects and does not represent a specific ordering for the objects, it being understood that the "first\second\third" may be interchanged in a specific order or sequence, where allowed, to enable the embodiments of the present application described herein to be practiced in an order other than that illustrated or described herein.

The method provided in the embodiments of the present application operates on an electronic device, which may also refer to a User Equipment (UE), an access terminal, a subscriber unit, a subscriber station, a mobile station, a remote terminal, a mobile device, a User terminal, a wireless communication device, a User agent, or a User Equipment. An access terminal may be a cellular telephone, a cordless telephone, a session initiation protocol (Session Initiation Protocol, SIP) phone, a wireless local loop (Wireless Local Loop, WLL) station, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device with wireless communication capabilities, a computing device or other processing device connected to a wireless modem, an in-vehicle device, a wearable device, a terminal device in a future 5G network or a terminal device in a future evolved public land mobile network (Public Land Mobile Network, PLMN) network, etc.

Taking an electronic device as an example of a mobile phone, fig. 1 is a block diagram illustrating a part of a structure of the mobile phone related to a method embodiment provided in an embodiment of the present application. Referring to fig. 1, a mobile phone includes: radio Frequency (RF) circuitry 170, memory 180, input unit 160, display unit 150, sensor 140, audio circuitry 120, wireless-fidelity (Wi-Fi) module 130, processor 110, and power supply 190. Those skilled in the art will appreciate that the handset configuration shown in fig. 1 is not limiting of the handset and may include more or fewer components than shown, or may combine certain components, or may be arranged in a different arrangement of components.

The following describes the components of the mobile phone in detail with reference to fig. 1.

The RF circuit 170 may be used for receiving and transmitting signals during the process of receiving and transmitting information or communication, specifically, after receiving downlink information of the base station, the downlink information is processed by the processor 110; and, the uplink data is transmitted to the base station. Typically, the RF circuitry 170 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry 170 may also communicate with networks and other devices via wireless communications. The wireless communications may use any communication standard or protocol including, but not limited to, global System for Mobile communications (GSM), general Packet Radio Service (GPRS), code Division Multiple Access (CDMA), wideband Code Division Multiple Access (WCDMA), long Term Evolution (LTE), email, short Message Service (SMS), and the like.

The memory 180 may be used to store software programs and modules, and the processor 110 performs various functional applications and data processing of the cellular phone by running the software programs and modules stored in the memory. The memory 180 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a three-dimensional reconstruction function according to an embodiment of the present application), and the like; the storage data area may store data created from the use of the handset (such as the sample image and the first real-time pose set to which embodiments of the present application relate), and so on. In addition, memory 180 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The input unit 160 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the handset. In particular, the input unit 160 may include a touch panel and other input devices 161. The touch panel, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations thereon or thereabout by a user using any suitable object or accessory such as a finger, stylus, etc.), and drive the corresponding connection device according to a predetermined program. Alternatively, the touch panel may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor, and can receive and execute commands sent by the processor. In addition, the touch panel may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The input unit may include other input devices in addition to the touch panel. In particular, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 150 may be used to display information input by a user or information provided to the user and various menus of the mobile phone. The display unit may include a display panel 151. Alternatively, the display panel may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light Emitting Diode (OLED), or the like. Further, the touch panel 161 may cover the display panel 151, and when the touch panel 161 detects a touch operation thereon or thereabout, the touch panel 161 is transferred to the processor 110 to determine the type of touch event, and then the processor 110 provides a corresponding visual output on the display panel 151 according to the type of touch event. Although in fig. 1, the touch panel 161 and the display panel 151 implement input and output functions of the mobile phone as two independent components, in some embodiments, the touch panel 161 may be integrated with the display panel 151 to implement input and output functions of the mobile phone.

The handset may also include at least one sensor 140, such as a light sensor, an inertial measurement unit, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel according to the brightness of ambient light, and the proximity sensor may turn off the display panel and/or the backlight when the mobile phone moves to the ear. The inertial measurement unit can detect acceleration and angular velocity in all directions (generally three axes), can detect gravity and direction when stationary, and can be used for identifying mobile phone gesture applications (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration identification related functions (such as pedometer and knocking) and the like; other sensors such as barometer, hygrometer, thermometer, infrared sensor, etc. that may be configured in the mobile phone are not described herein.

The audio circuitry, as well as an ultrasonic receiver 122, such as a microphone, and an ultrasonic transmitter, such as an earpiece 121, in the ultrasonic module 120, may provide an audio interface between the user and the handset. The audio circuit can transmit the received electrical signal after the audio data conversion to the receiver 121, and the electrical signal is converted into a sound signal by the receiver to be output; on the other hand, the microphone 122 converts the collected sound signals into electrical signals, which are received by an audio circuit and converted into audio data, which are processed by an audio data output processor, and then sent to, for example, another mobile phone via the RF circuit 170, or the audio data are output to a memory for further processing.

WiFi belongs to a short-distance wireless transmission technology, and a mobile phone can help a user to send and receive emails, browse webpages, access streaming media and the like through the WiFi module 130, so that wireless broadband Internet access is provided for the user. Although fig. 1 shows the WiFi module 130, it is understood that it does not belong to the necessary constitution of the mobile phone, and can be omitted entirely as required within a range that does not change the essence of the present application.

The processor 110 is a control center of the mobile phone, connects various parts of the entire mobile phone using various interfaces and lines, and performs various functions of the mobile phone and processes data by running or executing application programs and/or modules stored in the memory 180 and invoking data stored in the memory 180, thereby performing overall monitoring of the mobile phone. Optionally, the mobile phone further includes a power supply 190 (such as a battery) for supplying power to the various components, and preferably, the power supply 190 may be logically connected to the processor 110 through a power management system, so as to implement functions of managing charging, discharging, and power consumption management through the power management system.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which will not be described herein.

In the embodiment of the present application, the processor 110 included in the mobile phone has the following image processing functions: collecting N frames of sample images in a real physical scene through a camera on a mobile phone, wherein N is an integer greater than 0; acquiring a first real-time pose set, wherein the first real-time pose set comprises a first real-time pose of the camera in the real physical scene when each frame of the sample image is acquired; performing three-dimensional reconstruction according to the N frames of sample images to obtain an initial point cloud and a first reconstruction pose set; the first reconstruction pose set comprises a first reconstruction pose of a camera corresponding to each frame of the sample image; and performing scale transformation on at least the initial point cloud according to the first real-time pose set and the first reconstruction pose set to obtain a target point cloud.

The functions implemented by the image processing method according to the embodiments of the present application may be implemented by invoking program codes by a processor in the electronic device, and of course, the program codes may be stored in a computer storage medium, where it is apparent that the electronic device includes at least the processor and the storage medium.

Fig. 2 is a schematic flowchart of an implementation of an image processing method according to an embodiment of the present application, as shown in fig. 2, where the method at least includes the following steps 101 to 104:

step 101, acquiring N frames of sample images in a real physical scene through a camera on electronic equipment; n is an integer greater than 0.

In some embodiments, the electronic device may capture N frames of Red Green Blue (RGB) images in a realistic physical scene with an installed monocular camera. Such as malls, airports, underground parks, office buildings, etc. In practical applications, the electronic device generally collects the sample image while moving.

Step 102, acquiring a first real-time pose set, wherein the first real-time pose set comprises a first real-time pose of the camera in the real physical scene when each frame of the sample image is acquired.

When the method is realized, the first real-time pose corresponding to each sample image in the real physical scene can be acquired through the augmented reality application on the electronic equipment, so that a first real-time pose set comprising N first real-time poses is obtained.

That is, the augmented reality application acquires data in synchronization with the camera, the former acquiring real-time pose of the camera in a real physical scene, and the latter acquiring sample images in the real physical scene. It should be noted that the augmented reality application may be any application program capable of obtaining the first real-time pose. For example, the augmented reality application is an arore application. The first real-time pose and the second real-time pose described below in the embodiments of the present application do not refer to a certain value, that is, the first real-time pose corresponding to different sample images may be different, and the second real-time pose corresponding to different sample images may be different. The same is true for other terms with "first", "second".

In an embodiment of the present application, the first real-time pose includes a first real-time position and pose of the camera in a real-world physical scene.

In some embodiments, the electronic device obtaining the first real-time pose may include: collecting acceleration and angular velocity of the electronic equipment through an inertial measurement unit on the electronic equipment; acquiring a relative pose between the camera and the inertial measurement unit; and determining the first real-time pose according to the acceleration, the angular velocity and the relative position.

Step 103, performing three-dimensional reconstruction according to the N frames of sample images to obtain an initial point cloud and a first reconstruction pose set; the first reconstruction pose set comprises a first reconstruction pose of a camera corresponding to each sample image.

In some embodiments, the electronic device may process the N frames of sample images according to an in-motion recovery structure (Structure from motion, SFM) algorithm to obtain an initial point cloud and a first set of reconstructed poses. In practice, the electronic device may employ COLMAP for three-dimensional reconstruction. And inputting N frames of sample images by utilizing an executable file provided by the COLMAP, so that three-dimensional reconstruction can be realized.

And 104, performing scale transformation on at least the initial point cloud according to the first real-time pose set and the first reconstruction pose set to obtain a target point cloud.

It will be appreciated that in practical applications, the unit of the coordinate system in which the first real-time pose set is located may be different from the unit of the coordinate system in which the first reconstructed pose set is located, for example the unit of the coordinate system in which the former is meter and the unit of the coordinate system in which the latter is centimeter. Therefore, if the initial point cloud obtained by three-dimensional reconstruction is directly used as the target point cloud, the problem of inaccurate positioning precision of the scale-free target point cloud when the scale-free target point cloud is applied to indoor and outdoor navigation positioning is caused. For example, the user actually locates in the underground parking garage, and the positioning result output by the electronic device is that the position of the user is on a certain road.

Based on this, in the embodiment of the application, the electronic device may utilize the feature that the augmented reality application can obtain the first real-time pose of the camera in the real physical scene, and scale-convert the initial point cloud according to the first real-time pose set and the first reconstruction pose set, so as to obtain a scaled target point cloud, and further improve the accuracy of navigation and positioning when the target point cloud is applied to indoor and outdoor navigation positioning.

In some embodiments, the electronic device may implement step 104 by steps 204 through 205 of the following embodiments; further, the electronic device may further implement step 104 through steps 304 to 307 of the following embodiments; still further, the electronic device may implement step 104 through steps 404 to 414 of the following embodiments.

The embodiment of the present application further provides an image processing method, which may include the following steps 201 to 205:

step 201, collecting N frames of sample images in a real physical scene through a camera on an electronic device; n is an integer greater than 0;

step 202, acquiring, by an augmented reality application on the electronic device, first real-time poses corresponding to the camera in the real-time physical scene when each sample image is acquired, so as to obtain a first real-time pose set including N first real-time poses;

Step 203, performing three-dimensional reconstruction according to the N frames of sample images to obtain an initial point cloud and a first reconstruction pose set; the initial point cloud comprises three-dimensional reconstruction coordinates of a plurality of sampling points, and the first reconstruction pose set comprises a first reconstruction pose of a camera corresponding to each sample image;

step 204, determining a pose transformation relationship between the first real-time pose set and the first reconstructed pose set, wherein the pose transformation relationship comprises a scale transformation relationship, a rotation relationship and a translation relationship.

In implementation, the electronic device may determine the scaling relationship between the two sets through steps 404-412 of the following embodiments.

And the rotational relationship and translational relationship may be derived by determining a rigid translational rotational transformation between the first real-time pose set and the first reconstructed pose set. For example, as shown in the following formula (1), the rotation relationship R and the translation relationship t can be obtained by obtaining the optimal solution of the formula (1) by the least square method.

Wherein R and t are the slave point sets P _A (i.e., the first set of reconstructed poses) transform to a set of points P _B Panning transformation (i.e. first real-time pose set), p _i Representing a set of points P _A Each element of (a), q _i Representing a set of points P _B Is included in the list.

And 205, transforming the three-dimensional reconstruction coordinates of each sampling point in the initial point cloud according to the pose transformation relationship to obtain the target point cloud.

For example, the three-dimensional reconstruction coordinate q for the kth sampling point in the initial point cloud is calculated according to the following formula (2) _k Transforming to obtain q' _k ：

q' _k ＝s*R*q _k +t (2)；

Wherein the number of sampling points in the initial point cloud is n, k is an integer greater than 0 and less than or equal to n. That is, the transformation method of the three-dimensional reconstruction coordinates of each sampling point is the same. In the formula, s represents a scale transformation relationship, R represents a rotation relationship, and t represents a translation relationship.

In some embodiments, the first reconstruction pose includes a first reconstruction location and a first reconstruction pose, and the electronic device may further perform steps 211 through 213 of:

step 211, transforming each of the first reconstruction positions according to the pose transformation relationship to obtain a third reconstruction position.

It is understood that the first reconstruction location is the three-dimensional reconstruction coordinates of the camera. The method of transforming to the third reconstruction location is the same as the method of transforming the three-dimensional reconstruction coordinates of the sampling point, i.e., see the above formula (2).

Step 212, transforming each first reconstruction gesture according to the rotation relationship in the gesture transformation relationship to obtain a corresponding second reconstruction gesture;

Step 213, adding each third reconstruction location and corresponding second reconstruction pose and sample image as a set of data to the target point cloud.

The embodiment of the present application further provides an image processing method, which may include the following steps 301 to 307:

step 301, acquiring N frames of sample images in a real physical scene through a camera on an electronic device; n is an integer greater than 0;

step 302, through an augmented reality application on the electronic device, acquiring first real-time poses corresponding to the camera in the real physical scene when each sample image is acquired, so as to obtain a first real-time pose set including N first real-time poses;

step 303, performing three-dimensional reconstruction according to the N frames of sample images to obtain an initial point cloud and a first reconstruction pose set; the initial point cloud comprises three-dimensional reconstruction coordinates of a plurality of sampling points, and the first reconstruction pose set comprises a first reconstruction pose of a camera corresponding to each sample image;

step 304, respectively carrying out re-centering processing on the first real-time pose set and the first reconstruction pose set to obtain a second real-time pose set and a second reconstruction pose set, wherein the re-centering processing is used for eliminating the influence of a coordinate system where the pose is located on a determination result of the scale transformation relation.

It will be appreciated that, if the scale change relationship between the first real-time pose set and the first reconstructed pose set is determined directly from the two, the result is actually inaccurate, as affected by the coordinate system. Therefore, it is necessary to perform the re-centering process on the two sets, respectively, and the implementation of the re-centering process is, for example, steps 404 to 407 of the following embodiments. The purpose is to obtain a more accurate scaling relationship.

Step 305, determining a scale transformation relationship between the second real-time pose set and the second reconstructed pose set.

For example, a scaling relationship between the second real-time pose set and the second reconstructed pose set may be determined by steps 408 through 412 of the following embodiments.

Step 306, determining a rotational relationship and a translational relationship between the second real-time pose set and the second reconstructed pose set.

It will be appreciated that from the two sets of poses after the re-centering process, a more accurate rotational and translational relationship can be obtained.

And step 307, transforming the three-dimensional reconstruction coordinates of each sampling point in the initial point cloud according to the scale transformation relationship, the rotation relationship and the translation relationship to obtain the target point cloud.

The embodiment of the present application further provides an image processing method, which may include the following steps 401 to 414:

step 401, collecting N frames of sample images in a real physical scene through a camera on the electronic equipment; n is an integer greater than 0;

step 402, through an augmented reality application on the electronic device, acquiring first real-time poses corresponding to the camera in the real physical scene when each sample image is acquired, so as to obtain a first real-time pose set including N first real-time poses; wherein the first real-time pose comprises a first real-time position;

step 403, performing three-dimensional reconstruction according to the N frames of sample images to obtain an initial point cloud and a first reconstruction pose set; the initial point cloud comprises three-dimensional reconstruction coordinates of a plurality of sampling points, and the first reconstruction pose set comprises a first reconstruction pose of a camera corresponding to each sample image; wherein the first reconstruction pose comprises a first reconstruction position;

step 404, determining a first center position of a center point of the first real-time pose set according to the N first real-time positions.

In some embodiments, the electronic device determines an average position of the N first real-time positions as a first center position. For example, the N first real-time positions are respectively: p is p ₁ (x ₁ ,y ₁ ,z ₁ )、p ₂ (x ₂ ,y ₂ ,z ₂ ) And p ₃ (x ₃ ,y ₃ ,z ₃ ) The first center position is 13 (x ₁ +x ₂ +x ₃ ,y ₁ +y ₂ +y ₃ ,z ₁ +z ₂ +z ₃ )。

In another embodiment, the electronic device may further determine a weighted average of the N first real-time positions as the first center position.

And step 405, processing each first real-time position according to the first central position to obtain a second real-time pose set including N second real-time positions.

In some embodiments, an electronic deviceThe difference between each of the first real-time positions and the first center position may be determined as a corresponding second real-time position. For example, the first real-time position is p ₁ (x ₁ ,y ₁ ,z ₁ ) The first central position is p ₀ (x ₀ ,y ₀ ,z ₀ ) The second real-time position is p ₁ '(x ₁ -x ₀ ,y ₁ -y ₀ ,z ₁ -z ₀ )。

In some embodiments, the electronic device may also determine the product of the difference value and a particular coefficient as the second real-time location.

Step 406, determining a second center position of a center point of the first reconstruction pose set according to the N first reconstruction positions;

step 407, processing each of the first reconstruction positions according to the second center position to obtain a second reconstruction pose set including N second reconstruction positions.

It should be noted that, the second center position is the same as the first center position, and the second reconstructed position is the same as the second real-time position, so that the description thereof will not be repeated here.

Step 408, determining a first distance between each of the second real-time locations and the first center location.

In some embodiments, the first distance may be a euclidean distance or cosine similarity, or the like. Correspondingly, the second distance is the same as the first distance in the parameter type, and when the first distance is the Euclidean distance, the second distance is also the Euclidean distance; when the first distance is cosine similarity, the second distance is cosine similarity.

Step 409, accumulating each first distance to obtain a sum of the first distances;

step 410, determining a second distance between each of the second reconstruction locations and the second center location;

step 411, accumulating each of the second distances to obtain a sum of the second distances;

step 412, determining the scaling relationship according to the sum of the first distances and the sum of the second distances.

For example, a ratio between the first distance sum and the second distance sum is determined as the scaling relationship.

Step 413, determining a rotational relationship and a translational relationship between the second real-time pose set and the second reconstructed pose set;

and step 414, transforming the three-dimensional reconstruction coordinates of each sampling point in the initial point cloud according to the scale transformation relationship, the rotation relationship and the translation relationship to obtain the target point cloud.

Based on this, an exemplary application of the embodiments of the present application in one practical application scenario will be described below.

The embodiment of the application provides a scale recovery method based on three-dimensional reconstruction of images. The method can realize the reconstruction of the three-dimensional structure model which is more reliable and has the same proportion with the actual physical scene in the large indoor and outdoor scenes. The method has the advantages of less equipment dependence, lower cost, higher reconstruction precision and more accurate scale recovery, and the recovered scale map can be used in indoor and outdoor positioning navigation solutions. As shown in fig. 3, the method mainly includes three parts: data acquisition, data preprocessing, scale recovery and rotation-parallel transformation.

In the embodiment of the application, the data acquisition part mainly performs RGB image acquisition of a real physical scene and real-time pose acquisition of a camera under a synchronous timestamp, and then inputs acquired data as data for data preprocessing. The data acquisition section as shown in fig. 3 is realized by the following steps S11 and S12:

s11, utilizing a monocular camera to acquire RGB images;

and S12, acquiring the real-time pose of the camera under the synchronous timestamp by using the ARCore application, namely acquiring the first real-time pose.

Among them, RGB image acquisition is performed with a monocular camera for S11, and the explanation is given here. An RGB image is an example of the sample image described in the above embodiment. In order to make the RGB image exactly aligned with the time stamp of the real-time pose of the camera recorded by the ARCore in step S12, the most direct and effective scheme in the camera model selection is to use the camera of the android mobile phone supporting the ARCore application. The android mobile phone can load a data acquisition application program written by the android mobile phone, and the synchronization of the RGB image and the real-time pose data is ensured through hardware and software control. Therefore, in the embodiment of the application, the android mobile phone is adopted for data acquisition. The reason is that most cameras of mobile phones adopt rolling shutters with common precision, and cannot move violently when RGB images are acquired, otherwise, the photos can generate motion blur, and environmental reconstruction and scale recovery effects are affected.

Wherein the real-time pose acquisition of the camera is performed for S12, which is explained here below. The ARCore uses an inertial measurement unit of the android mobile phone, the inertial measurement unit can record the acceleration and the angular velocity of the mobile phone in real time, and the ARCore calculates the real-time pose of the camera according to the acceleration and the angular velocity data and the relative pose between the camera and the inertial measurement unit; and then, simultaneously recording the RGB image and the real-time pose of the camera at the moment through the data acquisition application program, and completing the acquisition of the pose of the camera under the synchronous timestamp of S12.

In the embodiment of the application, the data preprocessing part mainly obtains the non-scale reconstruction of the real physical scene through an SFM algorithm, and aligns the real-time pose data obtained by ARCore with the reconstructed pose data to be used as the input of scale recovery and panning transformation. As shown in fig. 3, the data preprocessing section may be realized by the following steps S21 and S22:

s21, performing three-dimensional reconstruction according to the RGB image data set output in the S11 to obtain initial point cloud data and a first reconstruction pose corresponding to each RGB image;

s22, carrying out data alignment on the reconstructed pose and the pose data recorded by ARCore.

Among them, for the implementation method of the S21 three-dimensional reconstruction, the following explanation is given here. The SFM algorithm is used for three-dimensional reconstruction, for example, using COLMAP. And inputting the RGB image data set by utilizing the executable file provided by the COLMAP, so that the point cloud data of the real physical scene and the reconstruction pose data corresponding to each RGB image, namely the first reconstruction pose, can be reconstructed.

Wherein the reconstructed pose and the real-time pose data recorded by the arore are data aligned for S22, which is explained here below. The reconstructed pose data corresponding to each RGB image obtained in S21 is not recorded in the sequence of RGB image time stamps, but the real-time pose data corresponding to the RGB image obtained by ARCore is recorded in the sequence of RGB image time stamps, and alignment of the two sets of pose data corresponding to the RGB image is necessary to make scale recovery and input of the panning transform error free. The alignment process is to sort the real-time pose data corresponding to each RGB image obtained in the step S21 according to the time stamp sequence.

In the embodiment of the application, the scale recovery and rotation transformation part calculates the data obtained by the data preprocessing part to obtain a final and reliable three-dimensional structure model reconstruction result which is equal to the real physical scene in proportion. As shown in fig. 3, the scale recovery and the panning transformation part can be realized by the following steps S31 and S32:

s31, solving a pose transformation relation between two groups of pose data, wherein the pose transformation relation comprises the following steps: a scale transformation relationship, a rotation relationship R and a translation relationship t;

and S32, applying the pose transformation relation to the three-dimensional reconstruction data output in the S21 to obtain a three-dimensional reconstruction result with scale information.

Wherein the pose transformation relationship between the two sets of pose data is solved for S31, the explanation is given here. The scale transformation relation s is calculated first, and then the rotation-parallel transformations R and t are calculated.

In some embodiments, the method for calculating the scale transformation relation s is as follows, the two sets of pose data can be regarded as two point sets, and the translation point set of the reconstructed pose obtained by three-dimensional reconstruction is recorded as P _A The translation point set of the real-time pose obtained by ARCore is recorded as P _B . In order to eliminate the influence of the inconsistency of the coordinate system on the calculation result of s, it is necessary to perform a point set re-centering process. First, the center points of two sets of points are calculated, and the point set P is represented by the following formula (3):

I.e. point set P _A The reconstruction coordinates in each reconstruction pose are included; point set P _B Including real-time coordinates in each real-time pose.

For P _A And P _B The center point μ of the point set is calculated according to the following formulas (4) and (5), respectively _A Sum mu _B ：

Wherein N represents a point set P _A The number of reconstructed coordinates included in the set of points P _A The number of data in (a) and the set of points P _B The number of data in (a) is equal. As can be seen from the above equation, the center point of the point set is the average of all coordinates in the point set. For example, the point set P includes a coordinate point P ₁ (x ₁ ,y ₁ ,z ₁ )、p ₂ (x ₂ ,y ₂ ,z ₂ ) And p ₃ (x ₃ ,y ₃ ,z ₃ ) The coordinates of the center point mu of the point set P are

The two sets of points are re-centred to obtain a new set of points P according to the following formulas (6) and (7) _A ' and P _B ′：

P _A ′＝P _A -μ _A (6)；

P _B ′＝P _B -μ _B (7)；

The scaling relationship s is obtained according to the following formula (8):

in the formula, ||P _B ′|| ₂ Representing a new set of points P _B Coordinates of each point in' and center point mu _B The sum of Euclidean distances between, i.e

||P _A ′|| ₂ The meaning of the representation is similar and will not be described in detail here.

In some embodiments, the method of calculating the panning relationship, i.e., the rotation relationship R and the panning relationship t, is as follows. Calculating two Point sets P _A ' and P _B The' rigid translational rotational transformation, as shown in the following equation (9), can be translated into a least squares optimization problem:

Wherein R and t are the slave point sets P _A Transforming to a Point set P _B P is the panning transform of (2) _i Representing a set of points P _A Each element of (a), q _i Representing a set of points P _B Is included in the list. The minimum value of the above equation (9) is calculated, i.e., converted to a solution in which the derivatives R and t are 0 in equation (9).

Let R in the above formula (9) be an invariant, and let t be derived, and let F (t) = (R, t), the following formula (10) can be obtained:

order the

The following formula (11) can be derived:

substituting formula (11) into the least squares expression (9) yields the following formula (12):

the following formula (13) can be obtained:

the above formula (13) is developed in a matrix representation to obtain the following formula (14):

since the rotation matrix R is an orthogonal matrix, there is RR ^T =i, while knowing that in equation (14) above the second and third terms are scalar, the scalar transpose is still equal to the scalar itself, so that equation (15) below can be obtained:

|Rp _i ′-q _i ′|| ₂ ＝p _i ′ ^T p _i ′-2q _i ′ ^T Rp _i ′+q _i ′ ^T q _i ′ (15)；

the problem now turns to the minimum of equation (15) above, where only one term is related to R and the other two terms are constants, so the problem turns to the minimum of variable R therein, i.e., as shown in equation (16) below:

obtaining the following formula (17):

the conversion in the above equation is to convert the accumulation into a matrix multiplication, where R is an N diagonal matrix, Q 'and P' are 3N matrices, and the trace after the matrix multiplication is equal to the value to the left of the equation. Meanwhile, for the trace of the matrix, there is a transformation relationship as shown in the following formula (18):

In the above formula, since U, R, V is an orthogonal matrix, then m=v ^T RU is also an orthogonal matrix. Solving M maximum trace, and M is an orthogonal matrix, then M is necessarily an identity matrix, namely the following formula (19) is obtained:

I＝M＝V ^T RU (19)；

thereby obtaining a rotation relationship R according to the following expression (20):

R＝VU ^T (20)；

after the rotation matrix R is obtained, the translation vector t can be obtained according to the following formula (21):

wherein, the pose transformation relationship is applied to the data of the three-dimensional reconstruction of the S21 aiming at the S32, so as to obtain a three-dimensional reconstruction result with scale information, and the following explanation is made. And (3) applying S, R and t obtained by solving the S31 to initial point cloud data obtained by three-dimensional reconstruction and reconstruction positions in reconstruction pose, and applying R to reconstruction pose in the reconstruction pose to obtain a three-dimensional reconstruction result with scale information, so that a reconstructed coordinate system can be aligned to an ARCore coordinate system, and good map data is provided for large-scale indoor and outdoor map positioning based on a mobile phone.

In the embodiment of the application, a perfect three-dimensional structure model reconstruction solution which is more reliable and equal in proportion to the real physical world in a large indoor and outdoor scene is provided.

In the embodiment of the application, the solution for reconstructing the three-dimensional structure model is less in dependent equipment, and can be realized only by relying on common android mobile terminal equipment, so that the cost is low, the reconstruction precision is high, the scale recovery is accurate, and the recovered scale map, namely the target point cloud, can be used in indoor and outdoor positioning navigation solutions, such as AR navigation and the like.

Based on the foregoing embodiments, embodiments of the present application provide an image processing apparatus, where the apparatus includes each module included, and may be implemented by a processor in an electronic device; of course, the method can also be realized by a specific logic circuit; in an implementation, the processor may be a Central Processing Unit (CPU), a Microprocessor (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.

Fig. 4A is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, as shown in fig. 4A, the apparatus 400 includes an image acquisition module 401, a pose acquisition module 402, a three-dimensional reconstruction module 403, and a map transformation module 404, where:

the image acquisition module 401 is configured to acquire N frames of sample images in a real physical scene, where N is an integer greater than 0;

a pose acquisition module 402, configured to acquire a first real-time pose set, where the first real-time pose set includes a first real-time pose of the image acquisition module in the real physical scene when acquiring each frame of the sample image;

the three-dimensional reconstruction module 403 is configured to perform three-dimensional reconstruction according to the N frame sample images, so as to obtain an initial point cloud and a first reconstruction pose set; the first reconstruction pose set comprises a first reconstruction pose of an image acquisition module corresponding to each frame of the sample image;

And the map transformation module 404 is configured to perform scale transformation on at least the initial point cloud according to the first real-time pose set and the first reconstructed pose set, so as to obtain a target point cloud.

In some embodiments, the initial point cloud includes three-dimensional reconstruction coordinates of a plurality of sampling points; a map transformation module 404 for: determining a pose transformation relationship between the first real-time pose set and the first reconstructed pose set, wherein the pose transformation relationship comprises a scale transformation relationship, a rotation relationship and a translation relationship; and according to the pose transformation relation, transforming the three-dimensional reconstruction coordinates of each sampling point in the initial point cloud to obtain the target point cloud.

In some embodiments, the map transformation module 404 is to: respectively carrying out re-centering treatment on the first real-time pose set and the first reconstruction pose set to obtain a second real-time pose set and a second reconstruction pose set, wherein the re-centering treatment is used for eliminating the influence of a coordinate system where the pose is located on a determination result of the scale transformation relation; and determining a scale transformation relationship between the second real-time pose set and the second reconstructed pose set.

In some embodiments, the first real-time pose comprises a first real-time position, and the first reconstructed pose comprises a first reconstructed position; a map transformation module 404 for: determining a first center position of a center point of the first real-time pose set according to the N first real-time positions; processing each first real-time position according to the first center position to obtain a second real-time pose set comprising N second real-time positions; determining a second center position of a center point of the first reconstruction pose set according to the N first reconstruction positions; and processing each first reconstruction position according to the second center position to obtain a second reconstruction pose set comprising N second reconstruction positions.

In some embodiments, the map transformation module 404 is to: determining a first distance between each of the second real-time locations and the first central location; accumulating each first distance to obtain the sum of the first distances; determining a second distance between each of the second reconstruction locations and the second center location; accumulating each second distance to obtain the sum of the second distances; and determining the scale transformation relation according to the sum of the first distances and the sum of the second distances.

In some embodiments, the first reconstruction pose includes a first reconstruction location and a first reconstruction pose, as shown in fig. 4B, the apparatus further includes a pose transformation module 405 and a data addition module 406; the pose transformation module 405 is configured to transform each of the first reconstruction positions according to the pose transformation relationship to obtain a third reconstruction position; transforming each first reconstruction gesture according to the rotation relation in the gesture transformation relation to obtain a corresponding second reconstruction gesture; a data adding module 406, configured to add each of the third reconstruction locations and the corresponding second reconstruction pose and sample image as a set of data to the target point cloud.

In some embodiments, the pose acquisition module 402 is configured to acquire acceleration and angular velocity of the device through an inertial measurement unit on the device; acquiring the relative pose between the image acquisition module 401 and the inertial measurement unit; and determining the first real-time pose according to the acceleration, the angular velocity and the relative position.

The description of the apparatus embodiments above is similar to that of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the device embodiments of the present application, please refer to the description of the method embodiments of the present application for understanding.

In the embodiment of the present application, if the image processing method is implemented in the form of a software functional module and sold or used as a separate product, the image processing method may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or part contributing to the related art, and the computer software product may be stored in a storage medium, including several instructions for causing an electronic device to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.

Correspondingly, an electronic device is provided in the embodiment of the present application, fig. 5 is a schematic diagram of a hardware entity of the electronic device in the embodiment of the present application, as shown in fig. 5, where the hardware entity of the electronic device 500 includes: comprising a memory 501 and a processor 502, said memory 501 storing a computer program executable on the processor 502, said processor 502 implementing the steps of the image processing method provided in the above-mentioned embodiments when said program is executed.

The memory 501 is used to store instructions and applications executable by the processor 502, and may also cache data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or processed by each module in the processor 502 and the electronic device 500, and may be implemented by a FLASH memory (FLASH) or a random access memory (Random Access Memory, RAM).

Accordingly, the present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the image processing method provided in the above embodiment.

Fig. 6 is a schematic structural diagram of a chip according to an embodiment of the present application. The chip 600 shown in fig. 6 includes a processor 610, and the processor 610 may call and run a computer program from a memory to implement the methods in the embodiments of the present application.

Optionally, as shown in fig. 6, the chip 600 may further include a memory 620. Wherein the processor 610 may call and run a computer program from the memory 620 to implement the methods in embodiments of the present application.

The memory 620 may be a separate device from the processor 610 or may be integrated into the processor 610.

Optionally, the chip 600 may also include an input interface 630. The processor 610 may control the input interface 630 to communicate with other devices or chips, and in particular, may acquire information or data sent by the other devices or chips.

Optionally, the chip 600 may further include an output interface 640. Wherein the processor 610 may control the output interface 640 to communicate with other devices or chips, and in particular, may output information or data to other devices or chips.

Optionally, the chip may be applied to the electronic device in the embodiment of the present application, and the chip may implement a corresponding flow implemented by the electronic device in each method in the embodiment of the present application, which is not described herein for brevity.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, or the like.

It should be noted here that: the above description of the storage medium, chip and electronic device embodiments is similar to that of the method embodiments described above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium, the chip and the electronic device of the present application, please refer to the description of the method embodiments of the present application for understanding.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" or "some embodiments" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" or "in some embodiments" in various places throughout this specification are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application. The foregoing embodiment numbers are merely for describing, and do not represent advantages or disadvantages of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the image processing apparatus are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other division manners in actual implementation, such as: multiple modules or components may be combined, or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or modules, whether electrically, mechanically, or otherwise.

The modules described above as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules; can be located in one place or distributed to a plurality of network units; some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated in one processing unit, or each module may be separately used as one unit, or two or more modules may be integrated in one unit; the integrated modules may be implemented in hardware or in hardware plus software functional units.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read Only Memory (ROM), a magnetic disk or an optical disk, or the like, which can store program codes.

Alternatively, the integrated units described above may be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or part contributing to the related art, and the computer software product may be stored in a storage medium, including several instructions for causing an electronic device to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.

The methods disclosed in the several method embodiments provided in the present application may be arbitrarily combined without collision to obtain a new method embodiment.

The features disclosed in the several product embodiments provided in the present application may be combined arbitrarily without conflict to obtain new product embodiments.

The features disclosed in the several method or apparatus embodiments provided in the present application may be arbitrarily combined without conflict to obtain new method embodiments or apparatus embodiments.

The foregoing is merely an embodiment of the present application, but the protection scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image processing method, the method comprising:

collecting N frames of sample images in a real physical scene through a camera on electronic equipment, wherein N is an integer greater than 0;

acquiring a first real-time pose set, wherein the first real-time pose set comprises a first real-time pose of the camera in the real physical scene when each frame of the sample image is acquired;

performing three-dimensional reconstruction according to the N frames of sample images to obtain an initial point cloud and a first reconstruction pose set; the first reconstruction pose set comprises a first reconstruction pose of a camera corresponding to each frame of the sample image;

according to the pose transformation relation between the first real-time pose set and the first reconstruction pose set, at least performing scale transformation on the initial point cloud to obtain a target point cloud;

The first reconstruction pose comprises a first reconstruction position and a first reconstruction pose;

transforming each first reconstruction position according to the pose transformation relation to obtain a third reconstruction position;

transforming each first reconstruction gesture according to the rotation relation in the gesture transformation relation to obtain a corresponding second reconstruction gesture;

and adding each third reconstruction position and a corresponding second reconstruction pose and sample image as a set of data into the target point cloud.

2. The method of claim 1, wherein the initial point cloud comprises three-dimensional reconstructed coordinates of a plurality of sampling points;

the pose transformation relation comprises a scale transformation relation, a rotation relation and a translation relation;

and according to the pose transformation relation, transforming the three-dimensional reconstruction coordinates of each sampling point in the initial point cloud to obtain the target point cloud.

3. The method of claim 2, wherein determining a scaling relationship between the first real-time pose set and the first reconstructed pose set comprises:

respectively carrying out re-centering treatment on the first real-time pose set and the first reconstruction pose set to obtain a second real-time pose set and a second reconstruction pose set, wherein the re-centering treatment is used for eliminating the influence of a coordinate system where the pose is located on a determination result of the scale transformation relation;

And determining a scale transformation relationship between the second real-time pose set and the second reconstructed pose set.

4. The method of claim 3, wherein the first real-time pose comprises a first real-time position;

the re-centering process is performed on the first real-time pose set and the first reconstruction pose set respectively to obtain a second real-time pose set and a second reconstruction pose set, including:

determining a first center position of a center point of the first real-time pose set according to the N first real-time positions;

processing each first real-time position according to the first center position to obtain a second real-time pose set comprising N second real-time positions;

determining a second center position of a center point of the first reconstruction pose set according to the N first reconstruction positions;

and processing each first reconstruction position according to the second center position to obtain a second reconstruction pose set comprising N second reconstruction positions.

5. The method of claim 4, wherein the determining a scaling relationship between the second set of real-time poses and the second set of reconstructed poses comprises:

Determining a first distance between each of the second real-time locations and the first central location;

accumulating each first distance to obtain the sum of the first distances;

determining a second distance between each of the second reconstruction locations and the second center location;

accumulating each second distance to obtain the sum of the second distances;

and determining the scale transformation relation according to the sum of the first distances and the sum of the second distances.

6. The method of claim 1, wherein acquiring the first real-time pose comprises:

collecting acceleration and angular velocity of the electronic equipment through an inertial measurement unit on the electronic equipment;

acquiring a relative pose between the camera and the inertial measurement unit;

and determining the first real-time pose according to the acceleration, the angular velocity and the relative pose.

7. An image processing apparatus, comprising:

the image acquisition module is used for acquiring N frames of sample images in a physical scene, wherein N is an integer greater than 0;

the pose acquisition module is used for acquiring a first real-time pose set, wherein the first real-time pose set comprises a first real-time pose of the image acquisition module in the real physical scene when acquiring each frame of the sample image;

The three-dimensional reconstruction module is used for carrying out three-dimensional reconstruction according to the N frames of sample images to obtain an initial point cloud and a first reconstruction pose set; the first reconstruction pose set comprises a first reconstruction pose of the image acquisition module corresponding to each sample image; the first reconstruction pose comprises a first reconstruction position and a first reconstruction pose;

the map transformation module is used for performing scale transformation on at least the initial point cloud according to the pose transformation relation between the first real-time pose set and the first reconstruction pose set to obtain a target point cloud;

the pose transformation module is used for transforming each first reconstruction position according to the pose transformation relation to obtain a third reconstruction position; transforming each first reconstruction gesture according to the rotation relation in the gesture transformation relation to obtain a corresponding second reconstruction gesture;

and the data adding module is used for adding each third reconstruction position, the corresponding second reconstruction gesture and the sample image into the target point cloud as a set of data.

8. An electronic device comprising a memory and a processor, the memory storing a computer program executable on the processor, characterized in that the processor implements the steps of the image processing method of any of claims 1 to 6 when the program is executed.

9. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the image processing method of any one of claims 1 to 6.