CN116468917A

CN116468917A - Image processing method, electronic device and storage medium

Info

Publication number: CN116468917A
Application number: CN202310283069.9A
Authority: CN
Inventors: 顾晓光
Original assignee: Hubei Xingji Meizu Technology Co ltd
Current assignee: Hubei Xingji Meizu Technology Co ltd
Priority date: 2023-03-17
Filing date: 2023-03-17
Publication date: 2023-07-21

Abstract

The application provides an image processing method, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a plurality of image frames at equal intervals; extracting the characteristics of each image frame, and determining the characteristic points of the current image frame, wherein the characteristic point data of the characteristic points comprise descriptors for describing the image areas around the characteristic points; performing feature matching on each feature point and all feature points of the previous image frame, and determining the feature points of the previous image frame matched with the feature points; and transmitting the matching relation data and the image data to the terminal, so that the terminal determines a three-dimensional model of the target based on the matching relation data and the image data, wherein the matching relation data comprises the matched characteristic points, the characteristic point IDs of the characteristic points of the previous image frame and the characteristic point data, and the image data comprises the image data of the current image frame. According to the method and the device, the three-dimensional model with accurate parameters can be obtained, so that the generation effect of the three-dimensional image is improved, and the user experience is improved.

Description

Image processing method, electronic device and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method, an electronic device, and a storage medium.

Background

Smart head-mounted devices (e.g., AR glasses, VR glasses, or MR glasses, etc.) are rapidly developing and are increasingly being used. The intelligent head-mounted device has three-dimensional content display and interaction functions, but the production of three-dimensional content at present mainly depends on professional devices and professional staff. Shooting and sharing of three-dimensional images is not as convenient as two-dimensional images, and smart headsets equipped with cameras are more natural and convenient to shoot than hand-held terminals (e.g., cell phones).

Disclosure of Invention

In a first aspect, an embodiment of the present application provides an image processing method, which is applied to an intelligent wearable device, including:

acquiring a plurality of image frames at equal intervals;

extracting the characteristics of each image frame, and determining the characteristic points of the current image frame, wherein the characteristic point data of the characteristic points comprise descriptors for describing the image areas around the characteristic points;

performing feature matching on each feature point and all feature points of the previous image frame, and determining the feature points of the previous image frame matched with the feature points;

and transmitting matching relation data and image data to a terminal, so that the terminal determines a three-dimensional model of a target based on the matching relation data and the image data, wherein the matching relation data comprises the matched characteristic points, the characteristic point IDs of the characteristic points of the previous image frame and the characteristic point data, and the image data comprises the image data of the current image frame.

In some embodiments, further comprising:

performing feature matching on each feature point and all feature points of the previous image frame, and determining Euclidean distances between each feature point and all feature points of the previous image frame;

determining the characteristic points of the previous image frame matched with the characteristic points based on the Euclidean distances;

or, performing feature matching on each feature point and all feature points of the previous image frame based on a fast neighbor matching algorithm, and determining the feature points of the previous image frame matched with the feature points.

In some embodiments, further comprising:

determining that the characteristic points are not matched with all the characteristic points of the previous image frame, and discarding the characteristic points;

and determining that each characteristic point is not matched with all characteristic points of the previous image frame, and discarding the current image frame.

In some embodiments, further comprising:

determining that the characteristic points are not matched with all the characteristic points of the previous image frame, and reserving the characteristic points;

and determining that each characteristic point is not matched with all characteristic points of the previous image frame, reserving the current image frame, and discarding the previous image frame.

In a second aspect, an embodiment of the present application provides an image processing method, applied to a terminal, including:

Receiving matching relation data and image data sent by intelligent wearing equipment, wherein the matching relation data is determined by the intelligent wearing equipment based on feature matching of each feature point of a current image frame and all feature points of a previous image frame, the matching relation data comprises the feature points which are matched, feature point IDs of the feature points of the previous image frame and feature point data, the image data comprises the image data of the current image frame, and the feature point data of the feature points comprises descriptors for describing image areas around the feature points;

determining the camera pose and the three-dimensional space point of the current image frame based on the matching relation data, the current image frame and the last image frame;

performing BA optimization on all the determined camera pose and the three-dimensional space points;

determining that a new image frame is not received, and performing BA optimization on all the determined camera pose and the three-dimensional space points to obtain a final camera pose and three-dimensional space points;

and determining a three-dimensional model based on all received image frames, and finally camera pose, camera internal parameters and three-dimensional space points of all the image frames.

In some embodiments, further comprising:

setting the camera pose of the current image frame as a unit matrix, and determining a target matrix based on the unit matrix and the matching relation data;

decomposing the target matrix to obtain the camera pose of the current image frame;

and performing triangulation based on the camera pose of the current image frame, the camera pose of the previous image frame and the matching relationship data to determine the three-dimensional space point.

In some embodiments, further comprising:

and inputting all received image frames, namely final camera pose, camera internal parameters and three-dimensional space points of all the image frames into a target algorithm, and outputting the three-dimensional model, wherein the target algorithm is an MVS algorithm or a NeRF algorithm.

In some embodiments, further comprising:

and outputting a multi-view rendering video or a three-dimensional data file based on the three-dimensional model.

In a third aspect, embodiments of the present application further provide an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements any one of the image processing methods described above when the program is executed by the processor.

In a fourth aspect, embodiments of the present application also provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an image processing method as described in any of the above.

In a fifth aspect, embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements an image processing method as described in any of the above.

Drawings

For a clearer description of the present application or of the prior art, the drawings that are used in the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description below are some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of a smart wearable device according to an embodiment of the present application

FIG. 2 is a flow chart of an image processing method according to an embodiment of the present application;

FIG. 3 is a second flow chart of an image processing method according to an embodiment of the present disclosure;

FIG. 4 is a third flow chart of an image processing method according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions and advantages of the present application more apparent, the technical solutions in the present application will be clearly and completely described below with reference to the drawings in the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

Some image processing methods provided by the embodiments of the present application may be applied to intelligent wearable devices such as wearable devices, augmented reality (augmented reality, AR)/Virtual Reality (VR) devices, and the specific types of the intelligent wearable devices are not limited in any way.

Illustratively, fig. 1 is a schematic structural diagram of a smart wearable device provided in an embodiment of the present application, as shown in fig. 1, the smart wearable device 100 may include a processor 110, an internal memory 121, a universal serial bus (universalserial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, a wireless communication module 160, a sensor module 180, a key 190, a light emitting diode (light emitting diode, LED) lamp 191, a camera 193, a display component 194, an optical machine 195, and the like; wherein the sensor module 180 includes a touch sensor 180K; the light engine 195 includes a lens and a display screen.

It is to be understood that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the smart wearable device 100. In other embodiments of the present application, the smart wearable device 100 may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processingunit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, a universal asynchronous receiver transmitter (universal asynchronousreceiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processorinterface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serialbus, USB) interface, among others.

The I2C interface is a bi-directional synchronous serial bus comprising a serial data line (SDA) and a serial clock line (derail clock line, DCL). In some embodiments, the processor 110 may contain multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, charger, flash, camera 193, etc., respectively, through different I2C bus interfaces. For example: the processor 110 may be coupled to the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through an I2C bus interface to implement a touch function of the smart wearable device 100.

The UART interface is a universal serial data bus for asynchronous communications. The bus may be a bi-directional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function.

The MIPI interface may be used to connect processor 110 with peripheral devices such as display element 194, camera 193, and the like. The MIPI interfaces include camera serial interfaces (camera serial interface, CSI), display serial interfaces (displayserial interface, DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the photographing function of smart wearable device 100. The processor 110 and the display component 194 communicate through a DSI interface to implement the display function of the smart wearable device 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal or as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display component 194, the wireless communication module 160, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, an MIPI interface, etc.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the smart wearable device 100, and may also be used to transfer data between the smart wearable device 100 and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. The interface may also be used to connect other smart wearable devices, such as smart bracelets, smart finger rings, or smart phones, among others.

It should be understood that the interface connection relationship between the modules illustrated in the embodiment of the present invention is only illustrated schematically, and does not limit the structure of the smart wearable device 100. In other embodiments of the present application, the smart wearable device 100 may also use different interfacing manners, or a combination of multiple interfacing manners in the foregoing embodiments.

The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charge management module 140 may receive a charging input of a wired charger through the USB interface 130. In some wireless charging embodiments, the charge management module 140 may receive wireless charging input through a wireless charging coil of the smart wearable device 100. The charging management module 140 may also supply power to the smart wearable device 100 through the power management module 141 while charging the battery 142.

The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 to power the processor 110, the internal memory 121, the display component 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be configured to monitor battery capacity, battery cycle number, battery health (leakage, impedance) and other parameters. In other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charge management module 140 may be disposed in the same device.

The wireless communication function of the smart wearable device 100 may be implemented by the antenna 1, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antenna 1 is used for transmitting and receiving electromagnetic wave signals. Each antenna in the smart wearable device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., applied on the smart wearable device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it into electromagnetic waves to radiate through the antenna 1.

In some embodiments, the antenna 1 and the wireless communication module 160 of the smart wearable device 100 are coupled such that the smart wearable device 100 may communicate with a network and other devices through wireless communication techniques. The wireless communication techniques may include the Global System for Mobile communications (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multipleaccess, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (longterm evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigationsatellite system, GLONASS), a beidou satellite navigation system (beidou navigation satellitesystem, BDS), a quasi zenith satellite system (quasi-zenith satellite system, QZSS) and/or a satellite based augmentation system (satellite based augmentation systems, SBAS).

The smart wearable device 100 implements display functions through a GPU, a display part 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display unit 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display section 194 is for displaying images, videos, and the like. The display component 194 may include a display lens or a display mask, and the display component 194 may also include a display screen. The display lens or the display mask can be an optical waveguide (such as a diffraction optical waveguide or a geometric optical waveguide), a free-form prism, a free space or the like; the display lens or the display mask is a propagation path of an imaging light path, and can transmit a virtual image to human eyes. In order to make AR glasses capable of simultaneously seeing real and virtual pictures, a waveguide may be used to transmit virtual picture light into human eyes.

The display screen may be a display panel, and the display panel may be a liquid crystal display (liquid crystaldisplay, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like.

In some embodiments, the smart wearable device 100 may include 1 or N display components 194, N being a positive integer greater than 1.

The smart wearable device 100 may implement a photographing function through an ISP, a camera 193, a video codec, a GPU, a display part 194, an application processor, and the like.

The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, the smart wearable device 100 may include 1 or N cameras 193, N being a positive integer greater than 1. In embodiments of the present application, camera 193 may include at least one infrared camera.

Illustratively, the camera 193 may be configured to acquire a plurality of image frames from which a plurality of time-equally spaced image frames are acquired for performing the image processing method of the embodiment of the present application.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the smart wearable device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.

Video codecs are used to compress or decompress digital video. The smart wearable device 100 may support one or more video codecs. In this way, the smart wearable device 100 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent cognition of the intelligent wearable device 100 can be implemented by the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.

The internal memory 121 may be used to store computer executable program code including instructions. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data (e.g., audio data, phonebook, etc.) created during use of the smart wearable device 100. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like. The processor 110 performs various functional applications and data processing of the smart wearable device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The touch sensor 180K, also referred to as a "touch device". The touch sensor 180K may be disposed on the display unit 194, and a touch screen, also referred to as a "touch screen", is formed by the touch sensor 180K and the display unit 194. The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to the touch operation may be provided through the display part 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the smart wearable device 100 at a different location than the display component 194.

The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The smart wearable device 100 may receive key inputs, generating key signal inputs related to user settings and function control of the smart wearable device 100.

The optical engine 195 is mainly used for imaging, and includes a lens and a display screen, where the lens may be an optical component.

Other image processing methods provided in the embodiments of the present application may be applied to terminal devices such as mobile phones, tablet computers, vehicle-mounted devices, augmented reality (augmented reality, AR)/Virtual Reality (VR) devices, notebook computers, ultra-mobile personal computer, UMPC, netbooks, personal digital assistants (personal digital assistant, PDA), etc., and may also be applied to databases, servers, and service response systems based on terminal artificial intelligence, where, generally, the processing capability of the terminal device described in the embodiments of the present application is stronger than that of an intelligent wearable device, for example, the terminal device has a graphics processing unit including more cores, etc., and the embodiments of the present application do not impose any limitation on the specific type of the terminal device. The mobile terminal (terminal device) in the embodiments of the present application includes various handheld devices, vehicle-mounted devices, computing devices, or other processing devices connected to a wireless modem, such as a mobile phone, a tablet, a desktop notebook, and an intelligent device that can run an application program, including a central console of an intelligent automobile, and the like. Specifically, it may refer to a User Equipment (UE), an access terminal, a subscriber unit, a subscriber station, a mobile station, a remote terminal, a mobile device, a user terminal, a wireless communication device, a user agent, or a user equipment. The terminal device may also be a satellite phone, a cellular phone, a smart phone, a wireless data card, a wireless modem, a machine type communication device, a cordless phone, a session initiation protocol (session initiation protocol, SIP) phone, a wireless local loop (wireless local loop, WLL) station, a personal digital assistant (personal digital assistant, PDA), a handheld device with wireless communication capabilities, a computing device or other processing device connected to a wireless modem, a car-mounted device, a wireless terminal in an industrial control (industrial control), a wireless terminal in a self-driving (self-driving), a wireless terminal in a telemedicine (remote medium), a wireless terminal in a smart grid (smart grid), a wireless terminal in a transportation security (transportation safety), a wireless terminal in a smart city (smart city), a wireless terminal in a smart home (smart home), a terminal device in a 5G network or future communication network, etc. The mobile terminal may be powered by a battery, may also be attached to and powered by a power system of the vehicle or vessel. The power supply system of the vehicle or the ship may also charge the battery of the mobile terminal to extend the communication time of the mobile terminal.

Referring to fig. 2, an embodiment of the present application further provides an image processing method applied to an intelligent wearable device, including, but not limited to, the following steps:

step 201, acquiring a plurality of image frames with equal time intervals;

step 202, extracting features of each image frame, and determining feature points of the current image frame, wherein feature point data of the feature points comprise descriptors for describing image areas around the feature points;

step 203, performing feature matching on each feature point and all feature points of the previous image frame, and determining the feature point of the previous image frame matched with the feature point;

and step 204, sending matching relation data and image data to a terminal, so that the terminal determines a three-dimensional model of a target based on the matching relation data and the image data, wherein the matching relation data comprises the matched characteristic points, the characteristic point IDs of the characteristic points of the previous image frame and the characteristic point data, and the image data comprises the image data of the current image frame.

In the step 201, the smart wearable device may start the camera to shoot and acquire the video segment by receiving the input instruction of the user, and select a plurality of image frames with equal time intervals in real time from the video segment by using the policy with equal time intervals, or may also acquire a plurality of image frames with equal time intervals by shooting with equal time intervals by the camera, which is not limited in the application. The smart wearable device may be AR (Augmented Reality) glasses, VR (Virtual reality) glasses, or the like. The time interval may be set to 500ms, i.e. the acquisition interval of each image frame is 500ms, for example, which may also be adjusted according to the specific situation.

It will be appreciated that each of the image frames described above may include one or more objects, or may not include any objects, and that the objects between each image frame may be the same or different, e.g., one image frame includes object i, object ii, and the previous image frame of one image frame includes object i, object ii, and object iii, which is not a limitation of the present application.

In the above step 202, feature extraction is performed on each image frame, feature points of each image frame are obtained, and feature points of the current image frame are determined from all the feature points.

The feature to be matched in this embodiment may be SIFT feature.

SIFT (Scale Invariant Feature Transform, scale-invariant feature transform), local features of an image are extracted, extreme points are found in a scale space, and position, scale and direction information of the extreme points are extracted. The application range of SIFT includes object identification, robot map sensing and navigation, influence stitching, three-dimensional model establishment, gesture recognition, image tracking, etc.

The characteristics of SIFT features are as follows:

1. the rotation, the scale and the brightness change are kept unchanged, and the video angle change, the noise and the like have stability to a certain extent;

2. The method has the characteristics of being unique, rich in information quantity and suitable for rapid and accurate matching in massive characteristic data;

3. the multiple-scale feature vector can generate a large number of SIFT feature vectors even if a few objects are provided;

4. the expandability can be conveniently combined with feature vectors of other forms;

the essence of the SIFT algorithm is to find key points (feature points) in different scale spaces, calculate the size, direction and scale information of the key points, and use the information to form the problem that the key points describe the feature points.

Note that the feature point data of the feature point in the present embodiment includes descriptors describing image areas around the feature point. In this embodiment, the descriptor may be a 128-dimensional vector describing the image information of the 16×16 pixel area around the feature point, that is, when the feature points of two adjacent image frames are matched, the descriptor of each feature point of the two image frames may be matched, and when the descriptors are matched, it indicates that the matching of the two feature points is successful.

In the step 203, feature matching is performed on each feature point of the current image frame and all feature points of the previous image frame, and feature points of the previous image frame that are successfully matched are determined.

It should be noted that, the feature matching method in this embodiment may perform matching according to the similarity of feature points or perform matching according to a fast neighbor matching algorithm. In the above step 204, the matching relationship data and the image data are transmitted to the terminal, so that the terminal determines a three-dimensional model of the object based on the matching relationship data and the image data.

The matching relation data includes the matched feature point, the feature point ID of the feature point of the previous image frame, and the feature point data, and the image data includes the image data of the current image frame.

And carrying out feature matching on each target feature and the feature point of the last frame to be processed to obtain matching relation data, sending the matching relation data and the image data of the current frame to be processed to a terminal, and calculating by using terminal computing resources after the terminal receives the matching relation data and the image data of the current frame to be processed, so as to determine the three-dimensional model of the target.

According to the image processing method, the plurality of image frames with equal time intervals are obtained, then feature extraction is carried out on each image frame, feature points of the current image frame are determined, feature matching is carried out on each feature point and all feature points of the previous image frame, feature points of the previous image frame matched with the feature points are determined, finally matching relation data and image data are sent to the terminal, and the terminal determines a three-dimensional model of a target based on the matching relation data and the image data. According to the method and the device, the target image is acquired through the intelligent wearing equipment, calculated by using the calculation resource and then sent to the terminal for calculation, calculation is completed through the terminal and the intelligent wearing equipment, and limitation of calculation force and power consumption of the intelligent wearing equipment can be effectively solved, so that a three-dimensional model with higher fineness is obtained.

In some embodiments, further comprising:

It will be appreciated that the euclidean distance between each feature point and all feature points of the previous image frame, or the feature points may be represented using descriptors when feature matching is performed on each of the feature points and all feature points of the previous image frame based on a fast neighbor matching algorithm.

The embodiment gives a specific way of feature matching.

In the process of feature matching, the euclidean distance between each feature point and all feature points of the previous image frame needs to be determined, the euclidean distances from the current feature point 1 to the feature point a, the feature point B and the feature point C of the previous image frame can be recorded as x1, x2 and x3 … … respectively, and then the similarity with the current feature point 1 is judged according to the euclidean distances. As described above, if x1 < x2 < x3, the euclidean distance between the feature point a and the current feature point 1 is the smallest, i.e. the similarity is the largest, that is, the feature point a of the previous image frame is the feature point successfully matched with the current feature point 1.

In some scenes, when the dimension of the descriptor corresponding to the feature point is high, and in addition, the number of feature points is larger due to the complex scene, the feature point can be realized through a fast neighbor matching (Approximate Nearest Neighbors Match) algorithm. Such as a random K-d tree and a preferential search K-means tree. In some embodiments, further comprising:

Specifically, in this embodiment, in the process of feature matching, there is a phenomenon that feature points of a current image frame are not matched with all feature points in a previous image frame, for example, euclidean distances between each feature point and all feature points of a previous image frame are larger than a preset value, that is, the similarity does not satisfy the preset value, it may be determined that feature points are not matched with all feature points of a previous image frame, and then the feature points may be discarded.

If each feature point of the current image frame does not match each feature point of the previous image frame, the current image frame may be discarded.

It is understood that the present embodiment selects to discard the current image frame from the current image frame and the previous image frame, and may be understood as performing feature point matching with the next image frame as the current image frame and the previous image frame.

In some embodiments, further comprising:

Specifically, the present embodiment is different from the above-described embodiment in that, when it is determined that the feature point of the current image frame and all the feature points of the previous image frame do not match, the present embodiment retains the feature point and discards the feature point of the previous image frame.

And traversing each characteristic point of the current image frame, matching each characteristic point with the previous image frame, and discarding the previous image frame if each characteristic point of the current image frame is not matched with each characteristic point of the previous image frame.

It will be appreciated that the present embodiment selects to discard the previous image frame from the current image frame and the previous image frame, that is, it can be understood that the matching of the feature points is performed between the current processing frame and the previous image frame of the previous image frame.

Referring to fig. 3, an image processing method is provided for an embodiment of the present application, and is applied to a terminal, including but not limited to the following steps:

Step 301, receiving matching relation data and image data sent by an intelligent wearing device, wherein the matching relation data is determined by the intelligent wearing device based on feature matching of each feature point of a current image frame and all feature points of a previous image frame, the matching relation data comprises the feature points which are matched, feature point IDs of the feature points of the previous image frame and feature point data, the image data comprises image data of the current image frame, and the feature point data of the feature points comprises descriptors for describing image areas around the feature points;

step 302, determining a camera pose and a three-dimensional space point of the current image frame based on the matching relationship data, the current image frame and the previous image frame;

step 303, performing BA optimization on all the determined camera pose and the three-dimensional space points;

step 304, determining that a new image frame is not received, and performing BA optimization on all the determined camera pose and the three-dimensional space points to obtain a final camera pose and a final three-dimensional space point;

step 305, determining a three-dimensional model based on all received image frames, the final camera pose, camera internal parameters and three-dimensional space points of all the image frames.

It should be noted that, the image processing method provided in this embodiment is applied to a terminal, where the terminal and the intelligent wearable device may establish communication and perform data transmission, and the data processing of the terminal and the intelligent wearable device may be parallel to improve the overall generation efficiency of the stereoscopic image, and the communication may be established through the foregoing wireless communication technology or wired communication technology, which is not limited in this application.

In step 301, the terminal receives image data of all frames to be processed and matching relationship data between a current image frame and a previous image frame, which are sent by the smart wearable device. It should be noted that, the image frame data of the intelligent wearable device received by the terminal may be directly sent by the intelligent wearable device, or may be sent in a transit manner, for example, the image frame data is sent to the cloud server by the intelligent wearable device first, and then is sent to the terminal through the cloud server. It should be noted that, in this embodiment, the terminal receives a plurality of continuous image frames sent by the smart wearable device, and each image frame corresponds to one image data. The image data of all the image frames can be understood as that each frame of image frame is sequentially used as a current frame to be processed, and the image data corresponding to each frame is sent to the terminal for processing, so that the terminal can receive the image data of the continuous frames, and further a three-dimensional model is built.

In this embodiment, the current frame to be processed is determined by the intelligent wearable device by selecting in real time based on the foregoing equidistant policy, and the previous frame to be processed refers to the previous frame to be processed adjacent to the current frame to be processed, which is not described herein.

The matching relation data are determined by the intelligent wearable device based on feature matching of each feature point of the current image frame and all feature points of the previous image frame. The matching relation data comprises the matched characteristic points, the characteristic point IDs of the characteristic points of the previous image frame and the characteristic point data, the image data comprises the image data of the current image frame, and the characteristic point data of the characteristic points comprise descriptors for describing the image areas around the characteristic points. In this embodiment, the descriptor may be a 128-dimensional vector describing image information of a 16×16 pixel area around the feature point.

In the step 302, the terminal uses computing resources and mobile phone computing resources to determine the pose and three-dimensional space point of the camera of the current image frame according to the matching relationship data, the current image frame and the previous image frame sent by the intelligent wearable device.

It should be noted that, the pose of the camera of the current image frame is the pose matrix of the camera in the current state, and the three-dimensional space point is the three-dimensional space point generated by triangulating the pose of the camera of the current image frame and the pose of the camera of the previous image frame.

Next, through the above step 303, BA optimization is performed on the camera pose and the three-dimensional spatial point. The BA (Bundle Adjustment, beam method adjustment) is to improve the accuracy of the camera pose and the three-dimensional space point, so as to obtain the optimal camera pose and the three-dimensional space point to generate the final three-dimensional image, which means that the optimal three-dimensional model and the camera parameters are extracted from the visual picture as the optimal data of each frame to be processed. Further, through the above step 304, when it is determined that a new image frame is not received, that is, when the image acquisition of the intelligent wearable device is finished, global optimization is performed on all camera pose and three-dimensional space points, so as to obtain final camera pose and three-dimensional space points.

In some examples, step 303 further comprises:

or, performing local BA optimization on the camera pose and the three-dimensional space point of the image frames in the first frame sequence;

the first frame sequence is a frame sequence including at least a current image frame and a previous image frame, and it is understood that the first frame sequence may further include an image frame before the previous image frame, for example, further includes two or three consecutive image frames before the previous image frame, and the number of image frames of the first frame sequence is not limited in this application.

In some examples, step 304 further comprises:

or performing global BA optimization on the camera pose and the three-dimensional space point of the image frame in the second frame sequence to obtain a final camera pose and a final three-dimensional space point;

for example, a key frame is determined from each first frame sequence, the second frame sequence may be a frame sequence formed by all key frames, after it is determined that no new image frame is received, global BA optimization is performed on all key frames in the second frame sequence to determine a camera pose and a three-dimensional space point of the key frames after global BA optimization, where the camera pose and the three-dimensional space point of the image frames that are not the key frames form a final camera pose and a three-dimensional space point together.

It should be noted that, the local optimization generally includes optimizing a frame sequence including a current image frame, and the global optimization optimizes a key frame of each frame sequence, so as to obtain a final camera pose and a three-dimensional space point, where the final three-dimensional space point of all the image frames in the embodiment forms a three-dimensional sparse point cloud.

It can be appreciated that the embodiment of the present application may perform BA optimization on all the image frames that have generated three-dimensional points after receiving the current image frame, or may perform local BA optimization on a portion of the image frames (i.e., the image frames of the first frame sequence content) that include the current image frame after receiving the current image frame.

In the embodiment of the application, when it is determined that a new image frame is not received, BA optimization is performed on all the image frames with three-dimensional points generated, and global BA optimization may also be performed on image frames in a second frame sequence consisting of key frames determined by the first frame sequence, so as to obtain final camera pose and three-dimensional space points of all the image frames.

And finally, determining a three-dimensional model based on all received image frames, and finally camera pose, camera internal parameters and three-dimensional space points of all the image frames to obtain a final three-dimensional image.

In this embodiment, three-dimensional imaging is performed according to the image data of the image frame, so as to obtain a three-dimensional stereoscopic image.

Alternatively, the multi-view rendered video or the three-dimensional data file may be output based on a three-dimensional model.

In some embodiments, further comprising: and inputting all received image frames, namely final camera pose, camera internal parameters and three-dimensional space points of all the image frames into a target algorithm, and outputting the three-dimensional model, wherein the target algorithm is an MVS algorithm or a NeRF algorithm.

Specifically, all received image frames, final camera pose, camera internal parameters and three-dimensional space points of all the image frames are input into an MVS algorithm or a NeRF algorithm to obtain a three-dimensional model. In this embodiment, the three-dimensional space point is a three-dimensional sparse point cloud, and by using the MVS algorithm or the NeRF algorithm, a three-dimensional dense point cloud can be output, i.e. a three-dimensional image is formed.

It should be noted that the MVS algorithm (Multi-view stereo) can construct a highly detailed three-dimensional model from the image alone, and collect a huge image dataset, which is used to construct a 3D geometric model for resolving the image. The NeRF algorithm (neuro-radiation field) can generate new views of the complex 3D scene based on the partial 2D image set. In the NeRF neural network model, it is trained to reproduce an input view of a scene using rendering loss by taking input images representing the scene and interpolating between them to render a complete scene. Thus, the NeRF algorithm is an efficient method of generating images for composite data.

In some embodiments, further comprising:

It will be appreciated that this embodiment is a process of generating three-dimensional spatial points,

Specifically, this embodiment requires at least two matching frames that are adjacent in time sequence, i.e., the current image frame and the last image frame. Setting the camera pose of the image data of the first frame (namely the last image frame) as an identity matrix, estimating a target matrix through matching feature points between the image data of the first frame and the image data of the second frame (namely the current image frame), and decomposing the target matrix to obtain the camera pose of the image of the second frame.

After estimating the pose of the first two frames of images, three-dimensional spatial points are generated by triangularization (triangularization).

In this embodiment, for the image data of the current image frame that is added continuously, 2D-2D matching is performed according to the image data of the current image frame and the image data of the previous image frame, so as to obtain 2D-3D matching between the two, and then the camera pose of the current frame is calculated by using RANSAC-PNP. Then generating three-dimensional space points in the current frame through triangulation, and adding the newly added three-dimensional space points into historical data.

For example, if the previous image Frame is denoted as Frame1, the current image Frame is denoted as Frame2, the next image Frame is denoted as Frame3, and so on.

Firstly, 2D-2D calculation is carried out between frames 1-2 to obtain the camera pose R of the current Frame and the unit length t, and then the 3D point is calculated by triangulation.

Then 3D-2D calculation is carried out between frames 2-3 to obtain the camera pose R of the current Frame and t1 with t as the unit length, and then 3D points are calculated by triangulation.

Through continuous circulation, the camera pose and the three-dimensional space point of the current image frame can be continuously added into the historical data, so that the next image frame can be used for confirming the camera pose and generating the three-dimensional space point.

According to the image processing method, image data of all image frames and matching relation data of a current image frame and a previous image frame sent by intelligent wearable equipment are received through a terminal, matching feature points in the image data of the current image frame are determined based on the matching relation data, then camera pose and three-dimensional space points of the current image frame are determined based on the matching feature points, the image data of the current image frame and the image data of the previous image frame, and finally a three-dimensional model is determined based on all received image frames and final camera pose, camera internal parameters and three-dimensional space points of all image frames. According to the embodiment of the application, the intelligent wearable equipment computing resource and the terminal computing resource can be utilized to be completed cooperatively, so that effective computing power distribution is realized, and the three-dimensional image reconstruction effect with as small computing delay and as fine as possible is achieved.

And optimizing the pose of the camera and the three-dimensional space point to obtain optimized data of each image frame to obtain a three-dimensional image. The generation effect of the three-dimensional image is further improved through optimizing the data, and the user experience is improved.

Referring to fig. 4, fig. 4 is a flowchart of an image processing method according to an embodiment of the present application, including the following steps:

step 401, starting shooting;

step 402, shooting by a camera of the intelligent wearable device;

step 403, collecting the current frame to be processed at equal intervals in real time;

step 404, extracting image features by utilizing local resources of the intelligent wearable equipment;

step 405, calculating a matching relationship between a current frame to be processed and a previous frame to be processed by using a local computing resource of the intelligent wearable device;

step 406, uploading the matching relation data and the compressed data of the current image frame to a terminal connected with the intelligent wearable device;

step 407, estimating the pose of the camera of the current frame to be processed at the terminal;

step 408, triangulating the current frame to be processed at the terminal to generate three-dimensional space points;

step 409, performing BA optimization on all the generated three-dimensional space points and the estimated pose at the terminal;

step 410, performing BA optimization on all data at the terminal;

Step 411, obtaining target pose, camera internal parameters and three-dimensional sparse point cloud of all frames to be processed;

step 412, the terminal calculates to obtain a three-dimensional model by using an MVS algorithm or a NeRF algorithm;

step 413, outputting a three-dimensional image including the three-dimensional model.

Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, which may include: processor 510, communication interface (Communications Interface) 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, memory 530 complete communication with each other through communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform an image processing method comprising:

acquiring a plurality of image frames at equal intervals;

transmitting matching relation data and image data to a terminal, so that the terminal determines a three-dimensional model of a target based on the matching relation data and the image data, wherein the matching relation data comprises the matched characteristic points, characteristic point IDs of characteristic points of the previous image frame and characteristic point data, and the image data comprises image data of the current image frame;

Or alternatively, the first and second heat exchangers may be,

Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present application also provides a computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing the image processing method provided by the methods described above, the method comprising:

Acquiring a plurality of image frames at equal intervals;

or alternatively, the first and second heat exchangers may be,

In yet another aspect, the present application also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the image processing method provided by the above methods, the method comprising:

acquiring a plurality of image frames at equal intervals;

or alternatively, the first and second heat exchangers may be,

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. An image processing method, which is applied to intelligent wearable equipment, comprises the following steps:

acquiring a plurality of image frames at equal intervals;

2. The image processing method according to claim 1, characterized by further comprising:

3. The image processing method according to claim 1, characterized by further comprising:

4. The image processing method according to claim 1, characterized by further comprising:

5. An image processing method, applied to a terminal, comprising:

6. The image processing method according to claim 5, characterized by further comprising:

7. The image processing method according to claim 5, characterized by further comprising:

8. The image processing method according to any one of claims 5 to 7, characterized by further comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the image processing method of any one of claims 1 to 4, or the image processing method of any one of claims 5-8, when the program is executed by the processor.

10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the image processing method according to any one of claims 1 to 4, or the image processing method according to any one of claims 5 to 8.