CN111768443A

CN111768443A - Image processing method and device based on mobile camera

Info

Publication number: CN111768443A
Application number: CN201910667521.5A
Authority: CN
Inventors: 刘享军; 吕晓磊
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2019-07-23
Filing date: 2019-07-23
Publication date: 2020-10-13

Abstract

The embodiment of the application discloses an image processing method and device based on a mobile camera. One embodiment of the method comprises: acquiring a current frame shot by the mobile camera, and determining the feature point position of the feature point of the target object contained in the current frame; matching the feature points of the current frame and the three-dimensional map by using the feature point positions to determine the spatial position of the feature points of the target object contained in the current frame; based on the determined feature point position and spatial position, determining a pose change of the current frame relative to the moving camera of the historical frame containing the target object. According to the embodiment of the application, the current frame and the three-dimensional map can be matched with the feature points, and the pose change of the camera of each frame is tracked. And the spatial position of the feature point is accurately obtained through the plane feature of the current frame.

Description

Image processing method and device based on mobile camera

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of internet, and particularly relates to an image processing method and device based on a mobile camera.

Background

Augmented Reality (AR) technology is a technology for superimposing images, videos, and three-dimensional rendering models in a real-world scene.

With the improvement of computing power of electronic products, AR is applied more and more widely. The AR technology is used in various fields in life, popularization and application of the AR technology bring an astonishing and new experience to life of people, and meanwhile, the AR technology brings convenience to life of people.

Disclosure of Invention

The embodiment of the application provides an image processing method and device based on a mobile camera.

In a first aspect, an embodiment of the present application provides an image processing method based on a mobile camera, including: acquiring a current frame shot by a mobile camera, and determining the feature point position of the feature point of a target object contained in the current frame; matching the feature points of the current frame and the three-dimensional map by using the feature point positions to determine the spatial positions of the feature points of the target object contained in the current frame, wherein the three-dimensional map comprises the spatial positions of a plurality of feature points of the target object; based on the determined feature point locations and spatial locations, a pose change of the current frame relative to a moving camera of a historical frame containing the target object is determined.

In some embodiments, the three-dimensional map is constructed by the following construction steps: acquiring at least two frames of images which are shot by a mobile camera and contain a target object, wherein the at least two frames of images comprise a first image frame and a second image frame; in response to the characteristic points of the target object contained in the first image frame and the characteristic points of the target object contained in the second image frame being successfully matched, determining that the pose of the mobile camera is changed into a first pose change from the shooting of the first image frame to the shooting of the second image frame; and constructing a three-dimensional map based on the first image frame, the second image frame and the first posture change.

In some embodiments, the pose changes include rotational changes and translational changes of the moving camera; in response to the feature points of the target object contained in the first image frame and the feature points of the target object contained in the second image frame being successfully matched, determining that the pose change of the mobile camera from shooting the first image frame to shooting the second image frame is a first pose change, including: matching the feature points of the target object contained in the first image frame and the feature points of the target object contained in the second image frame based on the descriptors of the feature points of the target object contained in the first image frame and the descriptors of the feature points of the target object contained in the second image frame; in response to a successful match, a first pose change of the moving camera of the second image frame relative to the first image frame is determined using epipolar constraints.

In some embodiments, the at least two images further include a third image frame, and constructing the three-dimensional map based on the first image frame, the second image frame, and the first pose change includes: determining a pose change of the third image frame relative to a moving camera of a previously shot image frame as a second pose change, wherein at least two images comprise the previously shot image; and constructing a three-dimensional map based on the first image frame, the second image frame and the third image frame, and the first position and posture change and the second position and posture change.

In some embodiments, after building the three-dimensional map, the method further comprises: acquiring a fourth image frame shot by a mobile camera, and matching descriptors of feature points of the fourth image frame with descriptors of feature points of a reference image, wherein the reference image comprises a target object; and adjusting the three-dimensional map by minimizing the reprojection error in response to determining that the fourth image frame matches the reference image for more than a predetermined number of feature points.

In a second aspect, an embodiment of the present application provides an image processing apparatus based on a mobile camera, including: the acquisition unit is configured to acquire a current frame shot by the mobile camera and determine the feature point position of the feature point of the target object contained in the current frame; a first determining unit configured to perform feature point matching on a current frame and a three-dimensional map to determine a spatial position of a feature point of a target object included in the current frame, using a feature point position, wherein the three-dimensional map includes spatial positions of a plurality of feature points of the target object; a second determination unit configured to determine a change in pose of the current frame with respect to the moving camera of the history frame containing the target object based on the determined feature point position and the spatial position.

In some embodiments, the apparatus further comprises a map building unit; the map construction unit includes: the device comprises an acquisition subunit, a processing unit and a display unit, wherein the acquisition subunit is configured to acquire at least two frames of images which are shot by a mobile camera and contain a target object, and the at least two frames of images comprise a first image frame and a second image frame; a determining subunit configured to determine that the pose change of the mobile camera from capturing the first image frame to capturing the second image frame is a first pose change in response to successful matching of the feature points of the target object contained in the first image frame and the feature points of the target object contained in the second image frame; a construction subunit configured to construct a three-dimensional map based on the first image frame, the second image frame, and the first pose change.

In some embodiments, the pose changes include rotational changes and translational changes of the moving camera; a determination subunit further configured to: matching the feature points of the target object contained in the first image frame and the feature points of the target object contained in the second image frame based on the descriptors of the feature points of the target object contained in the first image frame and the descriptors of the feature points of the target object contained in the second image frame; in response to a successful match, a first pose change of the moving camera of the second image frame relative to the first image frame is determined using epipolar constraints.

In some embodiments, the at least two frame images further comprise a third image frame, the construction subunit further configured to: determining a pose change of the third image frame relative to a moving camera of a previously shot image frame as a second pose change, wherein at least two images comprise the previously shot image; and constructing a three-dimensional map based on the first image frame, the second image frame and the third image frame, and the first position and posture change and the second position and posture change.

In some embodiments, the apparatus further comprises: the matching unit is configured to acquire a fourth image frame shot by the mobile camera after the three-dimensional map is built, and match descriptors of feature points of the fourth image frame with descriptors of feature points of a reference image, wherein the reference image contains a target object; an adjusting unit configured to adjust the three-dimensional map with a minimized reprojection error in response to determining that the fourth image frame matches the reference image with more than a preset number of feature points.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement a method as in any embodiment of a method for image processing based on a mobile camera.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method as in any one of the embodiments of the image processing method based on a mobile camera.

According to the image processing scheme based on the mobile camera, firstly, a current frame shot by the mobile camera is obtained, and the position of a feature point of a target object contained in the current frame is determined. And then, matching the feature points of the current frame and a three-dimensional map by using the feature point positions to determine the spatial positions of the feature points of the target object contained in the current frame, wherein the three-dimensional map comprises the spatial positions of a plurality of feature points of the target object. Finally, based on the determined feature point position and spatial position, a pose change of the current frame relative to a moving camera of a historical frame containing the target object is determined. The scheme provided by the embodiment of the application can match the characteristic points of the current frame and the three-dimensional map and track the pose change of the camera of each frame. And the spatial position of the feature point is accurately obtained through the plane feature of the current frame.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2a is a flow diagram of one embodiment of a mobile-camera based image processing method according to the present application;

FIG. 2b is a schematic diagram of a three-dimensional map of a mobile-camera-based image processing method according to the present application;

FIG. 3 is a schematic diagram of an application scenario of a mobile-camera-based image processing method according to the present application;

FIG. 4 is a flow diagram of yet another embodiment of a mobile-camera based image processing method according to the present application;

FIG. 5 is a schematic block diagram of one embodiment of a mobile-camera based image processing apparatus according to the present application;

FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the mobile-camera based image processing method or mobile-camera based image processing apparatus of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as an image processing application based on a mobile camera, a video application, a live application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the

terminal devices

101, 102, and 103.

Here, the

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for the

terminal devices

101, 102, 103. The background server can analyze and process the received data of the current frame and the like, and feed back a processing result (for example, the pose change of the current frame relative to the moving camera of the historical frame) to the terminal equipment.

It should be noted that the image processing method based on the mobile camera provided in the embodiment of the present application may be executed by the server 105 or the

terminal devices

101, 102, and 103, and accordingly, the image processing apparatus based on the mobile camera may be disposed in the server 105 or the

terminal devices

101, 102, and 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2a, a flow diagram 200 of one embodiment of a mobile-camera based image processing method according to the present application is shown. The image processing method based on the mobile camera comprises the following steps:

step 201, obtaining a current frame shot by a mobile camera, and determining a feature point position of a feature point of a target object contained in the current frame.

In this embodiment, an executing subject (for example, a server or a terminal device shown in fig. 1) of the image processing method based on the mobile camera may acquire a current frame captured by the mobile camera, and determine a feature point position of a target object included in the current frame. Specifically, the above-described feature point positions may be represented in the form of coordinates. The feature point position indicates the position of the feature point of the target object in the current frame.

Step 202, matching the feature points of the current frame and a three-dimensional map by using the feature point positions to determine the spatial positions of the feature points of the target object contained in the current frame, wherein the three-dimensional map includes the spatial positions of a plurality of feature points of the target object.

In these implementations, the execution subject may perform feature point matching on the current frame and the three-dimensional map, so as to determine a spatial position of a feature point of the target object included in the current frame. The spatial positions of these feature points in the three-dimensional map may be expressed as three-dimensional coordinates. Feature points having the same feature or features with high similarity (e.g., similarity greater than a preset threshold) can be successfully matched. If the feature points with the number more than the preset number are matched, the execution main body can determine that the feature points of the target object are successfully matched in the current frame and the three-dimensional map. In practice, the execution body can triangulate the spatial position of the feature point by using epipolar constraint.

As shown in fig. 2b, X in the figure_i(i.e. X)₁、X₂…X₆) Is the spatial position of the feature point of the target object in the three-dimensional space. The rectangular parallelepiped formed by the broken line in the figure is the target object. P_iIs the pose of the moving camera of the keyframe in the world coordinate system.

Specifically, the executing entity may use a fast nearest neighbor (FLANN) algorithm or a random sample consensus (RANSAC) algorithm to perform feature point matching.

And step 203, determining the pose change of the current frame relative to the moving camera of the historical frame containing the target object based on the determined feature point position and the space position.

In this embodiment, the execution subject may determine the pose change of the mobile camera of the current frame relative to the history frame based on the feature point position of the feature point of the current frame in the current frame and the spatial position in the three-dimensional map. Specifically, the history frame also contains the target object.

In practice, the execution subject may determine a pose change of the mobile camera between the current frame and any history frame captured by the mobile camera by using a three-dimensional map.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the image processing method based on the mobile camera according to the present embodiment. In the application scenario of fig. 3, the execution subject 301 may obtain a current frame (e.g., a 102 th frame image of a target object captured by a mobile camera) 302 captured by the mobile camera, and determine a feature point position 303 of a feature point of the target object (e.g., pencil a) included in the current frame. The current frame is matched with a three-dimensional map 304 by using the feature point positions 303 to determine the spatial positions 305 of the feature points of pencil a contained in the current frame, wherein the three-dimensional map comprises the spatial positions of a plurality of feature points of pencil a. Execution subject 301 determines 306 a pose change of the current frame relative to the moving camera of the historical frame containing pencil a based on the determined landmark position 303 and spatial position 305.

The method provided by the embodiment of the application can be used for matching the feature points of the current frame and the three-dimensional map and tracking the pose change of the camera of each frame. And the spatial position of the feature point is accurately obtained through the plane feature of the current frame.

With further reference to fig. 4, a flow 400 of yet another embodiment of a mobile-camera based image processing method is shown. The flow 400 of the image processing method based on the mobile camera comprises the following steps:

step 401, obtaining a current frame shot by the mobile camera, and determining a feature point position of a feature point of a target object included in the current frame.

In this embodiment, an executing subject (for example, a server or a terminal device shown in fig. 1) of the image processing method based on the mobile camera may acquire a current frame captured by the mobile camera, and determine a feature point position of a target object included in the current frame. Specifically, the above-described feature point positions may be represented in the form of coordinates. The feature point position indicates the position of the feature point in the current frame.

Step 402, matching the feature points of the current frame and a three-dimensional map by using the feature point positions to determine the spatial positions of the feature points of the target object contained in the current frame, wherein the three-dimensional map includes the spatial positions of a plurality of feature points of the target object.

In these implementations, the execution subject may perform feature point matching on the current frame and the three-dimensional map, so as to determine a spatial position of a feature point of the target object included in the current frame. The spatial positions of these feature points in the three-dimensional map may be expressed as three-dimensional coordinates. Feature points having the same features or features with high similarity can be successfully matched. If more than the preset number of feature points are matched, the feature points of the target object can be successfully matched in the current frame and the three-dimensional map.

And step 403, determining the pose change of the current frame relative to the moving camera of the historical frame containing the target object based on the determined feature point position and the spatial position.

In the present embodiment, the three-dimensional map is constructed by the following construction steps a, b and c:

step a, at least two frames of images which are shot by a mobile camera and contain a target object are obtained, wherein the at least two frames of images comprise a first image frame and a second image frame.

In the present embodiment, an execution subject (for example, a server or a terminal device shown in fig. 1) of the image processing method based on the mobile camera may acquire at least two frames of images. Each of the at least two frame images contains a target object. The first image frame is captured prior to the second image frame. Between the shooting time of the first image frame and the shooting time of the second image frame, other image frames shot by the mobile camera can be contained or not contained.

And b, in response to the fact that the feature points of the target object contained in the first image frame and the feature points of the target object contained in the second image frame are successfully matched, determining that the pose change of the mobile camera from shooting of the first image frame to shooting of the second image frame is a first pose change.

In this embodiment, the execution subject may match the feature points of the target object included in the first image frame and the feature points of the target object included in the second image frame. In response to a successful match, the execution subject may determine a pose change of the moving camera of the first image frame relative to the second image frame and determine the pose change as the first pose change.

In practice, the execution subject may match the feature points of the target object using the features of the feature points. If the feature points with the number more than the preset number are matched, the feature points of the target object can be successfully matched between the first image frame and the second image frame.

In some optional implementations of this embodiment, the pose change includes a rotational change and a translational change of the moving camera; step 402 may include: matching the feature points of the target object contained in the first image frame and the feature points of the target object contained in the second image frame based on the descriptors of the feature points of the target object contained in the first image frame and the descriptors of the feature points of the target object contained in the second image frame; in response to a successful match, a first pose change of the moving camera of the second image frame relative to the first image frame is determined using epipolar constraints.

In these alternative implementations, the execution subject may extract a descriptor of a feature point of a target object included in the image frame as a feature of the feature point. And matching is performed based on the extracted descriptors. In particular, the descriptors herein may be in various forms. For example, the descriptor may be a Binary Robust Independent Element Features (BRIEF) or an accelerated Up robustfeed (SURF).

And c, constructing a three-dimensional map based on the first image frame, the second image frame and the first position and posture change.

In this embodiment, the execution subject may construct a three-dimensional map based on the first image frame, the second image frame, and a change in the pose of the moving camera between the two image frames. Specifically, the spatial positions of a plurality of feature points of the target object contained in the two image frames can be determined by using the positions of the feature points and the change of the pose (i.e. the first pose change) of the two image frames captured by the mobile camera.

In some optional implementations of this embodiment, the at least two frame images further include a third frame image; step c may include: determining a pose change of the third image frame relative to a moving camera of a previously shot image frame as a second pose change, wherein at least two images comprise the previously shot image; and constructing a three-dimensional map based on the first image frame, the second image frame and the third image frame, and the first position and posture change and the second position and posture change.

In these alternative implementations, the execution body may construct the three-dimensional map by using not only the first image frame and the second image frame, but also other image frames obtained by shooting the target object by using the mobile camera. Specifically, the execution subject may determine the spatial positions of the plurality of feature points by using the positions of the plurality of feature points of the target object included in the third image frame and the image frame captured previously and by using a change in the posture of the moving camera between the two image frames (i.e., a second posture change), and by using the positions of the plurality of feature points of the target object included in the first image frame and the second image frame and a change in the posture of the moving camera between the two image frames (i.e., a first posture change).

The implementation modes can determine the three-dimensional map by utilizing a plurality of image frames obtained by shooting from a plurality of angles of the target object, and eliminate data errors caused by collecting samples of a small number of angles, so that the more accurate three-dimensional map is determined.

In some optional application scenarios of these implementations, after step 403, the method may further include:

acquiring a fourth image frame shot by a mobile camera, and matching descriptors of feature points of the fourth image frame with descriptors of feature points of a reference image, wherein the reference image comprises a target object; and adjusting the three-dimensional map by minimizing the reprojection error in response to determining that the fourth image frame matches the reference image for more than a predetermined number of feature points.

In these optional application scenarios, the execution subject may acquire the fourth image frame, extract a descriptor of the feature point of the fourth image frame, and match the descriptor of the feature point of the fourth image frame with the descriptor of the feature point of the reference image. The reference image can be obtained by shooting an image of the target object by adopting a movable camera in advance. In this way, the reference image can be used to match feature points with other images to determine whether the other images contain the target object. If so, the three-dimensional map may be adjusted using the other images. Specifically, if it is determined that the other image matches the reference image with more than a preset number of feature points, it may be determined that the other image includes the target object. The executing body can adjust the three-dimensional map by taking the minimized reprojection error as a target for optimizing the three-dimensional map.

In practice, the execution subject may not only adjust the three-dimensional map with the minimum reprojection error as an optimization target, but also further optimize the three-dimensional map by a lagrange multiplier method.

It should be noted that "first", "second", "third", "fourth", etc. in this application do not represent the order of the image frames (unless specifically stated), but only distinguish different image frames.

The embodiment can identify and process the images, realize the matching of the feature points between the two-dimensional images and further construct an accurate three-dimensional map.

With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an image processing apparatus based on a mobile camera, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.

As shown in fig. 5, the image processing apparatus 500 based on a mobile camera of the present embodiment includes: an acquisition unit 501, a first determination unit 502, and a second determination unit 503. The acquiring unit 501 is configured to acquire a current frame shot by a mobile camera, and determine a feature point position of a feature point of a target object included in the current frame; a first determining unit 502 configured to perform feature point matching on a current frame and a three-dimensional map to determine a spatial position of a feature point of a target object included in the current frame, using a feature point position, wherein the three-dimensional map includes spatial positions of a plurality of feature points of the target object; a second determining unit 503 configured to determine a pose change of the current frame with respect to the moving camera of the history frame containing the target object based on the determined feature point position and the spatial position.

In some embodiments, the obtaining unit 501 of the mobile camera-based image processing apparatus 500 may obtain a current frame captured by the mobile camera, and determine a feature point position of a target object included in the current frame. Specifically, the above-described feature point positions may be represented in the form of coordinates. The feature point position indicates the position of the feature point of the target object in the current frame.

In some embodiments, the first determining unit 502 performs feature point matching on the current frame and the three-dimensional map, so as to determine the spatial position of the feature point of the target object included in the current frame. The spatial positions of these feature points in the three-dimensional map may be expressed as three-dimensional coordinates. Feature points having the same features or features with high similarity can be successfully matched. If the feature points with the number more than the preset number are matched, the execution main body can determine that the feature points of the target object are successfully matched in the current frame and the three-dimensional map.

In some embodiments, the second determination unit 503 determines the pose change of the moving camera of the current frame with respect to the history frame based on the feature point position of the feature point of the current frame in the current frame and the spatial position in the three-dimensional map. Specifically, the history frame also contains the target object.

In some optional implementations of this embodiment, the apparatus further includes a map construction unit; the map construction unit includes: the device comprises an acquisition subunit, a processing unit and a display unit, wherein the acquisition subunit is configured to acquire at least two frames of images which are shot by a mobile camera and contain a target object, and the at least two frames of images comprise a first image frame and a second image frame; a determining subunit configured to determine that the pose change of the mobile camera from capturing the first image frame to capturing the second image frame is a first pose change in response to successful matching of the feature points of the target object contained in the first image frame and the feature points of the target object contained in the second image frame; a construction subunit configured to construct a three-dimensional map based on the first image frame, the second image frame, and the first pose change.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium of the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a first determination unit, and a second determination unit. The names of the units do not form a limitation on the units themselves in some cases, and for example, the acquiring unit may also be described as a "unit that acquires a current frame captured by a moving camera and determines the feature point positions of the feature points of the target object included in the current frame".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a current frame shot by a mobile camera, and determining the feature point position of the feature point of a target object contained in the current frame; matching the feature points of the current frame and the three-dimensional map by using the feature point positions to determine the spatial positions of the feature points of the target object contained in the current frame, wherein the three-dimensional map comprises the spatial positions of a plurality of feature points of the target object; based on the determined feature point locations and spatial locations, a pose change of the current frame relative to a moving camera of a historical frame containing the target object is determined.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. An image processing method based on a mobile camera comprises the following steps:

acquiring a current frame shot by the mobile camera, and determining the feature point position of the feature point of the target object contained in the current frame;

matching the feature points of the current frame and a three-dimensional map by using the feature point positions to determine the spatial positions of the feature points of the target object contained in the current frame, wherein the three-dimensional map comprises the spatial positions of a plurality of feature points of the target object;

based on the determined landmark positions and spatial positions, determining a pose change of the current frame relative to the mobile camera including a historical frame of the target object.

2. The method of claim 1, wherein the three-dimensional map is constructed by the construction steps of:

acquiring at least two frames of images which are shot by the mobile camera and contain the target object, wherein the at least two frames of images comprise a first image frame and a second image frame;

in response to the feature points of the target object contained in the first image frame and the feature points of the target object contained in the second image frame being successfully matched, determining that the pose change of the mobile camera from shooting the first image frame to shooting the second image frame is a first pose change;

constructing the three-dimensional map based on the first image frame, the second image frame, and the first pose change.

3. The method of claim 2, wherein pose changes include rotational and translational changes of the moving camera;

the determining that the pose change of the mobile camera from capturing a first image frame to capturing a second image frame is a first pose change in response to successful matching of the feature points of the target object contained in the first image frame and the feature points of the target object contained in the second image frame comprises:

matching feature points of the target object included in the first image frame and feature points of the target object included in the second image frame based on descriptors of the feature points of the target object included in the first image frame and descriptors of the feature points of the target object included in the second image frame;

in response to a successful match, determining the first change in pose of the second image frame relative to the moving camera of the first image frame using epipolar constraints.

4. The method of claim 2, wherein the at least two frame images further include a third image frame, the constructing the three-dimensional map based on the first image frame, the second image frame, and the first pose change comprising:

determining a pose change of the mobile camera of the third image frame relative to a previously captured image frame as a second pose change, wherein the at least two images comprise the previously captured image;

constructing the three-dimensional map based on the first image frame, the second image frame, the third image frame, and the first and second pose changes.

5. The method of one of claim 2, wherein after said constructing the three-dimensional map, the method further comprises:

acquiring a fourth image frame shot by the mobile camera, and matching descriptors of feature points of the fourth image frame with descriptors of feature points of a reference image, wherein the reference image comprises the target object;

adjusting the three-dimensional map with minimized reprojection errors in response to determining that the fourth image frame matches the reference image for more than a predetermined number of feature points.

6. An image processing apparatus based on a mobile camera, comprising:

the acquisition unit is configured to acquire a current frame shot by the mobile camera and determine the feature point position of the feature point of the target object contained in the current frame;

a first determining unit configured to perform feature point matching on the current frame and a three-dimensional map to determine a spatial position of a feature point of the target object included in the current frame, using the feature point position, wherein the three-dimensional map includes spatial positions of a plurality of feature points of the target object;

a second determination unit configured to determine a change in pose of the current frame with respect to the moving camera of a history frame containing the target object based on the determined feature point position and spatial position.

7. The apparatus of claim 6, wherein the apparatus further comprises a mapping unit; the map construction unit includes:

an acquisition subunit configured to acquire at least two frames of images including the target object captured by the mobile camera, wherein the at least two frames of images include a first image frame and a second image frame;

a determining subunit configured to determine that the pose change of the mobile camera from capturing a first image frame to capturing a second image frame is a first pose change in response to successful matching of the feature points of the target object contained in the first image frame and the feature points of the target object contained in the second image frame;

a construction subunit configured to construct the three-dimensional map based on the first image frame, the second image frame, and the first pose change.

8. The apparatus of claim 6, wherein pose changes comprise rotational and translational changes of the moving camera;

the determining subunit further configured to:

9. The apparatus of claim 6, wherein the at least two frame images further comprise a third image frame, the construction subunit further configured to:

10. The apparatus of claim 6, wherein the apparatus further comprises:

the matching unit is configured to acquire a fourth image frame shot by the mobile camera after the three-dimensional map is built, and match descriptors of feature points of the fourth image frame with descriptors of feature points of a reference image, wherein the reference image contains the target object;

an adjusting unit configured to adjust the three-dimensional map with a minimized reprojection error in response to determining that the fourth image frame matches the reference image with more than a preset number of feature points.

11. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-5.