CN108108748A

CN108108748A - A kind of information processing method and electronic equipment

Info

Publication number: CN108108748A
Application number: CN201711299204.XA
Authority: CN
Inventors: 陈建冲; 高江涛; 杨旭; 史成耀; 崔建竹
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2017-12-08
Filing date: 2017-12-08
Publication date: 2018-06-01

Abstract

The invention discloses a kind of information processing method and electronic equipment, wherein, method includes：Construction feature database；The property data base includes the characteristic point information of the multiple image of target object；The image of the target object is gathered, and characteristic point information extraction is carried out to the image of acquisition；The characteristic point information of extraction is matched with the characteristic point information of two field picture in the property data base, to carry out object identification to the target object；When object identification success, the spatial positional information of extracted characteristic point is obtained；The spatial positional information of the characteristic point based on acquisition determines the acquisition pose of the image of the target object.

Description

Information processing method and electronic equipment

Technical Field

The present invention relates to the field of information processing technologies, and in particular, to an information processing method and an electronic device.

Background

A basic problem in Augmented Reality (AR) is how to superimpose virtual information on a real object, and in related technical solutions, object identification needs to be performed first, and then a relative posture between an image acquisition device and the object is calculated, that is, an acquisition pose corresponding to an image of the object is obtained, so that the virtual information is seamlessly superimposed on a position desired by a user. At present, the scheme for acquiring the pose of an object image includes the following steps:

the method comprises the following steps that 1, a two-dimensional picture mark (marker) is pasted on an object in advance, and then the relative posture of image acquisition equipment and the object is obtained through identifying the marker; however, the application scenario is too limited to be universal;

2, giving a 3D model of the object by a user, extracting structural information (angles, points, straight lines and the like) of the object, and realizing object identification through feature matching; however, the scheme requires a user to provide a 3D model of an object, which greatly restricts the application scenario of the scheme, and is not strong in practicability;

3, reconstructing a sparse 3D model of the object through a plurality of pictures at different angles, and then identifying the object and estimating the posture of the camera in a frame-by-frame matching mode; however, the recognition speed of the scheme is slow, and the recognition is difficult to be realized in real time and is difficult to be applied in practice.

Disclosure of Invention

Embodiments of the present invention provide an information processing method, an information processing apparatus, and a storage medium, which can quickly perform object identification on a target object and determine an acquisition pose corresponding to an image of the target object.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides an information processing method, which comprises the following steps:

constructing a feature database; the characteristic database comprises characteristic point information of multi-frame images of the target object;

collecting an image of the target object, and extracting feature point information of the collected image;

matching the extracted feature point information with feature point information of frame images in the feature database to perform object identification on the target object;

when the object identification is successful, acquiring the spatial position information of the extracted feature points;

and determining the acquisition pose of the image of the target object based on the acquired spatial position information of the feature points.

In the above scheme, the constructing the feature database includes:

acquiring multi-frame images of the target object under different viewpoints;

respectively extracting feature point information of the multi-frame images under different viewpoints;

matching the extracted frame images under different viewpoints with feature point information to obtain matching information;

performing three-dimensional reconstruction on the target object based on the obtained matching information to obtain a three-dimensional reconstruction result;

and constructing the feature database based on the three-dimensional reconstruction result.

In the foregoing solution, the constructing the feature database based on the three-dimensional reconstruction result includes:

obtaining space position information of the reconstruction feature points based on the three-dimensional reconstruction result;

selecting N frame images under different viewpoints as reference frame images; n is a positive integer greater than 1;

and constructing a feature database comprising the reference frame image, the feature points on the reference frame image and the spatial position information of the feature points on the reference frame image.

In the foregoing solution, the matching the extracted feature point information with the feature point information of the frame image in the feature database to perform object identification on the target object includes:

matching the feature points of each frame of image in the feature database with the extracted feature points;

and when the number of the feature points which are successfully matched is determined to be larger than a preset threshold value, representing that the object identification of the target object is successful.

In the above scheme, the method further comprises:

and when the object identification is successful, identifying the target object in the continuously acquired frame images of the target object, and realizing the target tracking of the target object.

In the above scheme, the method further comprises:

and according to the acquisition pose of the image corresponding to the target object, superposing a virtual object to a preset position of the target object in the process of displaying the target object.

In the above scheme, the method further comprises:

in response to successful object recognition of the target object and the presence of feature points for which matching failed among the extracted feature points,

acquiring a frame image with the highest similarity between the acquisition pose corresponding to the image in the feature database and the acquisition pose of the image of the target object;

and matching the image of the target object with the acquired image based on projection characteristics to obtain the spatial position information of the characteristic point with failed matching.

An embodiment of the present invention further provides an electronic device, where the electronic device includes:

a memory for storing an executable program;

a processor for implementing, by executing the executable program stored in the memory:

In the above scheme, the processor is further configured to match feature points of each frame of image in the feature database with the extracted feature points;

In the above solution, the processor is further configured to respond to that the object identification of the target object is successful and there is a feature point with a failed matching in the extracted feature points,

By applying the information processing method, the device and the storage medium provided by the embodiment of the invention, the characteristic database comprising the characteristic point information of the multi-frame image of the target object is constructed, so that after the image of the target object is acquired, the characteristic point information can be matched based on the characteristic point information in the characteristic database, the object identification of the target object is further realized rapidly, and the acquisition pose of the image of the target object is determined.

Drawings

Fig. 1 is a first schematic flow chart of an information processing method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a process for creating a feature database according to an embodiment of the present invention;

fig. 3 is a second schematic flowchart of an information processing method according to an embodiment of the present invention;

fig. 4 is a third schematic flowchart of an information processing method according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and specific examples.

In order to make the objects, technical solutions and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be provided with reference to the accompanying drawings, and it should be understood that the embodiments provided herein are only for explaining the present invention and are not intended to limit the present invention. In addition, the following embodiments are provided as some embodiments for implementing the invention, not all embodiments for implementing the invention, and those skilled in the art will not make creative efforts to recombine technical solutions of the following embodiments and other embodiments based on implementing the invention all belong to the protection scope of the invention.

It should be noted that, in the embodiments of the present invention, the terms "comprises", "comprising" or any other variation thereof are intended to cover a non-exclusive inclusion, so that a method or apparatus including a series of elements includes not only the explicitly recited elements but also other elements not explicitly listed or inherent to the method or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of additional related elements (e.g., steps in a method) in a method or apparatus that comprises the element.

It should be noted that the terms "first \ second \ third" related to the embodiments of the present invention only distinguish similar objects, and do not represent a specific ordering for the objects, and it should be understood that "first \ second \ third" may exchange a specific order or sequence when allowed. It should be understood that the terms first, second, and third, as used herein, are interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or otherwise described herein.

Before further detailed description of the present invention, terms and expressions referred to in the embodiments of the present invention are described, and the terms and expressions referred to in the embodiments of the present invention are applicable to the following explanations.

1) Object Recognition refers to the Recognition (finding) of a given Object in an image or a set of video sequences.

2) And target tracking, namely, in a video or frame image sequence, realizing object identification on a target object and tracking the motion track of the target object.

3) The feature points refer to points with sharp changes in image gray values or points with large curvatures on the image edges (i.e. intersection points of two edges), can reflect the essential features of the image, and can identify target objects (objects) in the image.

4) Descriptors (Descriptors), i.e., Feature Descriptors (Feature Descriptors), are used to describe the attributes of Feature points.

5) The capture pose, in terms of an image, the position and posture of an image capture device (such as a camera) performing image capture with respect to a subject target object, includes rotation and translation, i.e., Six Degrees of Freedom (6DoF, Six Degrees of Freedom) of the image capture device with respect to the subject target object.

Example one

As an optional embodiment of implementing the information processing method according to the embodiment of the present invention, referring to fig. 1, fig. 1 is an optional flowchart schematic diagram of the information processing method according to the embodiment of the present invention, and the information processing method according to the embodiment of the present invention relates to steps 101 to 105, which are described below respectively.

Step 101: constructing a feature database; the feature database includes feature point information of a plurality of frame images of a target object.

In one embodiment, the feature database may be constructed by:

acquiring multi-frame images of a target object under different viewpoints; respectively extracting feature point information of multi-frame images under different viewpoints; matching the extracted frame images under different viewpoints with feature point information to obtain matching information; performing three-dimensional reconstruction on the target object based on the obtained matching information to obtain a three-dimensional reconstruction result; and constructing a feature database based on the three-dimensional reconstruction result.

Referring to fig. 2, fig. 2 is a schematic flow chart of establishing a feature database according to an embodiment of the present invention, and in actual implementation, acquisition of multiple frames of images of a target object at different viewpoints may be implemented in the following manner: the method comprises the steps of carrying out video recording (scan) on a camera around a shot target Object (3D Object) to obtain a scanned video (video) of the target Object, further obtaining each frame of image forming the scanned video, and forming a plurality of frame images of the target Object under different viewpoints. The whole acquisition process is carried out under the offline condition, and the method is convenient to realize.

In one embodiment, referring to fig. 2, extracting feature point information of a frame image may include: extracting ORB (organized FAST and rotaed BRIEF) feature points (Key points) and descriptors thereof from the frame image. In practical application, FAST (features from obtained segment test) algorithm is used to extract ORB feature points, and the core idea of FAST algorithm is to find out the pell points, i.e. to compare a point with its surrounding points, and if it is different from most of them, it can be regarded as a feature point. In practical application, a descriptor of a feature point can be calculated by adopting a BRIEF (binary Robust Independent element features) algorithm, and the core idea of the BRIEF algorithm is to select N point pairs in a certain mode around a key point P and combine comparison results of the N point pairs to serve as the descriptor.

Since the collected multi-frame images of the target object at different viewpoints may be formed by multi-frame images of a scanned video constituting the target object, the multi-frame images of the target object at different viewpoints may be understood as a frame image sequence corresponding to the target object, in an embodiment, referring to fig. 2, the matching of the feature point information is performed on the extracted frame images at different viewpoints to obtain matching information, which may include: for each frame of image in the frame image sequence, feature Point matching is performed on each frame of image and the adjacent N (N is a positive integer, e.g. 5) frame of image before the frame of image, so as to obtain matching information, i.e. corresponding feature Point pair information (Point pair correlation) which is successfully matched.

In an embodiment, performing three-dimensional reconstruction on the target object based on the obtained matching information to obtain a three-dimensional reconstruction result may include: based on the obtained matching information, performing sparse 3D model reconstruction on the target object by using an sfm (structure from motion) algorithm, for example, recovering camera parameters and three-dimensional information by using a numerical method through the obtained matching feature point set, so as to obtain a sparse 3D model (sparse 3D model) and 3D coordinates (spatial position information) of the reconstructed feature points. Thus, the user does not need to provide a 3D model, and the user can obtain a sparse 3D model based on the obtained video by only recording the video around the target object with the camera, and it should be noted that the pose (6 degrees of freedom) of the image capturing apparatus in this embodiment is relative to the target object.

In one embodiment, building the feature database based on the three-dimensional reconstruction result may include:

obtaining space position information of the reconstruction feature points based on the three-dimensional reconstruction result; selecting N frame images under different viewpoints as reference frame images (reference frames); n is a positive integer greater than 1; and constructing a feature database comprising the reference frame image, the feature points on the reference frame image and the spatial position information of the feature points on the reference frame image. Wherein, each reference frame image corresponds to the collection pose of an image collection device (camera). For example, 20 frame images from different viewpoints forming the sparse 3D model may be selected as reference frame images, corresponding to 20 different capture poses of the camera.

Step 102: and acquiring an image of the target object, and extracting feature point information of the acquired image.

Here, in an embodiment, a camera is used to capture an image of a target object, and then ORB feature points and their descriptors in a current frame image are extracted.

Step 103: and matching the extracted characteristic point information with the characteristic point information of the frame image in the characteristic database so as to identify the object of the target object.

In one embodiment, matching the extracted feature point information with the feature point information of the frame image in the feature database may be achieved by:

matching the feature points of each frame of image in the feature database with the extracted feature points one by one; and when the number of the feature points which are successfully matched is determined to be larger than a preset threshold value, representing that the object identification of the target object is successful. For example, feature point matching is performed on the current frame image of the extracted target object and each reference frame image in the feature database one by one, when the number of feature points successfully matched is greater than 15, the object recognition of the target object is considered to be successful, otherwise, the object recognition is considered to be failed, and the processing flow is ended.

Step 104: when the object recognition is successful, the spatial position information of the extracted feature points is acquired.

Here, in actual implementation, since the feature database stores the spatial position information corresponding to the feature point, when the object identification is successful, the spatial position information of the feature point matching successfully can be obtained from the constructed feature database.

In one embodiment, when feature point matching is performed in step 103, there may be a case where although object identification is successful, there are feature points whose matching fails among the extracted feature points, such as: extracting 18 feature points from a current frame image of a target object, wherein when the current frame image is matched with a reference frame image one by one, 16 feature points are successfully matched, and the target object is considered to be successfully identified, however, 2 feature points which fail to be matched exist, and spatial position information of the feature points which fail to be matched can be obtained in the following way: acquiring a reference frame image with the highest similarity between the acquisition pose corresponding to the reference frame image in the feature database and the acquisition pose of the image of the target object; and matching the image of the target object with the acquired reference frame image with the highest similarity of the acquisition pose based on the projection characteristics to obtain the spatial position information of the feature points with failed matching.

In an embodiment, after the object identification of the target object is successful, the target tracking of the target object can be further implemented by: the target object is identified in frame images of the target object that are continuously acquired. For example, in a continuous frame image sequence of the acquired target object, if the target object is identified in a previous frame image, feature point matching is performed on a current frame image and the previous frame image, when the number of feature points successfully matched is greater than a preset threshold (e.g., 15), object identification of the target object is successfully achieved in the current frame, and by analogy, target tracking of the target object is achieved in the continuous frame image sequence.

Step 105: and determining the acquisition pose of the image of the target object based on the acquired spatial position information of the characteristic points.

In an embodiment, according to the obtained spatial position information of the feature points, an n-Point Perspective (PnP) algorithm can be used to determine an acquisition pose (camera pose) of an image of a target object, and similarly, an acquisition pose (camera pose) corresponding to each frame of image in a continuous frame image sequence of the target object can be obtained, the obtained acquisition pose is subjected to smooth filtering and output, and meanwhile, the frame image of the target object, corresponding feature Point information and the acquisition pose can be added into a feature database.

In an embodiment, after the acquisition pose of the image of the target object is determined, the virtual object may be superimposed to a preset position of the target object in the process of displaying the target object according to the determined acquisition pose of the image, so as to realize the superimposition of the virtual object and the real object.

Example two

As another optional embodiment for implementing the information processing method according to the embodiment of the present invention, referring to fig. 3 and fig. 4, fig. 3 and fig. 4 are schematic diagrams of optional flows of the information processing method according to the embodiment of the present invention, and the method is applied to an electronic device, and the information processing method according to the embodiment of the present invention relates to step 201 to step 208, which are described below respectively.

Step 201: the electronic device loads a feature database.

Here, in actual implementation, before this step, it is necessary to construct a feature database including feature point information of a multi-frame image of a target object. In one embodiment, the feature database may be constructed by: the method comprises the steps that a camera is recorded around a target object to obtain videos including image information of the target object under different viewpoints, and a continuous frame image sequence of the target object is obtained; extracting ORB characteristic points and descriptors thereof in each frame image, and then carrying out characteristic point matching on each frame image and 5 adjacent frame images in front of the frame image to obtain matching information (namely successfully matched characteristic point pair information); performing sparse 3D model reconstruction on the target object by adopting an SFM algorithm to obtain a sparse 3D model of the target object and space position information (3D coordinates) of reconstruction characteristic points; randomly selecting 20 frames of images at different viewpoints as reference frame images based on the sparse 3D model of the target object; and constructing a feature database comprising the reference frame image, the feature points on the reference frame image and the spatial position information of the feature points on the reference frame image. Because the reference frame images correspond to the frame images under different viewpoints, 20 reference frame images respectively correspond to 20 different acquisition poses.

Step 202: and carrying out continuous image acquisition on the target object, and respectively extracting ORB characteristic points and descriptors of each frame of image of the target object.

In actual implementation, continuous image acquisition is carried out on a target object to obtain a continuous frame image sequence of the target object, ORB feature point detection is carried out on the frame image of the target object, and the detection and extraction of the ORB feature points can be carried out through a FAST algorithm; specifically, a circle of pixel values around the candidate feature point may be detected based on the gray-scale values of the image around the feature point, and if there are enough (e.g., three quarters of the circle of points around) pixel points in the area around the candidate point and the gray-scale value of the candidate point is different enough (e.g., exceeds a given gray-scale threshold), the candidate point is considered as a feature point. Meanwhile, in practical implementation, a descriptor of a feature point can be calculated by adopting a BRIEF algorithm.

Step 203: image matching is carried out on the frame images of the target object and the reference frame images in the feature database one by one, whether matching is successful or not is judged, and if matching is successful, step 204 is executed; if the match fails, step 208 is performed.

Here, in practical implementation, the image matching of the frame image of the target object with the reference frame image in the feature database one by one includes:

matching the frame image of the target object with the reference frame image by using the characteristic points, and if the successfully matched characteristic points exceed a preset threshold (such as 15), determining that the image matching is successful; otherwise, the image matching is regarded as failed. In practical application, the image matching is successful, that is, the object recognition of the target object is successful. If the image matching between the frame image of the target object and the currently selected reference frame image fails, another reference frame image in the feature database is selected (such as random selection) for image matching, and if the matching between the current frame image of the target object and all the reference frame images in the feature database fails, the processing flow is ended.

Step 204: spatial position information of feature points of a frame image of a target object is acquired.

Here, in actual implementation, since the feature database stores the spatial position information corresponding to the feature point, when the object identification is successful, the spatial position information of the feature point matching successfully can be obtained from the constructed feature database based on the feature point matching successfully. In an embodiment, when performing image matching on a current frame image of a target object, there may be a case where image matching is successful, but feature points of the current frame image of the target object are not completely matched successfully, and spatial position information of the feature points can be obtained for the feature points with failed matching by: acquiring a reference frame image with the highest similarity between the acquisition pose corresponding to the reference frame image in the feature database and the acquisition pose of the image of the target object; and matching the image of the target object with the acquired reference frame image with the highest similarity of the acquisition pose based on the projection characteristics to obtain the spatial position information of the feature points with failed matching.

Step 205: and performing target tracking on the target object based on the frame image of the target object.

In practical implementation, the target tracking of the target object can be realized by the following steps: the target object is identified in frame images of the target object that are continuously acquired. For example, in a continuous frame image sequence of an acquired target object, if the target object is identified in a previous frame image, feature point matching is performed on a current frame image and the previous frame image, when the number of successfully matched feature points is greater than a preset threshold (e.g., 10), object identification on the target object is successfully achieved in the current frame, and by analogy, target tracking on the target object is achieved in the continuous frame image sequence.

Step 206: and determining the acquisition pose of the image of the target object based on the acquired spatial position information of the characteristic points.

In an embodiment, a PnP algorithm may be used to determine a camera pose (6DoF) of an image of a target object according to the obtained spatial position information of the feature points, and in the same manner, a camera pose (camera pose) corresponding to each frame of image in a sequence of consecutive frames of images of the target object may be obtained, and the obtained camera pose may be filtered and output smoothly, and at the same time, the frame image of the target object and corresponding feature point information and the obtained camera pose may be added to a feature database.

Step 207: and according to the determined acquisition pose of the image, the virtual object is superposed to the preset position of the target object in the process of displaying the target object.

Step 208: and ending the processing flow.

By applying the embodiment of the invention, a user can generate a sparse 3D model and a characteristic database for identifying the target object by recording a video around the object by using a camera or other image acquisition equipment without providing a 3D model of the target object, and the user does not need to paste any marker on the object and can enlarge an application scene based on the texture information of the object to be identified; after the target object is subjected to object recognition based on the generated feature database comprising the frame image information of the target object at each viewpoint, the target object can be further subjected to real-time target tracking, and the superposition of the virtual object and the real object is realized based on the acquired acquisition pose.

EXAMPLE III

As an alternative embodiment of the electronic device according to the embodiment of the present invention, referring to fig. 5 and fig. 5, a schematic diagram of a composition structure of the electronic device according to the embodiment of the present invention is shown, where the electronic device includes: a processor 21, a memory 22 and at least one external communication interface 23; the processor 21, the memory 22 and the external communication interface 23 are all connected through a bus 24; wherein,

a memory 22 for storing an executable program;

a processor 21, configured to implement, by executing the executable program stored in the memory:

In an embodiment, the processor 21 is further configured to acquire multiple frames of images of the target object at different viewpoints;

In an embodiment, the processor 21 is further configured to obtain spatial position information of a reconstructed feature point based on the three-dimensional reconstruction result;

In an embodiment, the processor 21 is further configured to match feature points of each frame image in the feature database with the extracted feature points;

In an embodiment, the processor 21 is further configured to identify the target object in frame images of the target object continuously acquired when the object identification is successful, so as to implement target tracking on the target object.

In an embodiment, the processor 21 is further configured to superimpose a virtual object to a preset position of the target object in the process of displaying the target object according to an acquisition pose of the image corresponding to the target object.

In one embodiment, the processor 21 is further configured to, in response to successful object recognition on the target object and there is a feature point with failed matching in the extracted feature points,

It should be noted that: the electronic device and the information processing method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again. For technical details not disclosed in the embodiments of the electronic device of the present invention, refer to the description of the embodiments of the method of the present invention.

Based on the above description of the information processing method and the electronic device, an embodiment of the present invention further provides a storage medium, on which computer instructions are stored, and the instructions, when executed by a processor, implement:

In one embodiment, the instructions when executed by the processor further implement:

acquiring multi-frame images of the target object under different viewpoints;

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. An information processing method, characterized in that the method comprises:

2. The method of claim 1, wherein the building a feature database comprises:

acquiring multi-frame images of the target object under different viewpoints;

3. The method of claim 2, wherein said constructing the feature database based on the three-dimensional reconstruction results comprises:

4. The method of claim 1, wherein the matching the extracted feature point information with feature point information of frame images in the feature database to perform object recognition on the target object comprises:

5. The method of claim 1, wherein the method further comprises:

6. The method of claim 1, wherein the method further comprises:

7. The method of claim 1, wherein the method further comprises:

8. An electronic device, characterized in that the electronic device comprises:

a memory for storing an executable program;

9. The electronic device of claim 8,

the processor is further configured to match feature points of each frame of image in the feature database with the extracted feature points;

10. The electronic device of claim 8,

the processor is further used for responding to successful object identification of the target object and the extracted feature points have feature points with failed matching,