CN112819892B

CN112819892B - Image processing method and device

Info

Publication number: CN112819892B
Application number: CN202110177906.0A
Authority: CN
Inventors: 齐越; 谢振威; 王君义; 高连生; 李弘毅
Original assignee: Shenzhen Beihang Emerging Industrial Technology Research Institute; Beihang University
Current assignee: Shenzhen Beihang Emerging Industrial Technology Research Institute; Beihang University
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2022-11-25
Anticipated expiration: 2041-02-08
Also published as: CN112819892A

Abstract

The embodiment of the application provides an image processing method and device, which are used for acquiring texture feature point information of each frame of image in a video to be processed and marker information in each frame of image when the image is rendered; determining the pose of a camera in the video to be processed and the pose of a target marker in each marker according to the texture feature point information of each frame of image and the marker information in each frame of image; determining the target position of the preset virtual image in each frame image according to the pose of the camera and the pose of the target marker; and rendering each frame of image based on the target position to obtain each frame of target image containing the preset virtual image. Therefore, the pose of the camera is determined through the texture feature point information and the marker information in the image, the situation that the determined pose of the camera is low in accuracy due to the fact that the texture feature information is few can be avoided, the pose accuracy of the camera is improved, and therefore the accuracy of the rendering result of the original image is improved.

Description

Image processing method and device

Technical Field

The present application relates to the field of augmented reality technologies, and in particular, to an image processing method and apparatus.

Background

Augmented Reality (AR) technology is a technology for skillfully fusing virtual information and the real world, is more and more widely applied in the fields of life entertainment, smart home, industrial manufacturing and the like, and particularly relates to a three-dimensional registration technology in Augmented Reality.

In the prior art, a three-dimensional registration method of instant positioning and mapping (SLAM for short) is usually used, and in the scheme, information of a real environment is acquired through equipment such as a camera, and texture feature information in the information of the real environment is extracted; and extracting feature point information in the texture feature information according to a feature extraction algorithm, and determining the camera posture based on the feature point information, so that the original image is rendered according to the camera posture to obtain a virtual-real fused image.

However, when determining the camera pose based on the feature point information, the accuracy of the determined camera pose may be low due to less texture feature information in the real environment, and thus the accuracy of the rendering result of the original image may be low.

Disclosure of Invention

The embodiment of the application provides an image processing method and device, which improve the accuracy of a camera pose, so that the accuracy of a rendering result of an original image is improved.

In a first aspect, an embodiment of the present application provides an image processing method, where the image processing method includes:

acquiring texture feature point information of each frame of image in a video to be processed and marker information in each frame of image.

And determining the pose of the camera in the video to be processed and the pose of the target marker in each marker according to the texture feature point information of each frame of image and the marker information in each frame of image.

And determining the target position of a preset virtual image in each frame image according to the pose of the camera and the pose of the target marker.

And rendering the frame images based on the target position to obtain the frame target images containing the preset virtual images.

In a possible implementation manner, the determining, according to the pose of the camera and the pose of the target marker, a target position of a preset virtual image in each frame image includes:

and analyzing the pose of the camera and the pose of the target marker, and establishing a conversion relation between the pose of the camera and the pose of the target marker.

And determining the target position of the preset virtual image in each frame image according to the conversion relation and the parameters of the preset virtual image.

In a possible implementation manner, the determining the pose of the camera in the video to be processed and the pose of the target marker in each marker according to the texture feature point information of each frame of image and the marker information in each frame of image includes:

and matching the characteristic points of the target marker in any two continuous frames of images in the video to be processed.

And determining the two frames of images with the highest matching degree of the feature points as two continuous frames of images to be used.

And determining the pose of the camera in the video to be processed and the pose of the target marker in each marker according to the texture feature point information of each frame of image and the target markers in the images to be used of the two continuous frames.

In a possible implementation manner, the determining, according to the texture feature point information of each frame of image and the target markers in the two consecutive frames of images to be used, the pose of the camera in the video to be processed and the pose of the target markers in the markers includes:

and matching the target markers in the two continuous frames of images to be used, and determining each target marker pair corresponding to the two continuous frames of images to be used.

And determining a transformation matrix between the camera coordinate system corresponding to each target marker and the target marker coordinate system according to the camera coordinate system and the marker coordinate system corresponding to each target marker.

And determining the pose of the camera in the video to be processed and the pose of the target marker in each marker according to the texture feature point information of each frame of image and the transformation matrix.

In a possible implementation manner, the determining, according to the texture feature point information of each frame of image and the transformation matrix, a pose of a camera in the video to be processed and a pose of a target marker in each marker includes:

determining a limit constraint described by the transformation matrix.

And determining the sum of the number of the texture feature points and the number of the markers which meet the limit constraint in the texture feature point information of each frame of image and the target markers in each frame of image.

And if the sum of the number is greater than a preset threshold value, determining the pose of the camera in the video to be processed and the pose of the target marker in each marker according to the texture feature point information of each frame of image and the transformation matrix.

In one possible implementation, the method further includes:

and calculating the projection error between the camera coordinate system and the marker coordinate system corresponding to each target marker.

And optimizing the transformation matrix according to the projection error, and determining an updated transformation matrix.

In one possible implementation, the method further includes:

and transforming the marker information in each frame of image to obtain the marker information in the preset direction.

And determining the target marker in the marker information in each frame of image according to the marker information in the preset position.

In one possible implementation manner, the acquiring marker information in each frame of image includes:

and preprocessing each frame of image, and extracting the contour information of each frame of image.

And analyzing the contour information of each frame of image to obtain the marker information in each frame of image.

In a second aspect, an embodiment of the present application provides an image processing apparatus including:

the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring the texture feature point information of each frame of image in the video to be processed and the marker information in each frame of image.

And the determining unit is used for determining the pose of the camera in the video to be processed and the pose of the target marker in each marker according to the texture feature point information of each frame of image and the marker information in each frame of image.

The determining unit is further configured to determine a target position of a preset virtual image in each frame image according to the pose of the camera and the pose of the target marker.

And the processing unit is used for rendering each frame of image based on the target position to obtain each frame of target image containing the preset virtual image.

In a possible implementation manner, the determining unit is specifically configured to parse the pose of the camera and the pose of the target marker, and establish a transformation relationship between the pose of the camera and the pose of the target marker; and determining the target position of the preset virtual image in each frame image according to the conversion relation and the parameters of the preset virtual image.

In a possible implementation manner, the determining unit is specifically configured to match feature points of a target marker in any two consecutive frames of images in the video to be processed; determining two frames of images with the highest feature point matching degree as two continuous frames of images to be used; and determining the pose of the camera in the video to be processed and the pose of the target marker in each marker according to the texture feature point information of each frame of image and the target markers in the images to be used of the two continuous frames.

In a possible implementation manner, the determining unit is specifically configured to match target markers in the two consecutive frames of images to be used, and determine each target marker pair corresponding to the two consecutive frames of images to be used. And determining a transformation matrix between the camera coordinate system corresponding to each target marker and the target marker coordinate system according to the camera coordinate system and the marker coordinate system corresponding to each target marker. And determining the pose of the camera in the video to be processed and the pose of the target marker in each marker according to the texture feature point information of each frame of image and the transformation matrix.

In a possible implementation manner, the determining unit is specifically configured to determine a limit constraint described by the transformation matrix; determining the sum of the number of texture feature points and markers which meet the limit constraint from the texture feature point information of each frame of image and the target markers in each frame of image; and if the sum of the number is greater than a preset threshold value, determining the pose of the camera in the video to be processed and the pose of the target marker in each marker according to the texture feature point information of each frame of image and the transformation matrix.

In a possible implementation manner, the determining unit is further configured to calculate projection errors between a camera coordinate system and a marker coordinate system respectively corresponding to the target markers; and optimizing the transformation matrix according to the projection error, and determining an updated transformation matrix.

In a possible implementation manner, the obtaining unit is further configured to perform transformation processing on the marker information in each frame of image to obtain marker information in a preset orientation; and determining the target marker in the marker information in each frame of image according to the marker information in the preset direction.

In a possible implementation manner, the obtaining unit is specifically configured to perform preprocessing on each frame of image, and extract contour information of each frame of image; and analyzing the contour information of each frame of image to obtain the marker information in each frame of image.

In a third aspect, an embodiment of the present application further provides an image processing apparatus, which may include a memory and a processor; wherein the content of the first and second substances,

the memory is used for storing computer programs.

The processor is configured to read the computer program stored in the memory, and execute the image processing method in any one of the possible implementation manners of the first aspect according to the computer program in the memory.

In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, where a computer-executable instruction is stored in the computer-readable storage medium, and when a processor executes the computer-executable instruction, the image processing method described in any one of the foregoing possible implementation manners of the first aspect is implemented.

In a fifth aspect, this application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the image processing method described in any one of the possible implementation manners of the first aspect.

Therefore, according to the image processing method and device provided by the embodiment of the application, when the image is rendered, the texture feature point information of each frame of image in the video to be processed and the marker information in each frame of image are obtained; determining the pose of a camera in the video to be processed and the pose of a target marker in each marker according to the texture feature point information of each frame of image and the marker information in each frame of image; determining the target position of the preset virtual image in each frame image according to the pose of the camera and the pose of the target marker; and rendering each frame of image based on the target position to obtain each frame of target image containing the preset virtual image. Therefore, the pose of the camera is determined through the texture feature point information and the marker information in the image, the problem that the accuracy of the determined pose of the camera is low due to less texture feature information can be avoided, the accuracy of the pose of the camera is improved, and the accuracy of the rendering result of the original image is improved.

Drawings

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a process of determining a pose of a camera in a video to be processed and a pose of a target marker in each marker according to an embodiment of the present application;

fig. 3 is a schematic flowchart of another image processing method according to an embodiment of the present application;

fig. 4 is a schematic view of a target marker provided in an embodiment of the present application;

fig. 5 is a schematic diagram of a virtual-real fusion scene provided in an embodiment of the present application;

fig. 6 is a schematic diagram of another virtual-real fusion scene provided in the embodiment of the present application;

fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of another image processing apparatus according to an embodiment of the present application.

With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. The drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the disclosed concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the disclosure, as detailed in the appended claims.

In the embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In the description of the text of the present application, the character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The technical scheme provided by the embodiment of the application can be applied to augmented reality scenes. Augmented Reality (AR) technology is a technology for skillfully fusing virtual information and the real world, is more and more widely applied in the fields of life entertainment, smart home, industrial manufacturing and the like, and particularly relates to a three-dimensional registration technology in Augmented Reality. When the camera moves, part of the tracked scene may overflow the view field range along with the movement of the camera, causing failure in tracking part of the feature points, and in addition, the tracking failure may be caused by noise, occlusion and other reasons, thereby causing the number of the tracked feature points to be continuously reduced.

In the prior art, a three-dimensional registration method of instant positioning and mapping (SLAM for short) is usually used, and the scheme collects information of a real environment and extracts texture feature information in the information of the real environment; and extracting characteristic point information in the texture characteristic information according to a characteristic extraction algorithm, matching the characteristic points in two adjacent frames of images, and calculating a characteristic point pair. And calculating a transformation matrix between the two frames of images according to the characteristic point pairs, and determining the gesture of the camera, so that the original image is rendered according to the gesture of the camera to obtain a virtual-real fused image.

However, when the pose of the camera is determined according to the feature point information in the texture feature information, the accuracy of the finally determined pose of the camera is low because the texture feature information in the real environment is less and sufficient feature point information cannot be extracted, so that the accuracy of the rendering result of the original image is low.

In order to avoid the problem that the accuracy of the finally determined camera pose is low because the texture feature information is less and sufficient feature point information cannot be extracted, markers can be placed in the real environment, so that sufficient feature points exist in the real environment, the transformation relation between the camera pose and the marker pose can be determined according to the marker information, and a transformation matrix is obtained, so that the original image can be accurately rendered.

Based on the above concept, the embodiment of the present application provides an image processing method, which obtains texture feature point information of each frame of image in a video to be processed and marker information in each frame of image when rendering an image; determining the pose of a camera in the video to be processed and the pose of a target marker in each marker according to the texture feature point information of each frame of image and the marker information in each frame of image; determining the target position of the preset virtual image in each frame image according to the pose of the camera and the pose of the target marker; and rendering each frame of image based on the target position to obtain each frame of target image containing the preset virtual image.

Therefore, in the embodiment of the application, the pose of the camera is determined through the texture feature point information and the marker information in the image, the problem that the accuracy of the determined pose of the camera is low due to less texture feature information can be avoided, the accuracy of the pose of the camera is improved, and the accuracy of the rendering result of the original image is improved.

Hereinafter, the image processing method provided by the present application will be described in detail by specific examples. It is to be understood that the following detailed description may be combined with other embodiments, and that the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present disclosure. The image processing method may be performed by software and/or hardware means, for example, the hardware means may be an image processing means, which may be a terminal or a processing chip in the terminal. For example, referring to fig. 1, the image processing method may include:

s101, obtaining texture feature point information of each frame of image in a video to be processed and marker information in each frame of image.

For example, the marker in the embodiment of the present application may be a square paper sheet containing a pattern generated based on a certain encoding mode, for example, hamming code encoding, or may be in other forms, which is not limited in this embodiment of the present application. The marker is placed in a real environment such as a wall surface or a floor and the like with less texture feature point information, so that the feature point information in the real environment is increased.

For example, when texture feature point information of each frame of image in the video to be processed is obtained, texture information in a real environment corresponding to each frame of image in the video to be processed may be extracted, and the texture feature point information of each frame of image in the video to be processed is obtained through a feature point extraction algorithm. The Feature point extraction algorithm may be an organized FAST and related features tree (ORB) algorithm, a Scale Invariant Feature Transform (SIFT) algorithm, a deep learning algorithm, and the like, and the specific algorithm is not limited in the embodiment of the present application. The video to be processed can be acquired through a camera or other shooting equipment, and the acquisition equipment of the video to be processed is not limited in any way in the embodiment of the application.

When marker information in each frame of image in a video to be processed is obtained, preprocessing each frame of image, and extracting outline information of each frame of image; and analyzing the outline information of each frame of image to obtain the marker information in each frame of image. It can be understood that, when obtaining the markers in each frame of image according to the contour information, the influence of non-markers similar to the contour information of the markers in the real environment cannot be avoided, and therefore, in the embodiment of the present application, the obtained information of the markers includes the information of the target markers and the environment information similar to the contour of the target markers.

In the embodiment of the application, by extracting the contour information in each frame of image in the video to be processed, the markers can be preliminarily screened according to the contours of the markers, so that the markers which are far away from the markers are excluded.

After obtaining the texture feature point information of each frame of image in the video to be processed and the marker information in each frame of image, the following S102 may be executed:

s102, determining the pose of a camera in the video to be processed and the pose of a target marker in each marker according to the texture feature point information of each frame of image and the marker information in each frame of image.

For example, before determining the pose of the camera in the video to be processed and the pose of the target marker in each marker according to the texture feature point information of each frame of image and the marker information in each frame of image, the target marker needs to be determined. When the target marker is determined, the marker information in each frame of image can be firstly transformed to obtain the marker information in the preset direction; and determining a target marker in the marker information in each frame of image according to the marker information in the preset direction.

It can be understood that when the marker information in each frame of image is transformed to obtain the marker information in the preset orientation, the pixel coordinates of four corner points of the marker are read, the markers in various orientations are transformed into the markers in the preset orientation in a projection transformation manner, and the marker information in the preset orientation is obtained. Wherein the preset orientation is the orientation when the marker is viewed in elevation, i.e. the front view of the marker.

When the target marker is determined in the marker information in each frame image according to the marker information in the preset orientation, whether the marker is the target marker or not can be determined according to the coding information by reading the coding information in the marker information, and the information of the target marker can be obtained. And if the information of the marker contains the coded information, determining that the marker is the target marker, and otherwise, determining that the marker is not the target marker.

For example, the target marker information may include, in addition to the corner coordinates of the target marker, the encoding information, and the like, a mapping relationship between the target marker and each frame of image, that is, which target markers are included in each frame of image. In addition, after all the target markers are determined, a target marker map can be established, so that if a new video frame image appears, the target markers in the new video frame image can be directly synchronized into the established target marker map.

In the embodiment of the application, the information of the target marker is determined by analyzing and processing the marker information in each frame of image, so that the acquired information of the target marker is more accurate.

For example, when the pose of the camera in the video to be processed and the pose of the target marker in each marker are determined according to the texture feature Point information of each frame of image and the marker information in each frame of image, the pose of the target marker in 6 degrees of freedom in the camera coordinate system can be calculated through a PnP (passive-n-Point) algorithm and the corner coordinates of the target marker, and the pose of the camera in the video to be processed and the pose of the target marker in each marker are determined. The PnP algorithm is a method for solving 3-to 2-dimensional point-to-point motion.

S103, determining the target position of the preset virtual image in each frame image according to the pose of the camera and the pose of the target marker.

For example, when the target position of the preset virtual image in each frame image is determined according to the pose of the camera and the pose of the target marker, the pose of the camera and the pose of the target marker may be analyzed, and a transformation relationship between the pose of the camera and the pose of the target marker may be established; and determining the target position of the preset virtual image in each frame image according to the conversion relation and the parameters of the preset virtual image.

The parameters of the virtual image may include information such as a shape, a color, and a size of the virtual image, and the specific embodiment of the present application is not limited at all.

And S104, rendering each frame of image based on the target position to obtain each frame of target image containing the preset virtual image.

For example, when each frame of image is rendered based on the target position, a model of a preset virtual image may be formulated according to parameters of the preset virtual image, and the target preset virtual image may be drawn; and carrying out virtual-real fusion on the target preset virtual image and each frame of image in the video to be processed to obtain each frame of target image containing the preset virtual image. The virtual image may be an image of a virtual character, or an image of a virtual object at each angle, which is not limited in this embodiment.

After each frame of image is rendered based on the target position to obtain each frame of target image containing the preset virtual image, each frame of target image containing the preset virtual image can be output to enable a user to view.

Therefore, the image processing method provided by the embodiment of the application obtains the texture feature point information of each frame of image in the video to be processed and the marker information in each frame of image when rendering the image; determining the pose of a camera in the video to be processed and the pose of a target marker in each marker according to the texture feature point information of each frame of image and the marker information in each frame of image; determining the target position of the preset virtual image in each frame image according to the pose of the camera and the pose of the target marker; and rendering each frame of image based on the target position to obtain each frame of target image containing the preset virtual image. Therefore, the pose of the camera is determined through the texture feature point information and the marker information in the image, the situation that the determined pose of the camera is low in accuracy due to the fact that the texture feature information is few can be avoided, the pose accuracy of the camera is improved, and therefore the accuracy of the rendering result of the original image is improved.

Based on the above-described embodiment shown in fig. 1, in order to facilitate understanding of how to determine the pose of the camera in the video to be processed and the pose of the target marker in each marker in the present embodiment according to the texture feature point information of each frame image and the marker information in each frame image, in the following, how to determine the pose of the camera in the video to be processed and the pose of the target marker in each marker in the present embodiment according to the texture feature point information of each frame image and the marker information in each frame image will be described in detail by the embodiment shown in fig. 2.

Fig. 2 is a schematic flowchart of a process of determining a pose of a camera in a video to be processed and a pose of a target marker in each marker according to an embodiment of the present application. The method of determining the pose of the camera and the pose of the target marker in each marker in the video to be processed may be performed by software and/or hardware means. For example, referring to fig. 2, the method for determining the pose of the camera and the pose of the target markers in each marker in the video to be processed may include:

s201, matching the target markers in the two continuous frames of images to be used, and determining each target marker pair corresponding to the two continuous frames of images to be used.

For example, before matching the target markers in two consecutive images to be used, the two consecutive images to be used need to be determined. When two continuous frames of images to be used are determined, the feature points of the target marker in any two continuous frames of images in the video to be processed can be matched; and determining the two frames of images with the highest matching degree of the feature points as two continuous frames of images to be used.

In order to match the feature points of the target marker, the target marker may be numbered, so that matching can be performed according to the number of the target marker, that is, matching the target marker with the same number in the images to be used in two consecutive frames, that is, matching the feature points. Since the embodiments of the present application are described by taking the target marker number as an example, the embodiments of the present application are not limited thereto.

For example, when the target marker is matched, the texture feature point may be matched according to the information of the texture feature point of each frame image, and the texture feature point pair may be determined.

In the embodiment of the application, the two frames of images with the highest feature point matching degree are determined as the two continuous frames of images to be used, so that the obtained two continuous frames of images to be used contain more feature point information, and the accuracy of determination results of the pose of the camera in the video to be processed and the pose of the target marker in each marker is improved.

S202, determining a transformation matrix between the camera coordinate system and the target marker coordinate system corresponding to each target marker according to the camera coordinate system and the marker coordinate system corresponding to each target marker.

For example, when determining a transformation matrix between a camera coordinate system and a target marker coordinate system corresponding to each target marker according to the camera coordinate system and the marker coordinate system corresponding to each target marker, matching target markers in two consecutive frames of images to be used, and determining each target marker pair corresponding to two consecutive frames of images to be used; and determining a transformation matrix between the camera coordinate system and the target marker coordinate system corresponding to each target marker according to the camera coordinate system and the marker coordinate system corresponding to each target marker. Wherein each target marker is a target marker corresponding to each marker pair.

It can be understood that, when determining the transformation matrix between the camera coordinate system and the target marker coordinate system corresponding to each target marker according to the camera coordinate system and the marker coordinate system corresponding to each target marker, one target marker pair of all target marker pairs may be selected, that is, different projections of the same target marker in two different camera coordinate systems corresponding to two consecutive frames of images may be selected, or a plurality of target marker pairs may be selected.

For example, in determining the transformation matrix between the camera coordinate system and the target marker coordinate system, it is assumed that in a target marker pair, the transformation between the two camera coordinate systems with respect to the target marker coordinate system is Tmc, respectively ₁ ，Tmc ₂ . Then can be turned onThe transformation matrix between the camera coordinate system and the target marker coordinate system, that is, the transformation matrix between the camera coordinate system and the target marker coordinate system corresponding to each target marker, is calculated by the following equations (1), (2) and (3).

T ₂₁ ＝Tmc ₂ ^-1 *Tmc ₁ (1)

E＝t^*R (2)

F＝K ^-1 *E*K (3)

Where T denotes a displacement vector extracted from T, E denotes a matrix of the entire camera, R denotes a rotation vector extracted from T, K denotes an internal reference matrix of the camera, and F denotes a transformation matrix between a camera coordinate system and a target marker coordinate system.

If a plurality of target marker pairs are selected, the transformation matrix between each camera coordinate system and the target marker coordinate system can be sequentially calculated according to the method, and the optimal transformation matrix is selected as the transformation matrix between the camera coordinate system and the target marker coordinate system.

In the embodiment of the application, the transformation matrix between the camera coordinate system and the target marker coordinate system respectively corresponding to each target marker is determined through the target marker pair in the images to be used in two continuous frames, so that the problem that the transformation matrix is calculated inaccurately by using the texture feature points is avoided, and the obtained transformation matrix is more accurate.

For example, to further determine the accuracy of the transformation matrix, in one possible implementation, limit constraints described by the transformation matrix may be determined; determining the sum of the number of texture feature points and target markers which meet limit constraints in the texture feature point information of each frame of image and the markers in each frame of image; if the sum of the quantities is larger than the preset threshold value, the transformation matrix is determined to be the transformation matrix between the camera coordinate system and the target marker coordinate system corresponding to each target marker, so that the determined transformation matrix is more accurate. The size of the preset threshold is not limited in any way in the embodiment of the present application.

For example, when determining the accuracy of the transformation matrix, after determining the limit constraints described by the transformation matrix, the texture feature point information of each frame of image, and the markers in each frame of image, the ratio of the sum of the number of the texture feature points and the number of the target markers satisfying the limit constraints to the sum of the number of all the texture feature points and the number of the target markers may be calculated; and if the ratio is greater than the preset ratio, determining the transformation matrix as the transformation matrix between the camera coordinate system and the target marker coordinate system corresponding to each target marker respectively. The embodiment of the present application does not limit the size of the preset ratio.

When the sum of the number of texture feature points and target markers meeting the limit constraint is determined, the 3-dimensional positions of all feature points can be restored according to triangulation, the restored 3-dimensional feature points are projected to a camera coordinate system, and a reprojection error (KP) is calculated _c And (4) eliminating the characteristic points with larger reprojection errors. Wherein K represents a camera reference matrix, P _c Representing the coordinates of the 3-dimensional feature points in the camera coordinate system. Since the size of the target marker is known, the 3-dimensional position of the target marker restored by triangulation is consistent with the real scale, thereby avoiding the problems of monocular visual instant positioning without the target marker and uncertainty of the map construction scale.

In the embodiment of the application, whether the obtained transformation matrix is accurate or not is determined according to the number of all the texture feature points and the target markers and the number of the texture feature points and the target markers which meet the requirement of the transformation matrix, and the accuracy of rendering each frame of image is further improved.

And S203, determining the pose of the camera in the video to be processed and the pose of the target marker in each marker according to the transformation matrix.

For example, before determining the pose of the camera in the video to be processed and the pose of the target marker in each marker, the projection error between the camera coordinate system and the marker coordinate system corresponding to each target marker can be calculated; and optimizing the transformation matrix according to the projection error, and determining the updated transformation matrix.

When the transformation matrix is optimized, the transformation relation from the corner point of the target marker in the target marker coordinate system to the camera coordinate system can be calculated by the following formula (4).

Wherein p denotes the position of the corner point of the target marker in the camera coordinate system, p _m Representing the position of the corner point of the target marker in the coordinate system of the target marker, T _cw Representing 6 degrees of freedom pose, T, of a camera _mw Representing the 6 degree-of-freedom pose of the target marker, and pi represents a function that converts points from the camera coordinate system to the pixel coordinate system.

Due to pixel errors, camera distortion, etc., an error occurs in the 6 degrees of freedom of the calculation, and the error term can be expressed by the following formula (5).

According to the error term, the error can be determined by a least square optimization method, namely, the transformation matrix is optimized according to the following formula (6) to determine an updated transformation matrix.

In the embodiment of the application, the transformation matrix is optimized and updated, so that the obtained transformation matrix is more accurate, and the rendering accuracy of each frame of image is improved.

For example, when the pose of the camera in the video to be processed and the pose of the target marker in each marker are determined according to the texture feature point information and the transformation matrix of each frame of image, the updated transformation matrix may be analyzed by the g2o method, so as to determine the pose of the camera in the video to be processed and the pose of the target marker in each marker.

Therefore, in the embodiment of the application, the target marker pairs corresponding to the two continuous frames of images to be used are determined by matching the target markers in the two continuous frames of images to be used; determining a transformation matrix between the camera coordinate system and the target marker coordinate system corresponding to each target marker according to the camera coordinate system and the marker coordinate system corresponding to each target marker; and determining the pose of the camera in the video to be processed and the pose of the target marker in each marker according to the texture feature point information and the transformation matrix of each frame of image. The transformation matrix can be determined more accurately, so that the obtained pose of the camera in the video to be processed and the pose of the target marker in each marker are more accurate, and the rendering accuracy of each frame of image is improved.

Hereinafter, the image processing method provided in the embodiment of the present application will be described in detail by taking a specific application as an example, and specifically, refer to fig. 3, where fig. 3 is a schematic flow chart of another image processing method provided in the embodiment of the present application.

For example, as shown in fig. 3, an original frame corresponding to an original video is obtained by analyzing the original video, that is, the video is processed into a multi-frame image, and the posture estimation module analyzes the original frame image to obtain a camera posture and a target marker posture; and the virtual-real fusion module receives the camera gesture and the target marker gesture, processes the camera gesture and the target marker gesture and draws a virtual image. And the virtual-real fusion image module fuses the virtual image and the original frame image to obtain a virtual-real fusion image.

The virtual-real fusion module receives the camera gesture and the target marker gesture, processes the camera gesture and the target marker gesture, and when a virtual image is drawn, the camera gesture can be converted from an instant positioning and map building coordinate system to a game engine coordinate system, namely from an SLAM coordinate system to a Unity coordinate system, and scene information defined in an extensible markup language format, namely an XML-format scene file is extracted, wherein the scene information comprises a path and a relative position of a model to be used and related information of a target marker; and drawing a virtual image according to the transformed virtual coordinate and the loaded model.

As shown in fig. 3, the attitude estimation module includes a target marker recognition sub-module, a feature extraction sub-module, an instant positioning and mapping initialization sub-module, an instant positioning and mapping back-end optimization sub-module, and a mapping multiplexing sub-module. The gesture estimation module analyzes and processes the original frame image, when the camera gesture and the target marker gesture are obtained, the target marker recognition submodule is responsible for recognizing the markers in the original frame, and determines the target markers in the multiple markers by screening the markers, and extracts information related to the target markers, wherein the method for screening the target markers and obtaining the information of the target markers is the same as the method for determining the target markers in the above embodiments, and the method is not repeated in this embodiment. Wherein the information related to the target marker comprises a name of the target marker and a rough pose of the target marker. The feature extraction sub-module uses opencv to extract ORB feature points in the original frame image, that is, feature point information of texture features of the original frame image is extracted. The real-time positioning and map building initialization submodule simply analyzes and processes the target marker information to obtain a transformation matrix in front of a target marker coordinate system and a camera coordinate system; optimizing target marker information and feature point information of textural features by the real-time positioning and map construction rear-end optimization submodule, namely optimizing a target marker coordinate system and a transformation matrix in front of a camera coordinate system to obtain an updated transformation matrix, and determining a camera posture and an accurate target marker posture; the map multiplexing sub-module provides a function of map multiplexing, that is, multiplexing the map described in the original frame image. It can be understood that, the method for processing the original frame image by the target marker identification sub-module, the feature extraction sub-module, the instant positioning and map building initialization sub-module, and the instant positioning and map building back-end optimization sub-module may refer to the above embodiments, which are not repeated herein.

For example, the target marker used in the embodiment of the present application may be a target marker in hamming code, which can be specifically shown in fig. 4. Fig. 4 is a schematic diagram of a target marker provided in an embodiment of the present application.

To further prove the accuracy of the image processing method provided in the embodiment of the present application, the method shown in fig. 3 may be compared with the method in the prior art, where in a scene with rich texture features, both methods require about 10 initialization operations to obtain the transformation matrix between the camera coordinate system and the target marker coordinate system; in a scene with a lack of texture features, the method shown in fig. 3 can still be maintained for about 10 times, but the method in the prior art needs hundreds of times or even cannot be initialized successfully, that is, the method shown in fig. 3 can obtain a transformation matrix, while the method in the prior art cannot obtain the transformation matrix due to less texture features; and the method shown in fig. 3 can reduce the error to 10% -50% of the original error.

For example, the virtual-real fused image obtained by the method shown in fig. 3 may be displayed in a terminal device, and fig. 5 and fig. 6 are both scene graphs obtained by transplanting the obtained virtual-real fused image into a mobile terminal. Fig. 5 is a schematic diagram of a virtual-real fusion scene provided in an embodiment of the present application, and fig. 6 is a schematic diagram of another virtual-real fusion scene provided in the embodiment of the present application.

Fig. 5 shows the combination of the virtual object 1 and the real scene, and fig. 6 shows the combination of the virtual object 2 and the real scene. The virtual objects in fig. 5 and 6 are all arranged on the floor with fewer texture feature points, and it can be known from fig. 5 and 6 that the rendering positions of the two virtual objects are all the floors with fewer texture feature points in the real scene, so that the image processing method provided by the embodiment of the application can accurately render the virtual objects in the real scene, and the obtained virtual-real fusion image has a good effect.

In summary, the image processing method provided by the embodiment of the present application can determine the pose of the camera through the texture feature point information and the marker information in the video frame image, so as to avoid the problem of low accuracy of the determined pose of the camera due to less texture feature information, and improve the accuracy of the rendering result of the original image.

Fig. 7 is a schematic structural diagram of an image processing apparatus 70 according to an embodiment of the present application, and for example, please refer to fig. 7, the image processing apparatus 70 may include:

the obtaining unit 701 is configured to obtain texture feature point information of each frame of image in the video to be processed, and marker information in each frame of image.

A determining unit 702, configured to determine, according to the texture feature point information of each frame of image and the marker information in each frame of image, a pose of a camera in the video to be processed and a pose of a target marker in each marker.

The determining unit 703 is further configured to determine a target position of the preset virtual image in each frame image according to the pose of the camera and the pose of the target marker.

And the processing unit 704 is configured to render each frame of image based on the target position, so as to obtain each frame of target image including the preset virtual image.

Optionally, the determining unit 702 is specifically configured to analyze the pose of the camera and the pose of the target marker, and establish a transformation relationship between the pose of the camera and the pose of the target marker; and determining the target position of the preset virtual image in each frame image according to the conversion relation and the parameters of the preset virtual image.

Optionally, the determining unit 702 is specifically configured to match feature points of a target marker in any two consecutive frames of images in a video to be processed; determining two frames of images with the highest feature point matching degree as two continuous frames of images to be used; and determining the pose of the camera in the video to be processed and the pose of the target marker in each marker according to the texture feature point information of each frame of image and the target markers in the images to be used of two continuous frames.

Optionally, the determining unit 702 is specifically configured to match target markers in two consecutive frames of images to be used, and determine each target marker pair corresponding to the two consecutive frames of images to be used. And determining a transformation matrix between the camera coordinate system and the target marker coordinate system corresponding to each target marker according to the camera coordinate system and the marker coordinate system corresponding to each target marker. And determining the pose of the camera in the video to be processed and the pose of the target marker in each marker according to the texture feature point information and the transformation matrix of each frame of image.

Optionally, the determining unit 702 is specifically configured to determine a limit constraint described by the transformation matrix; determining the sum of the number of texture feature points and markers which meet the limit constraint in the texture feature point information of each frame of image and the target markers in each frame of image; and if the sum of the number is greater than a preset threshold value, determining the pose of the camera in the video to be processed and the pose of the target marker in each marker according to the texture feature point information and the transformation matrix of each frame of image.

Optionally, the determining unit 702 is further configured to calculate projection errors between the camera coordinate system and the marker coordinate system corresponding to each target marker; and optimizing the transformation matrix according to the projection error, and determining the updated transformation matrix.

Optionally, the obtaining unit 702 is further configured to perform transformation processing on the marker information in each frame of image to obtain marker information in a preset orientation; and determining a target marker in the marker information in each frame of image according to the marker information in the preset direction.

Optionally, the obtaining unit 701 is specifically configured to perform preprocessing on each frame of image, and extract contour information of each frame of image; and analyzing the contour information of each frame of image to obtain the marker information in each frame of image.

The image processing apparatus provided in the embodiment of the present application can execute the technical solution of the image processing method in any of the above embodiments, and the implementation principle and the beneficial effect of the image processing apparatus are similar to those of the image processing method, which can be referred to as the implementation principle and the beneficial effect of the image processing method, and are not described herein again.

Fig. 8 is a schematic structural diagram of another image processing apparatus 80 according to an embodiment of the present application, for example, please refer to fig. 8, where the image processing apparatus 80 may include a processor 801 and a memory 802;

wherein the content of the first and second substances,

the memory 802 is used for storing computer programs.

The processor 801 is configured to read the computer program stored in the memory 802, and execute the technical solution of the image processing method in any of the embodiments according to the computer program in the memory 802.

Alternatively, the memory 802 may be separate or integrated with the processor 801. When the memory 802 is a device separate from the processor 801, the image processing apparatus 80 may further include: a bus for connecting the memory 802 and the processor 801.

Optionally, this embodiment further includes: a communication interface that may be coupled to the processor 801 via a bus. The processor 801 may control the communication interface to implement the functions of reception and transmission of the image processing apparatus 80 described above.

The image processing apparatus 80 shown in the embodiment of the present application can execute the technical solution of the image processing method in any of the above embodiments, and the implementation principle and the beneficial effect thereof are similar to those of the image processing method, and reference may be made to the implementation principle and the beneficial effect of the image processing method, which is not described herein again.

An embodiment of the present application further provides a computer-readable storage medium, where a computer execution instruction is stored in the computer-readable storage medium, and when a processor executes the computer execution instruction, the technical solution of the image processing method in any of the embodiments is implemented, and implementation principles and beneficial effects of the method are similar to those of the image processing method, and reference may be made to the implementation principles and beneficial effects of the image processing method, which are not described herein again.

The embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the technical solution of the image processing method in any of the above embodiments is implemented, and the implementation principle and the beneficial effect of the computer program are similar to those of the image processing method, which can be referred to as the implementation principle and the beneficial effect of the image processing method, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts shown as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application.

It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The memory may comprise a high speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one magnetic disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, or the like.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

The computer-readable storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. An image processing method, comprising:

acquiring texture feature point information of each frame of image in a video to be processed and marker information in each frame of image;

determining the pose of a camera in the video to be processed and the pose of a target marker in each marker according to the texture feature point information of each frame of image and the marker information in each frame of image;

determining the pose of the camera in the video to be processed and the pose of the target marker in each marker according to the texture feature point information of each frame of image and the marker information in each frame of image, wherein the determining comprises the following steps:

determining the pose of a camera in the video to be processed and the pose of a target marker in each marker according to the texture feature point information of each frame of image and the target markers in the images to be used of two continuous frames;

determining the pose of the camera in the video to be processed and the pose of the target marker in each marker according to the texture feature point information of each frame of image and the target markers in the images to be used of two continuous frames, wherein the determining comprises the following steps:

matching the target markers in the two continuous frames of images to be used, and determining each target marker pair corresponding to the two continuous frames of images to be used;

determining a transformation matrix between the camera coordinate system corresponding to each target marker and the target marker coordinate system according to the camera coordinate system and the marker coordinate system corresponding to each target marker;

determining the pose of a camera in the video to be processed and the pose of a target marker in each marker according to the texture feature point information of each frame of image and the transformation matrix;

determining the target position of a preset virtual image in each frame image according to the pose of the camera and the pose of the target marker;

and rendering each frame of image based on the target position to obtain each frame of target image containing the preset virtual image.

2. The method of claim 1, wherein determining the target position of a preset virtual image in each frame image according to the pose of the camera and the pose of the target marker comprises:

analyzing the pose of the camera and the pose of the target marker, and establishing a conversion relation between the pose of the camera and the pose of the target marker;

3. The method of claim 1, wherein the marker information comprises feature points of the markers, and wherein determining the pose of the camera in the video to be processed and the pose of the target marker in the markers from the texture feature point information of the frames of images and the marker information in the frames of images comprises:

matching the feature points of the target marker in any two continuous frames of images in the video to be processed;

determining two frames of images with the highest feature point matching degree as two continuous frames of images to be used;

4. The method according to claim 1, wherein the determining the pose of the camera in the video to be processed and the pose of the target marker in the markers according to the texture feature point information of the frame images and the transformation matrix comprises:

determining a limit constraint described by the transformation matrix;

determining the sum of the number of texture feature points and markers which meet the limit constraint from the texture feature point information of each frame of image and the target markers in each frame of image;

5. The method of claim 1, further comprising:

calculating projection errors between the camera coordinate system and the marker coordinate system respectively corresponding to each target marker;

6. The method according to any one of claims 1-5, further comprising:

carrying out transformation processing on the marker information in each frame of image to obtain marker information in a preset direction;

7. The method according to any one of claims 1-5, wherein obtaining marker information in each frame of image comprises:

preprocessing each frame of image, and extracting outline information of each frame of image;

8. An image processing apparatus characterized by comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring texture feature point information of each frame of image in a video to be processed and marker information in each frame of image;

the determining unit is used for determining the pose of the camera in the video to be processed and the pose of the target marker in each marker according to the texture feature point information of each frame of image and the marker information in each frame of image;

the determining unit is specifically configured to determine, according to the texture feature point information of each frame of image and the target markers in the images to be used of two consecutive frames, the pose of the camera in the video to be processed and the pose of the target markers in each marker;

the determining unit is specifically configured to match target markers in the two consecutive frames of images to be used, and determine each target marker pair corresponding to the two consecutive frames of images to be used; determining a transformation matrix between the camera coordinate system corresponding to each target marker and the target marker coordinate system according to the camera coordinate system and the marker coordinate system corresponding to each target marker; determining the pose of a camera in the video to be processed and the pose of a target marker in each marker according to the texture feature point information of each frame of image and the transformation matrix;

the determining unit is further configured to determine a target position of a preset virtual image in each frame of image according to the pose of the camera and the pose of the target marker;

9. An image processing apparatus, comprising a memory and a processor; wherein, the first and the second end of the pipe are connected with each other,

the memory for storing a computer program;

the processor is used for reading the computer program stored in the memory and executing an image processing method according to any one of the claims 1-7 according to the computer program in the memory.

10. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed by a processor, implement an image processing method according to any one of claims 1 to 7.