CN116092071A

CN116092071A - Target detection method of AR equipment and AR equipment

Info

Publication number: CN116092071A
Application number: CN202211383028.9A
Authority: CN
Inventors: 卢锦亮; 郑贵桢
Original assignee: Hisense Electronic Technology Shenzhen Co ltd
Current assignee: Hisense Electronic Technology Shenzhen Co ltd
Priority date: 2022-11-07
Filing date: 2022-11-07
Publication date: 2023-05-09

Abstract

The application provides a target detection method of AR equipment and the AR equipment, which are used for improving the accuracy of target detection. Comprising the following steps: aiming at an image containing a target object, which is shot by a camera in the AR equipment at any moment, carrying out target detection on the image by utilizing a target detection algorithm to obtain the scale of the target object at the current moment; based on the scale and the pose of the camera at the current moment in a world coordinate system, obtaining the pose of the camera at the current moment in a target coordinate system, wherein the target coordinate system is a coordinate system corresponding to the target object; according to the pose of the camera at the current moment in the target coordinate system and the local pose of the target object at the current moment, obtaining the initial pose of the target object at the current moment in the camera coordinate system, wherein the local pose of the target is obtained based on the scale of the target; and optimizing the initial pose of the target object at the current moment through the pose change quantity of the camera to obtain the target pose of the target object at the current moment in the camera coordinate system.

Description

Target detection method of AR equipment and AR equipment

Technical Field

The application relates to the technical field of virtual reality device image processing, in particular to a target detection method of AR (augmented reality) equipment and the AR equipment.

Background

AR (Augmented Reality ) technology provides a rich virtual-real interaction experience for users who can interact with virtual items in a real scene through AR devices. At present, virtual-real interaction is mainly characterized in that a user generates a virtual object in a real scene through an AR device, and the position of the virtual object in the real space is determined through relevant information of the AR device, such as the pose of a camera in the AR device. Further, the user may prefer that the virtual item be capable of interacting with the real item, e.g., the virtual item in FIG. 1 may be placed around the real item. Or the effect that the movement of a virtual article as in fig. 2 can collide with a real article. In the interaction of virtual objects and real objects, 3D (3-dimensional) object detection techniques play a significant role.

In the 3D target detection technology in the prior art, a scheme based on a laser radar is mainly adopted, and although the 3D position of a target with higher precision can be obtained by generating 3D point cloud data through the laser radar, acquisition equipment such as the laser radar is high in cost and high in power consumption, and is difficult to directly apply to portable equipment. However, in the prior art, if a target detection technology of a laser radar is not used, the detection accuracy is low, and the accuracy of target detection is low.

Disclosure of Invention

The application provides a target detection method of AR equipment and AR equipment, which are used for realizing accurate positioning of targets on the premise of using acquisition equipment such as a laser radar and the like, and improving the accuracy of target detection.

In a first aspect, an embodiment of the present application provides a target detection method of an AR device, including:

aiming at an image containing a target object, which is shot by a camera in AR equipment at any moment, carrying out target detection on the image by utilizing a target detection algorithm to obtain the scale of the target object at the current moment, wherein the scale of the target object at the current moment comprises the length, the width and the height of the target object; the method comprises the steps of,

obtaining the pose of the camera in a target coordinate system at the current moment based on the scale of the target object at the current moment and the pose of the camera in a world coordinate system at the current moment, wherein the target coordinate system is a coordinate system corresponding to the target object;

according to the pose of the camera in the target coordinate system at the current moment and the local pose of the target object at the current moment, obtaining the initial pose of the target object in the camera coordinate system at the current moment, wherein the local pose of the target is obtained based on the scale of the target;

Optimizing the initial pose of the target object at the current moment through the pose change amount of the camera to obtain the target pose of the target object in the camera coordinate system at the current moment, wherein the pose change amount is the change amount of the pose of the camera in the world coordinate system at the current moment and the last moment.

A second aspect of the present application provides an AR device, including a processor and a memory, where the processor and the memory are connected by a bus;

the memory has stored therein a computer program, the processor being configured to perform the following operations based on the computer program:

According to a third aspect provided by an embodiment of the present invention, there is provided a computer storage medium storing a computer program for executing the method according to the first aspect.

In the above embodiment of the present application, the pose of the camera in the target coordinate system is obtained based on the detected scale of the target object and the pose of the camera in the world coordinate system, then the initial pose of the target object in the camera coordinate system is obtained according to the pose of the camera in the target coordinate system and the local pose of the target object, and finally the initial pose of the target object is optimized according to the pose change amount of the camera, so as to obtain the target pose of the target object in the camera coordinate system. According to the method and the device for detecting the target object, the initial pose of the target object is optimized through the pose change amount of the camera, the 3D position of the target object in each frame of image in the video can be effectively smoothed relative to the camera, and the interaction influence of the target object caused by position shake is relieved, so that the target is accurately positioned on the premise that acquisition equipment such as a laser radar is not used, and the accuracy of target detection is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

Fig. 1 schematically illustrates one of application scenario diagrams provided in an embodiment of the present application;

fig. 2 illustrates a second application scenario schematic diagram provided in an embodiment of the present application;

fig. 3 illustrates a third exemplary application scenario provided in an embodiment of the present application;

fig. 4 illustrates one of flowcharts of a target detection method of an AR device provided in an embodiment of the present application;

FIG. 5 schematically illustrates a flow chart for determining a second pose change amount according to an embodiment of the present application;

fig. 6 is a schematic flow chart of determining a target pose of the target object in a camera coordinate system at the current moment according to an embodiment of the present application;

FIG. 7 is a schematic flow chart of optimizing the scale of a target object according to an embodiment of the present application;

Fig. 8 is a schematic flow chart of determining a target scale of the target object at the current moment according to an embodiment of the present application;

fig. 9 illustrates a second flowchart of a target detection method of an AR device according to an embodiment of the present application;

fig. 10 schematically illustrates a structural diagram of an object detection apparatus of an AR device provided in an embodiment of the present application;

fig. 11 illustrates a hardware configuration diagram of an AR device provided in an embodiment of the present application.

Detailed Description

For purposes of clarity, embodiments and advantages of the present application, the following description will make clear and complete the exemplary embodiments of the present application, with reference to the accompanying drawings in the exemplary embodiments of the present application, it being apparent that the exemplary embodiments described are only some, but not all, of the examples of the present application.

Based on the exemplary embodiments described herein, all other embodiments that may be obtained by one of ordinary skill in the art without making any inventive effort are within the scope of the claims appended hereto. Furthermore, while the disclosure is presented in the context of an exemplary embodiment or embodiments, it should be appreciated that the various aspects of the disclosure may, separately, comprise a complete embodiment.

It should be noted that the brief description of the terms in the present application is only for convenience in understanding the embodiments described below, and is not intended to limit the embodiments of the present application. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.

The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprise" and "have," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to those elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" as used in this application refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the function associated with that element.

The ideas of the embodiments of the present application are summarized below.

In the existing 3D target detection technology, a scheme based on a laser radar is mainly adopted, and although the 3D position of a target with higher precision can be obtained by generating 3D point cloud data through the laser radar, acquisition equipment such as the laser radar is high in cost and high in power consumption, and is difficult to directly apply to portable equipment. However, if the 3D target detection is directly performed without using an acquisition device such as a laser radar, the detection accuracy is low, and the target detection accuracy is reduced.

Based on the problem that in the prior art, the accuracy of target detection is low when no acquisition equipment such as a laser radar is used, the embodiment of the application provides a target detection method of an AR device, the pose of a camera in a target coordinate system is obtained based on the detected scale of the target object and the pose of the camera in a world coordinate system, then the initial pose of the target object in the camera coordinate system is obtained according to the pose of the camera in the target coordinate system and the local pose of the target object, and finally the initial pose of the target object is optimized according to the pose change amount of the camera, so that the target pose of the target object in the camera coordinate system is obtained. According to the method and the device for detecting the target object, the initial pose of the target object is optimized through the pose change amount of the camera, the 3D position of the target object in each frame of image in the video can be effectively smoothed relative to the camera, and the interaction influence of the target object caused by position shake is relieved, so that the target is accurately positioned on the premise that acquisition equipment such as a laser radar is not used, and the accuracy of target detection is improved.

Embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 schematically illustrates an application scenario provided in an embodiment of the present application; as shown in fig. 1, the application scenario is described by taking an AR device as an example. The application scenario includes an AR device 110 and a server 120. The server 120 may be implemented by a single server or by a plurality of servers. The server 120 may be implemented by a physical server or may be implemented by a virtual server.

In a possible application scenario, the AR device 110 sends a shot image containing a target object to the server 120, the image containing the target object, which is shot by a camera in the AR device 110 at any time and received by the server 120, is subjected to target detection by using a target detection algorithm to obtain a scale of the target object at the current time, wherein the scale of the target object at the current time comprises a length, a width and a height of the target object; the server 120 obtains the pose of the camera in the target coordinate system at the current moment based on the scale of the target object at the current moment and the pose of the camera in the world coordinate system at the current moment, wherein the target coordinate system is a coordinate system corresponding to the target object; and obtaining the initial pose of the target object in the camera coordinate system at the current moment according to the pose of the camera in the target coordinate system at the current moment and the local pose of the target object at the current moment, wherein the local pose of the target is obtained based on the scale of the target. Finally, the server 120 optimizes the initial pose of the target object at the current moment through the pose change amount of the camera to obtain the target pose of the target object in the camera coordinate system at the current moment, wherein the pose change amount is the change amount of the pose of the camera in the world coordinate system at the current moment and the last moment.

As shown in fig. 2, another application scenario is schematically illustrated, where the application scenario includes an AR device 110, a server 120, and a memory 130. The AR device 110 stores the captured image containing the target object in the memory 130, the server 120 obtains the image containing the target object at any moment from the memory 130, and performs target detection on the image by using a target detection algorithm to obtain a scale of the target object at the current moment, where the scale of the target object at the current moment includes a length, a width and a height of the target object; the server 120 obtains the pose of the camera in the target coordinate system at the current moment based on the scale of the target object at the current moment and the pose of the camera in the world coordinate system at the current moment, wherein the target coordinate system is a coordinate system corresponding to the target object; and obtaining the initial pose of the target object in the camera coordinate system at the current moment according to the pose of the camera in the target coordinate system at the current moment and the local pose of the target object at the current moment, wherein the local pose of the target is obtained based on the scale of the target. Finally, the server 120 optimizes the initial pose of the target object at the current moment through the pose change amount of the camera to obtain the target pose of the target object in the camera coordinate system at the current moment, wherein the pose change amount is the change amount of the pose of the camera in the world coordinate system at the current moment and the last moment.

As shown in fig. 3, another application scenario is schematically illustrated, where the application scenario includes an AR device 110 and a memory 130. The AR device 110 performs target detection on an image containing a target object, which is captured at any time, by using a target detection algorithm to obtain a scale of the target object, where the scale of the target object at the current time includes a length, a width and a height of the target object; the AR device 110 obtains the pose of the camera in the target coordinate system at the current moment based on the scale of the target object at the current moment and the pose of the camera in the world coordinate system at the current moment, wherein the target coordinate system is a coordinate system corresponding to the target object; and obtaining the initial pose of the target object in the camera coordinate system at the current moment according to the pose of the camera in the target coordinate system at the current moment and the local pose of the target object at the current moment, wherein the local pose of the target is obtained based on the scale of the target. Finally, the AR device 110 optimizes the initial pose of the target object at the current moment by using the pose change amount of the camera to obtain the target pose of the target object in the camera coordinate system at the current moment, where the pose change amount is the change amount of the pose of the camera in the world coordinate system at the current moment and the last moment.

Wherein, only a single AR device 110, a single server 120 and a single memory 130 are detailed in the description herein, it should be understood by those skilled in the art that the AR device 110, the server 120 and the memory 130 are shown to represent the operations of the AR device 110, the server 120 and the memory 130 to which the technical solutions of the present application relate. Rather than implying a limitation on the number, type, or location of AR devices 110, servers 120, and memory 130. It should be noted that the underlying concepts of the example embodiments of the present application are not altered if additional modules are added to or individual modules are removed from the illustrated environment.

It should be noted that, the target detection method of the AR device provided in the present application is not only applicable to the application scenarios shown in fig. 1, fig. 2 and fig. 3, but also applicable to any apparatus for target detection with an AR device.

The target detection method of the AR device according to the exemplary embodiment of the present application will be described below with reference to the accompanying drawings in conjunction with the above-described application scenario, and it should be noted that the above-described application scenario is only shown for the convenience of understanding the method and principle of the present application, and the embodiments of the present application are not limited in any way in this respect.

As shown in fig. 4, which is a flowchart of a target detection method of an AR device, the method may include the following steps:

step 401: aiming at an image containing a target object, which is shot by a camera in AR equipment at any moment, carrying out target detection on the image by utilizing a target detection algorithm to obtain the scale of the target object at the current moment, wherein the scale of the target object at the current moment comprises the length, the width and the height of the target object;

in this embodiment, before the image is subjected to target detection by using the target detection algorithm, a preprocessing operation needs to be performed on the image, where the preprocessing operation includes, but is not limited to, dynamically filling, scaling and normalizing the image, and the image is changed into an image with a single precision type and a specified size. And the single precision type is a float32 type in the present embodiment, and the specified size is 512×512×1 in the present embodiment. However, the image type and the image size in the present embodiment are not limited, and may be set according to actual situations.

It should be noted that: the target detection algorithm in this embodiment may be set according to actual situations, and this embodiment is not limited to the target detection algorithm.

Step 402: obtaining the pose of the camera in a target coordinate system at the current moment based on the scale of the target object at the current moment and the pose of the camera in a world coordinate system at the current moment, wherein the target coordinate system is a coordinate system corresponding to the target object;

the AR device is provided with a camera pose tracking algorithm, so that the AR device can directly acquire the pose of the camera in the world coordinate system at each moment through the camera pose tracking algorithm.

In one embodiment, step 402 may be implemented as: and inputting the scale of the target object at the current moment and the pose of the camera in the world coordinate system at the current moment into a pnp (Perspective-n-Point) algorithm to obtain the pose of the camera in the target coordinate system at the current moment.

It should be noted that: the target coordinate system in this embodiment is a space rectangular coordinate system established by taking the center of gravity of the target object as an origin and taking the straight lines corresponding to the length, width and height of the target object as an x-axis, a y-axis and a z-axis respectively. However, the specific directions of the coordinate axes may be set according to the actual situation, and the present embodiment is not limited herein.

Step 403: according to the pose of the camera in the target coordinate system at the current moment and the local pose of the target object at the current moment, obtaining the initial pose of the target object in the camera coordinate system at the current moment, wherein the local pose of the target is obtained based on the scale of the target;

The local pose of the target is determined by determining a length in a scale of the target as an abscissa of the local pose, determining a width in a scale of the target as an ordinate of the local pose, and determining a height in the scale of the target as an ordinate of the local pose. The direction of the target is a preset initial direction. Thereby, the position coordinates of the local pose are obtained by the abscissa, the ordinate and the initial direction.

In one embodiment, step 403 may be implemented as: multiplying the pose of the camera in the target coordinate system at the current moment with the local pose of the target object at the current moment to obtain the initial pose of the target object in the camera coordinate system at the current moment. The initial pose of the target object in the camera coordinate system at the current moment can be obtained through the formula (1):

wherein ,

for the initial pose of the target object in the camera coordinate system at the current moment i +.>

For the pose of the camera in the target coordinate system at the current moment i +.>

And (3) the local pose of the target object at the current moment i.

Step 404: optimizing the initial pose of the target object at the current moment through the pose change amount of the camera to obtain the target pose of the target object in the camera coordinate system at the current moment, wherein the pose change amount is the change amount of the pose of the camera in the world coordinate system at the current moment and the last moment.

The pose change amount comprises a first pose change amount and a second pose change amount, wherein the first pose change amount is obtained based on the pose of the camera in a world coordinate system, and the second pose change amount is obtained based on the pose of the camera in a target coordinate system;

the manner of determining the first pose change amount and the manner of determining the second pose change amount will be described in detail below:

first pose change amount: and subtracting the pose of the camera at the current moment in the world coordinate system from the pose of the camera at the last moment in the world coordinate system to obtain the first pose variation.

Second pose change amount: as shown in fig. 5, a flow chart for determining the second pose change amount includes the following steps:

step 501: according to the pose of the camera in the target coordinate system at the current moment and the pose of the camera in the world coordinate system at the current moment, converting the local pose of the target object into the local pose of the target object in the world coordinate system;

in one embodiment, step 501 may be implemented as: dividing the pose of the camera in the target coordinate system at the current moment by the pose of the camera in the world coordinate system at the current moment to obtain the local pose of the target object in the world coordinate system. The local pose of the target object in the world coordinate system can be obtained through a formula (2):

wherein ,T^O2W For the local pose of the target object in the world coordinate system,

And (3) the pose of the camera in the world coordinate system at the current moment i.

Step 502: obtaining the pose of the camera in the world coordinate system at the last moment based on the local pose of the target object in the world coordinate system and the pose of the camera in the target coordinate system at the last moment;

the local pose of the target object in the world coordinate system is unchanged, so that the pose of the camera in the world coordinate system at the last moment can be determined through the local pose of the target object in the world coordinate system.

In one embodiment, step 502 may be implemented as: and multiplying the local pose of the target object in the world coordinate system by the pose of the camera in the target coordinate system at the last moment to obtain the pose of the camera in the world coordinate system at the last moment. The pose of the camera in the world coordinate system at the last moment can be obtained through the formula (3):

wherein ,

For the pose of the camera in the world coordinate system at the previous moment i-1, T ^O2W For the local pose of the target object in the world coordinate system, < > >

And the pose of the camera in the target coordinate system at the previous moment i-1.

Step 503: and subtracting the pose of the camera in the world coordinate system at the previous moment from the pose of the camera in the world coordinate system at the current moment to obtain the second pose variation.

After describing the first pose change amount and the determining manner of the first pose change amount, the following describes a specific manner of determining the target pose of the target object in the camera coordinate system at the current moment, as shown in fig. 6, and is a schematic flowchart of determining the target pose of the target object in the camera coordinate system at the current moment, including the following steps:

step 601: obtaining an initial weight matrix of the target object at the current moment based on an intermediate weight matrix corresponding to the target object at the previous moment;

in one embodiment, step 601 may be implemented as: multiplying the intermediate weight matrix with a preset state transition matrix to obtain a first intermediate state transition matrix; multiplying the first intermediate state transition matrix with the inverse matrix of the preset state transition matrix to obtain a second intermediate state transition matrix; and adding the second intermediate state transition matrix with a preset first noise matrix to obtain an initial weight matrix of the target object at the current moment. The initial weight matrix of the target object at the current moment can be obtained through a formula (4):

wherein ,

an initial weight matrix of the target object at the current moment i is obtained, A is the preset state transition matrix, and P _i-1 And for the middle weight matrix corresponding to the target object in the previous moment i-1, A' is the inverse matrix of the preset state transition matrix, and Q is the preset first noise matrix.

It should be noted that: and if the current time is the first time, the intermediate weight matrix corresponding to the target object at the last time is a preset initial intermediate weight matrix.

Step 602: obtaining a target weight matrix of the target object at the current moment by using the initial weight matrix of the target object at the current moment; the target weight matrix of the target object at the current moment can be obtained through a formula (5):

wherein ,K_i And for the target weight matrix of the target object at the current moment i, H is a preset second noise matrix, H' is an inverse matrix of the second noise matrix, and R is a preset third noise matrix.

After the target weight matrix is obtained, the intermediate weight matrix at the current time may be updated by equation (6):

wherein ,P_i And (3) taking the intermediate weight matrix of the current moment I as a preset identity matrix.

Step 603: obtaining a target pose change amount through the target weight matrix, the first pose change amount and the second pose change amount; wherein the target pose change amount can be obtained by the formula (7):

wherein ,ΔT_i As the amount of change in the pose of the target,

for the second posture change amount, +.>

And the first pose change amount is the first pose change amount.

Step 604: and obtaining the target pose of the target object in a camera coordinate system at the current moment according to the target pose change amount and the initial pose.

In one embodiment, step 604 may be implemented as: and multiplying the target pose change amount by the initial pose to obtain the target pose. Wherein the target pose can be obtained by formula (8):

/>

wherein ,

for the target pose of the target object in the camera coordinate system at the current moment i, delta T _i For the target pose variation amount, +.>

And (3) the initial pose of the target object in the camera coordinate system at the current moment i.

To further increase the accuracy of target detection, in one embodiment, after performing step 404, as shown in fig. 7, a flowchart of optimizing the scale of the target object includes the following steps:

Step 701: optimizing the scale of the target object at the current moment by utilizing the target pose of the target object at the current moment to obtain the target scale of the target object at the current moment;

as shown in fig. 8, a flow chart for determining a target scale of the target object at the current moment includes the following steps:

step 801: re-projecting the target pose of the target object at the current moment on a corresponding image to obtain the actual position of the target object in the image;

the manner of re-projection in the present embodiment may be set according to the actual situation, and the present embodiment is not limited to the manner of re-projection here.

Step 802: inputting the target pose of the target object into a pre-trained neural network to obtain the predicted position of the target object in the image;

it should be noted that: the neural network is not limited in this embodiment, and the neural network in this embodiment may be set according to actual situations.

Step 803: obtaining an adjustment weight according to the actual position and the predicted position;

in one embodiment, step 803 may be implemented as: multiplying the initial adjustment weight by the actual position to obtain an adjusted actual position, subtracting the adjusted actual position from the predicted position to obtain an error value, adjusting the initial adjustment weight by a preset mode to obtain an adjusted initial adjustment weight if the error value is not smaller than a specified threshold, determining the adjusted initial adjustment weight as the initial adjustment weight, and returning to multiplying the initial adjustment weight by the actual position to obtain the adjusted actual position until the error value is smaller than the specified threshold.

The preset mode is as follows: and increasing or reducing the initial adjustment weight by a preset value every time to obtain the adjusted initial adjustment weight. However, the preset mode is not limited in this embodiment, and may be set according to actual situations.

Step 804: and adjusting the scale of the target object at the current moment based on the adjustment weight to obtain the target scale of the target object at the current moment.

In one embodiment, the adjustment weight is multiplied by a scale of the target object at the current time to obtain a target scale of the target object at the current time. The target scale of the target object at the current moment can be obtained through a formula (9):

wherein ,

for the target scale of the target object at the current moment, and (2)>

Alpha is the scale of the target object at the current moment ₀ And adjusting the weight for the user.

Step 702: and inputting the target scale of the target object at the current moment and the scale in the image of the target object at the next moment into a Kalman filtering algorithm, carrying out smoothing processing on the scale of the image of the target object at the next moment to obtain the smoothed scale of the target object in the image of the next moment, and determining the smoothed scale as the scale of the target object in the image of the next moment so as to obtain the target pose of the target object in a camera coordinate system at the next moment based on the scale of the target object in the image of the next moment.

In this embodiment, the smooth processing is performed on the scale of the image by using a kalman filtering algorithm, so that the smoothed scale of the target object in the image at the next moment is the manner in the prior art, which is not described in detail herein.

In order to further improve the accuracy of the target detection, in one embodiment, before executing step 402, the scale of the target object at the current time and the target scale in the image of the target object at the previous time are input into a kalman filtering algorithm to perform smoothing processing on the scale of the image of the target object at the current time, so as to obtain the smoothed scale of the target object in the image of the current time, and the smoothed scale of the target object in the image of the current time is determined as the scale of the target object at the current time.

When the target pose of the target object in the camera coordinate system is obtained, the virtual target can be placed on the target object based on the target pose, or the phenomenon that the virtual target collides with the target object can be realized. Since both of these approaches are prior art approaches. The detailed implementation manner will not be repeated here in this embodiment.

For further connection with the technical solution in the present application, the following detailed description with reference to fig. 9 may include the following steps:

step 901: aiming at an image containing a target object, which is shot by a camera in AR equipment at any moment, carrying out target detection on the image by utilizing a target detection algorithm to obtain the scale of the target object at the current moment, wherein the scale of the target object at the current moment comprises the length, the width and the height of the target object;

step 902: inputting the scale of the target object at the current moment and the target scale in the image of the target object at the previous moment into a Kalman filtering algorithm, performing smoothing processing on the scale of the image of the target object at the current moment to obtain the smoothed scale of the target object in the image of the current moment, and determining the smoothed scale of the target object in the image of the current moment as the scale of the target object at the current moment;

step 903: inputting the scale of the target object at the current moment and the pose of the camera in the world coordinate system at the current moment into a pnp algorithm to obtain the pose of the camera in the target coordinate system at the current moment;

Step 904: according to the pose of the camera in the target coordinate system at the current moment and the local pose of the target object at the current moment, obtaining the initial pose of the target object in the camera coordinate system at the current moment, wherein the local pose of the target is obtained based on the scale of the target;

step 905: obtaining an initial weight matrix of the target object at the current moment based on an intermediate weight matrix corresponding to the target object at the previous moment;

step 906: obtaining a target weight matrix of the target object at the current moment by using the initial weight matrix of the target object at the current moment;

step 907: obtaining a target pose change amount through the target weight matrix, the first pose change amount and the second pose change amount; wherein the first pose change amount is obtained based on the pose of the camera in a world coordinate system, and the second pose change amount is obtained based on the pose of the camera in a target coordinate system;

step 908: obtaining a target pose of the target object in a camera coordinate system at the current moment according to the target pose change amount and the initial pose;

Step 909: optimizing the scale of the target object at the current moment by utilizing the target pose of the target object at the current moment to obtain the target scale of the target object at the current moment;

step 910: and inputting the target scale of the target object at the current moment and the scale in the image of the target object at the next moment into a Kalman filtering algorithm, carrying out smoothing processing on the scale of the image of the target object at the next moment to obtain the smoothed scale of the target object in the image of the next moment, and determining the smoothed scale as the scale of the target object in the image of the next moment so as to obtain the target pose of the target object in a camera coordinate system at the next moment based on the scale of the target object in the image of the next moment.

Based on the same inventive concept, the target detection method of the AR device as described above in the present disclosure may also be implemented by a target detection apparatus of an AR device. The effect of the target detection device of the AR device is similar to that of the foregoing method, and will not be described herein.

Fig. 10 is a schematic structural diagram of an object detection apparatus of an AR device according to an embodiment of the present disclosure.

As shown in fig. 10, the target detection apparatus 1000 of the AR device of the present disclosure may include a target detection module 1010, a camera pose determination module 1020, an initial pose determination module 1030, and a target pose determination module 1040.

A target detection module 1010, configured to perform target detection on an image including a target object captured by a camera in an AR device at any time by using a target detection algorithm, to obtain a scale of the target object at a current time, where the scale of the target object at the current time includes a length, a width, and a height of the target object;

a camera pose determining module 1020, configured to obtain a pose of the camera in a target coordinate system at a current time based on a scale of the target object at the current time and a pose of the camera in a world coordinate system at the current time, where the target coordinate system is a coordinate system corresponding to the target object;

an initial pose determining module 1030, configured to obtain an initial pose of the target object in the camera coordinate system at the current time according to a pose of the camera in the target coordinate system at the current time and a local pose of the target object at the current time, where the local pose of the target is obtained based on a scale of the target;

The target pose determining module 1040 is configured to optimize an initial pose of the target object at a current moment by a pose change amount of a camera, so as to obtain a target pose of the target object at the current moment in a camera coordinate system, where the pose change amount is a change amount of a pose of the camera at the current moment and at a previous moment in a world coordinate system.

In one embodiment, the apparatus further comprises:

the scale optimization module 1050 is configured to optimize, according to the pose change of the camera, an initial pose of the target object at the current moment to obtain a target pose of the target object at the current moment in a camera coordinate system, and then optimize, according to the target pose of the target object at the current moment, a scale of the target object at the current moment to obtain a target scale of the target object at the current moment;

and inputting the target scale of the target object at the current moment and the scale in the image of the target object at the next moment into a Kalman filtering algorithm, carrying out smoothing processing on the scale of the image of the target object at the next moment to obtain the smoothed scale of the target object in the image of the next moment, and determining the smoothed scale as the scale of the target object in the image of the next moment so as to obtain the target pose of the target object in a camera coordinate system at the next moment based on the scale of the target object in the image of the next moment.

In one embodiment, the scale optimization module 1050 performs the optimization on the scale of the target object at the current time by using the target pose of the target object at the current time to obtain the target scale of the target object at the current time, which is specifically configured to:

re-projecting the target pose of the target object at the current moment on a corresponding image to obtain the actual position of the target object in the image; the method comprises the steps of,

inputting the target pose of the target object into a pre-trained neural network to obtain the predicted position of the target object in the image;

obtaining an adjustment weight according to the actual position and the predicted position;

and adjusting the scale of the target object at the current moment based on the adjustment weight to obtain the target scale of the target object at the current moment.

In one embodiment, the camera pose determining module 1020 is specifically configured to:

inputting the scale of the target object at the current moment and the pose of the camera in the world coordinate system at the current moment into a pnp algorithm to obtain the pose of the camera in the target coordinate system at the current moment;

the initial pose determining module 1030 is specifically configured to:

Multiplying the pose of the camera in the target coordinate system at the current moment with the local pose of the target object at the current moment to obtain the initial pose of the target object in the camera coordinate system at the current moment.

In one embodiment, the pose change amount includes a first pose change amount obtained based on a pose of the camera in a world coordinate system and a second pose change amount obtained based on a pose of the camera in a target coordinate system;

the target pose determining module 1040 is specifically configured to:

obtaining an initial weight matrix of the target object at the current moment based on an intermediate weight matrix corresponding to the target object at the previous moment;

obtaining a target weight matrix of the target object at the current moment by using the initial weight matrix of the target object at the current moment;

obtaining a target pose change amount through the target weight matrix, the first pose change amount and the second pose change amount;

and obtaining the target pose of the target object in a camera coordinate system at the current moment according to the target pose change amount and the initial pose.

In one embodiment, the target pose determining module 1040 executes the intermediate weight matrix corresponding to the target object at the previous time to obtain an initial weight matrix of the target object at the current time, which is specifically configured to:

multiplying the intermediate weight matrix with a preset state transition matrix to obtain a first intermediate state transition matrix;

multiplying the first intermediate state transition matrix with the inverse matrix of the preset state transition matrix to obtain a second intermediate state transition matrix;

and adding the second intermediate state transition matrix with a preset first noise matrix to obtain an initial weight matrix of the target object at the current moment.

In one embodiment, the target pose determination module 1040 is further configured to:

the initial weight matrix is obtained by the following formula:

wherein ,

In one embodiment, the target pose determining module 1040 executes the initial weight matrix of the target object at the current time to obtain a target weight matrix of the target object at the current time, which is specifically configured to:

The target weight matrix of the target object at the current moment is obtained through the following formula:

wherein ,K_i The method comprises the steps that a target weight matrix of a target object at a current moment i is obtained, H is a preset second noise matrix, H' is an inverse matrix of the second noise matrix, and R is a preset third noise matrix;

the target pose determining module 1040 executes the obtaining a target pose change amount by the target weight matrix, the first pose change amount and the second pose change amount, and is specifically configured to:

the target pose change amount is obtained through the following formula:

wherein ,ΔT_i As the amount of change in the pose of the target,

for the second posture change amount, +.>

And the first pose change amount is the first pose change amount.

In one embodiment, the target pose determining module 1040 performs the step of obtaining, according to the target pose change amount and the initial pose, a target pose of the target object in a camera coordinate system at the current moment, specifically configured to:

and multiplying the target pose change amount by the initial pose to obtain the target pose.

Having described a target detection method and apparatus of an AR device according to an exemplary embodiment of the present invention, next, an AR device according to another exemplary embodiment of the present invention is described.

Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein collectively as a "circuit," module "or" system.

In some possible implementations, an AR device according to the present invention may include at least one processor, and at least one computer storage medium. Wherein the computer storage medium stores program code which, when executed by a processor, causes the processor to perform the steps in the object detection method of an AR device according to various exemplary embodiments of the present invention described above in the present specification. For example, the processor may perform steps 401-404 as shown in FIG. 4.

An AR device 1100 according to this embodiment of the present invention is described below with reference to fig. 11. The AR device 1100 shown in fig. 11 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in fig. 11, AR device 1100 is in the form of a generic AR device. Components of AR device 1100 may include, but are not limited to: the at least one processor 1101, the at least one computer storage medium 1102, a bus 1103 that connects the various system components, including the computer storage medium 1102 and the processor 1101.

The bus 1103 represents one or more of several types of bus structures, including a computer storage media bus or computer storage media controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.

The computer storage media 1102 may include readable media in the form of volatile computer storage media, such as random access computer storage media (RAM) 1121 and/or cache storage media 1122, and may further include read only computer storage media (ROM) 1123.

The computer storage media 1102 may also include a program/utility 1125 having a set (at least one) of program modules 1124, such program modules 1124 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The AR device 1100 may also communicate with one or more external devices 1104 (e.g., keyboard, pointing device, etc.), one or more devices that enable a user to interact with the AR device 1100, and/or any device (e.g., router, modem, etc.) that enables the AR device 1100 to communicate with one or more other AR devices. Such communication may occur through an input/output (I/O) interface 1105. Also, AR device 1100 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, e.g., the internet, through network adapter 1106. As shown, network adapter 1106 communicates with other modules for AR device 1100 over bus 1103. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in connection with the AR device 1100, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

In some possible embodiments, aspects of the object detection method of an AR device provided by the present invention may also be implemented in the form of a program product, which includes program code for causing a computer device to perform the steps in the object detection method of an AR device according to the various exemplary embodiments of the invention as described above in this specification, when the program product is run on the computer device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, a random access computer storage medium (RAM), a read-only computer storage medium (ROM), an erasable programmable read-only computer storage medium (EPROM or flash memory), an optical fiber, a portable compact disc read-only computer storage medium (CD-ROM), an optical computer storage medium, a magnetic computer storage medium, or any suitable combination of the foregoing.

The program product of object detection for an AR device of embodiments of the present invention may employ a portable compact disc read-only computer storage medium (CD-ROM) and include program code and may run on the AR device. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user AR device, partly on the user device, as a stand-alone software package, partly on the user AR device, partly on a remote AR device, or entirely on a remote AR device or server. In the case of remote AR devices, the remote AR device may be connected to the user AR device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external AR device (e.g., connected through the internet using an internet service provider).

It should be noted that although several modules of the apparatus are mentioned in the detailed description above, this division is merely exemplary and not mandatory. Indeed, the features and functions of two or more modules described above may be embodied in one module in accordance with embodiments of the present invention. Conversely, the features and functions of one module described above may be further divided into a plurality of modules to be embodied.

Furthermore, although the operations of the methods of the present invention are depicted in the drawings in a particular order, this is not required to either imply that the operations must be performed in that particular order or that all of the illustrated operations be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk computer storage media, CD-ROM, optical computer storage media, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable computer storage medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable computer storage medium produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method for detecting an object of an AR device, the method comprising:

2. The method according to claim 1, wherein the optimizing the initial pose of the target object at the current moment by the pose change amount of the camera obtains the target pose of the target object in the camera coordinate system at the current moment, and the method further comprises:

Optimizing the scale of the target object at the current moment by utilizing the target pose of the target object at the current moment to obtain the target scale of the target object at the current moment;

3. The method according to claim 2, wherein optimizing the scale of the target object at the current time by using the target pose of the target object at the current time to obtain the target scale of the target object at the current time comprises:

4. The method according to claim 1, wherein the obtaining the pose of the camera in the target coordinate system at the current time based on the scale of the target object at the current time and the pose of the camera in the world coordinate system at the current time includes:

according to the pose of the camera in the target coordinate system at the current moment and the local pose of the target object at the current moment, obtaining the initial pose of the target object in the camera coordinate system at the current moment comprises the following steps:

5. The method of claim 1, wherein the pose change amount comprises a first pose change amount and a second pose change amount, wherein the first pose change amount is derived based on a pose of the camera in a world coordinate system and the second pose change amount is derived based on a pose of the camera in a target coordinate system;

optimizing the initial pose of the target object at the current moment by the pose change amount of the camera to obtain the target pose of the target object in the camera coordinate system at the current moment, wherein the method comprises the following steps:

6. The method of claim 5, wherein the obtaining the initial weight matrix of the target object at the current time based on the intermediate weight matrix corresponding to the target object at the previous time comprises:

7. The method according to claim 5 or 6, characterized in that the initial weight matrix is obtained by the following formula:

wherein ,

8. The method according to claim 5, wherein the obtaining the target weight matrix of the target object at the current time by using the initial weight matrix of the target object at the current time includes:

K _i ＝P _i ^- H′(HP _i ^- H′+R) ^-1 ；

the obtaining the target pose change amount through the target weight matrix, the first pose change amount and the second pose change amount includes:

the target pose change amount is obtained through the following formula:

ΔT _i ＝ΔT _i ² +K _i (ΔT _i ¹ -HΔT _i ² )；

wherein ,ΔT_i For the target pose change amount, deltaT _i ² For the second position change amount, deltaT _i ¹ And the first pose change amount is the first pose change amount.

9. The method according to claim 5, wherein the obtaining the target pose of the target object in the camera coordinate system at the current moment according to the target pose variation and the initial pose includes:

10. An AR device comprising a processor and a memory, said processor and said memory being connected by a bus;