WO2024042645A1

WO2024042645A1 - Video processing device, video processing method, and video processing program

Info

Publication number: WO2024042645A1
Application number: PCT/JP2022/031905
Authority: WO
Inventors: 誠武藤
Original assignee: 日本電信電話株式会社
Priority date: 2022-08-24
Filing date: 2022-08-24
Publication date: 2024-02-29

Abstract

A user-worn video processing device according to one embodiment of the present invention comprises: an image acquisition unit that acquires a captured image captured by a camera, the captured image containing a moving body and an environment; a moving body detection unit that detects the moving body from the captured image; a line-of-sight estimation unit that estimates the line of sight of a user on the basis of the captured image; a line-of-sight movement detection unit that acquires sensor data from a sensor provided in the video processing device and detects a movement of the line of sight of the user with respect to the estimated line of sight as a start point on the basis of the sensor data; a determination unit that determines, on the basis of the detected movement of the line of sight of the user, whether the moving body is a moving body that the user is riding; a drawing unit that computes the appearance of an AR content on the basis of the estimated movement of the line of sight; and an output control unit that causes the computed AR content to be displayed.

Description

Video processing device, video processing method, and video processing program

The present invention relates to a video processing device, a video processing method, and a video processing program.

A user using an augmented reality (AR) system can view the real space of the real world through a mobile terminal or an AR device. At this time, content such as navigation information or 3D data (hereinafter referred to as AR content) is presented as additional information in the real space. That is, a user using an AR system can see AR content superimposed on the real world and use information about this content.

For example, when a user using an AR system is moving around on a moving object, the image displayed by the AR device will include a portion of the camera image showing the environment (scenery in front of the bicycle) and a portion showing the moving object. (a part of the bicycle body).

In a situation where a moving object similar to the one on which the user is riding is running parallel to the one on which the user is riding, in addition to the moving object on which the user is riding, a moving object on which the user is not riding is also recognized. As a result, there is a problem in that the AR device displays the AR content at the correct position, and also displays the AR content at a location corresponding to a moving object on which the user is not riding.

This invention was made in view of the above-mentioned circumstances, and its purpose is to detect whether the user is not riding in the case where a moving object on which the user is not riding is recognized in addition to the moving object on which the user is riding. The objective is to provide a technology that can prevent AR content from being displayed even in locations corresponding to moving objects.

In order to solve the above problems, one aspect of the present invention is an image processing device worn by a user, which includes an image acquisition unit that acquires a photographed image including a moving body and an environment, which is photographed by a camera; a moving object detection section that detects the moving object from the camera; a line of sight estimating section that estimates the user's line of sight based on the photographed image; and a line of sight estimation section that acquires sensor data from a sensor included in the video processing device, and acquires sensor data from a sensor included in the image processing device, and acquires sensor data based on the sensor data. a line-of-sight movement detection unit that detects a movement of the user's line of sight starting from the estimated line of sight; a determination unit that determines whether or not the AR content is displayed; a drawing unit that calculates how the AR content looks based on the estimated movement of the line of sight; and an output control unit that controls the display of the calculated AR content. .

According to one aspect of the present invention, even if a moving object with a similar appearance appears in a photographed image, AR content can be displayed only at a location corresponding to the moving object on which the user is riding. , it becomes possible to accurately present AR content to the user.

FIG. 1 is a block diagram showing an example of the hardware configuration of a video processing device according to an embodiment. FIG. 2 is a block diagram showing the software configuration of the video processing apparatus in the embodiment in relation to the hardware configuration shown in FIG. 1. FIG. 3 is a flowchart illustrating an example of an operation by which the video processing device displays AR content only at the correct position of a captured image. FIG. 4 is a diagram showing an example of a photographed image. FIG. 5 is a diagram illustrating an example when a moving object is detected in a photographed image. FIG. 6 is a diagram showing an example of a "feature point space" and an "AR content space" stored in the space storage unit. FIG. 7 is a diagram illustrating an example of "the range of the starting point of the line of sight with the moving object on which the user is riding as an object" and the "range of the direction of the line of sight with the moving object of the user as the object". FIG. 8 is a diagram showing an example of how the calculated AR content looks.

Hereinafter, embodiments according to the present invention will be described with reference to the drawings. Note that, hereinafter, elements that are the same or similar to elements that have already been explained will be given the same or similar numerals, and overlapping explanations will basically be omitted. For example, when there are multiple identical or similar elements, a common code may be used to explain each element without distinction, or a common code may be used to distinguish and explain each element. In addition, branch numbers may also be used.

[Embodiment]
(composition)
FIG. 1 is a block diagram showing an example of the hardware configuration of a video processing device 1 according to an embodiment.
The video processing device 1 is a computer that analyzes input data, generates and outputs output data. The video processing device 1 may be, for example, an AR device including AR glasses, smart glasses, or other wearable devices. That is, the video processing device 1 may be a device worn and used by a user.

As shown in FIG. 1, the video processing device 1 includes a control section 10, a program storage section 20, a data storage section 30, a communication interface 40, and an input/output interface 50. The control unit 10, program storage unit 20, data storage unit 30, communication interface 40, and input/output interface 50 are communicably connected to each other via a bus. Further, the communication interface 40 may be communicably connected to an external device via a network. Further, the input/output interface 50 is communicably connected to the input device 2, the output device 3, the camera 4, and the inertial sensor 5.

The control unit 10 controls the video processing device 1. The control unit 10 includes a hardware processor such as a central processing unit (CPU). For example, the control unit 10 may be an integrated circuit capable of executing various programs.

The program storage unit 20 includes non-volatile memories that can be written to and read from at any time such as EPROM (Erasable Programmable Read Only Memory), HDD (Hard Disk Drive), and SSD (Solid State Drive), as well as ROM ( It can be used in combination with non-volatile memory such as Read Only Memory). The program storage unit 20 stores programs necessary to execute various processes. That is, the control unit 10 can implement various controls and operations by reading and executing programs stored in the program storage unit 20.

The data storage unit 30 is a storage that uses a combination of a non-volatile memory that can be written to and read from at any time, such as an HDD or a memory card, and a volatile memory such as a RAM (Random Access Memory), as a storage medium. . The data storage unit 30 is used to store data acquired and generated while the control unit 10 executes programs and performs various processes.

The communication interface 40 includes one or more wired or wireless communication modules. For example, the communication interface 40 includes a communication module that makes a wired or wireless connection to an external device via a network. Communication interface 40 may include a wireless communication module that wirelessly connects to external devices such as Wi-Fi access points and base stations. Furthermore, the communication interface 40 may include a wireless communication module for wirelessly connecting to an external device using short-range wireless technology. That is, the communication interface 40 may be any general communication interface as long as it is capable of communicating with an external device under the control of the control unit 10 and transmitting and receiving various information including past performance data. .

The input/output interface 50 is connected to the input device 2, output device 3, camera 4, inertial sensor 5, etc. The input/output interface 50 is an interface that allows information to be transmitted and received between the input device 2, the output device 3, and the plurality of cameras 4 and inertial sensors 5. The input/output interface 50 may be integrated with the communication interface 40. For example, the video processing device 1 and at least one of the input device 2, the output device 3, the camera 4, and the inertial sensor 5 are wirelessly connected using short-range wireless technology or the like. Information may also be sent and received using

The input device 2 may include, for example, a keyboard, a pointing device, etc. for the user to input various information including past performance data to the video processing device 1. The input device 2 also includes a reader for reading data to be stored in the program storage section 20 or the data storage section 30 from a memory medium such as a USB memory, and a disk device for reading such data from a disk medium. May be included.

The output device 3 includes a display that displays images captured by the camera 4, AR content, and the like. The output device 3 may be integrated with the video processing device 1. For example, when the video processing device 1 is AR glasses or smart glasses, the output device 3 is a part of the glasses.

The camera 4 is capable of photographing environments such as landscapes, and may be a general camera 4 that can be attached to the video processing device 1. Here, the environment generally refers to the scenery that is photographed. The camera 4 may be integrated with the video processing device 1. The camera 4 may output the captured image to the control unit 10 of the video processing device 1 through the input/output interface 50.

The inertial sensor 5 includes, for example, an acceleration sensor, an angular velocity sensor, a geomagnetic sensor, and the like. For example, when the video processing device 1 is an AR device, the inertial sensor 5 senses the moving speed and head movement of the user wearing the AR device, and outputs sensor data according to the sensing to the control unit 10.

FIG. 2 is a block diagram showing the software configuration of the video processing device 1 in the embodiment in relation to the hardware configuration shown in FIG. 1.
The control unit 10 includes an image acquisition unit 101 , a moving object detection unit 102 , a feature point extraction unit 103 , a line of sight estimation unit 104 , a line of sight movement estimation unit 105 , a sensor data control unit 106 , and a target object determination unit 107 , an AR content drawing section 108 , and an output control section 109 .

The image acquisition unit 101 acquires a photographed image taken by the camera 4. Note that the image acquisition unit 101 may store the captured image in the image storage unit 301.

The moving object detection unit 102 detects a moving object from the captured image. The moving object detection unit 102 detects the moving object MB appearing in the photographed image. Of course, the moving object may be any arbitrary object such as a motorized bicycle, an electric bicycle, a motorcycle, or a vehicle. Further, the moving object may include only the moving object, or may include a part of the body of the user riding the moving object, such as an arm. Further, a general technique may be used as the detection method. Furthermore, the moving object detection unit 102 may delete from the photographed image a portion of the environment that is not the detected moving object.

The feature point extraction unit 103 extracts feature points from the moving object in the captured image. For example, the feature point extracting unit 103 may extract as feature points those located near the feature point space stored in the space storage unit 302, which will be described later.

The line of sight estimation unit 104 estimates the position of the line of sight by comparing the feature points extracted by the feature point extraction unit 103 with the feature point space stored in the spatial storage unit 302. Note that details of the method for estimating the position of the line of sight will be described later.

The line-of-sight movement estimation unit 105 estimates the line-of-sight movement. The line of sight movement estimation unit 105 moves the three-dimensional movement measured by the sensor data control unit 106 (described later) from the position of the line of sight received from the line of sight estimation unit 104 as a starting point, thereby generating a line of sight that follows the movement of the user's head. Estimate movement. That is, the line-of-sight movement estimating unit 105 estimates the movement of the user's line of sight starting from the line-of-sight position estimated by the line-of-sight estimation unit 104 based on the sensor data.

The sensor data control unit 106 acquires sensor data from the inertial sensor 5. Then, the sensor data control unit 106 measures the user's head movement, body movement, etc. from the acquired sensor data. For example, the sensor data control unit 106 measures the user's three-dimensional movement (for example, the user's head movement) based on the sensor data.

The target object determining unit 107 determines whether the target object is a moving object on which the user is riding. For example, the target object determining unit 107 determines whether the moving object is the moving object in which the user is riding, based on the detected movement of the user's line of sight. Specifically, the object determination unit 107 determines whether the origin of the detected line of sight of the user's line of sight is within a predetermined range, and the direction of the detected user's line of sight is within the predetermined range when the user gets on the vehicle. If the moving object is within the visible range, it is determined that the moving object is the moving object on which the user is riding. The predetermined range may be a cubic range that is higher than the sitting position of the mobile object by the average sitting height of a person and takes into consideration the posture in which the user rides the vehicle.

The AR content drawing unit 108 calculates how the AR content looks. The AR content drawing unit 108 draws the estimated line-of-sight movement included in the line-of-sight movement information received from the line-of-sight estimation unit 104, that is, the movement destination line of sight, into the AR content space corresponding to the feature point space stored in the spatial storage unit 302. and calculates how the AR content will appear in the set space.

The output control unit 109 outputs AR content information. The output control unit 109 controls the output device 3 to draw AR content. For example, the output control unit 109 controls the adjusted AR content to be displayed on AR glasses or the like.

The data storage unit 30 includes an image storage unit 301 and a spatial storage unit 302.

The image storage unit 301 may store captured images acquired by the image acquisition unit 101. Here, the captured image stored in the image storage unit 301 may have information about the longitude and latitude in the real world where the captured image was captured, which is acquired by the video processing device 1. Further, the image storage unit 301 may automatically delete the captured image after a predetermined period of time has passed.

The space storage unit 302 stores a feature point space and an AR content space corresponding to the feature point space. The feature point space may be located at a preset position in the photographed image, and may be set at each part of the moving body, for example. Then, an AR content space corresponding to the feature point space may be set.

(motion)
First, a method for displaying AR content on the video processing device 1 (mobile terminal or AR device) used by the user in a general AR system will be described.

The video processing device 1 extracts feature points from the image taken by the camera 4. Furthermore, the video processing device 1 also extracts feature points from a captured image that has been captured in advance (for example, a captured image of the previous frame or several frames before the image) (hereinafter referred to as feature point space). ) and the extracted feature points to estimate the user's line of sight (position and direction). Here, the feature point space is a space constructed for a "surrounding space" set based on a predetermined usage scene. Therefore, the captured image is also a captured image of the surrounding space. Therefore, it is assumed that the positional relationships of both the feature point space and the captured image are identified based on the "surrounding space."

The video processing device 1 tracks the movement of the user's head based on the inertial data received from the inertial sensor 5, and tracks the user's line of sight described above in more real time.

Furthermore, the video processing device 1 positions the user's line of sight, which is being followed in real time, on an AR content space created in advance, and calculates how the AR content looks from there.

Then, the video processing device 1 causes the output device 3 to draw the calculated appearance.

In this way, the AR system generates AR content and causes the video processing device 1, such as a smartphone or AR glasses, to display the AR content. However, as described above, in this method, if multiple moving objects are included in the image taken by the camera 4, the video processing device 1 displays the AR content in the correct position, and the user AR content may be displayed in locations that are not compatible with mobile objects.

Therefore, the operation of the video processing device 1 for displaying AR content at the correct position even in a captured image in which multiple moving objects are captured will be described below.

FIG. 3 is a flowchart illustrating an example of an operation by which the video processing device 1 displays AR content only at the correct position of a captured image.
The operation of this flowchart is realized by the control unit 10 of the video processing device 1 reading and executing the program stored in the program storage unit 20.

This operation flow is started, for example, when the user inputs an instruction to display AR content or when a predetermined condition is satisfied, and the control unit 10 outputs an instruction to display AR content. Alternatively, this operation flow may be started when the video processing device 1 is activated and the camera 4 acquires a photographed image. Further, in this operation, it is assumed that the moving object is a bicycle.

In step ST101, the image acquisition unit 101 acquires a photographed image taken by the camera 4. Note that the image acquisition unit 101 may store the captured image in the image storage unit 301. Note that the photographed image includes the environment and a moving object. Here, the environment may be a general landscape, as described above. Therefore, the environment refers to the part excluding the moving object.

FIG. 4 is a diagram showing an example of a photographed image.
In the example of FIG. 4, the image is taken while the user is riding and driving a bicycle, which is a moving object. Therefore, the photographed image includes the moving object and the environment. Here, the camera 4 is a camera 4 included in the AR glasses that are the video processing device 1, and the photographed image is taken by this camera 4.

Furthermore, in the example of FIG. 4, the user's arms, the bicycle handlebars, and wheels are shown with diagonal lines for the sake of simplicity.

In step ST102, the moving object detection unit 102 detects the moving object MB from the captured image. Here, the moving body detection unit 102 may perform object detection using a general method to detect the moving body MB. For example, as the object detection method, an object detection method as disclosed in Non-Patent Document 2 may be used. Therefore, a detailed explanation of the object detection method will be omitted here. Furthermore, the moving object detection unit 102 leaves the portions other than those detected as the moving object MB and deletes the other portions. That is, the moving object detection unit 102 deletes the environmental portion in the captured image.

FIG. 5 is a diagram illustrating an example when a moving body MB is detected in a photographed image.
In the example of FIG. 5, a bicycle and a user are detected as mobile objects MB. That is, the portion detected as the mobile body MB includes the user's arm in addition to the bicycle handle and wheels. Then, as shown in FIG. 5, the environment portion is deleted.

In step ST103, the line of sight estimation unit 104 estimates the user's line of sight. First, it is assumed that a "feature point space" in which each part of a bicycle, which is a mobile object MB, is a feature point is stored in the space storage unit 302. It is assumed that an "AR content space" in which a speedometer is arranged as an AR content at the center of the handle part in the feature point space is also stored in the space storage unit 302.

FIG. 6 is a diagram showing an example of the "feature point space" and the "AR content space" stored in the space storage unit 302.

In the example of FIG. 6, the "feature point space" is indicated by the reference symbol CP, and the AR content is indicated by the reference symbol ARC. FIG. 6 is an example, and it goes without saying that the spatial storage unit 302 may store a plurality of such diagrams.

The feature point extraction unit 103 extracts feature points from the captured image from which the environment portion has been deleted in step ST102. That is, the feature point extraction unit 103 extracts feature points within the mobile body MB. Here, the feature point extraction method may be a general method. The feature point extraction unit 104 looks over the entire captured image and extracts specific features such as boundaries (edges) of objects or corners of objects. Here, the thing may be a boundary between objects. For example, in the example of FIG. 6, it is possible to extract the boundary of the mobile body MB as a feature point.

Then, the line of sight estimating unit 104 estimates the position of the line of sight by comparing the feature points extracted by the feature point extracting unit 103 with the feature point space stored in the spatial storage unit 302. For example, the line of sight estimation unit 104 may estimate the line of sight using vision-based AR technology or the like. The vision-based AR technology may be a general technology such as PTAM, SmartAR, Microsoft Hololens, etc., which are markerless AR technologies, for example. Therefore, a detailed explanation of the AR technology will be omitted here. The line-of-sight estimating unit 104 outputs the estimated position of the line-of-sight to the line-of-sight movement estimation unit 105.

In step ST104, the line-of-sight movement estimation unit 105 estimates the line-of-sight movement. First, the sensor data control unit 106 acquires sensor data from the inertial sensor 5. Then, the sensor data control unit 106 measures the user's head movement, body movement, etc. from the acquired sensor data. Specifically, for example, the inertial sensor 5 is an inertial measurement unit (IMU), and the sensor data control unit 106 acquires sensor data such as acceleration, angular velocity, and geomagnetism from the inertial sensor 5. The sensor data control unit 106 may then measure the user's three-dimensional movement (for example, the user's head movement) based on these data. Then, the sensor data control unit 106 outputs the measurement result to the line of sight movement estimation unit 105.

The line-of-sight movement estimating unit 105 moves the three-dimensional movement measured by the sensor data control unit 106 from the position of the line-of-sight received from the line-of-sight estimation unit 104 as a starting point, thereby generating a line-of-sight movement that follows the movement of the user's head. presume. That is, the line-of-sight movement estimating unit 105 estimates the movement of the user's line of sight starting from the line-of-sight position estimated by the line-of-sight estimation unit 104 based on the sensor data. Then, the line-of-sight movement estimation unit 105 outputs line-of-sight movement information including the estimated line-of-sight movement to the target object determination unit 107.

In step ST105, the target object determining unit 107 determines whether the target object is the mobile body MB in which the user is riding. For example, the object determination unit 107 determines whether the mobile body MB is a mobile body in which the user rides, based on the detected movement of the user's line of sight. Specifically, the details are as follows.

First, the target object determination unit 107 defines the range where the head of the user riding the mobile body MB is located as "the range of the starting point of the line of sight with the mobile body MB on which the user is riding as the target object". I'll keep it. Specifically, "the range of the starting point of the line of sight with respect to the mobile object MB on which the user is riding" is determined by the average sitting height of a typical human being from the sitting position of the mobile object MB (for example, the saddle of a bicycle). It is defined by a cubic range that takes into account the height of the vehicle and the posture of the user while riding the vehicle.

Further, the line-of-sight range in which the user can see the moving body MB on which the user rides is defined as "the line-of-sight range with the user's moving body MB as an object." Specifically, "the range of the direction of the user's line of sight with the mobile body MB as an object" is defined by a cone within the range in which the mobile body MB is visible from the user's head.

The target object determining unit 107 determines that the starting point of the line of sight movement estimated by the line of sight movement estimation unit 105 in step ST105 (that is, the starting point of the line of sight after movement) is within a predetermined range, that is, “the moving object on which the user is riding is the target object”. and the direction of the line of sight (i.e., the direction of the line of sight after moving the line of sight) is included in the ``range of the line of sight direction where the user's moving body is the object''. Determine whether the

If it is determined that the mobile object MB is included, it is determined that the mobile object MB is the one on which the user rides. In this case, the target object determination unit 107 outputs the line of sight movement information to the AR content drawing unit 108. The process then proceeds to step ST106. On the other hand, if it is determined that the mobile body MB is not included, it is determined that the mobile body MB is not boarded by a user. Therefore, the process ends. That is, the AR content will not be displayed at a location corresponding to the mobile body MB in which it is determined that the user is not riding.

FIG. 7 is a diagram showing an example of "the range of the starting point of the line of sight with the mobile body MB on which the user is riding as the target object" and the "range of the direction of the line of sight with the user's mobile body MB as the target object". be.

In FIG. 7, the "range of the starting point of the line of sight with the mobile body MB on which the user is riding as the target object" is represented by the reference symbol Ori, and the "range of the direction of the line of sight with the user's mobile body MB as the target object" is expressed as Ori. It is represented by the reference numeral Di.

In addition, the line of sight in (a) of FIG. 7 is such that the starting point of the line of sight is included in the "range of the starting point of the line of sight that targets the mobile body MB on which the user is riding," and the direction of the line of sight is " An example is shown in which the object is included in the range of line-of-sight directions with the moving body MB as an object.

On the other hand, in the line of sight in (b1) of FIG. 7, the starting point of the line of sight is included in the "range of the starting point of the line of sight that targets the mobile body MB on which the user is riding," but the direction of the line of sight is "the user's An example is shown in which the moving object MB is not included in the line-of-sight direction range in which the object is the moving body MB. Furthermore, in the line of sight in (b2) of FIG. 7, although the starting point of the line of sight is not included in the "range of the starting point of the line of sight that targets the mobile body MB on which the user is riding," the direction of the line of sight is "the user's An example is shown in which the range of line-of-sight directions with the moving body MB as the object is included.

The line of sight in (a) of FIG. 7 is determined to be the user's line of sight, while (b1) and (b2) are determined not to be the user's line of sight.

In step ST106, the AR content drawing unit 108 calculates how the AR content looks. The AR content drawing unit 108 draws the estimated line-of-sight movement included in the line-of-sight movement information received from the line-of-sight estimation unit 104, that is, the movement destination line of sight, into the AR content space corresponding to the feature point space stored in the spatial storage unit 302. and calculates how the AR content will appear in the set space.

FIG. 8 is a diagram showing an example of how the calculated AR content looks.
As shown in FIG. 8, the AR content drawing unit 108 adjusts the appearance of the AR content based on the movement information of the eye movement, and outputs AR content information for drawing the adjusted AR content to the output control unit 109. do.

In step ST107, the output control unit 109 outputs AR content information. The output control unit 109 controls the output device 3 to draw AR content. For example, the output control unit 109 controls the adjusted AR content to be displayed on AR glasses or the like.

(Operations and effects of embodiments)
According to the embodiment, the video processing device 1 is capable of displaying AR content only at a location corresponding to the mobile body MB on which the user is riding, even if a mobile body MB with similar appearance is included in the captured image. I can do it. This allows the video processing device 1 to accurately present AR content to the user.

[Other embodiments]
In the embodiment described above, an example has been described in which a photographed image photographed by the camera 4 included in the video processing device 1 is used, but the photographed image is not limited to the camera 4 included in the video processing device 1. For example, it may be an independent camera 4 connected to the video processing device 1. However, it is assumed that the camera 4 is installed at a location (for example, above the user's head) where it can capture a captured image that can estimate the user's line of sight.

Furthermore, the method described in the above embodiments can be applied to, for example, magnetic disks (floppy (registered trademark) disks, hard disks, etc.), optical disks (CD-ROMs, DVDs, etc.) as programs (software means) that can be executed by a computer. , MO, etc.), semiconductor memory (ROM, RAM, flash memory, etc.), and can also be transmitted and distributed via a communication medium. Note that the programs stored on the medium side also include a setting program for configuring software means (including not only execution programs but also tables and data structures) in the computer to be executed by the computer. A computer that realizes this device reads a program stored in a storage medium, and if necessary, constructs software means using a setting program, and executes the above-described processing by controlling the operation of the software means. Note that the storage medium referred to in this specification is not limited to those for distribution, and includes storage media such as magnetic disks and semiconductor memories provided inside computers or devices connected via a network.

In short, the present invention is not limited to the above-described embodiments, and various modifications can be made at the implementation stage without departing from the spirit of the invention. Moreover, each embodiment may be implemented in combination as appropriate as possible, and in that case, the combined effects can be obtained. Further, the embodiments described above include inventions at various stages, and various inventions can be extracted by appropriately combining the plurality of disclosed constituent elements.

1... Video processing device 2... Input device 3... Output device 4... Camera 5... Inertial sensor 10... Control unit 101... Image acquisition unit 102... Moving object detection unit 103... Feature point extraction unit 104... Line of sight estimation unit 105... Line of sight movement Estimation unit 106...Sensor data control unit 107...Object determination unit 108...AR content drawing unit 109...Output control unit 20...Program storage unit 30...Data storage unit 301...Image storage unit 302...Spatial storage unit 40...Communication interface 50 …I/O interface MB…Mobile object

Claims

A video processing device worn by a user,
an image acquisition unit that acquires a photographed image including a moving object and the environment, taken by a camera;
a moving object detection unit that detects the moving object from the captured image;
a line-of-sight estimation unit that estimates the user's line-of-sight based on the captured image;
a line-of-sight movement detection unit that acquires sensor data from a sensor included in the video processing device, and detects movement of the user's line of sight starting from the estimated line-of-sight based on the sensor data;
a determination unit that determines whether the moving object is a moving object on which the user rides based on the detected movement of the user's line of sight;
a drawing unit that calculates how the AR content looks based on the estimated movement of the line of sight;
an output control unit that controls to display the calculated AR content;
An image processing device comprising:
The video processing device according to claim 1, wherein the moving object detection unit deletes a portion of the environment excluding the detected moving object from the captured image.
The determination unit is configured to determine whether the origin of the line of sight of the detected user's line of sight is within a predetermined range, and the direction of the detected user's line of sight is from the predetermined range to a moving body on which the user rides. The video processing device according to claim 1, wherein the video processing device determines that the moving object is a moving object ridden by the user when the moving object is within a visible range.
The video processing device according to claim 3, wherein the predetermined range is a cubic range that is elevated from the sitting position of the movable object by an average sitting height of a human being, and that takes into account the posture in which the user rides the vehicle.
a feature point extraction unit that extracts feature points within the moving object;
a storage unit that stores the feature point space;
The video processing device according to claim 1, further comprising: the line of sight estimating unit estimates the line of sight based on the extracted feature points and the feature point space.
The storage unit further stores an AR content space corresponding to the feature point space,
The video processing device according to claim 5, wherein the drawing unit sets the destination of the line of sight to the AR content space, and calculates how the AR content looks in the set space.
A video processing method executed by a processor of a video processing device worn by a user, the method comprising:
Obtaining an image captured by a camera that includes a moving object and an environment;
Detecting the moving object from the captured image;
Estimating the user's line of sight based on the captured image;
acquiring sensor data from a sensor included in the video processing device;
Detecting a movement of the user's line of sight starting from the estimated line of sight based on the sensor data;
Based on the detected movement of the user's line of sight, determining whether the moving object is a moving object on which the user rides;
Calculating how the AR content looks based on the estimated movement of the line of sight;
Controlling to display the calculated AR content;
A video processing method comprising:
A video processing program comprising instructions to be executed by a processor of a video processing device worn by a user, the instructions comprising:
Obtaining an image captured by a camera that includes a moving object and an environment;
Detecting the moving object from the captured image;
Estimating the user's line of sight based on the captured image;
acquiring sensor data from a sensor included in the video processing device;
Detecting a movement of the user's line of sight starting from the estimated line of sight based on the sensor data;
Based on the detected movement of the user's line of sight, determining whether the moving object is a moving object on which the user rides;
Calculating how the AR content looks based on the estimated movement of the line of sight;
Controlling to display the calculated AR content;
A video processing program with