WO2024042644A1

WO2024042644A1 - Video processing device, video processing method, and video processing program

Info

Publication number: WO2024042644A1
Application number: PCT/JP2022/031904
Authority: WO
Inventors: 誠武藤
Original assignee: 日本電信電話株式会社
Priority date: 2022-08-24
Filing date: 2022-08-24
Publication date: 2024-02-29

Abstract

A video processing device according to one embodiment is to be worn by a user and comprises an image acquisition unit that acquires a captured image that has been captured by a camera and includes a mobile body being ridden by the user and the surroundings, a mobile body detection unit that detects the mobile body from the captured image, a surroundings image reconstruction unit that supplements an image of the surroundings at the portion of the detected mobile body when the area of the detected mobile body is smaller than a prescribed threshold value, a line of sight estimation unit that estimates the line of sight of the user on the basis of the captured image, a line of sight movement detection unit that acquires sensor data from a sensor of the video processing device and detects movement of the line of sight of the user from the estimated line of sight on the basis of the sensor data, a rendering unit that calculates the appearance of AR content on the basis of the movement of the estimated line of sight, and an output control unit that performs control to display the calculated AR content.

Description

Video processing device, video processing method, and video processing program

The present invention relates to a video processing device, a video processing method, and a video processing program.

A user using an augmented reality (AR) system can view the real space of the real world through a mobile terminal or an AR device. At this time, content such as navigation information or 3D data (hereinafter referred to as AR content) is presented as additional information in the real space. That is, a user using an AR system can see AR content superimposed on the real world and use information about this content.

For example, when a user using an AR system is moving around on a moving object, the image displayed by the AR device will include a portion of the camera image showing the environment (scenery in front of the bicycle) and a portion showing the moving object. (a part of the bicycle body).

Conventional self-position estimation processing is based on the premise that the camera image reflects the environment, and cannot be processed accurately. As a result, there is a problem in that the AR device cannot display AR content in the correct position.

This invention has been made in view of the above-mentioned circumstances, and its purpose is to prevent mobile terminals or The purpose of an AR device is to provide a technology that can display AR content at a precise location.

In order to solve the above problems, one aspect of the present invention is an image processing device worn by a user, which acquires an image captured by a camera and including a moving object on which the user is riding and an environment. a moving object detecting section that detects the moving object from the photographed image; and a moving object detecting section that detects the moving object from the photographed image, and complements an image of the environment in the area of the detected moving object when the area of the detected moving object is smaller than a predetermined threshold. an environmental image restoration unit; a line-of-sight estimating unit that estimates the user's line of sight based on the photographed image; and a line-of-sight estimation unit that acquires sensor data from a sensor included in the image processing device, and calculates the estimated line-of-sight based on the sensor data. a line-of-sight movement detection unit that detects a movement of the user's line of sight starting at , a drawing unit that calculates how the AR content looks based on the estimated movement of the line-of-sight, and a display unit that displays the calculated AR content. and an output control section that controls the output so as to perform the control.

According to one aspect of the present invention, even when a camera image includes a portion showing the environment and a portion showing a moving object, a mobile terminal or an AR device displays AR content at an accurate position. This makes it possible to accurately present AR content to the user.

FIG. 1 is a block diagram showing an example of the hardware configuration of a video processing device according to the first embodiment. FIG. 2 is a block diagram showing the software configuration of the video processing apparatus in the first embodiment in relation to the hardware configuration shown in FIG. FIG. 3 is a flowchart illustrating an example of an operation for the video processing device to display AR content at a correct position in a captured image. FIG. 4 is a diagram showing an example of a photographed image. FIG. 5 is a diagram illustrating an example when a moving object is detected in a photographed image. FIG. 6 is a diagram showing an example of a case where a video of a portion extracted as a moving object is complemented with surrounding images. FIG. 7 is a diagram showing an example of setting AR content corresponding to feature points. FIG. 8 is a diagram showing an example of how the calculated AR content looks. FIG. 9 is a diagram illustrating an example of a captured image captured by a camera in the second embodiment. FIG. 10 is a block diagram showing the software configuration of the video processing apparatus in the second embodiment in relation to the hardware configuration shown in FIG. 1. FIG. 11 is a flowchart illustrating an example of an operation for the video processing device to display AR content at a correct position in a captured image. FIG. 12 is a diagram showing an example of detecting unevenness on a road surface in a photographed image. FIG. 13 is a diagram showing an example in which unevenness on a road surface in a photographed image is blurred.

Hereinafter, embodiments according to the present invention will be described with reference to the drawings. Note that, hereinafter, elements that are the same or similar to elements that have already been explained will be given the same or similar numerals, and overlapping explanations will basically be omitted. For example, when there are multiple identical or similar elements, a common code may be used to explain each element without distinction, or a common code may be used to distinguish and explain each element. In addition, branch numbers may also be used.

[First embodiment]
(composition)
FIG. 1 is a block diagram showing an example of the hardware configuration of a video processing device 1 according to the first embodiment.
The video processing device 1 is a computer that analyzes input data, generates and outputs output data. The video processing device 1 may be, for example, an AR device including AR glasses, smart glasses, or other wearable devices. That is, the video processing device 1 may be a device worn and used by a user.

As shown in FIG. 1, the video processing device 1 includes a control section 10, a program storage section 20, a data storage section 30, a communication interface 40, and an input/output interface 50. The control unit 10, program storage unit 20, data storage unit 30, communication interface 40, and input/output interface 50 are communicably connected to each other via a bus. Further, the communication interface 40 may be communicably connected to an external device via a network. Further, the input/output interface 50 is communicably connected to the input device 2, the output device 3, the camera 4, and the inertial sensor 5.

The control unit 10 controls the video processing device 1. The control unit 10 includes a hardware processor such as a central processing unit (CPU). For example, the control unit 10 may be an integrated circuit capable of executing various programs.

The program storage unit 20 includes non-volatile memories that can be written to and read from at any time such as EPROM (Erasable Programmable Read Only Memory), HDD (Hard Disk Drive), and SSD (Solid State Drive), as well as ROM ( It can be used in combination with non-volatile memory such as Read Only Memory). The program storage unit 20 stores programs necessary to execute various processes. That is, the control unit 10 can implement various controls and operations by reading and executing programs stored in the program storage unit 20.

The data storage unit 30 is a storage that uses a combination of a non-volatile memory that can be written to and read from at any time, such as an HDD or a memory card, and a volatile memory such as a RAM (Random Access Memory), as a storage medium. . The data storage unit 30 is used to store data acquired and generated while the control unit 10 executes programs and performs various processes.

The communication interface 40 includes one or more wired or wireless communication modules. For example, the communication interface 40 includes a communication module that makes a wired or wireless connection to an external device via a network. Communication interface 40 may include a wireless communication module that wirelessly connects to external devices such as Wi-Fi access points and base stations. Furthermore, the communication interface 40 may include a wireless communication module for wirelessly connecting to an external device using short-range wireless technology. That is, the communication interface 40 may be any general communication interface as long as it is capable of communicating with an external device under the control of the control unit 10 and transmitting and receiving various information including past performance data. .

The input/output interface 50 is connected to the input device 2, output device 3, camera 4, inertial sensor 5, etc. The input/output interface 50 is an interface that allows information to be transmitted and received between the input device 2, the output device 3, and the plurality of cameras 4 and inertial sensors 5. The input/output interface 50 may be integrated with the communication interface 40. For example, the video processing device 1 and at least one of the input device 2, the output device 3, the camera 4, and the inertial sensor 5 are wirelessly connected using short-range wireless technology or the like. Information may also be sent and received using

The input device 2 may include, for example, a keyboard, a pointing device, etc. for the user to input various information including past performance data to the video processing device 1. The input device 2 also includes a reader for reading data to be stored in the program storage section 20 or the data storage section 30 from a memory medium such as a USB memory, and a disk device for reading such data from a disk medium. May be included.

The output device 3 includes a display that displays images captured by the camera 4, AR content, and the like. The output device 3 may be integrated with the video processing device 1. For example, when the video processing device 1 is AR glasses or smart glasses, the output device 3 is a part of the glasses.

The camera 4 is capable of photographing environments such as landscapes, and may be a general camera 4 that can be attached to the video processing device 1. Here, the environment generally refers to the scenery that is photographed. The camera 4 may be integrated with the video processing device 1. The camera 4 may output the captured image to the control unit 10 of the video processing device 1 through the input/output interface 50.

The inertial sensor 5 includes, for example, an acceleration sensor, an angular velocity sensor, a geomagnetic sensor, and the like. For example, when the video processing device 1 is an AR device, the inertial sensor 5 senses the moving speed and head movement of the user wearing the AR device, and outputs sensor data according to the sensing to the control unit 10.

FIG. 2 is a block diagram showing the software configuration of the video processing device 1 in the first embodiment in relation to the hardware configuration shown in FIG. 1.
The control unit 10 includes an image acquisition unit 101 , a moving object detection unit 102 , an environment image restoration unit 103 , a feature point extraction unit 104 , a line of sight estimation unit 105 , a line of sight movement estimation unit 106 , and a sensor data control unit 107 , an AR content drawing section 108 , and an output control section 109 .

The image acquisition unit 101 acquires a photographed image taken by the camera 4. Note that the image acquisition unit 101 may store the captured image in the image storage unit 301.

The moving object detection unit 102 detects a moving object from the captured image. The moving object detection unit 102 detects the moving object MB appearing in the photographed image. Of course, the moving object may be any arbitrary object such as a motorized bicycle, an electric bicycle, a motorcycle, or a vehicle. Further, the moving object may include only the moving object, or may include a part of the body of the user riding the moving object, such as an arm. Further, a general technique may be used as the detection method.

The environmental image restoration unit 103 complements the video of the portion detected as a moving object. The environmental image restoration unit 103 replaces the image of the portion detected as a moving object with an environmental image supplemented from the environmental image of the photographed image. Note that a general technique may be used for the interpolation method.

The feature point extraction unit 104 extracts things in the environment of the photographed image as feature points. For example, the feature point extracting unit 104 extracts as feature points those located near the feature point space stored in the space storage unit 302, which will be described later.

The line of sight estimation unit 105 estimates the position of the line of sight by comparing the feature points extracted by the feature point extraction unit 104 with the feature point space stored in the spatial storage unit 302. Note that details of the method for estimating the position of the line of sight will be described later.

The line-of-sight movement estimation unit 106 estimates the line-of-sight movement. The line of sight movement estimating unit 106 moves the three-dimensional movement measured by the sensor data control unit 107 (described later) from the position of the line of sight received from the line of sight estimation unit 105 as a starting point, thereby generating a line of sight that follows the movement of the user's head. Estimate movement. That is, the line-of-sight movement estimating unit 106 estimates the movement of the user's line of sight starting from the line-of-sight position estimated by the line-of-sight estimation unit 105 based on the sensor data.

The sensor data control unit 107 acquires sensor data from the inertial sensor 5. Then, the sensor data control unit 107 measures the user's head movement, body movement, etc. from the acquired sensor data. For example, the sensor data control unit 107 measures the user's three-dimensional movement (for example, the user's head movement) based on the sensor data.

The AR content drawing unit 108 calculates how the AR content looks. The AR content drawing unit 108 draws the estimated line-of-sight movement included in the line-of-sight movement information received from the line-of-sight estimation unit 105, that is, the movement destination line of sight, into the AR content space corresponding to the feature point space stored in the spatial storage unit 302. and calculates how the AR content will appear in the set space.

The output control unit 109 outputs AR content information. The output control unit 109 controls the output device 3 to draw AR content. For example, the output control unit 109 controls the adjusted AR content to be displayed on AR glasses or the like.

The data storage unit 30 includes an image storage unit 301 and a spatial storage unit 302.

The image storage unit 301 may store captured images acquired by the image acquisition unit 101. Here, the captured image stored in the image storage unit 301 may have information about the longitude and latitude in the real world where the captured image was captured, which is acquired by the video processing device 1. Further, the image storage unit 301 may automatically delete the captured image after a predetermined period of time has passed.

The space storage unit 302 stores a feature point space and an AR content space corresponding to the feature point space. The feature point space may be located at a preset position in the photographed image, and for example, two positions may be set. Then, an AR content space may be set between the two feature point spaces.

(motion)
First, a method for displaying AR content on the video processing device 1 (mobile terminal or AR device) used by the user in a general AR system will be described.

The video processing device 1 extracts feature points from the image taken by the camera 4. Furthermore, the video processing device 1 also extracts feature points from a captured image that has been captured in advance (for example, a captured image of the previous frame or several frames before the image) (hereinafter referred to as feature point space). ) and the extracted feature points to estimate the user's line of sight (position and direction). Here, the feature point space is a space constructed for a "surrounding space" set based on a predetermined usage scene. Therefore, the captured image is also a captured image of the surrounding space. Therefore, it is assumed that the positional relationships of both the feature point space and the captured image are identified based on the "surrounding space."

The video processing device 1 tracks the movement of the user's head based on the inertial data received from the inertial sensor 5, and tracks the user's line of sight described above in more real time.

Furthermore, the video processing device 1 positions the user's line of sight, which is being followed in real time, on an AR content space created in advance, and calculates how the AR content looks from there.

Then, the video processing device 1 causes the output device 3 to draw the calculated appearance.

In this way, the AR system generates AR content and causes the video processing device 1, such as a smartphone or AR glasses, to display the AR content. However, as described above, this method assumes that the camera 4 reflects the environment, so in the case of a captured image that includes both the environment and the mobile object MB, the video processing device 1 can process the AR content. may fail to display in the correct position.

Therefore, the operation of the video processing device 1 for displaying AR content in the correct position even in a photographed image in which the environment and a moving object coexist will be described below.

FIG. 3 is a flowchart illustrating an example of an operation by which the video processing device 1 displays AR content at a correct position in a captured image.
The operation of this flowchart is realized by the control unit 10 of the video processing device 1 reading and executing the program stored in the program storage unit 20.

This operation flow is started, for example, when the user inputs an instruction to display AR content or when a predetermined condition is satisfied, and the control unit 10 outputs an instruction to display AR content. Alternatively, this operation flow may be started when the video processing device 1 is activated and the camera 4 acquires a photographed image. Further, in this operation, it is assumed that the moving object is a bicycle.

In step ST101, the image acquisition unit 101 acquires a photographed image taken by the camera 4. Note that the image acquisition unit 101 may store the captured image in the image storage unit 301. Note that the photographed image includes the environment and a moving object. Here, the environment may be a general landscape, as described above. Therefore, the environment refers to the part excluding the moving object.

FIG. 4 is a diagram showing an example of a photographed image.
In the example of FIG. 4, the image is taken while the user is riding and driving a bicycle, which is a moving object. Therefore, the photographed image includes the moving object and the environment. Here, the camera 4 is a camera 4 included in the AR glasses that are the video processing device 1, and the photographed image is taken by this camera 4.

Furthermore, in the example of FIG. 4, the user's arms, the bicycle handlebars, and wheels are shown with diagonal lines for the sake of simplicity. Furthermore, in the example of FIG. 3, two buildings Bu are shown in the photographed image. Further, it is assumed that the user shown in the example of FIG. 4 is facing forward.

In step ST102, the moving object detection unit 102 detects the moving object MB from the captured image. Here, the moving body detection unit 102 may perform object detection using a general method to detect the moving body MB. For example, as the object detection method, an object detection method as disclosed in Non-Patent Document 2 may be used. Therefore, a detailed explanation of the object detection method will be omitted here.

FIG. 5 is a diagram illustrating an example when a moving body MB is detected in a photographed image.
In the example of FIG. 5, a bicycle and a user are detected as mobile objects MB. That is, the portion detected as the mobile body MB includes the user's arm in addition to the bicycle handle and wheels.

In step ST103, the environment image restoration unit 103 complements the video of the portion detected as the mobile object MB. The environment image restoration unit 103 complements the image of the surrounding environment in the video of the portion detected as the mobile object MB. That is, the environmental image restoration unit 103 reproduces the peripheral image of the portion detected as the mobile object MB. The environment image restoration unit 103 may perform complementation using a general method. For example, the environmental image restoration unit 103 may use a complementary technique as disclosed in Non-Patent Document 3. Therefore, a detailed explanation of the complementary technology will be omitted here.

FIG. 6 is a diagram showing an example of a case where a video of a portion extracted as a mobile object MB is complemented with surrounding images.
As shown in FIG. 6, by complementing the image of the mobile body MB with the surrounding image, the environmental image restoration unit 103 can obtain a photographed image that would have been taken if the mobile body MB did not exist.

In step ST104, the line of sight estimation unit 105 estimates the user's line of sight. First, the feature point extraction unit 104 extracts feature points from the captured image. The feature points may be, for example, two buildings Bu.

FIG. 7 is a diagram showing an example of setting AR content corresponding to feature points.
As shown in FIG. 7, the space storage unit 302 presets a "feature point space" with two buildings Bu as feature points, and places a smiley face that is AR content in the middle part of the two buildings Bu in the space. It is assumed that an "AR content space" in which marks are placed is also stored. That is, the space storage unit 302 stores a feature point space and an AR content space corresponding to the feature point space. Here, in FIG. 7, the AR content is indicated by reference symbol ARC. The example in FIG. 7 is just an example, and the space storage unit 302 may store the AR content space together with various feature point spaces.

The feature point extraction unit 104 looks over the entire photographed image and extracts specific feature points such as boundaries (edges) of objects or corners of objects. Here, the thing may be a boundary between objects. For example, in FIG. 7, it is possible to extract the boundary of the building Bu, which is a specific feature point.

Then, the line of sight estimation unit 105 estimates the position of the line of sight by comparing the feature points extracted by the feature point extraction unit 104 with the feature point space stored in the spatial storage unit 302. For example, the line of sight estimation unit 105 may estimate the line of sight using vision-based AR technology or the like. The vision-based AR technology may be a general technology such as PTAM, SmartAR, Microsoft Hololens, etc., which are markerless AR technologies, for example. Therefore, a detailed explanation of the AR technology will be omitted here. The line-of-sight estimation unit 105 outputs the estimated line-of-sight position to the line-of-sight movement estimation unit 106.

For example, a method of storing images corresponding to photographed images in the spatial storage unit 302 and estimating the line-of-sight position by comparing the images is not efficient. Therefore, the line-of-sight estimation unit 105 can perform high-speed processing by comparing feature points extracted and simplified as feature points with the feature point space instead of the image itself.

When the user is facing forward, the area of the moving object reflected in the captured image is small. Therefore, since the area occupied by the moving object in the photographed image is small, the environmental image restoration unit 103 can complement the portion detected as the moving object (image restoration accuracy is high). As a result, the line of sight estimating unit 105 successfully estimates the line of sight.

In step ST105, the line-of-sight movement estimation unit 106 estimates the line-of-sight movement. First, the sensor data control unit 107 acquires sensor data from the inertial sensor 5. Then, the sensor data control unit 107 measures the user's head movement, body movement, etc. from the acquired sensor data. Specifically, for example, the inertial sensor 5 is an inertial measurement unit (IMU), and the sensor data control unit 107 acquires sensor data such as acceleration, angular velocity, and geomagnetism from the inertial sensor 5. The sensor data control unit 107 may then measure the user's three-dimensional movement (for example, the user's head movement) based on these data. Then, the sensor data control unit 107 outputs the measurement result to the line of sight movement estimation unit 106.

The line-of-sight movement estimating unit 106 moves the three-dimensional movement measured by the sensor data control unit 107 from the position of the line-of-sight received from the line-of-sight estimation unit 105 as a starting point, thereby generating a line-of-sight movement that follows the movement of the user's head. presume. That is, the line-of-sight movement estimating unit 106 estimates the movement of the user's line of sight starting from the line-of-sight position estimated by the line-of-sight estimation unit 105 based on the sensor data. Then, the eye movement estimation unit 106 outputs eye movement information including the estimated eye movement to the AR content drawing unit 108.

In step ST106, the AR content drawing unit 108 calculates how the AR content looks. The AR content drawing unit 108 draws the estimated line-of-sight movement included in the line-of-sight movement information received from the line-of-sight estimation unit 105, that is, the movement destination line of sight, into the AR content space corresponding to the feature point space stored in the spatial storage unit 302. and calculates how the AR content will appear in the set space.

FIG. 8 is a diagram showing an example of how the calculated AR content looks.
As shown in FIG. 8, the AR content drawing unit 108 adjusts the appearance of the AR content based on the movement information of the eye movement, and sends AR content information to the output control unit 109 for drawing the adjusted AR content ARC. Output.

In step ST107, the output control unit 109 outputs AR content information. The output control unit 109 controls the output device 3 to draw the AR content ARC. For example, the output control unit 109 controls the adjusted AR content ARC to be displayed on AR glasses or the like.

(Operations and effects of the first embodiment)
According to the first embodiment, the video processing device 1 can accurately process AR content even when a portion showing the environment and a portion showing the mobile object MB coexist in an image taken by the camera 4. The ARC can be presented to the user.

[Second embodiment]
In the second embodiment, for example, when a user riding a moving object such as a bicycle looks down, the moving object will occupy most of the image taken by the camera 4.

FIG. 9 is a diagram showing an example of a captured image captured by the camera 4 in the second embodiment.
As shown in FIG. 9, when the mobile object MB occupies most of the photographed image, the complementation processing may fail even if the processing in the first embodiment is performed.

In the second embodiment, a method will be described in which it is possible to accurately present AR content to a user even when a moving object occupies most of the captured image by causing the user to look down. .

(composition)
The hardware configuration of the video processing device 1 in the second embodiment may be the same as the hardware configuration in the first embodiment, so a redundant explanation here will be omitted.

FIG. 10 is a block diagram showing the software configuration of the video processing device 1 in the second embodiment in relation to the hardware configuration shown in FIG. 1.
In the second embodiment, the control unit 10 differs from the control unit 10 in the first embodiment in that it includes a moving object area calculation unit 110 and a road image analysis unit 111.

The moving body area calculation unit 110 calculates the area of the moving body MB. The moving object area calculation unit 110 may calculate the area of the moving object MB, or may calculate the ratio of the moving object MB to the captured image. Furthermore, the moving object area calculation unit 110 determines whether the calculated area of the moving object MB is equal to or larger than a threshold value.

The road surface image analysis unit 111 detects a portion where the unevenness in the environment shown in the photographed image has been deformed into a vertically elongated shape due to movement within the interval of the shutter speed of the camera 4. The road surface image analysis unit 111 may detect irregularities (for example, pebbles, etc.) appearing in the environment of the photographed image using a general method, and detect deformation of the pebbles that are the irregularities. Furthermore, the road surface image analysis unit 111 estimates the moving speed of the mobile body MB from the degree of deformation. Details of the method for estimating the moving speed will be described later.

Furthermore, the data storage unit 30 differs from the first embodiment in that it includes a line-of-sight storage unit 303. The line of sight storage unit 303 may store information about the line of sight estimated by the line of sight estimation unit 105. Note that the stored information regarding the line of sight may be deleted after a certain period of time has passed.

(motion)
FIG. 11 is a flowchart illustrating an example of an operation by which the video processing device 1 displays AR content at a correct position in a captured image.
The operation of this flowchart is realized by the control unit 10 of the video processing device 1 reading and executing the program stored in the program storage unit 20.

This operation flow is started, for example, when the user inputs an instruction to display AR content or when a predetermined condition is satisfied, and the control unit 10 outputs an instruction to display AR content. Alternatively, this operation flow may be started when the video processing device 1 is activated and the camera 4 acquires a photographed image.

Step ST201 and step ST202 may be the same as step ST101 and step ST102 described with reference to FIG. 3, so duplicate explanation here will be omitted. Note that in step ST202, the moving object detection section 102 may output the captured image and information about the detected moving object MB to the moving object area calculation section 110.

In step ST203, the moving body area calculation unit 110 calculates the area of the moving body MB.

The moving object area calculation unit 110 may calculate the area of the moving object MB, or may calculate the proportion of the moving object MB in the captured image.

In step ST204, the moving object area calculation unit 110 determines whether the calculated area of the moving object MB is greater than or equal to a threshold value. If it is determined that the calculated area of the moving body MB is not equal to or larger than the predetermined threshold, that is, if the area of the moving body MB does not occupy most of the captured image, the process proceeds to step ST205. On the other hand, if it is determined that the calculated area is equal to or greater than the predetermined threshold, that is, if the area of the moving body MB occupies most of the captured image, the process proceeds to step ST207. Note that when calculating the ratio, the moving object area calculation unit 110 may determine whether the ratio of the moving object MB is equal to or greater than a threshold value. Step ST205 and step ST206 may be the same as step ST103 and step ST104 described with reference to FIG. 3, so a duplicate description here will be omitted. However, the line-of-sight estimating unit 105 may store information about the estimated line-of-sight estimation in the line-of-sight storage unit 303 together with time information.

For example, as shown in FIG. 7, when the user is looking down, the moving object occupies most of the captured image. In such a case, as in the first embodiment, even if the environmental image restoration unit 103 tries to complement the portion detected as a moving object, the accuracy deteriorates and restoration cannot be performed correctly. As a result, the line of sight estimation unit 104 fails to estimate the line of sight. Therefore, as described below, the process estimates the user's line of sight using a method that allows estimation of the line of sight even when the user looks down.

In step ST207, the road image analysis unit 111 detects a portion where the unevenness in the environment shown in the photographed image has been deformed into a vertically elongated shape due to movement within the interval of the shutter speed of the camera 4. The road surface image analysis unit 111 may detect irregularities (for example, pebbles, etc.) appearing in the environment of the photographed image using a general method, and detect deformation of the pebbles that are the irregularities. For example, irregularities (pebbles) are detected as a rectangle circumscribing the irregularities. Therefore, the road image analysis unit 111 estimates the degree of deformation from the ratio of the length and width of this rectangle. Then, the road image analysis unit 111 estimates the moving speed of the mobile body MB from the degree of deformation. The road surface video analysis unit 111 estimates the speed of the moving object MB shown in the photographed image based on, for example, the intensity of blur, the mounting angle of the camera 4, and the like. Here, as a method for estimating the speed of the mobile body MB, a general technique may be used. For example, as the method for estimating the speed of the mobile body MB, a method for estimating the speed of the mobile body MB as disclosed in Non-Patent Document 4 may be used. Therefore, a detailed explanation of the method for estimating the speed of the mobile body MB will be omitted here.

FIG. 12 is a diagram showing an example of detecting unevenness on a road surface in a photographed image.
In FIG. 12, unevenness on the road surface is indicated by reference symbol CC. As shown in FIG. 12, the road image analysis unit 111 detects irregularities such as pebbles on the road surface from the photographed image.

FIG. 13 is a diagram showing an example in which unevenness on a road surface in a photographed image is blurred.
As shown in FIG. 13, as the speed of the moving body MB increases, the detected pebbles become blurred. The road surface image analysis unit 111 may estimate the speed of the moving body MB using the degree of blur.

In step ST208, the line of sight estimation unit 105 estimates the line of sight. The line-of-sight estimation unit 105 acquires information about the previously stored line-of-sight estimation from the current time from the line-of-sight storage unit 303, and estimates the current line-of-sight position based on the result of the line-of-sight estimation and the movement speed.

Steps ST209 to ST211 may be the same as steps ST105 to ST107 described with reference to FIG. 3, so a duplicate explanation here will be omitted.

For example, the AR content space is linked to a coordinate system such as longitude and latitude in the real world. Therefore, even if the line of sight estimation method is a method that utilizes the distortion of unevenness in the captured image, there are two points in the AR content space: the moving distance calculated from speed and time, and the longitude/latitude coordinate system. It is possible to link and map each other based on the distance between them. Therefore, through the processing in steps ST209 to ST211, the video processing device 1 can correctly recognize the AR content space and can accurately present the AR content ARC to the user.

(Operations and effects of the second embodiment)
According to the second embodiment, the video processing device 1 can accurately perform AR processing even when the portion of the image captured by the camera 4 is dominated by the portion of the mobile object MB rather than the portion of the environment. The content ARC can be presented to the user.

[Other embodiments]
In the above embodiment, an example has been described in which a photographed image photographed by the camera 4 included in the video processing device 1 is used, but the photographed image is not limited to the camera 4 included in the video processing device 1. For example, it may be an independent camera 4 connected to the video processing device 1. However, it is assumed that the camera 4 is installed at a location (for example, above the user's head) where a captured image from which the user's line of sight can be estimated can be captured.

Furthermore, the method described in the above embodiments can be applied to, for example, magnetic disks (floppy (registered trademark) disks, hard disks, etc.), optical disks (CD-ROMs, DVDs, etc.) as programs (software means) that can be executed by a computer. , MO, etc.), semiconductor memory (ROM, RAM, flash memory, etc.), and can also be transmitted and distributed via a communication medium. Note that the programs stored on the medium side also include a setting program for configuring software means (including not only execution programs but also tables and data structures) in the computer to be executed by the computer. A computer that realizes this device reads a program stored in a storage medium, and if necessary, constructs software means using a setting program, and executes the above-described processing by controlling the operation of the software means. Note that the storage medium referred to in this specification is not limited to those for distribution, and includes storage media such as magnetic disks and semiconductor memories provided inside computers or devices connected via a network.

In short, the present invention is not limited to the above-described embodiments, and various modifications can be made at the implementation stage without departing from the spirit thereof. Moreover, each embodiment may be implemented by appropriately combining them as much as possible, and in that case, the combined effects can be obtained. Further, the embodiments described above include inventions at various stages, and various inventions can be extracted by appropriately combining the plurality of disclosed constituent elements.

1...Video processing device 2...Input device 3...Output device 4...Camera 5...Inertial sensor 10...Control unit 101...Image acquisition unit 102...Moving object detection unit 103...Environmental image restoration unit 104...Feature point extraction unit 105...Line of sight Estimation unit 106... Gaze movement estimation unit 107... Sensor data control unit 108... AR content drawing unit 109... Output control unit 110... Moving object area calculation unit 111... Road image analysis unit 20... Program storage unit 30... Data storage unit 301... Image storage unit 302... Spatial storage unit 303... Line of sight storage unit 40... Communication interface 50... Input/output interface MB... Mobile object Bu... Building

Claims

A video processing device worn by a user,
an image acquisition unit that acquires an image captured by a camera that includes the moving object on which the user is riding and the environment;
a moving object detection unit that detects the moving object from the captured image;
an environment image restoration unit that complements an image of the environment in a portion of the detected moving body when the area of the detected moving body is smaller than a predetermined threshold;
a line-of-sight estimation unit that estimates the user's line-of-sight based on the captured image;
a line-of-sight movement detection unit that acquires sensor data from a sensor included in the video processing device, and detects movement of the user's line of sight starting from the estimated line-of-sight based on the sensor data;
a drawing unit that calculates how the AR content looks based on the estimated movement of the line of sight;
an output control unit that controls to display the calculated AR content;
An image processing device comprising:
a moving object area calculation unit that calculates the area of the detected moving object and determines whether the area exceeds a predetermined threshold;
If the area exceeds a predetermined threshold, further comprising a video analysis unit that detects unevenness in the environment of the photographed image, detects deformation of the unevenness, and estimates the speed of the moving object based on the degree of the deformation. The video processing device according to claim 1.
The video processing device according to claim 2, wherein the line-of-sight estimating unit estimates the user's current line-of-sight based on the speed of the moving object and the previous line-of-sight estimation.
a feature point extraction unit that extracts feature points in the environment within the photographed image;
a storage unit that stores the feature point space;
The video processing device according to claim 1, further comprising: the line of sight estimating unit estimates the line of sight based on the extracted feature points and the feature point space.
The storage unit further stores an AR content space corresponding to the feature point space,
The video processing device according to claim 4, wherein the drawing unit sets the destination of the line of sight to the AR content space, and calculates how the AR content looks in the set space.
The video processing device according to claim 4, wherein the feature point extraction unit extracts features at edges or corners of objects in the photographed image.
A video processing method executed by a processor of a video processing device worn by a user, the method comprising:
Obtaining a captured image captured by a camera that includes the moving object on which the user is riding and the environment;
Detecting the moving object from the captured image;
When the area of the detected moving body is smaller than a predetermined threshold, complementing an image of the environment in a portion of the detected moving body;
Estimating the user's line of sight based on the captured image;
acquiring sensor data from a sensor included in the video processing device;
Detecting a movement of the user's line of sight starting from the estimated line of sight based on the sensor data;
Calculating how the AR content looks based on the estimated movement of the line of sight;
Controlling to display the calculated AR content;
A video processing method comprising:
A video processing program comprising instructions to be executed by a processor of a video processing device worn by a user, the instructions comprising:
Obtaining a captured image captured by a camera that includes the moving object on which the user is riding and the environment;
Detecting the moving object from the captured image;
When the area of the detected moving body is smaller than a predetermined threshold, complementing an image of the environment in a portion of the detected moving body;
Estimating the user's line of sight based on the captured image;
acquiring sensor data from a sensor included in the video processing device;
Detecting a movement of the user's line of sight starting from the estimated line of sight based on the sensor data;
Calculating how the AR content looks based on the estimated movement of the line of sight;
Controlling to display the calculated AR content;
A video processing program with