WO2024042645A1 - Video processing device, video processing method, and video processing program - Google Patents

Video processing device, video processing method, and video processing program Download PDF

Info

Publication number
WO2024042645A1
WO2024042645A1 PCT/JP2022/031905 JP2022031905W WO2024042645A1 WO 2024042645 A1 WO2024042645 A1 WO 2024042645A1 JP 2022031905 W JP2022031905 W JP 2022031905W WO 2024042645 A1 WO2024042645 A1 WO 2024042645A1
Authority
WO
WIPO (PCT)
Prior art keywords
sight
line
user
video processing
moving object
Prior art date
Application number
PCT/JP2022/031905
Other languages
French (fr)
Japanese (ja)
Inventor
誠 武藤
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2022/031905 priority Critical patent/WO2024042645A1/en
Publication of WO2024042645A1 publication Critical patent/WO2024042645A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics

Definitions

  • the present invention relates to a video processing device, a video processing method, and a video processing program.
  • a user using an augmented reality (AR) system can view the real space of the real world through a mobile terminal or an AR device.
  • content such as navigation information or 3D data (hereinafter referred to as AR content) is presented as additional information in the real space. That is, a user using an AR system can see AR content superimposed on the real world and use information about this content.
  • the image displayed by the AR device will include a portion of the camera image showing the environment (scenery in front of the bicycle) and a portion showing the moving object. (a part of the bicycle body).
  • a moving object on which the user is riding is also recognized.
  • the AR device displays the AR content at the correct position, and also displays the AR content at a location corresponding to a moving object on which the user is not riding.
  • This invention was made in view of the above-mentioned circumstances, and its purpose is to detect whether the user is not riding in the case where a moving object on which the user is not riding is recognized in addition to the moving object on which the user is riding.
  • the objective is to provide a technology that can prevent AR content from being displayed even in locations corresponding to moving objects.
  • one aspect of the present invention is an image processing device worn by a user, which includes an image acquisition unit that acquires a photographed image including a moving body and an environment, which is photographed by a camera; a moving object detection section that detects the moving object from the camera; a line of sight estimating section that estimates the user's line of sight based on the photographed image; and a line of sight estimation section that acquires sensor data from a sensor included in the video processing device, and acquires sensor data from a sensor included in the image processing device, and acquires sensor data based on the sensor data.
  • a line-of-sight movement detection unit that detects a movement of the user's line of sight starting from the estimated line of sight; a determination unit that determines whether or not the AR content is displayed; a drawing unit that calculates how the AR content looks based on the estimated movement of the line of sight; and an output control unit that controls the display of the calculated AR content.
  • AR content can be displayed only at a location corresponding to the moving object on which the user is riding. , it becomes possible to accurately present AR content to the user.
  • FIG. 1 is a block diagram showing an example of the hardware configuration of a video processing device according to an embodiment.
  • FIG. 2 is a block diagram showing the software configuration of the video processing apparatus in the embodiment in relation to the hardware configuration shown in FIG. 1.
  • FIG. 3 is a flowchart illustrating an example of an operation by which the video processing device displays AR content only at the correct position of a captured image.
  • FIG. 4 is a diagram showing an example of a photographed image.
  • FIG. 5 is a diagram illustrating an example when a moving object is detected in a photographed image.
  • FIG. 6 is a diagram showing an example of a "feature point space" and an "AR content space" stored in the space storage unit.
  • FIG. 7 is a diagram illustrating an example of "the range of the starting point of the line of sight with the moving object on which the user is riding as an object" and the “range of the direction of the line of sight with the moving object of the user as the object”.
  • FIG. 8 is a diagram showing an example of how the calculated AR content looks.
  • FIG. 1 is a block diagram showing an example of the hardware configuration of a video processing device 1 according to an embodiment.
  • the video processing device 1 is a computer that analyzes input data, generates and outputs output data.
  • the video processing device 1 may be, for example, an AR device including AR glasses, smart glasses, or other wearable devices. That is, the video processing device 1 may be a device worn and used by a user.
  • the video processing device 1 includes a control section 10, a program storage section 20, a data storage section 30, a communication interface 40, and an input/output interface 50.
  • the control unit 10, program storage unit 20, data storage unit 30, communication interface 40, and input/output interface 50 are communicably connected to each other via a bus.
  • the communication interface 40 may be communicably connected to an external device via a network.
  • the input/output interface 50 is communicably connected to the input device 2, the output device 3, the camera 4, and the inertial sensor 5.
  • the control unit 10 controls the video processing device 1.
  • the control unit 10 includes a hardware processor such as a central processing unit (CPU).
  • the control unit 10 may be an integrated circuit capable of executing various programs.
  • the program storage unit 20 includes non-volatile memories that can be written to and read from at any time such as EPROM (Erasable Programmable Read Only Memory), HDD (Hard Disk Drive), and SSD (Solid State Drive), as well as ROM ( It can be used in combination with non-volatile memory such as Read Only Memory).
  • the program storage unit 20 stores programs necessary to execute various processes. That is, the control unit 10 can implement various controls and operations by reading and executing programs stored in the program storage unit 20.
  • the data storage unit 30 is a storage that uses a combination of a non-volatile memory that can be written to and read from at any time, such as an HDD or a memory card, and a volatile memory such as a RAM (Random Access Memory), as a storage medium. .
  • the data storage unit 30 is used to store data acquired and generated while the control unit 10 executes programs and performs various processes.
  • the communication interface 40 includes one or more wired or wireless communication modules.
  • the communication interface 40 includes a communication module that makes a wired or wireless connection to an external device via a network.
  • Communication interface 40 may include a wireless communication module that wirelessly connects to external devices such as Wi-Fi access points and base stations.
  • the communication interface 40 may include a wireless communication module for wirelessly connecting to an external device using short-range wireless technology. That is, the communication interface 40 may be any general communication interface as long as it is capable of communicating with an external device under the control of the control unit 10 and transmitting and receiving various information including past performance data. .
  • the input/output interface 50 is connected to the input device 2, output device 3, camera 4, inertial sensor 5, etc.
  • the input/output interface 50 is an interface that allows information to be transmitted and received between the input device 2, the output device 3, and the plurality of cameras 4 and inertial sensors 5.
  • the input/output interface 50 may be integrated with the communication interface 40.
  • the video processing device 1 and at least one of the input device 2, the output device 3, the camera 4, and the inertial sensor 5 are wirelessly connected using short-range wireless technology or the like. Information may also be sent and received using
  • the input device 2 may include, for example, a keyboard, a pointing device, etc. for the user to input various information including past performance data to the video processing device 1.
  • the input device 2 also includes a reader for reading data to be stored in the program storage section 20 or the data storage section 30 from a memory medium such as a USB memory, and a disk device for reading such data from a disk medium. May be included.
  • the output device 3 includes a display that displays images captured by the camera 4, AR content, and the like.
  • the output device 3 may be integrated with the video processing device 1.
  • the video processing device 1 is AR glasses or smart glasses, the output device 3 is a part of the glasses.
  • the camera 4 is capable of photographing environments such as landscapes, and may be a general camera 4 that can be attached to the video processing device 1.
  • the environment generally refers to the scenery that is photographed.
  • the camera 4 may be integrated with the video processing device 1.
  • the camera 4 may output the captured image to the control unit 10 of the video processing device 1 through the input/output interface 50.
  • the inertial sensor 5 includes, for example, an acceleration sensor, an angular velocity sensor, a geomagnetic sensor, and the like.
  • the inertial sensor 5 senses the moving speed and head movement of the user wearing the AR device, and outputs sensor data according to the sensing to the control unit 10.
  • FIG. 2 is a block diagram showing the software configuration of the video processing device 1 in the embodiment in relation to the hardware configuration shown in FIG. 1.
  • the control unit 10 includes an image acquisition unit 101 , a moving object detection unit 102 , a feature point extraction unit 103 , a line of sight estimation unit 104 , a line of sight movement estimation unit 105 , a sensor data control unit 106 , and a target object determination unit 107 , an AR content drawing section 108 , and an output control section 109 .
  • the image acquisition unit 101 acquires a photographed image taken by the camera 4. Note that the image acquisition unit 101 may store the captured image in the image storage unit 301.
  • the moving object detection unit 102 detects a moving object from the captured image.
  • the moving object detection unit 102 detects the moving object MB appearing in the photographed image.
  • the moving object may be any arbitrary object such as a motorized bicycle, an electric bicycle, a motorcycle, or a vehicle.
  • the moving object may include only the moving object, or may include a part of the body of the user riding the moving object, such as an arm.
  • a general technique may be used as the detection method.
  • the moving object detection unit 102 may delete from the photographed image a portion of the environment that is not the detected moving object.
  • the feature point extraction unit 103 extracts feature points from the moving object in the captured image.
  • the feature point extracting unit 103 may extract as feature points those located near the feature point space stored in the space storage unit 302, which will be described later.
  • the line of sight estimation unit 104 estimates the position of the line of sight by comparing the feature points extracted by the feature point extraction unit 103 with the feature point space stored in the spatial storage unit 302. Note that details of the method for estimating the position of the line of sight will be described later.
  • the line-of-sight movement estimation unit 105 estimates the line-of-sight movement.
  • the line of sight movement estimation unit 105 moves the three-dimensional movement measured by the sensor data control unit 106 (described later) from the position of the line of sight received from the line of sight estimation unit 104 as a starting point, thereby generating a line of sight that follows the movement of the user's head.
  • Estimate movement That is, the line-of-sight movement estimating unit 105 estimates the movement of the user's line of sight starting from the line-of-sight position estimated by the line-of-sight estimation unit 104 based on the sensor data.
  • the sensor data control unit 106 acquires sensor data from the inertial sensor 5. Then, the sensor data control unit 106 measures the user's head movement, body movement, etc. from the acquired sensor data. For example, the sensor data control unit 106 measures the user's three-dimensional movement (for example, the user's head movement) based on the sensor data.
  • the target object determining unit 107 determines whether the target object is a moving object on which the user is riding. For example, the target object determining unit 107 determines whether the moving object is the moving object in which the user is riding, based on the detected movement of the user's line of sight. Specifically, the object determination unit 107 determines whether the origin of the detected line of sight of the user's line of sight is within a predetermined range, and the direction of the detected user's line of sight is within the predetermined range when the user gets on the vehicle. If the moving object is within the visible range, it is determined that the moving object is the moving object on which the user is riding.
  • the predetermined range may be a cubic range that is higher than the sitting position of the mobile object by the average sitting height of a person and takes into consideration the posture in which the user rides the vehicle.
  • the AR content drawing unit 108 calculates how the AR content looks.
  • the AR content drawing unit 108 draws the estimated line-of-sight movement included in the line-of-sight movement information received from the line-of-sight estimation unit 104, that is, the movement destination line of sight, into the AR content space corresponding to the feature point space stored in the spatial storage unit 302. and calculates how the AR content will appear in the set space.
  • the output control unit 109 outputs AR content information.
  • the output control unit 109 controls the output device 3 to draw AR content.
  • the output control unit 109 controls the adjusted AR content to be displayed on AR glasses or the like.
  • the data storage unit 30 includes an image storage unit 301 and a spatial storage unit 302.
  • the image storage unit 301 may store captured images acquired by the image acquisition unit 101.
  • the captured image stored in the image storage unit 301 may have information about the longitude and latitude in the real world where the captured image was captured, which is acquired by the video processing device 1. Further, the image storage unit 301 may automatically delete the captured image after a predetermined period of time has passed.
  • the space storage unit 302 stores a feature point space and an AR content space corresponding to the feature point space.
  • the feature point space may be located at a preset position in the photographed image, and may be set at each part of the moving body, for example. Then, an AR content space corresponding to the feature point space may be set.
  • the video processing device 1 extracts feature points from the image taken by the camera 4. Furthermore, the video processing device 1 also extracts feature points from a captured image that has been captured in advance (for example, a captured image of the previous frame or several frames before the image) (hereinafter referred to as feature point space). ) and the extracted feature points to estimate the user's line of sight (position and direction).
  • feature point space is a space constructed for a "surrounding space" set based on a predetermined usage scene. Therefore, the captured image is also a captured image of the surrounding space. Therefore, it is assumed that the positional relationships of both the feature point space and the captured image are identified based on the "surrounding space.”
  • the video processing device 1 tracks the movement of the user's head based on the inertial data received from the inertial sensor 5, and tracks the user's line of sight described above in more real time.
  • the video processing device 1 positions the user's line of sight, which is being followed in real time, on an AR content space created in advance, and calculates how the AR content looks from there.
  • the video processing device 1 causes the output device 3 to draw the calculated appearance.
  • the AR system generates AR content and causes the video processing device 1, such as a smartphone or AR glasses, to display the AR content.
  • the video processing device 1 displays the AR content in the correct position, and the user AR content may be displayed in locations that are not compatible with mobile objects.
  • the operation of the video processing device 1 for displaying AR content at the correct position even in a captured image in which multiple moving objects are captured will be described below.
  • FIG. 3 is a flowchart illustrating an example of an operation by which the video processing device 1 displays AR content only at the correct position of a captured image.
  • the operation of this flowchart is realized by the control unit 10 of the video processing device 1 reading and executing the program stored in the program storage unit 20.
  • This operation flow is started, for example, when the user inputs an instruction to display AR content or when a predetermined condition is satisfied, and the control unit 10 outputs an instruction to display AR content.
  • this operation flow may be started when the video processing device 1 is activated and the camera 4 acquires a photographed image. Further, in this operation, it is assumed that the moving object is a bicycle.
  • step ST101 the image acquisition unit 101 acquires a photographed image taken by the camera 4.
  • the image acquisition unit 101 may store the captured image in the image storage unit 301.
  • the photographed image includes the environment and a moving object.
  • the environment may be a general landscape, as described above. Therefore, the environment refers to the part excluding the moving object.
  • FIG. 4 is a diagram showing an example of a photographed image.
  • the image is taken while the user is riding and driving a bicycle, which is a moving object. Therefore, the photographed image includes the moving object and the environment.
  • the camera 4 is a camera 4 included in the AR glasses that are the video processing device 1, and the photographed image is taken by this camera 4.
  • the moving object detection unit 102 detects the moving object MB from the captured image.
  • the moving body detection unit 102 may perform object detection using a general method to detect the moving body MB.
  • the object detection method an object detection method as disclosed in Non-Patent Document 2 may be used. Therefore, a detailed explanation of the object detection method will be omitted here.
  • the moving object detection unit 102 leaves the portions other than those detected as the moving object MB and deletes the other portions. That is, the moving object detection unit 102 deletes the environmental portion in the captured image.
  • FIG. 5 is a diagram illustrating an example when a moving body MB is detected in a photographed image.
  • a bicycle and a user are detected as mobile objects MB. That is, the portion detected as the mobile body MB includes the user's arm in addition to the bicycle handle and wheels. Then, as shown in FIG. 5, the environment portion is deleted.
  • step ST103 the line of sight estimation unit 104 estimates the user's line of sight.
  • a “feature point space” in which each part of a bicycle, which is a mobile object MB, is a feature point is stored in the space storage unit 302.
  • an "AR content space” in which a speedometer is arranged as an AR content at the center of the handle part in the feature point space is also stored in the space storage unit 302.
  • FIG. 6 is a diagram showing an example of the "feature point space” and the "AR content space” stored in the space storage unit 302.
  • the "feature point space" is indicated by the reference symbol CP, and the AR content is indicated by the reference symbol ARC.
  • FIG. 6 is an example, and it goes without saying that the spatial storage unit 302 may store a plurality of such diagrams.
  • the feature point extraction unit 103 extracts feature points from the captured image from which the environment portion has been deleted in step ST102. That is, the feature point extraction unit 103 extracts feature points within the mobile body MB.
  • the feature point extraction method may be a general method.
  • the feature point extraction unit 104 looks over the entire captured image and extracts specific features such as boundaries (edges) of objects or corners of objects.
  • the thing may be a boundary between objects. For example, in the example of FIG. 6, it is possible to extract the boundary of the mobile body MB as a feature point.
  • the line of sight estimating unit 104 estimates the position of the line of sight by comparing the feature points extracted by the feature point extracting unit 103 with the feature point space stored in the spatial storage unit 302.
  • the line of sight estimation unit 104 may estimate the line of sight using vision-based AR technology or the like.
  • the vision-based AR technology may be a general technology such as PTAM, SmartAR, Microsoft Hololens, etc., which are markerless AR technologies, for example. Therefore, a detailed explanation of the AR technology will be omitted here.
  • the line-of-sight estimating unit 104 outputs the estimated position of the line-of-sight to the line-of-sight movement estimation unit 105.
  • the line-of-sight movement estimation unit 105 estimates the line-of-sight movement.
  • the sensor data control unit 106 acquires sensor data from the inertial sensor 5. Then, the sensor data control unit 106 measures the user's head movement, body movement, etc. from the acquired sensor data.
  • the inertial sensor 5 is an inertial measurement unit (IMU), and the sensor data control unit 106 acquires sensor data such as acceleration, angular velocity, and geomagnetism from the inertial sensor 5.
  • the sensor data control unit 106 may then measure the user's three-dimensional movement (for example, the user's head movement) based on these data. Then, the sensor data control unit 106 outputs the measurement result to the line of sight movement estimation unit 105.
  • IMU inertial measurement unit
  • the line-of-sight movement estimating unit 105 moves the three-dimensional movement measured by the sensor data control unit 106 from the position of the line-of-sight received from the line-of-sight estimation unit 104 as a starting point, thereby generating a line-of-sight movement that follows the movement of the user's head.
  • the line-of-sight movement estimating unit 105 estimates the movement of the user's line of sight starting from the line-of-sight position estimated by the line-of-sight estimation unit 104 based on the sensor data. Then, the line-of-sight movement estimation unit 105 outputs line-of-sight movement information including the estimated line-of-sight movement to the target object determination unit 107.
  • the target object determining unit 107 determines whether the target object is the mobile body MB in which the user is riding. For example, the object determination unit 107 determines whether the mobile body MB is a mobile body in which the user rides, based on the detected movement of the user's line of sight. Specifically, the details are as follows.
  • the target object determination unit 107 defines the range where the head of the user riding the mobile body MB is located as "the range of the starting point of the line of sight with the mobile body MB on which the user is riding as the target object". I'll keep it. Specifically, "the range of the starting point of the line of sight with respect to the mobile object MB on which the user is riding” is determined by the average sitting height of a typical human being from the sitting position of the mobile object MB (for example, the saddle of a bicycle). It is defined by a cubic range that takes into account the height of the vehicle and the posture of the user while riding the vehicle.
  • the line-of-sight range in which the user can see the moving body MB on which the user rides is defined as "the line-of-sight range with the user's moving body MB as an object.”
  • the range of the direction of the user's line of sight with the mobile body MB as an object is defined by a cone within the range in which the mobile body MB is visible from the user's head.
  • the target object determining unit 107 determines that the starting point of the line of sight movement estimated by the line of sight movement estimation unit 105 in step ST105 (that is, the starting point of the line of sight after movement) is within a predetermined range, that is, “the moving object on which the user is riding is the target object”. and the direction of the line of sight (i.e., the direction of the line of sight after moving the line of sight) is included in the ⁇ range of the line of sight direction where the user's moving body is the object''. Determine whether the
  • the target object determination unit 107 outputs the line of sight movement information to the AR content drawing unit 108.
  • the process then proceeds to step ST106.
  • the process ends. That is, the AR content will not be displayed at a location corresponding to the mobile body MB in which it is determined that the user is not riding.
  • FIG. 7 is a diagram showing an example of "the range of the starting point of the line of sight with the mobile body MB on which the user is riding as the target object" and the “range of the direction of the line of sight with the user's mobile body MB as the target object". be.
  • the "range of the starting point of the line of sight with the mobile body MB on which the user is riding as the target object" is represented by the reference symbol Ori
  • the “range of the direction of the line of sight with the user's mobile body MB as the target object” is expressed as Ori. It is represented by the reference numeral Di.
  • the line of sight in (a) of FIG. 7 is such that the starting point of the line of sight is included in the "range of the starting point of the line of sight that targets the mobile body MB on which the user is riding," and the direction of the line of sight is " An example is shown in which the object is included in the range of line-of-sight directions with the moving body MB as an object.
  • the starting point of the line of sight is included in the "range of the starting point of the line of sight that targets the mobile body MB on which the user is riding," but the direction of the line of sight is "the user's An example is shown in which the moving object MB is not included in the line-of-sight direction range in which the object is the moving body MB. Furthermore, in the line of sight in (b2) of FIG.
  • the starting point of the line of sight is not included in the “range of the starting point of the line of sight that targets the mobile body MB on which the user is riding," the direction of the line of sight is "the user's An example is shown in which the range of line-of-sight directions with the moving body MB as the object is included.
  • the line of sight in (a) of FIG. 7 is determined to be the user's line of sight, while (b1) and (b2) are determined not to be the user's line of sight.
  • the AR content drawing unit 108 calculates how the AR content looks.
  • the AR content drawing unit 108 draws the estimated line-of-sight movement included in the line-of-sight movement information received from the line-of-sight estimation unit 104, that is, the movement destination line of sight, into the AR content space corresponding to the feature point space stored in the spatial storage unit 302. and calculates how the AR content will appear in the set space.
  • FIG. 8 is a diagram showing an example of how the calculated AR content looks.
  • the AR content drawing unit 108 adjusts the appearance of the AR content based on the movement information of the eye movement, and outputs AR content information for drawing the adjusted AR content to the output control unit 109. do.
  • step ST107 the output control unit 109 outputs AR content information.
  • the output control unit 109 controls the output device 3 to draw AR content.
  • the output control unit 109 controls the adjusted AR content to be displayed on AR glasses or the like.
  • the video processing device 1 is capable of displaying AR content only at a location corresponding to the mobile body MB on which the user is riding, even if a mobile body MB with similar appearance is included in the captured image. I can do it. This allows the video processing device 1 to accurately present AR content to the user.
  • the photographed image is not limited to the camera 4 included in the video processing device 1.
  • it may be an independent camera 4 connected to the video processing device 1.
  • the camera 4 is installed at a location (for example, above the user's head) where it can capture a captured image that can estimate the user's line of sight.
  • the method described in the above embodiments can be applied to, for example, magnetic disks (floppy (registered trademark) disks, hard disks, etc.), optical disks (CD-ROMs, DVDs, etc.) as programs (software means) that can be executed by a computer. , MO, etc.), semiconductor memory (ROM, RAM, flash memory, etc.), and can also be transmitted and distributed via a communication medium.
  • the programs stored on the medium side also include a setting program for configuring software means (including not only execution programs but also tables and data structures) in the computer to be executed by the computer.
  • a computer that realizes this device reads a program stored in a storage medium, and if necessary, constructs software means using a setting program, and executes the above-described processing by controlling the operation of the software means.
  • the storage medium referred to in this specification is not limited to those for distribution, and includes storage media such as magnetic disks and semiconductor memories provided inside computers or devices connected via a network.
  • the present invention is not limited to the above-described embodiments, and various modifications can be made at the implementation stage without departing from the spirit of the invention. Moreover, each embodiment may be implemented in combination as appropriate as possible, and in that case, the combined effects can be obtained. Further, the embodiments described above include inventions at various stages, and various inventions can be extracted by appropriately combining the plurality of disclosed constituent elements.
  • Video processing device 2 ... Input device 3
  • Output device 4 ... Camera 5
  • Inertial sensor 10 ... Control unit 101... Image acquisition unit 102... Moving object detection unit 103... Feature point extraction unit 104... Line of sight estimation unit 105... Line of sight movement Estimation unit 106...Sensor data control unit 107...Object determination unit 108...AR content drawing unit 109...Output control unit 20...Program storage unit 30...Data storage unit 301...Image storage unit 302...Spatial storage unit 40...Communication interface 50 ...I/O interface MB...Mobile object

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A user-worn video processing device according to one embodiment of the present invention comprises: an image acquisition unit that acquires a captured image captured by a camera, the captured image containing a moving body and an environment; a moving body detection unit that detects the moving body from the captured image; a line-of-sight estimation unit that estimates the line of sight of a user on the basis of the captured image; a line-of-sight movement detection unit that acquires sensor data from a sensor provided in the video processing device and detects a movement of the line of sight of the user with respect to the estimated line of sight as a start point on the basis of the sensor data; a determination unit that determines, on the basis of the detected movement of the line of sight of the user, whether the moving body is a moving body that the user is riding; a drawing unit that computes the appearance of an AR content on the basis of the estimated movement of the line of sight; and an output control unit that causes the computed AR content to be displayed.

Description

映像処理装置、映像処理方法、および映像処理プログラムVideo processing device, video processing method, and video processing program
 この発明は、映像処理装置、映像処理方法、および映像処理プログラムに関する。 The present invention relates to a video processing device, a video processing method, and a video processing program.
 拡張現実(AR:Augmented Reality)システムを利用するユーザは、携帯端末またはARデバイス越しに現実世界の実空間を見ることが可能である。この際、実空間に付加情報として、ナビゲーション情報または3Dデータ等のコンテンツ(以下、ARコンテンツと記載する)が提示される。すなわち、ARシステムを利用するユーザは、現実世界にARコンテンツが重ねて表示されており、このコンテンツの情報を利用することができる。 A user using an augmented reality (AR) system can view the real space of the real world through a mobile terminal or an AR device. At this time, content such as navigation information or 3D data (hereinafter referred to as AR content) is presented as additional information in the real space. That is, a user using an AR system can see AR content superimposed on the real world and use information about this content.
 例えば、ARシステムを利用するユーザが移動体に乗って移動している場合、ARデバイスにより映し出される映像は、カメラの映像に環境を映した部分(自転車の前方の風景)と、移動体を映した部分(自転車の車体の一部)とが混在する。 For example, when a user using an AR system is moving around on a moving object, the image displayed by the AR device will include a portion of the camera image showing the environment (scenery in front of the bicycle) and a portion showing the moving object. (a part of the bicycle body).
 ユーザが乗っている移動体と似た移動体が並走するような場面において、ユーザが乗っている移動体に加えて、ユーザが乗っていない移動体も認識してしまう。その結果、ARデバイスは、正しい位置にARコンテンツを表示するとともに、ユーザが乗っていない移動体に対応する個所にもARコンテンツを表示してしまうという問題がある。 In a situation where a moving object similar to the one on which the user is riding is running parallel to the one on which the user is riding, in addition to the moving object on which the user is riding, a moving object on which the user is not riding is also recognized. As a result, there is a problem in that the AR device displays the AR content at the correct position, and also displays the AR content at a location corresponding to a moving object on which the user is not riding.
 この発明は上記事情に着目してなされたもので、その目的とするところは、ユーザが乗っている移動体に加えて、ユーザが乗っていない移動体も認識した場合に、ユーザが乗っていない移動体に対応する個所にもARコンテンツを表示しないようにすることができる技術を提供することにある。 This invention was made in view of the above-mentioned circumstances, and its purpose is to detect whether the user is not riding in the case where a moving object on which the user is not riding is recognized in addition to the moving object on which the user is riding. The objective is to provide a technology that can prevent AR content from being displayed even in locations corresponding to moving objects.
 上記課題を解決するためにこの発明の一態様は、ユーザが装着する映像処理装置であって、カメラで撮影した、移動体と環境とを含む撮影画像を取得する画像取得部と、前記撮影画像から前記移動体を検出する移動体検出部と、前記撮影画像に基づいて前記ユーザの視線を推定する視線推定部と、前記映像処理装置が備えるセンサからセンサデータを取得し、前記センサデータに基づいて、前記推定された視線を起点とした前記ユーザの視線の移動を検出する視線移動検出部と、前記検出されたユーザの視線の移動に基づいて、前記移動体が前記ユーザが乗車した移動体であるかどうか判定する判定部と、前記推定された視線の移動に基づいて、ARコンテンツの見え方を算出する描画部と、前記算出されたARコンテンツを表示するように制御する出力制御部と、を備えるようにしたものである。 In order to solve the above problems, one aspect of the present invention is an image processing device worn by a user, which includes an image acquisition unit that acquires a photographed image including a moving body and an environment, which is photographed by a camera; a moving object detection section that detects the moving object from the camera; a line of sight estimating section that estimates the user's line of sight based on the photographed image; and a line of sight estimation section that acquires sensor data from a sensor included in the video processing device, and acquires sensor data from a sensor included in the image processing device, and acquires sensor data based on the sensor data. a line-of-sight movement detection unit that detects a movement of the user's line of sight starting from the estimated line of sight; a determination unit that determines whether or not the AR content is displayed; a drawing unit that calculates how the AR content looks based on the estimated movement of the line of sight; and an output control unit that controls the display of the calculated AR content. .
 この発明の一態様によれば、外見の似た移動体が撮影画像中に写っていたとしても、ユーザが乗っている移動体に対応する個所にのみARコンテンツを表示することができ、これにより、ユーザにARコンテンツを正確に提示することが可能となる。 According to one aspect of the present invention, even if a moving object with a similar appearance appears in a photographed image, AR content can be displayed only at a location corresponding to the moving object on which the user is riding. , it becomes possible to accurately present AR content to the user.
図1は、実施形態に係る映像処理装置のハードウェア構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the hardware configuration of a video processing device according to an embodiment. 図2は、実施形態における映像処理装置のソフトウェア構成を、図1に示したハードウェア構成に関連付けて示すブロック図である。FIG. 2 is a block diagram showing the software configuration of the video processing apparatus in the embodiment in relation to the hardware configuration shown in FIG. 1. 図3は、映像処理装置が撮影画像の正しい位置にのみARコンテンツを表示させるための動作の一例を示すフローチャートである。FIG. 3 is a flowchart illustrating an example of an operation by which the video processing device displays AR content only at the correct position of a captured image. 図4は、撮影画像の一例を示す図である。FIG. 4 is a diagram showing an example of a photographed image. 図5は、撮影画像において移動体が検出した際の一例を示す図である。FIG. 5 is a diagram illustrating an example when a moving object is detected in a photographed image. 図6は、空間記憶部に記憶された「特徴点空間」および「ARコンテンツ空間」の一例を示した図である。FIG. 6 is a diagram showing an example of a "feature point space" and an "AR content space" stored in the space storage unit. 図7は、「ユーザが乗車している移動体を対象物とする視線の起点の範囲」および「ユーザの移動体を対象物とする視線の方向の範囲」の一例を示した図である。FIG. 7 is a diagram illustrating an example of "the range of the starting point of the line of sight with the moving object on which the user is riding as an object" and the "range of the direction of the line of sight with the moving object of the user as the object". 図8は、算出されたARコンテンツの見え方の一例を示した図である。FIG. 8 is a diagram showing an example of how the calculated AR content looks.
 以下、図面を参照してこの発明に係る実施形態を説明する。なお、以降、説明済みの要素と同一または類似の要素には同一または類似の符号を付し、重複する説明については基本的に省略する。例えば、複数の同一または類似の要素が存在する場合に、各要素を区別せずに説明するために共通の符号を用いることがあるし、各要素を区別して説明するために当該共通の符号に加えて枝番号を用いることもある。 Hereinafter, embodiments according to the present invention will be described with reference to the drawings. Note that, hereinafter, elements that are the same or similar to elements that have already been explained will be given the same or similar numerals, and overlapping explanations will basically be omitted. For example, when there are multiple identical or similar elements, a common code may be used to explain each element without distinction, or a common code may be used to distinguish and explain each element. In addition, branch numbers may also be used.
 [実施形態] 
 (構成) 
 図1は、実施形態に係る映像処理装置1のハードウェア構成の一例を示すブロック図である。 
 映像処理装置1は、入力されたデータを解析して、出力データを生成し出力する、コンピュータである。映像処理装置1は、例えば、ARグラス、スマートグラス、または他のウェアラブルデバイスを含むARデバイスであって良い。すなわち、映像処理装置1は、ユーザが装着して使用するデバイスであって良い。
[Embodiment]
(composition)
FIG. 1 is a block diagram showing an example of the hardware configuration of a video processing device 1 according to an embodiment.
The video processing device 1 is a computer that analyzes input data, generates and outputs output data. The video processing device 1 may be, for example, an AR device including AR glasses, smart glasses, or other wearable devices. That is, the video processing device 1 may be a device worn and used by a user.
 図1に示すように、映像処理装置1は、制御部10、プログラム記憶部20、データ記憶部30、通信インタフェース40、および入出力インタフェース50を備える。制御部10、プログラム記憶部20、データ記憶部30、通信インタフェース40、および入出力インタフェース50は、バスを介して互いに通信可能に接続されている。さらに通信インタフェース40は、ネットワークを介して外部装置と通信可能に接続されてよい。また、入出力インタフェース50は、入力装置2、出力装置3、カメラ4、および慣性センサ5と通信可能に接続される。 As shown in FIG. 1, the video processing device 1 includes a control section 10, a program storage section 20, a data storage section 30, a communication interface 40, and an input/output interface 50. The control unit 10, program storage unit 20, data storage unit 30, communication interface 40, and input/output interface 50 are communicably connected to each other via a bus. Further, the communication interface 40 may be communicably connected to an external device via a network. Further, the input/output interface 50 is communicably connected to the input device 2, the output device 3, the camera 4, and the inertial sensor 5.
 制御部10は、映像処理装置1を制御する。制御部10は、中央処理ユニット(CPU:Central Processing Unit)等のハードウェアプロセッサを備える。例えば、制御部10は、様々なプログラムを実行することが可能な集積回路であっても良い。 The control unit 10 controls the video processing device 1. The control unit 10 includes a hardware processor such as a central processing unit (CPU). For example, the control unit 10 may be an integrated circuit capable of executing various programs.
 プログラム記憶部20は、記憶媒体として、例えば、EPROM(Erasable Programmable Read Only Memory)、HDD(Hard Disk Drive)、SSD(Solid State Drive)等の随時書込みおよび読出しが可能な不揮発性メモリと、ROM(Read Only Memory)等の不揮発性メモリとを組み合わせて使用することができる。プログラム記憶部20は、各種処理を実行するために必要なプログラムを格納している。すなわち、制御部10は、プログラム記憶部20に格納されたプログラムを読み出して実行することにより各種制御および動作を実現し得る。 The program storage unit 20 includes non-volatile memories that can be written to and read from at any time such as EPROM (Erasable Programmable Read Only Memory), HDD (Hard Disk Drive), and SSD (Solid State Drive), as well as ROM ( It can be used in combination with non-volatile memory such as Read Only Memory). The program storage unit 20 stores programs necessary to execute various processes. That is, the control unit 10 can implement various controls and operations by reading and executing programs stored in the program storage unit 20.
 データ記憶部30は、記憶媒体として、例えば、HDD、メモリカード等の随時書込みおよび読出しが可能な不揮発性メモリと、RAM(Random Access Memory)等の揮発性メモリとを組み合わせて使用したストレージである。データ記憶部30は、制御部10がプログラムを実行して各種処理を行う過程で取得および生成されたデータを記憶するために用いられる。 The data storage unit 30 is a storage that uses a combination of a non-volatile memory that can be written to and read from at any time, such as an HDD or a memory card, and a volatile memory such as a RAM (Random Access Memory), as a storage medium. . The data storage unit 30 is used to store data acquired and generated while the control unit 10 executes programs and performs various processes.
 通信インタフェース40は、1つ以上の有線または無線の通信モジュールを含む。例えば、通信インタフェース40は、ネットワークを介して外部装置と有線または無線接続する通信モジュールを含む。通信インタフェース40は、Wi-Fiアクセスポイントおよび基地局等の外部装置と無線接続する無線通信モジュールを含んでも良い。さらに、通信インタフェース40は、近距離無線技術を利用して外部装置と無線接続するための無線通信モジュールを含んでも良い。すなわち、通信インタフェース40は、制御部10の制御の下、外部装置との間で通信を行い、過去の実績データを含む各種情報を送受信することができるものであれば一般的な通信インタフェースで良い。 The communication interface 40 includes one or more wired or wireless communication modules. For example, the communication interface 40 includes a communication module that makes a wired or wireless connection to an external device via a network. Communication interface 40 may include a wireless communication module that wirelessly connects to external devices such as Wi-Fi access points and base stations. Furthermore, the communication interface 40 may include a wireless communication module for wirelessly connecting to an external device using short-range wireless technology. That is, the communication interface 40 may be any general communication interface as long as it is capable of communicating with an external device under the control of the control unit 10 and transmitting and receiving various information including past performance data. .
 入出力インタフェース50は、入力装置2、出力装置3、カメラ4、慣性センサ5等と接続される。入出力インタフェース50は、入力装置2、出力装置3、および複数のカメラ4、慣性センサ5との間で情報の送受信を可能にするインタフェースである。入出力インタフェース50は、通信インタフェース40と一体であってもよい。例えば、映像処理装置1と、入力装置2、出力装置3、カメラ4、慣性センサ5の少なくとも1つとは、近距離無線技術等を使用して無線接続されており、当該近距離無線技術を用いて情報の送受信を行ってもよい。 The input/output interface 50 is connected to the input device 2, output device 3, camera 4, inertial sensor 5, etc. The input/output interface 50 is an interface that allows information to be transmitted and received between the input device 2, the output device 3, and the plurality of cameras 4 and inertial sensors 5. The input/output interface 50 may be integrated with the communication interface 40. For example, the video processing device 1 and at least one of the input device 2, the output device 3, the camera 4, and the inertial sensor 5 are wirelessly connected using short-range wireless technology or the like. Information may also be sent and received using
 入力装置2は、例えば、ユーザが映像処理装置1に対して過去の実績データを含む各種情報を入力するためのキーボードやポインティングデバイス等を含んでも良い。また、入力装置2は、プログラム記憶部20またはデータ記憶部30に格納するべきデータを、USBメモリ等のメモリ媒体から読み出すためのリーダや、そのようなデータをディスク媒体から読み出すためのディスク装置を含んでも良い。 The input device 2 may include, for example, a keyboard, a pointing device, etc. for the user to input various information including past performance data to the video processing device 1. The input device 2 also includes a reader for reading data to be stored in the program storage section 20 or the data storage section 30 from a memory medium such as a USB memory, and a disk device for reading such data from a disk medium. May be included.
 出力装置3は、カメラ4で撮影した映像、ARコンテンツを表示するディスプレイ等を含む。出力装置3は、映像処理装置1と一体になっていても良い。例えば、映像処理装置1がARグラスまたはスマートグラス等である場合、出力装置3は、グラスの部分になる。 The output device 3 includes a display that displays images captured by the camera 4, AR content, and the like. The output device 3 may be integrated with the video processing device 1. For example, when the video processing device 1 is AR glasses or smart glasses, the output device 3 is a part of the glasses.
 カメラ4は、風景等の環境を撮影することが可能であり、映像処理装置1に装着可能な一般的なカメラ4であって良い。ここで、環境は、一般的に撮影される風景を指す。カメラ4は、映像処理装置1と一体になっていても良い。カメラ4は、撮影した撮影画像を入出力インタフェース50を通じて、映像処理装置1の制御部10に出力して良い。 The camera 4 is capable of photographing environments such as landscapes, and may be a general camera 4 that can be attached to the video processing device 1. Here, the environment generally refers to the scenery that is photographed. The camera 4 may be integrated with the video processing device 1. The camera 4 may output the captured image to the control unit 10 of the video processing device 1 through the input/output interface 50.
 慣性センサ5は、例えば、加速度センサ、角速度センサ、地磁気センサ等を含む。例えば、映像処理装置1がARデバイスである場合、慣性センサ5は、ARデバイスを装着したユーザの移動スピード、頭の動きを感知し、感知に応じたセンサデータを制御部10に出力する。 The inertial sensor 5 includes, for example, an acceleration sensor, an angular velocity sensor, a geomagnetic sensor, and the like. For example, when the video processing device 1 is an AR device, the inertial sensor 5 senses the moving speed and head movement of the user wearing the AR device, and outputs sensor data according to the sensing to the control unit 10.
 図2は、実施形態における映像処理装置1のソフトウェア構成を、図1に示したハードウェア構成に関連付けて示すブロック図である。 
 制御部10は、画像取得部101と、移動体検出部102と、特徴点抽出部103と、視線推定部104と、視線移動推定部105と、センサデータ制御部106と、対象物判定部107と、ARコンテンツ描画部108と、出力制御部109と、を備える。
FIG. 2 is a block diagram showing the software configuration of the video processing device 1 in the embodiment in relation to the hardware configuration shown in FIG. 1.
The control unit 10 includes an image acquisition unit 101 , a moving object detection unit 102 , a feature point extraction unit 103 , a line of sight estimation unit 104 , a line of sight movement estimation unit 105 , a sensor data control unit 106 , and a target object determination unit 107 , an AR content drawing section 108 , and an output control section 109 .
 画像取得部101は、カメラ4が撮影した撮影画像を取得する。なお、画像取得部101は、撮影画像を画像記憶部301に記憶させて良い。 The image acquisition unit 101 acquires a photographed image taken by the camera 4. Note that the image acquisition unit 101 may store the captured image in the image storage unit 301.
 移動体検出部102は、撮影画像から移動体を検出する。移動体検出部102は、撮影画像に写った移動体MBを検出する。移動体は、原動機付き自転車、電動機付き自転車、自動二輪車、車両等任意のものであって良いのは勿論である。さらに、移動体は、移動体のみを含むだけでも良いし、移動体に乗っているユーザの体の一部、例えば、腕等も含んで良い。また、検出方法は、一般的な技術を用いて良い。また、移動体検出部102は、検出した移動体以外のものである環境の部分を撮影画像から削除して良い。 The moving object detection unit 102 detects a moving object from the captured image. The moving object detection unit 102 detects the moving object MB appearing in the photographed image. Of course, the moving object may be any arbitrary object such as a motorized bicycle, an electric bicycle, a motorcycle, or a vehicle. Further, the moving object may include only the moving object, or may include a part of the body of the user riding the moving object, such as an arm. Further, a general technique may be used as the detection method. Furthermore, the moving object detection unit 102 may delete from the photographed image a portion of the environment that is not the detected moving object.
 特徴点抽出部103は、撮影画像の移動体から特徴点を抽出する。例えば、特徴点抽出部103は、後述する空間記憶部302に記憶された特徴点空間付近にあるものを特徴点として抽出して良い。 The feature point extraction unit 103 extracts feature points from the moving object in the captured image. For example, the feature point extracting unit 103 may extract as feature points those located near the feature point space stored in the space storage unit 302, which will be described later.
 視線推定部104は、特徴点抽出部103が抽出した特徴点を空間記憶部302に記憶された特徴点空間と突合することにより、視線の位置を推定する。なお、視線の位置の推定方法の詳細は後述する。 The line of sight estimation unit 104 estimates the position of the line of sight by comparing the feature points extracted by the feature point extraction unit 103 with the feature point space stored in the spatial storage unit 302. Note that details of the method for estimating the position of the line of sight will be described later.
 視線移動推定部105は、視線移動を推定する。視線移動推定部105は、後述するセンサデータ制御部106により計測された3次元の動きを視線推定部104から受信した視線の位置を起点に移動させることにより、ユーザの頭の動きに追従した視線移動を推定する。すなわち、視線移動推定部105は、センサデータに基づいて、視線推定部104により推定された視線の位置を起点としたユーザの視線の移動を推定する。 The line-of-sight movement estimation unit 105 estimates the line-of-sight movement. The line of sight movement estimation unit 105 moves the three-dimensional movement measured by the sensor data control unit 106 (described later) from the position of the line of sight received from the line of sight estimation unit 104 as a starting point, thereby generating a line of sight that follows the movement of the user's head. Estimate movement. That is, the line-of-sight movement estimating unit 105 estimates the movement of the user's line of sight starting from the line-of-sight position estimated by the line-of-sight estimation unit 104 based on the sensor data.
 センサデータ制御部106は、慣性センサ5からセンサデータを取得する。そしてセンサデータ制御部106は、取得したセンサデータからユーザの頭の動き、体の動き等を計測する。例えば、センサデータ制御部106は、センサデータに基づいてユーザの3次元の動き(例えば、ユーザの頭の動き)を計測する。 The sensor data control unit 106 acquires sensor data from the inertial sensor 5. Then, the sensor data control unit 106 measures the user's head movement, body movement, etc. from the acquired sensor data. For example, the sensor data control unit 106 measures the user's three-dimensional movement (for example, the user's head movement) based on the sensor data.
 対象物判定部107は、対象物がユーザの乗車している移動体であるかどうかを判定する。例えば、対象物判定部107は、検出されたユーザの視線の移動に基づいて、移動体がユーザが乗車した移動体であるかどうか判定する。具体的には対象物判定部107は、検出されたユーザの視線の移動先の視線の起源が所定の範囲内にあり、且つ検出されたユーザの視線の方向が所定の範囲からユーザが乗車する移動体が見える範囲内にいる場合、移動体がユーザの乗車した移動体であると判定する。所定の範囲は、移動体の座る位置からヒトの平均座高長をだけ高くし、且つユーザが乗車する姿勢を考慮した立方体の範囲であって良い。 The target object determining unit 107 determines whether the target object is a moving object on which the user is riding. For example, the target object determining unit 107 determines whether the moving object is the moving object in which the user is riding, based on the detected movement of the user's line of sight. Specifically, the object determination unit 107 determines whether the origin of the detected line of sight of the user's line of sight is within a predetermined range, and the direction of the detected user's line of sight is within the predetermined range when the user gets on the vehicle. If the moving object is within the visible range, it is determined that the moving object is the moving object on which the user is riding. The predetermined range may be a cubic range that is higher than the sitting position of the mobile object by the average sitting height of a person and takes into consideration the posture in which the user rides the vehicle.
 ARコンテンツ描画部108は、ARコンテンツの見え方を算出する。ARコンテンツ描画部108は、視線推定部104から受信した視線移動情報に含まれる推定した視線移動、すなわち、移動先の視線を空間記憶部302に記憶された特徴点空間に対応するARコンテンツ空間に設定し、設定された空間上でのARコンテンツの見え方を算出する。 The AR content drawing unit 108 calculates how the AR content looks. The AR content drawing unit 108 draws the estimated line-of-sight movement included in the line-of-sight movement information received from the line-of-sight estimation unit 104, that is, the movement destination line of sight, into the AR content space corresponding to the feature point space stored in the spatial storage unit 302. and calculates how the AR content will appear in the set space.
 出力制御部109は、ARコンテンツ情報を出力する。出力制御部109は、出力装置3にARコンテンツを描画するように制御する。例えば、出力制御部109は、調整されたARコンテンツをARグラス等に表示するように制御する。 The output control unit 109 outputs AR content information. The output control unit 109 controls the output device 3 to draw AR content. For example, the output control unit 109 controls the adjusted AR content to be displayed on AR glasses or the like.
 データ記憶部30は、画像記憶部301と、空間記憶部302と、を備える。 The data storage unit 30 includes an image storage unit 301 and a spatial storage unit 302.
 画像記憶部301は、画像取得部101が取得した撮影画像を記憶して良い。ここで、画像記憶部301に記憶される撮影画像は、映像処理装置1により取得された、撮影画像が撮影された現実世界における経度および緯度についての情報を有していても良い。また、画像記憶部301は、撮影した画像を所定の時間経過後、自動的に削除して良い。 The image storage unit 301 may store captured images acquired by the image acquisition unit 101. Here, the captured image stored in the image storage unit 301 may have information about the longitude and latitude in the real world where the captured image was captured, which is acquired by the video processing device 1. Further, the image storage unit 301 may automatically delete the captured image after a predetermined period of time has passed.
 空間記憶部302は、特徴点空間と当該特徴点空間に対応したARコンテンツ空間とを記憶している。特徴点空間は、撮影画像中の予め設定された位置にあって良く、例えば、移動体の各部に設定されていても良い。そして、特徴点空間に対応したARコンテンツ空間が設定されて良い。 The space storage unit 302 stores a feature point space and an AR content space corresponding to the feature point space. The feature point space may be located at a preset position in the photographed image, and may be set at each part of the moving body, for example. Then, an AR content space corresponding to the feature point space may be set.
 (動作) 
 最初に、一般的なARシステムにおいて、ユーザが使用する映像処理装置1(携帯端末またはARデバイス)にARコンテンツを表示する方法について説明する。
(motion)
First, a method for displaying AR content on the video processing device 1 (mobile terminal or AR device) used by the user in a general AR system will be described.
 映像処理装置1は、カメラ4が撮影した撮影画像から、特徴点を抽出する。さらに、映像処理装置1は、予め撮影された撮影画像(例えば、1つ前のフレームまたは数フレーム分だけ前のフレームの撮影画像)からも特徴点を抽出したデータ(以下、特徴点空間と称する)と、抽出した特徴点とを突合することにより、ユーザの視線(位置および方向)を推定する。ここで、特徴点空間は、予め決められた利用シーンに基づいて設定された「周辺空間」に対して構築された空間である。そのため、撮影画像も当該周辺空間を撮影したものになる。そのため、特徴点空間および撮影画像の両方とも「周辺空間」に基づいて位置関係を同定していること前提とする。 The video processing device 1 extracts feature points from the image taken by the camera 4. Furthermore, the video processing device 1 also extracts feature points from a captured image that has been captured in advance (for example, a captured image of the previous frame or several frames before the image) (hereinafter referred to as feature point space). ) and the extracted feature points to estimate the user's line of sight (position and direction). Here, the feature point space is a space constructed for a "surrounding space" set based on a predetermined usage scene. Therefore, the captured image is also a captured image of the surrounding space. Therefore, it is assumed that the positional relationships of both the feature point space and the captured image are identified based on the "surrounding space."
 映像処理装置1は、慣性センサ5から受信した慣性データに基づいてユーザの頭の動きを追跡し、上述したユーザの視線をよりリアルタイムに追従する。 The video processing device 1 tracks the movement of the user's head based on the inertial data received from the inertial sensor 5, and tracks the user's line of sight described above in more real time.
 さらに、映像処理装置1は、リアルタイムに追従しているユーザの視線を、予め作成されたARコンテンツ空間上に位置付け、そこからのARコンテンツの見え方を算出する。 Furthermore, the video processing device 1 positions the user's line of sight, which is being followed in real time, on an AR content space created in advance, and calculates how the AR content looks from there.
 そして、映像処理装置1は、出力装置3に当該算出した見え方を描画させる。 Then, the video processing device 1 causes the output device 3 to draw the calculated appearance.
 このようにして、ARシステムでは、ARコンテンツを生成し、映像処理装置1であるスマートフォンまたはARグラスに当該ARコンテンツを表示させる。しかしながら、上述したように、この方法では、カメラ4で撮影された撮影画像に複数の移動体が写ってしまった場合、映像処理装置1は、ARコンテンツを正しい位置に表示させるとともに、ユーザが乗っていない移動体に対応した場所にもARコンテンツを表示してしまうことがある。 In this way, the AR system generates AR content and causes the video processing device 1, such as a smartphone or AR glasses, to display the AR content. However, as described above, in this method, if multiple moving objects are included in the image taken by the camera 4, the video processing device 1 displays the AR content in the correct position, and the user AR content may be displayed in locations that are not compatible with mobile objects.
 そこで、以下では、複数の移動体が写っている撮影画像であっても正しい位置にARコンテンツを表示させるための映像処理装置1の動作について説明する。 Therefore, the operation of the video processing device 1 for displaying AR content at the correct position even in a captured image in which multiple moving objects are captured will be described below.
 図3は、映像処理装置1が撮影画像の正しい位置にのみARコンテンツを表示させるための動作の一例を示すフローチャートである。 
 映像処理装置1の制御部10がプログラム記憶部20に記憶されたプログラムを読み出して実行することにより、このフローチャートの動作が実現される。
FIG. 3 is a flowchart illustrating an example of an operation by which the video processing device 1 displays AR content only at the correct position of a captured image.
The operation of this flowchart is realized by the control unit 10 of the video processing device 1 reading and executing the program stored in the program storage unit 20.
 この動作フローは、例えば、ユーザがARコンテンツの表示を望む指示を入力した、または所定の条件を満たしたため、ARコンテンツを表示する指示を制御部10が出力したことにより開始される。或いは、この動作フローは、映像処理装置1が起動し、カメラ4が撮影画像を取得した場合に開始されても良い。また、この動作において、移動体は、自転車であるとする。 This operation flow is started, for example, when the user inputs an instruction to display AR content or when a predetermined condition is satisfied, and the control unit 10 outputs an instruction to display AR content. Alternatively, this operation flow may be started when the video processing device 1 is activated and the camera 4 acquires a photographed image. Further, in this operation, it is assumed that the moving object is a bicycle.
 ステップST101で、画像取得部101は、カメラ4が撮影した撮影画像を取得する。なお、画像取得部101は、撮影画像を画像記憶部301に記憶させて良い。なお、撮影画像は、環境および移動体が含まれるものとする。ここで、環境は、上述したように、一般的な風景であって良い。そのため、環境は、移動体を除いた部分を指す。 In step ST101, the image acquisition unit 101 acquires a photographed image taken by the camera 4. Note that the image acquisition unit 101 may store the captured image in the image storage unit 301. Note that the photographed image includes the environment and a moving object. Here, the environment may be a general landscape, as described above. Therefore, the environment refers to the part excluding the moving object.
 図4は、撮影画像の一例を示す図である。 
 図4の例では、ユーザが移動体である自転車に乗り、運転している際の撮影画像である。そのため、撮影画像は、移動体および環境が含まれたものである。ここで、カメラ4は、映像処理装置1であるARグラスが具備するカメラ4であり、撮影画像は、このカメラ4により撮影されたものである。
FIG. 4 is a diagram showing an example of a photographed image.
In the example of FIG. 4, the image is taken while the user is riding and driving a bicycle, which is a moving object. Therefore, the photographed image includes the moving object and the environment. Here, the camera 4 is a camera 4 included in the AR glasses that are the video processing device 1, and the photographed image is taken by this camera 4.
 また、図4の例では簡単化のため、ユーザの腕と自転車のハンドルおよび車輪が斜線部で表されている。 Furthermore, in the example of FIG. 4, the user's arms, the bicycle handlebars, and wheels are shown with diagonal lines for the sake of simplicity.
 ステップST102で、移動体検出部102は、撮影画像から移動体MBを検出する。ここで、移動体検出部102は、一般的な方法で物体検出を行い、移動体MBを検出して良い。例えば、物体検出の方法は、非特許文献2に開示されるような物体検出の方法を使用して良い。そのため、物体検出の方法の詳細な説明はここでは省略する。また、移動体検出部102は、移動体MBとして検出した以外の個所を残し、その他の部分を削除する。すなわち、移動体検出部102は、撮影画像中の環境部分を削除する。 In step ST102, the moving object detection unit 102 detects the moving object MB from the captured image. Here, the moving body detection unit 102 may perform object detection using a general method to detect the moving body MB. For example, as the object detection method, an object detection method as disclosed in Non-Patent Document 2 may be used. Therefore, a detailed explanation of the object detection method will be omitted here. Furthermore, the moving object detection unit 102 leaves the portions other than those detected as the moving object MB and deletes the other portions. That is, the moving object detection unit 102 deletes the environmental portion in the captured image.
 図5は、撮影画像において移動体MBが検出された際の一例を示す図である。 
 図5の例では、自転車およびユーザが移動体MBとして検出される。すなわち、移動体MBとして検出される部分は、自転車のハンドルおよび車輪に加えて、ユーザの腕も含む。そして、図5に示すように、環境部分は削除される。
FIG. 5 is a diagram illustrating an example when a moving body MB is detected in a photographed image.
In the example of FIG. 5, a bicycle and a user are detected as mobile objects MB. That is, the portion detected as the mobile body MB includes the user's arm in addition to the bicycle handle and wheels. Then, as shown in FIG. 5, the environment portion is deleted.
 ステップST103で、視線推定部104は、ユーザの視線を推定する。最初に、移動体MBである自転車の各部を特徴点とした「特徴点空間」が空間記憶部302に記憶されているとする。当該特徴点空間におけるハンドル部の中心にスピードメータをARコンテンツとして配置した「ARコンテンツ空間」も併せて空間記憶部302に記憶されているとする。 In step ST103, the line of sight estimation unit 104 estimates the user's line of sight. First, it is assumed that a "feature point space" in which each part of a bicycle, which is a mobile object MB, is a feature point is stored in the space storage unit 302. It is assumed that an "AR content space" in which a speedometer is arranged as an AR content at the center of the handle part in the feature point space is also stored in the space storage unit 302.
 図6は、空間記憶部302に記憶された「特徴点空間」および「ARコンテンツ空間」の一例を示した図である。 FIG. 6 is a diagram showing an example of the "feature point space" and the "AR content space" stored in the space storage unit 302.
 図6の例では「特徴点空間」が参照符号CPで示され、ARコンテンツが参照符号ARCで示されている。図6は一例であり、空間記憶部302は、このような図を複数記憶していて良いのは勿論である。 In the example of FIG. 6, the "feature point space" is indicated by the reference symbol CP, and the AR content is indicated by the reference symbol ARC. FIG. 6 is an example, and it goes without saying that the spatial storage unit 302 may store a plurality of such diagrams.
 特徴点抽出部103は、ステップST102で環境の部分が削除された撮影画像から特徴点を抽出する。すなわち、特徴点抽出部103は、移動体MB内にある特徴点を抽出する。ここで、特徴点の抽出方法は、一般的な方法であって良い。特徴点抽出部104は、撮影画像全体を見渡して、モノの境界(エッジ)または物体の角(コーナー)等の特定の特徴を抽出する。ここで、モノは、物体の境界等であって良い。例えば、図6の例では、移動体MBの境界を特徴点として抽出することが可能である。 The feature point extraction unit 103 extracts feature points from the captured image from which the environment portion has been deleted in step ST102. That is, the feature point extraction unit 103 extracts feature points within the mobile body MB. Here, the feature point extraction method may be a general method. The feature point extraction unit 104 looks over the entire captured image and extracts specific features such as boundaries (edges) of objects or corners of objects. Here, the thing may be a boundary between objects. For example, in the example of FIG. 6, it is possible to extract the boundary of the mobile body MB as a feature point.
 そして、視線推定部104は、特徴点抽出部103が抽出した特徴点を空間記憶部302に記憶された特徴点空間と突合することにより、視線の位置を推定する。例えば、視線推定部104は、ビジョンベースのAR技術等を用いて視線を推定して良い。ビジョンベースのAR技術は、例えば、マーカレスAR技術である、PTAM、SmartAR、Microsoft Hololens等の一般的な技術であって良い。そのため、AR技術についての詳細な説明はここでは省略する。視線推定部104は、推定した視線の位置を視線移動推定部105に出力する。 Then, the line of sight estimating unit 104 estimates the position of the line of sight by comparing the feature points extracted by the feature point extracting unit 103 with the feature point space stored in the spatial storage unit 302. For example, the line of sight estimation unit 104 may estimate the line of sight using vision-based AR technology or the like. The vision-based AR technology may be a general technology such as PTAM, SmartAR, Microsoft Hololens, etc., which are markerless AR technologies, for example. Therefore, a detailed explanation of the AR technology will be omitted here. The line-of-sight estimating unit 104 outputs the estimated position of the line-of-sight to the line-of-sight movement estimation unit 105.
 ステップST104で、視線移動推定部105は、視線移動を推定する。最初に、センサデータ制御部106は、慣性センサ5からセンサデータを取得する。そしてセンサデータ制御部106は、取得したセンサデータからユーザの頭の動き、体の動き等を計測する。具体的には、例えば、慣性センサ5が慣性計測装置(IMU)であり、センサデータ制御部106は、慣性センサ5から加速度、角速度、地磁気等のセンサデータを取得する。そしてセンサデータ制御部106は、これらのデータに基づいてユーザの3次元の動き(例えば、ユーザの頭の動き)を計測して良い。そして、センサデータ制御部106は、計測結果を視線移動推定部105に出力する。 In step ST104, the line-of-sight movement estimation unit 105 estimates the line-of-sight movement. First, the sensor data control unit 106 acquires sensor data from the inertial sensor 5. Then, the sensor data control unit 106 measures the user's head movement, body movement, etc. from the acquired sensor data. Specifically, for example, the inertial sensor 5 is an inertial measurement unit (IMU), and the sensor data control unit 106 acquires sensor data such as acceleration, angular velocity, and geomagnetism from the inertial sensor 5. The sensor data control unit 106 may then measure the user's three-dimensional movement (for example, the user's head movement) based on these data. Then, the sensor data control unit 106 outputs the measurement result to the line of sight movement estimation unit 105.
 視線移動推定部105は、センサデータ制御部106により計測された3次元の動きを視線推定部104から受信した視線の位置を起点に移動させることにより、ユーザの頭の動きに追従した視線移動を推定する。すなわち、視線移動推定部105は、センサデータに基づいて、視線推定部104により推定された視線の位置を起点としたユーザの視線の移動を推定する。そして、視線移動推定部105は、推定した視線移動を含む視線移動情報を対象物判定部107に出力する。 The line-of-sight movement estimating unit 105 moves the three-dimensional movement measured by the sensor data control unit 106 from the position of the line-of-sight received from the line-of-sight estimation unit 104 as a starting point, thereby generating a line-of-sight movement that follows the movement of the user's head. presume. That is, the line-of-sight movement estimating unit 105 estimates the movement of the user's line of sight starting from the line-of-sight position estimated by the line-of-sight estimation unit 104 based on the sensor data. Then, the line-of-sight movement estimation unit 105 outputs line-of-sight movement information including the estimated line-of-sight movement to the target object determination unit 107.
 ステップST105で、対象物判定部107は、対象物がユーザの乗車している移動体MBであるかどうかを判定する。例えば、対象物判定部107は、検出されたユーザの視線の移動に基づいて、移動体MBがユーザが乗車した移動体であるかどうか判定する。具体的には以下の通りである。 In step ST105, the target object determining unit 107 determines whether the target object is the mobile body MB in which the user is riding. For example, the object determination unit 107 determines whether the mobile body MB is a mobile body in which the user rides, based on the detected movement of the user's line of sight. Specifically, the details are as follows.
 最初に、対象物判定部107は、移動体MBに乗車したユーザの頭部が位置する範囲を、「ユーザが乗車している移動体MBを対象物とする視線の起点の範囲」として定義しておく。具体的には、「ユーザが乗車している移動体MBを対象物とする視線の起点の範囲」は、移動体MBの座る位置(例えば自転車のサドル)から一般的なヒトの平均座高長だけ高くし、且つ乗車中のユーザの姿勢を考慮した立方体の範囲により規定される。 First, the target object determination unit 107 defines the range where the head of the user riding the mobile body MB is located as "the range of the starting point of the line of sight with the mobile body MB on which the user is riding as the target object". I'll keep it. Specifically, "the range of the starting point of the line of sight with respect to the mobile object MB on which the user is riding" is determined by the average sitting height of a typical human being from the sitting position of the mobile object MB (for example, the saddle of a bicycle). It is defined by a cubic range that takes into account the height of the vehicle and the posture of the user while riding the vehicle.
 さらに、当該範囲からユーザが乗車する移動体MBを見ることができる視線の範囲を「ユーザの移動体MBを対象物とする視線の方向の範囲」と定義しておく。具体的には、「ユーザの移動体MBを対象物とする視線の方向の範囲」は、ユーザの頭部から移動体MBが見える範囲の円錐体により規定される。 Further, the line-of-sight range in which the user can see the moving body MB on which the user rides is defined as "the line-of-sight range with the user's moving body MB as an object." Specifically, "the range of the direction of the user's line of sight with the mobile body MB as an object" is defined by a cone within the range in which the mobile body MB is visible from the user's head.
 対象物判定部107は、ステップST105で視線移動推定部105が推定した視線移動の起点(すなわち、移動後の視線の起点)が所定の範囲内、すなわち「ユーザが乗車している移動体を対象物とする視線の起点の範囲」に含まれており、且つ、視線の方向(すなわち視線移動後の視線の方向)が「ユーザの移動体を対象物とする視線の方向の範囲」に含まれているかどうかを判定する。 The target object determining unit 107 determines that the starting point of the line of sight movement estimated by the line of sight movement estimation unit 105 in step ST105 (that is, the starting point of the line of sight after movement) is within a predetermined range, that is, “the moving object on which the user is riding is the target object”. and the direction of the line of sight (i.e., the direction of the line of sight after moving the line of sight) is included in the ``range of the line of sight direction where the user's moving body is the object''. Determine whether the
 含まれていると判定した場合、ユーザが乗車した移動体MBであると判定される。この場合、対象物判定部107は、視線移動情報をARコンテンツ描画部108に出力する。そして、処理は、ステップST106に進む。一方、含まれていないと判定した場合、ユーザが乗車していない移動体MBであると判定される。そのため、処理は終了する。すなわち、ユーザが乗車していないと判定された移動体MBに対応する個所にARコンテンツを表示しないことになる。 If it is determined that the mobile object MB is included, it is determined that the mobile object MB is the one on which the user rides. In this case, the target object determination unit 107 outputs the line of sight movement information to the AR content drawing unit 108. The process then proceeds to step ST106. On the other hand, if it is determined that the mobile body MB is not included, it is determined that the mobile body MB is not boarded by a user. Therefore, the process ends. That is, the AR content will not be displayed at a location corresponding to the mobile body MB in which it is determined that the user is not riding.
 図7は、「ユーザが乗車している移動体MBを対象物とする視線の起点の範囲」および「ユーザの移動体MBを対象物とする視線の方向の範囲」の一例を示した図である。 FIG. 7 is a diagram showing an example of "the range of the starting point of the line of sight with the mobile body MB on which the user is riding as the target object" and the "range of the direction of the line of sight with the user's mobile body MB as the target object". be.
 図7では、「ユーザが乗車している移動体MBを対象物とする視線の起点の範囲」を参照符号Oriとして表し、「ユーザの移動体MBを対象物とする視線の方向の範囲」を参照符号Diで表している。 In FIG. 7, the "range of the starting point of the line of sight with the mobile body MB on which the user is riding as the target object" is represented by the reference symbol Ori, and the "range of the direction of the line of sight with the user's mobile body MB as the target object" is expressed as Ori. It is represented by the reference numeral Di.
 また、図7の(a)の視線は、視線の起点が「ユーザが乗車している移動体MBを対象物とする視線の起点の範囲」に含まれ、且つ、視線の方向が「ユーザの移動体MBを対象物とする視線の方向の範囲」に含まれている例を示している。 In addition, the line of sight in (a) of FIG. 7 is such that the starting point of the line of sight is included in the "range of the starting point of the line of sight that targets the mobile body MB on which the user is riding," and the direction of the line of sight is " An example is shown in which the object is included in the range of line-of-sight directions with the moving body MB as an object.
 一方、図7の(b1)の視線は、視線の起点が「ユーザが乗車している移動体MBを対象物とする視線の起点の範囲」に含まれているが、視線の方向が「ユーザの移動体MBを対象物とする視線の方向の範囲」に含まれていない例を示している。さらに、図7の(b2)の視線は、視線の起点が「ユーザが乗車している移動体MBを対象物とする視線の起点の範囲」に含まれていないが、視線の方向が「ユーザの移動体MBを対象物とする視線の方向の範囲」に含まれている例を示している。 On the other hand, in the line of sight in (b1) of FIG. 7, the starting point of the line of sight is included in the "range of the starting point of the line of sight that targets the mobile body MB on which the user is riding," but the direction of the line of sight is "the user's An example is shown in which the moving object MB is not included in the line-of-sight direction range in which the object is the moving body MB. Furthermore, in the line of sight in (b2) of FIG. 7, although the starting point of the line of sight is not included in the "range of the starting point of the line of sight that targets the mobile body MB on which the user is riding," the direction of the line of sight is "the user's An example is shown in which the range of line-of-sight directions with the moving body MB as the object is included.
 図7の(a)の視線は、ユーザの視線であると判定される一方、(b1)および(b2)は、ユーザの視線ではないと判定されることになる。 The line of sight in (a) of FIG. 7 is determined to be the user's line of sight, while (b1) and (b2) are determined not to be the user's line of sight.
 ステップST106で、ARコンテンツ描画部108は、ARコンテンツの見え方を算出する。ARコンテンツ描画部108は、視線推定部104から受信した視線移動情報に含まれる推定した視線移動、すなわち、移動先の視線を空間記憶部302に記憶された特徴点空間に対応するARコンテンツ空間に設定し、設定された空間上でのARコンテンツの見え方を算出する。 In step ST106, the AR content drawing unit 108 calculates how the AR content looks. The AR content drawing unit 108 draws the estimated line-of-sight movement included in the line-of-sight movement information received from the line-of-sight estimation unit 104, that is, the movement destination line of sight, into the AR content space corresponding to the feature point space stored in the spatial storage unit 302. and calculates how the AR content will appear in the set space.
 図8は、算出されたARコンテンツの見え方の一例を示した図である。 
 図8に示すように、ARコンテンツ描画部108は、視線移動の動き情報に基づいてARコンテンツの見え方を調整し、調整したARコンテンツを描画するためのARコンテンツ情報を出力制御部109に出力する。
FIG. 8 is a diagram showing an example of how the calculated AR content looks.
As shown in FIG. 8, the AR content drawing unit 108 adjusts the appearance of the AR content based on the movement information of the eye movement, and outputs AR content information for drawing the adjusted AR content to the output control unit 109. do.
 ステップST107で、出力制御部109は、ARコンテンツ情報を出力する。出力制御部109は、出力装置3にARコンテンツを描画するように制御する。例えば、出力制御部109は、調整されたARコンテンツをARグラス等に表示するように制御する。 In step ST107, the output control unit 109 outputs AR content information. The output control unit 109 controls the output device 3 to draw AR content. For example, the output control unit 109 controls the adjusted AR content to be displayed on AR glasses or the like.
 (実施形態の作用効果) 
 実施形態によれば、映像処理装置1は、外見の似た移動体MBが撮影画像中に写っていたとしても、ユーザが乗っている移動体MBに対応する個所にのみARコンテンツを表示することができる。これにより、映像処理装置1は、ユーザにARコンテンツを正確に提示することが可能となる。
(Operations and effects of embodiments)
According to the embodiment, the video processing device 1 is capable of displaying AR content only at a location corresponding to the mobile body MB on which the user is riding, even if a mobile body MB with similar appearance is included in the captured image. I can do it. This allows the video processing device 1 to accurately present AR content to the user.
 [他の実施形態]
 上記の実施形態では、映像処理装置1が具備するカメラ4で撮影された撮影画像を用いる例を説明したが、撮影画像は、映像処理装置1が具備したカメラ4であることに限られない。例えば、映像処理装置1に接続された独立したカメラ4であって良い。ただし、カメラ4は、ユーザの視線を推定することができる撮影画像が撮影可能な場所(例えば、ユーザの頭の上等)に設置されているとする。
[Other embodiments]
In the embodiment described above, an example has been described in which a photographed image photographed by the camera 4 included in the video processing device 1 is used, but the photographed image is not limited to the camera 4 included in the video processing device 1. For example, it may be an independent camera 4 connected to the video processing device 1. However, it is assumed that the camera 4 is installed at a location (for example, above the user's head) where it can capture a captured image that can estimate the user's line of sight.
 また、前記実施形態に記載した手法は、計算機(コンピュータ)に実行させることができるプログラム(ソフトウェア手段)として、例えば磁気ディスク(フロッピー(登録商標)ディスク、ハードディスク等)、光ディスク(CD-ROM、DVD、MO等)、半導体メモリ(ROM、RAM、フラッシュメモリ等)等の記憶媒体に格納し、また通信媒体により伝送して頒布することもできる。なお、媒体側に格納されるプログラムには、計算機に実行させるソフトウェア手段(実行プログラムのみならずテーブル、データ構造も含む)を計算機内に構成させる設定プログラムをも含む。本装置を実現する計算機は、記憶媒体に記憶されたプログラムを読み込み、また場合により設定プログラムによりソフトウェア手段を構築し、このソフトウェア手段によって動作が制御されることにより上述した処理を実行する。なお、本明細書で言う記憶媒体は、頒布用に限らず、計算機内部或いはネットワークを介して接続される機器に設けられた磁気ディスク、半導体メモリ等の記憶媒体を含むものである。 Furthermore, the method described in the above embodiments can be applied to, for example, magnetic disks (floppy (registered trademark) disks, hard disks, etc.), optical disks (CD-ROMs, DVDs, etc.) as programs (software means) that can be executed by a computer. , MO, etc.), semiconductor memory (ROM, RAM, flash memory, etc.), and can also be transmitted and distributed via a communication medium. Note that the programs stored on the medium side also include a setting program for configuring software means (including not only execution programs but also tables and data structures) in the computer to be executed by the computer. A computer that realizes this device reads a program stored in a storage medium, and if necessary, constructs software means using a setting program, and executes the above-described processing by controlling the operation of the software means. Note that the storage medium referred to in this specification is not limited to those for distribution, and includes storage media such as magnetic disks and semiconductor memories provided inside computers or devices connected via a network.
 要するに、この発明は上記実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。また、各実施形態は可能な限り適宜組み合わせて実施してもよく、その場合組み合わせた効果が得られる。さらに、上記実施形態には種々の段階の発明が含まれており、開示される複数の構成要件における適当な組み合わせにより種々の発明が抽出され得る。 In short, the present invention is not limited to the above-described embodiments, and various modifications can be made at the implementation stage without departing from the spirit of the invention. Moreover, each embodiment may be implemented in combination as appropriate as possible, and in that case, the combined effects can be obtained. Further, the embodiments described above include inventions at various stages, and various inventions can be extracted by appropriately combining the plurality of disclosed constituent elements.
 1…映像処理装置
 2…入力装置
 3…出力装置
 4…カメラ
 5…慣性センサ
 10…制御部
 101…画像取得部
 102…移動体検出部
 103…特徴点抽出部
 104…視線推定部
 105…視線移動推定部
 106…センサデータ制御部
 107…対象物判定部
 108…ARコンテンツ描画部
 109…出力制御部
 20…プログラム記憶部
 30…データ記憶部
 301…画像記憶部
 302…空間記憶部
 40…通信インタフェース
 50…入出力インタフェース
 MB…移動体
1... Video processing device 2... Input device 3... Output device 4... Camera 5... Inertial sensor 10... Control unit 101... Image acquisition unit 102... Moving object detection unit 103... Feature point extraction unit 104... Line of sight estimation unit 105... Line of sight movement Estimation unit 106...Sensor data control unit 107...Object determination unit 108...AR content drawing unit 109...Output control unit 20...Program storage unit 30...Data storage unit 301...Image storage unit 302...Spatial storage unit 40...Communication interface 50 …I/O interface MB…Mobile object

Claims (8)

  1.  ユーザが装着する映像処理装置であって、
     カメラで撮影した、移動体と環境とを含む撮影画像を取得する画像取得部と、
     前記撮影画像から前記移動体を検出する移動体検出部と、
     前記撮影画像に基づいて前記ユーザの視線を推定する視線推定部と、
     前記映像処理装置が備えるセンサからセンサデータを取得し、前記センサデータに基づいて、前記推定された視線を起点とした前記ユーザの視線の移動を検出する視線移動検出部と、
     前記検出されたユーザの視線の移動に基づいて、前記移動体が前記ユーザが乗車した移動体であるかどうか判定する判定部と、
     前記推定された視線の移動に基づいて、ARコンテンツの見え方を算出する描画部と、
     前記算出されたARコンテンツを表示するように制御する出力制御部と、
     を備える映像処理装置。
    A video processing device worn by a user,
    an image acquisition unit that acquires a photographed image including a moving object and the environment, taken by a camera;
    a moving object detection unit that detects the moving object from the captured image;
    a line-of-sight estimation unit that estimates the user's line-of-sight based on the captured image;
    a line-of-sight movement detection unit that acquires sensor data from a sensor included in the video processing device, and detects movement of the user's line of sight starting from the estimated line-of-sight based on the sensor data;
    a determination unit that determines whether the moving object is a moving object on which the user rides based on the detected movement of the user's line of sight;
    a drawing unit that calculates how the AR content looks based on the estimated movement of the line of sight;
    an output control unit that controls to display the calculated AR content;
    An image processing device comprising:
  2.  前記移動体検出部は、検出された移動体を除いた前記環境の部分を前記撮影画像から削除する、請求項1に記載の映像処理装置。 The video processing device according to claim 1, wherein the moving object detection unit deletes a portion of the environment excluding the detected moving object from the captured image.
  3.  前記判定部は、前記検出されたユーザの視線の移動先の視線の起源が所定の範囲内にあり、且つ前記検出されたユーザの視線の方向が前記所定の範囲から前記ユーザが乗車する移動体が見える範囲内にある場合、前記移動体が前記ユーザの乗車した移動体であると判定する、請求項1に記載の映像処理装置。 The determination unit is configured to determine whether the origin of the line of sight of the detected user's line of sight is within a predetermined range, and the direction of the detected user's line of sight is from the predetermined range to a moving body on which the user rides. The video processing device according to claim 1, wherein the video processing device determines that the moving object is a moving object ridden by the user when the moving object is within a visible range.
  4.  前記所定の範囲は、前記移動体の座る位置からヒトの平均座高長だけ高くし、且つ前記ユーザが乗車する姿勢を考慮した立方体の範囲である、請求項3に記載の映像処理装置。 The video processing device according to claim 3, wherein the predetermined range is a cubic range that is elevated from the sitting position of the movable object by an average sitting height of a human being, and that takes into account the posture in which the user rides the vehicle.
  5.  前記移動体内にある特徴点を抽出する特徴点抽出部と、
     特徴点空間を記憶する記憶部と、
     をさらに備え、前記視線推定部は、前記抽出された特徴点および前記特徴点空間に基づいて視線を推定する、請求項1に記載の映像処理装置。
    a feature point extraction unit that extracts feature points within the moving object;
    a storage unit that stores the feature point space;
    The video processing device according to claim 1, further comprising: the line of sight estimating unit estimates the line of sight based on the extracted feature points and the feature point space.
  6.  前記記憶部は、前記特徴点空間に対応したARコンテンツ空間をさらに記憶し、
     前記描画部は、前記視線の移動先を前記ARコンテンツ空間に設定し、前記設定された空間上で前記ARコンテンツの見え方を算出する、請求項5に記載の映像処理装置。
    The storage unit further stores an AR content space corresponding to the feature point space,
    The video processing device according to claim 5, wherein the drawing unit sets the destination of the line of sight to the AR content space, and calculates how the AR content looks in the set space.
  7.  ユーザが装着する映像処理装置のプロセッサが実行する映像処理方法であって、
     カメラで撮影した、移動体と環境とを含む撮影画像を取得することと、
     前記撮影画像から前記移動体を検出することと、
     前記撮影画像に基づいて前記ユーザの視線を推定することと、
     前記映像処理装置が備えるセンサからセンサデータを取得することと、
     前記センサデータに基づいて、前記推定された視線を起点とした前記ユーザの視線の移動を検出することと、
     前記検出されたユーザの視線の移動に基づいて、前記移動体が前記ユーザが乗車した移動体であるかどうか判定することと、
     前記推定された視線の移動に基づいて、ARコンテンツの見え方を算出することと、
     前記算出されたARコンテンツを表示するように制御することと、
     を備える映像処理方法。
    A video processing method executed by a processor of a video processing device worn by a user, the method comprising:
    Obtaining an image captured by a camera that includes a moving object and an environment;
    Detecting the moving object from the captured image;
    Estimating the user's line of sight based on the captured image;
    acquiring sensor data from a sensor included in the video processing device;
    Detecting a movement of the user's line of sight starting from the estimated line of sight based on the sensor data;
    Based on the detected movement of the user's line of sight, determining whether the moving object is a moving object on which the user rides;
    Calculating how the AR content looks based on the estimated movement of the line of sight;
    Controlling to display the calculated AR content;
    A video processing method comprising:
  8.  ユーザに装着された映像処理装置のプロセッサによって実行させるための命令を備える映像処理プログラムであって、前記命令は、
     カメラで撮影した、移動体と環境とを含む撮影画像を取得することと、
     前記撮影画像から前記移動体を検出することと、
     前記撮影画像に基づいて前記ユーザの視線を推定することと、
     前記映像処理装置が備えるセンサからセンサデータを取得することと、
     前記センサデータに基づいて、前記推定された視線を起点とした前記ユーザの視線の移動を検出することと、
     前記検出されたユーザの視線の移動に基づいて、前記移動体が前記ユーザが乗車した移動体であるかどうか判定することと、
     前記推定された視線の移動に基づいて、ARコンテンツの見え方を算出することと、
     前記算出されたARコンテンツを表示するように制御することと、
     を備える、映像処理プログラム。
    A video processing program comprising instructions to be executed by a processor of a video processing device worn by a user, the instructions comprising:
    Obtaining an image captured by a camera that includes a moving object and an environment;
    Detecting the moving object from the captured image;
    Estimating the user's line of sight based on the captured image;
    acquiring sensor data from a sensor included in the video processing device;
    Detecting a movement of the user's line of sight starting from the estimated line of sight based on the sensor data;
    Based on the detected movement of the user's line of sight, determining whether the moving object is a moving object on which the user rides;
    Calculating how the AR content looks based on the estimated movement of the line of sight;
    Controlling to display the calculated AR content;
    A video processing program with
PCT/JP2022/031905 2022-08-24 2022-08-24 Video processing device, video processing method, and video processing program WO2024042645A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/031905 WO2024042645A1 (en) 2022-08-24 2022-08-24 Video processing device, video processing method, and video processing program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/031905 WO2024042645A1 (en) 2022-08-24 2022-08-24 Video processing device, video processing method, and video processing program

Publications (1)

Publication Number Publication Date
WO2024042645A1 true WO2024042645A1 (en) 2024-02-29

Family

ID=90012790

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/031905 WO2024042645A1 (en) 2022-08-24 2022-08-24 Video processing device, video processing method, and video processing program

Country Status (1)

Country Link
WO (1) WO2024042645A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018207426A1 (en) * 2017-05-09 2018-11-15 ソニー株式会社 Information processing device, information processing method, and program
WO2019087658A1 (en) * 2017-11-01 2019-05-09 ソニー株式会社 Information processing device, information processing method, and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018207426A1 (en) * 2017-05-09 2018-11-15 ソニー株式会社 Information processing device, information processing method, and program
WO2019087658A1 (en) * 2017-11-01 2019-05-09 ソニー株式会社 Information processing device, information processing method, and program

Similar Documents

Publication Publication Date Title
CN106062821B (en) Sensor-based camera motion for unrestricted SLAM detects
TWI534654B (en) Method and computer-readable media for selecting an augmented reality (ar) object on a head mounted device (hmd) and head mounted device (hmd)for selecting an augmented reality (ar) object
US9398210B2 (en) Methods and systems for dealing with perspective distortion in connection with smartphone cameras
WO2016017254A1 (en) Information processing device, information processing method, and program
US20130215149A1 (en) Information presentation device, digital camera, head mount display, projector, information presentation method and non-transitory computer readable medium
KR101341727B1 (en) Apparatus and Method for Controlling 3D GUI
KR20150082379A (en) Fast initialization for monocular visual slam
CN105229720A (en) Display control unit, display control method and recording medium
WO2017126172A1 (en) Information processing device, information processing method, and recording medium
KR20110035609A (en) Apparatus and method for sensing motion
KR20180005528A (en) Display apparatus and method for image processing
US20200341284A1 (en) Information processing apparatus, information processing method, and recording medium
KR101253644B1 (en) Apparatus and method for displaying augmented reality content using geographic information
KR101308184B1 (en) Augmented reality apparatus and method of windows form
JP2019163172A (en) System, information processor, information processing method, program and storage medium
WO2024042645A1 (en) Video processing device, video processing method, and video processing program
JP6621565B2 (en) Display control apparatus, display control method, and program
WO2024042644A1 (en) Video processing device, video processing method, and video processing program
KR101915578B1 (en) System for picking an object base on view-direction and method thereof
US10409464B2 (en) Providing a context related view with a wearable apparatus
JP2018521412A (en) Hypothetical line mapping and validation for 3D maps
JP2021184115A (en) Information processing device, information processing method and program
US20170161933A1 (en) Mobile virtual reality (vr) operation method, system and storage media
WO2022004483A1 (en) Information processing device, information processing method, and program
KR20180117379A (en) Device for providing augmented reality album, server for the same and method for the same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22956473

Country of ref document: EP

Kind code of ref document: A1