WO2022220048A1 - System, information processing method, and information processing program - Google Patents

System, information processing method, and information processing program Download PDF

Info

Publication number
WO2022220048A1
WO2022220048A1 PCT/JP2022/013835 JP2022013835W WO2022220048A1 WO 2022220048 A1 WO2022220048 A1 WO 2022220048A1 JP 2022013835 W JP2022013835 W JP 2022013835W WO 2022220048 A1 WO2022220048 A1 WO 2022220048A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
controller
unit
image signal
state
Prior art date
Application number
PCT/JP2022/013835
Other languages
French (fr)
Japanese (ja)
Inventor
直之 宮田
英樹 柳澤
麻美子 石田
英明 岩木
Original Assignee
株式会社ソニー・インタラクティブエンタテインメント
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社ソニー・インタラクティブエンタテインメント filed Critical 株式会社ソニー・インタラクティブエンタテインメント
Publication of WO2022220048A1 publication Critical patent/WO2022220048A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range

Definitions

  • the present invention relates to a system, an information processing method, and an information processing program.
  • An event-driven vision sensor in which pixels that detect changes in the intensity of incident light generate signals asynchronously with time.
  • An event-driven vision sensor is advantageous in that it can operate at low power and at high speed compared to a frame-type vision sensor that scans all pixels at predetermined intervals, specifically image sensors such as CCD and CMOS. is. Techniques related to such an event-driven vision sensor are described in Patent Document 1 and Patent Document 2, for example.
  • An object is to provide an information processing method and an information processing program.
  • a system including a sensor device, a controller that receives user operations, and an information processing device that performs processing based on user operations, wherein the sensor device detects all pixels at a predetermined timing.
  • a first image sensor that scans synchronously to generate a first image signal
  • an event-driven vision sensor that asynchronously generates a second image signal when detecting a change in the intensity of light incident on each pixel.
  • the controller includes a control value calculation unit that calculates a value, and the controller is a haptic presentation device that presents a haptic sensation based on the control value, a vibration device that vibrates based on the control value, or a voice that outputs sound based on the control value.
  • a system is provided having at least one of the output devices.
  • an information processing method for outputting a control value for feedback control to a controller, the first image generated by a first image sensor synchronously scanning all pixels at a predetermined timing. Based on the image signal and a second image signal generated by a second image sensor including an event-driven vision sensor that asynchronously generates a second image signal upon detecting a change in the intensity of light incident on each pixel. an information processing method including an estimation step of estimating a user's state and the posture of a controller held by the user; and an information output step of outputting information indicating the user's state and information indicating the posture of the controller. provided.
  • the first image signal is generated by a first image sensor that synchronously scans all pixels at a predetermined timing, and the intensity change of incident light for each pixel is detected.
  • the state of the user based on the image signal generated by the sensor that synchronously scans all pixels at a predetermined timing and the image signal generated by the event-driven vision sensor, the state of the user estimates the attitude of the controller held by , and calculates a control value for feedback control to the controller based on the estimation result. Therefore, it is possible to implement suitable feedback control to the controller while suppressing latency.
  • FIG. 1 is a schematic diagram showing the entire system according to one embodiment of the present invention
  • FIG. 1 is a block diagram showing a schematic configuration of a system according to one embodiment of the present invention
  • FIG. It is a block diagram showing a schematic configuration of an estimating unit in the system according to one embodiment of the present invention.
  • FIG. 4 is a diagram for explaining an example of estimation in one embodiment of the present invention;
  • FIG. 4 is another diagram for explaining an example of estimation in one embodiment of the present invention;
  • FIG. 4 is another diagram for explaining an example of estimation in one embodiment of the present invention;
  • FIG. 4 is another diagram for explaining an example of estimation in one embodiment of the present invention;
  • FIG. 1 is a schematic diagram showing the entire system according to one embodiment of the present invention
  • FIG. 1 is a block diagram showing a schematic configuration of a system according to one embodiment of the present invention
  • FIG. It is a block diagram showing a schematic configuration of an estimating unit in the system according to one embodiment of the present invention.
  • FIG. 4
  • FIG. 4 is another diagram for explaining an example of estimation in one embodiment of the present invention
  • 1 is a flowchart illustrating an example of a processing method according to an embodiment of the invention; It is a figure explaining feedback control in one embodiment of the present invention.
  • FIG. 4 is another diagram illustrating feedback control in one embodiment of the present invention;
  • FIG. 4 is another diagram illustrating feedback control in one embodiment of the present invention;
  • FIG. 4 is another diagram illustrating feedback control in one embodiment of the present invention;
  • FIG. 4 is a block diagram showing a schematic configuration of a system according to another embodiment of the invention;
  • FIG. 1 is a schematic diagram showing the entire system 1 according to this embodiment.
  • a system 1 according to the present embodiment is a game system including a camera unit 10, an information processing device 20, one or more controllers 30, and a display device 40, as shown in FIG.
  • the information processing device 20 is connected to each of the camera unit 10, the controller 30, and the display device 40 via a wired or wireless network.
  • the information processing device 20 progresses the game in accordance with information transmitted from the camera unit 10 and the controller 30, and the display device 40 displays a screen during execution of the information processing device 20, such as a game screen.
  • the camera unit 10 estimates the state of the user who is the game player and the orientation of the controller 30 held by the user, and transmits the estimation results to the information processing device 20 .
  • the state of the user includes at least one of the posture of the user, the shape of the user's arm, or the shape of the user's fingers.
  • the camera unit 10 functions as an operating device for accepting user operations, like the controller 30, by estimating and outputting the user's state and the orientation of the controller 30.
  • FIG. In order to estimate the user's state and the orientation of the controller 30, such a camera unit 10 is arranged at a position where the user can fit in the field of view, for example, at a distance of about 1 meter from the user. In the example of FIG.
  • the camera unit 10 is arranged near the display device 40 .
  • the optimum arrangement position of the camera unit 10 differs depending on the purpose. For example, depending on the content of the game to be played, it is desirable to arrange the camera unit 10 at a position where the target to be grasped, such as the user's whole body, upper body, hand, etc., fits within the field of view.
  • the information processing device 20 may display a tutorial or the like on the display device 40 to guide the user to arrange the camera unit 10 at an appropriate position.
  • FIG. 2 is a block diagram showing a schematic configuration of a system according to one embodiment of the invention.
  • Camera unit 10 includes RGB camera 11 , EDS (Event Driven Sensor) 12 , IMU (Inertial Measurement Unit) 13 , estimation section 14 , and information output section 15 .
  • the RGB camera 11 includes an image sensor 111 as a first image sensor and a processing circuit 112 connected to the image sensor 111 .
  • the image sensor 111 generates RGB image signals 113 by synchronously scanning all pixels at predetermined intervals or at predetermined timings according to user operations, for example.
  • Processing circuitry 112 converts, for example, RGB image signals 113 into a format suitable for storage and transmission.
  • the processing circuit 112 also gives the RGB image signal 113 a time stamp.
  • EDS 12 includes sensor 121 , which is a second image sensor forming a sensor array, and processing circuitry 122 connected to sensor 121 .
  • the sensor 121 includes a light receiving element, and is an event-driven vision that generates an event signal 123 when it detects a change in the intensity of light incident on each pixel, more specifically, a change in brightness exceeding a predetermined value. sensor.
  • the event signal 123 output through the processing circuitry 122 includes the identity of the sensor 121 (eg, pixel location), the polarity of the luminance change (rising or falling), and a time stamp. Further, when detecting a luminance change, the EDS 12 can generate the event signal 123 at a significantly higher frequency than the RGB image signal 113 (frame rate of the RGB camera 11).
  • the EDS 12 can generate the event signal 123 at a significantly higher frequency than the RGB image signal 113 (frame rate of the RGB camera 11).
  • a signal from which an image can be constructed is called an image signal. Therefore, the RGB image signal 113 and the event signal 123 are examples of image signals.
  • the time stamps given to the RGB image signal 113 and the event signal 123 are synchronized.
  • the time stamps applied to the RGB image signal 113 and the event signal 123 can be synchronized, for example, by providing the RGB camera 11 with time information that is used to generate the time stamps in the EDS 12 .
  • time stamps are generated based on the time when a specific event (for example, a subject change over the entire image) occurred. By calculating the offset amount, the time stamps given to the RGB image signal 113 and the event signal 123 can be synchronized afterwards.
  • the sensor 121 of the EDS 12 is associated with one or more pixels of the RGB image signal 113 by the calibration procedure of the RGB camera 11 and the EDS 12 performed in advance, and the event signal 123 is the RGB image. It is generated in response to light intensity variations in one or more pixels of signal 113 . More specifically, for example, a common calibration pattern is captured by the RGB camera 11 and the EDS 12, and corresponding parameters between the camera and the sensor are determined from the internal parameters and external parameters of the RGB camera 11 and the EDS 12, respectively. The calculation allows the sensor 121 to be associated with one or more pixels of the RGB image signal 113 .
  • the IMU 13 is a sensor that detects the orientation of the camera unit 10 itself.
  • the IMU 13 acquires three-dimensional posture data of the camera unit 10 at a predetermined cycle or timing, and outputs the data to the estimation unit 14 .
  • FIG. 3 is a block diagram showing a schematic configuration of the estimation unit 14.
  • Estimating section 14 includes first recognizing section 141 , coordinate calculating section 142 , trained model 143 , state estimating section 144 , second recognizing section 145 , and posture estimating section 146 .
  • a first recognition unit 141 of the estimation unit 14 recognizes one or more users included in the field of view of the camera unit 10 based on at least one of the RGB image signal 113 and the event signal 123 .
  • the first recognition unit 141 detects an object existing in a continuous pixel region in which the event signal 123 indicates that an event of the same polarity has occurred, and detects the object based on the corresponding portion of the RGB image signal 113 . Recognize the user by performing recognition. When multiple users are included in the field of the camera unit 10, the first recognition unit 141 identifies each user.
  • the coordinate calculating unit 142 of the estimating unit 14 calculates, for each user recognized by the first recognizing unit 141, coordinate information indicating the positions of multiple joints of the user from the RGB image signal 113 based on the learned model 143.
  • the trained model 143 can be constructed in advance by executing supervised learning using, for example, an image of a person having multiple joints as input data and coordinate information indicating the positions of the multiple joints of the person as correct data. can. It should be noted that a detailed description of a specific method of machine learning is omitted because various known techniques can be used.
  • the estimating unit 14 includes a relationship learning unit that learns the relationship between the image based on the input RGB image signal 113 and the coordinate information indicating the position of the joint each time the RGB image signal 113 is input. A configuration in which the learned model 143 is updated may be used. Then, the state estimation unit 144 of the estimation unit 14 estimates the user's state based on the coordinate information calculated by the coordinate calculation unit 142 for each user recognized by the first recognition unit 141 .
  • the second recognition unit 145 of the estimation unit 14 recognizes the controller 30 held by each user recognized by the first recognition unit 141 based on at least one of the RGB image signal 113 and the event signal 123 .
  • the second recognition unit 145 performs object recognition on a portion of the RGB image signal 113 corresponding to the vicinity of the user's hand, based on the coordinate information indicating the positions of the multiple joints of the user calculated by the coordinate calculation unit 142.
  • the controller 30 is recognized.
  • the second recognition unit 145 identifies the controller 30 for each user.
  • the identification of the controllers 30, that is, the determination of which controller 30 is held by which user, may be performed in any manner.
  • an identification mark or the like may be attached to each of the plurality of controllers 30, and determination may be made based on the RGB image signal 113.
  • each of the plurality of controllers 30 may output a predetermined identification signal, The determination may be made based on the identification signal received by the camera unit 10 or the information processing device 20 . Further, when the user does not hold the controller 30, the second recognition unit 145 recognizes that "the user does not hold the controller 30".
  • the posture estimation unit 146 of the estimation unit 14 estimates the posture of each controller 30 recognized by the second recognition unit 145 .
  • the posture estimation unit 146 estimates the shape of the controller 30 based on the result of subject recognition performed by the second recognition unit 145 in the RGB image signal 113, and estimates the posture of the controller 30 based on the estimated shape. do.
  • the controller 30 has a sensor that detects the orientation of the controller 30 itself, the orientation of the controller 30 may be determined by taking into account the output of the sensor.
  • the posture estimation unit 146 estimates the posture of each controller 30 . It should be noted that posture estimation section 146 may estimate the posture of controller 30 using a machine learning method using a learned model, similar to coordinate calculation section 142 described above.
  • FIG. Figures 4A-4D illustrate a controller 30 having left and right grip portions.
  • the state of the user estimated by the estimation unit 14 includes at least one of the posture of the user, the shape of the user's arms, or the shape of the user's fingers, as described above.
  • the posture of the user includes, for example, a state in which the user is seated on a chair, a state in which the user stands, a state in which the user faces the camera unit 10, and a state in which the user faces the side. including.
  • the shape of the user's arm includes, for example, a state in which the user raises the arm, a state in which the user moves the arm to take a predetermined pose, and the like.
  • the shape of the user's fingers includes, for example, a state in which the user is taking a predetermined pose such as a peace sign in which the user moves the fingers and raises two fingers, a state in which the user is gripping the controller 30, and the like.
  • the attitude of the controller 30 indicates which part of the controller 30 the user is holding and in what attitude the user is holding the controller 30 .
  • FIG. 4A illustrates a state in which the user takes a bow and arrow pose and holds the central portion of the controller 30 with one hand.
  • the example of FIG. 4A shows an example of a game in which a virtual bow V1 and arrow V2 are manipulated based on the user's state and the posture of the controller 30.
  • FIG. 4B illustrates a state in which the user holds the grip portion of the controller 30 with both hands and rotates it.
  • the example of FIG. B shows an example of a game in which a steering wheel of a motorcycle or the like is operated based on the state of the user and the posture of the controller 30 .
  • FIG. 4A illustrates a state in which the user takes a bow and arrow pose and holds the central portion of the controller 30 with one hand.
  • the example of FIG. 4A shows an example of a game in which a virtual bow V1 and arrow V2 are manipulated based on the user's state and the posture of the controller 30.
  • FIG. 4B illustrates a
  • FIG. 4C illustrates a state in which the user takes a pose of holding a bat and grips the grip portion on one side of the controller 30 with one hand.
  • the example of FIG. 4C shows an example of a game in which a virtual bat V3 is operated based on the state of the user and the posture of the controller 30.
  • FIG. FIG. 4D illustrates a state in which the user is in a shuriken throwing pose without holding the controller 30 .
  • the example of FIG. 4D shows an example of a game in which a virtual shuriken V4 is manipulated based on the user's state without using the controller 30 .
  • the estimation processing by the estimation unit 14 described so far is merely an example, and how and when the RGB image signal 113 and the event signal 123 are used is not limited to this example.
  • the estimation unit 14 may calculate the movement amount of part or all of the user included in the object scene and use the calculation result for the estimation process.
  • the estimation process may be performed using a known method such as a block matching method or a gradient method.
  • the three-dimensional posture data of the camera unit 10 detected by the IMU 13 may be used for the estimation processing by the estimation unit 14 .
  • the information output unit 15 outputs information indicating the state of the user estimated by the estimation unit 14 and information indicating the posture of the controller 30 to the information processing device 20 .
  • the information indicating the state of the user may be coordinate information calculated by the coordinate calculation unit 142 or information indicating the state of the user estimated by the state estimation unit 144 .
  • the information output unit 15 outputs to the information processing device 20 information indicating combinations of users and controllers held by the users. When outputting such information, information such as a table showing a combination of information indicating the user's state and information indicating the attitude of the controller 30 may be output.
  • the identification information of the controller 30 held by the user may be associated, or the information indicating the attitude of the controller 30 may be associated with the information indicating the state of the user holding the controller 30 . Further, the information output unit 15 may output information indicating the user's operation to the information processing device 20 by predetermining the relationship between the user's state, the posture of the controller 30, and the user's operation in a table or the like. .
  • the camera unit 10 functions as an operating device for accepting user operations in the same way as the controller 30 by estimating the user's state and the orientation of the controller 30 .
  • the user operation can be identified based on the state of the user estimated by the estimation unit 14 and the posture of the controller 30 .
  • the camera unit 10 alone completes everything from generating the RGB image signal 113 and the event signal 123 to estimating the state of the user and estimating the posture of the controller 30, and transmits the RGB image signal 113 and the event signal 123 to the information processing device 20.
  • the estimation result can be output without outputting the signal 123 .
  • the camera unit 10 preferably has an independent power source.
  • the information processing device 20 is implemented by a computer having, for example, a communication interface, a processor, and a memory, and includes a communication section 21 and a control section 22.
  • the control unit 22 includes functions of a control value calculation unit 221 and an image generation unit 222 implemented by the processor operating according to a program stored in memory or received via a communication interface. The function of each unit will be further described below.
  • the communication section 21 receives information output from the information output section 15 of the camera unit 10 . Further, the communication unit 21 can communicate with the controller 30 and outputs an image to be displayed on the display device 40 . Based on at least one of the information received from the information output unit 15 of the camera unit 10 and the information received from the controller 30, the control value calculation unit 221 of the control unit 22 determines the output to the external device including the controller 30 and the display device 40. Calculate the control value for feedback control. As described above, the camera unit 10 and controller 30 function as an operation device for accepting user operations. Therefore, the control value calculation unit 221 calculates a control value for feedback control to an external device including the controller 30 and the display device 40 according to a user operation performed via at least one of the camera unit 10 and the controller 30. do.
  • the calculated control value is output to the controller 30 and the display device 40 via the communication section 21 .
  • the image generator 222 of the controller 22 generates a display image to be displayed on the display device 40 according to the control value calculated by the control value calculator 221 .
  • the generated display image is output to the display device 40 via the communication section 21 . Details of calculation of the control values and generation of the display image will be described in connection with the description of the configurations of the controller 30 and the display device 40, which will be described later.
  • the controller 30 includes a communication section 31, an operation section 32, a force sense presentation section 33, a vibration section 34, and an audio output section 35, as shown in FIG.
  • a user can perform various operations related to the game by operating the controller 30 .
  • the communication unit 31 receives the control values output from the communication unit 21 of the information processing device 20 and outputs the control values to the force sense presentation unit 33 , the vibration unit 34 , and the audio output unit 35 .
  • the communication unit 31 also outputs information regarding user operations received by the operation unit 32 to the information processing device 20 .
  • the operation unit 32 includes a plurality of operators such as buttons and pads, and accepts user's operation input to the operators.
  • the haptic presentation unit 33 is provided in at least a part of the operation elements of the operation unit 32 , and presents the user with a force that resists or interlocks with the user's operation according to the control value supplied from the information processing device 20 .
  • the force sense presentation unit 33 can be configured by a motor, an actuator, or the like including a rotating rotor.
  • a well-known device can be used as the haptic device that constitutes the haptic device 33, and detailed description thereof will be omitted here.
  • the vibrating section 34 generates vibration according to a control value supplied from the information processing device 20, and can be configured by a motor, for example.
  • the vibrating unit 34 can notify the user that the user operation has been correctly performed and has been recognized by the information processing device 20 by generating vibration when the user operation is performed.
  • the audio output unit 35 outputs audio according to the control value supplied from the information processing device 20, and can be configured by, for example, a speaker.
  • the audio output unit 35 can notify the user that the user operation has been correctly performed and has been recognized by the information processing apparatus 20 by outputting audio when the user operation is performed.
  • At least one of the vibration by the vibration unit 34 and the sound output by the sound output unit 35 is performed in conjunction with the presentation of the force sense by the force sense presentation unit 33 described above, thereby providing various feedback controls to the user. It is possible to improve
  • the control value calculation unit 221 of the information processing device 20 calculates a control value for feedback control to the controller 30. More specifically, the force sense presentation unit 33, vibration unit 34, and And a control value for feedback control to the audio output unit 35 is calculated. Regarding the force sense presentation unit 33, the control value calculation unit 221 calculates a control value indicating what kind of force sense is to be presented as feedback control according to the user's operation. Regarding the vibration unit 34, the control value calculation unit 221 calculates a control value indicating whether to present what kind of vibration is to be generated as feedback control according to the user's operation.
  • the control value calculation unit 221 calculates a control value indicating what kind of audio is to be output as feedback control according to the user's operation.
  • the calculation of the control value by the control value calculator 221 can be performed according to a predetermined calculation formula, table, or the like.
  • the control value calculation unit 221 recognizes the user , the control value is calculated for each combination with the controller held by the user.
  • the control value calculation unit 221 calculates the force based on the state of the user and the posture of the controller 30. For the sense presenting unit 33 and the vibrating unit 34, a control value is calculated which indicates the presenting of the sense of force corresponding to the recoil when the bow and arrow is actually shot, and the generation of the vibration.
  • the control value calculator 221 also calculates a control value indicating the output of a sound corresponding to the sound when the bow and arrow are actually shot. As shown in the example of FIG.
  • the control value calculation unit 221 determines the force sense presentation unit 33 and the vibration of the controller 30 based on the state of the user and the posture of the controller 30 . With respect to the portion 34, a control value is calculated that indicates the presentation of the force sense corresponding to the reaction to the operation of the steering wheel and the generation of the vibration. Also, the control value calculator 221 calculates a control value indicating the output of the sound corresponding to the operation of the steering wheel. As shown in the example of FIG.
  • the control value calculation unit 221 when a user operation to operate the virtual bat V3 is performed, the control value calculation unit 221 causes the force sense presentation unit 33 and vibration With respect to the section 34, a control value is calculated that indicates the presentation of a force sense corresponding to the recoil when the bat is actually operated and the generation of vibration. Also, the control value calculator 221 calculates a control value indicating the output of a sound corresponding to the sound generated when the bat is actually operated. Note that, as shown in the example of FIG. 4D , when the user's operation is performed without using the controller 30 , the control value calculator 221 does not calculate the control value for feedback control to the controller 30 .
  • the controller 30 may be configured by a pair of controllers that can be held with both hands, may be configured by a controller that allows character input such as a keyboard, or may be configured by an application such as a smartphone. Further, the controller 30 may be provided with a contact sensor, and information indicating the contact state of the user to the contact sensor may be supplied to the information processing device 20 via the communication unit 31 . Such information can be used for calculation of the control value by the control value calculator 221 . Also, the controller 30 may be provided with a voice input unit and a voice recognition technology may be applied. For example, the controller 30 may include a voice input unit such as a microphone and a voice recognition unit, and may supply commands uttered by the user and information indicating user calls to the information processing apparatus 20 via the communication unit 31. good.
  • a voice input unit such as a microphone and a voice recognition unit
  • the display device 40 includes a receiver 41 and a display 42, as shown in FIG.
  • the reception unit 41 receives information indicating the display image generated by the image generation unit 222 of the information processing device 20 via the communication unit 21 .
  • the display unit 42 has a monitor such as an LCD (Liquid Crystal Display) or an organic EL, and can present the information to the user by displaying a display image based on the information received by the reception unit 41 .
  • the display device 40 described above may be configured by the dedicated display device shown in FIG. 1, or may be configured by a display device such as an HMD mounted on the user's head.
  • the display unit of the HMD includes a display element such as LCD (Liquid Crystal Display) or organic EL, and an optical device such as a lens. It may be a transmissive display element.
  • wearable devices such as AR (Augmented Reality) glasses and MR (Mixed Reality) glasses may be used as the HMD.
  • the display device 40 described above may be configured by a display device of a computer, or may be configured by a display device of a terminal device such as a smart phone.
  • a touch panel for detecting contact may be provided on the surface of the display unit 42 .
  • control value calculation unit 221 of the information processing device 20 calculates the control value for feedback control to the display image displayed on the display device 40. More specifically, the control value calculation unit 221 A control value is calculated that indicates how the display image is to be changed as feedback control according to the user's operation. The calculation of the control value by the control value calculator 221 can be performed according to a predetermined calculation formula, table, or the like.
  • the image generation unit 222 of the information processing device 20 generates a display image to be displayed on the display device 40 according to the control value calculated by the control value calculation unit 221, as described above. More specifically, the image generator 222 generates a new display image to be displayed on the display device 40 according to the control value for changing the display image.
  • the state of the user estimated by the camera unit 10 and the orientation of the controller 30 are reflected in the generation of the display image. Therefore, for example, when the user stands still and the posture of the controller 30 does not change, the change in the generated display image is small or there is no change.
  • the image changes according to the user's operation. Also, when a plurality of users are included in the field of the camera unit 10, the generated display image is an image that changes according to the number of users.
  • FIG. 5 is a flow chart showing an example of processing of the camera unit 10 according to one embodiment of the present invention.
  • the image sensor 111 of the RGB camera 11 generates the RGB image signal 113 (step S101) and the sensor 121 of the EDS 12 generates the event signal 123 (step S102).
  • the first recognition unit 141 recognizes the user (step S103), and the coordinate calculation unit 142 and state estimation unit 144 estimate the state of the user (step S104).
  • the second recognition unit 145 estimates the state of the controller 30 (step S105), and the posture estimation unit 146 estimates the posture of the controller 30 (step S106).
  • the information output unit 15 outputs information indicating the state of the user, information indicating the posture of the controller 30, and information indicating the combination of the user and the controller held by the user (step S107).
  • the estimating unit 14 continues outputting information by repeating steps S103 to S107 (the processing of steps S101 and S102 is also repeated, but the cycle may not necessarily be the same as the processing after step S103).
  • the camera unit 10 always outputs information indicating the latest state of the user, information indicating the posture of the controller 30, and information indicating the combination of the user and the controller 30 held by the user.
  • the information processing device 20 calculates a control value for feedback control to the controller 30 based on these pieces of information, thereby realizing suitable feedback control according to changes in the state of the user and the posture of the controller 30. .
  • FIGS. 6A and 6B For example, consider the case where the number of users changes, as illustrated in FIGS. 6A and 6B. As shown in FIG. 6A, when one user U1 exists in the field of the camera unit 10, the camera unit 10 receives information indicating the state of the user U1 and the attitude of the controller 30 U1 held by the user U1.
  • the information processing device 20 calculates a control value for feedback control to the controller 30U1 .
  • the camera unit 10 displays the information indicating the states of the user U1 and the user U2 and the information held by the user U1.
  • information indicating the attitude of controller 30 U1 and controller 30 U2 held by user U2 and information indicating the combination of the user and controller 30 held by the user, and information processing apparatus 20 outputs controller 30 U1 to controller 30 U1. and the control value of the feedback control to the controller 30U2 .
  • the information indicating the user's state, the information indicating the attitude of the controller 30, and the information indicating the combination of the user and the controller 30 held by the user are updated. , it is possible to realize feedback control that always matches the latest state.
  • FIGS. 7A and 7B consider a case where the number of users changes and the users who hold the controller 30 change.
  • the camera unit 10 receives information indicating the state of the user U1 and the attitude of the controller 30 U1 held by the user U1. , and the information processing device 20 calculates a control value for feedback control to the controller 30U1 . After that, as shown in FIG.
  • the camera unit 10 when another user U2 enters the field of view of the camera unit 10 and the user U2 holds the controller 30 U1 held by the user U1, the camera unit 10 , information indicating the states of the user U1 and the user U2, information indicating the posture of the controller 30 U1 held by the user U2, and information indicating the combination of the user and the controller 30 held by the user, and performing information processing.
  • the device 20 calculates a control value for feedback control to the controller 30U1 . In other words, even if the user holding a certain controller 30 changes dynamically, information indicating the state of the user, information indicating the posture of the controller 30, and information indicating the combination of the user and the controller 30 held by the user are Since it is updated, it is possible to realize feedback control that always matches the latest situation.
  • the camera unit 10 is based on the RGB image signal 113 and the event signal 123 generated by the image sensor 111 and the sensor 121, respectively, based on the user's state and the state held by the user.
  • the information processing device 20 performs feedback control to the controller 30 based on at least one of the information indicating the state of the user and the information indicating the orientation of the controller 30. Calculate the control value of Therefore, when a user operation is performed via the camera unit 10, by calculating a control value for feedback control to the controller 30 based on information indicating the user's state and information indicating the attitude of the controller 30, the latency To realize suitable feedback control to a controller while suppressing
  • the camera unit 10 performs from the generation of the RGB image signal 113 and the event signal 123 to the estimation of the state of the user and the posture of the controller 30, and the RGB image signal 113 and the event signal 123 are not output, and information indicating the estimation result can reduce the problem of communication load and communication delay. Furthermore, since it is not necessary to output the RGB image signal 113 and the event signal 123, it is also useful in terms of privacy protection.
  • the camera unit 10 of one embodiment of the present invention can estimate the user's state and the attitude of the controller 30 and accept user operations, the cursor position can be maintained as in the conventional pointing device type operation device. It also does not cause physical fatigue for the user.
  • the camera unit 10 does not require the user to wear a marker or an attachment to be recognized, unlike a conventional posture detection type operating device.
  • both the EDS 12 and the RGB camera 11 are provided, and the user's state and the attitude of the controller 30 are estimated based on the event signal 123 and the RGB image signal 113 . Therefore, it is possible to realize suitable processing that takes advantage of the respective characteristics of the RGB image signal 113 and the event signal 123 .
  • the user's state estimated by the estimation unit 14 includes at least one of the user's posture, the shape of the user's arm, or the shape of the user's fingers. Therefore, it is possible to estimate the characteristic user state and accurately grasp the intention and content of the user's operation.
  • the estimation unit 14 uses a learned model constructed by learning the relationship between an image of a person having multiple joints and coordinate information indicating the positions of the multiple joints. Based on this, the coordinate information of at least one joint of the user included in the image based on the RGB image signal 113 is calculated, and the state of the user is estimated based on the coordinate information. Therefore, the user's condition can be accurately and quickly estimated.
  • the first recognition unit 141 recognizes one or more users included in the object scene based on at least one of the RGB image signal 113 and the event signal 123, and the estimation unit 14 estimates the user's state and the posture of the controller 30 held by the user for each user recognized by the first recognition unit 141 . Therefore, it is possible to grasp user operations for each of a plurality of users included in the field of the camera unit 10 .
  • a second recognition unit 145 is provided for each user recognized by the first recognition unit 141, and the second recognition unit 145 recognizes the controller 30 held by the user. Information indicating the combination of the user recognized by 141 and the controller 30 recognized by the second recognition unit 145 is output. Therefore, even in a state where a plurality of controllers 30 are used by a plurality of users, user operations based on combinations of users and controllers 30 can be grasped and reflected in feedback control.
  • FIG. 8 is a block diagram showing a schematic configuration of a system according to another embodiment of the invention. 8 is a block diagram showing the configuration of a system 2 having a server 50 and a terminal device 60 instead of the information processing device 20 of FIG. Constituent elements having functional configurations are given the same reference numerals.
  • the server 50 is a server (for example, a cloud server) communicably connected to the camera unit 10 and the terminal device 60 via the Internet communication network or wirelessly.
  • the server 50 has the same configuration as the information processing apparatus 20 described with reference to FIG. 2, and performs various processes based on the information output by the camera unit 10.
  • the terminal device 60 also includes a communication unit 61 , and the communication unit 61 receives information output from the server 50 .
  • the communication unit 61 can communicate with the controller 30 and outputs an image to be displayed on the display device 40, like the communication unit 21 of the information processing apparatus 20 described with reference to FIG.
  • the camera unit 10 performs everything from generating the RGB image signal 113 and the event signal 123 to estimating the state of the person, and outputting the estimated information to the server 50 enables the use of a server such as a cloud server. A similar effect can be obtained in a game system that has been used.
  • the number of RGB cameras 11 and EDS 12 may be the same or may be different. Also, the number of RGB cameras 11 and EDS 12 may be one or more.
  • the range of the object field for generating the RGB image signals 113 it is possible to expand the range of the object field for generating the RGB image signals 113, or to estimate the state of a person in three dimensions from the plurality of RGB image signals 113. can.
  • the range of the object field for generating the event signal 123 is expanded, and the three-dimensional movement amount of the person is calculated based on the plurality of event signals 123. be able to.
  • the camera unit 10 described in each of the above examples may be implemented within a single device, or may be implemented distributed among a plurality of devices.
  • at least a portion of each sensor may be provided independently, and another configuration may be implemented as the camera unit 10 main body.

Abstract

Provided is a system including a sensor device, a controller, and an information processing device. The sensor device comprises: a first image sensor for generating a first image signal by synchronously scanning all pixels at a predetermined timing; a second image sensor including an event-driven vision sensor for generating a second image signal; an estimation unit for estimating, on the basis of the first image signal and the second image signal, a state of a user, and a posture of the controller held by the user; and an information output unit for outputting the estimation result. The information processing device comprises a control value calculation unit for calculating a control value for feedback control to the controller, on the basis of at least one of information indicating the state of the user, and information indicating the posture of the controller. The controller has at least one of a force sense presentation device, a vibration device, and a sound output device.

Description

システム、情報処理方法および情報処理プログラムSystem, information processing method and information processing program
 本発明は、システム、情報処理方法および情報処理プログラムに関する。 The present invention relates to a system, an information processing method, and an information processing program.
 入射する光の強度変化を検出したピクセルが時間非同期的に信号を生成する、イベント駆動型のビジョンセンサが知られている。イベント駆動型のビジョンセンサは、所定の周期ごとに全ピクセルをスキャンするフレーム型ビジョンセンサ、具体的にはCCDやCMOSなどのイメージセンサに比べて、低電力で高速に動作可能である点で有利である。このようなイベント駆動型のビジョンセンサに関する技術は、例えば特許文献1および特許文献2に記載されている。 An event-driven vision sensor is known, in which pixels that detect changes in the intensity of incident light generate signals asynchronously with time. An event-driven vision sensor is advantageous in that it can operate at low power and at high speed compared to a frame-type vision sensor that scans all pixels at predetermined intervals, specifically image sensors such as CCD and CMOS. is. Techniques related to such an event-driven vision sensor are described in Patent Document 1 and Patent Document 2, for example.
特表2014-535098号公報Japanese Patent Publication No. 2014-535098 特開2018-85725号公報JP 2018-85725 A
 しかしながら、イベント駆動型のビジョンセンサについては、上記のような利点は知られているものの、他の装置と組み合わせた利用方法については、まだ十分に提案されているとは言いがたい。 However, although the above advantages of event-driven vision sensors are known, it is hard to say that enough proposals have been made for how to use them in combination with other devices.
 そこで、本発明は、所定のタイミングで全画素を同期的にスキャンするセンサにより生成した画像信号と、イベント駆動型のビジョンセンサにより生成した画像信号とに基づいて、ユーザーの状態と、当該ユーザーが保持するコントローラの姿勢とを推定し、推定結果に基づいて、コントローラへのフィードバック制御の制御値を算出することによって、レイテンシを抑えつつ、コントローラへの好適なフィードバック制御を実現することができるシステム、情報処理方法および情報処理プログラムを提供することを目的とする。 Therefore, according to the present invention, based on an image signal generated by a sensor that synchronously scans all pixels at a predetermined timing and an image signal generated by an event-driven vision sensor, the user's state and the user's A system capable of realizing suitable feedback control to the controller while suppressing latency by estimating the attitude of the held controller and calculating a control value for feedback control to the controller based on the estimation result, An object is to provide an information processing method and an information processing program.
 本発明のある観点によれば、センサ装置と、ユーザー操作を受け付けるコントローラと、ユーザー操作に基づいて処理を行う情報処理装置とを含むシステムであって、センサ装置は、所定のタイミングで全画素を同期的にスキャンすることによって第1画像信号を生成する第1画像センサと、画素ごとに入射する光の強度変化を検出したときに非同期的に第2画像信号を生成するイベント駆動型のビジョンセンサを含む第2画像センサと、第1画像信号と第2画像信号とに基づいて、ユーザーの状態と、当該ユーザーが保持するコントローラの姿勢とを推定する推定部と、ユーザーの状態を示す情報とコントローラの姿勢を示す情報とを出力する情報出力部とを備え、情報処理装置は、ユーザーの状態を示す情報とコントローラの姿勢を示す情報との少なくとも一方に基づいて、コントローラへのフィードバック制御の制御値を算出する制御値算出部を備え、コントローラは、制御値に基づいて力覚を提示する力覚提示装置、制御値に基づいて振動する振動装置、または制御値に基づいて音声を出力する音声出力装置の少なくとも1つを有する、システムが提供される。
 本発明の別の観点によれば、コントローラへのフィードバック制御の制御値を出力する情報処理方法であって、所定のタイミングで全画素を同期的にスキャンする第1画像センサにより生成された第1画像信号と、画素ごとに入射する光の強度変化を検出したときに非同期的に第2画像信号を生成するイベント駆動型のビジョンセンサを含む第2画像センサにより生成された第2画像信号に基づいて、ユーザーの状態と、当該ユーザーが保持するコントローラの姿勢とを推定する推定ステップと、ユーザーの状態を示す情報とコントローラの姿勢を示す情報とを出力する情報出力ステップとを含む情報処理方法が提供される。
 本発明のさらに別の観点によれば、所定のタイミングで全画素を同期的にスキャンする第1画像センサにより生成された第1画像信号と、画素ごとに入射する光の強度変化を検出したときに非同期的に第2画像信号を生成するイベント駆動型のビジョンセンサを含む第2画像センサにより生成された第2画像信号とに基づいて、ユーザーの状態と、当該ユーザーが保持するコントローラの姿勢とを推定する機能と、ユーザーの状態を示す情報とコントローラの姿勢を示す情報とを出力する機能とをコンピュータに実現させる情報処理プログラムが提供される。
According to one aspect of the present invention, there is provided a system including a sensor device, a controller that receives user operations, and an information processing device that performs processing based on user operations, wherein the sensor device detects all pixels at a predetermined timing. A first image sensor that scans synchronously to generate a first image signal, and an event-driven vision sensor that asynchronously generates a second image signal when detecting a change in the intensity of light incident on each pixel. an estimation unit for estimating the state of the user and the orientation of the controller held by the user based on the first image signal and the second image signal; information indicating the state of the user; and an information output unit that outputs information indicating the posture of the controller, the information processing device controlling feedback control to the controller based on at least one of the information indicating the state of the user and the information indicating the posture of the controller. The controller includes a control value calculation unit that calculates a value, and the controller is a haptic presentation device that presents a haptic sensation based on the control value, a vibration device that vibrates based on the control value, or a voice that outputs sound based on the control value. A system is provided having at least one of the output devices.
According to another aspect of the present invention, there is provided an information processing method for outputting a control value for feedback control to a controller, the first image generated by a first image sensor synchronously scanning all pixels at a predetermined timing. Based on the image signal and a second image signal generated by a second image sensor including an event-driven vision sensor that asynchronously generates a second image signal upon detecting a change in the intensity of light incident on each pixel. an information processing method including an estimation step of estimating a user's state and the posture of a controller held by the user; and an information output step of outputting information indicating the user's state and information indicating the posture of the controller. provided.
According to still another aspect of the present invention, the first image signal is generated by a first image sensor that synchronously scans all pixels at a predetermined timing, and the intensity change of incident light for each pixel is detected. and a second image signal generated by a second image sensor including an event-driven vision sensor that asynchronously generates a second image signal with respect to the state of the user and the attitude of the controller held by the user. and a function of outputting information indicating the state of the user and information indicating the posture of the controller.
 上記の構成によれば、所定のタイミングで全画素を同期的にスキャンするセンサにより生成した画像信号と、イベント駆動型のビジョンセンサにより生成した画像信号とに基づいて、ユーザーの状態と、当該ユーザーが保持するコントローラの姿勢とを推定し、推定結果に基づいて、コントローラへのフィードバック制御の制御値を算出する。したがって、レイテンシを抑えつつ、コントローラへの好適なフィードバック制御を実現することができる。 According to the above configuration, based on the image signal generated by the sensor that synchronously scans all pixels at a predetermined timing and the image signal generated by the event-driven vision sensor, the state of the user estimates the attitude of the controller held by , and calculates a control value for feedback control to the controller based on the estimation result. Therefore, it is possible to implement suitable feedback control to the controller while suppressing latency.
本発明の一実施形態に係るシステムの全体を示す模式図である。1 is a schematic diagram showing the entire system according to one embodiment of the present invention; FIG. 本発明の一実施形態に係るシステムの概略的な構成を示すブロック図である。1 is a block diagram showing a schematic configuration of a system according to one embodiment of the present invention; FIG. 本発明の一実施形態に係るシステムにおける推定部の概略的な構成を示すブロック図である。It is a block diagram showing a schematic configuration of an estimating unit in the system according to one embodiment of the present invention. 本発明の一実施形態における推定の例について説明するための図である。FIG. 4 is a diagram for explaining an example of estimation in one embodiment of the present invention; FIG. 本発明の一実施形態における推定の例について説明するための別の図である。FIG. 4 is another diagram for explaining an example of estimation in one embodiment of the present invention; 本発明の一実施形態における推定の例について説明するための別の図である。FIG. 4 is another diagram for explaining an example of estimation in one embodiment of the present invention; 本発明の一実施形態における推定の例について説明するための別の図である。FIG. 4 is another diagram for explaining an example of estimation in one embodiment of the present invention; 本発明の一実施形態に係る処理方法の例を示すフローチャートである。1 is a flowchart illustrating an example of a processing method according to an embodiment of the invention; 本発明の一実施形態におけるフィードバック制御について説明する図である。It is a figure explaining feedback control in one embodiment of the present invention. 本発明の一実施形態におけるフィードバック制御について説明する別の図である。FIG. 4 is another diagram illustrating feedback control in one embodiment of the present invention; 本発明の一実施形態におけるフィードバック制御について説明する別の図である。FIG. 4 is another diagram illustrating feedback control in one embodiment of the present invention; 本発明の一実施形態におけるフィードバック制御について説明する別の図である。FIG. 4 is another diagram illustrating feedback control in one embodiment of the present invention; 本発明の別の実施形態に係るシステムの概略的な構成を示すブロック図である。FIG. 4 is a block diagram showing a schematic configuration of a system according to another embodiment of the invention;
 以下、添付図面を参照しながら、本発明のいくつかの実施形態について詳細に説明する。なお、本明細書および図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Several embodiments of the present invention will be described in detail below with reference to the accompanying drawings. In the present specification and drawings, constituent elements having substantially the same functional configuration are denoted by the same reference numerals, thereby omitting redundant description.
 図1は、本実施形態に係るシステム1の全体を示す模式図である。
 本実施形態に係るシステム1は、図1に示すように、カメラユニット10と、情報処理装置20と、1以上のコントローラ30と、表示装置40とを含むゲームシステムである。情報処理装置20は、カメラユニット10、コントローラ30、および表示装置40のぞれぞれと有線または無線のネットワークにより接続される。
 システム1では、カメラユニット10およびコントローラ30から送信される情報に応じて情報処理装置20がゲームを進行し、表示装置40が、情報処理装置20の実行時画面、例えばゲーム画面を表示する。
FIG. 1 is a schematic diagram showing the entire system 1 according to this embodiment.
A system 1 according to the present embodiment is a game system including a camera unit 10, an information processing device 20, one or more controllers 30, and a display device 40, as shown in FIG. The information processing device 20 is connected to each of the camera unit 10, the controller 30, and the display device 40 via a wired or wireless network.
In the system 1, the information processing device 20 progresses the game in accordance with information transmitted from the camera unit 10 and the controller 30, and the display device 40 displays a screen during execution of the information processing device 20, such as a game screen.
 本実施形態において、カメラユニット10は、ゲームのプレイヤーであるユーザーの状態と、当該ユーザーが保持するコントローラ30の姿勢とを推定し、推定結果を情報処理装置20に送信する。ここで、ユーザーの状態とは、ユーザーの姿勢、ユーザーの腕の形状、またはユーザーの手指の形状の少なくとも1つを含む。
 カメラユニット10は、ユーザーの状態とコントローラ30の姿勢とを推定して出力することにより、コントローラ30と同様にユーザー操作を受け付けるための操作装置として機能する。このようなカメラユニット10は、ユーザーの状態とコントローラ30の姿勢とを推定するために、被写界にユーザーが収まる位置、例えばユーザーから1メートル程度の距離に配置される。図1の例では、カメラユニット10は、表示装置40の近傍に配置される。カメラユニット10の最適な配置位置は、目的に応じて異なる。例えば、プレイ対象のゲームの内容に応じて、ユーザーの体全体、上半身、手元等、把握する対象が被写界に収まる位置にカメラユニット10を配置することが望まれる。
 なお、カメラユニット10の配置に際しては、例えば、情報処理装置20によって表示装置40にチュートリアルなどを表示することにより、ユーザーがカメラユニット10を適切な位置に配置するよう案内するとよい。
In this embodiment, the camera unit 10 estimates the state of the user who is the game player and the orientation of the controller 30 held by the user, and transmits the estimation results to the information processing device 20 . Here, the state of the user includes at least one of the posture of the user, the shape of the user's arm, or the shape of the user's fingers.
The camera unit 10 functions as an operating device for accepting user operations, like the controller 30, by estimating and outputting the user's state and the orientation of the controller 30. FIG. In order to estimate the user's state and the orientation of the controller 30, such a camera unit 10 is arranged at a position where the user can fit in the field of view, for example, at a distance of about 1 meter from the user. In the example of FIG. 1, the camera unit 10 is arranged near the display device 40 . The optimum arrangement position of the camera unit 10 differs depending on the purpose. For example, depending on the content of the game to be played, it is desirable to arrange the camera unit 10 at a position where the target to be grasped, such as the user's whole body, upper body, hand, etc., fits within the field of view.
When arranging the camera unit 10, for example, the information processing device 20 may display a tutorial or the like on the display device 40 to guide the user to arrange the camera unit 10 at an appropriate position.
 以下、システム1の各構成について説明する。
 図2は、本発明の一実施形態に係るシステムの概略的な構成を示すブロック図である。
 カメラユニット10は、RGBカメラ11と、EDS(Event Driven Sensor)12と、IMU(Inertial Measurement Unit)13と、推定部14と、情報出力部15とを含む。
 RGBカメラ11は、第1画像センサであるイメージセンサ111と、イメージセンサ111に接続される処理回路112とを含む。イメージセンサ111は、例えば所定の周期で、またはユーザー操作に応じた所定のタイミングで全ピクセルを同期的にスキャンすることによってRGB画像信号113を生成する。処理回路112は、例えばRGB画像信号113を保存および伝送に適した形式に変換する。また、処理回路112は、RGB画像信号113にタイムスタンプを与える。
Each configuration of the system 1 will be described below.
FIG. 2 is a block diagram showing a schematic configuration of a system according to one embodiment of the invention.
Camera unit 10 includes RGB camera 11 , EDS (Event Driven Sensor) 12 , IMU (Inertial Measurement Unit) 13 , estimation section 14 , and information output section 15 .
The RGB camera 11 includes an image sensor 111 as a first image sensor and a processing circuit 112 connected to the image sensor 111 . The image sensor 111 generates RGB image signals 113 by synchronously scanning all pixels at predetermined intervals or at predetermined timings according to user operations, for example. Processing circuitry 112 converts, for example, RGB image signals 113 into a format suitable for storage and transmission. The processing circuit 112 also gives the RGB image signal 113 a time stamp.
 EDS12は、センサアレイを構成する第2画像センサであるセンサ121と、センサ121に接続される処理回路122とを含む。センサ121は、受光素子を含み、画素ごとに入射する光の強度変化、より具体的には予め定めた所定の値を超える輝度変化を検出したときにイベント信号123を生成するイベント駆動型のビジョンセンサである。処理回路122を経て出力されるイベント信号123は、センサ121の識別情報(例えばピクセルの位置)と、輝度変化の極性(上昇または低下)と、タイムスタンプとを含む。また、輝度変化を検出した際に、EDS12は、RGB画像信号113の生成頻度(RGBカメラ11のフレームレート)より大幅に高い頻度でイベント信号123を生成することができる。また、EDS12は、RGB画像信号113の生成頻度(RGBカメラ11のフレームレート)より大幅に高い頻度でイベント信号123を生成することができる。
 なお、本明細書では、当該信号に基づいて画像を構築可能な信号を画像信号という。したがって、RGB画像信号113およびイベント信号123は、画像信号の一例を示す。
EDS 12 includes sensor 121 , which is a second image sensor forming a sensor array, and processing circuitry 122 connected to sensor 121 . The sensor 121 includes a light receiving element, and is an event-driven vision that generates an event signal 123 when it detects a change in the intensity of light incident on each pixel, more specifically, a change in brightness exceeding a predetermined value. sensor. The event signal 123 output through the processing circuitry 122 includes the identity of the sensor 121 (eg, pixel location), the polarity of the luminance change (rising or falling), and a time stamp. Further, when detecting a luminance change, the EDS 12 can generate the event signal 123 at a significantly higher frequency than the RGB image signal 113 (frame rate of the RGB camera 11). In addition, the EDS 12 can generate the event signal 123 at a significantly higher frequency than the RGB image signal 113 (frame rate of the RGB camera 11).
In this specification, a signal from which an image can be constructed is called an image signal. Therefore, the RGB image signal 113 and the event signal 123 are examples of image signals.
 本実施形態において、RGB画像信号113およびイベント信号123に与えられるタイムスタンプは同期している。具体的には、例えば、EDS12でタイムスタンプを生成するために用いられる時刻情報をRGBカメラ11に提供することによって、RGB画像信号113およびイベント信号123に与えられるタイムスタンプを同期させることができる。あるいは、タイムスタンプを生成するための時刻情報がRGBカメラ11とEDS12とでそれぞれ独立している場合、特定のイベント(例えば、画像全体にわたる被写体の変化)が発生した時刻を基準にしてタイムスタンプのオフセット量を算出することによって、事後的にRGB画像信号113およびイベント信号123に与えられるタイムスタンプを同期させることができる。 In this embodiment, the time stamps given to the RGB image signal 113 and the event signal 123 are synchronized. Specifically, the time stamps applied to the RGB image signal 113 and the event signal 123 can be synchronized, for example, by providing the RGB camera 11 with time information that is used to generate the time stamps in the EDS 12 . Alternatively, if time information for generating time stamps is independent for each of the RGB camera 11 and the EDS 12, time stamps are generated based on the time when a specific event (for example, a subject change over the entire image) occurred. By calculating the offset amount, the time stamps given to the RGB image signal 113 and the event signal 123 can be synchronized afterwards.
 また、本実施形態では、事前に実行されるRGBカメラ11とEDS12とのキャリブレーション手順によって、EDS12のセンサ121がRGB画像信号113の1または複数のピクセルに対応付けられ、イベント信号123はRGB画像信号113の1または複数のピクセルにおける光の強度変化に応じて生成される。より具体的には、例えば、RGBカメラ11とEDS12とで共通の校正パターンを撮像し、RGBカメラ11およびEDS12のぞれぞれの内部パラメータおよび外部パラメータからカメラとセンサとの間の対応パラメータを算出することによって、RGB画像信号113の1または複数のピクセルにセンサ121を対応付けることができる。 Further, in this embodiment, the sensor 121 of the EDS 12 is associated with one or more pixels of the RGB image signal 113 by the calibration procedure of the RGB camera 11 and the EDS 12 performed in advance, and the event signal 123 is the RGB image. It is generated in response to light intensity variations in one or more pixels of signal 113 . More specifically, for example, a common calibration pattern is captured by the RGB camera 11 and the EDS 12, and corresponding parameters between the camera and the sensor are determined from the internal parameters and external parameters of the RGB camera 11 and the EDS 12, respectively. The calculation allows the sensor 121 to be associated with one or more pixels of the RGB image signal 113 .
 IMU13は、カメラユニット10自体の姿勢を検出するセンサである。IMU13は、所定の周期で、または、所定のタイミングでカメラユニット10の三次元の姿勢データを取得し、推定部14に出力する。 The IMU 13 is a sensor that detects the orientation of the camera unit 10 itself. The IMU 13 acquires three-dimensional posture data of the camera unit 10 at a predetermined cycle or timing, and outputs the data to the estimation unit 14 .
 推定部14は、EDS12において生成されたイベント信号123と、RGBカメラ11において生成されたRGB画像信号113とに基づいて、ユーザーの状態と、当該ユーザーが保持するコントローラ30の姿勢とを推定する。図3は、推定部14の概略的な構成を示すブロック図である。推定部14は、第1認識部141と、座標算出部142と、学習済みモデル143と、状態推定部144と、第2認識部145と、姿勢推定部146とを含む。
 推定部14の第1認識部141は、RGB画像信号113とイベント信号123との少なくとも一方に基づいて、カメラユニット10の被写界に含まれる1以上のユーザーを認識する。第1認識部141は、例えば、イベント信号123において同じ極性のイベントが発生していることが示される連続した画素領域に存在するオブジェクトを検出し、RGB画像信号113の対応する部分に基づいて被写体認識を行うことによりユーザーを認識する。カメラユニット10の被写界に複数のユーザーが含まれる場合、第1認識部141は、それぞれのユーザーを識別する。
Based on the event signal 123 generated by the EDS 12 and the RGB image signal 113 generated by the RGB camera 11, the estimation unit 14 estimates the state of the user and the orientation of the controller 30 held by the user. FIG. 3 is a block diagram showing a schematic configuration of the estimation unit 14. As shown in FIG. Estimating section 14 includes first recognizing section 141 , coordinate calculating section 142 , trained model 143 , state estimating section 144 , second recognizing section 145 , and posture estimating section 146 .
A first recognition unit 141 of the estimation unit 14 recognizes one or more users included in the field of view of the camera unit 10 based on at least one of the RGB image signal 113 and the event signal 123 . For example, the first recognition unit 141 detects an object existing in a continuous pixel region in which the event signal 123 indicates that an event of the same polarity has occurred, and detects the object based on the corresponding portion of the RGB image signal 113 . Recognize the user by performing recognition. When multiple users are included in the field of the camera unit 10, the first recognition unit 141 identifies each user.
 推定部14の座標算出部142は、第1認識部141により認識したユーザーごとに、学習済みモデル143に基づいて、RGB画像信号113からユーザーが有する複数の関節の位置を示す座標情報を算出する。学習済みモデル143は、例えば、複数の関節を有する人物の画像を入力データとし、人物の複数の関節の位置を示す座標情報を正解データとした教師あり学習を実行することによって予め構築することができる。なお、機械学習の具体的な手法については、公知の各種の技術を利用可能であるため詳細な説明は省略する。また、推定部14に関係性学習部を備え、RGB画像信号113が入力される度に、入力されたRGB画像信号113に基づく画像と関節の位置を示す座標情報との関係性を学習して学習済みモデル143を更新する構成としてもよい。そして、推定部14の状態推定部144は、第1認識部141により認識したユーザーごとに、座標算出部142により算出した座標情報に基づいて、ユーザーの状態を推定する。 The coordinate calculating unit 142 of the estimating unit 14 calculates, for each user recognized by the first recognizing unit 141, coordinate information indicating the positions of multiple joints of the user from the RGB image signal 113 based on the learned model 143. . The trained model 143 can be constructed in advance by executing supervised learning using, for example, an image of a person having multiple joints as input data and coordinate information indicating the positions of the multiple joints of the person as correct data. can. It should be noted that a detailed description of a specific method of machine learning is omitted because various known techniques can be used. Further, the estimating unit 14 includes a relationship learning unit that learns the relationship between the image based on the input RGB image signal 113 and the coordinate information indicating the position of the joint each time the RGB image signal 113 is input. A configuration in which the learned model 143 is updated may be used. Then, the state estimation unit 144 of the estimation unit 14 estimates the user's state based on the coordinate information calculated by the coordinate calculation unit 142 for each user recognized by the first recognition unit 141 .
 推定部14の第2認識部145は、RGB画像信号113とイベント信号123との少なくとも一方に基づいて、第1認識部141により認識したユーザーごとに、当該ユーザーが保持するコントローラ30を認識する。第2認識部145は、例えば、座標算出部142により算出したユーザーが有する複数の関節の位置を示す座標情報に基づき、RGB画像信号113のうち、ユーザーの手の近傍に対応する部分について被写体認識を行うことによりコントローラ30を認識する。第1認識部141により複数のユーザーが認識された場合、第2認識部145は、それぞれのユーザーについてコントローラ30を識別する。なお、コントローラ30の識別、つまり、何れのユーザーが何れのコントローラ30を保持しているかの判別はどのように行ってもよい。例えば、複数のコントローラ30のそれぞれに識別のためのマークなどを添付し、RGB画像信号113に基づいて判別してもよいし、複数のコントローラ30のそれぞれから所定の識別信号を出力する構成とし、カメラユニット10または情報処理装置20により受信した識別信号に基づいて判別してもよい。また、ユーザーがコントローラ30を保持していない場合、第2認識部145は、「ユーザーがコントローラ30を保持していない」ことを認識する。 The second recognition unit 145 of the estimation unit 14 recognizes the controller 30 held by each user recognized by the first recognition unit 141 based on at least one of the RGB image signal 113 and the event signal 123 . For example, the second recognition unit 145 performs object recognition on a portion of the RGB image signal 113 corresponding to the vicinity of the user's hand, based on the coordinate information indicating the positions of the multiple joints of the user calculated by the coordinate calculation unit 142. , the controller 30 is recognized. When multiple users are recognized by the first recognition unit 141, the second recognition unit 145 identifies the controller 30 for each user. The identification of the controllers 30, that is, the determination of which controller 30 is held by which user, may be performed in any manner. For example, an identification mark or the like may be attached to each of the plurality of controllers 30, and determination may be made based on the RGB image signal 113. Alternatively, each of the plurality of controllers 30 may output a predetermined identification signal, The determination may be made based on the identification signal received by the camera unit 10 or the information processing device 20 . Further, when the user does not hold the controller 30, the second recognition unit 145 recognizes that "the user does not hold the controller 30".
 推定部14の姿勢推定部146は、第2認識部145により認識したコントローラ30ごとに、コントローラ30の姿勢を推定する。姿勢推定部146は、例えば、RGB画像信号113のうち、第2認識部145により行った被写体認識の結果に基づいてコントローラ30の形状を推定し、推定した形状に基づいてコントローラ30の姿勢を推定する。なお、コントローラ30がコントローラ30自体の姿勢を検出するセンサを有する場合、そのセンサの出力を加味してコントローラ30の姿勢を行ってもよい。第2認識部145により複数のコントローラ30が認識された場合、姿勢推定部146は、それぞれのコントローラ30について姿勢を推定する。なお、姿勢推定部146は、上述した座標算出部142と同様に、学習済みモデルを用いた機械学習の手法を利用してコントローラ30の姿勢を推定してもよい。 The posture estimation unit 146 of the estimation unit 14 estimates the posture of each controller 30 recognized by the second recognition unit 145 . For example, the posture estimation unit 146 estimates the shape of the controller 30 based on the result of subject recognition performed by the second recognition unit 145 in the RGB image signal 113, and estimates the posture of the controller 30 based on the estimated shape. do. Note that if the controller 30 has a sensor that detects the orientation of the controller 30 itself, the orientation of the controller 30 may be determined by taking into account the output of the sensor. When a plurality of controllers 30 are recognized by the second recognition unit 145 , the posture estimation unit 146 estimates the posture of each controller 30 . It should be noted that posture estimation section 146 may estimate the posture of controller 30 using a machine learning method using a learned model, similar to coordinate calculation section 142 described above.
 図4Aから図4Dは、推定部14によるユーザーの状態の推定、およびコントローラ30の姿勢の推定の例について説明するための図である。図4Aから図4Dでは、左右にグリップ部分を有するコントローラ30を例示する。
 推定部14により推定されるユーザーの状態は、上述したように、ユーザーの姿勢、ユーザーの腕の形状、またはユーザーの手指の形状の少なくとも1つを含む。ユーザーの姿勢は、例えば、ユーザーが椅子などに着席している状態、ユーザーが立っている状態、さらに、ユーザーがカメラユニット10に対して正面を向いている状態、側方を向いている状態等を含む。また、ユーザーの腕の形状は、例えば、ユーザーが腕を上げている状態やユーザーが腕を動かして所定のポーズをとっている状態等を含む。また、ユーザーの手指の形状は、例えば、ユーザーが手指を動かして2本の指を立てるピースサイン等の所定のポーズをとっている状態やユーザーがコントローラ30を握っている状態等を含む。また、コントローラ30の姿勢は、ユーザーがコントローラ30のどの部分を把持し、どのような姿勢でコントローラ30を保持しているかを示す。
4A to 4D are diagrams for explaining an example of estimation of the user's state by the estimation unit 14 and estimation of the posture of the controller 30. FIG. Figures 4A-4D illustrate a controller 30 having left and right grip portions.
The state of the user estimated by the estimation unit 14 includes at least one of the posture of the user, the shape of the user's arms, or the shape of the user's fingers, as described above. The posture of the user includes, for example, a state in which the user is seated on a chair, a state in which the user stands, a state in which the user faces the camera unit 10, and a state in which the user faces the side. including. Further, the shape of the user's arm includes, for example, a state in which the user raises the arm, a state in which the user moves the arm to take a predetermined pose, and the like. Further, the shape of the user's fingers includes, for example, a state in which the user is taking a predetermined pose such as a peace sign in which the user moves the fingers and raises two fingers, a state in which the user is gripping the controller 30, and the like. Also, the attitude of the controller 30 indicates which part of the controller 30 the user is holding and in what attitude the user is holding the controller 30 .
 図4Aは、ユーザーが弓矢を構えるポーズをとり、コントローラ30の中央部分を片手で把持している状態を例示する。図4Aの例は、ユーザーの状態とコントローラ30の姿勢に基づき、仮想の弓V1および矢V2を操作するゲームの例を示す。
 図4Bは、ユーザーがコントローラ30のグリップ部分を両手で把持して回転させる状態を例示する。図Bの例は、ユーザーの状態とコントローラ30の姿勢に基づき、オートバイ等のハンドルを操作するゲームの例を示す。
 図4Cは、ユーザーがバットを構えるポーズをとり、コントローラ30の片側のグリップ部分を片手で把持している状態を例示する。図4Cの例は、ユーザーの状態とコントローラ30の姿勢に基づき、仮想のバットV3を操作するゲームの例を示す。
 図4Dは、ユーザーがコントローラ30を把持せずに手裏剣を投げるポーズをとっている状態を例示する。図4Dの例は、コントローラ30を使用せずに、ユーザーの状態に基づき、仮想の手裏剣V4を操作するゲームの例を示す。
FIG. 4A illustrates a state in which the user takes a bow and arrow pose and holds the central portion of the controller 30 with one hand. The example of FIG. 4A shows an example of a game in which a virtual bow V1 and arrow V2 are manipulated based on the user's state and the posture of the controller 30. FIG.
FIG. 4B illustrates a state in which the user holds the grip portion of the controller 30 with both hands and rotates it. The example of FIG. B shows an example of a game in which a steering wheel of a motorcycle or the like is operated based on the state of the user and the posture of the controller 30 .
FIG. 4C illustrates a state in which the user takes a pose of holding a bat and grips the grip portion on one side of the controller 30 with one hand. The example of FIG. 4C shows an example of a game in which a virtual bat V3 is operated based on the state of the user and the posture of the controller 30. FIG.
FIG. 4D illustrates a state in which the user is in a shuriken throwing pose without holding the controller 30 . The example of FIG. 4D shows an example of a game in which a virtual shuriken V4 is manipulated based on the user's state without using the controller 30 .
 ここまで説明した推定部14による推定処理は一例であり、RGB画像信号113およびイベント信号123をどのタイミングでどのように利用するかはこの例に限定されない。例えば、推定部14は、イベント信号123に基づいて、被写界に含まれるユーザーの一部または全体の移動量を算出し、算出結果を推定処理に利用してもよい。また、ブロックマッチング法、勾配法などの公知の手法を用いて推定処理を行ってもよい。
 また、例えば、推定部14による推定処理に、IMU13により検出したカメラユニット10の三次元の姿勢データを利用してもよい。
The estimation processing by the estimation unit 14 described so far is merely an example, and how and when the RGB image signal 113 and the event signal 123 are used is not limited to this example. For example, based on the event signal 123, the estimation unit 14 may calculate the movement amount of part or all of the user included in the object scene and use the calculation result for the estimation process. Also, the estimation process may be performed using a known method such as a block matching method or a gradient method.
Further, for example, the three-dimensional posture data of the camera unit 10 detected by the IMU 13 may be used for the estimation processing by the estimation unit 14 .
 情報出力部15は、推定部14により推定したユーザーの状態を示す情報と、コントローラ30の姿勢を示す情報とを情報処理装置20に出力する。なお、ユーザーの状態を示す情報は、座標算出部142によって算出された座標情報でもよいし、状態推定部144によって推定されたユーザーの状態を示す情報であってもよい。
 また、第1認識部141により複数のユーザーが認識された場合、情報出力部15は、ユーザーと、当該ユーザーが保持するコントローラとの組み合わせを示す情報を情報処理装置20に出力する。このような情報を出力する際には、ユーザーの状態を示す情報とコントローラ30の姿勢を示す情報との組み合わせを示すテーブルなどの情報を出力してもよいし、ユーザーの状態を示す情報に、当該ユーザーが保持するコントローラ30の識別情報を関連付けてもよいし、コントローラ30の姿勢を示す情報に、当該コントローラ30を保持するユーザーの状態を示す情報を関連付けてもよい。
 さらに、ユーザーの状態およびコントローラ30の姿勢と、ユーザー操作との関係を予めテーブルなどに定めておくことにより、情報出力部15は、ユーザー操作を示す情報を情報処理装置20に出力してもよい。
The information output unit 15 outputs information indicating the state of the user estimated by the estimation unit 14 and information indicating the posture of the controller 30 to the information processing device 20 . The information indicating the state of the user may be coordinate information calculated by the coordinate calculation unit 142 or information indicating the state of the user estimated by the state estimation unit 144 .
Also, when a plurality of users are recognized by the first recognition unit 141 , the information output unit 15 outputs to the information processing device 20 information indicating combinations of users and controllers held by the users. When outputting such information, information such as a table showing a combination of information indicating the user's state and information indicating the attitude of the controller 30 may be output. The identification information of the controller 30 held by the user may be associated, or the information indicating the attitude of the controller 30 may be associated with the information indicating the state of the user holding the controller 30 .
Further, the information output unit 15 may output information indicating the user's operation to the information processing device 20 by predetermining the relationship between the user's state, the posture of the controller 30, and the user's operation in a table or the like. .
 ここまで説明したように、カメラユニット10は、ユーザーの状態およびコントローラ30の姿勢を推定することにより、コントローラ30と同様にユーザー操作を受け付けるための操作装置として機能する。つまり、推定部14により推定されるユーザーの状態およびコントローラ30の姿勢に基づいて、ユーザー操作を識別することができる。また、カメラユニット10は、RGB画像信号113およびイベント信号123の生成からユーザーの状態の推定およびコントローラ30の姿勢の推定までを単体で完結し、情報処理装置20に対してRGB画像信号113およびイベント信号123は出力せずに、推定結果を出力することができる。なお、カメラユニット10は、独立した電源を有することが好ましい。 As described above, the camera unit 10 functions as an operating device for accepting user operations in the same way as the controller 30 by estimating the user's state and the orientation of the controller 30 . In other words, the user operation can be identified based on the state of the user estimated by the estimation unit 14 and the posture of the controller 30 . In addition, the camera unit 10 alone completes everything from generating the RGB image signal 113 and the event signal 123 to estimating the state of the user and estimating the posture of the controller 30, and transmits the RGB image signal 113 and the event signal 123 to the information processing device 20. The estimation result can be output without outputting the signal 123 . Note that the camera unit 10 preferably has an independent power source.
 再び図2を参照して、情報処理装置20は、例えば通信インターフェース、プロセッサ、およびメモリを有するコンピュータによって実装され、通信部21および制御部22を含む。制御部22は、プロセッサがメモリに格納された、または通信インターフェースを介して受信されたプログラムに従って動作することによって実現される制御値算出部221および画像生成部222の各機能を含む。以下、各部の機能についてさらに説明する。 Referring to FIG. 2 again, the information processing device 20 is implemented by a computer having, for example, a communication interface, a processor, and a memory, and includes a communication section 21 and a control section 22. The control unit 22 includes functions of a control value calculation unit 221 and an image generation unit 222 implemented by the processor operating according to a program stored in memory or received via a communication interface. The function of each unit will be further described below.
 通信部21は、カメラユニット10の情報出力部15から出力された情報を受信する。また、通信部21は、コントローラ30と相互に通信可能であるとともに、表示装置40に表示させる画像を出力する。
 制御部22の制御値算出部221は、カメラユニット10の情報出力部15から受信した情報とコントローラ30から受信した情報との少なくとも一方に基づいて、コントローラ30および表示装置40を含む外部装置へのフィードバック制御の制御値を算出する。上述したように、カメラユニット10およびコントローラ30は、ユーザー操作を受け付けるための操作装置として機能する。そこで、制御値算出部221は、カメラユニット10とコントローラ30との少なくとも一方を介して行われたユーザー操作に応じて、コントローラ30および表示装置40を含む外部装置へのフィードバック制御の制御値を算出する。算出された制御値は、通信部21を介してコントローラ30および表示装置40に出力される。
 制御部22の画像生成部222は、制御値算出部221により算出した制御値に応じて、表示装置40に表示する表示画像を生成する。生成された表示画像は、通信部21を介して表示装置40に出力される。
 なお、制御値の算出、および表示画像の生成の詳細については、後述するコントローラ30および表示装置40の構成の説明と関連して説明を行う。
The communication section 21 receives information output from the information output section 15 of the camera unit 10 . Further, the communication unit 21 can communicate with the controller 30 and outputs an image to be displayed on the display device 40 .
Based on at least one of the information received from the information output unit 15 of the camera unit 10 and the information received from the controller 30, the control value calculation unit 221 of the control unit 22 determines the output to the external device including the controller 30 and the display device 40. Calculate the control value for feedback control. As described above, the camera unit 10 and controller 30 function as an operation device for accepting user operations. Therefore, the control value calculation unit 221 calculates a control value for feedback control to an external device including the controller 30 and the display device 40 according to a user operation performed via at least one of the camera unit 10 and the controller 30. do. The calculated control value is output to the controller 30 and the display device 40 via the communication section 21 .
The image generator 222 of the controller 22 generates a display image to be displayed on the display device 40 according to the control value calculated by the control value calculator 221 . The generated display image is output to the display device 40 via the communication section 21 .
Details of calculation of the control values and generation of the display image will be described in connection with the description of the configurations of the controller 30 and the display device 40, which will be described later.
 コントローラ30は、図2に示すように、通信部31、操作部32、力覚提示部33、振動部34、および音声出力部35を含む。ユーザーはコントローラ30を操作することによりゲームに関する各種操作を行うことができる。
 通信部31は、情報処理装置20の通信部21から出力された制御値を受信して、力覚提示部33、振動部34、および音声出力部35の各部に出力する。また、通信部31は、操作部32により受け付けたユーザー操作に関する情報を、情報処理装置20に出力する。
 操作部32は、ボタンおよびパッドなどの複数の操作子を備え、操作子に対するユーザーの操作入力を受け付ける。
 力覚提示部33は、操作部32の少なくとも一部の操作子に設けられ、情報処理装置20から供給される制御値にしたがい、ユーザー操作に抗する、あるいは連動する力をユーザーに提示する。具体的に、力覚提示部33は、回転するロータを含むモータやアクチュエータ等で構成できる。力覚提示部33を構成する力覚提示装置については広く知られたものを採用できるので、ここでの詳しい説明を省略する。
The controller 30 includes a communication section 31, an operation section 32, a force sense presentation section 33, a vibration section 34, and an audio output section 35, as shown in FIG. A user can perform various operations related to the game by operating the controller 30 .
The communication unit 31 receives the control values output from the communication unit 21 of the information processing device 20 and outputs the control values to the force sense presentation unit 33 , the vibration unit 34 , and the audio output unit 35 . The communication unit 31 also outputs information regarding user operations received by the operation unit 32 to the information processing device 20 .
The operation unit 32 includes a plurality of operators such as buttons and pads, and accepts user's operation input to the operators.
The haptic presentation unit 33 is provided in at least a part of the operation elements of the operation unit 32 , and presents the user with a force that resists or interlocks with the user's operation according to the control value supplied from the information processing device 20 . Specifically, the force sense presentation unit 33 can be configured by a motor, an actuator, or the like including a rotating rotor. A well-known device can be used as the haptic device that constitutes the haptic device 33, and detailed description thereof will be omitted here.
 振動部34は、情報処理装置20から供給される制御値にしたがい、振動を発生するものであり、例えばモータにより構成できる。振動部34は、ユーザー操作が行われた場合に振動を発生することにより、ユーザー操作が正しく行われ、情報処理装置20に認識されたことをユーザーに通知することが可能である。
 音声出力部35は、情報処理装置20から供給される制御値にしたがい、音声を出力するものであり、例えばスピーカにより構成できる。音声出力部35は、ユーザー操作が行われた場合に音声を出力することにより、ユーザー操作が正しく行われ、情報処理装置20に認識されたことをユーザーに通知することが可能である。
 なお、上述した力覚提示部33による力覚の提示と連動して、振動部34による振動と、音声出力部35による音声出力との少なくとも一方が行われることにより、ユーザーへのフィードバック制御の多様性を高めることが可能である。
The vibrating section 34 generates vibration according to a control value supplied from the information processing device 20, and can be configured by a motor, for example. The vibrating unit 34 can notify the user that the user operation has been correctly performed and has been recognized by the information processing device 20 by generating vibration when the user operation is performed.
The audio output unit 35 outputs audio according to the control value supplied from the information processing device 20, and can be configured by, for example, a speaker. The audio output unit 35 can notify the user that the user operation has been correctly performed and has been recognized by the information processing apparatus 20 by outputting audio when the user operation is performed.
At least one of the vibration by the vibration unit 34 and the sound output by the sound output unit 35 is performed in conjunction with the presentation of the force sense by the force sense presentation unit 33 described above, thereby providing various feedback controls to the user. It is possible to improve
 情報処理装置20の制御値算出部221は、上述したように、コントローラ30へのフィードバック制御の制御値を算出するが、より具体的には、コントローラ30の力覚提示部33、振動部34、および音声出力部35へのフィードバック制御の制御値を算出する。力覚提示部33に関して、制御値算出部221は、ユーザー操作に応じたフィードバック制御としてどのような力覚を提示させるかを示す制御値を算出する。振動部34に関して、制御値算出部221は、ユーザー操作に応じたフィードバック制御としてどのような振動を発生させるかを提示するかを示す制御値を算出する。音声出力部35に関して、制御値算出部221は、ユーザー操作に応じたフィードバック制御としてどのような音声を出力させるかを示す制御値を算出する。制御値算出部221による制御値の算出は、予め定められた計算式やテーブル等にしたがって行うことができる。
 なお、第1認識部141により複数のユーザーが認識され、情報出力部15によりユーザーと、当該ユーザーが保持するコントローラとの組み合わせを示す情報とが出力された場合、制御値算出部221は、ユーザーと、当該ユーザーが保持するコントローラとの組み合わせごとに制御値を算出する。
As described above, the control value calculation unit 221 of the information processing device 20 calculates a control value for feedback control to the controller 30. More specifically, the force sense presentation unit 33, vibration unit 34, and And a control value for feedback control to the audio output unit 35 is calculated. Regarding the force sense presentation unit 33, the control value calculation unit 221 calculates a control value indicating what kind of force sense is to be presented as feedback control according to the user's operation. Regarding the vibration unit 34, the control value calculation unit 221 calculates a control value indicating whether to present what kind of vibration is to be generated as feedback control according to the user's operation. Regarding the audio output unit 35, the control value calculation unit 221 calculates a control value indicating what kind of audio is to be output as feedback control according to the user's operation. The calculation of the control value by the control value calculator 221 can be performed according to a predetermined calculation formula, table, or the like.
When the first recognition unit 141 recognizes a plurality of users and the information output unit 15 outputs information indicating the combination of the user and the controller held by the user, the control value calculation unit 221 recognizes the user , the control value is calculated for each combination with the controller held by the user.
 例えば、上述した図4Aの例で示したように、仮想の弓V1により矢V2を放つユーザー操作が行われた場合、制御値算出部221は、ユーザーの状態とコントローラ30の姿勢に基づき、力覚提示部33および振動部34に関して、実際に弓矢を放った際の反動に相当する力覚の提示および振動の発生を示す制御値を算出する。また、制御値算出部221は、実際に弓矢を放った際の音声に相当する音声の出力を示す制御値を算出する。
 図4Bの例で示したように、オートバイ等のハンドルを操作するユーザー操作が行われた場合、制御値算出部221は、ユーザーの状態とコントローラ30の姿勢に基づき、力覚提示部33および振動部34に関して、ハンドルの操作への反動に相当する力覚の提示および振動の発生を示す制御値を算出する。また、制御値算出部221は、ハンドルの操作に相当する音声の出力を示す制御値を算出する。
 図4Cの例で示したように、仮想のバットV3を操作するユーザー操作が行われた場合、制御値算出部221は、ユーザーの状態とコントローラ30の姿勢に基づき、力覚提示部33および振動部34に関して、実際にバットを操作した際の反動に相当する力覚の提示および振動の発生を示す制御値を算出する。また、制御値算出部221は、実際にバットを操作した際に発生する音声に相当する音声の出力を示す制御値を算出する。
 なお、図4Dの例で示したように、コントローラ30を使用せずにユーザー操作が行われた場合には、制御値算出部221は、コントローラ30へのフィードバック制御の制御値を算出しない。
For example, as shown in the example of FIG. 4A described above, when the user performs the operation of shooting the arrow V2 with the virtual bow V1, the control value calculation unit 221 calculates the force based on the state of the user and the posture of the controller 30. For the sense presenting unit 33 and the vibrating unit 34, a control value is calculated which indicates the presenting of the sense of force corresponding to the recoil when the bow and arrow is actually shot, and the generation of the vibration. The control value calculator 221 also calculates a control value indicating the output of a sound corresponding to the sound when the bow and arrow are actually shot.
As shown in the example of FIG. 4B , when a user operation of operating a steering wheel of a motorcycle or the like is performed, the control value calculation unit 221 determines the force sense presentation unit 33 and the vibration of the controller 30 based on the state of the user and the posture of the controller 30 . With respect to the portion 34, a control value is calculated that indicates the presentation of the force sense corresponding to the reaction to the operation of the steering wheel and the generation of the vibration. Also, the control value calculator 221 calculates a control value indicating the output of the sound corresponding to the operation of the steering wheel.
As shown in the example of FIG. 4C , when a user operation to operate the virtual bat V3 is performed, the control value calculation unit 221 causes the force sense presentation unit 33 and vibration With respect to the section 34, a control value is calculated that indicates the presentation of a force sense corresponding to the recoil when the bat is actually operated and the generation of vibration. Also, the control value calculator 221 calculates a control value indicating the output of a sound corresponding to the sound generated when the bat is actually operated.
Note that, as shown in the example of FIG. 4D , when the user's operation is performed without using the controller 30 , the control value calculator 221 does not calculate the control value for feedback control to the controller 30 .
 なお、ここまで説明したコントローラ30には公知の様々な構成を適用することが可能である。例えば、両手のそれぞれに把持可能な一対のコントローラにより構成されてもよいし、キーボートのような文字入力が可能なコントローラにより構成されてもよいし、スマートフォンなどのアプリケーションにより構成されてもよい。
 また、コントローラ30に、接触センサを備え、接触センサへのユーザーの接触状態を示す情報を、通信部31を介して情報処理装置20に供給してもよい。このような情報は、制御値算出部221による制御値の算出に利用することができる。
 また、コントローラ30に、音声入力部を備え、音声認識技術を適用してもよい。例えば、コントローラ30にマイクなどの音声入力部と音声認識部とを備え、ユーザーが発声するコマンドや、ユーザーの呼びかけなどを示す情報を、通信部31を介して情報処理装置20に供給してもよい。
Various known configurations can be applied to the controller 30 described so far. For example, it may be configured by a pair of controllers that can be held with both hands, may be configured by a controller that allows character input such as a keyboard, or may be configured by an application such as a smartphone.
Further, the controller 30 may be provided with a contact sensor, and information indicating the contact state of the user to the contact sensor may be supplied to the information processing device 20 via the communication unit 31 . Such information can be used for calculation of the control value by the control value calculator 221 .
Also, the controller 30 may be provided with a voice input unit and a voice recognition technology may be applied. For example, the controller 30 may include a voice input unit such as a microphone and a voice recognition unit, and may supply commands uttered by the user and information indicating user calls to the information processing apparatus 20 via the communication unit 31. good.
 表示装置40は、図2に示すように、受信部41、および表示部42を含む。
 受信部41は、情報処理装置20の画像生成部222が生成した表示画像を示す情報を、通信部21を介して受信する。
 表示部42は、例えばLCD(Liquid Crystal Display)や有機ELなどのモニタを備え、受信部41により受信した情報に基づき、表示画像を表示することにより、ユーザーに提示することが可能である。
The display device 40 includes a receiver 41 and a display 42, as shown in FIG.
The reception unit 41 receives information indicating the display image generated by the image generation unit 222 of the information processing device 20 via the communication unit 21 .
The display unit 42 has a monitor such as an LCD (Liquid Crystal Display) or an organic EL, and can present the information to the user by displaying a display image based on the information received by the reception unit 41 .
 なお、上述した表示装置40には公知の様々な構成を適用することが可能である。例えば、図1に示した専用の表示装置により構成されてもよいし、ユーザーの頭部に装着されるHMDなどの表示装置により構成されてもよい。例えば、HMDの表示部は、例えばLCD(Liquid Crystal Display)、有機ELなどの表示素子と、レンズなどの光学装置とを備え、表示素子は、透過型の表示素子であってもよいし、非透過型の表示素子であってもよい。さらに、AR(Augmented Reality)グラス、MR(Mixed Reality)グラスなどの装着型デバイスをHMDとして使用してもよい。また、上述した表示装置40は、コンピュータの表示装置により構成されてもよいし、スマートフォンなどの端末装置の表示装置により構成されてもよい。また、表示部42の表面に接触を検知するタッチパネルを備えてもよい。 Various known configurations can be applied to the display device 40 described above. For example, it may be configured by the dedicated display device shown in FIG. 1, or may be configured by a display device such as an HMD mounted on the user's head. For example, the display unit of the HMD includes a display element such as LCD (Liquid Crystal Display) or organic EL, and an optical device such as a lens. It may be a transmissive display element. Furthermore, wearable devices such as AR (Augmented Reality) glasses and MR (Mixed Reality) glasses may be used as the HMD. Further, the display device 40 described above may be configured by a display device of a computer, or may be configured by a display device of a terminal device such as a smart phone. A touch panel for detecting contact may be provided on the surface of the display unit 42 .
 情報処理装置20の制御値算出部221は、上述したように、表示装置40に表示する表示画像へのフィードバック制御の制御値を算出するが、より具体的には、制御値算出部221は、ユーザー操作に応じたフィードバック制御として表示画像をどのように変化させるかを示す制御値を算出する。制御値算出部221による制御値の算出は、予め定められた計算式やテーブル等にしたがって行うことができる。 As described above, the control value calculation unit 221 of the information processing device 20 calculates the control value for feedback control to the display image displayed on the display device 40. More specifically, the control value calculation unit 221 A control value is calculated that indicates how the display image is to be changed as feedback control according to the user's operation. The calculation of the control value by the control value calculator 221 can be performed according to a predetermined calculation formula, table, or the like.
 情報処理装置20の画像生成部222は、上述したように、制御値算出部221により算出した制御値に応じて表示装置40に表示する表示画像を生成する。より具体的には、画像生成部222は、表示画像を変化させるための制御値にしたがって、新たに表示装置40に表示する表示画像を生成する。なお、表示画像の生成には、カメラユニット10により推定したユーザーの状態およびコントローラ30の姿勢が反映される。そのため、例えば、ユーザーが静止し、コントローラ30の姿勢が変化しない場合には、生成される表示画像の変化が小さいか変化がなく、ユーザー操作が行われた場合には、生成される表示画像はユーザー操作に応じて変化した画像となる。また、カメラユニット10の被写界に複数のユーザーが含まれる場合には、生成される表示画像はユーザーの人数に応じて変化した画像となる。 The image generation unit 222 of the information processing device 20 generates a display image to be displayed on the display device 40 according to the control value calculated by the control value calculation unit 221, as described above. More specifically, the image generator 222 generates a new display image to be displayed on the display device 40 according to the control value for changing the display image. Note that the state of the user estimated by the camera unit 10 and the orientation of the controller 30 are reflected in the generation of the display image. Therefore, for example, when the user stands still and the posture of the controller 30 does not change, the change in the generated display image is small or there is no change. The image changes according to the user's operation. Also, when a plurality of users are included in the field of the camera unit 10, the generated display image is an image that changes according to the number of users.
 図5は、本発明の一実施形態に係るカメラユニット10の処理の例を示すフローチャートである。図示された例では、RGBカメラ11のイメージセンサ111がRGB画像信号113を生成する(ステップS101)とともに、EDS12のセンサ121がイベント信号123を生成する(ステップS102)。
 そして、第1認識部141がユーザーを認識し(ステップS103)、座標算出部142および状態推定部144がユーザーの状態を推定する(ステップS104)。次に、第2認識部145がコントローラ30の状態を推定し(ステップS105)、姿勢推定部146がコントローラ30の姿勢を推定する(ステップS106)。
 そして、情報出力部15がユーザーの状態を示す情報、コントローラ30の姿勢を示す情報、及びユーザーと、当該ユーザーが保持するコントローラとの組み合わせを示す情報を出力する(ステップS107)。
 推定部14は、ステップS103からステップS107を繰り返す(ステップS101からS102の処理も繰り返されるが、必ずしもステップS103以降の処理と同じ周期でなくてもよい)ことにより、情報の出力を継続する。
FIG. 5 is a flow chart showing an example of processing of the camera unit 10 according to one embodiment of the present invention. In the illustrated example, the image sensor 111 of the RGB camera 11 generates the RGB image signal 113 (step S101) and the sensor 121 of the EDS 12 generates the event signal 123 (step S102).
Then, the first recognition unit 141 recognizes the user (step S103), and the coordinate calculation unit 142 and state estimation unit 144 estimate the state of the user (step S104). Next, the second recognition unit 145 estimates the state of the controller 30 (step S105), and the posture estimation unit 146 estimates the posture of the controller 30 (step S106).
Then, the information output unit 15 outputs information indicating the state of the user, information indicating the posture of the controller 30, and information indicating the combination of the user and the controller held by the user (step S107).
The estimating unit 14 continues outputting information by repeating steps S103 to S107 (the processing of steps S101 and S102 is also repeated, but the cycle may not necessarily be the same as the processing after step S103).
 このような処理により、カメラユニット10は、常に最新のユーザーの状態を示す情報、コントローラ30の姿勢を示す情報、及びユーザーと、当該ユーザーが保持するコントローラ30との組み合わせを示す情報を出力する。
 情報処理装置20は、これらの情報に基づいてコントローラ30へのフィードバック制御の制御値を算出することにより、ユーザーの状態およびコントローラ30の姿勢の変化に応じた好適なフィードバック制御を実現することができる。
 例えば、図6Aおよび図6Bに例示するように、ユーザーの人数が変化する場合を考える。図6Aに示すように、カメラユニット10の被写界内に一人のユーザーU1が存在する状態では、カメラユニット10は、ユーザーU1の状態を示す情報と、ユーザーU1が保持するコントローラ30U1の姿勢を示す情報とを出力し、情報処理装置20は、コントローラ30U1へのフィードバック制御の制御値を算出する。その後、図6Bに示すように、カメラユニット10の被写界内にもう一人のユーザーU2が入ってきた場合、カメラユニット10は、ユーザーU1およびユーザーU2の状態を示す情報と、ユーザーU1が保持するコントローラ30U1およびユーザーU2が保持するコントローラ30U2の姿勢を示す情報と、ユーザーと当該ユーザーが保持するコントローラ30との組み合わせを示す情報とを出力し、情報処理装置20は、コントローラ30U1へのフィードバック制御の制御値と、コントローラ30U2へのフィードバック制御の制御値とをそれぞれ算出する。
 つまり、ユーザーの人数が動的に変化しても、ユーザーの状態を示す情報、コントローラ30の姿勢を示す情報、およびユーザーと当該ユーザーが保持するコントローラ30との組み合わせを示す情報が更新されるため、常に最新の状態に合わせたフィードバック制御を実現することができる。
Through such processing, the camera unit 10 always outputs information indicating the latest state of the user, information indicating the posture of the controller 30, and information indicating the combination of the user and the controller 30 held by the user.
The information processing device 20 calculates a control value for feedback control to the controller 30 based on these pieces of information, thereby realizing suitable feedback control according to changes in the state of the user and the posture of the controller 30. .
For example, consider the case where the number of users changes, as illustrated in FIGS. 6A and 6B. As shown in FIG. 6A, when one user U1 exists in the field of the camera unit 10, the camera unit 10 receives information indicating the state of the user U1 and the attitude of the controller 30 U1 held by the user U1. , and the information processing device 20 calculates a control value for feedback control to the controller 30U1 . After that, as shown in FIG. 6B, when another user U2 enters the field of view of the camera unit 10, the camera unit 10 displays the information indicating the states of the user U1 and the user U2 and the information held by the user U1. information indicating the attitude of controller 30 U1 and controller 30 U2 held by user U2, and information indicating the combination of the user and controller 30 held by the user, and information processing apparatus 20 outputs controller 30 U1 to controller 30 U1. and the control value of the feedback control to the controller 30U2 .
In other words, even if the number of users dynamically changes, the information indicating the user's state, the information indicating the attitude of the controller 30, and the information indicating the combination of the user and the controller 30 held by the user are updated. , it is possible to realize feedback control that always matches the latest state.
 また、例えば、図7Aおよび図7Bに例示するように、ユーザーの人数が変化し、コントローラ30を保持するユーザーが変化する場合を考える。図7Aに示すように、カメラユニット10の被写界内に一人のユーザーU1が存在する状態では、カメラユニット10は、ユーザーU1の状態を示す情報と、ユーザーU1が保持するコントローラ30U1の姿勢を示す情報とを出力し、情報処理装置20は、コントローラ30U1へのフィードバック制御の制御値を算出する。その後、図7Bに示すように、カメラユニット10の被写界内にもう一人のユーザーU2が入ってきて、ユーザーU1が保持していたコントローラ30U1をユーザーU2が保持した場合、カメラユニット10は、ユーザーU1およびユーザーU2の状態を示す情報と、ユーザーU2が保持するコントローラ30U1の姿勢を示す情報と、ユーザーと当該ユーザーが保持するコントローラ30との組み合わせを示す情報とを出力し、情報処理装置20は、コントローラ30U1へのフィードバック制御の制御値を算出する。
 つまり、あるコントローラ30を保持するユーザーが動的に変化しても、ユーザーの状態を示す情報、コントローラ30の姿勢を示す情報、およびユーザーと当該ユーザーが保持するコントローラ30との組み合わせを示す情報が更新されるため、常に最新の状況に合わせたフィードバック制御を実現することができる。
 なお、図示は省略するが、ユーザーの人数が変化せず、コントローラ30を保持するユーザーが変化する場合についても同様である。何れの場合においても、ユーザーの人数が変化しても、コントローラ30を保持するユーザーが変化しても、特別な設定処理等を行うことなく、状況に合わせたフィードバック制御を実現することができる。
Also, for example, as illustrated in FIGS. 7A and 7B, consider a case where the number of users changes and the users who hold the controller 30 change. As shown in FIG. 7A, when one user U1 exists in the field of view of the camera unit 10, the camera unit 10 receives information indicating the state of the user U1 and the attitude of the controller 30 U1 held by the user U1. , and the information processing device 20 calculates a control value for feedback control to the controller 30U1 . After that, as shown in FIG. 7B, when another user U2 enters the field of view of the camera unit 10 and the user U2 holds the controller 30 U1 held by the user U1, the camera unit 10 , information indicating the states of the user U1 and the user U2, information indicating the posture of the controller 30 U1 held by the user U2, and information indicating the combination of the user and the controller 30 held by the user, and performing information processing. The device 20 calculates a control value for feedback control to the controller 30U1 .
In other words, even if the user holding a certain controller 30 changes dynamically, information indicating the state of the user, information indicating the posture of the controller 30, and information indicating the combination of the user and the controller 30 held by the user are Since it is updated, it is possible to realize feedback control that always matches the latest situation.
Although illustration is omitted, the same applies when the number of users does not change and the user holding the controller 30 changes. In either case, even if the number of users changes or the user holding the controller 30 changes, feedback control suitable for the situation can be realized without performing special setting processing or the like.
 以上で説明したような本発明の一実施形態では、カメラユニット10は、イメージセンサ111およびセンサ121によりそれぞれ生成したRGB画像信号113およびイベント信号123に基づいて、ユーザーの状態と、当該ユーザーが保持するコントローラ30の姿勢とを推定して推定結果を出力し、情報処理装置20は、ユーザーの状態を示す情報とコントローラ30の姿勢を示す情報との少なくとも一方に基づいて、コントローラ30へのフィードバック制御の制御値を算出する。
 したがって、カメラユニット10を介したユーザー操作が行われた場合に、ユーザーの状態を示す情報とコントローラ30の姿勢を示す情報に基づいてコントローラ30へのフィードバック制御の制御値を算出することにより、レイテンシを抑えつつ、コントローラへの好適なフィードバック制御を実現する。
In one embodiment of the present invention as described above, the camera unit 10 is based on the RGB image signal 113 and the event signal 123 generated by the image sensor 111 and the sensor 121, respectively, based on the user's state and the state held by the user. The information processing device 20 performs feedback control to the controller 30 based on at least one of the information indicating the state of the user and the information indicating the orientation of the controller 30. Calculate the control value of
Therefore, when a user operation is performed via the camera unit 10, by calculating a control value for feedback control to the controller 30 based on information indicating the user's state and information indicating the attitude of the controller 30, the latency To realize suitable feedback control to a controller while suppressing
 また、RGB画像信号113およびイベント信号123の生成からユーザーの状態およびコントローラ30の姿勢の推定までをカメラユニット10で行い、RGB画像信号113およびイベント信号123は出力せずに、推定結果を示す情報を出力することにより、通信負荷や通信遅延の問題を低減することができる。さらに、RGB画像信号113およびイベント信号123を出力する必要がないため、プライバシー保護の点でも有用である。
 また、本発明の一実施形態のカメラユニット10は、ユーザーの状態およびコントローラ30の姿勢を推定してユーザー操作を受け付けることができるため、従来のポインティングデバイス方式の操作装置のようにカーソル位置を維持するためにユーザーの肉体疲労を引き起こすこともない。また、カメラユニット10は、従来の姿勢検出方式の操作装置のように認識対象となるマーカーやアタッチメントをユーザーに装着させる必要もない。
Further, the camera unit 10 performs from the generation of the RGB image signal 113 and the event signal 123 to the estimation of the state of the user and the posture of the controller 30, and the RGB image signal 113 and the event signal 123 are not output, and information indicating the estimation result can reduce the problem of communication load and communication delay. Furthermore, since it is not necessary to output the RGB image signal 113 and the event signal 123, it is also useful in terms of privacy protection.
In addition, since the camera unit 10 of one embodiment of the present invention can estimate the user's state and the attitude of the controller 30 and accept user operations, the cursor position can be maintained as in the conventional pointing device type operation device. It also does not cause physical fatigue for the user. In addition, the camera unit 10 does not require the user to wear a marker or an attachment to be recognized, unlike a conventional posture detection type operating device.
 また、本発明の一実施形態では、EDS12およびRGBカメラ11の両方を備え、イベント信号123とRGB画像信号113とに基づいて、ユーザーの状態およびコントローラ30の姿勢を推定する。そのため、RGB画像信号113およびイベント信号123のそれぞれの特性を活かした好適な処理を実現することができる。 Also, in one embodiment of the present invention, both the EDS 12 and the RGB camera 11 are provided, and the user's state and the attitude of the controller 30 are estimated based on the event signal 123 and the RGB image signal 113 . Therefore, it is possible to realize suitable processing that takes advantage of the respective characteristics of the RGB image signal 113 and the event signal 123 .
 また、本発明の一実施形態では、推定部14により推定されるユーザーの状態は、ユーザーの姿勢、ユーザーの腕の形状、またはユーザーの手指の形状の少なくとも1つを含む。したがって、特徴的なユーザーの状態を推定し、ユーザー操作の意図や内容を的確に把握することができる。 In addition, in one embodiment of the present invention, the user's state estimated by the estimation unit 14 includes at least one of the user's posture, the shape of the user's arm, or the shape of the user's fingers. Therefore, it is possible to estimate the characteristic user state and accurately grasp the intention and content of the user's operation.
 また、本発明の一実施形態では、推定部14は、複数の関節を有する人物の画像と、複数の関節の位置を示す座標情報との関係性を学習することによって構築された学習済みモデルに基づいて、RGB画像信号113に基づく画像に含まれるユーザーの少なくとも1つの関節の座標情報を算出し、座標情報に基づいてユーザーの状態を推定する。したがって、ユーザーの状態を的確かつ高速に推定することができる。 Further, in one embodiment of the present invention, the estimation unit 14 uses a learned model constructed by learning the relationship between an image of a person having multiple joints and coordinate information indicating the positions of the multiple joints. Based on this, the coordinate information of at least one joint of the user included in the image based on the RGB image signal 113 is calculated, and the state of the user is estimated based on the coordinate information. Therefore, the user's condition can be accurately and quickly estimated.
 また、本発明の一実施形態では、RGB画像信号113とイベント信号123との少なくとも一方に基づいて、被写界に含まれる1以上のユーザーを認識する第1認識部141を備え、推定部14は、第1認識部141により認識したユーザーごとに、ユーザーの状態と、当該ユーザーが保持するコントローラ30の姿勢とを推定する。したがって、カメラユニット10の被写界に含まれる複数のユーザーごとにユーザー操作を把握することができる。 Further, in one embodiment of the present invention, the first recognition unit 141 recognizes one or more users included in the object scene based on at least one of the RGB image signal 113 and the event signal 123, and the estimation unit 14 estimates the user's state and the posture of the controller 30 held by the user for each user recognized by the first recognition unit 141 . Therefore, it is possible to grasp user operations for each of a plurality of users included in the field of the camera unit 10 .
 また、本発明の一実施形態では、第1認識部141により認識したユーザーごとに、当該ユーザーが保持するコントローラ30を認識する第2認識部145を備え、情報出力部15は、第1認識部141により認識したユーザーと、第2認識部145により認識したコントローラ30との組み合わせを示す情報を出力する。したがって、複数のユーザーにより複数のコントローラ30が使用されている状態であっても、ユーザーとコントローラ30との組み合わせをふまえたユーザー操作を把握し、フィードバック制御に反映させることができる。 Further, in one embodiment of the present invention, a second recognition unit 145 is provided for each user recognized by the first recognition unit 141, and the second recognition unit 145 recognizes the controller 30 held by the user. Information indicating the combination of the user recognized by 141 and the controller 30 recognized by the second recognition unit 145 is output. Therefore, even in a state where a plurality of controllers 30 are used by a plurality of users, user operations based on combinations of users and controllers 30 can be grasped and reflected in feedback control.
 図8は、本発明の別の実施形態に係るシステムの概略的な構成を示すブロック図である。なお、図8は、図2の情報処理装置20に代えて、サーバ50および端末装置60を備えたシステム2の構成を示すブロック図であり、図8においては、図2と実質的に同一の機能構成を有する構成要素については、同一の符号を付する。 FIG. 8 is a block diagram showing a schematic configuration of a system according to another embodiment of the invention. 8 is a block diagram showing the configuration of a system 2 having a server 50 and a terminal device 60 instead of the information processing device 20 of FIG. Constituent elements having functional configurations are given the same reference numerals.
 図8の例においては、サーバ50は、カメラユニット10および端末装置60とインターネット通信網や、無線によって通信可能に接続されたサーバ(例えば、クラウドサーバ)である。サーバ50は、図2で説明した情報処理装置20と同様の構成を有し、カメラユニット10により出力された情報に基づく各種処理を行う。また、端末装置60は、通信部61を備え、通信部61は、サーバ50から出力された情報を受信する。また、通信部61は、図2で説明した情報処理装置20の通信部21と同様に、コントローラ30と相互に通信可能であるとともに、表示装置40に表示させる画像を出力する。
 このような構成により、RGB画像信号113およびイベント信号123の生成から人物の状態の推定までをカメラユニット10で行い、サーバ50には推定した情報を出力することにより、クラウドサーバなどのサーバを用いたゲームシステムにおいても同様の効果を得ることができる。
In the example of FIG. 8, the server 50 is a server (for example, a cloud server) communicably connected to the camera unit 10 and the terminal device 60 via the Internet communication network or wirelessly. The server 50 has the same configuration as the information processing apparatus 20 described with reference to FIG. 2, and performs various processes based on the information output by the camera unit 10. The terminal device 60 also includes a communication unit 61 , and the communication unit 61 receives information output from the server 50 . The communication unit 61 can communicate with the controller 30 and outputs an image to be displayed on the display device 40, like the communication unit 21 of the information processing apparatus 20 described with reference to FIG.
With such a configuration, the camera unit 10 performs everything from generating the RGB image signal 113 and the event signal 123 to estimating the state of the person, and outputting the estimated information to the server 50 enables the use of a server such as a cloud server. A similar effect can be obtained in a game system that has been used.
 また、上記の各例において、RGBカメラ11およびEDS12の数は同数であってもよいし、異なる数であってもよい。また、RGBカメラ11およびEDS12の数は、それぞれ1つであってもよいし、複数であってもよい。例えば、複数のRGBカメラ11を備える場合には、RGB画像信号113を生成する被写界のレンジを拡大したり、複数のRGB画像信号113から人物の状態を三次元で推定したりすることができる。また、例えば、複数のEDS12を備える場合には、イベント信号123を生成する被写界のレンジを拡大したり、複数のイベント信号123に基づいて、人物の三次元の移動量を算出したりすることができる。 Also, in each of the above examples, the number of RGB cameras 11 and EDS 12 may be the same or may be different. Also, the number of RGB cameras 11 and EDS 12 may be one or more. For example, when a plurality of RGB cameras 11 are provided, it is possible to expand the range of the object field for generating the RGB image signals 113, or to estimate the state of a person in three dimensions from the plurality of RGB image signals 113. can. Further, for example, when a plurality of EDSs 12 are provided, the range of the object field for generating the event signal 123 is expanded, and the three-dimensional movement amount of the person is calculated based on the plurality of event signals 123. be able to.
 また、上記の各例で説明されたカメラユニット10は、単一の装置内で実装されてもよいし、複数の装置に分散して実装されてもよい。例えば、各センサの少なくとも一部を独立に備え、その他の構成をカメラユニット10本体として実装してもよい。 In addition, the camera unit 10 described in each of the above examples may be implemented within a single device, or may be implemented distributed among a plurality of devices. For example, at least a portion of each sensor may be provided independently, and another configuration may be implemented as the camera unit 10 main body.
 以上、添付図面を参照しながら本発明のいくつかの実施形態について詳細に説明したが、本発明はかかる例に限定されない。本発明の属する技術の分野における通常の知識を有する者であれば、請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本発明の技術的範囲に属するものと了解される。 Although several embodiments of the present invention have been described in detail above with reference to the accompanying drawings, the present invention is not limited to such examples. It is clear that a person having ordinary knowledge in the technical field to which the present invention belongs can conceive of various modifications or modifications within the scope of the technical idea described in the claims. It is understood that these also belong to the technical scope of the present invention.
 1・2…システム、10…カメラユニット、11…RGBカメラ、12…EDS、13…IMU、14…推定部14…情報出力部、20…情報処理装置、21・31・61…通信部、22…制御部、32…操作部、33…力覚提示部、34…振動部、35…音声出力部、40…表示装置、41…受信部、42…表示部、50…サーバ、60…端末装置、111…イメージセンサ、112・122…処理回路、113…RGB画像信号、121…センサ、123…イベント信号、141…第1認識部、142…座標算出部、143…学習済みモデル、144…状態推定部、145…第2認識部、146…姿勢推定部、221…制御値算出部、222…画像生成部。 Reference Signs List 1 2 system 10 camera unit 11 RGB camera 12 EDS 13 IMU 14 estimation unit 14 information output unit 20 information processing device 21 31 61 communication unit 22 ... control section 32 ... operation section 33 ... force presentation section 34 ... vibration section 35 ... audio output section 40 ... display device 41 ... reception section 42 ... display section 50 ... server 60 ... terminal device , 111... Image sensor 112/122... Processing circuit 113... RGB image signal 121... Sensor 123... Event signal 141... First recognition unit 142... Coordinate calculation unit 143... Learned model 144... State Estimation unit 145 Second recognition unit 146 Attitude estimation unit 221 Control value calculation unit 222 Image generation unit.

Claims (7)

  1.  センサ装置と、ユーザー操作を受け付けるコントローラと、前記ユーザー操作に基づいて処理を行う情報処理装置とを含むシステムであって、
     前記センサ装置は、
      所定のタイミングで全画素を同期的にスキャンすることによって第1画像信号を生成する第1画像センサと、
      画素ごとに入射する光の強度変化を検出したときに非同期的に第2画像信号を生成するイベント駆動型のビジョンセンサを含む第2画像センサと、
      前記第1画像信号と前記第2画像信号とに基づいて、ユーザーの状態と、当該ユーザーが保持する前記コントローラの姿勢とを推定する推定部と、
      前記ユーザーの状態を示す情報と前記コントローラの姿勢を示す情報とを出力する情報出力部とを備え、
     前記情報処理装置は、
      前記ユーザーの状態を示す情報と前記コントローラの姿勢を示す情報との少なくとも一方に基づいて、前記コントローラへのフィードバック制御の制御値を算出する制御値算出部を備え、
     前記コントローラは、前記制御値に基づいて力覚を提示する力覚提示装置、前記制御値に基づいて振動する振動装置、または前記制御値に基づいて音声を出力する音声出力装置の少なくとも1つを有する、システム。
    A system that includes a sensor device, a controller that receives a user operation, and an information processing device that performs processing based on the user operation,
    The sensor device is
    a first image sensor that generates a first image signal by synchronously scanning all pixels at a predetermined timing;
    a second image sensor that includes an event-driven vision sensor that asynchronously generates a second image signal upon detecting a change in intensity of light incident on each pixel;
    an estimating unit that estimates a state of a user and an orientation of the controller held by the user based on the first image signal and the second image signal;
    an information output unit that outputs information indicating the state of the user and information indicating the attitude of the controller;
    The information processing device is
    a control value calculation unit that calculates a control value for feedback control to the controller based on at least one of information indicating the state of the user and information indicating the attitude of the controller;
    The controller operates at least one of a force sense presentation device that presents a force sense based on the control value, a vibration device that vibrates based on the control value, or an audio output device that outputs sound based on the control value. have a system.
  2.  前記ユーザーの状態は、前記ユーザーの姿勢、前記ユーザーの腕の形状、または前記ユーザーの手指の形状の少なくとも1つを含む、請求項1に記載のシステム。 The system according to claim 1, wherein the user's state includes at least one of the user's posture, the shape of the user's arm, or the shape of the user's fingers.
  3.  前記推定部は、複数の関節を有する人物の画像と、前記複数の関節の位置を示す座標情報との関係性を学習することによって構築された学習済みモデルに基づいて、前記第1画像信号に基づく第1画像に含まれる前記ユーザーの少なくとも1つの関節の座標情報を算出し、前記座標情報に基づいて前記ユーザーの状態を推定する、請求項1または請求項2に記載のシステム。 The estimation unit estimates the first image signal based on a learned model constructed by learning a relationship between an image of a person having multiple joints and coordinate information indicating positions of the multiple joints. 3. The system according to claim 1 or 2, wherein coordinate information of at least one joint of said user included in a first image based thereon is calculated, and said user's state is estimated based on said coordinate information.
  4.  前記情報処理装置は、前記第1画像信号と前記第2画像信号との少なくとも一方に基づいて、被写界に含まれる1以上のユーザーを認識する第1認識部をさらに備え、
     前記推定部は、前記第1認識部により認識した前記ユーザーごとに、前記ユーザーの状態と、当該ユーザーが保持する前記コントローラの姿勢とを推定する、請求項1から請求項3のいずれか1項に記載のシステム。
    The information processing device further includes a first recognition unit that recognizes one or more users included in the object scene based on at least one of the first image signal and the second image signal,
    4. The estimation unit according to any one of claims 1 to 3, wherein, for each user recognized by the first recognition unit, the estimation unit estimates the state of the user and the posture of the controller held by the user. The system described in .
  5.  前記システムは複数の前記コントローラを含み、
     前記情報処理装置は、前記第1認識部により認識した前記ユーザーごとに、当該ユーザーが保持する前記コントローラを認識する第2認識部をさらに備え、
     前記情報出力部は、前記第1認識部により認識した前記ユーザーと、前記第2認識部により認識した前記コントローラとの組み合わせを示す情報を出力する、請求項4に記載のシステム。
    the system includes a plurality of the controllers;
    The information processing device further includes a second recognition unit that recognizes the controller held by the user for each of the users recognized by the first recognition unit,
    5. The system according to claim 4, wherein said information output unit outputs information indicating a combination of said user recognized by said first recognition unit and said controller recognized by said second recognition unit.
  6.  コントローラへのフィードバック制御の制御値を出力する情報処理方法であって、
     所定のタイミングで全画素を同期的にスキャンする第1画像センサにより生成された第1画像信号と、画素ごとに入射する光の強度変化を検出したときに非同期的に第2画像信号を生成するイベント駆動型のビジョンセンサを含む第2画像センサにより生成された前記第2画像信号に基づいて、前記ユーザーの状態と、当該ユーザーが保持する前記コントローラの姿勢とを推定する推定ステップと、
     前記ユーザーの状態を示す情報と前記コントローラの姿勢を示す情報とを出力する情報出力ステップと
     を含む情報処理方法。
    An information processing method for outputting a control value for feedback control to a controller,
    A first image signal is generated by a first image sensor that synchronously scans all pixels at a predetermined timing, and a second image signal is generated asynchronously when a change in intensity of light incident on each pixel is detected. an estimating step of estimating the state of the user and the pose of the controller held by the user based on the second image signal produced by a second image sensor, including an event-driven vision sensor;
    and an information output step of outputting information indicating the state of the user and information indicating the attitude of the controller.
  7.  所定のタイミングで全画素を同期的にスキャンする第1画像センサにより生成された第1画像信号と、画素ごとに入射する光の強度変化を検出したときに非同期的に第2画像信号を生成するイベント駆動型のビジョンセンサを含む第2画像センサにより生成された前記第2画像信号とに基づいて、前記ユーザーの状態と、当該ユーザーが保持するコントローラの姿勢とを推定する機能と、
     前記ユーザーの状態を示す情報と前記コントローラの姿勢を示す情報とを出力する機能と
     をコンピュータに実現させる情報処理プログラム。
    A first image signal is generated by a first image sensor that synchronously scans all pixels at a predetermined timing, and a second image signal is generated asynchronously when a change in intensity of light incident on each pixel is detected. estimating the state of the user and the pose of a controller held by the user based on the second image signal generated by a second image sensor including an event-driven vision sensor;
    An information processing program that causes a computer to implement a function of outputting information indicating the state of the user and information indicating the attitude of the controller.
PCT/JP2022/013835 2021-04-13 2022-03-24 System, information processing method, and information processing program WO2022220048A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021-067660 2021-04-13
JP2021067660A JP2022162702A (en) 2021-04-13 2021-04-13 System, information processing method, and information processing program

Publications (1)

Publication Number Publication Date
WO2022220048A1 true WO2022220048A1 (en) 2022-10-20

Family

ID=83640587

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/013835 WO2022220048A1 (en) 2021-04-13 2022-03-24 System, information processing method, and information processing program

Country Status (2)

Country Link
JP (1) JP2022162702A (en)
WO (1) WO2022220048A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007296248A (en) * 2006-05-02 2007-11-15 Sony Computer Entertainment Inc Game device
JP2012521039A (en) * 2009-03-20 2012-09-10 マイクロソフト コーポレーション Working with virtual objects
JP2015527627A (en) * 2012-06-04 2015-09-17 株式会社ソニー・コンピュータエンタテインメント Multi-image interactive gaming device
US20160187990A1 (en) * 2014-12-26 2016-06-30 Samsung Electronics Co., Ltd. Method and apparatus for processing gesture input

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007296248A (en) * 2006-05-02 2007-11-15 Sony Computer Entertainment Inc Game device
JP2012521039A (en) * 2009-03-20 2012-09-10 マイクロソフト コーポレーション Working with virtual objects
JP2015527627A (en) * 2012-06-04 2015-09-17 株式会社ソニー・コンピュータエンタテインメント Multi-image interactive gaming device
US20160187990A1 (en) * 2014-12-26 2016-06-30 Samsung Electronics Co., Ltd. Method and apparatus for processing gesture input

Also Published As

Publication number Publication date
JP2022162702A (en) 2022-10-25

Similar Documents

Publication Publication Date Title
US10198874B2 (en) Methods and apparatus to align components in virtual reality environments
CN105094313B (en) For providing the system and method for touch feedback for remote interaction
US10997949B2 (en) Time synchronization between artificial reality devices
CN112312979A (en) Clock synchronization of head mounted display and controller on electromagnetic field
JP2003337963A (en) Device and method for image processing, and image processing program and recording medium therefor
CN107407965A (en) It is connected to the virtual reality helmet of mobile computing device
JP2015166890A (en) Information processing apparatus, information processing system, information processing method, and program
CN106796452A (en) By the head-mounted display apparatus and its control method, the computer program for controlling the device that tap control
CN110851095B (en) Multi-screen interactions in virtual and augmented reality
CN103914128B (en) Wear-type electronic equipment and input method
WO2019155840A1 (en) Information processing device, information processing method, and program
US11209657B2 (en) Position tracking system for head-mounted display systems that includes angle sensitive detectors
US10978019B2 (en) Head mounted display system switchable between a first-person perspective mode and a third-person perspective mode, related method and related non-transitory computer readable storage medium
JP6969577B2 (en) Information processing equipment, information processing methods, and programs
WO2022220048A1 (en) System, information processing method, and information processing program
JP7300436B2 (en) Information processing device, system, information processing method and information processing program
CN114731469A (en) Audio sample phase alignment in artificial reality systems
JP6535699B2 (en) INFORMATION PROCESSING METHOD, INFORMATION PROCESSING PROGRAM, AND INFORMATION PROCESSING APPARATUS
WO2022113834A1 (en) System, imaging device, information processing device, information processing method, and information processing program
WO2022220049A1 (en) System, information processing method, and information processing program
US11126267B2 (en) Tactile feedback device and operation method thereof
US20240127629A1 (en) System, information processing method, and information processing program
TW201944365A (en) A method to enhance first-person-view experience
WO2023286191A1 (en) Information processing apparatus and driving data generation method
JP2018000577A (en) Information processing method and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22787973

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22787973

Country of ref document: EP

Kind code of ref document: A1