WO2021240668A1 - Gesture detection device and gesture detection method - Google Patents

Gesture detection device and gesture detection method Download PDF

Info

Publication number
WO2021240668A1
WO2021240668A1 PCT/JP2020/020828 JP2020020828W WO2021240668A1 WO 2021240668 A1 WO2021240668 A1 WO 2021240668A1 JP 2020020828 W JP2020020828 W JP 2020020828W WO 2021240668 A1 WO2021240668 A1 WO 2021240668A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
hand
occupant
face
detected
Prior art date
Application number
PCT/JP2020/020828
Other languages
French (fr)
Japanese (ja)
Inventor
太郎 熊谷
拓也 村上
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to PCT/JP2020/020828 priority Critical patent/WO2021240668A1/en
Publication of WO2021240668A1 publication Critical patent/WO2021240668A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/36Input/output arrangements for on-board computers
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/16Anti-collision systems

Definitions

  • This disclosure relates to a gesture detection device and a gesture detection method.
  • Patent Document 1 proposes a control device that detects information about a user's hand only from a gesture area set based on the area of the driver's face.
  • the gesture detection device detects the occupant's hand based on the image. Therefore, depending on the state of the image, the gesture detection device may detect an object other than the hand as a hand.
  • the present disclosure is for solving the above-mentioned problems, and an object of the present disclosure is to provide a gesture detection device that accurately detects a hand in a gesture of an occupant.
  • the gesture detection device includes a face information acquisition unit, a hand candidate detection unit, and a determination unit.
  • the face information acquisition unit acquires information on the face orientation of the occupant.
  • the face orientation is detected based on the image captured by the image pickup device provided in the vehicle.
  • the hand candidate detection unit detects a hand candidate that is a candidate for the occupant's hand based on the image.
  • the determination unit rejects the hand candidate information so that the hand candidate is not detected as the occupant's hand in the gesture of the occupant to be detected based on a predetermined condition regarding the face orientation.
  • a gesture detection device for accurately detecting a hand in a occupant's gesture is provided.
  • FIG. It is a functional block diagram which shows the structure of the gesture detection apparatus in Embodiment 1.
  • FIG. It is a figure which shows an example of the structure of the processing circuit included in a gesture detection device. It is a figure which shows another example of the structure of the processing circuit included in the gesture detection apparatus.
  • FIG. It is a functional block diagram which shows the structure of the gesture detection apparatus in Embodiment 2.
  • FIG. It is a figure which shows an example of the face orientation of an occupant in Embodiment 2.
  • FIG. It is a flowchart which shows the gesture detection method in Embodiment 2.
  • FIG. It is a figure which shows an example of the frame to be processed. It is a figure which shows the relationship from the 1st frame to the 4th frame in Embodiment 3.
  • FIG. It is a functional block diagram which shows the structure of the gesture detection apparatus in Embodiment 4. It is a flowchart which shows the gesture detection method in Embodiment 4. It is a figure which shows an example of the frame to be processed. It is a block diagram which shows the structure of the gesture detection device and the device which operates in connection with it in Embodiment 5.
  • FIG. 1 is a functional block diagram showing the configuration of the gesture detection device 100 according to the first embodiment. Further, FIG. 1 shows an image pickup device 110 and a face detection unit 10 as devices that operate in connection with the gesture detection device 100.
  • the image pickup device 110 is provided in the vehicle.
  • the image pickup device 110 captures an image of an occupant inside the vehicle.
  • the face detection unit 10 detects the face orientation of the occupant based on the image.
  • the face orientation corresponds to, for example, the direction facing the front of the occupant's face, the direction of the line of sight, and the like.
  • the gesture detection device 100 detects the gesture of the hand of the occupant of the vehicle based on the image taken by the image pickup device 110.
  • the gesture detection device 100 includes a face information acquisition unit 20, a hand candidate detection unit 30, and a determination unit 40.
  • the face information acquisition unit 20 acquires information on the face orientation of the occupant from the face detection unit 10.
  • the hand candidate detection unit 30 detects a hand candidate that is a candidate for the occupant's hand based on the image taken by the image pickup device 110.
  • the hand candidate detection unit 30 detects a hand candidate by, for example, matching a pattern of the shape of an object (information on the luminance distribution) in the image with a predetermined pattern of the shape of the hand.
  • the determination unit 40 rejects the hand candidate information based on a predetermined condition regarding the face orientation.
  • the gesture detection device 100 does not identify the rejected hand candidate as a hand constituting the occupant's gesture.
  • FIG. 2 is a diagram showing an example of the configuration of the processing circuit 90 included in the gesture detection device 100.
  • Each function of the face information acquisition unit 20, the hand candidate detection unit 30, and the determination unit 40 is realized by the processing circuit 90. That is, the processing circuit 90 has a face information acquisition unit 20, a hand candidate detection unit 30, and a determination unit 40.
  • the processing circuit 90 may be, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), or an FPGA (Field). -ProgrammableGateArray), or a circuit that combines these.
  • the functions of the face information acquisition unit 20, the hand candidate detection unit 30, and the determination unit 40 may be individually realized by a plurality of processing circuits, or may be collectively realized by one processing circuit.
  • FIG. 3 is a diagram showing another example of the configuration of the processing circuit included in the gesture detection device 100.
  • the processing circuit includes a processor 91 and a memory 92.
  • each function of the face information acquisition unit 20, the hand candidate detection unit 30, and the determination unit 40 is realized.
  • each function is realized by executing the software or firmware described as a program by the processor 91.
  • the gesture detection device 100 has a memory 92 for storing the program and a processor 91 for executing the program.
  • the program describes a function in which the gesture detection device 100 acquires information on the face orientation of the occupant detected based on the image captured by the image pickup device 110 provided in the vehicle. Further, the program describes a function in which the gesture detection device 100 detects a hand candidate, which is a candidate for a occupant's hand, based on the image. Further, the program describes a function of rejecting the information of the hand candidate so that the hand candidate is not detected as a hand in the gesture of the occupant to be detected based on the predetermined condition regarding the face orientation. As described above, the program causes the computer to execute the procedure or method of the face information acquisition unit 20, the hand candidate detection unit 30, and the determination unit 40.
  • the processor 91 is, for example, a CPU (Central Processing Unit), an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor), or the like.
  • the memory 92 is, for example, non-volatile or volatile such as RAM (RandomAccessMemory), ROM (ReadOnlyMemory), flash memory, EPROM (ErasableProgrammableReadOnlyMemory), EEPROM (ElectricallyErasableProgrammableReadOnlyMemory). It is a semiconductor memory.
  • the memory 92 may be any storage medium used in the future, such as a magnetic disk, a flexible disk, an optical disk, a compact disk, a mini disk, or a DVD.
  • Each function of the face information acquisition unit 20, the hand candidate detection unit 30, and the determination unit 40 may be partially realized by dedicated hardware and the other part may be realized by software or firmware. In this way, the processing circuit realizes each of the above functions by hardware, software, firmware, or a combination thereof.
  • FIG. 4 is a flowchart showing the gesture detection method in the first embodiment.
  • the face detection unit 10 Prior to step S1 shown in FIG. 4, the face detection unit 10 detects the face orientation of the occupant based on the image taken by the image pickup device 110 provided in the vehicle.
  • step S1 the face information acquisition unit 20 acquires information on the face orientation of the occupant from the face detection unit 10.
  • step S2 the hand candidate detection unit 30 detects a hand candidate that is a candidate for the occupant's hand based on the image captured by the image pickup device 110.
  • step S3 the determination unit 40 determines whether or not to reject the hand candidate information based on a predetermined condition regarding the face orientation.
  • the determination unit 40 rejects the hand candidate information according to the determination result.
  • the rejected hand candidate is not detected as a occupant's hand in the occupant's gesture to be detected. In other words, the gesture detection device 100 does not identify the rejected hand candidate as a hand constituting the occupant's gesture.
  • the gesture detection device 100 in the first embodiment includes a face information acquisition unit 20, a hand candidate detection unit 30, and a determination unit 40.
  • the face information acquisition unit 20 acquires information on the face orientation of the occupant. The face orientation is detected based on the image captured by the image pickup device 110 provided in the vehicle.
  • the hand candidate detection unit 30 detects a hand candidate that is a candidate for the occupant's hand based on the image.
  • the determination unit 40 rejects the hand candidate information so that the hand candidate is not detected as the occupant's hand in the gesture of the occupant to be detected, based on a predetermined condition regarding the face orientation.
  • Such a gesture detection device 100 accurately detects the hand in the gesture of the occupant.
  • the gesture detection method in the first embodiment acquires information on the face orientation of the occupant detected based on the image captured by the image pickup device 110 provided in the vehicle.
  • the gesture detection method detects a hand candidate that is a candidate for the occupant's hand based on the image. Further, the gesture detection method rejects the hand candidate information so that the hand candidate is not detected as a hand in the gesture of the occupant to be detected based on a predetermined condition regarding the face orientation.
  • the occupant's hand in the gesture is accurately detected.
  • the gesture detection device and the gesture detection method according to the second embodiment will be described.
  • the second embodiment is a subordinate concept of the first embodiment, and the gesture detection device according to the second embodiment includes each configuration of the gesture detection device 100 according to the first embodiment. The same configuration and operation as in the first embodiment will be omitted.
  • FIG. 5 is a functional block diagram showing the configuration of the gesture detection device 101 according to the second embodiment. Further, FIG. 5 shows an image pickup device 110 and an in-vehicle device 120 as devices that operate in connection with the gesture detection device 101.
  • the image pickup device 110 is provided in the front center of the vehicle interior.
  • the image pickup apparatus 110 photographs the interior of the vehicle at a wide angle, and photographs both the driver's seat and the passenger seat at the same time.
  • the image pickup device 110 is, for example, a camera that detects infrared rays, a camera that detects visible light, and the like.
  • the gesture detection device 101 according to the second embodiment detects the gesture of the hand of the occupant of the vehicle based on the image captured by the image pickup device 110.
  • the gesture is a gesture for operating the in-vehicle device 120.
  • the in-vehicle device 120 is, for example, an air conditioner, an audio system, or the like.
  • the gesture detected by the gesture detection device 101 controls the temperature of the air conditioner, adjusts the volume of the audio, and the like.
  • the in-vehicle device 120 is not limited to the air conditioner and the audio.
  • the gesture detection device 101 includes a video acquisition unit 50, a face detection unit 10, a storage unit 60, a face information acquisition unit 20, a hand candidate detection unit 30, and a determination unit 40.
  • the image acquisition unit 50 acquires the image captured by the image pickup device 110 for each frame.
  • the face detection unit 10 detects the face and face orientation of the occupant for each frame of the image. For example, the face detection unit 10 detects the face parts of the occupant and detects the face orientation based on the position of the face parts. The face orientation detected based on the position of the face part is the direction facing the front of the occupant's face. Or, for example, the face detection unit 10 detects the line of sight of the occupant and detects the face orientation based on the line of sight. The face orientation detected based on the line of sight is the direction in which the line of sight is facing. That is, the face orientation in the second embodiment includes at least one of the direction facing the front of the occupant's face and the direction of the line of sight.
  • FIG. 6 is a diagram showing an example of the face orientation of the occupant in the second embodiment.
  • Face orientation is represented by Pitch, Yaw, and Roll angles.
  • the pitch angle, yaw angle and roll angle are 0 degrees.
  • the face detection unit 10 detects at least the pitch angle and the yaw angle among the pitch angle, the yaw angle and the roll angle.
  • the face detection unit 10 in the second embodiment detects the head position in the image.
  • the head position detected in the second embodiment is a position in the height direction. In this way, the face detection unit 10 detects the face orientation and the head position of the occupant. The head position can be read as the face position.
  • the storage unit 60 stores information on the face orientation and the head position for each frame.
  • the face information acquisition unit 20 acquires face orientation information for each frame.
  • the face information acquisition unit 20 acquires the face orientation information in the frame to be processed.
  • the face information acquisition unit 20 operates as follows.
  • the frame before the frame to be processed is the first frame
  • the frame to be processed is the second frame.
  • the face orientation of the occupant in the first frame is detected.
  • the occupant's face in the second frame is not detected.
  • the face information acquisition unit 20 acquires the face orientation and head position information in the first frame from the storage unit 60.
  • the second frame is a frame within a predetermined number of frames from the first frame.
  • the predetermined number of frames may be stored in the gesture detection device 101, for example, or may be input from the outside.
  • the first frame is preferably a frame in which the face orientation of the occupant is detected most recently from the second frame.
  • the hand candidate detection unit 30 detects a hand candidate that is a candidate for the occupant's hand for each frame of the image captured by the image pickup device 110.
  • the hand candidate detection unit 30 selects a occupant's hand candidate by, for example, matching a pattern of the shape of an object (information of luminance distribution) in the image with a predetermined pattern of the shape of the hand, that is, by pattern matching processing.
  • the shape of the hand to be detected may be either the shape of the open hand or the shape of the closed hand.
  • the shape of the hand to be detected may be, for example, the shape of the hand indicating the number, the shape of the hand indicating the direction, the shape of the hand indicating the intention of the occupant (OK, Good, etc.), or the like.
  • the determination unit 40 rejects the hand candidate information for each frame based on a predetermined condition regarding the face orientation.
  • the predetermined conditions may be stored in the gesture detection device 101, for example, or may be input from the outside. An example of predetermined conditions will be described later. "Rejecting" may include the determination unit 40 identifying the hand candidate as something other than a hand. Alternatively, "rejecting” may include invalidating the information of the hand candidate by the determination unit 40. In any case, the rejected hand candidate is not detected as the occupant's hand in the occupant's gesture to be detected. In other words, the gesture detection device 101 does not identify the rejected hand candidate as a hand constituting the occupant's gesture.
  • the gesture detection device 101 identifies the hand candidate not rejected by the determination unit 40 as a hand constituting the gesture of the occupant. Based on the gesture by the occupant's hand identified by the gesture detection device 101, the operation process of the in-vehicle device 120 and the like are executed. In the functional block diagram shown in FIG. 5, the functional unit that performs processing between the determination unit 40 and the in-vehicle device 120 is not shown.
  • the determination unit 40 in the second embodiment rejects the hand candidate information when at least one of the pitch angle and the yaw angle indicating the face orientation exceeds a predetermined range. That is, the predetermined condition regarding the face orientation in the second embodiment is that at least one of the pitch angle and the yaw angle representing the face orientation exceeds the predetermined range.
  • the range is, for example, between the angle corresponding to the front direction of the face and the angle corresponding to the oblique direction in which the image pickup apparatus 110 is located. This is because when the occupant makes a gesture, the occupant's face usually faces the image pickup device 110 from the front direction.
  • the range is predetermined for each of the pitch angle and the yaw angle.
  • the predetermined condition regarding the face orientation may be that at least one of the pitch angle and the yaw angle exceeds a predetermined threshold value.
  • the determination unit 40 rejects the hand candidate information of the second frame based on the conditions regarding the face orientation and the head position in the first frame. For example, if at least one of the pitch angle, yaw angle, and head position in the first frame exceeds a predetermined range, the determination unit 40 rejects the hand candidate information in the second frame. That is, the predetermined condition regarding the face orientation is that at least one of the pitch angle, the yaw angle, and the head position in the first frame exceeds the predetermined range.
  • the functions of the face detection unit 10, the face information acquisition unit 20, the hand candidate detection unit 30, the determination unit 40, the image acquisition unit 50, and the storage unit 60 are realized by the processing circuit shown in FIG. 2 or FIG.
  • FIG. 7 is a flowchart showing the gesture detection method in the second embodiment.
  • step S10 the image acquisition unit 50 acquires a frame to be processed in the image captured by the image pickup device 110.
  • step S20 the face detection unit 10 detects the occupant's face, face orientation, and head position in the frame to be processed.
  • step S30 the gesture detection device 101 determines whether or not the face orientation is detected. If the face orientation is detected, step S40 is executed. If no face orientation is detected, step S80 is executed.
  • step S40 the storage unit 60 stores face orientation and head position information for each frame.
  • step S50 the face information acquisition unit 20 acquires face orientation information in the frame to be processed.
  • the face information acquisition unit 20 may acquire the face orientation information from the face detection unit 10 or the storage unit 60.
  • step S60 the hand candidate detection unit 30 detects the occupant's hand candidate in the frame to be processed.
  • step S70 the determination unit 40 determines whether or not the face orientation satisfies a predetermined condition.
  • the predetermined condition is that at least one of the pitch angle and the yaw angle exceeds the predetermined range. If at least one of them exceeds the range, that is, if the condition is satisfied, step S120 is executed. If both are within that range, that is, if the conditions are not met, the gesture detection method ends.
  • FIG. 8 is a diagram showing an example of a frame to be processed.
  • the occupant is gesturing the hand 31 for operating the in-vehicle device 120.
  • the face detection unit 10 detects the occupant's face 11. Further, the face frame 12 is set so as to surround the face 11.
  • the hand candidate detection unit 30 detects the occupant's hand 31 as a hand candidate.
  • the hand candidate frame 32 is set so as to surround the hand candidate.
  • the occupant's face is facing the front. In this case, both the pitch angle and the yaw angle are within a predetermined range.
  • the determination unit 40 determines that the face orientation does not satisfy a predetermined condition. Therefore, in the case of FIG. 8, the gesture detection method ends. That is, the gesture detection device 101 identifies that the hand candidate is the hand 31 that constitutes the gesture of the occupant.
  • FIG. 9 is a diagram showing an example of a frame to be processed.
  • the occupant does not make a hand gesture for operating the in-vehicle device 120.
  • the occupant is looking into the information displayed on the center console of the vehicle.
  • the information is, for example, information about navigation.
  • the face detection unit 10 detects the occupant's face 11.
  • the face frame 12 is set so as to surround the face 11.
  • the hand candidate detection unit 30 erroneously detects the occupant's face 11 as a hand candidate.
  • the hand candidate frame 32 is set so as to include the hand candidate.
  • the hand candidate detection unit 30 may determine that the occupant's face 11 is a closed hand 31 and detect it as a hand candidate. be.
  • step S120 is executed.
  • step S80 the face information acquisition unit 20 determines whether or not the frame to be processed is within a predetermined number of frames from the frame in which the face orientation of the occupant was detected most recently. If the frame to be processed is within a predetermined number of frames, that is, if this condition is satisfied, step S90 is executed. If this condition is not met, the gesture detection method ends.
  • step S90 the face information acquisition unit 20 acquires information on the face orientation and head position in the frame in which the face orientation of the occupant was detected most recently from the storage unit 60.
  • step S100 the hand candidate detection unit 30 detects the occupant's hand candidate in the frame to be processed.
  • step S110 the determination unit 40 determines whether or not the face orientation and the head position satisfy predetermined conditions.
  • the predetermined condition is that at least one of the pitch angle, yaw angle and head position exceeds the predetermined range. If at least one is beyond that range, step S120 is executed. If everything is within that range, the gesture detection method ends.
  • FIG. 10 is a diagram showing an example of a frame to be processed.
  • the frame shown in FIG. 10 is a frame after the frame shown in FIG. 9, and is a frame within a predetermined number of frames.
  • the occupant is looking further in order to confirm the information displayed on the center console of the vehicle in detail.
  • the face detection unit 10 has failed to detect the occupant's face 11, and the face frame 12 is not set.
  • the hand candidate detection unit 30 erroneously detects the occupant's head as a hand candidate. Further, the hand candidate frame 32 is set so as to include the hand candidate.
  • FIG. 10 is a frame within a predetermined number of frames from the frame of FIG.
  • the face information acquisition unit 20 acquires information on the face orientation and the head position in the frame of FIG.
  • the occupant's face is diagonally downward.
  • the pitch angle and yaw angle exceed a predetermined range.
  • the determination unit 40 determines that the face orientation satisfies a predetermined condition. Therefore, step S120 is executed.
  • step S120 the determination unit 40 rejects the hand candidate information.
  • the determination unit 40 identifies the hand candidate as something other than a hand.
  • the determination unit 40 replaces the detection result of the hand candidate with the detection result of an object other than the hand. In this way, the determination unit 40 rejects the hand candidate information based on a predetermined condition regarding the face orientation. This completes the gesture detection method.
  • the gesture detection device 101 performs a hand candidate detection process after performing a face 11 detection process and a face orientation information acquisition process.
  • the gesture detection device 101 may execute the face 11 detection process and the face orientation information acquisition process after the hand candidate detection process.
  • the gesture detection device 101 may execute the hand candidate detection process in parallel with the face 11 detection process and the face orientation information acquisition process.
  • the face detection unit 10 succeeds in detecting the occupant's face 11 and the face orientation in the first frame, and fails to detect the occupant's face 11 in the second frame.
  • the first frame is a frame in which the face orientation of the occupant is detected most recently from the second frame.
  • the frame shown in FIG. 9 corresponds to the first frame
  • the frame shown in FIG. 10 corresponds to the second frame.
  • step S10 the image acquisition unit 50 acquires the second frame in the image captured by the image pickup device 110.
  • step S20 the face detection unit 10 fails to detect the occupant's face 11 in the second frame. Therefore, the face orientation and head position are not detected.
  • step S30 the gesture detection device 101 determines that the face orientation of the occupant has not been detected.
  • Step S80 is executed.
  • step S80 the face information acquisition unit 20 determines whether or not the second frame is within a predetermined number of frames from the first frame in which the face orientation of the occupant is detected most recently. As described above, in order to satisfy this condition for the first frame and the second frame, step S90 is executed.
  • step S90 the face information acquisition unit 20 acquires information on the face orientation and head position in the first frame from the storage unit 60.
  • step S100 the hand candidate detection unit 30 detects the occupant's hand candidate in the second frame.
  • step S110 the determination unit 40 determines whether or not at least one of the pitch angle, yaw angle, and head position in the first frame exceeds a predetermined range. If at least one is beyond that range, step S120 is executed. If everything is within that range, the gesture detection method ends.
  • step S120 the determination unit 40 rejects the hand candidate information in the second frame. This completes the gesture detection method for one processing target frame. After that, step S10 is executed again for the next frame.
  • Such a gesture detection device 101 reduces the identification of objects other than the occupant's hand as the hand 31. That is, the gesture detection device 101 accurately detects the hand 31 that constitutes the gesture of the occupant.
  • the occupant When the occupant operates the in-vehicle device 120, the occupant looks into the information displayed on the display device such as the dashboard of the vehicle and the center console to confirm. In that case, the occupant's head is reflected in the detection range of the hand candidate.
  • the hand candidate detection unit 30 may determine that the occupant's head (or face 11) is a closed hand 31 (such as a thumbs-up hand) and detect it as a hand candidate (for example, FIG. 10).
  • the face orientation of the occupant is included in the diagonal range from the front direction of the vehicle to the position of the image pickup device 110 (for example, FIG. 8).
  • the face orientation is out of the range.
  • the gesture detection device 101 in the second embodiment rejects the information of the hand candidate erroneously detected based on the predetermined condition regarding the face orientation. In other words, when the face orientation of the occupant exceeds a predetermined range, the gesture detection device 101 identifies that the hand candidate is something other than the hand, and rejects the information of the hand candidate. As a result, the gesture detection device 101 accurately detects the hand 31 in the gesture of the occupant.
  • the predetermined conditions regarding face orientation are not limited to the above conditions.
  • the condition is that at least one of the pitch, yaw, and roll angles and the head position of at least one of the lateral head position, the depth head position, and the height head position. It may be a combination of conditions.
  • the gesture detection device 101 in the second embodiment includes a storage unit 60.
  • the storage unit 60 stores information on the face orientation detected for each frame of the image and information on the head position of the occupant.
  • the face information acquisition unit 20 performs the face in the first frame.
  • the orientation information and the head position information are acquired from the storage unit 60.
  • the second frame is a frame within a predetermined number of frames (first predetermined number of frames) from the first frame.
  • the hand candidate detection unit 30 detects the hand candidate in the second frame.
  • the determination unit 40 rejects the hand candidate information in the second frame based on the predetermined conditions regarding the face orientation and the head position in the first frame.
  • the hand candidate detection unit 30 Since the face detection pattern matching process and the hand candidate detection pattern matching process are different from each other, even if the face detection unit 10 fails to detect the occupant's face orientation, the hand candidate detection unit 30 still uses the occupant's face 11. , Head, etc. may be erroneously detected as a hand candidate (for example, FIG. 10).
  • the movement of the occupant from the posture shown in FIG. 9 to the posture shown in FIG. 10, that is, the movement of the occupant looking into the display device is continuous and is performed in a short time. Therefore, even when the determination unit 40 rejects the hand candidate information in the frame to be processed based on the face orientation information in the frame close in time to the frame to be processed, the accuracy of the rejection determination is high. It doesn't get worse.
  • the gesture detection device 101 according to the second embodiment prevents the occupant's face 11, head, and the like from being detected as hand candidates even when the occupant's face orientation is temporarily not detected. As a result, the detection accuracy of the occupant's hand
  • the first frame in the second embodiment is a frame in which the face orientation of the occupant is detected most recently from the second frame.
  • the gesture detection device 101 determines whether or not the most recently detected face orientation satisfies a predetermined condition. Therefore, the gesture detection device 101 accurately detects the occupant's hand 31.
  • the gesture detection device and the gesture detection method according to the third embodiment will be described.
  • the third embodiment is a subordinate concept of the first embodiment, and the gesture detection device according to the third embodiment includes each configuration of the gesture detection device 101 according to the second embodiment. The same configuration and operation as those of the first and second embodiments will be omitted.
  • FIG. 11 is a diagram showing the relationship from the first frame to the fourth frame in the third embodiment.
  • the first frame is the first frame to be processed among the plurality of frames constituting the video.
  • the face orientation of the occupant is detected, and the hand candidate is also detected.
  • the determination unit 40 determines whether or not at least one of the pitch angle and the yaw angle of the face 11 in the first frame exceeds a predetermined range. .. The determination unit 40 rejects the hand candidate information in the first frame based on the determination result.
  • the second frame is a frame after the first frame by a predetermined number of frames.
  • the first predetermined number of frames may be stored in the gesture detection device, for example, or may be input from the outside.
  • the face orientation of the occupant is not detected, but the hand candidate is detected.
  • the first frame is a frame in which the face orientation of the occupant is detected most recently from the second frame.
  • the determination unit 40 has at least one of the pitch angle, yaw angle, and head position in the first frame in advance. Determine if it exceeds the specified range.
  • the determination unit 40 rejects the hand candidate information in each frame based on the determination result. This determination operation is the same as that of the second embodiment.
  • the gesture detection device prevents the occupant's face 11, head, and the like from being detected as hand candidates even when the occupant's face orientation is temporarily not detected.
  • the third frame is a frame after the second frame.
  • the face orientation of the occupant is not detected, but the hand candidate is detected.
  • the determination unit 40 rejects the hand candidate information. In other words, the determination unit 40 rejects the hand candidate information without making a rejection determination regarding the face orientation.
  • step S80 shown in FIG. 7 each frame from the frame next to the second frame to the third frame is determined to be "No". That is, the hand candidate information is not rejected. Therefore, an object other than the hand detected as a hand candidate may be recognized as the hand 31. Therefore, the gesture detection device rejects the information of the hand candidate detected in each frame from the frame next to the second frame to the third frame without making a rejection determination regarding the face orientation. Therefore, the detection accuracy of the occupant's hand 31 is improved.
  • the fourth frame is a frame after the second predetermined number of frames from the frame following the third frame.
  • the second predetermined number of frames may be stored in the gesture detection device, for example, or may be input from the outside.
  • the face orientation of the occupant is continuously detected from the frame following the third frame to the fourth frame.
  • hand candidates are detected.
  • the determination unit 40 rejects the hand candidate information. In other words, the determination unit 40 rejects the hand candidate information without making a rejection determination regarding the face orientation.
  • the detected face orientation may not be accurate.
  • the occupant's face 11 usually faces the direction in which the image pickup apparatus 110 is located from the front direction. Therefore, it is preferable that the rejection determination of the hand candidate information is restarted in a state where the face orientation of the occupant is likely to be included in the predetermined range.
  • the gesture detection device rejects the hand candidate information until the face detection is successful for the second predetermined number of frames or more.
  • the determination unit 40 restarts the determination as to whether or not to reject the hand candidate information. This improves the detection accuracy of the occupant's hand 31.
  • the gesture detection device and the gesture detection method according to the fourth embodiment will be described.
  • the fourth embodiment is a subordinate concept of the first embodiment.
  • the gesture detection device according to the fourth embodiment includes each configuration of the gesture detection device 101 according to the second embodiment. The same configuration and operation as any one of the first to third embodiments will be omitted.
  • FIG. 12 is a functional block diagram showing the configuration of the gesture detection device 102 according to the fourth embodiment.
  • the gesture detection device 102 includes a hand candidate detection unit 30A.
  • the hand candidate detection unit 30A in the fourth embodiment is a modified example of the hand candidate detection unit 30 in the second embodiment.
  • the hand candidate detection unit 30A performs hand candidate detection processing in the hand candidate detection target area set in the video.
  • the detection target area is preset, for example, at a position where the occupant performs a hand gesture. Or, for example, the detection target area is set at a position including the hand candidate frame 32 detected in the frame before the frame to be processed.
  • the hand candidate detection unit 30A in the fourth embodiment narrows the detection target area based on a predetermined condition regarding the face orientation. For example, as a predetermined condition, when at least one of the pitch angle and the yaw angle exceeds the predetermined range, the hand candidate detection unit 30A narrows the detection target area. In other words, the predetermined condition is that at least one of the pitch angle and the yaw angle exceeds the predetermined range.
  • the hand candidate detection unit 30A performs hand candidate detection processing in the narrowed detection target area.
  • the determination of whether or not the face orientation satisfies a predetermined condition is executed by, for example, the hand candidate detection unit 30A.
  • the hand candidate detection unit 30A may acquire the determination result by the determination unit 40.
  • the detection target area corresponds to, for example, a region called a gesture detection region.
  • FIG. 13 is a flowchart showing the gesture detection method according to the fourth embodiment. Steps S10 to S50 are the same as those in the second embodiment. Further, step S80 and step S90 are the same as those in the second embodiment.
  • the hand candidate detection unit 30A detects the occupant's hand candidate in the detection target area.
  • FIG. 14 is a diagram showing an example of a frame to be processed.
  • the occupant does not make a hand gesture for operating the in-vehicle device 120.
  • the occupant operates the rear view mirror 2 provided in the room by hand 31.
  • the occupant is looking at the rear-view mirror 2.
  • the face detection unit 10 detects the occupant's face 11.
  • the face frame 12 is set so as to surround the face 11.
  • the detection target area 33A is set in the frame.
  • the hand candidate detection unit 30A detects the occupant's hand 31 as a hand candidate within the detection target area 33A.
  • the hand candidate frame 32 is set so as to include the hand candidate.
  • the determination unit 40 determines whether or not the face orientation satisfies a predetermined condition.
  • the face orientation of the occupant detected by the face detection unit 10 is diagonally upward. Both the pitch angle and the yaw angle indicating the face orientation exceed a predetermined range.
  • the hand candidate detection unit 30A determines that the face orientation satisfies a predetermined condition. Step S120 is executed.
  • step S120 the hand candidate information is rejected.
  • step S130 is executed after step S120.
  • the hand candidate detection unit 30A narrows the detection target area 33A so that the rejected hand candidate is not detected.
  • the hand candidate detection unit 30A sets, for example, a detection target area 33B in which the size of the detection target area 33A is reduced so as not to include the hand candidate frame 32.
  • the detection target area 33B corresponds to an area in which the upper portion of the detection target area 33A is reduced.
  • the hand candidate detection unit 30A detects the hand candidate in the narrowed detection target area 33B.
  • Such a gesture detection device 102 does not detect the hand 31 that operates the rearview mirror 2 as a hand in the gesture for operating the in-vehicle device 120. More specifically, the gesture detection device 102 rejects the information of the hand candidate detected based on the hand 31 that operates the rearview mirror 2. Further, the gesture detection device 102 narrows the detection target area 33A based on the information on the face orientation of the occupant so that the hand candidate is not detected in the area around the rearview mirror 2.
  • the positions and sizes of the detection target areas 33A and 33B shown in FIG. 14 are examples, and are not limited thereto.
  • the detection target areas 33A and 33B may be not only the central portion between the driver's seat and the passenger seat but also the regions extended in the direction of both seats (left-right direction).
  • the gesture detection device 102 described above executes a process of narrowing the detection target area 33A after the process of rejecting the information of the hand candidate.
  • the timing of the process of narrowing the detection target area 33A may be between steps S50 and S60 and between steps S90 and S100.
  • the hand candidate detection unit 30A narrows the detection target area 33A in the processing target frame when the face orientation in the processing target frame satisfies a predetermined condition.
  • the gesture detection device shown in each of the above embodiments can be applied to a system constructed by appropriately combining a navigation device, a communication terminal, a server, and the functions of applications installed in the navigation device.
  • the navigation device includes, for example, a PND (Portable Navigation Device) and the like.
  • the communication terminal includes, for example, a mobile terminal such as a mobile phone, a smartphone and a tablet.
  • FIG. 15 is a block diagram showing the configuration of the gesture detection device 101 and the device that operates in connection with the gesture detection device 101 in the fifth embodiment.
  • the gesture detection device 101 and the communication device 130 are provided in the server 300.
  • the gesture detection device 101 acquires an image taken by the image pickup device 110 provided in the vehicle 1 via the communication device 140 and the communication device 130.
  • the gesture detection device 101 acquires information on the face orientation of the occupant detected based on the image.
  • the gesture detection device 101 detects a hand candidate based on the image.
  • the gesture detection device 101 rejects the hand candidate information based on a predetermined condition regarding the face orientation.
  • the gesture detection device 101 identifies the hand candidate that has not been rejected as the hand 31 that constitutes the gesture of the occupant. Based on the gesture by the occupant's hand 31 identified by the gesture detection device 101, the operation process of the in-vehicle device 120 and the like are executed.
  • gesture detection device 101 may be provided in the server 300, and the other part may be provided in the vehicle 1 in a distributed manner. The same effect is obtained when the gesture detection device 100 shown in the first embodiment is provided in the server 300.
  • each embodiment can be freely combined, and each embodiment can be appropriately modified or omitted.

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Provided is a gesture detection device which accurately detects the hand in a gesture by a vehicle occupant. The gesture detection device comprises a face information acquisition unit, a hand candidate detection unit, and a determination unit. The face information acquisition unit acquires information pertaining to the facial orientation of the vehicle occupant. The facial orientation is detected on the basis of a video taken by an imaging device provided in a vehicle. The hand candidate detection unit detects a hand candidate, which is a candidate for the vehicle occupant's hand, on the basis of the video. The determination unit dismisses, on the basis of a predetermined condition related to the facial orientation, hand candidate information such that the hand candidate is not detected as the vehicle occupant's hand which is to be detected in the gesture by the vehicle occupant.

Description

ジェスチャ検出装置およびジェスチャ検出方法Gesture detector and gesture detection method
 本開示は、ジェスチャ検出装置およびジェスチャ検出方法に関する。 This disclosure relates to a gesture detection device and a gesture detection method.
 車両の乗員による車載機器の操作に関して、乗員の手のジェスチャを検出することにより、乗員がその車載機器に接触することなく、その車載機器を操作するシステムが提案されている。例えば、ジェスチャ検出装置は、車内に設けられたカメラ等によって撮影された映像に基づいて乗員の手を検出する。車載機器は乗員の手のジェスチャに従って動作することから、ジェスチャ検出装置における乗員の手の検出には正確性が求められる。特許文献1には、運転手の顔の領域に基づいて設定されたジェスチャ領域のみから、ユーザの手に関する情報を検出する制御装置が提案されている。 Regarding the operation of the in-vehicle device by the occupant of the vehicle, a system has been proposed in which the occupant operates the in-vehicle device without touching the in-vehicle device by detecting the gesture of the occupant's hand. For example, the gesture detection device detects the occupant's hand based on an image taken by a camera or the like provided in the vehicle. Since the in-vehicle device operates according to the gesture of the occupant's hand, accuracy is required for the detection of the occupant's hand in the gesture detection device. Patent Document 1 proposes a control device that detects information about a user's hand only from a gesture area set based on the area of the driver's face.
特開2014-119295号公報Japanese Unexamined Patent Publication No. 2014-119295
 ジェスチャ検出装置は、映像に基づいて乗員の手を検出する。そのため、映像の状態によっては、ジェスチャ検出装置は、手以外の物を手として検出する場合がある。 The gesture detection device detects the occupant's hand based on the image. Therefore, depending on the state of the image, the gesture detection device may detect an object other than the hand as a hand.
 本開示は、上記の課題を解決するためのものであり、乗員のジェスチャにおける手を正確に検出するジェスチャ検出装置の提供を目的とする。 The present disclosure is for solving the above-mentioned problems, and an object of the present disclosure is to provide a gesture detection device that accurately detects a hand in a gesture of an occupant.
 本開示に係るジェスチャ検出装置は、顔情報取得部、手候補検出部および判定部を含む。顔情報取得部は、乗員の顔向きの情報を取得する。その顔向きは、車両に設けられた撮像装置によって撮像された映像に基づいて検出される。手候補検出部は、その映像に基づいて、乗員の手の候補である手候補を検出する。判定部は、顔向きに関する予め定められた条件に基づいて、手候補が検出対象である乗員のジェスチャにおける乗員の手として検出されないように、手候補の情報を棄却する。 The gesture detection device according to the present disclosure includes a face information acquisition unit, a hand candidate detection unit, and a determination unit. The face information acquisition unit acquires information on the face orientation of the occupant. The face orientation is detected based on the image captured by the image pickup device provided in the vehicle. The hand candidate detection unit detects a hand candidate that is a candidate for the occupant's hand based on the image. The determination unit rejects the hand candidate information so that the hand candidate is not detected as the occupant's hand in the gesture of the occupant to be detected based on a predetermined condition regarding the face orientation.
 本開示によれば、乗員のジェスチャにおける手を正確に検出するジェスチャ検出装置が提供される。 According to the present disclosure, a gesture detection device for accurately detecting a hand in a occupant's gesture is provided.
 本開示の目的、特徴、局面、および利点は、以下の詳細な説明と添付図面とによって、より明白になる。 The purposes, features, aspects, and advantages of this disclosure will be made clearer by the following detailed description and accompanying drawings.
実施の形態1におけるジェスチャ検出装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the gesture detection apparatus in Embodiment 1. FIG. ジェスチャ検出装置が含む処理回路の構成の一例を示す図である。It is a figure which shows an example of the structure of the processing circuit included in a gesture detection device. ジェスチャ検出装置が含む処理回路の構成の別の一例を示す図である。It is a figure which shows another example of the structure of the processing circuit included in the gesture detection apparatus. 実施の形態1におけるジェスチャ検出方法を示すフローチャートである。It is a flowchart which shows the gesture detection method in Embodiment 1. FIG. 実施の形態2におけるジェスチャ検出装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the gesture detection apparatus in Embodiment 2. FIG. 実施の形態2における乗員の顔向きの一例を示す図である。It is a figure which shows an example of the face orientation of an occupant in Embodiment 2. FIG. 実施の形態2におけるジェスチャ検出方法を示すフローチャートである。It is a flowchart which shows the gesture detection method in Embodiment 2. 処理対象のフレームの一例を示す図である。It is a figure which shows an example of the frame to be processed. 処理対象のフレームの一例を示す図である。It is a figure which shows an example of the frame to be processed. 処理対象のフレームの一例を示す図である。It is a figure which shows an example of the frame to be processed. 実施の形態3における第1フレームから第4フレームまでの関係を示す図である。It is a figure which shows the relationship from the 1st frame to the 4th frame in Embodiment 3. FIG. 実施の形態4におけるジェスチャ検出装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the gesture detection apparatus in Embodiment 4. 実施の形態4におけるジェスチャ検出方法を示すフローチャートである。It is a flowchart which shows the gesture detection method in Embodiment 4. 処理対象のフレームの一例を示す図である。It is a figure which shows an example of the frame to be processed. 実施の形態5におけるジェスチャ検出装置およびそれに関連して動作する装置の構成を示すブロック図である。It is a block diagram which shows the structure of the gesture detection device and the device which operates in connection with it in Embodiment 5.
 <実施の形態1>
 図1は、実施の形態1におけるジェスチャ検出装置100の構成を示す機能ブロック図である。また、図1には、ジェスチャ検出装置100と関連して動作する装置として、撮像装置110および顔検出部10が示されている。
<Embodiment 1>
FIG. 1 is a functional block diagram showing the configuration of the gesture detection device 100 according to the first embodiment. Further, FIG. 1 shows an image pickup device 110 and a face detection unit 10 as devices that operate in connection with the gesture detection device 100.
 撮像装置110は、車両に設けられている。撮像装置110は、車両の室内の乗員の映像を撮影する。 The image pickup device 110 is provided in the vehicle. The image pickup device 110 captures an image of an occupant inside the vehicle.
 顔検出部10は、その映像に基づいて、乗員の顔向きを検出する。顔向きは、例えば、乗員の顔の正面と相対する方向、視線の方向等に対応する。 The face detection unit 10 detects the face orientation of the occupant based on the image. The face orientation corresponds to, for example, the direction facing the front of the occupant's face, the direction of the line of sight, and the like.
 ジェスチャ検出装置100は、撮像装置110によって撮影された映像に基づいて、車両の乗員の手のジェスチャを検出する。 The gesture detection device 100 detects the gesture of the hand of the occupant of the vehicle based on the image taken by the image pickup device 110.
 ジェスチャ検出装置100は、顔情報取得部20、手候補検出部30および判定部40を含む。 The gesture detection device 100 includes a face information acquisition unit 20, a hand candidate detection unit 30, and a determination unit 40.
 顔情報取得部20は、顔検出部10から乗員の顔向きの情報を取得する。 The face information acquisition unit 20 acquires information on the face orientation of the occupant from the face detection unit 10.
 手候補検出部30は、撮像装置110によって撮影された映像に基づいて、乗員の手の候補である手候補を検出する。手候補検出部30は、例えばその映像における物体の形状のパターン(輝度分布の情報)と予め定められた手の形状のパターンとをマッチングすることにより、手候補を検出する。 The hand candidate detection unit 30 detects a hand candidate that is a candidate for the occupant's hand based on the image taken by the image pickup device 110. The hand candidate detection unit 30 detects a hand candidate by, for example, matching a pattern of the shape of an object (information on the luminance distribution) in the image with a predetermined pattern of the shape of the hand.
 判定部40は、顔向きに関する予め定められた条件に基づいて、手候補の情報を棄却する。ジェスチャ検出装置100は、棄却された手候補を、乗員のジェスチャを構成する手として識別しない。 The determination unit 40 rejects the hand candidate information based on a predetermined condition regarding the face orientation. The gesture detection device 100 does not identify the rejected hand candidate as a hand constituting the occupant's gesture.
 図2は、ジェスチャ検出装置100が含む処理回路90の構成の一例を示す図である。顔情報取得部20、手候補検出部30および判定部40の各機能は、処理回路90により実現される。すなわち、処理回路90は、顔情報取得部20、手候補検出部30および判定部40を有する。 FIG. 2 is a diagram showing an example of the configuration of the processing circuit 90 included in the gesture detection device 100. Each function of the face information acquisition unit 20, the hand candidate detection unit 30, and the determination unit 40 is realized by the processing circuit 90. That is, the processing circuit 90 has a face information acquisition unit 20, a hand candidate detection unit 30, and a determination unit 40.
 処理回路90が専用のハードウェアである場合、処理回路90は、例えば、単一回路、複合回路、プログラム化されたプロセッサ、並列プログラム化されたプロセッサ、ASIC(Application Specific Integrated Circuit)、FPGA(Field-Programmable Gate Array)、またはこれらを組み合わせた回路等である。顔情報取得部20、手候補検出部30および判定部40の各機能は、複数の処理回路により個別に実現されてもよいし、1つの処理回路によりまとめて実現されてもよい。 When the processing circuit 90 is dedicated hardware, the processing circuit 90 may be, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), or an FPGA (Field). -ProgrammableGateArray), or a circuit that combines these. The functions of the face information acquisition unit 20, the hand candidate detection unit 30, and the determination unit 40 may be individually realized by a plurality of processing circuits, or may be collectively realized by one processing circuit.
 図3は、ジェスチャ検出装置100が含む処理回路の構成の別の一例を示す図である。処理回路は、プロセッサ91とメモリ92とを有する。プロセッサ91がメモリ92に格納されるプログラムを実行することにより、顔情報取得部20、手候補検出部30および判定部40の各機能が実現される。例えば、プログラムとして記載されたソフトウェアまたはファームウェアが、プロセッサ91によって実行されることにより各機能が実現される。このように、ジェスチャ検出装置100は、プログラムを格納するメモリ92と、そのプログラムを実行するプロセッサ91とを有する。 FIG. 3 is a diagram showing another example of the configuration of the processing circuit included in the gesture detection device 100. The processing circuit includes a processor 91 and a memory 92. By executing the program stored in the memory 92 by the processor 91, each function of the face information acquisition unit 20, the hand candidate detection unit 30, and the determination unit 40 is realized. For example, each function is realized by executing the software or firmware described as a program by the processor 91. As described above, the gesture detection device 100 has a memory 92 for storing the program and a processor 91 for executing the program.
 プログラムには、ジェスチャ検出装置100が、車両に設けられた撮像装置110によって撮像された映像に基づいて検出される乗員の顔向きの情報を取得する機能が記載されている。また、プログラムには、ジェスチャ検出装置100が、その映像に基づいて、乗員の手の候補である手候補を検出する機能が記載されている。さらに、プログラムには、顔向きに関する予め定められた条件に基づいて、手候補が検出対象である乗員のジェスチャにおける手として検出されないように、手候補の情報を棄却する機能が記載されている。このように、プログラムは、顔情報取得部20、手候補検出部30および判定部40の手順または方法をコンピュータに実行させるものである。 The program describes a function in which the gesture detection device 100 acquires information on the face orientation of the occupant detected based on the image captured by the image pickup device 110 provided in the vehicle. Further, the program describes a function in which the gesture detection device 100 detects a hand candidate, which is a candidate for a occupant's hand, based on the image. Further, the program describes a function of rejecting the information of the hand candidate so that the hand candidate is not detected as a hand in the gesture of the occupant to be detected based on the predetermined condition regarding the face orientation. As described above, the program causes the computer to execute the procedure or method of the face information acquisition unit 20, the hand candidate detection unit 30, and the determination unit 40.
 プロセッサ91は、例えば、CPU(Central Processing Unit)、演算装置、マイクロプロセッサ、マイクロコンピュータ、DSP(Digital Signal Processor)等である。メモリ92は、例えば、RAM(Random Access Memory)、ROM(Read Only Memory)、フラッシュメモリ、EPROM(Erasable Programmable Read Only Memory)、EEPROM(Electrically Erasable Programmable Read Only Memory)等の、不揮発性または揮発性の半導体メモリである。または、メモリ92は、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、DVD等、今後使用されるあらゆる記憶媒体であってもよい。 The processor 91 is, for example, a CPU (Central Processing Unit), an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor), or the like. The memory 92 is, for example, non-volatile or volatile such as RAM (RandomAccessMemory), ROM (ReadOnlyMemory), flash memory, EPROM (ErasableProgrammableReadOnlyMemory), EEPROM (ElectricallyErasableProgrammableReadOnlyMemory). It is a semiconductor memory. Alternatively, the memory 92 may be any storage medium used in the future, such as a magnetic disk, a flexible disk, an optical disk, a compact disk, a mini disk, or a DVD.
 上記の顔情報取得部20、手候補検出部30および判定部40の各機能は、一部が専用のハードウェアによって実現され、他の一部がソフトウェアまたはファームウェアにより実現されてもよい。このように、処理回路は、ハードウェア、ソフトウェア、ファームウェア、またはこれらの組み合わせによって、上記の各機能を実現する。 Each function of the face information acquisition unit 20, the hand candidate detection unit 30, and the determination unit 40 may be partially realized by dedicated hardware and the other part may be realized by software or firmware. In this way, the processing circuit realizes each of the above functions by hardware, software, firmware, or a combination thereof.
 図4は、実施の形態1におけるジェスチャ検出方法を示すフローチャートである。図4に示されるステップS1よりも前に、顔検出部10は、車両に設けられた撮像装置110によって撮影された映像に基づいて乗員の顔向きを検出している。 FIG. 4 is a flowchart showing the gesture detection method in the first embodiment. Prior to step S1 shown in FIG. 4, the face detection unit 10 detects the face orientation of the occupant based on the image taken by the image pickup device 110 provided in the vehicle.
 ステップS1にて、顔情報取得部20は、顔検出部10から乗員の顔向きの情報を取得する。 In step S1, the face information acquisition unit 20 acquires information on the face orientation of the occupant from the face detection unit 10.
 ステップS2にて、手候補検出部30は、撮像装置110によって撮影された映像に基づいて、乗員の手の候補である手候補を検出する。 In step S2, the hand candidate detection unit 30 detects a hand candidate that is a candidate for the occupant's hand based on the image captured by the image pickup device 110.
 ステップS3にて、判定部40は、顔向きに関する予め定められた条件に基づいて、手候補の情報を棄却するか否かを判定する。判定部40は、その判定結果に従い、手候補の情報を棄却する。棄却された手候補は、検出対象である乗員のジェスチャにおける乗員の手として検出されない。言い換えると、ジェスチャ検出装置100は、棄却された手候補を、乗員のジェスチャを構成する手として識別しない。 In step S3, the determination unit 40 determines whether or not to reject the hand candidate information based on a predetermined condition regarding the face orientation. The determination unit 40 rejects the hand candidate information according to the determination result. The rejected hand candidate is not detected as a occupant's hand in the occupant's gesture to be detected. In other words, the gesture detection device 100 does not identify the rejected hand candidate as a hand constituting the occupant's gesture.
 以上をまとめると、実施の形態1におけるジェスチャ検出装置100は、顔情報取得部20、手候補検出部30および判定部40を含む。顔情報取得部20は、乗員の顔向きの情報を取得する。その顔向きは、車両に設けられた撮像装置110によって撮像された映像に基づいて検出される。手候補検出部30は、その映像に基づいて、乗員の手の候補である手候補を検出する。判定部40は、顔向きに関する予め定められた条件に基づいて、手候補が検出対象である乗員のジェスチャにおける乗員の手として検出されないように、手候補の情報を棄却する。 Summarizing the above, the gesture detection device 100 in the first embodiment includes a face information acquisition unit 20, a hand candidate detection unit 30, and a determination unit 40. The face information acquisition unit 20 acquires information on the face orientation of the occupant. The face orientation is detected based on the image captured by the image pickup device 110 provided in the vehicle. The hand candidate detection unit 30 detects a hand candidate that is a candidate for the occupant's hand based on the image. The determination unit 40 rejects the hand candidate information so that the hand candidate is not detected as the occupant's hand in the gesture of the occupant to be detected, based on a predetermined condition regarding the face orientation.
 このようなジェスチャ検出装置100は、乗員のジェスチャにおける手を正確に検出する。 Such a gesture detection device 100 accurately detects the hand in the gesture of the occupant.
 また、実施の形態1におけるジェスチャ検出方法は、車両に設けられた撮像装置110によって撮像された映像に基づいて検出される乗員の顔向きの情報を取得する。また、ジェスチャ検出方法は、その映像に基づいて、乗員の手の候補である手候補を検出する。さらにジェスチャ検出方法は、顔向きに関する予め定められた条件に基づいて、手候補が検出対象である乗員のジェスチャにおける手として検出されないように、手候補の情報を棄却する。 Further, the gesture detection method in the first embodiment acquires information on the face orientation of the occupant detected based on the image captured by the image pickup device 110 provided in the vehicle. In addition, the gesture detection method detects a hand candidate that is a candidate for the occupant's hand based on the image. Further, the gesture detection method rejects the hand candidate information so that the hand candidate is not detected as a hand in the gesture of the occupant to be detected based on a predetermined condition regarding the face orientation.
 このようなジェスチャ検出方法によれば、乗員のジェスチャにおける手が正確に検出される。 According to such a gesture detection method, the occupant's hand in the gesture is accurately detected.
 <実施の形態2>
 実施の形態2におけるジェスチャ検出装置およびジェスチャ検出方法を説明する。実施の形態2は実施の形態1の下位概念であり、実施の形態2におけるジェスチャ検出装置は、実施の形態1におけるジェスチャ検出装置100の各構成を含む。なお、実施の形態1と同様の構成および動作については説明を省略する。
<Embodiment 2>
The gesture detection device and the gesture detection method according to the second embodiment will be described. The second embodiment is a subordinate concept of the first embodiment, and the gesture detection device according to the second embodiment includes each configuration of the gesture detection device 100 according to the first embodiment. The same configuration and operation as in the first embodiment will be omitted.
 図5は、実施の形態2におけるジェスチャ検出装置101の構成を示す機能ブロック図である。また、図5には、ジェスチャ検出装置101と関連して動作する装置として、撮像装置110および車載機器120が示されている。 FIG. 5 is a functional block diagram showing the configuration of the gesture detection device 101 according to the second embodiment. Further, FIG. 5 shows an image pickup device 110 and an in-vehicle device 120 as devices that operate in connection with the gesture detection device 101.
 撮像装置110は、車両の室内の前方中央に設けられている。撮像装置110は、車両の室内を広角で撮影し、運転席および助手席の両方を一度に撮影する。撮像装置110は、例えば、赤外線を検知するカメラ、可視光を検知するカメラ等である。実施の形態2におけるジェスチャ検出装置101は、撮像装置110によって撮影される映像に基づいて、車両の乗員の手のジェスチャを検出する。そのジェスチャは、車載機器120を操作するためのジェスチャである。車載機器120とは、例えば、エアコン、オーディオ等である。ジェスチャ検出装置101によって検出されたジェスチャによって、エアコンの温度調節、オーディオの音量調節等が実行される。ただし、車載機器120は、エアコンおよびオーディオに限定されるものではない。 The image pickup device 110 is provided in the front center of the vehicle interior. The image pickup apparatus 110 photographs the interior of the vehicle at a wide angle, and photographs both the driver's seat and the passenger seat at the same time. The image pickup device 110 is, for example, a camera that detects infrared rays, a camera that detects visible light, and the like. The gesture detection device 101 according to the second embodiment detects the gesture of the hand of the occupant of the vehicle based on the image captured by the image pickup device 110. The gesture is a gesture for operating the in-vehicle device 120. The in-vehicle device 120 is, for example, an air conditioner, an audio system, or the like. The gesture detected by the gesture detection device 101 controls the temperature of the air conditioner, adjusts the volume of the audio, and the like. However, the in-vehicle device 120 is not limited to the air conditioner and the audio.
 ジェスチャ検出装置101は、映像取得部50、顔検出部10、記憶部60、顔情報取得部20、手候補検出部30および判定部40を含む。 The gesture detection device 101 includes a video acquisition unit 50, a face detection unit 10, a storage unit 60, a face information acquisition unit 20, a hand candidate detection unit 30, and a determination unit 40.
 映像取得部50は、撮像装置110によって撮影された映像をフレームごとに取得する。 The image acquisition unit 50 acquires the image captured by the image pickup device 110 for each frame.
 顔検出部10は、その映像のフレームごとに、乗員の顔および顔向きを検出する。例えば、顔検出部10は、乗員の顔パーツを検出し、その顔パーツの位置に基づいて、顔向きを検出する。その顔パーツの位置に基づいて検出される顔向きは、乗員の顔の正面と相対する方向である。または例えば、顔検出部10は、乗員の視線を検出し、その視線に基づいて、顔向きを検出する。視線に基づいて検出される顔向きは、その視線が向いている方向である。つまり、実施の形態2における顔向きは、乗員の顔の正面と相対する方向および視線の方向のうち少なくとも一方を含む。 The face detection unit 10 detects the face and face orientation of the occupant for each frame of the image. For example, the face detection unit 10 detects the face parts of the occupant and detects the face orientation based on the position of the face parts. The face orientation detected based on the position of the face part is the direction facing the front of the occupant's face. Or, for example, the face detection unit 10 detects the line of sight of the occupant and detects the face orientation based on the line of sight. The face orientation detected based on the line of sight is the direction in which the line of sight is facing. That is, the face orientation in the second embodiment includes at least one of the direction facing the front of the occupant's face and the direction of the line of sight.
 図6は、実施の形態2における乗員の顔向きの一例を示す図である。顔向きは、ピッチ(Pitch)角、ヨー(Yaw)角およびロール(Roll)角で表される。例えば、乗員の顔が車両の前方をまっすぐ向いている場合、ピッチ角、ヨー角およびロール角は0度である。顔検出部10は、ピッチ角、ヨー角およびロール角のうち、少なくともピッチ角およびヨー角を検出する。さらに、実施の形態2における顔検出部10は、その映像における頭位置を検出する。実施の形態2において検出される頭位置は、高さ方向の位置である。このように、顔検出部10は、乗員の顔向きおよび頭位置を検出する。頭位置は、顔位置と読み替えることができる。 FIG. 6 is a diagram showing an example of the face orientation of the occupant in the second embodiment. Face orientation is represented by Pitch, Yaw, and Roll angles. For example, if the occupant's face is facing straight ahead of the vehicle, the pitch angle, yaw angle and roll angle are 0 degrees. The face detection unit 10 detects at least the pitch angle and the yaw angle among the pitch angle, the yaw angle and the roll angle. Further, the face detection unit 10 in the second embodiment detects the head position in the image. The head position detected in the second embodiment is a position in the height direction. In this way, the face detection unit 10 detects the face orientation and the head position of the occupant. The head position can be read as the face position.
 記憶部60は、顔検出部10によって顔向きが検出された場合、フレームごとにその顔向きおよび頭位置の情報を記憶する。 When the face orientation is detected by the face detection unit 10, the storage unit 60 stores information on the face orientation and the head position for each frame.
 顔情報取得部20は、フレームごとに、顔向きの情報を取得する。顔情報取得部20は、処理対象のフレームにおける乗員の顔向きが検出された場合、その処理対象のフレームにおける顔向きの情報を取得する。処理対象のフレームにおける乗員の顔向きが検出されない場合、顔情報取得部20は、以下のように動作する。ここでは、処理対象のフレームよりも前のフレームを第1フレームとし、処理対象のフレームを第2フレームとする。第1フレームにおける乗員の顔向きは検出される。第2フレームにおける乗員の顔は検出されない。この場合、第2フレームの処理において、顔情報取得部20は、第1フレームにおける顔向きおよび頭位置の情報を記憶部60から取得する。 The face information acquisition unit 20 acquires face orientation information for each frame. When the face orientation of the occupant in the frame to be processed is detected, the face information acquisition unit 20 acquires the face orientation information in the frame to be processed. When the face orientation of the occupant in the frame to be processed is not detected, the face information acquisition unit 20 operates as follows. Here, the frame before the frame to be processed is the first frame, and the frame to be processed is the second frame. The face orientation of the occupant in the first frame is detected. The occupant's face in the second frame is not detected. In this case, in the processing of the second frame, the face information acquisition unit 20 acquires the face orientation and head position information in the first frame from the storage unit 60.
 その第2フレームは、第1フレームから予め定められたフレーム数以内のフレームである。予め定められたフレーム数は、例えば、ジェスチャ検出装置101に記憶されていてもよいし、外部から入力されたものであってもよい。第1フレームは、第2フレームから遡って直近で乗員の顔向きが検出されたフレームであることが好ましい。 The second frame is a frame within a predetermined number of frames from the first frame. The predetermined number of frames may be stored in the gesture detection device 101, for example, or may be input from the outside. The first frame is preferably a frame in which the face orientation of the occupant is detected most recently from the second frame.
 手候補検出部30は、撮像装置110によって撮影された映像のフレームごとに、乗員の手の候補である手候補を検出する。手候補検出部30は、例えばその映像における物体の形状のパターン(輝度分布の情報)と予め定められた手の形状のパターンとをマッチングすることにより、つまりパターンマッチング処理により、乗員の手候補を検出する。検出対象の手の形状は、開いた状態の手の形状および閉じた状態の手の形状のうちいずれであってもよい。検出対象の手の形状は、例えば、数を示す手の形状、方向を示す手の形状、乗員の意思(OKまたはGoodなど)を示す手の形状等であってもよい。 The hand candidate detection unit 30 detects a hand candidate that is a candidate for the occupant's hand for each frame of the image captured by the image pickup device 110. The hand candidate detection unit 30 selects a occupant's hand candidate by, for example, matching a pattern of the shape of an object (information of luminance distribution) in the image with a predetermined pattern of the shape of the hand, that is, by pattern matching processing. To detect. The shape of the hand to be detected may be either the shape of the open hand or the shape of the closed hand. The shape of the hand to be detected may be, for example, the shape of the hand indicating the number, the shape of the hand indicating the direction, the shape of the hand indicating the intention of the occupant (OK, Good, etc.), or the like.
 判定部40は、顔向きに関する予め定められた条件に基づいて、フレームごとに、手候補の情報を棄却する。予め定められた条件は、例えば、ジェスチャ検出装置101に記憶されていてもよいし、外部から入力されたものであってもよい。予め定められた条件の一例は、後述する。「棄却する」とは、判定部40が手候補を手以外の物として識別することを含んでいてもよい。または、「棄却する」とは、判定部40が手候補の情報を無効にすることを含んでいてもよい。いずれにしても、棄却された手候補は、検出対象である乗員のジェスチャにおける乗員の手として検出されない。言い換えると、ジェスチャ検出装置101は、棄却された手候補を、乗員のジェスチャを構成する手として識別しない。一方で、ジェスチャ検出装置101は、判定部40で棄却されなかった手候補を乗員のジェスチャを構成する手として識別する。ジェスチャ検出装置101によって識別された乗員の手によるジェスチャに基づいて、車載機器120の操作処理等が実行される。なお、図5に示される機能ブロック図において、判定部40と車載機器120との間の処理を行う機能部の図示は省略されている。 The determination unit 40 rejects the hand candidate information for each frame based on a predetermined condition regarding the face orientation. The predetermined conditions may be stored in the gesture detection device 101, for example, or may be input from the outside. An example of predetermined conditions will be described later. "Rejecting" may include the determination unit 40 identifying the hand candidate as something other than a hand. Alternatively, "rejecting" may include invalidating the information of the hand candidate by the determination unit 40. In any case, the rejected hand candidate is not detected as the occupant's hand in the occupant's gesture to be detected. In other words, the gesture detection device 101 does not identify the rejected hand candidate as a hand constituting the occupant's gesture. On the other hand, the gesture detection device 101 identifies the hand candidate not rejected by the determination unit 40 as a hand constituting the gesture of the occupant. Based on the gesture by the occupant's hand identified by the gesture detection device 101, the operation process of the in-vehicle device 120 and the like are executed. In the functional block diagram shown in FIG. 5, the functional unit that performs processing between the determination unit 40 and the in-vehicle device 120 is not shown.
 実施の形態2における判定部40は、顔向きを表すピッチ角およびヨー角のうち少なくとも一方が、予め定められた範囲を超えている場合に、手候補の情報を棄却する。つまり、実施の形態2における顔向きに関する予め定められた条件とは、顔向きを表すピッチ角およびヨー角のうち少なくとも一方が、予め定められた範囲を超えていることである。その範囲は、例えば、顔の正面方向に対応する角度から、撮像装置110が位置する斜め方向に対応する角度の間である。乗員がジェスチャを行う際、乗員の顔は正面方向から撮像装置110の方向を向いているのが通常だからである。その範囲は、ピッチ角およびヨー角のそれぞれに対して予め定められている。なお、顔向きに関する予め定められた条件は、ピッチ角およびヨー角のうち少なくとも一方が、予め定められた閾値を超えていることであってもよい。 The determination unit 40 in the second embodiment rejects the hand candidate information when at least one of the pitch angle and the yaw angle indicating the face orientation exceeds a predetermined range. That is, the predetermined condition regarding the face orientation in the second embodiment is that at least one of the pitch angle and the yaw angle representing the face orientation exceeds the predetermined range. The range is, for example, between the angle corresponding to the front direction of the face and the angle corresponding to the oblique direction in which the image pickup apparatus 110 is located. This is because when the occupant makes a gesture, the occupant's face usually faces the image pickup device 110 from the front direction. The range is predetermined for each of the pitch angle and the yaw angle. The predetermined condition regarding the face orientation may be that at least one of the pitch angle and the yaw angle exceeds a predetermined threshold value.
 第1フレームと第2フレームとが上記の関係を有する場合、判定部40は、第1フレームにおける顔向きと頭位置とに関する条件に基づいて、第2フレームの手候補の情報を棄却する。例えば、判定部40は、第1フレームにおけるピッチ角、ヨー角および頭位置のうち少なくとも1つが、予め定められた範囲を超えている場合、第2フレームの手候補の情報を棄却する。つまり、顔向きに関する予め定められた条件は、第1フレームにおけるピッチ角、ヨー角および頭位置のうち少なくとも1つが、予め定められた範囲を超えていることである。 When the first frame and the second frame have the above relationship, the determination unit 40 rejects the hand candidate information of the second frame based on the conditions regarding the face orientation and the head position in the first frame. For example, if at least one of the pitch angle, yaw angle, and head position in the first frame exceeds a predetermined range, the determination unit 40 rejects the hand candidate information in the second frame. That is, the predetermined condition regarding the face orientation is that at least one of the pitch angle, the yaw angle, and the head position in the first frame exceeds the predetermined range.
 上記の顔検出部10、顔情報取得部20、手候補検出部30、判定部40、映像取得部50および記憶部60の機能は、図2または図3に示される処理回路によって実現される。 The functions of the face detection unit 10, the face information acquisition unit 20, the hand candidate detection unit 30, the determination unit 40, the image acquisition unit 50, and the storage unit 60 are realized by the processing circuit shown in FIG. 2 or FIG.
 図7は、実施の形態2におけるジェスチャ検出方法を示すフローチャートである。 FIG. 7 is a flowchart showing the gesture detection method in the second embodiment.
 ステップS10にて、映像取得部50は、撮像装置110によって撮影された映像における処理対象のフレームを取得する。 In step S10, the image acquisition unit 50 acquires a frame to be processed in the image captured by the image pickup device 110.
 ステップS20にて、顔検出部10は、処理対象のフレームにおける乗員の顔、顔向きおよび頭位置を検出する。 In step S20, the face detection unit 10 detects the occupant's face, face orientation, and head position in the frame to be processed.
 ステップS30にて、ジェスチャ検出装置101は、顔向きが検出されたか否かを判定する。顔向きが検出されている場合、ステップS40が実行される。顔向きが検出されていない場合、ステップS80が実行される。 In step S30, the gesture detection device 101 determines whether or not the face orientation is detected. If the face orientation is detected, step S40 is executed. If no face orientation is detected, step S80 is executed.
 ステップS40にて、記憶部60は、フレームごとに顔向きおよび頭位置の情報を記憶する。 In step S40, the storage unit 60 stores face orientation and head position information for each frame.
 ステップS50にて、顔情報取得部20は、処理対象のフレームにおける顔向きの情報を取得する。顔情報取得部20は、その顔向きの情報を顔検出部10から取得してもよいし、記憶部60から取得してもよい。 In step S50, the face information acquisition unit 20 acquires face orientation information in the frame to be processed. The face information acquisition unit 20 may acquire the face orientation information from the face detection unit 10 or the storage unit 60.
 ステップS60にて、手候補検出部30は、処理対象のフレームにおける乗員の手候補を検出する。 In step S60, the hand candidate detection unit 30 detects the occupant's hand candidate in the frame to be processed.
 ステップS70にて、判定部40は、顔向きが予め定められた条件を満たすか否かを判定する。ここでは、予め定められた条件は、ピッチ角およびヨー角のうち少なくとも一方が、予め定められた範囲を超えていることである。少なくとも一方がその範囲を超えている場合、つまり条件を満たす場合、ステップS120が実行される。両方がその範囲以内である場合、つまり条件を満たさない場合、ジェスチャ検出方法は終了する。 In step S70, the determination unit 40 determines whether or not the face orientation satisfies a predetermined condition. Here, the predetermined condition is that at least one of the pitch angle and the yaw angle exceeds the predetermined range. If at least one of them exceeds the range, that is, if the condition is satisfied, step S120 is executed. If both are within that range, that is, if the conditions are not met, the gesture detection method ends.
 図8は、処理対象のフレームの一例を示す図である。図8において、乗員は、車載機器120の操作のための手31のジェスチャを行っている。顔検出部10は乗員の顔11を検出している。また、その顔11を囲むように顔枠12が設定されている。手候補検出部30は、手候補として乗員の手31を検出している。その手候補を囲むように手候補枠32が設定されている。乗員の顔向きは、正面方向である。この場合、ピッチ角およびヨー角の両方が予め定められた範囲以内である。判定部40は、顔向きが予め定められた条件を満たさないと判定する。よって、図8の場合、ジェスチャ検出方法は終了する。すなわち、ジェスチャ検出装置101は、手候補が乗員のジェスチャを構成する手31であると識別する。 FIG. 8 is a diagram showing an example of a frame to be processed. In FIG. 8, the occupant is gesturing the hand 31 for operating the in-vehicle device 120. The face detection unit 10 detects the occupant's face 11. Further, the face frame 12 is set so as to surround the face 11. The hand candidate detection unit 30 detects the occupant's hand 31 as a hand candidate. The hand candidate frame 32 is set so as to surround the hand candidate. The occupant's face is facing the front. In this case, both the pitch angle and the yaw angle are within a predetermined range. The determination unit 40 determines that the face orientation does not satisfy a predetermined condition. Therefore, in the case of FIG. 8, the gesture detection method ends. That is, the gesture detection device 101 identifies that the hand candidate is the hand 31 that constitutes the gesture of the occupant.
 図9は、処理対象のフレームの一例を示す図である。図9において、乗員は、車載機器120の操作のための手のジェスチャを行っていない。乗員は、車両のセンターコンソールに表示された情報を覗き込んでいる。その情報は、例えば、ナビゲーションに関する情報である。顔検出部10は、乗員の顔11を検出している。その顔11を囲むように顔枠12が設定されている。手候補検出部30は、誤って乗員の顔11を手候補として検出している。その手候補を含むように手候補枠32が設定されている。図9のように、乗員の頭が丸坊主である場合、手候補検出部30は、その乗員の顔11を、閉じた状態の手31であると判断し、それを手候補として検出する場合がある。しかし、図9において、顔検出部10によって検出される乗員の顔向きは、斜め下方向である。ピッチ角およびヨー角の両方は予め定められた範囲を超えている。判定部40は、顔向きが予め定められた条件を満たすと判定する。そのため、ステップS120が実行される。 FIG. 9 is a diagram showing an example of a frame to be processed. In FIG. 9, the occupant does not make a hand gesture for operating the in-vehicle device 120. The occupant is looking into the information displayed on the center console of the vehicle. The information is, for example, information about navigation. The face detection unit 10 detects the occupant's face 11. The face frame 12 is set so as to surround the face 11. The hand candidate detection unit 30 erroneously detects the occupant's face 11 as a hand candidate. The hand candidate frame 32 is set so as to include the hand candidate. As shown in FIG. 9, when the occupant's head is a buzz cut, the hand candidate detection unit 30 may determine that the occupant's face 11 is a closed hand 31 and detect it as a hand candidate. be. However, in FIG. 9, the face orientation of the occupant detected by the face detection unit 10 is diagonally downward. Both the pitch angle and the yaw angle exceed a predetermined range. The determination unit 40 determines that the face orientation satisfies a predetermined condition. Therefore, step S120 is executed.
 ステップS80にて、顔情報取得部20は、処理対象のフレームが、直近で乗員の顔向きが検出されたフレームから予め定められたフレーム数以内のフレームであるか否かを判定する。処理対象のフレームが予め定められたフレーム数以内のフレームである場合、つまり、この条件が満たされる場合、ステップS90が実行される。この条件が満たされない場合、ジェスチャ検出方法は終了する。 In step S80, the face information acquisition unit 20 determines whether or not the frame to be processed is within a predetermined number of frames from the frame in which the face orientation of the occupant was detected most recently. If the frame to be processed is within a predetermined number of frames, that is, if this condition is satisfied, step S90 is executed. If this condition is not met, the gesture detection method ends.
 ステップS90にて、顔情報取得部20は、直近で乗員の顔向きが検出されたフレームにおける顔向きおよび頭位置の情報を記憶部60から取得する。 In step S90, the face information acquisition unit 20 acquires information on the face orientation and head position in the frame in which the face orientation of the occupant was detected most recently from the storage unit 60.
 ステップS100にて、手候補検出部30は、処理対象のフレームにおける乗員の手候補を検出する。 In step S100, the hand candidate detection unit 30 detects the occupant's hand candidate in the frame to be processed.
 ステップS110にて、判定部40は、顔向きと頭位置とが予め定められた条件を満たすか否かを判定する。ここでは、予め定められた条件は、ピッチ角、ヨー角および頭位置のうち少なくとも1つが、予め定められた範囲を超えていることである。少なくとも1つがその範囲を超えている場合、ステップS120が実行される。全てがその範囲以内である場合、ジェスチャ検出方法は終了する。 In step S110, the determination unit 40 determines whether or not the face orientation and the head position satisfy predetermined conditions. Here, the predetermined condition is that at least one of the pitch angle, yaw angle and head position exceeds the predetermined range. If at least one is beyond that range, step S120 is executed. If everything is within that range, the gesture detection method ends.
 図10は、処理対象のフレームの一例を示す図である。図10に示されるフレームは、図9に示されるフレームの後のフレームであり、予め定められたフレーム数以内のフレームである。図10において、乗員は、車両のセンターコンソールに表示された情報を詳細に確認するため、さらに覗き込んでいる。顔検出部10が乗員の顔11の検出に失敗しており、顔枠12が設定されていない。手候補検出部30は、誤って乗員の頭を手候補として検出している。また、その手候補を含むように手候補枠32が設定されている。上記のように、図10は、図9のフレームから予め定められたフレーム数以内のフレームである。そのため、顔情報取得部20は、図9のフレームにおける顔向きおよび頭位置の情報を取得する。乗員の顔向きは、斜め下方向である。ピッチ角およびヨー角が、予め定められた範囲を超えている。判定部40は、顔向きが予め定められた条件を満たすと判定する。そのため、ステップS120が実行される。 FIG. 10 is a diagram showing an example of a frame to be processed. The frame shown in FIG. 10 is a frame after the frame shown in FIG. 9, and is a frame within a predetermined number of frames. In FIG. 10, the occupant is looking further in order to confirm the information displayed on the center console of the vehicle in detail. The face detection unit 10 has failed to detect the occupant's face 11, and the face frame 12 is not set. The hand candidate detection unit 30 erroneously detects the occupant's head as a hand candidate. Further, the hand candidate frame 32 is set so as to include the hand candidate. As described above, FIG. 10 is a frame within a predetermined number of frames from the frame of FIG. Therefore, the face information acquisition unit 20 acquires information on the face orientation and the head position in the frame of FIG. The occupant's face is diagonally downward. The pitch angle and yaw angle exceed a predetermined range. The determination unit 40 determines that the face orientation satisfies a predetermined condition. Therefore, step S120 is executed.
 ステップS120にて、判定部40は、手候補の情報を棄却する。例えば、判定部40は、手候補を手以外の物として識別する。例えば、判定部40は、手候補の検出結果を手以外の物の検出結果に置き換える。このように、判定部40は、顔向きに関する予め定められた条件に基づいて、手候補の情報を棄却する。以上で、ジェスチャ検出方法は終了する。 In step S120, the determination unit 40 rejects the hand candidate information. For example, the determination unit 40 identifies the hand candidate as something other than a hand. For example, the determination unit 40 replaces the detection result of the hand candidate with the detection result of an object other than the hand. In this way, the determination unit 40 rejects the hand candidate information based on a predetermined condition regarding the face orientation. This completes the gesture detection method.
 上記のジェスチャ検出方法において、ジェスチャ検出装置101は、顔11の検出処理および顔向きの情報の取得処理を行った後、手候補の検出処理を行っている。しかし、ジェスチャ検出装置101は、手候補の検出処理の後に、顔11の検出処理および顔向きの情報の取得処理を実行してもよい。または、ジェスチャ検出装置101は、手候補の検出処理を、顔11の検出処理および顔向きの情報の取得処理と並行して実行してもよい。 In the above gesture detection method, the gesture detection device 101 performs a hand candidate detection process after performing a face 11 detection process and a face orientation information acquisition process. However, the gesture detection device 101 may execute the face 11 detection process and the face orientation information acquisition process after the hand candidate detection process. Alternatively, the gesture detection device 101 may execute the hand candidate detection process in parallel with the face 11 detection process and the face orientation information acquisition process.
 次に、一例として、映像を構成する第1フレームと第2フレームとが上記の関係を有する場合の、第2フレームにおけるジェスチャ検出方法を説明する。ここでは、顔検出部10は、第1フレームにおいて乗員の顔11および顔向きの検出に成功し、第2フレームにおいて乗員の顔11の検出に失敗している。その第1フレームは、第2フレームから遡って直近で乗員の顔向きが検出されたフレームである。図9に示されるフレームが第1フレームに対応し、図10に示されるフレームが第2フレームに対応する。 Next, as an example, a gesture detection method in the second frame when the first frame and the second frame constituting the video have the above relationship will be described. Here, the face detection unit 10 succeeds in detecting the occupant's face 11 and the face orientation in the first frame, and fails to detect the occupant's face 11 in the second frame. The first frame is a frame in which the face orientation of the occupant is detected most recently from the second frame. The frame shown in FIG. 9 corresponds to the first frame, and the frame shown in FIG. 10 corresponds to the second frame.
 ステップS10にて、映像取得部50は、撮像装置110によって撮影された映像における第2フレームを取得する。 In step S10, the image acquisition unit 50 acquires the second frame in the image captured by the image pickup device 110.
 ステップS20にて、顔検出部10は、第2フレームにおける乗員の顔11の検出に失敗する。そのため、顔向きおよび頭位置は検出されない。 In step S20, the face detection unit 10 fails to detect the occupant's face 11 in the second frame. Therefore, the face orientation and head position are not detected.
 ステップS30にて、ジェスチャ検出装置101は、乗員の顔向きが検出されていないと判定する。ステップS80が実行される。 In step S30, the gesture detection device 101 determines that the face orientation of the occupant has not been detected. Step S80 is executed.
 ステップS80にて、顔情報取得部20は、第2フレームが、直近で乗員の顔向きが検出された第1フレームから予め定められたフレーム数以内のフレームであるか否かを判定する。上記のように、第1フレームと第2フレームとは、この条件を満たすため、ステップS90が実行される。 In step S80, the face information acquisition unit 20 determines whether or not the second frame is within a predetermined number of frames from the first frame in which the face orientation of the occupant is detected most recently. As described above, in order to satisfy this condition for the first frame and the second frame, step S90 is executed.
 ステップS90にて、顔情報取得部20は、第1フレームにおける顔向きおよび頭位置の情報を記憶部60から取得する。 In step S90, the face information acquisition unit 20 acquires information on the face orientation and head position in the first frame from the storage unit 60.
 ステップS100にて、手候補検出部30は、第2フレームにおける乗員の手候補を検出する。 In step S100, the hand candidate detection unit 30 detects the occupant's hand candidate in the second frame.
 ステップS110にて、判定部40は、第1フレームにおけるピッチ角、ヨー角および頭位置のうち少なくとも1つが、予め定められた範囲を超えているか否かを判定する。少なくとも1つがその範囲を超えている場合、ステップS120が実行される。全てがその範囲以内である場合、ジェスチャ検出方法は終了する。 In step S110, the determination unit 40 determines whether or not at least one of the pitch angle, yaw angle, and head position in the first frame exceeds a predetermined range. If at least one is beyond that range, step S120 is executed. If everything is within that range, the gesture detection method ends.
 ステップS120にて、判定部40は、第2フレームにおける手候補の情報を棄却する。以上で、1つの処理対象フレームにおけるジェスチャ検出方法は終了する。その後、次のフレームに対してステップS10が再び実行される。 In step S120, the determination unit 40 rejects the hand candidate information in the second frame. This completes the gesture detection method for one processing target frame. After that, step S10 is executed again for the next frame.
 このようなジェスチャ検出装置101は、乗員の手以外の物が手31として識別されることを低減させる。つまり、ジェスチャ検出装置101は、乗員のジェスチャを構成する手31を正確に検出する。 Such a gesture detection device 101 reduces the identification of objects other than the occupant's hand as the hand 31. That is, the gesture detection device 101 accurately detects the hand 31 that constitutes the gesture of the occupant.
 乗員が車載機器120を操作する際、乗員は、車両のダッシュボード、センターコンソール等の表示装置に表示された情報を覗き込んで確認する。その場合、乗員の頭が手候補の検出範囲に映り込む。手候補検出部30は、その乗員の頭(または顔11)を閉じられた状態の手31(サムズアップの手など)であると判断し、それを手候補として検出する場合がある(例えば図10)。一方で、乗員が、通常、車両機器の操作のためのジェスチャを行う場合、乗員の顔向きは車両の正面方向から撮像装置110が位置する斜め方向の範囲に含まれる(例えば図8)。乗員が表示情報を覗き込む場合、顔向きはその範囲から外れる。実施の形態2におけるジェスチャ検出装置101は、顔向きに関する予め定められた条件に基づいて、誤って検出された手候補の情報を棄却する。言い換えると、乗員の顔向きが予め定められた範囲を超えている場合、ジェスチャ検出装置101は、手候補が手以外の物であると識別し、その手候補の情報を棄却する。その結果、ジェスチャ検出装置101は、乗員のジェスチャにおける手31を正確に検出する。 When the occupant operates the in-vehicle device 120, the occupant looks into the information displayed on the display device such as the dashboard of the vehicle and the center console to confirm. In that case, the occupant's head is reflected in the detection range of the hand candidate. The hand candidate detection unit 30 may determine that the occupant's head (or face 11) is a closed hand 31 (such as a thumbs-up hand) and detect it as a hand candidate (for example, FIG. 10). On the other hand, when the occupant normally makes a gesture for operating the vehicle equipment, the face orientation of the occupant is included in the diagonal range from the front direction of the vehicle to the position of the image pickup device 110 (for example, FIG. 8). When the occupant looks into the displayed information, the face orientation is out of the range. The gesture detection device 101 in the second embodiment rejects the information of the hand candidate erroneously detected based on the predetermined condition regarding the face orientation. In other words, when the face orientation of the occupant exceeds a predetermined range, the gesture detection device 101 identifies that the hand candidate is something other than the hand, and rejects the information of the hand candidate. As a result, the gesture detection device 101 accurately detects the hand 31 in the gesture of the occupant.
 顔向きに関する予め定められた条件は、上記の条件に限定されるものではない。例えば、その条件は、ピッチ角、ヨー角およびロール角のうち少なくとも1つの角度と、横方向の頭位置、奥行方向の頭位置および高さ方向の頭位置のうち少なくとも1つの頭位置と、が組み合わされた条件であってもよい。 The predetermined conditions regarding face orientation are not limited to the above conditions. For example, the condition is that at least one of the pitch, yaw, and roll angles and the head position of at least one of the lateral head position, the depth head position, and the height head position. It may be a combination of conditions.
 実施の形態2におけるジェスチャ検出装置101は、記憶部60を含む。記憶部60は、映像のフレームごとに検出される顔向きの情報と乗員の頭位置の情報とを記憶する。映像の第1フレームにおける乗員の顔向きが検出され、かつ、第1フレームよりも後の第2フレームにおける乗員の顔向きが検出されなかった場合、顔情報取得部20は、第1フレームにおける顔向きの情報と頭位置の情報とを、記憶部60から取得する。第2フレームは、第1フレームから予め定められたフレーム数(第1の予め定められたフレーム数)以内のフレームである。手候補検出部30は、第2フレームにおける手候補を検出する。判定部40は、予め定められた条件としての第1フレームにおける顔向きと頭位置とに関する条件に基づいて、第2フレームにおける手候補の情報を棄却する。 The gesture detection device 101 in the second embodiment includes a storage unit 60. The storage unit 60 stores information on the face orientation detected for each frame of the image and information on the head position of the occupant. When the face orientation of the occupant in the first frame of the image is detected and the face orientation of the occupant in the second frame after the first frame is not detected, the face information acquisition unit 20 performs the face in the first frame. The orientation information and the head position information are acquired from the storage unit 60. The second frame is a frame within a predetermined number of frames (first predetermined number of frames) from the first frame. The hand candidate detection unit 30 detects the hand candidate in the second frame. The determination unit 40 rejects the hand candidate information in the second frame based on the predetermined conditions regarding the face orientation and the head position in the first frame.
 顔検出のパターンマッチング処理と手候補検出のパターンマッチング処理とは互いに異なるため、顔検出部10が乗員の顔向きの検出に失敗した場合であっても、手候補検出部30が乗員の顔11、頭等を手候補として誤って検出する場合がある(例えば図10)。乗員が図9に示される体勢から図10に示される体勢へ移動する動作、つまり、乗員が表示装置を覗き込む動作は、連続的であり短時間に行われる。そのため、判定部40が、処理対象のフレームと時間的に近いフレームにおける顔向きの情報に基づいて、処理対象のフレームにおける手候補の情報を棄却した場合であっても、その棄却判定の精度が悪化することはない。実施の形態2におけるジェスチャ検出装置101は、一時的に乗員の顔向きが検出されない状態であっても、乗員の顔11、頭等が手候補として検出されることを防ぐ。その結果、乗員の手31の検出精度が向上する。 Since the face detection pattern matching process and the hand candidate detection pattern matching process are different from each other, even if the face detection unit 10 fails to detect the occupant's face orientation, the hand candidate detection unit 30 still uses the occupant's face 11. , Head, etc. may be erroneously detected as a hand candidate (for example, FIG. 10). The movement of the occupant from the posture shown in FIG. 9 to the posture shown in FIG. 10, that is, the movement of the occupant looking into the display device is continuous and is performed in a short time. Therefore, even when the determination unit 40 rejects the hand candidate information in the frame to be processed based on the face orientation information in the frame close in time to the frame to be processed, the accuracy of the rejection determination is high. It doesn't get worse. The gesture detection device 101 according to the second embodiment prevents the occupant's face 11, head, and the like from being detected as hand candidates even when the occupant's face orientation is temporarily not detected. As a result, the detection accuracy of the occupant's hand 31 is improved.
 実施の形態2における第1フレームは、第2フレームから遡って直近で乗員の顔向きが検出されたフレームである。 The first frame in the second embodiment is a frame in which the face orientation of the occupant is detected most recently from the second frame.
 一時的に乗員の顔11が検出されない状態であっても、ジェスチャ検出装置101は、直近で検出された顔向きが予め定められた条件を満たすか否かを判定する。そのため、ジェスチャ検出装置101は、乗員の手31を正確に検出する。 Even if the occupant's face 11 is temporarily not detected, the gesture detection device 101 determines whether or not the most recently detected face orientation satisfies a predetermined condition. Therefore, the gesture detection device 101 accurately detects the occupant's hand 31.
 <実施の形態3>
 実施の形態3におけるジェスチャ検出装置およびジェスチャ検出方法を説明する。実施の形態3は実施の形態1の下位概念であり、実施の形態3におけるジェスチャ検出装置は、実施の形態2におけるジェスチャ検出装置101の各構成を含む。なお、実施の形態1または2と同様の構成および動作については説明を省略する。
<Embodiment 3>
The gesture detection device and the gesture detection method according to the third embodiment will be described. The third embodiment is a subordinate concept of the first embodiment, and the gesture detection device according to the third embodiment includes each configuration of the gesture detection device 101 according to the second embodiment. The same configuration and operation as those of the first and second embodiments will be omitted.
 図11は、実施の形態3における第1フレームから第4フレームまでの関係を示す図である。 FIG. 11 is a diagram showing the relationship from the first frame to the fourth frame in the third embodiment.
 第1フレームは、映像を構成する複数のフレームのうち、最初の処理対象のフレームである。第1フレームにおいては、乗員の顔向きが検出され、手候補も検出される。処理対象のフレームが第1フレームである場合、判定部40は、その第1フレームにおける顔11のピッチ角およびヨー角のうち少なくとも一方が、予め定められた範囲を超えているか否かを判定する。判定部40は、その判定結果に基づいて、第1フレームにおける手候補の情報を棄却する。 The first frame is the first frame to be processed among the plurality of frames constituting the video. In the first frame, the face orientation of the occupant is detected, and the hand candidate is also detected. When the frame to be processed is the first frame, the determination unit 40 determines whether or not at least one of the pitch angle and the yaw angle of the face 11 in the first frame exceeds a predetermined range. .. The determination unit 40 rejects the hand candidate information in the first frame based on the determination result.
 第2フレームは、第1フレームから第1の予め定められたフレーム数だけ後のフレームである。第1の予め定められたフレーム数は、例えば、ジェスチャ検出装置に記憶されていてもよいし、外部から入力されたものであってもよい。第1フレームの次のフレームから第2フレームまでの各フレームにおいては、乗員の顔向きは検出されないが、手候補は検出される。第1フレームは、第2フレームから遡って直近で乗員の顔向きが検出されたフレームである。処理対象のフレームが第1フレームの次のフレームから第2フレームまでのいずれかのフレームである場合、判定部40は、第1フレームにおけるピッチ角、ヨー角および頭位置のうち少なくとも1つが、予め定められた範囲を超えているか否かを判定する。判定部40は、その判定結果に基づいて、各フレームにおける手候補の情報を棄却する。この判定動作は、実施の形態2と同様である。ジェスチャ検出装置は、一時的に乗員の顔向きが検出されない状態であっても、乗員の顔11、頭等が手候補として検出されることを防ぐ。 The second frame is a frame after the first frame by a predetermined number of frames. The first predetermined number of frames may be stored in the gesture detection device, for example, or may be input from the outside. In each frame from the frame following the first frame to the second frame, the face orientation of the occupant is not detected, but the hand candidate is detected. The first frame is a frame in which the face orientation of the occupant is detected most recently from the second frame. When the frame to be processed is any of the frames from the frame following the first frame to the second frame, the determination unit 40 has at least one of the pitch angle, yaw angle, and head position in the first frame in advance. Determine if it exceeds the specified range. The determination unit 40 rejects the hand candidate information in each frame based on the determination result. This determination operation is the same as that of the second embodiment. The gesture detection device prevents the occupant's face 11, head, and the like from being detected as hand candidates even when the occupant's face orientation is temporarily not detected.
 第3フレームは、第2フレームよりも後のフレームである。第2フレームから第3フレームまでの各フレームにおいては、乗員の顔向きは検出されないが、手候補は検出される。処理対象のフレームが第2フレームの次のフレームから第3フレームまでのいずれかのフレームである場合、判定部40は、手候補の情報を棄却する。言い換えると、判定部40は、顔向きに関する棄却判定を行うことなく手候補の情報を棄却する。 The third frame is a frame after the second frame. In each frame from the second frame to the third frame, the face orientation of the occupant is not detected, but the hand candidate is detected. When the frame to be processed is any of the frames from the frame next to the second frame to the third frame, the determination unit 40 rejects the hand candidate information. In other words, the determination unit 40 rejects the hand candidate information without making a rejection determination regarding the face orientation.
 顔向きが検出されない時間が継続した場合、第1フレームと処理対象のフレームとの間のフレーム数は、第1の予め定められたフレーム数よりも大きくなる。そのため、図7に示されるステップS80において、第2フレームの次のフレームから第3フレームまでの各フレームは「No」と判定される。つまり、手候補の情報は棄却されない。そのため、手候補として検出された手以外の物が手31として認識される可能性がある。そこで、ジェスチャ検出装置は、第2フレームの次のフレームから第3フレームまでの各フレームにおいて検出された手候補の情報を、顔向きに関する棄却判定を行うことなく棄却する。そのため、乗員の手31の検出精度が向上する。 If the face orientation is not detected for a continuous period of time, the number of frames between the first frame and the frame to be processed becomes larger than the first predetermined number of frames. Therefore, in step S80 shown in FIG. 7, each frame from the frame next to the second frame to the third frame is determined to be "No". That is, the hand candidate information is not rejected. Therefore, an object other than the hand detected as a hand candidate may be recognized as the hand 31. Therefore, the gesture detection device rejects the information of the hand candidate detected in each frame from the frame next to the second frame to the third frame without making a rejection determination regarding the face orientation. Therefore, the detection accuracy of the occupant's hand 31 is improved.
 第4フレームは、第3フレームの次のフレームから第2の予め定められたフレーム数だけ後のフレームである。第2の予め定められたフレーム数は、例えば、ジェスチャ検出装置に記憶されていてもよいし、外部から入力されたものであってもよい。第3フレームの次のフレームから第4フレームまで連続して乗員の顔向きが検出される。また、手候補も検出される。処理対象のフレームが第3フレームの次のフレームから第4フレームの1つ前のフレームまでのいずれかのフレームである場合、判定部40は、手候補の情報を棄却する。言い換えると、判定部40は、顔向きに関する棄却判定を行うことなく手候補の情報を棄却する。 The fourth frame is a frame after the second predetermined number of frames from the frame following the third frame. The second predetermined number of frames may be stored in the gesture detection device, for example, or may be input from the outside. The face orientation of the occupant is continuously detected from the frame following the third frame to the fourth frame. Also, hand candidates are detected. When the frame to be processed is any frame from the frame next to the third frame to the frame immediately before the fourth frame, the determination unit 40 rejects the hand candidate information. In other words, the determination unit 40 rejects the hand candidate information without making a rejection determination regarding the face orientation.
 顔向きが検出されない時間が続いた後に、再び顔向きが検出された場合、その検出された顔向きが正確でない場合がある。また、乗員がジェスチャを行う際、通常、乗員の顔11は正面方向から撮像装置110が位置する方向を向いている。そのため、手候補の情報の棄却判定は、乗員の顔向きが予め定められた範囲に含まれる可能性が高い状態で、再開されることが好ましい。実施の形態3におけるジェスチャ検出装置は、第2の予め定められたフレーム数以上、顔検出に成功するまで、手候補の情報を棄却する。 If the face orientation is detected again after a period of time when the face orientation is not detected, the detected face orientation may not be accurate. Further, when the occupant makes a gesture, the occupant's face 11 usually faces the direction in which the image pickup apparatus 110 is located from the front direction. Therefore, it is preferable that the rejection determination of the hand candidate information is restarted in a state where the face orientation of the occupant is likely to be included in the predetermined range. The gesture detection device according to the third embodiment rejects the hand candidate information until the face detection is successful for the second predetermined number of frames or more.
 処理対象のフレームが第4フレームである場合、判定部40は、手候補の情報を棄却するか否かについての判定を再開する。これにより、乗員の手31の検出精度が向上する。 When the frame to be processed is the fourth frame, the determination unit 40 restarts the determination as to whether or not to reject the hand candidate information. This improves the detection accuracy of the occupant's hand 31.
 以上の機能は、図2または図3に示される処理回路によって実現される。 The above functions are realized by the processing circuit shown in FIG. 2 or FIG.
 <実施の形態4>
 実施の形態4におけるジェスチャ検出装置およびジェスチャ検出方法を説明する。実施の形態4は実施の形態1の下位概念である。実施の形態4におけるジェスチャ検出装置は、実施の形態2におけるジェスチャ検出装置101の各構成を含む。なお、実施の形態1から3のいずれかと同様の構成および動作については説明を省略する。
<Embodiment 4>
The gesture detection device and the gesture detection method according to the fourth embodiment will be described. The fourth embodiment is a subordinate concept of the first embodiment. The gesture detection device according to the fourth embodiment includes each configuration of the gesture detection device 101 according to the second embodiment. The same configuration and operation as any one of the first to third embodiments will be omitted.
 図12は、実施の形態4におけるジェスチャ検出装置102の構成を示す機能ブロック図である。ジェスチャ検出装置102は、手候補検出部30Aを含む。実施の形態4におおける手候補検出部30Aは、実施の形態2における手候補検出部30の変形例である。 FIG. 12 is a functional block diagram showing the configuration of the gesture detection device 102 according to the fourth embodiment. The gesture detection device 102 includes a hand candidate detection unit 30A. The hand candidate detection unit 30A in the fourth embodiment is a modified example of the hand candidate detection unit 30 in the second embodiment.
 手候補検出部30Aは、映像内に設定される手候補の検出対象領域において、手候補の検出処理を行う。通常時の手候補の検出処理において、検出対象領域は、例えば乗員が手のジェスチャを行う位置に予め設定されている。または例えば、検出対象領域は、処理対象のフレームよりも前のフレームにおいて検出された手候補枠32を含む位置に設定されている。実施の形態4における手候補検出部30Aは、その検出対象領域を、顔向きに関する予め定められた条件に基づいて狭くする。例えば、予め定められた条件として、ピッチ角およびヨー角のうち少なくとも一方が、予め定められた範囲を超えている場合、手候補検出部30Aは検出対象領域を狭くする。言い換えると、予め定められた条件とは、ピッチ角およびヨー角のうち少なくとも一方が、予め定められた範囲を超えていることである。手候補検出部30Aは、その狭められた検出対象領域において、手候補の検出処理を行う。 The hand candidate detection unit 30A performs hand candidate detection processing in the hand candidate detection target area set in the video. In the normal hand candidate detection process, the detection target area is preset, for example, at a position where the occupant performs a hand gesture. Or, for example, the detection target area is set at a position including the hand candidate frame 32 detected in the frame before the frame to be processed. The hand candidate detection unit 30A in the fourth embodiment narrows the detection target area based on a predetermined condition regarding the face orientation. For example, as a predetermined condition, when at least one of the pitch angle and the yaw angle exceeds the predetermined range, the hand candidate detection unit 30A narrows the detection target area. In other words, the predetermined condition is that at least one of the pitch angle and the yaw angle exceeds the predetermined range. The hand candidate detection unit 30A performs hand candidate detection processing in the narrowed detection target area.
 顔向きが予め定められた条件を満たすか否かの判定は、例えば、手候補検出部30Aによって実行される。または例えば、判定部40による判定結果を手候補検出部30Aが取得してもよい。なお検出対象領域は、例えば、ジェスチャ検出領域と言われる領域に対応する。 The determination of whether or not the face orientation satisfies a predetermined condition is executed by, for example, the hand candidate detection unit 30A. Alternatively, for example, the hand candidate detection unit 30A may acquire the determination result by the determination unit 40. The detection target area corresponds to, for example, a region called a gesture detection region.
 図13は、実施の形態4におけるジェスチャ検出方法を示すフローチャートである。ステップS10からS50までは実施の形態2と同様である。また、ステップS80およびステップS90は、実施の形態2と同様である。 FIG. 13 is a flowchart showing the gesture detection method according to the fourth embodiment. Steps S10 to S50 are the same as those in the second embodiment. Further, step S80 and step S90 are the same as those in the second embodiment.
 ステップS60およびS100にて、手候補検出部30Aは、検出対象領域における乗員の手候補を検出する。図14は、処理対象のフレームの一例を示す図である。図14において、乗員は、車載機器120の操作のための手のジェスチャを行っていない。乗員は、室内に設けられたルームミラー(リアビューミラー)2を手31で操作している。乗員は、そのルームミラー2を見ている。顔検出部10は、乗員の顔11を検出している。その顔11を囲むように顔枠12が設定されている。フレーム内には、検出対象領域33Aが設定されている。手候補検出部30Aは、その検出対象領域33A内で、手候補として乗員の手31を検出している。その手候補を含むように手候補枠32が設定されている。 In steps S60 and S100, the hand candidate detection unit 30A detects the occupant's hand candidate in the detection target area. FIG. 14 is a diagram showing an example of a frame to be processed. In FIG. 14, the occupant does not make a hand gesture for operating the in-vehicle device 120. The occupant operates the rear view mirror 2 provided in the room by hand 31. The occupant is looking at the rear-view mirror 2. The face detection unit 10 detects the occupant's face 11. The face frame 12 is set so as to surround the face 11. The detection target area 33A is set in the frame. The hand candidate detection unit 30A detects the occupant's hand 31 as a hand candidate within the detection target area 33A. The hand candidate frame 32 is set so as to include the hand candidate.
 ステップS70およびS110にて、判定部40は、顔向きが予め定められた条件を満たすか否かを判定する。図14においては、顔検出部10によって検出される乗員の顔向きは、斜め上方向である。顔向きを示すピッチ角およびヨー角の両方は予め定められた範囲を超えている。手候補検出部30Aは、顔向きが予め定められた条件を満たすと判定する。ステップS120が実行される。 In steps S70 and S110, the determination unit 40 determines whether or not the face orientation satisfies a predetermined condition. In FIG. 14, the face orientation of the occupant detected by the face detection unit 10 is diagonally upward. Both the pitch angle and the yaw angle indicating the face orientation exceed a predetermined range. The hand candidate detection unit 30A determines that the face orientation satisfies a predetermined condition. Step S120 is executed.
 ステップS120にて、手候補の情報は棄却される。実施の形態4では、ステップS120の後、ステップS130が実行される。 In step S120, the hand candidate information is rejected. In the fourth embodiment, step S130 is executed after step S120.
 ステップS130にて、手候補検出部30Aは、棄却された手候補が検出されないように、検出対象領域33Aを狭くする。手候補検出部30Aは、例えば、手候補枠32を含まないように、検出対象領域33Aのサイズを小さくした検出対象領域33Bを設定する。ここでは、検出対象領域33Bは、検出対象領域33Aの上部が削減された領域に対応する。次のフレームにおけるジェスチャ検出の処理において、手候補検出部30Aは、その狭められた検出対象領域33Bおける手候補を検出する。 In step S130, the hand candidate detection unit 30A narrows the detection target area 33A so that the rejected hand candidate is not detected. The hand candidate detection unit 30A sets, for example, a detection target area 33B in which the size of the detection target area 33A is reduced so as not to include the hand candidate frame 32. Here, the detection target area 33B corresponds to an area in which the upper portion of the detection target area 33A is reduced. In the gesture detection process in the next frame, the hand candidate detection unit 30A detects the hand candidate in the narrowed detection target area 33B.
 このようなジェスチャ検出装置102は、ルームミラー2を操作する手31を、車載機器120の操作のためのジェスチャにおける手として検出しない。より詳細には、ジェスチャ検出装置102は、ルームミラー2を操作する手31に基づいて検出された手候補の情報を棄却する。さらに、ジェスチャ検出装置102は、ルームミラー2の周辺の領域で手候補が検出されないように、乗員の顔向きの情報に基づいて検出対象領域33Aを狭くする。 Such a gesture detection device 102 does not detect the hand 31 that operates the rearview mirror 2 as a hand in the gesture for operating the in-vehicle device 120. More specifically, the gesture detection device 102 rejects the information of the hand candidate detected based on the hand 31 that operates the rearview mirror 2. Further, the gesture detection device 102 narrows the detection target area 33A based on the information on the face orientation of the occupant so that the hand candidate is not detected in the area around the rearview mirror 2.
 図14に示される検出対象領域33Aおよび33Bの位置および大きさは一例であって、それらに限定されるものではない。例えば、検出対象領域33Aおよび33Bは、運転席と助手席との間の中央部だけでなく、両座席の方向(左右方向)に拡張された領域であってもよい。 The positions and sizes of the detection target areas 33A and 33B shown in FIG. 14 are examples, and are not limited thereto. For example, the detection target areas 33A and 33B may be not only the central portion between the driver's seat and the passenger seat but also the regions extended in the direction of both seats (left-right direction).
 上記のジェスチャ検出装置102は、手候補の情報の棄却処理の後に、検出対象領域33Aを狭くする処理を実行している。しかし、検出対象領域33Aを狭くする処理のタイミングは、ステップS50とS60との間、および、ステップS90とS100との間であってもよい。その場合、手候補検出部30Aは、処理対象のフレームにおける顔向きが予め定められた条件を満たす場合に、その処理対象のフレームにおける検出対象領域33Aを狭くする。 The gesture detection device 102 described above executes a process of narrowing the detection target area 33A after the process of rejecting the information of the hand candidate. However, the timing of the process of narrowing the detection target area 33A may be between steps S50 and S60 and between steps S90 and S100. In that case, the hand candidate detection unit 30A narrows the detection target area 33A in the processing target frame when the face orientation in the processing target frame satisfies a predetermined condition.
 <実施の形態5>
 以上の各実施の形態に示されたジェスチャ検出装置は、ナビゲーション装置と、通信端末と、サーバと、これらにインストールされるアプリケーションの機能とを適宜に組み合わせて構築されるシステムにも適用することができる。ここで、ナビゲーション装置とは、例えば、PND(Portable Navigation Device)などを含む。通信端末とは、例えば、携帯電話、スマートフォンおよびタブレットなどの携帯端末を含む。
<Embodiment 5>
The gesture detection device shown in each of the above embodiments can be applied to a system constructed by appropriately combining a navigation device, a communication terminal, a server, and the functions of applications installed in the navigation device. can. Here, the navigation device includes, for example, a PND (Portable Navigation Device) and the like. The communication terminal includes, for example, a mobile terminal such as a mobile phone, a smartphone and a tablet.
 図15は、実施の形態5におけるジェスチャ検出装置101およびそれに関連して動作する装置の構成を示すブロック図である。 FIG. 15 is a block diagram showing the configuration of the gesture detection device 101 and the device that operates in connection with the gesture detection device 101 in the fifth embodiment.
 ジェスチャ検出装置101および通信装置130がサーバ300に設けられている。ジェスチャ検出装置101は、車両1に設けられた撮像装置110で撮影された映像を、通信装置140および通信装置130を介して取得する。ジェスチャ検出装置101は、その映像に基づいて検出される乗員の顔向きの情報を取得する。ジェスチャ検出装置101は、その映像に基づいて手候補を検出する。ジェスチャ検出装置101は、顔向きに関する予め定められた条件に基づいて、手候補の情報を棄却する。ジェスチャ検出装置101は、棄却されなかった手候補を、乗員のジェスチャを構成する手31として識別する。ジェスチャ検出装置101によって識別された乗員の手31によるジェスチャに基づいて、車載機器120の操作処理等が実行される。 The gesture detection device 101 and the communication device 130 are provided in the server 300. The gesture detection device 101 acquires an image taken by the image pickup device 110 provided in the vehicle 1 via the communication device 140 and the communication device 130. The gesture detection device 101 acquires information on the face orientation of the occupant detected based on the image. The gesture detection device 101 detects a hand candidate based on the image. The gesture detection device 101 rejects the hand candidate information based on a predetermined condition regarding the face orientation. The gesture detection device 101 identifies the hand candidate that has not been rejected as the hand 31 that constitutes the gesture of the occupant. Based on the gesture by the occupant's hand 31 identified by the gesture detection device 101, the operation process of the in-vehicle device 120 and the like are executed.
 このように、ジェスチャ検出装置101がサーバ300に配置されることにより、車両1に搭載される装置の構成を簡素化することができる。 By arranging the gesture detection device 101 on the server 300 in this way, it is possible to simplify the configuration of the device mounted on the vehicle 1.
 また、ジェスチャ検出装置101の機能あるいは構成要素の一部がサーバ300に設けられ、他の一部が車両1に設けられるなど、分散して配置されてもよい。実施の形態1に示されたジェスチャ検出装置100がサーバ300に設けられる場合も同様の効果を奏する。 Further, some of the functions or components of the gesture detection device 101 may be provided in the server 300, and the other part may be provided in the vehicle 1 in a distributed manner. The same effect is obtained when the gesture detection device 100 shown in the first embodiment is provided in the server 300.
 なお、本開示は、各実施の形態を自由に組み合わせたり、各実施の形態を適宜、変形、省略したりすることが可能である。 In this disclosure, each embodiment can be freely combined, and each embodiment can be appropriately modified or omitted.
 本開示は詳細に説明されたが、上記の説明は、全ての局面において、例示であり、限定されるものではない。例示されていない無数の変形例が、想定され得る。 Although this disclosure has been described in detail, the above description is exemplary and not limited in all aspects. A myriad of variants not illustrated can be envisioned.
 1 車両、2 ルームミラー、10 顔検出部、11 顔、12 顔枠、20 顔情報取得部、30 手候補検出部、30A 手候補検出部、31 手、32 手候補枠、33A 検出対象領域、33B 検出対象領域、40 判定部、50 映像取得部、60 記憶部、100 ジェスチャ検出装置、101 ジェスチャ検出装置、102 ジェスチャ検出装置、110 撮像装置、120 車載機器。 1 vehicle, 2 room mirror, 10 face detection unit, 11 face, 12 face frame, 20 face information acquisition unit, 30 hand candidate detection unit, 30A hand candidate detection unit, 31 hand, 32 hand candidate frame, 33A detection target area, 33B detection target area, 40 judgment unit, 50 video acquisition unit, 60 storage unit, 100 gesture detection device, 101 gesture detection device, 102 gesture detection device, 110 image pickup device, 120 in-vehicle device.

Claims (6)

  1.  車両に設けられた撮像装置によって撮像された映像に基づいて検出される乗員の顔向きの情報を取得する顔情報取得部と、
     前記映像に基づいて、前記乗員の手の候補である手候補を検出する手候補検出部と、
     前記顔向きに関する予め定められた条件に基づいて、前記手候補が検出対象である前記乗員のジェスチャにおける前記乗員の前記手として検出されないように、前記手候補の前記情報を棄却する判定部と、を備える、ジェスチャ検出装置。
    A face information acquisition unit that acquires information on the face orientation of the occupant detected based on the image captured by the image pickup device installed in the vehicle, and the face information acquisition unit.
    A hand candidate detection unit that detects a hand candidate that is a candidate for the occupant's hand based on the video, and a hand candidate detection unit.
    A determination unit that rejects the information of the hand candidate so that the hand candidate is not detected as the occupant's hand in the gesture of the occupant to be detected based on a predetermined condition regarding the face orientation. Gesture detector.
  2.  前記映像のフレームごとに検出される前記乗員の前記顔向きの前記情報と前記乗員の頭位置の情報とを記憶する記憶部をさらに備え、
     前記映像の第1フレームにおける前記乗員の前記顔向きが検出され、かつ、前記第1フレームから第1の予め定められたフレーム数以内の第2フレームであって前記第1フレームよりも後の前記第2フレームにおける前記乗員の前記顔向きが検出されなかった場合、
     前記顔情報取得部は、前記第1フレームにおける前記顔向きの前記情報と前記頭位置の前記情報とを、前記記憶部から取得し、
     前記手候補検出部は、前記第2フレームにおける前記手候補を検出し、
     前記判定部は、前記予め定められた条件としての前記第1フレームにおける前記顔向きと前記頭位置とに関する条件に基づいて、前記第2フレームにおける前記手候補の前記情報を棄却する、請求項1に記載のジェスチャ検出装置。
    Further, a storage unit for storing the information on the face orientation of the occupant and the information on the head position of the occupant detected for each frame of the image is provided.
    The face orientation of the occupant in the first frame of the image is detected, and the second frame is within the first predetermined number of frames from the first frame and is later than the first frame. When the face orientation of the occupant in the second frame is not detected,
    The face information acquisition unit acquires the information on the face orientation and the information on the head position in the first frame from the storage unit.
    The hand candidate detection unit detects the hand candidate in the second frame and determines the hand candidate.
    The determination unit rejects the information of the hand candidate in the second frame based on the condition relating to the face orientation and the head position in the first frame as the predetermined conditions. Gesture detector according to.
  3.  前記第1フレームは、前記第2フレームから遡って直近で前記乗員の前記顔向きが検出されたフレームである、請求項2に記載のジェスチャ検出装置。 The gesture detection device according to claim 2, wherein the first frame is a frame in which the face orientation of the occupant is detected most recently from the second frame.
  4.  前記第1フレームの次のフレームから前記第2フレームを超えて第3フレームまで、連続して前記乗員の前記顔向きが検出されず、かつ、前記第3フレームの次のフレームから第4フレームまで、連続して前記乗員の前記顔向きが検出された場合、
     前記判定部は、
     前記第3フレームの前記次のフレームから前記第4フレームの1つ前のフレームまでの各々のフレームにおいては、前記手候補の前記情報を棄却し、
     前記第4フレームにおいては、前記手候補の前記情報を棄却するか否かの判定を再開し、
     前記第2フレームは、前記第1フレームから前記第1の予め定められたフレーム数だけ後のフレームであり、
     前記第4フレームは、前記第3フレームから第2の予め定められたフレーム数だけ後のフレームである、請求項3に記載のジェスチャ検出装置。
    From the frame following the first frame to the third frame beyond the second frame, the face orientation of the occupant is not continuously detected, and from the frame next to the third frame to the fourth frame. , When the face orientation of the occupant is continuously detected,
    The determination unit
    In each frame from the next frame of the third frame to the frame immediately before the fourth frame, the information of the hand candidate is rejected.
    In the fourth frame, the determination of whether or not to reject the information of the hand candidate is restarted, and the determination is resumed.
    The second frame is a frame after the first frame by a predetermined number of frames.
    The gesture detection device according to claim 3, wherein the fourth frame is a frame after the third frame by a second predetermined number of frames.
  5.  前記手候補検出部は、
     前記顔向きに関する前記予め定められた条件に基づいて、前記映像内に設定される前記手候補の検出対象領域を狭くし、
     前記検出対象領域における前記手候補を検出する、請求項1に記載のジェスチャ検出装置。
    The hand candidate detection unit
    Based on the predetermined conditions regarding the face orientation, the detection target area of the hand candidate set in the image is narrowed.
    The gesture detection device according to claim 1, wherein the hand candidate in the detection target region is detected.
  6.  車両に設けられた撮像装置によって撮像された映像に基づいて検出される乗員の顔向きの情報を取得し、
     前記映像に基づいて、前記乗員の手の候補である手候補を検出し、
     前記顔向きに関する予め定められた条件に基づいて、前記手候補が検出対象である前記乗員のジェスチャにおける前記乗員の前記手として検出されないように、前記手候補の前記情報を棄却する、ジェスチャ検出方法。
    The information on the face orientation of the occupant detected based on the image captured by the image pickup device installed in the vehicle is acquired, and the information is obtained.
    Based on the video, a hand candidate that is a candidate for the occupant's hand is detected.
    A gesture detection method that rejects the information of the hand candidate so that the hand candidate is not detected as the occupant's hand in the gesture of the occupant to be detected based on a predetermined condition regarding the face orientation. ..
PCT/JP2020/020828 2020-05-27 2020-05-27 Gesture detection device and gesture detection method WO2021240668A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/020828 WO2021240668A1 (en) 2020-05-27 2020-05-27 Gesture detection device and gesture detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/020828 WO2021240668A1 (en) 2020-05-27 2020-05-27 Gesture detection device and gesture detection method

Publications (1)

Publication Number Publication Date
WO2021240668A1 true WO2021240668A1 (en) 2021-12-02

Family

ID=78723081

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/020828 WO2021240668A1 (en) 2020-05-27 2020-05-27 Gesture detection device and gesture detection method

Country Status (1)

Country Link
WO (1) WO2021240668A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014048937A (en) * 2012-08-31 2014-03-17 Omron Corp Gesture recognition device, control method thereof, display equipment, and control program
JP2014197252A (en) * 2013-03-29 2014-10-16 パナソニック株式会社 Gesture operation apparatus, program thereof, and vehicle mounted with gesture operation apparatus
WO2018122891A1 (en) * 2016-12-26 2018-07-05 三菱電機株式会社 Touch panel input device, touch gesture determination device, touch gesture determination method, and touch gesture determination program
JP2018528536A (en) * 2015-08-31 2018-09-27 エスアールアイ インターナショナルSRI International Method and system for monitoring driving behavior
WO2019229938A1 (en) * 2018-05-31 2019-12-05 三菱電機株式会社 Image processing device, image processing method, and image processing system
JP2019536673A (en) * 2017-08-10 2019-12-19 ペキン センスタイム テクノロジー ディベロップメント カンパニー リミテッド Driving state monitoring method and device, driver monitoring system, and vehicle

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014048937A (en) * 2012-08-31 2014-03-17 Omron Corp Gesture recognition device, control method thereof, display equipment, and control program
JP2014197252A (en) * 2013-03-29 2014-10-16 パナソニック株式会社 Gesture operation apparatus, program thereof, and vehicle mounted with gesture operation apparatus
JP2018528536A (en) * 2015-08-31 2018-09-27 エスアールアイ インターナショナルSRI International Method and system for monitoring driving behavior
WO2018122891A1 (en) * 2016-12-26 2018-07-05 三菱電機株式会社 Touch panel input device, touch gesture determination device, touch gesture determination method, and touch gesture determination program
JP2019536673A (en) * 2017-08-10 2019-12-19 ペキン センスタイム テクノロジー ディベロップメント カンパニー リミテッド Driving state monitoring method and device, driver monitoring system, and vehicle
WO2019229938A1 (en) * 2018-05-31 2019-12-05 三菱電機株式会社 Image processing device, image processing method, and image processing system

Similar Documents

Publication Publication Date Title
CN108958467B (en) Apparatus and method for controlling display of hologram, vehicle system
US20050025345A1 (en) Non-contact information input device
US20120148117A1 (en) System and method for facial identification
JP6513321B2 (en) Vehicle imaging control device, driver monitoring device, and vehicle imaging control method
JP6589796B2 (en) Gesture detection device
JPWO2007043452A1 (en) On-vehicle imaging device and imaging movable range measurement method of on-vehicle camera
JP7109649B2 (en) Arousal level estimation device, automatic driving support device, and arousal level estimation method
WO2021240668A1 (en) Gesture detection device and gesture detection method
JP2022143854A (en) Occupant state determination device and occupant state determination method
KR102441079B1 (en) Apparatus and method for controlling display of vehicle
US10953811B2 (en) Vehicle image controller, system including the same, and method thereof
JP7051014B2 (en) Face detection processing device and face detection processing method
JP7258262B2 (en) Adjustment device, adjustment system, display device, occupant monitoring device, and adjustment method
WO2021240671A1 (en) Gesture detection device and gesture detection method
WO2019097677A1 (en) Image capture control device, image capture control method, and driver monitoring system provided with image capture control device
JP2009113599A (en) On-vehicle input device
JP7175381B2 (en) Arousal level estimation device, automatic driving support device, and arousal level estimation method
WO2021229741A1 (en) Gesture detecting device and gesture detecting method
JP6865906B2 (en) Display control device and display control method
JP6847323B2 (en) Line-of-sight detection device and line-of-sight detection method
JP7267517B2 (en) Gesture recognition device and gesture recognition method
WO2022157880A1 (en) Hand detection device, gesture recognition device, and hand detection method
JP6956686B2 (en) Angle control device and angle control method
US20240174248A1 (en) Vehicle warning apparatus
CN115206130B (en) Parking space detection method, system, terminal and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20938446

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20938446

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP