WO2021240668A1

WO2021240668A1 - Gesture detection device and gesture detection method

Info

Publication number: WO2021240668A1
Application number: PCT/JP2020/020828
Authority: WO
Inventors: 太郎熊谷; 拓也村上
Original assignee: 三菱電機株式会社
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2021-12-02

Abstract

Provided is a gesture detection device which accurately detects the hand in a gesture by a vehicle occupant. The gesture detection device comprises a face information acquisition unit, a hand candidate detection unit, and a determination unit. The face information acquisition unit acquires information pertaining to the facial orientation of the vehicle occupant. The facial orientation is detected on the basis of a video taken by an imaging device provided in a vehicle. The hand candidate detection unit detects a hand candidate, which is a candidate for the vehicle occupant's hand, on the basis of the video. The determination unit dismisses, on the basis of a predetermined condition related to the facial orientation, hand candidate information such that the hand candidate is not detected as the vehicle occupant's hand which is to be detected in the gesture by the vehicle occupant.

Description

Gesture detector and gesture detection method

This disclosure relates to a gesture detection device and a gesture detection method.

Regarding the operation of the in-vehicle device by the occupant of the vehicle, a system has been proposed in which the occupant operates the in-vehicle device without touching the in-vehicle device by detecting the gesture of the occupant's hand. For example, the gesture detection device detects the occupant's hand based on an image taken by a camera or the like provided in the vehicle. Since the in-vehicle device operates according to the gesture of the occupant's hand, accuracy is required for the detection of the occupant's hand in the gesture detection device. Patent Document 1 proposes a control device that detects information about a user's hand only from a gesture area set based on the area of the driver's face.

Japanese Unexamined Patent Publication No. 2014-119295

The gesture detection device detects the occupant's hand based on the image. Therefore, depending on the state of the image, the gesture detection device may detect an object other than the hand as a hand.

The present disclosure is for solving the above-mentioned problems, and an object of the present disclosure is to provide a gesture detection device that accurately detects a hand in a gesture of an occupant.

The gesture detection device according to the present disclosure includes a face information acquisition unit, a hand candidate detection unit, and a determination unit. The face information acquisition unit acquires information on the face orientation of the occupant. The face orientation is detected based on the image captured by the image pickup device provided in the vehicle. The hand candidate detection unit detects a hand candidate that is a candidate for the occupant's hand based on the image. The determination unit rejects the hand candidate information so that the hand candidate is not detected as the occupant's hand in the gesture of the occupant to be detected based on a predetermined condition regarding the face orientation.

According to the present disclosure, a gesture detection device for accurately detecting a hand in a occupant's gesture is provided.

The purposes, features, aspects, and advantages of this disclosure will be made clearer by the following detailed description and accompanying drawings.

It is a functional block diagram which shows the structure of the gesture detection apparatus in Embodiment 1. FIG. It is a figure which shows an example of the structure of the processing circuit included in a gesture detection device. It is a figure which shows another example of the structure of the processing circuit included in the gesture detection apparatus. It is a flowchart which shows the gesture detection method in Embodiment 1. FIG. It is a functional block diagram which shows the structure of the gesture detection apparatus in Embodiment 2. FIG. It is a figure which shows an example of the face orientation of an occupant in Embodiment 2. FIG. It is a flowchart which shows the gesture detection method in Embodiment 2. It is a figure which shows an example of the frame to be processed. It is a figure which shows an example of the frame to be processed. It is a figure which shows an example of the frame to be processed. It is a figure which shows the relationship from the 1st frame to the 4th frame in Embodiment 3. FIG. It is a functional block diagram which shows the structure of the gesture detection apparatus in Embodiment 4. It is a flowchart which shows the gesture detection method in Embodiment 4. It is a figure which shows an example of the frame to be processed. It is a block diagram which shows the structure of the gesture detection device and the device which operates in connection with it in Embodiment 5.

<Embodiment 1>
FIG. 1 is a functional block diagram showing the configuration of the gesture detection device 100 according to the first embodiment. Further, FIG. 1 shows an image pickup device 110 and a face detection unit 10 as devices that operate in connection with the gesture detection device 100.

The image pickup device 110 is provided in the vehicle. The image pickup device 110 captures an image of an occupant inside the vehicle.

The face detection unit 10 detects the face orientation of the occupant based on the image. The face orientation corresponds to, for example, the direction facing the front of the occupant's face, the direction of the line of sight, and the like.

The gesture detection device 100 detects the gesture of the hand of the occupant of the vehicle based on the image taken by the image pickup device 110.

The gesture detection device 100 includes a face information acquisition unit 20, a hand candidate detection unit 30, and a determination unit 40.

The face information acquisition unit 20 acquires information on the face orientation of the occupant from the face detection unit 10.

The hand candidate detection unit 30 detects a hand candidate that is a candidate for the occupant's hand based on the image taken by the image pickup device 110. The hand candidate detection unit 30 detects a hand candidate by, for example, matching a pattern of the shape of an object (information on the luminance distribution) in the image with a predetermined pattern of the shape of the hand.

The determination unit 40 rejects the hand candidate information based on a predetermined condition regarding the face orientation. The gesture detection device 100 does not identify the rejected hand candidate as a hand constituting the occupant's gesture.

FIG. 2 is a diagram showing an example of the configuration of the processing circuit 90 included in the gesture detection device 100. Each function of the face information acquisition unit 20, the hand candidate detection unit 30, and the determination unit 40 is realized by the processing circuit 90. That is, the processing circuit 90 has a face information acquisition unit 20, a hand candidate detection unit 30, and a determination unit 40.

When the processing circuit 90 is dedicated hardware, the processing circuit 90 may be, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), or an FPGA (Field). -ProgrammableGateArray), or a circuit that combines these. The functions of the face information acquisition unit 20, the hand candidate detection unit 30, and the determination unit 40 may be individually realized by a plurality of processing circuits, or may be collectively realized by one processing circuit.

FIG. 3 is a diagram showing another example of the configuration of the processing circuit included in the gesture detection device 100. The processing circuit includes a processor 91 and a memory 92. By executing the program stored in the memory 92 by the processor 91, each function of the face information acquisition unit 20, the hand candidate detection unit 30, and the determination unit 40 is realized. For example, each function is realized by executing the software or firmware described as a program by the processor 91. As described above, the gesture detection device 100 has a memory 92 for storing the program and a processor 91 for executing the program.

The program describes a function in which the gesture detection device 100 acquires information on the face orientation of the occupant detected based on the image captured by the image pickup device 110 provided in the vehicle. Further, the program describes a function in which the gesture detection device 100 detects a hand candidate, which is a candidate for a occupant's hand, based on the image. Further, the program describes a function of rejecting the information of the hand candidate so that the hand candidate is not detected as a hand in the gesture of the occupant to be detected based on the predetermined condition regarding the face orientation. As described above, the program causes the computer to execute the procedure or method of the face information acquisition unit 20, the hand candidate detection unit 30, and the determination unit 40.

The processor 91 is, for example, a CPU (Central Processing Unit), an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor), or the like. The memory 92 is, for example, non-volatile or volatile such as RAM (RandomAccessMemory), ROM (ReadOnlyMemory), flash memory, EPROM (ErasableProgrammableReadOnlyMemory), EEPROM (ElectricallyErasableProgrammableReadOnlyMemory). It is a semiconductor memory. Alternatively, the memory 92 may be any storage medium used in the future, such as a magnetic disk, a flexible disk, an optical disk, a compact disk, a mini disk, or a DVD.

Each function of the face information acquisition unit 20, the hand candidate detection unit 30, and the determination unit 40 may be partially realized by dedicated hardware and the other part may be realized by software or firmware. In this way, the processing circuit realizes each of the above functions by hardware, software, firmware, or a combination thereof.

FIG. 4 is a flowchart showing the gesture detection method in the first embodiment. Prior to step S1 shown in FIG. 4, the face detection unit 10 detects the face orientation of the occupant based on the image taken by the image pickup device 110 provided in the vehicle.

In step S1, the face information acquisition unit 20 acquires information on the face orientation of the occupant from the face detection unit 10.

In step S2, the hand candidate detection unit 30 detects a hand candidate that is a candidate for the occupant's hand based on the image captured by the image pickup device 110.

In step S3, the determination unit 40 determines whether or not to reject the hand candidate information based on a predetermined condition regarding the face orientation. The determination unit 40 rejects the hand candidate information according to the determination result. The rejected hand candidate is not detected as a occupant's hand in the occupant's gesture to be detected. In other words, the gesture detection device 100 does not identify the rejected hand candidate as a hand constituting the occupant's gesture.

Summarizing the above, the gesture detection device 100 in the first embodiment includes a face information acquisition unit 20, a hand candidate detection unit 30, and a determination unit 40. The face information acquisition unit 20 acquires information on the face orientation of the occupant. The face orientation is detected based on the image captured by the image pickup device 110 provided in the vehicle. The hand candidate detection unit 30 detects a hand candidate that is a candidate for the occupant's hand based on the image. The determination unit 40 rejects the hand candidate information so that the hand candidate is not detected as the occupant's hand in the gesture of the occupant to be detected, based on a predetermined condition regarding the face orientation.

Such a gesture detection device 100 accurately detects the hand in the gesture of the occupant.

Further, the gesture detection method in the first embodiment acquires information on the face orientation of the occupant detected based on the image captured by the image pickup device 110 provided in the vehicle. In addition, the gesture detection method detects a hand candidate that is a candidate for the occupant's hand based on the image. Further, the gesture detection method rejects the hand candidate information so that the hand candidate is not detected as a hand in the gesture of the occupant to be detected based on a predetermined condition regarding the face orientation.

According to such a gesture detection method, the occupant's hand in the gesture is accurately detected.

<Embodiment 2>
The gesture detection device and the gesture detection method according to the second embodiment will be described. The second embodiment is a subordinate concept of the first embodiment, and the gesture detection device according to the second embodiment includes each configuration of the gesture detection device 100 according to the first embodiment. The same configuration and operation as in the first embodiment will be omitted.

FIG. 5 is a functional block diagram showing the configuration of the gesture detection device 101 according to the second embodiment. Further, FIG. 5 shows an image pickup device 110 and an in-vehicle device 120 as devices that operate in connection with the gesture detection device 101.

The image pickup device 110 is provided in the front center of the vehicle interior. The image pickup apparatus 110 photographs the interior of the vehicle at a wide angle, and photographs both the driver's seat and the passenger seat at the same time. The image pickup device 110 is, for example, a camera that detects infrared rays, a camera that detects visible light, and the like. The gesture detection device 101 according to the second embodiment detects the gesture of the hand of the occupant of the vehicle based on the image captured by the image pickup device 110. The gesture is a gesture for operating the in-vehicle device 120. The in-vehicle device 120 is, for example, an air conditioner, an audio system, or the like. The gesture detected by the gesture detection device 101 controls the temperature of the air conditioner, adjusts the volume of the audio, and the like. However, the in-vehicle device 120 is not limited to the air conditioner and the audio.

The gesture detection device 101 includes a video acquisition unit 50, a face detection unit 10, a storage unit 60, a face information acquisition unit 20, a hand candidate detection unit 30, and a determination unit 40.

The image acquisition unit 50 acquires the image captured by the image pickup device 110 for each frame.

The face detection unit 10 detects the face and face orientation of the occupant for each frame of the image. For example, the face detection unit 10 detects the face parts of the occupant and detects the face orientation based on the position of the face parts. The face orientation detected based on the position of the face part is the direction facing the front of the occupant's face. Or, for example, the face detection unit 10 detects the line of sight of the occupant and detects the face orientation based on the line of sight. The face orientation detected based on the line of sight is the direction in which the line of sight is facing. That is, the face orientation in the second embodiment includes at least one of the direction facing the front of the occupant's face and the direction of the line of sight.

FIG. 6 is a diagram showing an example of the face orientation of the occupant in the second embodiment. Face orientation is represented by Pitch, Yaw, and Roll angles. For example, if the occupant's face is facing straight ahead of the vehicle, the pitch angle, yaw angle and roll angle are 0 degrees. The face detection unit 10 detects at least the pitch angle and the yaw angle among the pitch angle, the yaw angle and the roll angle. Further, the face detection unit 10 in the second embodiment detects the head position in the image. The head position detected in the second embodiment is a position in the height direction. In this way, the face detection unit 10 detects the face orientation and the head position of the occupant. The head position can be read as the face position.

When the face orientation is detected by the face detection unit 10, the storage unit 60 stores information on the face orientation and the head position for each frame.

The face information acquisition unit 20 acquires face orientation information for each frame. When the face orientation of the occupant in the frame to be processed is detected, the face information acquisition unit 20 acquires the face orientation information in the frame to be processed. When the face orientation of the occupant in the frame to be processed is not detected, the face information acquisition unit 20 operates as follows. Here, the frame before the frame to be processed is the first frame, and the frame to be processed is the second frame. The face orientation of the occupant in the first frame is detected. The occupant's face in the second frame is not detected. In this case, in the processing of the second frame, the face information acquisition unit 20 acquires the face orientation and head position information in the first frame from the storage unit 60.

The second frame is a frame within a predetermined number of frames from the first frame. The predetermined number of frames may be stored in the gesture detection device 101, for example, or may be input from the outside. The first frame is preferably a frame in which the face orientation of the occupant is detected most recently from the second frame.

The hand candidate detection unit 30 detects a hand candidate that is a candidate for the occupant's hand for each frame of the image captured by the image pickup device 110. The hand candidate detection unit 30 selects a occupant's hand candidate by, for example, matching a pattern of the shape of an object (information of luminance distribution) in the image with a predetermined pattern of the shape of the hand, that is, by pattern matching processing. To detect. The shape of the hand to be detected may be either the shape of the open hand or the shape of the closed hand. The shape of the hand to be detected may be, for example, the shape of the hand indicating the number, the shape of the hand indicating the direction, the shape of the hand indicating the intention of the occupant (OK, Good, etc.), or the like.

The determination unit 40 rejects the hand candidate information for each frame based on a predetermined condition regarding the face orientation. The predetermined conditions may be stored in the gesture detection device 101, for example, or may be input from the outside. An example of predetermined conditions will be described later. "Rejecting" may include the determination unit 40 identifying the hand candidate as something other than a hand. Alternatively, "rejecting" may include invalidating the information of the hand candidate by the determination unit 40. In any case, the rejected hand candidate is not detected as the occupant's hand in the occupant's gesture to be detected. In other words, the gesture detection device 101 does not identify the rejected hand candidate as a hand constituting the occupant's gesture. On the other hand, the gesture detection device 101 identifies the hand candidate not rejected by the determination unit 40 as a hand constituting the gesture of the occupant. Based on the gesture by the occupant's hand identified by the gesture detection device 101, the operation process of the in-vehicle device 120 and the like are executed. In the functional block diagram shown in FIG. 5, the functional unit that performs processing between the determination unit 40 and the in-vehicle device 120 is not shown.

The determination unit 40 in the second embodiment rejects the hand candidate information when at least one of the pitch angle and the yaw angle indicating the face orientation exceeds a predetermined range. That is, the predetermined condition regarding the face orientation in the second embodiment is that at least one of the pitch angle and the yaw angle representing the face orientation exceeds the predetermined range. The range is, for example, between the angle corresponding to the front direction of the face and the angle corresponding to the oblique direction in which the image pickup apparatus 110 is located. This is because when the occupant makes a gesture, the occupant's face usually faces the image pickup device 110 from the front direction. The range is predetermined for each of the pitch angle and the yaw angle. The predetermined condition regarding the face orientation may be that at least one of the pitch angle and the yaw angle exceeds a predetermined threshold value.

When the first frame and the second frame have the above relationship, the determination unit 40 rejects the hand candidate information of the second frame based on the conditions regarding the face orientation and the head position in the first frame. For example, if at least one of the pitch angle, yaw angle, and head position in the first frame exceeds a predetermined range, the determination unit 40 rejects the hand candidate information in the second frame. That is, the predetermined condition regarding the face orientation is that at least one of the pitch angle, the yaw angle, and the head position in the first frame exceeds the predetermined range.

The functions of the face detection unit 10, the face information acquisition unit 20, the hand candidate detection unit 30, the determination unit 40, the image acquisition unit 50, and the storage unit 60 are realized by the processing circuit shown in FIG. 2 or FIG.

FIG. 7 is a flowchart showing the gesture detection method in the second embodiment.

In step S10, the image acquisition unit 50 acquires a frame to be processed in the image captured by the image pickup device 110.

In step S20, the face detection unit 10 detects the occupant's face, face orientation, and head position in the frame to be processed.

In step S30, the gesture detection device 101 determines whether or not the face orientation is detected. If the face orientation is detected, step S40 is executed. If no face orientation is detected, step S80 is executed.

In step S40, the storage unit 60 stores face orientation and head position information for each frame.

In step S50, the face information acquisition unit 20 acquires face orientation information in the frame to be processed. The face information acquisition unit 20 may acquire the face orientation information from the face detection unit 10 or the storage unit 60.

In step S60, the hand candidate detection unit 30 detects the occupant's hand candidate in the frame to be processed.

In step S70, the determination unit 40 determines whether or not the face orientation satisfies a predetermined condition. Here, the predetermined condition is that at least one of the pitch angle and the yaw angle exceeds the predetermined range. If at least one of them exceeds the range, that is, if the condition is satisfied, step S120 is executed. If both are within that range, that is, if the conditions are not met, the gesture detection method ends.

FIG. 8 is a diagram showing an example of a frame to be processed. In FIG. 8, the occupant is gesturing the hand 31 for operating the in-vehicle device 120. The face detection unit 10 detects the occupant's face 11. Further, the face frame 12 is set so as to surround the face 11. The hand candidate detection unit 30 detects the occupant's hand 31 as a hand candidate. The hand candidate frame 32 is set so as to surround the hand candidate. The occupant's face is facing the front. In this case, both the pitch angle and the yaw angle are within a predetermined range. The determination unit 40 determines that the face orientation does not satisfy a predetermined condition. Therefore, in the case of FIG. 8, the gesture detection method ends. That is, the gesture detection device 101 identifies that the hand candidate is the hand 31 that constitutes the gesture of the occupant.

FIG. 9 is a diagram showing an example of a frame to be processed. In FIG. 9, the occupant does not make a hand gesture for operating the in-vehicle device 120. The occupant is looking into the information displayed on the center console of the vehicle. The information is, for example, information about navigation. The face detection unit 10 detects the occupant's face 11. The face frame 12 is set so as to surround the face 11. The hand candidate detection unit 30 erroneously detects the occupant's face 11 as a hand candidate. The hand candidate frame 32 is set so as to include the hand candidate. As shown in FIG. 9, when the occupant's head is a buzz cut, the hand candidate detection unit 30 may determine that the occupant's face 11 is a closed hand 31 and detect it as a hand candidate. be. However, in FIG. 9, the face orientation of the occupant detected by the face detection unit 10 is diagonally downward. Both the pitch angle and the yaw angle exceed a predetermined range. The determination unit 40 determines that the face orientation satisfies a predetermined condition. Therefore, step S120 is executed.

In step S80, the face information acquisition unit 20 determines whether or not the frame to be processed is within a predetermined number of frames from the frame in which the face orientation of the occupant was detected most recently. If the frame to be processed is within a predetermined number of frames, that is, if this condition is satisfied, step S90 is executed. If this condition is not met, the gesture detection method ends.

In step S90, the face information acquisition unit 20 acquires information on the face orientation and head position in the frame in which the face orientation of the occupant was detected most recently from the storage unit 60.

In step S100, the hand candidate detection unit 30 detects the occupant's hand candidate in the frame to be processed.

In step S110, the determination unit 40 determines whether or not the face orientation and the head position satisfy predetermined conditions. Here, the predetermined condition is that at least one of the pitch angle, yaw angle and head position exceeds the predetermined range. If at least one is beyond that range, step S120 is executed. If everything is within that range, the gesture detection method ends.

FIG. 10 is a diagram showing an example of a frame to be processed. The frame shown in FIG. 10 is a frame after the frame shown in FIG. 9, and is a frame within a predetermined number of frames. In FIG. 10, the occupant is looking further in order to confirm the information displayed on the center console of the vehicle in detail. The face detection unit 10 has failed to detect the occupant's face 11, and the face frame 12 is not set. The hand candidate detection unit 30 erroneously detects the occupant's head as a hand candidate. Further, the hand candidate frame 32 is set so as to include the hand candidate. As described above, FIG. 10 is a frame within a predetermined number of frames from the frame of FIG. Therefore, the face information acquisition unit 20 acquires information on the face orientation and the head position in the frame of FIG. The occupant's face is diagonally downward. The pitch angle and yaw angle exceed a predetermined range. The determination unit 40 determines that the face orientation satisfies a predetermined condition. Therefore, step S120 is executed.

In step S120, the determination unit 40 rejects the hand candidate information. For example, the determination unit 40 identifies the hand candidate as something other than a hand. For example, the determination unit 40 replaces the detection result of the hand candidate with the detection result of an object other than the hand. In this way, the determination unit 40 rejects the hand candidate information based on a predetermined condition regarding the face orientation. This completes the gesture detection method.

In the above gesture detection method, the gesture detection device 101 performs a hand candidate detection process after performing a face 11 detection process and a face orientation information acquisition process. However, the gesture detection device 101 may execute the face 11 detection process and the face orientation information acquisition process after the hand candidate detection process. Alternatively, the gesture detection device 101 may execute the hand candidate detection process in parallel with the face 11 detection process and the face orientation information acquisition process.

Next, as an example, a gesture detection method in the second frame when the first frame and the second frame constituting the video have the above relationship will be described. Here, the face detection unit 10 succeeds in detecting the occupant's face 11 and the face orientation in the first frame, and fails to detect the occupant's face 11 in the second frame. The first frame is a frame in which the face orientation of the occupant is detected most recently from the second frame. The frame shown in FIG. 9 corresponds to the first frame, and the frame shown in FIG. 10 corresponds to the second frame.

In step S10, the image acquisition unit 50 acquires the second frame in the image captured by the image pickup device 110.

In step S20, the face detection unit 10 fails to detect the occupant's face 11 in the second frame. Therefore, the face orientation and head position are not detected.

In step S30, the gesture detection device 101 determines that the face orientation of the occupant has not been detected. Step S80 is executed.

In step S80, the face information acquisition unit 20 determines whether or not the second frame is within a predetermined number of frames from the first frame in which the face orientation of the occupant is detected most recently. As described above, in order to satisfy this condition for the first frame and the second frame, step S90 is executed.

In step S90, the face information acquisition unit 20 acquires information on the face orientation and head position in the first frame from the storage unit 60.

In step S100, the hand candidate detection unit 30 detects the occupant's hand candidate in the second frame.

In step S110, the determination unit 40 determines whether or not at least one of the pitch angle, yaw angle, and head position in the first frame exceeds a predetermined range. If at least one is beyond that range, step S120 is executed. If everything is within that range, the gesture detection method ends.

In step S120, the determination unit 40 rejects the hand candidate information in the second frame. This completes the gesture detection method for one processing target frame. After that, step S10 is executed again for the next frame.

Such a gesture detection device 101 reduces the identification of objects other than the occupant's hand as the hand 31. That is, the gesture detection device 101 accurately detects the hand 31 that constitutes the gesture of the occupant.

When the occupant operates the in-vehicle device 120, the occupant looks into the information displayed on the display device such as the dashboard of the vehicle and the center console to confirm. In that case, the occupant's head is reflected in the detection range of the hand candidate. The hand candidate detection unit 30 may determine that the occupant's head (or face 11) is a closed hand 31 (such as a thumbs-up hand) and detect it as a hand candidate (for example, FIG. 10). On the other hand, when the occupant normally makes a gesture for operating the vehicle equipment, the face orientation of the occupant is included in the diagonal range from the front direction of the vehicle to the position of the image pickup device 110 (for example, FIG. 8). When the occupant looks into the displayed information, the face orientation is out of the range. The gesture detection device 101 in the second embodiment rejects the information of the hand candidate erroneously detected based on the predetermined condition regarding the face orientation. In other words, when the face orientation of the occupant exceeds a predetermined range, the gesture detection device 101 identifies that the hand candidate is something other than the hand, and rejects the information of the hand candidate. As a result, the gesture detection device 101 accurately detects the hand 31 in the gesture of the occupant.

The predetermined conditions regarding face orientation are not limited to the above conditions. For example, the condition is that at least one of the pitch, yaw, and roll angles and the head position of at least one of the lateral head position, the depth head position, and the height head position. It may be a combination of conditions.

The gesture detection device 101 in the second embodiment includes a storage unit 60. The storage unit 60 stores information on the face orientation detected for each frame of the image and information on the head position of the occupant. When the face orientation of the occupant in the first frame of the image is detected and the face orientation of the occupant in the second frame after the first frame is not detected, the face information acquisition unit 20 performs the face in the first frame. The orientation information and the head position information are acquired from the storage unit 60. The second frame is a frame within a predetermined number of frames (first predetermined number of frames) from the first frame. The hand candidate detection unit 30 detects the hand candidate in the second frame. The determination unit 40 rejects the hand candidate information in the second frame based on the predetermined conditions regarding the face orientation and the head position in the first frame.

Since the face detection pattern matching process and the hand candidate detection pattern matching process are different from each other, even if the face detection unit 10 fails to detect the occupant's face orientation, the hand candidate detection unit 30 still uses the occupant's face 11. , Head, etc. may be erroneously detected as a hand candidate (for example, FIG. 10). The movement of the occupant from the posture shown in FIG. 9 to the posture shown in FIG. 10, that is, the movement of the occupant looking into the display device is continuous and is performed in a short time. Therefore, even when the determination unit 40 rejects the hand candidate information in the frame to be processed based on the face orientation information in the frame close in time to the frame to be processed, the accuracy of the rejection determination is high. It doesn't get worse. The gesture detection device 101 according to the second embodiment prevents the occupant's face 11, head, and the like from being detected as hand candidates even when the occupant's face orientation is temporarily not detected. As a result, the detection accuracy of the occupant's hand 31 is improved.

The first frame in the second embodiment is a frame in which the face orientation of the occupant is detected most recently from the second frame.

Even if the occupant's face 11 is temporarily not detected, the gesture detection device 101 determines whether or not the most recently detected face orientation satisfies a predetermined condition. Therefore, the gesture detection device 101 accurately detects the occupant's hand 31.

<Embodiment 3>
The gesture detection device and the gesture detection method according to the third embodiment will be described. The third embodiment is a subordinate concept of the first embodiment, and the gesture detection device according to the third embodiment includes each configuration of the gesture detection device 101 according to the second embodiment. The same configuration and operation as those of the first and second embodiments will be omitted.

FIG. 11 is a diagram showing the relationship from the first frame to the fourth frame in the third embodiment.

The first frame is the first frame to be processed among the plurality of frames constituting the video. In the first frame, the face orientation of the occupant is detected, and the hand candidate is also detected. When the frame to be processed is the first frame, the determination unit 40 determines whether or not at least one of the pitch angle and the yaw angle of the face 11 in the first frame exceeds a predetermined range. .. The determination unit 40 rejects the hand candidate information in the first frame based on the determination result.

The second frame is a frame after the first frame by a predetermined number of frames. The first predetermined number of frames may be stored in the gesture detection device, for example, or may be input from the outside. In each frame from the frame following the first frame to the second frame, the face orientation of the occupant is not detected, but the hand candidate is detected. The first frame is a frame in which the face orientation of the occupant is detected most recently from the second frame. When the frame to be processed is any of the frames from the frame following the first frame to the second frame, the determination unit 40 has at least one of the pitch angle, yaw angle, and head position in the first frame in advance. Determine if it exceeds the specified range. The determination unit 40 rejects the hand candidate information in each frame based on the determination result. This determination operation is the same as that of the second embodiment. The gesture detection device prevents the occupant's face 11, head, and the like from being detected as hand candidates even when the occupant's face orientation is temporarily not detected.

The third frame is a frame after the second frame. In each frame from the second frame to the third frame, the face orientation of the occupant is not detected, but the hand candidate is detected. When the frame to be processed is any of the frames from the frame next to the second frame to the third frame, the determination unit 40 rejects the hand candidate information. In other words, the determination unit 40 rejects the hand candidate information without making a rejection determination regarding the face orientation.

If the face orientation is not detected for a continuous period of time, the number of frames between the first frame and the frame to be processed becomes larger than the first predetermined number of frames. Therefore, in step S80 shown in FIG. 7, each frame from the frame next to the second frame to the third frame is determined to be "No". That is, the hand candidate information is not rejected. Therefore, an object other than the hand detected as a hand candidate may be recognized as the hand 31. Therefore, the gesture detection device rejects the information of the hand candidate detected in each frame from the frame next to the second frame to the third frame without making a rejection determination regarding the face orientation. Therefore, the detection accuracy of the occupant's hand 31 is improved.

The fourth frame is a frame after the second predetermined number of frames from the frame following the third frame. The second predetermined number of frames may be stored in the gesture detection device, for example, or may be input from the outside. The face orientation of the occupant is continuously detected from the frame following the third frame to the fourth frame. Also, hand candidates are detected. When the frame to be processed is any frame from the frame next to the third frame to the frame immediately before the fourth frame, the determination unit 40 rejects the hand candidate information. In other words, the determination unit 40 rejects the hand candidate information without making a rejection determination regarding the face orientation.

If the face orientation is detected again after a period of time when the face orientation is not detected, the detected face orientation may not be accurate. Further, when the occupant makes a gesture, the occupant's face 11 usually faces the direction in which the image pickup apparatus 110 is located from the front direction. Therefore, it is preferable that the rejection determination of the hand candidate information is restarted in a state where the face orientation of the occupant is likely to be included in the predetermined range. The gesture detection device according to the third embodiment rejects the hand candidate information until the face detection is successful for the second predetermined number of frames or more.

When the frame to be processed is the fourth frame, the determination unit 40 restarts the determination as to whether or not to reject the hand candidate information. This improves the detection accuracy of the occupant's hand 31.

The above functions are realized by the processing circuit shown in FIG. 2 or FIG.

<Embodiment 4>
The gesture detection device and the gesture detection method according to the fourth embodiment will be described. The fourth embodiment is a subordinate concept of the first embodiment. The gesture detection device according to the fourth embodiment includes each configuration of the gesture detection device 101 according to the second embodiment. The same configuration and operation as any one of the first to third embodiments will be omitted.

FIG. 12 is a functional block diagram showing the configuration of the gesture detection device 102 according to the fourth embodiment. The gesture detection device 102 includes a hand candidate detection unit 30A. The hand candidate detection unit 30A in the fourth embodiment is a modified example of the hand candidate detection unit 30 in the second embodiment.

The hand candidate detection unit 30A performs hand candidate detection processing in the hand candidate detection target area set in the video. In the normal hand candidate detection process, the detection target area is preset, for example, at a position where the occupant performs a hand gesture. Or, for example, the detection target area is set at a position including the hand candidate frame 32 detected in the frame before the frame to be processed. The hand candidate detection unit 30A in the fourth embodiment narrows the detection target area based on a predetermined condition regarding the face orientation. For example, as a predetermined condition, when at least one of the pitch angle and the yaw angle exceeds the predetermined range, the hand candidate detection unit 30A narrows the detection target area. In other words, the predetermined condition is that at least one of the pitch angle and the yaw angle exceeds the predetermined range. The hand candidate detection unit 30A performs hand candidate detection processing in the narrowed detection target area.

The determination of whether or not the face orientation satisfies a predetermined condition is executed by, for example, the hand candidate detection unit 30A. Alternatively, for example, the hand candidate detection unit 30A may acquire the determination result by the determination unit 40. The detection target area corresponds to, for example, a region called a gesture detection region.

FIG. 13 is a flowchart showing the gesture detection method according to the fourth embodiment. Steps S10 to S50 are the same as those in the second embodiment. Further, step S80 and step S90 are the same as those in the second embodiment.

In steps S60 and S100, the hand candidate detection unit 30A detects the occupant's hand candidate in the detection target area. FIG. 14 is a diagram showing an example of a frame to be processed. In FIG. 14, the occupant does not make a hand gesture for operating the in-vehicle device 120. The occupant operates the rear view mirror 2 provided in the room by hand 31. The occupant is looking at the rear-view mirror 2. The face detection unit 10 detects the occupant's face 11. The face frame 12 is set so as to surround the face 11. The detection target area 33A is set in the frame. The hand candidate detection unit 30A detects the occupant's hand 31 as a hand candidate within the detection target area 33A. The hand candidate frame 32 is set so as to include the hand candidate.

In steps S70 and S110, the determination unit 40 determines whether or not the face orientation satisfies a predetermined condition. In FIG. 14, the face orientation of the occupant detected by the face detection unit 10 is diagonally upward. Both the pitch angle and the yaw angle indicating the face orientation exceed a predetermined range. The hand candidate detection unit 30A determines that the face orientation satisfies a predetermined condition. Step S120 is executed.

In step S120, the hand candidate information is rejected. In the fourth embodiment, step S130 is executed after step S120.

In step S130, the hand candidate detection unit 30A narrows the detection target area 33A so that the rejected hand candidate is not detected. The hand candidate detection unit 30A sets, for example, a detection target area 33B in which the size of the detection target area 33A is reduced so as not to include the hand candidate frame 32. Here, the detection target area 33B corresponds to an area in which the upper portion of the detection target area 33A is reduced. In the gesture detection process in the next frame, the hand candidate detection unit 30A detects the hand candidate in the narrowed detection target area 33B.

Such a gesture detection device 102 does not detect the hand 31 that operates the rearview mirror 2 as a hand in the gesture for operating the in-vehicle device 120. More specifically, the gesture detection device 102 rejects the information of the hand candidate detected based on the hand 31 that operates the rearview mirror 2. Further, the gesture detection device 102 narrows the detection target area 33A based on the information on the face orientation of the occupant so that the hand candidate is not detected in the area around the rearview mirror 2.

The positions and sizes of the

detection target areas

33A and 33B shown in FIG. 14 are examples, and are not limited thereto. For example, the

detection target areas

33A and 33B may be not only the central portion between the driver's seat and the passenger seat but also the regions extended in the direction of both seats (left-right direction).

The gesture detection device 102 described above executes a process of narrowing the detection target area 33A after the process of rejecting the information of the hand candidate. However, the timing of the process of narrowing the detection target area 33A may be between steps S50 and S60 and between steps S90 and S100. In that case, the hand candidate detection unit 30A narrows the detection target area 33A in the processing target frame when the face orientation in the processing target frame satisfies a predetermined condition.

<Embodiment 5>
The gesture detection device shown in each of the above embodiments can be applied to a system constructed by appropriately combining a navigation device, a communication terminal, a server, and the functions of applications installed in the navigation device. can. Here, the navigation device includes, for example, a PND (Portable Navigation Device) and the like. The communication terminal includes, for example, a mobile terminal such as a mobile phone, a smartphone and a tablet.

FIG. 15 is a block diagram showing the configuration of the gesture detection device 101 and the device that operates in connection with the gesture detection device 101 in the fifth embodiment.

The gesture detection device 101 and the communication device 130 are provided in the server 300. The gesture detection device 101 acquires an image taken by the image pickup device 110 provided in the vehicle 1 via the communication device 140 and the communication device 130. The gesture detection device 101 acquires information on the face orientation of the occupant detected based on the image. The gesture detection device 101 detects a hand candidate based on the image. The gesture detection device 101 rejects the hand candidate information based on a predetermined condition regarding the face orientation. The gesture detection device 101 identifies the hand candidate that has not been rejected as the hand 31 that constitutes the gesture of the occupant. Based on the gesture by the occupant's hand 31 identified by the gesture detection device 101, the operation process of the in-vehicle device 120 and the like are executed.

By arranging the gesture detection device 101 on the server 300 in this way, it is possible to simplify the configuration of the device mounted on the vehicle 1.

Further, some of the functions or components of the gesture detection device 101 may be provided in the server 300, and the other part may be provided in the vehicle 1 in a distributed manner. The same effect is obtained when the gesture detection device 100 shown in the first embodiment is provided in the server 300.

In this disclosure, each embodiment can be freely combined, and each embodiment can be appropriately modified or omitted.

Although this disclosure has been described in detail, the above description is exemplary and not limited in all aspects. A myriad of variants not illustrated can be envisioned.

1 vehicle, 2 room mirror, 10 face detection unit, 11 face, 12 face frame, 20 face information acquisition unit, 30 hand candidate detection unit, 30A hand candidate detection unit, 31 hand, 32 hand candidate frame, 33A detection target area, 33B detection target area, 40 judgment unit, 50 video acquisition unit, 60 storage unit, 100 gesture detection device, 101 gesture detection device, 102 gesture detection device, 110 image pickup device, 120 in-vehicle device.

Claims

A face information acquisition unit that acquires information on the face orientation of the occupant detected based on the image captured by the image pickup device installed in the vehicle, and the face information acquisition unit.
A hand candidate detection unit that detects a hand candidate that is a candidate for the occupant's hand based on the video, and a hand candidate detection unit.
A determination unit that rejects the information of the hand candidate so that the hand candidate is not detected as the occupant's hand in the gesture of the occupant to be detected based on a predetermined condition regarding the face orientation. Gesture detector.
Further, a storage unit for storing the information on the face orientation of the occupant and the information on the head position of the occupant detected for each frame of the image is provided.
The face orientation of the occupant in the first frame of the image is detected, and the second frame is within the first predetermined number of frames from the first frame and is later than the first frame. When the face orientation of the occupant in the second frame is not detected,
The face information acquisition unit acquires the information on the face orientation and the information on the head position in the first frame from the storage unit.
The hand candidate detection unit detects the hand candidate in the second frame and determines the hand candidate.
The determination unit rejects the information of the hand candidate in the second frame based on the condition relating to the face orientation and the head position in the first frame as the predetermined conditions. Gesture detector according to.
The gesture detection device according to claim 2, wherein the first frame is a frame in which the face orientation of the occupant is detected most recently from the second frame.
From the frame following the first frame to the third frame beyond the second frame, the face orientation of the occupant is not continuously detected, and from the frame next to the third frame to the fourth frame. , When the face orientation of the occupant is continuously detected,
The determination unit
In each frame from the next frame of the third frame to the frame immediately before the fourth frame, the information of the hand candidate is rejected.
In the fourth frame, the determination of whether or not to reject the information of the hand candidate is restarted, and the determination is resumed.
The second frame is a frame after the first frame by a predetermined number of frames.
The gesture detection device according to claim 3, wherein the fourth frame is a frame after the third frame by a second predetermined number of frames.
The hand candidate detection unit
Based on the predetermined conditions regarding the face orientation, the detection target area of the hand candidate set in the image is narrowed.
The gesture detection device according to claim 1, wherein the hand candidate in the detection target region is detected.
The information on the face orientation of the occupant detected based on the image captured by the image pickup device installed in the vehicle is acquired, and the information is obtained.
Based on the video, a hand candidate that is a candidate for the occupant's hand is detected.
A gesture detection method that rejects the information of the hand candidate so that the hand candidate is not detected as the occupant's hand in the gesture of the occupant to be detected based on a predetermined condition regarding the face orientation. ..