WO2022009349A1

WO2022009349A1 - Conversation monitoring device, control method, and computer readable medium

Info

Publication number: WO2022009349A1
Application number: PCT/JP2020/026745
Authority: WO
Inventors: 純一船田; 尚志水本
Original assignee: 日本電気株式会社
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2022-01-13
Also published as: JPWO2022009349A1; JP7416253B2

Abstract

A conversation monitoring device (2000) acquires video data (32) in which a plurality of persons (10) (group of persons (40)) positioned within a first predetermined distance of each other inside a monitored area have been detected, and analyzes the video data (32) to determine whether or not a conversation is taking place among the group of persons (40). If the existence of a conversation cannot be determined, the conversation monitoring device (2000) causes a mobile robot (20) provided with a camera (22) to move to a location where an image of the face of each person (10) included in the group of persons (40) can be captured, or causes the mobile robot (20) provided with a microphone (24) to move to a location within a second predetermined distance from the group of persons (40). The conversation monitoring device (2000) analyzes video data (23) obtained from the camera (22) or audio data (25) obtained from the microphone (24) to determine whether or not a conversation is taking place among the group of persons (40).

Description

Conversation monitoring device, control method, and computer-readable medium

The present invention relates to a technique for detecting conversations between a plurality of persons.

From the viewpoint of preventing the spread of infectious diseases, it is preferable to avoid long conversations at short distances. Therefore, a system has been developed to detect a situation in which a conversation is held for a long time at a short distance. For example, Patent Document 1 detects that a resident and a visitor have a conversation for a predetermined time or longer by using an image obtained from a camera installed in a facility, and infects an infectious disease according to the detection. It discloses a technology to notify that the risk is high. Here, in Patent Document 1, a state of facing each other at a short distance is detected as a state of having a conversation.

International Publication No. 2019/239913

The system of Patent Document 1 detects that a resident and a visitor are facing each other by using an image obtained from a camera fixedly installed in the facility. However, when using a fixedly installed camera, there is a possibility that the face of the resident or the visitor cannot be detected from the image of the camera because the resident or the visitor enters the blind spot of the camera. As a result, even if the resident and the visitor are having a conversation, it may not be detected.

The present invention has been made in view of the above problems, and one of the purposes thereof is to provide a technique for accurately detecting a situation in which a conversation is taking place.

The conversation monitoring device of the present disclosure acquires first video data in which a plurality of persons located within a first predetermined distance from each other in the monitoring area are detected, analyzes the first video data, and analyzes the first video data to obtain the plurality of persons. The first camera determines whether or not the plurality of persons are having a conversation even if the first determination unit for determining whether or not the person is having a conversation and the first video data are analyzed. The mobile robot provided is moved to a position where the faces of the plurality of persons can be imaged, or the mobile robot provided with the microphone is moved to a place within a second predetermined distance from the plurality of persons. After moving the mobile control unit and the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed, and whether the plurality of persons are having a conversation. It has a second determination unit for determining whether or not it is present.

The control method of the present disclosure is executed by a computer. The control method acquires first video data in which a plurality of persons located within a first predetermined distance from each other in the monitoring area are detected, analyzes the first video data, and the plurality of persons have a conversation. A first camera is provided when it is not possible to determine whether or not the plurality of persons are talking even if the first determination step for determining whether or not the video is being performed and the first video data are analyzed. Movement control to move the mobile robot to a position where the faces of the plurality of persons can be imaged, or to move the mobile robot provided with the microphone to a place within a second predetermined distance from the plurality of persons. After moving the step and the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed to determine whether or not the plurality of persons are talking. It has a second determination step for determination.

The computer-readable medium of the present disclosure stores a program that causes a computer to execute the control method of the present disclosure.

According to the present invention, there is provided a technique for accurately detecting a situation in which a conversation is taking place.

It is a figure which illustrates the outline of the conversation monitoring apparatus of Embodiment 1. FIG. It is a figure which illustrates the functional structure of a conversation monitoring apparatus. It is a block diagram which illustrates the hardware composition of the computer which realizes a conversation monitoring device. It is a block diagram which illustrates the hardware composition of a mobile robot. It is a flowchart which illustrates the flow of the process executed by the conversation monitoring apparatus of Embodiment 1. FIG.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. In each drawing, the same or corresponding elements are designated by the same reference numerals, and duplicate explanations are omitted as necessary for the sake of clarity of explanation.

FIG. 1 is a diagram illustrating an outline of the conversation monitoring device of the first embodiment (conversation monitoring device 2000 of FIG. 2 described later). The following description given with reference to FIG. 1 is for facilitating the understanding of the conversation monitoring device 2000 of the first embodiment, and the operation of the conversation monitoring device 2000 of the first embodiment will be described below. Not limited to things.

The conversation monitoring device 2000 detects a situation in which a plurality of persons 10 are talking in a predetermined monitoring area. The monitoring area can be any place such as an office. Further, the monitoring area may be outdoors.

First, the conversation monitoring device 2000 analyzes the video data 32 generated by the camera 30 and attempts to determine whether or not conversation is being performed by a plurality of persons 10 (hereinafter referred to as conversation determination). Here, the camera 30 may be a camera 22 mounted on the mobile robot 20 described later, or a camera other than the camera 22 (for example, a surveillance camera provided on a wall, a ceiling, or the like). May be good. In the former case, the camera 22 mounted on the mobile robot 20 generates both the video data 32 and the video data 23 described later. Hereinafter, a set of a plurality of persons 10 to be determined whether or not a conversation is being performed is referred to as a person group 40. The video data 32 may include a plurality of person groups 40.

Here, in the video data 32, the plurality of persons 10 included in the person group 40 have a predetermined distance L1 or less from each other. In other words, video data including a plurality of persons 10 whose distance from each other is a predetermined distance L1 or less is treated as video data 32. When three or more people 10 are included in the person group 40, "a plurality of people 10 have a predetermined distance L1 or less from each other" means, for example, "all people 10 have a diameter L1". It is included in the circle. "

Any value can be adopted for the predetermined distance L1. For example, the predetermined distance L1 is determined based on the distance between people (so-called social distance) that should be left to prevent infectious diseases. Specifically, as a predetermined distance L1, a value defined as a social distance or a value obtained by adding a predetermined margin to the value can be used.

When the conversation monitoring device 2000 cannot determine whether or not a conversation is being performed in the person group 40 even after analyzing the video data 32, the conversation monitoring device 2000 uses the mobile robot 20 to determine the conversation about the person group 40. Do more. Specifically, the conversation monitoring device 2000 uses video data 32 to determine conversation, and 1) conversation is taking place in the person group 40, 2) conversation is not taking place in the person group 40, and 3). One of the three that it is not possible to determine whether or not a conversation is taking place in the person group 40 (for example, neither the probability of having a conversation nor the probability of not having a conversation is sufficiently high). Judgment result is obtained. Then, when the determination result is 3), the conversation monitoring device 2000 further performs a conversation determination by using the mobile robot 20.

The conversation monitoring device 2000 uses the video data 23 obtained from the camera 22 mounted on the mobile robot 20 or the voice data 25 obtained from the microphone 24 mounted on the mobile robot 20. The camera 22 and the microphone 24 may be provided with either one or both. At this time, the conversation monitoring device 2000 moves the mobile robot 20 to an appropriate position (hereinafter referred to as a destination) so that it can be determined whether or not a conversation is being performed in the person group 40.

For example, suppose that the video data 23 is used for conversation determination. In this case, for example, the conversation monitoring device 2000 moves the mobile robot 20 to a position where the face of each person 10 included in the person group 40 can be imaged, and uses the video data 23 obtained after the movement to describe the person group 40. Make a conversation judgment.

On the other hand, it is assumed that the voice data 25 is used for conversation determination. In this case, for example, the conversation monitoring device 2000 moves the mobile robot 20 to a position where the distance from the person group 40 is a predetermined distance L2, and makes a conversation determination for the person group 40 using the voice data 25 obtained after the movement. ..

When it is determined that a conversation is taking place in the person group 40, the conversation monitoring device 2000 performs a predetermined coping process (for example, a warning process).

<Example of action effect>
In the conversation monitoring device 2000 of the present embodiment, even if the video data 32 in which a plurality of persons 10 (person group 40) whose distances from each other are equal to or less than a predetermined distance L1 is detected, these plurality of persons 10 have a conversation. When it is not possible to determine whether or not the robot 20 is used, the mobile robot 20 is used. Specifically, the conversation monitoring device 2000 moves the mobile robot 20 to an appropriate position, and uses the video data 23 or the voice data 25 obtained from the mobile robot 20 to determine the conversation about the person group 40. conduct. As described above, according to the conversation monitoring device 2000 of the present embodiment, when it is not possible to determine whether or not a plurality of persons 10 located within a specific distance from each other are having a conversation, the mobile robot 20 is controlled. By doing so, it is possible to obtain video data 23 and audio data 25 that can be discriminated. Therefore, it is possible to determine with higher accuracy whether or not a plurality of persons 10 located within a specific distance are having a conversation.

Hereinafter, the conversation monitoring device 2000 of this embodiment will be described in more detail.

<Example of functional configuration>
FIG. 2 is a diagram illustrating the functional configuration of the conversation monitoring device 2000. The conversation monitoring device 2000 includes a first determination unit 2020, a movement control unit 2040, and a second determination unit 2060. The first determination unit 2020 analyzes the video data 32 and determines whether or not a conversation is taking place in the person group 40. When the movement control unit 2040 cannot determine whether or not a conversation is being performed in the person group 40 by the determination using the video data 32, the movement control unit 2040 transfers the mobile robot 20 provided with the camera 22 to the person group 40. The face of each of the included persons 10 is moved to a position where imaging is possible, or the mobile robot 20 provided with the microphone 24 is moved to a position where the distance from the person group 40 is a predetermined distance L2 or less. The second determination unit 2060 analyzes the video data 23 obtained from the camera 22 or the voice data 25 obtained from the microphone 24 after the mobile robot 20 moves, and determines whether or not a conversation is taking place in the person group 40. To judge.

<Example of hardware configuration>
Each functional component of the conversation monitoring device 2000 may be realized by hardware that realizes each functional component (eg, a hard-wired electronic circuit, etc.), or a combination of hardware and software (eg, example). It may be realized by a combination of an electronic circuit and a program that controls it). Hereinafter, a case where each functional component of the conversation monitoring device 2000 is realized by a combination of hardware and software will be further described.

FIG. 3 is a block diagram illustrating a hardware configuration of a computer 500 that realizes the conversation monitoring device 2000. The computer 500 is any computer. For example, the computer 500 is a stationary computer such as a PC (Personal Computer) or a server machine. In addition, for example, the computer 500 is a portable computer such as a smartphone or a tablet terminal. In addition, for example, the computer 500 may be a controller (controller 600 described later) built in the mobile robot 20. In this case, the conversation monitoring device 2000 is realized as the mobile robot 20 (that is, the mobile robot 20 also has a function as the conversation monitoring device 2000). The computer 500 may be a dedicated computer designed to realize the conversation monitoring device 2000, or may be a general-purpose computer.

For example, by installing a predetermined application on the computer 500, each function of the conversation monitoring device 2000 is realized on the computer 500. The above application is composed of a program for realizing the functional component of the conversation monitoring device 2000.

The computer 500 has a bus 502, a processor 504, a memory 506, a storage device 508, an input / output interface 510, and a network interface 512. The bus 502 is a data transmission path for the processor 504, the memory 506, the storage device 508, the input / output interface 510, and the network interface 512 to transmit and receive data to and from each other. However, the method of connecting the processors 504 and the like to each other is not limited to the bus connection.

The processor 504 is various processors such as a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or an FPGA (Field-Programmable Gate Array). The memory 506 is a main storage device realized by using RAM (RandomAccessMemory) or the like. The storage device 508 is an auxiliary storage device realized by using a hard disk, SSD (Solid State Drive), memory card, ROM (Read Only Memory), or the like.

The input / output interface 510 is an interface for connecting the computer 500 and the input / output device. For example, an input device such as a keyboard and an output device such as a display device are connected to the input / output interface 510.

The network interface 512 is an interface for connecting the computer 500 to the wireless network. This network may be a LAN (Local Area Network) or a WAN (Wide Area Network). For example, the computer 500 is communicably connected to the mobile robot 20 via a network interface 512 and a wireless network.

The storage device 508 stores a program (a program that realizes the above-mentioned application) that realizes each functional component of the conversation monitoring device 2000. The processor 504 reads this program into the memory 506 and executes it to realize each functional component of the conversation monitoring device 2000.

The conversation monitoring device 2000 may be realized by one computer 500 or by a plurality of computers 500. In the latter case, the configurations of the computers 500 do not have to be the same and can be different.

<Example of hardware configuration of mobile robot 20>
FIG. 4 is a block diagram illustrating a hardware configuration of the mobile robot 20. The mobile robot 20 includes a camera 22, a microphone 24, an actuator 26, a moving means 27, and a controller 600. The mobile robot 20 moves by operating the moving means 27 according to the output of the actuator 26. For example, the moving means 27 is a means for realizing traveling, such as a wheel. In this case, the mobile robot 20 travels and moves in the monitoring area. In addition, for example, the transportation means 27 may be a means for realizing flight, such as a propeller. In this case, the mobile robot 20 flies and moves in the monitoring area. The output of the actuator 26 is controlled by the controller 600.

The controller 600 is an arbitrary computer, and is realized by an integrated circuit such as SoC (System on a Chip) or SiP (System in a Package). In addition, for example, the controller 600 may be realized by a mobile terminal such as a smartphone. The controller 600 has a bus 602, a processor 604, a memory 606, a storage device 608, an input / output interface 610, and a network interface 612. The bus 602, processor 604, memory 606, storage device 608, I / O interface 610, and network interface 612 are similar to the bus 502, processor 504, memory 506, storage device 508, I / O interface 510, and network interface 512, respectively. Has a function.

<Processing flow>
FIG. 5 is a flowchart illustrating the flow of processing executed by the conversation monitoring device 2000 of the first embodiment. The first determination unit 2020 acquires the video data 32 including the person group 40, analyzes the video data 32, and makes a conversation determination about the person group 40 (S102). When it is determined that a conversation is taking place in the person group 40 (S102: with conversation), the conversation monitoring device 2000 executes a predetermined coping process (S104). When it is determined that no conversation is taking place in the person group 40 (S102: no conversation), the process of FIG. 5 ends.

When it is determined that it cannot be determined whether or not the conversation is being performed in the person group 40 (S102: cannot be determined), the loop process A is executed. The loop process A is composed of S106 to S112. The movement control unit 2040 moves the mobile robot 20 (S108). The second determination unit 2060 makes a conversation determination using the video data 23 or the voice data 25 obtained from the mobile robot 20 after the movement (S110).

When it is determined that a conversation is taking place in the person group 40 (S110: with conversation), the conversation monitoring device 2000 executes a coping process (S104). When it is determined that no conversation is taking place in the person group 40 (S110: no conversation), the process of FIG. 5 ends. If it is determined that it cannot be determined whether or not the conversation is being performed in the person group 40 (S110: cannot be determined), the process returns to step S106 and the loop process A is executed again. By doing so, the conversation determination is performed while moving the mobile robot 20 until it can be determined whether or not there is a conversation.

Note that in the flowchart of FIG. 5, the loop process A continues to be executed while it cannot be determined whether or not there is a conversation in the person group 40. However, even if it cannot be determined whether or not there is a conversation, the loop process A may be terminated and the process of FIG. 5 may be terminated when the predetermined termination condition is satisfied. For example, the conversation monitoring device 2000 determines in S106 whether or not a predetermined end condition is satisfied, and if the end condition is satisfied, ends the loop process A. On the other hand, when the predetermined end condition is satisfied, the process of FIG. 5 ends.

Any condition can be adopted as the predetermined end condition. For example, the predetermined end condition is a condition that "a predetermined time has elapsed since the conversation determination (S102) was first performed for the person group 40". In addition, for example, the predetermined end condition may be a condition that "the distance between the persons 10 included in the person group 40 is larger than the predetermined distance L1". By doing so, the plurality of persons 10 whose distance from each other is larger than a predetermined distance can be excluded from the target of the conversation determination.

<About video data 32>
The video data 32 is video data generated by the camera 30, and is obtained by detecting a person group 40 (a plurality of people 10 whose distance from each other is a predetermined distance L1 or less). Here, a device that performs a process of detecting a person group 40 from video data is called a person group detection device. The person group detection device may be the conversation monitoring device 2000 (that is, the conversation monitoring device 2000 may also have the function of the person group detection device), or may be a device other than the conversation monitoring device 2000.

The person group detection device detects a plurality of people 10 from the video data generated by the camera 30, and identifies that the distance between these people 10 is a predetermined distance L1 or less, thereby making these people 10 people. Detected as group 40. A plurality of cameras 30 may be provided.

Here, there are various methods for specifying that the distance between the persons 10 is a predetermined distance L1 or less. For example, the person group detection device analyzes the video data generated by the camera 30 and detects a plurality of people 10 from the video data. When a plurality of people 10 are detected, the person group detection device controls a projector to project an image representing a specific distance (hereinafter referred to as a distance image) onto the ground. Here, the distance image is projected at a position where both the detected plurality of people 10 and the distance image can be included in the imaging range of the camera 30. The distance represented by the distance image is, for example, the predetermined distance L1 described above. The projector may be mounted on the mobile robot 20 or may be installed in another place (for example, the ceiling).

The person group detection device detects a plurality of people 10 and a distance image from the video data generated by the camera 30, and compares the distance between the people 10 with the size of the distance image (that is, a predetermined distance L1 on the image). When the distance between the persons 10 is smaller than the size of the distance image, the person group detection device detects these persons 10 as the person group 40. Then, the person group detection device provides the video data in which the person group 40 is detected and the video data generated by the camera 30 after the video data to the conversation monitoring device 2000 as the video data 32.

The method for specifying that the distance between the persons 10 is a predetermined distance L1 or less is not limited to the above method, and other existing techniques may be used.

It is preferable that the conversation monitoring device 2000 is provided with information that can specify the position of the person group 40 in addition to the video data 32. The information that can specify the position of the person group 40 is, for example, information that represents the position of the person group 40 in the map data that represents the map of the monitoring area. It should be noted that existing technology can be used for the technology for specifying the position of the object captured by the camera on the map data.

<Determination by the first determination unit 2020: S102>
The first determination unit 2020 determines conversation with respect to the person group 40 included in the video data 32 (S102). Specifically, the first determination unit 2020 makes a conversation determination based on the facial condition of each person 10 included in the person group 40. Hereinafter, a method of conversation determination using video data will be illustrated.

<< Judgment based on mouth movement >>
For example, the first determination unit 2020 determines whether or not each person 10 included in the person group 40 is moving his or her mouth, thereby determining whether or not there is a conversation. For example, the first determination unit 2020 determines that if any one of the plurality of persons 10 included in the person group 40 is moving his / her mouth, all the persons 10 included in the person group 40 are having a conversation (that is,). , It is determined that the conversation is taking place in the person group 40). Further, if no person included in the person group 40 is moving his / her mouth, the first determination unit 2020 determines that no conversation is being performed in the person group 40. Further, when the person 10 who is moving the mouth is not detected and the person 10 who cannot determine whether or not the person is moving the mouth is detected, the first determination unit 2020 determines that the presence or absence of conversation cannot be determined. ..

The first determination unit 2020 may determine that the conversation is being performed only by the person 10 who is moving the mouth among the plurality of persons 10 included in the person group 40. In this case, the first determination unit 2020 excludes the person 10 determined not to move the mouth from the person group 40. The first determination unit 2020 is moving the mouth when the person 10 who cannot determine whether or not the mouth is moving is not detected from the video data 32 and two or more people 10 who are moving the mouth are detected. After excluding the non-existing person 10 from the person group 40, it is determined that the conversation is taking place in the person group 40. On the other hand, when the first determination unit 2020 does not detect the person 10 who cannot determine whether or not he / she is moving his / her mouth from the video data 32, and the number of people 10 who are moving his / her mouth is one or less, the person group 40 Judges that no conversation is taking place. Further, when the first determination unit 2020 detects a person 10 who cannot determine whether or not he / she is moving his / her mouth from the video data 32, the first determination unit 2020 excludes the person 10 who is determined not to move his / her mouth from the person group 40. It is determined that the presence or absence of conversation cannot be determined.

Here, the person 10 who cannot determine whether or not he / she is moving his / her mouth may, for example, turn his / her back to the camera that generates the video data 32 because the portion of the mouth is not included in the video data 32. A person whose mouth is not imaged because it is blocked by an obstacle, or a person whose judgment result regarding the presence or absence of mouth movement is low.

For example, the first determination unit 2020 is configured to calculate both the probability of moving the mouth and the probability of not moving the mouth of the person from the time-series data of the image area representing the mouth of the person and its surroundings. To. Then, based on these probabilities, the first determination unit 2020 determines whether or not the person's state is 1) moving the mouth, 2) not moving the mouth, and 3) moving the mouth. Identify one of the following:

For example, a predetermined value T1 is set in advance as a threshold value for the above probability. In this case, if the probability of moving the mouth is T1 or more, it is determined that the person is moving the mouth. On the other hand, if the probability of not moving the mouth is T1 or more, it is determined that the person is not moving the mouth. In addition, if both the probability of moving the mouth and the probability of not moving the mouth are less than the threshold T1.
It is determined that it cannot be determined whether or not the person is moving his or her mouth. For example, when the threshold value T1 = 0.7, if "probability of moving the mouth = 0.6, probability of not moving the mouth = 0.4" is calculated, the judgment result is "it cannot be determined whether or not the mouth is moving". Will be.

<< Judgment based on the direction of the face or line of sight >>
For example, the first determination unit 2020 determines whether or not there is a conversation based on the direction of the face or the line of sight of each person 10 included in the person group 40. Hereinafter, the case of using the orientation of the face will be described more specifically. Unless otherwise specified, the description of the case where the direction of the line of sight is used is the one in which the "face" is replaced with the "line of sight" in the following description.

For example, the first determination unit 2020 is a person included in the person group 40 when the face of each person 10 included in the person group 40 faces toward any other person 10 included in the person group 40. 10 It is determined that all the members are having a conversation (that is, it is determined that the person group 40 is having a conversation). Further, when none of the faces of each person included in the person group 40 are facing toward the other person 10 included in the person group 40, the first determination unit 2020 is having a conversation in the person group 40. It is determined that there is no such thing. Further, the first determination unit 2020 determines that the presence or absence of conversation cannot be determined when the person 10 whose face orientation cannot be determined is detected from the person group 40.

The first determination unit 2020 determines that the conversation is being performed only by the person 10 whose face is facing the other person 10 included in the person group 40 among the plurality of persons 10 included in the person group 40. You may. In this case, the first determination unit 2020 excludes the person 10 who is determined not to face the other person 10 included in the person group 40 from the person group 40. If the first determination unit 2020 does not detect the person 10 whose face direction cannot be determined from the video data 32, and if two or more persons 10 whose faces are facing toward the other person 10 are detected, the other person After excluding the person 10 whose face is not facing toward the person 10 from the person group 40, it is determined that the conversation is being performed in the person group 40. On the other hand, when the first determination unit 2020 does not detect the person 10 whose face direction cannot be determined from the video data 32, and the number of persons 10 whose faces are facing toward the other person 10 is one or less. It is determined that no conversation is taking place in the person group 40. Further, when the first determination unit 2020 detects a person 10 whose face orientation cannot be determined from the video data 32, the first determination unit 2020 excludes the person 10 whose face is not facing the other person 10 from the person group 40. , Judge that the presence or absence of conversation cannot be determined.

For example, the face portion of the person 10 whose face orientation cannot be determined is not included in the video data 32 (for example, the person turns his back to the camera that generates the video data 32 or is obstructed by an obstacle. A person whose face part is not imaged) or a person whose accuracy of the discrimination result regarding the orientation of the face is low. When the direction of the line of sight is used instead of the direction of the face, the part of the eyes is analyzed instead of the part of the face.

For example, in the first determination unit 2020, the face faces the direction of the person in each of a plurality of directions (for example, predetermined 4 directions or 8 directions) from the time series data of the image area representing the face of the person. It is configured to calculate the probability of being present. When there is a direction in which the calculated probability is equal to or greater than the threshold value, the first determination unit 2020 specifies that direction as the direction of the face of the person 10. On the other hand, when the probabilities calculated for each orientation are all less than the threshold value, the first determination unit 2020 determines that the orientation of the face of the person 10 cannot be determined.

<< How to use the model >>
The first determination unit 2020 has a trained model for identifying whether or not the plurality of persons 10 are having a conversation in response to the input of video data including the faces of the plurality of persons 10. May be good. In the model, for example, in response to the input of video data including the faces of a plurality of persons 10, 1) the conversation is being performed and 2) the conversation is not being performed for the plurality of persons 10. 3) Outputs one of the three determination results that it is not possible to determine whether or not a conversation is taking place. Such a model can be realized by, for example, a recurrent neural network (RNN).

For example, the model calculates both the probability that a conversation is taking place and the probability that a conversation is not taking place, and compares these probabilities with the threshold value. If the probability that a conversation is taking place is greater than or equal to the threshold value, the determination result that the conversation is taking place is output. If the probability that no conversation is taking place is greater than or equal to the threshold value, the determination result that no conversation is taking place is output. If both the probability that the conversation is taking place and the probability that the conversation is not taking place are less than the threshold value, a determination result that it cannot be determined whether or not the conversation is taking place is output.

The above model is trained in advance using learning data composed of a combination of "video data and a label for the correct answer (label indicating whether or not a conversation is taking place)". Here, an existing technique can be used as a technique for learning a model using learning data composed of a combination of input data and a correct label.

<Control of the mobile robot 20 and determination by the second determination unit 2060: S108, S110>
The movement control unit 2040 moves the mobile robot 20 toward a position where video data 23 or voice data 25 capable of determining the presence or absence of conversation in the person group 40 can be obtained (S108). Then, the second determination unit 2060 makes a conversation determination about the person group 40 by using the video data 23 or the voice data 25 obtained from the mobile robot 20 during or after the movement. Hereinafter, a method of controlling the movement of the mobile robot 20 and a method of determining conversation will be described separately for a case where the video data 23 is used for the conversation determination and a case where the voice data 25 is used.

<< Case of using video data 23 >>
The method of the conversation determination using the video data 23 is the same as the conversation determination using the video data 32 described above. Therefore, in the conversation determination using the video data 23, the movement of the mouth of each person 10's face, the direction of the face, the direction of the line of sight, and the like are used. Therefore, the movement control unit 2040 directs the mobile robot toward a position where information necessary for specifying the movement of the mouth, the direction of the face, or the direction of the line of sight can be obtained for each person 10 included in the person group 40. Move 20. The information required to identify the movement of the mouth, the orientation of the face, and the orientation of the line of sight is an image region including the mouth, an image region including the face, and an image region including the eyes, respectively.

For example, the movement control unit 2040 moves the mobile robot 20 so as to approach the person group 40. In addition, for example, the movement control unit 2040 moves the mobile robot 20 to a position where there are no obstacles between the person 10 included in the person group 40 and the mobile robot 20. The mobile robot may be moved so as to approach a specific object included in the video data obtained from the camera mounted on the mobile robot, or the robot may be moved to a position where there are no obstacles between the mobile robot and the specific object. Existing technology can be used for the technology itself for moving a mobile robot.

In order to include the mouth and eyes of the person 10 in the video data 23, it is preferable to move the mobile robot 20 to the front of the face of the person 10. Therefore, for example, the movement control unit 2040 calculates the face orientations of the plurality of persons 10 included in the person group 40, and sequentially moves the mobile robot 20 to the front of the faces of the plurality of persons 10. By doing so, the movement control unit 2040 sequentially specifies the movement of the mouth and the direction of the line of sight for each person 10.

In addition, for example, the movement control unit 2040 may move the mobile robot 20 so that the mouths and eyes of a plurality of people 10 can be imaged from one place. For example, the movement control unit 2040 calculates the average direction of the face orientation of each person 10 from the video data 32 and the video data 23, and moves the mobile robot 20 to a position on the direction.

It is assumed that when the mobile robot 20 is moved to the front of the face of the person 10 in this way, the direction of the face of the person 10 cannot be specified from the video data 23. In this case, while the movement control unit 2040 brings the mobile robot 20 closer to the person group 40 or moves the mobile robot 20 so as to rotate around the person group 40, the second determination unit 2060 moves the video. Attempts are made to identify the orientation of the face of the person 10 from the data 23. Then, when the orientation of the face of the person 10 can be specified, the movement control unit 2040 moves the mobile robot 20 to the front of the face of the person 10.

<< Case of using voice data 25 >>
When the voice data 25 is used, the second determination unit 2060 makes a conversation determination about the person group 40 based on the relationship between the size of the voice included in the voice data 25 and the distance to the person group 40. Here, even if the conversation is taking place in the person group 40, if the position of the mobile robot 20 is far from the person group 40, it is difficult for the microphone 24 to detect the voice of the conversation in the person group 40. Therefore, the movement control unit 2040 moves the mobile robot 20 to a position where the distance from the person group 40 is a predetermined distance L3 or less. This predetermined distance L3 is preset as a distance at which the voice of the conversation can be detected by the microphone 24 when the conversation is being held in the person group 40. Then, the second determination unit 2060 acquires voice data 25 from the microphone 24 of the mobile robot 20 that has moved to a position where the distance from the person group 40 is a predetermined distance L3 or less, and the voice represented by the voice data 25. It is determined whether or not the magnitude of is equal to or greater than the threshold value. When the volume of the voice represented by the voice data 25 is equal to or larger than the threshold value, the second determination unit 2060 determines that the conversation is being performed in the person group 40. On the other hand, when the volume of the voice represented by the voice data 25 is less than the threshold value, the second determination unit 2060 determines that the conversation is not performed in the person group 40.

The above threshold value may be a fixed value or may be dynamically set according to the distance from the mobile robot 20 to the person group 40. In the latter case, for example, a function that defines the relationship between the distance and the threshold value is predetermined. The second determination unit 2060 specifies the distance from the mobile robot 20 to the person group 40 at the time when the voice data 25 is obtained from the microphone 24, specifies the threshold value by inputting the distance into the above function, and voices. The loudness of the voice represented by the data 25 is compared with the identified threshold.

Further, the second determination unit 2060 may analyze the voice data 25 and determine whether or not a human voice is included. In this case, the second determination unit 2060 has a conversation in the person group 40 when the volume of the voice represented by the voice data 25 is equal to or larger than the threshold value and the voice includes a human voice. It is determined that there is. On the other hand, if the volume of the voice is less than the threshold value or the voice does not include the human voice, it is determined that the conversation is not performed in the person group 40. By doing so, for example, it is possible to prevent a situation in which a sound other than a human voice is generated from being erroneously detected as a situation in which the person group 40 is having a conversation.

The second determination unit 2060 may consider the number of people whose voice is included in the voice data 25. For example, in the second determination unit 2060, when the volume of the voice represented by the voice data 25 is equal to or larger than the threshold value and the voice includes the voices of a plurality of people, the conversation is performed in the person group 40. It is determined that there is. On the other hand, when the loudness of the voice is less than the threshold value or the voice of the person including the voice is one or less, it is determined that the conversation is not performed in the person group 40. By doing so, for example, it is possible to prevent a situation in which one person is speaking to himself from being erroneously detected as a situation in which the person group 40 is having a conversation.

Further, the second determination unit 2060 has low accuracy of the determination result as to whether or not the voice data 25 contains a human voice, and the accuracy of the calculation result regarding the number of people including the voice in the voice data 25. In some cases, it may be determined that the presence or absence of conversation cannot be determined. For example, if both the probability that the voice data 25 contains a human voice and the probability that the voice data 25 does not contain a human voice are less than a predetermined threshold value, it is determined that the presence or absence of conversation cannot be determined. ..

Further, the second determination unit 2060 has a trained model for identifying whether or not the voice data includes the voices of a plurality of persons 10 having a conversation in response to the input of voice data. You may be doing it. For example, the model cannot determine whether or not 1) conversation is taking place, 2) conversation is not taking place, and 3) conversation is taking place, depending on the input of voice data. The determination result of any one of the three is output. Such a model can be realized by, for example, a recurrent neural network (RNN: Recurrent neural network).

The above model is learned in advance using learning data composed of a combination of "voice data and a label for the correct answer (a label indicating whether or not a conversation is taking place)".

<< About calculation of movement route >>
In order to move the mobile robot 20 to a specific destination, a movement route to the destination is set using map data that can be referred to by the mobile robot 20. Here, a device that calculates a movement route to a destination using map data and performs a process of setting the calculated movement route in the mobile robot 20 is called a route setting device. The route setting device may be a mobile robot 20, a conversation monitoring device 2000, or a device other than these.

The route setting device acquires map data, and based on the map data and the destination (position where the mobile robot 20 should be moved) determined by the various methods described above, determines the movement route of the mobile robot 20. calculate. Then, the route setting device sets the calculated movement route in the mobile robot 20. The mobile robot 20 moves according to a set movement path. When the route setting device is a device other than the conversation monitoring device 2000, the movement control unit 2040 provides the route setting device with information indicating a destination to be set in the mobile robot 20.

The existing technology can be used as the technology for calculating the movement route based on the map data and the destination information.

<< Other matters related to movement control >>
It is preferable that the mobile robot 20 moves so as not to interfere with the behavior of a person in the monitoring area. For example, the mobile robot 20 uses the video data 32 and the video data 23 to grasp the movement of each person in the monitoring area and move so as not to come into contact with each person. As the technology for moving the mobile robot 20 while avoiding contact with humans, existing technology (for example, technology for moving an autonomous vehicle so as not to collide with other vehicles or passersby) is adopted. be able to.

In addition, for example, it is preferable that the mobile robot 20 moves so that the mobile robot 20 does not enter the field of view of a person who is not included in the person group 40. Therefore, for example, when the person 10 not included in the person group 40 is detected from the video data 23, the route setting device specifies the face direction or the line-of-sight direction of the person 10. Then, the route setting device is for the mobile robot 20 to reach the destination without entering the line of sight of the person 10 based on the specified face direction or line-of-sight direction and the destination of the mobile robot 20. The movement route is calculated, and the movement route is set in the mobile robot 20.

However, when the direction of the face or the direction of the line of sight of the person 10 changes significantly repeatedly, it may be difficult to move the mobile robot 20 so as not to be in the line of sight of the person 10. Therefore, for example, the route setting device detects only a person who is unlikely to have a large change in the direction of the face or the direction of the line of sight (for example, a person who is standing or sitting in a chair) from the video data, and the detected person. The movement path of the mobile robot 20 may be set so as not to be in sight.

The mobile robot 20 may be stopped or may be moving until the control by the movement control unit 2040 is received. In the latter case, for example, for the mobile robot 20, a movement route is set so as to patrol a part or all of the monitoring area. In particular, when the person group 40 is detected using the video data 23 (that is, when the camera 22 is also used as the camera 30), the monitoring area is patrolled by the mobile robot 20 in the monitoring area. It is preferable to be able to detect the person group 40 at various places in the country. Hereinafter, the movement route set in the mobile robot 20 for patrol is also referred to as a patrol route.

It is preferable that the patrol route includes a region where the distribution of people is high (that is, there are many people) among the monitoring regions. For example, the patrol route should include only the monitoring area where the distribution of people is high. In addition, for example, the patrol route is set so that the frequency of patrol in a region with a high density of people is higher than the frequency of patrol in a region with a low density of people.

Further, when the camera 30 is a camera fixedly set in a place other than the mobile robot 20 such as a surveillance camera, the patrol path of the mobile robot 20 includes an area not included in the imaging range of the camera 30 (in addition, the camera 30 is not included in the imaging range of the camera 30. Hereinafter, it is preferable that a region outside the imaging range) is included. By doing so, the mobile robot 20 can image an area that is difficult to image with the fixed camera, so that the inside of the monitoring area can be widely monitored.

The patrol route may be set manually or automatically by a route setting device. In the latter case, for example, the route setting device identifies a region outside the imaging range of the camera 30 by analyzing the video data generated by the camera 30, and generates a patrol route including the region outside the imaging range. More specifically, the route setting device identifies an area within the imaging range of the camera 30 by using the map data of the monitoring area and the video data generated by the camera 30, and the area other than the specified area. Is specified as a region outside the imaging range.

For example, it is assumed that the area outside the imaging range is one closed area. In this case, the route setting device generates a patrol route so as to patrol the region outside the imaging range. On the other hand, it is assumed that the regions outside the imaging range are a plurality of regions that are not connected to each other. In this case, for example, the route setting device generates a patrol route so as to sequentially patrol the plurality of regions outside the imaging range. When a plurality of mobile robots 20 are provided in the monitoring area, different patrol routes may be set for each mobile robot 20. In this case, it is preferable that each patrol path includes regions outside the imaging range that are different from each other.

<Corrective action: S104>
When it is determined that the conversation is being performed in the person group 40, the conversation monitoring device 2000 executes a predetermined coping process (S104). Any process can be adopted as the coping process. For example, the coping process is a process of issuing a warning to the person group 40 (hereinafter referred to as a warning process). For example, the warning process is a process of displaying a screen indicating a warning on a display device provided in the mobile robot 20 or irradiating a projector provided in the mobile robot 20 with an image indicating a warning. In addition, for example, the warning process is a process of outputting a voice indicating a warning from a speaker provided in the mobile robot 20.

Here, the mobile robot 20 may give a warning to the person group 40 after approaching it to some extent. For example, the conversation monitoring device 2000 may move the mobile robot 20 to a position where the distance from the person group 40 is equal to or less than a predetermined threshold value, and then output various warnings described above from the mobile robot 20. good.

In addition, for example, the conversation monitoring device 2000 may send a notification indicating a warning to each person included in the person group 40. In this case, the information in which the identification information of the person 10 and the transmission destination of the notification to the person (for example, an e-mail address) are associated with each other is stored in advance in a storage device accessible from the conversation monitoring device 2000. The conversation monitoring device 2000 identifies the identification information of each person included in the person group 40 by using the video data 32, the video data 23, or the voice data 25, and sends the identification information to the destination corresponding to the identification information as described above. Send a notification. When the identification information of the person 10 is specified by using the video data, for example, the identification information of the person 10 is a feature amount on the image of the person 10 (for example, a feature amount of a face). The conversation monitoring device 2000 extracts the feature amount of the person 10 included in the person group 40 from the video data 32 and the video data 23, and sets the feature amount to match the feature amount (for example, the feature amount having a similarity equal to or higher than the threshold value). Send a notification to the associated destination. On the other hand, when the voice data 25 is used, the feature amount of the voice of the person 10 is used as the identification information of the person 10.

Further, when it is detected that a conversation is taking place in the person group 40, the conversation monitoring device 2000 does not limit the target to only the person group 40, but also warns other people to pay attention. You may go. For example, the conversation monitoring device 2000 controls a device that performs broadcasting to perform broadcasting (indoor broadcasting, in-house broadcasting, outdoor broadcasting, etc.) that encourages avoiding conversation at a short distance, or plays a predetermined warning sound. Let me do it.

The coping process is not limited to the warning process. For example, the conversation monitoring device 2000 may store information (identification information, video data 32 or video data 23 in which the person group 40 is captured) about the person group 40 having a conversation in the storage device. By doing so, for example, when one of the person group 40 is found to have an infectious disease, another person included in the person group 40 can be identified as a person suspected of having an infectious disease. can.

Further, the conversation monitoring device 2000 does not immediately perform the coping process even if it is detected that the conversation is being performed in the person group 40, and may perform the coping process only when the conversation continues for a predetermined time. In addition, for example, the conversation monitoring device 2000 may perform coping processing in multiple stages according to the duration of the conversation. In this case, information corresponding to different warning processes for each of the plurality of warning levels is stored in advance in a storage device accessible from the conversation monitoring device 2000. For example, a higher warning level is associated with a more prominent (more effective warning) warning.

The conversation monitoring device 2000 gives a higher level warning as the conversation duration increases. For example, the conversation monitoring device 2000 performs a first-level warning process of "moving to a position within a predetermined distance from the person group 40" when the conversation duration is P1 or longer. Next, when the duration of the conversation is P2 (> P1) or longer, the second level warning process of "displaying the warning screen on the display device or projecting the warning image on the ground" is performed. Then, when the duration of the conversation is P3 (> P2) or longer, the third level warning process of "outputting the warning voice from the speaker" is performed.

By giving multi-step warnings according to the duration of the conversation in this way, a conservative warning is given while the conversation duration is short, and a more conspicuous warning is given as the conversation duration increases. It will be possible. This makes it possible to balance the magnitude of the warning effect with the degree to which the warning interferes with human activity. In other words, as long as the duration of the conversation is short, the effect of stopping the conversation may be small, so a warning should be given so as not to interfere with the conversation as much as possible, and if the duration is long, it will interfere with the conversation to some extent. Since it does not matter, it is possible to perform an operation such as giving a warning that has a great effect of stopping the conversation.

<Corrective processing considering conditions other than the presence or absence of conversation>
When the person group 40 satisfies a specific condition, the conversation monitoring device 2000 may not perform coping processing on the person group 40. For example, a specific condition is that "appropriate measures have been taken to prevent infectious diseases". More specific examples include the condition that "all the persons 10 included in the person group 40 are wearing masks" and "the plurality of persons 10 included in the person group 40 are partitioned by a partition". The condition that "is there" can be mentioned. Here, as a policy for restricting conversations in the monitoring area, it is conceivable to adopt a policy that "conversations may be made at a short distance if appropriate measures are taken to prevent infectious diseases". .. By adopting the above-mentioned conditions for infectious disease prevention, it is possible to operate such a policy. The timing for determining whether or not the above-mentioned specific conditions are satisfied may be either before or after the conversation determination is performed for the person group 40.

Further, the above-mentioned warning level may be changed depending on whether or not the above-mentioned specific conditions are satisfied. That is, the warning level when a specific condition is satisfied is lower than the warning level when a specific condition is not satisfied. By doing so, for example, if appropriate measures are taken to prevent infectious diseases, a more conservative warning will be given, and if appropriate measures are not taken to prevent infectious diseases. Can be used to give more prominent warnings.

Although the invention of the present application has been described above with reference to the embodiments, the invention of the present application is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made within the scope of the present invention in terms of the configuration and details of the present invention.

In the above example, the program is stored using various types of non-transitory computer readable medium and can be provided to the computer. Non-temporary computer-readable media include various types of tangible storage mediums. Examples of non-temporary computer readable media are magnetic recording media (eg, flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (eg, magneto-optical disks), CD-ROMs, CD-Rs, CD-Rs. / W, including semiconductor memory (for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM). The program may also be provided to the computer by various types of temporary computer readable medium. Examples of temporary computer-readable media include electrical, optical, and electromagnetic waves. The temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.

Some or all of the above embodiments may also be described, but not limited to:
(Appendix 1)
Whether or not the plurality of persons are having a conversation by acquiring the first video data in which a plurality of persons located within the first predetermined distance from each other in the monitoring area are detected and analyzing the first video data. The first judgment unit that determines
If it is not possible to determine whether or not the plurality of persons are talking even by analyzing the first video data, the mobile robot provided with the first camera can capture the faces of the plurality of persons. A movement control unit that moves a mobile robot provided with a microphone to a position within a second predetermined distance from the plurality of persons.
After moving the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed to determine whether or not the plurality of persons are having a conversation. A conversation monitoring device having two determination units.
(Appendix 2)
The first determination unit is
Based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the first video data, it is determined whether or not the plurality of persons are having a conversation.
Addition: When there is a person who cannot specify the movement of the mouth, the direction of the face, or the direction of the line of sight among the plurality of persons, it is determined that it cannot be determined whether or not the plurality of persons are having a conversation. The conversation monitoring device according to 1.
(Appendix 3)
The first determination unit calculates both the probability that a conversation is taking place and the probability that a conversation is not taking place for the plurality of persons included in the first video data, and determines the probability that the conversation is taking place. The conversation monitoring device according to Appendix 1, wherein it is determined that it cannot be determined whether or not the plurality of persons are having a conversation when both of the probabilities that no conversation is taking place are less than the threshold value.
(Appendix 4)
The second determination unit determines whether or not the plurality of persons are having a conversation based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the second video data. The conversation monitoring device according to any one of Supplementary note 1 to 3 for determination.
(Appendix 5)
The second determination unit is based on the loudness of the voice represented by the voice data and the distance between the mobile robot and the plurality of persons when the voice data is obtained. The conversation monitoring device according to any one of Supplementary note 1 to 3, which determines whether or not a person is having a conversation.
(Appendix 6)
The mobile robot is equipped with a projector.
Before accepting the control by the movement control unit, the mobile robot moves in the monitoring area while projecting a distance image showing the magnitude of the first predetermined distance from the projector to the ground.
By detecting the distance image and a plurality of people from the second video data and comparing the distance between the detected people with the size of the distance image, the distance images are located within the first predetermined distance from each other. Detects multiple people and
The conversation monitoring device according to any one of Supplementary note 1 to 5, wherein the first determination unit acquires the second video data in which the plurality of persons are detected as the first video data.
(Appendix 7)
The movement control unit identifies the direction or line-of-sight direction of the face of a person around the mobile robot by analyzing the second video data, and determines a region located in the specified face direction or line-of-sight direction. The conversation monitoring device according to any one of Supplementary note 1 to 6, which moves the mobile robot so as not to pass through.
(Appendix 8)
A second camera that captures a part of the monitoring area is provided.
The mobile robot moves an area that cannot be imaged by the second camera before accepting control by the movement control unit.
The conversation monitoring device according to any one of Supplementary note 1 to 7, wherein the first video data is video data generated by the first camera or the second camera.
(Appendix 9)
The conversation monitoring device according to Appendix 8, wherein a region that cannot be imaged by the second camera is specified by using the video data generated by the second camera and the map data of the monitoring area.
(Appendix 10)
When it is determined by the first determination unit or the second determination unit that the plurality of persons are having a conversation, a predetermined coping process is performed on the plurality of persons.
The predetermined coping process includes a process of bringing the mobile robot closer to the plurality of persons, a process of causing a display device or a projector provided in the mobile robot to output information indicating a warning, and a process provided in the mobile robot. The conversation monitoring device according to any one of Supplementary note 1 to 9, wherein one or more of the processes for outputting a warning sound to the speaker are included.
(Appendix 11)
A control method performed by a computer
Whether or not the plurality of persons are having a conversation by acquiring the first video data in which a plurality of persons located within the first predetermined distance from each other in the monitoring area are detected and analyzing the first video data. The first judgment step to determine
If it is not possible to determine whether or not the plurality of persons are talking even by analyzing the first video data, the mobile robot provided with the first camera can capture the faces of the plurality of persons. A movement control step for moving a mobile robot provided with a microphone to a position within a second predetermined distance from the plurality of persons.
After moving the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed to determine whether or not the plurality of persons are having a conversation. A control method comprising two determination steps.
(Appendix 12)
In the first determination step,
Based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the first video data, it is determined whether or not the plurality of persons are having a conversation.
Addendum: When there is a person who cannot specify the movement of the mouth, the direction of the face, or the direction of the line of sight among the plurality of persons, it is determined that it cannot be determined whether or not the plurality of persons are having a conversation. 11. The control method according to 11.
(Appendix 13)
In the first determination step, both the probability that a conversation is taking place and the probability that a conversation is not taking place are calculated for the plurality of persons included in the first video data, and the probability that the conversation is taking place is calculated. The control method according to Appendix 11, wherein it is determined that it cannot be determined whether or not the plurality of persons are having a conversation when both of the probabilities that the conversation is not taking place are less than the threshold value.
(Appendix 14)
In the second determination step, whether or not the plurality of persons are having a conversation based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the second video data. The control method according to any one of Supplementary note 11 to 13 for determination.
(Appendix 15)
In the second determination step, the plurality of voice data are based on the loudness of the voice represented by the voice data and the distance between the mobile robot and the plurality of persons when the voice data is obtained. The control method according to any one of Supplementary note 11 to 13, which determines whether or not a person is having a conversation.
(Appendix 16)
The mobile robot is equipped with a projector.
Before accepting the control by the movement control step, the mobile robot moves in the monitoring area while projecting a distance image showing the magnitude of the first predetermined distance from the projector to the ground.
By detecting the distance image and a plurality of people from the second video data and comparing the distance between the detected people with the size of the distance image, the distance images are located within the first predetermined distance from each other. Detects multiple people and
The control method according to any one of Supplementary note 11 to 15, wherein in the first determination step, the second video data in which the plurality of persons are detected is acquired as the first video data.
(Appendix 17)
In the movement control step, by analyzing the second video data, the direction or line-of-sight direction of the face of a person around the mobile robot is specified, and the region located in the specified face direction or line-of-sight direction is determined. The control method according to any one of Supplementary note 11 to 16, wherein the mobile robot is moved so as not to pass through.
(Appendix 18)
A second camera that captures a part of the monitoring area is provided.
The mobile robot moves an area that cannot be imaged by the second camera before accepting the control by the movement control step.
The control method according to any one of Supplementary note 11 to 17, wherein the first video data is video data generated by the first camera or the second camera.
(Appendix 19)
The control method according to Appendix 18, wherein a region that cannot be imaged by the second camera is specified by using the video data generated by the second camera and the map data of the monitoring area.
(Appendix 20)
When it is determined by the first determination step or the second determination step that the plurality of persons are having a conversation, a predetermined coping process is performed on the plurality of persons.
The predetermined coping process includes a process of bringing the mobile robot closer to the plurality of persons, a process of causing a display device or a projector provided in the mobile robot to output information indicating a warning, and a process provided in the mobile robot. The control method according to any one of Supplementary note 11 to 19, wherein any one or more of the processes for outputting a warning sound to the speaker is included.
(Appendix 21)
A computer-readable medium that stores programs
The program
Whether or not the plurality of persons are having a conversation by acquiring the first video data in which a plurality of persons located within the first predetermined distance from each other in the monitoring area are detected and analyzing the first video data. The first judgment step to determine
If it is not possible to determine whether or not the plurality of persons are talking even by analyzing the first video data, the mobile robot provided with the first camera can capture the faces of the plurality of persons. A movement control step for moving a mobile robot provided with a microphone to a position within a second predetermined distance from the plurality of persons.
After moving the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed to determine whether or not the plurality of persons are having a conversation. A computer-readable medium that causes a computer to perform two judgment steps.
(Appendix 22)
In the first determination step,
Based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the first video data, it is determined whether or not the plurality of persons are having a conversation.
Addition: When there is a person who cannot specify the movement of the mouth, the direction of the face, or the direction of the line of sight among the plurality of persons, it is determined that it cannot be determined whether or not the plurality of persons are having a conversation. 21 is a computer-readable medium.
(Appendix 23)
In the first determination step, both the probability that a conversation is taking place and the probability that a conversation is not taking place are calculated for the plurality of persons included in the first video data, and the probability that the conversation is taking place is calculated. The computer-readable medium according to Appendix 21, wherein it is determined that it cannot be determined whether or not the plurality of persons are having a conversation when both of the probabilities that the conversation is not taking place are less than the threshold value.
(Appendix 24)
In the second determination step, whether or not the plurality of persons are having a conversation based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the second video data. The computer-readable medium according to any one of Supplementary note 21 to 23 to be determined.
(Appendix 25)
In the second determination step, the plurality of voice data are based on the loudness of the voice represented by the voice data and the distance between the mobile robot and the plurality of persons when the voice data is obtained. The computer-readable medium according to any one of Supplementary note 21 to 23, which determines whether or not a person is having a conversation.
(Appendix 26)
The mobile robot is equipped with a projector.
Before accepting the control by the movement control step, the mobile robot moves in the monitoring area while projecting a distance image showing the magnitude of the first predetermined distance from the projector to the ground.
By detecting the distance image and a plurality of people from the second video data and comparing the distance between the detected people with the size of the distance image, the distance images are located within the first predetermined distance from each other. Detects multiple people and
The computer-readable medium according to any one of Supplementary note 21 to 25, wherein the second video data in which the plurality of persons are detected in the first determination step is acquired as the first video data.
(Appendix 27)
In the movement control step, by analyzing the second video data, the direction or line-of-sight direction of the face of a person around the mobile robot is specified, and the region located in the specified face direction or line-of-sight direction is determined. The computer-readable medium according to any one of Supplementary note 21 to 26, which moves the mobile robot so as not to pass through.
(Appendix 28)
A second camera that captures a part of the monitoring area is provided.
The mobile robot moves an area that cannot be imaged by the second camera before accepting the control by the movement control step.
The computer-readable medium according to any one of Supplementary note 21 to 27, wherein the first video data is video data generated by the first camera or the second camera.
(Appendix 29)
The computer-readable medium according to Appendix 28, which identifies an area that cannot be imaged by the second camera by using the video data generated by the second camera and the map data of the monitoring area.
(Appendix 30)
When it is determined by the first determination step or the second determination step that the plurality of persons are having a conversation, a predetermined coping process is performed on the plurality of persons.
The predetermined coping process includes a process of bringing the mobile robot closer to the plurality of persons, a process of causing a display device or a projector provided in the mobile robot to output information indicating a warning, and a process provided in the mobile robot. The computer-readable medium according to any one of Supplementary note 21 to 29, which comprises one or more of the processes for outputting a warning sound to the speaker.

10 People 20 Mobile Robot 22 Camera 23 Video Data 24 Microphone 25 Voice Data 26 Actuator 27 Transportation 30 Camera 32 Video Data 40 People Group 500 Computer 502 Bus 504 Processor 506 Memory 508 Storage Device 510 I / O Interface 512 Network Interface 600 Controller 602 Bus 604 Processor 606 Memory 608 Storage device 610 Input / output interface 612 Network interface 2000 Conversation monitoring device 2020 First judgment unit 2040 Mobile control unit 2060 Second judgment unit

Claims

Whether or not the plurality of persons are having a conversation by acquiring the first video data in which a plurality of persons located within the first predetermined distance from each other in the monitoring area are detected and analyzing the first video data. The first judgment unit that determines
If it is not possible to determine whether or not the plurality of persons are talking even by analyzing the first video data, the mobile robot provided with the first camera can capture the faces of the plurality of persons. A movement control unit that moves a mobile robot provided with a microphone to a position within a second predetermined distance from the plurality of persons.
After moving the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed to determine whether or not the plurality of persons are having a conversation. A conversation monitoring device having two determination units.
The first determination unit is
Based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the first video data, it is determined whether or not the plurality of persons are having a conversation.
When there is a person who cannot specify the movement of the mouth, the direction of the face, or the direction of the line of sight among the plurality of persons, it is determined that it cannot be determined whether or not the plurality of persons are having a conversation. Item 1. The conversation monitoring device according to item 1.
The first determination unit calculates both the probability that a conversation is taking place and the probability that a conversation is not taking place for the plurality of persons included in the first video data, and determines the probability that the conversation is taking place. The conversation monitoring device according to claim 1, wherein it is determined that it cannot be determined whether or not the plurality of persons are having a conversation when both of the probabilities that no conversation is taking place are less than the threshold value.
The second determination unit determines whether or not the plurality of persons are having a conversation based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the second video data. The conversation monitoring device according to any one of claims 1 to 3 for determination.
The second determination unit is based on the loudness of the voice represented by the voice data and the distance between the mobile robot and the plurality of persons when the voice data is obtained. The conversation monitoring device according to any one of claims 1 to 3, which determines whether or not a person is having a conversation.
The mobile robot is equipped with a projector.
Before accepting the control by the movement control unit, the mobile robot moves in the monitoring area while projecting a distance image showing the magnitude of the first predetermined distance from the projector to the ground.
By detecting the distance image and a plurality of people from the second video data and comparing the distance between the detected people with the size of the distance image, the distance images are located within the first predetermined distance from each other. Detects multiple people and
The conversation monitoring device according to any one of claims 1 to 5, wherein the first determination unit acquires the second video data in which the plurality of persons are detected as the first video data.
The movement control unit identifies the direction or line-of-sight direction of the face of a person around the mobile robot by analyzing the second video data, and determines a region located in the specified face direction or line-of-sight direction. The conversation monitoring device according to any one of claims 1 to 6, wherein the mobile robot is moved so as not to pass through.
A second camera that captures a part of the monitoring area is provided.
The mobile robot moves an area that cannot be imaged by the second camera before accepting control by the movement control unit.
The conversation monitoring device according to any one of claims 1 to 7, wherein the first video data is video data generated by the first camera or the second camera.
The conversation monitoring device according to claim 8, wherein an area that cannot be imaged by the second camera is specified by using the video data generated by the second camera and the map data of the monitoring area.
When it is determined by the first determination unit or the second determination unit that the plurality of persons are having a conversation, a predetermined coping process is performed on the plurality of persons.
The predetermined coping process includes a process of bringing the mobile robot closer to the plurality of persons, a process of causing a display device or a projector provided in the mobile robot to output information indicating a warning, and a process provided in the mobile robot. The conversation monitoring device according to any one of claims 1 to 9, wherein one or more of the processes for outputting a warning sound to the speaker are included.
A control method performed by a computer
Whether or not the plurality of persons are having a conversation by acquiring the first video data in which a plurality of persons located within the first predetermined distance from each other in the monitoring area are detected and analyzing the first video data. The first judgment step to determine
If it is not possible to determine whether or not the plurality of persons are talking even by analyzing the first video data, the mobile robot provided with the first camera can capture the faces of the plurality of persons. A movement control step for moving a mobile robot provided with a microphone to a position within a second predetermined distance from the plurality of persons.
After moving the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed to determine whether or not the plurality of persons are having a conversation. A control method comprising two determination steps.
In the first determination step,
Based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the first video data, it is determined whether or not the plurality of persons are having a conversation.
When there is a person who cannot specify the movement of the mouth, the direction of the face, or the direction of the line of sight among the plurality of persons, it is determined that it cannot be determined whether or not the plurality of persons are having a conversation. Item 11. The control method according to Item 11.
In the first determination step, both the probability that a conversation is taking place and the probability that a conversation is not taking place are calculated for the plurality of persons included in the first video data, and the probability that the conversation is taking place is calculated. The control method according to claim 11, wherein it is determined that it cannot be determined whether or not the plurality of persons are having a conversation when both of the probabilities that the conversation is not taking place are less than the threshold value.
In the second determination step, whether or not the plurality of persons are having a conversation based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the second video data. The control method according to any one of claims 11 to 13, wherein the determination is made.
In the second determination step, the plurality of voice data are based on the loudness of the voice represented by the voice data and the distance between the mobile robot and the plurality of persons when the voice data is obtained. The control method according to any one of claims 11 to 13, wherein it is determined whether or not a person is having a conversation.
The mobile robot is equipped with a projector.
Before accepting the control by the movement control step, the mobile robot moves in the monitoring area while projecting a distance image showing the magnitude of the first predetermined distance from the projector to the ground.
By detecting the distance image and a plurality of people from the second video data and comparing the distance between the detected people with the size of the distance image, the distance images are located within the first predetermined distance from each other. Detects multiple people and
The control method according to any one of claims 11 to 15, wherein in the first determination step, the second video data in which the plurality of persons are detected is acquired as the first video data.
In the movement control step, by analyzing the second video data, the direction or line-of-sight direction of the face of a person around the mobile robot is specified, and the region located in the specified face direction or line-of-sight direction is determined. The control method according to any one of claims 11 to 16, wherein the mobile robot is moved so as not to pass through.
A second camera that captures a part of the monitoring area is provided.
The mobile robot moves an area that cannot be imaged by the second camera before accepting the control by the movement control step.
The control method according to any one of claims 11 to 17, wherein the first video data is video data generated by the first camera or the second camera.
The control method according to claim 18, wherein an area that cannot be captured by the second camera is specified by using the video data generated by the second camera and the map data of the monitoring area.
When it is determined by the first determination step or the second determination step that the plurality of persons are having a conversation, a predetermined coping process is performed on the plurality of persons.
The predetermined coping process includes a process of bringing the mobile robot closer to the plurality of persons, a process of causing a display device or a projector provided in the mobile robot to output information indicating a warning, and a process provided in the mobile robot. The control method according to any one of claims 11 to 19, wherein one or more of the processes for outputting a warning sound to the speaker are included.
A computer-readable medium that stores programs
The program
Whether or not the plurality of persons are having a conversation by acquiring the first video data in which a plurality of persons located within the first predetermined distance from each other in the monitoring area are detected and analyzing the first video data. The first judgment step to determine
If it is not possible to determine whether or not the plurality of persons are talking even by analyzing the first video data, the mobile robot provided with the first camera can capture the faces of the plurality of persons. A movement control step for moving a mobile robot provided with a microphone to a position within a second predetermined distance from the plurality of persons.
After moving the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed to determine whether or not the plurality of persons are having a conversation. A computer-readable medium that causes a computer to perform two judgment steps.
In the first determination step,
Based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the first video data, it is determined whether or not the plurality of persons are having a conversation.
A claim for determining whether or not the plurality of persons are having a conversation when there is a person who cannot specify the movement of the mouth, the direction of the face, or the direction of the line of sight among the plurality of persons. Item 21 is a computer-readable medium.
In the first determination step, both the probability that a conversation is taking place and the probability that a conversation is not taking place are calculated for the plurality of persons included in the first video data, and the probability that the conversation is taking place is calculated. The computer-readable medium according to claim 21, wherein it is determined that it cannot be determined whether or not the plurality of persons are having a conversation when both of the probabilities that the conversation is not taking place are less than the threshold value.
In the second determination step, whether or not the plurality of persons are having a conversation based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the second video data. The computer-readable medium according to any one of claims 21 to 23, which is determined.
In the second determination step, the plurality of voice data are based on the loudness of the voice represented by the voice data and the distance between the mobile robot and the plurality of persons when the voice data is obtained. The computer-readable medium according to any one of claims 21 to 23, which determines whether or not a person is having a conversation.
The mobile robot is equipped with a projector.
Before accepting the control by the movement control step, the mobile robot moves in the monitoring area while projecting a distance image showing the magnitude of the first predetermined distance from the projector to the ground.
By detecting the distance image and a plurality of people from the second video data and comparing the distance between the detected people with the size of the distance image, the distance images are located within the first predetermined distance from each other. Detects multiple people and
The computer-readable medium according to any one of claims 21 to 25, wherein the second video data in which the plurality of persons are detected in the first determination step is acquired as the first video data.
In the movement control step, by analyzing the second video data, the direction or line-of-sight direction of the face of a person around the mobile robot is specified, and the region located in the specified face direction or line-of-sight direction is determined. The computer-readable medium according to any one of claims 21 to 26, which moves the mobile robot so as not to pass through.
A second camera that captures a part of the monitoring area is provided.
The mobile robot moves an area that cannot be imaged by the second camera before accepting the control by the movement control step.
The computer-readable medium according to any one of claims 21 to 27, wherein the first video data is video data generated by the first camera or the second camera.
The computer-readable medium according to claim 28, which identifies an area that cannot be captured by the second camera by using the video data generated by the second camera and the map data of the monitoring area.
When it is determined by the first determination step or the second determination step that the plurality of persons are having a conversation, a predetermined coping process is performed on the plurality of persons.
The predetermined coping process includes a process of bringing the mobile robot closer to the plurality of persons, a process of causing a display device or a projector provided in the mobile robot to output information indicating a warning, and a process provided in the mobile robot. The computer-readable medium according to any one of claims 21 to 29, which comprises one or more of the processes of outputting a warning sound to the speaker.