WO2022009349A1 - Conversation monitoring device, control method, and computer readable medium - Google Patents

Conversation monitoring device, control method, and computer readable medium Download PDF

Info

Publication number
WO2022009349A1
WO2022009349A1 PCT/JP2020/026745 JP2020026745W WO2022009349A1 WO 2022009349 A1 WO2022009349 A1 WO 2022009349A1 JP 2020026745 W JP2020026745 W JP 2020026745W WO 2022009349 A1 WO2022009349 A1 WO 2022009349A1
Authority
WO
WIPO (PCT)
Prior art keywords
conversation
persons
video data
mobile robot
camera
Prior art date
Application number
PCT/JP2020/026745
Other languages
French (fr)
Japanese (ja)
Inventor
純一 船田
尚志 水本
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to JP2022534569A priority Critical patent/JP7416253B2/en
Priority to PCT/JP2020/026745 priority patent/WO2022009349A1/en
Publication of WO2022009349A1 publication Critical patent/WO2022009349A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services

Definitions

  • the present invention relates to a technique for detecting conversations between a plurality of persons.
  • Patent Document 1 detects that a resident and a visitor have a conversation for a predetermined time or longer by using an image obtained from a camera installed in a facility, and infects an infectious disease according to the detection. It discloses a technology to notify that the risk is high.
  • a state of facing each other at a short distance is detected as a state of having a conversation.
  • Patent Document 1 detects that a resident and a visitor are facing each other by using an image obtained from a camera fixedly installed in the facility.
  • a fixedly installed camera there is a possibility that the face of the resident or the visitor cannot be detected from the image of the camera because the resident or the visitor enters the blind spot of the camera. As a result, even if the resident and the visitor are having a conversation, it may not be detected.
  • the present invention has been made in view of the above problems, and one of the purposes thereof is to provide a technique for accurately detecting a situation in which a conversation is taking place.
  • the conversation monitoring device of the present disclosure acquires first video data in which a plurality of persons located within a first predetermined distance from each other in the monitoring area are detected, analyzes the first video data, and analyzes the first video data to obtain the plurality of persons.
  • the first camera determines whether or not the plurality of persons are having a conversation even if the first determination unit for determining whether or not the person is having a conversation and the first video data are analyzed.
  • the mobile robot provided is moved to a position where the faces of the plurality of persons can be imaged, or the mobile robot provided with the microphone is moved to a place within a second predetermined distance from the plurality of persons. After moving the mobile control unit and the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed, and whether the plurality of persons are having a conversation. It has a second determination unit for determining whether or not it is present.
  • the control method of the present disclosure is executed by a computer.
  • the control method acquires first video data in which a plurality of persons located within a first predetermined distance from each other in the monitoring area are detected, analyzes the first video data, and the plurality of persons have a conversation.
  • a first camera is provided when it is not possible to determine whether or not the plurality of persons are talking even if the first determination step for determining whether or not the video is being performed and the first video data are analyzed.
  • Movement control to move the mobile robot to a position where the faces of the plurality of persons can be imaged, or to move the mobile robot provided with the microphone to a place within a second predetermined distance from the plurality of persons.
  • the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed to determine whether or not the plurality of persons are talking. It has a second determination step for determination.
  • the computer-readable medium of the present disclosure stores a program that causes a computer to execute the control method of the present disclosure.
  • FIG. 1 It is a figure which illustrates the outline of the conversation monitoring apparatus of Embodiment 1.
  • FIG. It is a figure which illustrates the functional structure of a conversation monitoring apparatus.
  • It is a block diagram which illustrates the hardware composition of the computer which realizes a conversation monitoring device.
  • It is a block diagram which illustrates the hardware composition of a mobile robot.
  • It is a flowchart which illustrates the flow of the process executed by the conversation monitoring apparatus of Embodiment 1.
  • FIG. 1 It is a figure which illustrates the outline of the conversation monitoring apparatus of Embodiment 1.
  • FIG. It is a figure which illustrates the functional structure of a conversation monitoring apparatus.
  • It is a block diagram which illustrates the hardware composition of the computer which realizes a conversation monitoring device.
  • It is a block diagram which illustrates the hardware composition of a mobile robot.
  • It is a flowchart which illustrates the flow of the process executed by the conversation monitoring apparatus of Embodiment 1.
  • FIG. 1 is a diagram illustrating an outline of the conversation monitoring device of the first embodiment (conversation monitoring device 2000 of FIG. 2 described later). The following description given with reference to FIG. 1 is for facilitating the understanding of the conversation monitoring device 2000 of the first embodiment, and the operation of the conversation monitoring device 2000 of the first embodiment will be described below. Not limited to things.
  • the conversation monitoring device 2000 detects a situation in which a plurality of persons 10 are talking in a predetermined monitoring area.
  • the monitoring area can be any place such as an office. Further, the monitoring area may be outdoors.
  • the conversation monitoring device 2000 analyzes the video data 32 generated by the camera 30 and attempts to determine whether or not conversation is being performed by a plurality of persons 10 (hereinafter referred to as conversation determination).
  • the camera 30 may be a camera 22 mounted on the mobile robot 20 described later, or a camera other than the camera 22 (for example, a surveillance camera provided on a wall, a ceiling, or the like). May be good.
  • the camera 22 mounted on the mobile robot 20 generates both the video data 32 and the video data 23 described later.
  • a set of a plurality of persons 10 to be determined whether or not a conversation is being performed is referred to as a person group 40.
  • the video data 32 may include a plurality of person groups 40.
  • the plurality of persons 10 included in the person group 40 have a predetermined distance L1 or less from each other.
  • video data including a plurality of persons 10 whose distance from each other is a predetermined distance L1 or less is treated as video data 32.
  • "a plurality of people 10 have a predetermined distance L1 or less from each other” means, for example, "all people 10 have a diameter L1". It is included in the circle. "
  • the predetermined distance L1 is determined based on the distance between people (so-called social distance) that should be left to prevent infectious diseases. Specifically, as a predetermined distance L1, a value defined as a social distance or a value obtained by adding a predetermined margin to the value can be used.
  • the conversation monitoring device 2000 uses the mobile robot 20 to determine the conversation about the person group 40. Do more. Specifically, the conversation monitoring device 2000 uses video data 32 to determine conversation, and 1) conversation is taking place in the person group 40, 2) conversation is not taking place in the person group 40, and 3). One of the three that it is not possible to determine whether or not a conversation is taking place in the person group 40 (for example, neither the probability of having a conversation nor the probability of not having a conversation is sufficiently high). Judgment result is obtained. Then, when the determination result is 3), the conversation monitoring device 2000 further performs a conversation determination by using the mobile robot 20.
  • the conversation monitoring device 2000 uses the video data 23 obtained from the camera 22 mounted on the mobile robot 20 or the voice data 25 obtained from the microphone 24 mounted on the mobile robot 20.
  • the camera 22 and the microphone 24 may be provided with either one or both.
  • the conversation monitoring device 2000 moves the mobile robot 20 to an appropriate position (hereinafter referred to as a destination) so that it can be determined whether or not a conversation is being performed in the person group 40.
  • the conversation monitoring device 2000 moves the mobile robot 20 to a position where the face of each person 10 included in the person group 40 can be imaged, and uses the video data 23 obtained after the movement to describe the person group 40. Make a conversation judgment.
  • the conversation monitoring device 2000 moves the mobile robot 20 to a position where the distance from the person group 40 is a predetermined distance L2, and makes a conversation determination for the person group 40 using the voice data 25 obtained after the movement. ..
  • the conversation monitoring device 2000 When it is determined that a conversation is taking place in the person group 40, the conversation monitoring device 2000 performs a predetermined coping process (for example, a warning process).
  • a predetermined coping process for example, a warning process.
  • the conversation monitoring device 2000 of the present embodiment even if the video data 32 in which a plurality of persons 10 (person group 40) whose distances from each other are equal to or less than a predetermined distance L1 is detected, these plurality of persons 10 have a conversation.
  • the mobile robot 20 is used. Specifically, the conversation monitoring device 2000 moves the mobile robot 20 to an appropriate position, and uses the video data 23 or the voice data 25 obtained from the mobile robot 20 to determine the conversation about the person group 40. conduct.
  • the conversation monitoring device 2000 of the present embodiment when it is not possible to determine whether or not a plurality of persons 10 located within a specific distance from each other are having a conversation, the mobile robot 20 is controlled. By doing so, it is possible to obtain video data 23 and audio data 25 that can be discriminated. Therefore, it is possible to determine with higher accuracy whether or not a plurality of persons 10 located within a specific distance are having a conversation.
  • FIG. 2 is a diagram illustrating the functional configuration of the conversation monitoring device 2000.
  • the conversation monitoring device 2000 includes a first determination unit 2020, a movement control unit 2040, and a second determination unit 2060.
  • the first determination unit 2020 analyzes the video data 32 and determines whether or not a conversation is taking place in the person group 40.
  • the movement control unit 2040 transfers the mobile robot 20 provided with the camera 22 to the person group 40.
  • the face of each of the included persons 10 is moved to a position where imaging is possible, or the mobile robot 20 provided with the microphone 24 is moved to a position where the distance from the person group 40 is a predetermined distance L2 or less.
  • the second determination unit 2060 analyzes the video data 23 obtained from the camera 22 or the voice data 25 obtained from the microphone 24 after the mobile robot 20 moves, and determines whether or not a conversation is taking place in the person group 40. To judge.
  • Each functional component of the conversation monitoring device 2000 may be realized by hardware that realizes each functional component (eg, a hard-wired electronic circuit, etc.), or a combination of hardware and software (eg, example). It may be realized by a combination of an electronic circuit and a program that controls it).
  • a case where each functional component of the conversation monitoring device 2000 is realized by a combination of hardware and software will be further described.
  • FIG. 3 is a block diagram illustrating a hardware configuration of a computer 500 that realizes the conversation monitoring device 2000.
  • the computer 500 is any computer.
  • the computer 500 is a stationary computer such as a PC (Personal Computer) or a server machine.
  • the computer 500 is a portable computer such as a smartphone or a tablet terminal.
  • the computer 500 may be a controller (controller 600 described later) built in the mobile robot 20.
  • the conversation monitoring device 2000 is realized as the mobile robot 20 (that is, the mobile robot 20 also has a function as the conversation monitoring device 2000).
  • the computer 500 may be a dedicated computer designed to realize the conversation monitoring device 2000, or may be a general-purpose computer.
  • each function of the conversation monitoring device 2000 is realized on the computer 500.
  • the above application is composed of a program for realizing the functional component of the conversation monitoring device 2000.
  • the computer 500 has a bus 502, a processor 504, a memory 506, a storage device 508, an input / output interface 510, and a network interface 512.
  • the bus 502 is a data transmission path for the processor 504, the memory 506, the storage device 508, the input / output interface 510, and the network interface 512 to transmit and receive data to and from each other.
  • the method of connecting the processors 504 and the like to each other is not limited to the bus connection.
  • the processor 504 is various processors such as a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or an FPGA (Field-Programmable Gate Array).
  • the memory 506 is a main storage device realized by using RAM (RandomAccessMemory) or the like.
  • the storage device 508 is an auxiliary storage device realized by using a hard disk, SSD (Solid State Drive), memory card, ROM (Read Only Memory), or the like.
  • the input / output interface 510 is an interface for connecting the computer 500 and the input / output device.
  • an input device such as a keyboard and an output device such as a display device are connected to the input / output interface 510.
  • the network interface 512 is an interface for connecting the computer 500 to the wireless network.
  • This network may be a LAN (Local Area Network) or a WAN (Wide Area Network).
  • the computer 500 is communicably connected to the mobile robot 20 via a network interface 512 and a wireless network.
  • the storage device 508 stores a program (a program that realizes the above-mentioned application) that realizes each functional component of the conversation monitoring device 2000.
  • the processor 504 reads this program into the memory 506 and executes it to realize each functional component of the conversation monitoring device 2000.
  • the conversation monitoring device 2000 may be realized by one computer 500 or by a plurality of computers 500. In the latter case, the configurations of the computers 500 do not have to be the same and can be different.
  • FIG. 4 is a block diagram illustrating a hardware configuration of the mobile robot 20.
  • the mobile robot 20 includes a camera 22, a microphone 24, an actuator 26, a moving means 27, and a controller 600.
  • the mobile robot 20 moves by operating the moving means 27 according to the output of the actuator 26.
  • the moving means 27 is a means for realizing traveling, such as a wheel.
  • the mobile robot 20 travels and moves in the monitoring area.
  • the transportation means 27 may be a means for realizing flight, such as a propeller.
  • the mobile robot 20 flies and moves in the monitoring area.
  • the output of the actuator 26 is controlled by the controller 600.
  • the controller 600 is an arbitrary computer, and is realized by an integrated circuit such as SoC (System on a Chip) or SiP (System in a Package). In addition, for example, the controller 600 may be realized by a mobile terminal such as a smartphone.
  • the controller 600 has a bus 602, a processor 604, a memory 606, a storage device 608, an input / output interface 610, and a network interface 612.
  • the bus 602, processor 604, memory 606, storage device 608, I / O interface 610, and network interface 612 are similar to the bus 502, processor 504, memory 506, storage device 508, I / O interface 510, and network interface 512, respectively. Has a function.
  • FIG. 5 is a flowchart illustrating the flow of processing executed by the conversation monitoring device 2000 of the first embodiment.
  • the first determination unit 2020 acquires the video data 32 including the person group 40, analyzes the video data 32, and makes a conversation determination about the person group 40 (S102).
  • the conversation monitoring device 2000 executes a predetermined coping process (S104).
  • S104 predetermined coping process
  • the loop process A is executed.
  • the loop process A is composed of S106 to S112.
  • the movement control unit 2040 moves the mobile robot 20 (S108).
  • the second determination unit 2060 makes a conversation determination using the video data 23 or the voice data 25 obtained from the mobile robot 20 after the movement (S110).
  • the conversation monitoring device 2000 executes a coping process (S104).
  • a coping process S104
  • the process of FIG. 5 ends. If it is determined that it cannot be determined whether or not the conversation is being performed in the person group 40 (S110: cannot be determined), the process returns to step S106 and the loop process A is executed again. By doing so, the conversation determination is performed while moving the mobile robot 20 until it can be determined whether or not there is a conversation.
  • the loop process A continues to be executed while it cannot be determined whether or not there is a conversation in the person group 40. However, even if it cannot be determined whether or not there is a conversation, the loop process A may be terminated and the process of FIG. 5 may be terminated when the predetermined termination condition is satisfied.
  • the conversation monitoring device 2000 determines in S106 whether or not a predetermined end condition is satisfied, and if the end condition is satisfied, ends the loop process A. On the other hand, when the predetermined end condition is satisfied, the process of FIG. 5 ends.
  • the predetermined end condition is a condition that "a predetermined time has elapsed since the conversation determination (S102) was first performed for the person group 40".
  • the predetermined end condition may be a condition that "the distance between the persons 10 included in the person group 40 is larger than the predetermined distance L1".
  • the video data 32 is video data generated by the camera 30, and is obtained by detecting a person group 40 (a plurality of people 10 whose distance from each other is a predetermined distance L1 or less).
  • a device that performs a process of detecting a person group 40 from video data is called a person group detection device.
  • the person group detection device may be the conversation monitoring device 2000 (that is, the conversation monitoring device 2000 may also have the function of the person group detection device), or may be a device other than the conversation monitoring device 2000.
  • the person group detection device detects a plurality of people 10 from the video data generated by the camera 30, and identifies that the distance between these people 10 is a predetermined distance L1 or less, thereby making these people 10 people. Detected as group 40.
  • a plurality of cameras 30 may be provided.
  • the person group detection device analyzes the video data generated by the camera 30 and detects a plurality of people 10 from the video data. When a plurality of people 10 are detected, the person group detection device controls a projector to project an image representing a specific distance (hereinafter referred to as a distance image) onto the ground.
  • the distance image is projected at a position where both the detected plurality of people 10 and the distance image can be included in the imaging range of the camera 30.
  • the distance represented by the distance image is, for example, the predetermined distance L1 described above.
  • the projector may be mounted on the mobile robot 20 or may be installed in another place (for example, the ceiling).
  • the person group detection device detects a plurality of people 10 and a distance image from the video data generated by the camera 30, and compares the distance between the people 10 with the size of the distance image (that is, a predetermined distance L1 on the image). When the distance between the persons 10 is smaller than the size of the distance image, the person group detection device detects these persons 10 as the person group 40. Then, the person group detection device provides the video data in which the person group 40 is detected and the video data generated by the camera 30 after the video data to the conversation monitoring device 2000 as the video data 32.
  • the method for specifying that the distance between the persons 10 is a predetermined distance L1 or less is not limited to the above method, and other existing techniques may be used.
  • the conversation monitoring device 2000 is provided with information that can specify the position of the person group 40 in addition to the video data 32.
  • the information that can specify the position of the person group 40 is, for example, information that represents the position of the person group 40 in the map data that represents the map of the monitoring area. It should be noted that existing technology can be used for the technology for specifying the position of the object captured by the camera on the map data.
  • the first determination unit 2020 determines conversation with respect to the person group 40 included in the video data 32 (S102). Specifically, the first determination unit 2020 makes a conversation determination based on the facial condition of each person 10 included in the person group 40.
  • a method of conversation determination using video data will be illustrated.
  • the first determination unit 2020 determines whether or not each person 10 included in the person group 40 is moving his or her mouth, thereby determining whether or not there is a conversation. For example, the first determination unit 2020 determines that if any one of the plurality of persons 10 included in the person group 40 is moving his / her mouth, all the persons 10 included in the person group 40 are having a conversation (that is,). , It is determined that the conversation is taking place in the person group 40). Further, if no person included in the person group 40 is moving his / her mouth, the first determination unit 2020 determines that no conversation is being performed in the person group 40. Further, when the person 10 who is moving the mouth is not detected and the person 10 who cannot determine whether or not the person is moving the mouth is detected, the first determination unit 2020 determines that the presence or absence of conversation cannot be determined. ..
  • the first determination unit 2020 may determine that the conversation is being performed only by the person 10 who is moving the mouth among the plurality of persons 10 included in the person group 40. In this case, the first determination unit 2020 excludes the person 10 determined not to move the mouth from the person group 40. The first determination unit 2020 is moving the mouth when the person 10 who cannot determine whether or not the mouth is moving is not detected from the video data 32 and two or more people 10 who are moving the mouth are detected. After excluding the non-existing person 10 from the person group 40, it is determined that the conversation is taking place in the person group 40.
  • the person group 40 Judges that no conversation is taking place. Further, when the first determination unit 2020 detects a person 10 who cannot determine whether or not he / she is moving his / her mouth from the video data 32, the first determination unit 2020 excludes the person 10 who is determined not to move his / her mouth from the person group 40. It is determined that the presence or absence of conversation cannot be determined.
  • the person 10 who cannot determine whether or not he / she is moving his / her mouth may, for example, turn his / her back to the camera that generates the video data 32 because the portion of the mouth is not included in the video data 32.
  • the first determination unit 2020 is configured to calculate both the probability of moving the mouth and the probability of not moving the mouth of the person from the time-series data of the image area representing the mouth of the person and its surroundings. To. Then, based on these probabilities, the first determination unit 2020 determines whether or not the person's state is 1) moving the mouth, 2) not moving the mouth, and 3) moving the mouth. Identify one of the following:
  • a predetermined value T1 is set in advance as a threshold value for the above probability.
  • the probability of moving the mouth is T1 or more, it is determined that the person is moving the mouth.
  • the probability of not moving the mouth is T1 or more, it is determined that the person is not moving the mouth.
  • the first determination unit 2020 determines whether or not there is a conversation based on the direction of the face or the line of sight of each person 10 included in the person group 40.
  • the case of using the orientation of the face will be described more specifically. Unless otherwise specified, the description of the case where the direction of the line of sight is used is the one in which the "face" is replaced with the "line of sight” in the following description.
  • the first determination unit 2020 is a person included in the person group 40 when the face of each person 10 included in the person group 40 faces toward any other person 10 included in the person group 40. 10 It is determined that all the members are having a conversation (that is, it is determined that the person group 40 is having a conversation). Further, when none of the faces of each person included in the person group 40 are facing toward the other person 10 included in the person group 40, the first determination unit 2020 is having a conversation in the person group 40. It is determined that there is no such thing. Further, the first determination unit 2020 determines that the presence or absence of conversation cannot be determined when the person 10 whose face orientation cannot be determined is detected from the person group 40.
  • the first determination unit 2020 determines that the conversation is being performed only by the person 10 whose face is facing the other person 10 included in the person group 40 among the plurality of persons 10 included in the person group 40. You may. In this case, the first determination unit 2020 excludes the person 10 who is determined not to face the other person 10 included in the person group 40 from the person group 40. If the first determination unit 2020 does not detect the person 10 whose face direction cannot be determined from the video data 32, and if two or more persons 10 whose faces are facing toward the other person 10 are detected, the other person After excluding the person 10 whose face is not facing toward the person 10 from the person group 40, it is determined that the conversation is being performed in the person group 40.
  • the first determination unit 2020 does not detect the person 10 whose face direction cannot be determined from the video data 32, and the number of persons 10 whose faces are facing toward the other person 10 is one or less. It is determined that no conversation is taking place in the person group 40. Further, when the first determination unit 2020 detects a person 10 whose face orientation cannot be determined from the video data 32, the first determination unit 2020 excludes the person 10 whose face is not facing the other person 10 from the person group 40. , Judge that the presence or absence of conversation cannot be determined.
  • the face portion of the person 10 whose face orientation cannot be determined is not included in the video data 32 (for example, the person turns his back to the camera that generates the video data 32 or is obstructed by an obstacle.
  • the direction of the line of sight is used instead of the direction of the face, the part of the eyes is analyzed instead of the part of the face.
  • the face faces the direction of the person in each of a plurality of directions (for example, predetermined 4 directions or 8 directions) from the time series data of the image area representing the face of the person. It is configured to calculate the probability of being present. When there is a direction in which the calculated probability is equal to or greater than the threshold value, the first determination unit 2020 specifies that direction as the direction of the face of the person 10. On the other hand, when the probabilities calculated for each orientation are all less than the threshold value, the first determination unit 2020 determines that the orientation of the face of the person 10 cannot be determined.
  • a plurality of directions for example, predetermined 4 directions or 8 directions
  • the first determination unit 2020 has a trained model for identifying whether or not the plurality of persons 10 are having a conversation in response to the input of video data including the faces of the plurality of persons 10. May be good.
  • the model for example, in response to the input of video data including the faces of a plurality of persons 10, 1) the conversation is being performed and 2) the conversation is not being performed for the plurality of persons 10. 3) Outputs one of the three determination results that it is not possible to determine whether or not a conversation is taking place.
  • a model can be realized by, for example, a recurrent neural network (RNN).
  • RNN recurrent neural network
  • the model calculates both the probability that a conversation is taking place and the probability that a conversation is not taking place, and compares these probabilities with the threshold value. If the probability that a conversation is taking place is greater than or equal to the threshold value, the determination result that the conversation is taking place is output. If the probability that no conversation is taking place is greater than or equal to the threshold value, the determination result that no conversation is taking place is output. If both the probability that the conversation is taking place and the probability that the conversation is not taking place are less than the threshold value, a determination result that it cannot be determined whether or not the conversation is taking place is output.
  • the above model is trained in advance using learning data composed of a combination of "video data and a label for the correct answer (label indicating whether or not a conversation is taking place)".
  • an existing technique can be used as a technique for learning a model using learning data composed of a combination of input data and a correct label.
  • the movement control unit 2040 moves the mobile robot 20 toward a position where video data 23 or voice data 25 capable of determining the presence or absence of conversation in the person group 40 can be obtained (S108). Then, the second determination unit 2060 makes a conversation determination about the person group 40 by using the video data 23 or the voice data 25 obtained from the mobile robot 20 during or after the movement.
  • a method of controlling the movement of the mobile robot 20 and a method of determining conversation will be described separately for a case where the video data 23 is used for the conversation determination and a case where the voice data 25 is used.
  • the method of the conversation determination using the video data 23 is the same as the conversation determination using the video data 32 described above. Therefore, in the conversation determination using the video data 23, the movement of the mouth of each person 10's face, the direction of the face, the direction of the line of sight, and the like are used. Therefore, the movement control unit 2040 directs the mobile robot toward a position where information necessary for specifying the movement of the mouth, the direction of the face, or the direction of the line of sight can be obtained for each person 10 included in the person group 40. Move 20.
  • the information required to identify the movement of the mouth, the orientation of the face, and the orientation of the line of sight is an image region including the mouth, an image region including the face, and an image region including the eyes, respectively.
  • the movement control unit 2040 moves the mobile robot 20 so as to approach the person group 40.
  • the movement control unit 2040 moves the mobile robot 20 to a position where there are no obstacles between the person 10 included in the person group 40 and the mobile robot 20.
  • the mobile robot may be moved so as to approach a specific object included in the video data obtained from the camera mounted on the mobile robot, or the robot may be moved to a position where there are no obstacles between the mobile robot and the specific object.
  • Existing technology can be used for the technology itself for moving a mobile robot.
  • the movement control unit 2040 calculates the face orientations of the plurality of persons 10 included in the person group 40, and sequentially moves the mobile robot 20 to the front of the faces of the plurality of persons 10. By doing so, the movement control unit 2040 sequentially specifies the movement of the mouth and the direction of the line of sight for each person 10.
  • the movement control unit 2040 may move the mobile robot 20 so that the mouths and eyes of a plurality of people 10 can be imaged from one place.
  • the movement control unit 2040 calculates the average direction of the face orientation of each person 10 from the video data 32 and the video data 23, and moves the mobile robot 20 to a position on the direction.
  • the second determination unit 2060 moves the video. Attempts are made to identify the orientation of the face of the person 10 from the data 23. Then, when the orientation of the face of the person 10 can be specified, the movement control unit 2040 moves the mobile robot 20 to the front of the face of the person 10.
  • the second determination unit 2060 makes a conversation determination about the person group 40 based on the relationship between the size of the voice included in the voice data 25 and the distance to the person group 40.
  • the movement control unit 2040 moves the mobile robot 20 to a position where the distance from the person group 40 is a predetermined distance L3 or less. This predetermined distance L3 is preset as a distance at which the voice of the conversation can be detected by the microphone 24 when the conversation is being held in the person group 40.
  • the second determination unit 2060 acquires voice data 25 from the microphone 24 of the mobile robot 20 that has moved to a position where the distance from the person group 40 is a predetermined distance L3 or less, and the voice represented by the voice data 25. It is determined whether or not the magnitude of is equal to or greater than the threshold value. When the volume of the voice represented by the voice data 25 is equal to or larger than the threshold value, the second determination unit 2060 determines that the conversation is being performed in the person group 40. On the other hand, when the volume of the voice represented by the voice data 25 is less than the threshold value, the second determination unit 2060 determines that the conversation is not performed in the person group 40.
  • the above threshold value may be a fixed value or may be dynamically set according to the distance from the mobile robot 20 to the person group 40. In the latter case, for example, a function that defines the relationship between the distance and the threshold value is predetermined.
  • the second determination unit 2060 specifies the distance from the mobile robot 20 to the person group 40 at the time when the voice data 25 is obtained from the microphone 24, specifies the threshold value by inputting the distance into the above function, and voices. The loudness of the voice represented by the data 25 is compared with the identified threshold.
  • the second determination unit 2060 may analyze the voice data 25 and determine whether or not a human voice is included. In this case, the second determination unit 2060 has a conversation in the person group 40 when the volume of the voice represented by the voice data 25 is equal to or larger than the threshold value and the voice includes a human voice. It is determined that there is. On the other hand, if the volume of the voice is less than the threshold value or the voice does not include the human voice, it is determined that the conversation is not performed in the person group 40. By doing so, for example, it is possible to prevent a situation in which a sound other than a human voice is generated from being erroneously detected as a situation in which the person group 40 is having a conversation.
  • the second determination unit 2060 may consider the number of people whose voice is included in the voice data 25. For example, in the second determination unit 2060, when the volume of the voice represented by the voice data 25 is equal to or larger than the threshold value and the voice includes the voices of a plurality of people, the conversation is performed in the person group 40. It is determined that there is. On the other hand, when the loudness of the voice is less than the threshold value or the voice of the person including the voice is one or less, it is determined that the conversation is not performed in the person group 40. By doing so, for example, it is possible to prevent a situation in which one person is speaking to himself from being erroneously detected as a situation in which the person group 40 is having a conversation.
  • the second determination unit 2060 has low accuracy of the determination result as to whether or not the voice data 25 contains a human voice, and the accuracy of the calculation result regarding the number of people including the voice in the voice data 25. In some cases, it may be determined that the presence or absence of conversation cannot be determined. For example, if both the probability that the voice data 25 contains a human voice and the probability that the voice data 25 does not contain a human voice are less than a predetermined threshold value, it is determined that the presence or absence of conversation cannot be determined. ..
  • the second determination unit 2060 has a trained model for identifying whether or not the voice data includes the voices of a plurality of persons 10 having a conversation in response to the input of voice data. You may be doing it. For example, the model cannot determine whether or not 1) conversation is taking place, 2) conversation is not taking place, and 3) conversation is taking place, depending on the input of voice data. The determination result of any one of the three is output.
  • a model can be realized by, for example, a recurrent neural network (RNN: Recurrent neural network).
  • the model calculates both the probability that a conversation is taking place and the probability that a conversation is not taking place, and compares these probabilities with the threshold value. If the probability that a conversation is taking place is greater than or equal to the threshold value, the determination result that the conversation is taking place is output. If the probability that no conversation is taking place is greater than or equal to the threshold value, the determination result that no conversation is taking place is output. If both the probability that the conversation is taking place and the probability that the conversation is not taking place are less than the threshold value, a determination result that it cannot be determined whether or not the conversation is taking place is output.
  • the above model is learned in advance using learning data composed of a combination of "voice data and a label for the correct answer (a label indicating whether or not a conversation is taking place)".
  • a movement route to the destination is set using map data that can be referred to by the mobile robot 20.
  • a device that calculates a movement route to a destination using map data and performs a process of setting the calculated movement route in the mobile robot 20 is called a route setting device.
  • the route setting device may be a mobile robot 20, a conversation monitoring device 2000, or a device other than these.
  • the route setting device acquires map data, and based on the map data and the destination (position where the mobile robot 20 should be moved) determined by the various methods described above, determines the movement route of the mobile robot 20. calculate. Then, the route setting device sets the calculated movement route in the mobile robot 20. The mobile robot 20 moves according to a set movement path.
  • the route setting device is a device other than the conversation monitoring device 2000
  • the movement control unit 2040 provides the route setting device with information indicating a destination to be set in the mobile robot 20.
  • the existing technology can be used as the technology for calculating the movement route based on the map data and the destination information.
  • the mobile robot 20 moves so as not to interfere with the behavior of a person in the monitoring area.
  • the mobile robot 20 uses the video data 32 and the video data 23 to grasp the movement of each person in the monitoring area and move so as not to come into contact with each person.
  • existing technology for example, technology for moving an autonomous vehicle so as not to collide with other vehicles or passersby
  • the mobile robot 20 moves so that the mobile robot 20 does not enter the field of view of a person who is not included in the person group 40. Therefore, for example, when the person 10 not included in the person group 40 is detected from the video data 23, the route setting device specifies the face direction or the line-of-sight direction of the person 10. Then, the route setting device is for the mobile robot 20 to reach the destination without entering the line of sight of the person 10 based on the specified face direction or line-of-sight direction and the destination of the mobile robot 20. The movement route is calculated, and the movement route is set in the mobile robot 20.
  • the route setting device detects only a person who is unlikely to have a large change in the direction of the face or the direction of the line of sight (for example, a person who is standing or sitting in a chair) from the video data, and the detected person.
  • the movement path of the mobile robot 20 may be set so as not to be in sight.
  • the mobile robot 20 may be stopped or may be moving until the control by the movement control unit 2040 is received.
  • a movement route is set so as to patrol a part or all of the monitoring area.
  • the monitoring area is patrolled by the mobile robot 20 in the monitoring area. It is preferable to be able to detect the person group 40 at various places in the country.
  • the movement route set in the mobile robot 20 for patrol is also referred to as a patrol route.
  • the patrol route includes a region where the distribution of people is high (that is, there are many people) among the monitoring regions.
  • the patrol route should include only the monitoring area where the distribution of people is high.
  • the patrol route is set so that the frequency of patrol in a region with a high density of people is higher than the frequency of patrol in a region with a low density of people.
  • the patrol path of the mobile robot 20 includes an area not included in the imaging range of the camera 30 (in addition, the camera 30 is not included in the imaging range of the camera 30.
  • the mobile robot 20 can image an area that is difficult to image with the fixed camera, so that the inside of the monitoring area can be widely monitored.
  • the patrol route may be set manually or automatically by a route setting device.
  • the route setting device identifies a region outside the imaging range of the camera 30 by analyzing the video data generated by the camera 30, and generates a patrol route including the region outside the imaging range. More specifically, the route setting device identifies an area within the imaging range of the camera 30 by using the map data of the monitoring area and the video data generated by the camera 30, and the area other than the specified area. Is specified as a region outside the imaging range.
  • the route setting device generates a patrol route so as to patrol the region outside the imaging range.
  • the regions outside the imaging range are a plurality of regions that are not connected to each other.
  • the route setting device generates a patrol route so as to sequentially patrol the plurality of regions outside the imaging range.
  • different patrol routes may be set for each mobile robot 20.
  • each patrol path includes regions outside the imaging range that are different from each other.
  • the conversation monitoring device 2000 executes a predetermined coping process (S104). Any process can be adopted as the coping process.
  • the coping process is a process of issuing a warning to the person group 40 (hereinafter referred to as a warning process).
  • the warning process is a process of displaying a screen indicating a warning on a display device provided in the mobile robot 20 or irradiating a projector provided in the mobile robot 20 with an image indicating a warning.
  • the warning process is a process of outputting a voice indicating a warning from a speaker provided in the mobile robot 20.
  • the mobile robot 20 may give a warning to the person group 40 after approaching it to some extent.
  • the conversation monitoring device 2000 may move the mobile robot 20 to a position where the distance from the person group 40 is equal to or less than a predetermined threshold value, and then output various warnings described above from the mobile robot 20. good.
  • the conversation monitoring device 2000 may send a notification indicating a warning to each person included in the person group 40.
  • the information in which the identification information of the person 10 and the transmission destination of the notification to the person (for example, an e-mail address) are associated with each other is stored in advance in a storage device accessible from the conversation monitoring device 2000.
  • the conversation monitoring device 2000 identifies the identification information of each person included in the person group 40 by using the video data 32, the video data 23, or the voice data 25, and sends the identification information to the destination corresponding to the identification information as described above. Send a notification.
  • the identification information of the person 10 is a feature amount on the image of the person 10 (for example, a feature amount of a face).
  • the conversation monitoring device 2000 extracts the feature amount of the person 10 included in the person group 40 from the video data 32 and the video data 23, and sets the feature amount to match the feature amount (for example, the feature amount having a similarity equal to or higher than the threshold value). Send a notification to the associated destination.
  • the voice data 25 is used, the feature amount of the voice of the person 10 is used as the identification information of the person 10.
  • the conversation monitoring device 2000 when it is detected that a conversation is taking place in the person group 40, the conversation monitoring device 2000 does not limit the target to only the person group 40, but also warns other people to pay attention. You may go.
  • the conversation monitoring device 2000 controls a device that performs broadcasting to perform broadcasting (indoor broadcasting, in-house broadcasting, outdoor broadcasting, etc.) that encourages avoiding conversation at a short distance, or plays a predetermined warning sound. Let me do it.
  • the coping process is not limited to the warning process.
  • the conversation monitoring device 2000 may store information (identification information, video data 32 or video data 23 in which the person group 40 is captured) about the person group 40 having a conversation in the storage device. By doing so, for example, when one of the person group 40 is found to have an infectious disease, another person included in the person group 40 can be identified as a person suspected of having an infectious disease. can.
  • the conversation monitoring device 2000 does not immediately perform the coping process even if it is detected that the conversation is being performed in the person group 40, and may perform the coping process only when the conversation continues for a predetermined time.
  • the conversation monitoring device 2000 may perform coping processing in multiple stages according to the duration of the conversation.
  • information corresponding to different warning processes for each of the plurality of warning levels is stored in advance in a storage device accessible from the conversation monitoring device 2000. For example, a higher warning level is associated with a more prominent (more effective warning) warning.
  • the conversation monitoring device 2000 gives a higher level warning as the conversation duration increases. For example, the conversation monitoring device 2000 performs a first-level warning process of "moving to a position within a predetermined distance from the person group 40" when the conversation duration is P1 or longer. Next, when the duration of the conversation is P2 (> P1) or longer, the second level warning process of "displaying the warning screen on the display device or projecting the warning image on the ground” is performed. Then, when the duration of the conversation is P3 (> P2) or longer, the third level warning process of "outputting the warning voice from the speaker” is performed.
  • a specific condition is that "appropriate measures have been taken to prevent infectious diseases". More specific examples include the condition that "all the persons 10 included in the person group 40 are wearing masks" and "the plurality of persons 10 included in the person group 40 are partitioned by a partition". The condition that "is there" can be mentioned.
  • a policy for restricting conversations in the monitoring area it is conceivable to adopt a policy that "conversations may be made at a short distance if appropriate measures are taken to prevent infectious diseases”. ..
  • the timing for determining whether or not the above-mentioned specific conditions are satisfied may be either before or after the conversation determination is performed for the person group 40.
  • the above-mentioned warning level may be changed depending on whether or not the above-mentioned specific conditions are satisfied. That is, the warning level when a specific condition is satisfied is lower than the warning level when a specific condition is not satisfied. By doing so, for example, if appropriate measures are taken to prevent infectious diseases, a more conservative warning will be given, and if appropriate measures are not taken to prevent infectious diseases. Can be used to give more prominent warnings.
  • Non-temporary computer-readable media include various types of tangible storage mediums.
  • Examples of non-temporary computer readable media are magnetic recording media (eg, flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (eg, magneto-optical disks), CD-ROMs, CD-Rs, CD-Rs. / W, including semiconductor memory (for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM).
  • the program may also be provided to the computer by various types of temporary computer readable medium. Examples of temporary computer-readable media include electrical, optical, and electromagnetic waves.
  • the temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.
  • (Appendix 1) Whether or not the plurality of persons are having a conversation by acquiring the first video data in which a plurality of persons located within the first predetermined distance from each other in the monitoring area are detected and analyzing the first video data.
  • the first judgment unit that determines If it is not possible to determine whether or not the plurality of persons are talking even by analyzing the first video data, the mobile robot provided with the first camera can capture the faces of the plurality of persons.
  • a movement control unit that moves a mobile robot provided with a microphone to a position within a second predetermined distance from the plurality of persons.
  • a conversation monitoring device having two determination units.
  • the first determination unit is Based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the first video data, it is determined whether or not the plurality of persons are having a conversation. Addition: When there is a person who cannot specify the movement of the mouth, the direction of the face, or the direction of the line of sight among the plurality of persons, it is determined that it cannot be determined whether or not the plurality of persons are having a conversation.
  • the conversation monitoring device according to 1.
  • the first determination unit calculates both the probability that a conversation is taking place and the probability that a conversation is not taking place for the plurality of persons included in the first video data, and determines the probability that the conversation is taking place.
  • the conversation monitoring device according to Appendix 1, wherein it is determined that it cannot be determined whether or not the plurality of persons are having a conversation when both of the probabilities that no conversation is taking place are less than the threshold value.
  • the second determination unit determines whether or not the plurality of persons are having a conversation based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the second video data.
  • the conversation monitoring device according to any one of Supplementary note 1 to 3 for determination.
  • the second determination unit is based on the loudness of the voice represented by the voice data and the distance between the mobile robot and the plurality of persons when the voice data is obtained.
  • the conversation monitoring device according to any one of Supplementary note 1 to 3, which determines whether or not a person is having a conversation.
  • the mobile robot is equipped with a projector. Before accepting the control by the movement control unit, the mobile robot moves in the monitoring area while projecting a distance image showing the magnitude of the first predetermined distance from the projector to the ground. By detecting the distance image and a plurality of people from the second video data and comparing the distance between the detected people with the size of the distance image, the distance images are located within the first predetermined distance from each other.
  • the conversation monitoring device Detects multiple people and The conversation monitoring device according to any one of Supplementary note 1 to 5, wherein the first determination unit acquires the second video data in which the plurality of persons are detected as the first video data.
  • the movement control unit identifies the direction or line-of-sight direction of the face of a person around the mobile robot by analyzing the second video data, and determines a region located in the specified face direction or line-of-sight direction.
  • the conversation monitoring device according to any one of Supplementary note 1 to 6, which moves the mobile robot so as not to pass through.
  • a second camera that captures a part of the monitoring area is provided. The mobile robot moves an area that cannot be imaged by the second camera before accepting control by the movement control unit.
  • the conversation monitoring device according to any one of Supplementary note 1 to 7, wherein the first video data is video data generated by the first camera or the second camera.
  • Appendix 9 The conversation monitoring device according to Appendix 8, wherein a region that cannot be imaged by the second camera is specified by using the video data generated by the second camera and the map data of the monitoring area.
  • Appendix 10 When it is determined by the first determination unit or the second determination unit that the plurality of persons are having a conversation, a predetermined coping process is performed on the plurality of persons.
  • the predetermined coping process includes a process of bringing the mobile robot closer to the plurality of persons, a process of causing a display device or a projector provided in the mobile robot to output information indicating a warning, and a process provided in the mobile robot.
  • the conversation monitoring device according to any one of Supplementary note 1 to 9, wherein one or more of the processes for outputting a warning sound to the speaker are included.
  • Appendix 11 A control method performed by a computer Whether or not the plurality of persons are having a conversation by acquiring the first video data in which a plurality of persons located within the first predetermined distance from each other in the monitoring area are detected and analyzing the first video data.
  • the first judgment step to determine If it is not possible to determine whether or not the plurality of persons are talking even by analyzing the first video data, the mobile robot provided with the first camera can capture the faces of the plurality of persons.
  • a movement control step for moving a mobile robot provided with a microphone to a position within a second predetermined distance from the plurality of persons. After moving the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed to determine whether or not the plurality of persons are having a conversation.
  • a control method comprising two determination steps.
  • both the probability that a conversation is taking place and the probability that a conversation is not taking place are calculated for the plurality of persons included in the first video data, and the probability that the conversation is taking place is calculated.
  • the control method according to Appendix 11, wherein it is determined that it cannot be determined whether or not the plurality of persons are having a conversation when both of the probabilities that the conversation is not taking place are less than the threshold value.
  • Appendix 14 In the second determination step, whether or not the plurality of persons are having a conversation based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the second video data.
  • the plurality of voice data are based on the loudness of the voice represented by the voice data and the distance between the mobile robot and the plurality of persons when the voice data is obtained.
  • the control method according to any one of Supplementary note 11 to 13, which determines whether or not a person is having a conversation.
  • the mobile robot is equipped with a projector. Before accepting the control by the movement control step, the mobile robot moves in the monitoring area while projecting a distance image showing the magnitude of the first predetermined distance from the projector to the ground. By detecting the distance image and a plurality of people from the second video data and comparing the distance between the detected people with the size of the distance image, the distance images are located within the first predetermined distance from each other.
  • the control method according to any one of Supplementary note 11 to 17, wherein the first video data is video data generated by the first camera or the second camera.
  • Appendix 19 The control method according to Appendix 18, wherein a region that cannot be imaged by the second camera is specified by using the video data generated by the second camera and the map data of the monitoring area.
  • Appendix 20 When it is determined by the first determination step or the second determination step that the plurality of persons are having a conversation, a predetermined coping process is performed on the plurality of persons.
  • the predetermined coping process includes a process of bringing the mobile robot closer to the plurality of persons, a process of causing a display device or a projector provided in the mobile robot to output information indicating a warning, and a process provided in the mobile robot.
  • the control method according to any one of Supplementary note 11 to 19, wherein any one or more of the processes for outputting a warning sound to the speaker is included.
  • (Appendix 21) A computer-readable medium that stores programs The program Whether or not the plurality of persons are having a conversation by acquiring the first video data in which a plurality of persons located within the first predetermined distance from each other in the monitoring area are detected and analyzing the first video data.
  • the first judgment step to determine If it is not possible to determine whether or not the plurality of persons are talking even by analyzing the first video data, the mobile robot provided with the first camera can capture the faces of the plurality of persons.
  • a movement control step for moving a mobile robot provided with a microphone to a position within a second predetermined distance from the plurality of persons. After moving the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed to determine whether or not the plurality of persons are having a conversation.
  • a computer-readable medium that causes a computer to perform two judgment steps.
  • both the probability that a conversation is taking place and the probability that a conversation is not taking place are calculated for the plurality of persons included in the first video data, and the probability that the conversation is taking place is calculated.
  • Appendix 24 In the second determination step, whether or not the plurality of persons are having a conversation based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the second video data.
  • the plurality of voice data are based on the loudness of the voice represented by the voice data and the distance between the mobile robot and the plurality of persons when the voice data is obtained.
  • the mobile robot is equipped with a projector. Before accepting the control by the movement control step, the mobile robot moves in the monitoring area while projecting a distance image showing the magnitude of the first predetermined distance from the projector to the ground.
  • the distance images are located within the first predetermined distance from each other. Detects multiple people and The computer-readable medium according to any one of Supplementary note 21 to 25, wherein the second video data in which the plurality of persons are detected in the first determination step is acquired as the first video data.
  • Appendix 27 In the movement control step, by analyzing the second video data, the direction or line-of-sight direction of the face of a person around the mobile robot is specified, and the region located in the specified face direction or line-of-sight direction is determined.
  • a second camera that captures a part of the monitoring area is provided.
  • the mobile robot moves an area that cannot be imaged by the second camera before accepting the control by the movement control step.
  • Appendix 30 When it is determined by the first determination step or the second determination step that the plurality of persons are having a conversation, a predetermined coping process is performed on the plurality of persons.
  • the predetermined coping process includes a process of bringing the mobile robot closer to the plurality of persons, a process of causing a display device or a projector provided in the mobile robot to output information indicating a warning, and a process provided in the mobile robot.
  • the computer-readable medium according to any one of Supplementary note 21 to 29, which comprises one or more of the processes for outputting a warning sound to the speaker.

Landscapes

  • Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Manipulator (AREA)
  • Alarm Systems (AREA)

Abstract

A conversation monitoring device (2000) acquires video data (32) in which a plurality of persons (10) (group of persons (40)) positioned within a first predetermined distance of each other inside a monitored area have been detected, and analyzes the video data (32) to determine whether or not a conversation is taking place among the group of persons (40). If the existence of a conversation cannot be determined, the conversation monitoring device (2000) causes a mobile robot (20) provided with a camera (22) to move to a location where an image of the face of each person (10) included in the group of persons (40) can be captured, or causes the mobile robot (20) provided with a microphone (24) to move to a location within a second predetermined distance from the group of persons (40). The conversation monitoring device (2000) analyzes video data (23) obtained from the camera (22) or audio data (25) obtained from the microphone (24) to determine whether or not a conversation is taking place among the group of persons (40).

Description

会話監視装置、制御方法、及びコンピュータ可読媒体Conversation monitoring device, control method, and computer-readable medium
 本発明は、複数の人物による会話を検出する技術に関する。 The present invention relates to a technique for detecting conversations between a plurality of persons.
 感染症の感染拡大を予防する観点などから、近距離で長時間の会話が行われることを避けることが好ましい状況が存在する。そこで、近距離で長時間会話が行われている状況を検出するシステムが開発されている。例えば特許文献1は、施設に設置されたカメラから得られる画像を利用して、居住者と来訪者が所定時間以上会話を行ったことを検出し、当該検出に応じて、感染症に感染する危険性が高いことを通知する技術を開示している。ここで、特許文献1では、近距離で向かい合っている状態が、会話をしている状態として検出される。 From the viewpoint of preventing the spread of infectious diseases, it is preferable to avoid long conversations at short distances. Therefore, a system has been developed to detect a situation in which a conversation is held for a long time at a short distance. For example, Patent Document 1 detects that a resident and a visitor have a conversation for a predetermined time or longer by using an image obtained from a camera installed in a facility, and infects an infectious disease according to the detection. It discloses a technology to notify that the risk is high. Here, in Patent Document 1, a state of facing each other at a short distance is detected as a state of having a conversation.
国際公開第2019/239813号International Publication No. 2019/239913
 特許文献1のシステムでは、施設に固定で設置されたカメラから得られる画像を利用して、居住者と来訪者が向き合っていることを検出する。しかしながら、固定で設置されたカメラを利用する場合、居住者や来訪者がカメラの死角に入ってしまうなどの理由により、居住者や来訪者の顔をカメラの画像から検出できない恐れがある。その結果、居住者と来訪者が会話をしていても、そのことを検出できない恐れがある。 The system of Patent Document 1 detects that a resident and a visitor are facing each other by using an image obtained from a camera fixedly installed in the facility. However, when using a fixedly installed camera, there is a possibility that the face of the resident or the visitor cannot be detected from the image of the camera because the resident or the visitor enters the blind spot of the camera. As a result, even if the resident and the visitor are having a conversation, it may not be detected.
 本発明は上記の課題に鑑みてなされたものであり、その目的の一つは、会話が行われている状況を精度良く検出する技術を提供することである。 The present invention has been made in view of the above problems, and one of the purposes thereof is to provide a technique for accurately detecting a situation in which a conversation is taking place.
 本開示の会話監視装置は、監視領域内において互いに第1所定距離以内に位置する複数の人物が検出された第1ビデオデータを取得し、前記第1ビデオデータを解析して、前記複数の人物が会話をしているか否かを判定する第1判定部と、前記第1ビデオデータを解析しても、前記複数の人物が会話をしているか否かを判別できない場合に、第1カメラが設けられている移動型ロボットを前記複数の人物の顔を撮像可能な位置まで移動させるか、又は、マイクロフォンが設けられている移動型ロボットを前記複数の人物から第2所定距離以内の場所まで移動させる移動制御部と、前記移動型ロボットを移動させた後に、前記第1カメラから得られる第2ビデオデータ又は前記マイクロフォンから得られる音声データを解析して、前記複数の人物が会話をしているか否かを判定する第2判定部と、を有する。 The conversation monitoring device of the present disclosure acquires first video data in which a plurality of persons located within a first predetermined distance from each other in the monitoring area are detected, analyzes the first video data, and analyzes the first video data to obtain the plurality of persons. The first camera determines whether or not the plurality of persons are having a conversation even if the first determination unit for determining whether or not the person is having a conversation and the first video data are analyzed. The mobile robot provided is moved to a position where the faces of the plurality of persons can be imaged, or the mobile robot provided with the microphone is moved to a place within a second predetermined distance from the plurality of persons. After moving the mobile control unit and the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed, and whether the plurality of persons are having a conversation. It has a second determination unit for determining whether or not it is present.
 本開示の制御方法は、コンピュータによって実行される。当該制御方法は、監視領域内において互いに第1所定距離以内に位置する複数の人物が検出された第1ビデオデータを取得し、前記第1ビデオデータを解析して、前記複数の人物が会話をしているか否かを判定する第1判定ステップと、前記第1ビデオデータを解析しても、前記複数の人物が会話をしているか否かを判別できない場合に、第1カメラが設けられている移動型ロボットを前記複数の人物の顔を撮像可能な位置まで移動させるか、又は、マイクロフォンが設けられている移動型ロボットを前記複数の人物から第2所定距離以内の場所まで移動させる移動制御ステップと、前記移動型ロボットを移動させた後に、前記第1カメラから得られる第2ビデオデータ又は前記マイクロフォンから得られる音声データを解析して、前記複数の人物が会話をしているか否かを判定する第2判定ステップと、を有する。 The control method of the present disclosure is executed by a computer. The control method acquires first video data in which a plurality of persons located within a first predetermined distance from each other in the monitoring area are detected, analyzes the first video data, and the plurality of persons have a conversation. A first camera is provided when it is not possible to determine whether or not the plurality of persons are talking even if the first determination step for determining whether or not the video is being performed and the first video data are analyzed. Movement control to move the mobile robot to a position where the faces of the plurality of persons can be imaged, or to move the mobile robot provided with the microphone to a place within a second predetermined distance from the plurality of persons. After moving the step and the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed to determine whether or not the plurality of persons are talking. It has a second determination step for determination.
 本開示のコンピュータ可読媒体は、本開示の制御方法をコンピュータに実行させるプログラムを格納している。 The computer-readable medium of the present disclosure stores a program that causes a computer to execute the control method of the present disclosure.
 本発明によれば、会話が行われている状況を精度良く検出する技術が提供される。 According to the present invention, there is provided a technique for accurately detecting a situation in which a conversation is taking place.
実施形態1の会話監視装置の概要を例示する図である。It is a figure which illustrates the outline of the conversation monitoring apparatus of Embodiment 1. FIG. 会話監視装置の機能構成を例示する図である。It is a figure which illustrates the functional structure of a conversation monitoring apparatus. 会話監視装置を実現するコンピュータのハードウエア構成を例示するブロック図である。It is a block diagram which illustrates the hardware composition of the computer which realizes a conversation monitoring device. 移動型ロボットのハードウエア構成を例示するブロック図である。It is a block diagram which illustrates the hardware composition of a mobile robot. 実施形態1の会話監視装置によって実行される処理の流れを例示するフローチャートである。It is a flowchart which illustrates the flow of the process executed by the conversation monitoring apparatus of Embodiment 1. FIG.
 以下では、本開示の実施形態について、図面を参照しながら詳細に説明する。各図面において、同一又は対応する要素には同一の符号が付されており、説明の明確化のため、必要に応じて重複説明は省略される。 Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. In each drawing, the same or corresponding elements are designated by the same reference numerals, and duplicate explanations are omitted as necessary for the sake of clarity of explanation.
 図1は、実施形態1の会話監視装置(後述する図2の会話監視装置2000)の概要を例示する図である。なお、図1を参照して行う以下の説明は、実施形態1の会話監視装置2000についての理解を容易にするためのものであり、実施形態1の会話監視装置2000の動作は以下で説明するものに限定されない。 FIG. 1 is a diagram illustrating an outline of the conversation monitoring device of the first embodiment (conversation monitoring device 2000 of FIG. 2 described later). The following description given with reference to FIG. 1 is for facilitating the understanding of the conversation monitoring device 2000 of the first embodiment, and the operation of the conversation monitoring device 2000 of the first embodiment will be described below. Not limited to things.
 会話監視装置2000は、所定の監視領域において、複数の人物10が会話している状況を検出する。監視領域は、オフィスなどといった任意の場所とすることができる。また、監視領域は、屋外であってもよい。 The conversation monitoring device 2000 detects a situation in which a plurality of persons 10 are talking in a predetermined monitoring area. The monitoring area can be any place such as an office. Further, the monitoring area may be outdoors.
 まず会話監視装置2000は、カメラ30によって生成されたビデオデータ32を解析して、複数の人物10において会話が行われているか否かの判定(以下、会話判定)を試みる。ここで、カメラ30は、後述する移動型ロボット20に搭載されているカメラ22であってもよいし、カメラ22以外のカメラ(例えば、壁や天井などに設けられている監視カメラ)であってもよい。前者の場合、移動型ロボット20に搭載されているカメラ22が、ビデオデータ32と後述するビデオデータ23の双方を生成する。以下、会話が行われているか否かの判定対象である複数の人物10の組みを、人物グループ40と呼ぶ。ビデオデータ32には、人物グループ40が複数含まれていてもよい。 First, the conversation monitoring device 2000 analyzes the video data 32 generated by the camera 30 and attempts to determine whether or not conversation is being performed by a plurality of persons 10 (hereinafter referred to as conversation determination). Here, the camera 30 may be a camera 22 mounted on the mobile robot 20 described later, or a camera other than the camera 22 (for example, a surveillance camera provided on a wall, a ceiling, or the like). May be good. In the former case, the camera 22 mounted on the mobile robot 20 generates both the video data 32 and the video data 23 described later. Hereinafter, a set of a plurality of persons 10 to be determined whether or not a conversation is being performed is referred to as a person group 40. The video data 32 may include a plurality of person groups 40.
 ここで、ビデオデータ32において、人物グループ40に含まれる複数の人物10は、互いからの距離が所定距離 L1 以下となっている。言い換えれば、互いからの距離が所定距離 L1 以下である複数の人物10が含まれるビデオデータが、ビデオデータ32として扱われる。なお、人物グループ40に三人以上の人物10が含まれる場合、「複数の人物10は、互いからの距離が所定距離 L1 以下である」とは、例えば、「全ての人物10が直径 L1 の円の中に含まれる」という状況である。 Here, in the video data 32, the plurality of persons 10 included in the person group 40 have a predetermined distance L1 or less from each other. In other words, video data including a plurality of persons 10 whose distance from each other is a predetermined distance L1 or less is treated as video data 32. When three or more people 10 are included in the person group 40, "a plurality of people 10 have a predetermined distance L1 or less from each other" means, for example, "all people 10 have a diameter L1". It is included in the circle. "
 所定距離 L1 には任意の値を採用できる。例えば所定距離 L1 は、感染症予防のために空けるべき人と人の間の距離(いわゆるソーシャルディスタンス)に基づいて決定される。具体的には、所定距離 L1 として、ソーシャルディスタンスとして定義されている値や、その値に所定のマージンを加えた値を利用できる。 Any value can be adopted for the predetermined distance L1. For example, the predetermined distance L1 is determined based on the distance between people (so-called social distance) that should be left to prevent infectious diseases. Specifically, as a predetermined distance L1, a value defined as a social distance or a value obtained by adding a predetermined margin to the value can be used.
 会話監視装置2000は、ビデオデータ32を解析しても人物グループ40において会話が行われているか否かを判別できなかった場合に、移動型ロボット20を利用して、人物グループ40についての会話判定をさらに行う。具体的には、会話監視装置2000は、ビデオデータ32を利用した会話判定により、1)人物グループ40において会話が行われている、2)人物グループ40において会話が行われていない、及び3)人物グループ40において会話が行われているか否かを判別できない(例えば、会話をしている確率と、会話をしていない確率とのいずれもが、十分に高くない)という3つのうちのいずれかの判定結果を得る。そして、会話監視装置2000は、判定結果が3)である場合に、移動型ロボット20を利用してさらに会話判定を行う。 When the conversation monitoring device 2000 cannot determine whether or not a conversation is being performed in the person group 40 even after analyzing the video data 32, the conversation monitoring device 2000 uses the mobile robot 20 to determine the conversation about the person group 40. Do more. Specifically, the conversation monitoring device 2000 uses video data 32 to determine conversation, and 1) conversation is taking place in the person group 40, 2) conversation is not taking place in the person group 40, and 3). One of the three that it is not possible to determine whether or not a conversation is taking place in the person group 40 (for example, neither the probability of having a conversation nor the probability of not having a conversation is sufficiently high). Judgment result is obtained. Then, when the determination result is 3), the conversation monitoring device 2000 further performs a conversation determination by using the mobile robot 20.
 会話監視装置2000は、移動型ロボット20に搭載されているカメラ22から得られるビデオデータ23、又は移動型ロボット20に搭載されているマイクロフォン24から得られる音声データ25を利用する。なお、カメラ22とマイクロフォン24は、いずれか一方が設けられていてもよいし、双方が設けられていてもよい。この際、会話監視装置2000は、人物グループ40において会話が行われているか否かを判別できるように、移動型ロボット20を適切な位置(以下、目的地)へ移動させる。 The conversation monitoring device 2000 uses the video data 23 obtained from the camera 22 mounted on the mobile robot 20 or the voice data 25 obtained from the microphone 24 mounted on the mobile robot 20. The camera 22 and the microphone 24 may be provided with either one or both. At this time, the conversation monitoring device 2000 moves the mobile robot 20 to an appropriate position (hereinafter referred to as a destination) so that it can be determined whether or not a conversation is being performed in the person group 40.
 例えば、会話判定にビデオデータ23を利用するとする。この場合、例えば会話監視装置2000は、人物グループ40に含まれる各人物10の顔を撮像できる位置まで移動型ロボット20を移動させ、移動後に得られるビデオデータ23を用いて、人物グループ40についての会話判定を行う。 For example, suppose that the video data 23 is used for conversation determination. In this case, for example, the conversation monitoring device 2000 moves the mobile robot 20 to a position where the face of each person 10 included in the person group 40 can be imaged, and uses the video data 23 obtained after the movement to describe the person group 40. Make a conversation judgment.
 一方、会話判定に音声データ25を利用するとする。この場合、例えば会話監視装置2000は、人物グループ40からの距離が所定距離 L2 の位置まで移動型ロボット20を移動させ、移動後に得られる音声データ25を用いて、人物グループ40について会話判定を行う。 On the other hand, it is assumed that the voice data 25 is used for conversation determination. In this case, for example, the conversation monitoring device 2000 moves the mobile robot 20 to a position where the distance from the person group 40 is a predetermined distance L2, and makes a conversation determination for the person group 40 using the voice data 25 obtained after the movement. ..
 人物グループ40において会話が行われていると判定された場合、会話監視装置2000は、所定の対処処理(例えば、警告処理)を行う。 When it is determined that a conversation is taking place in the person group 40, the conversation monitoring device 2000 performs a predetermined coping process (for example, a warning process).
<作用効果の例>
 本実施形態の会話監視装置2000は、互いからの距離が所定距離 L1 以下である複数の人物10(人物グループ40)が検出されたビデオデータ32を用いても、これら複数の人物10が会話をしているか否かを判別できない場合に、移動型ロボット20を利用する。具体的には、会話監視装置2000は、移動型ロボット20を適切な位置に移動させ、移動型ロボット20から得られるビデオデータ23又は音声データ25を利用して、人物グループ40についての会話判定を行う。このように、本実施形態の会話監視装置2000によれば、互いに特定の距離以内に位置している複数の人物10が会話を行っているか否かを判別できない場合に、移動型ロボット20を制御することで、当該判別が可能なビデオデータ23や音声データ25を得ることができる。そのため、特定の距離以内に位置する複数の人物10が会話をしているか否かを、より高い精度で判定することができる。
<Example of action effect>
In the conversation monitoring device 2000 of the present embodiment, even if the video data 32 in which a plurality of persons 10 (person group 40) whose distances from each other are equal to or less than a predetermined distance L1 is detected, these plurality of persons 10 have a conversation. When it is not possible to determine whether or not the robot 20 is used, the mobile robot 20 is used. Specifically, the conversation monitoring device 2000 moves the mobile robot 20 to an appropriate position, and uses the video data 23 or the voice data 25 obtained from the mobile robot 20 to determine the conversation about the person group 40. conduct. As described above, according to the conversation monitoring device 2000 of the present embodiment, when it is not possible to determine whether or not a plurality of persons 10 located within a specific distance from each other are having a conversation, the mobile robot 20 is controlled. By doing so, it is possible to obtain video data 23 and audio data 25 that can be discriminated. Therefore, it is possible to determine with higher accuracy whether or not a plurality of persons 10 located within a specific distance are having a conversation.
 以下、本実施形態の会話監視装置2000について、より詳細に説明する。 Hereinafter, the conversation monitoring device 2000 of this embodiment will be described in more detail.
<機能構成の例>
 図2は、会話監視装置2000の機能構成を例示する図である。会話監視装置2000は、第1判定部2020、移動制御部2040、及び第2判定部2060を有する。第1判定部2020は、ビデオデータ32を解析して、人物グループ40において会話が行われているか否かを判定する。移動制御部2040は、ビデオデータ32を利用した判定では、人物グループ40において会話が行われているか否かを判別できない場合に、カメラ22が設けられている移動型ロボット20を、人物グループ40に含まれる各人物10の顔を撮像可能な位置まで移動させるか、又は、マイクロフォン24が設けられている移動型ロボット20を、人物グループ40からの距離が所定距離 L2 以下の位置まで移動させる。第2判定部2060は、移動型ロボット20の移動後に、カメラ22から得られるビデオデータ23、又はマイクロフォン24から得られる音声データ25を解析して、人物グループ40において会話が行われているか否かを判定する。
<Example of functional configuration>
FIG. 2 is a diagram illustrating the functional configuration of the conversation monitoring device 2000. The conversation monitoring device 2000 includes a first determination unit 2020, a movement control unit 2040, and a second determination unit 2060. The first determination unit 2020 analyzes the video data 32 and determines whether or not a conversation is taking place in the person group 40. When the movement control unit 2040 cannot determine whether or not a conversation is being performed in the person group 40 by the determination using the video data 32, the movement control unit 2040 transfers the mobile robot 20 provided with the camera 22 to the person group 40. The face of each of the included persons 10 is moved to a position where imaging is possible, or the mobile robot 20 provided with the microphone 24 is moved to a position where the distance from the person group 40 is a predetermined distance L2 or less. The second determination unit 2060 analyzes the video data 23 obtained from the camera 22 or the voice data 25 obtained from the microphone 24 after the mobile robot 20 moves, and determines whether or not a conversation is taking place in the person group 40. To judge.
<ハードウエア構成の例>
 会話監視装置2000の各機能構成部は、各機能構成部を実現するハードウエア(例:ハードワイヤードされた電子回路など)で実現されてもよいし、ハードウエアとソフトウエアとの組み合わせ(例:電子回路とそれを制御するプログラムの組み合わせなど)で実現されてもよい。以下、会話監視装置2000の各機能構成部がハードウエアとソフトウエアとの組み合わせで実現される場合について、さらに説明する。
<Example of hardware configuration>
Each functional component of the conversation monitoring device 2000 may be realized by hardware that realizes each functional component (eg, a hard-wired electronic circuit, etc.), or a combination of hardware and software (eg, example). It may be realized by a combination of an electronic circuit and a program that controls it). Hereinafter, a case where each functional component of the conversation monitoring device 2000 is realized by a combination of hardware and software will be further described.
 図3は、会話監視装置2000を実現するコンピュータ500のハードウエア構成を例示するブロック図である。コンピュータ500は、任意のコンピュータである。例えばコンピュータ500は、PC(Personal Computer)やサーバマシンなどといった、据え置き型のコンピュータである。その他にも例えば、コンピュータ500は、スマートフォンやタブレット端末などといった可搬型のコンピュータである。その他にも例えば、コンピュータ500は、移動型ロボット20に内蔵されているコントローラ(後述するコントローラ600)であってもよい。この場合、会話監視装置2000が移動型ロボット20として実現されることとなる(すなわち、移動型ロボット20が会話監視装置2000としての機能も兼ね備えることとなる)。コンピュータ500は、会話監視装置2000を実現するために設計された専用のコンピュータであってもよいし、汎用のコンピュータであってもよい。 FIG. 3 is a block diagram illustrating a hardware configuration of a computer 500 that realizes the conversation monitoring device 2000. The computer 500 is any computer. For example, the computer 500 is a stationary computer such as a PC (Personal Computer) or a server machine. In addition, for example, the computer 500 is a portable computer such as a smartphone or a tablet terminal. In addition, for example, the computer 500 may be a controller (controller 600 described later) built in the mobile robot 20. In this case, the conversation monitoring device 2000 is realized as the mobile robot 20 (that is, the mobile robot 20 also has a function as the conversation monitoring device 2000). The computer 500 may be a dedicated computer designed to realize the conversation monitoring device 2000, or may be a general-purpose computer.
 例えば、コンピュータ500に対して所定のアプリケーションをインストールすることにより、コンピュータ500で、会話監視装置2000の各機能が実現される。上記アプリケーションは、会話監視装置2000の機能構成部を実現するためのプログラムで構成される。 For example, by installing a predetermined application on the computer 500, each function of the conversation monitoring device 2000 is realized on the computer 500. The above application is composed of a program for realizing the functional component of the conversation monitoring device 2000.
 コンピュータ500は、バス502、プロセッサ504、メモリ506、ストレージデバイス508、入出力インタフェース510、及びネットワークインタフェース512を有する。バス502は、プロセッサ504、メモリ506、ストレージデバイス508、入出力インタフェース510、及びネットワークインタフェース512が、相互にデータを送受信するためのデータ伝送路である。ただし、プロセッサ504などを互いに接続する方法は、バス接続に限定されない。 The computer 500 has a bus 502, a processor 504, a memory 506, a storage device 508, an input / output interface 510, and a network interface 512. The bus 502 is a data transmission path for the processor 504, the memory 506, the storage device 508, the input / output interface 510, and the network interface 512 to transmit and receive data to and from each other. However, the method of connecting the processors 504 and the like to each other is not limited to the bus connection.
 プロセッサ504は、CPU(Central Processing Unit)、GPU(Graphics Processing Unit)、又は FPGA(Field-Programmable Gate Array)などの種々のプロセッサである。メモリ506は、RAM(Random Access Memory)などを用いて実現される主記憶装置である。ストレージデバイス508は、ハードディスク、SSD(Solid State Drive)、メモリカード、又は ROM(Read Only Memory)などを用いて実現される補助記憶装置である。 The processor 504 is various processors such as a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or an FPGA (Field-Programmable Gate Array). The memory 506 is a main storage device realized by using RAM (RandomAccessMemory) or the like. The storage device 508 is an auxiliary storage device realized by using a hard disk, SSD (Solid State Drive), memory card, ROM (Read Only Memory), or the like.
 入出力インタフェース510は、コンピュータ500と入出力デバイスとを接続するためのインタフェースである。例えば入出力インタフェース510には、キーボードなどの入力装置や、ディスプレイ装置などの出力装置が接続される。 The input / output interface 510 is an interface for connecting the computer 500 and the input / output device. For example, an input device such as a keyboard and an output device such as a display device are connected to the input / output interface 510.
 ネットワークインタフェース512は、コンピュータ500を無線ネットワークに接続するためのインタフェースである。このネットワークは、LAN(Local Area Network)であってもよいし、WAN(Wide Area Network)であってもよい。例えばコンピュータ500は、ネットワークインタフェース512及び無線ネットワークを介して、移動型ロボット20と通信可能に接続されている。 The network interface 512 is an interface for connecting the computer 500 to the wireless network. This network may be a LAN (Local Area Network) or a WAN (Wide Area Network). For example, the computer 500 is communicably connected to the mobile robot 20 via a network interface 512 and a wireless network.
 ストレージデバイス508は、会話監視装置2000の各機能構成部を実現するプログラム(前述したアプリケーションを実現するプログラム)を記憶している。プロセッサ504は、このプログラムをメモリ506に読み出して実行することで、会話監視装置2000の各機能構成部を実現する。 The storage device 508 stores a program (a program that realizes the above-mentioned application) that realizes each functional component of the conversation monitoring device 2000. The processor 504 reads this program into the memory 506 and executes it to realize each functional component of the conversation monitoring device 2000.
 会話監視装置2000は、1つのコンピュータ500で実現されてもよいし、複数のコンピュータ500で実現されてもよい。後者の場合において、各コンピュータ500の構成は同一である必要はなく、それぞれ異なるものとすることができる。 The conversation monitoring device 2000 may be realized by one computer 500 or by a plurality of computers 500. In the latter case, the configurations of the computers 500 do not have to be the same and can be different.
<移動型ロボット20のハードウエア構成の例>
 図4は、移動型ロボット20のハードウエア構成を例示するブロック図である。移動型ロボット20は、カメラ22、マイクロフォン24、アクチュエータ26、移動手段27、及びコントローラ600を有する。移動型ロボット20は、アクチュエータ26の出力に応じて移動手段27が動作することによって移動する。例えば移動手段27は、車輪などのような、走行を実現する手段である。この場合、移動型ロボット20は、監視領域内を走行して移動する。その他にも例えば、移動手段27は、プロペラのように、飛行を実現する手段でもよい。この場合、移動型ロボット20は、監視領域内を飛行して移動する。アクチュエータ26の出力は、コントローラ600によって制御される。
<Example of hardware configuration of mobile robot 20>
FIG. 4 is a block diagram illustrating a hardware configuration of the mobile robot 20. The mobile robot 20 includes a camera 22, a microphone 24, an actuator 26, a moving means 27, and a controller 600. The mobile robot 20 moves by operating the moving means 27 according to the output of the actuator 26. For example, the moving means 27 is a means for realizing traveling, such as a wheel. In this case, the mobile robot 20 travels and moves in the monitoring area. In addition, for example, the transportation means 27 may be a means for realizing flight, such as a propeller. In this case, the mobile robot 20 flies and moves in the monitoring area. The output of the actuator 26 is controlled by the controller 600.
 コントローラ600は任意のコンピュータであり、例えば SoC(System on a Chip)や SiP(System in a Package)などの集積回路で実現される。その他にも例えば、コントローラ600は、スマートフォンなどの携帯端末で実現されてもよい。コントローラ600は、バス602、プロセッサ604、メモリ606、ストレージデバイス608、入出力インタフェース610、及びネットワークインタフェース612を有する。バス602、プロセッサ604、メモリ606、ストレージデバイス608、入出力インタフェース610、及びネットワークインタフェース612はそれぞれ、バス502、プロセッサ504、メモリ506、ストレージデバイス508、入出力インタフェース510、及びネットワークインタフェース512と同様の機能を有する。 The controller 600 is an arbitrary computer, and is realized by an integrated circuit such as SoC (System on a Chip) or SiP (System in a Package). In addition, for example, the controller 600 may be realized by a mobile terminal such as a smartphone. The controller 600 has a bus 602, a processor 604, a memory 606, a storage device 608, an input / output interface 610, and a network interface 612. The bus 602, processor 604, memory 606, storage device 608, I / O interface 610, and network interface 612 are similar to the bus 502, processor 504, memory 506, storage device 508, I / O interface 510, and network interface 512, respectively. Has a function.
<処理の流れ>
 図5は、実施形態1の会話監視装置2000によって実行される処理の流れを例示するフローチャートである。第1判定部2020は、人物グループ40が含まれるビデオデータ32を取得し、ビデオデータ32を解析して、人物グループ40についての会話判定を行う(S102)。人物グループ40において会話が行われていると判定された場合(S102:会話あり)、会話監視装置2000は所定の対処処理を実行する(S104)。人物グループ40において会話が行われていないと判定された場合(S102:会話なし)、図5の処理は終了する。
<Processing flow>
FIG. 5 is a flowchart illustrating the flow of processing executed by the conversation monitoring device 2000 of the first embodiment. The first determination unit 2020 acquires the video data 32 including the person group 40, analyzes the video data 32, and makes a conversation determination about the person group 40 (S102). When it is determined that a conversation is taking place in the person group 40 (S102: with conversation), the conversation monitoring device 2000 executes a predetermined coping process (S104). When it is determined that no conversation is taking place in the person group 40 (S102: no conversation), the process of FIG. 5 ends.
 人物グループ40において会話が行われているか否かを判別できないと判定された場合(S102:判別できない)、ループ処理Aが実行される。ループ処理Aは、S106からS112で構成される。移動制御部2040は、移動型ロボット20を移動させる(S108)。第2判定部2060は、移動後の移動型ロボット20から得られたビデオデータ23又は音声データ25を用いて、会話判定を行う(S110)。 When it is determined that it cannot be determined whether or not the conversation is being performed in the person group 40 (S102: cannot be determined), the loop process A is executed. The loop process A is composed of S106 to S112. The movement control unit 2040 moves the mobile robot 20 (S108). The second determination unit 2060 makes a conversation determination using the video data 23 or the voice data 25 obtained from the mobile robot 20 after the movement (S110).
 人物グループ40において会話が行われていると判定された場合(S110:会話あり)、会話監視装置2000は対処処理を実行する(S104)。人物グループ40において会話が行われていないと判定された場合(S110:会話なし)、図5の処理は終了する。人物グループ40において会話が行われているかどうかを判別できないと判定された場合(S110:判別できない)、ステップS106に戻って再度ループ処理Aが実行される。こうすることで、会話の有無を判別できるまで、移動型ロボット20を移動させながら会話判定が行われる。 When it is determined that a conversation is taking place in the person group 40 (S110: with conversation), the conversation monitoring device 2000 executes a coping process (S104). When it is determined that no conversation is taking place in the person group 40 (S110: no conversation), the process of FIG. 5 ends. If it is determined that it cannot be determined whether or not the conversation is being performed in the person group 40 (S110: cannot be determined), the process returns to step S106 and the loop process A is executed again. By doing so, the conversation determination is performed while moving the mobile robot 20 until it can be determined whether or not there is a conversation.
 なお、図5のフローチャートでは、人物グループ40における会話の有無が判別できない間、ループ処理Aが実行され続ける。しかしながら、会話の有無が判別できない場合でも、所定の終了条件が満たされたら、ループ処理Aを終了して、図5の処理を終了してもよい。例えば会話監視装置2000は、S106において、所定の終了条件が満たされているか否かを判定し、終了条件が満たされている場合には、ループ処理Aを終了する。一方、所定の終了条件が満たされている場合、図5の処理は終了する。 Note that in the flowchart of FIG. 5, the loop process A continues to be executed while it cannot be determined whether or not there is a conversation in the person group 40. However, even if it cannot be determined whether or not there is a conversation, the loop process A may be terminated and the process of FIG. 5 may be terminated when the predetermined termination condition is satisfied. For example, the conversation monitoring device 2000 determines in S106 whether or not a predetermined end condition is satisfied, and if the end condition is satisfied, ends the loop process A. On the other hand, when the predetermined end condition is satisfied, the process of FIG. 5 ends.
 所定の終了条件には、任意の条件を採用できる。例えば所定の終了条件は、「人物グループ40について最初に会話判定(S102)を行ってから所定時間が経過した」という条件である。その他にも例えば、所定の終了条件は、「人物グループ40に含まれる人物10同士の距離が所定距離 L1 よりも大きくなった」という条件でもよい。こうすることで、互いからの距離が所定の距離よりも大きくなった複数の人物10については、会話判定の対象から除外することができる。 Any condition can be adopted as the predetermined end condition. For example, the predetermined end condition is a condition that "a predetermined time has elapsed since the conversation determination (S102) was first performed for the person group 40". In addition, for example, the predetermined end condition may be a condition that "the distance between the persons 10 included in the person group 40 is larger than the predetermined distance L1". By doing so, the plurality of persons 10 whose distance from each other is larger than a predetermined distance can be excluded from the target of the conversation determination.
<ビデオデータ32について>
 ビデオデータ32は、カメラ30によって生成されるビデオデータであり、人物グループ40(互いからの距離が所定距離 L1 以下である複数の人物10)が検出されたものである。ここで、ビデオデータから人物グループ40を検出する処理を行う装置を、人物グループ検出装置と呼ぶ。人物グループ検出装置は、会話監視装置2000であってもよい(すなわち、会話監視装置2000が人物グループ検出装置の機能を兼ね備えてもよい)し、会話監視装置2000以外の装置であってもよい。
<About video data 32>
The video data 32 is video data generated by the camera 30, and is obtained by detecting a person group 40 (a plurality of people 10 whose distance from each other is a predetermined distance L1 or less). Here, a device that performs a process of detecting a person group 40 from video data is called a person group detection device. The person group detection device may be the conversation monitoring device 2000 (that is, the conversation monitoring device 2000 may also have the function of the person group detection device), or may be a device other than the conversation monitoring device 2000.
 人物グループ検出装置は、カメラ30によって生成されるビデオデータから複数の人物10を検出し、これらの人物10同士の距離が所定距離 L1 以下であることを特定することにより、これらの人物10を人物グループ40として検出する。なお、カメラ30は、複数設けられていてもよい。 The person group detection device detects a plurality of people 10 from the video data generated by the camera 30, and identifies that the distance between these people 10 is a predetermined distance L1 or less, thereby making these people 10 people. Detected as group 40. A plurality of cameras 30 may be provided.
 ここで、人物10同士の距離が所定距離 L1 以下であることを特定する方法は様々である。例えば、人物グループ検出装置は、カメラ30によって生成されたビデオデータを解析し、当該ビデオデータから複数の人物10を検出する。複数の人物10が検出されたら、人物グループ検出装置は、プロジェクタを制御して、特定の距離を表す画像(以下、距離画像)を地面に投射させる。ここで、距離画像は、検出した複数の人物10と距離画像の双方をカメラ30の撮像範囲に含めることができる位置に投射される。距離画像が表す距離は、例えば、前述した所定距離 L1 である。なお、プロジェクタは、移動型ロボット20に搭載されていてもよいし、その他の場所(例えば天井)に設置されていてもよい。 Here, there are various methods for specifying that the distance between the persons 10 is a predetermined distance L1 or less. For example, the person group detection device analyzes the video data generated by the camera 30 and detects a plurality of people 10 from the video data. When a plurality of people 10 are detected, the person group detection device controls a projector to project an image representing a specific distance (hereinafter referred to as a distance image) onto the ground. Here, the distance image is projected at a position where both the detected plurality of people 10 and the distance image can be included in the imaging range of the camera 30. The distance represented by the distance image is, for example, the predetermined distance L1 described above. The projector may be mounted on the mobile robot 20 or may be installed in another place (for example, the ceiling).
 人物グループ検出装置は、カメラ30によって生成されるビデオデータから複数の人物10と距離画像を検出し、人物10同士の距離を距離画像のサイズ(すなわち、画像上の所定距離 L1)と比較する。人物10同士の距離が距離画像のサイズより小さい場合、人物グループ検出装置は、これらの人物10を人物グループ40として検出する。そして、人物グループ検出装置は、人物グループ40が検出されたビデオデータや、当該ビデオデータよりも後にカメラ30によって生成されるビデオデータを、ビデオデータ32として会話監視装置2000に提供する。 The person group detection device detects a plurality of people 10 and a distance image from the video data generated by the camera 30, and compares the distance between the people 10 with the size of the distance image (that is, a predetermined distance L1 on the image). When the distance between the persons 10 is smaller than the size of the distance image, the person group detection device detects these persons 10 as the person group 40. Then, the person group detection device provides the video data in which the person group 40 is detected and the video data generated by the camera 30 after the video data to the conversation monitoring device 2000 as the video data 32.
 なお、人物10同士の距離が所定距離 L1 以下であることを特定する方法は、上述の方法に限定されず、その他の既存の技術を利用してもよい。 The method for specifying that the distance between the persons 10 is a predetermined distance L1 or less is not limited to the above method, and other existing techniques may be used.
 会話監視装置2000には、ビデオデータ32に加え、人物グループ40の位置を特定可能な情報が提供されることが好ましい。人物グループ40の位置を特定可能な情報は、例えば、監視領域の地図を表す地図データにおける、人物グループ40の位置を表す情報である。なお、カメラによって撮像された物体の地図データ上の位置を特定する技術には、既存の技術を利用することができる。 It is preferable that the conversation monitoring device 2000 is provided with information that can specify the position of the person group 40 in addition to the video data 32. The information that can specify the position of the person group 40 is, for example, information that represents the position of the person group 40 in the map data that represents the map of the monitoring area. It should be noted that existing technology can be used for the technology for specifying the position of the object captured by the camera on the map data.
<第1判定部2020による判定:S102>
 第1判定部2020は、ビデオデータ32に含まれる人物グループ40について、会話判定を行う(S102)。具体的には、第1判定部2020は、人物グループ40に含まれる各人物10の顔の状態に基づいて、会話判定を行う。以下、ビデオデータを用いた会話判定の方法について例示する。
<Determination by the first determination unit 2020: S102>
The first determination unit 2020 determines conversation with respect to the person group 40 included in the video data 32 (S102). Specifically, the first determination unit 2020 makes a conversation determination based on the facial condition of each person 10 included in the person group 40. Hereinafter, a method of conversation determination using video data will be illustrated.
<<口の動きに基づく判定>>
 例えば第1判定部2020は、人物グループ40に含まれる各人物10が口を動かしているか否かを判定することで、会話の有無を判定する。例えば第1判定部2020は、人物グループ40に含まれる複数の人物10のうち、誰か一人でも口を動かしていたら、人物グループ40に含まれる人物10全員で会話をしていると判定する(すなわち、人物グループ40で会話が行われていると判定する)。また、人物グループ40に含まれる人物が誰も口を動かしていなかったら、第1判定部2020は、人物グループ40で会話が行われていないと判定する。さらに、第1判定部2020は、口を動かしている人物10が検出されず、なおかつ、口を動かしているか否かを判別できない人物10が検出された場合、会話の有無を判別できないと判定する。
<< Judgment based on mouth movement >>
For example, the first determination unit 2020 determines whether or not each person 10 included in the person group 40 is moving his or her mouth, thereby determining whether or not there is a conversation. For example, the first determination unit 2020 determines that if any one of the plurality of persons 10 included in the person group 40 is moving his / her mouth, all the persons 10 included in the person group 40 are having a conversation (that is,). , It is determined that the conversation is taking place in the person group 40). Further, if no person included in the person group 40 is moving his / her mouth, the first determination unit 2020 determines that no conversation is being performed in the person group 40. Further, when the person 10 who is moving the mouth is not detected and the person 10 who cannot determine whether or not the person is moving the mouth is detected, the first determination unit 2020 determines that the presence or absence of conversation cannot be determined. ..
 第1判定部2020は、人物グループ40に含まれる複数の人物10のうち、口を動かしている人物10のみによって会話が行われていると判定してもよい。この場合、第1判定部2020は、口を動かしていないと判定された人物10を、人物グループ40から除外する。第1判定部2020は、ビデオデータ32から、口を動かしているか否かを判別できない人物10が検出されず、なおかつ、口を動かしている人物10が二人以上検出されたら、口を動かしていない人物10を人物グループ40から除外した上で、人物グループ40で会話が行われていると判定する。一方、第1判定部2020は、ビデオデータ32から、口を動かしているか否かを判別できない人物10が検出されず、なおかつ、口を動かしている人物10が一人以下である場合、人物グループ40で会話が行われていないと判定する。さらに、第1判定部2020は、ビデオデータ32から、口を動かしているか否かを判別できない人物10が検出されたら、口を動かしていないと判定された人物10を人物グループ40から除外した上で、会話の有無を判別できないと判定する。 The first determination unit 2020 may determine that the conversation is being performed only by the person 10 who is moving the mouth among the plurality of persons 10 included in the person group 40. In this case, the first determination unit 2020 excludes the person 10 determined not to move the mouth from the person group 40. The first determination unit 2020 is moving the mouth when the person 10 who cannot determine whether or not the mouth is moving is not detected from the video data 32 and two or more people 10 who are moving the mouth are detected. After excluding the non-existing person 10 from the person group 40, it is determined that the conversation is taking place in the person group 40. On the other hand, when the first determination unit 2020 does not detect the person 10 who cannot determine whether or not he / she is moving his / her mouth from the video data 32, and the number of people 10 who are moving his / her mouth is one or less, the person group 40 Judges that no conversation is taking place. Further, when the first determination unit 2020 detects a person 10 who cannot determine whether or not he / she is moving his / her mouth from the video data 32, the first determination unit 2020 excludes the person 10 who is determined not to move his / her mouth from the person group 40. It is determined that the presence or absence of conversation cannot be determined.
 ここで、口を動かしているか否かを判別できない人物10は、例えば、その口の部分がビデオデータ32に含まれていない(例えば、ビデオデータ32を生成するカメラに対して背を向けていたり、障害物に遮られて口の部分が撮像されていない)人物や、口の動きの有無についての判別結果の確度が低い人物である。 Here, the person 10 who cannot determine whether or not he / she is moving his / her mouth may, for example, turn his / her back to the camera that generates the video data 32 because the portion of the mouth is not included in the video data 32. A person whose mouth is not imaged because it is blocked by an obstacle, or a person whose judgment result regarding the presence or absence of mouth movement is low.
 例えば第1判定部2020は、人物の口やその周辺を表す画像領域の時系列データから、その人物について、口を動かしている確率と口を動かしていない確率の双方を算出するように構成される。そして、第1判定部2020は、これらの確率に基づいて、その人物の状態として、1)口を動かしている、2)口を動かしていない、及び3)口を動かしているか否かを判別できない、のうちのいずれか1つを特定する。 For example, the first determination unit 2020 is configured to calculate both the probability of moving the mouth and the probability of not moving the mouth of the person from the time-series data of the image area representing the mouth of the person and its surroundings. To. Then, based on these probabilities, the first determination unit 2020 determines whether or not the person's state is 1) moving the mouth, 2) not moving the mouth, and 3) moving the mouth. Identify one of the following:
 例えば、上記確率についての閾値として、所定の値 T1 を予め設定しておく。この場合、口を動かしている確率が T1 以上である場合、当該人物が口を動かしていると判定される。一方、口を動かしていない確率が T1 以上である場合、当該人物が口を動かしていないと判定される。さらに、口を動かしている確率と口を動かしていない確率の双方が閾値 T1 未満である場合、
当該人物が口を動かしているか否かを判別できないと判定される。例えば、閾値 T1=0.7 とした場合において、「口を動かしている確率=0.6、口を動かしていない確率=0.4」が算出されたら、「口を動かしているか否かを判別できない」という判定結果となる。
For example, a predetermined value T1 is set in advance as a threshold value for the above probability. In this case, if the probability of moving the mouth is T1 or more, it is determined that the person is moving the mouth. On the other hand, if the probability of not moving the mouth is T1 or more, it is determined that the person is not moving the mouth. In addition, if both the probability of moving the mouth and the probability of not moving the mouth are less than the threshold T1.
It is determined that it cannot be determined whether or not the person is moving his or her mouth. For example, when the threshold value T1 = 0.7, if "probability of moving the mouth = 0.6, probability of not moving the mouth = 0.4" is calculated, the judgment result is "it cannot be determined whether or not the mouth is moving". Will be.
<<顔又は視線の向きに基づく判定>>
 例えば第1判定部2020は、人物グループ40に含まれる各人物10の顔又は視線の向きに基づいて、会話の有無を判定する。以下、顔の向きを利用するケースについて、より具体的に説明する。視線の向きを利用するケースについての説明は、特に説明しない限り、以下の説明において「顔」を「視線」に置き換えたものとなる。
<< Judgment based on the direction of the face or line of sight >>
For example, the first determination unit 2020 determines whether or not there is a conversation based on the direction of the face or the line of sight of each person 10 included in the person group 40. Hereinafter, the case of using the orientation of the face will be described more specifically. Unless otherwise specified, the description of the case where the direction of the line of sight is used is the one in which the "face" is replaced with the "line of sight" in the following description.
 例えば第1判定部2020は、人物グループ40に含まれる各人物10の顔が、人物グループ40に含まれる他のいずれかの人物10の方を向いている場合に、人物グループ40に含まれる人物10全員で会話をしていると判定する(すなわち、人物グループ40で会話が行われていると判定する)。また、人物グループ40に含まれる各人物の顔がいずれも、人物グループ40に含まれる他の人物10の方を向いていない場合、第1判定部2020は、人物グループ40で会話が行われていないと判定する。さらに、第1判定部2020は、人物グループ40の中から、顔の向きを判別できない人物10が検出された場合、会話の有無を判別できないと判定する。 For example, the first determination unit 2020 is a person included in the person group 40 when the face of each person 10 included in the person group 40 faces toward any other person 10 included in the person group 40. 10 It is determined that all the members are having a conversation (that is, it is determined that the person group 40 is having a conversation). Further, when none of the faces of each person included in the person group 40 are facing toward the other person 10 included in the person group 40, the first determination unit 2020 is having a conversation in the person group 40. It is determined that there is no such thing. Further, the first determination unit 2020 determines that the presence or absence of conversation cannot be determined when the person 10 whose face orientation cannot be determined is detected from the person group 40.
 第1判定部2020は、人物グループ40に含まれる複数の人物10のうち、人物グループ40に含まれる他の人物10の方へ顔が向いている人物10のみによって会話が行われていると判定してもよい。この場合、第1判定部2020は、人物グループ40に含まれる他の人物10の方へ顔が向いていないと判定された人物10を、人物グループ40から除外する。第1判定部2020は、ビデオデータ32から、顔の向きを判別できない人物10が検出されず、なおかつ、他の人物10の方へ顔が向いている人物10が二人以上検出されたら、他の人物10の方へ顔が向いていない人物10を人物グループ40から除外した上で、人物グループ40で会話が行われていると判定する。一方、第1判定部2020は、ビデオデータ32から、顔の向きを判別できない人物10が検出されず、なおかつ、他の人物10の方へ顔が向いている人物10が一人以下である場合、人物グループ40で会話が行われていないと判定する。さらに、第1判定部2020は、ビデオデータ32から、顔の向きが判別できない人物10が検出されたら、他の人物10の方へ顔が向いていない人物10を人物グループ40から除外した上で、会話の有無を判別できないと判定する。 The first determination unit 2020 determines that the conversation is being performed only by the person 10 whose face is facing the other person 10 included in the person group 40 among the plurality of persons 10 included in the person group 40. You may. In this case, the first determination unit 2020 excludes the person 10 who is determined not to face the other person 10 included in the person group 40 from the person group 40. If the first determination unit 2020 does not detect the person 10 whose face direction cannot be determined from the video data 32, and if two or more persons 10 whose faces are facing toward the other person 10 are detected, the other person After excluding the person 10 whose face is not facing toward the person 10 from the person group 40, it is determined that the conversation is being performed in the person group 40. On the other hand, when the first determination unit 2020 does not detect the person 10 whose face direction cannot be determined from the video data 32, and the number of persons 10 whose faces are facing toward the other person 10 is one or less. It is determined that no conversation is taking place in the person group 40. Further, when the first determination unit 2020 detects a person 10 whose face orientation cannot be determined from the video data 32, the first determination unit 2020 excludes the person 10 whose face is not facing the other person 10 from the person group 40. , Judge that the presence or absence of conversation cannot be determined.
 顔の向きを判別できない人物10は、例えば、その顔の部分がビデオデータ32に含まれていない(例えば、ビデオデータ32を生成するカメラに対して背を向けていたり、障害物に遮られて顔の部分が撮像されていない)人物や、顔の向きについての判別結果の確度が低い人物である。なお、顔の向きの代わりに視線の向きを利用する場合、顔の部分を解析する代わりに、目の部分の解析が行われる。 For example, the face portion of the person 10 whose face orientation cannot be determined is not included in the video data 32 (for example, the person turns his back to the camera that generates the video data 32 or is obstructed by an obstacle. A person whose face part is not imaged) or a person whose accuracy of the discrimination result regarding the orientation of the face is low. When the direction of the line of sight is used instead of the direction of the face, the part of the eyes is analyzed instead of the part of the face.
 例えば第1判定部2020は、人物の顔を表す画像領域の時系列データから、その人物について、複数の向き(例えば、所定の4方向や8方向など)それぞれについて、顔がその向きを向いている確率を算出するように構成される。第1判定部2020は、算出した確率が閾値以上である向きが存在する場合、その向きを人物10の顔の向きとして特定する。一方、各向きについて算出した確率がいずれも閾値未満である場合、第1判定部2020は、人物10の顔の向きが判別できないと判定する。 For example, in the first determination unit 2020, the face faces the direction of the person in each of a plurality of directions (for example, predetermined 4 directions or 8 directions) from the time series data of the image area representing the face of the person. It is configured to calculate the probability of being present. When there is a direction in which the calculated probability is equal to or greater than the threshold value, the first determination unit 2020 specifies that direction as the direction of the face of the person 10. On the other hand, when the probabilities calculated for each orientation are all less than the threshold value, the first determination unit 2020 determines that the orientation of the face of the person 10 cannot be determined.
<<モデルを利用する方法>>
 第1判定部2020は、複数の人物10の顔が含まれるビデオデータが入力されたことに応じて、複数の人物10が会話をしているか否かを識別する学習済みモデルを有していてもよい。当該モデルは、例えば、複数の人物10の顔が含まれるビデオデータが入力されたことに応じて、それら複数の人物10について、1)会話が行われている、2)会話が行われていない、3)会話が行われているか否かの判別ができないという3つのうちのいずれかの判定結果を出力する。このようなモデルは、例えば、リカレントニューラルネットワーク(RNN: Recurrent neural network)などで実現することができる。
<< How to use the model >>
The first determination unit 2020 has a trained model for identifying whether or not the plurality of persons 10 are having a conversation in response to the input of video data including the faces of the plurality of persons 10. May be good. In the model, for example, in response to the input of video data including the faces of a plurality of persons 10, 1) the conversation is being performed and 2) the conversation is not being performed for the plurality of persons 10. 3) Outputs one of the three determination results that it is not possible to determine whether or not a conversation is taking place. Such a model can be realized by, for example, a recurrent neural network (RNN).
 例えばモデルは、会話が行われている確率と会話が行われない確率の双方を算出し、これらの確率を閾値と比較する。会話が行われている確率が閾値以上である場合、会話が行われているという判定結果が出力される。会話が行われていない確率が閾値以上である場合、会話が行われていないという判定結果が出力される。会話が行われている確率と会話が行われていない確率の双方が閾値未満である場合、会話が行われているか否かを判別できないという判定結果が出力される。 For example, the model calculates both the probability that a conversation is taking place and the probability that a conversation is not taking place, and compares these probabilities with the threshold value. If the probability that a conversation is taking place is greater than or equal to the threshold value, the determination result that the conversation is taking place is output. If the probability that no conversation is taking place is greater than or equal to the threshold value, the determination result that no conversation is taking place is output. If both the probability that the conversation is taking place and the probability that the conversation is not taking place are less than the threshold value, a determination result that it cannot be determined whether or not the conversation is taking place is output.
 なお、上記モデルは、「ビデオデータ、正解のラベル(会話が行われているか否かを示すラベル)」という組み合わせで構成される学習データを用いて、予め学習しておく。ここで、入力データと正解のラベルの組み合わせで構成される学習データを用いてモデルを学習する技術には、既存の技術を利用することができる。 The above model is trained in advance using learning data composed of a combination of "video data and a label for the correct answer (label indicating whether or not a conversation is taking place)". Here, an existing technique can be used as a technique for learning a model using learning data composed of a combination of input data and a correct label.
<移動型ロボット20の制御と第2判定部2060による判定:S108、S110>
 移動制御部2040は、人物グループ40における会話の有無を判別可能なビデオデータ23又は音声データ25が得られる位置に向けて、移動型ロボット20を移動させる(S108)。そして、第2判定部2060は、移動中又は移動後に移動型ロボット20から得られるビデオデータ23又は音声データ25を利用して、人物グループ40についての会話判定を行う。以下、会話判定にビデオデータ23を使うケースと音声データ25を使うケースに分けて、移動型ロボット20の移動を制御する方法と会話判定の方法について説明する。
<Control of the mobile robot 20 and determination by the second determination unit 2060: S108, S110>
The movement control unit 2040 moves the mobile robot 20 toward a position where video data 23 or voice data 25 capable of determining the presence or absence of conversation in the person group 40 can be obtained (S108). Then, the second determination unit 2060 makes a conversation determination about the person group 40 by using the video data 23 or the voice data 25 obtained from the mobile robot 20 during or after the movement. Hereinafter, a method of controlling the movement of the mobile robot 20 and a method of determining conversation will be described separately for a case where the video data 23 is used for the conversation determination and a case where the voice data 25 is used.
<<ビデオデータ23を利用するケース>>
 ビデオデータ23を利用した会話判定の方法は、前述したビデオデータ32を利用した会話判定と同じである。そのため、ビデオデータ23を利用した会話判定には、各人物10の顔の口の動き、顔の向き、又は視線の向きなどが利用される。そこで、移動制御部2040は、人物グループ40に含まれる各人物10について、口の動き、顔の向き、又は視線の向きを特定するために必要な情報が得られる位置に向けて、移動型ロボット20を移動させる。口の動き、顔の向き、及び視線の向きを特定するために必要な情報はそれぞれ、口を含む画像領域、顔を含む画像領域、及び目を含む画像領域である。
<< Case of using video data 23 >>
The method of the conversation determination using the video data 23 is the same as the conversation determination using the video data 32 described above. Therefore, in the conversation determination using the video data 23, the movement of the mouth of each person 10's face, the direction of the face, the direction of the line of sight, and the like are used. Therefore, the movement control unit 2040 directs the mobile robot toward a position where information necessary for specifying the movement of the mouth, the direction of the face, or the direction of the line of sight can be obtained for each person 10 included in the person group 40. Move 20. The information required to identify the movement of the mouth, the orientation of the face, and the orientation of the line of sight is an image region including the mouth, an image region including the face, and an image region including the eyes, respectively.
 例えば移動制御部2040は、移動型ロボット20を、人物グループ40へ近づくように移動させる。その他にも例えば、移動制御部2040は、人物グループ40に含まれる人物10と移動型ロボット20との間に障害物が無くなる位置へ、移動型ロボット20を移動させる。なお、移動型ロボットに搭載されているカメラから得られるビデオデータに含まれる特定の物体へ近づくように当該移動型ロボットを移動させたり、当該特定の物体との間に障害物がなくなる位置へ当該移動型ロボットを移動させる技術自体には、既存の技術を利用することができる。 For example, the movement control unit 2040 moves the mobile robot 20 so as to approach the person group 40. In addition, for example, the movement control unit 2040 moves the mobile robot 20 to a position where there are no obstacles between the person 10 included in the person group 40 and the mobile robot 20. The mobile robot may be moved so as to approach a specific object included in the video data obtained from the camera mounted on the mobile robot, or the robot may be moved to a position where there are no obstacles between the mobile robot and the specific object. Existing technology can be used for the technology itself for moving a mobile robot.
 なお、人物10の口や目がビデオデータ23に含まれるようにするためには、人物10の顔の正面へ移動型ロボット20を移動させることが好ましい。そこで例えば、移動制御部2040は、人物グループ40に含まれる複数の人物10の顔の向きをそれぞれ算出し、複数の人物10の顔の正面へ順次移動型ロボット20を移動させる。こうすることで、移動制御部2040は、各人物10について順次口の動きや視線の向きを特定していく。 In order to include the mouth and eyes of the person 10 in the video data 23, it is preferable to move the mobile robot 20 to the front of the face of the person 10. Therefore, for example, the movement control unit 2040 calculates the face orientations of the plurality of persons 10 included in the person group 40, and sequentially moves the mobile robot 20 to the front of the faces of the plurality of persons 10. By doing so, the movement control unit 2040 sequentially specifies the movement of the mouth and the direction of the line of sight for each person 10.
 その他にも例えば、移動制御部2040は、一カ所から複数の人物10の口や目を撮像できるように、移動型ロボット20を移動させてもよい。例えば移動制御部2040は、ビデオデータ32やビデオデータ23から、各人物10の顔の向きの平均方向を算出し、その方向上の位置へ移動型ロボット20を移動させる。 In addition, for example, the movement control unit 2040 may move the mobile robot 20 so that the mouths and eyes of a plurality of people 10 can be imaged from one place. For example, the movement control unit 2040 calculates the average direction of the face orientation of each person 10 from the video data 32 and the video data 23, and moves the mobile robot 20 to a position on the direction.
 なお、このように移動型ロボット20を人物10の顔の正面へ移動させた場合において、ビデオデータ23から人物10の顔の向きを特定できなかったとする。この場合、移動制御部2040が、移動型ロボット20を人物グループ40に近づけたり、人物グループ40の周囲を廻るようにして移動型ロボット20を移動させたりしながら、第2判定部2060が、ビデオデータ23から人物10の顔の向きの特定を試みる。そして、人物10の顔の向きが特定できたら、移動制御部2040は、人物10の顔の正面へ移動型ロボット20を移動させる。 It is assumed that when the mobile robot 20 is moved to the front of the face of the person 10 in this way, the direction of the face of the person 10 cannot be specified from the video data 23. In this case, while the movement control unit 2040 brings the mobile robot 20 closer to the person group 40 or moves the mobile robot 20 so as to rotate around the person group 40, the second determination unit 2060 moves the video. Attempts are made to identify the orientation of the face of the person 10 from the data 23. Then, when the orientation of the face of the person 10 can be specified, the movement control unit 2040 moves the mobile robot 20 to the front of the face of the person 10.
<<音声データ25を利用するケース>>
 音声データ25を利用する場合、第2判定部2060は、音声データ25に含まれる音声の大きさと、人物グループ40までの距離との関係に基づいて、人物グループ40についての会話判定を行う。ここで、人物グループ40で会話が行われていたとしても、移動型ロボット20の位置が人物グループ40から遠いと、マイクロフォン24で人物グループ40の会話の音声を検出することが難しい。そこで移動制御部2040は、移動型ロボット20を、人物グループ40からの距離が所定距離 L3 以下の位置まで移動させる。この所定距離 L3 は、人物グループ40で会話が行われていた場合に、マイクロフォン24でその会話の音声を検出することが可能な距離として、予め設定される。そして、第2判定部2060は、人物グループ40からの距離が所定距離 L3 以下である位置に移動した移動型ロボット20のマイクロフォン24から音声データ25を取得し、その音声データ25によって表される音声の大きさが閾値以上であるか否かを判定する。音声データ25によって表される音声の大きさが閾値以上である場合、第2判定部2060は、人物グループ40で会話が行われていると判定する。一方、音声データ25によって表される音声の大きさが閾値未満である場合、第2判定部2060は、人物グループ40で会話が行われていないと判定する。
<< Case of using voice data 25 >>
When the voice data 25 is used, the second determination unit 2060 makes a conversation determination about the person group 40 based on the relationship between the size of the voice included in the voice data 25 and the distance to the person group 40. Here, even if the conversation is taking place in the person group 40, if the position of the mobile robot 20 is far from the person group 40, it is difficult for the microphone 24 to detect the voice of the conversation in the person group 40. Therefore, the movement control unit 2040 moves the mobile robot 20 to a position where the distance from the person group 40 is a predetermined distance L3 or less. This predetermined distance L3 is preset as a distance at which the voice of the conversation can be detected by the microphone 24 when the conversation is being held in the person group 40. Then, the second determination unit 2060 acquires voice data 25 from the microphone 24 of the mobile robot 20 that has moved to a position where the distance from the person group 40 is a predetermined distance L3 or less, and the voice represented by the voice data 25. It is determined whether or not the magnitude of is equal to or greater than the threshold value. When the volume of the voice represented by the voice data 25 is equal to or larger than the threshold value, the second determination unit 2060 determines that the conversation is being performed in the person group 40. On the other hand, when the volume of the voice represented by the voice data 25 is less than the threshold value, the second determination unit 2060 determines that the conversation is not performed in the person group 40.
 なお、上記閾値は、固定の値であってもよいし、移動型ロボット20から人物グループ40までの距離に応じて動的に設定されてもよい。後者の場合、例えば、距離と閾値との関係を定めた関数を予め定めておく。第2判定部2060は、マイクロフォン24から音声データ25を得た時点について、移動型ロボット20から人物グループ40までの距離を特定し、上記関数にその距離を入力することで閾値を特定し、音声データ25によって表される音声の大きさと特定した閾値とを比較する。 The above threshold value may be a fixed value or may be dynamically set according to the distance from the mobile robot 20 to the person group 40. In the latter case, for example, a function that defines the relationship between the distance and the threshold value is predetermined. The second determination unit 2060 specifies the distance from the mobile robot 20 to the person group 40 at the time when the voice data 25 is obtained from the microphone 24, specifies the threshold value by inputting the distance into the above function, and voices. The loudness of the voice represented by the data 25 is compared with the identified threshold.
 また、第2判定部2060は、音声データ25を解析して、人の声が含まれているか否かを判定してもよい。この場合、第2判定部2060は、音声データ25によって表される音声の大きさが閾値以上であり、なおかつ、当該音声に人の声が含まれる場合に、人物グループ40で会話が行われていると判定する。一方、当該音声の大きさが閾値未満であるか、又は、当該音声に人の声が含まれていない場合に、人物グループ40で会話が行われていないと判定する。こうすることで、例えば、人の声以外の音が発生している状況を、人物グループ40が会話をしている状況として誤検出してしまうことを防ぐことができる。 Further, the second determination unit 2060 may analyze the voice data 25 and determine whether or not a human voice is included. In this case, the second determination unit 2060 has a conversation in the person group 40 when the volume of the voice represented by the voice data 25 is equal to or larger than the threshold value and the voice includes a human voice. It is determined that there is. On the other hand, if the volume of the voice is less than the threshold value or the voice does not include the human voice, it is determined that the conversation is not performed in the person group 40. By doing so, for example, it is possible to prevent a situation in which a sound other than a human voice is generated from being erroneously detected as a situation in which the person group 40 is having a conversation.
 なお、第2判定部2060は、音声データ25に声が含まれる人の数を考慮してもよい。例えば第2判定部2060は、音声データ25によって表される音声の大きさが閾値以上であり、なおかつ、当該音声に複数の人物の声が含まれる場合に、人物グループ40で会話が行われていると判定する。一方、当該音声の大きさが閾値未満であるか、又は、当該音声に声が含まれる人の声が一人以下である場合に、人物グループ40で会話が行われていないと判定する。こうすることで、例えば、一人の人物が独り言を言っている状況を、人物グループ40が会話をしている状況として誤検出してしまうことを防ぐことができる。 The second determination unit 2060 may consider the number of people whose voice is included in the voice data 25. For example, in the second determination unit 2060, when the volume of the voice represented by the voice data 25 is equal to or larger than the threshold value and the voice includes the voices of a plurality of people, the conversation is performed in the person group 40. It is determined that there is. On the other hand, when the loudness of the voice is less than the threshold value or the voice of the person including the voice is one or less, it is determined that the conversation is not performed in the person group 40. By doing so, for example, it is possible to prevent a situation in which one person is speaking to himself from being erroneously detected as a situation in which the person group 40 is having a conversation.
 また、第2判定部2060は、音声データ25に人の声が含まれているか否かの判定結果の確度や、音声データ25に声が含まれるに人の数についての算出結果の確度が低い場合に、会話の有無を判別できないと判定してもよい。例えば、音声データ25に人の声が含まれている確率と、人の声が含まれていない確率のいずれもが、所定の閾値未満である場合に、会話の有無を判別できないと判定される。 Further, the second determination unit 2060 has low accuracy of the determination result as to whether or not the voice data 25 contains a human voice, and the accuracy of the calculation result regarding the number of people including the voice in the voice data 25. In some cases, it may be determined that the presence or absence of conversation cannot be determined. For example, if both the probability that the voice data 25 contains a human voice and the probability that the voice data 25 does not contain a human voice are less than a predetermined threshold value, it is determined that the presence or absence of conversation cannot be determined. ..
 さらに、第2判定部2060は、音声データが入力されたことに応じて、当該音声データに会話をしている複数の人物10の音声が含まれているか否かを識別する学習済みモデルを有していてもよい。当該モデルは、例えば、音声データが入力されたことに応じて、1)会話が行われている、2)会話が行われていない、3)会話が行われているか否かを判別ができないという3つのうちのいずれかの判定結果を出力する。このようなモデルは、例えば、リカレントニューラルネットワーク(RNN: Recurrent neural network)などで実現することができる。 Further, the second determination unit 2060 has a trained model for identifying whether or not the voice data includes the voices of a plurality of persons 10 having a conversation in response to the input of voice data. You may be doing it. For example, the model cannot determine whether or not 1) conversation is taking place, 2) conversation is not taking place, and 3) conversation is taking place, depending on the input of voice data. The determination result of any one of the three is output. Such a model can be realized by, for example, a recurrent neural network (RNN: Recurrent neural network).
 例えばモデルは、会話が行われている確率と会話が行われない確率の双方を算出し、これらの確率を閾値と比較する。会話が行われている確率が閾値以上である場合、会話が行われているという判定結果が出力される。会話が行われていない確率が閾値以上である場合、会話が行われていないという判定結果が出力される。会話が行われている確率と会話が行われていない確率の双方が閾値未満である場合、会話が行われているか否かを判別できないという判定結果が出力される。 For example, the model calculates both the probability that a conversation is taking place and the probability that a conversation is not taking place, and compares these probabilities with the threshold value. If the probability that a conversation is taking place is greater than or equal to the threshold value, the determination result that the conversation is taking place is output. If the probability that no conversation is taking place is greater than or equal to the threshold value, the determination result that no conversation is taking place is output. If both the probability that the conversation is taking place and the probability that the conversation is not taking place are less than the threshold value, a determination result that it cannot be determined whether or not the conversation is taking place is output.
 なお、上記モデルは、「音声データ、正解のラベル(会話が行われているか否かを示すラベル)」という組み合わせで構成される学習データを用いて、予め学習しておく。 The above model is learned in advance using learning data composed of a combination of "voice data and a label for the correct answer (a label indicating whether or not a conversation is taking place)".
<<移動経路の算出について>>
 移動型ロボット20を特定の目的地へ移動させるためには、移動型ロボット20が参照可能な地図データを用いて、当該目的地までの移動経路を設定する。ここで、地図データを用いて目的地への移動経路を算出し、算出した移動経路を移動型ロボット20に設定する処理を行う装置を、経路設定装置と呼ぶ。経路設定装置は、移動型ロボット20であってもよいし、会話監視装置2000であってもよいし、これら以外の装置であってもよい。
<< About calculation of movement route >>
In order to move the mobile robot 20 to a specific destination, a movement route to the destination is set using map data that can be referred to by the mobile robot 20. Here, a device that calculates a movement route to a destination using map data and performs a process of setting the calculated movement route in the mobile robot 20 is called a route setting device. The route setting device may be a mobile robot 20, a conversation monitoring device 2000, or a device other than these.
 経路設定装置は、地図データを取得し、当該地図データと、前述した種々の方法で決定した目的地(移動型ロボット20を移動させるべき位置)とに基づいて、移動型ロボット20の移動経路を算出する。そして、経路設定装置は、算出した移動経路を移動型ロボット20に設定する。移動型ロボット20は、設定された移動経路に従って移動する。なお、経路設定装置が会話監視装置2000以外の装置である場合、移動制御部2040は、経路設定装置に対し、移動型ロボット20に設定すべき目的地を示す情報を提供する。 The route setting device acquires map data, and based on the map data and the destination (position where the mobile robot 20 should be moved) determined by the various methods described above, determines the movement route of the mobile robot 20. calculate. Then, the route setting device sets the calculated movement route in the mobile robot 20. The mobile robot 20 moves according to a set movement path. When the route setting device is a device other than the conversation monitoring device 2000, the movement control unit 2040 provides the route setting device with information indicating a destination to be set in the mobile robot 20.
 なお、地図データと目的地の情報とに基づいて移動経路を算出する技術には、既存の技術を利用することができる。 The existing technology can be used as the technology for calculating the movement route based on the map data and the destination information.
<<移動の制御に関するその他の事項>>
 移動型ロボット20は、監視領域にいる人の行動の妨げとならないように移動することが好ましい。例えば移動型ロボット20は、ビデオデータ32やビデオデータ23を用いて、監視領域にいる各人物の動きを把握し、各人物と接触しないように移動する。なお、人との接触を避けて移動型ロボット20を移動させる技術には、既存の技術(例えば、自動運転車を他の自動車や通行人などとぶつからないように移動させる技術など)を採用することができる。
<< Other matters related to movement control >>
It is preferable that the mobile robot 20 moves so as not to interfere with the behavior of a person in the monitoring area. For example, the mobile robot 20 uses the video data 32 and the video data 23 to grasp the movement of each person in the monitoring area and move so as not to come into contact with each person. As the technology for moving the mobile robot 20 while avoiding contact with humans, existing technology (for example, technology for moving an autonomous vehicle so as not to collide with other vehicles or passersby) is adopted. be able to.
 その他にも例えば、移動型ロボット20は、人物グループ40に含まれない人物の視界に移動型ロボット20が入らないように移動することが好適である。そこで例えば、経路設定装置は、ビデオデータ23から人物グループ40に含まれない人物10が検出されたら、その人物10の顔の方向又は視線方向を特定する。そして、経路設定装置は、特定した顔の方向又は視線方向と、移動型ロボット20の目的地とに基づいて、人物10の視界に入らずに当該目的地へ移動型ロボット20が到達するための移動経路を算出し、当該移動経路を移動型ロボット20に設定する。 In addition, for example, it is preferable that the mobile robot 20 moves so that the mobile robot 20 does not enter the field of view of a person who is not included in the person group 40. Therefore, for example, when the person 10 not included in the person group 40 is detected from the video data 23, the route setting device specifies the face direction or the line-of-sight direction of the person 10. Then, the route setting device is for the mobile robot 20 to reach the destination without entering the line of sight of the person 10 based on the specified face direction or line-of-sight direction and the destination of the mobile robot 20. The movement route is calculated, and the movement route is set in the mobile robot 20.
 ただし、人物10の顔の方向や視線方向が繰り返し大きく変化する場合などには、人物10の視界に入らないように移動型ロボット20を移動させることは難しいこともありうる。そこで例えば、経路設定装置は、顔の方向や視線方向が大きく変化する蓋然性が低い人物(例えば、立ち止まっている人物や椅子に座っている人物)のみをビデオデータから検出し、検出された人物の視界に入らないように移動型ロボット20の移動経路を設定してもよい。 However, when the direction of the face or the direction of the line of sight of the person 10 changes significantly repeatedly, it may be difficult to move the mobile robot 20 so as not to be in the line of sight of the person 10. Therefore, for example, the route setting device detects only a person who is unlikely to have a large change in the direction of the face or the direction of the line of sight (for example, a person who is standing or sitting in a chair) from the video data, and the detected person. The movement path of the mobile robot 20 may be set so as not to be in sight.
 移動型ロボット20は、移動制御部2040による制御を受け付けるまでの間、停止していてもよいし、移動していてもよい。後者の場合、例えば移動型ロボット20に対し、監視領域内の一部又は全部を巡回するように移動経路を設定しておく。特に、ビデオデータ23を利用して人物グループ40の検出が行われる場合(すなわち、カメラ22がカメラ30としても利用される場合)、監視領域内を移動型ロボット20に巡回させることで、監視領域内の様々な場所で人物グループ40を検出できるようにすることが好適である。以下、巡回用に移動型ロボット20に設定されている移動経路のことを、巡回経路とも表記する。 The mobile robot 20 may be stopped or may be moving until the control by the movement control unit 2040 is received. In the latter case, for example, for the mobile robot 20, a movement route is set so as to patrol a part or all of the monitoring area. In particular, when the person group 40 is detected using the video data 23 (that is, when the camera 22 is also used as the camera 30), the monitoring area is patrolled by the mobile robot 20 in the monitoring area. It is preferable to be able to detect the person group 40 at various places in the country. Hereinafter, the movement route set in the mobile robot 20 for patrol is also referred to as a patrol route.
 巡回経路には、監視領域のうち、人の分布の密度が高い(すなわち、人が多い)領域が含まれることが好ましい。例えば、巡回経路には、監視領域のうち、人の分布の密度が高い領域のみが含まれるようにする。その他にも例えば、巡回経路は、人の分布の密度が高い領域を巡回する頻度が、人の密度が低い領域を巡回する頻度よりも高くなるように設定される。 It is preferable that the patrol route includes a region where the distribution of people is high (that is, there are many people) among the monitoring regions. For example, the patrol route should include only the monitoring area where the distribution of people is high. In addition, for example, the patrol route is set so that the frequency of patrol in a region with a high density of people is higher than the frequency of patrol in a region with a low density of people.
 また、カメラ30が、監視カメラなど、移動型ロボット20以外の場所に固定で設定されているカメラである場合、移動型ロボット20の巡回経路には、カメラ30の撮像範囲に含まれない領域(以下、撮像範囲外領域)が含まれることが好ましい。このようにすることで、固定カメラで撮像することが難しい領域を移動型ロボット20に撮像させることができるため、監視領域内を幅広く監視することができるようになる。 Further, when the camera 30 is a camera fixedly set in a place other than the mobile robot 20 such as a surveillance camera, the patrol path of the mobile robot 20 includes an area not included in the imaging range of the camera 30 (in addition, the camera 30 is not included in the imaging range of the camera 30. Hereinafter, it is preferable that a region outside the imaging range) is included. By doing so, the mobile robot 20 can image an area that is difficult to image with the fixed camera, so that the inside of the monitoring area can be widely monitored.
 巡回経路は、人手で設定されてもよいし、経路設定装置によって自動的に設定されてもよい。後者の場合、例えば経路設置装置は、カメラ30によって生成されるビデオデータを解析することで、カメラ30について撮像範囲外領域を特定し、当該撮像範囲外領域を含む巡回経路を生成する。より具体的には、経路設定装置は、監視領域の地図データと、カメラ30によって生成されるビデオデータとを用いて、カメラ30の撮像範囲内の領域を特定し、当該特定した領域以外の領域を、撮像範囲外領域として特定する。 The patrol route may be set manually or automatically by a route setting device. In the latter case, for example, the route setting device identifies a region outside the imaging range of the camera 30 by analyzing the video data generated by the camera 30, and generates a patrol route including the region outside the imaging range. More specifically, the route setting device identifies an area within the imaging range of the camera 30 by using the map data of the monitoring area and the video data generated by the camera 30, and the area other than the specified area. Is specified as a region outside the imaging range.
 例えば、撮像範囲外領域が、1つの閉じた領域であるとする。この場合、経路設定装置は、撮像範囲外領域の中を巡回するように、巡回経路を生成する。一方、撮像範囲外領域が、互いに繋がっていない複数の領域であるとする。この場合、例えば経路設定装置は、これら複数の撮像範囲外領域を順次巡回するように、巡回経路を生成する。なお、監視領域に移動型ロボット20が複数設けられている場合、各移動型ロボット20に対して、それぞれ異なる巡回経路が設定されてもよい。この場合、各巡回経路には、互いに異なる撮像範囲外領域が含まれるようにすることが好ましい。 For example, it is assumed that the area outside the imaging range is one closed area. In this case, the route setting device generates a patrol route so as to patrol the region outside the imaging range. On the other hand, it is assumed that the regions outside the imaging range are a plurality of regions that are not connected to each other. In this case, for example, the route setting device generates a patrol route so as to sequentially patrol the plurality of regions outside the imaging range. When a plurality of mobile robots 20 are provided in the monitoring area, different patrol routes may be set for each mobile robot 20. In this case, it is preferable that each patrol path includes regions outside the imaging range that are different from each other.
<対処処理:S104>
 人物グループ40で会話が行われていると判定されたら、会話監視装置2000は、所定の対処処理を実行する(S104)。対処処理には、任意の処理を採用できる。例えば対処処理は、人物グループ40に対して警告を発する処理(以下、警告処理)である。例えば警告処理は、移動型ロボット20に設けられているディスプレイ装置に警告を表す画面を表示させたり、移動型ロボット20に設けられているプロジェクタに警告を表す画像を照射させたりする処理である。その他にも例えば、警告処理は、移動型ロボット20に設けられているスピーカーから、警告を表す音声を出力する処理である。
<Corrective action: S104>
When it is determined that the conversation is being performed in the person group 40, the conversation monitoring device 2000 executes a predetermined coping process (S104). Any process can be adopted as the coping process. For example, the coping process is a process of issuing a warning to the person group 40 (hereinafter referred to as a warning process). For example, the warning process is a process of displaying a screen indicating a warning on a display device provided in the mobile robot 20 or irradiating a projector provided in the mobile robot 20 with an image indicating a warning. In addition, for example, the warning process is a process of outputting a voice indicating a warning from a speaker provided in the mobile robot 20.
 ここで、移動型ロボット20は、人物グループ40に対してある程度近づいた上で警告を行うようにしてもよい。例えば会話監視装置2000は、人物グループ40からの距離が所定の閾値以下である位置まで移動型ロボット20を移動させ、その後に前述した種々の警告が移動型ロボット20から出力されるようにしてもよい。 Here, the mobile robot 20 may give a warning to the person group 40 after approaching it to some extent. For example, the conversation monitoring device 2000 may move the mobile robot 20 to a position where the distance from the person group 40 is equal to or less than a predetermined threshold value, and then output various warnings described above from the mobile robot 20. good.
 その他にも例えば、会話監視装置2000は、人物グループ40に含まれる各人物に対して、警告を表す通知を送信してもよい。この場合、人物10の識別情報と、当該人物に対する通知の送信先(例えば、メールアドレスなど)とを対応づけた情報を、会話監視装置2000からアクセス可能な記憶装置に予め格納しておく。会話監視装置2000は、ビデオデータ32、ビデオデータ23、又は音声データ25を利用して、人物グループ40に含まれる各人物の識別情報を特定し、当該識別情報に対応する送信先へ、前述した通知を送信する。ビデオデータを利用して人物10の識別情報を特定する場合、例えば人物10の識別情報は、人物10の画像上の特徴量(例えば、顔の特徴量など)である。会話監視装置2000は、人物グループ40に含まれる人物10の特徴量をビデオデータ32やビデオデータ23から抽出し、当該特徴量とマッチする特徴量(例えば、類似度が閾値以上の特徴量)に対応づけられている送信先へ、通知を送信する。一方、音声データ25を利用する場合、人物10の識別情報として、人物10の声の特徴量を利用する。 In addition, for example, the conversation monitoring device 2000 may send a notification indicating a warning to each person included in the person group 40. In this case, the information in which the identification information of the person 10 and the transmission destination of the notification to the person (for example, an e-mail address) are associated with each other is stored in advance in a storage device accessible from the conversation monitoring device 2000. The conversation monitoring device 2000 identifies the identification information of each person included in the person group 40 by using the video data 32, the video data 23, or the voice data 25, and sends the identification information to the destination corresponding to the identification information as described above. Send a notification. When the identification information of the person 10 is specified by using the video data, for example, the identification information of the person 10 is a feature amount on the image of the person 10 (for example, a feature amount of a face). The conversation monitoring device 2000 extracts the feature amount of the person 10 included in the person group 40 from the video data 32 and the video data 23, and sets the feature amount to match the feature amount (for example, the feature amount having a similarity equal to or higher than the threshold value). Send a notification to the associated destination. On the other hand, when the voice data 25 is used, the feature amount of the voice of the person 10 is used as the identification information of the person 10.
 また、人物グループ40で会話が行われていることが検出された場合に、会話監視装置2000は、人物グループ40のみに対象を絞らず、その他の人々に対しても注意を促すような警告を行ってもよい。例えば、会話監視装置2000は、放送を行う装置を制御して、近距離での会話を避けることを促す放送(室内放送、館内放送、屋外放送など)を行わせたり、所定の警告音を流させたりする。 Further, when it is detected that a conversation is taking place in the person group 40, the conversation monitoring device 2000 does not limit the target to only the person group 40, but also warns other people to pay attention. You may go. For example, the conversation monitoring device 2000 controls a device that performs broadcasting to perform broadcasting (indoor broadcasting, in-house broadcasting, outdoor broadcasting, etc.) that encourages avoiding conversation at a short distance, or plays a predetermined warning sound. Let me do it.
 対処処理は、警告処理に限定されない。例えば会話監視装置2000は、会話を行っていた人物グループ40についての情報(識別情報や、人物グループ40が撮像されたビデオデータ32やビデオデータ23)を、記憶装置に格納してもよい。こうすることで、例えば、人物グループ40の中の一人が感染症にかかったことが判明した場合に、人物グループ40に含まれる他の人物を、感染症の疑いがある人物として特定することができる。 The coping process is not limited to the warning process. For example, the conversation monitoring device 2000 may store information (identification information, video data 32 or video data 23 in which the person group 40 is captured) about the person group 40 having a conversation in the storage device. By doing so, for example, when one of the person group 40 is found to have an infectious disease, another person included in the person group 40 can be identified as a person suspected of having an infectious disease. can.
 また、会話監視装置2000は、人物グループ40で会話が行われていることが検出されてもすぐに対処処理を行わず、当該会話が所定時間継続した場合のみ、対処処理を行ってもよい。その他にも例えば、会話監視装置2000は、会話の継続時間に応じて、多段階に対処処理を行ってもよい。この場合、複数の警告レベルそれぞれに対し、それぞれ異なる警告処理を対応づけた情報を、予め会話監視装置2000からアクセス可能な記憶装置に格納しておく。例えば、より高い警告レベルには、より目立つ(警告の効果が大きい)警告が対応づけられる。 Further, the conversation monitoring device 2000 does not immediately perform the coping process even if it is detected that the conversation is being performed in the person group 40, and may perform the coping process only when the conversation continues for a predetermined time. In addition, for example, the conversation monitoring device 2000 may perform coping processing in multiple stages according to the duration of the conversation. In this case, information corresponding to different warning processes for each of the plurality of warning levels is stored in advance in a storage device accessible from the conversation monitoring device 2000. For example, a higher warning level is associated with a more prominent (more effective warning) warning.
 会話監視装置2000は、会話の継続時間が長いほど、レベルの高い警告を行うようにする。例えば会話監視装置2000は、会話の継続時間が P1 以上となった場合に、「人物グループ40から所定の距離以内の位置まで移動する」という第1レベルの警告処理を行う。次に、会話の継続時間が P2(>P1)以上となった場合に、「警告画面をディスプレイ装置に表示する、又は警告画像を地面に投射する」という第2レベルの警告処理を行う。そして、会話の継続時間が P3(>P2)以上となった場合に、「警告の音声をスピーカーから出力する」という第3レベルの警告処理を行う。 The conversation monitoring device 2000 gives a higher level warning as the conversation duration increases. For example, the conversation monitoring device 2000 performs a first-level warning process of "moving to a position within a predetermined distance from the person group 40" when the conversation duration is P1 or longer. Next, when the duration of the conversation is P2 (> P1) or longer, the second level warning process of "displaying the warning screen on the display device or projecting the warning image on the ground" is performed. Then, when the duration of the conversation is P3 (> P2) or longer, the third level warning process of "outputting the warning voice from the speaker" is performed.
 このように会話の継続時間に応じた多段階の警告を行うことで、会話の継続時間が短いうちは控えめな警告を行い、会話の継続時間が長くなるにつれてより目立つ警告を行う、といった運用が可能となる。これにより、警告の効果の大きさと、警告が人の活動の妨げになる度合いとのバランスを取ることができる。すなわち、会話の継続時間が短いうちは、会話を止めさせる効果が小さくてもよいから、できる限り会話の妨げにならないように配慮した警告を行い、継続時間が長くなったら、ある程度会話の妨げになってもよいから、会話を止めさせる効果が大きい警告を行う、といった運用が可能となる。 By giving multi-step warnings according to the duration of the conversation in this way, a conservative warning is given while the conversation duration is short, and a more conspicuous warning is given as the conversation duration increases. It will be possible. This makes it possible to balance the magnitude of the warning effect with the degree to which the warning interferes with human activity. In other words, as long as the duration of the conversation is short, the effect of stopping the conversation may be small, so a warning should be given so as not to interfere with the conversation as much as possible, and if the duration is long, it will interfere with the conversation to some extent. Since it does not matter, it is possible to perform an operation such as giving a warning that has a great effect of stopping the conversation.
<会話の有無以外の条件を考慮した対処処理>
 会話監視装置2000は、人物グループ40が特定の条件を満たしている場合、その人物グループ40については対処処理を行わないようにしてもよい。例えば特定の条件は、「感染症予防のための適切な対処が行われている」という条件である。より具体的な例としては、「人物グループ40に含まれる全ての人物10がマスクをしている」という条件や、「人物グループ40に含まれている複数の人物10が、仕切りで仕切られている」という条件などが挙げられる。ここで、監視領域における会話を制限する際のポリシーとして、「感染症予防のための適切な対処が行われていれば近距離で会話をしてもよい」というポリシーを採用することが考えられる。前述した感染症予防に関する条件を採用することにより、このようなポリシーの運用が可能となる。なお、上述した特定の条件が満たされているか否かを判定するタイミングは、人物グループ40について会話判定を行う前と後のいずれであってもよい。
<Corrective processing considering conditions other than the presence or absence of conversation>
When the person group 40 satisfies a specific condition, the conversation monitoring device 2000 may not perform coping processing on the person group 40. For example, a specific condition is that "appropriate measures have been taken to prevent infectious diseases". More specific examples include the condition that "all the persons 10 included in the person group 40 are wearing masks" and "the plurality of persons 10 included in the person group 40 are partitioned by a partition". The condition that "is there" can be mentioned. Here, as a policy for restricting conversations in the monitoring area, it is conceivable to adopt a policy that "conversations may be made at a short distance if appropriate measures are taken to prevent infectious diseases". .. By adopting the above-mentioned conditions for infectious disease prevention, it is possible to operate such a policy. The timing for determining whether or not the above-mentioned specific conditions are satisfied may be either before or after the conversation determination is performed for the person group 40.
 また、前述した特定の条件が満たされているか否かに応じて、前述した警告レベルを変えてもよい。すなわち、特定の条件が満たされている場合の警告レベルを、特定の条件が満たされていない場合の警告レベルよりも低くする。こうすることで、例えば、感染症予防のための適切な対処が行われている場合には、より控えめな警告を行うようにし、感染症予防のための適切な対処が行われていない場合には、より目立つ警告を行うといった運用が可能となる。 Further, the above-mentioned warning level may be changed depending on whether or not the above-mentioned specific conditions are satisfied. That is, the warning level when a specific condition is satisfied is lower than the warning level when a specific condition is not satisfied. By doing so, for example, if appropriate measures are taken to prevent infectious diseases, a more conservative warning will be given, and if appropriate measures are not taken to prevent infectious diseases. Can be used to give more prominent warnings.
 以上、実施の形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the invention of the present application has been described above with reference to the embodiments, the invention of the present application is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made within the scope of the present invention in terms of the configuration and details of the present invention.
 なお、上述の例において、プログラムは、様々なタイプの非一時的なコンピュータ可読媒体(non-transitory computer readable medium)を用いて格納され、コンピュータに提供することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記録媒体(tangible storage medium)を含む。非一時的なコンピュータ可読媒体の例は、磁気記録媒体(例えば、フレキシブルディスク、磁気テープ、ハードディスクドライブ)、光磁気記録媒体(例えば、光磁気ディスク)、CD-ROM、CD-R、CD-R/W、半導体メモリ(例えば、マスク ROM、PROM(Programmable ROM)、EPROM(Erasable PROM)、フラッシュROM、RAM)を含む。また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体(transitory computer readable medium)によってコンピュータに提供されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバ等の有線通信路、又は無線通信路を介して、プログラムをコンピュータに供給できる。 In the above example, the program is stored using various types of non-transitory computer readable medium and can be provided to the computer. Non-temporary computer-readable media include various types of tangible storage mediums. Examples of non-temporary computer readable media are magnetic recording media (eg, flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (eg, magneto-optical disks), CD-ROMs, CD-Rs, CD-Rs. / W, including semiconductor memory (for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM). The program may also be provided to the computer by various types of temporary computer readable medium. Examples of temporary computer-readable media include electrical, optical, and electromagnetic waves. The temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.
 上記の実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。
 (付記1)
 監視領域内において互いに第1所定距離以内に位置する複数の人物が検出された第1ビデオデータを取得し、前記第1ビデオデータを解析して、前記複数の人物が会話をしているか否かを判定する第1判定部と、
 前記第1ビデオデータを解析しても、前記複数の人物が会話をしているか否かを判別できない場合に、第1カメラが設けられている移動型ロボットを前記複数の人物の顔を撮像可能な位置まで移動させるか、又は、マイクロフォンが設けられている移動型ロボットを前記複数の人物から第2所定距離以内の場所まで移動させる移動制御部と、
 前記移動型ロボットを移動させた後に、前記第1カメラから得られる第2ビデオデータ又は前記マイクロフォンから得られる音声データを解析して、前記複数の人物が会話をしているか否かを判定する第2判定部と、を有する会話監視装置。
 (付記2)
 前記第1判定部は、
  前記第1ビデオデータに含まれる前記複数の人物それぞれの口の動き、顔の向き、又は視線の向きに基づいて、前記複数の人物が会話をしているか否かを判定し、
  前記複数の人物の中に、口の動き、顔の向き、又は視線の向きを特定できない人物が存在する場合に、前記複数の人物が会話をしているか否かを判別できないと判定する、付記1に記載の会話監視装置。
 (付記3)
 前記第1判定部は、前記第1ビデオデータに含まれる前記複数の人物について、会話が行われている確率と会話が行われていない確率の双方を算出し、会話が行われている確率と会話が行われていない確率の双方が閾値未満である場合に、前記複数の人物が会話をしているか否かを判別できないと判定する、付記1に記載の会話監視装置。
 (付記4)
 前記第2判定部は、前記第2ビデオデータに含まれる前記複数の人物それぞれの口の動き、顔の向き、又は視線の向きに基づいて、前記複数の人物が会話をしているか否かを判定する、付記1から3いずれか一項に記載の会話監視装置。
 (付記5)
 前記第2判定部は、前記音声データによって表される音声の大きさ、及びその音声データが得られた時の前記移動型ロボットと前記複数の人物との間の距離に基づいて、前記複数の人物が会話をしているか否かを判定する、付記1から3いずれか一項に記載の会話監視装置。
 (付記6)
 前記移動型ロボットにはプロジェクタが搭載されており、
 前記移動型ロボットは、前記移動制御部による制御を受け付ける前に、前記第1所定距離の大きさを表す距離画像を前記プロジェクタから地面に投影しつつ、前記監視領域内を移動し、
 前記第2ビデオデータから前記距離画像及び複数の人物を検出し、検出された各人物の間の距離と前記距離画像の大きさとを比較することで、互いに前記第1所定距離以内に位置する前記複数の人物を検出し、
 前記第1判定部は、当該複数の人物が検出された前記第2ビデオデータを、前記第1ビデオデータとして取得する、付記1から5いずれか一項に記載の会話監視装置。
 (付記7)
 前記移動制御部は、前記第2ビデオデータを解析することで、前記移動型ロボットの周囲の人物の顔の方向又は視線方向を特定し、当該特定した顔の方向又は視線方向に位置する領域を通過しないように、前記移動型ロボットを移動させる、付記1から6いずれか一項に記載の会話監視装置。
 (付記8)
 前記監視領域の一部を撮像する第2カメラが設けられており、
 前記移動型ロボットは、前記移動制御部による制御を受け付ける前に、前記第2カメラで撮像できない領域を移動し、
 前記第1ビデオデータは、前記第1カメラ又は前記第2カメラによって生成されるビデオデータである、付記1から7いずれか一項に記載の会話監視装置。
 (付記9)
 前記第2カメラによって生成されるビデオデータと、前記監視領域の地図データとを利用して、前記第2カメラで撮像できない領域を特定する、付記8に記載の会話監視装置。
 (付記10)
 前記第1判定部又は前記第2判定部によって、前記複数の人物が会話をしていると判定された場合、前記複数の人物に対して所定の対処処理を行い、
 前記所定の対処処理には、前記複数の人物に前記移動型ロボットを近づける処理、前記移動型ロボットに設けられているディスプレイ装置又はプロジェクタに警告を表す情報を出力させる処理、前記移動型ロボットに設けられているスピーカーに警告音を出力させる処理のうち、いずれか1つ以上が含まれる、付記1から9いずれか一項に記載の会話監視装置。
 (付記11)
 コンピュータによって実行される制御方法であって、
 監視領域内において互いに第1所定距離以内に位置する複数の人物が検出された第1ビデオデータを取得し、前記第1ビデオデータを解析して、前記複数の人物が会話をしているか否かを判定する第1判定ステップと、
 前記第1ビデオデータを解析しても、前記複数の人物が会話をしているか否かを判別できない場合に、第1カメラが設けられている移動型ロボットを前記複数の人物の顔を撮像可能な位置まで移動させるか、又は、マイクロフォンが設けられている移動型ロボットを前記複数の人物から第2所定距離以内の場所まで移動させる移動制御ステップと、
 前記移動型ロボットを移動させた後に、前記第1カメラから得られる第2ビデオデータ又は前記マイクロフォンから得られる音声データを解析して、前記複数の人物が会話をしているか否かを判定する第2判定ステップと、を有する制御方法。
 (付記12)
 前記第1判定ステップにおいて、
  前記第1ビデオデータに含まれる前記複数の人物それぞれの口の動き、顔の向き、又は視線の向きに基づいて、前記複数の人物が会話をしているか否かを判定し、
  前記複数の人物の中に、口の動き、顔の向き、又は視線の向きを特定できない人物が存在する場合に、前記複数の人物が会話をしているか否かを判別できないと判定する、付記11に記載の制御方法。
 (付記13)
 前記第1判定ステップにおいて、前記第1ビデオデータに含まれる前記複数の人物について、会話が行われている確率と会話が行われていない確率の双方を算出し、会話が行われている確率と会話が行われていない確率の双方が閾値未満である場合に、前記複数の人物が会話をしているか否かを判別できないと判定する、付記11に記載の制御方法。
 (付記14)
 前記第2判定ステップにおいて、前記第2ビデオデータに含まれる前記複数の人物それぞれの口の動き、顔の向き、又は視線の向きに基づいて、前記複数の人物が会話をしているか否かを判定する、付記11から13いずれか一項に記載の制御方法。
 (付記15)
 前記第2判定ステップにおいて、前記音声データによって表される音声の大きさ、及びその音声データが得られた時の前記移動型ロボットと前記複数の人物との間の距離に基づいて、前記複数の人物が会話をしているか否かを判定する、付記11から13いずれか一項に記載の制御方法。
 (付記16)
 前記移動型ロボットにはプロジェクタが搭載されており、
 前記移動型ロボットは、前記移動制御ステップによる制御を受け付ける前に、前記第1所定距離の大きさを表す距離画像を前記プロジェクタから地面に投影しつつ、前記監視領域内を移動し、
 前記第2ビデオデータから前記距離画像及び複数の人物を検出し、検出された各人物の間の距離と前記距離画像の大きさとを比較することで、互いに前記第1所定距離以内に位置する前記複数の人物を検出し、
 前記第1判定ステップにおいて、当該複数の人物が検出された前記第2ビデオデータを、前記第1ビデオデータとして取得する、付記11から15いずれか一項に記載の制御方法。
 (付記17)
 前記移動制御ステップにおいて、前記第2ビデオデータを解析することで、前記移動型ロボットの周囲の人物の顔の方向又は視線方向を特定し、当該特定した顔の方向又は視線方向に位置する領域を通過しないように、前記移動型ロボットを移動させる、付記11から16いずれか一項に記載の制御方法。
 (付記18)
 前記監視領域の一部を撮像する第2カメラが設けられており、
 前記移動型ロボットは、前記移動制御ステップによる制御を受け付ける前に、前記第2カメラで撮像できない領域を移動し、
 前記第1ビデオデータは、前記第1カメラ又は前記第2カメラによって生成されるビデオデータである、付記11から17いずれか一項に記載の制御方法。
 (付記19)
 前記第2カメラによって生成されるビデオデータと、前記監視領域の地図データとを利用して、前記第2カメラで撮像できない領域を特定する、付記18に記載の制御方法。
 (付記20)
 前記第1判定ステップ又は前記第2判定ステップによって、前記複数の人物が会話をしていると判定された場合、前記複数の人物に対して所定の対処処理を行い、
 前記所定の対処処理には、前記複数の人物に前記移動型ロボットを近づける処理、前記移動型ロボットに設けられているディスプレイ装置又はプロジェクタに警告を表す情報を出力させる処理、前記移動型ロボットに設けられているスピーカーに警告音を出力させる処理のうち、いずれか1つ以上が含まれる、付記11から19いずれか一項に記載の制御方法。
 (付記21)
 プログラムを記憶しているコンピュータ可読媒体であって、
 前記プログラムは、
 監視領域内において互いに第1所定距離以内に位置する複数の人物が検出された第1ビデオデータを取得し、前記第1ビデオデータを解析して、前記複数の人物が会話をしているか否かを判定する第1判定ステップと、
 前記第1ビデオデータを解析しても、前記複数の人物が会話をしているか否かを判別できない場合に、第1カメラが設けられている移動型ロボットを前記複数の人物の顔を撮像可能な位置まで移動させるか、又は、マイクロフォンが設けられている移動型ロボットを前記複数の人物から第2所定距離以内の場所まで移動させる移動制御ステップと、
 前記移動型ロボットを移動させた後に、前記第1カメラから得られる第2ビデオデータ又は前記マイクロフォンから得られる音声データを解析して、前記複数の人物が会話をしているか否かを判定する第2判定ステップと、をコンピュータに実行させる、コンピュータ可読媒体。
 (付記22)
 前記第1判定ステップにおいて、
  前記第1ビデオデータに含まれる前記複数の人物それぞれの口の動き、顔の向き、又は視線の向きに基づいて、前記複数の人物が会話をしているか否かを判定し、
  前記複数の人物の中に、口の動き、顔の向き、又は視線の向きを特定できない人物が存在する場合に、前記複数の人物が会話をしているか否かを判別できないと判定する、付記21に記載のコンピュータ可読媒体。
 (付記23)
 前記第1判定ステップにおいて、前記第1ビデオデータに含まれる前記複数の人物について、会話が行われている確率と会話が行われていない確率の双方を算出し、会話が行われている確率と会話が行われていない確率の双方が閾値未満である場合に、前記複数の人物が会話をしているか否かを判別できないと判定する、付記21に記載のコンピュータ可読媒体。
 (付記24)
 前記第2判定ステップにおいて、前記第2ビデオデータに含まれる前記複数の人物それぞれの口の動き、顔の向き、又は視線の向きに基づいて、前記複数の人物が会話をしているか否かを判定する、付記21から23いずれか一項に記載のコンピュータ可読媒体。
 (付記25)
 前記第2判定ステップにおいて、前記音声データによって表される音声の大きさ、及びその音声データが得られた時の前記移動型ロボットと前記複数の人物との間の距離に基づいて、前記複数の人物が会話をしているか否かを判定する、付記21から23いずれか一項に記載のコンピュータ可読媒体。
 (付記26)
 前記移動型ロボットにはプロジェクタが搭載されており、
 前記移動型ロボットは、前記移動制御ステップによる制御を受け付ける前に、前記第1所定距離の大きさを表す距離画像を前記プロジェクタから地面に投影しつつ、前記監視領域内を移動し、
 前記第2ビデオデータから前記距離画像及び複数の人物を検出し、検出された各人物の間の距離と前記距離画像の大きさとを比較することで、互いに前記第1所定距離以内に位置する前記複数の人物を検出し、
 前記第1判定ステップにおいて、当該複数の人物が検出された前記第2ビデオデータを、前記第1ビデオデータとして取得する、付記21から25いずれか一項に記載のコンピュータ可読媒体。
 (付記27)
 前記移動制御ステップにおいて、前記第2ビデオデータを解析することで、前記移動型ロボットの周囲の人物の顔の方向又は視線方向を特定し、当該特定した顔の方向又は視線方向に位置する領域を通過しないように、前記移動型ロボットを移動させる、付記21から26いずれか一項に記載のコンピュータ可読媒体。
 (付記28)
 前記監視領域の一部を撮像する第2カメラが設けられており、
 前記移動型ロボットは、前記移動制御ステップによる制御を受け付ける前に、前記第2カメラで撮像できない領域を移動し、
 前記第1ビデオデータは、前記第1カメラ又は前記第2カメラによって生成されるビデオデータである、付記21から27いずれか一項に記載のコンピュータ可読媒体。
 (付記29)
 前記第2カメラによって生成されるビデオデータと、前記監視領域の地図データとを利用して、前記第2カメラで撮像できない領域を特定する、付記28に記載のコンピュータ可読媒体。
 (付記30)
 前記第1判定ステップ又は前記第2判定ステップによって、前記複数の人物が会話をしていると判定された場合、前記複数の人物に対して所定の対処処理を行い、
 前記所定の対処処理には、前記複数の人物に前記移動型ロボットを近づける処理、前記移動型ロボットに設けられているディスプレイ装置又はプロジェクタに警告を表す情報を出力させる処理、前記移動型ロボットに設けられているスピーカーに警告音を出力させる処理のうち、いずれか1つ以上が含まれる、付記21から29いずれか一項に記載のコンピュータ可読媒体。
Some or all of the above embodiments may also be described, but not limited to:
(Appendix 1)
Whether or not the plurality of persons are having a conversation by acquiring the first video data in which a plurality of persons located within the first predetermined distance from each other in the monitoring area are detected and analyzing the first video data. The first judgment unit that determines
If it is not possible to determine whether or not the plurality of persons are talking even by analyzing the first video data, the mobile robot provided with the first camera can capture the faces of the plurality of persons. A movement control unit that moves a mobile robot provided with a microphone to a position within a second predetermined distance from the plurality of persons.
After moving the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed to determine whether or not the plurality of persons are having a conversation. A conversation monitoring device having two determination units.
(Appendix 2)
The first determination unit is
Based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the first video data, it is determined whether or not the plurality of persons are having a conversation.
Addition: When there is a person who cannot specify the movement of the mouth, the direction of the face, or the direction of the line of sight among the plurality of persons, it is determined that it cannot be determined whether or not the plurality of persons are having a conversation. The conversation monitoring device according to 1.
(Appendix 3)
The first determination unit calculates both the probability that a conversation is taking place and the probability that a conversation is not taking place for the plurality of persons included in the first video data, and determines the probability that the conversation is taking place. The conversation monitoring device according to Appendix 1, wherein it is determined that it cannot be determined whether or not the plurality of persons are having a conversation when both of the probabilities that no conversation is taking place are less than the threshold value.
(Appendix 4)
The second determination unit determines whether or not the plurality of persons are having a conversation based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the second video data. The conversation monitoring device according to any one of Supplementary note 1 to 3 for determination.
(Appendix 5)
The second determination unit is based on the loudness of the voice represented by the voice data and the distance between the mobile robot and the plurality of persons when the voice data is obtained. The conversation monitoring device according to any one of Supplementary note 1 to 3, which determines whether or not a person is having a conversation.
(Appendix 6)
The mobile robot is equipped with a projector.
Before accepting the control by the movement control unit, the mobile robot moves in the monitoring area while projecting a distance image showing the magnitude of the first predetermined distance from the projector to the ground.
By detecting the distance image and a plurality of people from the second video data and comparing the distance between the detected people with the size of the distance image, the distance images are located within the first predetermined distance from each other. Detects multiple people and
The conversation monitoring device according to any one of Supplementary note 1 to 5, wherein the first determination unit acquires the second video data in which the plurality of persons are detected as the first video data.
(Appendix 7)
The movement control unit identifies the direction or line-of-sight direction of the face of a person around the mobile robot by analyzing the second video data, and determines a region located in the specified face direction or line-of-sight direction. The conversation monitoring device according to any one of Supplementary note 1 to 6, which moves the mobile robot so as not to pass through.
(Appendix 8)
A second camera that captures a part of the monitoring area is provided.
The mobile robot moves an area that cannot be imaged by the second camera before accepting control by the movement control unit.
The conversation monitoring device according to any one of Supplementary note 1 to 7, wherein the first video data is video data generated by the first camera or the second camera.
(Appendix 9)
The conversation monitoring device according to Appendix 8, wherein a region that cannot be imaged by the second camera is specified by using the video data generated by the second camera and the map data of the monitoring area.
(Appendix 10)
When it is determined by the first determination unit or the second determination unit that the plurality of persons are having a conversation, a predetermined coping process is performed on the plurality of persons.
The predetermined coping process includes a process of bringing the mobile robot closer to the plurality of persons, a process of causing a display device or a projector provided in the mobile robot to output information indicating a warning, and a process provided in the mobile robot. The conversation monitoring device according to any one of Supplementary note 1 to 9, wherein one or more of the processes for outputting a warning sound to the speaker are included.
(Appendix 11)
A control method performed by a computer
Whether or not the plurality of persons are having a conversation by acquiring the first video data in which a plurality of persons located within the first predetermined distance from each other in the monitoring area are detected and analyzing the first video data. The first judgment step to determine
If it is not possible to determine whether or not the plurality of persons are talking even by analyzing the first video data, the mobile robot provided with the first camera can capture the faces of the plurality of persons. A movement control step for moving a mobile robot provided with a microphone to a position within a second predetermined distance from the plurality of persons.
After moving the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed to determine whether or not the plurality of persons are having a conversation. A control method comprising two determination steps.
(Appendix 12)
In the first determination step,
Based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the first video data, it is determined whether or not the plurality of persons are having a conversation.
Addendum: When there is a person who cannot specify the movement of the mouth, the direction of the face, or the direction of the line of sight among the plurality of persons, it is determined that it cannot be determined whether or not the plurality of persons are having a conversation. 11. The control method according to 11.
(Appendix 13)
In the first determination step, both the probability that a conversation is taking place and the probability that a conversation is not taking place are calculated for the plurality of persons included in the first video data, and the probability that the conversation is taking place is calculated. The control method according to Appendix 11, wherein it is determined that it cannot be determined whether or not the plurality of persons are having a conversation when both of the probabilities that the conversation is not taking place are less than the threshold value.
(Appendix 14)
In the second determination step, whether or not the plurality of persons are having a conversation based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the second video data. The control method according to any one of Supplementary note 11 to 13 for determination.
(Appendix 15)
In the second determination step, the plurality of voice data are based on the loudness of the voice represented by the voice data and the distance between the mobile robot and the plurality of persons when the voice data is obtained. The control method according to any one of Supplementary note 11 to 13, which determines whether or not a person is having a conversation.
(Appendix 16)
The mobile robot is equipped with a projector.
Before accepting the control by the movement control step, the mobile robot moves in the monitoring area while projecting a distance image showing the magnitude of the first predetermined distance from the projector to the ground.
By detecting the distance image and a plurality of people from the second video data and comparing the distance between the detected people with the size of the distance image, the distance images are located within the first predetermined distance from each other. Detects multiple people and
The control method according to any one of Supplementary note 11 to 15, wherein in the first determination step, the second video data in which the plurality of persons are detected is acquired as the first video data.
(Appendix 17)
In the movement control step, by analyzing the second video data, the direction or line-of-sight direction of the face of a person around the mobile robot is specified, and the region located in the specified face direction or line-of-sight direction is determined. The control method according to any one of Supplementary note 11 to 16, wherein the mobile robot is moved so as not to pass through.
(Appendix 18)
A second camera that captures a part of the monitoring area is provided.
The mobile robot moves an area that cannot be imaged by the second camera before accepting the control by the movement control step.
The control method according to any one of Supplementary note 11 to 17, wherein the first video data is video data generated by the first camera or the second camera.
(Appendix 19)
The control method according to Appendix 18, wherein a region that cannot be imaged by the second camera is specified by using the video data generated by the second camera and the map data of the monitoring area.
(Appendix 20)
When it is determined by the first determination step or the second determination step that the plurality of persons are having a conversation, a predetermined coping process is performed on the plurality of persons.
The predetermined coping process includes a process of bringing the mobile robot closer to the plurality of persons, a process of causing a display device or a projector provided in the mobile robot to output information indicating a warning, and a process provided in the mobile robot. The control method according to any one of Supplementary note 11 to 19, wherein any one or more of the processes for outputting a warning sound to the speaker is included.
(Appendix 21)
A computer-readable medium that stores programs
The program
Whether or not the plurality of persons are having a conversation by acquiring the first video data in which a plurality of persons located within the first predetermined distance from each other in the monitoring area are detected and analyzing the first video data. The first judgment step to determine
If it is not possible to determine whether or not the plurality of persons are talking even by analyzing the first video data, the mobile robot provided with the first camera can capture the faces of the plurality of persons. A movement control step for moving a mobile robot provided with a microphone to a position within a second predetermined distance from the plurality of persons.
After moving the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed to determine whether or not the plurality of persons are having a conversation. A computer-readable medium that causes a computer to perform two judgment steps.
(Appendix 22)
In the first determination step,
Based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the first video data, it is determined whether or not the plurality of persons are having a conversation.
Addition: When there is a person who cannot specify the movement of the mouth, the direction of the face, or the direction of the line of sight among the plurality of persons, it is determined that it cannot be determined whether or not the plurality of persons are having a conversation. 21 is a computer-readable medium.
(Appendix 23)
In the first determination step, both the probability that a conversation is taking place and the probability that a conversation is not taking place are calculated for the plurality of persons included in the first video data, and the probability that the conversation is taking place is calculated. The computer-readable medium according to Appendix 21, wherein it is determined that it cannot be determined whether or not the plurality of persons are having a conversation when both of the probabilities that the conversation is not taking place are less than the threshold value.
(Appendix 24)
In the second determination step, whether or not the plurality of persons are having a conversation based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the second video data. The computer-readable medium according to any one of Supplementary note 21 to 23 to be determined.
(Appendix 25)
In the second determination step, the plurality of voice data are based on the loudness of the voice represented by the voice data and the distance between the mobile robot and the plurality of persons when the voice data is obtained. The computer-readable medium according to any one of Supplementary note 21 to 23, which determines whether or not a person is having a conversation.
(Appendix 26)
The mobile robot is equipped with a projector.
Before accepting the control by the movement control step, the mobile robot moves in the monitoring area while projecting a distance image showing the magnitude of the first predetermined distance from the projector to the ground.
By detecting the distance image and a plurality of people from the second video data and comparing the distance between the detected people with the size of the distance image, the distance images are located within the first predetermined distance from each other. Detects multiple people and
The computer-readable medium according to any one of Supplementary note 21 to 25, wherein the second video data in which the plurality of persons are detected in the first determination step is acquired as the first video data.
(Appendix 27)
In the movement control step, by analyzing the second video data, the direction or line-of-sight direction of the face of a person around the mobile robot is specified, and the region located in the specified face direction or line-of-sight direction is determined. The computer-readable medium according to any one of Supplementary note 21 to 26, which moves the mobile robot so as not to pass through.
(Appendix 28)
A second camera that captures a part of the monitoring area is provided.
The mobile robot moves an area that cannot be imaged by the second camera before accepting the control by the movement control step.
The computer-readable medium according to any one of Supplementary note 21 to 27, wherein the first video data is video data generated by the first camera or the second camera.
(Appendix 29)
The computer-readable medium according to Appendix 28, which identifies an area that cannot be imaged by the second camera by using the video data generated by the second camera and the map data of the monitoring area.
(Appendix 30)
When it is determined by the first determination step or the second determination step that the plurality of persons are having a conversation, a predetermined coping process is performed on the plurality of persons.
The predetermined coping process includes a process of bringing the mobile robot closer to the plurality of persons, a process of causing a display device or a projector provided in the mobile robot to output information indicating a warning, and a process provided in the mobile robot. The computer-readable medium according to any one of Supplementary note 21 to 29, which comprises one or more of the processes for outputting a warning sound to the speaker.
10      人物
20      移動型ロボット
22      カメラ
23      ビデオデータ
24      マイクロフォン
25      音声データ
26      アクチュエータ
27      移動手段
30      カメラ
32      ビデオデータ
40      人物グループ
500      コンピュータ
502      バス
504      プロセッサ
506      メモリ
508      ストレージデバイス
510      入出力インタフェース
512      ネットワークインタフェース
600      コントローラ
602      バス
604      プロセッサ
606      メモリ
608      ストレージデバイス
610      入出力インタフェース
612      ネットワークインタフェース
2000     会話監視装置
2020     第1判定部
2040     移動制御部
2060     第2判定部
10 People 20 Mobile Robot 22 Camera 23 Video Data 24 Microphone 25 Voice Data 26 Actuator 27 Transportation 30 Camera 32 Video Data 40 People Group 500 Computer 502 Bus 504 Processor 506 Memory 508 Storage Device 510 I / O Interface 512 Network Interface 600 Controller 602 Bus 604 Processor 606 Memory 608 Storage device 610 Input / output interface 612 Network interface 2000 Conversation monitoring device 2020 First judgment unit 2040 Mobile control unit 2060 Second judgment unit

Claims (30)

  1.  監視領域内において互いに第1所定距離以内に位置する複数の人物が検出された第1ビデオデータを取得し、前記第1ビデオデータを解析して、前記複数の人物が会話をしているか否かを判定する第1判定部と、
     前記第1ビデオデータを解析しても、前記複数の人物が会話をしているか否かを判別できない場合に、第1カメラが設けられている移動型ロボットを前記複数の人物の顔を撮像可能な位置まで移動させるか、又は、マイクロフォンが設けられている移動型ロボットを前記複数の人物から第2所定距離以内の場所まで移動させる移動制御部と、
     前記移動型ロボットを移動させた後に、前記第1カメラから得られる第2ビデオデータ又は前記マイクロフォンから得られる音声データを解析して、前記複数の人物が会話をしているか否かを判定する第2判定部と、を有する会話監視装置。
    Whether or not the plurality of persons are having a conversation by acquiring the first video data in which a plurality of persons located within the first predetermined distance from each other in the monitoring area are detected and analyzing the first video data. The first judgment unit that determines
    If it is not possible to determine whether or not the plurality of persons are talking even by analyzing the first video data, the mobile robot provided with the first camera can capture the faces of the plurality of persons. A movement control unit that moves a mobile robot provided with a microphone to a position within a second predetermined distance from the plurality of persons.
    After moving the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed to determine whether or not the plurality of persons are having a conversation. A conversation monitoring device having two determination units.
  2.  前記第1判定部は、
      前記第1ビデオデータに含まれる前記複数の人物それぞれの口の動き、顔の向き、又は視線の向きに基づいて、前記複数の人物が会話をしているか否かを判定し、
      前記複数の人物の中に、口の動き、顔の向き、又は視線の向きを特定できない人物が存在する場合に、前記複数の人物が会話をしているか否かを判別できないと判定する、請求項1に記載の会話監視装置。
    The first determination unit is
    Based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the first video data, it is determined whether or not the plurality of persons are having a conversation.
    When there is a person who cannot specify the movement of the mouth, the direction of the face, or the direction of the line of sight among the plurality of persons, it is determined that it cannot be determined whether or not the plurality of persons are having a conversation. Item 1. The conversation monitoring device according to item 1.
  3.  前記第1判定部は、前記第1ビデオデータに含まれる前記複数の人物について、会話が行われている確率と会話が行われていない確率の双方を算出し、会話が行われている確率と会話が行われていない確率の双方が閾値未満である場合に、前記複数の人物が会話をしているか否かを判別できないと判定する、請求項1に記載の会話監視装置。 The first determination unit calculates both the probability that a conversation is taking place and the probability that a conversation is not taking place for the plurality of persons included in the first video data, and determines the probability that the conversation is taking place. The conversation monitoring device according to claim 1, wherein it is determined that it cannot be determined whether or not the plurality of persons are having a conversation when both of the probabilities that no conversation is taking place are less than the threshold value.
  4.  前記第2判定部は、前記第2ビデオデータに含まれる前記複数の人物それぞれの口の動き、顔の向き、又は視線の向きに基づいて、前記複数の人物が会話をしているか否かを判定する、請求項1から3いずれか一項に記載の会話監視装置。 The second determination unit determines whether or not the plurality of persons are having a conversation based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the second video data. The conversation monitoring device according to any one of claims 1 to 3 for determination.
  5.  前記第2判定部は、前記音声データによって表される音声の大きさ、及びその音声データが得られた時の前記移動型ロボットと前記複数の人物との間の距離に基づいて、前記複数の人物が会話をしているか否かを判定する、請求項1から3いずれか一項に記載の会話監視装置。 The second determination unit is based on the loudness of the voice represented by the voice data and the distance between the mobile robot and the plurality of persons when the voice data is obtained. The conversation monitoring device according to any one of claims 1 to 3, which determines whether or not a person is having a conversation.
  6.  前記移動型ロボットにはプロジェクタが搭載されており、
     前記移動型ロボットは、前記移動制御部による制御を受け付ける前に、前記第1所定距離の大きさを表す距離画像を前記プロジェクタから地面に投影しつつ、前記監視領域内を移動し、
     前記第2ビデオデータから前記距離画像及び複数の人物を検出し、検出された各人物の間の距離と前記距離画像の大きさとを比較することで、互いに前記第1所定距離以内に位置する前記複数の人物を検出し、
     前記第1判定部は、当該複数の人物が検出された前記第2ビデオデータを、前記第1ビデオデータとして取得する、請求項1から5いずれか一項に記載の会話監視装置。
    The mobile robot is equipped with a projector.
    Before accepting the control by the movement control unit, the mobile robot moves in the monitoring area while projecting a distance image showing the magnitude of the first predetermined distance from the projector to the ground.
    By detecting the distance image and a plurality of people from the second video data and comparing the distance between the detected people with the size of the distance image, the distance images are located within the first predetermined distance from each other. Detects multiple people and
    The conversation monitoring device according to any one of claims 1 to 5, wherein the first determination unit acquires the second video data in which the plurality of persons are detected as the first video data.
  7.  前記移動制御部は、前記第2ビデオデータを解析することで、前記移動型ロボットの周囲の人物の顔の方向又は視線方向を特定し、当該特定した顔の方向又は視線方向に位置する領域を通過しないように、前記移動型ロボットを移動させる、請求項1から6いずれか一項に記載の会話監視装置。 The movement control unit identifies the direction or line-of-sight direction of the face of a person around the mobile robot by analyzing the second video data, and determines a region located in the specified face direction or line-of-sight direction. The conversation monitoring device according to any one of claims 1 to 6, wherein the mobile robot is moved so as not to pass through.
  8.  前記監視領域の一部を撮像する第2カメラが設けられており、
     前記移動型ロボットは、前記移動制御部による制御を受け付ける前に、前記第2カメラで撮像できない領域を移動し、
     前記第1ビデオデータは、前記第1カメラ又は前記第2カメラによって生成されるビデオデータである、請求項1から7いずれか一項に記載の会話監視装置。
    A second camera that captures a part of the monitoring area is provided.
    The mobile robot moves an area that cannot be imaged by the second camera before accepting control by the movement control unit.
    The conversation monitoring device according to any one of claims 1 to 7, wherein the first video data is video data generated by the first camera or the second camera.
  9.  前記第2カメラによって生成されるビデオデータと、前記監視領域の地図データとを利用して、前記第2カメラで撮像できない領域を特定する、請求項8に記載の会話監視装置。 The conversation monitoring device according to claim 8, wherein an area that cannot be imaged by the second camera is specified by using the video data generated by the second camera and the map data of the monitoring area.
  10.  前記第1判定部又は前記第2判定部によって、前記複数の人物が会話をしていると判定された場合、前記複数の人物に対して所定の対処処理を行い、
     前記所定の対処処理には、前記複数の人物に前記移動型ロボットを近づける処理、前記移動型ロボットに設けられているディスプレイ装置又はプロジェクタに警告を表す情報を出力させる処理、前記移動型ロボットに設けられているスピーカーに警告音を出力させる処理のうち、いずれか1つ以上が含まれる、請求項1から9いずれか一項に記載の会話監視装置。
    When it is determined by the first determination unit or the second determination unit that the plurality of persons are having a conversation, a predetermined coping process is performed on the plurality of persons.
    The predetermined coping process includes a process of bringing the mobile robot closer to the plurality of persons, a process of causing a display device or a projector provided in the mobile robot to output information indicating a warning, and a process provided in the mobile robot. The conversation monitoring device according to any one of claims 1 to 9, wherein one or more of the processes for outputting a warning sound to the speaker are included.
  11.  コンピュータによって実行される制御方法であって、
     監視領域内において互いに第1所定距離以内に位置する複数の人物が検出された第1ビデオデータを取得し、前記第1ビデオデータを解析して、前記複数の人物が会話をしているか否かを判定する第1判定ステップと、
     前記第1ビデオデータを解析しても、前記複数の人物が会話をしているか否かを判別できない場合に、第1カメラが設けられている移動型ロボットを前記複数の人物の顔を撮像可能な位置まで移動させるか、又は、マイクロフォンが設けられている移動型ロボットを前記複数の人物から第2所定距離以内の場所まで移動させる移動制御ステップと、
     前記移動型ロボットを移動させた後に、前記第1カメラから得られる第2ビデオデータ又は前記マイクロフォンから得られる音声データを解析して、前記複数の人物が会話をしているか否かを判定する第2判定ステップと、を有する制御方法。
    A control method performed by a computer
    Whether or not the plurality of persons are having a conversation by acquiring the first video data in which a plurality of persons located within the first predetermined distance from each other in the monitoring area are detected and analyzing the first video data. The first judgment step to determine
    If it is not possible to determine whether or not the plurality of persons are talking even by analyzing the first video data, the mobile robot provided with the first camera can capture the faces of the plurality of persons. A movement control step for moving a mobile robot provided with a microphone to a position within a second predetermined distance from the plurality of persons.
    After moving the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed to determine whether or not the plurality of persons are having a conversation. A control method comprising two determination steps.
  12.  前記第1判定ステップにおいて、
      前記第1ビデオデータに含まれる前記複数の人物それぞれの口の動き、顔の向き、又は視線の向きに基づいて、前記複数の人物が会話をしているか否かを判定し、
      前記複数の人物の中に、口の動き、顔の向き、又は視線の向きを特定できない人物が存在する場合に、前記複数の人物が会話をしているか否かを判別できないと判定する、請求項11に記載の制御方法。
    In the first determination step,
    Based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the first video data, it is determined whether or not the plurality of persons are having a conversation.
    When there is a person who cannot specify the movement of the mouth, the direction of the face, or the direction of the line of sight among the plurality of persons, it is determined that it cannot be determined whether or not the plurality of persons are having a conversation. Item 11. The control method according to Item 11.
  13.  前記第1判定ステップにおいて、前記第1ビデオデータに含まれる前記複数の人物について、会話が行われている確率と会話が行われていない確率の双方を算出し、会話が行われている確率と会話が行われていない確率の双方が閾値未満である場合に、前記複数の人物が会話をしているか否かを判別できないと判定する、請求項11に記載の制御方法。 In the first determination step, both the probability that a conversation is taking place and the probability that a conversation is not taking place are calculated for the plurality of persons included in the first video data, and the probability that the conversation is taking place is calculated. The control method according to claim 11, wherein it is determined that it cannot be determined whether or not the plurality of persons are having a conversation when both of the probabilities that the conversation is not taking place are less than the threshold value.
  14.  前記第2判定ステップにおいて、前記第2ビデオデータに含まれる前記複数の人物それぞれの口の動き、顔の向き、又は視線の向きに基づいて、前記複数の人物が会話をしているか否かを判定する、請求項11から13いずれか一項に記載の制御方法。 In the second determination step, whether or not the plurality of persons are having a conversation based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the second video data. The control method according to any one of claims 11 to 13, wherein the determination is made.
  15.  前記第2判定ステップにおいて、前記音声データによって表される音声の大きさ、及びその音声データが得られた時の前記移動型ロボットと前記複数の人物との間の距離に基づいて、前記複数の人物が会話をしているか否かを判定する、請求項11から13いずれか一項に記載の制御方法。 In the second determination step, the plurality of voice data are based on the loudness of the voice represented by the voice data and the distance between the mobile robot and the plurality of persons when the voice data is obtained. The control method according to any one of claims 11 to 13, wherein it is determined whether or not a person is having a conversation.
  16.  前記移動型ロボットにはプロジェクタが搭載されており、
     前記移動型ロボットは、前記移動制御ステップによる制御を受け付ける前に、前記第1所定距離の大きさを表す距離画像を前記プロジェクタから地面に投影しつつ、前記監視領域内を移動し、
     前記第2ビデオデータから前記距離画像及び複数の人物を検出し、検出された各人物の間の距離と前記距離画像の大きさとを比較することで、互いに前記第1所定距離以内に位置する前記複数の人物を検出し、
     前記第1判定ステップにおいて、当該複数の人物が検出された前記第2ビデオデータを、前記第1ビデオデータとして取得する、請求項11から15いずれか一項に記載の制御方法。
    The mobile robot is equipped with a projector.
    Before accepting the control by the movement control step, the mobile robot moves in the monitoring area while projecting a distance image showing the magnitude of the first predetermined distance from the projector to the ground.
    By detecting the distance image and a plurality of people from the second video data and comparing the distance between the detected people with the size of the distance image, the distance images are located within the first predetermined distance from each other. Detects multiple people and
    The control method according to any one of claims 11 to 15, wherein in the first determination step, the second video data in which the plurality of persons are detected is acquired as the first video data.
  17.  前記移動制御ステップにおいて、前記第2ビデオデータを解析することで、前記移動型ロボットの周囲の人物の顔の方向又は視線方向を特定し、当該特定した顔の方向又は視線方向に位置する領域を通過しないように、前記移動型ロボットを移動させる、請求項11から16いずれか一項に記載の制御方法。 In the movement control step, by analyzing the second video data, the direction or line-of-sight direction of the face of a person around the mobile robot is specified, and the region located in the specified face direction or line-of-sight direction is determined. The control method according to any one of claims 11 to 16, wherein the mobile robot is moved so as not to pass through.
  18.  前記監視領域の一部を撮像する第2カメラが設けられており、
     前記移動型ロボットは、前記移動制御ステップによる制御を受け付ける前に、前記第2カメラで撮像できない領域を移動し、
     前記第1ビデオデータは、前記第1カメラ又は前記第2カメラによって生成されるビデオデータである、請求項11から17いずれか一項に記載の制御方法。
    A second camera that captures a part of the monitoring area is provided.
    The mobile robot moves an area that cannot be imaged by the second camera before accepting the control by the movement control step.
    The control method according to any one of claims 11 to 17, wherein the first video data is video data generated by the first camera or the second camera.
  19.  前記第2カメラによって生成されるビデオデータと、前記監視領域の地図データとを利用して、前記第2カメラで撮像できない領域を特定する、請求項18に記載の制御方法。 The control method according to claim 18, wherein an area that cannot be captured by the second camera is specified by using the video data generated by the second camera and the map data of the monitoring area.
  20.  前記第1判定ステップ又は前記第2判定ステップによって、前記複数の人物が会話をしていると判定された場合、前記複数の人物に対して所定の対処処理を行い、
     前記所定の対処処理には、前記複数の人物に前記移動型ロボットを近づける処理、前記移動型ロボットに設けられているディスプレイ装置又はプロジェクタに警告を表す情報を出力させる処理、前記移動型ロボットに設けられているスピーカーに警告音を出力させる処理のうち、いずれか1つ以上が含まれる、請求項11から19いずれか一項に記載の制御方法。
    When it is determined by the first determination step or the second determination step that the plurality of persons are having a conversation, a predetermined coping process is performed on the plurality of persons.
    The predetermined coping process includes a process of bringing the mobile robot closer to the plurality of persons, a process of causing a display device or a projector provided in the mobile robot to output information indicating a warning, and a process provided in the mobile robot. The control method according to any one of claims 11 to 19, wherein one or more of the processes for outputting a warning sound to the speaker are included.
  21.  プログラムを記憶しているコンピュータ可読媒体であって、
     前記プログラムは、
     監視領域内において互いに第1所定距離以内に位置する複数の人物が検出された第1ビデオデータを取得し、前記第1ビデオデータを解析して、前記複数の人物が会話をしているか否かを判定する第1判定ステップと、
     前記第1ビデオデータを解析しても、前記複数の人物が会話をしているか否かを判別できない場合に、第1カメラが設けられている移動型ロボットを前記複数の人物の顔を撮像可能な位置まで移動させるか、又は、マイクロフォンが設けられている移動型ロボットを前記複数の人物から第2所定距離以内の場所まで移動させる移動制御ステップと、
     前記移動型ロボットを移動させた後に、前記第1カメラから得られる第2ビデオデータ又は前記マイクロフォンから得られる音声データを解析して、前記複数の人物が会話をしているか否かを判定する第2判定ステップと、をコンピュータに実行させる、コンピュータ可読媒体。
    A computer-readable medium that stores programs
    The program
    Whether or not the plurality of persons are having a conversation by acquiring the first video data in which a plurality of persons located within the first predetermined distance from each other in the monitoring area are detected and analyzing the first video data. The first judgment step to determine
    If it is not possible to determine whether or not the plurality of persons are talking even by analyzing the first video data, the mobile robot provided with the first camera can capture the faces of the plurality of persons. A movement control step for moving a mobile robot provided with a microphone to a position within a second predetermined distance from the plurality of persons.
    After moving the mobile robot, the second video data obtained from the first camera or the voice data obtained from the microphone is analyzed to determine whether or not the plurality of persons are having a conversation. A computer-readable medium that causes a computer to perform two judgment steps.
  22.  前記第1判定ステップにおいて、
      前記第1ビデオデータに含まれる前記複数の人物それぞれの口の動き、顔の向き、又は視線の向きに基づいて、前記複数の人物が会話をしているか否かを判定し、
      前記複数の人物の中に、口の動き、顔の向き、又は視線の向きを特定できない人物が存在する場合に、前記複数の人物が会話をしているか否かを判別できないと判定する、請求項21に記載のコンピュータ可読媒体。
    In the first determination step,
    Based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the first video data, it is determined whether or not the plurality of persons are having a conversation.
    A claim for determining whether or not the plurality of persons are having a conversation when there is a person who cannot specify the movement of the mouth, the direction of the face, or the direction of the line of sight among the plurality of persons. Item 21 is a computer-readable medium.
  23.  前記第1判定ステップにおいて、前記第1ビデオデータに含まれる前記複数の人物について、会話が行われている確率と会話が行われていない確率の双方を算出し、会話が行われている確率と会話が行われていない確率の双方が閾値未満である場合に、前記複数の人物が会話をしているか否かを判別できないと判定する、請求項21に記載のコンピュータ可読媒体。 In the first determination step, both the probability that a conversation is taking place and the probability that a conversation is not taking place are calculated for the plurality of persons included in the first video data, and the probability that the conversation is taking place is calculated. The computer-readable medium according to claim 21, wherein it is determined that it cannot be determined whether or not the plurality of persons are having a conversation when both of the probabilities that the conversation is not taking place are less than the threshold value.
  24.  前記第2判定ステップにおいて、前記第2ビデオデータに含まれる前記複数の人物それぞれの口の動き、顔の向き、又は視線の向きに基づいて、前記複数の人物が会話をしているか否かを判定する、請求項21から23いずれか一項に記載のコンピュータ可読媒体。 In the second determination step, whether or not the plurality of persons are having a conversation based on the movement of the mouth, the direction of the face, or the direction of the line of sight of each of the plurality of persons included in the second video data. The computer-readable medium according to any one of claims 21 to 23, which is determined.
  25.  前記第2判定ステップにおいて、前記音声データによって表される音声の大きさ、及びその音声データが得られた時の前記移動型ロボットと前記複数の人物との間の距離に基づいて、前記複数の人物が会話をしているか否かを判定する、請求項21から23いずれか一項に記載のコンピュータ可読媒体。 In the second determination step, the plurality of voice data are based on the loudness of the voice represented by the voice data and the distance between the mobile robot and the plurality of persons when the voice data is obtained. The computer-readable medium according to any one of claims 21 to 23, which determines whether or not a person is having a conversation.
  26.  前記移動型ロボットにはプロジェクタが搭載されており、
     前記移動型ロボットは、前記移動制御ステップによる制御を受け付ける前に、前記第1所定距離の大きさを表す距離画像を前記プロジェクタから地面に投影しつつ、前記監視領域内を移動し、
     前記第2ビデオデータから前記距離画像及び複数の人物を検出し、検出された各人物の間の距離と前記距離画像の大きさとを比較することで、互いに前記第1所定距離以内に位置する前記複数の人物を検出し、
     前記第1判定ステップにおいて、当該複数の人物が検出された前記第2ビデオデータを、前記第1ビデオデータとして取得する、請求項21から25いずれか一項に記載のコンピュータ可読媒体。
    The mobile robot is equipped with a projector.
    Before accepting the control by the movement control step, the mobile robot moves in the monitoring area while projecting a distance image showing the magnitude of the first predetermined distance from the projector to the ground.
    By detecting the distance image and a plurality of people from the second video data and comparing the distance between the detected people with the size of the distance image, the distance images are located within the first predetermined distance from each other. Detects multiple people and
    The computer-readable medium according to any one of claims 21 to 25, wherein the second video data in which the plurality of persons are detected in the first determination step is acquired as the first video data.
  27.  前記移動制御ステップにおいて、前記第2ビデオデータを解析することで、前記移動型ロボットの周囲の人物の顔の方向又は視線方向を特定し、当該特定した顔の方向又は視線方向に位置する領域を通過しないように、前記移動型ロボットを移動させる、請求項21から26いずれか一項に記載のコンピュータ可読媒体。 In the movement control step, by analyzing the second video data, the direction or line-of-sight direction of the face of a person around the mobile robot is specified, and the region located in the specified face direction or line-of-sight direction is determined. The computer-readable medium according to any one of claims 21 to 26, which moves the mobile robot so as not to pass through.
  28.  前記監視領域の一部を撮像する第2カメラが設けられており、
     前記移動型ロボットは、前記移動制御ステップによる制御を受け付ける前に、前記第2カメラで撮像できない領域を移動し、
     前記第1ビデオデータは、前記第1カメラ又は前記第2カメラによって生成されるビデオデータである、請求項21から27いずれか一項に記載のコンピュータ可読媒体。
    A second camera that captures a part of the monitoring area is provided.
    The mobile robot moves an area that cannot be imaged by the second camera before accepting the control by the movement control step.
    The computer-readable medium according to any one of claims 21 to 27, wherein the first video data is video data generated by the first camera or the second camera.
  29.  前記第2カメラによって生成されるビデオデータと、前記監視領域の地図データとを利用して、前記第2カメラで撮像できない領域を特定する、請求項28に記載のコンピュータ可読媒体。 The computer-readable medium according to claim 28, which identifies an area that cannot be captured by the second camera by using the video data generated by the second camera and the map data of the monitoring area.
  30.  前記第1判定ステップ又は前記第2判定ステップによって、前記複数の人物が会話をしていると判定された場合、前記複数の人物に対して所定の対処処理を行い、
     前記所定の対処処理には、前記複数の人物に前記移動型ロボットを近づける処理、前記移動型ロボットに設けられているディスプレイ装置又はプロジェクタに警告を表す情報を出力させる処理、前記移動型ロボットに設けられているスピーカーに警告音を出力させる処理のうち、いずれか1つ以上が含まれる、請求項21から29いずれか一項に記載のコンピュータ可読媒体。
    When it is determined by the first determination step or the second determination step that the plurality of persons are having a conversation, a predetermined coping process is performed on the plurality of persons.
    The predetermined coping process includes a process of bringing the mobile robot closer to the plurality of persons, a process of causing a display device or a projector provided in the mobile robot to output information indicating a warning, and a process provided in the mobile robot. The computer-readable medium according to any one of claims 21 to 29, which comprises one or more of the processes of outputting a warning sound to the speaker.
PCT/JP2020/026745 2020-07-08 2020-07-08 Conversation monitoring device, control method, and computer readable medium WO2022009349A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2022534569A JP7416253B2 (en) 2020-07-08 2020-07-08 Conversation monitoring device, control method, and program
PCT/JP2020/026745 WO2022009349A1 (en) 2020-07-08 2020-07-08 Conversation monitoring device, control method, and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/026745 WO2022009349A1 (en) 2020-07-08 2020-07-08 Conversation monitoring device, control method, and computer readable medium

Publications (1)

Publication Number Publication Date
WO2022009349A1 true WO2022009349A1 (en) 2022-01-13

Family

ID=79552401

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/026745 WO2022009349A1 (en) 2020-07-08 2020-07-08 Conversation monitoring device, control method, and computer readable medium

Country Status (2)

Country Link
JP (1) JP7416253B2 (en)
WO (1) WO2022009349A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010238182A (en) * 2009-03-31 2010-10-21 Sogo Keibi Hosho Co Ltd Autonomous mobile object and suspicious person detecting method
JP2019084628A (en) * 2017-11-07 2019-06-06 富士ゼロックス株式会社 Autonomous mobile robot
WO2019239813A1 (en) * 2018-06-14 2019-12-19 パナソニックIpマネジメント株式会社 Information processing method, information processing program, and information processing system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010238182A (en) * 2009-03-31 2010-10-21 Sogo Keibi Hosho Co Ltd Autonomous mobile object and suspicious person detecting method
JP2019084628A (en) * 2017-11-07 2019-06-06 富士ゼロックス株式会社 Autonomous mobile robot
WO2019239813A1 (en) * 2018-06-14 2019-12-19 パナソニックIpマネジメント株式会社 Information processing method, information processing program, and information processing system

Also Published As

Publication number Publication date
JPWO2022009349A1 (en) 2022-01-13
JP7416253B2 (en) 2024-01-17

Similar Documents

Publication Publication Date Title
JP7229662B2 (en) How to issue alerts in a video surveillance system
WO2021047306A1 (en) Abnormal behavior determination method and apparatus, terminal, and readable storage medium
WO2022183661A1 (en) Event detection method and apparatus, electronic device, storage medium, and program product
US11776274B2 (en) Information processing apparatus, control method, and program
KR101602753B1 (en) emergency call system using voice
JP5025607B2 (en) Abnormal behavior detection device
CN109492571B (en) Method and device for identifying human age and electronic equipment
US11375245B2 (en) Live video streaming based on an environment-related trigger
KR101692688B1 (en) Mode changing robot and control method thereof
WO2020024552A1 (en) Road safety monitoring method and system, and computer-readable storage medium
CN109544870B (en) Alarm judgment method for intelligent monitoring system and intelligent monitoring system
WO2020202865A1 (en) Person detection device and person detection method
WO2022183663A1 (en) Event detection method and apparatus, and electronic device, storage medium and program product
JP2022526071A (en) Situational awareness monitoring
WO2021057790A1 (en) Method and device for determining smoke
WO2022009349A1 (en) Conversation monitoring device, control method, and computer readable medium
CN109635691A (en) A kind of control method, device, equipment and storage medium
US11209796B2 (en) Surveillance system with intelligent robotic surveillance device
WO2022009339A1 (en) Conversation monitoring device, control method, and computer-readable medium
JP7476965B2 (en) Notification control device, control method, and program
CN111985309A (en) Alarm method, camera device and storage device
US20240038403A1 (en) Image display apparatus, image display system, image display method, and non-transitory computer readable medium
KR102141657B1 (en) Emergency guidance system based on voice and video
KR101520446B1 (en) Monitoring system for prevention beating and cruel act
CN112578909B (en) Method and device for equipment interaction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20944532

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022534569

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20944532

Country of ref document: EP

Kind code of ref document: A1