WO2023276700A1 - Focus determining system, communication analyzing system, and focus determining method - Google Patents

Focus determining system, communication analyzing system, and focus determining method Download PDF

Info

Publication number
WO2023276700A1
WO2023276700A1 PCT/JP2022/024136 JP2022024136W WO2023276700A1 WO 2023276700 A1 WO2023276700 A1 WO 2023276700A1 JP 2022024136 W JP2022024136 W JP 2022024136W WO 2023276700 A1 WO2023276700 A1 WO 2023276700A1
Authority
WO
WIPO (PCT)
Prior art keywords
person
space
detection unit
communication
people
Prior art date
Application number
PCT/JP2022/024136
Other languages
French (fr)
Japanese (ja)
Inventor
一樹 北村
直毅 吉川
ジャマル ムリアナ ユスフ ビン
プラティック プラネイ
ジアリ マ
Original Assignee
パナソニックIpマネジメント株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パナソニックIpマネジメント株式会社 filed Critical パナソニックIpマネジメント株式会社
Priority to JP2023531788A priority Critical patent/JPWO2023276700A1/ja
Publication of WO2023276700A1 publication Critical patent/WO2023276700A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk

Definitions

  • the present invention relates to a focus determination system, a communication analysis system, and a focus determination method.
  • Patent Literature 1 discloses an information providing apparatus that provides information on the psychological state of a subject in order to evaluate intellectual activity in a meeting, meeting, or the like.
  • the present invention provides a focus determination system and the like that can determine how much a target person is focused on communication.
  • a focus determination system acquires video information of a video in which a plurality of people staying in a space in which a display and a whiteboard are installed, and based on the acquired video information, the plurality of people in the space a human detection unit that detects a first position that is the position of a person; and a head that acquires the video information and detects the orientation of the head of the target person among the plurality of people based on the acquired video information. Based on a head orientation detection unit, the detected first position, and the detected orientation of the subject's head, determining the subject's focus on communication conducted by the plurality of people in the space.
  • a focus determination unit wherein when a person other than the target person among the plurality of people is speaking, the direction of the detected head of the target person is in the direction of the plurality of people.
  • a period in which a person other than the target person, the display, or the whiteboard is facing is determined as a period in which the target person is concentrating on the communication.
  • a communication analysis system includes: the focus determination system; and a communication analysis unit that analyzes the quality of communication performed by the plurality of people in the space based on the determined focus of the subject. Prepare.
  • a focus determination method acquires video information of a video in which a plurality of people staying in a space in which a display and a whiteboard are installed, and based on the acquired video information, the plurality of people in the space a first detection step of detecting a first position that is the position of the person; obtaining the image information; and detecting the orientation of the head of the target person among the plurality of persons based on the obtained image information. Determining the subject's focus on communication conducted by the plurality of people in the space based on a second sensing step and the sensed first position and the sensed orientation of the subject's head.
  • the focus determination step when a person other than the target person among the plurality of people is speaking, the orientation of the detected head of the target person is equal to the plurality of people.
  • a period in which the person faces any one of the persons other than the target person, the display, or the whiteboard is determined as a period in which the person focuses on the communication.
  • a program according to one aspect of the present invention is a program for causing a computer to execute the focus determination method.
  • the focus determination system and the like of the present invention can determine how much the target person is focused on communication.
  • FIG. 1 is a block diagram showing the functional configuration of the communication analysis system according to the embodiment.
  • FIG. 2 is a diagram illustrating an example of a space to which the communication analysis system according to the embodiment is applied;
  • FIG. 3 is an external view of the sensing device according to the embodiment.
  • FIG. 4 is a flow chart of the operation of the sound source detection unit included in the communication analysis system according to the embodiment.
  • FIG. 5 is a diagram schematically showing detection results of the position of the sound source.
  • FIG. 6 is a flow chart of the operation of the human detection unit included in the communication analysis system according to the embodiment.
  • FIG. 7 is a diagram schematically showing detection results of a person's position.
  • FIG. 8 is a flowchart of the operation of the speech amount estimation unit included in the communication analysis system according to the embodiment.
  • FIG. 9 is a diagram schematically showing the detection result of the position of the speaker.
  • FIG. 10 is a diagram showing an example of information indicating the amount of speech.
  • FIG. 11 is a flowchart of the operation of the head orientation detection unit included in the communication analysis system according to the embodiment;
  • FIG. 12 is a diagram showing a three-dimensional coordinate space for explaining the operation of the head orientation detection unit included in the communication analysis system according to the embodiment;
  • FIG. 13 is a diagram for explaining the correction of the angle of the orientation of the head.
  • 14 is a flowchart of the operation of the focus determination unit provided in the communication analysis system according to the embodiment;
  • FIG. FIG. 15 is a diagram for explaining the intended direction of the subject.
  • FIG. 16 is a diagram illustrating an example of information indicating a focus period.
  • 17 is a flowchart of the operation of the communication analysis unit included in the communication analysis system according to the embodiment;
  • FIG. FIG. 18 is a diagram showing an example of a score display screen indicating the
  • each figure is a schematic diagram and is not necessarily strictly illustrated. Moreover, in each figure, the same code
  • FIG. 1 is a block diagram showing the functional configuration of the communication analysis system according to the embodiment.
  • FIG. 2 is a diagram showing an example of a space to which the communication analysis system is applied.
  • the communication analysis system 10 is used in an office having a space 70 such as a conference room to analyze the quality of communication of a plurality of people located in a space 70.
  • the space 70 is, for example, a closed space, but may be an open space. Examples of the space 70 include a conference room as well as an open rest area in an office space (where chairs and tables are placed in part of the office space). Also, the space 70 does not need to be physically separated, and may be a place separated by illumination light or airflow in the entire space. For example, a warm color area with a color temperature of 3000 K may be provided in a corner of an office space illuminated with daylight color with a color temperature of 5000 K, and this area may be used as the space 70 .
  • the communication analysis system 10 includes a sensing device 20 and an information processing system 30.
  • the sensing device 20 will be described with reference to FIG. 3 in addition to FIGS. 1 and 2.
  • FIG. FIG. 3 is an external view of the sensing device 20.
  • FIG. 3A is a top view of the sensing device 20
  • FIG. 3B is a side view of the sensing device 20.
  • the sensing device 20 is installed on the desk 40 installed in the space 70 and senses sounds and images in the space 70 .
  • the sensing device 20 is specifically installed in the center of the desk 40 .
  • the sensing device 20 includes a microphone array 21 , a plurality of cameras 22 and a ranging sensor 23 .
  • the microphone array 21 acquires sound in the space 70 and outputs sound information (a plurality of sound signals) of the acquired sound.
  • the microphone array 21 specifically includes a plurality of microphone elements, and each of the plurality of microphone elements acquires sound in the space 70 and outputs a sound signal of the acquired sound.
  • Each of the plurality of cameras 22 captures an image (in other words, moving image) of a person staying in the space 70 and outputs image information of the image.
  • the camera 22 is a general camera implemented by a CMOS image sensor or the like, but may be a fisheye camera or the like.
  • the sensing device 20 has four cameras so that the entire surroundings of the sensing device 20 can be photographed from the desk 40, and at least one camera capable of photographing all the people staying in the space 70 is provided. Just be prepared.
  • the distance measuring sensor 23 measures the distance from the sensing device 20 (camera 22) to the object, and outputs distance information indicating the measured distance to the object.
  • the object is, for example, a person staying in the space 70 .
  • the ranging sensor 23 is, for example, a TOF (Time Of Flight) type LiDAR (Light Detection and Ranging), but may be a range image sensor or the like.
  • Sensing device 20 may include at least one ranging sensor 23 , but may include a plurality of ranging sensors 23 corresponding to cameras 22 .
  • the information processing system 30 performs wired or wireless communication with the sensing device 20, and based on sensing information (specifically, sound information, video information, distance information, etc.) acquired from the sensing device 20 through the communication. , to analyze the quality of communication.
  • the information processing system 30 is, for example, an edge computer installed in a facility having a space 70, but may be a cloud computer installed outside the facility.
  • the sensing device 20 and the information processing system 30 may be realized as one integrated device.
  • part of the functions of the information processing system 30 may be implemented as an edge computer, and part of the other functions may be implemented by a cloud computer.
  • the information processing system 30 includes a sound source detection unit 31, a person detection unit 32, a head direction detection unit 33, a speech amount estimation unit 34, a focus determination unit 35, a communication analysis unit 36, and a storage unit 37. Prepare.
  • the sound source detection unit 31 acquires sound information of the sound acquired in the space 70 from the sensing device 20, and detects a second position, which is the position of the sound source in the space 70, based on the acquired sound information.
  • the human detection unit 32 acquires from the sensing device 20 image information of an image showing a person staying in the space 70, and detects the first position, which is the position of the person in the space 70, based on the acquired image information.
  • the head orientation detection unit 33 acquires from the sensing device 20 image information of an image of a person staying in the space 70, and detects the orientation of the person's head (in other words, the orientation of the face) based on the acquired image information. detect.
  • the head direction detection unit 33 may detect the direction of the person's line of sight based on the acquired video information.
  • the speech amount estimation unit 34 estimates the amount of human speech based on the first position detected by the human detection unit 32 and the second position detected by the sound source detection unit 31 .
  • the focus determination unit 35 detects a plurality of people including the person in the space 70 based on the first position detected by the person detection unit 32 and the orientation of the person's head detected by the head orientation detection unit 33. Determines a person's focus on communications made by
  • the communication analysis unit 36 analyzes the quality of communication based on at least one of the human speech volume estimated by the speech volume estimation unit 34 and the human focus determined by the focus determination unit 35 .
  • the communication analysis unit 36 also outputs analysis results.
  • Each of the sound source detection unit 31, the person detection unit 32, the head orientation detection unit 33, the speech amount estimation unit 34, the focus determination unit 35, and the communication analysis unit 36 described above is realized by a microcomputer or a processor.
  • the functions of the sound source detection unit 31, the human detection unit 32, the head orientation detection unit 33, the speech amount estimation unit 34, the focus determination unit 35, and the communication analysis unit 36 are, for example, stored in the memory unit 37 by the microcomputer or processor. It is implemented by executing a stored computer program.
  • the storage unit 37 is a storage device that stores the computer program and information necessary for realizing the functions of the components.
  • the storage unit 37 is implemented by, for example, an HDD (Hard Disk Drive), but may be implemented by a semiconductor memory.
  • a system including the sound source detection unit 31, the human detection unit 32, and the speech amount estimation unit 34 is also described as a speaker diarization system 38. That is, the speaker diarization system 38 comprises a sound source detection unit 31 , a person detection unit 32 and a speech amount estimation unit 34 . Speaker diarization system 38 may further comprise head orientation detection unit 33 or focus determination unit 35 .
  • a system including the human detection unit 32, the head orientation detection unit 33, and the focus determination unit 35 is also described as a focus determination system 39. That is, the focus determination system 39 includes the human detection unit 32 , the head orientation detection unit 33 , and the focus determination unit 35 .
  • the focus determination system 39 may further include the sound source detection unit 31 or the speech amount estimation unit 34 .
  • FIG. 4 is a flow chart of the operation of the sound source detection unit 31. As shown in FIG.
  • the sound source detection unit 31 acquires sound information of the sound acquired in the space 70 from the microphone array 21 of the sensing device 20 (S11). Specifically, the sound source detection unit 31 acquires multiple sound signals output by multiple microphone elements included in the microphone array 21 .
  • each of the acquired multiple sound signals is a signal in the time domain.
  • the sound source detection unit 31 transforms each of the plurality of sound signals from a time domain signal into a frequency domain signal by performing a Fourier transform (S12).
  • the sound source detection unit 31 calculates a spatial correlation matrix from the input vector determined based on the multiple sound signals after being transformed into the frequency domain (S13).
  • the microphone array 21 has M microphone elements, and the m-th sound signal after the Fourier transform is X m ( ⁇ , t), the input vector x( ⁇ , t) is expressed by the following equation. be done. T means transpose.
  • the spatial correlation matrix R is represented by the following formula.
  • H means conjugate transpose.
  • the frequency index ⁇ will be omitted for the sake of simplicity.
  • the sound source detection unit 31 calculates eigenvectors by eigenvalue decomposition of the spatial correlation matrix (S14). Specifically, the sound source detection unit 31 performs eigenvalue decomposition on the above spatial correlation matrix based on the following equation to obtain eigenvalue vectors e 1 , . . . e M and eigenvalues ⁇ 1 . can be calculated.
  • the sound source detection unit 31 detects the position of the sound source from the eigenvectors (S15). Specifically, the sound source detection unit 31 can identify the loudness of the sound and the direction from which the sound arrives from the eigenvector, and can detect the direction from which the relatively loud sound arrives as the direction (position) of the sound source. can.
  • the sound source detection unit 31 can detect in which direction (angle) the sound source is positioned with respect to the position O of the sensing device 20 .
  • FIG. 5 is a diagram (a diagram of the desk 40 viewed from above) schematically showing the detection result of the position of the sound source.
  • two sound sources S1 and S2 are detected.
  • the sound source detection unit 31 should detect at least the two-dimensional position of the sound source as shown in FIG. You may In the following embodiments, the position of the sound source detected by the sound source detection unit 31 is also described as the second position.
  • the sound source detection unit 31 can track the position of the sound source (second position) by repeating the operation of FIG. 4 every unit time.
  • the sound source is specifically a speaker (person) staying in the space 70 , but may also be a device installed in the space 70 .
  • FIG. 6 is a flow chart of the operation of the human detection unit 32. As shown in FIG. Note that the following operations are actually performed for video information acquired from each of the four cameras 22, but for the sake of convenience, the following description assumes that video information is acquired from one camera 22.
  • the human detection unit 32 acquires image information of the image acquired in the space 70 from the camera 22 of the sensing device 20 (S21).
  • the human detection unit 32 identifies an area in which a person appears in the image based on the acquired image information (S22).
  • the human detection unit 32 can identify an area in which a person appears in the video by a method using pattern matching, a method using a machine learning model, or the like.
  • the human detection unit 32 assigns identification information to the specified area (that is, the area where a person is present) (S23). For example, the human detection unit 32 identifies three areas and assigns identification information of A, B, and C to the identified three areas.
  • a person corresponding to the area A is also described as a person A, a person corresponding to the area B as a person B, and a person corresponding to the area C as a person C.
  • the human detection unit 32 identifies the direction in which the person is based on the position in the image of the area identified in step S23 (S24).
  • the storage unit 37 stores in advance information indicating the installation position of the camera 22 (the center on the desk 40) and the shooting range (angle of view) of the camera 22. It is possible to specify which direction the position corresponds to.
  • the human detection unit 32 estimates the distance from the sensing device 20 (camera 22) to the person based on the size of the area specified in step S23 (S25). In this case, it is estimated that the larger the area specified in step S23, the closer the distance from the sensing device 20 (camera 22) to the person.
  • the distance from the sensing device 20 (camera 22 ) to the person may be specified based on the distance information (measured distance value) acquired from the ranging sensor 23 .
  • the human detection unit 32 can detect the position of a person in the space 70, as shown in FIG.
  • FIG. 7 is a diagram (a diagram of the desk 40 viewed from above) schematically showing the detection result of the position of a person.
  • the human detection unit 32 detects a three-dimensional human position (three-dimensional coordinates of the human position). direction) can be detected.
  • the position of the person detected by the human detection unit 32 is also described as the first position.
  • the human detection unit 32 can track the position of a person (first position) by repeating the operation of FIG. 6 every unit time. At this time, the assignment of the identification information in step S23 may be performed only once for the first time.
  • FIG. 8 is a flow chart of the operation of the speech amount estimation unit 34. As shown in FIG.
  • the speech amount estimation unit 34 acquires the first position detected by the human detection unit 32 and the second position detected by the sound source detection unit 31 (S31). The first position and the second position acquired at this time are detected at substantially the same timing. “Substantially the same” means that some deviation may be included.
  • the speech amount estimation unit 34 converts the first position represented by three-dimensional coordinates into two-dimensional coordinates (angle) corresponding to the second position (S32). Note that if the position of the microphone array 21 and the position of the camera 22 are different, the first position after being converted into two-dimensional coordinates is corrected based on the difference between these positions.
  • FIG. 9 is a diagram schematically showing the detection result of the position of the speaker.
  • FIG. 9 is a superimposed view of the second position (FIG. 5) and the first position (FIG. 7).
  • the speech amount estimation unit 34 detects that the sound source S1 is the person A, for example, when the angle difference ⁇ 1 between the second position of the sound source S1 and the first position of the person A is less than or equal to a predetermined value. That is, the person A is detected as the speaker. Further, the speech amount estimation unit 34 detects that the sound source S2 is the person C, for example, when the angle difference ⁇ 2 between the second position of the sound source S2 and the first position of the person C is equal to or less than a predetermined value. That is, person C is detected as the speaker.
  • the speech amount estimation unit 34 tracks each of person A, person B, and person C by repeating the operation of FIG. can be estimated. Specifically, the speech amount estimation unit 34 estimates the period during which the person A, the person B, and the person C are detected as the speaker. It can be estimated that the That is, the speech amount estimation unit 34 can store information (information indicating the amount of speech) indicating the period during which the person A, person B, and person C are speaking in the storage unit 37 .
  • FIG. 10 is a diagram showing an example of information indicating the amount of speech. As shown in FIG. 10, the information indicating the amount of speech is information in which the amount of speech (speech time) is associated with each piece of identification information assigned in step S23.
  • the speech amount estimation unit 34 tracks each of the plurality of persons based on the first position detected by the human detection unit 32 and the second position detected by the sound source detection unit 31, and It is possible to estimate the amount of speech of each of the plurality of people staying in the space 70 (the amount of speech of the target person).
  • Such a method of estimating the amount of speech by the speech amount estimation unit 34 is useful for analyzing the quality of communication in a conference or the like involving movement of seats by a plurality of people.
  • the movement of seats by a plurality of people means moving to use the whiteboard 60, for example.
  • the speech volume estimation unit 34 estimates the speech volume of each of the plurality of people while maintaining the anonymity of the plurality of people, instead of performing individual identification by voice recognition and individual identification by image recognition. can be done.
  • FIG. 11 is a flow chart of the operation of the head orientation detection unit 33.
  • FIG. 12 is a diagram showing a three-dimensional coordinate space for explaining the operation of the head orientation detection unit 33.
  • the camera 22 is positioned at a position on the desk 40, not at the center position O (0, 0, 0) of the desk 40. It is described as being located at C(x0, y0, z0).
  • FIG. 12 also shows the coordinates in the image indicated by UV.
  • the head orientation detection unit 33 acquires image information of the image acquired in the space 70 from the camera 22 of the sensing device 20 (S41).
  • the head orientation detection unit 33 identifies the orientation of the person's head in the image based on the acquired image information (S42).
  • the head orientation detection unit 33 detects the orientation of the person's head by, for example, identifying an area corresponding to the person's head in the video by face recognition processing and applying a machine learning model to the identified area. can do.
  • the vector indicating the direction of the head at this time forms an angle of A° with respect to the segment (straight line) connecting the position P of the person and the position C of the camera. .
  • the head direction detection unit 33 acquires the position of the person (more specifically, the position of the person's head) detected by the person detection unit 32 (S43). As shown in FIG. 12, the position of the person at this time is P(x1, y1, z1).
  • the horizontal viewing angle ⁇ of the camera 22 the position (u, v) of the person in the image, and the width w of the image, , x1 can be expressed by the following equation.
  • Information such as the size of the desk 40, the horizontal viewing angle ⁇ of the camera 22, and the width w of the image is stored in the storage unit 37 in advance.
  • the head orientation detection unit 33 detects the orientation of the head (angle A) with reference to the position of the camera 22 specified in step S42, and the position (specifically, coordinates) of the person acquired in step S43. x1 value) (S44).
  • FIG. 13 is a diagram (a diagram of the desk 40 viewed from above) for explaining such angle correction.
  • the head orientation detection unit 33 can calculate ⁇ OPC in FIG. 13 based on the x1 coordinate acquired in step S43, and can set A+ ⁇ OPC as the corrected angle.
  • the head orientation detection unit 33 can track the orientation of the person's head by repeating the operation of FIG. 11 every unit time.
  • FIG. 14 is a flow chart of the operation of the focus determination unit 35. As shown in FIG.
  • the focus determination unit 35 acquires the first position detected by the human detection unit 32 and the orientation of the person's head detected by the head orientation detection unit 33 (S51). More specifically, the first position acquired at this time is the first position of each of the plurality of people staying in the space 70, and the orientation of the head of each of the plurality of persons is the orientation of the head of each of the plurality of persons. be.
  • FIG. 15 is a diagram (a diagram of the desk 40 viewed from above) for explaining the intended direction of the subject.
  • the focus determination unit 35 determines a direction connecting position G where another person is from position P as the target direction.
  • the target direction does not necessarily have to be determined for a person, and may be determined for the display 50 and the whiteboard 60 as described later.
  • the focus determination unit 35 determines the allowable range around the target direction (S53). For example, in the example of FIG. 15, the range from ⁇ GPO ⁇ to ⁇ GPO+ ⁇ is determined as the allowable range. ⁇ is a coefficient for determining the allowable range. The value of ⁇ is different depending on whether the target direction is a person, the display 50 or the whiteboard 60 .
  • the focus determination unit 35 determines the focus of the subject by comparing the orientation of the subject's head acquired in step S51 with the allowable range determined in step S53 (S54). Focus here means focus on communication performed by a plurality of people including the target person in the space 70 . The focus determination unit 35 determines that the target person is focusing on communication when the direction of the target person's head is within the allowable range, since it is considered that the target person is looking at another person. On the other hand, when the target person's head orientation is outside the allowable range, the focus determination unit 35 determines that the target person is not focusing on communication because it is considered that the target person is not looking at other people.
  • the focus determination unit 35 repeats the operation of FIG. C can determine the focus of each. Specifically, the focus determination unit 35 accumulates the periods in which each of person A, person B, and person C is determined to be focused, and indicates the period in which multiple people are focused on communication. Information can be stored in the storage unit 37 .
  • FIG. 16 is a diagram illustrating an example of information indicating a focus period. As shown in FIG. 16, the information indicating the amount of speech is information in which each piece of identification information assigned in step S23 is associated with a focus period.
  • the focus determination unit 35 determines whether a plurality of people are performing in the space 70 based on the first position detected by the human detection unit 32 and the orientation of the head detected by the head orientation detection unit 33 . It is possible to determine the subject's focus on the communication received.
  • the focus determination unit 35 further acquires the second position detected by the sound source detection unit 31 to detect the position of the speaker, or acquires the position of the speaker detected by the speech amount estimation unit 34. You may In either case, the focus determination unit 35 can determine focus by considering whether or not each of the plurality of people is a speaker. For example, only when the subject's head is facing the speaker's direction, it is determined that the subject is focusing on communication, and when the subject's head is looking in the direction of the person who is not speaking, the subject is focusing on communication. It can be determined that they are not.
  • the focus determination unit 35 may determine focus as follows.
  • the target person (assumed to be a person C) is positioned right beside the target person even if the person A located right beside the target person is speaking. It is difficult to turn to the person who is located. Therefore, it is considered to look at the other person B.
  • the target person faces the display 50 (shown in FIG. 2) and the whiteboard 60 (shown in FIG. 2) on which meeting report materials and the like are displayed.
  • the target person is focusing on communication. It may be determined that That is, when a person other than the target person among the plurality of people is speaking, the focus determination unit 35 determines that the orientation of the detected target person's head is A period in which the person faces either 50 or the whiteboard 60 may be determined as a period in which the person is concentrating on communication. Note that the period during which the target person is facing the whiteboard 60 may be determined as the period during which the target person is concentrating on communication under the condition that person A is speaking near the whiteboard. .
  • FIG. 17 is a flow chart of the operation of the communication analysis unit 36. As shown in FIG.
  • the communication analysis unit 36 reads out the information indicating the amount of speech (FIG. 10) and the information indicating the focus period (FIG. 16) stored in the storage unit 37 (S61).
  • the communication analysis unit 36 analyzes the quality of communication in the space 70 based on the obtained information indicating the amount of speech and the obtained information indicating the focus period. For example, the communication analysis unit scores communication quality with the following scoring criteria.
  • the communication analysis unit 36 calculates, for example, the ratio of the speech volumes of persons A to C based on the acquired information indicating the speech volumes, and the minimum/maximum value of this ratio is close to 1 (that is, the person A ⁇ Person C speaks more evenly), the higher the first score related to the amount of speech is set. Specifically, when the ratio of the amount of speech from person A to person C is 1:1.2:1.5, the communication analysis unit 36 calculates 1 (minimum value)/1.5 (maximum value) and 1 A first score is calculated based on the difference between .
  • the communication analysis unit 36 for example, based on the acquired information indicating the focus period, sets the second score related to focus to a larger value as the average value of the focus periods of persons A to C increases.
  • the communication analysis unit 36 calculates the sum of the first score and the second score as the final score indicating the quality of communication in the space 70.
  • Each of the persons A to C for example, by accessing the communication analysis system 10 (information processing system 30) using an information terminal such as a smartphone or a personal computer, obtains a final score indicating the quality of communication (that is, the quality of communication quality analysis results) can be confirmed.
  • FIG. 18 is a diagram showing an example of a score display screen indicating the quality of communication displayed on the information terminal.
  • the communication analysis unit 36 can analyze the quality of communication performed by a plurality of people in the space 70 based on the speech volume estimation result and the focus determination result.
  • the communication analysis unit 36 may analyze the quality of communication performed by a plurality of people in the space 70 based on at least one of the speech volume estimation result and the focus determination result.
  • the score calculation criteria as described above are merely an example, and the score may be appropriately determined according to the content required for communication in the space 70 .
  • the microphone array 21, the camera 22, and the ranging sensor 23 are installed on the desk 40.
  • the microphone array 21, camera 22 and ranging sensor 23 may be installed on the ceiling.
  • the microphone array 21, the camera 22, and the ranging sensor 23 do not have to be concentrated in one place, and the microphone array 21, the camera 22, and the ranging sensor 23 may be installed in different places. good.
  • the sound source detection unit 31 and the human detection unit 32 are used together to detect the speaker, but the speaker can also be detected with the human detection unit 32 alone.
  • the human detection unit 32 can detect whether or not the person is speaking from the movement of the person in the video.
  • the communication analysis unit 36 calculates the final score indicating the quality of communication (that is, the analysis result of the quality of communication) in real time while the meeting or the like is being held in the space 70, and calculates the final score.
  • the environment in the space 70 may be controlled in real time based on the final score obtained.
  • the communication analysis unit 36 determines that the calculated final score is less than a predetermined value (that is, the communication is not active), it sends a control signal to the environment control device (not shown) installed in the space 70. By doing so, the environment of the space 70 is controlled.
  • a predetermined value that is, the communication is not active
  • the environment control devices include an air conditioner that controls the temperature environment of the space 70, a lighting device that controls the light environment of the space 70, a fragrance generator that controls the smell of the space 70, and the sound of the space 70. Examples include a music player that controls the environment.
  • the communication analysis unit 36 can activate communication in the space 70 by raising the set temperature of the air conditioner or making the lighting equipment brighter than it is now.
  • the communication analysis unit 36 activates the communication in the space 70 by activating the scent generator installed in the space 70 or by causing the music playback device installed in the space 70 to play back music. good too.
  • the focus determination system 39 acquires image information of images showing a plurality of people staying in the space 70 in which the display 50 and the whiteboard 60 are installed, and based on the acquired image information,
  • a human detection unit 32 that detects a first position that is the position of a plurality of people, and a head that acquires image information and detects the orientation of the head of a target person among the plurality of people based on the acquired image information.
  • An orientation detection unit 33 and a focus determination unit that determines the target person's focus on communication performed by a plurality of people in the space 70 based on the detected first position and the detected target person's head orientation. 35.
  • the focus determination unit 35 determines that when a person other than the target person among the plurality of people is speaking, the orientation of the detected target person's head is different from that of the person other than the target person among the plurality of people, the display 50, Also, the period during which the target person faces any one of the whiteboards 60 is determined as the period during which the subject is concentrating on communication.
  • Such a focus determination system 39 can determine how much the target person is focused on communication.
  • a desk 40 is installed in the space 70 , and the human detection unit 32 acquires video information from the camera 22 installed on the desk 40 .
  • Such a focus determination system 39 can acquire video information from the camera 22 installed on the desk 40 .
  • a plurality of cameras 22 are installed in the space 70 , and the image information acquired by the human detection unit 32 includes image information of images captured by each of the plurality of cameras 22 .
  • Such a focus determination system 39 can acquire video information from multiple cameras 22 installed on the desk 40 .
  • a ranging sensor 23 that measures the distance from the camera 22 to the subject is installed.
  • the human detection unit 32 detects the first position based on the acquired image information and the detection result of the distance measuring sensor 23 .
  • Such a focus determination system 39 can detect the first position based on the detection result of the distance measuring sensor 23 .
  • the human detection unit 32 estimates the distance from the human detection unit 32 to the target person based on the size of the target person in the image in order to detect the first position.
  • Such a focus determination system 39 can estimate the distance from the human detection unit 32 to the target person based on the size of the target person in the image.
  • the communication analysis system 10 also includes a focus determination system 39 and a communication analysis unit 36 that analyzes the quality of communication conducted by a plurality of people in the space 70 based on the determined focus of the subject.
  • Such a communication analysis system 10 can analyze the quality of communication based on the focus of the target person.
  • the communication analysis system 10 further acquires sound information of the sound acquired in the space 70, and a sound source detection unit that detects a second position, which is the position of the sound source in the space 70, based on the acquired sound information.
  • a speech volume estimation unit 34 for tracking the subject and estimating the subject's speech volume during tracking based on the detected first position and the detected second position.
  • the communication analysis unit 36 analyzes the quality of communication based on the determined focus of the subject and the estimated speech volume of the subject.
  • Such a communication analysis system 10 can analyze the quality of communication based on the subject's focus and the subject's utterance volume.
  • the focus determination method executed by a computer such as the focus determination unit 35 acquires video information of a video in which a plurality of people staying in the space 70 in which the display 50 and the whiteboard 60 are installed, and acquires the acquired video information.
  • a first detection step of detecting a first position which is the position of a plurality of people in the space 70, based on the above;
  • Based on the second detecting step of detecting orientation, the detected first position, and the detected orientation of the subject's head determining the subject's focus on communication conducted by a plurality of people in the space 70. and a focus determination step.
  • the focus determination step when a person other than the target person among the plurality of people is speaking, the orientation of the detected target person's head is different from that of the person other than the target person among the plurality of people, the display 50, Also, the period during which the person faces any one of the whiteboards 60 is determined as the period during which the person is concentrating on communication.
  • Such a focus determination method can determine how much the target person is focused on communication.
  • the communication analysis system was implemented by multiple devices, but it may be implemented as a single device.
  • the communication analysis system may be implemented as a single device corresponding to an information processing system, a speaker diarization system, or an attention determination system.
  • the functional components included in the communication analysis system may be distributed to the multiple devices in any way.
  • the communication method between devices in the above embodiment is not particularly limited. Further, a relay device (not shown) may intervene in communication between devices.
  • processing executed by a specific processing unit may be executed by another processing unit.
  • order of multiple processes may be changed, and multiple processes may be executed in parallel.
  • each component may be realized by executing a software program suitable for each component.
  • Each component may be realized by reading and executing a software program recorded in a recording medium such as a hard disk or a semiconductor memory by a program execution unit such as a CPU or processor.
  • each component may be realized by hardware.
  • each component may be a circuit (or integrated circuit). These circuits may form one circuit as a whole, or may be separate circuits. These circuits may be general-purpose circuits or dedicated circuits.
  • the present invention may be implemented as a speech amount estimation method executed by a computer such as a speaker diarization system, or may be implemented as a program for causing a computer to execute such a speech amount estimation method.
  • a computer such as a speaker diarization system
  • it may be realized as a computer-readable non-temporary recording medium in which such a program is recorded.
  • the present invention may be implemented as a focus determination method executed by a computer such as a focus determination system, or may be implemented as a program for causing a computer to execute such a focus determination method. It may be implemented as a computer-readable non-temporary recording medium on which a program is recorded.
  • communication analysis system 20 sensing device 21 microphone array 22 camera 23 ranging sensor (sensor) 30 information processing system 31 sound source detection unit 32 person detection unit 33 head orientation detection unit 34 speech amount estimation unit 35 focus determination unit 36 communication analysis unit 37 memory unit 38 speaker diarization system 39 focus determination system 40 desk 50 display 60 white board 70 space

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A focus determining system (39) comprises: a person detecting unit (32) for detecting first positions, which are the positions of a plurality of persons in a space (70), on the basis of video information; a head orientation detecting unit (33) for detecting the orientation of the head of a subject among the plurality of persons, on the basis of the video information; and a focus determining unit (35) for determining, on the basis of the detected first positions and the detected orientation of the head of the subject, that when a person among the plurality of persons, other than the subject, is speaking, a period in which the detected orientation of the head of the subject is facing any of a person among the plurality of persons, other than the subject, a display (50), or a whiteboard (60), is a period during which the subject is focusing on communication.

Description

注力判定システム、コミュニケーション解析システム、及び、注力判定方法Focus determination system, communication analysis system, and focus determination method
 本発明は、注力判定システム、コミュニケーション解析システム、及び、注力判定方法に関する。 The present invention relates to a focus determination system, a communication analysis system, and a focus determination method.
 会社などの組織においては、従業員同士がコミュニケーションを密にとって各自のタスクに取り組むことが重要である。このようなコミュニケーションに関する技術として、特許文献1には、会議や打ち合わせなどの場における知的活動に対する評価をするために対象者の心理状態を情報として提供する情報提供装置が開示されている。 In an organization such as a company, it is important for employees to communicate closely with each other and work on their own tasks. As a technology related to such communication, Patent Literature 1 discloses an information providing apparatus that provides information on the psychological state of a subject in order to evaluate intellectual activity in a meeting, meeting, or the like.
特開2004‐112518号公報JP-A-2004-112518
 本発明は、対象者がコミュニケーションに対してどの程度注力しているかを判定することができる注力判定システム等を提供する。 The present invention provides a focus determination system and the like that can determine how much a target person is focused on communication.
 本発明の一態様に係る注力判定システムは、ディスプレイ及びホワイトボードが設置された空間に滞在する複数の人が映る映像の映像情報を取得し、取得した前記映像情報に基づいて前記空間における前記複数の人の位置である第一位置を検知する人検知ユニットと、前記映像情報を取得し、取得した前記映像情報に基づいて前記複数の人のうちの対象者の頭部の向きを検知する頭部向き検知ユニットと、検知された前記第一位置、及び、検知された前記対象者の頭部の向きに基づいて、前記空間において前記複数の人によって行われるコミュニケーションに対する前記対象者の注力を判定する注力判定ユニットとを備え、前記注力判定ユニットは、前記複数の人のうち前記対象者以外の人が発話しているときに、検知された前記対象者の頭部の向きが、前記複数の人のうち前記対象者以外の人、前記ディスプレイ、及び、前記ホワイトボードのいずれかを向いている期間を、前記対象者が前記コミュニケーションに対して注力している期間であると判定する。 A focus determination system according to an aspect of the present invention acquires video information of a video in which a plurality of people staying in a space in which a display and a whiteboard are installed, and based on the acquired video information, the plurality of people in the space a human detection unit that detects a first position that is the position of a person; and a head that acquires the video information and detects the orientation of the head of the target person among the plurality of people based on the acquired video information. Based on a head orientation detection unit, the detected first position, and the detected orientation of the subject's head, determining the subject's focus on communication conducted by the plurality of people in the space. and a focus determination unit, wherein when a person other than the target person among the plurality of people is speaking, the direction of the detected head of the target person is in the direction of the plurality of people. A period in which a person other than the target person, the display, or the whiteboard is facing is determined as a period in which the target person is concentrating on the communication.
 本発明の一態様に係るコミュニケーション解析システムは、前記注力判定システムと、判定された前記対象者の注力に基づいて、前記空間において前記複数の人によって行われるコミュニケーションの品質を解析するコミュニケーション解析ユニットとを備える。 A communication analysis system according to an aspect of the present invention includes: the focus determination system; and a communication analysis unit that analyzes the quality of communication performed by the plurality of people in the space based on the determined focus of the subject. Prepare.
 本発明の一態様に係る注力判定方法は、ディスプレイ及びホワイトボードが設置された空間に滞在する複数の人が映る映像の映像情報を取得し、取得した前記映像情報に基づいて前記空間における前記複数の人の位置である第一位置を検知する第一検知ステップと、前記映像情報を取得し、取得した前記映像情報に基づいて前記複数の人のうちの対象者の頭部の向きを検知する第二検知ステップと、検知された前記第一位置、及び、検知された前記対象者の頭部の向きに基づいて、前記空間において前記複数の人によって行われるコミュニケーションに対する前記対象者の注力を判定する注力判定ステップとを含み、前記注力判定ステップにおいては、前記複数の人のうち前記対象者以外の人が発話しているときに、検知された前記対象者の頭部の向きが、前記複数の人のうち前記対象者以外の人、前記ディスプレイ、及び、前記ホワイトボードのいずれかを向いている期間を、前記人が前記コミュニケーションに対して注力している期間であると判定する。 A focus determination method according to an aspect of the present invention acquires video information of a video in which a plurality of people staying in a space in which a display and a whiteboard are installed, and based on the acquired video information, the plurality of people in the space a first detection step of detecting a first position that is the position of the person; obtaining the image information; and detecting the orientation of the head of the target person among the plurality of persons based on the obtained image information. Determining the subject's focus on communication conducted by the plurality of people in the space based on a second sensing step and the sensed first position and the sensed orientation of the subject's head. In the focus determination step, when a person other than the target person among the plurality of people is speaking, the orientation of the detected head of the target person is equal to the plurality of people. A period in which the person faces any one of the persons other than the target person, the display, or the whiteboard is determined as a period in which the person focuses on the communication.
 本発明の一態様に係るプログラムは、前記注力判定方法をコンピュータに実行させるためのプログラムである。 A program according to one aspect of the present invention is a program for causing a computer to execute the focus determination method.
 本発明の注力判定システム等は、対象者がコミュニケーションに対してどの程度注力しているかを判定することができる。 The focus determination system and the like of the present invention can determine how much the target person is focused on communication.
図1は、実施の形態に係るコミュニケーション解析システムの機能構成を示すブロック図である。FIG. 1 is a block diagram showing the functional configuration of the communication analysis system according to the embodiment. 図2は、実施の形態に係るコミュニケーション解析システムが適用される空間の一例を示す図である。FIG. 2 is a diagram illustrating an example of a space to which the communication analysis system according to the embodiment is applied; 図3は、実施の形態に係るセンシング装置の外観図である。FIG. 3 is an external view of the sensing device according to the embodiment. 図4は、実施の形態に係るコミュニケーション解析システムが備える音源検知ユニットの動作のフローチャートである。FIG. 4 is a flow chart of the operation of the sound source detection unit included in the communication analysis system according to the embodiment. 図5は、音源の位置の検知結果を模式的に示す図である。FIG. 5 is a diagram schematically showing detection results of the position of the sound source. 図6は、実施の形態に係るコミュニケーション解析システムが備える人検知ユニットの動作のフローチャートである。FIG. 6 is a flow chart of the operation of the human detection unit included in the communication analysis system according to the embodiment. 図7は、人の位置の検知結果を模式的に示す図である。FIG. 7 is a diagram schematically showing detection results of a person's position. 図8は、実施の形態に係るコミュニケーション解析システムが備える発話量推定ユニットの動作のフローチャートである。FIG. 8 is a flowchart of the operation of the speech amount estimation unit included in the communication analysis system according to the embodiment. 図9は、発話者の位置の検知結果を模式的に示す図である。FIG. 9 is a diagram schematically showing the detection result of the position of the speaker. 図10は、発話量を示す情報の一例を示す図である。FIG. 10 is a diagram showing an example of information indicating the amount of speech. 図11は、実施の形態に係るコミュニケーション解析システムが備える頭部向き検知ユニットの動作のフローチャートである。FIG. 11 is a flowchart of the operation of the head orientation detection unit included in the communication analysis system according to the embodiment; 図12は、実施の形態に係るコミュニケーション解析システムが備える頭部向き検知ユニットの動作を説明するための三次元座標空間を示す図である。FIG. 12 is a diagram showing a three-dimensional coordinate space for explaining the operation of the head orientation detection unit included in the communication analysis system according to the embodiment; 図13は、頭部の向きの角度の補正を説明するための図である。FIG. 13 is a diagram for explaining the correction of the angle of the orientation of the head. 図14は、実施の形態に係るコミュニケーション解析システムが備える注力判定ユニットの動作のフローチャートである。14 is a flowchart of the operation of the focus determination unit provided in the communication analysis system according to the embodiment; FIG. 図15は、対象者の目的方向を説明するための図である。FIG. 15 is a diagram for explaining the intended direction of the subject. 図16は、注力期間を示す情報の一例を示す図である。FIG. 16 is a diagram illustrating an example of information indicating a focus period. 図17は、実施の形態に係るコミュニケーション解析システムが備えるコミュニケーション解析ユニットの動作のフローチャートである。17 is a flowchart of the operation of the communication analysis unit included in the communication analysis system according to the embodiment; FIG. 図18は、情報端末に表示される、コミュニケーションの品質を示すスコアの表示画面の一例を示す図である。FIG. 18 is a diagram showing an example of a score display screen indicating the quality of communication displayed on the information terminal.
 以下、実施の形態について、図面を参照しながら具体的に説明する。なお、以下で説明する実施の形態は、いずれも包括的または具体的な例を示すものである。以下の実施の形態で示される数値、形状、材料、構成要素、構成要素の配置位置及び接続形態、ステップ、ステップの順序などは、一例であり、本発明を限定する主旨ではない。また、以下の実施の形態における構成要素のうち、独立請求項に記載されていない構成要素については、任意の構成要素として説明される。 Hereinafter, embodiments will be specifically described with reference to the drawings. It should be noted that the embodiments described below are all comprehensive or specific examples. Numerical values, shapes, materials, components, arrangement positions and connection forms of components, steps, order of steps, and the like shown in the following embodiments are examples and are not intended to limit the present invention. Further, among the constituent elements in the following embodiments, constituent elements not described in independent claims will be described as optional constituent elements.
 なお、各図は模式図であり、必ずしも厳密に図示されたものではない。また、各図において、実質的に同一の構成に対しては同一の符号を付し、重複する説明は省略または簡略化される場合がある。 It should be noted that each figure is a schematic diagram and is not necessarily strictly illustrated. Moreover, in each figure, the same code|symbol is attached|subjected with respect to substantially the same structure, and the overlapping description may be abbreviate|omitted or simplified.
 (実施の形態)
 [構成]
 まず、実施の形態に係るコミュニケーション解析システムの構成について説明する。図1は、実施の形態に係るコミュニケーション解析システムの機能構成を示すブロック図である。図2は、コミュニケーション解析システムが適用される空間の一例を示す図である。
(Embodiment)
[composition]
First, the configuration of the communication analysis system according to the embodiment will be described. FIG. 1 is a block diagram showing the functional configuration of the communication analysis system according to the embodiment. FIG. 2 is a diagram showing an example of a space to which the communication analysis system is applied.
 図1及び図2に示されるように、コミュニケーション解析システム10は、会議室などの空間70を有するオフィスなどで使用される、空間70内に位置する複数の人のコミュニケーションの品質を解析するためのシステムである。空間70は、例えば、閉空間であるが、開放空間であってもよい。空間70としては、会議室のほかに、オフィス空間内の開放的休憩所(オフィス空間の一部に椅子及びテーブルが置かれている場所)が例示される。また、空間70は、物理的に区切られている必要は無く、全体空間のうち照明光または気流などで区切られた場所であってもよい。例えば、色温度5000Kの昼光色で照明されたオフィス空間の一角に、色温度3000Kの暖色の領域が設けられ、この領域が空間70とされてもよい。 As shown in FIGS. 1 and 2, the communication analysis system 10 is used in an office having a space 70 such as a conference room to analyze the quality of communication of a plurality of people located in a space 70. System. The space 70 is, for example, a closed space, but may be an open space. Examples of the space 70 include a conference room as well as an open rest area in an office space (where chairs and tables are placed in part of the office space). Also, the space 70 does not need to be physically separated, and may be a place separated by illumination light or airflow in the entire space. For example, a warm color area with a color temperature of 3000 K may be provided in a corner of an office space illuminated with daylight color with a color temperature of 5000 K, and this area may be used as the space 70 .
 コミュニケーション解析システム10は、センシング装置20と、情報処理システム30とを備える。まず、センシング装置20について図1及び図2に加えて図3を参照しながら説明する。図3は、センシング装置20の外観図である。図3の(a)は、センシング装置20の上面図であり、図3の(b)は、センシング装置20の側面図である。なお、図3では測距センサ23は図示されていない。 The communication analysis system 10 includes a sensing device 20 and an information processing system 30. First, the sensing device 20 will be described with reference to FIG. 3 in addition to FIGS. 1 and 2. FIG. FIG. 3 is an external view of the sensing device 20. As shown in FIG. 3A is a top view of the sensing device 20, and FIG. 3B is a side view of the sensing device 20. FIG. Note that the distance measuring sensor 23 is not shown in FIG.
 センシング装置20は、空間70に設置された机40の上に設置され、空間70における音及び映像などをセンシングする。センシング装置20は、具体的には、机40の上の中央部に設置される。センシング装置20は、マイクロフォンアレイ21と、複数のカメラ22と、測距センサ23とを備える。 The sensing device 20 is installed on the desk 40 installed in the space 70 and senses sounds and images in the space 70 . The sensing device 20 is specifically installed in the center of the desk 40 . The sensing device 20 includes a microphone array 21 , a plurality of cameras 22 and a ranging sensor 23 .
 マイクロフォンアレイ21は、空間70における音を取得し、取得した音の音情報(複数の音信号)を出力する。マイクロフォンアレイ21は、具体的には、複数のマイクロフォン素子を含み、複数のマイクロフォン素子のそれぞれは、空間70における音を取得し、取得した音の音信号を出力する。 The microphone array 21 acquires sound in the space 70 and outputs sound information (a plurality of sound signals) of the acquired sound. The microphone array 21 specifically includes a plurality of microphone elements, and each of the plurality of microphone elements acquires sound in the space 70 and outputs a sound signal of the acquired sound.
 複数のカメラ22のそれぞれは、空間70に滞在する人が映る映像(言い換えれば、動画像)を撮影し、当該映像の映像情報を出力する。カメラ22は、CMOSイメージセンサなどによって実現される一般的なカメラであるが、魚眼カメラなどであってもよい。センシング装置20は、机40の上からセンシング装置20の周囲の全体を撮影できるように4つのカメラを備えているが、空間70に滞在する人の全員を撮影することができる少なくとも1つのカメラを備えていればよい。 Each of the plurality of cameras 22 captures an image (in other words, moving image) of a person staying in the space 70 and outputs image information of the image. The camera 22 is a general camera implemented by a CMOS image sensor or the like, but may be a fisheye camera or the like. The sensing device 20 has four cameras so that the entire surroundings of the sensing device 20 can be photographed from the desk 40, and at least one camera capable of photographing all the people staying in the space 70 is provided. Just be prepared.
 測距センサ23は、センシング装置20(カメラ22)から対象物までの距離を計測し、計測した対象物までの距離を示す距離情報を出力する。対象物は、空間70に滞在する人などである。測距センサ23は、例えば、TOF(Time Of Flight)方式のLiDAR(Light Detection and Ranging)であるが、距離画像センサなどであってもよい。センシング装置20は、少なくとも1つの測距センサ23を備えていればよいが、カメラ22と対応して複数の測距センサ23を備えてもよい。 The distance measuring sensor 23 measures the distance from the sensing device 20 (camera 22) to the object, and outputs distance information indicating the measured distance to the object. The object is, for example, a person staying in the space 70 . The ranging sensor 23 is, for example, a TOF (Time Of Flight) type LiDAR (Light Detection and Ranging), but may be a range image sensor or the like. Sensing device 20 may include at least one ranging sensor 23 , but may include a plurality of ranging sensors 23 corresponding to cameras 22 .
 次に、情報処理システム30について説明する。情報処理システム30は、センシング装置20と有線または無線の通信を行い、当該通信によってセンシング装置20から取得したセンシング情報(具体的には、音情報、映像情報、及び、距離情報など)に基づいて、コミュニケーションの品質を解析する。情報処理システム30は、例えば、空間70を有する施設に設置されるエッジコンピュータであるが、当該施設の外に設置されるクラウドコンピュータであってもよい。情報処理システム30がエッジコンピュータである場合、センシング装置20及び情報処理システム30は、一体的な1つの装置として実現されてもよい。また、情報処理システム30の一部の機能がエッジコンピュータとして実現され、他の一部の機能がクラウドコンピュータによって実現されてもよい。 Next, the information processing system 30 will be described. The information processing system 30 performs wired or wireless communication with the sensing device 20, and based on sensing information (specifically, sound information, video information, distance information, etc.) acquired from the sensing device 20 through the communication. , to analyze the quality of communication. The information processing system 30 is, for example, an edge computer installed in a facility having a space 70, but may be a cloud computer installed outside the facility. When the information processing system 30 is an edge computer, the sensing device 20 and the information processing system 30 may be realized as one integrated device. Also, part of the functions of the information processing system 30 may be implemented as an edge computer, and part of the other functions may be implemented by a cloud computer.
 情報処理システム30は、具体的には、音源検知ユニット31、人検知ユニット32、頭部向き検知ユニット33、発話量推定ユニット34、注力判定ユニット35、コミュニケーション解析ユニット36、及び、記憶ユニット37を備える。 Specifically, the information processing system 30 includes a sound source detection unit 31, a person detection unit 32, a head direction detection unit 33, a speech amount estimation unit 34, a focus determination unit 35, a communication analysis unit 36, and a storage unit 37. Prepare.
 音源検知ユニット31は、空間70において取得された音の音情報をセンシング装置20から取得し、取得した音情報に基づいて空間70における音源の位置である第二位置を検知する。 The sound source detection unit 31 acquires sound information of the sound acquired in the space 70 from the sensing device 20, and detects a second position, which is the position of the sound source in the space 70, based on the acquired sound information.
 人検知ユニット32は、空間70に滞在する人が映る映像の映像情報をセンシング装置20から取得し、取得した映像情報に基づいて空間70における人の位置である第一位置を検知する。 The human detection unit 32 acquires from the sensing device 20 image information of an image showing a person staying in the space 70, and detects the first position, which is the position of the person in the space 70, based on the acquired image information.
 頭部向き検知ユニット33は、空間70に滞在する人が映る映像の映像情報をセンシング装置20から取得し、取得した映像情報に基づいて人の頭部の向き(言い換えれば、顔の向き)を検知する。頭部向き検知ユニット33は、取得した映像情報に基づいて、人の目線の向きを検知してもよい。 The head orientation detection unit 33 acquires from the sensing device 20 image information of an image of a person staying in the space 70, and detects the orientation of the person's head (in other words, the orientation of the face) based on the acquired image information. detect. The head direction detection unit 33 may detect the direction of the person's line of sight based on the acquired video information.
 発話量推定ユニット34は、人検知ユニット32によって検知された第一位置、及び、音源検知ユニット31によって検知された第二位置に基づいて、人の発話量を推定する。 The speech amount estimation unit 34 estimates the amount of human speech based on the first position detected by the human detection unit 32 and the second position detected by the sound source detection unit 31 .
 注力判定ユニット35は、人検知ユニット32によって検知された第一位置、及び、頭部向き検知ユニット33によって検知された人の頭部の向きに基づいて、空間70において当該人を含む複数の人によって行われるコミュニケーションに対する人の注力を判定する。 The focus determination unit 35 detects a plurality of people including the person in the space 70 based on the first position detected by the person detection unit 32 and the orientation of the person's head detected by the head orientation detection unit 33. Determines a person's focus on communications made by
 コミュニケーション解析ユニット36は、発話量推定ユニット34によって推定された人の発話量、及び、注力判定ユニット35によって判定された人の注力の少なくとも一方に基づいて、コミュニケーションの品質を解析する。また、コミュニケーション解析ユニット36は、解析結果を出力する。 The communication analysis unit 36 analyzes the quality of communication based on at least one of the human speech volume estimated by the speech volume estimation unit 34 and the human focus determined by the focus determination unit 35 . The communication analysis unit 36 also outputs analysis results.
 以上説明した、音源検知ユニット31、人検知ユニット32、頭部向き検知ユニット33、発話量推定ユニット34、注力判定ユニット35、及び、コミュニケーション解析ユニット36のそれぞれは、マイクロコンピュータまたはプロセッサによって実現される。音源検知ユニット31、人検知ユニット32、頭部向き検知ユニット33、発話量推定ユニット34、注力判定ユニット35、及び、コミュニケーション解析ユニット36の機能は、例えば、上記マイクロコンピュータまたはプロセッサが記憶ユニット37に記憶されたコンピュータプログラムを実行することによって実現される。 Each of the sound source detection unit 31, the person detection unit 32, the head orientation detection unit 33, the speech amount estimation unit 34, the focus determination unit 35, and the communication analysis unit 36 described above is realized by a microcomputer or a processor. . The functions of the sound source detection unit 31, the human detection unit 32, the head orientation detection unit 33, the speech amount estimation unit 34, the focus determination unit 35, and the communication analysis unit 36 are, for example, stored in the memory unit 37 by the microcomputer or processor. It is implemented by executing a stored computer program.
 記憶ユニット37は、上記コンピュータプログラム、及び、上記各構成要素の機能を実現するために必要な情報などが記憶される記憶装置である。記憶ユニット37は、例えば、HDD(Hard Disk Drive)によって実現されるが、半導体メモリによって実現されてもよい。 The storage unit 37 is a storage device that stores the computer program and information necessary for realizing the functions of the components. The storage unit 37 is implemented by, for example, an HDD (Hard Disk Drive), but may be implemented by a semiconductor memory.
 なお、音源検知ユニット31、人検知ユニット32、及び、発話量推定ユニット34を含むシステムは、話者ダイアライゼーションシステム38とも記載される。つまり、話者ダイアライゼーションシステム38は、音源検知ユニット31、人検知ユニット32、及び、発話量推定ユニット34を備える。話者ダイアライゼーションシステム38は、さらに、頭部向き検知ユニット33または注力判定ユニット35を備えてもよい。 A system including the sound source detection unit 31, the human detection unit 32, and the speech amount estimation unit 34 is also described as a speaker diarization system 38. That is, the speaker diarization system 38 comprises a sound source detection unit 31 , a person detection unit 32 and a speech amount estimation unit 34 . Speaker diarization system 38 may further comprise head orientation detection unit 33 or focus determination unit 35 .
 また、人検知ユニット32、頭部向き検知ユニット33、及び、注力判定ユニット35を含むシステムは、注力判定システム39とも記載される。つまり、注力判定システム39は、人検知ユニット32、頭部向き検知ユニット33、及び、注力判定ユニット35を備える。注力判定システム39は、さらに、音源検知ユニット31または発話量推定ユニット34を備えてもよい。 A system including the human detection unit 32, the head orientation detection unit 33, and the focus determination unit 35 is also described as a focus determination system 39. That is, the focus determination system 39 includes the human detection unit 32 , the head orientation detection unit 33 , and the focus determination unit 35 . The focus determination system 39 may further include the sound source detection unit 31 or the speech amount estimation unit 34 .
 [音源検知ユニットの動作]
 次に、音源検知ユニット31の動作についてより具体的に説明する。図4は、音源検知ユニット31の動作のフローチャートである。
[Operation of sound source detection unit]
Next, the operation of the sound source detection unit 31 will be described more specifically. FIG. 4 is a flow chart of the operation of the sound source detection unit 31. As shown in FIG.
 まず、音源検知ユニット31は、空間70において取得された音の音情報をセンシング装置20のマイクロフォンアレイ21から取得する(S11)。音源検知ユニット31は、具体的には、マイクロフォンアレイ21に含まれる複数のマイクロフォン素子が出力する複数の音信号を取得する。 First, the sound source detection unit 31 acquires sound information of the sound acquired in the space 70 from the microphone array 21 of the sensing device 20 (S11). Specifically, the sound source detection unit 31 acquires multiple sound signals output by multiple microphone elements included in the microphone array 21 .
 ここで、取得された複数の音信号のそれぞれは、時間領域の信号である。音源検知ユニット31は、複数の音信号のそれぞれをフーリエ変換することにより、時間領域の信号から周波数領域の信号に変換する(S12)。 Here, each of the acquired multiple sound signals is a signal in the time domain. The sound source detection unit 31 transforms each of the plurality of sound signals from a time domain signal into a frequency domain signal by performing a Fourier transform (S12).
 次に、音源検知ユニット31は、周波数領域に変換された後の複数の音信号に基づいて定まる入力ベクトルから、空間相関行列を算出する(S13)。ここで、マイクロフォンアレイ21がマイクロフォン素子をM個備え、このうちm番目のフーリエ変換後の音信号をX(ω,t)とすると、入力ベクトルx(ω,t)は以下の式で表される。Tは転置を意味する。 Next, the sound source detection unit 31 calculates a spatial correlation matrix from the input vector determined based on the multiple sound signals after being transformed into the frequency domain (S13). Here, if the microphone array 21 has M microphone elements, and the m-th sound signal after the Fourier transform is X m (ω, t), the input vector x(ω, t) is expressed by the following equation. be done. T means transpose.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 また、空間相関行列Rは、以下の式で表される。Hは、共役転置を意味する。なお、以降は説明の簡略化のため周波数のインデックスωは省略される。 Also, the spatial correlation matrix R is represented by the following formula. H means conjugate transpose. In the following description, the frequency index ω will be omitted for the sake of simplicity.
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 次に、音源検知ユニット31は、上記の空間相関行列を固有値分解することにより、固有ベクトルを算出する(S14)。音源検知ユニット31は、具体的には、以下の式に基づいて上記の空間相関行列を固有値分解することにより、固有値ベクトルe、・・・eと、固有値λ・・・λとを算出することができる。 Next, the sound source detection unit 31 calculates eigenvectors by eigenvalue decomposition of the spatial correlation matrix (S14). Specifically, the sound source detection unit 31 performs eigenvalue decomposition on the above spatial correlation matrix based on the following equation to obtain eigenvalue vectors e 1 , . . . e M and eigenvalues λ 1 . can be calculated.
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 次に、音源検知ユニット31は、固有ベクトルから音源の位置を検知する(S15)。音源検知ユニット31は、具体的には、固有ベクトルによって音の大きさとその音が到来する方向を特定することができ、比較的大きい音が到来する方向を音源の方向(位置)として検知することができる。 Next, the sound source detection unit 31 detects the position of the sound source from the eigenvectors (S15). Specifically, the sound source detection unit 31 can identify the loudness of the sound and the direction from which the sound arrives from the eigenvector, and can detect the direction from which the relatively loud sound arrives as the direction (position) of the sound source. can.
 この結果、図5に示されるように、音源検知ユニット31は、センシング装置20の位置Oを基準にどの方向(角度)に音源が位置するかを検知することができる。図5は、音源の位置の検知結果を模式的に示す図(机40を上方から見た図)である。図5の例では、音源S1及び音源S2の2つが検知されている。なお、音源検知ユニット31は、少なくとも図5に示されるような二次元的な音源の位置(上面視において角度で表される方向)を検知すればよいが、三次元的な音源の位置を検知してもよい。以下の実施の形態では、音源検知ユニット31によって検知された音源の位置は、第二位置とも記載される。 As a result, as shown in FIG. 5 , the sound source detection unit 31 can detect in which direction (angle) the sound source is positioned with respect to the position O of the sensing device 20 . FIG. 5 is a diagram (a diagram of the desk 40 viewed from above) schematically showing the detection result of the position of the sound source. In the example of FIG. 5, two sound sources S1 and S2 are detected. The sound source detection unit 31 should detect at least the two-dimensional position of the sound source as shown in FIG. You may In the following embodiments, the position of the sound source detected by the sound source detection unit 31 is also described as the second position.
 音源検知ユニット31は、図4の動作を単位時間ごとに繰り返すことにより、音源の位置(第二位置)を追跡することができる。なお、音源とは、具体的には、空間70内に滞在する発話者(人)であるが、空間70内に設置された機器である可能性もある。 The sound source detection unit 31 can track the position of the sound source (second position) by repeating the operation of FIG. 4 every unit time. The sound source is specifically a speaker (person) staying in the space 70 , but may also be a device installed in the space 70 .
 [人検知ユニットの動作]
 次に、人検知ユニット32の動作についてより具体的に説明する。図6は、人検知ユニット32の動作のフローチャートである。なお、以下の動作は、実際には4つのカメラ22のそれぞれから取得された映像情報について行われるが、以下では、便宜上、1つのカメラ22から映像情報が取得されるものとして説明が行われる。
[Operation of human detection unit]
Next, the operation of the human detection unit 32 will be described more specifically. FIG. 6 is a flow chart of the operation of the human detection unit 32. As shown in FIG. Note that the following operations are actually performed for video information acquired from each of the four cameras 22, but for the sake of convenience, the following description assumes that video information is acquired from one camera 22.
 まず、人検知ユニット32は、空間70において取得された映像の映像情報をセンシング装置20のカメラ22から取得する(S21)。 First, the human detection unit 32 acquires image information of the image acquired in the space 70 from the camera 22 of the sensing device 20 (S21).
 次に、人検知ユニット32は、取得された映像情報に基づいて、映像内で人が映っている領域を特定する(S22)。人検知ユニット32は、パターンマッチングを用いた手法、または、機械学習モデルを用いた手法などにより、映像内で人が映っている領域を特定することができる。 Next, the human detection unit 32 identifies an area in which a person appears in the image based on the acquired image information (S22). The human detection unit 32 can identify an area in which a person appears in the video by a method using pattern matching, a method using a machine learning model, or the like.
 次に、人検知ユニット32は、特定した領域(つまり、人がいる領域)に識別情報を割り当てる(S23)。例えば、人検知ユニット32は、3つの領域を特定し、特定した3つの領域にA、B、Cの識別情報を割り当てる。以下では、領域Aに対応する人を人A、領域Bに対応する人を人B、領域Cに対応する人を人Cとも記載する。 Next, the human detection unit 32 assigns identification information to the specified area (that is, the area where a person is present) (S23). For example, the human detection unit 32 identifies three areas and assigns identification information of A, B, and C to the identified three areas. Hereinafter, a person corresponding to the area A is also described as a person A, a person corresponding to the area B as a person B, and a person corresponding to the area C as a person C.
 次に、人検知ユニット32は、ステップS23で特定した領域の映像内での位置に基づいて、人がいる方向を特定する(S24)。記憶ユニット37には、カメラ22の設置位置(机40の上の中央)と、カメラ22の撮影範囲(画角)とを示す情報があらかじめ記憶されており、人検知ユニット32は、映像内の位置がどの方向に相当するかを特定することができる。 Next, the human detection unit 32 identifies the direction in which the person is based on the position in the image of the area identified in step S23 (S24). The storage unit 37 stores in advance information indicating the installation position of the camera 22 (the center on the desk 40) and the shooting range (angle of view) of the camera 22. It is possible to specify which direction the position corresponds to.
 次に、人検知ユニット32は、ステップS23で特定した領域の大きさに基づいて、センシング装置20(カメラ22)から人までの距離を推定する(S25)。この場合、ステップS23で特定した領域が大きいほど、センシング装置20(カメラ22)から人までの距離は近いと推定される。なお、ステップS25では、測距センサ23から取得される距離情報(距離の実測値)によってセンシング装置20(カメラ22)から人までの距離が特定されてもよい。 Next, the human detection unit 32 estimates the distance from the sensing device 20 (camera 22) to the person based on the size of the area specified in step S23 (S25). In this case, it is estimated that the larger the area specified in step S23, the closer the distance from the sensing device 20 (camera 22) to the person. In step S<b>25 , the distance from the sensing device 20 (camera 22 ) to the person may be specified based on the distance information (measured distance value) acquired from the ranging sensor 23 .
 以上のステップS24及びステップS25の結果、図7に示されるように、人検知ユニット32は、空間70における人の位置を検知することができる。図7は、人の位置の検知結果を模式的に示す図(机40を上方から見た図)である。図7の例では、人A、人B、及び、人Cの三人の位置が検知されている。なお、人検知ユニット32は、三次元的な人の位置(人の位置の三次元座標)を検知するが、少なくとも図7に示されるような二次元的な音源の位置(上面視において角度で表される方向)を検知すればよい。以下の実施の形態では、人検知ユニット32によって検知された人の位置は、第一位置とも記載される。 As a result of steps S24 and S25 above, the human detection unit 32 can detect the position of a person in the space 70, as shown in FIG. FIG. 7 is a diagram (a diagram of the desk 40 viewed from above) schematically showing the detection result of the position of a person. In the example of FIG. 7, the positions of three people, person A, person B, and person C, are detected. The human detection unit 32 detects a three-dimensional human position (three-dimensional coordinates of the human position). direction) can be detected. In the following embodiments, the position of the person detected by the human detection unit 32 is also described as the first position.
 人検知ユニット32は、図6の動作を単位時間ごとに繰り返すことにより、人の位置(第一位置)を追跡することができる。このときステップS23の識別情報の割り当ては、最初の1回のみ行われればよい。 The human detection unit 32 can track the position of a person (first position) by repeating the operation of FIG. 6 every unit time. At this time, the assignment of the identification information in step S23 may be performed only once for the first time.
 [発話量推定ユニットの動作]
 次に、発話量推定ユニット34の動作についてより具体的に説明する。図8は、発話量推定ユニット34の動作のフローチャートである。
[Operation of speech amount estimation unit]
Next, the operation of the speech amount estimation unit 34 will be described more specifically. FIG. 8 is a flow chart of the operation of the speech amount estimation unit 34. As shown in FIG.
 発話量推定ユニット34は、人検知ユニット32によって検知された第一位置、及び、音源検知ユニット31によって検知された第二位置を取得する(S31)。このとき取得される第一位置及び第二位置は、実質的に同一のタイミングで検知されたものである。実質的に同一とは、多少のずれを含んでもよいことを意味する。 The speech amount estimation unit 34 acquires the first position detected by the human detection unit 32 and the second position detected by the sound source detection unit 31 (S31). The first position and the second position acquired at this time are detected at substantially the same timing. “Substantially the same” means that some deviation may be included.
 次に、発話量推定ユニット34は、三次元座標で表される第一位置を、第二位置相当の二次元座標(角度)に変換する(S32)。なお、マイクロフォンアレイ21の位置とカメラ22の位置とが異なる場合には、これらの位置の差分に基づいて、二次元座標に変換された後の第一位置が補正される。 Next, the speech amount estimation unit 34 converts the first position represented by three-dimensional coordinates into two-dimensional coordinates (angle) corresponding to the second position (S32). Note that if the position of the microphone array 21 and the position of the camera 22 are different, the first position after being converted into two-dimensional coordinates is corrected based on the difference between these positions.
 次に、発話量推定ユニット34は、変換後の第一位置と、第二位置とを照合することにより、発話者(発話者の位置)を検知する(S33)。図9は、発話者の位置の検知結果を模式的に示す図である。図9は、第二位置(図5)と、第一位置(図7)とを重畳した図である。 Next, the speech amount estimation unit 34 detects the speaker (speaker's position) by matching the converted first position with the second position (S33). FIG. 9 is a diagram schematically showing the detection result of the position of the speaker. FIG. 9 is a superimposed view of the second position (FIG. 5) and the first position (FIG. 7).
 発話量推定ユニット34は、例えば、音源S1の第二位置と人Aの第一位置との角度差Δθ1が所定値以下であるときに、音源S1が人Aであると検知する。つまり、人Aは発話者として検知される。また、発話量推定ユニット34は、例えば、音源S2の第二位置と人Cの第一位置との角度差Δθ2が所定値以下であるときに、音源S2が人Cであると検知する。つまり、人Cは発話者として検知される。 The speech amount estimation unit 34 detects that the sound source S1 is the person A, for example, when the angle difference Δθ1 between the second position of the sound source S1 and the first position of the person A is less than or equal to a predetermined value. That is, the person A is detected as the speaker. Further, the speech amount estimation unit 34 detects that the sound source S2 is the person C, for example, when the angle difference Δθ2 between the second position of the sound source S2 and the first position of the person C is equal to or less than a predetermined value. That is, person C is detected as the speaker.
 発話量推定ユニット34は、図8の動作を単位時間ごとに繰り返すことにより、人A、人B、及び、人Cのそれぞれを追跡し、人A、人B、及び、人Cそれぞれの発話量を推定することができる。発話量推定ユニット34は、具体的には、人A、人B、及び、人Cのそれぞれが発話者として検知された期間を、人A、人B、及び、人Cのそれぞれが発話している期間と推定することができる。つまり、発話量推定ユニット34は、人A、人B、及び、人Cのそれぞれが発話している期間を示す情報(発話量を示す情報)を記憶ユニット37に記憶することができる。図10は、発話量を示す情報の一例を示す図である。図10に示されるように、発話量を示す情報は、上記ステップS23で割り当てられた識別情報のそれぞれに、発話量(発話時間)が紐づけられた情報である。 The speech amount estimation unit 34 tracks each of person A, person B, and person C by repeating the operation of FIG. can be estimated. Specifically, the speech amount estimation unit 34 estimates the period during which the person A, the person B, and the person C are detected as the speaker. It can be estimated that the That is, the speech amount estimation unit 34 can store information (information indicating the amount of speech) indicating the period during which the person A, person B, and person C are speaking in the storage unit 37 . FIG. 10 is a diagram showing an example of information indicating the amount of speech. As shown in FIG. 10, the information indicating the amount of speech is information in which the amount of speech (speech time) is associated with each piece of identification information assigned in step S23.
 このように、発話量推定ユニット34は、人検知ユニット32によって検知された第一位置、及び、音源検知ユニット31によって検知された第二位置に基づいて、複数の人それぞれを追跡し、かつ、空間70に滞在する複数の人のそれぞれの発話量(対象者の発話量)を推定することができる。このような発話量推定ユニット34による発話量の推定方法は、複数の人の座席の移動を伴う会議等におけるコミュニケーションの品質の解析に有用である。この場合の複数の人の座席の移動とは、例えば、ホワイトボード60を使うために移動することなどを意味する。 Thus, the speech amount estimation unit 34 tracks each of the plurality of persons based on the first position detected by the human detection unit 32 and the second position detected by the sound source detection unit 31, and It is possible to estimate the amount of speech of each of the plurality of people staying in the space 70 (the amount of speech of the target person). Such a method of estimating the amount of speech by the speech amount estimation unit 34 is useful for analyzing the quality of communication in a conference or the like involving movement of seats by a plurality of people. In this case, the movement of seats by a plurality of people means moving to use the whiteboard 60, for example.
 また、発話量推定ユニット34は、音声認識による個人特定、及び、画像認識による個人特定を行うのではなく、複数の人の匿名性を維持したまま、複数の人それぞれの発話量を推定することができる。 In addition, the speech volume estimation unit 34 estimates the speech volume of each of the plurality of people while maintaining the anonymity of the plurality of people, instead of performing individual identification by voice recognition and individual identification by image recognition. can be done.
 [頭部向き推定ユニットの動作]
 次に、頭部向き検知ユニット33の動作についてより具体的に説明する。図11は、頭部向き検知ユニット33の動作のフローチャートである。図12は、頭部向き検知ユニット33の動作を説明するための三次元座標空間を示す図である。なお、図12に示されるように、以下の頭部向き検知ユニット33の動作の説明においては、カメラ22は、机40の中心位置O(0,0,0)ではなく、机40上の位置C(x0,y0,z0)に設置されているものとして説明が行われる。なお、図12には、X-Y-Zで示される空間70における三次元座標に加えて、U-Vで示される映像中の座標も合わせて図示されている。
[Operation of head orientation estimation unit]
Next, the operation of the head orientation detection unit 33 will be described more specifically. FIG. 11 is a flow chart of the operation of the head orientation detection unit 33. As shown in FIG. FIG. 12 is a diagram showing a three-dimensional coordinate space for explaining the operation of the head orientation detection unit 33. As shown in FIG. Note that, as shown in FIG. 12, in the following description of the operation of the head orientation detection unit 33, the camera 22 is positioned at a position on the desk 40, not at the center position O (0, 0, 0) of the desk 40. It is described as being located at C(x0, y0, z0). In addition to the three-dimensional coordinates in the space 70 indicated by XYZ, FIG. 12 also shows the coordinates in the image indicated by UV.
 まず、頭部向き検知ユニット33は、空間70において取得された映像の映像情報をセンシング装置20のカメラ22から取得する(S41)。 First, the head orientation detection unit 33 acquires image information of the image acquired in the space 70 from the camera 22 of the sensing device 20 (S41).
 次に、頭部向き検知ユニット33は、取得された映像情報に基づいて、映像に映る人の頭部の向きを特定する(S42)。頭部向き検知ユニット33は、例えば、顔認識処理により、映像における人の頭部に相当する領域を特定し、特定した領域に機械学習モデルを適用することにより、人の頭部の向きを検知することができる。図12に示されるように、このときの頭部の向きを示すベクトルは、当該人の位置Pとカメラの位置Cとを結ぶ線分(直線)に対してA°の角度をなすものとする。 Next, the head orientation detection unit 33 identifies the orientation of the person's head in the image based on the acquired image information (S42). The head orientation detection unit 33 detects the orientation of the person's head by, for example, identifying an area corresponding to the person's head in the video by face recognition processing and applying a machine learning model to the identified area. can do. As shown in FIG. 12, the vector indicating the direction of the head at this time forms an angle of A° with respect to the segment (straight line) connecting the position P of the person and the position C of the camera. .
 次に、頭部向き検知ユニット33は、人検知ユニット32によって検知される上記人の位置(より詳細には、人の頭部の位置)を取得する(S43)。図12に示されるように、このときの人の位置は、P(x1,y1,z1)とされる。 Next, the head direction detection unit 33 acquires the position of the person (more specifically, the position of the person's head) detected by the person detection unit 32 (S43). As shown in FIG. 12, the position of the person at this time is P(x1, y1, z1).
 例えば、机40の実際のサイズによって定まるz1-z0と、カメラ22の水平方向の視野角αと、映像中の映像内の人の位置(u、v)と、映像の横幅wとを用いると、x1は、以下の式で表現できる。なお、机40のサイズ、カメラ22の水平方向の視野角α、及び、映像の横幅wなどの情報は、あらかじめ記憶ユニット37に記憶される。 For example, using z1-z0 determined by the actual size of the desk 40, the horizontal viewing angle α of the camera 22, the position (u, v) of the person in the image, and the width w of the image, , x1 can be expressed by the following equation. Information such as the size of the desk 40, the horizontal viewing angle α of the camera 22, and the width w of the image is stored in the storage unit 37 in advance.
 x1=x0+((u-w/2)/w)×(z1-z0)×tan(α)  x1=x0+((u-w/2)/w)×(z1-z0)×tan(α)
 次に、頭部向き検知ユニット33は、ステップS42において特定されたカメラ22の位置を基準とした頭部の向き(角度A)を、ステップS43において取得した人の位置(具体的には、座標x1の値)に基づいて補正する(S44)。図13は、このような角度の補正を説明するための図(机40を上方から見た図)である。 Next, the head orientation detection unit 33 detects the orientation of the head (angle A) with reference to the position of the camera 22 specified in step S42, and the position (specifically, coordinates) of the person acquired in step S43. x1 value) (S44). FIG. 13 is a diagram (a diagram of the desk 40 viewed from above) for explaining such angle correction.
 頭部向き検知ユニット33は、ステップS43において取得されたx1の座標に基づいて、図13における∠OPCを算出することができ、A+∠OPCを補正後の角度とすることができる。 The head orientation detection unit 33 can calculate ∠OPC in FIG. 13 based on the x1 coordinate acquired in step S43, and can set A+∠OPC as the corrected angle.
 頭部向き検知ユニット33は、図11の動作を単位時間ごとに繰り返すことにより、人の頭部の向きを追跡することができる。 The head orientation detection unit 33 can track the orientation of the person's head by repeating the operation of FIG. 11 every unit time.
 [注力判定ユニットの動作]
 次に、注力判定ユニット35の動作についてより具体的に説明する。図14は、注力判定ユニット35の動作のフローチャートである。
[Operation of focus determination unit]
Next, the operation of the focus determination unit 35 will be described more specifically. FIG. 14 is a flow chart of the operation of the focus determination unit 35. As shown in FIG.
 注力判定ユニット35は、人検知ユニット32によって検知された第一位置、及び、頭部向き検知ユニット33によって検知された人の頭部の向きを取得する(S51)。このとき取得される第一位置は、より詳細には、空間70に滞在する複数の人それぞれの第一位置であり、人の頭部の向きは、当該複数の人それぞれの頭部の向きである。 The focus determination unit 35 acquires the first position detected by the human detection unit 32 and the orientation of the person's head detected by the head orientation detection unit 33 (S51). More specifically, the first position acquired at this time is the first position of each of the plurality of people staying in the space 70, and the orientation of the head of each of the plurality of persons is the orientation of the head of each of the plurality of persons. be.
 次に、注力判定ユニット35は、複数の人の1人を対象者として、対象者の目的方向を決定する(S52)。図15は、対象者の目的方向を説明するための図(机40を上方から見た図)である。 Next, the focus determination unit 35 determines the intended direction of the target person, with one of the plurality of people as the target person (S52). FIG. 15 is a diagram (a diagram of the desk 40 viewed from above) for explaining the intended direction of the subject.
 注力判定ユニット35は、例えば、位置Pにいる人が対象者である場合、位置Pから他の人がいる位置Gを結ぶ方向を目的方向に決定する。なお、目的方向は、必ずしも人に対して決定される必要はなく、後述のようにディスプレイ50及びホワイトボード60に対して決定されてもよい。 For example, if a person at position P is the target person, the focus determination unit 35 determines a direction connecting position G where another person is from position P as the target direction. Note that the target direction does not necessarily have to be determined for a person, and may be determined for the display 50 and the whiteboard 60 as described later.
 次に、注力判定ユニット35は、目的方向を中心として許容範囲を決定する(S53)。例えば、図15の例では、∠GPO-βから∠GPO+βまでの範囲が許容範囲として決定される。βは許容範囲を定めるための係数である。なお、βの値は、目的方向が向かう先が、人であるか、ディスプレイ50であるか、ホワイトボード60であるかによって異なる値とされる。 Next, the focus determination unit 35 determines the allowable range around the target direction (S53). For example, in the example of FIG. 15, the range from ∠GPO−β to ∠GPO+β is determined as the allowable range. β is a coefficient for determining the allowable range. The value of β is different depending on whether the target direction is a person, the display 50 or the whiteboard 60 .
 次に、注力判定ユニット35は、ステップS51で取得した対象者の頭部の向きと、ステップS53で決定した許容範囲とを比較することにより、対象者の注力を判定する(S54)。ここでの注力は、空間70において対象者を含む複数の人によって行われるコミュニケーションに対する注力を意味する。注力判定ユニット35は、対象者の頭部の向きが許容範囲内であるときには、対象者が他の人を見ていると考えられることから、コミュニケーションに注力していると判定する。一方、注力判定ユニット35は、対象者の頭部の向きが許容範囲外であるときには、対象者が他の人を見ていないと考えられることからコミュニケーションに注力していないと判定する。 Next, the focus determination unit 35 determines the focus of the subject by comparing the orientation of the subject's head acquired in step S51 with the allowable range determined in step S53 (S54). Focus here means focus on communication performed by a plurality of people including the target person in the space 70 . The focus determination unit 35 determines that the target person is focusing on communication when the direction of the target person's head is within the allowable range, since it is considered that the target person is looking at another person. On the other hand, when the target person's head orientation is outside the allowable range, the focus determination unit 35 determines that the target person is not focusing on communication because it is considered that the target person is not looking at other people.
 注力判定ユニット35は、図14の動作を単位時間ごとに複数の人(人A、人B、及び、人Cとする)それぞれを対象者として繰り返すことにより、人A、人B、及び、人Cそれぞれの注力を判定することができる。注力判定ユニット35は、具体的には、人A、人B、及び、人Cのそれぞれが注力していると判定された期間を積算し、複数の人がコミュニケーションに注力している期間を示す情報を記憶ユニット37に記憶することができる。図16は、注力期間を示す情報の一例を示す図である。図16に示されるように、発話量を示す情報は、上記ステップS23で割り当てられた識別情報のそれぞれに、注力期間が紐づけられた情報である。 The focus determination unit 35 repeats the operation of FIG. C can determine the focus of each. Specifically, the focus determination unit 35 accumulates the periods in which each of person A, person B, and person C is determined to be focused, and indicates the period in which multiple people are focused on communication. Information can be stored in the storage unit 37 . FIG. 16 is a diagram illustrating an example of information indicating a focus period. As shown in FIG. 16, the information indicating the amount of speech is information in which each piece of identification information assigned in step S23 is associated with a focus period.
 このように、注力判定ユニット35は、人検知ユニット32によって検知された第一位置、及び、頭部向き検知ユニット33によって検知された頭部の向きに基づいて、空間70において複数の人によって行われるコミュニケーションに対する対象者の注力を判定することができる。 In this way, the focus determination unit 35 determines whether a plurality of people are performing in the space 70 based on the first position detected by the human detection unit 32 and the orientation of the head detected by the head orientation detection unit 33 . It is possible to determine the subject's focus on the communication received.
 なお、注力判定ユニット35は、音源検知ユニット31によって検知された第二位置をさらに取得して発話者の位置を検知するか、あるいは、発話量推定ユニット34によって検知される発話者の位置を取得してもよい。いずれの場合も、注力判定ユニット35は、複数の人のそれぞれが発話者であるか否かを考慮して注力を判定することができる。例えば、対象者の頭部が発話者の方向を向いているときのみ対象者がコミュニケーションに注力していると判定し、発話していない人の方向を見ているときは対象者がコミュニケーションに注力していないと判定することができる。 Note that the focus determination unit 35 further acquires the second position detected by the sound source detection unit 31 to detect the position of the speaker, or acquires the position of the speaker detected by the speech amount estimation unit 34. You may In either case, the focus determination unit 35 can determine focus by considering whether or not each of the plurality of people is a speaker. For example, only when the subject's head is facing the speaker's direction, it is determined that the subject is focusing on communication, and when the subject's head is looking in the direction of the person who is not speaking, the subject is focusing on communication. It can be determined that they are not.
 また、複数の人のそれぞれが発話者であるか否かを考慮する場合、注力判定ユニット35は、以下のように注力を判定してもよい。 Also, when considering whether or not each of a plurality of people is a speaker, the focus determination unit 35 may determine focus as follows.
 例えば、空間70において複数の人によって会議が行われているような場合、対象者(人Cとする)は、当該対象者の真横に位置している人Aが発話していても、真横に位置している人のほうは向きづらい。したがって、他の人Bのほうを見ると考えられる。また、対象者は、会議の報告資料等が表示されているディスプレイ50(図2に図示)、及び、ホワイトボード60(図2に図示)のほうを向くことも考えられる。このような場合、上記のように対象者の頭部が発話者の方向を向いているときのみ対象者がコミュニケーションに注力していると判定されると、実際にはコミュニケーションに注力している対象者がコミュニケーションに注力していないと判定されてしまうという課題がある。 For example, when a meeting is held by a plurality of people in the space 70, the target person (assumed to be a person C) is positioned right beside the target person even if the person A located right beside the target person is speaking. It is difficult to turn to the person who is located. Therefore, it is considered to look at the other person B. In addition, it is conceivable that the target person faces the display 50 (shown in FIG. 2) and the whiteboard 60 (shown in FIG. 2) on which meeting report materials and the like are displayed. In such a case, if it is determined that the target person is focusing on communication only when the target person's head is facing the direction of the speaker as described above, the target person who is actually focusing on communication There is a problem that it is determined that the person is not focusing on communication.
 そこで、このような場合、人Aが発話している間は、(1)対象者が人Aの方向を向いている期間、(2)対象者がディスプレイ50の方向を向いている期間、(3)対象者が人A以外の人(人B)の方向を向いている期間、(4)対象者がホワイトボード60の方向を向いている期間のいずれも、対象者がコミュニケーションに注力していると判定されてもよい。つまり、注力判定ユニット35は、複数の人のうち対象者以外の人が発話しているときに、検知された対象者の頭部の向きが、複数の人のうち対象者以外の人、ディスプレイ50、及び、ホワイトボード60のいずれかを向いている期間を、人がコミュニケーションに対して注力している期間であると判定してもよい。なお、対象者がホワイトボード60のほうを向いている期間は、人Aがホワイトボード付近で発話しているという条件の下で、対象者がコミュニケーションに注力している期間と判定されてもよい。 Therefore, in such a case, while the person A is speaking, (1) the period during which the target person is facing the direction of the person A, (2) the period during which the target person is facing the display 50, ( During both 3) the period when the target person is facing the person (person B) other than person A and (4) the period when the target person is facing the whiteboard 60, the target person is focusing on communication. It may be determined that That is, when a person other than the target person among the plurality of people is speaking, the focus determination unit 35 determines that the orientation of the detected target person's head is A period in which the person faces either 50 or the whiteboard 60 may be determined as a period in which the person is concentrating on communication. Note that the period during which the target person is facing the whiteboard 60 may be determined as the period during which the target person is concentrating on communication under the condition that person A is speaking near the whiteboard. .
 [コミュニケーション解析ユニットの動作]
 次に、コミュニケーション解析ユニット36の動作についてより具体的に説明する。図17は、コミュニケーション解析ユニット36の動作のフローチャートである。
[Operation of communication analysis unit]
Next, the operation of the communication analysis unit 36 will be described more specifically. FIG. 17 is a flow chart of the operation of the communication analysis unit 36. As shown in FIG.
 コミュニケーション解析ユニット36は、記憶ユニット37に記憶された、発話量を示す情報(図10)、及び、注力期間を示す情報(図16)を記憶ユニット37から読み出す(S61)。 The communication analysis unit 36 reads out the information indicating the amount of speech (FIG. 10) and the information indicating the focus period (FIG. 16) stored in the storage unit 37 (S61).
 次に、コミュニケーション解析ユニット36は、取得した発話量を示す情報、及び、取得した注力期間を示す情報に基づいて、空間70におけるコミュニケーションの品質を解析する。例えば、コミュニケーション解析ユニットは、以下の採点基準でコミュニケーションの品質をスコア化する。 Next, the communication analysis unit 36 analyzes the quality of communication in the space 70 based on the obtained information indicating the amount of speech and the obtained information indicating the focus period. For example, the communication analysis unit scores communication quality with the following scoring criteria.
 コミュニケーション解析ユニット36は、例えば、取得した発話量を示す情報に基づいて、人A~人Cの発話量の比率を算出し、この比率の最小値/最大値が1に近い(つまり、人A~人Cがまんべんなく発言している)ほど、発話量に関する第一スコアを大きい値にする。コミュニケーション解析ユニット36は、具体的には、人A~人Cの発話量の比率が1:1.2:1.5である場合、1(最小値)/1.5(最大値)と1との差分に基づいて第一スコアを算出する。 The communication analysis unit 36 calculates, for example, the ratio of the speech volumes of persons A to C based on the acquired information indicating the speech volumes, and the minimum/maximum value of this ratio is close to 1 (that is, the person A ~ Person C speaks more evenly), the higher the first score related to the amount of speech is set. Specifically, when the ratio of the amount of speech from person A to person C is 1:1.2:1.5, the communication analysis unit 36 calculates 1 (minimum value)/1.5 (maximum value) and 1 A first score is calculated based on the difference between .
 また、コミュニケーション解析ユニット36は、例えば、取得した注力期間を示す情報に基づいて、人A~人Cの注力期間の平均値が大きいほど、注力に関する第二スコアを大きい値にする。 Also, the communication analysis unit 36, for example, based on the acquired information indicating the focus period, sets the second score related to focus to a larger value as the average value of the focus periods of persons A to C increases.
 そして、コミュニケーション解析ユニット36は、第一スコア及び第二スコアの合計を、空間70におけるコミュニケーションの品質を示す最終スコアとして算出する。人A~人Cのそれぞれは、例えば、スマートフォンまたはパーソナルコンピュータなどの情報端末を用いてコミュニケーション解析システム10(情報処理システム30)にアクセスすることにより、コミュニケーションの品質を示す最終スコア(つまり、コミュニケーションの品質の解析結果)を確認することができる。図18は、情報端末に表示される、コミュニケーションの品質を示すスコアの表示画面の一例を示す図である。 Then, the communication analysis unit 36 calculates the sum of the first score and the second score as the final score indicating the quality of communication in the space 70. Each of the persons A to C, for example, by accessing the communication analysis system 10 (information processing system 30) using an information terminal such as a smartphone or a personal computer, obtains a final score indicating the quality of communication (that is, the quality of communication quality analysis results) can be confirmed. FIG. 18 is a diagram showing an example of a score display screen indicating the quality of communication displayed on the information terminal.
 このように、コミュニケーション解析ユニット36は、発話量の推定結果、及び、注力の判定結果に基づいて、空間70において複数の人によって行われるコミュニケーションの品質を解析することができる。なお、コミュニケーション解析ユニット36は、発話量の推定結果、及び、注力の判定結果の少なくとも一方に基づいて、空間70において複数の人によって行われるコミュニケーションの品質を解析すればよい。また、上記のようなスコアの算出基準は一例であり、スコアは空間70におけるコミュニケーションに求められる内容に応じて適宜定められればよい。 In this way, the communication analysis unit 36 can analyze the quality of communication performed by a plurality of people in the space 70 based on the speech volume estimation result and the focus determination result. Note that the communication analysis unit 36 may analyze the quality of communication performed by a plurality of people in the space 70 based on at least one of the speech volume estimation result and the focus determination result. Also, the score calculation criteria as described above are merely an example, and the score may be appropriately determined according to the content required for communication in the space 70 .
 [変形例]
 上記実施の形態では、マイクロフォンアレイ21、カメラ22、及び、測距センサ23は机40の上に設置された。しかしながら、マイクロフォンアレイ21、カメラ22、及び、測距センサ23は、天井に設置されてもよい。また、マイクロフォンアレイ21、カメラ22、及び、測距センサ23が1か所に集約される必要はなく、マイクロフォンアレイ21、カメラ22、及び、測距センサ23は互いに異なる場所に設置されていてもよい。
[Modification]
In the above embodiment, the microphone array 21, the camera 22, and the ranging sensor 23 are installed on the desk 40. FIG. However, the microphone array 21, camera 22 and ranging sensor 23 may be installed on the ceiling. In addition, the microphone array 21, the camera 22, and the ranging sensor 23 do not have to be concentrated in one place, and the microphone array 21, the camera 22, and the ranging sensor 23 may be installed in different places. good.
 また、上記実施の形態では、発話者を検知するために音源検知ユニット31と人検知ユニット32とが併用されたが、人検知ユニット32のみで発話者を検知することもできる。例えば、人検知ユニット32は、映像に映る人の動きなどから当該人が発話中であるか否かを検知することができる。 Also, in the above embodiment, the sound source detection unit 31 and the human detection unit 32 are used together to detect the speaker, but the speaker can also be detected with the human detection unit 32 alone. For example, the human detection unit 32 can detect whether or not the person is speaking from the movement of the person in the video.
 また、上記実施の形態において、コミュニケーション解析ユニット36は、コミュニケーションの品質を示す最終スコア(つまり、コミュニケーションの品質の解析結果)を空間70において会議等が行われているときにリアルタイムに算出し、算出した最終スコアに基づいて空間70における環境をリアルタイムに制御してもよい。 Further, in the above-described embodiment, the communication analysis unit 36 calculates the final score indicating the quality of communication (that is, the analysis result of the quality of communication) in real time while the meeting or the like is being held in the space 70, and calculates the final score. The environment in the space 70 may be controlled in real time based on the final score obtained.
 例えば、コミュニケーション解析ユニット36は、算出された最終スコアが所定値未満である(つまり、コミュニケーションが活発でない)と判定すると、空間70に設置された環境制御機器(図示せず)へ制御信号を送信することにより、空間70の環境を制御する。 For example, when the communication analysis unit 36 determines that the calculated final score is less than a predetermined value (that is, the communication is not active), it sends a control signal to the environment control device (not shown) installed in the space 70. By doing so, the environment of the space 70 is controlled.
 環境制御機器は、具体的には、空間70の温度環境を制御する空気調和機、空間70の光環境を制御する照明機器、空間70のにおいを制御する香り発生機、及び、空間70の音環境を制御する楽曲再生装置などである。 Specifically, the environment control devices include an air conditioner that controls the temperature environment of the space 70, a lighting device that controls the light environment of the space 70, a fragrance generator that controls the smell of the space 70, and the sound of the space 70. Examples include a music player that controls the environment.
 例えば、コミュニケーション解析ユニット36は、空気調和機の設定温度を高くする、または、照明機器を現在よりも明るくすることにより、空間70におけるコミュニケーションの活性化を図ることができる。また、コミュニケーション解析ユニット36は、空間70に設置された香り発生機を作動させる、または、空間70に設置された楽曲再生装置に楽曲を再生させることにより、空間70におけるコミュニケーションの活性化を図ってもよい。 For example, the communication analysis unit 36 can activate communication in the space 70 by raising the set temperature of the air conditioner or making the lighting equipment brighter than it is now. In addition, the communication analysis unit 36 activates the communication in the space 70 by activating the scent generator installed in the space 70 or by causing the music playback device installed in the space 70 to play back music. good too.
 [効果等]
 以上説明したように、注力判定システム39は、ディスプレイ50及びホワイトボード60が設置された空間70に滞在する複数の人が映る映像の映像情報を取得し、取得した映像情報に基づいて空間70における複数の人の位置である第一位置を検知する人検知ユニット32と、映像情報を取得し、取得した映像情報に基づいて複数の人のうちの対象者の頭部の向きを検知する頭部向き検知ユニット33と、検知された第一位置、及び、検知された対象者の頭部の向きに基づいて、空間70において複数の人によって行われるコミュニケーションに対する対象者の注力を判定する注力判定ユニット35とを備える。注力判定ユニット35は、複数の人のうち対象者以外の人が発話しているときに、検知された対象者の頭部の向きが、複数の人のうち対象者以外の人、ディスプレイ50、及び、ホワイトボード60のいずれかを向いている期間を、対象者がコミュニケーションに対して注力している期間であると判定する。
[Effects, etc.]
As described above, the focus determination system 39 acquires image information of images showing a plurality of people staying in the space 70 in which the display 50 and the whiteboard 60 are installed, and based on the acquired image information, A human detection unit 32 that detects a first position that is the position of a plurality of people, and a head that acquires image information and detects the orientation of the head of a target person among the plurality of people based on the acquired image information. An orientation detection unit 33 and a focus determination unit that determines the target person's focus on communication performed by a plurality of people in the space 70 based on the detected first position and the detected target person's head orientation. 35. The focus determination unit 35 determines that when a person other than the target person among the plurality of people is speaking, the orientation of the detected target person's head is different from that of the person other than the target person among the plurality of people, the display 50, Also, the period during which the target person faces any one of the whiteboards 60 is determined as the period during which the subject is concentrating on communication.
 このような注力判定システム39は、対象者がコミュニケーションに対してどの程度注力しているかを判定することができる。 Such a focus determination system 39 can determine how much the target person is focused on communication.
 また、例えば、空間70には机40が設置され、人検知ユニット32は、机40の上に設置されたカメラ22から映像情報を取得する。 Also, for example, a desk 40 is installed in the space 70 , and the human detection unit 32 acquires video information from the camera 22 installed on the desk 40 .
 このような注力判定システム39は、机40の上に設置されたカメラ22から映像情報を取得することができる。 Such a focus determination system 39 can acquire video information from the camera 22 installed on the desk 40 .
 また、例えば、空間70には、複数のカメラ22が設置され、人検知ユニット32によって取得される映像情報には、複数のカメラ22のそれぞれによって撮影された映像の映像情報が含まれる。 Also, for example, a plurality of cameras 22 are installed in the space 70 , and the image information acquired by the human detection unit 32 includes image information of images captured by each of the plurality of cameras 22 .
 このような注力判定システム39は、机40の上に設置された複数のカメラ22から映像情報を取得することができる。 Such a focus determination system 39 can acquire video information from multiple cameras 22 installed on the desk 40 .
 また、例えば、空間70には、カメラ22から対象者までの距離を計測する測距センサ23が設置される。人検知ユニット32は、取得した映像情報、及び、測距センサ23の検知結果に基づいて第一位置を検知する。 Also, for example, in the space 70, a ranging sensor 23 that measures the distance from the camera 22 to the subject is installed. The human detection unit 32 detects the first position based on the acquired image information and the detection result of the distance measuring sensor 23 .
 このような注力判定システム39は、測距センサ23の検知結果に基づいて第一位置を検知することができる。 Such a focus determination system 39 can detect the first position based on the detection result of the distance measuring sensor 23 .
 また、例えば、人検知ユニット32は、第一位置を検知するために、映像における対象者の大きさに基づいて人検知ユニット32から対象者までの距離を推定する。 Also, for example, the human detection unit 32 estimates the distance from the human detection unit 32 to the target person based on the size of the target person in the image in order to detect the first position.
 このような注力判定システム39は、映像における対象者の大きさに基づいて人検知ユニット32から対象者までの距離を推定することができる。 Such a focus determination system 39 can estimate the distance from the human detection unit 32 to the target person based on the size of the target person in the image.
 また、コミュニケーション解析システム10は、注力判定システム39と、判定された前記対象者の注力に基づいて、空間70において複数の人によって行われるコミュニケーションの品質を解析するコミュニケーション解析ユニット36とを備える。 The communication analysis system 10 also includes a focus determination system 39 and a communication analysis unit 36 that analyzes the quality of communication conducted by a plurality of people in the space 70 based on the determined focus of the subject.
 このようなコミュニケーション解析システム10は、対象者の注力に基づいてコミュニケーションの品質を解析することができる。 Such a communication analysis system 10 can analyze the quality of communication based on the focus of the target person.
 また、例えば、コミュニケーション解析システム10は、さらに、空間70において取得された音の音情報を取得し、取得した音情報に基づいて空間70における音源の位置である第二位置を検知する音源検知ユニット31と、検知された第一位置、及び、検知された第二位置に基づいて、対象者を追跡し、かつ、追跡中の対象者の発話量を推定する発話量推定ユニット34とを備える。コミュニケーション解析ユニット36は、判定された対象者の注力、及び、推定された対象者の発話量に基づいて、コミュニケーションの品質を解析する。 Further, for example, the communication analysis system 10 further acquires sound information of the sound acquired in the space 70, and a sound source detection unit that detects a second position, which is the position of the sound source in the space 70, based on the acquired sound information. 31 and a speech volume estimation unit 34 for tracking the subject and estimating the subject's speech volume during tracking based on the detected first position and the detected second position. The communication analysis unit 36 analyzes the quality of communication based on the determined focus of the subject and the estimated speech volume of the subject.
 このようなコミュニケーション解析システム10は、対象者の注力、及び、対象者の発話量に基づいてコミュニケーションの品質を解析することができる。 Such a communication analysis system 10 can analyze the quality of communication based on the subject's focus and the subject's utterance volume.
 また、注力判定ユニット35などのコンピュータによって実行される注力判定方法は、ディスプレイ50及びホワイトボード60が設置された空間70に滞在する複数の人が映る映像の映像情報を取得し、取得した映像情報に基づいて空間70における複数の人の位置である第一位置を検知する第一検知ステップと、映像情報を取得し、取得した映像情報に基づいて複数の人のうちの対象者の頭部の向きを検知する第二検知ステップと、検知された第一位置、及び、検知された対象者の頭部の向きに基づいて、空間70において複数の人によって行われるコミュニケーションに対する対象者の注力を判定する注力判定ステップとを含む。注力判定ステップにおいては、複数の人のうち対象者以外の人が発話しているときに、検知された対象者の頭部の向きが、複数の人のうち対象者以外の人、ディスプレイ50、及び、ホワイトボード60のいずれかを向いている期間を、人がコミュニケーションに対して注力している期間であると判定する。 In addition, the focus determination method executed by a computer such as the focus determination unit 35 acquires video information of a video in which a plurality of people staying in the space 70 in which the display 50 and the whiteboard 60 are installed, and acquires the acquired video information. a first detection step of detecting a first position, which is the position of a plurality of people in the space 70, based on the above; Based on the second detecting step of detecting orientation, the detected first position, and the detected orientation of the subject's head, determining the subject's focus on communication conducted by a plurality of people in the space 70. and a focus determination step. In the focus determination step, when a person other than the target person among the plurality of people is speaking, the orientation of the detected target person's head is different from that of the person other than the target person among the plurality of people, the display 50, Also, the period during which the person faces any one of the whiteboards 60 is determined as the period during which the person is concentrating on communication.
 このような注力判定方法は、対象者がコミュニケーションに対してどの程度注力しているかを判定することができる。 Such a focus determination method can determine how much the target person is focused on communication.
 (その他の実施の形態)
 以上、実施の形態について説明したが、本発明は、上記実施の形態に限定されるものではない。
(Other embodiments)
Although the embodiments have been described above, the present invention is not limited to the above embodiments.
 例えば、上記実施の形態では、コミュニケーション解析システムは、複数の装置によって実現されたが、単一の装置として実現されてもよい。例えば、コミュニケーション解析システムは、情報処理システム、話者ダイアライゼーションシステム、または、注力判定システムに相当する単一の装置として実現されてもよい。コミュニケーション解析システムが複数の装置によって実現される場合、コミュニケーション解析システムが備える機能的な構成要素は、複数の装置にどのように振り分けられてもよい。 For example, in the above embodiments, the communication analysis system was implemented by multiple devices, but it may be implemented as a single device. For example, the communication analysis system may be implemented as a single device corresponding to an information processing system, a speaker diarization system, or an attention determination system. When the communication analysis system is realized by multiple devices, the functional components included in the communication analysis system may be distributed to the multiple devices in any way.
 また、上記実施の形態における装置間の通信方法については特に限定されるものではない。また、装置間の通信においては、図示されない中継装置が介在してもよい。 Also, the communication method between devices in the above embodiment is not particularly limited. Further, a relay device (not shown) may intervene in communication between devices.
 また、上記実施の形態において、特定の処理部が実行する処理を別の処理部が実行してもよい。また、複数の処理の順序が変更されてもよいし、複数の処理が並行して実行されてもよい。 Further, in the above embodiment, the processing executed by a specific processing unit may be executed by another processing unit. In addition, the order of multiple processes may be changed, and multiple processes may be executed in parallel.
 また、上記実施の形態において、各構成要素は、各構成要素に適したソフトウェアプログラムを実行することによって実現されてもよい。各構成要素は、CPUまたはプロセッサなどのプログラム実行部が、ハードディスクまたは半導体メモリなどの記録媒体に記録されたソフトウェアプログラムを読み出して実行することによって実現されてもよい。 Also, in the above embodiments, each component may be realized by executing a software program suitable for each component. Each component may be realized by reading and executing a software program recorded in a recording medium such as a hard disk or a semiconductor memory by a program execution unit such as a CPU or processor.
 また、各構成要素は、ハードウェアによって実現されてもよい。例えば、各構成要素は、回路(または集積回路)でもよい。これらの回路は、全体として1つの回路を構成してもよいし、それぞれ別々の回路でもよい。また、これらの回路は、それぞれ、汎用的な回路でもよいし、専用の回路でもよい。 Also, each component may be realized by hardware. For example, each component may be a circuit (or integrated circuit). These circuits may form one circuit as a whole, or may be separate circuits. These circuits may be general-purpose circuits or dedicated circuits.
 また、本発明の全般的または具体的な態様は、システム、装置、方法、集積回路、コンピュータプログラムまたはコンピュータ読み取り可能なCD-ROMなどの記録媒体で実現されてもよい。また、システム、装置、方法、集積回路、コンピュータプログラム及び記録媒体の任意な組み合わせで実現されてもよい。 Also, general or specific aspects of the present invention may be implemented in a system, apparatus, method, integrated circuit, computer program, or recording medium such as a computer-readable CD-ROM. Also, any combination of systems, devices, methods, integrated circuits, computer programs and recording media may be implemented.
 例えば、本発明は、話者ダイアライゼーションシステムなどのコンピュータが実行する発話量推定方法として実現されてもよいし、このような発話量推定方法をコンピュータに実行させるためのプログラムとして実現されてもよいし、このようなプログラムが記録されたコンピュータ読み取り可能な非一時的な記録媒体として実現されてもよい。 For example, the present invention may be implemented as a speech amount estimation method executed by a computer such as a speaker diarization system, or may be implemented as a program for causing a computer to execute such a speech amount estimation method. However, it may be realized as a computer-readable non-temporary recording medium in which such a program is recorded.
 また、本発明は、注力判定システムなどのコンピュータが実行する注力判定方法として実現されてもよいし、このような注力判定方法をコンピュータに実行させるためのプログラムとして実現されてもよいし、このようなプログラムが記録されたコンピュータ読み取り可能な非一時的な記録媒体として実現されてもよい。 Further, the present invention may be implemented as a focus determination method executed by a computer such as a focus determination system, or may be implemented as a program for causing a computer to execute such a focus determination method. It may be implemented as a computer-readable non-temporary recording medium on which a program is recorded.
 その他、各実施の形態に対して当業者が思いつく各種変形を施して得られる形態、または、本発明の趣旨を逸脱しない範囲で各実施の形態における構成要素及び機能を任意に組み合わせることで実現される形態も本発明に含まれる。 In addition, forms obtained by applying various modifications to each embodiment that a person skilled in the art can think of, or realized by arbitrarily combining the constituent elements and functions of each embodiment without departing from the spirit of the present invention. Also included in the present invention.
 10 コミュニケーション解析システム
 20 センシング装置
 21 マイクロフォンアレイ
 22 カメラ
 23 測距センサ(センサ)
 30 情報処理システム
 31 音源検知ユニット
 32 人検知ユニット
 33 頭部向き検知ユニット
 34 発話量推定ユニット
 35 注力判定ユニット
 36 コミュニケーション解析ユニット
 37 記憶ユニット
 38 話者ダイアライゼーションシステム
 39 注力判定システム
 40 机
 50 ディスプレイ
 60 ホワイトボード
 70 空間
10 communication analysis system 20 sensing device 21 microphone array 22 camera 23 ranging sensor (sensor)
30 information processing system 31 sound source detection unit 32 person detection unit 33 head orientation detection unit 34 speech amount estimation unit 35 focus determination unit 36 communication analysis unit 37 memory unit 38 speaker diarization system 39 focus determination system 40 desk 50 display 60 white board 70 space

Claims (9)

  1.  ディスプレイ及びホワイトボードが設置された空間に滞在する複数の人が映る映像の映像情報を取得し、取得した前記映像情報に基づいて前記空間における前記複数の人の位置である第一位置を検知する人検知ユニットと、
     前記映像情報を取得し、取得した前記映像情報に基づいて前記複数の人のうちの対象者の頭部の向きを検知する頭部向き検知ユニットと、
     検知された前記第一位置、及び、検知された前記対象者の頭部の向きに基づいて、前記空間において前記複数の人によって行われるコミュニケーションに対する前記対象者の注力を判定する注力判定ユニットとを備え、
     前記注力判定ユニットは、前記複数の人のうち前記対象者以外の人が発話しているときに、検知された前記対象者の頭部の向きが、前記複数の人のうち前記対象者以外の人、前記ディスプレイ、及び、前記ホワイトボードのいずれかを向いている期間を、前記対象者が前記コミュニケーションに対して注力している期間であると判定する
     注力判定システム。
    Acquiring image information of an image of a plurality of people staying in a space in which a display and a whiteboard are installed, and detecting a first position, which is the positions of the plurality of people in the space, based on the acquired image information. a human detection unit;
    a head orientation detection unit that acquires the image information and detects the orientation of the head of the target person among the plurality of persons based on the acquired image information;
    a focus determination unit that determines the subject's focus on communication performed by the plurality of people in the space based on the detected first position and the detected orientation of the subject's head; prepared,
    The focus determination unit determines that, when a person other than the target person among the plurality of people is speaking, the orientation of the detected head of the target person is different from that of the target person among the plurality of people other than the target person. An attention determination system that determines a period during which the target person is facing one of the person, the display, and the whiteboard as a period during which the target person is focusing on the communication.
  2.  前記空間には机が設置され、
     前記人検知ユニットは、前記机の上に設置されたカメラから前記映像情報を取得する
     請求項1に記載の注力判定システム。
    A desk is installed in the space,
    The focus determination system according to claim 1, wherein the human detection unit acquires the image information from a camera installed on the desk.
  3.  前記空間には、複数のカメラが設置され、
     前記人検知ユニットによって取得される前記映像情報には、前記複数のカメラのそれぞれによって撮影された映像の映像情報が含まれる
     請求項2に記載の注力判定システム。
    A plurality of cameras are installed in the space,
    3. The focus determination system according to claim 2, wherein the image information acquired by the human detection unit includes image information of images captured by each of the plurality of cameras.
  4.  前記空間には、前記カメラから前記対象者までの距離を計測するセンサが設置され、
     前記人検知ユニットは、取得した前記映像情報、及び、前記センサの検知結果に基づいて前記第一位置を検知する
     請求項2に記載の注力判定システム。
    A sensor for measuring the distance from the camera to the subject is installed in the space,
    3. The focus determination system according to claim 2, wherein the human detection unit detects the first position based on the obtained image information and the detection result of the sensor.
  5.  前記人検知ユニットは、前記第一位置を検知するために、前記映像における前記対象者の大きさに基づいて前記人検知ユニットから前記対象者までの距離を推定する
     請求項1に記載の注力判定システム。
    2. The focus determination according to claim 1, wherein the human detection unit estimates a distance from the human detection unit to the target person based on the size of the target person in the image in order to detect the first position. system.
  6.  請求項1~5のいずれか1項に記載の注力判定システムと、
     判定された前記対象者の注力に基づいて、前記空間において前記複数の人によって行われるコミュニケーションの品質を解析するコミュニケーション解析ユニットとを備える
     コミュニケーション解析システム。
    The focus determination system according to any one of claims 1 to 5,
    A communication analysis system, comprising: a communication analysis unit that analyzes quality of communication conducted by the plurality of people in the space based on the determined focus of the subject.
  7.  さらに、前記空間において取得された音の音情報を取得し、取得した前記音情報に基づいて前記空間における音源の位置である第二位置を検知する音源検知ユニットと、
     検知された前記第一位置、及び、検知された前記第二位置に基づいて、前記対象者を追跡し、かつ、追跡中の前記対象者の発話量を推定する発話量推定ユニットとを備え、
     前記コミュニケーション解析ユニットは、判定された前記対象者の注力、及び、推定された前記対象者の発話量に基づいて、前記コミュニケーションの品質を解析する
     請求項6に記載のコミュニケーション解析システム。
    Further, a sound source detection unit that acquires sound information of the sound acquired in the space and detects a second position, which is the position of the sound source in the space, based on the acquired sound information;
    a speech volume estimation unit that tracks the subject based on the detected first position and the detected second position and estimates the speech volume of the subject being tracked;
    7. The communication analysis system according to claim 6, wherein the communication analysis unit analyzes the quality of the communication based on the determined focus of the subject and the estimated speech volume of the subject.
  8.  ディスプレイ及びホワイトボードが設置された空間に滞在する複数の人が映る映像の映像情報を取得し、取得した前記映像情報に基づいて前記空間における前記複数の人の位置である第一位置を検知する第一検知ステップと、
     前記映像情報を取得し、取得した前記映像情報に基づいて前記複数の人のうちの対象者の頭部の向きを検知する第二検知ステップと、
     検知された前記第一位置、及び、検知された前記対象者の頭部の向きに基づいて、前記空間において前記複数の人によって行われるコミュニケーションに対する前記対象者の注力を判定する注力判定ステップとを含み、
     前記注力判定ステップにおいては、前記複数の人のうち前記対象者以外の人が発話しているときに、検知された前記対象者の頭部の向きが、前記複数の人のうち前記対象者以外の人、前記ディスプレイ、及び、前記ホワイトボードのいずれかを向いている期間を、前記人が前記コミュニケーションに対して注力している期間であると判定する
     注力判定方法。
    Acquiring image information of an image of a plurality of people staying in a space in which a display and a whiteboard are installed, and detecting a first position, which is the positions of the plurality of people in the space, based on the acquired image information. a first detection step;
    a second detection step of acquiring the image information and detecting the orientation of the head of the target person among the plurality of people based on the acquired image information;
    a focus determination step of determining the subject's focus on communication performed by the plurality of people in the space based on the detected first position and the detected orientation of the subject's head; including
    In the focus determination step, when a person other than the target person among the plurality of people is speaking, the direction of the detected head of the target person is equal to that of the plurality of people other than the target person. determining a period during which the person is facing either the display or the whiteboard as a period during which the person is focusing on the communication.
  9.  請求項8に記載の注力判定方法をコンピュータに実行させるためのプログラム。 A program for causing a computer to execute the focus determination method according to claim 8.
PCT/JP2022/024136 2021-06-28 2022-06-16 Focus determining system, communication analyzing system, and focus determining method WO2023276700A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2023531788A JPWO2023276700A1 (en) 2021-06-28 2022-06-16

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021106485 2021-06-28
JP2021-106485 2021-06-28

Publications (1)

Publication Number Publication Date
WO2023276700A1 true WO2023276700A1 (en) 2023-01-05

Family

ID=84692337

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/024136 WO2023276700A1 (en) 2021-06-28 2022-06-16 Focus determining system, communication analyzing system, and focus determining method

Country Status (2)

Country Link
JP (1) JPWO2023276700A1 (en)
WO (1) WO2023276700A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018036868A (en) * 2016-08-31 2018-03-08 株式会社リコー Conference support system, conference support device, and conference support method
JP2019057061A (en) * 2017-09-20 2019-04-11 富士ゼロックス株式会社 Information output device and program
JP2019200475A (en) * 2018-05-14 2019-11-21 富士通株式会社 Activity evaluation program, apparatus, and method
WO2021090702A1 (en) * 2019-11-07 2021-05-14 ソニー株式会社 Information processing device, information processing method, and program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018036868A (en) * 2016-08-31 2018-03-08 株式会社リコー Conference support system, conference support device, and conference support method
JP2019057061A (en) * 2017-09-20 2019-04-11 富士ゼロックス株式会社 Information output device and program
JP2019200475A (en) * 2018-05-14 2019-11-21 富士通株式会社 Activity evaluation program, apparatus, and method
WO2021090702A1 (en) * 2019-11-07 2021-05-14 ソニー株式会社 Information processing device, information processing method, and program

Also Published As

Publication number Publication date
JPWO2023276700A1 (en) 2023-01-05

Similar Documents

Publication Publication Date Title
EP3400705B1 (en) Active speaker location detection
US7039199B2 (en) System and process for locating a speaker using 360 degree sound source localization
JP6519370B2 (en) User attention determination system, method and program
CN108089152B (en) Equipment control method, device and system
JP5004276B2 (en) Sound source direction determination apparatus and method
US10582117B1 (en) Automatic camera control in a video conference system
US10241990B2 (en) Gesture based annotations
US10645520B1 (en) Audio system for artificial reality environment
WO2010109700A1 (en) Three-dimensional object determining device, three-dimensional object determining method and three-dimensional object determining program
JP4595364B2 (en) Information processing apparatus and method, program, and recording medium
JP2018520595A (en) Multifactor image feature registration and tracking method, circuit, apparatus, system, and associated computer-executable code
CN111551921A (en) Sound source orientation system and method based on sound image linkage
WO2009119288A1 (en) Communication system and communication program
WO2023276700A1 (en) Focus determining system, communication analyzing system, and focus determining method
WO2023276701A1 (en) Speaker diarization system, communication analysis system, and utterance amount estimation method
WO2021033592A1 (en) Information processing apparatus, information processing method, and program
RU174044U1 (en) AUDIO-VISUAL MULTI-CHANNEL VOICE DETECTOR
KR101976937B1 (en) Apparatus for automatic conference notetaking using mems microphone array
CN111273232B (en) Indoor abnormal condition judging method and system
Li et al. Multiple active speaker localization based on audio-visual fusion in two stages
WO2021090702A1 (en) Information processing device, information processing method, and program
WO2021065694A1 (en) Information processing system and method
CN110730378A (en) Information processing method and system
US9883142B1 (en) Automated collaboration system
JPWO2023276700A5 (en)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22832850

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023531788

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE