WO2023276700A1

WO2023276700A1 - Focus determining system, communication analyzing system, and focus determining method

Info

Publication number: WO2023276700A1
Application number: PCT/JP2022/024136
Authority: WO
Inventors: 一樹北村; 直毅吉川; ジャマルムリアナユスフビン; プラティックプラネイ; ジアリマ
Original assignee: パナソニックＩｐマネジメント株式会社
Priority date: 2021-06-28
Filing date: 2022-06-16
Publication date: 2023-01-05
Also published as: JPWO2023276700A1

Abstract

A focus determining system (39) comprises: a person detecting unit (32) for detecting first positions, which are the positions of a plurality of persons in a space (70), on the basis of video information; a head orientation detecting unit (33) for detecting the orientation of the head of a subject among the plurality of persons, on the basis of the video information; and a focus determining unit (35) for determining, on the basis of the detected first positions and the detected orientation of the head of the subject, that when a person among the plurality of persons, other than the subject, is speaking, a period in which the detected orientation of the head of the subject is facing any of a person among the plurality of persons, other than the subject, a display (50), or a whiteboard (60), is a period during which the subject is focusing on communication.

Description

Focus determination system, communication analysis system, and focus determination method

The present invention relates to a focus determination system, a communication analysis system, and a focus determination method.

In an organization such as a company, it is important for employees to communicate closely with each other and work on their own tasks. As a technology related to such communication, Patent Literature 1 discloses an information providing apparatus that provides information on the psychological state of a subject in order to evaluate intellectual activity in a meeting, meeting, or the like.

JP-A-2004-112518

The present invention provides a focus determination system and the like that can determine how much a target person is focused on communication.

A focus determination system according to an aspect of the present invention acquires video information of a video in which a plurality of people staying in a space in which a display and a whiteboard are installed, and based on the acquired video information, the plurality of people in the space a human detection unit that detects a first position that is the position of a person; and a head that acquires the video information and detects the orientation of the head of the target person among the plurality of people based on the acquired video information. Based on a head orientation detection unit, the detected first position, and the detected orientation of the subject's head, determining the subject's focus on communication conducted by the plurality of people in the space. and a focus determination unit, wherein when a person other than the target person among the plurality of people is speaking, the direction of the detected head of the target person is in the direction of the plurality of people. A period in which a person other than the target person, the display, or the whiteboard is facing is determined as a period in which the target person is concentrating on the communication.

A communication analysis system according to an aspect of the present invention includes: the focus determination system; and a communication analysis unit that analyzes the quality of communication performed by the plurality of people in the space based on the determined focus of the subject. Prepare.

A focus determination method according to an aspect of the present invention acquires video information of a video in which a plurality of people staying in a space in which a display and a whiteboard are installed, and based on the acquired video information, the plurality of people in the space a first detection step of detecting a first position that is the position of the person; obtaining the image information; and detecting the orientation of the head of the target person among the plurality of persons based on the obtained image information. Determining the subject's focus on communication conducted by the plurality of people in the space based on a second sensing step and the sensed first position and the sensed orientation of the subject's head. In the focus determination step, when a person other than the target person among the plurality of people is speaking, the orientation of the detected head of the target person is equal to the plurality of people. A period in which the person faces any one of the persons other than the target person, the display, or the whiteboard is determined as a period in which the person focuses on the communication.

A program according to one aspect of the present invention is a program for causing a computer to execute the focus determination method.

The focus determination system and the like of the present invention can determine how much the target person is focused on communication.

FIG. 1 is a block diagram showing the functional configuration of the communication analysis system according to the embodiment. FIG. 2 is a diagram illustrating an example of a space to which the communication analysis system according to the embodiment is applied; FIG. 3 is an external view of the sensing device according to the embodiment. FIG. 4 is a flow chart of the operation of the sound source detection unit included in the communication analysis system according to the embodiment. FIG. 5 is a diagram schematically showing detection results of the position of the sound source. FIG. 6 is a flow chart of the operation of the human detection unit included in the communication analysis system according to the embodiment. FIG. 7 is a diagram schematically showing detection results of a person's position. FIG. 8 is a flowchart of the operation of the speech amount estimation unit included in the communication analysis system according to the embodiment. FIG. 9 is a diagram schematically showing the detection result of the position of the speaker. FIG. 10 is a diagram showing an example of information indicating the amount of speech. FIG. 11 is a flowchart of the operation of the head orientation detection unit included in the communication analysis system according to the embodiment; FIG. 12 is a diagram showing a three-dimensional coordinate space for explaining the operation of the head orientation detection unit included in the communication analysis system according to the embodiment; FIG. 13 is a diagram for explaining the correction of the angle of the orientation of the head. 14 is a flowchart of the operation of the focus determination unit provided in the communication analysis system according to the embodiment; FIG. FIG. 15 is a diagram for explaining the intended direction of the subject. FIG. 16 is a diagram illustrating an example of information indicating a focus period. 17 is a flowchart of the operation of the communication analysis unit included in the communication analysis system according to the embodiment; FIG. FIG. 18 is a diagram showing an example of a score display screen indicating the quality of communication displayed on the information terminal.

Hereinafter, embodiments will be specifically described with reference to the drawings. It should be noted that the embodiments described below are all comprehensive or specific examples. Numerical values, shapes, materials, components, arrangement positions and connection forms of components, steps, order of steps, and the like shown in the following embodiments are examples and are not intended to limit the present invention. Further, among the constituent elements in the following embodiments, constituent elements not described in independent claims will be described as optional constituent elements.

It should be noted that each figure is a schematic diagram and is not necessarily strictly illustrated. Moreover, in each figure, the same code|symbol is attached|subjected with respect to substantially the same structure, and the overlapping description may be abbreviate|omitted or simplified.

(Embodiment)
[composition]
First, the configuration of the communication analysis system according to the embodiment will be described. FIG. 1 is a block diagram showing the functional configuration of the communication analysis system according to the embodiment. FIG. 2 is a diagram showing an example of a space to which the communication analysis system is applied.

As shown in FIGS. 1 and 2, the communication analysis system 10 is used in an office having a space 70 such as a conference room to analyze the quality of communication of a plurality of people located in a space 70. System. The space 70 is, for example, a closed space, but may be an open space. Examples of the space 70 include a conference room as well as an open rest area in an office space (where chairs and tables are placed in part of the office space). Also, the space 70 does not need to be physically separated, and may be a place separated by illumination light or airflow in the entire space. For example, a warm color area with a color temperature of 3000 K may be provided in a corner of an office space illuminated with daylight color with a color temperature of 5000 K, and this area may be used as the space 70 .

The communication analysis system 10 includes a sensing device 20 and an information processing system 30. First, the sensing device 20 will be described with reference to FIG. 3 in addition to FIGS. 1 and 2. FIG. FIG. 3 is an external view of the sensing device 20. As shown in FIG. 3A is a top view of the sensing device 20, and FIG. 3B is a side view of the sensing device 20. FIG. Note that the distance measuring sensor 23 is not shown in FIG.

The sensing device 20 is installed on the desk 40 installed in the space 70 and senses sounds and images in the space 70 . The sensing device 20 is specifically installed in the center of the desk 40 . The sensing device 20 includes a microphone array 21 , a plurality of cameras 22 and a ranging sensor 23 .

The microphone array 21 acquires sound in the space 70 and outputs sound information (a plurality of sound signals) of the acquired sound. The microphone array 21 specifically includes a plurality of microphone elements, and each of the plurality of microphone elements acquires sound in the space 70 and outputs a sound signal of the acquired sound.

Each of the plurality of cameras 22 captures an image (in other words, moving image) of a person staying in the space 70 and outputs image information of the image. The camera 22 is a general camera implemented by a CMOS image sensor or the like, but may be a fisheye camera or the like. The sensing device 20 has four cameras so that the entire surroundings of the sensing device 20 can be photographed from the desk 40, and at least one camera capable of photographing all the people staying in the space 70 is provided. Just be prepared.

The distance measuring sensor 23 measures the distance from the sensing device 20 (camera 22) to the object, and outputs distance information indicating the measured distance to the object. The object is, for example, a person staying in the space 70 . The ranging sensor 23 is, for example, a TOF (Time Of Flight) type LiDAR (Light Detection and Ranging), but may be a range image sensor or the like. Sensing device 20 may include at least one ranging sensor 23 , but may include a plurality of ranging sensors 23 corresponding to cameras 22 .

Next, the information processing system 30 will be described. The information processing system 30 performs wired or wireless communication with the sensing device 20, and based on sensing information (specifically, sound information, video information, distance information, etc.) acquired from the sensing device 20 through the communication. , to analyze the quality of communication. The information processing system 30 is, for example, an edge computer installed in a facility having a space 70, but may be a cloud computer installed outside the facility. When the information processing system 30 is an edge computer, the sensing device 20 and the information processing system 30 may be realized as one integrated device. Also, part of the functions of the information processing system 30 may be implemented as an edge computer, and part of the other functions may be implemented by a cloud computer.

Specifically, the information processing system 30 includes a sound source detection unit 31, a person detection unit 32, a head direction detection unit 33, a speech amount estimation unit 34, a focus determination unit 35, a communication analysis unit 36, and a storage unit 37. Prepare.

The sound source detection unit 31 acquires sound information of the sound acquired in the space 70 from the sensing device 20, and detects a second position, which is the position of the sound source in the space 70, based on the acquired sound information.

The human detection unit 32 acquires from the sensing device 20 image information of an image showing a person staying in the space 70, and detects the first position, which is the position of the person in the space 70, based on the acquired image information.

The head orientation detection unit 33 acquires from the sensing device 20 image information of an image of a person staying in the space 70, and detects the orientation of the person's head (in other words, the orientation of the face) based on the acquired image information. detect. The head direction detection unit 33 may detect the direction of the person's line of sight based on the acquired video information.

The speech amount estimation unit 34 estimates the amount of human speech based on the first position detected by the human detection unit 32 and the second position detected by the sound source detection unit 31 .

The focus determination unit 35 detects a plurality of people including the person in the space 70 based on the first position detected by the person detection unit 32 and the orientation of the person's head detected by the head orientation detection unit 33. Determines a person's focus on communications made by

The communication analysis unit 36 analyzes the quality of communication based on at least one of the human speech volume estimated by the speech volume estimation unit 34 and the human focus determined by the focus determination unit 35 . The communication analysis unit 36 also outputs analysis results.

Each of the sound source detection unit 31, the person detection unit 32, the head orientation detection unit 33, the speech amount estimation unit 34, the focus determination unit 35, and the communication analysis unit 36 described above is realized by a microcomputer or a processor. . The functions of the sound source detection unit 31, the human detection unit 32, the head orientation detection unit 33, the speech amount estimation unit 34, the focus determination unit 35, and the communication analysis unit 36 are, for example, stored in the memory unit 37 by the microcomputer or processor. It is implemented by executing a stored computer program.

The storage unit 37 is a storage device that stores the computer program and information necessary for realizing the functions of the components. The storage unit 37 is implemented by, for example, an HDD (Hard Disk Drive), but may be implemented by a semiconductor memory.

A system including the sound source detection unit 31, the human detection unit 32, and the speech amount estimation unit 34 is also described as a speaker diarization system 38. That is, the speaker diarization system 38 comprises a sound source detection unit 31 , a person detection unit 32 and a speech amount estimation unit 34 . Speaker diarization system 38 may further comprise head orientation detection unit 33 or focus determination unit 35 .

A system including the human detection unit 32, the head orientation detection unit 33, and the focus determination unit 35 is also described as a focus determination system 39. That is, the focus determination system 39 includes the human detection unit 32 , the head orientation detection unit 33 , and the focus determination unit 35 . The focus determination system 39 may further include the sound source detection unit 31 or the speech amount estimation unit 34 .

[Operation of sound source detection unit]
Next, the operation of the sound source detection unit 31 will be described more specifically. FIG. 4 is a flow chart of the operation of the sound source detection unit 31. As shown in FIG.

First, the sound source detection unit 31 acquires sound information of the sound acquired in the space 70 from the microphone array 21 of the sensing device 20 (S11). Specifically, the sound source detection unit 31 acquires multiple sound signals output by multiple microphone elements included in the microphone array 21 .

Here, each of the acquired multiple sound signals is a signal in the time domain. The sound source detection unit 31 transforms each of the plurality of sound signals from a time domain signal into a frequency domain signal by performing a Fourier transform (S12).

Next, the sound source detection unit 31 calculates a spatial correlation matrix from the input vector determined based on the multiple sound signals after being transformed into the frequency domain (S13). Here, if the microphone array 21 has M microphone elements, and the m-th sound signal after the Fourier transform is X _m (ω, t), the input vector x(ω, t) is expressed by the following equation. be done. T means transpose.

Also, the spatial correlation matrix R is represented by the following formula. H means conjugate transpose. In the following description, the frequency index ω will be omitted for the sake of simplicity.

Next, the sound source detection unit 31 calculates eigenvectors by eigenvalue decomposition of the spatial correlation matrix (S14). Specifically, the sound source detection unit 31 performs eigenvalue decomposition on the above spatial correlation matrix based on the following equation to obtain eigenvalue vectors e ₁ , . . . e _M and eigenvalues λ ₁ _. can be calculated.

Next, the sound source detection unit 31 detects the position of the sound source from the eigenvectors (S15). Specifically, the sound source detection unit 31 can identify the loudness of the sound and the direction from which the sound arrives from the eigenvector, and can detect the direction from which the relatively loud sound arrives as the direction (position) of the sound source. can.

As a result, as shown in FIG. 5 , the sound source detection unit 31 can detect in which direction (angle) the sound source is positioned with respect to the position O of the sensing device 20 . FIG. 5 is a diagram (a diagram of the desk 40 viewed from above) schematically showing the detection result of the position of the sound source. In the example of FIG. 5, two sound sources S1 and S2 are detected. The sound source detection unit 31 should detect at least the two-dimensional position of the sound source as shown in FIG. You may In the following embodiments, the position of the sound source detected by the sound source detection unit 31 is also described as the second position.

The sound source detection unit 31 can track the position of the sound source (second position) by repeating the operation of FIG. 4 every unit time. The sound source is specifically a speaker (person) staying in the space 70 , but may also be a device installed in the space 70 .

[Operation of human detection unit]
Next, the operation of the human detection unit 32 will be described more specifically. FIG. 6 is a flow chart of the operation of the human detection unit 32. As shown in FIG. Note that the following operations are actually performed for video information acquired from each of the four cameras 22, but for the sake of convenience, the following description assumes that video information is acquired from one camera 22.

First, the human detection unit 32 acquires image information of the image acquired in the space 70 from the camera 22 of the sensing device 20 (S21).

Next, the human detection unit 32 identifies an area in which a person appears in the image based on the acquired image information (S22). The human detection unit 32 can identify an area in which a person appears in the video by a method using pattern matching, a method using a machine learning model, or the like.

Next, the human detection unit 32 assigns identification information to the specified area (that is, the area where a person is present) (S23). For example, the human detection unit 32 identifies three areas and assigns identification information of A, B, and C to the identified three areas. Hereinafter, a person corresponding to the area A is also described as a person A, a person corresponding to the area B as a person B, and a person corresponding to the area C as a person C.

Next, the human detection unit 32 identifies the direction in which the person is based on the position in the image of the area identified in step S23 (S24). The storage unit 37 stores in advance information indicating the installation position of the camera 22 (the center on the desk 40) and the shooting range (angle of view) of the camera 22. It is possible to specify which direction the position corresponds to.

Next, the human detection unit 32 estimates the distance from the sensing device 20 (camera 22) to the person based on the size of the area specified in step S23 (S25). In this case, it is estimated that the larger the area specified in step S23, the closer the distance from the sensing device 20 (camera 22) to the person. In step S<b>25 , the distance from the sensing device 20 (camera 22 ) to the person may be specified based on the distance information (measured distance value) acquired from the ranging sensor 23 .

As a result of steps S24 and S25 above, the human detection unit 32 can detect the position of a person in the space 70, as shown in FIG. FIG. 7 is a diagram (a diagram of the desk 40 viewed from above) schematically showing the detection result of the position of a person. In the example of FIG. 7, the positions of three people, person A, person B, and person C, are detected. The human detection unit 32 detects a three-dimensional human position (three-dimensional coordinates of the human position). direction) can be detected. In the following embodiments, the position of the person detected by the human detection unit 32 is also described as the first position.

The human detection unit 32 can track the position of a person (first position) by repeating the operation of FIG. 6 every unit time. At this time, the assignment of the identification information in step S23 may be performed only once for the first time.

[Operation of speech amount estimation unit]
Next, the operation of the speech amount estimation unit 34 will be described more specifically. FIG. 8 is a flow chart of the operation of the speech amount estimation unit 34. As shown in FIG.

The speech amount estimation unit 34 acquires the first position detected by the human detection unit 32 and the second position detected by the sound source detection unit 31 (S31). The first position and the second position acquired at this time are detected at substantially the same timing. “Substantially the same” means that some deviation may be included.

Next, the speech amount estimation unit 34 converts the first position represented by three-dimensional coordinates into two-dimensional coordinates (angle) corresponding to the second position (S32). Note that if the position of the microphone array 21 and the position of the camera 22 are different, the first position after being converted into two-dimensional coordinates is corrected based on the difference between these positions.

Next, the speech amount estimation unit 34 detects the speaker (speaker's position) by matching the converted first position with the second position (S33). FIG. 9 is a diagram schematically showing the detection result of the position of the speaker. FIG. 9 is a superimposed view of the second position (FIG. 5) and the first position (FIG. 7).

The speech amount estimation unit 34 detects that the sound source S1 is the person A, for example, when the angle difference Δθ1 between the second position of the sound source S1 and the first position of the person A is less than or equal to a predetermined value. That is, the person A is detected as the speaker. Further, the speech amount estimation unit 34 detects that the sound source S2 is the person C, for example, when the angle difference Δθ2 between the second position of the sound source S2 and the first position of the person C is equal to or less than a predetermined value. That is, person C is detected as the speaker.

The speech amount estimation unit 34 tracks each of person A, person B, and person C by repeating the operation of FIG. can be estimated. Specifically, the speech amount estimation unit 34 estimates the period during which the person A, the person B, and the person C are detected as the speaker. It can be estimated that the That is, the speech amount estimation unit 34 can store information (information indicating the amount of speech) indicating the period during which the person A, person B, and person C are speaking in the storage unit 37 . FIG. 10 is a diagram showing an example of information indicating the amount of speech. As shown in FIG. 10, the information indicating the amount of speech is information in which the amount of speech (speech time) is associated with each piece of identification information assigned in step S23.

Thus, the speech amount estimation unit 34 tracks each of the plurality of persons based on the first position detected by the human detection unit 32 and the second position detected by the sound source detection unit 31, and It is possible to estimate the amount of speech of each of the plurality of people staying in the space 70 (the amount of speech of the target person). Such a method of estimating the amount of speech by the speech amount estimation unit 34 is useful for analyzing the quality of communication in a conference or the like involving movement of seats by a plurality of people. In this case, the movement of seats by a plurality of people means moving to use the whiteboard 60, for example.

In addition, the speech volume estimation unit 34 estimates the speech volume of each of the plurality of people while maintaining the anonymity of the plurality of people, instead of performing individual identification by voice recognition and individual identification by image recognition. can be done.

[Operation of head orientation estimation unit]
Next, the operation of the head orientation detection unit 33 will be described more specifically. FIG. 11 is a flow chart of the operation of the head orientation detection unit 33. As shown in FIG. FIG. 12 is a diagram showing a three-dimensional coordinate space for explaining the operation of the head orientation detection unit 33. As shown in FIG. Note that, as shown in FIG. 12, in the following description of the operation of the head orientation detection unit 33, the camera 22 is positioned at a position on the desk 40, not at the center position O (0, 0, 0) of the desk 40. It is described as being located at C(x0, y0, z0). In addition to the three-dimensional coordinates in the space 70 indicated by XYZ, FIG. 12 also shows the coordinates in the image indicated by UV.

First, the head orientation detection unit 33 acquires image information of the image acquired in the space 70 from the camera 22 of the sensing device 20 (S41).

Next, the head orientation detection unit 33 identifies the orientation of the person's head in the image based on the acquired image information (S42). The head orientation detection unit 33 detects the orientation of the person's head by, for example, identifying an area corresponding to the person's head in the video by face recognition processing and applying a machine learning model to the identified area. can do. As shown in FIG. 12, the vector indicating the direction of the head at this time forms an angle of A° with respect to the segment (straight line) connecting the position P of the person and the position C of the camera. .

Next, the head direction detection unit 33 acquires the position of the person (more specifically, the position of the person's head) detected by the person detection unit 32 (S43). As shown in FIG. 12, the position of the person at this time is P(x1, y1, z1).

For example, using z1-z0 determined by the actual size of the desk 40, the horizontal viewing angle α of the camera 22, the position (u, v) of the person in the image, and the width w of the image, , x1 can be expressed by the following equation. Information such as the size of the desk 40, the horizontal viewing angle α of the camera 22, and the width w of the image is stored in the storage unit 37 in advance.

　x1=x0+((u-w/2)/w)×(z1-z0)×tan(α)

Next, the head orientation detection unit 33 detects the orientation of the head (angle A) with reference to the position of the camera 22 specified in step S42, and the position (specifically, coordinates) of the person acquired in step S43. x1 value) (S44). FIG. 13 is a diagram (a diagram of the desk 40 viewed from above) for explaining such angle correction.

The head orientation detection unit 33 can calculate ∠OPC in FIG. 13 based on the x1 coordinate acquired in step S43, and can set A+∠OPC as the corrected angle.

The head orientation detection unit 33 can track the orientation of the person's head by repeating the operation of FIG. 11 every unit time.

[Operation of focus determination unit]
Next, the operation of the focus determination unit 35 will be described more specifically. FIG. 14 is a flow chart of the operation of the focus determination unit 35. As shown in FIG.

The focus determination unit 35 acquires the first position detected by the human detection unit 32 and the orientation of the person's head detected by the head orientation detection unit 33 (S51). More specifically, the first position acquired at this time is the first position of each of the plurality of people staying in the space 70, and the orientation of the head of each of the plurality of persons is the orientation of the head of each of the plurality of persons. be.

Next, the focus determination unit 35 determines the intended direction of the target person, with one of the plurality of people as the target person (S52). FIG. 15 is a diagram (a diagram of the desk 40 viewed from above) for explaining the intended direction of the subject.

For example, if a person at position P is the target person, the focus determination unit 35 determines a direction connecting position G where another person is from position P as the target direction. Note that the target direction does not necessarily have to be determined for a person, and may be determined for the display 50 and the whiteboard 60 as described later.

Next, the focus determination unit 35 determines the allowable range around the target direction (S53). For example, in the example of FIG. 15, the range from ∠GPO−β to ∠GPO+β is determined as the allowable range. β is a coefficient for determining the allowable range. The value of β is different depending on whether the target direction is a person, the display 50 or the whiteboard 60 .

Next, the focus determination unit 35 determines the focus of the subject by comparing the orientation of the subject's head acquired in step S51 with the allowable range determined in step S53 (S54). Focus here means focus on communication performed by a plurality of people including the target person in the space 70 . The focus determination unit 35 determines that the target person is focusing on communication when the direction of the target person's head is within the allowable range, since it is considered that the target person is looking at another person. On the other hand, when the target person's head orientation is outside the allowable range, the focus determination unit 35 determines that the target person is not focusing on communication because it is considered that the target person is not looking at other people.

The focus determination unit 35 repeats the operation of FIG. C can determine the focus of each. Specifically, the focus determination unit 35 accumulates the periods in which each of person A, person B, and person C is determined to be focused, and indicates the period in which multiple people are focused on communication. Information can be stored in the storage unit 37 . FIG. 16 is a diagram illustrating an example of information indicating a focus period. As shown in FIG. 16, the information indicating the amount of speech is information in which each piece of identification information assigned in step S23 is associated with a focus period.

In this way, the focus determination unit 35 determines whether a plurality of people are performing in the space 70 based on the first position detected by the human detection unit 32 and the orientation of the head detected by the head orientation detection unit 33 . It is possible to determine the subject's focus on the communication received.

Note that the focus determination unit 35 further acquires the second position detected by the sound source detection unit 31 to detect the position of the speaker, or acquires the position of the speaker detected by the speech amount estimation unit 34. You may In either case, the focus determination unit 35 can determine focus by considering whether or not each of the plurality of people is a speaker. For example, only when the subject's head is facing the speaker's direction, it is determined that the subject is focusing on communication, and when the subject's head is looking in the direction of the person who is not speaking, the subject is focusing on communication. It can be determined that they are not.

Also, when considering whether or not each of a plurality of people is a speaker, the focus determination unit 35 may determine focus as follows.

For example, when a meeting is held by a plurality of people in the space 70, the target person (assumed to be a person C) is positioned right beside the target person even if the person A located right beside the target person is speaking. It is difficult to turn to the person who is located. Therefore, it is considered to look at the other person B. In addition, it is conceivable that the target person faces the display 50 (shown in FIG. 2) and the whiteboard 60 (shown in FIG. 2) on which meeting report materials and the like are displayed. In such a case, if it is determined that the target person is focusing on communication only when the target person's head is facing the direction of the speaker as described above, the target person who is actually focusing on communication There is a problem that it is determined that the person is not focusing on communication.

Therefore, in such a case, while the person A is speaking, (1) the period during which the target person is facing the direction of the person A, (2) the period during which the target person is facing the display 50, ( During both 3) the period when the target person is facing the person (person B) other than person A and (4) the period when the target person is facing the whiteboard 60, the target person is focusing on communication. It may be determined that That is, when a person other than the target person among the plurality of people is speaking, the focus determination unit 35 determines that the orientation of the detected target person's head is A period in which the person faces either 50 or the whiteboard 60 may be determined as a period in which the person is concentrating on communication. Note that the period during which the target person is facing the whiteboard 60 may be determined as the period during which the target person is concentrating on communication under the condition that person A is speaking near the whiteboard. .

[Operation of communication analysis unit]
Next, the operation of the communication analysis unit 36 will be described more specifically. FIG. 17 is a flow chart of the operation of the communication analysis unit 36. As shown in FIG.

The communication analysis unit 36 reads out the information indicating the amount of speech (FIG. 10) and the information indicating the focus period (FIG. 16) stored in the storage unit 37 (S61).

Next, the communication analysis unit 36 analyzes the quality of communication in the space 70 based on the obtained information indicating the amount of speech and the obtained information indicating the focus period. For example, the communication analysis unit scores communication quality with the following scoring criteria.

The communication analysis unit 36 calculates, for example, the ratio of the speech volumes of persons A to C based on the acquired information indicating the speech volumes, and the minimum/maximum value of this ratio is close to 1 (that is, the person A ~ Person C speaks more evenly), the higher the first score related to the amount of speech is set. Specifically, when the ratio of the amount of speech from person A to person C is 1:1.2:1.5, the communication analysis unit 36 calculates 1 (minimum value)/1.5 (maximum value) and 1 A first score is calculated based on the difference between .

Also, the communication analysis unit 36, for example, based on the acquired information indicating the focus period, sets the second score related to focus to a larger value as the average value of the focus periods of persons A to C increases.

Then, the communication analysis unit 36 calculates the sum of the first score and the second score as the final score indicating the quality of communication in the space 70. Each of the persons A to C, for example, by accessing the communication analysis system 10 (information processing system 30) using an information terminal such as a smartphone or a personal computer, obtains a final score indicating the quality of communication (that is, the quality of communication quality analysis results) can be confirmed. FIG. 18 is a diagram showing an example of a score display screen indicating the quality of communication displayed on the information terminal.

In this way, the communication analysis unit 36 can analyze the quality of communication performed by a plurality of people in the space 70 based on the speech volume estimation result and the focus determination result. Note that the communication analysis unit 36 may analyze the quality of communication performed by a plurality of people in the space 70 based on at least one of the speech volume estimation result and the focus determination result. Also, the score calculation criteria as described above are merely an example, and the score may be appropriately determined according to the content required for communication in the space 70 .

[Modification]
In the above embodiment, the microphone array 21, the camera 22, and the ranging sensor 23 are installed on the desk 40. FIG. However, the microphone array 21, camera 22 and ranging sensor 23 may be installed on the ceiling. In addition, the microphone array 21, the camera 22, and the ranging sensor 23 do not have to be concentrated in one place, and the microphone array 21, the camera 22, and the ranging sensor 23 may be installed in different places. good.

Also, in the above embodiment, the sound source detection unit 31 and the human detection unit 32 are used together to detect the speaker, but the speaker can also be detected with the human detection unit 32 alone. For example, the human detection unit 32 can detect whether or not the person is speaking from the movement of the person in the video.

Further, in the above-described embodiment, the communication analysis unit 36 calculates the final score indicating the quality of communication (that is, the analysis result of the quality of communication) in real time while the meeting or the like is being held in the space 70, and calculates the final score. The environment in the space 70 may be controlled in real time based on the final score obtained.

For example, when the communication analysis unit 36 determines that the calculated final score is less than a predetermined value (that is, the communication is not active), it sends a control signal to the environment control device (not shown) installed in the space 70. By doing so, the environment of the space 70 is controlled.

Specifically, the environment control devices include an air conditioner that controls the temperature environment of the space 70, a lighting device that controls the light environment of the space 70, a fragrance generator that controls the smell of the space 70, and the sound of the space 70. Examples include a music player that controls the environment.

For example, the communication analysis unit 36 can activate communication in the space 70 by raising the set temperature of the air conditioner or making the lighting equipment brighter than it is now. In addition, the communication analysis unit 36 activates the communication in the space 70 by activating the scent generator installed in the space 70 or by causing the music playback device installed in the space 70 to play back music. good too.

[Effects, etc.]
As described above, the focus determination system 39 acquires image information of images showing a plurality of people staying in the space 70 in which the display 50 and the whiteboard 60 are installed, and based on the acquired image information, A human detection unit 32 that detects a first position that is the position of a plurality of people, and a head that acquires image information and detects the orientation of the head of a target person among the plurality of people based on the acquired image information. An orientation detection unit 33 and a focus determination unit that determines the target person's focus on communication performed by a plurality of people in the space 70 based on the detected first position and the detected target person's head orientation. 35. The focus determination unit 35 determines that when a person other than the target person among the plurality of people is speaking, the orientation of the detected target person's head is different from that of the person other than the target person among the plurality of people, the display 50, Also, the period during which the target person faces any one of the whiteboards 60 is determined as the period during which the subject is concentrating on communication.

Such a focus determination system 39 can determine how much the target person is focused on communication.

Also, for example, a desk 40 is installed in the space 70 , and the human detection unit 32 acquires video information from the camera 22 installed on the desk 40 .

Such a focus determination system 39 can acquire video information from the camera 22 installed on the desk 40 .

Also, for example, a plurality of cameras 22 are installed in the space 70 , and the image information acquired by the human detection unit 32 includes image information of images captured by each of the plurality of cameras 22 .

Such a focus determination system 39 can acquire video information from multiple cameras 22 installed on the desk 40 .

Also, for example, in the space 70, a ranging sensor 23 that measures the distance from the camera 22 to the subject is installed. The human detection unit 32 detects the first position based on the acquired image information and the detection result of the distance measuring sensor 23 .

Such a focus determination system 39 can detect the first position based on the detection result of the distance measuring sensor 23 .

Also, for example, the human detection unit 32 estimates the distance from the human detection unit 32 to the target person based on the size of the target person in the image in order to detect the first position.

Such a focus determination system 39 can estimate the distance from the human detection unit 32 to the target person based on the size of the target person in the image.

The communication analysis system 10 also includes a focus determination system 39 and a communication analysis unit 36 that analyzes the quality of communication conducted by a plurality of people in the space 70 based on the determined focus of the subject.

Such a communication analysis system 10 can analyze the quality of communication based on the focus of the target person.

Further, for example, the communication analysis system 10 further acquires sound information of the sound acquired in the space 70, and a sound source detection unit that detects a second position, which is the position of the sound source in the space 70, based on the acquired sound information. 31 and a speech volume estimation unit 34 for tracking the subject and estimating the subject's speech volume during tracking based on the detected first position and the detected second position. The communication analysis unit 36 analyzes the quality of communication based on the determined focus of the subject and the estimated speech volume of the subject.

Such a communication analysis system 10 can analyze the quality of communication based on the subject's focus and the subject's utterance volume.

In addition, the focus determination method executed by a computer such as the focus determination unit 35 acquires video information of a video in which a plurality of people staying in the space 70 in which the display 50 and the whiteboard 60 are installed, and acquires the acquired video information. a first detection step of detecting a first position, which is the position of a plurality of people in the space 70, based on the above; Based on the second detecting step of detecting orientation, the detected first position, and the detected orientation of the subject's head, determining the subject's focus on communication conducted by a plurality of people in the space 70. and a focus determination step. In the focus determination step, when a person other than the target person among the plurality of people is speaking, the orientation of the detected target person's head is different from that of the person other than the target person among the plurality of people, the display 50, Also, the period during which the person faces any one of the whiteboards 60 is determined as the period during which the person is concentrating on communication.

Such a focus determination method can determine how much the target person is focused on communication.

(Other embodiments)
Although the embodiments have been described above, the present invention is not limited to the above embodiments.

For example, in the above embodiments, the communication analysis system was implemented by multiple devices, but it may be implemented as a single device. For example, the communication analysis system may be implemented as a single device corresponding to an information processing system, a speaker diarization system, or an attention determination system. When the communication analysis system is realized by multiple devices, the functional components included in the communication analysis system may be distributed to the multiple devices in any way.

Also, the communication method between devices in the above embodiment is not particularly limited. Further, a relay device (not shown) may intervene in communication between devices.

Further, in the above embodiment, the processing executed by a specific processing unit may be executed by another processing unit. In addition, the order of multiple processes may be changed, and multiple processes may be executed in parallel.

Also, in the above embodiments, each component may be realized by executing a software program suitable for each component. Each component may be realized by reading and executing a software program recorded in a recording medium such as a hard disk or a semiconductor memory by a program execution unit such as a CPU or processor.

Also, each component may be realized by hardware. For example, each component may be a circuit (or integrated circuit). These circuits may form one circuit as a whole, or may be separate circuits. These circuits may be general-purpose circuits or dedicated circuits.

Also, general or specific aspects of the present invention may be implemented in a system, apparatus, method, integrated circuit, computer program, or recording medium such as a computer-readable CD-ROM. Also, any combination of systems, devices, methods, integrated circuits, computer programs and recording media may be implemented.

For example, the present invention may be implemented as a speech amount estimation method executed by a computer such as a speaker diarization system, or may be implemented as a program for causing a computer to execute such a speech amount estimation method. However, it may be realized as a computer-readable non-temporary recording medium in which such a program is recorded.

Further, the present invention may be implemented as a focus determination method executed by a computer such as a focus determination system, or may be implemented as a program for causing a computer to execute such a focus determination method. It may be implemented as a computer-readable non-temporary recording medium on which a program is recorded.

In addition, forms obtained by applying various modifications to each embodiment that a person skilled in the art can think of, or realized by arbitrarily combining the constituent elements and functions of each embodiment without departing from the spirit of the present invention. Also included in the present invention.

10 communication analysis system 20 sensing device 21 microphone array 22 camera 23 ranging sensor (sensor)
30 information processing system 31 sound source detection unit 32 person detection unit 33 head orientation detection unit 34 speech amount estimation unit 35 focus determination unit 36 communication analysis unit 37 memory unit 38 speaker diarization system 39 focus determination system 40 desk 50 display 60 white board 70 space

Claims

Acquiring image information of an image of a plurality of people staying in a space in which a display and a whiteboard are installed, and detecting a first position, which is the positions of the plurality of people in the space, based on the acquired image information. a human detection unit;
a head orientation detection unit that acquires the image information and detects the orientation of the head of the target person among the plurality of persons based on the acquired image information;
a focus determination unit that determines the subject's focus on communication performed by the plurality of people in the space based on the detected first position and the detected orientation of the subject's head; prepared,
The focus determination unit determines that, when a person other than the target person among the plurality of people is speaking, the orientation of the detected head of the target person is different from that of the target person among the plurality of people other than the target person. An attention determination system that determines a period during which the target person is facing one of the person, the display, and the whiteboard as a period during which the target person is focusing on the communication.
A desk is installed in the space,
The focus determination system according to claim 1, wherein the human detection unit acquires the image information from a camera installed on the desk.
A plurality of cameras are installed in the space,
3. The focus determination system according to claim 2, wherein the image information acquired by the human detection unit includes image information of images captured by each of the plurality of cameras.
A sensor for measuring the distance from the camera to the subject is installed in the space,
3. The focus determination system according to claim 2, wherein the human detection unit detects the first position based on the obtained image information and the detection result of the sensor.
2. The focus determination according to claim 1, wherein the human detection unit estimates a distance from the human detection unit to the target person based on the size of the target person in the image in order to detect the first position. system.
The focus determination system according to any one of claims 1 to 5,
A communication analysis system, comprising: a communication analysis unit that analyzes quality of communication conducted by the plurality of people in the space based on the determined focus of the subject.
Further, a sound source detection unit that acquires sound information of the sound acquired in the space and detects a second position, which is the position of the sound source in the space, based on the acquired sound information;
a speech volume estimation unit that tracks the subject based on the detected first position and the detected second position and estimates the speech volume of the subject being tracked;
7. The communication analysis system according to claim 6, wherein the communication analysis unit analyzes the quality of the communication based on the determined focus of the subject and the estimated speech volume of the subject.
Acquiring image information of an image of a plurality of people staying in a space in which a display and a whiteboard are installed, and detecting a first position, which is the positions of the plurality of people in the space, based on the acquired image information. a first detection step;
a second detection step of acquiring the image information and detecting the orientation of the head of the target person among the plurality of people based on the acquired image information;
a focus determination step of determining the subject's focus on communication performed by the plurality of people in the space based on the detected first position and the detected orientation of the subject's head; including
In the focus determination step, when a person other than the target person among the plurality of people is speaking, the direction of the detected head of the target person is equal to that of the plurality of people other than the target person. determining a period during which the person is facing either the display or the whiteboard as a period during which the person is focusing on the communication.
A program for causing a computer to execute the focus determination method according to claim 8.