WO2023276700A1 - Système de détermination de mise au point, système d'analyse de communication et procédé de détermination de mise au point - Google Patents

Système de détermination de mise au point, système d'analyse de communication et procédé de détermination de mise au point Download PDF

Info

Publication number
WO2023276700A1
WO2023276700A1 PCT/JP2022/024136 JP2022024136W WO2023276700A1 WO 2023276700 A1 WO2023276700 A1 WO 2023276700A1 JP 2022024136 W JP2022024136 W JP 2022024136W WO 2023276700 A1 WO2023276700 A1 WO 2023276700A1
Authority
WO
WIPO (PCT)
Prior art keywords
person
space
detection unit
communication
people
Prior art date
Application number
PCT/JP2022/024136
Other languages
English (en)
Japanese (ja)
Inventor
一樹 北村
直毅 吉川
ジャマル ムリアナ ユスフ ビン
プラティック プラネイ
ジアリ マ
Original Assignee
パナソニックIpマネジメント株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パナソニックIpマネジメント株式会社 filed Critical パナソニックIpマネジメント株式会社
Priority to JP2023531788A priority Critical patent/JPWO2023276700A1/ja
Publication of WO2023276700A1 publication Critical patent/WO2023276700A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk

Definitions

  • the present invention relates to a focus determination system, a communication analysis system, and a focus determination method.
  • Patent Literature 1 discloses an information providing apparatus that provides information on the psychological state of a subject in order to evaluate intellectual activity in a meeting, meeting, or the like.
  • the present invention provides a focus determination system and the like that can determine how much a target person is focused on communication.
  • a focus determination system acquires video information of a video in which a plurality of people staying in a space in which a display and a whiteboard are installed, and based on the acquired video information, the plurality of people in the space a human detection unit that detects a first position that is the position of a person; and a head that acquires the video information and detects the orientation of the head of the target person among the plurality of people based on the acquired video information. Based on a head orientation detection unit, the detected first position, and the detected orientation of the subject's head, determining the subject's focus on communication conducted by the plurality of people in the space.
  • a focus determination unit wherein when a person other than the target person among the plurality of people is speaking, the direction of the detected head of the target person is in the direction of the plurality of people.
  • a period in which a person other than the target person, the display, or the whiteboard is facing is determined as a period in which the target person is concentrating on the communication.
  • a communication analysis system includes: the focus determination system; and a communication analysis unit that analyzes the quality of communication performed by the plurality of people in the space based on the determined focus of the subject. Prepare.
  • a focus determination method acquires video information of a video in which a plurality of people staying in a space in which a display and a whiteboard are installed, and based on the acquired video information, the plurality of people in the space a first detection step of detecting a first position that is the position of the person; obtaining the image information; and detecting the orientation of the head of the target person among the plurality of persons based on the obtained image information. Determining the subject's focus on communication conducted by the plurality of people in the space based on a second sensing step and the sensed first position and the sensed orientation of the subject's head.
  • the focus determination step when a person other than the target person among the plurality of people is speaking, the orientation of the detected head of the target person is equal to the plurality of people.
  • a period in which the person faces any one of the persons other than the target person, the display, or the whiteboard is determined as a period in which the person focuses on the communication.
  • a program according to one aspect of the present invention is a program for causing a computer to execute the focus determination method.
  • the focus determination system and the like of the present invention can determine how much the target person is focused on communication.
  • FIG. 1 is a block diagram showing the functional configuration of the communication analysis system according to the embodiment.
  • FIG. 2 is a diagram illustrating an example of a space to which the communication analysis system according to the embodiment is applied;
  • FIG. 3 is an external view of the sensing device according to the embodiment.
  • FIG. 4 is a flow chart of the operation of the sound source detection unit included in the communication analysis system according to the embodiment.
  • FIG. 5 is a diagram schematically showing detection results of the position of the sound source.
  • FIG. 6 is a flow chart of the operation of the human detection unit included in the communication analysis system according to the embodiment.
  • FIG. 7 is a diagram schematically showing detection results of a person's position.
  • FIG. 8 is a flowchart of the operation of the speech amount estimation unit included in the communication analysis system according to the embodiment.
  • FIG. 9 is a diagram schematically showing the detection result of the position of the speaker.
  • FIG. 10 is a diagram showing an example of information indicating the amount of speech.
  • FIG. 11 is a flowchart of the operation of the head orientation detection unit included in the communication analysis system according to the embodiment;
  • FIG. 12 is a diagram showing a three-dimensional coordinate space for explaining the operation of the head orientation detection unit included in the communication analysis system according to the embodiment;
  • FIG. 13 is a diagram for explaining the correction of the angle of the orientation of the head.
  • 14 is a flowchart of the operation of the focus determination unit provided in the communication analysis system according to the embodiment;
  • FIG. FIG. 15 is a diagram for explaining the intended direction of the subject.
  • FIG. 16 is a diagram illustrating an example of information indicating a focus period.
  • 17 is a flowchart of the operation of the communication analysis unit included in the communication analysis system according to the embodiment;
  • FIG. FIG. 18 is a diagram showing an example of a score display screen indicating the
  • each figure is a schematic diagram and is not necessarily strictly illustrated. Moreover, in each figure, the same code
  • FIG. 1 is a block diagram showing the functional configuration of the communication analysis system according to the embodiment.
  • FIG. 2 is a diagram showing an example of a space to which the communication analysis system is applied.
  • the communication analysis system 10 is used in an office having a space 70 such as a conference room to analyze the quality of communication of a plurality of people located in a space 70.
  • the space 70 is, for example, a closed space, but may be an open space. Examples of the space 70 include a conference room as well as an open rest area in an office space (where chairs and tables are placed in part of the office space). Also, the space 70 does not need to be physically separated, and may be a place separated by illumination light or airflow in the entire space. For example, a warm color area with a color temperature of 3000 K may be provided in a corner of an office space illuminated with daylight color with a color temperature of 5000 K, and this area may be used as the space 70 .
  • the communication analysis system 10 includes a sensing device 20 and an information processing system 30.
  • the sensing device 20 will be described with reference to FIG. 3 in addition to FIGS. 1 and 2.
  • FIG. FIG. 3 is an external view of the sensing device 20.
  • FIG. 3A is a top view of the sensing device 20
  • FIG. 3B is a side view of the sensing device 20.
  • the sensing device 20 is installed on the desk 40 installed in the space 70 and senses sounds and images in the space 70 .
  • the sensing device 20 is specifically installed in the center of the desk 40 .
  • the sensing device 20 includes a microphone array 21 , a plurality of cameras 22 and a ranging sensor 23 .
  • the microphone array 21 acquires sound in the space 70 and outputs sound information (a plurality of sound signals) of the acquired sound.
  • the microphone array 21 specifically includes a plurality of microphone elements, and each of the plurality of microphone elements acquires sound in the space 70 and outputs a sound signal of the acquired sound.
  • Each of the plurality of cameras 22 captures an image (in other words, moving image) of a person staying in the space 70 and outputs image information of the image.
  • the camera 22 is a general camera implemented by a CMOS image sensor or the like, but may be a fisheye camera or the like.
  • the sensing device 20 has four cameras so that the entire surroundings of the sensing device 20 can be photographed from the desk 40, and at least one camera capable of photographing all the people staying in the space 70 is provided. Just be prepared.
  • the distance measuring sensor 23 measures the distance from the sensing device 20 (camera 22) to the object, and outputs distance information indicating the measured distance to the object.
  • the object is, for example, a person staying in the space 70 .
  • the ranging sensor 23 is, for example, a TOF (Time Of Flight) type LiDAR (Light Detection and Ranging), but may be a range image sensor or the like.
  • Sensing device 20 may include at least one ranging sensor 23 , but may include a plurality of ranging sensors 23 corresponding to cameras 22 .
  • the information processing system 30 performs wired or wireless communication with the sensing device 20, and based on sensing information (specifically, sound information, video information, distance information, etc.) acquired from the sensing device 20 through the communication. , to analyze the quality of communication.
  • the information processing system 30 is, for example, an edge computer installed in a facility having a space 70, but may be a cloud computer installed outside the facility.
  • the sensing device 20 and the information processing system 30 may be realized as one integrated device.
  • part of the functions of the information processing system 30 may be implemented as an edge computer, and part of the other functions may be implemented by a cloud computer.
  • the information processing system 30 includes a sound source detection unit 31, a person detection unit 32, a head direction detection unit 33, a speech amount estimation unit 34, a focus determination unit 35, a communication analysis unit 36, and a storage unit 37. Prepare.
  • the sound source detection unit 31 acquires sound information of the sound acquired in the space 70 from the sensing device 20, and detects a second position, which is the position of the sound source in the space 70, based on the acquired sound information.
  • the human detection unit 32 acquires from the sensing device 20 image information of an image showing a person staying in the space 70, and detects the first position, which is the position of the person in the space 70, based on the acquired image information.
  • the head orientation detection unit 33 acquires from the sensing device 20 image information of an image of a person staying in the space 70, and detects the orientation of the person's head (in other words, the orientation of the face) based on the acquired image information. detect.
  • the head direction detection unit 33 may detect the direction of the person's line of sight based on the acquired video information.
  • the speech amount estimation unit 34 estimates the amount of human speech based on the first position detected by the human detection unit 32 and the second position detected by the sound source detection unit 31 .
  • the focus determination unit 35 detects a plurality of people including the person in the space 70 based on the first position detected by the person detection unit 32 and the orientation of the person's head detected by the head orientation detection unit 33. Determines a person's focus on communications made by
  • the communication analysis unit 36 analyzes the quality of communication based on at least one of the human speech volume estimated by the speech volume estimation unit 34 and the human focus determined by the focus determination unit 35 .
  • the communication analysis unit 36 also outputs analysis results.
  • Each of the sound source detection unit 31, the person detection unit 32, the head orientation detection unit 33, the speech amount estimation unit 34, the focus determination unit 35, and the communication analysis unit 36 described above is realized by a microcomputer or a processor.
  • the functions of the sound source detection unit 31, the human detection unit 32, the head orientation detection unit 33, the speech amount estimation unit 34, the focus determination unit 35, and the communication analysis unit 36 are, for example, stored in the memory unit 37 by the microcomputer or processor. It is implemented by executing a stored computer program.
  • the storage unit 37 is a storage device that stores the computer program and information necessary for realizing the functions of the components.
  • the storage unit 37 is implemented by, for example, an HDD (Hard Disk Drive), but may be implemented by a semiconductor memory.
  • a system including the sound source detection unit 31, the human detection unit 32, and the speech amount estimation unit 34 is also described as a speaker diarization system 38. That is, the speaker diarization system 38 comprises a sound source detection unit 31 , a person detection unit 32 and a speech amount estimation unit 34 . Speaker diarization system 38 may further comprise head orientation detection unit 33 or focus determination unit 35 .
  • a system including the human detection unit 32, the head orientation detection unit 33, and the focus determination unit 35 is also described as a focus determination system 39. That is, the focus determination system 39 includes the human detection unit 32 , the head orientation detection unit 33 , and the focus determination unit 35 .
  • the focus determination system 39 may further include the sound source detection unit 31 or the speech amount estimation unit 34 .
  • FIG. 4 is a flow chart of the operation of the sound source detection unit 31. As shown in FIG.
  • the sound source detection unit 31 acquires sound information of the sound acquired in the space 70 from the microphone array 21 of the sensing device 20 (S11). Specifically, the sound source detection unit 31 acquires multiple sound signals output by multiple microphone elements included in the microphone array 21 .
  • each of the acquired multiple sound signals is a signal in the time domain.
  • the sound source detection unit 31 transforms each of the plurality of sound signals from a time domain signal into a frequency domain signal by performing a Fourier transform (S12).
  • the sound source detection unit 31 calculates a spatial correlation matrix from the input vector determined based on the multiple sound signals after being transformed into the frequency domain (S13).
  • the microphone array 21 has M microphone elements, and the m-th sound signal after the Fourier transform is X m ( ⁇ , t), the input vector x( ⁇ , t) is expressed by the following equation. be done. T means transpose.
  • the spatial correlation matrix R is represented by the following formula.
  • H means conjugate transpose.
  • the frequency index ⁇ will be omitted for the sake of simplicity.
  • the sound source detection unit 31 calculates eigenvectors by eigenvalue decomposition of the spatial correlation matrix (S14). Specifically, the sound source detection unit 31 performs eigenvalue decomposition on the above spatial correlation matrix based on the following equation to obtain eigenvalue vectors e 1 , . . . e M and eigenvalues ⁇ 1 . can be calculated.
  • the sound source detection unit 31 detects the position of the sound source from the eigenvectors (S15). Specifically, the sound source detection unit 31 can identify the loudness of the sound and the direction from which the sound arrives from the eigenvector, and can detect the direction from which the relatively loud sound arrives as the direction (position) of the sound source. can.
  • the sound source detection unit 31 can detect in which direction (angle) the sound source is positioned with respect to the position O of the sensing device 20 .
  • FIG. 5 is a diagram (a diagram of the desk 40 viewed from above) schematically showing the detection result of the position of the sound source.
  • two sound sources S1 and S2 are detected.
  • the sound source detection unit 31 should detect at least the two-dimensional position of the sound source as shown in FIG. You may In the following embodiments, the position of the sound source detected by the sound source detection unit 31 is also described as the second position.
  • the sound source detection unit 31 can track the position of the sound source (second position) by repeating the operation of FIG. 4 every unit time.
  • the sound source is specifically a speaker (person) staying in the space 70 , but may also be a device installed in the space 70 .
  • FIG. 6 is a flow chart of the operation of the human detection unit 32. As shown in FIG. Note that the following operations are actually performed for video information acquired from each of the four cameras 22, but for the sake of convenience, the following description assumes that video information is acquired from one camera 22.
  • the human detection unit 32 acquires image information of the image acquired in the space 70 from the camera 22 of the sensing device 20 (S21).
  • the human detection unit 32 identifies an area in which a person appears in the image based on the acquired image information (S22).
  • the human detection unit 32 can identify an area in which a person appears in the video by a method using pattern matching, a method using a machine learning model, or the like.
  • the human detection unit 32 assigns identification information to the specified area (that is, the area where a person is present) (S23). For example, the human detection unit 32 identifies three areas and assigns identification information of A, B, and C to the identified three areas.
  • a person corresponding to the area A is also described as a person A, a person corresponding to the area B as a person B, and a person corresponding to the area C as a person C.
  • the human detection unit 32 identifies the direction in which the person is based on the position in the image of the area identified in step S23 (S24).
  • the storage unit 37 stores in advance information indicating the installation position of the camera 22 (the center on the desk 40) and the shooting range (angle of view) of the camera 22. It is possible to specify which direction the position corresponds to.
  • the human detection unit 32 estimates the distance from the sensing device 20 (camera 22) to the person based on the size of the area specified in step S23 (S25). In this case, it is estimated that the larger the area specified in step S23, the closer the distance from the sensing device 20 (camera 22) to the person.
  • the distance from the sensing device 20 (camera 22 ) to the person may be specified based on the distance information (measured distance value) acquired from the ranging sensor 23 .
  • the human detection unit 32 can detect the position of a person in the space 70, as shown in FIG.
  • FIG. 7 is a diagram (a diagram of the desk 40 viewed from above) schematically showing the detection result of the position of a person.
  • the human detection unit 32 detects a three-dimensional human position (three-dimensional coordinates of the human position). direction) can be detected.
  • the position of the person detected by the human detection unit 32 is also described as the first position.
  • the human detection unit 32 can track the position of a person (first position) by repeating the operation of FIG. 6 every unit time. At this time, the assignment of the identification information in step S23 may be performed only once for the first time.
  • FIG. 8 is a flow chart of the operation of the speech amount estimation unit 34. As shown in FIG.
  • the speech amount estimation unit 34 acquires the first position detected by the human detection unit 32 and the second position detected by the sound source detection unit 31 (S31). The first position and the second position acquired at this time are detected at substantially the same timing. “Substantially the same” means that some deviation may be included.
  • the speech amount estimation unit 34 converts the first position represented by three-dimensional coordinates into two-dimensional coordinates (angle) corresponding to the second position (S32). Note that if the position of the microphone array 21 and the position of the camera 22 are different, the first position after being converted into two-dimensional coordinates is corrected based on the difference between these positions.
  • FIG. 9 is a diagram schematically showing the detection result of the position of the speaker.
  • FIG. 9 is a superimposed view of the second position (FIG. 5) and the first position (FIG. 7).
  • the speech amount estimation unit 34 detects that the sound source S1 is the person A, for example, when the angle difference ⁇ 1 between the second position of the sound source S1 and the first position of the person A is less than or equal to a predetermined value. That is, the person A is detected as the speaker. Further, the speech amount estimation unit 34 detects that the sound source S2 is the person C, for example, when the angle difference ⁇ 2 between the second position of the sound source S2 and the first position of the person C is equal to or less than a predetermined value. That is, person C is detected as the speaker.
  • the speech amount estimation unit 34 tracks each of person A, person B, and person C by repeating the operation of FIG. can be estimated. Specifically, the speech amount estimation unit 34 estimates the period during which the person A, the person B, and the person C are detected as the speaker. It can be estimated that the That is, the speech amount estimation unit 34 can store information (information indicating the amount of speech) indicating the period during which the person A, person B, and person C are speaking in the storage unit 37 .
  • FIG. 10 is a diagram showing an example of information indicating the amount of speech. As shown in FIG. 10, the information indicating the amount of speech is information in which the amount of speech (speech time) is associated with each piece of identification information assigned in step S23.
  • the speech amount estimation unit 34 tracks each of the plurality of persons based on the first position detected by the human detection unit 32 and the second position detected by the sound source detection unit 31, and It is possible to estimate the amount of speech of each of the plurality of people staying in the space 70 (the amount of speech of the target person).
  • Such a method of estimating the amount of speech by the speech amount estimation unit 34 is useful for analyzing the quality of communication in a conference or the like involving movement of seats by a plurality of people.
  • the movement of seats by a plurality of people means moving to use the whiteboard 60, for example.
  • the speech volume estimation unit 34 estimates the speech volume of each of the plurality of people while maintaining the anonymity of the plurality of people, instead of performing individual identification by voice recognition and individual identification by image recognition. can be done.
  • FIG. 11 is a flow chart of the operation of the head orientation detection unit 33.
  • FIG. 12 is a diagram showing a three-dimensional coordinate space for explaining the operation of the head orientation detection unit 33.
  • the camera 22 is positioned at a position on the desk 40, not at the center position O (0, 0, 0) of the desk 40. It is described as being located at C(x0, y0, z0).
  • FIG. 12 also shows the coordinates in the image indicated by UV.
  • the head orientation detection unit 33 acquires image information of the image acquired in the space 70 from the camera 22 of the sensing device 20 (S41).
  • the head orientation detection unit 33 identifies the orientation of the person's head in the image based on the acquired image information (S42).
  • the head orientation detection unit 33 detects the orientation of the person's head by, for example, identifying an area corresponding to the person's head in the video by face recognition processing and applying a machine learning model to the identified area. can do.
  • the vector indicating the direction of the head at this time forms an angle of A° with respect to the segment (straight line) connecting the position P of the person and the position C of the camera. .
  • the head direction detection unit 33 acquires the position of the person (more specifically, the position of the person's head) detected by the person detection unit 32 (S43). As shown in FIG. 12, the position of the person at this time is P(x1, y1, z1).
  • the horizontal viewing angle ⁇ of the camera 22 the position (u, v) of the person in the image, and the width w of the image, , x1 can be expressed by the following equation.
  • Information such as the size of the desk 40, the horizontal viewing angle ⁇ of the camera 22, and the width w of the image is stored in the storage unit 37 in advance.
  • the head orientation detection unit 33 detects the orientation of the head (angle A) with reference to the position of the camera 22 specified in step S42, and the position (specifically, coordinates) of the person acquired in step S43. x1 value) (S44).
  • FIG. 13 is a diagram (a diagram of the desk 40 viewed from above) for explaining such angle correction.
  • the head orientation detection unit 33 can calculate ⁇ OPC in FIG. 13 based on the x1 coordinate acquired in step S43, and can set A+ ⁇ OPC as the corrected angle.
  • the head orientation detection unit 33 can track the orientation of the person's head by repeating the operation of FIG. 11 every unit time.
  • FIG. 14 is a flow chart of the operation of the focus determination unit 35. As shown in FIG.
  • the focus determination unit 35 acquires the first position detected by the human detection unit 32 and the orientation of the person's head detected by the head orientation detection unit 33 (S51). More specifically, the first position acquired at this time is the first position of each of the plurality of people staying in the space 70, and the orientation of the head of each of the plurality of persons is the orientation of the head of each of the plurality of persons. be.
  • FIG. 15 is a diagram (a diagram of the desk 40 viewed from above) for explaining the intended direction of the subject.
  • the focus determination unit 35 determines a direction connecting position G where another person is from position P as the target direction.
  • the target direction does not necessarily have to be determined for a person, and may be determined for the display 50 and the whiteboard 60 as described later.
  • the focus determination unit 35 determines the allowable range around the target direction (S53). For example, in the example of FIG. 15, the range from ⁇ GPO ⁇ to ⁇ GPO+ ⁇ is determined as the allowable range. ⁇ is a coefficient for determining the allowable range. The value of ⁇ is different depending on whether the target direction is a person, the display 50 or the whiteboard 60 .
  • the focus determination unit 35 determines the focus of the subject by comparing the orientation of the subject's head acquired in step S51 with the allowable range determined in step S53 (S54). Focus here means focus on communication performed by a plurality of people including the target person in the space 70 . The focus determination unit 35 determines that the target person is focusing on communication when the direction of the target person's head is within the allowable range, since it is considered that the target person is looking at another person. On the other hand, when the target person's head orientation is outside the allowable range, the focus determination unit 35 determines that the target person is not focusing on communication because it is considered that the target person is not looking at other people.
  • the focus determination unit 35 repeats the operation of FIG. C can determine the focus of each. Specifically, the focus determination unit 35 accumulates the periods in which each of person A, person B, and person C is determined to be focused, and indicates the period in which multiple people are focused on communication. Information can be stored in the storage unit 37 .
  • FIG. 16 is a diagram illustrating an example of information indicating a focus period. As shown in FIG. 16, the information indicating the amount of speech is information in which each piece of identification information assigned in step S23 is associated with a focus period.
  • the focus determination unit 35 determines whether a plurality of people are performing in the space 70 based on the first position detected by the human detection unit 32 and the orientation of the head detected by the head orientation detection unit 33 . It is possible to determine the subject's focus on the communication received.
  • the focus determination unit 35 further acquires the second position detected by the sound source detection unit 31 to detect the position of the speaker, or acquires the position of the speaker detected by the speech amount estimation unit 34. You may In either case, the focus determination unit 35 can determine focus by considering whether or not each of the plurality of people is a speaker. For example, only when the subject's head is facing the speaker's direction, it is determined that the subject is focusing on communication, and when the subject's head is looking in the direction of the person who is not speaking, the subject is focusing on communication. It can be determined that they are not.
  • the focus determination unit 35 may determine focus as follows.
  • the target person (assumed to be a person C) is positioned right beside the target person even if the person A located right beside the target person is speaking. It is difficult to turn to the person who is located. Therefore, it is considered to look at the other person B.
  • the target person faces the display 50 (shown in FIG. 2) and the whiteboard 60 (shown in FIG. 2) on which meeting report materials and the like are displayed.
  • the target person is focusing on communication. It may be determined that That is, when a person other than the target person among the plurality of people is speaking, the focus determination unit 35 determines that the orientation of the detected target person's head is A period in which the person faces either 50 or the whiteboard 60 may be determined as a period in which the person is concentrating on communication. Note that the period during which the target person is facing the whiteboard 60 may be determined as the period during which the target person is concentrating on communication under the condition that person A is speaking near the whiteboard. .
  • FIG. 17 is a flow chart of the operation of the communication analysis unit 36. As shown in FIG.
  • the communication analysis unit 36 reads out the information indicating the amount of speech (FIG. 10) and the information indicating the focus period (FIG. 16) stored in the storage unit 37 (S61).
  • the communication analysis unit 36 analyzes the quality of communication in the space 70 based on the obtained information indicating the amount of speech and the obtained information indicating the focus period. For example, the communication analysis unit scores communication quality with the following scoring criteria.
  • the communication analysis unit 36 calculates, for example, the ratio of the speech volumes of persons A to C based on the acquired information indicating the speech volumes, and the minimum/maximum value of this ratio is close to 1 (that is, the person A ⁇ Person C speaks more evenly), the higher the first score related to the amount of speech is set. Specifically, when the ratio of the amount of speech from person A to person C is 1:1.2:1.5, the communication analysis unit 36 calculates 1 (minimum value)/1.5 (maximum value) and 1 A first score is calculated based on the difference between .
  • the communication analysis unit 36 for example, based on the acquired information indicating the focus period, sets the second score related to focus to a larger value as the average value of the focus periods of persons A to C increases.
  • the communication analysis unit 36 calculates the sum of the first score and the second score as the final score indicating the quality of communication in the space 70.
  • Each of the persons A to C for example, by accessing the communication analysis system 10 (information processing system 30) using an information terminal such as a smartphone or a personal computer, obtains a final score indicating the quality of communication (that is, the quality of communication quality analysis results) can be confirmed.
  • FIG. 18 is a diagram showing an example of a score display screen indicating the quality of communication displayed on the information terminal.
  • the communication analysis unit 36 can analyze the quality of communication performed by a plurality of people in the space 70 based on the speech volume estimation result and the focus determination result.
  • the communication analysis unit 36 may analyze the quality of communication performed by a plurality of people in the space 70 based on at least one of the speech volume estimation result and the focus determination result.
  • the score calculation criteria as described above are merely an example, and the score may be appropriately determined according to the content required for communication in the space 70 .
  • the microphone array 21, the camera 22, and the ranging sensor 23 are installed on the desk 40.
  • the microphone array 21, camera 22 and ranging sensor 23 may be installed on the ceiling.
  • the microphone array 21, the camera 22, and the ranging sensor 23 do not have to be concentrated in one place, and the microphone array 21, the camera 22, and the ranging sensor 23 may be installed in different places. good.
  • the sound source detection unit 31 and the human detection unit 32 are used together to detect the speaker, but the speaker can also be detected with the human detection unit 32 alone.
  • the human detection unit 32 can detect whether or not the person is speaking from the movement of the person in the video.
  • the communication analysis unit 36 calculates the final score indicating the quality of communication (that is, the analysis result of the quality of communication) in real time while the meeting or the like is being held in the space 70, and calculates the final score.
  • the environment in the space 70 may be controlled in real time based on the final score obtained.
  • the communication analysis unit 36 determines that the calculated final score is less than a predetermined value (that is, the communication is not active), it sends a control signal to the environment control device (not shown) installed in the space 70. By doing so, the environment of the space 70 is controlled.
  • a predetermined value that is, the communication is not active
  • the environment control devices include an air conditioner that controls the temperature environment of the space 70, a lighting device that controls the light environment of the space 70, a fragrance generator that controls the smell of the space 70, and the sound of the space 70. Examples include a music player that controls the environment.
  • the communication analysis unit 36 can activate communication in the space 70 by raising the set temperature of the air conditioner or making the lighting equipment brighter than it is now.
  • the communication analysis unit 36 activates the communication in the space 70 by activating the scent generator installed in the space 70 or by causing the music playback device installed in the space 70 to play back music. good too.
  • the focus determination system 39 acquires image information of images showing a plurality of people staying in the space 70 in which the display 50 and the whiteboard 60 are installed, and based on the acquired image information,
  • a human detection unit 32 that detects a first position that is the position of a plurality of people, and a head that acquires image information and detects the orientation of the head of a target person among the plurality of people based on the acquired image information.
  • An orientation detection unit 33 and a focus determination unit that determines the target person's focus on communication performed by a plurality of people in the space 70 based on the detected first position and the detected target person's head orientation. 35.
  • the focus determination unit 35 determines that when a person other than the target person among the plurality of people is speaking, the orientation of the detected target person's head is different from that of the person other than the target person among the plurality of people, the display 50, Also, the period during which the target person faces any one of the whiteboards 60 is determined as the period during which the subject is concentrating on communication.
  • Such a focus determination system 39 can determine how much the target person is focused on communication.
  • a desk 40 is installed in the space 70 , and the human detection unit 32 acquires video information from the camera 22 installed on the desk 40 .
  • Such a focus determination system 39 can acquire video information from the camera 22 installed on the desk 40 .
  • a plurality of cameras 22 are installed in the space 70 , and the image information acquired by the human detection unit 32 includes image information of images captured by each of the plurality of cameras 22 .
  • Such a focus determination system 39 can acquire video information from multiple cameras 22 installed on the desk 40 .
  • a ranging sensor 23 that measures the distance from the camera 22 to the subject is installed.
  • the human detection unit 32 detects the first position based on the acquired image information and the detection result of the distance measuring sensor 23 .
  • Such a focus determination system 39 can detect the first position based on the detection result of the distance measuring sensor 23 .
  • the human detection unit 32 estimates the distance from the human detection unit 32 to the target person based on the size of the target person in the image in order to detect the first position.
  • Such a focus determination system 39 can estimate the distance from the human detection unit 32 to the target person based on the size of the target person in the image.
  • the communication analysis system 10 also includes a focus determination system 39 and a communication analysis unit 36 that analyzes the quality of communication conducted by a plurality of people in the space 70 based on the determined focus of the subject.
  • Such a communication analysis system 10 can analyze the quality of communication based on the focus of the target person.
  • the communication analysis system 10 further acquires sound information of the sound acquired in the space 70, and a sound source detection unit that detects a second position, which is the position of the sound source in the space 70, based on the acquired sound information.
  • a speech volume estimation unit 34 for tracking the subject and estimating the subject's speech volume during tracking based on the detected first position and the detected second position.
  • the communication analysis unit 36 analyzes the quality of communication based on the determined focus of the subject and the estimated speech volume of the subject.
  • Such a communication analysis system 10 can analyze the quality of communication based on the subject's focus and the subject's utterance volume.
  • the focus determination method executed by a computer such as the focus determination unit 35 acquires video information of a video in which a plurality of people staying in the space 70 in which the display 50 and the whiteboard 60 are installed, and acquires the acquired video information.
  • a first detection step of detecting a first position which is the position of a plurality of people in the space 70, based on the above;
  • Based on the second detecting step of detecting orientation, the detected first position, and the detected orientation of the subject's head determining the subject's focus on communication conducted by a plurality of people in the space 70. and a focus determination step.
  • the focus determination step when a person other than the target person among the plurality of people is speaking, the orientation of the detected target person's head is different from that of the person other than the target person among the plurality of people, the display 50, Also, the period during which the person faces any one of the whiteboards 60 is determined as the period during which the person is concentrating on communication.
  • Such a focus determination method can determine how much the target person is focused on communication.
  • the communication analysis system was implemented by multiple devices, but it may be implemented as a single device.
  • the communication analysis system may be implemented as a single device corresponding to an information processing system, a speaker diarization system, or an attention determination system.
  • the functional components included in the communication analysis system may be distributed to the multiple devices in any way.
  • the communication method between devices in the above embodiment is not particularly limited. Further, a relay device (not shown) may intervene in communication between devices.
  • processing executed by a specific processing unit may be executed by another processing unit.
  • order of multiple processes may be changed, and multiple processes may be executed in parallel.
  • each component may be realized by executing a software program suitable for each component.
  • Each component may be realized by reading and executing a software program recorded in a recording medium such as a hard disk or a semiconductor memory by a program execution unit such as a CPU or processor.
  • each component may be realized by hardware.
  • each component may be a circuit (or integrated circuit). These circuits may form one circuit as a whole, or may be separate circuits. These circuits may be general-purpose circuits or dedicated circuits.
  • the present invention may be implemented as a speech amount estimation method executed by a computer such as a speaker diarization system, or may be implemented as a program for causing a computer to execute such a speech amount estimation method.
  • a computer such as a speaker diarization system
  • it may be realized as a computer-readable non-temporary recording medium in which such a program is recorded.
  • the present invention may be implemented as a focus determination method executed by a computer such as a focus determination system, or may be implemented as a program for causing a computer to execute such a focus determination method. It may be implemented as a computer-readable non-temporary recording medium on which a program is recorded.
  • communication analysis system 20 sensing device 21 microphone array 22 camera 23 ranging sensor (sensor) 30 information processing system 31 sound source detection unit 32 person detection unit 33 head orientation detection unit 34 speech amount estimation unit 35 focus determination unit 36 communication analysis unit 37 memory unit 38 speaker diarization system 39 focus determination system 40 desk 50 display 60 white board 70 space

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Telephonic Communication Services (AREA)

Abstract

L'invention concerne un système de détermination de mise au point (39) comprenant : une unité de détection de personne (32) permettant de détecter des premières positions, qui sont les positions d'une pluralité de personnes dans un espace (70), d'après les informations vidéo ; une unité de détection d'orientation de tête (33) permettant de détecter l'orientation de la tête d'un sujet parmi la pluralité de personnes, d'après les informations vidéo ; et une unité de détermination de mise au point (35) permettant de déterminer, d'après les premières positions détectées et l'orientation détectée de la tête du sujet, que lorsqu'une personne de la pluralité de personnes, autre que le sujet, est en train de parler, une période pendant laquelle l'orientation détectée de la tête du sujet est en regard d'une quelconque personne de la pluralité de personnes, autre que le sujet, d'un dispositif d'affichage (50) ou d'un tableau blanc (60), est une période pendant laquelle le sujet se concentre sur une communication.
PCT/JP2022/024136 2021-06-28 2022-06-16 Système de détermination de mise au point, système d'analyse de communication et procédé de détermination de mise au point WO2023276700A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2023531788A JPWO2023276700A1 (fr) 2021-06-28 2022-06-16

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021106485 2021-06-28
JP2021-106485 2021-06-28

Publications (1)

Publication Number Publication Date
WO2023276700A1 true WO2023276700A1 (fr) 2023-01-05

Family

ID=84692337

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/024136 WO2023276700A1 (fr) 2021-06-28 2022-06-16 Système de détermination de mise au point, système d'analyse de communication et procédé de détermination de mise au point

Country Status (2)

Country Link
JP (1) JPWO2023276700A1 (fr)
WO (1) WO2023276700A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018036868A (ja) * 2016-08-31 2018-03-08 株式会社リコー 会議支援システム、会議支援装置、及び会議支援方法
JP2019057061A (ja) * 2017-09-20 2019-04-11 富士ゼロックス株式会社 情報出力装置及びプログラム
JP2019200475A (ja) * 2018-05-14 2019-11-21 富士通株式会社 活性度評価プログラム、装置、及び方法
WO2021090702A1 (fr) * 2019-11-07 2021-05-14 ソニー株式会社 Dispositif de traitement d'informations, procédé de traitement d'informations et programme

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018036868A (ja) * 2016-08-31 2018-03-08 株式会社リコー 会議支援システム、会議支援装置、及び会議支援方法
JP2019057061A (ja) * 2017-09-20 2019-04-11 富士ゼロックス株式会社 情報出力装置及びプログラム
JP2019200475A (ja) * 2018-05-14 2019-11-21 富士通株式会社 活性度評価プログラム、装置、及び方法
WO2021090702A1 (fr) * 2019-11-07 2021-05-14 ソニー株式会社 Dispositif de traitement d'informations, procédé de traitement d'informations et programme

Also Published As

Publication number Publication date
JPWO2023276700A1 (fr) 2023-01-05

Similar Documents

Publication Publication Date Title
EP3400705B1 (fr) Detection de location de l'intervenant active
US7039199B2 (en) System and process for locating a speaker using 360 degree sound source localization
JP6519370B2 (ja) ユーザ注意判定システム、方法及びプログラム
CN108089152B (zh) 一种设备控制方法、装置及系统
JP5004276B2 (ja) 音源方向判定装置及び方法
US10582117B1 (en) Automatic camera control in a video conference system
US10241990B2 (en) Gesture based annotations
US10645520B1 (en) Audio system for artificial reality environment
WO2010109700A1 (fr) Dispositif de détermination d'objet tridimensionnel, procédé de détermination d'objet tridimensionnel et programme de détermination d'objet tridimensionnel
JP4595364B2 (ja) 情報処理装置および方法、プログラム、並びに記録媒体
JP2018520595A (ja) 多因子画像特徴登録及び追尾のための方法、回路、装置、システム、及び、関連するコンピュータで実行可能なコード
CN111551921A (zh) 一种声像联动的声源定向系统及方法
WO2009119288A1 (fr) Système de communication et programme de communication
WO2023276700A1 (fr) Système de détermination de mise au point, système d'analyse de communication et procédé de détermination de mise au point
WO2023276701A1 (fr) Système de segmentation et de regroupement de locuteur, système d'analyse de communication et procédé d'estimation de quantité d'énonciation
WO2021033592A1 (fr) Appareil de traitement d'informations, procédé de traitement d'informations et programme
RU174044U1 (ru) Аудиовизуальный многоканальный детектор наличия голоса
KR101976937B1 (ko) 마이크로폰 어레이를 이용한 회의록 자동작성장치
CN111273232B (zh) 一种室内异常情况判断方法及系统
Li et al. Multiple active speaker localization based on audio-visual fusion in two stages
WO2021090702A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations et programme
WO2021065694A1 (fr) Procédé et système de traitement d'informations
CN110730378A (zh) 一种信息处理方法及系统
US9883142B1 (en) Automated collaboration system
JPWO2023276700A5 (fr)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22832850

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023531788

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE