JP2019095552A

JP2019095552A - Voice analysis system, voice analysis device, and voice analysis program

Info

Publication number: JP2019095552A
Application number: JP2017223705A
Authority: JP
Inventors: 充洋北村; Mitsuhiro Kitamura
Original assignee: Kyocera Document Solutions Inc
Current assignee: Kyocera Document Solutions Inc
Priority date: 2017-11-21
Filing date: 2017-11-21
Publication date: 2019-06-20

Abstract

To enable effective responding to a symptom of bullying.SOLUTION: A sound collection part 1 collects a voice in a school classroom, and outputs a voice signal corresponding to the voice. A voice input part 51 receives the voice signal from the voice collection part 1. A voice analysis part 52 detects the occurrence of a predetermined event related to a symptom of bullying from the voice signal, and specifies at least one of a victim and a related person in the event on the basis of at least one of a voiceprint of the voice indicating the occurrence of the predetermined event, the occurrence place of the predetermined event, and a name detected from the voice indicating the occurrence of the predetermined event.SELECTED DRAWING: Figure 1

Description

本発明は、音声解析システム、音声解析装置、および音声解析プログラムに関するものである。 The present invention relates to a voice analysis system, a voice analysis device, and a voice analysis program.

学校などの集団におけるいじめが問題視されており、いじめの兆候を早期に発見し対応策を講ずることが求められている。 Bullying in groups such as schools is regarded as a problem, and it is required to detect signs of bullying early and take countermeasures.

ある音声データ解析装置は、学校内で特定の時間帯に音声録音を行い、音声録音で得られた音声データから誹謗中傷キーワードを検出することで、いじめの早期発見を可能にしている（例えば特許文献１参照）。 Some voice data analysis devices make possible voice early detection of bullying by performing voice recording in a specific time zone in a school, and detecting a slandering keyword from voice data obtained by voice recording (for example, patent) Reference 1).

特開２００８−１６５０９７号公報JP, 2008-165097, A

しかしながら、上述の音声データ解析装置では、音声からいじめ行為を自動的に発見することはできるが、いじめの被害者や関連者（加害者、同調者など）を特定することは困難であり、いじめの被害者や関連者に対する実効的な対応が行えない可能性がある。 However, although the above-mentioned voice data analysis device can automatically detect a bullying action from voice, it is difficult to identify a victim or a related person (a perpetrator, a companion, etc.) of the bullying, and the bullying is May not be able to effectively respond to

本発明は、上記の問題に鑑みてなされたものであり、いじめ兆候に対して実効的な対応を可能にする音声解析システム、音声解析装置、および音声解析プログラムを得ることを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to obtain a voice analysis system, a voice analysis device, and a voice analysis program that enable an effective response to bullying signs.

本発明に係る音声解析システムは、学校の教室内の音声を集音し前記音声に対応する音声信号を出力する集音部と、前記集音部から前記音声信号を受け付ける音声解析装置とを備える。そして、前記音声解析装置は、前記音声信号から、いじめ兆候に関する所定イベントの発生を検出するとともに、前記所定イベントの発生を示す音声の声紋、前記所定イベントの発生場所、および前記所定イベントの発生を示す音声から検出された名前の少なくとも１つに基づいて、前記イベントにおける被害者および関連者の少なくとも一方を特定する。 A voice analysis system according to the present invention includes a sound collection unit that collects voice in a classroom of a school and outputs a voice signal corresponding to the voice, and a voice analysis device that receives the voice signal from the sound collection unit. . Then, the voice analysis device detects the occurrence of a predetermined event related to a bullying sign from the voice signal, and generates a voiceprint of voice indicating the occurrence of the predetermined event, the occurrence location of the predetermined event, and the occurrence of the predetermined event. At least one of a victim and a related person in the event is identified based on at least one of the names detected from the indicated speech.

本発明に係る音声解析装置は、学校の教室内の音声を集音し前記音声に対応する音声信号を出力する集音部から前記音声信号を受け付ける音声入力部と、前記音声信号から、いじめ兆候に関する所定イベントの発生を検出するとともに、前記所定イベントの発生を示す音声の声紋、前記所定イベントの発生場所、および前記所定イベントの発生を示す音声から検出された名前の少なくとも１つに基づいて、前記イベントにおける被害者および関連者の少なくとも一方を特定する音声解析部とを備える。 A voice analysis apparatus according to the present invention collects a voice in a classroom of a school and outputs a voice signal corresponding to the voice from the voice collection unit that receives the voice signal from the voice collection unit; Based on at least one of a voice print of voice indicating the occurrence of the predetermined event, a location of the predetermined event, and a name detected from voice indicating the occurrence of the predetermined event. And a voice analysis unit for identifying at least one of a victim and a related person at the event.

本発明に係る音声解析プログラムは、コンピューターを、学校の教室内の音声を集音し前記音声に対応する音声信号を出力する集音部から前記音声信号を受け付ける音声入力部、並びに前記音声信号から、いじめ兆候に関する所定イベントの発生を検出するとともに、前記所定イベントの発生を示す音声の声紋、前記所定イベントの発生場所、および前記所定イベントの発生を示す音声から検出された名前の少なくとも１つに基づいて、前記イベントにおける被害者および関連者の少なくとも一方を特定する音声解析部として機能させる。 A voice analysis program according to the present invention includes a voice input unit that receives a voice signal from a sound collection unit that picks up a voice in a classroom of a school and outputs a voice signal corresponding to the voice from a computer; A voice print of a voice indicating a occurrence of a predetermined event related to a bullying sign and indicating the occurrence of the predetermined event, a location where the predetermined event occurs, and a name detected from a voice indicating the occurrence of the predetermined event Based on it, it functions as a voice analysis unit that identifies at least one of the victim and the related person in the event.

本発明によれば、いじめ兆候に対して実効的な対応を可能にする音声解析システム、音声解析装置、および音声解析プログラムが得られる。 According to the present invention, it is possible to obtain a speech analysis system, a speech analysis device, and a speech analysis program that enable an effective response to bullying signs.

本発明の上記又は他の目的、特徴および優位性は、添付の図面とともに以下の詳細な説明から更に明らかになる。 The above or other objects, features and advantages of the present invention will become more apparent from the following detailed description in conjunction with the accompanying drawings.

図１は、本発明の実施の形態に係る音声解析システムの構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of the speech analysis system according to the embodiment of the present invention. 図２は、図１における生徒データ３１の一例を示す図である。FIG. 2 is a view showing an example of the student data 31 in FIG. 図３は、図１におけるイベント発生履歴データ４１の一例を示す図である。FIG. 3 is a diagram showing an example of the event occurrence history data 41 in FIG.

以下、図に基づいて本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described based on the drawings.

図１は、本発明の実施の形態に係る音声解析システムの構成を示すブロック図である。図１に示す音声解析システムは、集音部１と音声解析装置２とを備える。集音部１は、学校の教室内の音声を集音しその音声に対応する音声信号を出力する。例えば、集音部１は、学校の教室に設置されたマイクロフォンなどである。なお、集音部１には、必要に応じて、アンプ、通信回路などの電子回路、電源回路などが含まれる。また、音声解析装置２は、学校内に設置された端末装置または電子機器、外部のサーバーなどである。また、集音部１と音声解析装置２との接続は、有線でも無線でもよい。 FIG. 1 is a block diagram showing the configuration of the speech analysis system according to the embodiment of the present invention. The voice analysis system shown in FIG. 1 includes a sound collection unit 1 and a voice analysis device 2. The sound collection unit 1 collects the voice in the classroom of the school and outputs a voice signal corresponding to the voice. For example, the sound collection unit 1 is a microphone or the like installed in a classroom of a school. The sound collection unit 1 includes an amplifier, an electronic circuit such as a communication circuit, a power supply circuit, and the like as necessary. The voice analysis device 2 is a terminal device or an electronic device installed in a school, an external server, or the like. The connection between the sound collection unit 1 and the voice analysis device 2 may be wired or wireless.

音声解析装置２は、集音部１からその音声信号を受け付け、その音声信号に対して所定の音声解析処理を行う。具体的には、音声解析装置２は、その音声信号から、いじめ兆候に関する所定イベントの発生を検出するとともに、その所定イベントの発生を示す音声の声紋およびその所定イベントの発生場所の少なくとも一方に基づいて、そのイベントにおける被害者および関連者の少なくとも一方を特定する。 The voice analysis device 2 receives the voice signal from the sound collection unit 1 and performs predetermined voice analysis processing on the voice signal. Specifically, the voice analysis device 2 detects the occurrence of a predetermined event related to a bullying sign from the voice signal, and based on at least one of the voice print of voice indicating the occurrence of the predetermined event and the occurrence location of the predetermined event. Identify the victim and / or the related person at the event.

図１に示すように、音声解析装置２は、ユーザーインターフェイス１１、通信インターフェイス１２、記憶装置１３、および演算処理装置１４を備える。 As shown in FIG. 1, the voice analysis device 2 includes a user interface 11, a communication interface 12, a storage device 13, and an arithmetic processing device 14.

ユーザーインターフェイス１１は、ユーザーに各種情報を表示する表示装置（ディスプレイパネルなど）およびユーザー操作を受け付ける入力装置（キーボードなど）を備える。通信インターフェイス１２は、外部装置と通信するための周辺機器インターフェイス、ネットワークインターフェイスなどである。 The user interface 11 includes a display device (such as a display panel) that displays various information to the user and an input device (such as a keyboard) that receives user operations. The communication interface 12 is a peripheral device interface, a network interface or the like for communicating with an external device.

記憶装置１３は、各種プログラムおよび各種データを格納している不揮発性の記憶装置（ハードディスクドライブ、フラッシュメモリーなど）である。ここでは、記憶装置１３は、音声解析プログラム２１を格納している。また、記憶装置１３には、生徒データ３１、いじめ兆候イベントリスト３２、およびキーワードリスト３３が格納される。さらに、記憶装置１３には、イベント発生履歴データ４１が格納される。 The storage device 13 is a non-volatile storage device (hard disk drive, flash memory, etc.) storing various programs and various data. Here, the storage device 13 stores a voice analysis program 21. Further, in the storage device 13, student data 31, a bullying sign event list 32, and a keyword list 33 are stored. Furthermore, event occurrence history data 41 is stored in the storage device 13.

演算処理装置１４は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）などを備えたコンピューターであって、ＲＯＭや記憶装置１３に格納されているプログラムをＲＡＭにロードしＣＰＵで実行することで、各種処理部として動作する。 The arithmetic processing unit 14 is a computer provided with a central processing unit (CPU), a read only memory (ROM), a random access memory (RAM), etc., and loads a program stored in the ROM or the storage device 13 into the RAM. By being executed by the CPU, it operates as various processing units.

ここでは、演算処理装置１４は、音声解析プログラム２１で実行することで、音声入力部５１、音声解析部５２、および教育支援部５３として動作する。 Here, the arithmetic processing unit 14 operates as the voice input unit 51, the voice analysis unit 52, and the education support unit 53 by being executed by the voice analysis program 21.

音声入力部５１は、集音部１から音声信号を受け付ける。なお、集音部１からアナログ音声信号が出力される場合、アナログデジタル変換器で、そのアナログ音声信号がデジタル音声信号に変換され、そのデジタル音声信号が音声入力部５１に入力される。 The voice input unit 51 receives a voice signal from the sound collection unit 1. When an analog audio signal is output from the sound collection unit 1, the analog audio signal is converted into a digital audio signal by the analog-to-digital converter, and the digital audio signal is input to the audio input unit 51.

音声解析部５２は、その音声信号から、いじめ兆候に関する所定イベントの発生を検出するとともに、その所定イベントの発生を示す音声の声紋およびその所定イベントの発生場所の少なくとも一方に基づいて、そのイベントにおける被害者および関連者の少なくとも一方を特定する。 The voice analysis unit 52 detects the occurrence of a predetermined event related to a bullying sign from the voice signal, and at the same time based on at least one of the voiceprint of voice indicating the occurrence of the predetermined event and the occurrence location of the predetermined event. Identify at least one of the victim and the related person.

例えば、いじめ兆候イベントリスト３２に記述されている後述のいずれかの条件が成立した際に、所定イベントの発生が検出されたと判定される。また、上述の音声の発言者が被害者および関連者のいずれであるかは、後述のイベント種別の特定のように、発言音声に含まれる単語（生徒の名前、特定キーワードなど）から特定される。 For example, when one of the conditions described later described in the bullying sign event list 32 is satisfied, it is determined that the occurrence of the predetermined event is detected. In addition, whether the speaker of the above-mentioned voice is a victim or a related person is specified from a word (student's name, a specific keyword, etc.) included in the utterance voice, as in the case of specifying an event type described later. .

例えば、音声解析部５２は、上述の音声信号から、（ａ）所定イベントの発生を示す音声を特定し、（ｂ）教室の複数の生徒と複数の生徒の声紋データとをそれぞれ関連付けている生徒データ３１に基づいて、特定した音声の声紋に合致する声紋データの生徒を、被害者または関連者として特定する。 For example, the voice analysis unit 52 identifies (a) voice indicating occurrence of a predetermined event from the above-described voice signal, and (b) a student who associates a plurality of students in the classroom with voiceprint data of the plurality of students. Based on the data 31, a student of voiceprint data that matches the voiceprint of the specified voice is identified as a victim or a related person.

例えば、生徒データ３１は、さらに、教室の複数の生徒と複数の生徒に割り当てられている座席の座席位置とをそれぞれ関連付けている。そして、音声解析部５２は、座席位置と生徒との対応関係に基づいて、座席に着席している際の（例えば所定時間帯の朝礼時や授業時の）生徒の発言時の音声を含む音声信号から、当該生徒の声紋データを生成し、生成した声紋データと当該生徒とを互いに関連付けて生徒データ３１に追加する。 For example, the student data 31 further associates the plurality of students in the classroom with the seat positions of the seats assigned to the plurality of students, respectively. Then, based on the correspondence relationship between the seat position and the student, the voice analysis unit 52 is a voice including a voice when the student speaks (for example, at the morning meeting or at a class time in a predetermined time zone) while sitting at the seat. Voiceprint data of the student is generated from the signal, and the generated voiceprint data and the student are associated with each other and added to the student data 31.

また、例えば、集音部１は、複数の位置で集音された音声に対応する複数の音声信号を出力し、音声入力部５１は、その複数の音声信号を受け付ける。そして、音声解析部５２は、その複数の音声信号から、（ａ）所定イベントの発生を示す音声の発生位置を特定し、（ｂ）教室の複数の生徒と複数の生徒に割り当てられている座席の座席位置とをそれぞれ関連付けている生徒データ３１に基づいて、特定した発生位置に合致する座席位置の生徒を、被害者または関連者として特定する。例えば、複数の集音位置の座標、複数の集音位置での音声の集音タイミングなどから、音声発生位置が特定される。また、例えば、集音部１が複数の指向性マイクロフォンで複数の位置での集音を行い、複数の集音位置の座標、複数の指向性マイクロフォンの集音音量などから、音声発生位置が特定される。 In addition, for example, the sound collection unit 1 outputs a plurality of audio signals corresponding to the audio collected at a plurality of positions, and the audio input unit 51 receives the plurality of audio signals. Then, the voice analysis unit 52 specifies (a) a voice generation position indicating the occurrence of a predetermined event from the plurality of voice signals, and (b) a seat allocated to a plurality of students and a plurality of students in the classroom On the basis of the student data 31 associating the seat position with each other, the student at the seat position matching the specified occurrence position is identified as a victim or a related person. For example, the sound generation position is specified from the coordinates of a plurality of sound collection positions, the sound collection timing of sound at a plurality of sound collection positions, and the like. Also, for example, the sound collection unit 1 collects sound at a plurality of positions with a plurality of directional microphones, and the sound generation position is specified from the coordinates of the plurality of sound collection positions, the sound collection volume of a plurality of directional microphones, etc. Be done.

図２は、図１における生徒データ３１の一例を示す図である。 FIG. 2 is a view showing an example of the student data 31 in FIG.

例えば図２に示すように、生徒データ３１には、各生徒について、その生徒の生徒ＩＤ、その生徒の声紋データ、その生徒の、教室での座席位置を示す座席位置データ、その生徒の名前などが含まれている。なお、声紋データは、例えば、声紋認証に使用可能な特徴点データである。 For example, as shown in FIG. 2, in the student data 31, for each student, the student ID of the student, the voiceprint data of the student, the seat position data indicating the seat position in the classroom of the student, the name of the student, etc. It is included. Voice print data is, for example, feature point data that can be used for voice print authentication.

また、音声解析部５２は、（ａ）上述の音声信号に対する音声認識処理によって上述の音声に含まれるキーワードを特定し、（ｂ）上述の音声信号に対する感情認識処理によって上述の音声の発話者の感情を特定し、そのキーワードおよびその感情に基づいて、所定イベントの種別を特定する。つまり、検出された被害者の感情や関連者の感情を考慮して所定イベントの発生が検出されるとともに、そのイベント種別が特定される。 Further, the voice analysis unit 52 (a) specifies the keyword included in the voice by the voice recognition process for the voice signal described above, and (b) the speaker of the voice of the voice described above by the emotion recognition process for the voice signal The emotion is identified, and the type of the predetermined event is identified based on the keyword and the emotion. That is, the occurrence of a predetermined event is detected in consideration of the emotion of the detected victim and the emotion of the related person, and the event type is identified.

所定イベントの種別は、いじめ兆候イベントリスト３２に予め記述されている。いじめ兆候イベントリスト３２は、複数の所定イベントの種別と、複数の所定イベントの種別に対応する判定条件とを関連付けている。 The type of the predetermined event is described in advance in the bullying sign event list 32. The bullying symptom event list 32 associates the plurality of predetermined event types with the determination conditions corresponding to the plurality of predetermined event types.

例えば、そのイベント種別としては、「嘲笑」、「阻害」、「命令形コミュニケーション」、「誹謗中傷」、「強要」、「けんか」などが設定される。 For example, as the event type, “sneaking”, “inhibition”, “command-based communication”, “smiling slander”, “force”, “fight”, etc. are set.

例えば、「嘲笑」については、授業中などにおいて特定生徒（被害者）の発言後や特定生徒の名前が教師に呼ばれた後に、他の生徒（関連者）の嘲笑が起こったことなどが条件とされる。特定生徒の発言は、声紋解析によって特定され、特定生徒の名前は、音声認識処理で音声から抽出される。また、嘲笑の発生は、例えば、感情認識処理などで特定される。また、授業中であることは、例えば、音声信号が、継続して、音声なし、教師の音声のみ、あるいは１人の生徒の音声のみとなっていることで検出されたり、または、所定の授業スケジュール（授業の開始時刻と終了時刻）から各時点が授業中であるか否かが判定されたりする。 For example, with regard to “laughing”, it is a condition that other students (related persons) have made a laugh, etc. after a specific student (victim) says or while the specific student's name is called by a teacher during class etc. It is assumed. The specific student's utterance is identified by voiceprint analysis, and the specific student's name is extracted from the speech in the speech recognition process. In addition, the occurrence of mockery is identified by, for example, emotion recognition processing. Also, being in class may be detected, for example, as the voice signal is continuously without voice, only teacher's voice, or only one student's voice, or a predetermined class Based on the schedule (the start time and the end time of the class), it is determined whether each time point is in class.

例えば、「阻害」については、授業中などにおいて特定生徒（被害者）の発言が他の生徒（関連者）の発言で遮られるなどが条件とされる。 For example, with regard to “inhibition”, it is a condition that the speech of a specific student (victim) is blocked by the speech of another student (related person) in class or the like.

例えば、「命令形コミュニケーション」については、音声（つまり、音声信号から音声認識処理で変換されたテキストデータ）において、特定生徒（被害者）に対する他の生徒（関連者）の発言で「しろ」、「してこい」、「するな」など特定の語尾が検出されることが条件とされる。 For example, in the case of "command-type communication", in speech (that is, text data converted from speech signals by speech recognition processing), the "other" of another student (related person) with respect to a specific student (victim) " The condition is that a specific ending such as "eikoi" or "yuna" is detected.

例えば、「誹謗中傷」については、音声（つまり、音声信号から音声認識処理で変換されたテキストデータ）において、特定生徒（被害者）に対する他の生徒（関連者）の発言で、キーワードリスト３３に挙げられている特定のキーワードが検出されることが条件とされる。 For example, in the case of "sludge slander", in the speech (that is, the text data converted from the speech signal by speech recognition processing), the keyword list 33 is a statement from another student (related person) to a specific student (victim). The condition is that a particular keyword listed is detected.

例えば、「強要」については、音声（つまり、音声信号から音声認識処理で変換されたテキストデータ）において、特定生徒（被害者）から他の生徒（関連者）に対する発言で、「いやだ」、「やめて」などの特定のキーワードが検出されることが条件とされる。ここで、特定生徒の発言の代わりに、特定の感情（泣くなど）が感情認識処理によって検出されることを条件としてもよい。 For example, in the case of "compulsion", in a speech (that is, text data converted from speech signals by speech recognition processing), "speaking" from a specified student (victim) to another student (related person), "No" The condition is that a specific keyword such as "Stop" is detected. Here, instead of the specific student's remark, it may be conditioned that a specific emotion (such as crying) is detected by the emotion recognition process.

例えば、「けんか」については、大きな物音、格闘音などの特定音が所定時間継続して検出されることが条件とされる。なお、この場合、被害者と関連者は、特定音の発生位置から特定される。 For example, with regard to the “question”, it is a condition that a specific sound such as a loud noise or a fighting sound is continuously detected for a predetermined time. In this case, the victim and the related person are identified from the generation position of the specific sound.

教育支援部５３は、（ａ）少なくとも、（ａ１）特定した所定イベントの種別、（ａ２）特定した被害者および関連者の少なくとも一方、および（ａ３）上述の音声信号を含む音声データファイルを互いに関連付けて、履歴としてイベント発生履歴データ４１に記録し、（ｂ）その履歴をレポートとして出力する。 The education support unit 53 is configured to: (a) at least (a1) at least one of the identified predetermined event types; (a2) at least one of the identified victim and the related person; and (a3) an audio data file including the audio signal described above It associates and records in the event occurrence history data 41 as a history, and (b) outputs the history as a report.

なお、音声解析部５２は、上述の音声信号に基づいて、教室における複数の生徒により形成されるグループを特定し、教育支援部５３は、（ａ）関連者が含まれるグループを特定し、（ｂ）特定したグループをそのレポートにおいて示すようにしてもよい。 The voice analysis unit 52 identifies a group formed by a plurality of students in the classroom based on the above-described voice signal, and the education support unit 53 identifies (a) a group including a related person, b) The identified group may be indicated in the report.

例えば、レポートは、所定期間ごとに出力されたり、履歴がイベント発生履歴データ４１に追加される際に後述の上限が成立すると随時出力されたりする。 For example, the report is output at predetermined intervals, or output as needed when an upper limit described later is satisfied when the history is added to the event occurrence history data 41.

また、例えば、特定生徒が被害者となっている履歴の件数が、１人の生徒あたりの履歴の平均件数より高い場合に、その旨のレポートが所定の担当者（教師やカウンセラーなど）に対して送信される。 Also, for example, when the number of histories in which a specific student is a victim is higher than the average number of histories per student, a report to that effect is given to a predetermined person in charge (such as a teacher or counselor). Will be sent.

また、例えば、特定生徒が被害者となっている履歴の件数が、継続的に増加している場合に、その旨のレポートが所定の担当者に対して送信される。 Also, for example, when the number of histories in which a specific student is a victim is continuously increasing, a report to that effect is transmitted to a predetermined person in charge.

また、例えば、特定生徒が被害者となっている履歴の件数が、所定の短期間で急増または急減した場合（所定の短期間における件数の増加幅または減少幅が所定閾値を超えた場合）に、その旨のレポートが所定の担当者に対して送信される。 Also, for example, when the number of histories in which a specific student is a victim increases or decreases rapidly in a predetermined short period (when the increase or decrease in the number in a predetermined short period exceeds a predetermined threshold) , A report to that effect is sent to a predetermined person in charge.

例えば、教育支援部５３は、通信インターフェイス１２を使用して、レポートを電子メールなどとして、予め登録されている担当者のアドレスに宛てて送信する。 For example, the education support unit 53 uses the communication interface 12 to transmit the report as an e-mail or the like to the address of the person in charge registered in advance.

なお、その際、レポートとともに、いじめ兆候に対するアドバイスや今後のいじめ発生リスクなどの付加情報を送信するようにしてもよい。 At this time, along with the report, additional information such as an advice on a bullying sign or a risk of future bullying may be transmitted.

また、教育支援部５３は、ユーザーインターフェイス１１を使用して生徒の情報を受け付け、生徒データ３１に追加する機能を有する。 The education support unit 53 also has a function of receiving student information using the user interface 11 and adding the information to the student data 31.

図３は、図１におけるイベント発生履歴データ４１の一例を示す図である。 FIG. 3 is a diagram showing an example of the event occurrence history data 41 in FIG.

例えば図３に示すように、イベント発生履歴データ４１には、イベントが発生するたびに、履歴が追加される。各イベントについて、発生日時、発生場所データ、被害者生徒ＩＤ、関連者生徒ＩＤ、イベント種別、証拠音声データ（音声データファイルへのパス）などを示すレコードがイベント発生履歴データ４１に追加される。なお、証拠音声データは、音声信号から所定フォーマット（ここでは、ｗａｖ形式）で生成される。 For example, as shown in FIG. 3, a history is added to the event occurrence history data 41 each time an event occurs. For each event, records indicating occurrence date / time, occurrence location data, victim student ID, related student ID, event type, evidence voice data (pass to voice data file), etc. are added to the event occurrence history data 41. The evidence speech data is generated from the speech signal in a predetermined format (here, wav format).

次に、上記音声解析システムの動作について説明する。 Next, the operation of the voice analysis system will be described.

音声解析装置２の音声入力部５１は、所定スケジュールに従って、所定開始時間（例えば、登校時間、教室の解錠時間など）になると、集音部１からの音声信号の取得を開始し、所定終了時間（例えば、下校時間、教室の施錠時間など）になると、集音部１からの音声信号の取得を終了する。 The voice input unit 51 of the voice analysis device 2 starts acquiring the voice signal from the sound collection unit 1 when a predetermined start time (for example, school attendance time, classroom unlocking time, etc.) comes according to a predetermined schedule, and predetermined termination When it is time (for example, after school time, classroom locking time, etc.), the acquisition of the audio signal from the sound collection unit 1 is ended.

所定開始時間から所定終了時間までの期間、音声解析部５２は、音声入力部５１により取得された音声信号から、上述の所定イベントの発生を示す音声（発言など）が検出されるか否かを監視する。 During a period from a predetermined start time to a predetermined end time, the voice analysis unit 52 determines whether voice (such as speech) indicating the occurrence of the above-mentioned predetermined event is detected from the voice signal acquired by the voice input unit 51. Monitor.

上述の所定イベントの発生を示す音声（発言など）が検出されると、その所定イベントについて、音声解析部５２は、生徒データ３１に基づいて、その音声内の各発言について、声紋または発生位置に基づいて、その発言の発言者である生徒（生徒ＩＤ）を、その発言内容に応じて被害者または関連者として特定し、教育支援部５３は、検出されたイベントについての上述のレコードをイベント発生履歴データ４１に追加する。 When a voice (such as a speech) indicating the occurrence of the predetermined event described above is detected, the voice analysis unit 52 generates a voiceprint or a generation position for each speech in the voice based on the student data 31 for the predetermined event. Based on the contents of the statement, the student (student ID) who is the speaker of the statement is identified as a victim or a related person, and the education support unit 53 generates the above record of the detected event as an event. It is added to the history data 41.

また、教育支援部５３は、定期的に、または、上述のレコードの追加時に、イベント発生履歴データ４１を解析し、上述のような条件を満たす特定生徒が検出された場合には、その特定生徒についてのレポートを所定の担当者へ送信する。 In addition, the education support unit 53 analyzes the event occurrence history data 41 regularly or at the time of addition of the above-mentioned record, and when a specific student who satisfies the above condition is detected, the specific student is detected Send a report on the information to a designated person in charge.

以上のように、上記実施の形態によれば、集音部１は、学校の教室内の音声を集音しその音声に対応する音声信号を出力する。音声入力部５１は、集音部１から前記音声信号を受け付ける。音声解析部５２は、その音声信号から、いじめ兆候に関する所定イベントの発生を検出するとともに、その所定イベントの発生を示す音声の声紋およびその所定イベントの発生場所の少なくとも一方に基づいて、そのイベントにおける被害者および関連者の少なくとも一方を特定する。 As mentioned above, according to the said embodiment, the sound collection part 1 collects the audio | voice in the classroom of a school, and outputs the audio | voice signal corresponding to the audio | voice. The voice input unit 51 receives the voice signal from the sound collection unit 1. The voice analysis unit 52 detects the occurrence of a predetermined event related to a bullying sign from the voice signal, and at the same time based on at least one of the voiceprint of voice indicating the occurrence of the predetermined event and the occurrence location of the predetermined event. Identify at least one of the victim and the related person.

これにより、いじめ行為の関係者（被害者および関連者）が特定されるため、いじめ兆候に対して実効的な対応が可能になる。 As a result, the related parties (victims and related persons) of the bullying behavior are identified, which makes it possible to effectively respond to the bullying signs.

なお、上述の実施の形態に対する様々な変更および修正については、当業者には明らかである。そのような変更および修正は、その主題の趣旨および範囲から離れることなく、かつ、意図された利点を弱めることなく行われてもよい。つまり、そのような変更および修正が請求の範囲に含まれることを意図している。 Note that various changes and modifications to the above-described embodiment will be apparent to those skilled in the art. Such changes and modifications may be made without departing from the spirit and scope of the subject matter and without diminishing the intended advantages. That is, such changes and modifications are intended to be included in the scope of the claims.

例えば、上記実施の形態において、上述のグループは、上述の音声信号の他、学校内に設置された防犯カメラの撮影画像に基づいて特定されるようにしてもよい。その場合、例えば撮影画像から、所定時間連続して同時に撮影された複数の生徒を顔認証などで特定し、特定した複数の生徒を１つのグループに属するものとする。なお、上述の所定時間は、撮影場所（教室、部室など）に応じて設定されるようにしてもよい。 For example, in the above-mentioned embodiment, the above-mentioned group may be specified based on a photography picture of a security camera installed in a school other than the above-mentioned sound signal. In that case, for example, a plurality of students who have been photographed simultaneously and continuously for a predetermined period of time are specified from the photographed image by face authentication or the like, and the specified plurality of students belong to one group. In addition, the above-mentioned predetermined time may be set according to an imaging place (a classroom, a club room, etc.).

また、上記実施の形態において、関係者が発言した音声の音声信号に対する音声認識処理で得られたテキストデータにおいて、生徒データ３１に登録されている生徒の名前を検索し、発見した名前の生徒を被害者として特定するようにしてもよい。その場合、例えば、イベント種別が「命令形コミュニケーション」であるイベントの発生が検出された場合のテキストデータにおいて発見された名前の生徒が被害者として特定される。 In the above embodiment, in the text data obtained by the speech recognition process for the speech signal of the speech uttered by the person concerned, the name of the student registered in the student data 31 is searched, and the student of the found name is It may be specified as a victim. In that case, for example, a student whose name is found in the text data when occurrence of an event whose event type is “command-based communication” is detected is identified as a victim.

本発明は、例えば、学校内の端末装置や電子機器に適用可能である。 The present invention is applicable to, for example, a terminal device or an electronic device in a school.

１集音部
２音声解析装置
１４演算処理装置（コンピューターの一例）
２１音声解析プログラム
５１音声入力部
５２音声解析部
５３教育支援部 1 sound collection unit 2 voice analysis device 14 arithmetic processing device (an example of a computer)
21 voice analysis program 51 voice input unit 52 voice analysis unit 53 education support unit

Claims

A sound collection unit that collects voices in a classroom of the school and outputs a voice signal corresponding to the voices;
A voice analysis device for receiving the voice signal from the sound collection unit;
The voice analysis device detects, from the voice signal, the occurrence of a predetermined event related to a bullying sign, and a voiceprint of voice indicating the occurrence of the predetermined event, the location of the predetermined event, and voice indicating the occurrence of the predetermined event Identifying at least one of a victim and a related person in the event based on at least one of the names detected from
Voice analysis system characterized by

A voice input unit that receives the voice signal from a sound collection unit that collects voice in a classroom of the school and outputs a voice signal corresponding to the voice;
From the voice signal, occurrence of a predetermined event related to a bullying sign is detected, and a voiceprint of voice indicating the occurrence of the predetermined event, location of the predetermined event, and a name detected from voice indicating the occurrence of the predetermined event A voice analysis unit that identifies at least one of a victim and a related person at the event based on at least one;
An audio analysis apparatus comprising:

The voice analysis unit identifies (a) voice indicating the occurrence of the predetermined event from the voice signal, and (b) a student who associates a plurality of students in the classroom with voiceprint data of the plurality of students. The voice analysis device according to claim 2, wherein a student of the voiceprint data matching the voiceprint of the voice identified is identified as the victim or the related person based on data.

The student data respectively associates the plurality of students in the classroom with the seat positions of the seats assigned to the plurality of students,
The voice analysis unit generates the voiceprint data of the student from the voice signal including the voice of the student speaking when sitting on the seat, and the generated voiceprint data and the student are mutually compared. Associate and add to the student data,
The voice analysis device according to claim 3, characterized in that

The sound collection unit outputs a plurality of audio signals corresponding to audio collected at a plurality of positions,
The voice input unit receives the plurality of voice signals,
The voice analysis unit (a) specifies a voice generation position indicating the occurrence of the predetermined event from the plurality of voice signals, and (b) is assigned to a plurality of students in the classroom and the plurality of students Identifying a student at the seat position matching the identified occurrence position as the victim or the associate based on student data respectively associating the seat position with the seat position;
The voice analysis device according to claim 2, characterized in that:

The voice analysis unit (a) identifies a keyword included in the voice by voice recognition processing on the voice signal, (b) identifies an emotion of a speaker of the voice by emotion recognition processing on the voice signal, 3. The voice analysis device according to claim 2, wherein the type of the predetermined event is specified based on a keyword and the emotion.

(A) at least (a1) the type of the specified predetermined event, (a2) at least one of the specified victim and the related person, and (a3) a voice data file including the voice signal in association with each other The speech analysis apparatus according to any one of claims 2 to 6, further comprising: an education support unit which records as (b) and outputs the history as a report.

The voice analysis unit identifies a group formed by a plurality of students in the classroom based on the voice signal,
The education support unit (a) identifies the group in which the related person is included, and (b) indicates the identified group in the report.
The voice analysis device according to claim 7, characterized in that

9. The voice analysis device according to claim 8, wherein the group is identified based on the voice signal or a photographed image of a security camera installed in the school.

Computer,
A voice input unit that receives the voice signal from a sound collector that picks up a voice in a classroom of the school and outputs a voice signal corresponding to the voice, and detects occurrence of a predetermined event related to a bullying sign from the voice signal. A victim of voice and an associate of the event based on at least one of a voice print of voice indicating the occurrence of the predetermined event, a location of the occurrence of the predetermined event, and a name detected from voice indicating the occurrence of the predetermined event A voice analysis program that functions as a voice analysis unit that identifies at least one.