JP5171501B2

JP5171501B2 - Server, system, method and program for extracting important words

Info

Publication number: JP5171501B2
Application number: JP2008237926A
Authority: JP
Inventors: 祐宮崎; 誠秀
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2008-03-07
Filing date: 2008-09-17
Publication date: 2013-03-27
Anticipated expiration: 2028-09-17
Also published as: JP2009238199A

Abstract

PROBLEM TO BE SOLVED: To provide a device for extracting a significant word in a conversation, and for presenting it as text data. SOLUTION: A server 10 receives the voice data of a conversation, and separates the received voice data for every speaker, converts each of the separated voice data into text data, selects the prescribed number of text data from among the converted text data from the order of a new speech time, and extracts a significant word, based on an index related with the appearance frequency of each word included in the selected text data. COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、会話の中の重要語を抽出するサーバ、システム、方法およびプログラムに関する。 The present invention, server to extract the important word in the conversation, system, and Methods us and program.

従来、会議や会話等の音声データは、録音しなければ記録としては残らず、また、録音された音声データも、連続して再生しなければ、その内容を知ることができないため、内容の理解や情報の検索に時間がかかっていた。そこで、音声認識の技術を用いて、音声データをテキストデータに変換することが行われてきた。 Conventionally, audio data for meetings, conversations, etc. will not be recorded unless it is recorded, and the recorded audio data cannot be known unless it is played back continuously. And searching for information took a long time. Therefore, voice data has been converted to text data using voice recognition technology.

このとき、音声データに複数の発言者が存在すると、テキストデータに変換された場合に、発言者の区別がつかなくなる。そこで、音声データを発言者毎に分離する方法が提案されている（例えば、特許文献１参照）。これらの技術により、音声データを発言者毎にテキストデータとして記録、提示することが可能となる。 At this time, if there are a plurality of speakers in the voice data, the speakers cannot be distinguished when converted into text data. Therefore, a method of separating voice data for each speaker has been proposed (see, for example, Patent Document 1). With these techniques, it is possible to record and present voice data as text data for each speaker.

一方、会議や会話等の音声データをテキストデータに変換して、このテキストデータからキーワードを抽出し、キーワードに応じた広告を配信することが提案されている。例えば、特許文献２では、テレビ電話システムにおいて、テキストデータからキーワードを抽出して、キーワードに応じた広告を表示画面に表示させることが提案されている。また、特許文献３では、車内の音を集音して音声認識処理を行い、認識された語彙と一致するキーワードに応じた広告を車載器に配信することが提案されている。
特許第３３６４４８７号公報特開２００２−１６５１９３号公報特開２００４−２２６０７０号公報 On the other hand, it has been proposed to convert voice data such as a meeting or conversation into text data, extract a keyword from the text data, and distribute an advertisement corresponding to the keyword. For example, Patent Document 2 proposes that in a videophone system, a keyword is extracted from text data and an advertisement corresponding to the keyword is displayed on a display screen. Japanese Patent Application Laid-Open No. 2004-228620 proposes collecting sound in a vehicle and performing voice recognition processing, and distributing an advertisement corresponding to a keyword that matches the recognized vocabulary to the vehicle-mounted device.
Japanese Patent No. 3364487 JP 2002-165193 A JP 2004-226070 A

しかしながら、特許文献１の技術により、音声データをテキストデータに変換した場合、全ての発言がデータ化されるため、重要な発言も、話題から外れた重要度の低い発言についても、平等にデータ化されていた。その結果、会議や会話の内容を的確に判断することが困難であり、特許文献２および３にも、重要なキーワードを抽出するための明確な記載はない。また、特許文献２および３の技術では、音声を収集した現場に対して広告を配信するため、大衆に対して広告が配信されているとはいえなかった。更に、テキストデータから抽出されたキーワードに応じたサービスであり、キーワードの重要度にかかわらずサービス情報が表示される。よって、効果的に広告が配信されているとはいえなかった。 However, when speech data is converted to text data by the technique of Patent Document 1, all speech is converted into data, so even important speech and less important speech that is out of the topic are converted to equal data. It had been. As a result, it is difficult to accurately determine the contents of a meeting or conversation, and Patent Documents 2 and 3 do not have a clear description for extracting important keywords. Further, in the techniques of Patent Documents 2 and 3, since the advertisement is distributed to the site where the voice is collected, it cannot be said that the advertisement is distributed to the general public. Furthermore, it is a service corresponding to a keyword extracted from text data, and service information is displayed regardless of the importance of the keyword. Therefore, it cannot be said that the advertisement is effectively distributed.

そこで本発明は、会話の中の重要語を抽出してテキストデータとして提示し、効果的に広告を配信することのできる装置を提供することを目的とする。 Therefore, an object of the present invention is to provide an apparatus capable of extracting an important word in a conversation and presenting it as text data and effectively distributing an advertisement.

本発明では、以下のような解決手段を提供する。 The present invention provides the following solutions.

（１）会話の中の重要語を抽出するサーバであって、
前記会話の音声データを受信する受信手段と、
前記受信手段により受信した音声データを、発言者毎に分離する分離手段と、
前記分離手段により分離された音声データのそれぞれを、テキストデータに変換する変換手段と、
前記変換手段により変換されたテキストデータのうち、発言時刻が新しいものから所定数のテキストデータを選択する選択手段と、
前記選択手段により選択されたテキストデータに含まれる各語の出現頻度に関する指標に基づいて重要語を抽出する抽出手段と、を備えるサーバ。 (1) A server that extracts important words in a conversation,
Receiving means for receiving voice data of the conversation;
Separating means for separating voice data received by the receiving means for each speaker;
Conversion means for converting each of the audio data separated by the separation means into text data;
A selection means for selecting a predetermined number of text data from the newest speech time among the text data converted by the conversion means;
A server comprising: extraction means for extracting an important word based on an index relating to an appearance frequency of each word included in the text data selected by the selection means.

このような構成によれば、当該サーバは、会話の音声データを受信し、受信した音声データを、発言者毎に分離し、分離された音声データのそれぞれを、テキストデータに変換し、変換されたテキストデータのうち、発言時刻が新しいものから所定数のテキストデータを選択し、選択されたテキストデータに含まれる各語の出現頻度に関する指標に基づいて重要語を抽出する。 According to such a configuration, the server receives the voice data of the conversation, separates the received voice data for each speaker, converts each of the separated voice data into text data, and is converted. Among the text data, a predetermined number of text data is selected from those having a new utterance time, and important words are extracted based on an index relating to the appearance frequency of each word included in the selected text data.

このことにより、当該サーバは、受信した音声データのうち、直近の所定数の発言内容から重要語を抽出する。その結果、刻々と内容が変化する会話の中から、現在話題となっている内容を表す重要語をタイムリーにユーザに対して提示することができる。 As a result, the server extracts important words from the latest predetermined number of utterance contents in the received voice data. As a result, it is possible to present to the user timely important words representing the content that is currently being discussed from conversations whose content changes every moment.

その結果、ユーザは、現在の話題を確認することができるため、この話題に沿った発言がし易くなる。また、会議等で、議題から外れた重要語が提示されるのを監視することにより、議論の内容が発散することを抑制できる。 As a result, since the user can check the current topic, it is easy to make a comment along this topic. In addition, it is possible to prevent the content of the discussion from diverging by monitoring the presentation of important words that are out of the agenda at a meeting or the like.

（２）ネットワークを介して接続された端末装置からの処理開始要求に応じて、当該端末装置を特定する特定手段と、
前記特定手段により特定された前記端末装置とのデータ通信接続を確立する接続手段と、
前記抽出手段により抽出された重要語を、前記接続手段によりデータ通信接続が確立された前記端末装置に送信する送信手段と、を更に備える（１）に記載のサーバ。 (2) a specifying means for specifying the terminal device in response to a processing start request from the terminal device connected via the network;
Connection means for establishing a data communication connection with the terminal device specified by the specifying means;
The server according to (1), further comprising: a transmission unit that transmits the important word extracted by the extraction unit to the terminal device that has established a data communication connection by the connection unit.

このような構成によれば、当該サーバは、ネットワークを介して接続された端末装置からの処理開始要求に応じて、当該端末装置を特定し、特定された端末装置とのデータ通信接続を確立し、抽出された重要語を、データ通信接続が確立された端末装置に送信する。 According to such a configuration, the server identifies the terminal device in response to a processing start request from the terminal device connected via the network, and establishes a data communication connection with the identified terminal device. The extracted important words are transmitted to the terminal device with which the data communication connection is established.

このことにより、当該サーバは、ネットワークを介して接続された端末装置により集音された音声データに基づいて、当該端末装置に対して、刻々と変化する重要語をタイムリーに報知することができる。 Thus, the server can notify the terminal device of important words that change every moment in a timely manner based on the voice data collected by the terminal device connected via the network. .

（３）ネットワークを介して接続された端末装置に対して前記会話の映像及び音声をリアルタイム配信するリアルタイム配信手段と、
前記抽出手段により抽出された重要語に応じた広告情報を取得する広告取得手段と、
前記広告取得手段により取得された広告情報を、前記リアルタイム配信に連動させて前記端末装置に配信する広告配信手段と、を更に備える（１）に記載のサーバ。 (3) Real-time delivery means for delivering the video and audio of the conversation in real time to a terminal device connected via a network ;
Advertisement acquisition means for acquiring advertisement information in accordance with the important words extracted by the extraction means;
The server according to (1), further comprising: an advertisement distribution unit that distributes the advertisement information acquired by the advertisement acquisition unit to the terminal device in conjunction with the real-time distribution.

このような構成によれば、当該サーバは、端末装置に会話の映像及び音声をリアルタイム配信し、抽出された重要語に応じた広告情報を取得し、取得された広告情報を、リアルタイム配信に連動させて端末装置に配信する。 According to such a configuration, the server distributes the video and audio of the conversation in real time to the terminal device, acquires the advertisement information corresponding to the extracted important word, and links the acquired advertisement information to the real-time distribution. And distributed to the terminal device.

このことにより、当該サーバは、会話の映像及び音声に応じた広告を、会話の映像及び音声のリアルタイム配信に連動させて端末装置に配信することができるので、公衆に対して効果的に広告を配信することができる。また、配信される広告に会話の映像及び音声が調和するので違和感がない広告表示を実現できると共に、ユーザの印象に残り易いものとなるので、高い広告効果が期待できる。また、端末装置のユーザが配信される映像に登場する出演者のファンである場合には、この広告表示に対してクリックする確率が高い状況を見込むことができる。 As a result, the server can distribute advertisements according to the conversation video and voice to the terminal device in conjunction with the real-time distribution of the conversation video and voice. Can be delivered. Further, since the video and audio of the conversation are harmonized with the distributed advertisement, it is possible to realize an advertisement display without a sense of incongruity, and it is easy to remain in the impression of the user, so a high advertising effect can be expected. Moreover, when the user of the terminal device is a fan of a performer who appears in the video to be distributed, it is possible to expect a situation where the probability of clicking on the advertisement display is high.

（４）前記抽出手段は、前記指標として、前記テキストデータの中に各語が出現する頻度を示すＴＦ値と、前記所定数のテキストデータのうち各語が出現する頻度に関するＤＦ値の逆数であるＩＤＦ値と、の積を算出することを特徴とする（１）から（３）のいずれかに記載のサーバ。 (4) The extraction means uses, as the index, a TF value indicating the frequency of occurrence of each word in the text data and an inverse number of a DF value relating to the frequency of occurrence of each word in the predetermined number of text data. The server according to any one of (1) to (3), wherein a product of a certain IDF value is calculated.

このような構成によれば、当該サーバは、時間の経過を考慮し、所定数の発言を選択することにより、その所定数の範囲において、ＴＦ（ＴｅｒｍＦｒｅｑｕｅｎｃｙ）とＩＤＦ（ＩｎｖｅｒｓｅＤｏｃｕｍｅｎｔＦｒｅｑｕｅｎｃｙ）の積に基づくＴＦ・ＩＤＦ値を算出する。 According to such a configuration, the server considers the passage of time and selects a predetermined number of utterances, so that the product of TF (Term Frequency) and IDF (Inverse Document Frequency) within the predetermined number range. TF / IDF values based on the above are calculated.

このことにより、当該サーバは、語句の新鮮度を考慮したＴＦ・ＩＤＦ技術を用いて、出現頻度に基づく重要語を抽出することができる。 Thus, the server can extract an important word based on the appearance frequency by using the TF / IDF technology considering the freshness of the phrase.

（５）前記選択手段は、前記テキストデータが所定の種類の語を含むことを検出した場合に、当該所定の種類の語が発言された以降のテキストデータを選択することを特徴とする（１）から（４）のいずれかに記載のサーバ。 (5) When the selection unit detects that the text data includes a predetermined type of word, the selection unit selects text data after the predetermined type of word is spoken (1). The server according to any one of (4) to (4).

このような構成によれば、当該サーバは、所定の種類の語句、例えば、「ところで」、「さて」等の話題転換語を検出した場合に、この検出した語句以降の発言を対象として、重要語を抽出する。 According to such a configuration, when the server detects a predetermined type of phrase, for example, a topic conversion word such as “By the way” or “Now”, it is important to target the remarks after the detected phrase. Extract words.

このことにより、当該サーバは、同一の話題で話されている期間を対象として、重要語を抽出するので、異なる話題が混在することにより現在の重要語が見過ごされる可能性を低減できる。 As a result, the server extracts important words for a period in which they are spoken on the same topic, so that it is possible to reduce the possibility that the current important word is overlooked due to a mixture of different topics.

（６）前記テキストデータに含まれる各語の類義語を判別するための類義語データベースを更に備え、
前記抽出手段は、前記類義語データベースに記憶された類義語を含めて、各語の出現頻度に関する前記指標を算出することを特徴とする（１）から（５）のいずれかに記載のサーバ。 (6) A synonym database for determining synonyms of each word included in the text data is further provided.
The server according to any one of (1) to (5), wherein the extraction unit calculates the index related to the appearance frequency of each word including the synonyms stored in the synonym database.

このような構成によれば、当該サーバは、テキストデータに含まれる各語の類義語を判別するための類義語データベースを更に備え、類義語データベースに記憶された類義語を含めて、各語の出現頻度に関する指標を算出する。 According to such a configuration, the server further includes a synonym database for discriminating synonyms of each word included in the text data, and includes the synonym stored in the synonym database, and an index related to the appearance frequency of each word. Is calculated.

このことにより、当該サーバは、類義語データベースを備えるので、類義語を同一の語句と見なして、重要語を抽出することができる。その結果、発言者が異なることによる表現の揺れを吸収することができる。 Thus, since the server includes a synonym database, synonyms can be regarded as the same phrase and important words can be extracted. As a result, it is possible to absorb the fluctuation of expression due to different speakers.

（７）前記抽出手段は、前記選択手段により選択されたテキストデータの中で、前記発言者が同一であるテキストデータの数に基づいて、当該発言者のテキストデータに含まれる各語の前記指標に重み付けを行うことを特徴とする（１）から（６）のいずれかに記載のサーバ。 (7) The extraction unit is configured to determine the index of each word included in the text data of the speaker based on the number of text data with the same speaker in the text data selected by the selection unit. The server according to any one of (1) to (6), wherein weighting is performed on the server.

このような構成によれば、当該サーバは、発言者が同一である発言が複数ある場合に、その発言数に応じて、その発言者の発言に含まれる各語について重み付けを行う。このことにより、当該サーバは、発言数が多い発言者の発言内容に含まれる語句の重要度を高くすることができる。 According to such a configuration, when there are a plurality of utterances with the same speaker, the server weights each word included in the utterance of the speaker according to the number of utterances. Thus, the server can increase the importance of words included in the content of the speech of a speaker who has a large number of statements.

（８）前記変換手段は、前記音声データの音量を示す音量データを、前記テキストデータに関連付け、
前記抽出手段は、前記変換手段により前記テキストデータに関連付けられた音量データに基づいて、当該テキストデータに含まれる各語の前記指標に重み付けを行うことを特徴とする（１）から（７）のいずれかに記載のサーバ。 (8) The conversion means associates volume data indicating the volume of the audio data with the text data,
(1) to (7), wherein the extraction unit weights the index of each word included in the text data based on the volume data associated with the text data by the conversion unit. A server according to any of the above.

このような構成によれば、当該サーバは、音声データの音量を示す音量データを、テキストデータに関連付け、この音量データに基づいて、当該テキストデータに含まれる各語の指標に重み付けを行う。 According to such a configuration, the server associates the volume data indicating the volume of the voice data with the text data, and weights the index of each word included in the text data based on the volume data.

このことにより、当該サーバは、音量の大きい語句について重要度を高くすることができるので、発言者が強調する語句を重要語として抽出する可能性を高めることができる。 Accordingly, the server can increase the degree of importance for a phrase with a high volume, so that it is possible to increase the possibility of extracting a phrase emphasized by a speaker as an important word.

（９）会話の音声を受信する端末装置と、ネットワークを介してデータ通信可能なサーバにより、当該会話の中の重要語を抽出するシステムであって、
前記サーバは、
前記端末装置からの処理開始要求に応じて、当該端末装置を特定する特定手段と、
前記特定手段により特定された前記端末装置とのデータ通信接続を確立する接続手段と、
前記会話の音声データを受信する受信手段と、
前記受信手段により受信した音声データを、発言者毎に分離する分離手段と、
前記分離手段により分離された音声データのそれぞれを、テキストデータに変換する変換手段と、
前記変換手段により変換されたテキストデータのうち、発言時刻が新しいものから所定数のテキストデータを選択する選択手段と、
前記選択手段により選択されたテキストデータに含まれる各語の出現頻度に関する指標に基づいて重要語を抽出する抽出手段と、
前記抽出手段により抽出された重要語を、前記接続手段によりデータ通信接続が確立された前記端末装置に送信する送信手段と、を備え、
前記端末装置は、
前記送信手段により送信された前記重要語を表示する表示手段を備えるシステム。 (9) A system for extracting important words in the conversation by a terminal device that receives the voice of the conversation and a server capable of data communication via a network,
The server
In response to a processing start request from the terminal device, a specifying unit that specifies the terminal device;
Connection means for establishing a data communication connection with the terminal device specified by the specifying means;
Receiving means for receiving voice data of the conversation;
Separating means for separating voice data received by the receiving means for each speaker;
Conversion means for converting each of the audio data separated by the separation means into text data;
A selection means for selecting a predetermined number of text data from the newest speech time among the text data converted by the conversion means;
Extraction means for extracting an important word based on an index relating to the appearance frequency of each word included in the text data selected by the selection means;
Transmitting means for transmitting the important word extracted by the extracting means to the terminal device in which a data communication connection has been established by the connecting means,
The terminal device
A system comprising display means for displaying the important word transmitted by the transmission means.

このような構成によれば、当該システムは、会話の音声を受信する端末装置が当該サーバと通信することにより、当該サーバにより送信された重要語を表示する。このことにより、当該システムを運用することで、（１）および（２）と同様の効果が期待できる。 According to such a configuration, the system displays the important word transmitted by the server when the terminal device that receives the voice of the conversation communicates with the server. Thus, by operating the system, the same effects as (1) and (2) can be expected.

（１０）前記端末装置は、前記送信手段により送信された前記重要語を、時系列に記憶する記憶手段を更に備える（９）に記載のシステム。 (10) The system according to (9), wherein the terminal device further includes storage means for storing the important words transmitted by the transmission means in time series.

このような構成によれば、当該システムの端末装置は、抽出された重要語を時系列に記憶するので、利用者は、会話が終わった後からでも、例えば議事録の作成等の際に、この会話の流れを参照することができる。 According to such a configuration, since the terminal device of the system stores the extracted important words in time series, even after the conversation is over, for example, when creating the minutes, etc. You can refer to this conversation flow.

（１１）会話の中の重要語を抽出する方法であって、
前記会話の音声データを受信する受信ステップと、
前記受信ステップにより受信した音声データを、発言者毎に分離する分離ステップと、
前記分離ステップにより分離された音声データのそれぞれを、テキストデータに変換する変換ステップと、
前記変換ステップにより変換されたテキストデータのうち、発言時刻が新しいものから所定数のテキストデータを選択する選択ステップと、
前記選択ステップにより選択されたテキストデータに含まれる各語の出現頻度に関する指標に基づいて重要語を抽出する抽出ステップと、を含む方法。 (11) A method for extracting important words in a conversation,
Receiving the voice data of the conversation;
A separation step of separating the voice data received by the reception step for each speaker;
A conversion step of converting each of the audio data separated by the separation step into text data;
A selection step of selecting a predetermined number of text data from the newest speech time among the text data converted by the conversion step;
An extraction step of extracting an important word based on an index relating to an appearance frequency of each word included in the text data selected by the selection step.

このような構成によれば、当該方法を実行することにより、（１）と同様の効果が期待できる。 According to such a configuration, the same effect as in (1) can be expected by executing the method.

（１２）会話の中の重要語をサーバに抽出させるプログラムであって、
前記会話の音声データを受信する受信ステップと、
前記受信ステップにより受信した音声データを、発言者毎に分離する分離ステップと、
前記分離ステップにより分離された音声データのそれぞれを、テキストデータに変換する変換ステップと、
前記変換ステップにより変換されたテキストデータのうち、発言時刻が新しいものから所定数のテキストデータを選択する選択ステップと、
前記選択ステップにより選択されたテキストデータに含まれる各語の出現頻度に関する指標に基づいて重要語を抽出する抽出ステップと、を実行させるプログラム。 (12) A program that causes a server to extract important words in a conversation,
Receiving the voice data of the conversation;
A separation step of separating the voice data received by the reception step for each speaker;
A conversion step of converting each of the audio data separated by the separation step into text data;
A selection step of selecting a predetermined number of text data from the newest speech time among the text data converted by the conversion step;
A program for executing an extraction step of extracting an important word based on an index related to an appearance frequency of each word included in the text data selected in the selection step.

このような構成によれば、当該プログラムを実行することにより、（１）と同様の効果が期待できる。 According to such a configuration, the same effect as in (1) can be expected by executing the program.

本発明によれば、会話の中の重要語を抽出してテキストデータとして提示することができる。また、公衆に対して効果的に広告を配信することができる。 According to the present invention, important words in a conversation can be extracted and presented as text data. In addition, advertisements can be distributed effectively to the public.

（第１実施形態）
以下、第１実施形態について図を参照しながら説明する。 (First embodiment)
The first embodiment will be described below with reference to the drawings.

［システム概要］
図１は、第１実施形態に係るシステムの概要を示す図である。会議等の会話の場において音声を受信する端末装置２０と、この会話の中の重要語を抽出するサーバ１０とが、ネットワークを介して接続されている。第１実施形態では、端末装置２０は携帯電話機であるとして説明するが、通信機能を備えた端末装置であれば、これには限られない。また、利用場所を限定させないために、携帯可能な小型の端末装置であることが好ましい。 [System Overview]
FIG. 1 is a diagram illustrating an overview of a system according to the first embodiment. A terminal device 20 that receives audio in a conversation such as a conference and a server 10 that extracts important words in the conversation are connected via a network. In the first embodiment, the terminal device 20 is described as a mobile phone. However, the terminal device 20 is not limited to this as long as the terminal device has a communication function. Moreover, in order not to limit a use place, it is preferable that it is a portable small terminal device.

端末装置２０は、会話の音声データをサーバ１０に送信し、サーバ１０により抽出された重要語を受信して表示する。これにより、端末装置２０のユーザは、会話の中で刻々と変化する重要語をリアルタイムに知ることができる。 The terminal device 20 transmits voice data of the conversation to the server 10 and receives and displays the important words extracted by the server 10. Thereby, the user of the terminal device 20 can know the important words that change every moment in the conversation in real time.

サーバ１０は、端末装置２０から受信した音声データに対して、音声認識技術を用い、音声のテキストデータを生成する。続いて、サーバ１０は、生成したテキストデータに対して、語の新鮮度を考慮したＴＦ・ＩＤＦ技術（詳細は後述する）を用い、会話の中の重要語を抽出する。 The server 10 generates speech text data using speech recognition technology for speech data received from the terminal device 20. Subsequently, the server 10 extracts an important word in the conversation using a TF / IDF technique (details will be described later) in consideration of the freshness of the word for the generated text data.

［ハードウェア構成］
図２は、第１実施形態に係るサーバ１０のハードウェア構成の一例を示す図である。サーバ１０は、制御装置１０１を構成するＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１（１０１０）（マルチプロセッサ構成ではＣＰＵ２（１０１２）等複数のＣＰＵが追加されてもよい）、バスライン１００５、通信Ｉ／Ｆ１０４０、メインメモリ１０５０、ＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔＯｕｔｐｕｔＳｙｓｔｅｍ）１０６０、ＵＳＢポート１０９０、Ｉ／Ｏコントローラ１０７０ならびにキーボードおよびマウス等の入力装置１１００や表示装置１０２２を備える。 [Hardware configuration]
FIG. 2 is a diagram illustrating an example of a hardware configuration of the server 10 according to the first embodiment. The server 10 includes a CPU (Central Processing Unit) 1 (1010) constituting the control device 101 (a plurality of CPUs such as CPU 2 (1012) may be added in a multiprocessor configuration), a bus line 1005, a communication I / F 1040, includes a main memory 1050, BIOS (Basic input Output system ) 1060, USB port 1090, I / O controller 107 input device 1100 and a display device 1022 such as a keyboard and mouse to 0 rabbi.

ＢＩＯＳ１０６０は、サーバ１０の起動時に制御装置１０１が実行するブートプログラムや、サーバ１０のハードウェアに依存するプログラム等を格納する。 The BIOS 1060 stores a boot program executed by the control device 101 when the server 10 is started up, a program depending on the hardware of the server 10, and the like.

Ｉ／Ｏコントローラ１０７０には、テープドライブ１０７２、ハードディスク１０７４、光ディスクドライブ１０７６、半導体メモリ１０７８等の記憶装置１０７を接続することができる。 A storage device 107 such as a tape drive 1072, a hard disk 1074, an optical disk drive 1076, and a semiconductor memory 1078 can be connected to the I / O controller 1070.

記憶装置１０７を構成するハードディスク１０７４は、サーバ１０がサーバとして機能するための各種プログラムおよび本発明の機能を実行するプログラムを記憶しており、更に必要に応じて各種データベースを構成可能である。 The hard disk 1074 constituting the storage device 107 stores various programs for the server 10 to function as a server and programs for executing the functions of the present invention, and various databases can be configured as necessary.

光ディスクドライブ１０７６としては、例えば、ＤＶＤ−ＲＯＭドライブ、ＣＤ−ＲＯＭドライブ、ＤＶＤ−ＲＡＭドライブ、ＣＤ−ＲＡＭドライブ等を使用することができる。この場合は各ドライブに対応した光ディスク１０７７を使用する。光ディスク１０７７から光ディスクドライブ１０７６によりプログラムまたはデータを読み取り、Ｉ／Ｏコントローラ１０７０を介してメインメモリ１０５０またはハードディスク１０７４に提供することもできる。また、同様にテープドライブ１０７２に対応したテープメディア１０７１を主としてバックアップのために使用することもできる。 As the optical disk drive 1076, for example, a DVD-ROM drive, a CD-ROM drive, a DVD-RAM drive, a CD-RAM drive, or the like can be used. In this case, the optical disk 1077 corresponding to each drive is used. A program or data can be read from the optical disk 1077 by the optical disk drive 1076 and provided to the main memory 1050 or the hard disk 1074 via the I / O controller 1070. Similarly, the tape medium 1071 corresponding to the tape drive 1072 can be used mainly for backup.

サーバ１０に提供されるプログラムは、ハードディスク１０７４、光ディスク１０７７またはメモリーカード等の記録媒体に格納されて提供される。このプログラムは、Ｉ／Ｏコントローラ１０７０を介して、記録媒体から読み出され、または通信Ｉ／Ｆ１０４０を介してダウンロードされることによって、サーバ１０にインストールされ実行されてもよい。 Program provided to the server 10, a hard disk 1074, optical disk 107 7 or is provided stored on a recording medium such as a memory card. The program may be installed in the server 10 and executed by being read from the recording medium via the I / O controller 1070 or downloaded via the communication I / F 1040.

前述のプログラムは、内部または外部の記憶媒体に格納されてもよい。ここで、記憶装置１０７を構成する記憶媒体としては、ハードディスク１０７４、光ディスク１０７７またはメモリーカードの他に、ＭＤ等の光磁気記録媒体、テープ媒体を用いることができる。また、専用通信回線やインターネットに接続されたサーバシステムに設けたハードディスク１０７４または光ディスクライブラリー等の記憶装置を記録媒体として使用し、通信回線を介してプログラムをサーバ１０に提供してもよい。 The aforementioned program may be stored in an internal or external storage medium. Here, the storage medium constituting the storage device 107, a hard disk 1074, optical disk 107 7 or to any other memory card, a magneto-optical recording medium such as an MD, a tape medium. Further, a storage device such as a hard disk 1074 or an optical disk library provided in a server system connected to a dedicated communication line or the Internet may be used as a recording medium, and the program may be provided to the server 10 via the communication line.

ここで、表示装置１０２２は、ユーザにデータの入力を受け付ける画面を表示したり、サーバ１０による演算処理結果の画面を表示したりするものであり、ブラウン管表示装置（ＣＲＴ）、液晶表示装置（ＬＣＤ）等のディスプレイ装置を含む。 Here, the display device 1022 displays a screen for accepting data input to the user, or displays a screen of a calculation processing result by the server 10, and includes a cathode ray tube display device (CRT) and a liquid crystal display device (LCD). ) And the like.

ここで、入力装置１１００は、ユーザによる入力の受け付けを行うものであり、キーボードおよびマウス等により構成してよい。 Here, the input device 1100 accepts input by the user, and may be configured by a keyboard, a mouse, and the like.

また、通信Ｉ／Ｆ１０４０は、サーバ１０を専用ネットワークまたは公共ネットワークを介して端末と接続できるようにするためのネットワーク・アダプタである。通信Ｉ／Ｆ１０４０は、モデム、ケーブル・モデムおよびイーサネット（登録商標）・アダプタを含んでよい。 The communication I / F 1040 is a network adapter for enabling the server 10 to be connected to a terminal via a dedicated network or a public network. The communication I / F 1040 may include a modem, a cable modem, and an Ethernet (registered trademark) adapter.

以上の例は、サーバ１０について主に説明したが、コンピュータに、プログラムをインストールして、そのコンピュータをサーバ装置として動作させることにより上記で説明した機能を実現することもできる。したがって、本発明において一実施形態として説明したサーバにより実現される機能は、上述の方法を当該コンピュータで実行することにより、あるいは、上述のプログラムを当該コンピュータに導入して実行することによっても実現可能である。 In the above example, the server 10 has been mainly described. However, the functions described above can also be realized by installing a program in a computer and operating the computer as a server device. Therefore, the functions realized by the server described as an embodiment in the present invention can be realized by executing the above-described method on the computer, or by introducing and executing the above-described program on the computer. It is.

［機能構成］
図３は、第１実施形態に係るサーバ１０における、制御装置１０１の主な機能の構成を示す図である。なお、以下に説明する各機能は、サーバ１０単体において実現されることとしたが、これには限られず、適宜、複数のサーバに機能を分散させてもよい。 [Function configuration]
FIG. 3 is a diagram illustrating a configuration of main functions of the control device 101 in the server 10 according to the first embodiment. In addition, although each function demonstrated below was implement | achieved in the server 10 single-piece | unit, it is not restricted to this, You may distribute a function to several servers suitably.

サーバ１０は、電話番号特定部１１０と、音声受信部１２０と、人物特定部１３０と、音声認識部１４０と、重み付け設定部１５０と、重要語抽出部１６０と、重要語送信部１７０と、を備える。 The server 10 includes a telephone number specifying unit 110, a voice receiving unit 120, a person specifying unit 130, a voice recognition unit 140, a weight setting unit 150, a keyword extraction unit 160, and a keyword transmission unit 170. Prepare.

電話番号特定部１１０は、端末装置２０からの処理開始要求を受信したことに応じて、端末装置２０の電話番号を特定する。具体的には、図４に示す登録ユーザテーブルを記憶装置１０７に記憶しており、電話番号特定部１１０は、ユーザＩＤに対応する端末装置２０の電話番号、すなわち重要語を配信する対象を特定する。 The telephone number specifying unit 110 specifies the telephone number of the terminal device 20 in response to receiving the processing start request from the terminal device 20. Specifically, the registered user table shown in FIG. 4 is stored in the storage device 107, and the telephone number specifying unit 110 specifies the telephone number of the terminal device 20 corresponding to the user ID, that is, the target for distributing the important word. To do.

音声受信部１２０は、端末装置２０により集音され、電子データとして変換された音声データを受信する。 The sound receiving unit 120 receives sound data collected by the terminal device 20 and converted as electronic data.

人物特定部１３０は、音声受信部１２０により受信した音声データについて、発言者の特定を行う。すなわち、会話の中の音声を、発言者毎に分離し、時系列に、発言者と紐付けられた複数の音声データとする。例えば、図５に示す会話から生成されるテキストデータの模式図によれば、「Ａさん」の音声データ、次に発言した「Ｂさん」の音声データ、その次に発言した「Ｃさん」の音声データ、最後に発言した「Ｄさん」の音声データが、互いに分離される。 The person specifying unit 130 specifies a speaker for the audio data received by the audio receiving unit 120. That is, the voice in the conversation is separated for each speaker, and a plurality of voice data associated with the speaker is obtained in time series. For example, according to the schematic diagram of the text data generated from the conversation shown in FIG. 5, the voice data of “Mr. A”, the voice data of “Mr. B” who spoke next, and the voice of “Mr. C” who spoke next The voice data and the voice data of “Mr. D” who spoke last are separated from each other.

ここで、人物特定部１３０は、上述の特許文献１等、既存の技術を利用することにより実現可能である。 Here, the person specifying unit 130 can be realized by using an existing technique such as the above-described Patent Document 1.

音声認識部１４０は、音声受信部１２０により受信し、人物特定部１３０により発言者毎に分離された音声データを解析し、それぞれをテキストデータに変換する。更に、音声認識部１４０は、変換したテキストデータについて、形態素解析を行い、語句を抽出する。図５の模式図によれば、例えば、「Ａさん」の発言からは、「ＡＡＡ」、「ＢＢＸ」、「ＣＣＣ」が抽出される。 The speech recognition unit 140 analyzes the speech data received by the speech reception unit 120 and separated for each speaker by the person identification unit 130, and converts each into text data. Furthermore, the speech recognition unit 140 performs morphological analysis on the converted text data and extracts words. According to the schematic diagram of FIG. 5, for example, “AAA”, “BBX”, and “CCC” are extracted from the utterance of “Mr. A”.

ここで、音声認識部１４０は、図６に示す類義語テーブルを参照し、「ＢＢＸ」を類義語である「ＢＢＢ」に変換する。これにより、「Ｂさん」の発言から抽出される「ＢＢＢ」と同一であると認識できる。 Here, the speech recognition unit 140 refers to the synonym table shown in FIG. 6 and converts “BBX” into a synonym “BBB”. Thereby, it can be recognized that it is the same as “BBB” extracted from the remarks of “Mr. B”.

ここでは、既存の技術に基づいて、認識できた音声が語句の組合せとしてテキストデータ（図５の符号５１、５２、５３、５４）で記憶される。このとき、人物特定部１３０により特定した発言者と関連付けて記憶される。更に、音声データの音量を示す音量データを関連付けて記憶してよい。 Here, based on existing technology, recognized speech is stored as text data (reference numerals 51, 52, 53, and 54 in FIG. 5) as a combination of words. At this time, it is stored in association with the speaker specified by the person specifying unit 130. Furthermore, sound volume data indicating the sound data volume may be stored in association with each other.

重み付け設定部１５０は、音声認識部１４０により変換されたテキストデータのうち、重要語を抽出する対象であるテキストデータを選択し、更に、各テキストデータに対して重要度の重み付けを行う。 The weight setting unit 150 selects text data that is a target for extracting important words from the text data converted by the voice recognition unit 140, and further weights each text data with importance.

具体的には、重み付け設定部１５０は、まず、発言時刻が新しいものから所定数のテキストデータを選択する。この所定数は、予め設定されるものであって、例えば、所定数を４と設定すれば、図５の模式図ではテキストデータ５１、５２、５３および５４が選択される。 Specifically, the weight setting unit 150 first selects a predetermined number of text data from the newest utterance time. The predetermined number, be one that is set in advance, for example, if 4 and setting a predetermined number, text data 51,52,5 3 Contact and 54 is selected by the schematic diagram of FIG.

ここで、テキストデータの中に、「ところで」、「さて」、「話は違うが」等の話題転換語が現れた場合には、この話題転換語以降のテキストデータを選択することとする。例えば、所定数が４であっても、テキストデータ５４に話題転換語が現れた場合には、テキストデータ５１、５２および５３は除外される。このことにより、現在話されている話題に関するテキストデータが選択される。 Here, when a topic change word such as “by the way”, “Now”, or “the story is different” appears in the text data, the text data after the topic change word is selected. For example, the predetermined number of even 4, if the topic transformed word appears in the text data 54, text data 51,5 2 Contact and 53 are excluded. As a result, text data related to the currently spoken topic is selected.

続いて、重み付け設定部１５０は、各テキストデータに関して、発言者の発言数や音量等に基づいて、重要度の重み付けを行う。例えば、発言数が他より多い発言者のテキストデータに対しては重要度を高くする。また、音声認識部１４０によりテキストデータに関連付けられた音量データが大きいほど、重要度を高くする。あるいは、同一の発言者のテキストデータの中で音量データが相対的に大きいものについて、重要度を高くしてもよい。 Subsequently, the weight setting unit 150 weights each text data based on the number of speakers, the volume, and the like. For example, the importance is increased for text data of a speaker who has a larger number of utterances than others. Also, the greater the volume data associated with the text data by the voice recognition unit 140, the higher the importance. Or you may make importance high about the thing with relatively large volume data among the text data of the same speaker.

また、音声認識部１４０により、音量データは、テキストデータに含まれる語句それぞれに対して関連付けてもよい。この場合、重み付け設定部１５０は、テキストデータに含まれる語句に対して重要度の重み付けができる。 In addition, the voice recognition unit 140 may associate the volume data with each word / phrase included in the text data. In this case, the weight setting unit 150 can weight the importance of words included in the text data.

重要語抽出部１６０は、重み付け設定部１５０により選択されたテキストデータ５１、５２、５３および５４を参照し、それぞれに含まれる語句について、出現頻度に関する指標として、ＴＦ（ＴｅｒｍＦｒｅｑｕｅｎｃｙ）とＩＤＦ（ＩｎｖｅｒｓｅＤｏｃｕｍｅｎｔＦｒｅｑｕｅｎｃｙ）の積であるＴＦ・ＩＤＦ（ｔ）値を算出する。 Important word extracting unit 160 refers to the text data 51,52,5 3 Contact and 54 selected by the weight setting unit 150, the words contained in each of the indicators of frequency, TF (Term Frequency) and IDF A TF · IDF (t) value that is a product of (Inverse Document Frequency) is calculated.

語句ｔについてのＴＦ・ＩＤＦ（ｔ）値は、

により算出される。
ＴＦ（ｔ）は、語句ｔがテキストデータに含まれる数を示す。
ＤＦ（ｔ）は、語句ｔが含まれるテキストデータの数を示す。
Ｎは、選択されたテキストデータの数を示す。 The TF · IDF (t) value for the word t is

Is calculated by
TF (t) indicates the number of words / phrases t included in the text data.
DF (t) indicates the number of text data including the word t.
N indicates the number of selected text data.

例えば、図５の模式図によれば、所定数「Ｎ＝４」としたとき、テキストデータ５４に含まれる「ＡＡＡ」については、「ＴＦ（ｔ）＝１」、「ＤＦ（ｔ）＝４」となる。また、「ＥＥＥ」については、「ＴＦ（ｔ）＝２」、「ＤＦ（ｔ）＝２」となる。 For example, according to the schematic diagram of FIG. 5, when the predetermined number “N = 4” is set, “TF (t) = 1” and “DF (t) = 4” are included for “AAA” included in the text data 54. " For “EEE”, “TF (t) = 2” and “DF (t) = 2”.

すなわち、ＴＦ・ＩＤＦ（ｔ）値は、「ＡＡＡ」に比べて「ＥＥＥ」が大きくなる。重要語抽出部１６０は、このＴＦ・ＩＤＦ（ｔ）値が最大の語句を重要語として抽出する。なお、ＴＦ・ＩＤＦ（ｔ）値が最大の語句のみではなく、複数の語句を抽出してもよく、また、一定値以上の語句を抽出することとしてもよい。 That is, the TF · IDF (t) value is larger in “EEE” than “AAA”. The keyword extraction unit 160 extracts the word / phrase having the maximum TF / IDF (t) value as the keyword. It should be noted that not only the word / phrase having the maximum TF · IDF (t) value but also a plurality of words / phrases may be extracted, or words / phrases having a certain value or more may be extracted.

ここで、重要語抽出部１６０は、重み付け設定部１５０により設定された重要度の重み付けに基づいて、ＴＦ（ｔ）、ＤＦ（ｔ）、あるいはＴＦ・ＩＤＦ（ｔ）値を調節することが好ましい。これにより、重要語抽出部１６０は、発言数や音量に基づく重要語を抽出することができる。 Here, the keyword extraction unit 160 preferably adjusts the TF (t), DF (t), or TF / IDF (t) value based on the importance weighting set by the weighting setting unit 150. . Thereby, the important word extraction part 160 can extract the important word based on the number of utterances and a sound volume.

なお、第１実施形態のＴＦ・ＩＤＦ（ｔ）値の計算式は、重要度の指標としての一例であって、これには限られない。例えば、対数（ｌｏｇ）を用いなくてもよく、また、対数の底として、「１０」、「２」、あるいは自然対数「ｅ」等、適宜設定することができる。 Note that the formula for calculating the TF / IDF (t) value of the first embodiment is an example as an index of importance, and is not limited thereto. For example, the logarithm (log) need not be used, and the base of the logarithm can be set as appropriate, such as “10”, “2”, or the natural logarithm “e”.

また、ＴＦ（ｔ）は語句ｔの数としたが、これには限られず、テキストデータに含まれる割合等、出現頻度を示す値としてよい。また、ＤＦ（ｔ）についても同様に、Ｎに対して、語句ｔが含まれるテキストデータの割合等、出現頻度を示す値としてよい。 Moreover, although TF (t) is the number of words / phrases t, it is not limited thereto, and may be a value indicating the appearance frequency, such as a ratio included in text data. Similarly, DF (t) may be a value indicating the appearance frequency, such as a ratio of text data including the word t with respect to N.

重要語送信部１７０は、重要語抽出部１６０により抽出された重要語を、電話番号特定部１１０により特定された端末装置２０に対して送信する。この処理は、端末装置２０に対する文字データの片方向の通信でよいため、いわゆるプッシュ配信の手法を利用することができる。重要語送信部１７０は、電話番号特定部１１０により特定された端末装置２０とのデータ通信接続を確立した後、重要語抽出部１６０により重要語が抽出される度に、端末装置２０に対して配信を継続する。 The keyword transmission unit 170 transmits the keyword extracted by the keyword extraction unit 160 to the terminal device 20 specified by the telephone number specification unit 110. Since this process may be a one-way communication of character data to the terminal device 20, a so-called push distribution technique can be used. The important word transmission unit 170 establishes a data communication connection with the terminal device 20 specified by the telephone number specifying unit 110 and then extracts the important word from the terminal device 20 each time an important word is extracted by the important word extraction unit 160. Continue delivery.

ここで、端末装置２０は、サーバ１０の重要語送信部１７０から受信した重要語を表示することにより、利用者に報知する。これにより、端末装置２０の利用者は、会話の中で刻々と変化する重要語を、リアルタイムで把握することができる。 Here, the terminal device 20 notifies the user by displaying the important word received from the important word transmission unit 170 of the server 10. Thereby, the user of the terminal device 20 can grasp the important words that change every moment in the conversation in real time.

また、端末装置２０は、受信した重要語を、発言の時刻と対応付けて時系列に記憶する。これにより、利用者は、後から会話の流れを把握することができる。 Further, the terminal device 20 stores the received important words in time series in association with the time of utterance. Thereby, the user can grasp | ascertain the flow of conversation later.

［処理フロー］
図７は、第１実施形態に係るサーバ１０における制御装置１０１の処理を示すフローチャートである。 [Processing flow]
FIG. 7 is a flowchart showing processing of the control device 101 in the server 10 according to the first embodiment.

ステップＳ１では、制御装置１０１は、端末装置２０から、ユーザＩＤと共に、処理の開始要求を受信する。 In step S 1, the control device 101 receives a process start request from the terminal device 20 together with the user ID.

ステップＳ２では、制御装置１０１は、ステップＳ１で受信したユーザＩＤにより、端末装置２０の電話番号を特定する。 In step S2, the control device 101 identifies the telephone number of the terminal device 20 based on the user ID received in step S1.

ステップＳ３では、制御装置１０１は、端末装置２０から会話の音声データを受信する。 In step S 3, the control device 101 receives conversation voice data from the terminal device 20.

ステップＳ４では、制御装置１０１は、ステップＳ３にて受信した音声データの発言者を特定する。 In step S4, the control apparatus 101 specifies the speaker of the voice data received in step S3.

ステップＳ５では、制御装置１０１は、ステップＳ４にて特定した発言者が変わったか否かを判定する。この判定がＹＥＳの場合は、同一発言者による一連の発言が終了したと判断できるので、ステップＳ６に移る。一方、判定がＮＯの場合は、同一発言者による発言が継続していると判断できるので、ステップＳ３に戻り音声データの受信を継続する。 In step S5, the control apparatus 101 determines whether the speaker specified in step S4 has changed. If this determination is YES, it can be determined that a series of utterances by the same speaker has ended, and the process moves to step S6. On the other hand, if the determination is NO, it can be determined that the utterance by the same speaker is continuing, so the process returns to step S3 and continues to receive the audio data.

ステップＳ６では、制御装置１０１は、ステップＳ３にて受信した音声データをテキストデータに変換する。 In step S6, the control device 101 converts the voice data received in step S3 into text data.

ステップＳ７では、制御装置１０１は、ステップＳ６にて変換されたテキストデータを形態素解析し、テキストデータに含まれる語句を抽出する。 In step S7, the control device 101 performs morphological analysis on the text data converted in step S6, and extracts a phrase included in the text data.

ステップＳ８では、制御装置１０１は、ステップＳ７にて抽出された語句に対して、重み付けを行う。具体的には、上述のように、テキストデータの新しさ、発言者の発言回数や音量等により、重要度を調整する。 In step S8, the control apparatus 101 weights the word / phrase extracted in step S7. Specifically, as described above, the importance level is adjusted based on the newness of the text data, the number of times the speaker speaks, the volume, and the like.

ステップＳ９では、制御装置１０１は、ステップＳ８にて重み付けされた語句について、ＴＦ・ＩＤＦ（ｔ）値を算出し、この値に基づいて重要語を抽出する。 In step S9, the control apparatus 101 calculates a TF / IDF (t) value for the word weighted in step S8, and extracts an important word based on this value.

ステップＳ１０では、制御装置１０１は、ステップＳ９にて抽出された重要語を端末装置２０に送信する。 In step S 10, the control device 101 transmits the important word extracted in step S 9 to the terminal device 20.

ステップＳ１１では、制御装置１０１は、処理を終了するか否かを判定する。具体的には、端末装置２０から終了要求を受信したことにより処理を終了すると判定する。この判定がＹＥＳの場合は処理を終了し、判定がＮＯの場合はステップＳ３に戻り、音声データの受信を継続する。 In step S11, the control device 101 determines whether or not to end the process. Specifically, it is determined that the process is ended by receiving the end request from the terminal device 20. If this determination is YES, the process ends. If the determination is NO, the process returns to step S3 and continues to receive audio data.

（第２実施形態）
次に、第２実施形態について説明する。第２実施形態では、ライブ中継送信装置からライブ広告配信サーバに映像・音声を送信し、ライブ広告配信サーバにおいて、端末装置に映像・音声をリアルタイム配信すると共に、重要語の抽出を行い、この抽出した重要語に応じた広告を、端末装置に配信するものである。 (Second Embodiment)
Next, a second embodiment will be described. In the second embodiment, video / audio is transmitted from the live relay transmission device to the live advertisement distribution server, and the live advertisement distribution server distributes the video / audio to the terminal device in real time and extracts important words. The advertisement according to the important word is delivered to the terminal device.

なお、以下の説明において、上述した第１実施形態と同様の機能を果たす部分には、同一の符号または末尾に同一の符号を付して、重複する説明を適宜省略する。 In the following description, the same reference numerals or the same reference numerals are given to the portions that perform the same functions as those in the first embodiment described above, and overlapping descriptions will be omitted as appropriate.

［システム概要］
図８は、第２実施形態に係るシステムの概要を示す図である。ライブ中継の映像・音声をライブ広告配信サーバ２００に送信するライブ中継送信装置３０と、端末装置２０と、ライブにおける映像・音声を端末装置２０にリアルタイム配信すると共に、ライブの音声の重要語を抽出し、この重要語に応じた広告をリアルタイム配信と連動させて端末装置２０に配信するライブ広告配信サーバ２００とが、ネットワークを介して接続されている。第２実施形態では、端末装置２０はリアルタイム配信されるライブ中継の映像・音声を受信して再生できるブラウザを備えているものとして説明する。 [System Overview]
FIG. 8 is a diagram illustrating an overview of a system according to the second embodiment. Live relay transmission device 30 that transmits live relay video / audio to live advertisement distribution server 200, terminal device 20, and live video / audio to terminal device 20 are distributed in real time, and important words of live audio are extracted. A live advertisement distribution server 200 that distributes advertisements corresponding to the important words to the terminal device 20 in conjunction with real-time distribution is connected via a network. In the second embodiment, the terminal device 20 will be described as having a browser capable of receiving and reproducing live broadcast video / audio distributed in real time.

ライブ中継送信装置３０は、ライブを中継し、ライブ広告配信サーバ２００に映像・音声を送信する。 The live relay transmission device 30 relays the live and transmits video / audio to the live advertisement distribution server 200.

ライブ広告配信サーバ２００は、ライブ中継送信装置３０から映像・音声を受信すると、この映像・音声を端末装置２０にリアルタイム配信すると共に、音声認識技術を用いて音声のテキストデータを生成する。続いてライブ広告配信サーバ２００は、生成したテキストデータに対して、語の新鮮度を考慮したＴＦ・ＩＤＦ技術を用い、会話の中の重要語を抽出する。そして、抽出した重要語に応じた広告を取得し、リアルタイム配信と連動させて端末装置２０に配信する。 When the live advertisement distribution server 200 receives the video / audio from the live relay transmission device 30, the live advertisement distribution server 200 distributes the video / audio to the terminal device 20 in real time and generates voice text data using a voice recognition technology. Subsequently, the live advertisement distribution server 200 extracts an important word in the conversation by using the TF / IDF technology considering the freshness of the word for the generated text data. Then, an advertisement corresponding to the extracted important word is acquired and distributed to the terminal device 20 in conjunction with real-time distribution.

端末装置２０は、ライブ広告配信サーバ２００から映像・音声を受信し再生すると共に、広告を受信して映像・音声と連動させて表示する。 The terminal device 20 receives and reproduces the video / audio from the live advertisement distribution server 200, and also receives the advertisement and displays it in conjunction with the video / audio.

［機能構成］
図９は、第２実施形態に係るライブ広告配信サーバ２００における、制御装置１０１によって実行される主な機能の構成を示す図である。なお、以下に説明する各機能は、ライブ広告配信サーバ２００単体において実現されることとしたが、これには限られず、適宜、複数のサーバに機能を分散させてもよい。 [Function configuration]
FIG. 9 is a diagram illustrating a configuration of main functions executed by the control device 101 in the live advertisement distribution server 200 according to the second embodiment. In addition, although each function demonstrated below was implement | achieved in the live advertisement delivery server 200 single-piece | unit, it is not restricted to this, You may distribute a function to several servers suitably.

ライブ広告配信サーバ２００は、映像音声受信部１２１と、人物特定部１３０と、音声認識部１４０と、重み付け設定部１５０と、重要語抽出部１６０と、広告抽出部２１０と、配信広告決定部２２０と、広告配信部２３０と、リアルタイム配信部２４０と、を備える。 The live advertisement distribution server 200 includes a video / audio reception unit 121, a person identification unit 130, a voice recognition unit 140, a weighting setting unit 150, a keyword extraction unit 160, an advertisement extraction unit 210, and a distribution advertisement determination unit 220. And an advertisement distribution unit 230 and a real-time distribution unit 240.

映像音声受信部１２１は、ライブ中継送信装置３０により収録され、電子データとして変換された映像・音声データを受信する。 The video / audio receiver 121 receives the video / audio data recorded by the live relay transmitter 30 and converted as electronic data.

人物特定部１３０は、映像音声受信部１２１により受信した映像・音声データのうち、音声データを抽出して、この音声データについて、第１実施形態の人物特定部１３０と同様に発言者の特定を行う。 The person specifying unit 130 extracts audio data from the video / audio data received by the audio / video receiving unit 121, and specifies the speaker for the audio data in the same manner as the person specifying unit 130 of the first embodiment. Do.

音声認識部１４０と、重み付け設定部１５０と、重要語抽出部１６０と、については、第１実施形態と同様の機能を有する。 The voice recognition unit 140, the weight setting unit 150, and the keyword extraction unit 160 have the same functions as those in the first embodiment.

広告抽出部２１０は、後述で説明する図１０に示す広告テーブルを参照して、重要語抽出部１６０により抽出された重要語に応じた広告を抽出する。具体的には、重要語と、広告テーブルに記憶されるキーワードとのマッチングを行う。そして、マッチングにおいて、重要語と一致するキーワードを含む広告を抽出する。 The advertisement extraction unit 210 extracts an advertisement corresponding to the keyword extracted by the keyword extraction unit 160 with reference to the advertisement table shown in FIG. Specifically, matching is performed between an important word and a keyword stored in the advertisement table. Then, in the matching, an advertisement including a keyword that matches the keyword is extracted.

図１０は、第２実施形態に係る広告テーブルを示す図である。広告テーブルには、広告ＩＤと、キーワードと、広告内容と、ＵＲＬ等が記憶されている。広告ＩＤは、記憶されている広告を特定するキーである。キーワードは、重要語抽出部１６０により抽出される重要語とのマッチングを行うものである。広告内容およびＵＲＬは、端末装置２０に広告として配信される情報であり、ライブの映像と共に表示される。 FIG. 10 is a diagram illustrating an advertisement table according to the second embodiment. The advertisement table stores an advertisement ID, a keyword, advertisement contents, a URL, and the like. The advertisement ID is a key for specifying the stored advertisement. The keyword is used for matching with the keyword extracted by the keyword extraction unit 160. The advertisement content and URL are information distributed as an advertisement to the terminal device 20 and are displayed together with a live video.

図９に戻り、配信広告決定部２２０は、広告抽出部２１０により抽出された広告の数が予め定められた表示広告数より少ない場合には、抽出された広告を配信する広告として決定し、抽出された広告の数が表示広告数以上である場合には、抽出された広告の中から配信する広告を表示広告数だけランダムに選択したり、広告入札金額の高額なものの順に選択したりと、予め定められた規則に基づいて調整を行う。なお、ここで表示広告数は、ライブの映像・音声のリアルタイム配信と共に配信する広告の数であり、適宜決定できる数である。 Returning to FIG. 9, when the number of advertisements extracted by the advertisement extraction unit 210 is less than the predetermined number of display advertisements, the distribution advertisement determination unit 220 determines and extracts the extracted advertisements as distribution advertisements. If the number of ads that are displayed is greater than or equal to the number of displayed ads, the number of ads to be distributed is selected at random from the number of extracted ads, or the ads with the highest bid amount are selected, Adjustments are made based on predetermined rules. Note that the number of displayed advertisements is the number of advertisements distributed together with live video / audio real-time distribution, and can be determined as appropriate.

広告配信部２３０は、配信広告決定部２２０により配信することが決定された広告を端末装置２０に配信する。 The advertisement distribution unit 230 distributes the advertisement determined to be distributed by the distribution advertisement determination unit 220 to the terminal device 20.

リアルタイム配信部２４０は、映像音声受信部１２１により受信した映像・音声データを端末装置２０にリアルタイム配信する。 The real-time distribution unit 240 distributes the video / audio data received by the video / audio reception unit 121 to the terminal device 20 in real time.

［広告配信処理のフローチャート］
図１１は、第２実施形態に係るライブ広告配信サーバ２００における、制御装置１０１によって実行される広告配信処理を示すフローチャートである。なお、ライブ広告配信サーバ２００では、広告配信処理とは別に、制御装置１０１により映像・音声のリアルタイム配信処理が行われる。 [Flow chart of advertisement distribution processing]
FIG. 11 is a flowchart showing an advertisement distribution process executed by the control device 101 in the live advertisement distribution server 200 according to the second embodiment. In the live advertisement distribution server 200, the video / audio real-time distribution process is performed by the control device 101 separately from the advertisement distribution process.

ステップＳ１０１では、制御装置１０１は、ライブ中継送信装置３０から、会話の映像・音声データを受信する。 In step S 101, the control apparatus 101 receives conversation video / audio data from the live relay transmission apparatus 30.

ステップＳ１０２では、制御装置１０１は、ステップＳ１０１にて受信した映像・音声データより音声データを抽出し、この音声データについて発言者を特定する。 In step S102, the control apparatus 101 extracts audio data from the video / audio data received in step S101, and identifies a speaker for the audio data.

ステップＳ１０３では、制御装置１０１は、ステップＳ１０２にて特定した発言者が変わったか否かを判定する。この判定がＹＥＳの場合は、同一発言者による一連の発言が終了したと判断できるので、ステップＳ１０４に移る。一方、判定がＮＯの場合は、同一発言者による発言が継続していると判断できるので、ステップＳ１０１に戻り音声データの受信を継続する。 In step S103, the control apparatus 101 determines whether the speaker specified in step S102 has changed. If this determination is YES, it can be determined that a series of utterances by the same speaker has ended, and the process moves to step S104. On the other hand, if the determination is NO, it can be determined that the utterance by the same speaker is continuing, so the process returns to step S101 to continue receiving the audio data.

ステップＳ１０４では、制御装置１０１は、ステップＳ１０１にて受信した音声データをテキストデータに変換する。 In step S104, the control device 101 converts the voice data received in step S101 into text data.

ステップＳ１０５では、制御装置１０１は、ステップＳ１０４にて変換されたテキストデータを形態素解析し、テキストデータに含まれる語句を抽出する。 In step S105, the control device 101 performs morphological analysis on the text data converted in step S104, and extracts a phrase included in the text data.

ステップＳ１０６では、制御装置１０１は、ステップＳ１０５にて抽出された語句に対して重み付けを行う。具体的には、上述のように、テキストデータの新しさ、発言者の発言回数や音量等により、重要度を調整する。 In step S106, the control apparatus 101 weights the words extracted in step S105. Specifically, as described above, the importance level is adjusted based on the newness of the text data, the number of times the speaker speaks, the volume, and the like.

ステップＳ１０７では、制御装置１０１は、ステップＳ１０６にて重み付けされた語句について、ＴＦ・ＩＤＦ（ｔ）値を算出し、この値に基づいて重要語を抽出する。 In step S107, the control device 101 calculates a TF / IDF (t) value for the word weighted in step S106, and extracts an important word based on this value.

ステップＳ１０８では、制御装置１０１は、ステップＳ１０７により抽出された重要語に応じた広告を記憶装置１０７より抽出する。 In step S108, the control device 101 extracts an advertisement corresponding to the important word extracted in step S107 from the storage device 107.

ステップＳ１０９では、制御装置１０１は、ステップＳ１０８により抽出された広告を端末装置２０に配信する。 In step S109, the control device 101 distributes the advertisement extracted in step S108 to the terminal device 20 .

ステップＳ１１０では、制御装置１０１は、処理を終了するか否かを判定する。具体的には、ライブ中継送信装置３０から終了要求を受信したことにより処理を終了すると判定する。この判定がＹＥＳの場合は処理を終了し、判定がＮＯの場合はステップＳ１０１に戻り、映像・音声データの受信を継続する。 In step S110, the control device 101 determines whether or not to end the process. Specifically, it is determined that the process is to be terminated when a termination request is received from the live relay transmitter 30. If this determination is YES, the process ends. If the determination is NO, the process returns to step S101 to continue receiving video / audio data.

［広告配信の表示例］
図１２および図１３は、第２実施形態に係る端末装置２０における広告配信の表示例を示す図である。なお、図１２および図１３の説明において、配信広告決定部２２０における表示広告数を「３」とする。 [Display example of advertisement delivery]
12 and 13 are diagrams illustrating display examples of advertisement distribution in the terminal device 20 according to the second embodiment. In the description of FIG. 12 and FIG. 13, the number of display advertisements in the distribution advertisement determination unit 220 is “3”.

図１２は、ライブ中継の様子を示す図である。図１２では、ＥさんとＦさんとが会話をする様子がライブ中継されており、ライブ中継送信装置３０により、このライブ中継の映像・音声データがライブ広告配信サーバ２００に送信される。ここでは、ＥさんとＦさんの会話の内容をふきだし５５〜５８で示しており、ふきだし５５および５７がＥさんの発言であり、ふきだし５６および５８がＦさんの発言である。 FIG. 12 is a diagram illustrating a state of live relay. In FIG. 12, the state where Mr. E and Mr. F have a conversation is being relayed live, and the live relay transmission device 30 transmits the live relay video / audio data to the live advertisement distribution server 200. Here, the contents of the conversation between Mr. E and Mr. F are indicated by balloons 55 to 58, balloons 55 and 57 are comments of Mr. E, and balloons 56 and 58 are comments of Mr. F.

ライブ広告配信サーバ２００は、映像音声受信部１２１により、ライブ中継の映像・音声データを受信する。そして、人物特定部１３０により、音声データを抽出し、この音声データについて発言者の特定を行う。図１２では、「Ｅさん」と「Ｆさん」との音声データに分離される。そして、音声認識部１４０により、発言者毎に分離された音声データを解析し、それぞれをテキストデータに変換する。そして、変換したテキストデータについて、形態素解析を行い、語句を抽出する。図１２のふきだし５５〜５８によれば、「趣味」、「ドライブ」、「休日」が抽出される。そして、音声認識部１４０により類義語についてテキストデータの変換を行い、重み付け設定部１５０により、テキストデータに対して重み付けを行う。ここでは、ふきだし５５〜５８に示される音声から変換された４つのテキストデータが選択される。 The live advertisement distribution server 200 receives live broadcast video / audio data by the video / audio reception unit 121. Then, the voice data is extracted by the person specifying unit 130, and the speaker is specified for the voice data. In FIG. 12, the voice data of “Mr. E” and “Mr. F” are separated. Then, the speech recognition unit 140 analyzes the speech data separated for each speaker and converts each into text data. Then, morphological analysis is performed on the converted text data to extract words. According to balloons 55 to 58 in FIG. 12, “hobby”, “drive”, and “holiday” are extracted. Then, the speech recognition unit 140 converts the text data for the synonym, and the weighting setting unit 150 weights the text data. Here, four text data converted from the speech shown in speech bubbles 55-58 are selected.

重要語抽出部１６０は、重み付け設定部１５０により選択された４つのテキストデータを参照し、ＴＦ・ＩＤＦ（ｔ）値を算出する。ここでは、第１実施形態の重要語抽出部１６０における例と同様に、所定数「Ｎ＝４」としたとき、ふきだし５８に応じたテキストデータに含まれる語句「ドライブ」については、「ＴＦ（ｔ）＝３」、「ＤＦ（ｔ）＝３」となり、ＴＦ・ＩＤＦ（ｔ）値が最大の語句、すなわち重要語として抽出される。 The keyword extraction unit 160 refers to the four text data selected by the weight setting unit 150 and calculates a TF / IDF (t) value. Here, as in the example of the keyword extraction unit 160 of the first embodiment, when the predetermined number “N = 4”, the phrase “drive” included in the text data corresponding to the speech balloon 58 is “TF ( t) = 3 ”and“ DF (t) = 3 ”, and the TF / IDF (t) value is extracted as the word having the maximum value, that is, the important word.

広告抽出部２１０は、広告テーブル（図１０）を参照し、重要語として抽出された「ドライブ」がキーワードに含まれている広告を抽出する。ここでは、広告ＩＤ「１０１」、「１０２」の広告が抽出される。配信広告決定部２２０は、抽出された広告の数が「２」で、表示広告数、すなわち「３」より小さいので、広告ＩＤ「１０１」、「１０２」の広告が配信される広告として決定される。そして、広告配信部２３０により、広告ＩＤ「１０１」、「１０２」の広告が端末装置２０に配信される。 The advertisement extraction unit 210 refers to the advertisement table (FIG. 10) and extracts an advertisement in which “drive” extracted as an important word is included in the keyword. Here, advertisements with advertisement IDs “101” and “102” are extracted. Since the number of extracted advertisements is “2” and smaller than the number of displayed advertisements, that is, “3”, the distribution advertisement determination unit 220 is determined as an advertisement to which advertisements with advertisement IDs “101” and “102” are distributed. The Then, the advertisements with the advertisement IDs “101” and “102” are distributed to the terminal device 20 by the advertisement distribution unit 230.

図１３は、第２実施形態に係る端末装置２０に対して、重要語に応じた広告が配信されたときの表示例を示す図である。 FIG. 13 is a diagram illustrating a display example when an advertisement corresponding to an important word is distributed to the terminal device 20 according to the second embodiment.

図１３では、端末装置２０に設けられた表示部にブラウザ３０１が表示されている。そして、ブラウザ３０１にライブ映像３０２が表示されているのを確認できる。また、ライブ映像３０２の右部にスポンサー広告として、広告３０３および広告３０４が表示されているのを確認できる。この広告３０３および広告３０４は、広告配信部２３０により、ライブ広告配信サーバ２００から配信された広告であり、広告テーブル（図１０）の広告ＩＤ「１０１」、「１０２」に係る広告内容およびＵＲＬがそれぞれ表示されているのを確認できる。 In FIG. 13, the browser 301 is displayed on the display unit provided in the terminal device 20. Then, it can be confirmed that the live video 302 is displayed on the browser 301. Further, it can be confirmed that the advertisement 303 and the advertisement 304 are displayed as sponsor advertisements on the right side of the live video 302. The advertisements 303 and 304 are advertisements distributed from the live advertisement distribution server 200 by the advertisement distribution unit 230, and the advertisement contents and URLs related to the advertisement IDs “101” and “102” in the advertisement table (FIG. 10) are the same. You can see each displayed.

このように、ライブ中継の会話における重要語を抽出して、この重要語に応じた広告をライブ中継と共に配信するので、広告がライブ中継に調和し、違和感がない広告表示を実現できる。また、端末装置２０のユーザがライブ中継の出演者のファンである場合には、この広告表示に対してクリックする確率が高い状況、すなわち、高いコンバージョン率を見込むことができる。更に、表示される広告は、ライブ中継の映像・音声と調和した広告であるため、ユーザの印象に残り易いものとなり、高い広告効果が期待できる。 In this way, since an important word in a live relay conversation is extracted and an advertisement corresponding to the important word is distributed together with the live relay, the advertisement is harmonized with the live relay, and an advertisement display without a sense of incongruity can be realized. When the user of the terminal device 20 is a fan of a live broadcast performer, it is possible to expect a situation where the probability of clicking on the advertisement display is high, that is, a high conversion rate. Furthermore, since the displayed advertisement is an advertisement in harmony with the live broadcast video / audio, it tends to remain in the user's impression, and a high advertising effect can be expected.

以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限るものではない。また、本発明の実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、本発明の実施形態に記載されたものに限定されるものではない。 As mentioned above, although embodiment of this invention was described, this invention is not restricted to embodiment mentioned above. The effects described in the embodiments of the present invention are only the most preferable effects resulting from the present invention, and the effects of the present invention are limited to those described in the embodiments of the present invention. is not.

第１実施形態に係るシステムの概要を示す図である。It is a figure which shows the outline | summary of the system which concerns on 1st Embodiment. 第１実施形態に係るサーバ１０のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the server 10 which concerns on 1st Embodiment. 第１実施形態に係るサーバ１０における、制御装置１０１の主な機能の構成を示す図である。It is a figure which shows the structure of the main functions of the control apparatus 101 in the server 10 which concerns on 1st Embodiment. 第１実施形態に係る登録ユーザテーブルを示す図である。It is a figure which shows the registered user table which concerns on 1st Embodiment. 第１実施形態に係る会話から生成されるテキストデータの模式図である。It is a schematic diagram of the text data produced | generated from the conversation which concerns on 1st Embodiment. 第１実施形態に係る類義語テーブルを示す図である。It is a figure which shows the synonym table which concerns on 1st Embodiment. 第１実施形態に係るサーバ１０における制御装置１０１の処理を示すフローチャートである。It is a flowchart which shows the process of the control apparatus 101 in the server 10 which concerns on 1st Embodiment. 第２実施形態に係るシステムの概要を示す図である。It is a figure which shows the outline | summary of the system which concerns on 2nd Embodiment. 第２実施形態に係るライブ広告配信サーバ２００における、制御装置１０１によって実行される主な機能の構成を示す図である。It is a figure which shows the structure of the main functions performed by the control apparatus 101 in the live advertisement delivery server 200 which concerns on 2nd Embodiment. 第２実施形態に係る広告テーブルを示す図である。It is a figure which shows the advertisement table which concerns on 2nd Embodiment. 第２実施形態に係るライブ広告配信サーバ２００における、制御装置１０１によって実行される広告配信処理を示すフローチャートである。It is a flowchart which shows the advertisement delivery process performed by the control apparatus 101 in the live advertisement delivery server 200 which concerns on 2nd Embodiment. 第２実施形態に係るライブ中継の様子を示す図である。It is a figure which shows the mode of the live relay which concerns on 2nd Embodiment. 第２実施形態に係る端末装置２０に対して、重要語に応じた広告が配信されたときの表示例を示す図である。It is a figure which shows the example of a display when the advertisement according to an important word is delivered with respect to the terminal device 20 which concerns on 2nd Embodiment.

１０サーバ
２０端末装置
３０ライブ中継送信装置
１０１制御装置
１０７記憶装置
１１０電話番号特定部
１２０音声受信部
１２１映像音声受信部
１３０人物特定部
１４０音声認識部
１５０重み付け設定部
１６０重要語抽出部
１７０重要語送信部
２００ライブ広告配信サーバ
２１０広告抽出部
２２０配信広告決定部
２３０広告配信部
２４０リアルタイム配信部 DESCRIPTION OF SYMBOLS 10 Server 20 Terminal apparatus 30 Live relay transmission apparatus 101 Control apparatus 107 Storage apparatus 110 Telephone number specific | specification part 120 Audio | voice receiving part 121 Image | video audio | voice receiving part 130 Person specific part 140 Voice recognition part 150 Weight setting part 160 Important word extraction part 170 Important word Transmission unit 200 Live advertisement distribution server 210 Advertisement extraction unit 220 Distribution advertisement determination unit 230 Advertisement distribution unit 240 Real-time distribution unit

Claims

A server that extracts important words in a conversation,
Receiving means for receiving voice data of the conversation;
Separating means for separating voice data received by the receiving means for each speaker;
Conversion means for converting each of the audio data separated by the separation means into text data;
A selection means for selecting a predetermined number of text data from the newest speech time among the text data converted by the conversion means;
Extraction means for extracting an important word based on an index relating to the appearance frequency of each word included in the text data selected by the selection means,
The converting means associates volume data indicating the volume of the audio data with the text data,
The extracting means includes a TF value indicating the number of each word appeared in the text data before Symbol predetermined number, the product of the IDF value is the reciprocal of DF value indicating the number of text data containing each word Is calculated as the index, and the index is adjusted by weighting according to importance according to volume data associated with the text data .

The server according to claim 1, wherein the extraction unit increases the importance of text data of the same speaker whose volume data is relatively large.

In response to a processing start request from a terminal device connected via a network, a specifying unit that specifies the terminal device;
Connection means for establishing a data communication connection with the terminal device specified by the specifying means;
The server according to claim 1, further comprising: a transmission unit that transmits the important word extracted by the extraction unit to the terminal device in which a data communication connection has been established by the connection unit.

Real-time distribution means for distributing video and audio of the conversation in real time to a terminal device connected via a network;
Advertisement acquisition means for acquiring advertisement information in accordance with the important words extracted by the extraction means;
The server according to claim 1, further comprising: an advertisement distribution unit that distributes the advertisement information acquired by the advertisement acquisition unit to the terminal device in conjunction with the real-time distribution.

The selection means, when detecting that the text data includes a predetermined type of word, selects the text data after the predetermined type of word is spoken. Item 5. The server according to any one of Items 4 .

A synonym database for determining a synonym of each word included in the text data;
The server according to any one of claims 1 to 5 , wherein the extraction unit calculates the index related to the appearance frequency of each word including the synonyms stored in the synonym database.

The extraction means weights the index of each word included in the text data of the speaker based on the number of text data with the same speaker in the text data selected by the selection means. server according to any one of claims 1 to 6, which comprises carrying out.

A system for extracting important words in the conversation by a terminal device that receives the voice of the conversation and a server capable of data communication via a network,
The server
In response to a processing start request from the terminal device, a specifying unit that specifies the terminal device;
Connection means for establishing a data communication connection with the terminal device specified by the specifying means;
Receiving means for receiving voice data of the conversation;
Separating means for separating voice data received by the receiving means for each speaker;
Conversion means for converting each of the audio data separated by the separation means into text data;
A selection means for selecting a predetermined number of text data from the newest speech time among the text data converted by the conversion means;
Extraction means for extracting an important word based on an index relating to the appearance frequency of each word included in the text data selected by the selection means;
Transmitting means for transmitting the important word extracted by the extracting means to the terminal device in which a data communication connection has been established by the connecting means,
The terminal device
Display means for displaying the important word transmitted by the transmission means;
The converting means associates volume data indicating the volume of the audio data with the text data,
The extracting means includes a TF value indicating the number of each word appeared in the text data before Symbol predetermined number, the product of the IDF value is the reciprocal of DF value indicating the number of text data containing each word Is calculated as the index, and the index is adjusted by weighting according to importance according to volume data associated with the text data .

9. The system according to claim 8, wherein the terminal device further comprises storage means for storing the important words transmitted by the transmission means in time series.

A computer that extracts important words in a conversation,
Receiving the voice data of the conversation;
A separation step of separating the voice data received by the reception step for each speaker;
A conversion step of converting each of the audio data separated by the separation step into text data;
A selection step of selecting a predetermined number of text data from the newest speech time among the text data converted by the conversion step;
Extracting an important word based on an index relating to the appearance frequency of each word included in the text data selected by the selection step, and
In the conversion step, volume data indicating the volume of the audio data is associated with the text data,
In the extraction step, the TF value indicating the number of each word appeared in the text data before Symbol predetermined number, the product of the IDF value is the reciprocal of DF value indicating the number of text data containing each word Is calculated as the index, and the index is adjusted by weighting according to importance according to volume data associated with the text data .

A program that causes a computer to extract important words in a conversation,
Receiving the voice data of the conversation;
A separation step of separating the voice data received by the reception step for each speaker;
A conversion step of converting each of the audio data separated by the separation step into text data;
A selection step of selecting a predetermined number of text data from the newest speech time among the text data converted by the conversion step;
An extraction step of extracting an important word based on an index relating to the appearance frequency of each word included in the text data selected by the selection step;
In the conversion step, volume data indicating the volume of the audio data is associated with the text data,
In the extraction step, the TF value indicating the number of each word appeared in the text data before Symbol predetermined number, the product of the IDF value is the reciprocal of DF value indicating the number of text data containing each word Is calculated as the index, and the index is adjusted by weighting according to importance according to volume data associated with the text data .