JP2019110451A

JP2019110451A - Information processing system, information processing method, and program

Info

Publication number: JP2019110451A
Application number: JP2017242498A
Authority: JP
Inventors: 和真梅津; Kazuma Umezu
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2017-12-19
Filing date: 2017-12-19
Publication date: 2019-07-04
Anticipated expiration: 2037-12-19
Also published as: JP7052335B2

Abstract

To mitigate mental stress on a listener during a call.SOLUTION: A speech unit 10 includes: a voice reception unit 11 for collecting voice and generating first voice data; a transmission unit 12 for transmitting the first voice data to a communication opposite party's terminal; a reception unit 13 for acquiring second voice data transmitted from the communication opposite party's terminal; and an output unit 14 for outputting the communication opposite party's speech content. The output unit 14 has a first mode in which the communication opposite party's speech content is not output by sound but displayed by text, a second mode in which the communication opposite party's speech content is output by sound on the basis of processed second voice data, or a third mode in which the communication opposite party's speech content is output by a preregistered sound. During operation in any of the first to third modes, the voice reception unit 11 collects sound to generate the first voice data and the second transmission unit 12 transmits the first voice data to the communication opposite party's terminal.SELECTED DRAWING: Figure 2

Description

本発明は、情報処理システム、情報処理方法及びプログラムに関する。 The present invention relates to an information processing system, an information processing method, and a program.

特許文献１には、通話相手の感情を推定する情報処理端末が開示されている。 Patent Document 1 discloses an information processing terminal for estimating the emotion of a calling party.

特許文献２には、予め登録されたキーワードの部分を判読不能にして印刷する印刷システムが開示されている。 Patent Document 2 discloses a printing system in which a portion of a keyword registered in advance is rendered unreadable and printed.

特許文献３には、文字入力された単語を蓄積音声に変換し、通話相手の携帯通信端末に出力する文字モードを備えた携帯通信端末が開示されている。当該文字モードでは、相手からの音声を文字に変換し、ディスプレイに表示してもよいことが開示されている。 Patent Document 3 discloses a portable communication terminal provided with a character mode for converting a character-inputted word into a stored voice and outputting it to a portable communication terminal of a calling party. It is disclosed that in the character mode, the voice from the other party may be converted into characters and displayed on the display.

特許文献４には、文字入力されたメッセージを音声データにエンコードし、通話相手の携帯通信端末に出力する携帯通信端末が開示されている。相手からの音声を文字に変換し、ディスプレイに表示してもよいことが開示されている。 Patent Document 4 discloses a portable communication terminal that encodes a character input message into voice data and outputs the voice data to a portable communication terminal of the other party. It is disclosed that the voice from the other party may be converted to text and displayed on the display.

ＷＯ２００７／０６９３６１WO2007 / 069361 特開２０１０−１４１８５４Unexamined-Japanese-Patent No. 2010-141854 特開２００９−４４６７９JP 2009-44679 特開２００６−２９５４６８Japanese Patent Application Laid-Open No. 2006-295468

電話の通話相手が例えば乱暴な言葉を使ったり、乱暴な話し方をすると、聞き手に心理的ストレスを与える場合がある。特許文献１乃至４は当該課題を記載も示唆もしていない。本願発明は、通話時の聞き手の心理的ストレスを軽減することを課題とする。 When the other party on the phone uses, for example, rough language or rough speech, the listener may be psychologically stressed. Patent documents 1 to 4 neither describe nor suggest the subject. An object of the present invention is to reduce the psychological stress of a listener during a call.

本発明によれば、
集音し、第１の音声データを生成する音声受付手段と、
前記第１の音声データを通話相手の端末に送信する送信手段と、
通話相手の前記端末から送信された第２の音声データを取得する受信手段と、
前記通話相手の発言内容を出力する出力手段と、
を有し、
前記出力手段は、
前記通話相手の発言内容を音で出力せず、テキストで表示する第１のモード、
加工された前記第２の音声データに基づき、前記通話相手の発言内容を音で出力する第２のモード、又は、
前記通話相手の発言内容を予め登録された音で出力する第３のモード、
を有し、
前記出力手段が前記第１乃至第３のモードの中のいずれかで動作している間も、前記音声受付手段は集音して前記第１の音声データを生成し、前記送信手段は前記第１の音声データを通話相手の端末に送信する情報処理システムが提供される。 According to the invention
Voice receiving means for collecting sound and generating first voice data;
Transmitting means for transmitting the first voice data to the terminal of the other party of the call;
Receiving means for acquiring second voice data transmitted from the terminal of the other party of the call;
An output means for outputting the contents of the speech of the other party;
Have
The output means is
A first mode of displaying the contents of the other party's speech as text without outputting it as sound,
A second mode in which the speaking contents of the other party are output by sound based on the processed second voice data, or
A third mode for outputting the speech contents of the other party by a pre-registered sound,
Have
While the output means is operating in any one of the first to third modes, the voice reception means collects sound to generate the first voice data, and the transmission means An information processing system is provided which transmits voice data of 1 to a terminal of a calling party.

また、本発明によれば、
コンピュータが、
集音し、第１の音声データを生成し、
前記第１の音声データを通話相手の端末に送信し、
通話相手の前記端末から送信された第２の音声データを取得し、
前記通話相手の発言内容を音で出力せず、テキストで表示する第１のモード、加工された前記第２の音声データに基づき、前記通話相手の発言内容を音で出力する第２のモード、又は、前記通話相手の発言内容を予め登録された音で出力する第３のモード、で前記通話相手の発言内容を出力し、
前記第１乃至第３のモードの中のいずれかで前記通話相手の発言内容を出力している間も、集音して前記第１の音声データを生成し、前記第１の音声データを通話相手の端末に送信する情報処理方法が提供される。 Moreover, according to the present invention,
The computer is
Collect the sound and generate the first voice data,
Transmitting the first voice data to the other party's terminal;
Acquiring second voice data transmitted from the other party's terminal;
The first mode in which the speaking contents of the other party are not output as sound but displayed as text, and the second mode in which the speaking contents of the other party are output as sound based on the processed second voice data, Alternatively, in the third mode in which the speech contents of the other party are output with a pre-registered sound, the speech contents of the other party are output.
Also while outputting the speech contents of the other party in any of the first to third modes, the sound is collected to generate the first voice data, and the first voice data is called An information processing method is provided for transmitting to a partner's terminal.

また、本発明によれば、
集音し、第１の音声データを生成し、
前記第１の音声データを通話相手の端末に送信し、
通話相手の前記端末から送信された第２の音声データを取得し、
前記通話相手の発言内容を音で出力せず、テキストで表示する第１のモード、加工された前記第２の音声データに基づき、前記通話相手の発言内容を音で出力する第２のモード、又は、前記通話相手の発言内容を予め登録された音で出力する第３のモード、で前記通話相手の発言内容を出力し、
前記第１乃至第３のモードの中のいずれかで前記通話相手の発言内容を出力している間も、集音して前記第１の音声データを生成し、前記第１の音声データを通話相手の端末に送信する処理をコンピュータに実行させるプログラムが提供される。 Moreover, according to the present invention,
Collect the sound and generate the first voice data,
Transmitting the first voice data to the other party's terminal;
Acquiring second voice data transmitted from the other party's terminal;
The first mode in which the speaking contents of the other party are not output as sound but displayed as text, and the second mode in which the speaking contents of the other party are output as sound based on the processed second voice data, Alternatively, in the third mode in which the speech contents of the other party are output with a pre-registered sound, the speech contents of the other party are output.
Also while outputting the speech contents of the other party in any of the first to third modes, the sound is collected to generate the first voice data, and the first voice data is called A program is provided that causes a computer to execute a process of transmitting to the other party's terminal.

本発明によれば、通話時の聞き手の心理的ストレスを軽減することができる。 According to the present invention, it is possible to reduce the psychological stress of the listener during the call.

本実施形態の電話システム１の機能ブロック図の一例を示す図である。It is a figure which shows an example of the functional block diagram of the telephone system 1 of this embodiment. 本実施形態の通話部１０の機能ブロック図の一例を示す図である。It is a figure which shows an example of the functional block diagram of the telephone call part 10 of this embodiment. 本実施形態の通話部１０により出力される情報の一例を模式的に示す図である。It is a figure which shows typically an example of the information output by the telephone call part 10 of this embodiment. 本実施形態の通話部１０により出力される情報の一例を模式的に示す図である。It is a figure which shows typically an example of the information output by the telephone call part 10 of this embodiment. 本実施形態の通話部１０により出力される情報の一例を模式的に示す図である。It is a figure which shows typically an example of the information output by the telephone call part 10 of this embodiment. 本実施形態の通話部１０の機能ブロック図の一例を示す図である。It is a figure which shows an example of the functional block diagram of the telephone call part 10 of this embodiment. 本実施形態の電話システム１が収集する情報の一例を模式的に示す図である。It is a figure which shows typically an example of the information which the telephone system 1 of this embodiment collects. 本実施形態の通話部１０の処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of a process of the telephone call part 10 of this embodiment. 本実施形態の装置のハードウエア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the apparatus of this embodiment.

＜第１の実施形態＞
最初に、本実施形態の電話システム（情報処理システム）の概要を説明する。なお、本実施形態の電話システムを利用して通話しているものを「利用者」といい、通話している相手を「通話相手」という。本実施形態の電話システムは、第１乃至第３のモードの中の何れかを有する。 First Embodiment
First, an overview of the telephone system (information processing system) of the present embodiment will be described. Note that those who are in a call using the telephone system of the present embodiment are referred to as "users", and the other parties in a call are referred to as "others". The telephone system of the present embodiment has any one of the first to third modes.

第１のモードでは、通話相手の発言内容を音で出力せず、テキストで表示する。 In the first mode, the contents of the other party's speech are displayed as text without outputting sound.

第２のモードでは、通話相手の端末から送信された音声データを加工した加工後の音声データに基づき、通話相手の発言内容を音でスピーカから出力する。例えば、ボイスチェンジャーで通話相手の声等を変えて出力する。 In the second mode, based on the processed voice data obtained by processing the voice data transmitted from the other party's terminal, the contents of the other party's speech are output from the speaker as a sound. For example, the voice changer's voice etc. are changed by a voice changer and output.

第３のモードでは、通話相手の発言内容を予め登録された音でスピーカから出力する。すなわち、通話相手の声で通話相手の発言内容を出力するのでなく、他人の声やコンピュータ音等で通話相手の発言内容を出力する。 In the third mode, the content of the speech of the other party is output from the speaker with a pre-registered sound. That is, instead of outputting the contents of the other party's speech by the voice of the other party, the contents of the other party's speech are output as the voice of another person, a computer sound, or the like.

なお、本実施形態の電話システムは、第１乃至第３のモードの中のいずれかで動作している間も、マイクで利用者の声を集音して音声データを生成し、当該音声データを通話相手の端末に送信する。このため、電話システムが第１乃至第３のモードの中のいずれかで動作している間も、利用者は電話システムに向けて発言することで、自身の発言内容を通話相手に届けることができる。 The telephone system according to the present embodiment collects voice of the user with the microphone to generate voice data while operating in any of the first to third modes, and generates voice data Send to the other party's terminal. Therefore, while the telephone system is operating in any of the first to third modes, the user can deliver his / her statement to the other party by speaking to the telephone system. it can.

このような本実施形態の電話システムによれば、利用者は、通話相手の声を聞くことなく、通話相手との通話を行うことができる。このため、通話相手が乱暴な言葉を使ったり、乱暴な話し方をする者であっても、通話相手から受ける心理的ストレスを軽減できる。 According to the telephone system of this embodiment, the user can make a call with the other party without hearing the other party's voice. Therefore, it is possible to reduce the psychological stress received from the other party, even if the other party uses a rough language or speaks violently.

また、利用者は電話システムに向けて発言するという従来通りの手法で自身の発言内容を通話相手に届けることができる。このため、自然な通話を継続することができる。特許文献３及び４に記載の技術は、通話相手からの音声を文字に変換してディスプレイに表示する場合、自身の発言内容を文字入力する必要がある。この場合、通話の中で変な間ができてしまい、不自然な通話となってしまう。本実施形態の電話システムでは、当該不都合を軽減することができる。 Also, the user can deliver his / her speech contents to the other party in the conventional manner of speaking into the telephone system. Thus, natural calls can be continued. According to the techniques described in Patent Documents 3 and 4, when the voice from the other party is converted into characters and displayed on the display, it is necessary to input their own speech contents. In this case, a strange period occurs in the call, resulting in an unnatural call. In the telephone system of the present embodiment, the inconvenience can be alleviated.

次に、本実施形態の電話システムの構成を詳細に説明する。図１の機能ブロック図に示すように、電話システム１は、通話部１０と処理部２０とを有する。 Next, the configuration of the telephone system of the present embodiment will be described in detail. As shown in the functional block diagram of FIG. 1, the telephone system 1 has a call unit 10 and a processing unit 20.

通話部１０及び処理部２０は、物理的及び／又は論理的に分かれた装置に別々に設けられてもよいし、物理的及び／又は論理的に１つの装置に設けられてもよい。前者の例の場合、電話機、携帯電話、スマートフォン、パーソナルコンピュータ等の通話端末に通話部１０が設けられ、通話端末と通信可能に構成されたサーバ装置（例：ＩＰＰＢＸ（Intenet Protocol Private Branch eXchange）に処理部２０が設けられてもよい。後者の例の場合、通話端末に通話部１０及び処理部２０が設けられてもよい。 The call unit 10 and the processing unit 20 may be separately provided in physically and / or logically separated devices, or may be physically and / or logically provided in one device. In the case of the former example, the call unit 10 is provided in a call terminal such as a telephone, a mobile phone, a smart phone, a personal computer, etc., and a server apparatus configured to be able to communicate with the call terminal The processing unit 20 may be provided, and in the case of the latter example, the call unit 10 and the processing unit 20 may be provided in the call terminal.

最初に、処理部２０の機能構成を説明する。処理部２０は、通話相手の端末から送信された音声データを受信し、当該音声データに対して所定の処理を行う。そして、処理部２０は、処理の結果物を通話部１０に送信する。なお、テレビ電話等の場合は、処理部２０は通話相手の端末から送信された画像データをさらに受信し、画像データに対して所定の処理を行い、処理の結果物を通話部１０に送信してもよい。 First, the functional configuration of the processing unit 20 will be described. The processing unit 20 receives voice data transmitted from the other party's terminal, and performs predetermined processing on the voice data. Then, the processing unit 20 transmits the result of the process to the call unit 10. In the case of a videophone or the like, the processing unit 20 further receives the image data transmitted from the other party's terminal, performs predetermined processing on the image data, and transmits the processing result to the communication unit 10. May be

通話相手の端末は、電話機、携帯電話、スマートフォン、パーソナルコンピュータ等、通話機能を有する端末である。通話相手の端末から送信された音声データは、通話中に通話相手の端末で集音され、生成された音声データである。通話相手の端末から送信された画像データは、通話中に通話相手の端末で撮影され、生成された画像データである。 The other party's terminal is a terminal having a call function, such as a telephone, a mobile phone, a smartphone, a personal computer, and the like. The voice data transmitted from the other party's terminal is voice data collected and generated by the other party's terminal during the call. The image data transmitted from the other party's terminal is the image data generated and taken by the other party's terminal during the call.

ここで、処理部２０が行う所定の処理を説明する。例えば、処理部２０は、音声データに対して音声認識処理を行い、通話相手の発言内容を示すテキストデータを生成してもよい。そして、処理部２０は、当該テキストデータを通話部１０に送信してもよい。 Here, the predetermined process performed by the processing unit 20 will be described. For example, the processing unit 20 may perform voice recognition processing on voice data and generate text data indicating the content of the speech of the other party. Then, the processing unit 20 may transmit the text data to the call unit 10.

また、処理部２０は、上記テキストデータを処理し、通話相手の発言内容の中に予め登録された禁止ワードが含まれるか否かを判断してもよい。含まれる場合、処理部２０は、上記テキストデータの中の禁止ワード部分を伏字にしたテキストデータを生成してもよい。そして、処理部２０は、禁止ワード部分を伏字にしたテキストデータを通話部１０に送信してもよい。例えば禁止ワード部分を、「暴言」、「禁止ワード」等の所定の文言に置き代えてもよいし、その他の手法で伏字にしてもよい。 Further, the processing unit 20 may process the text data and determine whether or not a prohibited word registered in advance is included in the contents of the call of the other party. When it is included, the processing unit 20 may generate text data in which the prohibited word portion in the text data is in lower case. Then, the processing unit 20 may transmit, to the call unit 10, text data in which the prohibited word portion is in lower case. For example, the prohibited word portion may be replaced by predetermined words such as "rant word", "prohibited word" or the like, or may be suffixed by another method.

また、処理部２０は、通話相手の発言内容の中に含まれる禁止ワードの数（出現回数）をカウントしてもよい。そして、処理部２０は、その数を通話部１０に送信してもよい。 Further, the processing unit 20 may count the number (the number of appearances) of the prohibited word included in the content of the speech of the other party. Then, the processing unit 20 may transmit the number to the call unit 10.

また、処理部２０は、音声データに対して音声認識処理を行い、通話相手が予め登録された要注意人物か否かを判断してもよい。そして、処理部２０は、その判断結果を通話部１０に送信してもよい。なお、処理部２０は、当該判断において、上記画像データを用いてもよい。 In addition, the processing unit 20 may perform voice recognition processing on voice data to determine whether the other party is the person in need of pre-registration. Then, the processing unit 20 may transmit the determination result to the call unit 10. The processing unit 20 may use the image data in the determination.

また、処理部２０は、音声データを解析し、通話相手の感情を判断してもよい。そして、処理部２０は、その判断結果を通話部１０に送信してもよい。なお、処理部２０は、当該判断において、上記画像データを用いてもよい。 Further, the processing unit 20 may analyze voice data to determine the emotion of the other party. Then, the processing unit 20 may transmit the determination result to the call unit 10. The processing unit 20 may use the image data in the determination.

また、処理部２０は、通話相手の感情、禁止ワードの出現回数、通話相手の声の大きさ、通話時間等に基づき、予め定められた算出方法で、通話相手の怒り度合いを算出してもよい。そして、処理部２０は、算出した怒り度合いを通話部１０に送信してもよい。例えば、処理部２０は、通話相手の感情が「怒り」でない場合に怒り度合い「０」とし、通話相手の感情が「怒り」である場合に怒り度合い「１」以上を算出してもよい。そして、処理部２０は、禁止ワードの出現回数が多い程、声の大きさが大きい程、また、通話時間が長い程、大きい怒り度合いを算出してもよい。 Further, the processing unit 20 may calculate the degree of anger of the other party by a predetermined calculation method based on the other party's emotion, the number of occurrences of forbidden words, the size of the other party's voice, the call time, etc. Good. Then, the processing unit 20 may transmit the calculated degree of anger to the call unit 10. For example, the processing unit 20 may calculate the anger degree “0” when the emotion of the other party is not “anger” and calculate the anger degree “1” or more when the emotion of the other party is “anger”. Then, the processing unit 20 may calculate a greater degree of anger as the number of appearances of the prohibited word is larger, as the size of the voice is larger, and as the call time is longer.

また、処理部２０は、ボイスチェンジャーで音声データを加工してもよい。そして、処理部２０は、加工後の音声データを通話部１０に送信してもよい。 Further, the processing unit 20 may process voice data with a voice changer. Then, the processing unit 20 may transmit the processed voice data to the call unit 10.

また、処理部２０は、画像データを加工してもよい。そして、処理部２０は、加工後の画像データを通話部１０に送信してもよい。例えば、処理部２０は、画像の中の通話相手の顔部分にボカシを入れたり、通話相手の顔部分を他の画像（例：アニメーション、動物の顔、他の人の顔等）に置き代える加工等を行ってもよい。 Moreover, the processing unit 20 may process image data. Then, the processing unit 20 may transmit the processed image data to the call unit 10. For example, the processing unit 20 may add blur to the face portion of the other party in the image, or replace the face portion of the other party with another image (eg, animation, animal face, other person's face, etc.) You may process etc.

また、処理部２０は、通話相手の発言内容を示すテキストデータを、他の言語に翻訳してもよい（例：日本語を英語に翻訳）。 In addition, the processing unit 20 may translate text data indicating the content of the speech of the other party into another language (e.g., translating Japanese into English).

次に、通話部１０の機能構成を説明する。図２に示すように、通話部１０は、音声受付部１１と、送信部１２と、受信部１３と、出力部１４とを有する。 Next, the functional configuration of the call unit 10 will be described. As shown in FIG. 2, the call unit 10 includes a voice reception unit 11, a transmission unit 12, a reception unit 13, and an output unit 14.

音声受付部１１は、マイクを有する。音声受付部１１は、通話中、集音し、音声データを生成する。音声受付部１１が生成する音声データを、第１の音声データという。音声受付部１１により、利用者の音声が集音される。 The voice reception unit 11 has a microphone. The voice reception unit 11 collects sound during a call and generates voice data. The voice data generated by the voice reception unit 11 is referred to as first voice data. The voice reception unit 11 collects the voice of the user.

送信部１２は、通話中、第１の音声データを通話相手の端末に送信する。なお、電話システム１はカメラを有してもよい。そして、送信部１２は、通話中、当該カメラで生成された画像データ（利用者を撮影した画像データ）を通話相手の端末に送信してもよい。通話中に当該カメラで生成された画像データを、第１の画像データという。 The transmitter 12 transmits the first voice data to the terminal of the other party during the call. The telephone system 1 may have a camera. Then, the transmitting unit 12 may transmit the image data (image data obtained by photographing the user) generated by the camera to the terminal of the other party during the call. Image data generated by the camera during a call is referred to as first image data.

なお、以下で説明する出力部１４が第１乃至第３のモードの中のいずれかで動作している間も、音声受付部１１は集音して第１の音声データを生成し、送信部１２は第１の音声データ（さらに第１の画像データを含んでもよい）を通話相手の端末に送信する。 Note that, while the output unit 14 described below is operating in any one of the first to third modes, the voice reception unit 11 collects sound and generates the first voice data, and the transmission unit 12 transmits the first voice data (which may further include the first image data) to the other party's terminal.

受信部１３は、通話相手の端末から送信された音声データを取得する。通話相手の端末から送信された音声データを、第２の音声データという。受信部１３は、第２の音声データに加えて又は代えて、処理部２０により加工された第２の音声データを取得してもよい。 The receiving unit 13 acquires voice data transmitted from the terminal of the other party. The voice data transmitted from the other party's terminal is referred to as second voice data. The receiving unit 13 may obtain the second audio data processed by the processing unit 20 in addition to or instead of the second audio data.

また、受信部１３は、通話相手の端末から送信された画像データを取得してもよい。通話相手の端末から送信された画像データを、第２の画像データという。受信部１３は、第２の画像データに加えて又は代えて、処理部２０により加工された第２の画像データを取得してもよい。 Also, the receiving unit 13 may acquire image data transmitted from the terminal of the other party. The image data transmitted from the other party's terminal is referred to as second image data. The receiving unit 13 may obtain the second image data processed by the processing unit 20 in addition to or instead of the second image data.

電話システム１は、通話中、例えばＳＩＰ（session initiation protocol）等のプロトコルで、音声データや画像データの送受信を行うことができる。 The telephone system 1 can transmit and receive voice data and image data according to a protocol such as SIP (session initiation protocol) during a call, for example.

出力部１４は、通話相手の発言内容を、スピーカやディスプレイ等の出力装置を介して出力する。出力部１４は、第１のモード、第２のモード、又は、第３のモードを有する。なお、出力部１４は、これら３つのモードの中の２つ以上のモードを有してもよい。 The output unit 14 outputs the contents of the other party's utterance via an output device such as a speaker or a display. The output unit 14 has a first mode, a second mode, or a third mode. The output unit 14 may have two or more of these three modes.

第１のモードでは、出力部１４は、通話相手の発言内容を音で出力せず、テキストで表示する。例えば、出力部１４は、電話システム１が有する、又は、電話システム１と繋がったディスプレイに、通話相手の発言内容を示すテキストを表示する。出力部１４は、処理部２０により生成された通話相手の発言内容を示すテキストデータを利用して当該表示を実現することができる。 In the first mode, the output unit 14 displays the contents of the other party's speech as text without outputting sound. For example, the output unit 14 displays, on a display included in the telephone system 1 or connected to the telephone system 1, a text indicating the content of an utterance of the other party. The output unit 14 can realize the display by using the text data indicating the content of the speech of the other party generated by the processing unit 20.

図３に、出力部１４によりディスプレイに表示されたテキストの一例を示す。図では、通話相手の発言内容が発言順に一覧表示されている。 FIG. 3 shows an example of the text displayed on the display by the output unit 14. In the figure, the speech contents of the calling party are listed in the order of speech.

なお、出力部１４は、図４に示すように、通話相手の発言内容に加えて、利用者の発言内容をディスプレイに表示してもよい。処理部２０が第１の音声データを音声認識し、利用者の発言内容を示すテキストデータを生成してもよい。図では、通話相手及び利用者（図の「本人」）の発言内容が発言順に一覧表示されている。 As shown in FIG. 4, the output unit 14 may display the content of the user's speech on the display in addition to the content of the speech of the other party. The processing unit 20 may perform voice recognition of the first voice data and generate text data indicating the content of the user's speech. In the figure, the speech contents of the calling party and the user ("individual" in the figure) are listed and displayed in order of speech.

また、出力部１４は、図５に示すように、通話相手の発言内容の中の禁止ワード部分を伏字（図の「暴言」）にしてテキストで表示してもよい。出力部１４は、処理部２０により生成された禁止ワード部分を伏字にしたテキストデータを利用して当該表示を実現することができる。なお、所定の操作（例：伏字部分をクリック）に応じて、伏字にされた禁止ワードが表示されてもよい。 In addition, as shown in FIG. 5, the output unit 14 may display the prohibited word portion in the contents of the other party's utterance in the lower case ("rathering" in the figure) as text. The output unit 14 can realize the display by using the text data in which the prohibited word portion generated by the processing unit 20 is inverted. In addition, in accordance with a predetermined operation (e.g., clicking on a part in the lower part), a prohibited word in the lower part may be displayed.

第２のモードでは、出力部１４は、ボイスチェンジャーで加工された第２の音声データに基づき、通話相手の発言内容を音でスピーカから出力する。出力部１４は、処理部２０によりボイスチェンジャーで加工された第２の音声データに基づき当該出力を実現してもよい。その他、出力部１４がボイスチェンジャーを有してもよい。そして、出力部１４がボイスチェンジャーで第２の音声データを加工し、加工した第２の音声データに基づき当該出力を実現してもよい。 In the second mode, based on the second voice data processed by the voice changer, the output unit 14 outputs the speech content of the other party from the speaker as a sound. The output unit 14 may realize the output based on the second audio data processed by the processing unit 20 with the voice changer. In addition, the output unit 14 may have a voice changer. Then, the output unit 14 may process the second audio data with the voice changer, and may realize the output based on the processed second audio data.

第３のモードでは、出力部１４は、通話相手の発言内容を予め登録された音でスピーカから出力する。出力部１４は、処理部２０により生成された通話相手の発言内容を示すテキストデータに記載された文言（通話相手の発言内容）を予め登録された音で出力する。 In the third mode, the output unit 14 outputs the speech contents of the calling party from the speaker with a sound registered in advance. The output unit 14 outputs the words (content of the speech of the other party) described in the text data indicating the contents of the other party's speech generated by the processing unit 20 in the form of a pre-registered sound.

なお、出力部１４は、第２のモード及び第３のモードで動作している間も、通話相手の発言内容をテキストでディスプレイに表示してもよい。この場合も、禁止ワード部分を伏字にしてもよい。 The output unit 14 may display the contents of the other party's speech on the display as text even while operating in the second mode and the third mode. Also in this case, the prohibited word part may be in the lower case.

また、テレビ電話等、通話相手の端末から画像データが送信されてくる場合、出力部１４は、第１乃至第３のモードで動作している間、画像の表示を停止してもよいし、処理部２０により加工された第２の画像データ（例：通話相手の顔部分にボカシを入れたり、他の画像に置き代えた画像データ）を用いて画像表示してもよい。 When image data is transmitted from the other party's terminal, such as a videophone, the output unit 14 may stop displaying the image while operating in the first to third modes. The image may be displayed using the second image data processed by the processing unit 20 (for example, image data in which a void is put in the face portion of the other party of the call, or replaced with another image).

また、出力部１４は、第１乃至第３のモードの中のいずれかで動作している間、通話相手の感情を示す情報を出力してもよい。例えば、通話相手の感情を示す文字、絵、図形等をディスプレイに表示してもよい。 The output unit 14 may output information indicating the emotion of the other party while operating in any one of the first to third modes. For example, characters, pictures, figures, etc. indicating the emotion of the other party may be displayed on the display.

また、出力部１４は、第１のモードで動作している場合、通話相手の発言が途切れたタイミングを利用者に通知してもよい。出力部１４は、例えば、通話相手の発言が途切れたタイミングで所定の音をスピーカから発してもよいし、当該タイミングで所定の情報をディスプレイに表示してもよいし、当該タイミングで警告ランプを点灯させてもよいし、当該タイミングでバイブレータから振動を起こさせてもよいし、その他であってもよい。 In addition, when operating in the first mode, the output unit 14 may notify the user of the timing at which the other party's speech has been interrupted. For example, the output unit 14 may emit a predetermined sound from the speaker at the timing when the speech of the other party is interrupted, or may display predetermined information on the display at the timing, or a warning lamp at the timing. The light may be turned on, vibration may be generated from the vibrator at this timing, or the like.

この場合、出力部１４は、通話相手の発言が途切れたタイミングで、予め録音されていた利用者の相槌の音声データを通話相手の端末に送信してもよい。なお、複数種類の相槌の音声データを用意しておいてもよい。そして、直前の通話相手の発言内容に基づき、コンピュータ（ＡＩ：artificial intelligence）がどの種類の相槌の音声データを送信するか決定してもよい。 In this case, the output unit 14 may transmit, to the terminal of the other party of communication, the voice data of the user of the user who has been recorded in advance, at the timing at which the other party's utterance is interrupted. Note that voice data of a plurality of different types may be prepared. Then, the computer (AI: artificial intelligence) may determine which type of voice data of the type of the voice of the pair is to be transmitted, based on the speech contents of the immediately preceding calling party.

また、出力部１４は、通話相手の発言内容を示すテキストを、通話相手の言語と異なる言語で表示してもよい。例えば、通話相手が英語で発言した場合、その発言内容を日本語でテキスト表示してもよい。 In addition, the output unit 14 may display a text indicating the content of the utterance of the other party in a language different from the language of the other party. For example, when the other party speaks in English, the contents of the utterance may be displayed in Japanese as text.

以上、第１乃至第３のモードの中のいずれかを有する本実施形態の電話システム１によれば、利用者は、通話相手の声を聞くことなく、通話相手との通話を行うことができる。このため、乱暴な言葉を使ったり、乱暴な話し方をしたりする通話相手から受ける心理的ストレスを軽減できる。 As described above, according to the telephone system 1 of the present embodiment having any one of the first to third modes, the user can make a call with the other party without hearing the other party's voice. . Therefore, it is possible to reduce the psychological stress received from the other party who uses rough language or makes rough speech.

また、利用者は電話システムに向けて発言するという従来通りの手法で自身の発言内容を通話相手に届けることができる。このため、自然な通話を継続することができる。 Also, the user can deliver his / her speech contents to the other party in the conventional manner of speaking into the telephone system. Thus, natural calls can be continued.

また、テレビ電話の場合、電話システム１は、第１乃至第３のモードの間、通話相手の画像をそのまま表示するのでなく、画像の表示を停止したり、通話相手の顔部分にボカシを入れたり他の画像に置き代えたりすることができる。これにより、利用者の心理的ストレスを軽減できる。 In the case of a videophone call, the telephone system 1 does not display the image of the other party as it is during the first to third modes, but instead stops displaying the image or inserts a void in the face of the other party. Or replace it with another image. This can reduce the psychological stress of the user.

また、第１乃至第３のモードの場合、通話相手の感情を把握し難くなる。処理部２０による通話相手の感情の判断結果を利用者に通知することで、利用者は通話相手の感情を把握できる。結果、適切なコミュニケーションをとることができる。 Further, in the case of the first to third modes, it becomes difficult to grasp the emotion of the other party. By notifying the user of the determination result of the caller's emotion by the processing unit 20, the user can grasp the caller's emotion. As a result, you can communicate properly.

また、第１のモードで動作している場合、利用者は通話相手の発言が途切れたタイミングを把握し難くなる。この場合、通話中に変な間ができたり、通話相手が発言中に利用者が発言してしまう等の好ましくない状況が発生し得る。通話相手の発言が途切れたタイミングを利用者に通知したり、予め録音しておいた相槌を出力したりすることで、このような好ましくない状況の発生を軽減できる。 In addition, when operating in the first mode, it is difficult for the user to grasp the timing at which the other party's speech is interrupted. In this case, an undesirable situation may occur such as a strange period being made during a call, or a user speaking during the speaking party. The occurrence of such an undesirable situation can be reduced by notifying the user of the timing at which the other party's speech is interrupted or outputting the pre-recorded sumo wrestling.

＜第２の実施形態＞
本実施形態の電話システム１は、通話相手の発言内容を通話相手の声でスピーカから出力する通常モードを有する。そして、電話システム１は、所定のタイミングで、通常モードと、特別モード（第１乃至第３のモードの中のいずれか）との切り替えを行う。以下、電話システム１の機能を詳細に説明する。 Second Embodiment
The telephone system 1 of the present embodiment has a normal mode in which the speech contents of the other party are output from the speaker in the voice of the other party. Then, the telephone system 1 switches between the normal mode and the special mode (one of the first to third modes) at a predetermined timing. The functions of the telephone system 1 will be described in detail below.

処理部２０の機能は、第１の実施形態と同様である。 The function of the processing unit 20 is the same as that of the first embodiment.

図６に、通話部１０の機能ブロック部の一例を示す。通話部１０は、音声受付部１１と、送信部１２と、受信部１３と、出力部１４と、決定部１５とを有する。音声受付部１１、送信部１２及び受信部１３の機能は第１の実施形態と同様である。 FIG. 6 shows an example of a function block unit of the call unit 10. As shown in FIG. The call unit 10 includes a voice reception unit 11, a transmission unit 12, a reception unit 13, an output unit 14, and a determination unit 15. The functions of the voice reception unit 11, the transmission unit 12, and the reception unit 13 are the same as in the first embodiment.

出力部１４は、通常モードと、特別モード（第１乃至第３のモードの中のいずれか）とを有する。通常モードでは、出力部１４は、通話相手の発言内容を通話相手の声でスピーカから出力する。 The output unit 14 has a normal mode and a special mode (one of the first to third modes). In the normal mode, the output unit 14 outputs the speech contents of the other party from the speaker in the voice of the other party.

決定部１５は、出力部１４のモードを決定する。そして、出力部１４は、決定部１５により決定されたモードで動作する。 The determination unit 15 determines the mode of the output unit 14. Then, the output unit 14 operates in the mode determined by the determination unit 15.

［決定方法１］
決定部１５は、第２の音声データから特定される通話相手の感情、声の大きさ、発言内容及び通話時間の中の少なくとも１つに基づき、出力部１４のモードを決定してもよい。 [Determination method 1]
The determination unit 15 may determine the mode of the output unit 14 based on at least one of the emotion of the other party identified from the second voice data, the magnitude of the voice, the content of the utterance, and the call duration.

例えば、決定部１５は、「通話相手の感情が予め定められた所定の感情（例：「怒り」）」、「通話相手の声の大きさが閾値以上」、「特定のキーワード（禁止ワード）を所定回数以上発言」、及び、「通話時間が閾値以上」の中の１つからなる条件又は複数を組み合わせた条件を満たした場合、特別モードを決定してもよい。 For example, the determination unit 15 may say, “a predetermined emotion in which the emotion of the other party of the calling party is predetermined (eg: anger”), “a voice size of the other party is equal to or more than a threshold”, “a specific keyword (prohibited word) The special mode may be determined if a condition or a combination of a condition consisting of one of "speak a predetermined number of times or more" and "the talk time is equal to or more than a threshold" is satisfied.

［決定方法２］
決定部１５は、過去の通話時に収集された収集情報に基づき、出力部１４のモードを決定してもよい。 [Determination method 2]
The determination unit 15 may determine the mode of the output unit 14 based on the collected information collected during the past call.

図７に、過去の通話時に収集された収集情報の一例を模式的に示す。図示する収集情報は、通話相手ＩＤ（identifier）と、通話相手の属性と、通話日時と、通話特徴とが対応付けられている。 FIG. 7 schematically shows an example of collected information collected at the time of a call in the past. The collection information shown in the drawing is associated with the caller ID (identifier), the attributes of the caller, the call date and time, and the call features.

通話相手ＩＤは、通話相手の電話番号、電話アプリに登録されたＩＤ、第２の音声データから抽出された特徴量（声紋）の中の１つ又は複数であってもよいし、その他であってもよい。処理部２０が第２の音声データを解析し、当該特徴量を抽出してもよい。 The other party's ID may be one or more of the other party's telephone number, an ID registered in the telephone application, and a feature (voiceprint) extracted from the second voice data, or the other. May be The processing unit 20 may analyze the second audio data and extract the feature amount.

通話相手の属性は、通話相手の性別、年齢層、訛りの特徴等が例示される。処理部２０が第２の音声データを解析し、これらの属性を推定してもよい。 The attributes of the calling party are exemplified by the gender of the calling party, the age group, the characteristics of the call, and the like. The processing unit 20 may analyze the second voice data to estimate these attributes.

通話特徴は、通話相手との通話の特徴であり、通話時間、出力部１４が特別モードで動作したか否か、通話時間の中の特別モードで動作した時間の割合、通話相手の感情が「怒り」になったか否か、通話時間の中の通話相手の感情が「怒り」になった時間の割合、通話相手の声の大きさが閾値以上になったか否か、通話時間の中の通話相手の声の大きさが閾値以上になった時間の割合、通話相手が禁止ワードを発言したか否か、通話相手が禁止ワードを発言した回数、怒り度合い等が例示される。処理部２０が第２の音声データを解析し、通話特徴を生成してもよい。 The call feature is the feature of the call with the other party, and the call duration, whether the output unit 14 operated in the special mode, the ratio of the time spent in the special mode in the call time, the emotion of the other party Whether or not the user's emotions in the call time are "anger", whether the voice size of the other party's voice is above the threshold, the call during the call time The ratio of time when the voice of the other party exceeds the threshold, whether or not the other party speaks the forbidden word, the number of times the other party speaks the forbidden word, the degree of anger, etc. are exemplified. The processing unit 20 may analyze the second voice data to generate a call feature.

決定部１５は、当該収集情報に基づき、出力部１４のモードを決定することができる。例えば、通話相手の過去の通話特徴が以下の中の１つからなる条件又は複数を組み合わせた条件を満たす場合、決定部１５は特別モードを決定してもよい。 The determination unit 15 can determine the mode of the output unit 14 based on the collected information. For example, the determination unit 15 may determine the special mode when the past call feature of the other party of the call satisfies the condition including one or more of the following conditions or a combination of a plurality of conditions.

「出力部１４が特別モードで動作した」
「通話時間の中の特別モードで動作した時間の割合が閾値以上」
「通話相手の感情が「怒り」になった」
「通話時間の中の通話相手の感情が「怒り」になった時間の割合が閾値以上」
「通話相手の声の大きさが閾値以上になった」
「通話時間の中の通話相手の声の大きさが閾値以上になった時間の割合が閾値以上」
「通話相手が禁止ワードを発言した」
「通話相手が禁止ワードを発言した回数が閾値以上」
「通話相手の怒り度合いが閾値以上になった」 "Output unit 14 operated in the special mode"
"Percentage of time spent in special mode in talk time is above threshold"
"The emotion of the other party became" anger ""
"Percentage of time during which the other party's emotion in the call time became" anger "is above the threshold"
"The loudness of the other party's voice has exceeded the threshold"
"The percentage of time during which the loudness of the other party's voice in the talk time is above the threshold is above the threshold"
"The other party has spoken a forbidden word"
"The number of times the other party has spoken a forbidden word is above the threshold"
"Anger degree of the other party is above the threshold"

なお、通話相手の過去の通話が複数回ある場合、複数回の通話の中の所定割合以上の通話において上記条件を満たす場合、決定部１５は特別モードを決定してもよい。 Note that if there is a past call by the other party a plurality of times, the determination unit 15 may determine the special mode if the above condition is satisfied in calls of a predetermined ratio or more among the plurality of calls.

［決定方法３］
決定部１５は、通話相手の収集情報がない場合、通話相手と属性（性別、年齢層、訛りの特徴等）や状態（怒り度合い）が同一又は類似する他の通話相手の収集情報に基づき、出力部１４のモードを決定してもよい。 [Decision method 3]
If there is no collection information of the other party, the determination unit 15 determines the other party based on the other party's collection information having the same or similar attributes (sex, age group, characteristics of resemblance, etc.) and states (anger degree) as the other party. The mode of the output unit 14 may be determined.

［決定方法４］
決定部１５は、ユーザ入力に基づき、出力部１４のモードを決定してもよい。すなわち、利用者が出力部１４のモードを選択できてもよい。その他、利用者の通話を監視している監視者（例：利用者の上司）が、遠隔操作で出力部１４のモードを選択できてもよい。 [Determination method 4]
The determination unit 15 may determine the mode of the output unit 14 based on a user input. That is, the user may be able to select the mode of the output unit 14. In addition, an observer (for example, the superior of the user) who is monitoring the user's call may be able to select the mode of the output unit 14 by remote control.

次に、図８のフローチャートを用いて、通話部１０の処理の流れの一例を説明する。ここでは、決定方法１及び４でモードを決定する例を説明する。 Next, an example of the process flow of the call unit 10 will be described using the flowchart of FIG. 8. Here, an example of determining the mode by the determination methods 1 and 4 will be described.

通話開始直後、出力部１４は、デフォルト設定されている通常モードで通話相手の発言内容を出力する（Ｓ１０）。そして、決定部１５は、モード変更条件を満たすか否かの判断（Ｓ１１）を、通話が続いている間（Ｓ１３のＮｏ）、継続する。ここでのモード変更条件は、決定方法１で説明した特別モードを決定する条件、決定方法２及び３で説明した通話相手の収集情報又は通話相手と属性が類似する他の通話相手の収集情報に基づき特別モードを決定する条件、又は、決定方法４で説明したユーザ入力（通常モードから特別モードに変更する入力）の受付けである。 Immediately after the start of the call, the output unit 14 outputs the contents of the speech of the other party in the normal mode set as a default (S10). Then, the determination unit 15 continues the determination (S11) as to whether or not the mode change condition is satisfied, while the call continues (No in S13). Here, the mode change condition is a condition for determining the special mode described in the determination method 1, the collected information of the other party described in the determination methods 2 and 3, or the collected information of the other party whose attribute is similar to that of the other party. This is the condition for determining the special mode based on the above or the acceptance of the user input (input for changing from the normal mode to the special mode) described in the determination method 4.

モード変更条件を満たす場合（Ｓ１１のＹｅｓ）、決定部１５は特別モードを決定する。そして、出力部１４は、特別モードで通話相手の発言内容を出力する（Ｓ１２）。その後、決定部１５は、モード変更条件を満たすか否かの判断（Ｓ１４）を、通話が続いている間（Ｓ１５のＮｏ）、継続する。ここでのモード変更条件は、決定方法４で説明したユーザ入力（特別モードから通常モードに変更する入力）の受付けである。 If the mode change condition is satisfied (Yes in S11), the determination unit 15 determines the special mode. Then, the output unit 14 outputs the speech contents of the other party in the special mode (S12). After that, the determination unit 15 continues the determination (S14) as to whether or not the mode change condition is satisfied while the call continues (No in S15). The mode change condition here is acceptance of the user input (input for changing from the special mode to the normal mode) described in the determination method 4.

モード変更条件を満たす場合（Ｓ１４のＹｅｓ）、決定部１５は通常モードを決定する。そして、出力部１４は、通常モードで通話相手の発言内容を出力する（Ｓ１０）。以降、同様の処理を繰り返す。 If the mode change condition is satisfied (Yes in S14), the determination unit 15 determines the normal mode. Then, the output unit 14 outputs the speech contents of the other party in the normal mode (S10). Thereafter, the same processing is repeated.

なお、出力部１４は、通常モードの間も、通話相手の発言内容をテキストで表示してもよい。この場合、通常モードと第１のモード（特別モード）の違いは、通話相手の発言内容を音で出力するか否かとなる。 The output unit 14 may display the contents of the other party's utterance as text even during the normal mode. In this case, the difference between the normal mode and the first mode (special mode) is whether or not the contents of the other party's speech are output as sound.

また、決定方法２及び３を採用する場合、決定部１５は、呼出信号受信後かつ通話開始前に、通話相手の電話番号や通話相手の電話アプリのＩＤ等に基づきモードを決定してもよい。この場合、通話開始直後から、出力部１４は、特別モードで通話相手の発言内容を出力することができる。 When the determination methods 2 and 3 are adopted, the determination unit 15 may determine the mode based on the telephone number of the other party, the ID of the other party's phone application, etc. . In this case, immediately after the start of the call, the output unit 14 can output the speech contents of the other party in the special mode.

また、通話部１０は、通常モードの間は通話内容を録音せず、特別モードの間は通話相手の音声を録音してもよい。 Also, the calling unit 10 may not record the contents of the call during the normal mode, and may record the voice of the other party during the special mode.

その他、通話部１０は、通話中、通常モードであっても特別モードであっても通話内容を録音してもよい。そして、通話中に特別モードになった場合と特別モードにならなかった場合（常時通常モード）とで、その録音データに対する処理を異ならせてもよい。例えば、通話中に特別モードになった場合、その録音データ（音声ファイル）に重要フラグを付し、特別モードにならなかった場合（常時通常モード）はその録音データ（音声ファイル）に重要フラグを付さなくてもよい。ユーザは、当該重要フラグを用いて録音データをグループ分けしたりできる。その他、重要フラグ付きの録音データは録音日からＭ日後に削除され、重要フラグなしの録音データは録音日からＮ日後に削除されてもよい（Ｍ＞Ｎ）。すなわち、削除タイミングが異なってもよい。 In addition, during the call, the call unit 10 may record the contents of the call whether in the normal mode or the special mode. Then, the processing for the recording data may be different between when the special mode is entered during the call and when the special mode is not entered (always in the normal mode). For example, when the special mode is entered during a call, the important data is attached to the recorded data (voice file), and when the special mode is not entered (always normal mode), the important flag is added to the recorded data (voice file). It does not have to be attached. The user can group the recording data using the important flag. In addition, the recording data with the important flag may be deleted M days after the recording date, and the recording data without the important flag may be deleted N days after the recording date (M> N). That is, the deletion timing may be different.

第１及び第２の実施形態の電話システム１は、例えばコールセンター等で利用されてもよいし、その他のビジネスシーンで利用されてもよいし、プライベートなシーンで利用されてもよい。 The telephone system 1 of the first and second embodiments may be used, for example, in a call center or the like, may be used in other business scenes, or may be used in private scenes.

第１のモードは、通話相手が乱暴な言葉を使ったり、乱暴な話し方をしている場合のみならず、周りがうるさくて通話相手の声が聞こえない状況下でも利用できる。このような状況下であっても、第１のモードを利用すれば、利用者は通話相手の発言内容を把握できる。なお、ディスプレイはウェアラブル端末（例：眼鏡型ウェアラブル端末）であってもよい。 The first mode can be used not only when the other party uses rough words or speaking wildly, but also when the surroundings are noisy and the other party can not hear the other party's voice. Even in such a situation, the user can grasp the contents of the other party by using the first mode. The display may be a wearable terminal (eg, a glasses-type wearable terminal).

次に、電話システム１のハードウエア構成の一例について説明する。本実施形態の電話システム１が備える各機能は、任意のコンピュータのＣＰＵ（Central Processing Unit）、メモリ、メモリにロードされるプログラム、そのプログラムを格納するハードディスク等の記憶ユニット（あらかじめ装置を出荷する段階から格納されているプログラムのほか、ＣＤ（Compact Disc）等の記憶媒体やインターネット上のサーバ等からダウンロードされたプログラムをも格納できる）、ネットワーク接続用インターフェイスを中心にハードウエアとソフトウエアの任意の組合せによって実現される。そして、その実現方法、装置にはいろいろな変形例があることは、当業者には理解されるところである。 Next, an example of the hardware configuration of the telephone system 1 will be described. Each function of the telephone system 1 according to the present embodiment includes a central processing unit (CPU) of any computer, a memory, a program loaded to the memory, a storage unit such as a hard disk storing the program (the apparatus is shipped in advance In addition to programs stored from the library, it can also store storage media such as CDs (Compact Disc), programs downloaded from servers on the Internet, etc.) Arbitrary hardware and software centered on the network connection interface It is realized by the combination. And it is understood by those skilled in the art that there are various modifications in the implementation method and apparatus.

図９は、本実施形態の電話システム１のハードウエア構成を例示するブロック図である。図２に示すように、電話システム１は、プロセッサ１Ａ、メモリ２Ａ、入出力インターフェイス３Ａ、周辺回路４Ａ、バス５Ａを有する。周辺回路４Ａには、様々なモジュールが含まれる。電話システム１は周辺回路４Ａを有さなくてもよい。なお、通話部１０及び処理部２０が物理的及び／又は論理的に分かれた装置各々に設けられる場合、各装置がプロセッサ１Ａ、メモリ２Ａ、入出力インターフェイス３Ａ、周辺回路４Ａ、バス５Ａを有する。 FIG. 9 is a block diagram illustrating the hardware configuration of the telephone system 1 of the present embodiment. As shown in FIG. 2, the telephone system 1 includes a processor 1A, a memory 2A, an input / output interface 3A, a peripheral circuit 4A, and a bus 5A. Peripheral circuit 4A includes various modules. The telephone system 1 may not have the peripheral circuit 4A. When the call unit 10 and the processing unit 20 are provided in each of physically and / or logically divided devices, each device has a processor 1A, a memory 2A, an input / output interface 3A, a peripheral circuit 4A, and a bus 5A.

バス５Ａは、プロセッサ１Ａ、メモリ２Ａ、周辺回路４Ａ及び入出力インターフェイス３Ａが相互にデータを送受信するためのデータ伝送路である。プロセッサ１Ａは、例えばＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）などの演算処理装置である。メモリ２Ａは、例えばＲＡＭ（Random Access Memory）やＲＯＭ（Read Only Memory）などのメモリである。入出力インターフェイス３Ａは、入力装置（例：キーボード、マウス、マイク等）、外部装置、外部サーバ、外部センサー等から情報を取得するためのインターフェイスや、出力装置（例：ディスプレイ、スピーカ、プリンター、メーラ等）、外部装置、外部サーバ等に情報を出力するためのインターフェイスなどを含む。プロセッサ１Ａは、各モジュールに指令を出し、それらの演算結果をもとに演算を行うことができる。 The bus 5A is a data transmission path for the processor 1A, the memory 2A, the peripheral circuit 4A, and the input / output interface 3A to mutually transmit and receive data. The processor 1A is, for example, an arithmetic processing unit such as a central processing unit (CPU) or a graphics processing unit (GPU). The memory 2A is, for example, a memory such as a random access memory (RAM) or a read only memory (ROM). The input / output interface 3A is an interface for acquiring information from an input device (eg, keyboard, mouse, microphone, etc.), an external device, an external server, an external sensor, etc., an output device (eg, display, speaker, printer, mailer) Etc.), an interface for outputting information to an external device, an external server, etc. The processor 1A can issue an instruction to each module and perform an operation based on the result of the operation.

以下、参考形態の例を付記する。
１．集音し、第１の音声データを生成する音声受付手段と、
前記第１の音声データを通話相手の端末に送信する送信手段と、
通話相手の前記端末から送信された第２の音声データを取得する受信手段と、
前記通話相手の発言内容を出力する出力手段と、
を有し、
前記出力手段は、
前記通話相手の発言内容を音で出力せず、テキストで表示する第１のモード、
加工された前記第２の音声データに基づき、前記通話相手の発言内容を音で出力する第２のモード、又は、
前記通話相手の発言内容を予め登録された音で出力する第３のモード、
を有し、
前記出力手段が前記第１乃至第３のモードの中のいずれかで動作している間も、前記音声受付手段は集音して前記第１の音声データを生成し、前記送信手段は前記第１の音声データを通話相手の端末に送信する情報処理システム。
２．１に記載の情報処理システムにおいて、
前記出力手段は、前記第２のモード及び前記第３のモードで動作している間も、前記通話相手の発言内容をテキストで表示する情報処理システム。
３．１又は２に記載の情報処理システムにおいて、
前記出力手段は、前記通話相手の発言内容をテキストで表示する場合、特定のキーワードを伏字にする情報処理システム。
４．１から３のいずれかに記載の情報処理システムにおいて、
前記出力手段は、前記通話相手の発言内容を前記通話相手の声で出力する通常モードを有し、
前記出力手段のモードを決定する決定手段をさらに有し、
前記出力手段は、前記決定手段により決定されたモードで動作する情報処理システム。
５．４に記載の情報処理システムにおいて、
前記決定手段は、前記第２の音声データから特定される前記通話相手の感情、声の大きさ、発言内容及び通話時間の中の少なくとも１つに基づき、前記出力手段のモードを決定する情報処理システム。
６．５に記載の情報処理システムにおいて、
前記決定手段は、
前記通話相手の感情が予め定められた所定の感情、
前記通話相手の声の大きさが閾値以上、
特定のキーワードを所定回数以上発言、及び、
通話時間が閾値以上、
の中のいずれか１つ以上を満たした場合、前記出力手段のモードとして、前記第１乃至第３のモードの中のいずれかを決定する情報処理システム。
７．４から６のいずれかに記載の情報処理システムにおいて、
前記決定手段は、過去の通話時に収集された情報に基づき、前記出力手段のモードを決定する情報処理システム。
８．１から７のいずれかに記載の情報処理システムにおいて、
前記出力手段は、前記第１乃至第３のモードの中のいずれかで動作している間、前記通話相手の感情を示す情報を出力する情報処理システム。
９．１から８のいずれかに記載の情報処理システムにおいて、
前記出力手段は、前記第１のモードで動作している場合、前記通話相手の発言が途切れたタイミングを通知する情報処理システム。
１０．コンピュータが、
集音し、第１の音声データを生成し、
前記第１の音声データを通話相手の端末に送信し、
通話相手の前記端末から送信された第２の音声データを取得し、
前記通話相手の発言内容を音で出力せず、テキストで表示する第１のモード、加工された前記第２の音声データに基づき、前記通話相手の発言内容を音で出力する第２のモード、又は、前記通話相手の発言内容を予め登録された音で出力する第３のモード、で前記通話相手の発言内容を出力し、
前記第１乃至第３のモードの中のいずれかで前記通話相手の発言内容を出力している間も、集音して前記第１の音声データを生成し、前記第１の音声データを通話相手の端末に送信する情報処理方法。
１１．集音し、第１の音声データを生成し、
前記第１の音声データを通話相手の端末に送信し、
通話相手の前記端末から送信された第２の音声データを取得し、
前記通話相手の発言内容を音で出力せず、テキストで表示する第１のモード、加工された前記第２の音声データに基づき、前記通話相手の発言内容を音で出力する第２のモード、又は、前記通話相手の発言内容を予め登録された音で出力する第３のモード、で前記通話相手の発言内容を出力し、
前記第１乃至第３のモードの中のいずれかで前記通話相手の発言内容を出力している間も、集音して前記第１の音声データを生成し、前記第１の音声データを通話相手の端末に送信する処理をコンピュータに実行させるプログラム。 Hereinafter, an example of a reference form is added.
1. Voice receiving means for collecting sound and generating first voice data;
Transmitting means for transmitting the first voice data to the terminal of the other party of the call;
Receiving means for acquiring second voice data transmitted from the terminal of the other party of the call;
An output means for outputting the contents of the speech of the other party;
Have
The output means is
A first mode of displaying the contents of the other party's speech as text without outputting it as sound,
A second mode in which the speaking contents of the other party are output by sound based on the processed second voice data, or
A third mode for outputting the speech contents of the other party by a pre-registered sound,
Have
While the output means is operating in any one of the first to third modes, the voice reception means collects sound to generate the first voice data, and the transmission means An information processing system for transmitting voice data of 1 to a terminal of a calling party.
2. In the information processing system described in 1,
An information processing system, wherein the output means displays the contents of the other party's speech as text while operating in the second mode and the third mode.
3. In the information processing system according to 1 or 2,
The information processing system, wherein the output means turns a specific keyword into a bold letter when displaying the contents of the speech of the other party in text.
4. In the information processing system according to any one of 1 to 3,
The output means has a normal mode for outputting the contents of the callee's speech in the voice of the callee,
The apparatus further comprises determination means for determining a mode of the output means,
The information processing system, wherein the output unit operates in a mode determined by the determination unit.
5. In the information processing system described in 4,
The information processing device determines the mode of the output device on the basis of at least one of the emotion of the other party specified by the second voice data, the size of the voice, the content of the speech, and the call time, which is determined by the second voice data. system.
6. In the information processing system described in 5,
The determining means is
A predetermined emotion in which the emotion of the other party is predetermined;
The loudness of the other party's voice is above a threshold,
Speak a specific keyword more than a certain number of times, and
Call time is above threshold,
An information processing system which determines any one of the first to third modes as a mode of the output means when any one or more of the above are satisfied.
7. In the information processing system according to any one of 4 to 6,
The information processing system, wherein the determination means determines the mode of the output means based on information collected during a past call.
8. In the information processing system according to any one of 1 to 7,
The information processing system, wherein the output means outputs information indicating an emotion of the other party while operating in any one of the first to third modes.
9. In the information processing system according to any one of 1 to 8,
The information processing system, wherein the output unit is configured to notify a timing at which an utterance of the calling party is interrupted when operating in the first mode.
10. The computer is
Collect the sound and generate the first voice data,
Transmitting the first voice data to the other party's terminal;
Acquiring second voice data transmitted from the other party's terminal;
The first mode in which the speaking contents of the other party are not output as sound but displayed as text, and the second mode in which the speaking contents of the other party are output as sound based on the processed second voice data, Alternatively, in the third mode in which the speech contents of the other party are output with a pre-registered sound, the speech contents of the other party are output.
Also while outputting the speech contents of the other party in any of the first to third modes, the sound is collected to generate the first voice data, and the first voice data is called Information processing method to transmit to the other party's terminal.
11. Collect the sound and generate the first voice data,
Transmitting the first voice data to the other party's terminal;
Acquiring second voice data transmitted from the other party's terminal;
The first mode in which the speaking contents of the other party are not output as sound but displayed as text, and the second mode in which the speaking contents of the other party are output as sound based on the processed second voice data, Alternatively, in the third mode in which the speech contents of the other party are output with a pre-registered sound, the speech contents of the other party are output.
Also while outputting the speech contents of the other party in any of the first to third modes, the sound is collected to generate the first voice data, and the first voice data is called A program that causes a computer to execute processing to send to the other party's terminal.

１Ａプロセッサ
２Ａメモリ
３Ａ入出力Ｉ／Ｆ
４Ａ周辺回路
５Ａバス
１電話システム
１０通話部
１１音声受付部
１２送信部
１３受信部
１４出力部
１５決定部
２０処理部 1A processor 2A memory 3A input / output I / F
4A peripheral circuit 5A bus 1 telephone system 10 telephone call unit 11 voice reception unit 12 transmission unit 13 reception unit 14 output unit 15 determination unit 20 processing unit

Claims

Voice receiving means for collecting sound and generating first voice data;
Transmitting means for transmitting the first voice data to the terminal of the other party of the call;
Receiving means for acquiring second voice data transmitted from the terminal of the other party of the call;
An output means for outputting the contents of the speech of the other party;
Have
The output means is
A first mode of displaying the contents of the other party's speech as text without outputting it as sound,
A second mode in which the speaking contents of the other party are output by sound based on the processed second voice data, or
A third mode for outputting the speech contents of the other party by a pre-registered sound,
Have
While the output means is operating in any one of the first to third modes, the voice reception means collects sound to generate the first voice data, and the transmission means An information processing system for transmitting voice data of 1 to a terminal of a calling party.

In the information processing system according to claim 1,
An information processing system, wherein the output means displays the contents of the other party's speech as text while operating in the second mode and the third mode.

In the information processing system according to claim 1 or 2,
The information processing system, wherein the output means turns a specific keyword into a bold letter when displaying the contents of the speech of the other party in text.

The information processing system according to any one of claims 1 to 3.
The output means has a normal mode for outputting the contents of the callee's speech in the voice of the callee,
The apparatus further comprises determination means for determining a mode of the output means,
The information processing system, wherein the output unit operates in a mode determined by the determination unit.

In the information processing system according to claim 4,
The information processing device determines the mode of the output device on the basis of at least one of the emotion of the other party specified by the second voice data, the size of the voice, the content of the speech, and the call time, which is determined by the second voice data. system.

In the information processing system according to claim 5,
The determining means is
A predetermined emotion in which the emotion of the other party is predetermined;
The loudness of the other party's voice is above a threshold,
Speak a specific keyword more than a certain number of times, and
Call time is above threshold,
An information processing system which determines any one of the first to third modes as a mode of the output means when any one or more of the above are satisfied.

The information processing system according to any one of claims 4 to 6.
The information processing system, wherein the determination means determines the mode of the output means based on information collected during a past call.

The information processing system according to any one of claims 1 to 7.
The information processing system, wherein the output means outputs information indicating an emotion of the other party while operating in any one of the first to third modes.

The information processing system according to any one of claims 1 to 8.
The information processing system, wherein the output unit is configured to notify a timing at which an utterance of the calling party is interrupted when operating in the first mode.

The computer is
Collect the sound and generate the first voice data,
Transmitting the first voice data to the other party's terminal;
Acquiring second voice data transmitted from the other party's terminal;
The first mode in which the speaking contents of the other party are not output as sound but displayed as text, and the second mode in which the speaking contents of the other party are output as sound based on the processed second voice data, Alternatively, in the third mode in which the speech contents of the other party are output with a pre-registered sound, the speech contents of the other party are output.
Also while outputting the speech contents of the other party in any of the first to third modes, the sound is collected to generate the first voice data, and the first voice data is called Information processing method to transmit to the other party's terminal.

Collect the sound and generate the first voice data,
Transmitting the first voice data to the other party's terminal;
Acquiring second voice data transmitted from the other party's terminal;
The first mode in which the speaking contents of the other party are not output as sound but displayed as text, and the second mode in which the speaking contents of the other party are output as sound based on the processed second voice data, Alternatively, in the third mode in which the speech contents of the other party are output with a pre-registered sound, the speech contents of the other party are output.
Also while outputting the speech contents of the other party in any of the first to third modes, the sound is collected to generate the first voice data, and the first voice data is called A program that causes a computer to execute processing to send to the other party's terminal.