WO2016006354A1

WO2016006354A1 - Information processing device, and translation-data provision method

Info

Publication number: WO2016006354A1
Application number: PCT/JP2015/065266
Authority: WO
Inventors: 康憲加藤; 和樹関谷; 浩中里; 有一好光; 雅高水澤
Original assignee: Ｎｅｃソリューションイノベータ株式会社
Priority date: 2014-07-08
Filing date: 2015-05-27
Publication date: 2016-01-14
Also published as: JPWO2016006354A1

Abstract

This information processing device (50) is provided with: an information acquisition unit (51) which acquires language information from a terminal device; a transmission unit (52) which transmits, to a server device, language specification data corresponding to the acquired language information, and speech data of a speaker; a reception unit (53) which receives, from the server device, translation data obtained by translating the speech data into the language indicated by the language specification data; and a provision unit (54) which transmits, to the terminal device, the received translation data.

Description

Information processing apparatus and translation data providing method

The present invention relates to a technique for providing a translation service.

The following Patent Document 1 proposes a multilingual communication method. In this method, when a desired language is selected from the list on a monitor provided in the seat, the correspondence between the seat and the selected language is managed. Based on the correspondence, content is output in the selected language to the acoustic system and monitor of the seat.

Patent Document 2 below proposes a multi-channel conversation system that can smoothly and easily transition between a channel in a chat system and a conference room for VoIP in a VoIP (Voice over IP (Internet Protocol)) system. ing. This proposed system recognizes a voice conversation message transmitted and received in a VoIP conference room, translates a character string as a recognition result, and sends a character string extracted from the translation result and a keyword extracted from the character string to a chat server. Send. The chat server transmits the character string information of the translation result and the extracted keyword to the client terminal as a character string conversation message.

In the following Patent Document 3, a communication support method is proposed. In this proposal, the client device generates an internal representation based on the first language by recognizing and analyzing speech data, and determines the importance of the internal representation. The server device translates the internal representation into the second language in a mode corresponding to the importance. According to this proposed method, for input that does not include important content, a low-load translation process is automatically selected, thereby speeding up the response time until a translation result is obtained.

JP-T-2006-512647 JP 2004-185088 A JP 2004-355118 A

In the above proposed method, the server device provides multilingual translation results to a plurality of client devices. Thereby, each user of each client device can receive provision of contents in a desired language. However, in such a technique, each client device is required to prove the validity of each user and establish communication (session) with the server device. Each user information is registered in the server device for validity verification. That is, according to such a method, all information such as conversation participants and lecture attendees remains on the server device. Such information can be considered as personal information indicating personal preferences.

The present invention has been made in view of such circumstances, and realizes a technique for providing a listener with a translation service into a desired language without registering the listener's personal information in the server device.

In each aspect of the present invention, the following configurations are adopted in order to solve the above-described problems.

The first aspect relates to an information processing apparatus. An information processing apparatus according to a first aspect includes information acquisition means for acquiring language information from a terminal device, and transmission means for transmitting language designation data and utterance data of a speaker corresponding to the acquired language information to a server apparatus Receiving means for receiving, from a server device, translation data in which the utterance data is translated into the language indicated by the language designation data, and providing means for sending the received translation data to the terminal device.

The second aspect relates to a translation data providing method executed by at least one computer. The translation data providing method according to the second aspect acquires language information from a terminal device, transmits language designation data corresponding to the acquired language information and utterance data of a speaker to a server device, and the utterance data is Receiving translation data translated into a language indicated by the language designation data from a server device, and transmitting the received translation data to the terminal device.

As another aspect of the present invention, there may be a program for causing at least one computer to execute the method of the second aspect, or a computer-readable recording medium recording such a program. May be. This recording medium includes a non-transitory tangible medium.

According to each aspect described above, it is possible to realize a technology for providing a listener with a translation service into a desired language without registering the listener's personal information in the server device.

The above-described object and other objects, features, and advantages will be further clarified by a preferred embodiment described below and the following drawings attached thereto.

It is a figure which shows notionally the system configuration | structure of the translation system containing the speaker apparatus in 1st embodiment. It is a figure which shows notionally the hardware structural example of the speaker apparatus in 1st embodiment. It is a figure which shows notionally the process structural example of the speaker apparatus in 1st embodiment. It is a figure which shows the example of the matching information stored in a corresponding | compatible memory | storage part. It is a figure which shows the example of the data of the normal response from a server apparatus. It is a figure which shows the example of the data of the abnormal response from a server apparatus. It is a figure which shows notionally the process structural example of a server apparatus. It is a flowchart which shows the operation example of the speaker apparatus in 1st embodiment. It is a flowchart which shows the operation example of the speaker apparatus in 2nd embodiment. It is a figure which shows notionally the process structural example of the information processing apparatus in 3rd embodiment. It is a flowchart which shows the operation example of the information processing apparatus in 3rd embodiment.

Hereinafter, embodiments of the present invention will be described. In addition, each embodiment given below is an illustration, respectively, and this invention is not limited to the structure of each following embodiment.

[First embodiment]
Hereinafter, the speaker device and the translation data providing method in the first embodiment will be described with reference to a plurality of drawings.

〔System configuration〕
FIG. 1 is a diagram conceptually showing the system configuration of a translation system including a speaker device in the first embodiment. The translation system includes a server device 10, a speaker device 20, and the like. The translation system provides a translation service to the listener device 30 via the server device 10. The translation system can include a plurality of server apparatuses 10 and a plurality of speaker apparatuses 20, and can also provide a translation service to a plurality of listener apparatuses 30 via one server apparatus 10.

The server device 10 and the speaker device 20 are communicably connected via the communication network 9. The communication network 9 is a mobile phone line network, a Wi-Fi (Wireless Fidelity) line network, an Internet communication network, a dedicated line network, a LAN (Local Area Network), or the like. In the present embodiment, the communication form of the communication network 9 is not limited.

The server device 10 is a so-called computer and includes a CPU (Central Processing Unit) 2, a memory 3, an input / output interface (I / F) 4, a communication unit 7 and the like as shown in FIG. The memory 3 is a RAM (Random Access Memory), a ROM (Read Only Memory), a hard disk, or the like. The input / output I / F 4 can be connected to a user interface device such as a display device (not shown) or an input device (not shown). The communication unit 7 communicates with other computers such as the speaker device 20 and exchanges signals with other devices. The hardware configuration of the server device 10 is not limited.

《Speaker device》
FIG. 2 is a diagram conceptually illustrating a hardware configuration example of the speaker device 20 in the first embodiment. The speaker device 20 is a so-called computer such as a PC (Personal Computer), a mobile phone, a smartphone, a tablet terminal, and a wearable computer. The speaker device 20 includes a CPU 11, a memory 12, a display unit 13, a touch sensor 14, a communication unit 15, a microphone unit 16, a speaker unit 17, and the like. The CPU 11 is connected to other units via a communication line such as a bus.

The memory 12 is a RAM, a ROM, or an auxiliary storage device (such as a hard disk).
The display unit 13 includes a monitor such as an LCD (Liquid Crystal Display) or a CRT (Cathode Ray Tube) display, and performs display processing.

The touch sensor 14 receives an operation input from the user by sensing an external contact. The touch sensor 14 may be a sensor that can detect a proximity state from the outside even in a non-contact state. Further, the display unit 13 and the touch sensor 14 may be realized as a touch panel unit. Further, the speaker device 20 may have an input / output interface (not shown) connected to an input device such as a mouse or a keyboard together with the touch sensor 14 or instead of the touch sensor 14.

The microphone unit 16 is a sound collection device.
The speaker unit 17 is a sound output device.
The communication unit 15 communicates with other devices wirelessly or by wire. For example, when the speaker device 20 is a portable terminal, the communication unit 15 is wirelessly connected to the communication network 9, communicates with the communication unit 7 of the server device 10 via the communication network 9, and the listener device 30. Both perform wireless communication. Examples of wireless communication between the speaker device 20 and the listener device 30 include Bluetooth (registered trademark), ZigBee, NFC (Near Field Communication), and Wi-Fi. However, the form of the wireless communication is not limited.

The speaker device 20 can also include an imaging unit, a vibration sensor, an acceleration sensor, and the like in addition to the hardware elements shown in FIG. The hardware configuration of the speaker device 20 is not limited.

The listener device 30 is a so-called computer and has a hardware configuration similar to that of the speaker device 20. The hardware configuration of the listener device 30 is not limited as long as it can communicate with the speaker device 20 and can output the translation data sent from the speaker device 20. The hardware configuration of the speaker device 20 and the listener device 30 may be different.

[Processing configuration]
《Speaker device》
FIG. 3 is a diagram conceptually illustrating a processing configuration example of the speaker device 20 in the first embodiment. The speaker device 20 includes an information acquisition unit 21, a correspondence storage unit 22, an utterance data acquisition unit 23, a transmission unit 24, a reception unit 25, a provision unit 26, and the like. Each of these processing units is realized, for example, by executing a program stored in the memory 12 by the CPU 11. Further, the program may be installed from a portable recording medium such as a CD (Compact Disc) or a memory card or another computer on the network via the communication unit 15 and stored in the memory 12.

The information acquisition unit 21 acquires language information and a terminal ID from the plurality of listener devices 30. The language information is information on the language used by the user of each listener device 30, and indicates Japanese, English, French, German, Chinese, or the like. The language information obtained from each listener device 30 can also indicate a plurality of languages. When the language information indicates a plurality of languages, priority may be given to each language. When the language used by the user of each listener device 30 is different, the information acquisition unit 21 acquires a plurality of different language information from the plurality of listener devices 30.

The terminal ID is terminal identification information, and is used as an address of a destination or a transmission source in communication between the speaker device 20 and the listener device 30. The specific acquisition method of the language information and the terminal ID by the information acquisition unit 21 is exemplified in the section of the embodiment.
The information acquisition unit 21 stores each acquired language information and each terminal ID in the correspondence storage unit 22 in association with each other.

FIG. 4 is a diagram illustrating an example of association information stored in the correspondence storage unit 22. As illustrated in FIG. 4, the correspondence storage unit 22 stores a terminal ID and language information in association with each other. The linguistic information and terminal ID stored in the correspondence storage unit 22 may be those acquired by the information acquisition unit 21, or may be obtained by processing the information acquired by the information acquisition unit 21. May be. For example, when the information acquisition unit 21 acquires language information from text data, the correspondence storage unit 22 may store a language ID corresponding to the language indicated by the text data. When the language information acquired by the information acquisition unit 21 indicates a plurality of languages, the correspondence storage unit 22 may store information of one language extracted from the plurality of languages. The terminal ID stored in the correspondence storage unit 22 may be identification data uniquely generated by the speaker device 20. In this case, the speaker apparatus 20 manages the association between the uniquely generated identification data and the terminal ID acquired by the information acquisition unit 21.

The utterance data acquisition unit 23 acquires the utterance voice data of the speaker. The utterance data acquisition unit 23 acquires, as utterance voice data, voice data obtained by converting the voice signal collected by the microphone unit 16 by PCM (Pulse Code Modulation). The sound signal collected by the microphone unit 16 includes environmental sound in addition to the speech sound of the speaker. Therefore, the utterance data acquisition unit 23 can perform filter processing for removing the environmental sound on the acquired voice data, and use the obtained voice data as the utterance voice data. Further, the utterance data acquisition unit 23 may acquire utterance voice data including the silent time when the speaker does not speak, or may acquire the utterance voice data from which the silent time is removed.

The method of acquiring speech voice data by the speech data acquisition unit 23 is not limited to such a method. The utterance data acquisition unit 23 may acquire utterance voice data stored in the memory 12, a portable recording medium, or another computer in which the utterance of the speaker is recorded.

The utterance data acquisition unit 23 further acquires information on the language used in the utterance of the speaker. The utterance data acquisition unit 23 may have language information of the speaker in advance. The language information of the speaker may be input by the user operating the input device based on an input screen displayed on the monitor.

The transmission unit 24 receives the utterance voice data acquired by the utterance data acquisition unit 23 and language data corresponding to the language information of the speaker, and language designation data corresponding to the language information stored in the correspondence storage unit 22. 10 to send. For the language designation data and the language data, for example, a format defined as BCP 47 by IETF (The Internet Engineering Task B Force) is used. However, the data format of the language designation data and language data is arbitrary. The language data may be the language information of the speaker acquired by the utterance data acquisition unit 23. The language designation data may be the language information itself stored in the correspondence storage unit 22.

The transmission timing of the speech voice data, the language data of the speaker, and the language designation data may not be the same. For example, the transmission unit 24 can transmit the language designation data before other data after the language information is acquired by the information acquisition unit 21. Further, when the utterance data acquisition unit 23 has the language information of the speaker in advance, the transmission unit 24 can transmit the language data of the speaker before other data.

The receiving unit 25 receives translated text data from the server device 10 in which the speech voice data transmitted by the transmitting unit 24 is translated into the language indicated by the language designation data transmitted in the same manner. When the language designation data indicates a plurality of languages, the receiving unit 25 generates a plurality of pairs of translation text data and language data corresponding to the translation text data corresponding to the plurality of languages, based on the translation text data. It is received in a state where it can be associated with the text data of the speech recognition result of the uttered speech data.

As long as it is received in a state where it can be associated, there are no limitations on how to receive a plurality of pairs of translated text data and language data and text data of a speech recognition result. For example, the receiving unit 25 receives a plurality of pairs of translation text data and language data and text data of a speech recognition result as one communication message (response data). The receiving unit 25 may receive the plurality of pairs and the text data of the speech recognition result as separate communication messages (response data). In this case, association identification data for associating the plurality of pairs with the text data of the speech recognition result may be set in each communication message. Furthermore, the receiving unit 25 may receive a plurality of text recognition result text data for the plurality of pairs including one translation text data. Also in this case, by using the related identification data, a plurality of text data of the speech recognition results are linked, and the linked text data is associated with the plurality of pairs.

FIG. 5 is a diagram illustrating an example of normal response data from the server device 10. FIG. 6 is a diagram illustrating an example of abnormal response data from the server device 10. In the example of FIGS. 5 and 6, the response data from the server device 10 is described in a JSON (JavaScript (registered trademark) Object Notation) format. The value of the key “result” indicates whether it is a normal response (OK or ERROR), the sequence of the key “recg” indicates the speech recognition result, the sequence of the key “trans” indicates the translation result, and the key “code” The value indicates an error code, and the value of the key “message” indicates an error message. The “recg” array has an element “region” corresponding to the language data of the speaker and an element “text” corresponding to the text data of the speech recognition result of the speech data. The “trans” array includes pairs of an element “region” corresponding to language data and an element “text” corresponding to translated text data corresponding to the language data, for the number of languages indicated by the language designation data.

According to the example of FIG. 5, the translation text data and the language data are associated as one element in the “trans” array, and a plurality of pairs of the translation text data and the language data are “recg” in the response data. They are related by the relationship between the sequence and the “trans” sequence. However, the response data received by the receiving unit 25 from the server device 10 is not limited to the format shown in FIGS. If the identification data is set in each response data, the “recg” array and the “trans” array may be received by different response data.

The transmission unit 24 and the reception unit 25 can perform two-way communication with the server device 10 in one session by using, for example, a Web socket. According to this, utterance voice data in the direction from the speaker device 20 toward the server device 10 and translated text data in the reverse direction can be exchanged asynchronously. In other words, the server device 10 can freely divide the received utterance voice data, and sequentially transmit the translated text data converted from the divided partial utterance voice data to the speaker device 20 at an arbitrary timing.

The providing unit 26 transmits the translated text data received by the receiving unit 25 by specifying the terminal ID stored in the correspondence storage unit 22 in association with the language information corresponding to the translated text data as a destination. At this time, the providing unit 26 uses the language data received in association with the translated text data, and extracts the terminal ID associated with the language information that matches the language data from the correspondence storage unit 22. When a plurality of terminal IDs are extracted from the correspondence storage unit 22 for one language data, the providing unit 26 copies the translated text data by the number of the extracted terminal IDs, and the plurality of copied translation text data Are transmitted to the plurality of listener devices 30 indicated by the terminal IDs.

When a plurality of translated text data having different languages are received, the providing unit 26 causes the plurality of listener devices 30 to translate text corresponding to the language information of each listener device 30 in the received plurality of translated text data. The received plural translated text data are transmitted so that each data can be received. In this case, the providing unit 26 extracts a device ID from the correspondence storage unit 22 for each translation text data, and transmits each translated text data by designating the extracted device ID as a destination. The providing unit 26 may transmit the text data of the speech recognition result to the listener device 30 together with the translated text data.

<Server equipment>
FIG. 7 is a diagram conceptually illustrating a processing configuration example of the server device 10. The server device 10 includes a voice recognition unit 31, a translation unit 32, and the like. Each of these processing units is realized, for example, by executing a program stored in the memory 3 by the CPU 2. The program may be installed from a portable recording medium such as a CD or a memory card or another computer on the network via the communication unit 7 and stored in the memory 3.

The voice recognition unit 31 receives the voice data from the speaker device 20 and performs voice recognition processing on the voice data. A known voice recognition technique may be used for the voice recognition process. For example, the speech recognition unit 31 converts speech speech data into speech text data using an acoustic model formed by collecting sound waveform data and a language model formed by collecting words and word arrangements. In this case, the speech recognition unit 31 switches the acoustic model and the language model used in the speech recognition processing to the language model indicated by the language data based on the language data of the speaker sent from the speaker device 20. Moreover, the server apparatus 10 may have the speech recognition part 31 customized for each language for each language. In this case, based on the language data of the speaker sent from the speaker device 20, the server device 10 can also switch the voice recognition unit 31 to be executed.

The translation unit 32 performs a translation process (machine translation) on the utterance text data obtained by the speech recognition unit 31 from the language indicated by the language data of the speaker into the language indicated by the language designation data. For this translation process, a well-known translation technique such as a rule-based translation technique, a statistics-based translation technique, or the like may be used. When the language designation data indicates a plurality of different languages, the translation unit 32 performs a translation process corresponding to each language on the utterance text data. The translation unit 32 generates translation text data of each language indicated by the language designation data by the translation process.

The translation unit 32 generates a pair of the generated translation text data and the language data corresponding to the translation text data, and the text data (utterance text data) of the speech recognition result of the utterance voice data that is the source of the translation text data. To the speaker device 20 as response data. For example, the translation unit 32 transmits response data having the format shown in FIGS. 5 and 6 to the speaker apparatus 20.

The translation unit 32 may wait for speech text data having a length sufficient for translation to be obtained by the speech recognition unit 31 and execute the translation process. That is, the data unit translated by the translation unit 32 and the data unit processed by the speech recognition unit 31 may be different. If the speech recognition unit 31 obtains the utterance text data but does not perform the translation process on the data, the translation unit 32 produces the speech recognition result text data (utterance text data) and the text data. The related identification data for associating with the translated text data as the translation result may be transmitted to the speaker device 20 as response data. In this way, it is possible to avoid an event that the speaker device 20 cannot receive response data from the server device 10 for a long time.

[Operation example / Purchase support method]
Hereinafter, the translation data providing method in the first embodiment will be described with reference to FIG. FIG. 8 is a flowchart showing an operation example of the speaker device 20 in the first embodiment. As shown in FIG. 8, the translation data providing method in the first embodiment is executed by at least one computer such as the speaker device 20. For example, each illustrated process is executed by each processing unit included in the speaker device 20. Since each process is the same as the processing content of the above-mentioned each process part which the speaker apparatus 20 has, the detail of each process is abbreviate | omitted suitably.

In the following description, a case where the speaker device 20 provides translation data to a plurality of listener devices 30 is exemplified.

The speaker device 20 acquires language information and a device ID from each of the plurality of listener devices 30 (S81). The speaker device 20 associates the acquired language information with the device ID and stores them in the correspondence storage unit 22 (S82).

The speaker device 20 acquires the speech data and language information of the speaker (S83).
The speaker device 20 corresponds to the language data corresponding to the language information acquired in (S83), the speech data acquired in (S83), and the language information stored in the correspondence storage unit 22 in (S82). The language designation data to be transmitted is transmitted to the server device 10 (S84). When language information indicating a plurality of different languages is stored in the correspondence storage unit 22, language designation data indicating the plurality of languages is transmitted to the server device 10.

The server device 10 receives the data transmitted in (S84), performs speech recognition processing corresponding to the language indicated by the language data of the speaker on the received speech voice data, and generates speech text data. . The server device 10 performs a translation process on the utterance text data from the language of the speaker into the language indicated by the received language designation data. As a result, the server device 10 generates translated text data in which the speech voice data is translated into the language indicated by the language designation data.

The speaker device 20 receives response data for the data transmitted in (S84) from the server device 10 (S85). The response data includes a value indicating whether the response is normal. The response data indicating the normal response further includes a pair of translation text data and language data corresponding to the translation text data, and text data of a speech recognition result of the utterance voice data that is the basis of the translation text data. . When the language designation data indicates a plurality of languages, the speaker apparatus 20 sets a plurality of pairs of translation text data corresponding to the plurality of languages and language data corresponding to the translation text data in the translation text data. It is received as response data in a state in which it can be associated with the text data of the speech recognition result of the original speech voice data. Moreover, the speaker apparatus 20 may receive the text data and related identification data of a speech recognition result as response data.

The speaker apparatus 20 determines whether or not the response data indicates a normal response (S86). If the response data does not indicate a normal response (S86; NO), the speaker device 20 outputs error information based on the information set in the response data (S87). The error information to be output may be set in the response data as illustrated in FIG. Moreover, the output form of error information is arbitrary. The speaker device 20 can output error information to the monitor of the display unit 13. The speaker device 20 may cause the speaker unit 17 to send out a voice reading out the error information or a sound corresponding to the error information. Further, the speaker device 20 may transmit error information to each listener device 30.

When the response data indicates a normal response (S86; YES), the speaker apparatus 20 specifies the destination of the translated text data included in the response data (S88). Specifically, the speaker device 20 extracts the device ID associated with the language information corresponding to the translated text data from the correspondence storage unit 22, and uses the extracted device ID as the destination of the translated text data. At this time, a plurality of destinations (device IDs) may be specified for one translation text data. When the response data includes a plurality of translated text data related to a plurality of different languages, the speaker apparatus 20 specifies a destination (apparatus ID) for each of the plurality of translated text data.

The speaker device 20 transmits desired translation text data to each listener device 30 based on the destination specified in (S88) (S89). When a plurality of terminal IDs are extracted as destinations of one translation text data, the speaker apparatus 20 copies the translation text data by the number of extracted terminal IDs, and the copied plurality of translation text data. It transmits to the several listener apparatus 30 which those terminal ID shows. In addition, when the response data includes a plurality of translated text data in different languages, the speaker device 20 uses the plurality of listener devices 30 of each of the listener devices 30 in the received plurality of translated text data. The plurality of translated text data is transmitted so that each translated text data corresponding to the language information can be received.

The speaker device 20 may transmit the text data of the speech recognition result to the listener device 30 together with the translated text data. In addition, when the response data indicates a normal response (S86; YES) and the translated text data is not included in the response data, the speaker device 20 holds the response data and transmits the next response data. Wait (not shown). When the speaker device 20 receives the response data including the translated text data, the speaker device 20 links the text data of the speech recognition result and associates the linked data with the translated text data based on the related identification data. Do. In this case, the speaker device 20 may transmit only the text data of the speech recognition result included in the response data not including the translated text data to the listener device 30.

The listener device 30 acquires translated text data obtained by translating the speech voice data into the language indicated by the language information transmitted to the speaker device 20, and displays the translated text data on the monitor. In addition, the listener device 30 can output a voice that reads out the translated text data. Further, the listener device 30 may also output the text data of the speech recognition result of the utterance voice data that is received together with the translation text data and is the basis of the translation text data. In addition, when the listener device 30 receives the text data of the speech recognition result without the translated text data, the listener device 30 may output only the text data.

In FIG. 8, a plurality of steps (processes) are shown in order, but the steps executed in the first embodiment and the execution order of the steps are not limited to the example of FIG. For example, the speech audio data acquired in (S83) and the language information of the speaker may be acquired at different timings. The language information of the speaker can be acquired before (S81). Further, in FIG. 8, although simplified for convenience of explanation, when speech data is acquired at any time, (S83) and subsequent steps are repeated. Further, the translation data providing method may include a step of displaying the text data of the speech recognition result and the translation text data included in the response data indicating a normal response on the monitor of the display unit 13.

[Operation and Effect in First Embodiment]
As described above, in the first embodiment, the language information and the device ID are acquired by the speaker device 20 from the listener device 30 who wants to provide translation data, and the language information and the device ID are associated with each other in the correspondence storage unit 22. Stored. Then, utterance voice data that is the original data for translation is acquired by the utterer apparatus 20, and language designation data and utterance voice data corresponding to the correspondence information stored in the correspondence storage unit 22 are transmitted from the utterer apparatus 20 to the server apparatus 10. Sent to. In the server device 10, utterance voice data is converted into utterance text data by voice recognition, and the utterance text data is translated into a language indicated by the language designation data. The translated text data is sent from the server device 10 to the speaker device 20, and the device ID associated with the language information corresponding to the translated text data and stored in the correspondence storage unit 22 is designated as the destination, and the speech Is transmitted from the listener device 20 to the listener device 30.

Thus, according to the first embodiment, the listener device 30 can acquire the translation text data generated by the server device 10 via the speaker device 20 that acquires the speech voice data. In other words, the listener device 30 can obtain the translated text data from the speaker device 20 without accessing the server device 10 by providing the language information and the device ID to the speaker device 20. Conversely, the server device 10 only needs to recognize the speaker device 20, and does not need to recognize which listener device 30 receives the translated text data to be transmitted. Therefore, according to the first embodiment, a translation service into a desired language can be provided to the user without registering personal information of the user (listener) of the listener device 30 in the server device 10.

The reason why the personal information of the user of the listener device 30 is avoided as much as possible is registered in the server device 10 is that the server device 10 is in a third party (public) position unrelated to the utterance from which the translation is based. Because. In the first embodiment, the language information and the terminal ID of the listener device 30 are stored in the speaker device 20. However, each user of the speaker device 20 and the listener device 30 is in a relationship between the speaker and the listener, or a relationship close thereto (for example, a relationship between the person who acquires the speech data and the listener of the speech). The speaker device 20 is in a party position related to the utterance that is the source of translation. Therefore, even if such information is stored in the speaker device 20, it is difficult to be associated with leakage of personal information.

In the first embodiment, in addition to the speech voice data, language data corresponding to the language information of the speaker is transmitted from the speaker device 20 to the server device 10. Thereby, since the server apparatus 10 can switch a speech recognition process and a translation process for the language data, it can support a plurality of translation forms.

Further, in the first embodiment, when the language information acquired from the listener device 30 indicates a plurality of different languages, a plurality of pairs of translated text data and language data corresponding to the translated text data are converted into the translated text data. Is received by the speaker device 20 in a state associated with the text data of the speech recognition result of the speech data that is the source of the speech. Thereby, even when a plurality of listener devices 30 request a plurality of different languages, each listener device 30 can acquire translated text data translated into a desired language at substantially the same timing.

Further, in the first embodiment, the server apparatus 10 to the speaker apparatus 20 have a pair of translation text data and language data corresponding to the translation text data as the voice of the utterance voice data from which the translation text data is based. It is provided in a state where it can be associated with the text data of the recognition result. Thus, if the text data of the speech recognition result is displayed on the monitor of the speaker device 20, the speaker who is the user of the speaker device 20 or the person who can listen to the speech sees the text data. Thus, it can be determined whether or not the translated text data is accurate. If the text data of the speech recognition result does not have a sentence length sufficient for translation processing in the server device 10, the text data of the speech recognition result is sent from the server device 10 to the speaker device without the translation text data. 20 may be provided. Thereby, the speaker apparatus 20 can grasp | ascertain the translation condition in the server apparatus 10. FIG.

[Second Embodiment]
Hereinafter, the speaker apparatus and the translation data providing method in the second embodiment will be described with reference to a plurality of drawings. The system configuration in the second embodiment is the same as that in the first embodiment. The processing configurations of the server device 10 and the speaker device 20 in the second embodiment are also the same as in the first embodiment.

In the second embodiment, the speaker device 20 further acquires reliability information of speech recognition from the server device 10 in addition to the translated text data and the like. Hereinafter, the second embodiment will be described focusing on the contents different from the first embodiment, and the same contents as those of the first embodiment will be omitted as appropriate.

<Server equipment>
The speech recognition unit 31 generates speech text data by performing speech recognition processing on speech speech data, and further calculates the reliability of the speech recognition result. For example, the speech recognition unit 31 calculates the likelihood for each word of the recognition result candidate derived using the acoustic model and the language model, and the likelihood and selection of the word finally selected from the candidates The reliability can be calculated using a difference from the likelihood of the word that has not been performed. In this case, the higher the difference in likelihood, the higher the degree of reliability, and the lower the likelihood difference, the lower the degree of reliability. A known method may be used as a method for calculating the reliability of the speech recognition result.

The translation unit 32 associates the translation text data, the language data corresponding to the translation text data, the text data of the speech recognition result of the speech data that is the basis of the translation text data, and the reliability information of the speech recognition result. Transmit to the speaker device 20 in a possible state. When the translation unit 32 does not perform translation processing on the utterance text data obtained by the speech recognition unit 31, the text data (utterance text data) of the speech recognition result and the translation text data which is the translation result of this text data The association identification data for associating with and the reliability information of the voice recognition result may be transmitted to the speaker device 20 as response data.

《Speaker device》
The receiving unit 25 associates the translated text data, the language data corresponding to the translated text data, the text data of the speech recognition result of the speech data that is the basis of the translated text data, and the reliability information of the speech recognition result. Received from the server device 10 in a possible state. As long as it is received in a state where it can be associated, the method of receiving the translated text data, the language data, the text data of the speech recognition result, and the reliability information is not limited. When the text data and reliability information of the speech recognition result and the translated text data and language data are received as different response data, as described in the first embodiment, each response data includes related identification data. May be set.

The providing unit 26 determines whether to transmit the received translation text data as it is to the listener device 30 based on the reliability information received by the receiving unit 25. When the reliability information indicates a reliability greater than or equal to a predetermined value, the providing unit 26 transmits the translated text data to the listener device 30 as in the first embodiment. On the other hand, when the reliability information indicates a reliability lower than the predetermined value, the providing unit 26 outputs that the reliability is low because the accuracy of the translated text data is low. The providing unit 26 may display that the reliability is low on the monitor of the display unit 13 or may output the sound to the speaker unit 17 with sound. The predetermined value to be compared with the reliability is a reliability threshold and is held in advance by the providing unit 26.

Also, when the reliability information is lower than a predetermined value, the providing unit 26 may not send the translated text data to the listener device 30 or may allow the user to decide whether or not to send it. The providing unit 26 displays an operation button for selecting whether or not to transmit to the listener device 30 on the monitor together with the low reliability, and transmits the translated text data in response to a user operation on the operation button. You may decide to do or not send. The providing unit 26 may transmit reliability information to the listener device 30 together with the translation text data.

[Operation example / Purchase support method]
Hereinafter, the translation data providing method in the second embodiment will be described with reference to FIG. FIG. 9 is a flowchart showing an operation example of the speaker device 20 in the second embodiment. The execution subject of the translation data providing method in the second embodiment is the same as in the first embodiment. Since each process is the same as the processing content of the above-mentioned each process part which the speaker apparatus 20 has, the detail of each process is abbreviate | omitted suitably. In FIG. 9, steps having the same contents as those in FIG. 8 are denoted by the same reference numerals as those in FIG.

The speaker device 20 executes (S81) to (S84) as in the first embodiment.
The server device 10 receives the data transmitted in (S84), executes the speech recognition process and the translation process as in the first embodiment, and as a result, the speech voice data is translated into the language indicated by the language designation data. Generate translated text data. In addition, in the second embodiment, the server device 10 calculates the reliability of the voice recognition result. The server device 10 associates the translated text data, the language data corresponding to the translated text data, the text data of the speech recognition result of the speech data that is the basis of the translated text data, and the reliability information of the speech recognition result. In a possible state, it is transmitted as response data to the speaker device 20. The server device 10 may transmit the text data of the speech recognition result, the related identification data, and the reliability information of the speech recognition result as response data to the speaker device 20 without the translated text data.

The speaker apparatus 20 receives the response data from the server apparatus 10 (S91). The response data includes a value indicating whether or not the response is normal, a pair of translation text data and language data corresponding to the translation text data, text data of speech recognition results of the utterance voice data from which the translation text data is based And reliability information of the speech recognition result. When the language designation data indicates a plurality of languages, the speaker apparatus 20 sets a plurality of pairs of translation text data corresponding to the plurality of languages and language data corresponding to the translation text data in the translation text data. It is received as response data in a state associated with the text data and reliability information of the speech recognition result of the original speech voice data. Further, the speaker device 20 may receive text data of the speech recognition result, related identification data, and reliability information of the speech recognition result as response data.

The speaker device 20 determines whether or not the response data indicates a normal response (S92). If the response data does not indicate a normal response (S92; NO), the speaker device 20 outputs error information as in the first embodiment (S87).

When the response data indicates a normal response (S92; YES), the speaker apparatus 20 further determines whether or not the reliability information included in the response data indicates a reliability lower than a predetermined value (S93). When the reliability information indicates a reliability equal to or higher than a predetermined value (S93; NO), the speaker device 20 specifies the destination of the translated text data included in the response data (S88), as in the first embodiment. The desired translated text data is transmitted to each listener device 30 (S89). In the second embodiment, the speaker device 20 may transmit reliability information to the listener device 30 together with the translated text data.

The listener device 30 receives the translated text data from the speaker device 20 and displays it on a monitor that outputs the translated text data, as in the first embodiment. In the second embodiment, the listener device 30 can also output reliability information received together with the translated text data.

On the other hand, if the reliability information indicates a reliability lower than the predetermined value (S93; YES), the speaker device 20 presents that the reliability is low (S94). For example, the speaker device 20 may display that the reliability is low on the monitor of the display unit 13 or may output the sound to the speaker unit 17 with sound.

Furthermore, the speaker device 20 presents that the reliability is low, and causes the monitor to display an operation screen that allows the user to select whether or not to transmit the translated text data to the listener device 30. The speaker device 20 determines whether or not the user has selected transmission through a user operation via the operation image (S95). When the speaker device 20 determines that the user has selected transmission (S95; YES), the speaker device 20 executes (S88) as described above. If the speaker device 20 determines that the user has not selected transmission (S95; NO), the speaker device 20 outputs error information (S87).

In FIG. 9, a plurality of steps (processes) are shown in order, but the steps executed in the second embodiment and the execution order of the steps are not limited to the example of FIG. For example, when (S95) shown in FIG. 9 is omitted and the reliability is lower than a predetermined value (S93; YES), the speaker device 20 unconditionally sends the translated text data to the listener device 30. Instead, error information may be output (S87). Further, the speaker device 20 may always present the reliability information included in the response data without depending on the comparison result between the reliability and the predetermined value.

[Operation and Effect in Second Embodiment]
As described above, in the second embodiment, the reliability information of the speech recognition result is provided from the server device 10 to the speaker device 20 in a state where the reliability information can be associated with the translated text data, the language data, and the text data of the speech recognition result. Is done. Thereby, the speaker apparatus 20 can determine whether or not to transmit the translated text data to the listener apparatus 30 as it is based on the reliability information. When the reliability of the speech recognition result is low, the accuracy of the text data of the speech recognition result is low, and as a result, the accuracy of the translated text data converted from the text data is also low. Therefore, by using the reliability information, it is possible to prevent erroneous translation contents from being provided to the listener device 30. Further, if the speaker device 20 presents the reliability information, the speaker can be made to recognize that the reliability of voice recognition is low, and the speaker can be given a chance to rephrase. Thereby, the utterance content can be appropriately conveyed to the listener in another language. In addition, when the reliability is low, the speaker device 20 causes the user to select transmission of the translated text data to the listener device 30, so that the translated text data correctly translated even if the reliability is low, the listener device 30 can be provided.

[Supplement to the first embodiment and the second embodiment]
Although one server device 10 is illustrated in FIG. 1, the translation system can also include a plurality of server devices 10. For example, the server device 10 having the voice recognition unit 31 and the server device 10 having the translation unit 32 may be different devices. In this case, the server device 10 from which the utterer device 20 transmits uttered voice data or the like differs from the server device 10 from which the utterer device 20 receives the translated text data or the like. Different server devices 10 may be provided for each translated language.

Also, the listener device 30 may communicate with the speaker device 20 via another listener device 30. For example, the speaker device 20 and the plurality of listener devices 30 may form a wireless multi-hop network. According to this example, the listener device 30 existing at a position where the radio wave from the speaker device 20 does not reach can also receive translation data from the speaker device 20. A known technique may be used as a data propagation technique in the wireless multi-hop network.

[Third embodiment]
Hereinafter, the information processing apparatus and the translation data providing method according to the third embodiment will be described with reference to FIGS. 10 and 11.

FIG. 10 is a diagram conceptually illustrating a processing configuration example of the information processing apparatus in the third embodiment. As illustrated in FIG. 10, the information processing apparatus 50 includes an information acquisition unit 51, a transmission unit 52, a reception unit 53, a provision unit 54, and the like. The information acquisition unit 51 acquires language information from the terminal device. The transmission unit 52 transmits the language designation data corresponding to the language information acquired by the information acquisition unit 51 and the utterance data of the speaker to the server device. The receiving unit 53 receives from the server device the translation data in which the utterance data is translated into the language indicated by the language designation data. The providing unit 54 transmits the received translation data to the terminal device.

An example of the information processing apparatus 50 is the speaker apparatus 20 described above. An example of the terminal device is the listener device 30 described above, and an example of the server device is the server device 10 described above. However, the server device from which the receiving unit 53 receives translation data may be different from the server device from which the transmitting unit 52 transmits voice data.

An example of the specific processing content of the transmission unit 52 is the transmission unit 24 described above. The utterance data transmitted by the transmission unit 52 may not be voice data. For example, the transmission unit 52 may transmit the utterance text data as the utterance data to the server device. The utterance text data may be input by the user operating the input device of the information processing device 50. Further, the information processing apparatus 50 may include the voice recognition unit 31 described above, and the utterance text data may be converted from the utterance voice data by the voice recognition unit 31. In this case, the information processing apparatus 50 can generate text data as a speech recognition result and calculate the reliability of speech recognition. The server device that is the transmission destination of the utterance data may not have the voice recognition unit 31.

The transmission unit 52 may not transmit language data corresponding to the language information of the speaker. This corresponds to a case where the language of the speaker is fixedly fixed to one, or a case where the language can be automatically recognized from the utterance data on the server device side.

An example of the specific processing content of the receiving unit 53 is indicated by the receiving unit 25 described above. The translation data received by the receiving unit 53 may be voice data instead of text data. In this case, the server device generates and transmits translated voice data. In addition, when the language designation data transmitted by the transmission unit 52 indicates one language, the reception unit 53 does not need to receive language data corresponding to the translation data. Further, the receiving unit 53 may not receive the text data of the voice recognition result. This is because only the translation data need be provided to the terminal device, and the text data of the speech recognition result does not necessarily have to be presented in the information processing device 50.

An example of the specific processing content of the providing unit 54 is shown by the providing unit 26 described above. The translation data transmitted by the providing unit 54 may be voice data instead of text data. When the receiving unit 53 acquires the translated text data from the server device, the providing unit 54 may generate translated speech data that reads the translated text data, and transmit the translated speech data to the terminal device.

Also, the providing unit 54 can transmit the broadcast data in association with the language data corresponding to the translation data, instead of the unicast communication designating the terminal ID. In this case, the terminal device may extract translation data associated with desired language data from the received translation data.

An example of the specific processing content of the information acquisition unit 51 is indicated by the information acquisition unit 21 described above. However, the information acquisition unit 51 may not acquire the device ID when the providing unit 54 transmits the translation data by wireless broadcast.

As illustrated in FIG. 10, the information processing apparatus 50 may not include the correspondence storage unit 22. In this case, the information acquisition unit 51 may store each language information and each terminal ID in association with each other in the correspondence storage unit 22 of another computer. Moreover, the information acquisition part 51 should hold | maintain only language information, when not acquiring apparatus ID.

The information processing apparatus 50 shown in FIG. 10 has, for example, the same hardware configuration as the above-described speaker apparatus 20 shown in FIG. 2, and the program is processed in the same manner as the speaker apparatus 20. Each processing unit described above is realized. The hardware configuration of the information processing apparatus 50 is not limited.

FIG. 11 is a flowchart showing an operation example of the information processing apparatus 50 in the third embodiment. As shown in FIG. 11, the translation data providing method in the third embodiment is executed by at least one computer such as the information processing apparatus 50. For example, each illustrated process is performed by each processing unit included in the information processing apparatus 50.

The translation data providing method in this embodiment includes (S111) to (S116). In (S111), the computer acquires language information from the terminal device. In (S112), the computer transmits the language designation data corresponding to the language information acquired in (S111) and the utterance data of the speaker to the server device. In (S113), the computer receives response data from the server device. When the response data indicates a normal response (S114; YES), the response data includes translation data in which the utterance data is translated into the language indicated by the language designation data. When the response data does not indicate a normal response (S114; NO), the computer outputs error information (S116). In (S115), the computer transmits the translation data received in (S113) to the terminal device.

An example of (S111) is (S81) of FIGS. 8 and 9, an example of (S112) is (S84) of FIGS. 8 and 9, and an example of (S113) is (S85) of FIG. ) And (S91) of FIG. An example of (S115) is (S88) and (S89) in FIGS. 8 and 9, and an example of (S116) is (S87) in FIGS.

The third embodiment may be a program that causes at least one computer to execute such a method for providing translation data, or a recording medium that can be read by at least one computer that records such a program. There may be.

According to the third embodiment, the same operational effects as those of the first embodiment and the second embodiment described above can be obtained.

Hereinafter, examples will be given and the above-described embodiments will be described in more detail. The present invention is not limited in any way by the following examples.

When receiving the translation data, the listener operates his / her listener device 30 to pair his / her listener device 30 with the speaker device 20. Pairing between the listener device 30 and the speaker device 20 is realized through authentication corresponding to a form of wireless communication (Bluetooth (registered trademark), ZigBee, NFC, Wi-Fi, etc.) between both terminals. The In this pairing process, the speaker device 20 (information acquisition unit 21) may acquire a terminal ID from each listener device 30. Furthermore, at the time of pairing, the speaker device 20 establishes a radio channel with each of the listener devices 30 and receives user profile information from each of the listener devices 30 (the information acquisition unit 21 and the information acquisition unit 51). ). This user profile information includes the language information of the listener. Thereby, the user of the listener device 30 can receive provision of translation data only by performing an instruction operation for pairing with the speaker device 20.

Further, the speaker device 20 can also acquire the listener's language information from the listener device 30 by using a human area network technology that uses the surface electric field of the human body. In this case, the information acquisition unit 21 and the information acquisition unit 51 acquire language information from the terminal device as the human body communication with the terminal device such as the listener device 30 is successful. In this case, the speaker apparatus 20 has a communication unit 15 that performs human body communication using human area network technology, and acquires language information through human body communication using the communication unit 15. In this way, the user of the listener device 30 simply holds the listener device 30 and simply touches the holder of the speaker device 20 with the body like a handshake, so that the translation data can be easily obtained. You can receive the offer.

In each of the above-described embodiments, an utterance in a conversation between a speaker and a listener who is a user of the listener device 30 can be a translation target. Furthermore, in each embodiment, the speech of a speaker at a lecture or seminar can be translated. In this case, there may be a plurality of listeners who wish to listen in different languages. Even if the number of listener devices 30 that can be paired with the speaker device 20 is limited, a plurality of listener devices 30 can communicate with the speaker device 20 by using the wireless multi-hop network. Further, even if a wireless multi-hop network is not used, all the listener devices 30 can be paired with any one speaker device 20 by using a plurality of speaker devices 20. According to each embodiment described above, each listener can listen to translation data in a desired language almost simultaneously.

In the plurality of flowcharts used in the above description, a plurality of steps (processes) are described in order, but the execution order of the steps executed in each embodiment is not limited to the description order. In each embodiment, the order of the illustrated steps can be changed within a range that does not hinder the contents. Moreover, each above-mentioned embodiment and each modification can be combined in the range with which the content does not conflict.

Some or all of the above contents can be specified as follows. However, the above-mentioned content is not limited to the following description.

1. Information acquisition means for acquiring language information from the terminal device;
Transmitting means for transmitting the language designation data corresponding to the acquired language information and the utterance data of the speaker to the server device;
Receiving means for receiving, from a server device, translation data in which the utterance data is translated into a language indicated by the language designation data;
Providing means for transmitting the received translation data to the terminal device;
An information processing apparatus comprising:
2. The information acquisition means acquires a plurality of different language information from a plurality of terminal devices,
The transmission means transmits the language designation data and the utterance data corresponding to the acquired plurality of language information to a server device,
The receiving means receives a plurality of translation data translated into a plurality of languages indicated by the language designation data from a server device in a state associated with language data corresponding to each translation data,
The providing unit is configured to receive the plurality of received translation data so that the plurality of terminal devices can respectively receive translation data corresponding to language information of each terminal device among the plurality of received translation data. Send,
1. The information processing apparatus described in 1.
3. The information acquisition means acquires the language information and terminal identification information from the terminal device, stores each terminal identification information and each language information in association with each other,
The providing means transmits the received translation data by specifying terminal identification information stored in association with language information corresponding to the translation data as a destination;
1. Or 2. The information processing apparatus described in 1.
4). The providing means associates the received translation data with language data corresponding to the translation data, and transmits by radio broadcast.
1. To 3. The information processing apparatus according to any one of the above.
5. Speech data acquisition means for acquiring speech data and language information of the speaker;
Further comprising
The transmission means transmits language data corresponding to the language information of the speaker, the speech data as the speech data, and the language designation data to a server device,
The receiving means receives translated text data from the server device as the translated data;
1. To 4. The information processing apparatus according to any one of the above.
6). The receiving unit is configured to generate a plurality of pairs of the translated text data and language data corresponding to the translated text data corresponding to a plurality of languages indicated by the language designation data, and utterance voice data based on the translated text data Receive in a state that can be associated with the text data of the voice recognition result of
5. The information processing apparatus described in 1.
7). The receiving means includes the translated text data, language data corresponding to the translated text data, text data of speech recognition results of speech voice data that is the basis of the translated text data, and reliability information of the speech recognition results. Receive in a state that can be associated,
5. Or 6. The information processing apparatus described in 1.
8). The information acquisition means acquires the language information from the terminal device with the success of human body communication with the terminal device.
1. To 7. The information processing apparatus according to any one of the above.

9. In a translation data providing method executed on at least one computer,
Get language information from the terminal
Transmitting the language designation data corresponding to the acquired language information and the utterance data of the speaker to the server device;
Receiving the translation data from the server device, wherein the speech data is translated into the language indicated by the language designation data,
Transmitting the received translation data to the terminal device;
Translation data provision method including the above.
10. Acquire multiple different language information from multiple terminal devices,
Transmitting the language designation data and the utterance data corresponding to the acquired plurality of language information;
Receiving a plurality of translation data translated into a plurality of languages indicated by the language designation data in a state associated with each of the language data corresponding to each translation data;
Transmitting the plurality of received translation data so that the plurality of terminal devices can respectively receive translation data corresponding to language information of each terminal device among the plurality of received translation data;
Further includes: The translation data providing method described in 1.
11. Obtaining terminal identification information from the terminal device;
Each terminal identification information and each language information are stored in association with each other.
Further including
For transmission to the terminal device, the received translation data is transmitted by designating terminal identification information stored in association with language information corresponding to the translation data as a destination.
9. Or 10. The translation data providing method described in 1.
12 For transmission to the terminal device, the received translation data is associated with language data corresponding to the translation data, and is transmitted by radio broadcast.
9. To 11. The translation data provision method according to any one of the above.
13 Obtaining speech data and language information of the speaker,
Further including
Transmission to the server device, the language data corresponding to the language information of the speaker, the speech data as the speech data, and the language designation data,
Reception from the server device receives translation text data as the translation data from the server device,
9. To 12. The translation data provision method according to any one of the above.
14 The reception from the server device is based on a plurality of pairs of the translated text data and language data corresponding to the translated text data corresponding to a plurality of languages indicated by the language designation data. Received in a state where it can be associated with the text data of the speech recognition result of the speech data.
13 The translation data providing method described in 1.
15. The server device receives the translated text data, language data corresponding to the translated text data, text data of speech recognition results of speech voice data that is the basis of the translated text data, and trust of the speech recognition results. Receive information in a state that can be correlated,
13 Or 14. The translation data providing method described in 1.
16. The acquisition of the language information is acquired from the terminal device with the success of human body communication with the terminal device,
9. To 15. The translation data provision method according to any one of the above.

17. 9. To 16. A program that causes at least one computer to execute the translation data providing method according to any one of the above.
18. 17. The recording medium which records the program as described in readable by a computer.

This application claims priority based on Japanese Patent Application No. 2014-140134 filed on July 8, 2014, the entire disclosure of which is incorporated herein.

Claims

Information acquisition means for acquiring language information from the terminal device;
Transmitting means for transmitting the language designation data corresponding to the acquired language information and the utterance data of the speaker to the server device;
Receiving means for receiving, from a server device, translation data in which the utterance data is translated into a language indicated by the language designation data;
Providing means for transmitting the received translation data to the terminal device;
An information processing apparatus comprising:
The information acquisition means acquires a plurality of different language information from a plurality of terminal devices,
The transmission means transmits the language designation data and the utterance data corresponding to the acquired plurality of language information to a server device,
The receiving means receives a plurality of translation data translated into a plurality of languages indicated by the language designation data from a server device in a state associated with language data corresponding to each translation data,
The providing unit is configured to receive the plurality of received translation data so that the plurality of terminal devices can respectively receive translation data corresponding to language information of each terminal device among the plurality of received translation data. Send,
The information processing apparatus according to claim 1.
The information acquisition means acquires the language information and terminal identification information from the terminal device, stores each terminal identification information and each language information in association with each other,
The providing means transmits the received translation data by specifying terminal identification information stored in association with language information corresponding to the translation data as a destination;
The information processing apparatus according to claim 1 or 2.
The providing means associates the received translation data with language data corresponding to the translation data, and transmits by radio broadcast.
The information processing apparatus according to any one of claims 1 to 3.
Speech data acquisition means for acquiring speech data and language information of the speaker;
Further comprising
The transmission means transmits language data corresponding to the language information of the speaker, the speech data as the speech data, and the language designation data to a server device,
The receiving means receives translated text data from the server device as the translated data;
The information processing apparatus according to any one of claims 1 to 4.
The receiving unit is configured to generate a plurality of pairs of the translated text data and language data corresponding to the translated text data corresponding to a plurality of languages indicated by the language designation data, and utterance voice data based on the translated text data Receive in a state that can be associated with the text data of the voice recognition result of
The information processing apparatus according to claim 5.
The receiving means includes the translated text data, language data corresponding to the translated text data, text data of speech recognition results of speech voice data that is the basis of the translated text data, and reliability information of the speech recognition results. Receive in a state that can be associated,
The information processing apparatus according to claim 5 or 6.
The information acquisition means acquires the language information from the terminal device with the success of human body communication with the terminal device.
The information processing apparatus according to any one of claims 1 to 7.
In a translation data providing method executed on at least one computer,
Get language information from the terminal
Transmitting the language designation data corresponding to the acquired language information and the utterance data of the speaker to the server device;
Receiving the translation data from the server device, wherein the speech data is translated into the language indicated by the language designation data,
Transmitting the received translation data to the terminal device;
Translation data provision method including the above.
A program for causing at least one computer to execute the translation data providing method according to claim 9.