WO2016006354A1 - Information processing device, and translation-data provision method - Google Patents

Information processing device, and translation-data provision method Download PDF

Info

Publication number
WO2016006354A1
WO2016006354A1 PCT/JP2015/065266 JP2015065266W WO2016006354A1 WO 2016006354 A1 WO2016006354 A1 WO 2016006354A1 JP 2015065266 W JP2015065266 W JP 2015065266W WO 2016006354 A1 WO2016006354 A1 WO 2016006354A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
language
information
translation
text data
Prior art date
Application number
PCT/JP2015/065266
Other languages
French (fr)
Japanese (ja)
Inventor
康憲 加藤
和樹 関谷
浩 中里
有一 好光
雅高 水澤
Original Assignee
Necソリューションイノベータ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Necソリューションイノベータ株式会社 filed Critical Necソリューションイノベータ株式会社
Priority to JP2016532491A priority Critical patent/JPWO2016006354A1/en
Publication of WO2016006354A1 publication Critical patent/WO2016006354A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language

Definitions

  • the present invention relates to a technique for providing a translation service.
  • Patent Document 1 proposes a multilingual communication method.
  • this method when a desired language is selected from the list on a monitor provided in the seat, the correspondence between the seat and the selected language is managed. Based on the correspondence, content is output in the selected language to the acoustic system and monitor of the seat.
  • Patent Document 2 proposes a multi-channel conversation system that can smoothly and easily transition between a channel in a chat system and a conference room for VoIP in a VoIP (Voice over IP (Internet Protocol)) system. ing.
  • This proposed system recognizes a voice conversation message transmitted and received in a VoIP conference room, translates a character string as a recognition result, and sends a character string extracted from the translation result and a keyword extracted from the character string to a chat server.
  • the chat server transmits the character string information of the translation result and the extracted keyword to the client terminal as a character string conversation message.
  • a communication support method is proposed.
  • the client device generates an internal representation based on the first language by recognizing and analyzing speech data, and determines the importance of the internal representation.
  • the server device translates the internal representation into the second language in a mode corresponding to the importance.
  • a low-load translation process is automatically selected, thereby speeding up the response time until a translation result is obtained.
  • the server device provides multilingual translation results to a plurality of client devices.
  • each user of each client device can receive provision of contents in a desired language.
  • each client device is required to prove the validity of each user and establish communication (session) with the server device.
  • Each user information is registered in the server device for validity verification. That is, according to such a method, all information such as conversation participants and lecture attendees remains on the server device. Such information can be considered as personal information indicating personal preferences.
  • the present invention has been made in view of such circumstances, and realizes a technique for providing a listener with a translation service into a desired language without registering the listener's personal information in the server device.
  • An information processing apparatus includes information acquisition means for acquiring language information from a terminal device, and transmission means for transmitting language designation data and utterance data of a speaker corresponding to the acquired language information to a server apparatus Receiving means for receiving, from a server device, translation data in which the utterance data is translated into the language indicated by the language designation data, and providing means for sending the received translation data to the terminal device.
  • the second aspect relates to a translation data providing method executed by at least one computer.
  • the translation data providing method acquires language information from a terminal device, transmits language designation data corresponding to the acquired language information and utterance data of a speaker to a server device, and the utterance data is Receiving translation data translated into a language indicated by the language designation data from a server device, and transmitting the received translation data to the terminal device.
  • a program for causing at least one computer to execute the method of the second aspect or a computer-readable recording medium recording such a program. May be.
  • This recording medium includes a non-transitory tangible medium.
  • FIG. 1 is a diagram conceptually showing the system configuration of a translation system including a speaker device in the first embodiment.
  • the translation system includes a server device 10, a speaker device 20, and the like.
  • the translation system provides a translation service to the listener device 30 via the server device 10.
  • the translation system can include a plurality of server apparatuses 10 and a plurality of speaker apparatuses 20, and can also provide a translation service to a plurality of listener apparatuses 30 via one server apparatus 10.
  • the server device 10 and the speaker device 20 are communicably connected via the communication network 9.
  • the communication network 9 is a mobile phone line network, a Wi-Fi (Wireless Fidelity) line network, an Internet communication network, a dedicated line network, a LAN (Local Area Network), or the like.
  • the communication form of the communication network 9 is not limited.
  • the server device 10 is a so-called computer and includes a CPU (Central Processing Unit) 2, a memory 3, an input / output interface (I / F) 4, a communication unit 7 and the like as shown in FIG.
  • the memory 3 is a RAM (Random Access Memory), a ROM (Read Only Memory), a hard disk, or the like.
  • the input / output I / F 4 can be connected to a user interface device such as a display device (not shown) or an input device (not shown).
  • the communication unit 7 communicates with other computers such as the speaker device 20 and exchanges signals with other devices.
  • the hardware configuration of the server device 10 is not limited.
  • FIG. 2 is a diagram conceptually illustrating a hardware configuration example of the speaker device 20 in the first embodiment.
  • the speaker device 20 is a so-called computer such as a PC (Personal Computer), a mobile phone, a smartphone, a tablet terminal, and a wearable computer.
  • the speaker device 20 includes a CPU 11, a memory 12, a display unit 13, a touch sensor 14, a communication unit 15, a microphone unit 16, a speaker unit 17, and the like.
  • the CPU 11 is connected to other units via a communication line such as a bus.
  • the memory 12 is a RAM, a ROM, or an auxiliary storage device (such as a hard disk).
  • the display unit 13 includes a monitor such as an LCD (Liquid Crystal Display) or a CRT (Cathode Ray Tube) display, and performs display processing.
  • LCD Liquid Crystal Display
  • CRT Cathode Ray Tube
  • the touch sensor 14 receives an operation input from the user by sensing an external contact.
  • the touch sensor 14 may be a sensor that can detect a proximity state from the outside even in a non-contact state.
  • the display unit 13 and the touch sensor 14 may be realized as a touch panel unit.
  • the speaker device 20 may have an input / output interface (not shown) connected to an input device such as a mouse or a keyboard together with the touch sensor 14 or instead of the touch sensor 14.
  • the microphone unit 16 is a sound collection device.
  • the speaker unit 17 is a sound output device.
  • the communication unit 15 communicates with other devices wirelessly or by wire. For example, when the speaker device 20 is a portable terminal, the communication unit 15 is wirelessly connected to the communication network 9, communicates with the communication unit 7 of the server device 10 via the communication network 9, and the listener device 30. Both perform wireless communication. Examples of wireless communication between the speaker device 20 and the listener device 30 include Bluetooth (registered trademark), ZigBee, NFC (Near Field Communication), and Wi-Fi. However, the form of the wireless communication is not limited.
  • the speaker device 20 can also include an imaging unit, a vibration sensor, an acceleration sensor, and the like in addition to the hardware elements shown in FIG.
  • the hardware configuration of the speaker device 20 is not limited.
  • the listener device 30 is a so-called computer and has a hardware configuration similar to that of the speaker device 20.
  • the hardware configuration of the listener device 30 is not limited as long as it can communicate with the speaker device 20 and can output the translation data sent from the speaker device 20.
  • the hardware configuration of the speaker device 20 and the listener device 30 may be different.
  • FIG. 3 is a diagram conceptually illustrating a processing configuration example of the speaker device 20 in the first embodiment.
  • the speaker device 20 includes an information acquisition unit 21, a correspondence storage unit 22, an utterance data acquisition unit 23, a transmission unit 24, a reception unit 25, a provision unit 26, and the like.
  • Each of these processing units is realized, for example, by executing a program stored in the memory 12 by the CPU 11. Further, the program may be installed from a portable recording medium such as a CD (Compact Disc) or a memory card or another computer on the network via the communication unit 15 and stored in the memory 12.
  • a portable recording medium such as a CD (Compact Disc) or a memory card or another computer on the network via the communication unit 15 and stored in the memory 12.
  • CD Compact Disc
  • the information acquisition unit 21 acquires language information and a terminal ID from the plurality of listener devices 30.
  • the language information is information on the language used by the user of each listener device 30, and indicates Japanese, English, French, German, Chinese, or the like.
  • the language information obtained from each listener device 30 can also indicate a plurality of languages. When the language information indicates a plurality of languages, priority may be given to each language.
  • the information acquisition unit 21 acquires a plurality of different language information from the plurality of listener devices 30.
  • the terminal ID is terminal identification information, and is used as an address of a destination or a transmission source in communication between the speaker device 20 and the listener device 30.
  • the specific acquisition method of the language information and the terminal ID by the information acquisition unit 21 is exemplified in the section of the embodiment.
  • the information acquisition unit 21 stores each acquired language information and each terminal ID in the correspondence storage unit 22 in association with each other.
  • FIG. 4 is a diagram illustrating an example of association information stored in the correspondence storage unit 22.
  • the correspondence storage unit 22 stores a terminal ID and language information in association with each other.
  • the linguistic information and terminal ID stored in the correspondence storage unit 22 may be those acquired by the information acquisition unit 21, or may be obtained by processing the information acquired by the information acquisition unit 21. May be.
  • the correspondence storage unit 22 may store a language ID corresponding to the language indicated by the text data.
  • the correspondence storage unit 22 may store information of one language extracted from the plurality of languages.
  • the terminal ID stored in the correspondence storage unit 22 may be identification data uniquely generated by the speaker device 20. In this case, the speaker apparatus 20 manages the association between the uniquely generated identification data and the terminal ID acquired by the information acquisition unit 21.
  • the utterance data acquisition unit 23 acquires the utterance voice data of the speaker.
  • the utterance data acquisition unit 23 acquires, as utterance voice data, voice data obtained by converting the voice signal collected by the microphone unit 16 by PCM (Pulse Code Modulation).
  • the sound signal collected by the microphone unit 16 includes environmental sound in addition to the speech sound of the speaker. Therefore, the utterance data acquisition unit 23 can perform filter processing for removing the environmental sound on the acquired voice data, and use the obtained voice data as the utterance voice data. Further, the utterance data acquisition unit 23 may acquire utterance voice data including the silent time when the speaker does not speak, or may acquire the utterance voice data from which the silent time is removed.
  • the method of acquiring speech voice data by the speech data acquisition unit 23 is not limited to such a method.
  • the utterance data acquisition unit 23 may acquire utterance voice data stored in the memory 12, a portable recording medium, or another computer in which the utterance of the speaker is recorded.
  • the utterance data acquisition unit 23 further acquires information on the language used in the utterance of the speaker.
  • the utterance data acquisition unit 23 may have language information of the speaker in advance.
  • the language information of the speaker may be input by the user operating the input device based on an input screen displayed on the monitor.
  • the transmission unit 24 receives the utterance voice data acquired by the utterance data acquisition unit 23 and language data corresponding to the language information of the speaker, and language designation data corresponding to the language information stored in the correspondence storage unit 22. 10 to send.
  • language designation data and the language data for example, a format defined as BCP 47 by IETF (The Internet Engineering Task B Force) is used.
  • the data format of the language designation data and language data is arbitrary.
  • the language data may be the language information of the speaker acquired by the utterance data acquisition unit 23.
  • the language designation data may be the language information itself stored in the correspondence storage unit 22.
  • the transmission timing of the speech voice data, the language data of the speaker, and the language designation data may not be the same.
  • the transmission unit 24 can transmit the language designation data before other data after the language information is acquired by the information acquisition unit 21. Further, when the utterance data acquisition unit 23 has the language information of the speaker in advance, the transmission unit 24 can transmit the language data of the speaker before other data.
  • the receiving unit 25 receives translated text data from the server device 10 in which the speech voice data transmitted by the transmitting unit 24 is translated into the language indicated by the language designation data transmitted in the same manner.
  • the receiving unit 25 When the language designation data indicates a plurality of languages, the receiving unit 25 generates a plurality of pairs of translation text data and language data corresponding to the translation text data corresponding to the plurality of languages, based on the translation text data. It is received in a state where it can be associated with the text data of the speech recognition result of the uttered speech data.
  • the receiving unit 25 receives a plurality of pairs of translation text data and language data and text data of a speech recognition result as one communication message (response data).
  • the receiving unit 25 may receive the plurality of pairs and the text data of the speech recognition result as separate communication messages (response data).
  • association identification data for associating the plurality of pairs with the text data of the speech recognition result may be set in each communication message.
  • the receiving unit 25 may receive a plurality of text recognition result text data for the plurality of pairs including one translation text data. Also in this case, by using the related identification data, a plurality of text data of the speech recognition results are linked, and the linked text data is associated with the plurality of pairs.
  • FIG. 5 is a diagram illustrating an example of normal response data from the server device 10.
  • FIG. 6 is a diagram illustrating an example of abnormal response data from the server device 10.
  • the response data from the server device 10 is described in a JSON (JavaScript (registered trademark) Object Notation) format.
  • the value of the key “result” indicates whether it is a normal response (OK or ERROR)
  • the sequence of the key “recg” indicates the speech recognition result
  • the sequence of the key “trans” indicates the translation result
  • the key “code” The value indicates an error code
  • the value of the key “message” indicates an error message.
  • the “recg” array has an element “region” corresponding to the language data of the speaker and an element “text” corresponding to the text data of the speech recognition result of the speech data.
  • the “trans” array includes pairs of an element “region” corresponding to language data and an element “text” corresponding to translated text data corresponding to the language data, for the number of languages indicated by the language designation data.
  • the translation text data and the language data are associated as one element in the “trans” array, and a plurality of pairs of the translation text data and the language data are “recg” in the response data. They are related by the relationship between the sequence and the “trans” sequence.
  • the response data received by the receiving unit 25 from the server device 10 is not limited to the format shown in FIGS. If the identification data is set in each response data, the “recg” array and the “trans” array may be received by different response data.
  • the transmission unit 24 and the reception unit 25 can perform two-way communication with the server device 10 in one session by using, for example, a Web socket. According to this, utterance voice data in the direction from the speaker device 20 toward the server device 10 and translated text data in the reverse direction can be exchanged asynchronously. In other words, the server device 10 can freely divide the received utterance voice data, and sequentially transmit the translated text data converted from the divided partial utterance voice data to the speaker device 20 at an arbitrary timing.
  • the providing unit 26 transmits the translated text data received by the receiving unit 25 by specifying the terminal ID stored in the correspondence storage unit 22 in association with the language information corresponding to the translated text data as a destination. At this time, the providing unit 26 uses the language data received in association with the translated text data, and extracts the terminal ID associated with the language information that matches the language data from the correspondence storage unit 22. When a plurality of terminal IDs are extracted from the correspondence storage unit 22 for one language data, the providing unit 26 copies the translated text data by the number of the extracted terminal IDs, and the plurality of copied translation text data Are transmitted to the plurality of listener devices 30 indicated by the terminal IDs.
  • the providing unit 26 causes the plurality of listener devices 30 to translate text corresponding to the language information of each listener device 30 in the received plurality of translated text data.
  • the received plural translated text data are transmitted so that each data can be received.
  • the providing unit 26 extracts a device ID from the correspondence storage unit 22 for each translation text data, and transmits each translated text data by designating the extracted device ID as a destination.
  • the providing unit 26 may transmit the text data of the speech recognition result to the listener device 30 together with the translated text data.
  • FIG. 7 is a diagram conceptually illustrating a processing configuration example of the server device 10.
  • the server device 10 includes a voice recognition unit 31, a translation unit 32, and the like.
  • Each of these processing units is realized, for example, by executing a program stored in the memory 3 by the CPU 2.
  • the program may be installed from a portable recording medium such as a CD or a memory card or another computer on the network via the communication unit 7 and stored in the memory 3.
  • the voice recognition unit 31 receives the voice data from the speaker device 20 and performs voice recognition processing on the voice data.
  • a known voice recognition technique may be used for the voice recognition process.
  • the speech recognition unit 31 converts speech speech data into speech text data using an acoustic model formed by collecting sound waveform data and a language model formed by collecting words and word arrangements.
  • the speech recognition unit 31 switches the acoustic model and the language model used in the speech recognition processing to the language model indicated by the language data based on the language data of the speaker sent from the speaker device 20.
  • the server apparatus 10 may have the speech recognition part 31 customized for each language for each language. In this case, based on the language data of the speaker sent from the speaker device 20, the server device 10 can also switch the voice recognition unit 31 to be executed.
  • the translation unit 32 performs a translation process (machine translation) on the utterance text data obtained by the speech recognition unit 31 from the language indicated by the language data of the speaker into the language indicated by the language designation data.
  • a well-known translation technique such as a rule-based translation technique, a statistics-based translation technique, or the like may be used.
  • the language designation data indicates a plurality of different languages
  • the translation unit 32 performs a translation process corresponding to each language on the utterance text data.
  • the translation unit 32 generates translation text data of each language indicated by the language designation data by the translation process.
  • the translation unit 32 generates a pair of the generated translation text data and the language data corresponding to the translation text data, and the text data (utterance text data) of the speech recognition result of the utterance voice data that is the source of the translation text data. To the speaker device 20 as response data.
  • the translation unit 32 transmits response data having the format shown in FIGS. 5 and 6 to the speaker apparatus 20.
  • the translation unit 32 may wait for speech text data having a length sufficient for translation to be obtained by the speech recognition unit 31 and execute the translation process. That is, the data unit translated by the translation unit 32 and the data unit processed by the speech recognition unit 31 may be different. If the speech recognition unit 31 obtains the utterance text data but does not perform the translation process on the data, the translation unit 32 produces the speech recognition result text data (utterance text data) and the text data. The related identification data for associating with the translated text data as the translation result may be transmitted to the speaker device 20 as response data. In this way, it is possible to avoid an event that the speaker device 20 cannot receive response data from the server device 10 for a long time.
  • FIG. 8 is a flowchart showing an operation example of the speaker device 20 in the first embodiment.
  • the translation data providing method in the first embodiment is executed by at least one computer such as the speaker device 20.
  • each illustrated process is executed by each processing unit included in the speaker device 20. Since each process is the same as the processing content of the above-mentioned each process part which the speaker apparatus 20 has, the detail of each process is abbreviate
  • the speaker device 20 acquires language information and a device ID from each of the plurality of listener devices 30 (S81). The speaker device 20 associates the acquired language information with the device ID and stores them in the correspondence storage unit 22 (S82).
  • the speaker device 20 acquires the speech data and language information of the speaker (S83).
  • the speaker device 20 corresponds to the language data corresponding to the language information acquired in (S83), the speech data acquired in (S83), and the language information stored in the correspondence storage unit 22 in (S82).
  • the language designation data to be transmitted is transmitted to the server device 10 (S84).
  • language designation data indicating the plurality of languages is transmitted to the server device 10.
  • the server device 10 receives the data transmitted in (S84), performs speech recognition processing corresponding to the language indicated by the language data of the speaker on the received speech voice data, and generates speech text data. .
  • the server device 10 performs a translation process on the utterance text data from the language of the speaker into the language indicated by the received language designation data. As a result, the server device 10 generates translated text data in which the speech voice data is translated into the language indicated by the language designation data.
  • the speaker device 20 receives response data for the data transmitted in (S84) from the server device 10 (S85).
  • the response data includes a value indicating whether the response is normal.
  • the response data indicating the normal response further includes a pair of translation text data and language data corresponding to the translation text data, and text data of a speech recognition result of the utterance voice data that is the basis of the translation text data.
  • the speaker apparatus 20 sets a plurality of pairs of translation text data corresponding to the plurality of languages and language data corresponding to the translation text data in the translation text data. It is received as response data in a state in which it can be associated with the text data of the speech recognition result of the original speech voice data.
  • the speaker apparatus 20 may receive the text data and related identification data of a speech recognition result as response data.
  • the speaker apparatus 20 determines whether or not the response data indicates a normal response (S86). If the response data does not indicate a normal response (S86; NO), the speaker device 20 outputs error information based on the information set in the response data (S87). The error information to be output may be set in the response data as illustrated in FIG. Moreover, the output form of error information is arbitrary.
  • the speaker device 20 can output error information to the monitor of the display unit 13.
  • the speaker device 20 may cause the speaker unit 17 to send out a voice reading out the error information or a sound corresponding to the error information. Further, the speaker device 20 may transmit error information to each listener device 30.
  • the speaker apparatus 20 specifies the destination of the translated text data included in the response data (S88). Specifically, the speaker device 20 extracts the device ID associated with the language information corresponding to the translated text data from the correspondence storage unit 22, and uses the extracted device ID as the destination of the translated text data. At this time, a plurality of destinations (device IDs) may be specified for one translation text data.
  • the response data includes a plurality of translated text data related to a plurality of different languages
  • the speaker apparatus 20 specifies a destination (apparatus ID) for each of the plurality of translated text data.
  • the speaker device 20 transmits desired translation text data to each listener device 30 based on the destination specified in (S88) (S89).
  • the speaker apparatus 20 copies the translation text data by the number of extracted terminal IDs, and the copied plurality of translation text data. It transmits to the several listener apparatus 30 which those terminal ID shows.
  • the speaker device 20 uses the plurality of listener devices 30 of each of the listener devices 30 in the received plurality of translated text data. The plurality of translated text data is transmitted so that each translated text data corresponding to the language information can be received.
  • the speaker device 20 may transmit the text data of the speech recognition result to the listener device 30 together with the translated text data.
  • the response data indicates a normal response (S86; YES) and the translated text data is not included in the response data
  • the speaker device 20 holds the response data and transmits the next response data. Wait (not shown).
  • the speaker device 20 receives the response data including the translated text data
  • the speaker device 20 links the text data of the speech recognition result and associates the linked data with the translated text data based on the related identification data. Do. In this case, the speaker device 20 may transmit only the text data of the speech recognition result included in the response data not including the translated text data to the listener device 30.
  • the listener device 30 acquires translated text data obtained by translating the speech voice data into the language indicated by the language information transmitted to the speaker device 20, and displays the translated text data on the monitor. In addition, the listener device 30 can output a voice that reads out the translated text data. Further, the listener device 30 may also output the text data of the speech recognition result of the utterance voice data that is received together with the translation text data and is the basis of the translation text data. In addition, when the listener device 30 receives the text data of the speech recognition result without the translated text data, the listener device 30 may output only the text data.
  • the steps executed in the first embodiment and the execution order of the steps are not limited to the example of FIG.
  • the speech audio data acquired in (S83) and the language information of the speaker may be acquired at different timings.
  • the language information of the speaker can be acquired before (S81).
  • the translation data providing method may include a step of displaying the text data of the speech recognition result and the translation text data included in the response data indicating a normal response on the monitor of the display unit 13.
  • the language information and the device ID are acquired by the speaker device 20 from the listener device 30 who wants to provide translation data, and the language information and the device ID are associated with each other in the correspondence storage unit 22.
  • utterance voice data that is the original data for translation is acquired by the utterer apparatus 20, and language designation data and utterance voice data corresponding to the correspondence information stored in the correspondence storage unit 22 are transmitted from the utterer apparatus 20 to the server apparatus 10.
  • utterance voice data is converted into utterance text data by voice recognition, and the utterance text data is translated into a language indicated by the language designation data.
  • the translated text data is sent from the server device 10 to the speaker device 20, and the device ID associated with the language information corresponding to the translated text data and stored in the correspondence storage unit 22 is designated as the destination, and the speech Is transmitted from the listener device 20 to the listener device 30.
  • the listener device 30 can acquire the translation text data generated by the server device 10 via the speaker device 20 that acquires the speech voice data.
  • the listener device 30 can obtain the translated text data from the speaker device 20 without accessing the server device 10 by providing the language information and the device ID to the speaker device 20.
  • the server device 10 only needs to recognize the speaker device 20, and does not need to recognize which listener device 30 receives the translated text data to be transmitted. Therefore, according to the first embodiment, a translation service into a desired language can be provided to the user without registering personal information of the user (listener) of the listener device 30 in the server device 10.
  • the language information and the terminal ID of the listener device 30 are stored in the speaker device 20.
  • each user of the speaker device 20 and the listener device 30 is in a relationship between the speaker and the listener, or a relationship close thereto (for example, a relationship between the person who acquires the speech data and the listener of the speech).
  • the speaker device 20 is in a party position related to the utterance that is the source of translation. Therefore, even if such information is stored in the speaker device 20, it is difficult to be associated with leakage of personal information.
  • language data corresponding to the language information of the speaker is transmitted from the speaker device 20 to the server device 10.
  • the server apparatus 10 can switch a speech recognition process and a translation process for the language data, it can support a plurality of translation forms.
  • the language information acquired from the listener device 30 indicates a plurality of different languages
  • a plurality of pairs of translated text data and language data corresponding to the translated text data are converted into the translated text data.
  • the server apparatus 10 to the speaker apparatus 20 have a pair of translation text data and language data corresponding to the translation text data as the voice of the utterance voice data from which the translation text data is based. It is provided in a state where it can be associated with the text data of the recognition result.
  • the text data of the speech recognition result is displayed on the monitor of the speaker device 20, the speaker who is the user of the speaker device 20 or the person who can listen to the speech sees the text data.
  • the speaker device 20 further acquires reliability information of speech recognition from the server device 10 in addition to the translated text data and the like.
  • the second embodiment will be described focusing on the contents different from the first embodiment, and the same contents as those of the first embodiment will be omitted as appropriate.
  • the speech recognition unit 31 generates speech text data by performing speech recognition processing on speech speech data, and further calculates the reliability of the speech recognition result. For example, the speech recognition unit 31 calculates the likelihood for each word of the recognition result candidate derived using the acoustic model and the language model, and the likelihood and selection of the word finally selected from the candidates
  • the reliability can be calculated using a difference from the likelihood of the word that has not been performed. In this case, the higher the difference in likelihood, the higher the degree of reliability, and the lower the likelihood difference, the lower the degree of reliability.
  • a known method may be used as a method for calculating the reliability of the speech recognition result.
  • the translation unit 32 associates the translation text data, the language data corresponding to the translation text data, the text data of the speech recognition result of the speech data that is the basis of the translation text data, and the reliability information of the speech recognition result. Transmit to the speaker device 20 in a possible state.
  • the translation unit 32 does not perform translation processing on the utterance text data obtained by the speech recognition unit 31, the text data (utterance text data) of the speech recognition result and the translation text data which is the translation result of this text data
  • the association identification data for associating with and the reliability information of the voice recognition result may be transmitted to the speaker device 20 as response data.
  • the receiving unit 25 associates the translated text data, the language data corresponding to the translated text data, the text data of the speech recognition result of the speech data that is the basis of the translated text data, and the reliability information of the speech recognition result.
  • each response data includes related identification data. May be set.
  • the providing unit 26 determines whether to transmit the received translation text data as it is to the listener device 30 based on the reliability information received by the receiving unit 25.
  • the providing unit 26 transmits the translated text data to the listener device 30 as in the first embodiment.
  • the providing unit 26 outputs that the reliability is low because the accuracy of the translated text data is low.
  • the providing unit 26 may display that the reliability is low on the monitor of the display unit 13 or may output the sound to the speaker unit 17 with sound.
  • the predetermined value to be compared with the reliability is a reliability threshold and is held in advance by the providing unit 26.
  • the providing unit 26 may not send the translated text data to the listener device 30 or may allow the user to decide whether or not to send it.
  • the providing unit 26 displays an operation button for selecting whether or not to transmit to the listener device 30 on the monitor together with the low reliability, and transmits the translated text data in response to a user operation on the operation button. You may decide to do or not send.
  • the providing unit 26 may transmit reliability information to the listener device 30 together with the translation text data.
  • FIG. 9 is a flowchart showing an operation example of the speaker device 20 in the second embodiment.
  • the execution subject of the translation data providing method in the second embodiment is the same as in the first embodiment. Since each process is the same as the processing content of the above-mentioned each process part which the speaker apparatus 20 has, the detail of each process is abbreviate
  • steps having the same contents as those in FIG. 8 are denoted by the same reference numerals as those in FIG.
  • the speaker device 20 executes (S81) to (S84) as in the first embodiment.
  • the server device 10 receives the data transmitted in (S84), executes the speech recognition process and the translation process as in the first embodiment, and as a result, the speech voice data is translated into the language indicated by the language designation data. Generate translated text data.
  • the server device 10 calculates the reliability of the voice recognition result.
  • the server device 10 associates the translated text data, the language data corresponding to the translated text data, the text data of the speech recognition result of the speech data that is the basis of the translated text data, and the reliability information of the speech recognition result. In a possible state, it is transmitted as response data to the speaker device 20.
  • the server device 10 may transmit the text data of the speech recognition result, the related identification data, and the reliability information of the speech recognition result as response data to the speaker device 20 without the translated text data.
  • the speaker apparatus 20 receives the response data from the server apparatus 10 (S91).
  • the response data includes a value indicating whether or not the response is normal, a pair of translation text data and language data corresponding to the translation text data, text data of speech recognition results of the utterance voice data from which the translation text data is based And reliability information of the speech recognition result.
  • the speaker apparatus 20 sets a plurality of pairs of translation text data corresponding to the plurality of languages and language data corresponding to the translation text data in the translation text data. It is received as response data in a state associated with the text data and reliability information of the speech recognition result of the original speech voice data. Further, the speaker device 20 may receive text data of the speech recognition result, related identification data, and reliability information of the speech recognition result as response data.
  • the speaker device 20 determines whether or not the response data indicates a normal response (S92). If the response data does not indicate a normal response (S92; NO), the speaker device 20 outputs error information as in the first embodiment (S87).
  • the speaker apparatus 20 When the response data indicates a normal response (S92; YES), the speaker apparatus 20 further determines whether or not the reliability information included in the response data indicates a reliability lower than a predetermined value (S93). When the reliability information indicates a reliability equal to or higher than a predetermined value (S93; NO), the speaker device 20 specifies the destination of the translated text data included in the response data (S88), as in the first embodiment. The desired translated text data is transmitted to each listener device 30 (S89). In the second embodiment, the speaker device 20 may transmit reliability information to the listener device 30 together with the translated text data.
  • the listener device 30 receives the translated text data from the speaker device 20 and displays it on a monitor that outputs the translated text data, as in the first embodiment. In the second embodiment, the listener device 30 can also output reliability information received together with the translated text data.
  • the speaker device 20 presents that the reliability is low (S94).
  • the speaker device 20 may display that the reliability is low on the monitor of the display unit 13 or may output the sound to the speaker unit 17 with sound.
  • the speaker device 20 presents that the reliability is low, and causes the monitor to display an operation screen that allows the user to select whether or not to transmit the translated text data to the listener device 30.
  • the speaker device 20 determines whether or not the user has selected transmission through a user operation via the operation image (S95). When the speaker device 20 determines that the user has selected transmission (S95; YES), the speaker device 20 executes (S88) as described above. If the speaker device 20 determines that the user has not selected transmission (S95; NO), the speaker device 20 outputs error information (S87).
  • FIG. 9 a plurality of steps (processes) are shown in order, but the steps executed in the second embodiment and the execution order of the steps are not limited to the example of FIG.
  • the speaker device 20 when (S95) shown in FIG. 9 is omitted and the reliability is lower than a predetermined value (S93; YES), the speaker device 20 unconditionally sends the translated text data to the listener device 30. Instead, error information may be output (S87). Further, the speaker device 20 may always present the reliability information included in the response data without depending on the comparison result between the reliability and the predetermined value.
  • the reliability information of the speech recognition result is provided from the server device 10 to the speaker device 20 in a state where the reliability information can be associated with the translated text data, the language data, and the text data of the speech recognition result. Is done.
  • the speaker apparatus 20 can determine whether or not to transmit the translated text data to the listener apparatus 30 as it is based on the reliability information.
  • the reliability of the speech recognition result is low, the accuracy of the text data of the speech recognition result is low, and as a result, the accuracy of the translated text data converted from the text data is also low. Therefore, by using the reliability information, it is possible to prevent erroneous translation contents from being provided to the listener device 30.
  • the speaker device 20 presents the reliability information, the speaker can be made to recognize that the reliability of voice recognition is low, and the speaker can be given a chance to rephrase. Thereby, the utterance content can be appropriately conveyed to the listener in another language.
  • the speaker device 20 causes the user to select transmission of the translated text data to the listener device 30, so that the translated text data correctly translated even if the reliability is low, the listener device 30 can be provided.
  • the translation system can also include a plurality of server devices 10.
  • the server device 10 having the voice recognition unit 31 and the server device 10 having the translation unit 32 may be different devices.
  • the server device 10 from which the utterer device 20 transmits uttered voice data or the like differs from the server device 10 from which the utterer device 20 receives the translated text data or the like.
  • Different server devices 10 may be provided for each translated language.
  • the listener device 30 may communicate with the speaker device 20 via another listener device 30.
  • the speaker device 20 and the plurality of listener devices 30 may form a wireless multi-hop network.
  • the listener device 30 existing at a position where the radio wave from the speaker device 20 does not reach can also receive translation data from the speaker device 20.
  • a known technique may be used as a data propagation technique in the wireless multi-hop network.
  • FIG. 10 is a diagram conceptually illustrating a processing configuration example of the information processing apparatus in the third embodiment.
  • the information processing apparatus 50 includes an information acquisition unit 51, a transmission unit 52, a reception unit 53, a provision unit 54, and the like.
  • the information acquisition unit 51 acquires language information from the terminal device.
  • the transmission unit 52 transmits the language designation data corresponding to the language information acquired by the information acquisition unit 51 and the utterance data of the speaker to the server device.
  • the receiving unit 53 receives from the server device the translation data in which the utterance data is translated into the language indicated by the language designation data.
  • the providing unit 54 transmits the received translation data to the terminal device.
  • An example of the information processing apparatus 50 is the speaker apparatus 20 described above.
  • An example of the terminal device is the listener device 30 described above, and an example of the server device is the server device 10 described above.
  • the server device from which the receiving unit 53 receives translation data may be different from the server device from which the transmitting unit 52 transmits voice data.
  • the utterance data transmitted by the transmission unit 52 may not be voice data.
  • the transmission unit 52 may transmit the utterance text data as the utterance data to the server device.
  • the utterance text data may be input by the user operating the input device of the information processing device 50.
  • the information processing apparatus 50 may include the voice recognition unit 31 described above, and the utterance text data may be converted from the utterance voice data by the voice recognition unit 31. In this case, the information processing apparatus 50 can generate text data as a speech recognition result and calculate the reliability of speech recognition.
  • the server device that is the transmission destination of the utterance data may not have the voice recognition unit 31.
  • the transmission unit 52 may not transmit language data corresponding to the language information of the speaker. This corresponds to a case where the language of the speaker is fixedly fixed to one, or a case where the language can be automatically recognized from the utterance data on the server device side.
  • the translation data received by the receiving unit 53 may be voice data instead of text data.
  • the server device generates and transmits translated voice data.
  • the reception unit 53 does not need to receive language data corresponding to the translation data. Further, the receiving unit 53 may not receive the text data of the voice recognition result. This is because only the translation data need be provided to the terminal device, and the text data of the speech recognition result does not necessarily have to be presented in the information processing device 50.
  • the translation data transmitted by the providing unit 54 may be voice data instead of text data.
  • the providing unit 54 may generate translated speech data that reads the translated text data, and transmit the translated speech data to the terminal device.
  • the providing unit 54 can transmit the broadcast data in association with the language data corresponding to the translation data, instead of the unicast communication designating the terminal ID.
  • the terminal device may extract translation data associated with desired language data from the received translation data.
  • An example of the specific processing content of the information acquisition unit 51 is indicated by the information acquisition unit 21 described above.
  • the information acquisition unit 51 may not acquire the device ID when the providing unit 54 transmits the translation data by wireless broadcast.
  • the information processing apparatus 50 may not include the correspondence storage unit 22.
  • the information acquisition unit 51 may store each language information and each terminal ID in association with each other in the correspondence storage unit 22 of another computer. Moreover, the information acquisition part 51 should hold
  • the information processing apparatus 50 shown in FIG. 10 has, for example, the same hardware configuration as the above-described speaker apparatus 20 shown in FIG. 2, and the program is processed in the same manner as the speaker apparatus 20. Each processing unit described above is realized.
  • the hardware configuration of the information processing apparatus 50 is not limited.
  • FIG. 11 is a flowchart showing an operation example of the information processing apparatus 50 in the third embodiment.
  • the translation data providing method in the third embodiment is executed by at least one computer such as the information processing apparatus 50.
  • each illustrated process is performed by each processing unit included in the information processing apparatus 50.
  • the translation data providing method in this embodiment includes (S111) to (S116).
  • the computer acquires language information from the terminal device.
  • the computer transmits the language designation data corresponding to the language information acquired in (S111) and the utterance data of the speaker to the server device.
  • the computer receives response data from the server device.
  • the response data indicates a normal response (S114; YES)
  • the response data includes translation data in which the utterance data is translated into the language indicated by the language designation data.
  • the computer outputs error information (S116).
  • the computer transmits the translation data received in (S113) to the terminal device.
  • S111 is (S81) of FIGS. 8 and 9, an example of (S112) is (S84) of FIGS. 8 and 9, and an example of (S113) is (S85) of FIG. ) And (S91) of FIG.
  • S115 is (S88) and (S89) in FIGS. 8 and 9, and an example of (S116) is (S87) in FIGS.
  • the third embodiment may be a program that causes at least one computer to execute such a method for providing translation data, or a recording medium that can be read by at least one computer that records such a program. There may be.
  • the listener When receiving the translation data, the listener operates his / her listener device 30 to pair his / her listener device 30 with the speaker device 20. Pairing between the listener device 30 and the speaker device 20 is realized through authentication corresponding to a form of wireless communication (Bluetooth (registered trademark), ZigBee, NFC, Wi-Fi, etc.) between both terminals.
  • the speaker device 20 (information acquisition unit 21) may acquire a terminal ID from each listener device 30.
  • the speaker device 20 establishes a radio channel with each of the listener devices 30 and receives user profile information from each of the listener devices 30 (the information acquisition unit 21 and the information acquisition unit 51). ).
  • This user profile information includes the language information of the listener. Thereby, the user of the listener device 30 can receive provision of translation data only by performing an instruction operation for pairing with the speaker device 20.
  • the speaker device 20 can also acquire the listener's language information from the listener device 30 by using a human area network technology that uses the surface electric field of the human body.
  • the information acquisition unit 21 and the information acquisition unit 51 acquire language information from the terminal device as the human body communication with the terminal device such as the listener device 30 is successful.
  • the speaker apparatus 20 has a communication unit 15 that performs human body communication using human area network technology, and acquires language information through human body communication using the communication unit 15. In this way, the user of the listener device 30 simply holds the listener device 30 and simply touches the holder of the speaker device 20 with the body like a handshake, so that the translation data can be easily obtained. You can receive the offer.
  • an utterance in a conversation between a speaker and a listener who is a user of the listener device 30 can be a translation target. Furthermore, in each embodiment, the speech of a speaker at a lecture or seminar can be translated. In this case, there may be a plurality of listeners who wish to listen in different languages. Even if the number of listener devices 30 that can be paired with the speaker device 20 is limited, a plurality of listener devices 30 can communicate with the speaker device 20 by using the wireless multi-hop network. Further, even if a wireless multi-hop network is not used, all the listener devices 30 can be paired with any one speaker device 20 by using a plurality of speaker devices 20. According to each embodiment described above, each listener can listen to translation data in a desired language almost simultaneously.
  • Information acquisition means for acquiring language information from the terminal device; Transmitting means for transmitting the language designation data corresponding to the acquired language information and the utterance data of the speaker to the server device; Receiving means for receiving, from a server device, translation data in which the utterance data is translated into a language indicated by the language designation data; Providing means for transmitting the received translation data to the terminal device;
  • An information processing apparatus comprising: 2.
  • the information acquisition means acquires a plurality of different language information from a plurality of terminal devices
  • the transmission means transmits the language designation data and the utterance data corresponding to the acquired plurality of language information to a server device
  • the receiving means receives a plurality of translation data translated into a plurality of languages indicated by the language designation data from a server device in a state associated with language data corresponding to each translation data
  • the providing unit is configured to receive the plurality of received translation data so that the plurality of terminal devices can respectively receive translation data corresponding to language information of each terminal device among the plurality of received translation data.
  • the information acquisition means acquires the language information and terminal identification information from the terminal device, stores each terminal identification information and each language information in association with each other,
  • the providing means transmits the received translation data by specifying terminal identification information stored in association with language information corresponding to the translation data as a destination; 1. Or 2.
  • the providing means associates the received translation data with language data corresponding to the translation data, and transmits by radio broadcast. 1.
  • the transmission means transmits language data corresponding to the language information of the speaker, the speech data as the speech data, and the language designation data to a server device,
  • the receiving means receives translated text data from the server device as the translated data; 1.
  • the receiving unit is configured to generate a plurality of pairs of the translated text data and language data corresponding to the translated text data corresponding to a plurality of languages indicated by the language designation data, and utterance voice data based on the translated text data Receive in a state that can be associated with the text data of the voice recognition result of 5.
  • the receiving means includes the translated text data, language data corresponding to the translated text data, text data of speech recognition results of speech voice data that is the basis of the translated text data, and reliability information of the speech recognition results. Receive in a state that can be associated, 5. Or 6.
  • the information acquisition means acquires the language information from the terminal device with the success of human body communication with the terminal device. 1. To 7.
  • a translation data providing method executed on at least one computer, Get language information from the terminal Transmitting the language designation data corresponding to the acquired language information and the utterance data of the speaker to the server device; Receiving the translation data from the server device, wherein the speech data is translated into the language indicated by the language designation data, Transmitting the received translation data to the terminal device; Translation data provision method including the above. 10.
  • the received translation data is transmitted by designating terminal identification information stored in association with language information corresponding to the translation data as a destination. 9. Or 10.
  • the received translation data is associated with language data corresponding to the translation data, and is transmitted by radio broadcast. 9. To 11.
  • the translation data provision method according to any one of the above. 13 Obtaining speech data and language information of the speaker, Further including Transmission to the server device, the language data corresponding to the language information of the speaker, the speech data as the speech data, and the language designation data, Reception from the server device receives translation text data as the translation data from the server device, 9. To 12.
  • the translation data provision method according to any one of the above.
  • the reception from the server device is based on a plurality of pairs of the translated text data and language data corresponding to the translated text data corresponding to a plurality of languages indicated by the language designation data. Received in a state where it can be associated with the text data of the speech recognition result of the speech data. 13 The translation data providing method described in 1. 15.
  • the server device receives the translated text data, language data corresponding to the translated text data, text data of speech recognition results of speech voice data that is the basis of the translated text data, and trust of the speech recognition results. Receive information in a state that can be correlated, 13 Or 14.
  • the acquisition of the language information is acquired from the terminal device with the success of human body communication with the terminal device, 9. To 15.
  • the translation data provision method according to any one of the above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

This information processing device (50) is provided with: an information acquisition unit (51) which acquires language information from a terminal device; a transmission unit (52) which transmits, to a server device, language specification data corresponding to the acquired language information, and speech data of a speaker; a reception unit (53) which receives, from the server device, translation data obtained by translating the speech data into the language indicated by the language specification data; and a provision unit (54) which transmits, to the terminal device, the received translation data.

Description

情報処理装置及び翻訳データ提供方法Information processing apparatus and translation data providing method
 本発明は、翻訳サービスを提供する技術に関する。 The present invention relates to a technique for providing a translation service.
 下記特許文献1には、多言語交信方法が提案されている。この方法では、座席に設けられたモニタでリストの中から所望の言語が選択されると、その座席と選択言語との対応関係が管理される。その対応関係に基づいて、その座席の音響システムやモニタには、その選択言語でコンテンツが出力される。 The following Patent Document 1 proposes a multilingual communication method. In this method, when a desired language is selected from the list on a monitor provided in the seat, the correspondence between the seat and the selected language is managed. Based on the correspondence, content is output in the selected language to the acoustic system and monitor of the seat.
 下記特許文献2には、チャットシステムにおけるチャネルとVoIP(Voice over IP(Internet Protocol))システムにおけるVoIP用会議室との間を中断することなく円滑にかつ容易に移行できるマルチチャネル会話システムが提案されている。この提案システムは、VoIP会議室内で送受信される音声会話メッセージを音声認識し、その認識結果である文字列を翻訳し、この翻訳結果の文字列とその文字列から抽出されたキーワードをチャットサーバに送信する。チャットサーバは、翻訳結果の文字列情報と抽出されたキーワードを文字列会話メッセージとしてクライアント端末に送信する。 Patent Document 2 below proposes a multi-channel conversation system that can smoothly and easily transition between a channel in a chat system and a conference room for VoIP in a VoIP (Voice over IP (Internet Protocol)) system. ing. This proposed system recognizes a voice conversation message transmitted and received in a VoIP conference room, translates a character string as a recognition result, and sends a character string extracted from the translation result and a keyword extracted from the character string to a chat server. Send. The chat server transmits the character string information of the translation result and the extracted keyword to the client terminal as a character string conversation message.
 下記特許文献3には、コミュニケーション支援方法が提案されている。この提案では、クライアント装置が、音声データを言語認識及び言語解析することで、第一言語に基づく内部表現を生成し、この内部表現の重要度を判定する。サーバ装置は、その重要度に応じたモードでその内部表現を第二言語に翻訳する。この提案手法によれば、重要な内容を含まない入力に対しては、低負荷の翻訳処理が自動的に選択されることで、翻訳結果を得るまでの応答時間を早めることができる。 In the following Patent Document 3, a communication support method is proposed. In this proposal, the client device generates an internal representation based on the first language by recognizing and analyzing speech data, and determines the importance of the internal representation. The server device translates the internal representation into the second language in a mode corresponding to the importance. According to this proposed method, for input that does not include important content, a low-load translation process is automatically selected, thereby speeding up the response time until a translation result is obtained.
特表2006-512647号公報JP-T-2006-512647 特開2004-185088号公報JP 2004-185088 A 特開2004-355118号公報JP 2004-355118 A
 上述の提案手法では、サーバ装置が複数のクライアント装置に対して多言語の翻訳結果を提供する。これにより、各クライアント装置の各ユーザは、所望の言語でのコンテンツの提供をそれぞれ受けることができる。しかしながら、このような手法では、各クライアント装置は、各ユーザの正当性を証明して、サーバ装置と通信(セッション)を確立することが求められる。正当性証明のために、各ユーザ情報がサーバ装置にそれぞれ登録される。即ち、このような手法によれば、会話の参加者や講義の聴講者といった情報が全てサーバ装置上に残ることになる。このような情報は、個人的な嗜好を示す個人情報であると考えることができる。 In the above proposed method, the server device provides multilingual translation results to a plurality of client devices. Thereby, each user of each client device can receive provision of contents in a desired language. However, in such a technique, each client device is required to prove the validity of each user and establish communication (session) with the server device. Each user information is registered in the server device for validity verification. That is, according to such a method, all information such as conversation participants and lecture attendees remains on the server device. Such information can be considered as personal information indicating personal preferences.
 本発明は、このような事情に鑑みてなされたものであり、聞き手の個人情報をサーバ装置へ登録することなく、聞き手に所望の言語への翻訳サービスを提供する技術を実現する。 The present invention has been made in view of such circumstances, and realizes a technique for providing a listener with a translation service into a desired language without registering the listener's personal information in the server device.
 本発明の各側面では、上述した課題を解決するために、それぞれ以下の構成を採用する。 In each aspect of the present invention, the following configurations are adopted in order to solve the above-described problems.
 第一の側面は、情報処理装置に関する。第一の側面に係る情報処理装置は、端末装置から言語情報を取得する情報取得手段と、前記取得された言語情報に対応する言語指定データ及び発話者の発話データをサーバ装置に送信する送信手段と、前記発話データが前記言語指定データが示す言語に翻訳された翻訳データをサーバ装置から受信する受信手段と、前記受信された翻訳データを前記端末装置に送信する提供手段と、を有する。 The first aspect relates to an information processing apparatus. An information processing apparatus according to a first aspect includes information acquisition means for acquiring language information from a terminal device, and transmission means for transmitting language designation data and utterance data of a speaker corresponding to the acquired language information to a server apparatus Receiving means for receiving, from a server device, translation data in which the utterance data is translated into the language indicated by the language designation data, and providing means for sending the received translation data to the terminal device.
 第二の側面は、少なくとも1つのコンピュータにより実行される翻訳データ提供方法に関する。第二の側面に係る翻訳データ提供方法は、端末装置から言語情報を取得し、前記取得された言語情報に対応する言語指定データ及び発話者の発話データをサーバ装置に送信し、前記発話データが前記言語指定データが示す言語に翻訳された翻訳データをサーバ装置から受信し、前記受信された翻訳データを前記端末装置に送信する、ことを含む。 The second aspect relates to a translation data providing method executed by at least one computer. The translation data providing method according to the second aspect acquires language information from a terminal device, transmits language designation data corresponding to the acquired language information and utterance data of a speaker to a server device, and the utterance data is Receiving translation data translated into a language indicated by the language designation data from a server device, and transmitting the received translation data to the terminal device.
 なお、本発明の他の側面としては、上記第二の側面の方法を少なくとも1つのコンピュータに実行させるプログラムであってもよいし、このようなプログラムを記録したコンピュータが読み取り可能な記録媒体であってもよい。この記録媒体は、非一時的な有形の媒体を含む。 As another aspect of the present invention, there may be a program for causing at least one computer to execute the method of the second aspect, or a computer-readable recording medium recording such a program. May be. This recording medium includes a non-transitory tangible medium.
 上記各側面によれば、聞き手の個人情報をサーバ装置へ登録することなく、聞き手に所望の言語への翻訳サービスを提供する技術を実現することができる。 According to each aspect described above, it is possible to realize a technology for providing a listener with a translation service into a desired language without registering the listener's personal information in the server device.
 上述した目的、およびその他の目的、特徴および利点は、以下に述べる好適な実施の形態、およびそれに付随する以下の図面によってさらに明らかになる。 The above-described object and other objects, features, and advantages will be further clarified by a preferred embodiment described below and the following drawings attached thereto.
第一実施形態における発話者装置を含む翻訳システムのシステム構成を概念的に示す図である。It is a figure which shows notionally the system configuration | structure of the translation system containing the speaker apparatus in 1st embodiment. 第一実施形態における発話者装置のハードウェア構成例を概念的に示す図である。It is a figure which shows notionally the hardware structural example of the speaker apparatus in 1st embodiment. 第一実施形態における発話者装置の処理構成例を概念的に示す図である。It is a figure which shows notionally the process structural example of the speaker apparatus in 1st embodiment. 対応記憶部に格納される対応付け情報の例を示す図である。It is a figure which shows the example of the matching information stored in a corresponding | compatible memory | storage part. サーバ装置からの正常応答のデータの例を示す図である。It is a figure which shows the example of the data of the normal response from a server apparatus. サーバ装置からの異常応答のデータの例を示す図である。It is a figure which shows the example of the data of the abnormal response from a server apparatus. サーバ装置の処理構成例を概念的に示す図である。It is a figure which shows notionally the process structural example of a server apparatus. 第一実施形態における発話者装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the speaker apparatus in 1st embodiment. 第二実施形態における発話者装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the speaker apparatus in 2nd embodiment. 第三実施形態における情報処理装置の処理構成例を概念的に示す図である。It is a figure which shows notionally the process structural example of the information processing apparatus in 3rd embodiment. 第三実施形態における情報処理装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the information processing apparatus in 3rd embodiment.
 以下、本発明の実施の形態について説明する。なお、以下に挙げる各実施形態はそれぞれ例示であり、本発明は以下の各実施形態の構成に限定されない。 Hereinafter, embodiments of the present invention will be described. In addition, each embodiment given below is an illustration, respectively, and this invention is not limited to the structure of each following embodiment.
[第一実施形態]
 以下、第一実施形態における発話者装置及び翻訳データ提供方法について複数の図面を用いて説明する。
[First embodiment]
Hereinafter, the speaker device and the translation data providing method in the first embodiment will be described with reference to a plurality of drawings.
 〔システム構成〕
 図1は、第一実施形態における発話者装置を含む翻訳システムのシステム構成を概念的に示す図である。翻訳システムは、サーバ装置10、発話者装置20等を含む。翻訳システムは、サーバ装置10を介して、聞き手装置30に翻訳サービスを提供する。翻訳システムは、複数のサーバ装置10、複数の発話者装置20を含むこともできるし、1つのサーバ装置10を介して、複数の聞き手装置30に翻訳サービスを提供することもできる。
〔System configuration〕
FIG. 1 is a diagram conceptually showing the system configuration of a translation system including a speaker device in the first embodiment. The translation system includes a server device 10, a speaker device 20, and the like. The translation system provides a translation service to the listener device 30 via the server device 10. The translation system can include a plurality of server apparatuses 10 and a plurality of speaker apparatuses 20, and can also provide a translation service to a plurality of listener apparatuses 30 via one server apparatus 10.
 サーバ装置10と発話者装置20とは、通信網9を介して通信可能に接続される。通信網9は、携帯電話回線網、Wi-Fi(Wireless Fidelity)回線網、インターネット通信網、専用回線網、LAN(Local Area Network)等である。本実施形態では、通信網9の通信形態は制限されない。 The server device 10 and the speaker device 20 are communicably connected via the communication network 9. The communication network 9 is a mobile phone line network, a Wi-Fi (Wireless Fidelity) line network, an Internet communication network, a dedicated line network, a LAN (Local Area Network), or the like. In the present embodiment, the communication form of the communication network 9 is not limited.
 サーバ装置10は、いわゆるコンピュータであり、図1に示されるように、CPU(Central Processing Unit)2、メモリ3、入出力インタフェース(I/F)4、通信ユニット7等を有する。メモリ3は、RAM(Random Access Memory)、ROM(Read Only Memory)、ハードディスク等である。入出力I/F4は、表示装置(図示せず)、入力装置(図示せず)等のユーザインタフェース装置と接続可能である。通信ユニット7は、発話者装置20のような他のコンピュータとの通信や、他の機器との信号のやりとり等を行う。サーバ装置10のハードウェア構成は制限されない。 The server device 10 is a so-called computer and includes a CPU (Central Processing Unit) 2, a memory 3, an input / output interface (I / F) 4, a communication unit 7 and the like as shown in FIG. The memory 3 is a RAM (Random Access Memory), a ROM (Read Only Memory), a hard disk, or the like. The input / output I / F 4 can be connected to a user interface device such as a display device (not shown) or an input device (not shown). The communication unit 7 communicates with other computers such as the speaker device 20 and exchanges signals with other devices. The hardware configuration of the server device 10 is not limited.
 《発話者装置》
 図2は、第一実施形態における発話者装置20のハードウェア構成例を概念的に示す図である。発話者装置20は、PC(Personal Computer)、携帯電話、スマートフォン、タブレット端末、ウェアラブルコンピュータのようないわゆるコンピュータである。発話者装置20は、CPU11、メモリ12、表示ユニット13、タッチセンサ14、通信ユニット15、マイクロフォンユニット16、スピーカユニット17等を有する。CPU11は、他の各ユニットとバス等の通信線により接続される。
《Speaker device》
FIG. 2 is a diagram conceptually illustrating a hardware configuration example of the speaker device 20 in the first embodiment. The speaker device 20 is a so-called computer such as a PC (Personal Computer), a mobile phone, a smartphone, a tablet terminal, and a wearable computer. The speaker device 20 includes a CPU 11, a memory 12, a display unit 13, a touch sensor 14, a communication unit 15, a microphone unit 16, a speaker unit 17, and the like. The CPU 11 is connected to other units via a communication line such as a bus.
 メモリ12は、RAM、ROM、補助記憶装置(ハードディスク等)である。
 表示ユニット13は、LCD(Liquid Crystal Display)やCRT(Cathode Ray Tube)ディスプレイ等のようなモニタを含み、表示処理を行う。
The memory 12 is a RAM, a ROM, or an auxiliary storage device (such as a hard disk).
The display unit 13 includes a monitor such as an LCD (Liquid Crystal Display) or a CRT (Cathode Ray Tube) display, and performs display processing.
 タッチセンサ14は、外部からの接触を感知することによりユーザからの操作入力を受け付ける。タッチセンサ14は、非接触状態であっても外部からの近接状態を検知することができるセンサであってもよい。また、表示ユニット13及びタッチセンサ14は、タッチパネルユニットとして実現されてもよい。更に、発話者装置20は、タッチセンサ14と共に、又は、タッチセンサ14の代わりに、マウスやキーボード等の入力装置と接続される入出力インタフェース(図示せず)を持つようにしてもよい。 The touch sensor 14 receives an operation input from the user by sensing an external contact. The touch sensor 14 may be a sensor that can detect a proximity state from the outside even in a non-contact state. Further, the display unit 13 and the touch sensor 14 may be realized as a touch panel unit. Further, the speaker device 20 may have an input / output interface (not shown) connected to an input device such as a mouse or a keyboard together with the touch sensor 14 or instead of the touch sensor 14.
 マイクロフォンユニット16は、集音装置である。
 スピーカユニット17は、音出力装置である。
 通信ユニット15は、無線又は有線により他の装置と通信を行う。例えば、発話者装置20が携帯型端末の場合には、通信ユニット15は、無線により通信網9と接続し、通信網9を介してサーバ装置10の通信ユニット7と通信を行い、聞き手装置30とも無線通信を行う。発話者装置20と聞き手装置30との間の無線通信の形態には、例えば、Bluetooth(登録商標)、ZigBee、NFC(Near Field Communication)、Wi-Fi等がある。但し、その無線通信の形態は制限されない。
The microphone unit 16 is a sound collection device.
The speaker unit 17 is a sound output device.
The communication unit 15 communicates with other devices wirelessly or by wire. For example, when the speaker device 20 is a portable terminal, the communication unit 15 is wirelessly connected to the communication network 9, communicates with the communication unit 7 of the server device 10 via the communication network 9, and the listener device 30. Both perform wireless communication. Examples of wireless communication between the speaker device 20 and the listener device 30 include Bluetooth (registered trademark), ZigBee, NFC (Near Field Communication), and Wi-Fi. However, the form of the wireless communication is not limited.
 発話者装置20は、図2に示されるハードウェア要素以外にも、撮像ユニット、振動センサ、加速度センサ等を含むこともできる。発話者装置20のハードウェア構成も制限されない。 The speaker device 20 can also include an imaging unit, a vibration sensor, an acceleration sensor, and the like in addition to the hardware elements shown in FIG. The hardware configuration of the speaker device 20 is not limited.
 聞き手装置30は、いわゆるコンピュータであり、発話者装置20と同様のハードウェア構成を有する。発話者装置20と通信可能であり、かつ、発話者装置20から送られる翻訳データを出力可能であれば、聞き手装置30のハードウェア構成も制限されない。発話者装置20と聞き手装置30とのハードウェア構成は異なっていてもよい。 The listener device 30 is a so-called computer and has a hardware configuration similar to that of the speaker device 20. The hardware configuration of the listener device 30 is not limited as long as it can communicate with the speaker device 20 and can output the translation data sent from the speaker device 20. The hardware configuration of the speaker device 20 and the listener device 30 may be different.
 〔処理構成〕
 《発話者装置》
 図3は、第一実施形態における発話者装置20の処理構成例を概念的に示す図である。発話者装置20は、情報取得部21、対応記憶部22、発話データ取得部23、送信部24、受信部25、提供部26等を有する。これら各処理部は、例えば、CPU11によりメモリ12に格納されるプログラムが実行されることにより実現される。また、当該プログラムは、例えば、CD(Compact Disc)、メモリカード等のような可搬型記録媒体やネットワーク上の他のコンピュータから通信ユニット15を介してインストールされ、メモリ12に格納されてもよい。
[Processing configuration]
《Speaker device》
FIG. 3 is a diagram conceptually illustrating a processing configuration example of the speaker device 20 in the first embodiment. The speaker device 20 includes an information acquisition unit 21, a correspondence storage unit 22, an utterance data acquisition unit 23, a transmission unit 24, a reception unit 25, a provision unit 26, and the like. Each of these processing units is realized, for example, by executing a program stored in the memory 12 by the CPU 11. Further, the program may be installed from a portable recording medium such as a CD (Compact Disc) or a memory card or another computer on the network via the communication unit 15 and stored in the memory 12.
 情報取得部21は、複数の聞き手装置30から言語情報及び端末IDをそれぞれ取得する。言語情報は、各聞き手装置30のユーザが用いる言語の情報であり、日本語、英語、フランス語、ドイツ語、中国語等を示す。各聞き手装置30から得られる言語情報は、複数の言語を示すこともできる。言語情報が複数の言語を示す場合には、各言語に優先度が付されていてもよい。各聞き手装置30のユーザの使用言語が異なる場合には、情報取得部21は、複数の聞き手装置30から複数の異なる言語情報を取得することになる。 The information acquisition unit 21 acquires language information and a terminal ID from the plurality of listener devices 30. The language information is information on the language used by the user of each listener device 30, and indicates Japanese, English, French, German, Chinese, or the like. The language information obtained from each listener device 30 can also indicate a plurality of languages. When the language information indicates a plurality of languages, priority may be given to each language. When the language used by the user of each listener device 30 is different, the information acquisition unit 21 acquires a plurality of different language information from the plurality of listener devices 30.
 端末IDは、端末識別情報であり、発話者装置20と聞き手装置30との間の通信で宛先又は送信元のアドレスとして用いられる。情報取得部21による言語情報及び端末IDの具体的な取得手法については、実施例の項で例示する。
 情報取得部21は、取得された各言語情報及び各端末IDを対応付けて対応記憶部22に格納する。
The terminal ID is terminal identification information, and is used as an address of a destination or a transmission source in communication between the speaker device 20 and the listener device 30. The specific acquisition method of the language information and the terminal ID by the information acquisition unit 21 is exemplified in the section of the embodiment.
The information acquisition unit 21 stores each acquired language information and each terminal ID in the correspondence storage unit 22 in association with each other.
 図4は、対応記憶部22に格納される対応付け情報の例を示す図である。図4に例示されるように、対応記憶部22は、端末IDと言語情報とを対応付けて記憶する。対応記憶部22に格納される言語情報及び端末IDは、情報取得部21により取得されたものそのものであってもよいし、情報取得部21により取得されたものに加工が施されたものであってもよい。例えば、情報取得部21がテキストデータにより言語情報を取得した場合に、対応記憶部22には、そのテキストデータが示す言語に対応する言語IDが格納されてもよい。また、情報取得部21により取得された言語情報が複数の言語を示す場合に、対応記憶部22には、その複数の言語から抽出された1つの言語の情報が格納されてもよい。対応記憶部22に格納される端末IDは、発話者装置20により独自に生成された識別データであってもよい。この場合、発話者装置20は、その独自に生成した識別データと情報取得部21により取得された端末IDとの対応付けを管理する。 FIG. 4 is a diagram illustrating an example of association information stored in the correspondence storage unit 22. As illustrated in FIG. 4, the correspondence storage unit 22 stores a terminal ID and language information in association with each other. The linguistic information and terminal ID stored in the correspondence storage unit 22 may be those acquired by the information acquisition unit 21, or may be obtained by processing the information acquired by the information acquisition unit 21. May be. For example, when the information acquisition unit 21 acquires language information from text data, the correspondence storage unit 22 may store a language ID corresponding to the language indicated by the text data. When the language information acquired by the information acquisition unit 21 indicates a plurality of languages, the correspondence storage unit 22 may store information of one language extracted from the plurality of languages. The terminal ID stored in the correspondence storage unit 22 may be identification data uniquely generated by the speaker device 20. In this case, the speaker apparatus 20 manages the association between the uniquely generated identification data and the terminal ID acquired by the information acquisition unit 21.
 発話データ取得部23は、発話者の発話音声データを取得する。発話データ取得部23は、マイクロフォンユニット16により集音された音声信号がPCM(Pulse Code Modulation)により変換された音声データを発話音声データとして取得する。マイクロフォンユニット16により集音される音声信号には、発話者の発話音声に加えて、環境音も含まれる。そこで、発話データ取得部23は、取得された音声データに対して環境音を除去するためのフィルタ処理を施し、得られる音声データを発話音声データとすることもできる。また、発話データ取得部23は、発話者が発言していない無言時間も含む発話音声データを取得してもよいし、無言時間が除去された発話音声データを取得してもよい。 The utterance data acquisition unit 23 acquires the utterance voice data of the speaker. The utterance data acquisition unit 23 acquires, as utterance voice data, voice data obtained by converting the voice signal collected by the microphone unit 16 by PCM (Pulse Code Modulation). The sound signal collected by the microphone unit 16 includes environmental sound in addition to the speech sound of the speaker. Therefore, the utterance data acquisition unit 23 can perform filter processing for removing the environmental sound on the acquired voice data, and use the obtained voice data as the utterance voice data. Further, the utterance data acquisition unit 23 may acquire utterance voice data including the silent time when the speaker does not speak, or may acquire the utterance voice data from which the silent time is removed.
 発話データ取得部23による発話音声データの取得手法は、このような手法に制限されない。発話データ取得部23は、発話者の発話が録音され、メモリ12や可搬型記録媒体や他のコンピュータに格納されている発話音声データを取得してもよい。 The method of acquiring speech voice data by the speech data acquisition unit 23 is not limited to such a method. The utterance data acquisition unit 23 may acquire utterance voice data stored in the memory 12, a portable recording medium, or another computer in which the utterance of the speaker is recorded.
 発話データ取得部23は、発話者の発話で用いられる言語の情報を更に取得する。発話データ取得部23は、発話者の言語情報を予め持ってもよい。発話者の言語情報は、モニタに表示される入力画面に基づいて入力装置をユーザが操作することにより入力されてもよい。 The utterance data acquisition unit 23 further acquires information on the language used in the utterance of the speaker. The utterance data acquisition unit 23 may have language information of the speaker in advance. The language information of the speaker may be input by the user operating the input device based on an input screen displayed on the monitor.
 送信部24は、発話データ取得部23により取得された発話音声データ及び発話者の言語情報に対応する言語データ、並びに、対応記憶部22に格納される言語情報に対応する言語指定データをサーバ装置10に送信する。言語指定データ及び言語データには、例えば、IETF(The Internet Engineering Task Force)によりBCP47として定義された形式が利用される。但し、言語指定データ及び言語データのデータ形式は任意である。言語データは、発話データ取得部23により取得される発話者の言語情報そのものであってもよい。言語指定データは、対応記憶部22に格納される言語情報そのものであってもよい。 The transmission unit 24 receives the utterance voice data acquired by the utterance data acquisition unit 23 and language data corresponding to the language information of the speaker, and language designation data corresponding to the language information stored in the correspondence storage unit 22. 10 to send. For the language designation data and the language data, for example, a format defined as BCP 47 by IETF (The Internet Engineering Task B Force) is used. However, the data format of the language designation data and language data is arbitrary. The language data may be the language information of the speaker acquired by the utterance data acquisition unit 23. The language designation data may be the language information itself stored in the correspondence storage unit 22.
 発話音声データ、発話者の言語データ及び言語指定データの送信タイミングは、同時でなくてもよい。例えば、送信部24は、情報取得部21により言語情報が取得された後であれば、言語指定データを他のデータより先に送信することができる。また、送信部24は、発話データ取得部23が発話者の言語情報が予め有している場合には、発話者の言語データを他のデータより先に送信することができる。 The transmission timing of the speech voice data, the language data of the speaker, and the language designation data may not be the same. For example, the transmission unit 24 can transmit the language designation data before other data after the language information is acquired by the information acquisition unit 21. Further, when the utterance data acquisition unit 23 has the language information of the speaker in advance, the transmission unit 24 can transmit the language data of the speaker before other data.
 受信部25は、送信部24により送信された発話音声データが、同様に送信された言語指定データが示す言語に翻訳された、翻訳テキストデータをサーバ装置10から受信する。受信部25は、言語指定データが複数の言語を示す場合、当該複数の言語に対応する、翻訳テキストデータとその翻訳テキストデータに対応する言語データとの複数のペアを、その翻訳テキストデータの元となった発話音声データの音声認識結果のテキストデータと関連付け可能な状態で受信する。 The receiving unit 25 receives translated text data from the server device 10 in which the speech voice data transmitted by the transmitting unit 24 is translated into the language indicated by the language designation data transmitted in the same manner. When the language designation data indicates a plurality of languages, the receiving unit 25 generates a plurality of pairs of translation text data and language data corresponding to the translation text data corresponding to the plurality of languages, based on the translation text data. It is received in a state where it can be associated with the text data of the speech recognition result of the uttered speech data.
 関連付け可能な状態で受信されるのであれば、翻訳テキストデータ及び言語データの複数のペアと音声認識結果のテキストデータとの受信の仕方は限定されない。例えば、受信部25は、翻訳テキストデータ及び言語データの複数のペアと音声認識結果のテキストデータとを1つの通信メッセージ(応答データ)で受信する。また、受信部25は、当該複数のペアと音声認識結果のテキストデータとを別々の通信メッセージ(応答データ)で受信してもよい。この場合、当該複数のペアと音声認識結果のテキストデータとを関連付けるための関連識別データが各通信メッセージにそれぞれ設定されればよい。更に、受信部25は、1つの翻訳テキストデータを含む当該複数ペアに対して、複数の音声認識結果のテキストデータを受信してもよい。この場合にも、関連識別データを用いることにより、複数の音声認識結果のテキストデータが連結され、かつ、その連結されたテキストデータが当該複数のペアと関連付けられる。 As long as it is received in a state where it can be associated, there are no limitations on how to receive a plurality of pairs of translated text data and language data and text data of a speech recognition result. For example, the receiving unit 25 receives a plurality of pairs of translation text data and language data and text data of a speech recognition result as one communication message (response data). The receiving unit 25 may receive the plurality of pairs and the text data of the speech recognition result as separate communication messages (response data). In this case, association identification data for associating the plurality of pairs with the text data of the speech recognition result may be set in each communication message. Furthermore, the receiving unit 25 may receive a plurality of text recognition result text data for the plurality of pairs including one translation text data. Also in this case, by using the related identification data, a plurality of text data of the speech recognition results are linked, and the linked text data is associated with the plurality of pairs.
 図5は、サーバ装置10からの正常応答のデータの例を示す図である。図6は、サーバ装置10からの異常応答のデータの例を示す図である。図5及び図6の例では、サーバ装置10からの応答データは、JSON(JavaScript(登録商標) Object Notation)形式で記載されている。キー「result」の値が正常応答か否か(OK又はERROR)を示し、キー「recg」の配列が音声認識結果を示し、キー「trans」の配列が翻訳結果を示し、キー「code」の値がエラーコードを示し、キー「message」の値がエラーメッセージを示す。「recg」配列は、発話者の言語データに相当する要素「region」、発話音声データの音声認識結果のテキストデータに相当する要素「text」を有する。「trans」配列は、言語データに相当する要素「region」とその言語データに対応する翻訳テキストデータに相当する要素「text」とのペアが、言語指定データが示す言語の数分、含まれる。 FIG. 5 is a diagram illustrating an example of normal response data from the server device 10. FIG. 6 is a diagram illustrating an example of abnormal response data from the server device 10. In the example of FIGS. 5 and 6, the response data from the server device 10 is described in a JSON (JavaScript (registered trademark) Object Notation) format. The value of the key “result” indicates whether it is a normal response (OK or ERROR), the sequence of the key “recg” indicates the speech recognition result, the sequence of the key “trans” indicates the translation result, and the key “code” The value indicates an error code, and the value of the key “message” indicates an error message. The “recg” array has an element “region” corresponding to the language data of the speaker and an element “text” corresponding to the text data of the speech recognition result of the speech data. The “trans” array includes pairs of an element “region” corresponding to language data and an element “text” corresponding to translated text data corresponding to the language data, for the number of languages indicated by the language designation data.
 図5の例によれば、翻訳テキストデータと言語データとは、「trans」配列内の1要素として関連付けられており、翻訳テキストデータと言語データとの複数ペアは、応答データ内の「recg」配列と「trans」配列との関係により関連付けられている。但し、受信部25がサーバ装置10から受信する応答データは、図5及び図6に示される形式に制限されない。識別データが各応答データに設定されれば、「recg」配列と「trans」配列とは異なる応答データにより受信されてもよい。 According to the example of FIG. 5, the translation text data and the language data are associated as one element in the “trans” array, and a plurality of pairs of the translation text data and the language data are “recg” in the response data. They are related by the relationship between the sequence and the “trans” sequence. However, the response data received by the receiving unit 25 from the server device 10 is not limited to the format shown in FIGS. If the identification data is set in each response data, the “recg” array and the “trans” array may be received by different response data.
 送信部24及び受信部25は、例えば、Webソケットを用いることにより、サーバ装置10との間で1つのセッションで双方向通信を行うことができる。これによれば、発話者装置20からサーバ装置10に向かう方向の発話音声データ等と、逆方向の翻訳テキストデータ等とが非同期でやりとりされ得る。即ち、サーバ装置10は、受信した発話音声データを自由に区切り、区切られた部分的な発話音声データから変換された翻訳テキストデータを発話者装置20に任意のタイミングで逐次送信することができる。 The transmission unit 24 and the reception unit 25 can perform two-way communication with the server device 10 in one session by using, for example, a Web socket. According to this, utterance voice data in the direction from the speaker device 20 toward the server device 10 and translated text data in the reverse direction can be exchanged asynchronously. In other words, the server device 10 can freely divide the received utterance voice data, and sequentially transmit the translated text data converted from the divided partial utterance voice data to the speaker device 20 at an arbitrary timing.
 提供部26は、受信部25により受信された翻訳テキストデータを、その翻訳テキストデータに対応する言語情報と対応付けられて対応記憶部22に記憶される端末IDを宛先に指定して送信する。このとき、提供部26は、翻訳テキストデータに関連付けられて受信される言語データを用いて、その言語データと一致する言語情報と対応付けられた端末IDを対応記憶部22から抽出する。1つの言語データに関して複数の端末IDが対応記憶部22から抽出された場合、提供部26は、当該翻訳テキストデータを、抽出された端末IDの数分コピーし、コピーされた複数の翻訳テキストデータをそれら端末IDが示す複数の聞き手装置30に送信する。 The providing unit 26 transmits the translated text data received by the receiving unit 25 by specifying the terminal ID stored in the correspondence storage unit 22 in association with the language information corresponding to the translated text data as a destination. At this time, the providing unit 26 uses the language data received in association with the translated text data, and extracts the terminal ID associated with the language information that matches the language data from the correspondence storage unit 22. When a plurality of terminal IDs are extracted from the correspondence storage unit 22 for one language data, the providing unit 26 copies the translated text data by the number of the extracted terminal IDs, and the plurality of copied translation text data Are transmitted to the plurality of listener devices 30 indicated by the terminal IDs.
 言語の異なる複数の翻訳テキストデータが受信された場合、提供部26は、複数の聞き手装置30が、受信された複数の翻訳テキストデータの中の、各聞き手装置30の言語情報に対応する翻訳テキストデータをそれぞれ受信できるように、当該受信された複数の翻訳テキストデータを送信する。この場合、提供部26は、翻訳テキストデータ毎に、対応記憶部22から装置IDを抽出し、各翻訳テキストデータを、抽出された装置IDを宛先に指定してそれぞれ送信する。提供部26は、翻訳テキストデータと共に、音声認識結果のテキストデータも聞き手装置30に送信してもよい。 When a plurality of translated text data having different languages are received, the providing unit 26 causes the plurality of listener devices 30 to translate text corresponding to the language information of each listener device 30 in the received plurality of translated text data. The received plural translated text data are transmitted so that each data can be received. In this case, the providing unit 26 extracts a device ID from the correspondence storage unit 22 for each translation text data, and transmits each translated text data by designating the extracted device ID as a destination. The providing unit 26 may transmit the text data of the speech recognition result to the listener device 30 together with the translated text data.
 《サーバ装置》
 図7は、サーバ装置10の処理構成例を概念的に示す図である。サーバ装置10は、音声認識部31、翻訳部32等を有する。これら各処理部は、例えば、CPU2によりメモリ3に格納されるプログラムが実行されることにより実現される。また、当該プログラムは、例えば、CD、メモリカード等のような可搬型記録媒体やネットワーク上の他のコンピュータから通信ユニット7を介してインストールされ、メモリ3に格納されてもよい。
<Server equipment>
FIG. 7 is a diagram conceptually illustrating a processing configuration example of the server device 10. The server device 10 includes a voice recognition unit 31, a translation unit 32, and the like. Each of these processing units is realized, for example, by executing a program stored in the memory 3 by the CPU 2. The program may be installed from a portable recording medium such as a CD or a memory card or another computer on the network via the communication unit 7 and stored in the memory 3.
 音声認識部31は、発話者装置20から発話音声データを受信し、その発話音声データに対して音声認識処理を行う。音声認識処理には、周知の音声認識技術が利用されればよい。例えば、音声認識部31は、音の波形データを集めて形成される音響モデルと、単語と単語の並び方を集めて形成される言語モデルを用いて、発話音声データを発話テキストデータに変換する。この場合、音声認識部31は、発話者装置20から送られる発話者の言語データに基づいて、音声認識処理で用いる音響モデル及び言語モデルをその言語データが示す言語用のモデルに切り替える。また、サーバ装置10は、言語毎に、各言語にカスタマイズされた音声認識部31をそれぞれ有してもよい。この場合には、発話者装置20から送られる発話者の言語データに基づいて、サーバ装置10は、実行する音声認識部31を切り替えることもできる。 The voice recognition unit 31 receives the voice data from the speaker device 20 and performs voice recognition processing on the voice data. A known voice recognition technique may be used for the voice recognition process. For example, the speech recognition unit 31 converts speech speech data into speech text data using an acoustic model formed by collecting sound waveform data and a language model formed by collecting words and word arrangements. In this case, the speech recognition unit 31 switches the acoustic model and the language model used in the speech recognition processing to the language model indicated by the language data based on the language data of the speaker sent from the speaker device 20. Moreover, the server apparatus 10 may have the speech recognition part 31 customized for each language for each language. In this case, based on the language data of the speaker sent from the speaker device 20, the server device 10 can also switch the voice recognition unit 31 to be executed.
 翻訳部32は、音声認識部31により得られた発話テキストデータに対して、発話者の言語データが示す言語から言語指定データが示す言語への翻訳処理(機械翻訳)を実行する。この翻訳処理には、ルールベース翻訳手法、統計ベース翻訳手法等のような周知の翻訳技術が利用されればよい。言語指定データが複数の異なる言語を示す場合、翻訳部32は、発話テキストデータに対して、各言語に対応する翻訳処理をそれぞれ実行する。翻訳部32は、上記翻訳処理により、言語指定データが示す各言語の翻訳テキストデータをそれぞれ生成する。 The translation unit 32 performs a translation process (machine translation) on the utterance text data obtained by the speech recognition unit 31 from the language indicated by the language data of the speaker into the language indicated by the language designation data. For this translation process, a well-known translation technique such as a rule-based translation technique, a statistics-based translation technique, or the like may be used. When the language designation data indicates a plurality of different languages, the translation unit 32 performs a translation process corresponding to each language on the utterance text data. The translation unit 32 generates translation text data of each language indicated by the language designation data by the translation process.
 翻訳部32は、生成された翻訳テキストデータとその翻訳テキストデータに対応する言語データとのペアを、その翻訳テキストデータの元となった発話音声データの音声認識結果のテキストデータ(発話テキストデータ)と関連付けた状態で応答データとして発話者装置20に送信する。例えば、翻訳部32は、図5及び図6に示される形式を持つ応答データを発話者装置20に送信する。 The translation unit 32 generates a pair of the generated translation text data and the language data corresponding to the translation text data, and the text data (utterance text data) of the speech recognition result of the utterance voice data that is the source of the translation text data. To the speaker device 20 as response data. For example, the translation unit 32 transmits response data having the format shown in FIGS. 5 and 6 to the speaker apparatus 20.
 翻訳部32は、翻訳するのに十分な長さの発話テキストデータが音声認識部31により得られるまで待って、翻訳処理を実行してもよい。即ち、翻訳部32により翻訳処理されるデータ単位と、音声認識部31により音声認識処理されるデータ単位とは異なってもよい。翻訳部32は、音声認識部31により発話テキストデータが得られたものの、そのデータに対して翻訳処理をしない場合には、音声認識結果のテキストデータ(発話テキストデータ)、及び、このテキストデータの翻訳結果となる翻訳テキストデータと関連付けるための関連識別データを、応答データとして発話者装置20に送信してもよい。このようにすれば、発話者装置20が、長い間、応答データをサーバ装置10から受け取れないという事象を回避することができる。 The translation unit 32 may wait for speech text data having a length sufficient for translation to be obtained by the speech recognition unit 31 and execute the translation process. That is, the data unit translated by the translation unit 32 and the data unit processed by the speech recognition unit 31 may be different. If the speech recognition unit 31 obtains the utterance text data but does not perform the translation process on the data, the translation unit 32 produces the speech recognition result text data (utterance text data) and the text data. The related identification data for associating with the translated text data as the translation result may be transmitted to the speaker device 20 as response data. In this way, it is possible to avoid an event that the speaker device 20 cannot receive response data from the server device 10 for a long time.
 〔動作例/購買支援方法〕
 以下、第一実施形態における翻訳データ提供方法について図8を用いて説明する。図8は、第一実施形態における発話者装置20の動作例を示すフローチャートである。図8に示されるように、第一実施形態における翻訳データ提供方法は、発話者装置20のような少なくとも1つのコンピュータにより実行される。例えば、図示される各工程は、発話者装置20が有する各処理部により実行される。各工程は、発話者装置20が有する上述の各処理部の処理内容と同様であるため、各工程の詳細は、適宜省略される。
[Operation example / Purchase support method]
Hereinafter, the translation data providing method in the first embodiment will be described with reference to FIG. FIG. 8 is a flowchart showing an operation example of the speaker device 20 in the first embodiment. As shown in FIG. 8, the translation data providing method in the first embodiment is executed by at least one computer such as the speaker device 20. For example, each illustrated process is executed by each processing unit included in the speaker device 20. Since each process is the same as the processing content of the above-mentioned each process part which the speaker apparatus 20 has, the detail of each process is abbreviate | omitted suitably.
 以下の説明では、発話者装置20が複数の聞き手装置30に翻訳データを提供する場合が例示される。 In the following description, a case where the speaker device 20 provides translation data to a plurality of listener devices 30 is exemplified.
 発話者装置20は、複数の聞き手装置30の各々から言語情報及び装置IDをそれぞれ取得する(S81)。発話者装置20は、取得された言語情報と装置IDとを対応付けて対応記憶部22に格納する(S82)。 The speaker device 20 acquires language information and a device ID from each of the plurality of listener devices 30 (S81). The speaker device 20 associates the acquired language information with the device ID and stores them in the correspondence storage unit 22 (S82).
 発話者装置20は、発話者の発話音声データ及び言語情報を取得する(S83)。
 発話者装置20は、(S83)で取得された言語情報に対応する言語データ及び(S83)で取得された発話音声データ、並びに、(S82)で対応記憶部22に格納された言語情報に対応する言語指定データをサーバ装置10に送信する(S84)。対応記憶部22に複数の異なる言語を示す言語情報が格納されている場合には、当該複数の言語を示す言語指定データがサーバ装置10へ送信される。
The speaker device 20 acquires the speech data and language information of the speaker (S83).
The speaker device 20 corresponds to the language data corresponding to the language information acquired in (S83), the speech data acquired in (S83), and the language information stored in the correspondence storage unit 22 in (S82). The language designation data to be transmitted is transmitted to the server device 10 (S84). When language information indicating a plurality of different languages is stored in the correspondence storage unit 22, language designation data indicating the plurality of languages is transmitted to the server device 10.
 サーバ装置10は、(S84)で送信されたデータを受信し、受信された発話音声データに対して、発話者の言語データが示す言語に対応する音声認識処理を施し、発話テキストデータを生成する。サーバ装置10は、その発話テキストデータに対して、発話者の言語から、受信された言語指定データが示す言語への翻訳処理を実行する。結果、サーバ装置10は、発話音声データが当該言語指定データが示す言語へ翻訳された翻訳テキストデータを生成する。 The server device 10 receives the data transmitted in (S84), performs speech recognition processing corresponding to the language indicated by the language data of the speaker on the received speech voice data, and generates speech text data. . The server device 10 performs a translation process on the utterance text data from the language of the speaker into the language indicated by the received language designation data. As a result, the server device 10 generates translated text data in which the speech voice data is translated into the language indicated by the language designation data.
 発話者装置20は、(S84)で送信したデータに対する応答データをサーバ装置10から受信する(S85)。応答データは、正常応答か否かを示す値を含む。正常応答を示す応答データは、更に、翻訳テキストデータとその翻訳テキストデータに対応する言語データとのペア、及び、その翻訳テキストデータの元となった発話音声データの音声認識結果のテキストデータを含む。言語指定データが複数の言語を示す場合、発話者装置20は、当該複数の言語に対応する、翻訳テキストデータとその翻訳テキストデータに対応する言語データとの複数のペアを、その翻訳テキストデータの元となった発話音声データの音声認識結果のテキストデータと関連付け可能な状態で応答データとして受信する。また、発話者装置20は、音声認識結果のテキストデータ及び関連識別データを応答データとして受信する場合もあり得る。 The speaker device 20 receives response data for the data transmitted in (S84) from the server device 10 (S85). The response data includes a value indicating whether the response is normal. The response data indicating the normal response further includes a pair of translation text data and language data corresponding to the translation text data, and text data of a speech recognition result of the utterance voice data that is the basis of the translation text data. . When the language designation data indicates a plurality of languages, the speaker apparatus 20 sets a plurality of pairs of translation text data corresponding to the plurality of languages and language data corresponding to the translation text data in the translation text data. It is received as response data in a state in which it can be associated with the text data of the speech recognition result of the original speech voice data. Moreover, the speaker apparatus 20 may receive the text data and related identification data of a speech recognition result as response data.
 発話者装置20は、その応答データが正常応答を示すか否かを判定する(S86)。発話者装置20は、その応答データが正常応答を示さない場合(S86;NO)、その応答データに設定されている情報に基づいて、エラー情報を出力する(S87)。出力されるエラー情報は、図6に例示されるように、応答データ内に設定されていてもよい。また、エラー情報の出力形態は任意である。発話者装置20は、表示ユニット13のモニタにエラー情報を出力することができる。発話者装置20は、そのエラー情報を読み上げる音声又はそのエラー情報に対応する音をスピーカユニット17に送出させてもよい。また、発話者装置20は、エラー情報を各聞き手装置30にそれぞれ送信してもよい。 The speaker apparatus 20 determines whether or not the response data indicates a normal response (S86). If the response data does not indicate a normal response (S86; NO), the speaker device 20 outputs error information based on the information set in the response data (S87). The error information to be output may be set in the response data as illustrated in FIG. Moreover, the output form of error information is arbitrary. The speaker device 20 can output error information to the monitor of the display unit 13. The speaker device 20 may cause the speaker unit 17 to send out a voice reading out the error information or a sound corresponding to the error information. Further, the speaker device 20 may transmit error information to each listener device 30.
 発話者装置20は、その応答データが正常応答を示す場合(S86;YES)、応答データに含まれる翻訳テキストデータの宛先を特定する(S88)。具体的には、発話者装置20は、翻訳テキストデータに対応する言語情報に対応付けられた装置IDを対応記憶部22から抽出し、抽出された装置IDをその翻訳テキストデータの宛先として用いる。このとき、1つの翻訳テキストデータに対して複数の宛先(装置ID)が特定される場合もあり得る。また、応答データに異なる複数の言語に関する複数の翻訳テキストデータが含まれる場合には、発話者装置20は、複数の翻訳テキストデータの各々に関し、宛先(装置ID)を特定する。 When the response data indicates a normal response (S86; YES), the speaker apparatus 20 specifies the destination of the translated text data included in the response data (S88). Specifically, the speaker device 20 extracts the device ID associated with the language information corresponding to the translated text data from the correspondence storage unit 22, and uses the extracted device ID as the destination of the translated text data. At this time, a plurality of destinations (device IDs) may be specified for one translation text data. When the response data includes a plurality of translated text data related to a plurality of different languages, the speaker apparatus 20 specifies a destination (apparatus ID) for each of the plurality of translated text data.
 発話者装置20は、(S88)で特定された宛先に基づいて、各聞き手装置30に、所望の翻訳テキストデータをそれぞれ送信する(S89)。発話者装置20は、1つの翻訳テキストデータの宛先として複数の端末IDが抽出された場合、その翻訳テキストデータを、抽出された端末IDの数分コピーし、コピーされた複数の翻訳テキストデータをそれら端末IDが示す複数の聞き手装置30に送信する。また、応答データに言語の異なる複数の翻訳テキストデータが含まれている場合、発話者装置20は、複数の聞き手装置30が、受信された複数の翻訳テキストデータの中の、各聞き手装置30の言語情報に対応する翻訳テキストデータをそれぞれ受信できるように、当該複数の翻訳テキストデータを送信する。 The speaker device 20 transmits desired translation text data to each listener device 30 based on the destination specified in (S88) (S89). When a plurality of terminal IDs are extracted as destinations of one translation text data, the speaker apparatus 20 copies the translation text data by the number of extracted terminal IDs, and the copied plurality of translation text data. It transmits to the several listener apparatus 30 which those terminal ID shows. In addition, when the response data includes a plurality of translated text data in different languages, the speaker device 20 uses the plurality of listener devices 30 of each of the listener devices 30 in the received plurality of translated text data. The plurality of translated text data is transmitted so that each translated text data corresponding to the language information can be received.
 発話者装置20は、翻訳テキストデータと共に、音声認識結果のテキストデータも聞き手装置30に送信してもよい。また、発話者装置20は、応答データが正常応答を示し(S86;YES)、かつ、その応答データに翻訳テキストデータが含まれない場合には、その応答データを保持し、次の応答データを待つ(図示せず)。発話者装置20は、翻訳テキストデータを含む応答データを受信した場合に、関連識別データに基づいて、音声認識結果のテキストデータの連結、及び、その連結されたデータと翻訳テキストデータとの関連付けを行う。この場合、発話者装置20は、翻訳テキストデータが含まれない応答データに含まれる音声認識結果のテキストデータのみを聞き手装置30に送信してもよい。 The speaker device 20 may transmit the text data of the speech recognition result to the listener device 30 together with the translated text data. In addition, when the response data indicates a normal response (S86; YES) and the translated text data is not included in the response data, the speaker device 20 holds the response data and transmits the next response data. Wait (not shown). When the speaker device 20 receives the response data including the translated text data, the speaker device 20 links the text data of the speech recognition result and associates the linked data with the translated text data based on the related identification data. Do. In this case, the speaker device 20 may transmit only the text data of the speech recognition result included in the response data not including the translated text data to the listener device 30.
 聞き手装置30は、発話者装置20に送信した言語情報が示す言語に、発話音声データが翻訳された翻訳テキストデータを取得し、その翻訳テキストデータをモニタに表示する。また、聞き手装置30は、その翻訳テキストデータを読み上げる音声を出力することもできる。更に、聞き手装置30は、翻訳テキストデータと共に受信される、その翻訳テキストデータの元となる発話音声データの音声認識結果のテキストデータも同様に出力してもよい。また、聞き手装置30は、翻訳テキストデータなく、音声認識結果のテキストデータを受信した場合には、そのテキストデータのみを出力してもよい。 The listener device 30 acquires translated text data obtained by translating the speech voice data into the language indicated by the language information transmitted to the speaker device 20, and displays the translated text data on the monitor. In addition, the listener device 30 can output a voice that reads out the translated text data. Further, the listener device 30 may also output the text data of the speech recognition result of the utterance voice data that is received together with the translation text data and is the basis of the translation text data. In addition, when the listener device 30 receives the text data of the speech recognition result without the translated text data, the listener device 30 may output only the text data.
 図8では、複数の工程(処理)が順番に表されているが、第一実施形態で実行される工程及びその工程の実行順序は、図8の例のみに限定されない。例えば、(S83)で取得される発話音声データと発話者の言語情報とは、異なるタイミングで取得されてもよい。発話者の言語情報は、(S81)よりも前に取得され得る。また、図8では、説明の便宜のため簡易化されているが、発話音声データが随時取得される場合には、(S83)以降が繰り返される。更に、当該翻訳データ提供方法は、正常応答を示す応答データに含まれる音声認識結果のテキストデータ及び翻訳テキストデータを表示ユニット13のモニタに表示させる工程を含むこともできる。 In FIG. 8, a plurality of steps (processes) are shown in order, but the steps executed in the first embodiment and the execution order of the steps are not limited to the example of FIG. For example, the speech audio data acquired in (S83) and the language information of the speaker may be acquired at different timings. The language information of the speaker can be acquired before (S81). Further, in FIG. 8, although simplified for convenience of explanation, when speech data is acquired at any time, (S83) and subsequent steps are repeated. Further, the translation data providing method may include a step of displaying the text data of the speech recognition result and the translation text data included in the response data indicating a normal response on the monitor of the display unit 13.
 〔第一実施形態における作用及び効果〕
 上述のように、第一実施形態では、翻訳データの提供を望む聞き手装置30から言語情報及び装置IDが発話者装置20により取得され、言語情報及び装置IDが対応付けられて対応記憶部22に格納される。そして、翻訳の元データとなる発話音声データが、発話者装置20により取得され、対応記憶部22に格納される対応情報に対応する言語指定データ及び発話音声データが発話者装置20からサーバ装置10に送られる。サーバ装置10では、音声認識により発話音声データが発話テキストデータに変換され、この発話テキストデータが言語指定データにより示される言語に翻訳される。この翻訳テキストデータは、サーバ装置10から発話者装置20に送られ、その翻訳テキストデータに対応する言語情報と対応付けられて対応記憶部22に格納される装置IDを宛先に指定して、発話者装置20から聞き手装置30に送信される。
[Operation and Effect in First Embodiment]
As described above, in the first embodiment, the language information and the device ID are acquired by the speaker device 20 from the listener device 30 who wants to provide translation data, and the language information and the device ID are associated with each other in the correspondence storage unit 22. Stored. Then, utterance voice data that is the original data for translation is acquired by the utterer apparatus 20, and language designation data and utterance voice data corresponding to the correspondence information stored in the correspondence storage unit 22 are transmitted from the utterer apparatus 20 to the server apparatus 10. Sent to. In the server device 10, utterance voice data is converted into utterance text data by voice recognition, and the utterance text data is translated into a language indicated by the language designation data. The translated text data is sent from the server device 10 to the speaker device 20, and the device ID associated with the language information corresponding to the translated text data and stored in the correspondence storage unit 22 is designated as the destination, and the speech Is transmitted from the listener device 20 to the listener device 30.
 このように、第一実施形態によれば、聞き手装置30は、発話音声データを取得する発話者装置20を介して、サーバ装置10で生成された翻訳テキストデータを取得することができる。即ち、聞き手装置30は、発話者装置20に言語情報及び装置IDを提供することで、サーバ装置10にアクセスすることなく、発話者装置20から翻訳テキストデータを取得することができる。逆に、サーバ装置10は、発話者装置20のみを認識すればよく、送信する翻訳テキストデータをどの聞き手装置30が受けるのかを認識する必要はない。従って、第一実施形態によれば、聞き手装置30のユーザ(聞き手)の個人情報をサーバ装置10に登録することなく、当該ユーザに、所望の言語への翻訳サービスを提供することができる。 Thus, according to the first embodiment, the listener device 30 can acquire the translation text data generated by the server device 10 via the speaker device 20 that acquires the speech voice data. In other words, the listener device 30 can obtain the translated text data from the speaker device 20 without accessing the server device 10 by providing the language information and the device ID to the speaker device 20. Conversely, the server device 10 only needs to recognize the speaker device 20, and does not need to recognize which listener device 30 receives the translated text data to be transmitted. Therefore, according to the first embodiment, a translation service into a desired language can be provided to the user without registering personal information of the user (listener) of the listener device 30 in the server device 10.
 聞き手装置30のユーザの個人情報がサーバ装置10に登録されるのを出来る限り避ける理由は、サーバ装置10が翻訳の元となる発話に無関係な第三者的な(公共的な)立場にあるからである。第一実施形態では、発話者装置20に、聞き手装置30の言語情報及び端末IDが格納される。しかしながら、発話者装置20及び聞き手装置30の各ユーザは、発話者と聞き手の関係、又は、それに近似する関係(例えば、発話音声データを取得する者とその発話の聞き手の関係)にあるため、発話者装置20は、翻訳の元となる発話に関係する当事者的な立場にある。よって、発話者装置20にそれら情報が格納されたとしても、個人情報の漏えいには結び付きにくい。 The reason why the personal information of the user of the listener device 30 is avoided as much as possible is registered in the server device 10 is that the server device 10 is in a third party (public) position unrelated to the utterance from which the translation is based. Because. In the first embodiment, the language information and the terminal ID of the listener device 30 are stored in the speaker device 20. However, each user of the speaker device 20 and the listener device 30 is in a relationship between the speaker and the listener, or a relationship close thereto (for example, a relationship between the person who acquires the speech data and the listener of the speech). The speaker device 20 is in a party position related to the utterance that is the source of translation. Therefore, even if such information is stored in the speaker device 20, it is difficult to be associated with leakage of personal information.
 また、第一実施形態では、発話音声データに加えて、発話者の言語情報に対応する言語データが発話者装置20からサーバ装置10へ送信される。これにより、サーバ装置10は、その言語データ用に、音声認識処理及び翻訳処理を切り替えることができるため、複数の翻訳形態をサポートすることができる。 In the first embodiment, in addition to the speech voice data, language data corresponding to the language information of the speaker is transmitted from the speaker device 20 to the server device 10. Thereby, since the server apparatus 10 can switch a speech recognition process and a translation process for the language data, it can support a plurality of translation forms.
 更に、第一実施形態では、聞き手装置30から取得された言語情報が複数の異なる言語を示す場合に、翻訳テキストデータとその翻訳テキストデータに対応する言語データとの複数ペアが、その翻訳テキストデータの元となった発話音声データの音声認識結果のテキストデータと関連付けられた状態で発話者装置20により受信される。これにより、複数の聞き手装置30が複数の異なる言語を要求する場合でも、各聞き手装置30は、略同タイミングで、所望の言語に翻訳された翻訳テキストデータをそれぞれ取得することができる。 Further, in the first embodiment, when the language information acquired from the listener device 30 indicates a plurality of different languages, a plurality of pairs of translated text data and language data corresponding to the translated text data are converted into the translated text data. Is received by the speaker device 20 in a state associated with the text data of the speech recognition result of the speech data that is the source of the speech. Thereby, even when a plurality of listener devices 30 request a plurality of different languages, each listener device 30 can acquire translated text data translated into a desired language at substantially the same timing.
 また、第一実施形態では、サーバ装置10から発話者装置20には、翻訳テキストデータとその翻訳テキストデータに対応する言語データとのペアが、その翻訳テキストデータの元となる発話音声データの音声認識結果のテキストデータと関連付け可能な状態で提供される。これにより、音声認識結果のテキストデータを発話者装置20のモニタに表示させるようにすれば、発話者装置20のユーザである発話者又はその発話を聴くことができる者は、そのテキストデータを見ることで、翻訳テキストデータが正確か否かを判断することができる。また、音声認識結果のテキストデータがサーバ装置10において翻訳処理をするのに十分な文長を持たない場合には、翻訳テキストデータなしで、音声認識結果のテキストデータがサーバ装置10から発話者装置20に提供されてもよい。これにより、発話者装置20は、サーバ装置10での翻訳状況を把握することができる。 Further, in the first embodiment, the server apparatus 10 to the speaker apparatus 20 have a pair of translation text data and language data corresponding to the translation text data as the voice of the utterance voice data from which the translation text data is based. It is provided in a state where it can be associated with the text data of the recognition result. Thus, if the text data of the speech recognition result is displayed on the monitor of the speaker device 20, the speaker who is the user of the speaker device 20 or the person who can listen to the speech sees the text data. Thus, it can be determined whether or not the translated text data is accurate. If the text data of the speech recognition result does not have a sentence length sufficient for translation processing in the server device 10, the text data of the speech recognition result is sent from the server device 10 to the speaker device without the translation text data. 20 may be provided. Thereby, the speaker apparatus 20 can grasp | ascertain the translation condition in the server apparatus 10. FIG.
[第二実施形態]
 以下、第二実施形態における発話者装置及び翻訳データ提供方法について複数の図面を用いて説明する。第二実施形態におけるシステム構成は第一実施形態と同様である。また、第二実施形態におけるサーバ装置10及び発話者装置20の処理構成についても第一実施形態と同様である。
[Second Embodiment]
Hereinafter, the speaker apparatus and the translation data providing method in the second embodiment will be described with reference to a plurality of drawings. The system configuration in the second embodiment is the same as that in the first embodiment. The processing configurations of the server device 10 and the speaker device 20 in the second embodiment are also the same as in the first embodiment.
 第二実施形態では、発話者装置20は、サーバ装置10から、翻訳テキストデータ等に加えて、音声認識の信頼度情報を更に取得する。以下、第二実施形態について、第一実施形態と異なる内容を中心に説明し、第一実施形態と同様の内容については適宜省略する。 In the second embodiment, the speaker device 20 further acquires reliability information of speech recognition from the server device 10 in addition to the translated text data and the like. Hereinafter, the second embodiment will be described focusing on the contents different from the first embodiment, and the same contents as those of the first embodiment will be omitted as appropriate.
 《サーバ装置》
 音声認識部31は、発話音声データに対して音声認識処理を行うことで発話テキストデータを生成し、更に、その音声認識結果の信頼度を算出する。例えば、音声認識部31は、音響モデル及び言語モデルを用いて導出した認識結果候補の各単語に対して尤度を計算し、その候補の中から最終的に選択される単語の尤度と選択されなかった単語の尤度との差を用いて、当該信頼度を算出することができる。この場合、尤度の差が大きい程、高い信頼度が付与され、尤度の差が小さい程、低い信頼度が付与される。このような音声認識結果の信頼度の算出手法には周知の手法が利用されればよい。
<Server equipment>
The speech recognition unit 31 generates speech text data by performing speech recognition processing on speech speech data, and further calculates the reliability of the speech recognition result. For example, the speech recognition unit 31 calculates the likelihood for each word of the recognition result candidate derived using the acoustic model and the language model, and the likelihood and selection of the word finally selected from the candidates The reliability can be calculated using a difference from the likelihood of the word that has not been performed. In this case, the higher the difference in likelihood, the higher the degree of reliability, and the lower the likelihood difference, the lower the degree of reliability. A known method may be used as a method for calculating the reliability of the speech recognition result.
 翻訳部32は、翻訳テキストデータ、その翻訳テキストデータに対応する言語データ、その翻訳テキストデータの元となる発話音声データの音声認識結果のテキストデータ、及び、その音声認識結果の信頼度情報を関連付け可能な状態で発話者装置20に送信する。翻訳部32は、音声認識部31により得られた発話テキストデータに対して翻訳処理をしない場合には、音声認識結果のテキストデータ(発話テキストデータ)、このテキストデータの翻訳結果となる翻訳テキストデータと関連付けるための関連識別データ、及び、その音声認識結果の信頼度情報を、応答データとして発話者装置20に送信してもよい。 The translation unit 32 associates the translation text data, the language data corresponding to the translation text data, the text data of the speech recognition result of the speech data that is the basis of the translation text data, and the reliability information of the speech recognition result. Transmit to the speaker device 20 in a possible state. When the translation unit 32 does not perform translation processing on the utterance text data obtained by the speech recognition unit 31, the text data (utterance text data) of the speech recognition result and the translation text data which is the translation result of this text data The association identification data for associating with and the reliability information of the voice recognition result may be transmitted to the speaker device 20 as response data.
 《発話者装置》
 受信部25は、翻訳テキストデータ、その翻訳テキストデータに対応する言語データ、その翻訳テキストデータの元となる発話音声データの音声認識結果のテキストデータ、及び、その音声認識結果の信頼度情報を関連付け可能な状態でサーバ装置10から受信する。関連付け可能な状態で受信されるのであれば、翻訳テキストデータ、言語データ、音声認識結果のテキストデータ及び信頼度情報の受信の仕方は限定されない。音声認識結果のテキストデータ及び信頼度情報と、翻訳テキストデータ及び言語データとが別の応答データとして受信される場合には、第一実施形態で述べたように、各応答データには関連識別データが設定されればよい。
《Speaker device》
The receiving unit 25 associates the translated text data, the language data corresponding to the translated text data, the text data of the speech recognition result of the speech data that is the basis of the translated text data, and the reliability information of the speech recognition result. Received from the server device 10 in a possible state. As long as it is received in a state where it can be associated, the method of receiving the translated text data, the language data, the text data of the speech recognition result, and the reliability information is not limited. When the text data and reliability information of the speech recognition result and the translated text data and language data are received as different response data, as described in the first embodiment, each response data includes related identification data. May be set.
 提供部26は、受信部25で受信された信頼度情報に基づいて、同様に受信された翻訳テキストデータをそのまま聞き手装置30に送信するか否かを判定する。提供部26は、その信頼度情報が所定値以上の信頼度を示す場合には、第一実施形態と同様に、翻訳テキストデータを聞き手装置30に送信する。一方、提供部26は、その信頼度情報が所定値よりも低い信頼度を示す場合には、その翻訳テキストデータの確度も低いため、信頼度が低いことを出力する。提供部26は、信頼度が低いことを表示ユニット13のモニタに表示させてもよいし、音でスピーカユニット17に出力させてもよい。信頼度と比較される所定値は、信頼度の閾値であり、予め提供部26により保持される。 The providing unit 26 determines whether to transmit the received translation text data as it is to the listener device 30 based on the reliability information received by the receiving unit 25. When the reliability information indicates a reliability greater than or equal to a predetermined value, the providing unit 26 transmits the translated text data to the listener device 30 as in the first embodiment. On the other hand, when the reliability information indicates a reliability lower than the predetermined value, the providing unit 26 outputs that the reliability is low because the accuracy of the translated text data is low. The providing unit 26 may display that the reliability is low on the monitor of the display unit 13 or may output the sound to the speaker unit 17 with sound. The predetermined value to be compared with the reliability is a reliability threshold and is held in advance by the providing unit 26.
 また、提供部26は、その信頼度情報が所定値よりも低い場合、その翻訳テキストデータを聞き手装置30に送らないようにしてもよいし、送るか否かをユーザに決めさせてもよい。提供部26は、信頼度が低いことと共に、聞き手装置30に送信するか否かを選択するための操作ボタンをモニタに表示させ、その操作ボタンに対するユーザ操作に応じて、その翻訳テキストデータを送信すること又は送信しないことを決定してもよい。提供部26は、翻訳テキストデータと共に、信頼度情報を聞き手装置30に送信してもよい。 Also, when the reliability information is lower than a predetermined value, the providing unit 26 may not send the translated text data to the listener device 30 or may allow the user to decide whether or not to send it. The providing unit 26 displays an operation button for selecting whether or not to transmit to the listener device 30 on the monitor together with the low reliability, and transmits the translated text data in response to a user operation on the operation button. You may decide to do or not send. The providing unit 26 may transmit reliability information to the listener device 30 together with the translation text data.
 〔動作例/購買支援方法〕
 以下、第二実施形態における翻訳データ提供方法について図9を用いて説明する。図9は、第二実施形態における発話者装置20の動作例を示すフローチャートである。第二実施形態における翻訳データ提供方法の実行主体は、第一実施形態と同様である。各工程は、発話者装置20が有する上述の各処理部の処理内容と同様であるため、各工程の詳細は、適宜省略される。また、図9において、図8と同様の内容の工程については、図8と同じ符号が付されている。
[Operation example / Purchase support method]
Hereinafter, the translation data providing method in the second embodiment will be described with reference to FIG. FIG. 9 is a flowchart showing an operation example of the speaker device 20 in the second embodiment. The execution subject of the translation data providing method in the second embodiment is the same as in the first embodiment. Since each process is the same as the processing content of the above-mentioned each process part which the speaker apparatus 20 has, the detail of each process is abbreviate | omitted suitably. In FIG. 9, steps having the same contents as those in FIG. 8 are denoted by the same reference numerals as those in FIG.
 発話者装置20は、第一実施形態と同様に、(S81)から(S84)を実行する。
 サーバ装置10は、(S84)で送信されたデータを受信し、第一実施形態と同様に、音声認識処理及び翻訳処理を実行し、結果として、発話音声データが言語指定データが示す言語へ翻訳された翻訳テキストデータを生成する。加えて、第二実施形態では、サーバ装置10は、音声認識結果の信頼度を算出する。サーバ装置10は、翻訳テキストデータ、その翻訳テキストデータに対応する言語データ、その翻訳テキストデータの元となる発話音声データの音声認識結果のテキストデータ、及び、その音声認識結果の信頼度情報を関連付け可能な状態で応答データとして発話者装置20へ送信する。サーバ装置10は、翻訳テキストデータなく、音声認識結果のテキストデータ、関連識別データ及びその音声認識結果の信頼度情報を応答データとして発話者装置20へ送信してもよい。
The speaker device 20 executes (S81) to (S84) as in the first embodiment.
The server device 10 receives the data transmitted in (S84), executes the speech recognition process and the translation process as in the first embodiment, and as a result, the speech voice data is translated into the language indicated by the language designation data. Generate translated text data. In addition, in the second embodiment, the server device 10 calculates the reliability of the voice recognition result. The server device 10 associates the translated text data, the language data corresponding to the translated text data, the text data of the speech recognition result of the speech data that is the basis of the translated text data, and the reliability information of the speech recognition result. In a possible state, it is transmitted as response data to the speaker device 20. The server device 10 may transmit the text data of the speech recognition result, the related identification data, and the reliability information of the speech recognition result as response data to the speaker device 20 without the translated text data.
 発話者装置20は、その応答データをサーバ装置10から受信する(S91)。応答データは、正常応答か否かを示す値、翻訳テキストデータとその翻訳テキストデータに対応する言語データとのペア、その翻訳テキストデータの元となった発話音声データの音声認識結果のテキストデータ、及び、音声認識結果の信頼度情報を含む。言語指定データが複数の言語を示す場合、発話者装置20は、当該複数の言語に対応する、翻訳テキストデータとその翻訳テキストデータに対応する言語データとの複数のペアを、その翻訳テキストデータの元となった発話音声データの音声認識結果のテキストデータ及び信頼度情報と関連付けられた状態で応答データとして受信する。また、発話者装置20は、音声認識結果のテキストデータ、関連識別データ及び音声認識結果の信頼度情報を応答データとして受信する場合もあり得る。 The speaker apparatus 20 receives the response data from the server apparatus 10 (S91). The response data includes a value indicating whether or not the response is normal, a pair of translation text data and language data corresponding to the translation text data, text data of speech recognition results of the utterance voice data from which the translation text data is based And reliability information of the speech recognition result. When the language designation data indicates a plurality of languages, the speaker apparatus 20 sets a plurality of pairs of translation text data corresponding to the plurality of languages and language data corresponding to the translation text data in the translation text data. It is received as response data in a state associated with the text data and reliability information of the speech recognition result of the original speech voice data. Further, the speaker device 20 may receive text data of the speech recognition result, related identification data, and reliability information of the speech recognition result as response data.
 発話者装置20は、その応答データが正常応答を示すか否かを判定する(S92)。発話者装置20は、その応答データが正常応答を示さない場合(S92;NO)、第一実施形態と同様に、エラー情報を出力する(S87)。 The speaker device 20 determines whether or not the response data indicates a normal response (S92). If the response data does not indicate a normal response (S92; NO), the speaker device 20 outputs error information as in the first embodiment (S87).
 発話者装置20は、その応答データが正常応答を示す場合(S92;YES)、更に、応答データに含まれる信頼度情報が所定値より低い信頼度を示すか否かを判定する(S93)。発話者装置20は、信頼度情報が所定値以上の信頼度を示す場合(S93;NO)、第一実施形態と同様に、応答データに含まれる翻訳テキストデータの宛先を特定し(S88)、各聞き手装置30に、所望の翻訳テキストデータをそれぞれ送信する(S89)。第二実施形態では、発話者装置20は、翻訳テキストデータと共に、信頼度情報を聞き手装置30に送信してもよい。 When the response data indicates a normal response (S92; YES), the speaker apparatus 20 further determines whether or not the reliability information included in the response data indicates a reliability lower than a predetermined value (S93). When the reliability information indicates a reliability equal to or higher than a predetermined value (S93; NO), the speaker device 20 specifies the destination of the translated text data included in the response data (S88), as in the first embodiment. The desired translated text data is transmitted to each listener device 30 (S89). In the second embodiment, the speaker device 20 may transmit reliability information to the listener device 30 together with the translated text data.
 聞き手装置30は、発話者装置20から翻訳テキストデータを受信し、第一実施形態と同様に、その翻訳テキストデータを出力するモニタに表示する。第二実施形態では、聞き手装置30は、翻訳テキストデータと共に受信される信頼度情報を出力することもできる。 The listener device 30 receives the translated text data from the speaker device 20 and displays it on a monitor that outputs the translated text data, as in the first embodiment. In the second embodiment, the listener device 30 can also output reliability information received together with the translated text data.
 一方、発話者装置20は、信頼度情報が所定値より低い信頼度を示す場合(S93;YES)、信頼度が低いことを提示する(S94)。例えば、発話者装置20は、信頼度が低いことを表示ユニット13のモニタに表示させてもよいし、音でスピーカユニット17に出力させてもよい。 On the other hand, if the reliability information indicates a reliability lower than the predetermined value (S93; YES), the speaker device 20 presents that the reliability is low (S94). For example, the speaker device 20 may display that the reliability is low on the monitor of the display unit 13 or may output the sound to the speaker unit 17 with sound.
 更に、発話者装置20は、信頼度が低いことを提示すると共に、その翻訳テキストデータを聞き手装置30に送信するか否かをユーザに選択させる操作画面をモニタに表示させる。発話者装置20は、操作画像を介したユーザ操作により、ユーザが送信を選択したか否かを判定する(S95)。発話者装置20は、ユーザが送信を選択したと判定すると(S95;YES)、上述のように(S88)を実行する。発話者装置20は、ユーザが送信を選択しなかったと判定した場合には(S95;NO)、エラー情報を出力する(S87)。 Furthermore, the speaker device 20 presents that the reliability is low, and causes the monitor to display an operation screen that allows the user to select whether or not to transmit the translated text data to the listener device 30. The speaker device 20 determines whether or not the user has selected transmission through a user operation via the operation image (S95). When the speaker device 20 determines that the user has selected transmission (S95; YES), the speaker device 20 executes (S88) as described above. If the speaker device 20 determines that the user has not selected transmission (S95; NO), the speaker device 20 outputs error information (S87).
 図9では、複数の工程(処理)が順番に表されているが、第二実施形態で実行される工程及びその工程の実行順序は、図9の例のみに限定されない。例えば、図9に示される(S95)を省き、信頼度が所定値よりも低い場合には(S93;YES)、発話者装置20は、無条件に、その翻訳テキストデータを聞き手装置30に送らず、エラー情報を出力してもよい(S87)。また、発話者装置20は、信頼度と所定値との比較結果に依存せず、応答データに含まれる信頼度情報を常に提示するようにしてもよい。 In FIG. 9, a plurality of steps (processes) are shown in order, but the steps executed in the second embodiment and the execution order of the steps are not limited to the example of FIG. For example, when (S95) shown in FIG. 9 is omitted and the reliability is lower than a predetermined value (S93; YES), the speaker device 20 unconditionally sends the translated text data to the listener device 30. Instead, error information may be output (S87). Further, the speaker device 20 may always present the reliability information included in the response data without depending on the comparison result between the reliability and the predetermined value.
 〔第二実施形態における作用及び効果〕
 上述のように、第二実施形態では、音声認識結果の信頼度情報が、翻訳テキストデータ、言語データ、音声認識結果のテキストデータに関連付け可能な状態で、サーバ装置10から発話者装置20へ提供される。これにより、発話者装置20は、この信頼度情報に基づいて、翻訳テキストデータをそのまま聞き手装置30に送信するか否かを判定することができる。音声認識結果の信頼度が低い場合、音声認識結果のテキストデータの確度が低く、結果として、そのテキストデータから変換される翻訳テキストデータの確度も低くなる。従って、その信頼度情報を用いることで、誤った翻訳内容が聞き手装置30に提供されるのを防ぐことができる。また、発話者装置20がその信頼度情報を提示すれば、発話者に音声認識の信頼度が低いことを認識させることができ、発話者に言い直しの機会を与えることができる。これにより、発話内容を他の言語で適切に聞き手に伝えることができる。更に、信頼度が低い場合に、発話者装置20が翻訳テキストデータの聞き手装置30への送信をユーザに選択させることで、信頼度が低くても正しく翻訳された翻訳テキストデータについては、聞き手装置30に提供することができる。
[Operation and Effect in Second Embodiment]
As described above, in the second embodiment, the reliability information of the speech recognition result is provided from the server device 10 to the speaker device 20 in a state where the reliability information can be associated with the translated text data, the language data, and the text data of the speech recognition result. Is done. Thereby, the speaker apparatus 20 can determine whether or not to transmit the translated text data to the listener apparatus 30 as it is based on the reliability information. When the reliability of the speech recognition result is low, the accuracy of the text data of the speech recognition result is low, and as a result, the accuracy of the translated text data converted from the text data is also low. Therefore, by using the reliability information, it is possible to prevent erroneous translation contents from being provided to the listener device 30. Further, if the speaker device 20 presents the reliability information, the speaker can be made to recognize that the reliability of voice recognition is low, and the speaker can be given a chance to rephrase. Thereby, the utterance content can be appropriately conveyed to the listener in another language. In addition, when the reliability is low, the speaker device 20 causes the user to select transmission of the translated text data to the listener device 30, so that the translated text data correctly translated even if the reliability is low, the listener device 30 can be provided.
[第一実施形態及び第二実施形態の補足]
 図1には、1台のサーバ装置10が例示されているが、翻訳システムは、複数のサーバ装置10を含むこともできる。例えば、音声認識部31を有するサーバ装置10と翻訳部32を有するサーバ装置10とが異なる装置であってもよい。この場合、発話者装置20が発話音声データ等を送信するサーバ装置10と、発話者装置20が翻訳テキストデータ等を受信するサーバ装置10とが異なることになる。また、翻訳言語毎に異なるサーバ装置10が設けられてもよい。
[Supplement to the first embodiment and the second embodiment]
Although one server device 10 is illustrated in FIG. 1, the translation system can also include a plurality of server devices 10. For example, the server device 10 having the voice recognition unit 31 and the server device 10 having the translation unit 32 may be different devices. In this case, the server device 10 from which the utterer device 20 transmits uttered voice data or the like differs from the server device 10 from which the utterer device 20 receives the translated text data or the like. Different server devices 10 may be provided for each translated language.
 また、聞き手装置30は、他の聞き手装置30を介して、発話者装置20と通信を行ってもよい。例えば、発話者装置20と複数の聞き手装置30とは無線マルチホップネットワークを形成してもよい。この例によれば、発話者装置20からの電波が届かない位置に存在する聞き手装置30も発話者装置20から翻訳データの提供を受けることができる。無線マルチホップネットワークにおけるデータの伝搬手法には、周知の手法が利用されればよい。 Also, the listener device 30 may communicate with the speaker device 20 via another listener device 30. For example, the speaker device 20 and the plurality of listener devices 30 may form a wireless multi-hop network. According to this example, the listener device 30 existing at a position where the radio wave from the speaker device 20 does not reach can also receive translation data from the speaker device 20. A known technique may be used as a data propagation technique in the wireless multi-hop network.
[第三実施形態]
 以下、第三実施形態における情報処理装置及び翻訳データ提供方法について図10及び図11を用いて説明する。
[Third embodiment]
Hereinafter, the information processing apparatus and the translation data providing method according to the third embodiment will be described with reference to FIGS. 10 and 11.
 図10は、第三実施形態における情報処理装置の処理構成例を概念的に示す図である。図10に示されるように、情報処理装置50は、情報取得部51、送信部52、受信部53、提供部54等を有する。情報取得部51は、端末装置から言語情報を取得する。送信部52は、情報取得部51により取得された言語情報に対応する言語指定データ及び発話者の発話データをサーバ装置に送信する。受信部53は、発話データが言語指定データが示す言語に翻訳された翻訳データをサーバ装置から受信する。提供部54は、受信された翻訳データを上記端末装置に送信する。 FIG. 10 is a diagram conceptually illustrating a processing configuration example of the information processing apparatus in the third embodiment. As illustrated in FIG. 10, the information processing apparatus 50 includes an information acquisition unit 51, a transmission unit 52, a reception unit 53, a provision unit 54, and the like. The information acquisition unit 51 acquires language information from the terminal device. The transmission unit 52 transmits the language designation data corresponding to the language information acquired by the information acquisition unit 51 and the utterance data of the speaker to the server device. The receiving unit 53 receives from the server device the translation data in which the utterance data is translated into the language indicated by the language designation data. The providing unit 54 transmits the received translation data to the terminal device.
 情報処理装置50の一例が、上述の発話者装置20である。端末装置の一例が、上述の聞き手装置30であり、サーバ装置の一例が、上述のサーバ装置10である。但し、受信部53が翻訳データを受信するサーバ装置と、送信部52が音声データを送信するサーバ装置とが異なってもよい。 An example of the information processing apparatus 50 is the speaker apparatus 20 described above. An example of the terminal device is the listener device 30 described above, and an example of the server device is the server device 10 described above. However, the server device from which the receiving unit 53 receives translation data may be different from the server device from which the transmitting unit 52 transmits voice data.
 送信部52の具体的処理内容の一例が、上述の送信部24である。送信部52により送信される発話データは、音声データでなくてもよい。例えば、送信部52は、発話テキストデータを発話データとしてサーバ装置に送信してもよい。発話テキストデータは、情報処理装置50の入力装置をユーザが操作することで入力されてもよい。また、情報処理装置50が上述の音声認識部31を有し、当該発話テキストデータは、その音声認識部31により発話音声データから変換されてもよい。この場合、情報処理装置50は、音声認識結果のテキストデータを生成し、音声認識の信頼度を算出することができる。発話データの送信先であるサーバ装置は、音声認識部31を持たなくてもよい。 An example of the specific processing content of the transmission unit 52 is the transmission unit 24 described above. The utterance data transmitted by the transmission unit 52 may not be voice data. For example, the transmission unit 52 may transmit the utterance text data as the utterance data to the server device. The utterance text data may be input by the user operating the input device of the information processing device 50. Further, the information processing apparatus 50 may include the voice recognition unit 31 described above, and the utterance text data may be converted from the utterance voice data by the voice recognition unit 31. In this case, the information processing apparatus 50 can generate text data as a speech recognition result and calculate the reliability of speech recognition. The server device that is the transmission destination of the utterance data may not have the voice recognition unit 31.
 送信部52は、発話者の言語情報に対応する言語データを送信しなくてもよい。これは、発話者の言語が固定的に1つに決められている場合や、サーバ装置側で発話データから言語が自動で認識可能である場合等に該当する。 The transmission unit 52 may not transmit language data corresponding to the language information of the speaker. This corresponds to a case where the language of the speaker is fixedly fixed to one, or a case where the language can be automatically recognized from the utterance data on the server device side.
 受信部53の具体的処理内容の一例が、上述の受信部25で示される。受信部53により受信される翻訳データは、テキストデータではなく、音声データであってもよい。この場合、サーバ装置は、翻訳音声データを生成し、送信する。また、受信部53は、送信部52により送信される言語指定データが1つの言語を示す場合等には、翻訳データに対応する言語データを受信しなくてもよい。更に、受信部53は、音声認識結果のテキストデータも受信しなくてもよい。翻訳データのみが端末装置に提供されればよく、情報処理装置50において音声認識結果のテキストデータを必ずしも提示しなくてもよいからである。 An example of the specific processing content of the receiving unit 53 is indicated by the receiving unit 25 described above. The translation data received by the receiving unit 53 may be voice data instead of text data. In this case, the server device generates and transmits translated voice data. In addition, when the language designation data transmitted by the transmission unit 52 indicates one language, the reception unit 53 does not need to receive language data corresponding to the translation data. Further, the receiving unit 53 may not receive the text data of the voice recognition result. This is because only the translation data need be provided to the terminal device, and the text data of the speech recognition result does not necessarily have to be presented in the information processing device 50.
 提供部54の具体的処理内容の一例が、上述の提供部26で示される。提供部54により送信される翻訳データは、テキストデータではなく、音声データであってもよい。受信部53がサーバ装置から翻訳テキストデータを取得する場合、提供部54は、その翻訳テキストデータを読み上げる翻訳音声データを生成し、その翻訳音声データを端末装置に送信してもよい。 An example of the specific processing content of the providing unit 54 is shown by the providing unit 26 described above. The translation data transmitted by the providing unit 54 may be voice data instead of text data. When the receiving unit 53 acquires the translated text data from the server device, the providing unit 54 may generate translated speech data that reads the translated text data, and transmit the translated speech data to the terminal device.
 また、提供部54は、端末IDを指定したユニキャスト通信ではなく、翻訳データを、その翻訳データに対応する言語データと関連付けて、無線ブロードキャスト送信することもできる。この場合、端末装置は、受信された翻訳データの中から、所望の言語データと関連付けられた翻訳データを抽出すればよい。 Also, the providing unit 54 can transmit the broadcast data in association with the language data corresponding to the translation data, instead of the unicast communication designating the terminal ID. In this case, the terminal device may extract translation data associated with desired language data from the received translation data.
 情報取得部51の具体的処理内容の一例が、上述の情報取得部21で示される。但し、情報取得部51は、提供部54が無線ブロードキャストで翻訳データを送信する場合には、装置IDを取得しなくてもよい。 An example of the specific processing content of the information acquisition unit 51 is indicated by the information acquisition unit 21 described above. However, the information acquisition unit 51 may not acquire the device ID when the providing unit 54 transmits the translation data by wireless broadcast.
 図10に示されるように、情報処理装置50は、対応記憶部22を有していなくてもよい。この場合、情報取得部51は、他のコンピュータが有する対応記憶部22に、各言語情報及び各端末IDを対応付けて格納すればよい。また、情報取得部51は、装置IDを取得しない場合には、言語情報のみを保持すればよい。 As illustrated in FIG. 10, the information processing apparatus 50 may not include the correspondence storage unit 22. In this case, the information acquisition unit 51 may store each language information and each terminal ID in association with each other in the correspondence storage unit 22 of another computer. Moreover, the information acquisition part 51 should hold | maintain only language information, when not acquiring apparatus ID.
 図10に示される情報処理装置50は、例えば、図2に示される上述の発話者装置20と同様のハードウェア構成を有し、その発話者装置20と同様にプログラムが処理されることで、上述の各処理部が実現される。情報処理装置50のハードウェア構成は制限されない。 The information processing apparatus 50 shown in FIG. 10 has, for example, the same hardware configuration as the above-described speaker apparatus 20 shown in FIG. 2, and the program is processed in the same manner as the speaker apparatus 20. Each processing unit described above is realized. The hardware configuration of the information processing apparatus 50 is not limited.
 図11は、第三実施形態における情報処理装置50の動作例を示すフローチャートである。図11に示されるように、第三実施形態における翻訳データ提供方法は、情報処理装置50のような少なくとも1つのコンピュータにより実行される。例えば、図示される各工程は、情報処理装置50が有する各処理部により実行される。 FIG. 11 is a flowchart showing an operation example of the information processing apparatus 50 in the third embodiment. As shown in FIG. 11, the translation data providing method in the third embodiment is executed by at least one computer such as the information processing apparatus 50. For example, each illustrated process is performed by each processing unit included in the information processing apparatus 50.
 本実施形態における翻訳データ提供方法は、(S111)~(S116)を含む。(S111)では、コンピュータが、端末装置から言語情報を取得する。(S112)では、コンピュータが、(S111)で取得された言語情報に対応する言語指定データ及び発話者の発話データをサーバ装置に送信する。(S113)では、コンピュータが、応答データをサーバ装置から受信する。応答データが正常応答を示す場合(S114;YES)、応答データは、発話データが言語指定データが示す言語に翻訳された翻訳データを含む。応答データが正常応答を示さない場合(S114;NO)、コンピュータは、エラー情報を出力する(S116)。(S115)では、コンピュータは、(S113)で受信された翻訳データを端末装置に送信する。 The translation data providing method in this embodiment includes (S111) to (S116). In (S111), the computer acquires language information from the terminal device. In (S112), the computer transmits the language designation data corresponding to the language information acquired in (S111) and the utterance data of the speaker to the server device. In (S113), the computer receives response data from the server device. When the response data indicates a normal response (S114; YES), the response data includes translation data in which the utterance data is translated into the language indicated by the language designation data. When the response data does not indicate a normal response (S114; NO), the computer outputs error information (S116). In (S115), the computer transmits the translation data received in (S113) to the terminal device.
 (S111)の一例が、図8及び図9の(S81)であり、(S112)の一例が、図8及び図9の(S84)であり、(S113)の一例が、図8の(S85)及び図9の(S91)である。(S115)の一例が、図8及び図9の(S88)及び(S89)であり、(S116)の一例が、図8及び図9の(S87)である。 An example of (S111) is (S81) of FIGS. 8 and 9, an example of (S112) is (S84) of FIGS. 8 and 9, and an example of (S113) is (S85) of FIG. ) And (S91) of FIG. An example of (S115) is (S88) and (S89) in FIGS. 8 and 9, and an example of (S116) is (S87) in FIGS.
 また、第三実施形態は、このような翻訳データ提供方法を少なくとも1つのコンピュータに実行させるプログラムであってもよいし、このようなプログラムを記録した当該少なくとも1つのコンピュータが読み取り可能な記録媒体であってもよい。 The third embodiment may be a program that causes at least one computer to execute such a method for providing translation data, or a recording medium that can be read by at least one computer that records such a program. There may be.
 第三実施形態によれば、上述の第一実施形態及び第二実施形態と同様の作用効果を得ることができる。 According to the third embodiment, the same operational effects as those of the first embodiment and the second embodiment described above can be obtained.
 以下に実施例を挙げ、上述の各実施形態を更に詳細に説明する。本発明は以下の実施例から何ら限定を受けない。 Hereinafter, examples will be given and the above-described embodiments will be described in more detail. The present invention is not limited in any way by the following examples.
 翻訳データの提供を受けるにあたり、聞き手は、自身の聞き手装置30を操作して、自身の聞き手装置30を発話者装置20とペアリングをさせる。聞き手装置30と発話者装置20との間のペアリングは、両端末間の無線通信の形態(Bluetooth(登録商標)、ZigBee、NFC、Wi-Fi等)に対応する認証等を経て、実現される。このペアリングの過程で、発話者装置20(情報取得部21)は、各聞き手装置30から端末IDをそれぞれ取得してもよい。更に、発話者装置20は、ペアリング時に、各聞き手装置30との間に無線チャネルをそれぞれ確立すると共に、各聞き手装置30からユーザプロフィール情報をそれぞれ受信する(情報取得部21及び情報取得部51)。このユーザプロフィール情報に、聞き手の言語情報が含まれる。これにより、聞き手装置30のユーザは、発話者装置20とペアリングを行う指示操作をするだけで、翻訳データの提供を受けることができる。 When receiving the translation data, the listener operates his / her listener device 30 to pair his / her listener device 30 with the speaker device 20. Pairing between the listener device 30 and the speaker device 20 is realized through authentication corresponding to a form of wireless communication (Bluetooth (registered trademark), ZigBee, NFC, Wi-Fi, etc.) between both terminals. The In this pairing process, the speaker device 20 (information acquisition unit 21) may acquire a terminal ID from each listener device 30. Furthermore, at the time of pairing, the speaker device 20 establishes a radio channel with each of the listener devices 30 and receives user profile information from each of the listener devices 30 (the information acquisition unit 21 and the information acquisition unit 51). ). This user profile information includes the language information of the listener. Thereby, the user of the listener device 30 can receive provision of translation data only by performing an instruction operation for pairing with the speaker device 20.
 また、発話者装置20は、人体の表面電界を利用するヒューマンエリアネットワーク技術を用いて、聞き手の言語情報を聞き手装置30から取得することもできる。この場合、情報取得部21及び情報取得部51は、聞き手装置30のような端末装置との間での人体通信の成功に伴い、その端末装置から言語情報を取得する。この場合、発話者装置20は、ヒューマンエリアネットワーク技術を用いた人体通信を行う通信ユニット15を持ち、通信ユニット15を用いた人体通信により、言語情報を取得する。このようにすれば、聞き手装置30のユーザは、聞き手装置30を保持しつつ、発話者装置20の保持者と、握手のような体を触れ合わせる行動をするだけで、簡単に、翻訳データの提供を受けることができる。 Further, the speaker device 20 can also acquire the listener's language information from the listener device 30 by using a human area network technology that uses the surface electric field of the human body. In this case, the information acquisition unit 21 and the information acquisition unit 51 acquire language information from the terminal device as the human body communication with the terminal device such as the listener device 30 is successful. In this case, the speaker apparatus 20 has a communication unit 15 that performs human body communication using human area network technology, and acquires language information through human body communication using the communication unit 15. In this way, the user of the listener device 30 simply holds the listener device 30 and simply touches the holder of the speaker device 20 with the body like a handshake, so that the translation data can be easily obtained. You can receive the offer.
 上述の各実施形態は、発話者と聞き手装置30のユーザである聞き手との会話における発話を翻訳対象とすることができる。更に、各実施形態は、講演会やセミナー等での講演者の発話を翻訳対象とすることもできる。この場合、各々異なる言語で聴講することを希望する複数の聞き手が存在する可能性がある。発話者装置20にペアリングできる聞き手装置30の数に制限がある場合でも、無線マルチホップネットワークを用いることで、複数の聞き手装置30が発話者装置20と通信をすることができる。また、無線マルチホップネットワークを用いなくとも、複数の発話者装置20を用いて、全ての聞き手装置30がいずれか1つの発話者装置20とペアリングできるようにすることもできる。上述の各実施形態によれば、各聞き手は、所望の言語の翻訳データを略同時にそれぞれ聞くことができる。 In each of the above-described embodiments, an utterance in a conversation between a speaker and a listener who is a user of the listener device 30 can be a translation target. Furthermore, in each embodiment, the speech of a speaker at a lecture or seminar can be translated. In this case, there may be a plurality of listeners who wish to listen in different languages. Even if the number of listener devices 30 that can be paired with the speaker device 20 is limited, a plurality of listener devices 30 can communicate with the speaker device 20 by using the wireless multi-hop network. Further, even if a wireless multi-hop network is not used, all the listener devices 30 can be paired with any one speaker device 20 by using a plurality of speaker devices 20. According to each embodiment described above, each listener can listen to translation data in a desired language almost simultaneously.
 なお、上述の説明で用いた複数のフローチャートでは、複数の工程(処理)が順番に記載されているが、各実施形態で実行される工程の実行順序は、その記載の順番に制限されない。各実施形態では、図示される工程の順番を内容的に支障のない範囲で変更することができる。また、上述の各実施形態及び各変形例は、内容が相反しない範囲で組み合わせることができる。 In the plurality of flowcharts used in the above description, a plurality of steps (processes) are described in order, but the execution order of the steps executed in each embodiment is not limited to the description order. In each embodiment, the order of the illustrated steps can be changed within a range that does not hinder the contents. Moreover, each above-mentioned embodiment and each modification can be combined in the range with which the content does not conflict.
 上記の内容の一部又は全部は、以下のようにも特定され得る。但し、上述の内容が以下の記載に限定されるものではない。 Some or all of the above contents can be specified as follows. However, the above-mentioned content is not limited to the following description.
1. 端末装置から言語情報を取得する情報取得手段と、
 前記取得された言語情報に対応する言語指定データ及び発話者の発話データをサーバ装置に送信する送信手段と、
 前記発話データが前記言語指定データが示す言語に翻訳された翻訳データをサーバ装置から受信する受信手段と、
 前記受信された翻訳データを前記端末装置に送信する提供手段と、
 を備える情報処理装置。
2. 前記情報取得手段は、複数の端末装置から複数の異なる言語情報を取得し、
 前記送信手段は、前記取得された複数の言語情報に対応する前記言語指定データ及び前記発話データをサーバ装置に送信し、
 前記受信手段は、サーバ装置から前記言語指定データが示す複数の言語に翻訳された複数の翻訳データを各翻訳データに対応する言語データとそれぞれ関連付けられた状態で受信し、
 前記提供手段は、前記複数の端末装置が、前記受信された複数の翻訳データの中の、各端末装置の言語情報に対応する翻訳データをそれぞれ受信できるように、前記受信された複数の翻訳データを送信する、
 1.に記載の情報処理装置。
3. 前記情報取得手段は、前記端末装置から、前記言語情報及び端末識別情報を取得し、各端末識別情報と各言語情報とを対応付けて記憶し、
 前記提供手段は、前記受信された翻訳データを、その翻訳データに対応する言語情報と対応付けられて記憶される端末識別情報を宛先に指定して送信する、
 1.又は2.に記載の情報処理装置。
4. 前記提供手段は、前記受信された翻訳データを、その翻訳データに対応する言語データと関連付けて、無線ブロードキャスト送信する、
 1.から3.のいずれか1つに記載の情報処理装置。
5. 前記発話者の発話音声データ及び言語情報を取得する発話データ取得手段、
 を更に備え、
 前記送信手段は、前記発話者の言語情報に対応する言語データ、前記発話データとしての前記発話音声データ、及び、前記言語指定データをサーバ装置に送信し、
 前記受信手段は、サーバ装置から翻訳テキストデータを前記翻訳データとして受信する、
 1.から4.のいずれか1つに記載の情報処理装置。
6. 前記受信手段は、前記言語指定データが示す複数の言語に対応する、前記翻訳テキストデータとその翻訳テキストデータに対応する言語データとの複数ペアを、その翻訳テキストデータの元となった発話音声データの音声認識結果のテキストデータと関連付け可能な状態で受信する、
 5.に記載の情報処理装置。
7. 前記受信手段は、前記翻訳テキストデータ、その翻訳テキストデータに対応する言語データ、その翻訳テキストデータの元となる発話音声データの音声認識結果のテキストデータ、及び、その音声認識結果の信頼度情報を関連付け可能な状態で受信する、
 5.又は6.に記載の情報処理装置。
8. 前記情報取得手段は、前記端末装置との間での人体通信の成功に伴い、前記端末装置から前記言語情報を取得する、
 1.から7.のいずれか1つに記載の情報処理装置。
1. Information acquisition means for acquiring language information from the terminal device;
Transmitting means for transmitting the language designation data corresponding to the acquired language information and the utterance data of the speaker to the server device;
Receiving means for receiving, from a server device, translation data in which the utterance data is translated into a language indicated by the language designation data;
Providing means for transmitting the received translation data to the terminal device;
An information processing apparatus comprising:
2. The information acquisition means acquires a plurality of different language information from a plurality of terminal devices,
The transmission means transmits the language designation data and the utterance data corresponding to the acquired plurality of language information to a server device,
The receiving means receives a plurality of translation data translated into a plurality of languages indicated by the language designation data from a server device in a state associated with language data corresponding to each translation data,
The providing unit is configured to receive the plurality of received translation data so that the plurality of terminal devices can respectively receive translation data corresponding to language information of each terminal device among the plurality of received translation data. Send,
1. The information processing apparatus described in 1.
3. The information acquisition means acquires the language information and terminal identification information from the terminal device, stores each terminal identification information and each language information in association with each other,
The providing means transmits the received translation data by specifying terminal identification information stored in association with language information corresponding to the translation data as a destination;
1. Or 2. The information processing apparatus described in 1.
4). The providing means associates the received translation data with language data corresponding to the translation data, and transmits by radio broadcast.
1. To 3. The information processing apparatus according to any one of the above.
5. Speech data acquisition means for acquiring speech data and language information of the speaker;
Further comprising
The transmission means transmits language data corresponding to the language information of the speaker, the speech data as the speech data, and the language designation data to a server device,
The receiving means receives translated text data from the server device as the translated data;
1. To 4. The information processing apparatus according to any one of the above.
6). The receiving unit is configured to generate a plurality of pairs of the translated text data and language data corresponding to the translated text data corresponding to a plurality of languages indicated by the language designation data, and utterance voice data based on the translated text data Receive in a state that can be associated with the text data of the voice recognition result of
5. The information processing apparatus described in 1.
7). The receiving means includes the translated text data, language data corresponding to the translated text data, text data of speech recognition results of speech voice data that is the basis of the translated text data, and reliability information of the speech recognition results. Receive in a state that can be associated,
5. Or 6. The information processing apparatus described in 1.
8). The information acquisition means acquires the language information from the terminal device with the success of human body communication with the terminal device.
1. To 7. The information processing apparatus according to any one of the above.
9. 少なくとも1つのコンピュータに実行される翻訳データ提供方法において、
 端末装置から言語情報を取得し、
 前記取得された言語情報に対応する言語指定データ及び発話者の発話データをサーバ装置に送信し、
 前記発話データが前記言語指定データが示す言語に翻訳された翻訳データをサーバ装置から受信し、
 前記受信された翻訳データを前記端末装置に送信する、
 ことを含む翻訳データ提供方法。
10. 複数の端末装置から複数の異なる言語情報を取得し、
 前記取得された複数の言語情報に対応する前記言語指定データ及び前記発話データを送信し、
 前記言語指定データが示す複数の言語に翻訳された複数の翻訳データを各翻訳データに対応する言語データとそれぞれ関連付けられた状態で受信し、
 前記複数の端末装置が、前記受信された複数の翻訳データの中の、各端末装置の言語情報に対応する翻訳データをそれぞれ受信できるように、前記受信された複数の翻訳データを送信する、
 ことを更に含む9.に記載の翻訳データ提供方法。
11. 前記端末装置から端末識別情報を取得し、
 各端末識別情報と各言語情報とを対応付けて記憶する、
 ことを更に含み、
 前記端末装置への送信は、前記受信された翻訳データを、その翻訳データに対応する言語情報と対応付けられて記憶される端末識別情報を宛先に指定して送信する、
 9.又は10.に記載の翻訳データ提供方法。
12. 前記端末装置への送信は、前記受信された翻訳データを、その翻訳データに対応する言語データと関連付けて、無線ブロードキャスト送信する、
 9.から11.のいずれか1つに記載の翻訳データ提供方法。
13. 前記発話者の発話音声データ及び言語情報を取得する、
 ことを更に含み、
 前記サーバ装置への送信は、前記発話者の言語情報に対応する言語データ、前記発話データとしての前記発話音声データ、及び、前記言語指定データを送信し、
 前記サーバ装置からの受信は、前記サーバ装置から翻訳テキストデータを前記翻訳データとして受信する、
 9.から12.のいずれか1つに記載の翻訳データ提供方法。
14. 前記サーバ装置からの受信は、前記言語指定データが示す複数の言語に対応する、前記翻訳テキストデータとその翻訳テキストデータに対応する言語データとの複数ペアを、その翻訳テキストデータの元となった発話音声データの音声認識結果のテキストデータと関連付け可能な状態で受信する、
 13.に記載の翻訳データ提供方法。
15. 前記サーバ装置からの受信は、前記翻訳テキストデータ、その翻訳テキストデータに対応する言語データ、その翻訳テキストデータの元となる発話音声データの音声認識結果のテキストデータ、及び、その音声認識結果の信頼度情報を関連付け可能な状態で受信する、
 13.又は14.に記載の翻訳データ提供方法。
16. 前記言語情報の取得は、前記端末装置との間での人体通信の成功に伴い、前記端末装置から前記言語情報を取得する、
 9.から15.のいずれか1つに記載の翻訳データ提供方法。
9. In a translation data providing method executed on at least one computer,
Get language information from the terminal
Transmitting the language designation data corresponding to the acquired language information and the utterance data of the speaker to the server device;
Receiving the translation data from the server device, wherein the speech data is translated into the language indicated by the language designation data,
Transmitting the received translation data to the terminal device;
Translation data provision method including the above.
10. Acquire multiple different language information from multiple terminal devices,
Transmitting the language designation data and the utterance data corresponding to the acquired plurality of language information;
Receiving a plurality of translation data translated into a plurality of languages indicated by the language designation data in a state associated with each of the language data corresponding to each translation data;
Transmitting the plurality of received translation data so that the plurality of terminal devices can respectively receive translation data corresponding to language information of each terminal device among the plurality of received translation data;
Further includes: The translation data providing method described in 1.
11. Obtaining terminal identification information from the terminal device;
Each terminal identification information and each language information are stored in association with each other.
Further including
For transmission to the terminal device, the received translation data is transmitted by designating terminal identification information stored in association with language information corresponding to the translation data as a destination.
9. Or 10. The translation data providing method described in 1.
12 For transmission to the terminal device, the received translation data is associated with language data corresponding to the translation data, and is transmitted by radio broadcast.
9. To 11. The translation data provision method according to any one of the above.
13 Obtaining speech data and language information of the speaker,
Further including
Transmission to the server device, the language data corresponding to the language information of the speaker, the speech data as the speech data, and the language designation data,
Reception from the server device receives translation text data as the translation data from the server device,
9. To 12. The translation data provision method according to any one of the above.
14 The reception from the server device is based on a plurality of pairs of the translated text data and language data corresponding to the translated text data corresponding to a plurality of languages indicated by the language designation data. Received in a state where it can be associated with the text data of the speech recognition result of the speech data.
13 The translation data providing method described in 1.
15. The server device receives the translated text data, language data corresponding to the translated text data, text data of speech recognition results of speech voice data that is the basis of the translated text data, and trust of the speech recognition results. Receive information in a state that can be correlated,
13 Or 14. The translation data providing method described in 1.
16. The acquisition of the language information is acquired from the terminal device with the success of human body communication with the terminal device,
9. To 15. The translation data provision method according to any one of the above.
17. 9.から16.のいずれか1つに記載の翻訳データ提供方法を少なくとも1つのコンピュータに実行させるプログラム。
18. 17.に記載のプログラムをコンピュータが読み取り可能に記録する記録媒体。
17. 9. To 16. A program that causes at least one computer to execute the translation data providing method according to any one of the above.
18. 17. The recording medium which records the program as described in readable by a computer.
 この出願は、2014年7月8日に出願された日本出願特願2014-140134号を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2014-140134 filed on July 8, 2014, the entire disclosure of which is incorporated herein.

Claims (10)

  1.  端末装置から言語情報を取得する情報取得手段と、
     前記取得された言語情報に対応する言語指定データ及び発話者の発話データをサーバ装置に送信する送信手段と、
     前記発話データが前記言語指定データが示す言語に翻訳された翻訳データをサーバ装置から受信する受信手段と、
     前記受信された翻訳データを前記端末装置に送信する提供手段と、
     を備える情報処理装置。
    Information acquisition means for acquiring language information from the terminal device;
    Transmitting means for transmitting the language designation data corresponding to the acquired language information and the utterance data of the speaker to the server device;
    Receiving means for receiving, from a server device, translation data in which the utterance data is translated into a language indicated by the language designation data;
    Providing means for transmitting the received translation data to the terminal device;
    An information processing apparatus comprising:
  2.  前記情報取得手段は、複数の端末装置から複数の異なる言語情報を取得し、
     前記送信手段は、前記取得された複数の言語情報に対応する前記言語指定データ及び前記発話データをサーバ装置に送信し、
     前記受信手段は、サーバ装置から前記言語指定データが示す複数の言語に翻訳された複数の翻訳データを各翻訳データに対応する言語データとそれぞれ関連付けられた状態で受信し、
     前記提供手段は、前記複数の端末装置が、前記受信された複数の翻訳データの中の、各端末装置の言語情報に対応する翻訳データをそれぞれ受信できるように、前記受信された複数の翻訳データを送信する、
     請求項1に記載の情報処理装置。
    The information acquisition means acquires a plurality of different language information from a plurality of terminal devices,
    The transmission means transmits the language designation data and the utterance data corresponding to the acquired plurality of language information to a server device,
    The receiving means receives a plurality of translation data translated into a plurality of languages indicated by the language designation data from a server device in a state associated with language data corresponding to each translation data,
    The providing unit is configured to receive the plurality of received translation data so that the plurality of terminal devices can respectively receive translation data corresponding to language information of each terminal device among the plurality of received translation data. Send,
    The information processing apparatus according to claim 1.
  3.  前記情報取得手段は、前記端末装置から、前記言語情報及び端末識別情報を取得し、各端末識別情報と各言語情報とを対応付けて記憶し、
     前記提供手段は、前記受信された翻訳データを、その翻訳データに対応する言語情報と対応付けられて記憶される端末識別情報を宛先に指定して送信する、
     請求項1又は2に記載の情報処理装置。
    The information acquisition means acquires the language information and terminal identification information from the terminal device, stores each terminal identification information and each language information in association with each other,
    The providing means transmits the received translation data by specifying terminal identification information stored in association with language information corresponding to the translation data as a destination;
    The information processing apparatus according to claim 1 or 2.
  4.  前記提供手段は、前記受信された翻訳データを、その翻訳データに対応する言語データと関連付けて、無線ブロードキャスト送信する、
     請求項1から3のいずれか1項に記載の情報処理装置。
    The providing means associates the received translation data with language data corresponding to the translation data, and transmits by radio broadcast.
    The information processing apparatus according to any one of claims 1 to 3.
  5.  前記発話者の発話音声データ及び言語情報を取得する発話データ取得手段、
     を更に備え、
     前記送信手段は、前記発話者の言語情報に対応する言語データ、前記発話データとしての前記発話音声データ、及び、前記言語指定データをサーバ装置に送信し、
     前記受信手段は、サーバ装置から翻訳テキストデータを前記翻訳データとして受信する、
     請求項1から4のいずれか1項に記載の情報処理装置。
    Speech data acquisition means for acquiring speech data and language information of the speaker;
    Further comprising
    The transmission means transmits language data corresponding to the language information of the speaker, the speech data as the speech data, and the language designation data to a server device,
    The receiving means receives translated text data from the server device as the translated data;
    The information processing apparatus according to any one of claims 1 to 4.
  6.  前記受信手段は、前記言語指定データが示す複数の言語に対応する、前記翻訳テキストデータとその翻訳テキストデータに対応する言語データとの複数ペアを、その翻訳テキストデータの元となった発話音声データの音声認識結果のテキストデータと関連付け可能な状態で受信する、
     請求項5に記載の情報処理装置。
    The receiving unit is configured to generate a plurality of pairs of the translated text data and language data corresponding to the translated text data corresponding to a plurality of languages indicated by the language designation data, and utterance voice data based on the translated text data Receive in a state that can be associated with the text data of the voice recognition result of
    The information processing apparatus according to claim 5.
  7.  前記受信手段は、前記翻訳テキストデータ、その翻訳テキストデータに対応する言語データ、その翻訳テキストデータの元となる発話音声データの音声認識結果のテキストデータ、及び、その音声認識結果の信頼度情報を関連付け可能な状態で受信する、
     請求項5又は6に記載の情報処理装置。
    The receiving means includes the translated text data, language data corresponding to the translated text data, text data of speech recognition results of speech voice data that is the basis of the translated text data, and reliability information of the speech recognition results. Receive in a state that can be associated,
    The information processing apparatus according to claim 5 or 6.
  8.  前記情報取得手段は、前記端末装置との間での人体通信の成功に伴い、前記端末装置から前記言語情報を取得する、
     請求項1から7のいずれか1項に記載の情報処理装置。
    The information acquisition means acquires the language information from the terminal device with the success of human body communication with the terminal device.
    The information processing apparatus according to any one of claims 1 to 7.
  9.  少なくとも1つのコンピュータに実行される翻訳データ提供方法において、
     端末装置から言語情報を取得し、
     前記取得された言語情報に対応する言語指定データ及び発話者の発話データをサーバ装置に送信し、
     前記発話データが前記言語指定データが示す言語に翻訳された翻訳データをサーバ装置から受信し、
     前記受信された翻訳データを前記端末装置に送信する、
     ことを含む翻訳データ提供方法。
    In a translation data providing method executed on at least one computer,
    Get language information from the terminal
    Transmitting the language designation data corresponding to the acquired language information and the utterance data of the speaker to the server device;
    Receiving the translation data from the server device, wherein the speech data is translated into the language indicated by the language designation data,
    Transmitting the received translation data to the terminal device;
    Translation data provision method including the above.
  10.  請求項9に記載の翻訳データ提供方法を少なくとも1つのコンピュータに実行させるプログラム。 A program for causing at least one computer to execute the translation data providing method according to claim 9.
PCT/JP2015/065266 2014-07-08 2015-05-27 Information processing device, and translation-data provision method WO2016006354A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2016532491A JPWO2016006354A1 (en) 2014-07-08 2015-05-27 Information processing apparatus and translation data providing method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2014-140134 2014-07-08
JP2014140134 2014-07-08

Publications (1)

Publication Number Publication Date
WO2016006354A1 true WO2016006354A1 (en) 2016-01-14

Family

ID=55063996

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/065266 WO2016006354A1 (en) 2014-07-08 2015-05-27 Information processing device, and translation-data provision method

Country Status (2)

Country Link
JP (1) JPWO2016006354A1 (en)
WO (1) WO2016006354A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018186416A1 (en) * 2017-04-03 2018-10-11 旋造 田代 Translation processing method, translation processing program, and recording medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05334353A (en) * 1992-06-02 1993-12-17 A T R Jido Honyaku Denwa Kenkyusho:Kk Speech translation and communication system
JP2006004296A (en) * 2004-06-18 2006-01-05 Cypress Soft Kk Multi-lingual translation system, central processing unit, computer program, and multi-lingual translation method
JP2013171478A (en) * 2012-02-22 2013-09-02 Zenrin Datacom Co Ltd Retrieval server device, information retrieval method and information retrieval program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05334353A (en) * 1992-06-02 1993-12-17 A T R Jido Honyaku Denwa Kenkyusho:Kk Speech translation and communication system
JP2006004296A (en) * 2004-06-18 2006-01-05 Cypress Soft Kk Multi-lingual translation system, central processing unit, computer program, and multi-lingual translation method
JP2013171478A (en) * 2012-02-22 2013-09-02 Zenrin Datacom Co Ltd Retrieval server device, information retrieval method and information retrieval program

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018186416A1 (en) * 2017-04-03 2018-10-11 旋造 田代 Translation processing method, translation processing program, and recording medium

Also Published As

Publication number Publication date
JPWO2016006354A1 (en) 2017-06-01

Similar Documents

Publication Publication Date Title
JP6475386B2 (en) Device control method, device, and program
CN110730952B (en) Method and system for processing audio communication on network
KR101834546B1 (en) Terminal and handsfree device for servicing handsfree automatic interpretation, and method thereof
EP2770445A2 (en) Method and system for supporting a translation-based communication service and terminal supporting the service
JP6402748B2 (en) Spoken dialogue apparatus and utterance control method
US10741172B2 (en) Conference system, conference system control method, and program
KR20180020368A (en) Device and method of translating a language into another language
CN111095892A (en) Electronic device and control method thereof
US20160366528A1 (en) Communication system, audio server, and method for operating a communication system
WO2015158130A1 (en) Method and system of operating a social networking application via an external device
US20180288110A1 (en) Conference support system, conference support method, program for conference support device, and program for terminal
US20180309703A1 (en) Communication system and storage medium
TWI695281B (en) Translation system, translation method, and translation device
KR20130108173A (en) Question answering system using speech recognition by radio wire communication and its application method thereof
WO2015143114A1 (en) Sign language translation apparatus with smart glasses as display featuring a camera and optionally a microphone
Alkhalifa et al. Enssat: wearable technology application for the deaf and hard of hearing
JP2018174439A (en) Conference support system, conference support method, program of conference support apparatus, and program of terminal
JP2016076007A (en) Interactive apparatus and interactive method
CN113299309A (en) Voice translation method and device, computer readable medium and electronic equipment
WO2016006354A1 (en) Information processing device, and translation-data provision method
JP6669374B1 (en) Setting device, setting method and setting program
KR20130116128A (en) Question answering system using speech recognition by tts, its application method thereof
US20220199096A1 (en) Information processing apparatus and information processing method
WO2021134592A1 (en) Speech processing method, apparatus and device, and storage medium
JP6523974B2 (en) COMMUNICATION SUPPORT DEVICE, COMMUNICATION SUPPORT METHOD, AND PROGRAM

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15819400

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2016532491

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15819400

Country of ref document: EP

Kind code of ref document: A1