WO2017135214A1 - Système, procédé et programme de traduction de parole - Google Patents

Système, procédé et programme de traduction de parole Download PDF

Info

Publication number
WO2017135214A1
WO2017135214A1 PCT/JP2017/003300 JP2017003300W WO2017135214A1 WO 2017135214 A1 WO2017135214 A1 WO 2017135214A1 JP 2017003300 W JP2017003300 W JP 2017003300W WO 2017135214 A1 WO2017135214 A1 WO 2017135214A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
translation
unit
call
content
Prior art date
Application number
PCT/JP2017/003300
Other languages
English (en)
Japanese (ja)
Inventor
知高 大越
Original Assignee
株式会社リクルートライフスタイル
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社リクルートライフスタイル filed Critical 株式会社リクルートライフスタイル
Publication of WO2017135214A1 publication Critical patent/WO2017135214A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition

Definitions

  • the present invention relates to a speech translation system, a speech translation method, and a speech translation program.
  • identification information for identifying an interpreter for example, a telephone number
  • a telephone number when calling an interpreter while performing speech translation processing in a conventional speech translation apparatus
  • identification information for identifying an interpreter for example, a telephone number
  • a telephone number when calling an interpreter while performing speech translation processing in a conventional speech translation apparatus
  • the telephone number after specifying the telephone number, it is necessary to further perform a call operation, which may increase the burden on the user (user, speaker) and decrease convenience.
  • An object is to provide a speech translation system, a speech translation method, and a speech translation program that can be realized.
  • a speech translation system is provided between an information terminal that inputs a user's speech, a server device that translates the content of speech input to the information terminal, and the information terminal.
  • a speech translation system comprising: an interpreter terminal that performs the telephone call processing, wherein the server device differs in the content recognized by the speech recognition unit and the content recognized by the speech recognition unit.
  • a translation unit that translates the content into a language
  • the information terminal controls a speech output unit that outputs the content translated by the translation unit of the server device by voice, and a process of displaying the text of the translated content
  • Call processing control for controlling call processing between an interpreter terminal and a first display processing control unit that controls processing for selectively displaying a first image in addition to text
  • the first part When the image is selected, it comprises a call processing control unit for transmitting a call processing start request to initiate a call processing interpreter terminal, a speech translation system.
  • the server device further includes a score calculation unit that calculates a score related to translation accuracy, and the first display processing control unit performs a process of displaying the first image when the score is equal to or less than a predetermined threshold. You may control.
  • the server device further includes a storage unit that associates the translated content associated with the input speech content for each user and stores it as a translation history, and the interpreter terminal stores the translation history. You may further provide the 2nd display process control part which controls the process linked and displayed for every user.
  • the first display processing control unit controls processing for further displaying two or more second images respectively indicating two or more languages, and the call processing control unit selects one image of the second images.
  • call processing with the interpreter terminal associated with the interpreter who can use the language indicated by one of the selected second images is performed. You may control.
  • a speech translation method is the content of a user's speech, the content translated into content in a different language is output in speech, and the translated content Controlling the process of displaying text, controlling the process of selectively displaying the first image in addition to the text, and controlling the call process between the interpreter terminals, Transmitting a call process start request for starting the call process to the interpreter terminal when the first image is selected.
  • a speech translation program provides a computer, a speech output unit that outputs the content of a user's speech and the content translated into content of a different language, A first display processing control unit for controlling processing for displaying translated text, a first display processing control unit for controlling processing for selectively displaying a first image in addition to text, and an interpreter A call processing control unit for controlling a call process with a terminal, wherein when the first image is selected, a call process control unit for transmitting a call process start request for starting the call process to the interpreter terminal; And make it work.
  • part”, “apparatus”, and “system” do not simply mean physical means, but the functions of the “part”, “apparatus”, and “system” are realized by software. This includes cases where Further, even if the functions of one “part”, “apparatus”, and “system” are realized by two or more physical means and apparatuses, two or more “parts”, “apparatus”, “system” The function may be realized by one physical means or apparatus.
  • the burden on the user can be reduced and the convenience can be improved, the occurrence of mistranslation can be prevented, and smooth communication can be realized.
  • FIG. 1 is a system block diagram schematically illustrating a preferred embodiment of a network configuration according to a speech translation system according to the present disclosure.
  • FIG. It is a system block diagram showing roughly an example of composition of a user apparatus (information terminal) in a speech translation system by this indication.
  • It is a functional block diagram showing roughly an example of functional composition of a user apparatus (information terminal) in a speech translation system by this indication.
  • It is a system block diagram showing roughly an example of composition of a server apparatus in a speech translation system by this indication.
  • It is a functional block diagram which shows roughly an example of a function structure of the server apparatus in the speech translation system by this indication.
  • FIG. 1 It is a functional block diagram showing roughly an example of functional composition of an operator terminal in a speech translation system by this indication. It is a flowchart which shows an example of the process flow (part) in the speech translation system by this indication.
  • or (C) are top views which show an example of the transition of the display screen in the information terminal by this indication.
  • or (C) are top views which show an example of the transition of the display screen in the information terminal by this indication.
  • or (D) are top views which show an example of the transition of the display screen in the information terminal by this indication. It is a figure which shows an example of the display screen in the interpreter terminal by this indication. It is a flowchart which shows another example of the process flow (part) in the speech translation system by this indication.
  • FIG. 1 is a system block diagram schematically illustrating a preferred embodiment of a network configuration according to a speech translation system according to the present disclosure.
  • the speech translation system 100 exemplarily includes an information terminal 10 for inputting a user's voice, which is used by the user (speaker or other speaker), and an electronic device connected to the information terminal 10 via the network N.
  • the server device 20 that translates the content of the voice input to the information terminal 10 and the information terminal 10 and the server device 20 that are electronically connected via the network N to the operator terminal 30 (interpreter terminal).
  • an operator terminal 30 (interpreter terminal) that performs a call process with the information terminal 10 used by the interpreter.
  • FIG. 2 is a system block diagram schematically illustrating an example of the configuration of the user device (information terminal) in the speech translation system according to the present disclosure.
  • the information terminal 10 illustratively includes a processor 11, a storage resource 12, a voice input / output device 13 (for example, a microphone and a speaker that are separate or integrated), and communication.
  • An interface 14, an input device 15, a display device 16, and a camera 17 are provided.
  • the information terminal 10 operates by installed speech translation application software (at least a part of a speech translation program according to an embodiment of the present disclosure), so that a part of the speech translation system according to the embodiment of the present disclosure or It functions as a whole.
  • the information terminal 10 here is a portable tablet terminal device including a mobile phone represented by a smartphone having a communication function with the network N, for example.
  • the processor 11 includes an arithmetic logic unit and various registers (program counter, data register, instruction register, general-purpose register, etc.). Further, the processor 11 interprets and executes speech translation application software, which is the program P10 stored in the storage resource 12, and performs various processes.
  • the speech translation application software as the program P10 can be distributed from the server device 20 through the network N, for example, and may be installed and updated manually or automatically.
  • the network N includes, for example, a wired network (a short-range communication network (LAN), a wide-area communication network (WAN), a value-added communication network (VAN), etc.) and a wireless network (mobile communication network, satellite communication network, Bluetooth ( Bluetooth (registered trademark), WiFi (Wireless Fidelity), HSDPA (High Speed Downlink Packet Access, etc.).
  • a wired network a short-range communication network (LAN), a wide-area communication network (WAN), a value-added communication network (VAN), etc.
  • LAN short-range communication network
  • WAN wide-area communication network
  • VAN value-added communication network
  • wireless network mobile communication network
  • satellite communication network satellite communication network
  • Bluetooth Bluetooth (registered trademark)
  • WiFi Wireless Fidelity
  • HSDPA High Speed Downlink Packet Access
  • the storage resource 12 is a logical device provided by a storage area of a physical device (for example, a computer-readable storage medium such as a semiconductor memory), and an operating system program, a driver program, various information, etc. used for processing of the information terminal 10 Is stored.
  • a driver program include an input / output device driver program for controlling the audio input / output device 13, an input device driver program for controlling the input device 15, an output device driver program for controlling the display device 16, and the like.
  • the voice input / output device 13 is, for example, a general microphone and a sound player capable of reproducing sound data.
  • the communication interface 14 provides a connection interface with the server device 20 and the operator terminal 30, for example, and is configured by a wireless communication interface and / or a wired communication interface.
  • the input device 15 provides an interface for accepting an input operation by a tap operation such as an icon, a button, or a virtual keyboard displayed on the display device 16, and is externally attached to the information terminal 10 in addition to the touch panel.
  • a tap operation such as an icon, a button, or a virtual keyboard displayed on the display device 16
  • Various input devices can be exemplified.
  • the display device 16 provides various information as an image display interface to the user and the other party of conversation as necessary. Examples thereof include an organic EL display, a liquid crystal display, a CRT display, and preferably various methods. Including those using touch panels.
  • the camera 17 is for capturing still images and moving images of various subjects.
  • FIG. 3 is a functional block diagram schematically illustrating an example of a functional configuration of a user device (information terminal) in the speech translation system according to the present disclosure.
  • the information terminal 10 functionally includes a voice input / output unit 101, a transmission / reception unit 103, an input operation reception unit 105, a display unit 107, an information processing unit 109, and a storage unit 117.
  • the information processing unit 109 functionally includes a score comparison unit 111, a first display processing control unit 113, a call processing control unit 115, and an operator terminal specifying unit 116.
  • the voice input / output unit 101 inputs a user's voice, for example. Moreover, the voice input / output unit 101 outputs, for example, the contents translated by the server device 20 shown in FIG. Here, the voice input / output device 13 illustrated in FIG. 2 functions as the voice input / output unit 101.
  • the transmission / reception unit 103 transmits / receives various information to / from the server device 20 and the operator terminal 30 shown in FIG.
  • the transmission / reception unit 103 transmits the content of the input voice to the server device 20.
  • the transmission / reception unit 103 receives, for example, text information, audio information, and the like of content translated by the server device 20.
  • the transmission / reception part 103 receives the score regarding a translation precision from the server apparatus 20, for example.
  • the communication interface 14 illustrated in FIG. 2 functions as the transmission / reception unit 103.
  • the input operation accepting unit 105 is a block that accepts a user's input operation, for example.
  • the input device 15 illustrated in FIG. 2 functions as the input operation reception unit 105.
  • the display unit 107 displays various information.
  • the display unit 107 displays, for example, translated text. Further, the display unit 107 displays, for example, a language button 61 (second image) shown in FIG. 9A and a call start button 73 (first image) shown in FIG.
  • the display device 16 illustrated in FIG. 2 functions as the display unit 107.
  • the information processing unit 109 indicates the function of the processor 11 illustrated in FIG. 2, and the score comparison unit 111 compares, for example, a score related to the translation accuracy of the translation processing performed by the server device 20 with a predetermined threshold (score). .
  • the first display processing control unit 113 is a block that controls processing for displaying various types of information on the display unit 107.
  • the first display processing control unit 113 controls, for example, a process of displaying the text of the content translated in the server device 20, and in addition to the text of the content translated in the server device 20, the call shown in FIG. A process of selectively displaying the start button 73 (first image) is controlled.
  • the call processing control unit 115 is, for example, a block that controls call processing between the information terminal 10 and the operator terminal 30 and starts the call processing when the call start button 73 displayed on the display unit 107 is selected.
  • a call processing start request is transmitted to the operator terminal 30.
  • the operator terminal specifying unit 116 specifies the operator terminal 30 used by the interpreter who can use the language indicated by the English button selected in the language button 61 shown in FIG. 9A.
  • the storage unit 117 is a block that stores various programs and information used for processing of the information terminal 10.
  • the storage unit 117 stores, for example, text information, audio information, and the like of the content received by the transmission / reception unit 103 and translated by the server device 20.
  • the storage unit 117 stores a score related to the translation accuracy of the server device 20 received by the transmission / reception unit 103.
  • the storage resource 12 illustrated in FIG. 2 functions as the storage unit 117.
  • the camera 17 shown in FIG. 2 functions as, for example, an imaging unit (not shown in FIG. 3).
  • FIG. 4 is a system block diagram schematically illustrating an example of the configuration of the server device in the speech translation system according to the present disclosure.
  • the server device 20 illustratively includes a processor 21, a communication interface 22, and a storage resource 23.
  • the server device 20 is configured by, for example, a host computer having high arithmetic processing capability, and expresses a server function when a predetermined server program operates on the host computer.
  • a single or a plurality of host computers functioning as a speech synthesis server in the figure, it is indicated by a single, but is not limited thereto).
  • the processor 21 is composed of an arithmetic and logic unit for processing arithmetic operations, logical operations, bit operations and the like and various registers (program counter, data register, instruction register, general-purpose register, etc.), and is stored in the storage resource 23. P20 is interpreted and executed, and a predetermined calculation processing result is output.
  • the communication interface 22 is a hardware module for connecting to the information terminal 10 via the network N.
  • the communication interface 22 is a modulation / demodulation device such as an ISDN modem, an ADSL modem, a cable modem, an optical modem, or a soft modem.
  • the storage resource 23 is a logical device provided by, for example, a storage area of a physical device (a computer-readable storage medium such as a disk drive or a semiconductor memory), and each includes one or a plurality of programs P20, various modules L20, and various types.
  • a database D20 and various models M20 are stored.
  • the program P20 is the above-described server program that is the main program of the server device 20.
  • the various modules L20 perform a series of information processing related to requests and information transmitted from the information terminal 10, and thus are software modules (moduleized subprograms) that are appropriately called and executed during the operation of the program P20. ).
  • Examples of the module L20 include a speech recognition module, a translation module, and a speech synthesis module.
  • the various databases D20 include various corpora required for speech translation processing (for example, in the case of Japanese and English speech translation, a Japanese speech corpus, an English speech corpus, a Japanese character (vocabulary) corpus, an English character) (Vocabulary) corpus, Japanese dictionary, English dictionary, Japanese-English bilingual dictionary, Japanese-English bilingual corpus, etc.), a speech database described later, a management database for managing information related to users, and the like.
  • examples of the various models M20 include an acoustic model and a language model used for speech recognition described later.
  • FIG. 5 is a functional block diagram schematically showing an example of the functional configuration of the server device in the speech translation system according to the present disclosure.
  • the server device 20 functionally includes a transmission / reception unit 201, an information processing unit 203, and a storage unit 213.
  • the information processing unit 203 includes, for example, a speech recognition unit 205, a multilingual translation unit 207, a score calculation unit 209, and a speech synthesis unit 211.
  • the transmission / reception unit 201 transmits / receives various information to / from the information terminal 10 and the operator terminal 30 shown in FIG.
  • the transmission / reception unit 201 receives the content of the voice input to the information terminal 10 from the information terminal 10.
  • the transmission / reception unit 201 transmits, for example, text information, voice information, and the like of contents translated by the multilingual translation unit 207 described later to the information terminal 10.
  • the transmission / reception unit 201 transmits, for example, a score related to translation accuracy calculated by a score calculation unit 209 described later to the information terminal 10.
  • the communication interface 22 illustrated in FIG. 4 functions as the transmission / reception unit 201.
  • the information processing unit 203 indicates the function of the processor 21 shown in FIG. 4, and the voice recognition unit 205 recognizes the content of the voice input to the information terminal 10, for example.
  • the multilingual translation unit 207 translates the content recognized by the speech recognition unit 205 into the content of a different language.
  • the score calculation unit 209 calculates a score related to the translation accuracy of the multilingual translation unit 207.
  • the speech synthesis unit 211 performs speech synthesis based on the translation result by the multilingual translation unit 207.
  • the storage unit 213 is, for example, a block that stores various programs and information used for processing of the server device 20.
  • storage part 213 memorize
  • storage part 213 memorize
  • the storage unit 213 stores the translated content associated with the content of the input voice as a translation history in association with each user.
  • the storage resource 23 illustrated in FIG. 4 functions as the storage unit 213.
  • FIG. 6 is a system block diagram schematically illustrating an example of a configuration of an operator terminal (interpreter device) in the speech translation system according to the present disclosure.
  • the operator terminal 30 includes a processor 31, a storage resource 32, a voice input / output device 33 (for example, a microphone and a speaker that are separate or integrated), a communication interface 34, an input device 35, A display device 36 and a camera 37 are provided.
  • the operator terminal 30 has the same block configuration as the information terminal 10 shown in FIG. In the following, in particular, a configuration different from the configuration included in the information terminal 10 will be described.
  • the operator terminal 30 operates, for example, by installed CTI (Computer Telephony Integration) application software executed as at least a part of a speech translation program according to an embodiment of the present disclosure. It functions as a part or all of the speech translation system.
  • CTI Computer Telephony Integration
  • the operator terminal 30 receives a call from the information terminal 10 shown in FIG.
  • the interpreter performs interpretation via the operator terminal 30.
  • the operator terminal 30 displays information related to at least one of the other party of the telephone, for example, the information terminal 10 and the operator of the information terminal, a translation history, which will be described in detail later, on the display device 36.
  • the operator terminal 30 is, for example, a stationary terminal device including a desktop personal computer having a communication function with the network N.
  • the processor 31 interprets and executes CTI application software, which is the program P30 stored in the storage resource 32, and performs various processes.
  • the input device 35 provides an interface for accepting an input operation by a tap operation such as an icon, a button, or a virtual keyboard displayed on the display device 36, and various input devices externally attached to the operator terminal 30, for example, A keyboard and a mouse can be exemplified.
  • the input device 35 may be a device such as a touch panel of various types including the function of the display device 36.
  • FIG. 7 is a functional block diagram schematically illustrating an example of a functional configuration of an operator terminal (interpreter device) in the speech translation system according to the present disclosure.
  • the operator terminal 30 functionally includes a voice input / output unit 301, a transmission / reception unit 303, an input operation reception unit 305, a display unit 307, an information processing unit 309, and a storage unit 315.
  • the information processing unit 309 functionally includes a call processing unit 311 and a second display processing control unit 313.
  • the voice input / output unit 301 inputs the voice of an operator including an interpreter, for example.
  • the voice input / output unit 301 may be configured to output the content indicating the translation history received by the transmission / reception unit 303 by voice as described later, for example.
  • the voice input / output device 33 illustrated in FIG. 6 functions as the voice input / output unit 301.
  • the transmission / reception unit 303 transmits / receives various information to / from the information terminal 10 and the server device 20 illustrated in FIG.
  • the transmission / reception unit 303 receives, for example, a translation history transmitted from the server device 20 via the information terminal 10.
  • the transmission / reception unit 303 receives, for example, a call processing start request transmitted from the information terminal 10.
  • the transmission / reception unit 303 transmits a response signal to the call processing start request.
  • the communication interface 34 illustrated in FIG. 6 functions as the transmission / reception unit 303.
  • the input operation accepting unit 305 is a block that accepts an operator's input operation, for example.
  • the input device 35 illustrated in FIG. 6 functions as the input operation reception unit 305.
  • the display unit 307 displays various information.
  • the display unit 307 displays the translation history in association with each user.
  • the display device 36 illustrated in FIG. 6 functions as the display unit 307.
  • the information processing unit 309 indicates the function of the processor 31 illustrated in FIG. 6, and the call processing unit 311 is configured between the operator terminal 30 and the information terminal 10 based on a call processing start request transmitted from the information terminal 10, for example.
  • the response signal includes a signal indicating that a call is possible between the operator terminal 30 and the information terminal 10 and a signal indicating that a call is possible between the operator terminal 30 and the information terminal 10.
  • the second display processing control unit 313 is a block that controls processing for displaying various types of information on the display unit 307.
  • the second display processing control unit 313 controls the display unit 307 to display the translation history in association with each user.
  • the storage unit 315 is a block that stores various programs and information used for processing of the operator terminal 30.
  • the storage unit 315 stores, for example, a translation history transmitted from the server device 20 via the information terminal 10 received by the transmission / reception unit 303.
  • the storage resource 32 illustrated in FIG. 6 functions as the storage unit 315.
  • the camera 37 shown in FIG. 6 functions as an imaging unit, for example, although not shown in FIG.
  • FIG. 8 is a flowchart illustrating an example of a process flow (part) in the speech translation system according to the present disclosure.
  • 9 (A) to 9 (C), 10 (A) to (C), and 11 (A) to 11 (D) are plan views illustrating examples of display screen transition in the information terminal according to the present disclosure.
  • FIG. 12 is a diagram illustrating an example of a display screen in the interpreter terminal according to the present disclosure.
  • the conversation when the user of the information terminal 10 is a restaurant clerk who speaks Japanese and the conversation partner is a customer who speaks English, that is, the input language is Japanese and the translation language is English. Assume conversation. However, it is not limited to this.
  • this language selection screen When the application is activated, a customer language selection screen is displayed on the display unit 107 (FIG. 8; step SJ2). As shown in FIG. 9A, this language selection screen includes, for example, a Japanese text T21 for inquiring about the language to the customer, an English text T22 for that purpose, and a plurality of typical languages assumed.
  • a language button 61 (second image) indicating English, Chinese (for example, two types depending on the typeface), and Hangul) is displayed.
  • the Japanese text T21 and the English text T22 are classified by the first display processing control unit 113 and the display unit 107, for example, by areas of different colors on the screen of the display unit 107 of the information terminal 10, and They are displayed in opposite directions (different directions; upside down in the figure).
  • the user can easily confirm the Japanese text T21, while the customer can easily confirm the English text T22.
  • the text T21 and the text T22 are displayed separately, there is an advantage that the text T21 and the text T22 are clearly distinguished from each other.
  • the user presents the text T21 displayed on the language selection screen of FIG. 9A to the customer, and has the customer tap the English button, so that the customer's language is selected.
  • a standby screen for voice input in Japanese and English is displayed on the display device as the home screen (FIG. 8; step SJ3).
  • text T23 asking which of the user's or customer's language is to be spoken is displayed on this home screen.
  • the home screen also includes a history display button 63 for displaying a history of input contents, a language selection button 64 for returning to the language selection screen and switching the customer language (re-selecting the language), and the application software.
  • a setting button 65 for performing various settings is also displayed.
  • the voice input screen for accepting the user's Japanese utterance content is displayed.
  • FIG. 9C voice input screen for accepting the user's Japanese utterance content
  • voice input screen for accepting the user's Japanese utterance content
  • FIG. 9C voice input screen
  • voice input from the voice input / output unit 101 is enabled.
  • a text T24 for prompting the user to input voice and a microphone design 66 indicating that the voice input is in a standby state are displayed.
  • the Japanese input button 62a is not displayed on the voice input screen of FIG. 9C to indicate that Japanese voice input has been selected in FIG. 9B, which is the previous screen.
  • the English input button 62b is displayed in a light color so that a part of the English input button 62b is hidden behind the microphone design 66 (the same applies to FIGS. 10A and 10B described later).
  • a cancel button 67 is displayed at the bottom of the voice input screen. By tapping this button, it is possible to return to the voice input standby screen (FIG. 9B) and perform voice input again. (Same as in FIGS. 10A and 10B described later).
  • the volume of the voice volume is schematically shown on the screen of the display unit 107 together with the text T24.
  • a dynamically shown multiple circular design 68 is displayed, and the voice input level is visually fed back to the user who is the speaker (FIG. 8; step SJ4).
  • the information processing unit 109 of the information terminal 10 detects that there is no voice input for a certain period of time, the information processing unit 109 ends the acceptance of the utterance content by the user.
  • the information processing unit 109 generates an audio signal based on the audio input, and transmits the audio signal to the server device 20 through the transmission / reception unit 103 and the network N.
  • the voice recognition unit 205 of the information processing unit 203 of the server device 20 receives the voice signal through the transmission / reception unit 201 and performs voice recognition processing (FIG. 8; step SS1). At this time, the speech recognition unit 205 calls the necessary module L20, database D20, and model M20 (speech recognition module, Japanese speech corpus, acoustic model, language model, etc.) from the storage unit 213, "To" reading "(character).
  • the information processing unit 203 generates a text signal for text output based on the recognized “reading” (characters) of the voice, and transmits the text signal to the information terminal 10 through the transmission / reception unit 201 and the network N.
  • the information processing unit 203 calls the one corresponding to the actual utterance content from the text signal based on the content of the recognized speech itself and the Japanese conversation corpus stored in the storage unit 213 in advance. And generating a text signal based thereon.
  • the first display processing control unit 113 of the information terminal 10 that has received the text signal through the transmission / reception unit 201 recognizes the Japanese utterance content input by the user on the screen. As a result, the Japanese text T25 that is the content of the recognized speech is displayed.
  • the multilingual translation unit 207 proceeds to multilingual translation processing for translating the recognized speech “reading” (characters) into another language (FIG. 8; step SS2).
  • the multilingual translation unit 207 stores the necessary module L20 and database D20 (translation module, Japanese character corpus, Japanese dictionary, English dictionary, Japanese-English bilingual dictionary, Japanese-English bilingual corpus, etc.) from the storage unit 213.
  • the input speech “reading” (character string) which is the call and recognition result, is appropriately sorted and converted to Japanese phrases, clauses, sentences, etc., and the English corresponding to the conversion result is extracted, and the English grammar is extracted.
  • the display unit 107 displays a standby screen including Japanese text T26 indicating that translation is in progress and a circular design 69 indicating that translation is in progress. Is done.
  • the storage unit 213 stores the translation result (translation content) associated with the content of the input speech for each user as a translation history (FIG. 8; step SS3).
  • the storage unit 213 stores an English conversation corpus or the like corresponding to the translated English phrase, clause, sentence, or the like as a translation history in association with the content of the input speech.
  • the speech synthesis unit 211 calls the module L20, database D20, and model M20 (speech synthesis module, English speech corpus, acoustic model, language model, etc.) necessary for speech synthesis from the storage unit 213, and uses the translation result.
  • An English conversation corpus corresponding to a certain English phrase, clause, sentence or the like is converted into natural speech (FIG. 8; step SS4).
  • the information processing unit 203 When these multilingual translation processing and speech synthesis processing are completed, the information processing unit 203 generates a text signal for text display based on the English conversation corpus that is the translation result (translation content), and the synthesized text signal is also synthesized.
  • An audio signal for audio output is generated based on the audio and transmitted to the information terminal 10 through the transmission / reception unit 201 and the network N.
  • the first display processing control unit 113 of the information terminal 10 that has received the text signal and the audio signal through the transmission / reception unit 103, the Japanese corresponding to the text T25 and the text T25.
  • the conversation corpus text T27 (same as, but not limited to, text T25 here) and the English conversation corpus text T28, which is the translation result thereof, are displayed as a conversation screen, and the call is started on the screen.
  • a process of selectively displaying the button 73 (first image) is controlled (FIG. 8; step SJ5).
  • storage part 117 of the information terminal 10 may memorize
  • step SJ5 the voice input / output unit 101 outputs (reads out) the content (translation content) of the English text T28 as a translation result (FIG. 8; step SJ6).
  • step SJ6 may be executed before or after step SJ5.
  • the Japanese texts T25 and T27 and the English text T28 are also divided on the screen of the display unit 107 of the information terminal 10, for example, by different color areas and line segments, and They are displayed in opposite directions (different directions; upside down in the figure).
  • the user and the customer are in a face-to-face conversation, the user confirms the Japanese texts T25 and T27 (input contents) if both can see the screen of the display unit 107.
  • the customer can easily confirm the English text T28 (translated content).
  • the texts T25, T27 and the text T28 are displayed separately, there is an advantage that the texts T25, T27 and the text T28 are clearly distinguished from each other.
  • the audio output is repeated by tapping the audio output button 70 displayed on the conversation screen of FIG. Also, on this conversation screen, a check button 71 indicating that the translation at that time is finished is displayed. By tapping this, the translation processing is finished and the home screen (FIG. 9B) is returned. Can do.
  • the voice processing such as the customer's voice input, recognition, translation, and voice synthesis will be performed. Is performed (FIG. 8; No in step SJ7).
  • the check screen 71 displayed in FIG. 10C is tapped to display the home screen (FIG. 9B).
  • the English input button 62b is tapped to select English voice input by the customer.
  • the processing after this is performed except that the speaker changes from the user to the customer, the Japanese voice input is switched to the English voice input, and the English voice and text output is replaced with the Japanese voice and text output. Since it is basically the same as the above-described processing, detailed description thereof is omitted here.
  • a series of speech translation processing is terminated.
  • the call processing control unit 115 is configured to transmit a call processing start request to make a call with the interpreter. May be.
  • the call processing control unit 115 may generate a call processing start request when the call start button 73 is selected, or may generate a call processing start request in advance before the call start button 73 is selected. Good.
  • the call processing start request includes, for example, identification information of the information terminal 10. Further, it is generated including the translation history from the server device 20.
  • the identification information of the information terminal 10 includes, for example, the attributes of the user of the information terminal 10, that is, the user's name, address, date of birth, age, affiliation, family structure, etc., and the telephone number or identification number of the information terminal 10 ( ID) and the like.
  • a call between a store clerk or customer who uses the information terminal 10 and an interpreter who uses the operator terminal 30 is executed through a network N including a general telephone line network, an IP telephone line network, and the like. Note that there is no particular limitation on the calling means, and it is sufficient that both calls can be made.
  • the store clerk wants to talk to a more appropriate interpreter when there are multiple interpreters who can talk.
  • identification information of each interpreter or a terminal used by each interpreter and language information indicating one or more languages that can be used by each interpreter are stored in association with each other.
  • step SJ3 shown in FIG. 8 the user presents the text T21 displayed on the language selection screen shown in FIG. 9 (A) to the customer, and asks the customer to tap an English button so that the customer Language is selected.
  • the operator terminal specifying unit 116 is selected by referring to the terminal identification information used by each interpreter stored in the storage unit 117 and the language information indicating one or more languages that can be used by each interpreter.
  • the operator terminal 30 used by the interpreter who can use the language indicated by the English button, that is, English, is specified. Then, the call processing control unit 115 transmits a call processing start request to the operator terminal used by the interpreter, so that both calls are started. In this way, it is possible to appropriately identify an interpreter who can handle the language used in communication between the store clerk and the customer.
  • the storage unit 117 of the information terminal 10 includes, in addition to the terminal identification information used by each interpreter and the language information indicating one or more languages that can be used by each interpreter, the interpretation level and interpretation of each interpreter. Information indicating the capability may be stored in association with identification information of each interpreter or identification information of a terminal used by each interpreter. Then, when an English button is selected in step SJ3 shown in FIG. 8, the operator terminal specifying unit 116 uses an operator used by an interpreter with a higher interpreting level and ability among a plurality of interpreters who can use English. The terminal may be specified.
  • the operator terminal specifying unit 116 specifies the interpreter when the call start button 73 (first image) displayed on the display unit 117 of the information terminal 10 is selected in step SJ5 of FIG. May be. Further, the operator terminal specifying unit 116 may be configured in advance to specify an operator terminal used by an interpreter who makes a call for each language used in communication between a store clerk and a customer.
  • the transmission / reception unit 303 of the operator terminal 30 receives the call processing start request from the information terminal 10 (FIG. 8; step SO1).
  • the transmission / reception unit 303 transmits a response signal to the information terminal 10 (FIG. 8; step SO2).
  • the call processing unit 311 when allowing a call between the information terminal 10 and the operator terminal 30, the call processing unit 311 generates a response signal indicating that the call is allowed.
  • the call processing unit 311 is capable of making a call, in which the identification information of the information terminal 10 included in the received call processing start request is stored in advance in the storage unit 315 or another storage resource that can communicate with the operator terminal 30 By comparing with the identification information of the information terminal, it is determined whether or not a call with the information terminal 10 is permitted. On the other hand, when the call processing unit 311 does not permit the call between the information terminal 10 and the operator terminal 30, the call processing unit 311 generates a response signal indicating that the call is not permitted.
  • the second display processing control unit 313 causes the display unit 307 to display the translation history transmitted from the server device 20 via the information terminal 10 in association with each user (FIG. 8; step SO3).
  • the second display processing control unit 313 controls a process of displaying an image 81 indicating “calling” on the screen of the display unit 307, and An image 83 including a column indicating a name, a column indicating a telephone number of an information terminal used by the user, a column indicating an identification number of the information terminal, and a column indicating attribute information indicating a user's address and the like is displayed.
  • a process is controlled and the process which displays the translation log
  • the interpreter can check the speech translation history, so the communication between the store clerk and the customer so far It is possible to respond based on the flow of
  • the operator terminal 30 displays the speech translation history in time series on the display unit 307, so that the interpreter can more easily flow the communication between the store clerk and the customer so far. Therefore, it is possible to respond appropriately based on this flow.
  • the information terminal 10 receives a response signal from the operator terminal 30 (FIG. 8; step SJ8), the connection between the information terminal 10 and the operator terminal 30 is established, and a call between the store clerk or customer and the interpreter is made. Realize (FIG. 8; steps SJ9 and SO4).
  • the processor 11 displays the text T30 on the screen of the display unit 107 as shown in FIG.
  • the information terminal displays a call start button (first image) when outputting the translation result.
  • the information terminal relates to the translation accuracy calculated by the server device.
  • the first embodiment and the second embodiment are different in that a call start button (first image) is displayed when the score is compared with a predetermined threshold and the score is equal to or lower than the predetermined threshold.
  • a second embodiment will be described with reference to FIG. Differences from the flowchart shown in FIG. 8 that describe the first embodiment will be particularly described, and descriptions of points that are similar to the flowchart shown in FIG. 8 will be omitted.
  • FIG. 13 is a flowchart showing another example of the process flow (part) in the speech translation system.
  • the multilingual translation unit 207 of the server device 20 executes multilingual translation processing for translating the recognized “reading” (characters) of the recognized speech into another language (FIG. 13; step SS12).
  • the storage unit 213 stores, as a translation history, a translation result (translation content) associated with the content of the input speech and a score related to translation accuracy corresponding to the translation result for each user (FIG. 13; step SS13).
  • the translation processing for example, statistical translation is performed, and the correspondence between words and phrases between two languages is extracted from the bilingual data, for example, including a bilingual dictionary with probability and a word order conversion table with probability
  • the score calculation unit 209 is configured to calculate, for example, a score relating to the translation accuracy of what percentage for each translation result.
  • the speech synthesizer 211 When the multilingual translation process and the speech synthesis process are completed, the speech synthesizer 211 generates a text signal for text display based on the English conversation corpus that is the translation result (translation content), and also generates the synthesized speech. Based on this, an audio signal for audio output is generated. Then, the generated text signal, the generated voice signal, and the translation accuracy are transmitted to the information terminal 10 through the transmission / reception unit 201 and the network N.
  • the score comparison unit 111 of the information terminal 10 compares the score related to the translation accuracy calculated by the server device 20 with a predetermined threshold (FIG. 13; step SJ15). If the score is higher than the predetermined threshold (FIG. 13; No in step SJ15), it indicates that the translation accuracy is good, and the first display processing control unit 113 displays the translation result on the display unit 107 and is synthesized. Audio is output (FIG. 13; step SJ16). For example, when the predetermined threshold is 80% and the score relating to the translation accuracy of the translation process in the server device 20 is 90%, the translation accuracy is good. If the customer can understand the questions of the user (clerk) by performing the translation accurately, the process returns to step SJ13 shown in FIG. 13, and this time the customer's voice is input, recognized, and translated. And voice processing such as voice synthesis.
  • step SJ15 if the score relating to the translation accuracy is equal to or lower than the predetermined threshold value (FIG. 13; Yes in step SJ15), it indicates that the translation accuracy is poor, and the first display processing control unit 113 displays the translation result and A call start button is displayed (FIG. 13; step SJ17).
  • the first display processing control unit of the information terminal controls the process of selectively displaying the call start button when the translation result is displayed on the display unit.
  • the call start button is selected, by transmitting a call processing start request for starting a call between the user and an interpreter, the burden on the user can be reduced and convenience can be improved. Can be prevented and smooth communication can be realized.
  • the information terminal compares the score related to the translation accuracy of speech translation with a predetermined threshold value, and displays a call start button when the translation accuracy is low. Therefore, since the call start button is displayed only when the need for a call with the interpreter is higher at the information terminal, the call with the interpreter can be started more smoothly.
  • each process of speech recognition, translation, and speech synthesis is executed by the server device 20
  • these processes may be executed by the information terminal 10.
  • the module L20 used for these processes may be stored in the storage resource 12 of the information terminal 10 or may be stored in the storage resource 23 of the server device 20.
  • the database D20 of the voice database and / or the model M20 such as an acoustic model may be stored in the storage resource 12 of the information terminal 10 or may be stored in the storage resource 23 of the server device 20.
  • the speech translation system may not include the network N and the server device 20.
  • you may comprise so that this process may be performed in the server apparatus 20.
  • the step of displaying the translation history relating to step SO3 shown in FIG. 8 in association with each user may be executed simultaneously with step SO1, or after step SO1 and simultaneously with step SO2 or before step SO2. May be executed. Further, the step of displaying the translation history related to step SO13 shown in FIG. 10 in association with each user may be executed simultaneously with step SO11, or after step SO11 and simultaneously with step SO12 or before step SO12. May be executed.
  • the operator terminal 30 can obtain the translation history by receiving a call processing start request including the translation history, but is not limited thereto.
  • the operator terminal 30 may be configured to receive the translation history directly from the server device 20 before or after receiving the call processing start request.
  • a gateway server for converting a communication protocol between the information terminal 10 and the network N or between the operator terminal 30 and the network N may be interposed.
  • the information terminal 10 is not limited to a portable device, and may be a desktop personal computer, a notebook personal computer, a tablet personal computer, a laptop personal computer, or the like.
  • the operator terminal 30 is not limited to a stationary device, and may be configured with a portable tablet terminal device having a communication function with the network N.

Abstract

La présente invention permet de réduire la charge sur un utilisateur et d'améliorer l'utilité, ainsi que d'empêcher l'apparition d'erreurs et d'établir une communication sans problème. Un système de traduction vocale comprend un terminal d'informations pour entrer la parole d'un utilisateur, un dispositif serveur pour traduire le contenu de la parole entrée dans le terminal d'informations et un terminal d'interprète pour effectuer un processus de conversation téléphonique avec le terminal d'informations, le dispositif serveur comprenant une unité de reconnaissance vocale qui permet de reconnaître le contenu de la parole entrée dans le terminal d'informations et une unité de traduction qui permet de traduire le contenu reconnu par l'unité de reconnaissance vocale en un contenu dans une langue différente ; le terminal d'informations comprenant une unité de sortie de parole pour émettre en paroles le contenu traduit par l'unité de traduction du dispositif serveur, une première unité de commande de processus d'affichage permettant de commander un processus d'affichage sélectif d'une première image en plus du texte et une unité de commande de processus de conversation téléphonique permettant de transmettre au terminal d'interprète une demande de démarrage de processus de conversation téléphonique pour le démarrage du processus de conversation téléphonique lorsque la première image est sélectionnée.
PCT/JP2017/003300 2016-02-01 2017-01-31 Système, procédé et programme de traduction de parole WO2017135214A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2016-017071 2016-02-01
JP2016017071A JP6449181B2 (ja) 2016-02-01 2016-02-01 音声翻訳システム、音声翻訳方法、及び音声翻訳プログラム

Publications (1)

Publication Number Publication Date
WO2017135214A1 true WO2017135214A1 (fr) 2017-08-10

Family

ID=59499823

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/003300 WO2017135214A1 (fr) 2016-02-01 2017-01-31 Système, procédé et programme de traduction de parole

Country Status (2)

Country Link
JP (1) JP6449181B2 (fr)
WO (1) WO2017135214A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107507615A (zh) * 2017-08-29 2017-12-22 百度在线网络技术(北京)有限公司 界面智能交互控制方法、装置、系统及存储介质
CN111478971A (zh) * 2020-04-14 2020-07-31 青岛联合视界数字传媒有限公司 一种多语言翻译电话系统及翻译方法
CN112818707B (zh) * 2021-01-19 2024-02-27 传神语联网网络科技股份有限公司 基于逆向文本共识的多翻引擎协作语音翻译系统与方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002223299A (ja) * 2001-01-26 2002-08-09 Hitachi Ltd 通訳サービスシステム
JP2004157882A (ja) * 2002-11-07 2004-06-03 Patolis Corp オンライン文献検索・翻訳方法
JP2017010311A (ja) * 2015-06-23 2017-01-12 株式会社Nttドコモ 翻訳支援システム、情報処理装置およびプログラム

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62286172A (ja) * 1986-06-04 1987-12-12 Ricoh Co Ltd 文書処理装置
JPS63106866A (ja) * 1986-10-24 1988-05-11 Toshiba Corp 機械翻訳装置
JPH01230177A (ja) * 1988-03-10 1989-09-13 Oki Electric Ind Co Ltd 翻訳処理システム
JPH07105220A (ja) * 1993-09-30 1995-04-21 Hitachi Ltd 会話翻訳装置
JP5821096B2 (ja) * 2011-06-30 2015-11-24 三井金属アクト株式会社 自動車用ドアロック装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002223299A (ja) * 2001-01-26 2002-08-09 Hitachi Ltd 通訳サービスシステム
JP2004157882A (ja) * 2002-11-07 2004-06-03 Patolis Corp オンライン文献検索・翻訳方法
JP2017010311A (ja) * 2015-06-23 2017-01-12 株式会社Nttドコモ 翻訳支援システム、情報処理装置およびプログラム

Also Published As

Publication number Publication date
JP2017138650A (ja) 2017-08-10
JP6449181B2 (ja) 2019-01-09

Similar Documents

Publication Publication Date Title
US9355094B2 (en) Motion responsive user interface for realtime language translation
US8781840B2 (en) Retrieval and presentation of network service results for mobile device using a multimodal browser
US20140288919A1 (en) Translating languages
JP2020118955A (ja) 非表音文字体系を使用する言語のための音声支援型アプリケーションプロトタイプの試験中の音声コマンドマッチング
JP2002116796A (ja) 音声処理装置、音声処理方法及び記憶媒体
JP2015153108A (ja) 音声会話支援装置、及び音声会話支援方法及びプログラム
US11538476B2 (en) Terminal device, server and controlling method thereof
US11763074B2 (en) Systems and methods for tool integration using cross channel digital forms
US20080195375A1 (en) Echo translator
JP2014048506A (ja) 単語登録装置及びそのためのコンピュータプログラム
WO2017135214A1 (fr) Système, procédé et programme de traduction de parole
JPH07222248A (ja) 携帯型情報端末における音声情報の利用方式
JP6141483B1 (ja) 音声翻訳装置、音声翻訳方法、及び音声翻訳プログラム
JP2000075887A (ja) パターン認識装置、方法及びシステム
JP6290479B1 (ja) 音声翻訳装置、音声翻訳方法、及び音声翻訳プログラム
KR100593589B1 (ko) 음성인식을 이용한 다국어 통역/학습 장치 및 방법
JP6250209B1 (ja) 音声翻訳装置、音声翻訳方法、及び音声翻訳プログラム
JP6310950B2 (ja) 音声翻訳装置、音声翻訳方法、及び音声翻訳プログラム
JP6353860B2 (ja) 音声翻訳装置、音声翻訳方法、及び音声翻訳プログラム
WO2017122657A1 (fr) Dispositif de traduction de parole, procédé de traduction de parole et programme de traduction de parole
JP6110539B1 (ja) 音声翻訳装置、音声翻訳方法、及び音声翻訳プログラム
US20070244687A1 (en) Dialog supporting device
JP6334589B2 (ja) 定型フレーズ作成装置及びプログラム、並びに、会話支援装置及びプログラム
JP6198879B1 (ja) 音声翻訳装置、音声翻訳方法、及び音声翻訳プログラム
JP6174746B1 (ja) 音声翻訳装置、音声翻訳方法、及び音声翻訳プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17747370

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17747370

Country of ref document: EP

Kind code of ref document: A1