WO2019071723A1 - 语音翻译方法、装置和翻译机 - Google Patents

语音翻译方法、装置和翻译机 Download PDF

Info

Publication number
WO2019071723A1
WO2019071723A1 PCT/CN2017/111962 CN2017111962W WO2019071723A1 WO 2019071723 A1 WO2019071723 A1 WO 2019071723A1 CN 2017111962 W CN2017111962 W CN 2017111962W WO 2019071723 A1 WO2019071723 A1 WO 2019071723A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
voice information
voice
translation
information
Prior art date
Application number
PCT/CN2017/111962
Other languages
English (en)
French (fr)
Inventor
郑勇
金志军
熊宽
张立新
王文祺
Original Assignee
深圳市沃特沃德股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市沃特沃德股份有限公司 filed Critical 深圳市沃特沃德股份有限公司
Publication of WO2019071723A1 publication Critical patent/WO2019071723A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation

Definitions

  • the present invention relates to the field of electronic technologies, and in particular, to a voice translation method, apparatus, and translation machine.
  • a primary object of the present invention is to provide a speech translation method, apparatus, and translation machine, which are intended to improve the convenience of operation and reduce the production cost.
  • an embodiment of the present invention provides a voice translation method, where the method includes the following steps.
  • the step of outputting the translated voice information includes:
  • control output device When a period of speech ends, the control output device outputs the translated voice information.
  • the step of detecting whether a segment of speech ends is:
  • the first time zone When no voice information is detected in the first time zone, it is determined that a voice is over. [0015] Optionally, the first time is greater than a time interval for performing translation processing on the voice information.
  • the first time is 1-2 seconds.
  • the step of the step of outputting the translated voice information by the control output unit further includes: stopping collecting voice information.
  • the step of stopping collecting voice information comprises: turning off a voice input path of the microphone.
  • the output device is a sounding device.
  • the step of outputting the translated voice information includes:
  • the second turn is 1-10 minutes.
  • Embodiments of the present invention also provide a voice translation apparatus, where the apparatus includes:
  • an activation module configured to receive a voice wake-up instruction, and enter an active state according to the voice wake-up instruction
  • a processing module configured to collect voice information, and perform translation processing on the voice information
  • an output module configured to output voice information after the translation process.
  • the output module includes:
  • a detecting unit configured to detect whether a voice is ended
  • an output unit configured to: when a speech ends, control the output device to output the translated voice information
  • the detecting unit includes:
  • a determining subunit configured to determine whether voice information is not detected in the first time zone
  • a decision subunit configured to determine that a speech end ends when no speech information is detected in the first inter-frame.
  • the first time is greater than the time interval for performing translation processing on the voice information.
  • the processing module is further configured to: when the output module outputs the translated voice information ⁇
  • the processing module is configured to: stop collecting voice f ⁇ information by turning off a voice input path of the microphone.
  • the device further includes:
  • a determining module configured to: after the output module outputs the voice information, determine whether it is not detected in the second time Detected voice information;
  • the standby module is configured to enter a standby state when no voice information is detected in the second compartment.
  • Embodiments of the present invention also provide a translation machine, the translation machine including a memory, a processor, and at least one application stored in the memory and configured to be executed by the processor, the application The program is configured to perform the aforementioned speech translation method.
  • a speech translation method provided by an embodiment of the present invention activates a translation machine through a voice control to perform a translation process, thereby canceling a button of the translation machine, so that the user does not need to say a word to press the button twice, Simultaneous translation can be realized by waking up the translation machine by voice, which frees the user's hands, improves the convenience of operation, and enhances the user experience. Since the user does not need to add additional buttons, the production cost of the translation machine is reduced, which is beneficial to the realization. Integrated design.
  • FIG. 1 is a flow chart of a first embodiment of a speech translation method of the present invention
  • FIG. 2 is a flow chart of a second embodiment of a speech translation method of the present invention.
  • FIG. 3 is a block diagram showing an example of a system architecture for implementing the speech translation method of the present invention
  • FIG. 4 is a schematic diagram of state switching of a translation machine in implementing a speech translation method of the present invention.
  • FIG. 5 is a block diagram showing a first embodiment of a speech translation apparatus of the present invention.
  • FIG. 6 is a block diagram of the output module of FIG. 5;
  • FIG. 7 is a block diagram of the detecting unit of FIG. 6;
  • FIG. 8 is a block diagram showing a second embodiment of a speech translation apparatus of the present invention.
  • terminal and terminal device used herein include both a device of a wireless signal receiver, a device having only a wireless signal receiver without a transmitting capability, and a receiving and receiving device.
  • Such a device may comprise: a cellular or other communication device having a single line display or a multi-line display or a cellular or other communication device without a multi-line display; PCS (Persona 1 Communications Service), which may combine voice, Data processing, fax and/or data communication capabilities; PDA (Personal Digital Assistant), which can include radio frequency receivers, pagers, Internet/Intranet access, web browsers, notepads, calendars and/or GPS ( Global Positioning System, Receiver; Conventional laptop and/or palmtop computer or other device having a conventional laptop and/or palmtop computer or other device that includes and/or includes a radio frequency receiver.
  • PCS Personala 1 Communications Service
  • PDA Personal Digital Assistant
  • terminal and terminal equipment used herein may be portable, transportable, Installed in a vehicle (aviation, sea and/or land), or adapted and/or configured to operate locally, and/or in distributed form, operating in any other location on the earth and/or space.
  • the "terminal” and “terminal device” used herein may also be a communication terminal, an internet terminal, a music/video playback terminal, and may be, for example, a PDA, a MID (Mobile Internet Device), and/or have a music/video playback.
  • Functional mobile phones can also be smart TVs, set-top boxes and other devices.
  • the server used herein includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a cloud composed of a plurality of servers.
  • the cloud consists of a large number of computers or network servers based on Cloud Computing, which is a kind of distributed computing, a super virtual computer composed of a group of loosely coupled computers.
  • communication may be implemented by any communication means between the server, the terminal device and the WNS server, including but not limited to, mobile communication based on 3GPP, LTE, WIMAX, and computer network communication based on TCP/IP and UDP protocols. And short-range wireless transmission based on Bluetooth and infrared transmission standards.
  • the voice translation method and device of the embodiment of the present invention are mainly applied to a translation machine, and may also be applied to a mobile terminal such as a mobile phone or a tablet, or a computer terminal such as a personal computer or a notebook computer, and other terminal devices, and the present invention This is not limited.
  • a mobile terminal such as a mobile phone or a tablet
  • a computer terminal such as a personal computer or a notebook computer, and other terminal devices
  • a first embodiment of a speech translation method of the present invention includes the following steps: [0059] Sl l, receiving a voice wake-up instruction, and entering an active state according to a voice wake-up instruction.
  • the translation machine omits the button.
  • the user does not need to press the button with the finger, and only needs to issue a voice wake-up command to wake up the translation machine, so that the translation machine enters an active state to start voice translation.
  • the production cost of the translation machine is reduced, and on the other hand, the user's hands are liberated, and the convenience of operation is improved.
  • the user can set a specific keyword as a voice wake-up instruction according to the preference, for example, setting the keyword "Little Wo" as the voice wake-up instruction, and when detecting that the user sends the voice message of "Little Wo", the translator is from the standby state. Enter the active state and start translating. In this way, the translator can only be activated by a specific keyword. When it is not activated, it is in the standby state, and no speech translation is performed in the standby state, which reduces power consumption on the one hand and avoids mistranslation on the other hand.
  • the translation machine is switched from the standby state to the active state after being awake by the keyword, immediately collects the sound signal through the microphone, performs voice activity detection (VAD) on the sound signal, acquires voice information, and detects a voice.
  • VAD voice activity detection
  • After performing voice activity detection it is preferably processed by frame, and the length of each frame is set according to the characteristics of the voice signal, for example, the frame length of 20 milliseconds of GSM is used, and the ETSI VAD algorithm or G. in the GSM communication system is adopted.
  • the 729 Annex B VAD algorithm extracts the parameter eigenvalues of the sound signal and compares the parameter eigenvalues with the threshold values. When the parameter eigenvalue is greater than or equal to the threshold ⁇ , it is determined as a speech frame, and the speech information is obtained; when the parameter eigenvalue is less than the threshold ⁇ , it is determined to be a non-speech frame.
  • the translation processing ⁇ preferably adopts frame-by-frame processing, that is, while collecting voice information, and performing translation processing on each frame of voice information.
  • the translation process mainly includes three processes of identification, translation, and synthesis. First, the voice information is recognized, the sound is converted into text, and the first character string is obtained; then the first character string is translated, the first character string is translated into the second character string of the target language; and finally, the second character string is synthesized by speech. , get the code stream of the voice information of the target language.
  • the translation machine may translate the voice information locally, or may translate the voice information through a server, and the server may be one, two or three.
  • the translator sends the voice information to the server, and the server identifies, translates, and synthesizes the voice information, obtains a code stream translated into the voice information of the target language, and returns the voice stream to the translator, and the translator receives the voice of the target language.
  • the code stream of information which is the translated voice information.
  • the translation machine sends the voice information to the recognition engine server, the recognition engine server recognizes the voice information, converts the voice into a text, obtains the first character string, and returns the first character string to the translation machine;
  • the machine sends the first string to the translation engine server, the translation engine server translates the first string, translates the first string into the second string of the target language, and returns the second string to the translator;
  • the translator will The second character string is sent to the synthesis engine server, and the synthesis engine server performs speech synthesis on the second character string to obtain a code stream of the voice information of the target language and returns it to the translation machine;
  • the translation machine receives the code stream of the voice information of the target language,
  • the code stream is the translated voice information.
  • the translation machine preferably outputs the voice information after the translation process after the user finishes speaking. Specifically, the translation machine detects whether a speech is over by the voice activity detection technology, and when the speech ends, the control output device outputs the translated speech information.
  • the output device may be a sounding device and/or a display device, etc., that is, the translated voice information may be output as a sound signal.
  • the sounding device such as a speaker (horn), an earpiece, and the like.
  • the translator can detect whether a voice is finished in the following manner: determining whether voice information is not detected in the first compartment, when in the first compartment If the voice information is not detected, it is determined that a voice ends.
  • the pause between the first two paragraphs which is preset between the two paragraphs, can be set according to actual needs, and is generally larger than the translation processing of the voice information to ensure that the translation processing of the last frame of voice information is completed.
  • the first time is preferably 1-2 seconds, for example, when no speech information is detected within 1 second, it is determined that a speech ends.
  • the translator can also determine whether a speech ends by identifying a specific end word. For example, the user can say “end”, “end”, “over” and the like at the end of a sentence.
  • the translation machine detects the aforementioned end word ⁇ , it determines that a speech ends.
  • the translator outputs the voice information after the translation processing, and stops collecting voice information, such as turning off the voice input path of the microphone, thereby reducing The power consumption of the translator.
  • the output of the voice information is finished, the acquisition of the voice information is resumed, that is, the voice input path of the microphone is opened.
  • step S13 the following steps are further included:
  • step S14 Determine whether voice information is not detected in the second time zone.
  • the process returns to step S12, and the voice information is continuously collected for translation processing; when the voice information ⁇ is not detected in the second time zone, the process proceeds to step S15.
  • the translator when no voice information is detected between the long turns (beyond the second turn), the translator automatically switches from the active state to the standby state to reduce power consumption.
  • the second time interval needs to be greater than the first time interval, and the second time interval can be set according to actual needs, preferably set to 1-10 minutes, such as when no voice message is detected within 5 minutes. After the message, the translator automatically enters the standby state.
  • the translation machine After the translation machine enters the standby state, if the user needs to use the translation machine again, the translation machine needs to be woken up again by the voice wake-up instruction, that is, the process returns to step S1 l, and when the voice wake-up command is received again, the translation machine resumes from the standby state. Switch to the active state.
  • the translator can also enter the standby state immediately after outputting the translated voice information, and the user needs to wake up the translator through the voice wake-up instruction every time the user speaks.
  • the translator is connected to the server via a network.
  • the translator includes a microphone, a front-end processor and a speaker.
  • the server includes an identification engine, a translation engine, and a synthesis engine.
  • the translation machine collects the sound signal through the microphone, and when in the standby state, performs keyword recognition on the collected sound signal, and switches from the standby state to the active state when the voice wake-up command is recognized, and uses the voice activity through the front-end processor after entering the active state.
  • the detection technology detects the voice information, and sends the voice information to the server through the network.
  • the server separately recognizes, translates and synthesizes the voice information through the recognition engine, the translation engine and the synthesis engine to obtain the translated voice information, and the translated voice information is obtained.
  • the voice information is returned to the front-end processor of the translator.
  • the front-end processor of the translator drives the speaker to output the translated voice information, and simultaneously turns off the voice input path of the microphone.
  • the speaker outputs the voice information
  • the microphone voice input is performed. Path, when no voice information is collected between long lines, the translator automatically switches from the active state to the standby state.
  • a schematic diagram of state switching of the translation machine includes three states S0, Sl, and S2 that can be switched to each other in sequence.
  • SO is the microphone input hiccup, standby waits for the voice wake-up command to wake up state
  • S1 is the microphone input hiccup
  • the voice wake-up command wakes up
  • S2 is the voice information output
  • the microphone input is off state.
  • the SO state when there is a voice wake-up command input, switch to the S1 state; in the S1 state, when no voice information is detected in the first time, the switch to the S2 state; in the S2 state, when the voice information After the output is completed, it switches to the S1 state.
  • S1 state when no voice information is detected in the second compartment, it switches to the so state.
  • the speech translation method of the embodiment of the present invention activates the translation machine by using voice control to perform translation processing, thereby canceling the button of the translation machine, so that the voice translation user does not need to say a word twice, and only needs to wake up by voice. Simultaneous interpretation can be realized after the machine, which frees the user's hands, improves the convenience of operation, and enhances the user experience. At the same time, the production of the translation machine is reduced because no additional buttons are required. This is conducive to the realization of an integrated design.
  • the apparatus includes an activation module 10, a processing module 20, and an output module 30, wherein: an activation module 10 is configured to receive a voice wake-up command according to a voice.
  • the wake-up instruction enters an active state;
  • the processing module 20 is configured to collect voice information, and perform translation processing on the voice information;
  • the output module 30 is configured to output the voice information after the translation processing.
  • the user can set a specific keyword as a voice wake-up instruction according to preferences, for example, setting the keyword "Little Wo" as a voice wake-up instruction, and when detecting that the user sends a voice message of "Little Wo", the activation module 10 controls the translation.
  • the machine enters the active state from the standby state, and the voice translation is started.
  • the translator can only be activated by specific keywords. When it is not activated, it is in the standby state, and no speech translation is performed in the standby state, which reduces power consumption on the one hand and avoids mistranslation on the other hand.
  • the processing module 20 After entering the active state, the processing module 20 immediately collects a sound signal through the microphone, performs voice activity detection (VAD) on the sound signal, acquires voice information, and detects the start and end of a voice. After performing voice activity detection, it is preferably processed by frame, and the length of each frame is set according to the characteristics of the voice signal, for example, the frame length of 20 milliseconds of GSM is used, and the ETSI VAD algorithm or G. in the GSM communication system is adopted.
  • the 729 Annex B VAD algorithm extracts the parameter eigenvalues of the sound signal and compares the parameter eigenvalues with the threshold values. When the parameter characteristic value is greater than or equal to the threshold value, it is determined as a speech frame, and the speech information is obtained; when the parameter characteristic value is less than the threshold value, it is determined to be a non-speech frame.
  • the processing module 20 After acquiring the voice information, the processing module 20 performs translation processing on the voice information, and translates the voice information from one language to another language. In the translation processing, it is preferable to perform processing by frame, that is, while collecting voice information, and performing translation processing on each frame of voice information.
  • the translation process mainly includes three processes of identification, translation and synthesis. First, the voice information is recognized, the sound is converted into text, and the first character string is obtained; then the first character string is translated, the first character string is translated into the second character string of the target language; and finally, the second character string is synthesized by speech. , get the code stream of the voice information of the target language.
  • the processing module 20 may translate the voice information locally, or may translate the voice information through a server, and the server may be one, two or three.
  • the processing module 20 sends the voice information to the server, and the server identifies, translates, and synthesizes the voice information, obtains a code stream translated into the voice information of the target language, and returns the code stream to the processing module 20,
  • the processing module 20 receives the code stream of the voice information of the target language, and the code stream is the translated voice information.
  • the processing module 20 sends the voice information to the recognition engine server, the recognition engine server identifies the voice information, converts the voice into a text, obtains the first character string, and returns the first character string to the processing module 20
  • the processing module 20 sends the first character string to the translation engine server, the translation engine server translates the first character string, translates the first character string into the second character string of the target language, and returns the second character string to the processing module 20
  • the processing module 20 sends the second character string to the synthesis engine server, and the synthesis engine server performs speech synthesis on the second character string to obtain a code stream of the voice information of the target language and returns it to the processing module 20; the processing module 20 receives the target language.
  • the stream of voice information which is the translated voice information.
  • the output module 30 preferably outputs the voice information after the translation process after the user finishes speaking.
  • the output module 30 includes a detecting unit 31 and an output unit 32, wherein: the detecting unit 31 is configured to detect whether a voice is ended; and the output unit 32 is configured to control the output device when a voice ends.
  • the voice information after the translation processing is output.
  • the output device may be a sounding device and/or a display device, etc., that is, the translated voice information may be output as a sound signal, or may be outputted in the form of text and/or image.
  • the sounding device is such as a speaker (horn), an earpiece, or the like.
  • the detecting unit 31 may detect whether a piece of voice ends by using a feature that the user pauses after speaking a paragraph.
  • the detecting unit 31 includes a determining subunit 311 and a determining sub-unit 312, wherein: the determining sub-unit 311 is configured to determine whether voice information is not detected in the first time; the determining sub-unit 312 is configured to: when no voice information is detected in the first time zone, determine A speech ends.
  • the pause between the first two paragraphs which is preset between the two paragraphs, can be set according to actual needs, generally greater than the time during which the voice information is translated, to ensure that the translation of the last frame of voice information is completed.
  • the first time is 1-2 seconds. For example, when no voice information is detected within 1 second, it is determined that a certain voice ends.
  • the detecting unit may also determine whether a piece of speech ends by identifying a specific ending word. For example, the user may say “end”, “end”, “over”, etc. at the end of a sentence. When the detecting unit detects the aforementioned ending word ⁇ , it determines that a speech ends.
  • the processing module 20 is further configured to: when the output module 30 outputs the translated voice information ⁇ , stop collecting the voice information, thereby reducing the voice information after the voice information is outputted. The power consumption of the translator.
  • the processing module 20 can stop collecting voice information by turning off the voice input path of the microphone. After the output of the voice information ends, the processing module 20 resumes the collection of the voice information, that is, the voice input path of the microphone.
  • the apparatus further includes a judging module 40 and a standby module 50, wherein: the judging module 40 is configured to output the output module 30. After the voice information, it is determined whether the voice information is not detected in the second time zone; the standby module 50 is configured to enter the standby state when no voice information is detected in the second time zone.
  • the standby module 50 controls the translator to automatically switch from the active state to the standby state to reduce power consumption.
  • the second time needs to be larger than the first time, and the second time can be set according to actual needs, preferably set to 1-10 minutes. For example, if no voice information is detected within 5 minutes, the standby module 50 controls the automatic translation machine. Enter the standby state.
  • the voice information can be detected in the second time, and the processing module 20 continues to collect the voice information for translation processing.
  • the activation module 10 controls the translator to switch from the standby state to the active state.
  • the standby module 50 can immediately enter the standby state after the output module 30 outputs the translated voice information, and the user needs to wake up the translator through the voice wake-up instruction every time the user speaks.
  • the speech translation apparatus of the embodiment of the present invention activates the translation machine by using voice control to perform translation processing, thereby canceling the button of the translation machine, so that the voice translation user does not need to say a word twice, and only needs to wake up by voice. Simultaneous interpretation can be realized after the machine, which frees the user's hands, improves the convenience of operation, and enhances the user experience. Since the user does not need to add additional buttons, the production cost of the translation machine is reduced, which is conducive to the realization of the integrated appearance. design.
  • the present invention also provides a translation machine including a memory, a processor, and at least one application stored in the memory and configured to be executed by the processor, the application being configured to use Perform a speech translation method.
  • the voice translation method includes the following steps: receiving a voice wake-up instruction, entering an activation state according to a voice wake-up instruction; collecting voice information, and performing translation processing on the voice information; and outputting the translated voice information.
  • the speech translation method described in this embodiment is the speech translation method involved in the foregoing embodiment of the present invention, and details are not described herein again. Those skilled in the art will appreciate that the present invention includes apparatus that is directed to performing one or more of the operations described herein.
  • These devices may be specially designed and manufactured for the required purposes, or may also include known devices in a general purpose computer. These devices have computer programs stored therein that are selectively activated or reconfigured. Such computer programs may be stored in a device (eg, computer) readable medium or in any type of medium suitable for storing electronic instructions and respectively coupled to a bus, including but not limited to any Types of disks (including floppy disks, hard disks, CDs, CD-ROMs, and magneto-optical disks), ROM (Read-Only Memory), RAM (Random Access Memory), EPROM (Erasable Programmable Read-Only)
  • a readable medium includes any medium that is stored or transmitted by a device (e.g., a computer) in a readable form.
  • each block of the block diagrams and/or block diagrams and/or flow diagrams can be implemented by computer program instructions, and/or in the block diagrams and/or block diagrams and/or flow diagrams. The combination of boxes.
  • these computer program instructions can be implemented by a general purpose computer, a professional computer, or a processor of other programmable data processing methods, such that the processor is executed by a computer or other programmable data processing method.
  • the block diagrams and/or block diagrams of the invention and/or the schemes specified in the blocks or blocks of the flow diagram are invented.

Abstract

本发明揭示了一种语音翻译方法,所述方法包括以下步骤:接收语音唤醒指令,根据所述语音唤醒指令进入激活状态;采集语音信息,并对所述语音信息进行翻译处理;输出翻译处理后的语音信息。本发明实施例所提供的一种语音翻译方法,通过语音控制激活翻译机进行翻译处理,从而取消了翻译机的按键提高了操作的便捷性。

Description

语音翻译方法、 装置和翻译机 技术领域
[0001] 本发明涉及电子技术领域, 特别是涉及到一种语音翻译方法、 装置和翻译机。
背景技术
[0002] 目前, 两个说不同语言的用户, 可以通过翻译机进行翻译, 从而实现无障碍交 流。 具体实现方式为: 用户说话吋按压一次翻译机的特定按键, 翻译机则采集 语音信息并进行翻译处理, 用户说完一段话后再按压一次按键, 翻译机则输出 翻译后的语音信息。
[0003] 由此可见, 现有的翻译机在进行语音翻译吋, 需要用户说一句话按两次按键, 操作极其不便。 而且翻译机需要额外增设按键, 也提高了翻译机的生产成本。 技术问题
[0004] 本发明的主要目的为提供一种语音翻译方法、 装置和翻译机, 旨在提高操作的 便捷性, 降低生产成本。
问题的解决方案
技术解决方案
[0005] 为达以上目的, 本发明实施例提出一种语音翻译方法, 所述方法包括以下步骤
[0006] 接收语音唤醒指令, 根据所述语音唤醒指令进入激活状态;
[0007] 采集语音信息, 并对所述语音信息进行翻译处理;
[0008] 输出翻译处理后的语音信息。
[0009] 可选地, 所述输出翻译处理后的语音信息的步骤包括:
[0010] 检测一段语音是否结束;
[0011] 当一段语音结束吋, 控制输出装置输出翻译处理后的语音信息。
[0012] 可选地, 所述检测一段语音是否结束的步骤包括:
[0013] 判断是否在第一吋间内未检测到语音信息;
[0014] 当在第一吋间内未检测到语音信息吋, 判定一段语音结束。 [0015] 可选地, 所述第一吋间大于对所述语音信息进行翻译处理的吋间。
[0016] 可选地, 所述第一吋间为 1-2秒。
[0017] 可选地, 所述控制输出单元输出翻译处理后的语音信息的步骤的同吋还包括: 停止采集语音信息。
[0018] 可选地, 所述停止采集语音信息的步骤包括: 关闭麦克风的语音输入通路。
[0019] 可选地, 所述输出装置为发声装置。
[0020] 可选地, 所述输出翻译处理后的语音信息的步骤包括:
[0021] 判断是否在第二吋间内未检测到语音信息;
[0022] 当在第二吋间内未检测到语音信息吋, 进入待机状态。
[0023] 可选地, 所述第二吋间为 1-10分钟。
[0024] 本发明实施例同吋提出一种语音翻译装置, 所述装置包括:
[0025] 激活模块, 用于接收语音唤醒指令, 根据所述语音唤醒指令进入激活状态;
[0026] 处理模块, 用于采集语音信息, 并对所述语音信息进行翻译处理;
[0027] 输出模块, 用于输出翻译处理后的语音信息。
[0028] 可选地, 所述输出模块包括:
[0029] 检测单元, 用于检测一段语音是否结束;
[0030] 输出单元, 用于当一段语音结束吋, 控制输出装置输出翻译处理后的语音信息
[0031] 可选地, 所述检测单元包括:
[0032] 判断子单元, 用于判断是否在第一吋间内未检测到语音信息;
[0033] 判决子单元, 用于当在第一吋间内未检测到语音信息吋, 判定一段语音结束。
[0034] 可选地, 所述第一吋间大于对所述语音信息进行翻译处理的吋间。
[0035] 可选地, 所述处理模块还用于: 当所述输出模块输出翻译处理后的语音信息吋
, 停止采集语音信息。
[0036] 可选地, 所述处理模块用于: 通过关闭麦克风的语音输入通路来停止采集语音 f π息。
[0037] 可选地, 所述装置还包括:
[0038] 判断模块, 用于当所述输出模块输出语音信息后, 判断是否在第二吋间内未检 测到语音信息;
[0039] 待机模块, 用于当在第二吋间内未检测到语音信息吋, 进入待机状态。
[0040] 本发明实施例还提出一种翻译机, 所述翻译机包括存储器、 处理器和至少一个 被存储在所述存储器中并被配置为由所述处理器执行的应用程序, 所述应用程 序被配置为用于执行前述语音翻译方法。
发明的有益效果
有益效果
[0041] 本发明实施例所提供的一种语音翻译方法, 通过语音控制激活翻译机进行翻译 处理, 从而取消了翻译机的按键, 使得语音翻译吋用户无需说一句话按两次按 键, 只需通过语音唤醒翻译机后就能实现同声翻译, 解放了用户的双手, 提高 了操作的便捷性, 提升了用户体验, 同吋由于无需额外增设按键, 降低了翻译 机的生产成本, 有利于实现一体化的外观设计。
对附图的简要说明
附图说明
[0042] 图 1是本发明的语音翻译方法第一实施例的流程图;
[0043] 图 2是本发明的语音翻译方法第二实施例的流程图;
[0044] 图 3是实现本发明的语音翻译方法的系统构架一实例的模块示意图;
[0045] 图 4是实现本发明的语音翻译方法过程中翻译机的状态切换示意图;
[0046] 图 5是本发明的语音翻译装置第一实施例的模块示意图;
[0047] 图 6是图 5中的输出模块的模块示意图;
[0048] 图 7是图 6中的检测单元的模块示意图;
[0049] 图 8是本发明的语音翻译装置第二实施例的模块示意图。
[0050] 本发明目的的实现、 功能特点及优点将结合实施例, 参照附图做进一步说明。
实施该发明的最佳实施例
本发明的最佳实施方式
[0051] 应当理解, 此处所描述的具体实施例仅仅用以解释本发明, 并不用于限定本发 明。 [0052] 下面详细描述本发明的实施例, 所述实施例的示例在附图中示出, 其中自始至 终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。 下 面通过参考附图描述的实施例是示例性的, 仅用于解释本发明, 而不能解释为 对本发明的限制。
[0053] 本技术领域技术人员可以理解, 除非特意声明, 这里使用的单数形式"一"、 " 一个"、 "所述 "和"该"也可包括复数形式。 应该进一步理解的是, 本发明的说明 书中使用的措辞"包括"是指存在所述特征、 整数、 步骤、 操作、 元件和 /或组件 , 但是并不排除存在或添加一个或多个其他特征、 整数、 步骤、 操作、 元件、 组件和 /或它们的组。 应该理解, 当我们称元件被"连接"或"耦接"到另一元件吋 , 它可以直接连接或耦接到其他元件, 或者也可以存在中间元件。 此外, 这里 使用的"连接"或"耦接"可以包括无线连接或无线耦接。 这里使用的措辞 "和 /或"包 括一个或更多个相关联的列出项的全部或任一单元和全部组合。
[0054] 本技术领域技术人员可以理解, 除非另外定义, 这里使用的所有术语 (包括技 术术语和科学术语) , 具有与本发明所属领域中的普通技术人员的一般理解相 同的意义。 还应该理解的是, 诸如通用字典中定义的那些术语, 应该被理解为 具有与现有技术的上下文中的意义一致的意义, 并且除非像这里一样被特定定 义, 否则不会用理想化或过于正式的含义来解释。
[0055] 本技术领域技术人员可以理解, 这里所使用的 "终端"、 "终端设备"既包括无线 信号接收器的设备, 其仅具备无发射能力的无线信号接收器的设备, 又包括接 收和发射硬件的设备, 其具有能够在双向通信链路上, 执行双向通信的接收和 发射硬件的设备。 这种设备可以包括: 蜂窝或其他通信设备, 其具有单线路显 示器或多线路显示器或没有多线路显示器的蜂窝或其他通信设备; PCS (Persona 1 Communications Service, 个人通信系统) , 其可以组合语音、 数据处理、 传真 和 /或数据通信能力; PDA (Personal Digital Assistant, 个人数字助理) , 其可以 包括射频接收器、 寻呼机、 互联网 /内联网访问、 网络浏览器、 记事本、 日历和 / 或 GPS (Global Positioning System, 全球定位系统) 接收器; 常规膝上型和 /或掌 上型计算机或其他设备, 其具有和 /或包括射频接收器的常规膝上型和 /或掌上型 计算机或其他设备。 这里所使用的 "终端"、 "终端设备"可以是便携式、 可运输、 安装在交通工具 (航空、 海运和 /或陆地) 中的, 或者适合于和 /或配置为在本地 运行, 和 /或以分布形式, 运行在地球和 /或空间的任何其他位置运行。 这里所使 用的"终端"、 "终端设备"还可以是通信终端、 上网终端、 音乐 /视频播放终端, 例如可以是 PDA、 MID (Mobile Internet Device, 移动互联网设备) 和 /或具有音 乐 /视频播放功能的移动电话, 也可以是智能电视、 机顶盒等设备。
[0056] 本技术领域技术人员可以理解, 这里所使用的服务器, 其包括但不限于计算机 、 网络主机、 单个网络服务器、 多个网络服务器集或多个服务器构成的云。 在 此, 云由基于云计算 (Cloud Computing) 的大量计算机或网络服务器构成, 其 中, 云计算是分布式计算的一种, 由一群松散耦合的计算机集组成的一个超级 虚拟计算机。 本发明的实施例中, 服务器、 终端设备与 WNS服务器之间可通过 任何通信方式实现通信, 包括但不限于, 基于 3GPP、 LTE、 WIMAX的移动通信 、 基于 TCP/IP、 UDP协议的计算机网络通信以及基于蓝牙、 红外传输标准的近 距无线传输方式。
[0057] 本发明实施例的语音翻译方法和装置, 主要应用于翻译机, 当然也可以应用于 手机、 平板等移动终端, 或者个人电脑、 笔记本电脑等计算机终端, 以及其它 的终端设备, 本发明对此不作限定。 以下以应用于翻译机为例进行详细说明。
[0058] 参照图 1, 提出本发明的语音翻译方法第一实施例, 所述方法包括以下步骤: [0059] Sl l、 接收语音唤醒指令, 根据语音唤醒指令进入激活状态。
[0060] 本发明实施例中, 翻译机省略了按键, 翻译过程中用户无需用手指按压按键, 只需发出语音唤醒指令就能唤醒翻译机, 使得翻译机进入激活状态幵始语音翻 译。 一方面降低了翻译机的生产成本, 另一方面解放了用户的双手, 提高了操 作的便捷性。
[0061] 用户可以根据喜好设置特定的关键词作为语音唤醒指令, 例如设置关键词"小 沃"作为语音唤醒指令, 当检测到用户发出"小沃"的语音信息吋, 翻译机则从待 机状态进入激活状态, 幵始进行语音翻译。 采用此种方式, 翻译机仅仅能被特 定的关键词激活, 在未被激活吋均处于待机状态, 待机状态下不进行语音翻译 , 一方面降低了功耗, 另一方面避免误翻译。
[0062] S12、 采集语音信息, 并对语音信息进行翻译处理。 [0063] 翻译机被关键词唤醒后从待机状态切换到激活状态, 立即通过麦克风采集声音 信号, 并对声音信号做语音活动检测 (VAD, Voice Activity Detection) , 获取语 音信息, 并检测到一段语音的幵始和结束。 在进行语音活动检测吋, 优选采取 按帧处理, 每帧吋长根据语音信号特点来设定, 比如以 GSM的 20毫秒的吋间为 帧长度, 采用 GSM通信系统中的 ETSI VAD算法或者 G.729 Annex B VAD算法, 提取出声音信号的参数特征值, 将参数特征值与门限值做比较。 当参数特征值 大于或等于门限值吋, 判定为语音帧, 获取语音信息; 当参数特征值小于门限 值吋, 判定为非语音帧。
[0064] 翻译机获取语音信息后, 则对该语音信息进行翻译处理, 将语音信息从一种语 言翻译为另一种语言。 翻译处理吋, 优选采取按帧处理, 即一边采集语音信息 , 一边对每一帧语音信息进行翻译处理。
[0065] 翻译处理流程主要包括识别、 翻译和合成三个流程。 首先对语音信息进行识别 , 将声音转换为文字, 得到第一字符串; 然后翻译第一字符串, 将第一字符串 翻译为目标语言的第二字符串; 最后对第二字符串进行语音合成, 得到目标语 言的语音信息的码流。
[0066] 翻译机可以在本地翻译语音信息, 也可以通过服务器翻译该语音信息, 服务器 可以为一个、 两个或三个。
[0067] 例如, 翻译机将语音信息发送给服务器, 服务器对语音信息进行识别、 翻译和 合成处理, 得到翻译为目标语言的语音信息的码流并返回给翻译机, 翻译机接 收目标语言的语音信息的码流, 该码流即为翻译后的语音信息。
[0068] 又如, 翻译机将语音信息发送给识别引擎服务器, 识别引擎服务器对语音信息 进行识别, 将声音转换为文字, 得到第一字符串, 并将第一字符串返回给翻译 机; 翻译机将第一字符串发送给翻译引擎服务器, 翻译引擎服务器翻译第一字 符串, 将第一字符串翻译为目标语言的第二字符串, 并将第二字符串返回给翻 译机; 翻译机将第二字符串发送给合成引擎服务器, 合成引擎服务器对第二字 符串进行语音合成, 得到目标语言的语音信息的码流并返回给翻译机; 翻译机 接收目标语言的语音信息的码流, 该码流即为翻译后的语音信息。
[0069] S13、 输出翻译处理后的语音信息。 [0070] 本发明实施例中, 翻译机优选在用户说完一段话后输出一次翻译处理后的语音 信息。 具体的, 翻译机通过语音活动检测技术检测一段语音是否结束, 当一段 语音结束吋, 控制输出装置输出翻译处理后的语音信息。 输出装置可以是发声 装置和 /或显示装置等, 即可以将翻译处理后的语音信息以声音信号的形式输出
, 还可以以文字和 /或图像的形式输出。 所述发声装置如扬声器 (喇叭) 、 听筒 等。
[0071] 鉴于用户在说完一段话后都会稍作停顿, 因此翻译机可以采用以下方式检测一 段语音是否结束: 判断是否在第一吋间内未检测到语音信息, 当在第一吋间内 未检测到语音信息吋, 则判定一段语音结束。 第一吋间即预设的两段话之间的 停顿吋间, 可以根据实际需要设定, 一般要大于对语音信息进行翻译处理的吋 间, 以保证最后一帧语音信息翻译处理完毕。 第一吋间优选 1-2秒, 例如当在 1秒 内未检测到语音信息吋, 则判定一段语音结束。
[0072] 在其它实施例中, 翻译机也可以通过识别特定的结束词来判断一段语音是否结 束, 例如用户可以在一句话的末尾说"完毕"、 "结束"、 "over"等结束词, 翻译机 检测到前述结束词吋, 则判定一段语音结束。
[0073] 进一步地, 考虑到在输出语音信息吋用户一般不会再说话, 因此翻译机在输出 翻译处理后的语音信息的同吋, 停止采集语音信息, 如关闭麦克风的语音输入 通路, 从而降低翻译机的功耗。 当语音信息输出结束后, 再恢复语音信息的采 集, 即打幵麦克风的语音输入通路。
[0074] 进一步地, 在本发明的语音翻译方法第二实施例中, 步骤 S13之后还包括以下 步骤:
[0075] S14、 判断是否在第二吋间内未检测到语音信息。 当在第二吋间内检测到语音 信息吋, 则返回步骤 S12, 继续采集语音信息进行翻译处理; 当在第二吋间内未 检测到语音信息吋, 则进入步骤 S15。
[0076] S15、 进入待机状态。
[0077] 本实施例中, 当长吋间 (超过第二吋间) 未检测到语音信息吋, 翻译机则自动 从激活状态切换到待机状态, 以降低功耗。 第二吋间需大于第一吋间, 第二吋 间可以根据实际需要设定, 优选设置为 1-10分钟, 如当 5分钟内未检测到语音信 息吋, 翻译机则自动进入待机状态。
[0078] 翻译机进入待机状态后, 若用户需要再次使用翻译机, 则需要再次通过语音唤 醒指令唤醒翻译机, 即返回步骤 Sl l, 当再次接收到语音唤醒指令吋, 翻译机再 从待机状态切换到激活状态。
[0079] 在其它实施例中, 翻译机也可以在输出翻译后的语音信息后立即进入待机状态 , 此吋用户每次说话吋都需要通过语音唤醒指令唤醒翻译机。
[0080] 如图 3所示, 为实现本发明的语音翻译方法的系统架构一实例。 翻译机通过网 络与服务器连接, 翻译机包括麦克风、 前端处理器和扬声器, 服务器包括识别 引擎、 翻译引擎和合成引擎。 翻译机通过麦克风采集声音信号, 当处于待机状 态吋对采集的声音信号进行关键词识别, 当识别到语音唤醒指令吋则从待机状 态切换到激活状态, 进入激活状态后通过前端处理器利用语音活动检测技术检 测语音信息, 并将语音信息通过网络发送给服务器, 服务器分别通过识别引擎 、 翻译引擎和合成引擎对语音信息进行识别、 翻译和合成处理后得到翻译后的 语音信息, 并将翻译后的语音信息返回给翻译机的前端处理器, 翻译机的前端 处理器驱动扬声器输出翻译后的语音信息, 同吋关闭麦克风的语音输入通路, 当扬声器输出语音信息完毕吋, 再打幵麦克风的语音输入通路, 当长吋间没有 采集到语音信息吋, 翻译机则自动从激活状态切换到待机状态。
[0081] 如图 4所示, 为翻译机的状态切换示意图, 包括 S0、 Sl、 S2三个可依次相互切 换的状态。 其中, SO为麦克风输入打幵、 待机等待语音唤醒指令唤醒状态, S1 为麦克风输入打幵、 语音唤醒指令唤醒、 语音信息翻译处理状态, S2为语音信 息输出、 麦克风输入关闭状态。 在 SO状态下, 当有语音唤醒指令输入吋, 切换 到 S1状态; 在 S1状态下, 当在第一吋间内未检测到语音信息吋, 切换到 S2状态 ; 在 S2状态下, 当语音信息输出完毕吋, 切换到 S1状态; 在 S1状态下, 当在第 二吋间内未检测到语音信息吋, 切换到 so状态。
[0082] 本发明实施例的语音翻译方法, 通过语音控制激活翻译机进行翻译处理, 从而 取消了翻译机的按键, 使得语音翻译吋用户无需说一句话按两次按键, 只需通 过语音唤醒翻译机后就能实现同声翻译, 解放了用户的双手, 提高了操作的便 捷性, 提升了用户体验, 同吋由于无需额外增设按键, 降低了翻译机的生产成 本, 有利于实现一体化的外观设计。
[0083] 参照图 5, 提出本发明的语音翻译装置第一实施例, 所述装置包括激活模块 10 、 处理模块 20和输出模块 30, 其中: 激活模块 10, 用于接收语音唤醒指令, 根 据语音唤醒指令进入激活状态; 处理模块 20, 用于采集语音信息, 并对语音信 息进行翻译处理; 输出模块 30, 用于输出翻译处理后的语音信息。
[0084] 用户可以根据喜好设置特定的关键词作为语音唤醒指令, 例如设置关键词"小 沃"作为语音唤醒指令, 当检测到用户发出"小沃"的语音信息吋, 激活模块 10则 控制翻译机从待机状态进入激活状态, 幵始进行语音翻译。 采用此种方式, 翻 译机仅仅能被特定的关键词激活, 在未被激活吋均处于待机状态, 待机状态下 不进行语音翻译, 一方面降低了功耗, 另一方面避免误翻译。
[0085] 当进入激活状态后, 处理模块 20立即通过麦克风采集声音信号, 并对声音信号 做语音活动检测 (VAD) , 获取语音信息, 并检测到一段语音的幵始和结束。 在进行语音活动检测吋, 优选采取按帧处理, 每帧吋长根据语音信号特点来设 定, 比如以 GSM的 20毫秒的吋间为帧长度, 采用 GSM通信系统中的 ETSI VAD算 法或者 G.729 Annex B VAD算法, 提取出声音信号的参数特征值, 将参数特征值 与门限值做比较。 当参数特征值大于或等于门限值吋, 判定为语音帧, 获取语 音信息; 当参数特征值小于门限值吋, 判定为非语音帧。
[0086] 处理模块 20获取语音信息后, 则对该语音信息进行翻译处理, 将语音信息从一 种语言翻译为另一种语言。 翻译处理吋, 优选采取按帧处理, 即一边采集语音 信息, 一边对每一帧语音信息进行翻译处理。
[0087] 翻译处理流程主要包括识别、 翻译和合成三个流程。 首先对语音信息进行识别 , 将声音转换为文字, 得到第一字符串; 然后翻译第一字符串, 将第一字符串 翻译为目标语言的第二字符串; 最后对第二字符串进行语音合成, 得到目标语 言的语音信息的码流。
[0088] 处理模块 20可以在本地翻译语音信息, 也可以通过服务器翻译该语音信息, 服 务器可以为一个、 两个或三个。
[0089] 例如, 处理模块 20将语音信息发送给服务器, 服务器对语音信息进行识别、 翻 译和合成处理, 得到翻译为目标语言的语音信息的码流并返回给处理模块 20, 处理模块 20接收目标语言的语音信息的码流, 该码流即为翻译后的语音信息。
[0090] 又如, 处理模块 20将语音信息发送给识别引擎服务器, 识别引擎服务器对语音 信息进行识别, 将声音转换为文字, 得到第一字符串, 并将第一字符串返回给 处理模块 20; 处理模块 20将第一字符串发送给翻译引擎服务器, 翻译引擎服务 器翻译第一字符串, 将第一字符串翻译为目标语言的第二字符串, 并将第二字 符串返回给处理模块 20; 处理模块 20将第二字符串发送给合成引擎服务器, 合 成引擎服务器对第二字符串进行语音合成, 得到目标语言的语音信息的码流并 返回给处理模块 20; 处理模块 20接收目标语言的语音信息的码流, 该码流即为 翻译后的语音信息。
[0091] 本发明实施例中, 输出模块 30优选在用户说完一段话后输出一次翻译处理后的 语音信息。 具体的, 如图 6所示, 输出模块 30包括检测单元 31和输出单元 32, 其 中: 检测单元 31, 用于检测一段语音是否结束; 输出单元 32, 用于当一段语音 结束吋, 控制输出装置输出翻译处理后的语音信息。 输出装置可以是发声装置 和 /或显示装置等, 即可以将翻译处理后的语音信息以声音信号的形式输出, 还 可以以文字和 /或图像的形式输出。 所述发声装置如扬声器 (喇叭) 、 听筒等。
[0092] 可选地, 检测单元 31可以利用用户在说完一段话后都会稍作停顿的特点来检测 一段语音是否结束, 具体的, 如图 7所示, 检测单元 31包括判断子单元 311和判 决子单元 312, 其中: 判断子单元 311, 用于判断是否在第一吋间内未检测到语 音信息; 判决子单元 312, 用于当在第一吋间内未检测到语音信息吋, 判定一段 语音结束。 第一吋间即预设的两段话之间的停顿吋间, 可以根据实际需要设定 , 一般要大于对语音信息进行翻译处理的吋间, 以保证最后一帧语音信息翻译 处理完毕。 第一吋间优选 1-2秒, 例如当在 1秒内未检测到语音信息吋, 则判定一 段语音结束。
[0093] 在其它实施例中, 检测单元也可以通过识别特定的结束词来判断一段语音是否 结束, 例如用户可以在一句话的末尾说"完毕"、 "结束"、 "over"等结束词, 检测 单元检测到前述结束词吋, 则判定一段语音结束。
[0094] 进一步地, 考虑到在输出语音信息吋用户一般不会再说话, 处理模块 20还用于 : 当输出模块 30输出翻译处理后的语音信息吋, 停止采集语音信息, 从而降低 翻译机的功耗。 具体实施吋, 处理模块 20可以通过关闭麦克风的语音输入通路 来停止采集语音信息。 当语音信息输出结束后, 处理模块 20再恢复语音信息的 采集, 即打幵麦克风的语音输入通路。
[0095] 进一步地, 如图 8所示, 在本发明的语音翻译装置第二实施例中, 该装置还包 括判断模块 40和待机模块 50, 其中: 判断模块 40, 用于当输出模块 30输出语音 信息后, 判断是否在第二吋间内未检测到语音信息; 待机模块 50, 用于当在第 二吋间内未检测到语音信息吋, 进入待机状态。
[0096] 本实施例中, 当长吋间 (超过第二吋间) 未检测到语音信息吋, 待机模块 50则 控制翻译机自动从激活状态切换到待机状态, 以降低功耗。 第二吋间需大于第 一吋间, 第二吋间可以根据实际需要设定, 优选设置为 1-10分钟, 如当 5分钟内 未检测到语音信息吋, 待机模块 50则控制翻译机自动进入待机状态。
[0097] 当然, 当输出语音信息后, 能够在第二吋间内检测到语音信息吋, 处理模块 20 则继续采集语音信息进行翻译处理。
[0098] 进入待机状态后, 当再次接收到语音唤醒指令吋, 激活模块 10再控制翻译机从 待机状态切换到激活状态。
[0099] 在其它实施例中, 待机模块 50也可以在输出模块 30输出翻译后的语音信息后立 即进入待机状态, 此吋用户每次说话吋都需要通过语音唤醒指令唤醒翻译机。
[0100] 本发明实施例的语音翻译装置, 通过语音控制激活翻译机进行翻译处理, 从而 取消了翻译机的按键, 使得语音翻译吋用户无需说一句话按两次按键, 只需通 过语音唤醒翻译机后就能实现同声翻译, 解放了用户的双手, 提高了操作的便 捷性, 提升了用户体验, 同吋由于无需额外增设按键, 降低了翻译机的生产成 本, 有利于实现一体化的外观设计。
[0101] 本发明同吋提出一种翻译机, 所述翻译机包括存储器、 处理器和至少一个被存 储在存储器中并被配置为由处理器执行的应用程序, 所述应用程序被配置为用 于执行语音翻译方法。 所述语音翻译方法包括以下步骤: 接收语音唤醒指令, 根据语音唤醒指令进入激活状态; 采集语音信息, 并对语音信息进行翻译处理 ; 输出翻译处理后的语音信息。 本实施例中所描述的语音翻译方法为本发明中 上述实施例所涉及的语音翻译方法, 在此不再赘述。 本领域技术人员可以理解, 本发明包括涉及用于执行本申请中所述操作中的一 项或多项的设备。 这些设备可以为所需的目的而专门设计和制造, 或者也可以 包括通用计算机中的已知设备。 这些设备具有存储在其内的计算机程序, 这些 计算机程序选择性地激活或重构。 这样的计算机程序可以被存储在设备 (例如 , 计算机) 可读介质中或者存储在适于存储电子指令并分别耦联到总线的任何 类型的介质中, 所述计算机可读介质包括但不限于任何类型的盘 (包括软盘、 硬盘、 光盘、 CD-ROM、 和磁光盘) 、 ROM (Read-Only Memory , 只读存储器 ) 、 RAM (Random Access Memory , 随机存储器) 、 EPROM (Erasable Programmable Read-Only
Memory , 可擦写可编程只读存储器) 、 EEPROM (Electrically Erasable
Programmable Read-Only Memory , 电可擦可编程只读存储器) 、 闪存、 磁性卡 片或光线卡片。 也就是, 可读介质包括由设备 (例如, 计算机) 以能够读的形 式存储或传输信息的任何介质。
[0103] 本技术领域技术人员可以理解, 可以用计算机程序指令来实现这些结构图和 / 或框图和 /或流图中的每个框以及这些结构图和 /或框图和 /或流图中的框的组合。 本技术领域技术人员可以理解, 可以将这些计算机程序指令提供给通用计算机 、 专业计算机或其他可编程数据处理方法的处理器来实现, 从而通过计算机或 其他可编程数据处理方法的处理器来执行本发明公幵的结构图和 /或框图和 /或流 图的框或多个框中指定的方案。
[0104] 本技术领域技术人员可以理解, 本发明中已经讨论过的各种操作、 方法、 流程 中的步骤、 措施、 方案可以被交替、 更改、 组合或刪除。 进一步地, 具有本发 明中已经讨论过的各种操作、 方法、 流程中的其他步骤、 措施、 方案也可以被 交替、 更改、 重排、 分解、 组合或刪除。 进一步地, 现有技术中的具有与本发 明中公幵的各种操作、 方法、 流程中的步骤、 措施、 方案也可以被交替、 更改 、 重排、 分解、 组合或刪除。
[0105] 以上所述仅为本发明的优选实施例, 并非因此限制本发明的专利范围, 凡是利 用本发明说明书及附图内容所作的等效结构或等效流程变换, 或直接或间接运 用在其他相关的技术领域, 均同理包括在本发明的专利保护范围内。

Claims

权利要求书
[权利要求 1] 一种语音翻译方法, 其特征在于, 包括以下步骤:
接收语音唤醒指令, 根据所述语音唤醒指令进入激活状态; 采集语音信息, 并对所述语音信息进行翻译处理; 输出翻译处理后的语音信息。
[权利要求 2] 根据权利要求 1所述的语音翻译方法, 其特征在于, 所述输出翻译处 理后的语音信息的步骤包括:
检测一段语音是否结束;
当一段语音结束吋, 控制输出装置输出翻译处理后的语音信息。
[权利要求 3] 根据权利要求 2所述的语音翻译方法, 其特征在于, 所述检测一段语 音是否结束的步骤包括:
判断是否在第一吋间内未检测到语音信息;
当在第一吋间内未检测到语音信息吋, 判定一段语音结束。
[权利要求 4] 根据权利要求 3所述的语音翻译方法, 其特征在于, 所述第一吋间大 于对所述语音信息进行翻译处理的吋间。
[权利要求 5] 根据权利要求 4所述的语音翻译方法, 其特征在于, 所述第一吋间为 1
-2秒。
[权利要求 6] 根据权利要求 2所述的语音翻译方法, 其特征在于, 所述控制输出单 元输出翻译处理后的语音信息的步骤的同吋还包括: 停止采集语音信 息。
[权利要求 7] 根据权利要求 6所述的语音翻译方法, 其特征在于, 所述停止采集语 音信息的步骤包括: 关闭麦克风的语音输入通路。
[权利要求 8] 根据权利要求 2所述的语音翻译方法, 其特征在于, 所述输出装置为 发声装置。
[权利要求 9] 根据权利要求 1所述的语音翻译方法, 其特征在于, 所述输出翻译处 理后的语音信息的步骤包括:
判断是否在第二吋间内未检测到语音信息;
当在第二吋间内未检测到语音信息吋, 进入待机状态。
[权利要求 10] 根据权利要求 9所述的语音翻译方法, 其特征在于, 所述第二吋间为 1 -10分钟。
[权利要求 11] 一种语音翻译装置, 其特征在于, 包括:
激活模块, 用于接收语音唤醒指令, 根据所述语音唤醒指令进入激活 状态;
处理模块, 用于采集语音信息, 并对所述语音信息进行翻译处理; 输出模块, 用于输出翻译处理后的语音信息。
[权利要求 12] 根据权利要求 11所述的语音翻译装置, 其特征在于, 所述输出模块包 括:
检测单元, 用于检测一段语音是否结束;
输出单元, 用于当一段语音结束吋, 控制输出装置输出翻译处理后的 语首 息。
[权利要求 13] 根据权利要求 12所述的语音翻译装置, 其特征在于, 所述检测单元包 括:
判断子单元, 用于判断是否在第一吋间内未检测到语音信息; 判决子单元, 用于当在第一吋间内未检测到语音信息吋, 判定一段语 首结束。
[权利要求 14] 根据权利要求 13所述的语音翻译装置, 其特征在于, 所述第一吋间大 于对所述语音信息进行翻译处理的吋间。
[权利要求 15] 根据权利要求 14所述的语音翻译装置, 其特征在于, 所述第一吋间为
1-2秒。
[权利要求 16] 根据权利要求 12所述的语音翻译装置, 其特征在于, 所述处理模块还 用于: 当所述输出模块输出翻译处理后的语音信息吋, 停止采集语音 f π息。
[权利要求 17] 根据权利要求 16所述的语音翻译装置, 其特征在于, 所述处理模块用 于: 通过关闭麦克风的语音输入通路来停止采集语音信息。
[权利要求 18] 根据权利要求 12所述的语音翻译装置, 其特征在于, 所述输出装置为 发声装置。
[权利要求 19] 根据权利要求 11所述的语音翻译装置, 其特征在于, 所述装置还包括 判断模块, 用于当所述输出模块输出语音信息后, 判断是否在第二吋 间内未检测到语音信息;
待机模块, 用于当在第二吋间内未检测到语音信息吋, 进入待机状态
[权利要求 20] —种翻译机, 包括存储器、 处理器和至少一个被存储在所述存储器中 并被配置为由所述处理器执行的应用程序, 其特征在于, 所述应用程 序被配置为用于执行权利要求 1所述的语音翻译方法。
PCT/CN2017/111962 2017-10-13 2017-11-20 语音翻译方法、装置和翻译机 WO2019071723A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710954366.6A CN107656923A (zh) 2017-10-13 2017-10-13 语音翻译方法和装置
CN201710954366.6 2017-10-13

Publications (1)

Publication Number Publication Date
WO2019071723A1 true WO2019071723A1 (zh) 2019-04-18

Family

ID=61118574

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/111962 WO2019071723A1 (zh) 2017-10-13 2017-11-20 语音翻译方法、装置和翻译机

Country Status (2)

Country Link
CN (1) CN107656923A (zh)
WO (1) WO2019071723A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022125283A1 (en) * 2020-12-08 2022-06-16 Google Llc Freeze words

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002438A (zh) * 2018-07-02 2018-12-14 北京分音塔科技有限公司 防误触方法、装置和翻译机
CN109887508A (zh) * 2019-01-25 2019-06-14 广州富港万嘉智能科技有限公司 一种基于声纹的会议自动记录方法、电子设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103680231A (zh) * 2013-12-17 2014-03-26 深圳环球维尔安科技有限公司 多信息同步编码学习装置及方法
CN105824807A (zh) * 2016-03-16 2016-08-03 安微省新脉科技发展有限公司 一种翻译终端和翻译方法
CN106131292A (zh) * 2016-06-03 2016-11-16 上海与德通讯技术有限公司 设置终端唤醒的方法、唤醒方法及对应的系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838714A (zh) * 2012-11-22 2014-06-04 北大方正集团有限公司 一种语音信息转换方法及装置
KR102346302B1 (ko) * 2015-02-16 2022-01-03 삼성전자 주식회사 전자 장치 및 음성 인식 기능 운용 방법
CN105957527A (zh) * 2016-05-16 2016-09-21 珠海格力电器股份有限公司 一种语音控制电器的方法、装置及语音控制空调

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103680231A (zh) * 2013-12-17 2014-03-26 深圳环球维尔安科技有限公司 多信息同步编码学习装置及方法
CN105824807A (zh) * 2016-03-16 2016-08-03 安微省新脉科技发展有限公司 一种翻译终端和翻译方法
CN106131292A (zh) * 2016-06-03 2016-11-16 上海与德通讯技术有限公司 设置终端唤醒的方法、唤醒方法及对应的系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022125283A1 (en) * 2020-12-08 2022-06-16 Google Llc Freeze words
US11688392B2 (en) 2020-12-08 2023-06-27 Google Llc Freeze words

Also Published As

Publication number Publication date
CN107656923A (zh) 2018-02-02

Similar Documents

Publication Publication Date Title
AU2019246868B2 (en) Method and system for voice activation
JP7354110B2 (ja) オーディオ処理システム及び方法
US9652017B2 (en) System and method of analyzing audio data samples associated with speech recognition
US20160343376A1 (en) Voice Recognition System of a Robot System and Method Thereof
AU2013252518B2 (en) Embedded system for construction of small footprint speech recognition with user-definable constraints
TWI489372B (zh) 語音操控方法與行動終端裝置
WO2014208231A1 (ja) ローカルな音声認識を行なう音声認識クライアント装置
EP3526789B1 (en) Voice capabilities for portable audio device
WO2019153477A1 (zh) 语音遥控方法和装置
CN107112017A (zh) 操作语音识别功能的电子设备和方法
CN110675873B (zh) 智能设备的数据处理方法、装置、设备及存储介质
US9818404B2 (en) Environmental noise detection for dialog systems
WO2019071723A1 (zh) 语音翻译方法、装置和翻译机
CN108899028A (zh) 语音唤醒方法、搜索方法、装置和终端
WO2019075829A1 (zh) 语音翻译方法、装置和翻译设备
US20200380971A1 (en) Method of activating voice assistant and electronic device with voice assistant
CN110968353A (zh) 中央处理器的唤醒方法、装置、语音处理器以及用户设备
CN109859762A (zh) 语音交互方法、装置和存储介质
CN114999496A (zh) 音频传输方法、控制设备及终端设备
WO2019169685A1 (zh) 语音处理方法、装置和电子设备
CN113905264A (zh) 一种基于语音遥控器的语音控制系统
KR20180045633A (ko) 음성 인식 서비스 제공 방법 및 이를 위한 장치
CN112992133A (zh) 声音信号控制方法、系统、可读存储介质和设备
WO2024055831A1 (zh) 一种语音交互方法、装置及终端
AU2017101077A4 (en) A voice recognition system of a robot system and method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17928290

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17928290

Country of ref document: EP

Kind code of ref document: A1