WO2019227562A1 - 语音翻译方法及其装置 - Google Patents

语音翻译方法及其装置 Download PDF

Info

Publication number
WO2019227562A1
WO2019227562A1 PCT/CN2018/093456 CN2018093456W WO2019227562A1 WO 2019227562 A1 WO2019227562 A1 WO 2019227562A1 CN 2018093456 W CN2018093456 W CN 2018093456W WO 2019227562 A1 WO2019227562 A1 WO 2019227562A1
Authority
WO
WIPO (PCT)
Prior art keywords
gain
signal
speech
voice signal
speech signal
Prior art date
Application number
PCT/CN2018/093456
Other languages
English (en)
French (fr)
Inventor
周毕兴
Original Assignee
深圳市沃特沃德股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市沃特沃德股份有限公司 filed Critical 深圳市沃特沃德股份有限公司
Publication of WO2019227562A1 publication Critical patent/WO2019227562A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the present invention relates to the technical field of translation equipment, and in particular, to a speech translation method and a device thereof.
  • translators With its compact and portable appearance and powerful language translation functions, translators are welcomed by the majority of people who need language translation, especially those who go abroad, and they are also a good helper for learning foreign languages.
  • the language translator can help you read aloud and help you translate during your learning or conversation, so that there is no obstacle for ordinary people to communicate with international friends and international customers.
  • translators on the market combine modern electronic technology, communication technology and network technology to achieve good practical results.
  • shortcomings in the current translators For example, when users speak, they need to be closer to the translator. The better, the farther away, the lower the accuracy of recognizing the user's speech, resulting in a worse translation effect.
  • the main object of the present invention is to provide a speech translation method and device for obtaining the best translation.
  • the invention proposes a speech translation method, including:
  • the third speech signal is the same language
  • the second speech signal with the highest semantic relevance is recorded and played.
  • the step of separately acquiring second voice signals obtained by translating the first voice signal under different preset gains includes:
  • the second voice signals obtained by translating the first voice signals under different gains are respectively obtained.
  • the step of comparing the semantic relevance of each of the second speech signals with a preset third speech signal includes:
  • Historical information of the third voice signal is traced back one by one according to the time from night to morning, wherein the historical information includes dialogue information or a piece of utterance information;
  • the method further includes:
  • the second gain is different from the first gain used in the previous translation, determine whether the semantic meaning of the second speech signal with the highest semantic relevance is the semantic meaning of the second speech signal amplified by the first gain. the same;
  • the second speech signal amplified by the first gain used in the previous translation is recorded and played.
  • the step of determining whether the semantic meaning of the second speech signal with the highest semantic relevance is the same as the semantic meaning of the second speech signal amplified by the first gain includes:
  • the invention also provides a speech translation device, including:
  • a first acquisition module configured to acquire second speech signals corresponding to one-to-one translation of a first speech signal of a first user at different preset gains
  • a comparison module configured to compare the semantic relevance between each of the second voice signals and a third voice signal preset in the translator, wherein the first voice signal is a reply signal of the third voice signal, And the second voice signal and the third voice signal are in the same language;
  • a second acquisition module configured to acquire a second speech signal with the highest semantic relevance to the third speech signal and a corresponding second gain
  • a judging module configured to judge whether the second gain is the same as the first gain used in the previous translation, wherein the first gain and the second gain belong to the preset gain;
  • the execution module if the second gain is the same as the first gain used in the previous translation, records and plays a second speech signal with the highest semantic relevance.
  • the first acquisition module includes:
  • a first obtaining unit configured to obtain a signal-to-noise ratio of a current environment of a translator
  • a preset unit configured to set a plurality of different gains according to the signal-to-noise ratio
  • a second acquiring unit is configured to acquire the second voice signals obtained by translating the first voice signals under different gains, respectively.
  • comparison module includes:
  • a traceback unit configured to trace back historical information of the third voice signal one by one according to the time from night to morning, wherein the historical information includes dialogue information or a piece of utterance information;
  • the comparing unit is configured to compare the semantic relevance between the historical information and each of the second speech signals.
  • judgment module further includes:
  • a second determining unit configured to determine whether the semantic meaning of the second speech signal with the highest semantic relevance is amplified by the first gain if the second gain is different from the first gain used in the previous translation;
  • the second speech signal has the same semantic meaning
  • a first recording unit configured to record and play a second speech signal with the highest semantic relevance and a corresponding gain if the semantic meaning is different;
  • the second recording unit if the semantic meaning is the same, records and plays the second voice signal amplified by the first gain used in the previous translation.
  • the second judgment unit includes:
  • An acquisition subunit configured to acquire the first keywords in the text corresponding to the second speech signal with the highest semantic relevance and the connection relationship between the first keywords, and acquire the second speech amplified by the first gain
  • the signal corresponds to each of the second keywords in the text and the connection relationship between the second keywords
  • a judging subunit configured to determine whether the matching relationship between each of the first keywords and each of the first keywords and the connection relationship between each of the second keywords and each of the second keywords is Within a preset range
  • a determination subunit configured to determine that the semantic meaning of the second speech signal with the highest semantic relevance is the same as that of the second speech signal amplified by the first gain if it is within a preset range; otherwise, determine the semantic relevance
  • the semantic meaning of the second voice signal with the highest degree is different from the semantic meaning of the second voice signal amplified by the first gain.
  • the speech translation method and the device of the present invention have the beneficial effects that a plurality of different gains preset on the translation device are used for enlarged translation, and the best translation is obtained according to the semantic relevance of the previous sentence. It is convenient for the user to use the translator and enhance the user experience of using the translator.
  • FIG. 1 is a schematic flowchart of a speech translation method according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of step S1 in an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of step S2 in an embodiment of the present invention.
  • step S4 is a schematic diagram of a specific process after step S4 in another embodiment of the present invention.
  • FIG. 5 is a detailed flowchart of step S41 in another embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a speech translation apparatus according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a first obtaining module according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a comparison module in an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a judgment module in another embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of a second determination unit in another embodiment of the present invention.
  • the present invention provides a speech translation method, including:
  • Gain is the magnification. In electronics, it is usually the ratio of a system's signal output to signal input. Gain generally refers to the degree to which a component, circuit, device, or system has its current, voltage, or power increased, in decibels ( dB), that is, the unit of gain is generally decibel (dB), which is a relative value. Amplifier gain is the logarithm of the ratio of the amplifier's output power to the input power and is used to indicate the degree of power amplification.
  • the translator includes a microphone, a speech recognition unit, a processor, and a radio frequency part of the translator.
  • the voice recognition unit can set the gain of the voice signal entering the microphone; the radio frequency part of the translator can be connected to the back-end cloud server through wireless networks such as WIFI, BT, 2G, 3G, 4G, EMTC, NB-IoT, etc.
  • the signal-to-noise ratio of the environment set the gain of the voice signal entering the microphone, set a plurality of different gains in advance, translate the first voice signal of the first user according to the different gains, and obtain a one-to-one corresponding second voice Signal, a third voice signal is set in the translator in advance, wherein the second voice signal and the third voice signal are in the same language and the first voice signal is a reply signal of the third voice signal; a semantic association with the third voice signal is obtained
  • the second voice signal with the highest degree and the corresponding second gain and then compare whether the second gain is the same as the first gain used in the previous translation.
  • the first gain and the second gain are both preset gains. If the first gain and the second gain are the same gain, the second speech signal with the highest semantic relevance is stored and played.
  • the voice signals of the user 1 who uses the A language and / or the user 2 who uses the B language entering the microphone are stored in advance as the third voice signal, and then amplified separately according to the first gain (optimal gain).
  • the voice signal is then uploaded to the server to identify and translate.
  • the server stores the language as A and B, respectively, and stores it in the translator as historical dialogue information to determine the next voice signal to be translated (first (Speech signal) semantic relevance; the received first voice signal (new voice signal) is amplified according to preset different gains, uploaded to the translator server to be recognized and translated into the corresponding second voice signal, and the first The two speech signals are compared with the third speech signal, that is, the semantic relevance of the speech signal of the same language (A or B language) is compared to obtain the most relevant and most relevant context of the first speech signal to be translated.
  • the combined second voice signal is stored and played on the cloud server of the translator. In other embodiments, it can also be stored in a local storage space.
  • the steps of obtaining the second voice signal S1 obtained by translating the first voice signal under different preset gains respectively include:
  • Signal-to-noise ratio the English name is SNR or S / N (SIGNAL-NOISE RATIO), also known as signal-to-noise ratio, refers to the ratio of signal to noise in an electronic device or electronic system.
  • This signal refers to the electronic signal from the outside of the device that needs to be processed by this device.
  • Noise refers to the random additional signal (or information) that does not exist in the original signal generated after passing through the device. Does not change with the original signal.
  • the ratio of signal to noise The larger the signal-to-noise ratio, the smaller the noise mixed in the signal, the higher the sound quality of the sound playback, otherwise the opposite.
  • the signal-to-noise ratio of the environment where the translator is currently located is obtained in advance, and a plurality of different gains are set according to the signal-to-noise ratio. According to the different gains, a second voice corresponding to the first voice signal is obtained. signal.
  • the translator recognizes that the user speaks at a long and accurate distance, it is necessary to increase the gain of the voice signal entering the microphone when the interference source or noise source is small; when the interference source or the noise source is large, reduce the gain of the voice signal entering the microphone.
  • the external interference source or noise source is not fixed. Many noises are human voices, or the frequencies coincide with the frequencies of human voices, and they are superimposed on each other.
  • the noise entering the microphone is adjusted according to the size of the interference source or noise source.
  • the gain of the speech signal enables the translator to recognize that the translation user speaks a long distance and is accurate. For example, when the translator recognizes that the user speaks a long distance and is accurate, the gain of the voice signal entering the microphone is increased, which reduces the noise around the translator, and the signal to noise of the voice signal entering the microphone is relatively large. Increasing the gain of the speech signal entering the microphone can improve the accuracy of recognition and translation.
  • the step of comparing the semantic relevance S2 of each second voice signal with a preset third voice signal includes:
  • S21 Trace back historical information of the third voice signal one by one according to the time from night to morning, wherein the historical information includes dialogue information or a piece of utterance information;
  • the historical information refers to complete conversation information or a piece of utterance information of a certain user when the two users take turns to talk and record during the translation process.
  • the translator can sort each paragraph or each sentence in the historical dialogue record in the order of time from morning to night or according to the number of recorded information, and can mark each paragraph in sequence.
  • the historical information includes historical conversation records of two languages, and the historical conversation records of each language contain the complete conversation information of the two users.
  • N paragraphs in the conversation records of each language which contain the complete conversation information of users A and B.
  • the historical conversation record of language A includes A The original information input by the user, and the information after the original information input by the B user is translated into the A language;
  • the historical dialogue record of the B language includes the original information input by the B user, and the original information entered by the A user is translated into the B language Information.
  • the semantic relevance of the third voice signal is traced back one by one, and compared with each second voice signal, the semantic relevance of the historical information can be obtained from high to low or low to high. Sort order.
  • the method further includes:
  • the next judgment needs to be performed.
  • the semantic meaning of the second speech signal with the highest semantic relevance and the semantic meaning of the second speech signal amplified by the first gain are needed.
  • make a judgment If it is judged that the semantic meaning of the above second voice signal is different, record the second voice signal with the highest semantic relevance and the corresponding gain. This gain is used as the first gain in the next translation, and the second voice signal is recorded. Play; if it is judged that the above second speech signal has the same semantic meaning, the first gain used in the previous translation is still retained as the first gain in the next translation, and the amplified by the first gain used in the previous translation is recorded.
  • the second voice signal is played.
  • determining whether the semantic meaning of the second speech signal with the highest semantic relevance is the same as the semantic meaning of the second speech signal amplified by the first gain S41 includes:
  • S411 Acquire the first keywords in the text corresponding to the second speech signal with the highest semantic relevance and the connection relationship between the first keywords, and obtain the text in the text corresponding to the second speech signal amplified by the first gain.
  • S412 Determine whether the matching relationship between the first keywords and the first keywords and the matching relationship between the second keywords and the second keywords are within a preset range
  • the second speech signal with the highest semantic relevance is obtained corresponding to each first keyword in the text and the connection relationship between the first keywords, and the second speech amplified by the first gain is also obtained.
  • the signal corresponds to each second keyword in the text and the connection relationship between the second keywords; a matching range value is set in advance to determine the first keyword and the connection relationship between each first keyword, and each Whether the matching degree of the second keyword and the cohesion relationship between the second keywords are within the range value; for example, the preset matching range value is 90% -100%.
  • the present invention also provides a voice translation device, including:
  • the comparison module 2 is configured to compare the semantic correlation between each second voice signal and a third voice signal preset in the translator, wherein the first voice signal is a reply signal of the third voice signal and the second voice signal The same language as the third voice signal;
  • a second acquisition module 3 configured to acquire a second speech signal with the highest semantic relevance to the third speech signal and a corresponding second gain
  • a judging module 4 for judging whether the second gain is the same as the first gain used in the previous translation, where the first gain and the second gain belong to a preset gain;
  • Executing module 5 if the second gain is the same as the first gain used in the previous translation, a second speech signal with the highest semantic relevance is recorded and played.
  • Gain is the magnification. In electronics, it is usually the ratio of a system's signal output to signal input. Gain generally refers to the degree to which a component, circuit, device, or system has its current, voltage, or power increased, in decibels ( dB), that is, the unit of gain is generally decibel (dB), which is a relative value. Amplifier gain is the logarithm of the ratio of the amplifier's output power to the input power and is used to indicate the degree of power amplification.
  • the translator includes a microphone, a speech recognition unit, a processor, and a radio frequency part of the translator.
  • the voice recognition unit can set the gain of the voice signal entering the microphone; the radio frequency part of the translator can be connected to the back-end cloud server through wireless networks such as WIFI, BT, 2G, 3G, 4G, EMTC, NB-IoT, etc.
  • the signal-to-noise ratio in the environment is set for the gain of the voice signal entering the microphone, and a plurality of different gains are set in advance.
  • the first acquisition module 1 translates the first voice signal of the first user according to the different gains, and obtains one by one.
  • the corresponding second voice signal is preset with a third voice signal in the translator, wherein the second voice signal and the third voice signal are in the same language, and the first voice signal is a reply signal of the third voice signal; the second acquisition module 3 The second speech signal with the highest semantic relevance to the third speech signal and the corresponding second gain are obtained, and then the judgment module 4 determines whether the second gain is the same as the first gain used in the previous translation. Both the first gain and the second gain are preset gains. If the first gain and the second gain are the same gain, the execution module The second speech signal with the highest semantic relevance is stored and played.
  • the voice signals of the user 1 using the A language and / or the user 2 using the B language entering the microphone are stored in advance as the third semantic signal, and then amplified separately according to the first gain (optimal gain).
  • the voice signal is then uploaded to the server to identify and translate.
  • the server stores the language as A and B, respectively, and stores it in the translator as historical dialogue information to determine the next voice signal to be translated (first (Speech signal) semantic relevance; the received first voice signal (new voice signal) is amplified according to preset different gains, uploaded to the translator server to be recognized and translated into the corresponding second voice signal, and the first
  • the two speech signals are compared with the third speech signal one by one, that is, the semantic relevance of the speech signal of the same language (A or B language) is compared, and the context that is most relevant to the first speech signal to be translated is obtained.
  • the second voice signal is stored and played on the translator's cloud server. In other embodiments, it can also be stored in a local storage space.
  • the first obtaining module 1 includes:
  • a first obtaining unit 11 configured to obtain a signal-to-noise ratio of an environment in which a translator is currently located;
  • a preset unit 12 configured to set a plurality of different gains according to a signal-to-noise ratio
  • the second obtaining unit 13 is configured to obtain second voice signals corresponding to the first voice signals one-to-one under different gains, respectively.
  • Signal-to-noise ratio the English name is SNR or S / N (SIGNAL-NOISE RATIO), also known as signal-to-noise ratio, refers to the ratio of signal to noise in an electronic device or electronic system.
  • This signal refers to the electronic signal from the outside of the device that needs to be processed by this device.
  • Noise refers to the random additional signal (or information) that does not exist in the original signal generated after passing through the device. Does not change with the original signal.
  • the ratio of signal to noise The larger the signal-to-noise ratio, the smaller the noise mixed in the signal, the higher the sound quality of the sound playback, otherwise the opposite.
  • the first acquisition unit 11 acquires the signal-to-noise ratio of the environment in which the translator is currently located, the preset unit 12 sets a plurality of different gains according to the signal-to-noise ratio, and the set gains are used by the second acquisition unit 13
  • the second speech signals corresponding to the first speech signals are respectively acquired.
  • the translator recognizes that the user speaks at a long and accurate distance, it is necessary to increase the gain of the voice signal entering the microphone when the interference source or noise source is small; when the interference source or noise source is large, reduce the voice signal entering the microphone.
  • Gain The external interference source or noise source is not fixed, and a lot of noise itself is human voice, or the frequency coincides with the frequency of human voice, and they are superimposed on each other.
  • the microphone is adjusted
  • the gain of the speech signal is so that the translator recognizes the distance that the translation user speaks is long and accurate. For example, when the translator recognizes that the user speaks a long distance and is accurate, the gain of the voice signal entering the microphone is increased, which reduces the noise around the translator, and the signal to noise of the voice signal entering the microphone is relatively large. Increasing the gain of the speech signal entering the microphone can improve the accuracy of recognition and translation.
  • the comparison module 2 includes:
  • the traceback unit 21 is configured to trace back historical information of the third speech signal one by one according to the time from night to morning, wherein the historical information includes dialogue information or a segment of speech;
  • the comparing unit 22 is configured to compare the semantic relevance between the historical information and each second speech signal.
  • the historical information refers to complete conversation information or a piece of utterance information of a certain user when the two users take turns to talk and record during the translation process.
  • the translator can sort each paragraph or each sentence in the historical dialogue record in the order of time from morning to night or according to the number of recorded information, and can mark each paragraph in sequence.
  • the historical information includes historical conversation records of two languages, and the historical conversation records of each language contain the complete conversation information of the two users.
  • N paragraphs in the conversation records of each language which contain the complete conversation information of users A and B.
  • the historical conversation record of language A includes A The original information input by the user, and the information after the original information input by the B user is translated into the A language;
  • the historical dialogue record of the B language includes the original information input by the B user, and the original information entered by the A user is translated into the B language Information.
  • the semantic relevance of the third voice signal is traced back one by one, and compared with each second voice signal, the semantic relevance of the historical information can be obtained from high to low or low to high. Sort order.
  • the determining module 4 further includes:
  • the second determining unit 41 is configured to determine whether the semantic meaning of the second speech signal with the highest semantic relevance is the second speech signal amplified by the first gain if the second gain is different from the first gain used in the previous translation. Have the same meaning;
  • a first recording unit 421, configured to record and play a second speech signal with the highest semantic relevance and a corresponding gain if the semantic meaning is different;
  • the second recording unit 422 if the semantic meaning is the same, records and plays the second voice signal amplified by the first gain used in the previous translation.
  • the second judgment unit 42 needs to make a next judgment, and the second speech amplified by the first gain by the semantic meaning of the second speech signal with the highest semantic relevance
  • the semantics of the signals are judged. If it is judged that the semantic meaning of the second voice signal is different, the first recording unit 431 records the second voice signal with the highest semantic relevance and the corresponding gain, which is used as the first in the next translation. Gain and play the second voice signal; if it is judged that the above second voice signal has the same semantic meaning, the first gain used in the previous translation is still retained as the first gain in the next translation, and the second recording unit 432 Record and play the second voice signal amplified by the first gain used in the previous translation.
  • the second determination unit 41 includes:
  • the obtaining subunit 411 is configured to obtain the first speech keyword corresponding to the second speech signal with the highest semantic relevance and the connection relationship between the first keywords and the second speech signal amplified by the first gain. Corresponds to each second keyword in the text and the cohesive relationship between each second keyword;
  • the judging subunit 412 is configured to judge whether the matching degree between the first keywords and the first keywords and the matching relationship between the second keywords and the second keywords are within a preset range.
  • a determination subunit 413 is configured to determine that the semantic meaning of the second speech signal with the highest semantic relevance is the same as that of the second speech signal amplified by the first gain if it is within a preset range; otherwise, determine the semantic relevance The semantic meaning of the highest second speech signal is different from the semantic meaning of the second speech signal amplified by the first gain.
  • the obtaining subunit 421 obtains the first keywords and the connection relationship between the first keywords in the text corresponding to the second speech signal with the highest semantic relevance, and also acquires the first gain
  • the second voice signal corresponds to each second keyword in the text and the connection relationship between the second keywords; a matching range value is set in advance, and the judging subunit 422 judges each first keyword and each first keyword Whether the cohesion relationship between them matches each of the second keywords and the cohesion relationship between the second keywords is within a range value.
  • the preset matching range value is 90% -100%.
  • the matching degree of the connection relationship between the two is between 90% and 100% of the preset matching range value, and the determination subunit 423 determines that the semantic meaning of the two speech signals is the same; The sub-unit 423 determines that the semantic meanings of the two speech signals are different.
  • the speech translation method and device thereof according to the present invention respectively perform enlarged translation by presetting a plurality of different gains on the translation device, and obtain the best translation according to the semantic relevance of the previous sentence, which is greatly convenient. Users use translators to enhance their experience with translators.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Machine Translation (AREA)

Abstract

本发明揭示了一种语音翻译方法及其装置,包括:分别获取在不同预设增益下对第一语音信号进行翻译得到的第二语音信号;分别比较各第二语音信号与预先设置的第三语音信号的语意关联度,第一语音信号为第三语音信号的回复信号,且第二语音信号与第三语音信号为同一语种;获取与第三语音信号语意关联度最高的第二语音信号以及所对应的第二增益;判断第二增益是否与前一次翻译时使用的第一增益相同,第一增益和第二增益属于预设增益;若相同,则记录并播放语意关联度最高的第二语音信号。通过不同增益分别进行放大翻译及之前语句的语意关联度,获取到最佳翻译,极大的方便了用户使用翻译机,提升用户使用翻译机的体验。

Description

语音翻译方法及其装置 技术领域
本发明涉及到翻译设备的技术领域,特别是涉及到一种语音翻译方法及其装置。
背景技术
随着当今社会的快速发展,对外交流越来越多,但是对于许多人来说语言交流不畅是个不小的障碍,于是市场上出现了各式各样的语言翻译机。
翻译机凭借着小巧便携的外观、强大的语言翻译功能等深受受到广大有语言翻译需求的人士,特别是出国人士的欢迎,同时也是学习外语的好帮手。语言翻译机可以在你学习或者对话的过程中帮你朗读,帮你翻译,使得普通人与国际友人、国际客户的交流没有障碍。
现在市面上的翻译机结合现代电子技术、通讯技术及网络技术达到了不错的实用效果,但是目前的翻译机存在着一些不足,比如,用户说话的时候,要求用户离翻译机的距离要越近越好,距离稍微远一点,识别用户说话的准确率下降,导致翻译的效果变差。
技术问题
本发明的主要目的为提供一种获取最佳翻译的语音翻译方法及其装置。
技术解决方案
本发明提出一种语音翻译方法,包括:
分别获取在不同预设增益下对第一语音信号进行翻译得到的第二语音信号;
分别比较各所述第二语音信号与预先设置的第三语音信号的语意关联度,其中,所述第一语音信号为所述第三语音信号的回复信号,且所述第二语音信号与所述第三语音信号为同一语种;
获取与所述第三语音信号语意关联度最高的第二语音信号以及所对应的第二增益;
判断所述第二增益是否与前一次翻译时使用的第一增益相同,其中,所述第一增益和所述第二增益属于所述预设增益;
若相同,则记录并播放语意关联度最高的第二语音信号。
进一步地,所述分别获取在不同预设增益下对第一语音信号进行翻译得到的第二语音信号步骤,包括:
获取翻译机当前所在环境的信噪比;
根据所述信噪比设置多个不同的增益;
分别获取在不同增益下对所述第一语音信号进行翻译得到的所述第二语音信号。
进一步地,所述分别比较各所述第二语音信号与预先设置的第三语音信号的语意关联度的步骤,包括:
根据时间从晚到早的顺序逐次往前回溯所述第三语音信号的历史信息,其中,所述历史信息包括对话信息或者一段话语信息;
比较所述历史信息与各所述第二语音信号的语意关联度。
进一步地,所述判断所述第二增益是否与前一次翻译时使用的第一增益相同的步骤之后,还包括:
若所述第二增益与前一次翻译时使用的所述第一增益不同,则判断语意关联度最高的第二语音信号的语意,是否与所述第一增益所放大的第二语音信号的语意相同;
若语意不同,则记录并播放语意关联度最高的第二语音信号与所对应的增益;
若语意相同,则记录并播放前一次翻译时使用的第一增益所放大的第二语音信号。
进一步地,所述判断语意关联度最高的第二语音信号的语意,是否与所述第一增益所放大的第二语音信号的语意相同的步骤,包括:
获取语意关联度最高的第二语音信号对应文字文本中的各第一关键词以及各第一关键词之间的衔接关系,获取所述第一增益所放大的第二语音信号对应文字文本中的各第二关键词以及各第二关键词之间的衔接关系;
判断各所述第一关键字以及各第一关键词之间的衔接关系,与各所述第二关键词以及各第二关键词之间的衔接关系的匹配度是否在预设范围内;
若是,则判定语意关联度最高的第二语音信号的语意,与所述第一增益所放大的第二语音信号的语意相同;否则,则判定语意关联度最高的第二语音信号的语意,与所述第一增益所放大的第二语音信号的语意不相同。
本发明还提出了一种语音翻译装置,包括:
第一获取模块,用于分别获取在不同预设增益下翻译第一用户的第一语音信号一一对应的第二语音信号;
对比模块,用于分别比较各所述第二语音信号与预先设置于翻译机内的第三语音信号的语意关联度,其中,所述第一语音信号为所述第三语音信号的回复信号,且所述第二语音信号与所述第三语音信号为同一语种;
第二获取模块,用于获取与所述第三语音信号语意关联度最高的第二语音信号以及所对应的第二增益;
判断模块,用于判断所述第二增益是否与前一次翻译时使用的第一增益相同,其中,所述第一增益和所述第二增益属于所述预设增益;
执行模块,若所述第二增益与前一次翻译时使用的第一增益相同,则记录并播放语意关联度最高的第二语音信号。
进一步地,所述第一获取模块包括:
第一获取单元,用于获取翻译机当前所在环境的信噪比;
预设单元,用于根据所述信噪比设置多个不同的增益;
第二获取单元,用于分别获取在不同增益下对所述第一语音信号进行翻译得到的所述第二语音信号。
进一步地,所述对比模块包括:
回溯单元,用于根据时间从晚到早的顺序逐次往前回溯所述第三语音信号的历史信息,其中,所述历史信息包括对话信息或者一段话语信息;
对比单元,用于比较所述历史信息与各所述第二语音信号的语意关联度。
进一步地,所述判断模块还包括:
第二判断单元,用于若所述第二增益与前一次翻译时使用的所述第一增益不同,则判断语意关联度最高的第二语音信号的语意,是否与所述第一增益所放大的第二语音信号的语意相同;
第一记录单元,用于若语意不同,则记录并播放语意关联度最高的第二语音信号与所对应的增益;
第二记录单元,若语意相同,则记录并播放前一次翻译时使用的第一增益所放大的第二语音信号。
进一步地,第二判断单元包括:
获取子单元,用于获取语意关联度最高的第二语音信号对应文字文本中的各第一关键词以及各第一关键词之间的衔接关系,获取所述第一增益所放大的第二语音信号对应文字文本中的各第二关键词以及各第二关键词之间的衔接关系;
判断子单元,用于判断各所述第一关键字以及各第一关键词之间的衔接关系,与各所述第二关键词以及各第二关键词之间的衔接关系的匹配度是否在预设范围内;
判定子单元,用于若是在预设范围内,则判定语意关联度最高的第二语音信号的语意,与所述第一增益所放大的第二语音信号的语意相同;否则,则判定语意关联度最高的第二语音信号的语意,与所述第一增益所放大的第二语音信号的语意不相同。
有益效果
本发明的一种语音翻译方法及其装置的有益效果,通过预设在翻译装置上的多个不同增益分别进行放大翻译,并根据之前的语句的语意关联度,获取到最佳翻译,极大的方便了用户使用翻译机,提升用户使用翻译机的体验。
附图说明
图1 是本发明一实施例中语音翻译方法的流程示意图;
图2 是本发明一实施例中步骤S1的具体流程示意图;
图3 是本发明一实施例中步骤S2的具体流程示意图;
图4 是本发明另一实施例中步骤S4之后的具体流程示意图;
图5 是本发明另一实施例中步骤S41的具体流程示意图;
图6 是本发明一实施例中语音翻译装置的结构示意图;
图7 是本发明一实施例中第一获取模块的结构示意图;
图8 是本发明一实施例中对比模块的结构示意图;
图9 是本发明另一实施例中判断模块的结构示意图;
图10 是本发明另一实施例中第二判断单元的结构示意图。
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
本发明的最佳实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
参照图1所示,本发明提供了一种语音翻译方法,包括:
S1、分别获取在不同预设增益下对第一语音信号进行翻译得到的第二语音信号;
S2、分别比较各第二语音信号与预先设置的第三语音信号的语意关联度,其中,第一语音信号为第三语音信号的回复信号,且第二语音信号与第三语音信号为同一语种;
S3、获取与第三语音信号语意关联度最高的第二语音信号以及所对应的第二增益;
S4、判断第二增益是否与前一次翻译时使用的第一增益相同,其中,第一增益和第二增益属于预设增益;
S5、若相同,则记录并播放语意关联度最高的第二语音信号。
增益,就是放大倍数,在电子学上,通常为一个系统的讯号输出与讯号输入的比率,增益一般指对元器件、电路、设备或系统,其电流、电压或功率增加的程度,以分贝(dB)数来规定,即增益的单位一般是分贝(dB),是一个相对值。放大器增益是放大器输出功率与输入功率比值的对数,用以表示功率放大的程度。
在本实施例中,应用于翻译机上,翻译机包括麦克风、语音识别单元、处理器、翻译机射频部分。语音识别单元可以设置进入麦克风的语音信号的增益;翻译机射频部分并可通过无线网络、如WIFI、BT、2G、3G、4G、EMTC、NB-IoT等与后台云端服务器相连接,根据翻译机所在环境下的信噪比,对进入麦克风的语音信号的增益进行设置,预先设置多个不同的增益,根据不同的增益翻译第一用户的第一语音信号,获取到一一对应的第二语音信号,预先设置第三语音信号于翻译机内,其中,第二语音信号与第三语音信号为同一语种,第一语音信号为第三语音信号的回复信号;获取到与第三语音信号语意关联度最高的第二语音信号以及所对应的第二增益,再对比第二增益是否与前一次翻译时使用的第一增益相同,其中,第一增益、第二增益均属于预先设置的增益,如果第一增益与第二增益为同一增益,则将该语意关联度最高的第二语音信号储存下来,并播放。
在一具体实施例中,预先把使用A语言的用户1或/和使用B语言的用户2进入麦克风的语音信号存储下来,作为第三语音信号,然后按照第一增益(最佳增益)分别放大该语音信号,再把该语音信号上传给服务器识别翻译,服务器分别存储为A语言和B语言,用于存储在翻译机内作为历史对话信息,用来判断下一待翻译的语音信号(第一语音信号)的语意关联度;根据预设的不同的增益对接收到的第一语音信号(新的语音信号)进行放大,上传至翻译机服务器上识别翻译成对应的第二语音信号,将第二语音信号一一与第三语音信号比对,也即同语种的语音信号(A或B语言)的语意关联度进行对比,获取到与待翻译的第一语音信号的语境最相关最贴合的第二语音信号,存储于翻译机的云端服务器上并播放,在其他实施例中,还可以存储于本地存储空间。
如图2所示,在本实施例中,分别获取在不同预设增益下对第一语音信号进行翻译得到的第二语音信号S1的步骤,包括:
S11、获取翻译机当前所在环境的信噪比;
S12、根据信噪比设置多个不同的增益;
S13、分别获取在不同增益下对第一语音信号一一对应的第二语音信号。
信噪比,英文名称叫做SNR或S/N(SIGNAL-NOISE RATIO),又称为讯噪比,是指一个电子设备或者电子系统中信号与噪声的比例。该信号指的是来自设备外部需要通过这台设备进行处理的电子信号,噪声是指经过该设备后产生的原信号中并不存在的无规则的额外信号(或信息),并且该种信号并不随原信号的变化而变化。信号与噪声的比例,信噪比越大,说明混在信号里的噪声越小,声音回放的音质量越高,否则相反。
在使用翻译机的过程中,预先获取翻译机当前所在环境的信噪比,根据该信噪比设置多个不同的增益,根据不同的增益,获取到第一语音信号一一对应的第二语音信号。当翻译机识别用户说话的距离远而且准确,需要在干扰源或者噪音源小时,加大进入麦克风的语音信号的增益;在干扰源或者噪音源大时,减小进入麦克风的语音信号的增益。外界的干扰源或者噪音源不是固定的,很多噪音本身就是人声,或者说频率跟人声的频率重合,互相叠加,在实际使用过程中,根据干扰源或者噪音源的大小来调整进入麦克风的语音信号的增益以使翻译机识别翻译用户说话的距离远而且准确。如:当翻译机识别获取到用户说话的距离远而且准确时,设置进入麦克风的语音信号的增益加大,这在翻译机周围的噪声比较小,进入麦克风的语音信号的信噪比较大,加大进入麦克风的语音信号的增益可以提高识别翻译的准确率。
如图3所示,在本实施例中,根据分别比较各第二语音信号与预先设置的第三语音信号的语意关联度S2的步骤,包括:
S21、根据时间从晚到早的顺序逐次往前回溯第三语音信号的历史信息,其中,历史信息包括对话信息或者一段话语信息;
S22、比较历史信息与各第二语音信号的语意关联度。
在本实施中,历史信息指的是翻译机在翻译过程中记录和存储的两个用户轮流交谈时的完整对话信息或某一用户的一段话语信息。翻译机可以按照时间从早到晚的顺序或按照记录信息的次数对历史对话记录中的每一段话或者每一句话进行排序,并可以对每一段话进行顺序标记。
在一些实施例中,历史信息包括两种语言的历史对话记录,每种语言的历史对话记录都包含了两个用户的完整对话信息。A语言的历史对话记录和B语言的历史对话记录,每种语言的对话记录中都有N段话,包含了A用户和B用户的完整对话信息,其中,A语言的历史对话记录中包括A用户输入的原始信息,以及将B用户输入的原始信息翻译为A语言后的信息;B语言的历史对话记录中包括B用户输入的原始信息,以及将A用户输入的原始信息翻译为B语言后的信息。
根据时间从晚到早的顺序逐次往前回溯第三语音信号的历史信息的语意关联度,分别与各第二语音信号比较,可获得根据历史信息的语意关联度由高至低或者由低至高的排列顺序。
如图4所示,在本实施例中,判断第二增益是否与前一次翻译时使用的第一增益相同的步骤S4的步骤之后,还包括:
S41、若第二增益与前一次翻译时使用的第一增益不同,则判断语意关联度最高的第二语音信号的语意,是否与第一增益所放大的第二语音信号的语意相同;
S421、若语意不同,则记录并播放语意关联度最高的第二语音信号与所对应的增益;
S422、若语意相同,则记录并播放前一次翻译时使用的第一增益所放大的第二语音信号。
当判断第二增益与前一次翻译时使用的第一增益不同,则需要进行下一步的判断,对语意关联度最高的第二语音信号的语意与第一增益所放大的第二语音信号的语意进行判断,若判断以上的第二语音信号的语意不同,则记录语意关联度最高的第二语音信号以及所对应的增益,该增益作为下一次翻译时的第一增益,并对第二语音信号进行播放;若判断以上的第二语音信号的语意相同,则依然保留前一次翻译时使用的第一增益作为下一次翻译时的第一增益,记录前一次翻译时使用的第一增益所放大的第二语音信号,并进行播放。
如图5所示,在本实施例中,判断语意关联度最高的第二语音信号的语意,是否与第一增益所放大的第二语音信号的语意相同S41的步骤,包括:
S411、获取语意关联度最高的第二语音信号对应文字文本中的各第一关键词以及各第一关键词之间的衔接关系,获取第一增益所放大的第二语音信号对应文字文本中的各第二关键词以及各第二关键词之间的衔接关系;
S412、判断各第一关键词以及各第一关键词之间的衔接关系,与各第二关键词以及各第二关键词之间的衔接关系的匹配度是否在预设范围内;
S413、若是,则判定语意关联度最高的第二语音信号的语意,与第一增益所放大的第二语音信号的语意相同;否则,则判定语意关联度最高的第二语音信号的语意,与第一增益所放大的第二语音信号的语意不相同。
在本实施例中,获取语意关联度最高的第二语音信号对应文字文本中的各第一关键词以及各第一关键词之间的衔接关系,同时也获取第一增益所放大的第二语音信号对应文字文本中的各第二关键词以及各第二关键词之间的衔接关系;预先设置一个匹配范围值,判断各第一关键词以及各第一关键词之间的衔接关系,与各第二关键词以及各第二关键词之间的衔接关系的匹配度是否在范围值内;如:预设匹配范围值在90%-100%,当各第一关键词以及各第一关键词之间的衔接关系,与各第二关键词以及各第二关键词之间的衔接关系的匹配度在预设匹配范围值90%-100%之间,则判断两句语音信号的语意相同;若两句语音信号的匹配度低于90%,则判断该两句语音信号的语意不相同。
如图6所示,本发明还提出了一种语音翻译装置,包括:
第一获取模块1,用于分别获取在不同预设增益下对第一语音信号进行翻译得到的第二语音信号;
对比模块2,用于分别比较各第二语音信号与预先设置于翻译机内的第三语音信号的语意关联度,其中,第一语音信号为第三语音信号的回复信号,且第二语音信号与第三语音信号为同一语种;
第二获取模块3,用于获取与第三语音信号语意关联度最高的第二语音信号以及所对应的第二增益;
判断模块4,用于判断第二增益是否与前一次翻译时使用的第一增益相同,其中,第一增益和第二增益属于预设增益;
执行模块5,若第二增益与前一次翻译时使用的第一增益相同,则记录并播放语意关联度最高的第二语音信号。
增益,就是放大倍数,在电子学上,通常为一个系统的讯号输出与讯号输入的比率,增益一般指对元器件、电路、设备或系统,其电流、电压或功率增加的程度,以分贝(dB)数来规定,即增益的单位一般是分贝(dB),是一个相对值。放大器增益是放大器输出功率与输入功率比值的对数,用以表示功率放大的程度。
在本实施例中,应用于翻译机上,翻译机包括麦克风、语音识别单元、处理器、翻译机射频部分。语音识别单元可以设置进入麦克风的语音信号的增益;翻译机射频部分并可通过无线网络、如WIFI、BT、2G、3G、4G、EMTC、NB-IoT等与后台云端服务器相连接,根据翻译机所在环境下的信噪比,对进入麦克风的语音信号的增益进行设置,预先设置多个不同的增益,第一获取模块1根据不同的增益翻译第一用户的第一语音信号,获取到一一对应的第二语音信号,预先设置第三语音信号于翻译机内,其中,第二语音信号与第三语音信号为同一语种,第一语音信号为第三语音信号的回复信号;第二获取模块3获取到与第三语音信号语意关联度最高的第二语音信号以及所对应的第二增益,再通过判断模块4判断第二增益是否与前一次翻译时使用的第一增益相同,其中,第一增益、第二增益均属于预先设置的增益,如果第一增益与第二增益为同一增益,则执行模块将该语意关联度最高的第二语音信号储存下来,并播放。
在一具体实施例中,预先把使用A语言的用户1或/和使用B语言的用户2进入麦克风的语音信号存储下来,作为第三语意信号,然后按照第一增益(最佳增益)分别放大该语音信号,再把该语音信号上传给服务器识别翻译,服务器分别存储为A语言和B语言,用于存储在翻译机内作为历史对话信息,用来判断下一待翻译的语音信号(第一语音信号)的语意关联度;根据预设的不同的增益对接收到的第一语音信号(新的语音信号)进行放大,上传至翻译机服务器上识别翻译成对应的第二语音信号,将第二语音信号一一与第三语音信号对比,也即同语种的语音信号(A或B语言)的语意关联度进行对比,获取到与待翻译的第一语音信号的语境最相关最贴合的第二语音信号,存储于翻译机云端服务器上并播放,在其他实施例中,还可以存储于本地存储空间。
如图7所示,在本实施例中,第一获取模块1包括:
第一获取单元11,用于获取翻译机当前所在环境的信噪比;
预设单元12,用于根据信噪比设置多个不同的增益;
第二获取单元13,用于分别获取在不同增益下对第一语音信号一一对应的第二语音信号。
信噪比,英文名称叫做SNR或S/N(SIGNAL-NOISE RATIO),又称为讯噪比,是指一个电子设备或者电子系统中信号与噪声的比例。该信号指的是来自设备外部需要通过这台设备进行处理的电子信号,噪声是指经过该设备后产生的原信号中并不存在的无规则的额外信号(或信息),并且该种信号并不随原信号的变化而变化。信号与噪声的比例,信噪比越大,说明混在信号里的噪声越小,声音回放的音质量越高,否则相反。
在使用翻译机的过程中,第一获取单元11获取到翻译机当前所在环境的信噪比,预设单元12根据信噪比设置多个不同的增益,设置的增益用于第二获取单元13分别获取翻译第一语音信号对应的第二语音信号。当获取到翻译机识别用户说话的距离远而且准确,需要在干扰源或者噪音源小时,加大进入麦克风的语音信号的增益;在干扰源或者噪音源大时,减小进入麦克风的语音信号的增益。外界的干扰源或者噪音源不是固定的,而且很多噪音本身就是人声,或者说频率跟人声的频率重合,互相叠加,在实际使用过程中,根据干扰源或者噪音源的大小来调整进入麦克风的语音信号的增益以使翻译机识别翻译用户说话的距离远而且准确。如:当翻译机识别获取到用户说话的距离远而且准确时,设置进入麦克风的语音信号的增益加大,这在翻译机周围的噪声比较小,进入麦克风的语音信号的信噪比较大,加大进入麦克风的语音信号的增益可以提高识别翻译的准确率。
如图8所示,在本实施例中,对比模块2包括:
回溯单元21,用于根据时间从晚到早的顺序逐次往前回溯第三语音信号的历史信息,其中,历史信息包括对话信息或者一段话语;
对比单元22,用于比较历史信息与各第二语音信号的语意关联度。
在本实施中,历史信息指的是翻译机在翻译过程中记录和存储的两个用户轮流交谈时的完整对话信息或某一用户的一段话语信息。翻译机可以按照时间从早到晚的顺序或按照记录信息的次数对历史对话记录中的每一段话或者每一句话进行排序,并可以对每一段话进行顺序标记。
在一些实施例中,历史信息包括两种语言的历史对话记录,每种语言的历史对话记录都包含了两个用户的完整对话信息。A语言的历史对话记录和B语言的历史对话记录,每种语言的对话记录中都有N段话,包含了A用户和B用户的完整对话信息,其中,A语言的历史对话记录中包括A用户输入的原始信息,以及将B用户输入的原始信息翻译为A语言后的信息;B语言的历史对话记录中包括B用户输入的原始信息,以及将A用户输入的原始信息翻译为B语言后的信息。
根据时间从晚到早的顺序逐次往前回溯第三语音信号的历史信息的语意关联度,分别与各第二语音信号比较,可获得根据历史信息的语意关联度由高至低或者由低至高的排列顺序。
如图9所示,在本实施例中,判断模块4还包括:
第二判断单元41,用于若第二增益与前一次翻译时使用的第一增益不同,则判断语意关联度最高的第二语音信号的语意,是否与第一增益所放大的第二语音信号的语意相同;
第一记录单元421,用于若语意不同,则记录并播放语意关联度最高的第二语音信号与所对应的增益;
第二记录单元422,若语意相同,则记录并播放前一次翻译时使用的第一增益所放大的第二语音信号。
判断第二增益与前一次翻译时使用的第一增益不同,第二判断单元42需要进行下一步的判断,对语意关联度最高的第二语音信号的语意与第一增益所放大的第二语音信号的语意进行判断,若判断以上的第二语音信号的语意不同,则第一记录单元431记录语意关联度最高的第二语音信号以及所对应的增益,该增益作为下一次翻译时的第一增益,并对第二语音信号进行播放;若判断以上的第二语音信号的语意相同,则依然保留前一次翻译时使用的第一增益作为下一次翻译时的第一增益,第二记录单元432记录前一次翻译时使用的第一增益所放大的第二语音信号,并进行播放。
如图10所示,在本实施例中,第二判断单元41包括:
获取子单元411,用于获取语意关联度最高的第二语音信号对应文字文本中的各第一关键词以及各第一关键词之间的衔接关系,获取第一增益所放大的第二语音信号对应文字文本中的各第二关键词以及各第二关键词之间的衔接关系;
判断子单元412,用于判断各第一关键字以及各第一关键词之间的衔接关系,与各第二关键词以及各第二关键词之间的衔接关系的匹配度是否在预设范围内;
判定子单元413,用于若是在预设范围内,则判定语意关联度最高的第二语音信号的语意,与第一增益所放大的第二语音信号的语意相同;否则,则判定语意关联度最高的第二语音信号的语意,与第一增益所放大的第二语音信号的语意不相同。
在本实施例中,获取子单元421获取语意关联度最高的第二语音信号对应文字文本中的各第一关键词以及各第一关键词之间的衔接关系,同时也获取第一增益所放大的第二语音信号对应文字文本中的各第二关键词以及各第二关键词之间的衔接关系;预先设置一个匹配范围值,判断子单元422判断各第一关键词以及各第一关键词之间的衔接关系,与各第二关键词以及各第二关键词之间的衔接关系的匹配度是否在范围值内。
在一具体实施例中,预设匹配范围值在90%-100%,当各第一关键词以及各第一关键词之间的衔接关系,与各第二关键词以及各第二关键词之间的衔接关系的匹配度在预设匹配范围值90%-100%之间,则判定子单元423判定两句语音信号的语意相同;若两句语音信号的匹配度低于90%,则判定子单元423判定判断该两句语音信号的语意不相同。
本发明的一种语音翻译方法及其装置,通过预设在翻译装置上的多个不同增益分别进行放大翻译,并根据之前的语句的语意关联度,获取到最佳翻译,极大的方便了用户使用翻译机,提升用户使用翻译机的体验。
以上所述仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。

Claims (10)

  1. 一种语音翻译方法,其特征在于,包括:
    分别获取在不同预设增益下对第一语音信号进行翻译得到的第二语音信号;
    分别比较各所述第二语音信号与预先设置的第三语音信号的语意关联度,其中,所述第一语音信号为所述第三语音信号的回复信号,且所述第二语音信号与所述第三语音信号为同一语种;
    获取与所述第三语音信号语意关联度最高的第二语音信号以及所对应的第二增益;
    判断所述第二增益是否与前一次翻译时使用的第一增益相同,其中,所述第一增益和所述第二增益属于所述预设增益;
    若相同,则记录并播放语意关联度最高的第二语音信号。
  2. 根据权利要求1所述的语音翻译方法,其特征在于,所述分别获取在不同预设增益下对第一语音信号进行翻译得到的第二语音信号步骤,包括:
    获取翻译机当前所在环境的信噪比;
    根据所述信噪比设置多个不同的增益;
    分别获取在不同增益下对所述第一语音信号进行翻译得到的所述第二语音信号。
  3. 根据权利要求1所述的语音翻译方法,其特征在于,所述分别比较各所述第二语音信号与预先设置的第三语音信号的语意关联度的步骤,包括:
    根据时间从晚到早的顺序逐次往前回溯所述第三语音信号的历史信息,其中,所述历史信息包括对话信息或者一段话语信息;
    比较所述历史信息与各所述第二语音信号的语意关联度。
  4. 根据权利要求1所述的语音翻译方法,其特征在于,所述判断所述第二增益是否与前一次翻译时使用的第一增益相同的步骤之后,还包括:
    若所述第二增益与前一次翻译时使用的所述第一增益不同,则判断语意关联度最高的第二语音信号的语意,是否与所述第一增益所放大的第二语音信号的语意相同;
    若语意不同,则记录并播放语意关联度最高的第二语音信号与所对应的增益;
    若语意相同,则记录并播放前一次翻译时使用的第一增益所放大的第二语音信号。
  5. 根据权利要求4所述的语音翻译方法,其特征在于,所述判断语意关联度最高的第二语音信号的语意,是否与所述第一增益所放大的第二语音信号的语意相同的步骤,包括:
    获取语意关联度最高的第二语音信号对应文字文本中的各第一关键词以及各第一关键词之间的衔接关系,获取所述第一增益所放大的第二语音信号对应文字文本中的各第二关键词以及各第二关键词之间的衔接关系;
    判断各所述第一关键字以及各第一关键词之间的衔接关系,与各所述第二关键词以及各第二关键词之间的衔接关系的匹配度是否在预设范围内;
    若是,则判定语意关联度最高的第二语音信号的语意,与所述第一增益所放大的第二语音信号的语意相同;否则,则判定语意关联度最高的第二语音信号的语意,与所述第一增益所放大的第二语音信号的语意不相同。
  6. 一种语音翻译装置,其特征在于,包括:
    第一获取模块,用于分别获取在不同预设增益下对第一语音信号进行翻译得到的第二语音信号;
    对比模块,用于分别比较各所述第二语音信号与预先设置于翻译机内的第三语音信号的语意关联度,其中,所述第一语音信号为所述第三语音信号的回复信号,且所述第二语音信号与所述第三语音信号为同一语种;
    第二获取模块,用于获取与所述第三语音信号语意关联度最高的第二语音信号以及所对应的第二增益;
    判断模块,用于判断所述第二增益是否与前一次翻译时使用的第一增益相同,其中,所述第一增益和所述第二增益属于所述预设增益;
    执行模块,若所述第二增益与前一次翻译时使用的第一增益相同,则记录并播放语意关联度最高的第二语音信号。
  7. 根据权利要求6所述的一种语音翻译装置,其特征在于,所述第一获取模块包括:
    第一获取单元,用于获取翻译机当前所在环境的信噪比;
    预设单元,用于根据所述信噪比设置多个不同的增益;
    第二获取单元,用于分别获取在不同增益下对所述第一语音信号进行翻译得到所述第二语音信号。
  8. 根据权利要求6所述的一种语音翻译装置,其特征在于,所述对比模块包括:
    回溯单元,用于根据时间从晚到早的顺序逐次往前回溯所述第三语音信号的历史信息,其中,所述历史信息包括对话信息或者一段话语信息;
    对比单元,用于根据所述历史信息的语意关联度与各所述第二语音信号的比较;
    第一判断单元,用于判断与所述历史信息的语意广联度最高的所述第二语音信息。
  9. 根据权利要求6所述的语音翻译装置,其特征在于,所述判断模块还包括:
    第二判断单元,用于若所述第二增益与前一次翻译时使用的所述第一增益不同,则判断语意关联度最高的第二语音信号的语意,是否与所述第一增益所放大的第二语音信号的语意相同;
    第一记录单元,用于若语意不同,则记录并播放语意关联度最高的第二语音信号与所对应的增益;
    第二记录单元,若语意相同,则记录并播放前一次翻译时使用的第一增益所放大的第二语音信号。
  10. 根据权利要求9所述的语音翻译装置,其特征在于,第二判断单元包括:
    获取子单元,用于获取语意关联度最高的第二语音信号对应文字文本中的各第一关键词以及各第一关键词之间的衔接关系,获取所述第一增益所放大的第二语音信号对应文字文本中的各第二关键词以及各第二关键词之间的衔接关系;
    判断子单元,用于判断各所述第一关键字以及各第一关键词之间的衔接关系,与各所述第二关键词以及各第二关键词之间的衔接关系的匹配度是否在预设范围内;
    判定子单元,用于若是在预设范围内,则判定语意关联度最高的第二语音信号的语意,与所述第一增益所放大的第二语音信号的语意相同;否则,则判定语意关联度最高的第二语音信号的语意,与所述第一增益所放大的第二语音信号的语意不相同。
PCT/CN2018/093456 2018-05-31 2018-06-28 语音翻译方法及其装置 WO2019227562A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810550273.1 2018-05-31
CN201810550273.1A CN108829687B (zh) 2018-05-31 2018-05-31 语音翻译方法及其装置

Publications (1)

Publication Number Publication Date
WO2019227562A1 true WO2019227562A1 (zh) 2019-12-05

Family

ID=64147107

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/093456 WO2019227562A1 (zh) 2018-05-31 2018-06-28 语音翻译方法及其装置

Country Status (2)

Country Link
CN (1) CN108829687B (zh)
WO (1) WO2019227562A1 (zh)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107844470A (zh) * 2016-09-18 2018-03-27 腾讯科技(深圳)有限公司 一种语音数据处理方法及其设备
CN107863102A (zh) * 2017-12-25 2018-03-30 青岛冠义科技有限公司 一种语音识别电路及翻译系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11994895B2 (en) * 2013-09-27 2024-05-28 Labor Genome, Ltd. System for scoring an organizational role capability
CN106782521A (zh) * 2017-03-22 2017-05-31 海南职业技术学院 一种语音识别系统

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107844470A (zh) * 2016-09-18 2018-03-27 腾讯科技(深圳)有限公司 一种语音数据处理方法及其设备
CN107863102A (zh) * 2017-12-25 2018-03-30 青岛冠义科技有限公司 一种语音识别电路及翻译系统

Also Published As

Publication number Publication date
CN108829687A (zh) 2018-11-16
CN108829687B (zh) 2021-04-13

Similar Documents

Publication Publication Date Title
CN107895578B (zh) 语音交互方法和装置
US9864745B2 (en) Universal language translator
CN110870201B (zh) 音频信号调节方法、装置、存储介质及终端
US11227125B2 (en) Translation techniques with adjustable utterance gaps
WO2019111346A1 (ja) 双方向音声翻訳システム、双方向音声翻訳方法及びプログラム
WO2019033987A1 (zh) 提示方法、装置、存储介质及终端
CN111325039B (zh) 基于实时通话的语言翻译方法、系统、程序和手持终端
JP2010102254A (ja) 話者テンプレートを更新する装置及び方法
CN108449507A (zh) 语音通话数据处理方法、装置、存储介质及移动终端
WO2021244056A1 (zh) 一种数据处理方法、装置和可读介质
WO2019228329A1 (zh) 个人听力装置、外部声音处理装置及相关计算机程序产品
KR20180012639A (ko) 음성 인식 방법, 음성 인식 장치, 음성 인식 장치를 포함하는 기기, 음성 인식 방법을 수행하기 위한 프로그램을 저장하는 저장 매체, 및 변환 모델을 생성하는 방법
JP2011248025A (ja) チャネル統合方法、チャネル統合装置、プログラム
CN110198375A (zh) 录音方法、终端及计算机可读存储介质
KR101367722B1 (ko) 휴대단말기의 통화 서비스 방법
CN110475181A (zh) 设备配置方法、装置、设备和存储介质
CN113921026A (zh) 语音增强方法和装置
WO2019227562A1 (zh) 语音翻译方法及其装置
CN111274828B (zh) 基于留言的语言翻译方法、系统、计算机程序和手持终端
KR20130116128A (ko) 티티에스를 이용한 음성인식 질의응답 시스템 및 그것의 운영방법
KR101429138B1 (ko) 복수의 사용자를 위한 장치에서의 음성 인식 방법
CN102456305A (zh) 一种基于语音识别的便携式智能多媒体导览系统
JP2020053060A (ja) 情報提供方法、情報提供装置およびプログラム
CN112741622A (zh) 一种测听系统、测听方法、装置、耳机及终端设备
CN110875050B (zh) 用于真实场景的语音数据收集方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18920815

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18920815

Country of ref document: EP

Kind code of ref document: A1