CN107731232A - Voice translation method and device - Google Patents

Voice translation method and device Download PDF

Info

Publication number
CN107731232A
CN107731232A CN201710967364.0A CN201710967364A CN107731232A CN 107731232 A CN107731232 A CN 107731232A CN 201710967364 A CN201710967364 A CN 201710967364A CN 107731232 A CN107731232 A CN 107731232A
Authority
CN
China
Prior art keywords
voice
sex
voice messaging
vocal print
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710967364.0A
Other languages
Chinese (zh)
Inventor
郑勇
王文祺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Water World Co Ltd
Original Assignee
Shenzhen Water World Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Water World Co Ltd filed Critical Shenzhen Water World Co Ltd
Priority to CN201710967364.0A priority Critical patent/CN107731232A/en
Priority to PCT/CN2017/111961 priority patent/WO2019075829A1/en
Publication of CN107731232A publication Critical patent/CN107731232A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Machine Translation (AREA)

Abstract

Present invention is disclosed a kind of voice translation method and device, the described method comprises the following steps:Identify the voice sex of original voice messaging;According to phonetic synthesis vocal print corresponding to the voice Sex preference;Translation processing is carried out to the original voice messaging according to the phonetic synthesis vocal print of selection, so that the voice sex of the voice messaging after translation processing is consistent with the voice sex of original voice messaging, realized to the adaptive of voice sex.It is male voice that the voice come is translated when male speaks, and it is female voice that the voice come is translated when women speaks so that raw tone is harmonious with translated speech, greatly strengthen the sense of reality of exchange, improves Consumer's Experience.

Description

Voice translation method and device
Technical field
The present invention relates to electronic technology field, especially relates to a kind of voice translation method and device.
Background technology
At present, when two users for saying different language exchange, can be translated by interpreting equipment, so as to realize no barrier Hinder exchange.Specific implementation is:The specific keys of an interpreting equipment are pressed when user speaks, interpreting equipment then gathers voice Information simultaneously carries out translation processing, and user presses one-touch again after finishing one section of word, after interpreting equipment then exports translation processing Voice messaging.
The voice sex of voice messaging after interpreting equipment translation processing is pre-set, and user can be set as man Sound or female voice, once set, no matter talker is male or women, and the voice messaging after translation processing is all phase Same voice sex.For example, after male voice is set as, if talker is women, the voice messaging after translation processing is then Male voice;After female voice is set as, if talker is male, the voice messaging after translation processing is then female voice.
As can be seen here, in the prior art, the voice sex of the voice messaging after translation processing and original voice messaging Voice sex is possible to inconsistent, causes raw tone and translated speech uncoordinated so that user, which sounds, feels very strange, greatly The big sense of reality for reducing exchange, Consumer's Experience are bad.
The content of the invention
The main object of the present invention is a kind of voice translation method of offer and device, it is intended to solves the voice after translation processing The voice sex of the information technical problem inconsistent with the voice sex of original voice messaging, strengthens the sense of reality of exchange, carries Rise Consumer's Experience.
To achieve these objectives, the embodiment of the present invention proposes a kind of voice translation method, the described method comprises the following steps:
Identify the voice sex of original voice messaging;
According to phonetic synthesis vocal print corresponding to the voice Sex preference;
Translation processing is carried out to the original voice messaging according to the phonetic synthesis vocal print of selection, so that after translation processing Voice messaging voice sex it is consistent with the voice sex of original voice messaging.
Alternatively, the step of voice sex of the original voice messaging of the identification includes:
Obtain the frequency of the fundamental tone of the original voice messaging;
Compare the frequency of the fundamental tone and the size of threshold value;
When the frequency of the fundamental tone is less than or equal to threshold value, the voice sex for identifying the original voice messaging is Male voice;
When the frequency of the fundamental tone is more than threshold value, the voice sex for identifying the original voice messaging is female voice.
Alternatively, the step of frequency of the fundamental tone for obtaining the original voice messaging includes:
With default sample frequency to the original voice messaging continuous sampling M frames, M >=2;
Fundamental frequency feature extraction is carried out to the speech frame of collection;
Go out the frequency of the fundamental tone of the original voice messaging according to the fundamental frequency characteristic statisticses of extraction.
Alternatively, 25 >=M≤35.
Alternatively, when a length of 20-30ms of the speech frame.
Alternatively, the sample frequency is 8kHz.
Alternatively, the threshold value is 180-220Hz.
Alternatively, the phonetic synthesis vocal print according to selection to the original voice messaging translate the step of processing Suddenly include:
Voice recognition processing is carried out to the original voice messaging, obtains the first character string of source language;
Character translation processing is carried out to first character string, obtains the second character string of object language;
Phonetic synthesis processing is carried out to second character string using the phonetic synthesis vocal print of selection, obtains object language Voice messaging.
Alternatively, the step of voice sex of the original voice messaging of the identification includes:Whenever detecting one section of voice When information starts, then the voice sex of the voice messaging is identified.
Alternatively, the phonetic synthesis vocal print includes male voice vocal print and female voice vocal print, described to be selected according to the voice sex Include corresponding to selecting the step of phonetic synthesis vocal print:
When the voice sex is male voice, the male voice vocal print is selected;
When the voice sex is female voice, the female voice vocal print is selected.
The embodiment of the present invention proposes a kind of speech translation apparatus simultaneously, and described device includes:
Gender identification module, for identifying the voice sex of original voice messaging;
Vocal print selecting module, for the phonetic synthesis vocal print according to corresponding to the voice Sex preference;
Translation processing module, the original voice messaging is carried out at translation for the phonetic synthesis vocal print according to selection Reason, so that the voice sex of the voice messaging after translation processing is consistent with the voice sex of original voice messaging.
Alternatively, the gender identification module includes:
Acquiring unit, the frequency of the fundamental tone for obtaining the original voice messaging;
Comparing unit, for the frequency of the fundamental tone and the size of threshold value;
First recognition unit, for when the frequency of the fundamental tone is less than or equal to threshold value, determining the original language The voice sex of message breath is male voice;
Second recognition unit, for when the frequency of the fundamental tone is more than threshold value, determining the original voice messaging Voice sex be female voice.
Alternatively, the acquiring unit includes:
Sample subelement, for default sample frequency to the original voice messaging continuous sampling M frames, M >=2;
Subelement is extracted, for carrying out fundamental frequency feature extraction to the speech frame of collection;
Subelement is counted, for going out according to the fundamental frequency characteristic statisticses of extraction the fundamental tone of the original voice messaging Frequency.
Alternatively, the translation processing module includes:
First processing units, for carrying out voice recognition processing to the original voice messaging, obtain source language First character string;
Second processing unit, for carrying out character translation processing to first character string, obtain the second of object language Character string;
3rd processing unit, second character string is carried out at phonetic synthesis for the phonetic synthesis vocal print using selection Reason, obtains the voice messaging of object language.
Alternatively, the gender identification module is used for:Whenever detecting that one section of voice messaging starts, then institute's predicate is identified The voice sex of message breath.
Alternatively, the phonetic synthesis vocal print includes male voice vocal print and female voice vocal print, and the vocal print selecting module includes:
First choice unit, for when the voice sex is male voice, selecting the male voice vocal print;
Second selecting unit, for when the voice sex is female voice, selecting the female voice vocal print.
The embodiment of the present invention also proposes a kind of interpreting equipment, and the interpreting equipment includes memory, processor and at least one It is individual to be stored in the memory and be configured as being configured by the application program of the computing device, the application program For for performing aforementioned voice interpretation method.
A kind of voice translation method that the embodiment of the present invention is provided, by the voice for identifying original voice messaging Not, further according to phonetic synthesis vocal print corresponding to voice Sex preference, finally according to the phonetic synthesis vocal print of selection to original language Message breath carries out translation processing, so that the voice sex and the language of original voice messaging of the voice messaging after translation processing Sound sex is consistent, realizes to the adaptive of voice sex.It is male voice that the voice come is translated when male speaks, and works as women It is female voice that the voice come is translated when speaking so that raw tone is harmonious with translated speech, greatly strengthen the true of exchange True feeling, improve Consumer's Experience.
Brief description of the drawings
Fig. 1 is the flow chart of the embodiment of voice translation method one of the present invention;
Fig. 2 is the particular flow sheet of step S11 in Fig. 1;
Fig. 3 is the module diagram of the embodiment of speech translation apparatus one of the present invention;
Fig. 4 is the module diagram of the gender identification module in Fig. 3;
Fig. 5 is the module diagram of the acquiring unit in Fig. 4;
Fig. 6 is the module diagram of the vocal print selecting module in Fig. 3;
Fig. 7 is the module diagram of the translation processing module in Fig. 3.
The realization, functional characteristics and advantage of the object of the invention will be described further referring to the drawings in conjunction with the embodiments.
Embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached The embodiment of figure description is exemplary, is only used for explaining the present invention, and is not construed as limiting the claims.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singulative " one " used herein, " one It is individual ", " described " and "the" may also comprise plural form.It is to be further understood that what is used in the specification of the present invention arranges Diction " comprising " refer to the feature, integer, step, operation, element and/or component be present, but it is not excluded that in the presence of or addition One or more other features, integer, step, operation, element, component and/or their groups.It should be understood that when we claim member Part is " connected " or during " coupled " to another element, and it can be directly connected or coupled to other elements, or there may also be Intermediary element.In addition, " connection " used herein or " coupling " can include wireless connection or wireless coupling.It is used herein to arrange Taking leave "and/or" includes whole or any cell and all combinations of one or more associated list items.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art Language and scientific terminology), there is the general understanding identical meaning with the those of ordinary skill in art of the present invention.Should also Understand, those terms defined in such as general dictionary, it should be understood that have with the context of prior art The consistent meaning of meaning, and unless by specific definitions as here, idealization or the implication of overly formal otherwise will not be used To explain.
Those skilled in the art of the present technique are appreciated that " terminal " used herein above, " terminal device " both include wireless communication The equipment of number receiver, it only possesses the equipment of the wireless signal receiver of non-emissive ability, includes receiving again and transmitting hardware Equipment, its have on bidirectional communication link, can perform two-way communication reception and launch hardware equipment.This equipment It can include:Honeycomb or other communication equipments, it has single line display or multi-line display or shown without multi-line The honeycomb of device or other communication equipments;PCS (Personal Communications Service, PCS Personal Communications System), it can With combine voice, data processing, fax and/or its communication ability;PDA (Personal Digital Assistant, it is personal Digital assistants), it can include radio frequency receiver, pager, the Internet/intranet access, web browser, notepad, day Go through and/or GPS (Global Positioning System, global positioning system) receiver;Conventional laptop and/or palm Type computer or other equipment, its have and/or the conventional laptop including radio frequency receiver and/or palmtop computer or its His equipment." terminal " used herein above, " terminal device " they can be portable, can transport, installed in the vehicles (aviation, Sea-freight and/or land) in, or be suitable for and/or be configured in local runtime, and/or with distribution form, operate in the earth And/or any other position operation in space." terminal " used herein above, " terminal device " can also be communication terminal, on Network termination, music/video playback terminal, such as can be PDA, MID (Mobile Internet Device, mobile Internet Equipment) and/or mobile phone or the equipment such as intelligent television, set top box with music/video playing function.
Those skilled in the art of the present technique are appreciated that server used herein above, and it includes but is not limited to computer, net The cloud that network main frame, single network server, multiple webserver collection or multiple servers are formed.Here, cloud is by based on cloud meter The a large amount of computers or the webserver for calculating (Cloud Computing) are formed, wherein, cloud computing is the one of Distributed Calculation Kind, a super virtual computer being made up of the computer collection of a group loose couplings.In embodiments of the invention, server, Between terminal device and WNS servers can by any communication mode realize communicate, including but not limited to, based on 3GPP, LTE, WIMAX mobile communication, based on TCP/IP, the computer network communication of udp protocol and based on bluetooth, Infrared Transmission standard Low coverage wireless transmission method.
The voice translation method and device of the embodiment of the present invention, can apply to interpreting equipment, can also be applied to service Device.Interpreting equipment can be the mobile terminals such as special translator or mobile phone, flat board, can also be PC, pen Remember the terminals such as this computer.Reference picture 1, propose the present invention the embodiment of voice translation method one, methods described include with Lower step:
The voice sex of the original voice messaging of S11, identification.
Original voice messaging described in the embodiment of the present invention, i.e., voice messaging to be translated.Original voice messaging can To be the voice messaging gathered on the spot, local voice messaging or the language obtained from miscellaneous equipment can be stored in Message ceases.
Exemplified by applied to interpreting equipment, interpreting equipment can gather the voice messaging that user sends by microphone, should Voice messaging is original voice messaging.
Exemplified by applied to server, server receives the voice messaging that interpreting equipment is sent, and the voice messaging is original The voice messaging of beginning.
When identifying the voice sex of voice messaging, it can be identified and calculated by sex using fundamental frequency as basis of characterization Method identifies the voice sex of original voice messaging, sex recognizer such as VQ (Vector Quantization, the arrow Amount quantify), HMM (Hidden Markov Model, hidden Markov model), SVM (Support Vector Machines, SVMs) etc..
As shown in Fig. 2 the voice sex of original voice messaging can be identified in the following manner, following step is specifically included Suddenly:
The frequency of the fundamental tone of the original voice messaging of S111, acquisition.
Specifically, first with default sample frequency to original voice messaging continuous sampling M (M >=2) frame, then to adopting The speech frame of collection carries out fundamental frequency feature extraction, finally goes out original voice messaging according to the fundamental frequency characteristic statisticses of extraction Fundamental tone frequency.
Sample frequency can select 8kHz, naturally it is also possible to select other frequencies.M span preferably 25 >=M≤ 35, such as take M=30, i.e. the frame speech frame of continuous sampling 30.The duration of each speech frame is preferably 20-30ms.In statistics base During the frequency of sound, the fundamental frequency of the speech frame of collection can be averaged, using average value as original voice messaging The frequency of fundamental tone.
S112, the frequency for comparing fundamental tone and threshold value size, judge whether the frequency of fundamental tone is less than or equal to threshold value. When the frequency of fundamental tone is less than or equal to threshold value, into step S113;When the frequency of fundamental tone is more than threshold value, into step S114。
The fundamental frequency of male voice is less than the fundamental frequency of female voice, and the fundamental frequency distribution of male voice is typically in the range of 0- Between 200Hz, the fundamental frequency distribution of female voice is typically in the range of between 200-500Hz, therefore threshold value can be set as 180-220Hz, such as it is set as 200Hz.
The voice sex of the original voice messaging of S113, identification is male voice.
The voice sex of the original voice messaging of S114, identification is female voice.
The voice sex of voice messaging described in the embodiment of the present invention includes male voice and female voice.Be less than when the frequency of fundamental tone or During equal to threshold value, then the voice sex for identifying original voice messaging is male voice.When the frequency of fundamental tone is more than threshold value, then The voice sex for identifying original voice messaging is female voice.
In the embodiment of the present invention, whenever detecting that one section of voice messaging starts, then the voice of a voice messaging is identified Sex, think that each section of voice messaging matches corresponding phonetic synthesis vocal print respectively so that each section of voice after translation processing The voice sex of information is consistent with the voice sex of each section of original voice messaging.
At the beginning and end of one section of voice messaging is detected, it can be determined by the time interval of two sections of voices, example Such as:When not detecting voice messaging in preset duration, it is determined that one section of voice terminates, when detecting voice messaging again When, it is determined that next section of voice starts.When detecting voice messaging, voice activity detection (VAD, Voice can be passed through Activity Detection) technology detects in voice signal whether to include voice messaging.
, can also be by detecting whether specific keys are triggered to detect one section of voice messaging when applied to interpreting equipment Beginning and end, such as:When specific keys are triggered first, then one section of voice messaging starts, when specific keys again by During triggering, then one section of voice messaging terminates.
S12, the phonetic synthesis vocal print according to corresponding to the voice Sex preference of original voice messaging.
In the embodiment of the present invention, two kinds of phonetic synthesis vocal prints, respectively male voice vocal print and female voice vocal print are prefixed.Work as identification When the voice sex for going out original voice messaging is male voice, then male voice vocal print is selected;When the language for identifying original voice messaging When sound sex is female voice, then female voice vocal print is selected.
Further, male voice vocal print and female voice vocal print include at least two respectively, and the fundamental frequency of each is different, can be with According to male voice vocal print or female voice vocal print corresponding to the selection of the frequency of the fundamental tone of original voice messaging.So that after translation processing The vocal print of voice messaging and original voice messaging more coincide, further enhancing the sense of reality of exchange.
S13, translation processing carried out to original voice messaging according to the phonetic synthesis vocal print of selection.
In this step S13, translation processing is carried out to original voice messaging according to the phonetic synthesis vocal print of selection so that turn over It is consistent with the voice sex of original voice messaging to translate the voice sex of the voice messaging after processing, enhances the true of exchange Sense, improves Consumer's Experience.
The translation processing of voice messaging, mainly including three speech recognition, character translation, phonetic synthesis flows, specifically: Voice recognition processing is carried out to original voice messaging first, obtains the first character string of source language;First character string is entered The processing of row character translation, obtains the second character string of object language;The second character string is entered using the phonetic synthesis vocal print of selection The processing of row phonetic synthesis, obtains the voice messaging of object language.
Exemplified by applied to interpreting equipment.Interpreting equipment can locally carry out translation processing, i.e., original voice is believed Breath performs three speech recognition, character translation, phonetic synthesis handling processes successively, obtains the code of the voice messaging of object language Stream.
Interpreting equipment can also carry out translation processing by server.Such as:Interpreting equipment first believes original voice Breath is sent to speech recognition server, and speech recognition server carries out speech recognition to original voice messaging, identifies first Character string simultaneously returns to interpreting equipment;Interpreting equipment receives the first character string, and the first character string is sent into character translation clothes Business device, character translation server carry out character translation to the first character string, are translated as the second character string of object language and return To interpreting equipment;Interpreting equipment receives the second character string, and the phonetic synthesis vocal print of the second character string and selection is sent into language Sound synthesis server, voice synthesizing server are carried out at phonetic synthesis using the phonetic synthesis vocal print of selection to the second character string Reason, obtains the voice messaging of object language, and the voice messaging of object language is returned into interpreting equipment in the form of code stream, turns over Translate the code stream that equipment receives the voice messaging of object language, the voice messaging after being translated.
Certainly, in other embodiments, interpreting equipment can also be by original voice messaging and the phonetic synthesis sound of selection Line is sent to a server, and speech recognition is directly carried out to original voice messaging for the server and character translation is handled, and Phonetic synthesis is carried out using the phonetic synthesis vocal print of selection, obtains the code stream of the voice messaging of object language.
Exemplified by applied to server.Server performs speech recognition, character translation, language successively to original voice messaging Sound synthesizes three handling processes, obtains the voice messaging of object language.And by the voice messaging of object language in the form of code stream It is sent to interpreting equipment.
After interpreting equipment obtains the voice messaging after translation processing, then the voice messaging is exported, for example, drive the speaker is defeated Go out the voice messaging.Because the voice sex of the voice messaging of output is consistent with the voice sex of original voice messaging, because This user, which sounds, feels truer, improves Consumer's Experience.
The voice translation method of the embodiment of the present invention, by identifying the voice sex of original voice messaging, further according to Phonetic synthesis vocal print corresponding to voice Sex preference, finally original voice messaging is carried out according to the phonetic synthesis vocal print of selection Translation is handled, so that the voice sex and the voice sex phase one of original voice messaging of the voice messaging after translation processing Cause, realize to the adaptive of voice sex.It is male voice that the voice come is translated when male speaks, and is translated when women speaks Voice out is female voice so that raw tone is harmonious with translated speech, greatly strengthen the sense of reality of exchange, improves Consumer's Experience.
Reference picture 3, propose the present invention the embodiment of speech translation apparatus one, described device include gender identification module 10, Vocal print selecting module 20 and translation processing module 30, wherein:Gender identification module 10, for identifying the language of original voice messaging Sound sex;Vocal print selecting module 20, for the phonetic synthesis vocal print according to corresponding to original voice Sex preference;Translation processing mould Block 30, translation processing is carried out to original voice messaging for the phonetic synthesis vocal print according to selection, so that after translation processing The voice sex of voice messaging is consistent with the voice sex of original voice messaging.
Original voice messaging described in the embodiment of the present invention, i.e., voice messaging to be translated.Original voice messaging can To be the voice messaging gathered on the spot, local voice messaging or the language obtained from miscellaneous equipment can be stored in Message ceases.
Exemplified by applied to interpreting equipment, interpreting equipment can gather the voice messaging that user sends by microphone, should Voice messaging is original voice messaging.
Exemplified by applied to server, server receives the voice messaging that interpreting equipment is sent, and the voice messaging is original The voice messaging of beginning.
Identify voice messaging voice sex when, gender identification module 10 can using fundamental frequency as basis of characterization, The voice sex of original voice messaging, described sex recognizer such as VQ, HMM, SVM are identified by sex recognizer Deng.
Alternatively, gender identification module 10 is as shown in figure 4, including acquiring unit 11, comparing unit 12, the first recognition unit 13 and second recognition unit 14, wherein:Acquiring unit 11, the frequency of the fundamental tone for obtaining original voice messaging;Compare list Member 12, for comparing the frequency of fundamental tone and the size of threshold value;First recognition unit 13, for when the frequency of fundamental tone is less than or waits When threshold value, the voice sex for determining original voice messaging is male voice;Second recognition unit 14, for when the frequency of fundamental tone During more than threshold value, the voice sex for determining original voice messaging is female voice.
As shown in figure 5, acquiring unit 11 includes sampling subelement 111, extraction subelement 112 and statistics subelement 113, its In:Sample subelement 111, for default sample frequency to original voice messaging continuous sampling M (M >=2) frame,;Extraction Subelement 112, for carrying out fundamental frequency feature extraction to the speech frame of collection;Subelement 113 is counted, for according to extraction Fundamental frequency characteristic statisticses go out the frequency of the fundamental tone of original voice messaging.
Sample frequency can select 8kHz, naturally it is also possible to select other frequencies.M span preferably 25 >=M≤ 35, such as take M=30, i.e. the frame speech frame of continuous sampling 30.The duration of each speech frame is preferably 20-30ms.In statistics base During the frequency of sound, statistics subelement 113 can average to the fundamental frequency of the speech frame of collection, using average value as original Voice messaging fundamental tone frequency.
The fundamental frequency of male voice is less than the fundamental frequency of female voice, and the fundamental frequency distribution of male voice is typically in the range of 0- Between 200Hz, the fundamental frequency distribution of female voice is typically in the range of between 200-500Hz, therefore threshold value can be set as 180-220Hz, such as it is set as 200Hz.
The voice sex of voice messaging described in the embodiment of the present invention includes male voice and female voice.Be less than when the frequency of fundamental tone or During equal to threshold value, the first recognition unit 13 then identifies that the voice sex of original voice messaging is male voice.When the frequency of fundamental tone During more than threshold value, the second recognition unit 14 then identifies that the voice sex of original voice messaging is female voice.
In the embodiment of the present invention, whenever detecting that one section of voice messaging starts, sex recognition unit then identifies a language The voice sex of message breath, think that each section of voice messaging matches corresponding phonetic synthesis vocal print respectively so that after translation processing Each section of voice messaging voice sex it is consistent with the voice sex of each section of original voice messaging.
Sex recognition unit, can be between the time by two sections of voices at the beginning and end of one section of voice messaging is detected Every determining, such as:When not detecting voice messaging in preset duration, it is determined that one section of voice terminates, when examining again When measuring voice messaging, it is determined that next section of voice starts.When detecting voice messaging, voice activity detection can be passed through (VAD, Voice Activity Detection) technology detects in voice signal whether to include voice messaging.
When applied to interpreting equipment, sex recognition unit can also be by detecting whether specific keys are triggered to detect The beginning and end of one section of voice messaging, such as:When specific keys are triggered first, then one section of voice messaging starts, and works as spy When determining button and being triggered again, then one section of voice messaging terminates.
In the embodiment of the present invention, two kinds of phonetic synthesis vocal prints, respectively male voice vocal print and female voice vocal print are prefixed.Vocal print selects Module 20 is selected as shown in fig. 6, including the selecting unit 22 of first choice unit 21 and second, wherein:First choice unit 21, is used for When the voice sex of original voice messaging is male voice, then male voice vocal print is selected;Second selecting unit 22, for when original When the voice sex of voice messaging is female voice, then female voice vocal print is selected.
Further, male voice vocal print and female voice vocal print include at least two respectively, and the fundamental frequency of each is different, vocal print Selecting module 20 can according to the frequency of the fundamental tone of original voice messaging select corresponding to male voice vocal print or female voice vocal print.So as to So that the vocal print of the voice messaging and original voice messaging after translation processing more coincide, the sense of reality further enhancing.
Translation processing module 30 carries out translation processing according to the phonetic synthesis vocal print of selection to original voice messaging so that The voice sex of voice messaging after translation processing is consistent with the voice sex of original voice messaging, enhances the sense of reality, Improve Consumer's Experience.
The translation processing of voice messaging, mainly including three speech recognition, character translation, phonetic synthesis flows.Such as Fig. 7 institutes Show, translation processing module 30 includes first processing units 31, the processing unit 33 of second processing unit 32 and the 3rd:First processing is single Member 31, for carrying out voice recognition processing to original voice messaging, obtain the first character string of source language;Second processing list Member 32, for carrying out character translation processing to the first character string, obtain the second character string of object language;3rd processing unit 33, phonetic synthesis processing is carried out to the second character string for the phonetic synthesis vocal print using selection, obtains the voice of object language Information.
Exemplified by applied to interpreting equipment.Translation processing module 30 can locally carry out translation processing in interpreting equipment, i.e., Perform three speech recognition, character translation, phonetic synthesis handling processes successively to original voice messaging, obtain object language The code stream of voice messaging.
Translation processing module 30 can also carry out translation processing by server.Such as:First processing units 31 first will Original voice messaging is sent to speech recognition server, and speech recognition server carries out voice knowledge to original voice messaging Not, identify the first character string and return to interpreting equipment;Second processing unit 32 receives the first character string, and by the first character String is sent to character translation server, and character translation server carries out character translation to the first character string, is translated as object language The second character string and return to interpreting equipment;3rd processing unit 33 receives the second character string, and by the second character string and choosing The phonetic synthesis vocal print selected is sent to voice synthesizing server, and voice synthesizing server is using the phonetic synthesis vocal print of selection to the Two character strings carry out phonetic synthesis processing, obtain the voice messaging of object language, and by the voice messaging of object language with code stream Form return to interpreting equipment, the 3rd processing unit 33 receives the code stream of the voice messaging of object language, after being translated Voice messaging.
Certainly, in other embodiments, translation processing module 30 can also be by original voice messaging and the voice of selection Synthesis vocal print is sent to a server, and the server is directly carried out at speech recognition and character translation to original voice messaging Reason, and phonetic synthesis is carried out using the phonetic synthesis vocal print of selection, obtain the code stream of the voice messaging of object language.
Exemplified by applied to server.Translation processing module 30 passes through first processing units 31, the and of second processing unit 32 3rd processing unit 33 performs three speech recognition, character translation, phonetic synthesis handling processes to original voice messaging successively, Obtain the voice messaging of object language.And the voice messaging of object language is sent to interpreting equipment in the form of code stream.
After interpreting equipment obtains the voice messaging after translation processing, then the voice messaging is exported, for example, drive the speaker is defeated Go out the voice messaging.Because the voice sex of the voice messaging of output is consistent with the voice sex of original voice messaging, because This user, which sounds, feels truer, improves Consumer's Experience.
The speech translation apparatus of the embodiment of the present invention, by identifying the voice sex of original voice messaging, further according to Phonetic synthesis vocal print corresponding to voice Sex preference, finally original voice messaging is carried out according to the phonetic synthesis vocal print of selection Translation is handled, so that the voice sex and the voice sex phase one of original voice messaging of the voice messaging after translation processing Cause, realize to the adaptive of voice sex.It is male voice that the voice come is translated when male speaks, and is translated when women speaks Voice out is female voice so that raw tone is harmonious with translated speech, greatly strengthen the sense of reality of exchange, improves Consumer's Experience.
The voice translation method and device of the embodiment of the present invention are particularly suitable for use in translator, utilize translator half-duplex data The interactive features of transmission, when user often says a word, then the sex of user is identified according to the voice messaging of user, is turned over accordingly The voice messaging consistent with the sex of user is translated, so as to strengthen the authenticity of exchange, lifts Consumer's Experience.
The present invention proposes a kind of interpreting equipment simultaneously, and the interpreting equipment includes memory, processor and at least one quilt Store in memory and be configured as the application program by computing device, the application program is configurable for performing language Sound interpretation method.The voice translation method comprises the following steps:Identify the voice sex of original voice messaging;According to original Voice messaging voice Sex preference corresponding to phonetic synthesis vocal print;According to the phonetic synthesis vocal print of selection to original voice Information carries out translation processing, so that the voice sex and the voice sex of original voice messaging of the voice messaging after translation processing It is consistent.Voice translation method described in the present embodiment is the voiced translation side involved by above-described embodiment in the present invention Method, it will not be repeated here.
It will be understood by those skilled in the art that the present invention includes being related to for performing one in operation described herein Or multinomial equipment.These equipment can specially be designed and manufactured for required purpose, or can also include general-purpose computations Known device in machine.These equipment have the computer program being stored in it, and these computer programs optionally activate Or reconstruct.Such computer program, which can be stored in equipment (for example, computer) computer-readable recording medium or be stored in, to be suitable to Storage e-command is simultaneously coupled in any kind of medium of bus respectively, and the computer-readable medium includes but is not limited to Any kind of disk (including floppy disk, hard disk, CD, CD-ROM and magneto-optic disk), ROM (Read-Only Memory, it is read-only to deposit Reservoir), RAM (Random Access Memory, random access memory), EPROM (Erasable Programmable Read- Only Memory, Erarable Programmable Read only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory, EEPROM), flash memory, magnetic card or light card.It is it is, readable Medium includes any medium for storing or transmitting information in the form of it can read by equipment (for example, computer).
Those skilled in the art of the present technique be appreciated that can with computer program instructions come realize these structure charts and/or The combination of each frame and these structure charts and/or the frame in block diagram and/or flow graph in block diagram and/or flow graph.This technology is led Field technique personnel be appreciated that these computer program instructions can be supplied to all-purpose computer, special purpose computer or other The processor of programmable data processing method is realized, so as to pass through the processing of computer or other programmable data processing methods Device performs the scheme specified in the frame of structure chart and/or block diagram and/or flow graph disclosed by the invention or multiple frames.
Those skilled in the art of the present technique are appreciated that in the various operations discussed in the present invention, method, flow Step, measure, scheme can be replaced, changed, combined or deleted.Further, it is each with having been discussed in the present invention Kind operation, method, other steps in flow, measure, scheme can also be replaced, changed, reset, decomposed, combined or deleted. Further, it is of the prior art to have and the step in the various operations disclosed in the present invention, method, flow, measure, scheme It can also be replaced, changed, reset, decomposed, combined or deleted.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the scope of the invention, every utilization The equivalent structure or equivalent flow conversion that description of the invention and accompanying drawing content are made, or directly or indirectly it is used in other correlations Technical field, be included within the scope of the present invention.

Claims (10)

1. a kind of voice translation method, it is characterised in that comprise the following steps:
Identify the voice sex of original voice messaging;
According to phonetic synthesis vocal print corresponding to the voice Sex preference;
Translation processing is carried out to the original voice messaging according to the phonetic synthesis vocal print of selection, so that the language after translation processing The voice sex of message breath is consistent with the voice sex of original voice messaging.
2. voice translation method according to claim 1, it is characterised in that the voice of the original voice messaging of the identification The step of sex, includes:
Obtain the frequency of the fundamental tone of the original voice messaging;
Compare the frequency of the fundamental tone and the size of threshold value;
When the frequency of the fundamental tone is less than or equal to threshold value, the voice sex for identifying the original voice messaging is man Sound;
When the frequency of the fundamental tone is more than threshold value, the voice sex for identifying the original voice messaging is female voice.
3. voice translation method according to claim 2, it is characterised in that the acquisition original voice messaging The step of frequency of fundamental tone, includes:
With default sample frequency to the original voice messaging continuous sampling M frames, M >=2;
Fundamental frequency feature extraction is carried out to the speech frame of collection;
Go out the frequency of the fundamental tone of the original voice messaging according to the fundamental frequency characteristic statisticses of extraction.
4. according to the voice translation method described in claim any one of 1-3, it is characterised in that described to be closed according to the voice of selection The step of carrying out translation processing to the original voice messaging into vocal print includes:
Voice recognition processing is carried out to the original voice messaging, obtains the first character string of source language;
Character translation processing is carried out to first character string, obtains the second character string of object language;
Phonetic synthesis processing is carried out to second character string using the phonetic synthesis vocal print of selection, obtains the voice of object language Information.
5. the voice translation method according to Claims 2 or 3, it is characterised in that the phonetic synthesis vocal print includes male voice Vocal print and female voice vocal print, it is described to be included according to corresponding to the voice Sex preference the step of phonetic synthesis vocal print:
When the voice sex is male voice, the male voice vocal print is selected;
When the voice sex is female voice, the female voice vocal print is selected.
A kind of 6. speech translation apparatus, it is characterised in that including:
Gender identification module, for identifying the voice sex of original voice messaging;
Vocal print selecting module, for the phonetic synthesis vocal print according to corresponding to the voice Sex preference;
Translation processing module, translation processing is carried out to the original voice messaging for the phonetic synthesis vocal print according to selection, So that the voice sex of the voice messaging after translation processing is consistent with the voice sex of original voice messaging.
7. speech translation apparatus according to claim 6, it is characterised in that the gender identification module includes:
Acquiring unit, the frequency of the fundamental tone for obtaining the original voice messaging;
Comparing unit, for the frequency of the fundamental tone and the size of threshold value;
First recognition unit, for when the frequency of the fundamental tone is less than or equal to threshold value, determining the original voice letter The voice sex of breath is male voice;
Second recognition unit, for when the frequency of the fundamental tone is more than threshold value, determining the language of the original voice messaging Sound sex is female voice.
8. speech translation apparatus according to claim 7, it is characterised in that the acquiring unit includes:
Sample subelement, for default sample frequency to the original voice messaging continuous sampling M frames, M >=2;
Subelement is extracted, for carrying out fundamental frequency feature extraction to the speech frame of collection;
Count subelement, the frequency of the fundamental tone for going out the original voice messaging according to the fundamental frequency characteristic statisticses of extraction Rate.
9. according to the speech translation apparatus described in claim any one of 6-8, it is characterised in that the translation processing module bag Include:
First processing units, for carrying out voice recognition processing to the original voice messaging, obtain the first of source language Character string;
Second processing unit, for carrying out character translation processing to first character string, obtain the second character of object language String;
3rd processing unit, phonetic synthesis processing is carried out to second character string for the phonetic synthesis vocal print using selection, Obtain the voice messaging of object language.
10. the speech translation apparatus according to claim 7 or 8, it is characterised in that the phonetic synthesis vocal print includes male voice Vocal print and female voice vocal print, the vocal print selecting module include:
First choice unit, for when the voice sex is male voice, selecting the male voice vocal print;
Second selecting unit, for when the voice sex is female voice, selecting the female voice vocal print.
CN201710967364.0A 2017-10-17 2017-10-17 Voice translation method and device Pending CN107731232A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710967364.0A CN107731232A (en) 2017-10-17 2017-10-17 Voice translation method and device
PCT/CN2017/111961 WO2019075829A1 (en) 2017-10-17 2017-11-20 Voice translation method and apparatus, and translation device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710967364.0A CN107731232A (en) 2017-10-17 2017-10-17 Voice translation method and device

Publications (1)

Publication Number Publication Date
CN107731232A true CN107731232A (en) 2018-02-23

Family

ID=61211655

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710967364.0A Pending CN107731232A (en) 2017-10-17 2017-10-17 Voice translation method and device

Country Status (2)

Country Link
CN (1) CN107731232A (en)
WO (1) WO2019075829A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108831436A (en) * 2018-06-12 2018-11-16 深圳市合言信息科技有限公司 A method of text speech synthesis after simulation speaker's mood optimization translation
WO2019165748A1 (en) * 2018-02-28 2019-09-06 科大讯飞股份有限公司 Speech translation method and apparatus
CN112201224A (en) * 2020-10-09 2021-01-08 北京分音塔科技有限公司 Method, equipment and system for simultaneous translation of instant call
CN112614482A (en) * 2020-12-16 2021-04-06 平安国际智慧城市科技股份有限公司 Mobile terminal foreign language translation method, system and storage medium
CN112989847A (en) * 2021-03-11 2021-06-18 读书郎教育科技有限公司 Recording translation system and method of scanning pen

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007003897A (en) * 2005-06-24 2007-01-11 Toppan Printing Co Ltd Karaoke system, and device and program
US20080059147A1 (en) * 2006-09-01 2008-03-06 International Business Machines Corporation Methods and apparatus for context adaptation of speech-to-speech translation systems
CN101175272A (en) * 2007-09-19 2008-05-07 中兴通讯股份有限公司 Method for reading text short message
CN101359473A (en) * 2007-07-30 2009-02-04 国际商业机器公司 Auto speech conversion method and apparatus
KR20100068965A (en) * 2008-12-15 2010-06-24 한국전자통신연구원 Automatic interpretation apparatus and its method
JP2011197542A (en) * 2010-03-23 2011-10-06 Mitsubishi Electric Corp Rhythm pattern generation device
CN103956163A (en) * 2014-04-23 2014-07-30 成都零光量子科技有限公司 Common voice and encrypted voice interconversion system and method
CN105208194A (en) * 2015-08-17 2015-12-30 努比亚技术有限公司 Voice broadcast device and method
CN105913854A (en) * 2016-04-15 2016-08-31 腾讯科技(深圳)有限公司 Voice signal cascade processing method and apparatus

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9053096B2 (en) * 2011-12-01 2015-06-09 Elwha Llc Language translation based on speaker-related information
JP2013206253A (en) * 2012-03-29 2013-10-07 Toshiba Corp Machine translation device, method and program
CN103236259B (en) * 2013-03-22 2016-06-29 乐金电子研发中心(上海)有限公司 Voice recognition processing and feedback system, voice replying method
CN103559180A (en) * 2013-10-12 2014-02-05 安波 Chat translator
CN106156009A (en) * 2015-04-13 2016-11-23 中兴通讯股份有限公司 Voice translation method and device
CN106528547A (en) * 2016-11-09 2017-03-22 王东宇 Translation method for translation machine

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007003897A (en) * 2005-06-24 2007-01-11 Toppan Printing Co Ltd Karaoke system, and device and program
US20080059147A1 (en) * 2006-09-01 2008-03-06 International Business Machines Corporation Methods and apparatus for context adaptation of speech-to-speech translation systems
CN101359473A (en) * 2007-07-30 2009-02-04 国际商业机器公司 Auto speech conversion method and apparatus
CN101175272A (en) * 2007-09-19 2008-05-07 中兴通讯股份有限公司 Method for reading text short message
KR20100068965A (en) * 2008-12-15 2010-06-24 한국전자통신연구원 Automatic interpretation apparatus and its method
JP2011197542A (en) * 2010-03-23 2011-10-06 Mitsubishi Electric Corp Rhythm pattern generation device
CN103956163A (en) * 2014-04-23 2014-07-30 成都零光量子科技有限公司 Common voice and encrypted voice interconversion system and method
CN105208194A (en) * 2015-08-17 2015-12-30 努比亚技术有限公司 Voice broadcast device and method
CN105913854A (en) * 2016-04-15 2016-08-31 腾讯科技(深圳)有限公司 Voice signal cascade processing method and apparatus

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
国家经贸委经济研究中心中国国际名牌协会: "《中国经济技术发展优秀文集》", 30 June 2003, 中国文史出版社 *
陈力为 袁琦: "《语言工程》", 31 August 1997 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019165748A1 (en) * 2018-02-28 2019-09-06 科大讯飞股份有限公司 Speech translation method and apparatus
CN108831436A (en) * 2018-06-12 2018-11-16 深圳市合言信息科技有限公司 A method of text speech synthesis after simulation speaker's mood optimization translation
CN112201224A (en) * 2020-10-09 2021-01-08 北京分音塔科技有限公司 Method, equipment and system for simultaneous translation of instant call
CN112614482A (en) * 2020-12-16 2021-04-06 平安国际智慧城市科技股份有限公司 Mobile terminal foreign language translation method, system and storage medium
CN112989847A (en) * 2021-03-11 2021-06-18 读书郎教育科技有限公司 Recording translation system and method of scanning pen

Also Published As

Publication number Publication date
WO2019075829A1 (en) 2019-04-25

Similar Documents

Publication Publication Date Title
CN107731232A (en) Voice translation method and device
CN108900725B (en) Voiceprint recognition method and device, terminal equipment and storage medium
CN108305626A (en) The sound control method and device of application program
CN110049270A (en) Multi-person conference speech transcription method, apparatus, system, equipment and storage medium
US10181333B2 (en) Intelligent truthfulness indicator association
CN110149805A (en) Double-directional speech translation system, double-directional speech interpretation method and program
US20160163318A1 (en) Metadata extraction of non-transcribed video and audio streams
CN110069608A (en) A kind of method, apparatus of interactive voice, equipment and computer storage medium
CN107943914A (en) Voice information processing method and device
CN107749296A (en) Voice translation method and device
CN109801634A (en) A kind of fusion method and device of vocal print feature
CN112530408A (en) Method, apparatus, electronic device, and medium for recognizing speech
CN109271533A (en) A kind of multimedia document retrieval method
WO2023222088A1 (en) Voice recognition and classification method and apparatus
WO2020098523A1 (en) Voice recognition method and device and computing device
CN107104994A (en) Audio recognition method, electronic installation and speech recognition system
CN111883135A (en) Voice transcription method and device and electronic equipment
CN108628813A (en) Treating method and apparatus, the device for processing
CN111611358A (en) Information interaction method and device, electronic equipment and storage medium
CN113948090B (en) Voice detection method, session recording product and computer storage medium
CN108322770A (en) Video frequency program recognition methods, relevant apparatus, equipment and system
CN113571044A (en) Voice information processing method and device and electronic equipment
CN107656923A (en) Voice translation method and device
CN112269468A (en) Bluetooth and 2.4G, WIFI connection-based human-computer interaction intelligent glasses, method and platform for acquiring cloud information
CN111833907A (en) Man-machine interaction method, terminal and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180223