CN109859737A - Communication encryption method, system and computer readable storage medium - Google Patents

Communication encryption method, system and computer readable storage medium Download PDF

Info

Publication number
CN109859737A
CN109859737A CN201910243820.6A CN201910243820A CN109859737A CN 109859737 A CN109859737 A CN 109859737A CN 201910243820 A CN201910243820 A CN 201910243820A CN 109859737 A CN109859737 A CN 109859737A
Authority
CN
China
Prior art keywords
communication encryption
speech
waveform
speech recognition
mandarin
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910243820.6A
Other languages
Chinese (zh)
Inventor
王远昌
罗胤豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shenghong Innovation Technology Co ltd
Original Assignee
Shenzhen Shenghong Innovation Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Shenghong Innovation Technology Co ltd filed Critical Shenzhen Shenghong Innovation Technology Co ltd
Priority to CN201910243820.6A priority Critical patent/CN109859737A/en
Publication of CN109859737A publication Critical patent/CN109859737A/en
Pending legal-status Critical Current

Links

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The present invention discloses a kind of communication encryption method, system and storage medium, method includes: that transmitting terminal obtains mandarin pronunciation signal, mandarin pronunciation signal is identified by the speech recognition system of transmitting terminal and is converted into received text, and received text is then converted into the output of dialect waveform by the speech synthesis system of transmitting terminal;Receiving end obtains the dialect phonetic signal of transmitting terminal output, identifies dialect phonetic signal by the speech recognition system of receiving end and is converted to received text, and received text is converted into the output of mandarin waveform by the speech synthesis system of receiving end.The present invention improves the secrecy effect of communication information.In addition, it is safe and reliable come ensure to communicate that the mode of multi-enciphering can also be provided.In addition, input feature vector of the language feature set of different language as model can be also used in systems, and languages label and speaker's label is added, for distinguishing different languages and speaker, it can be by mandarin random transition at a variety of different dialects, to increase confidential nature.

Description

Communication encryption method, system and computer readable storage medium
Technical field
The present invention relates to communication technique field more particularly to a kind of communication encryption methods, system and computer-readable storage Medium.
Background technique
Secret communication refers to the communication for taking secrecy provision.Current secrecy provision is removed using secret signal, enigmatic language, password etc. Outside secrecy provision, channel security and information privacy are mainly used.Wherein, channel security is to use that the person of stealing secret information is made to be not easy intercept to letter The communication channel of breath, such as using dedicated route, transtaneous communication and radio spread-spectrum;Information privacy is the letter to transmission The methods of the code password that breath is arranged is subject to hidden to send out again.
With the development of electronic technology, crypto has been used to maintain secrecy.Its main feature is that the information of transmission in transmitting terminal Transposition encryption processing is carried out, inverse process also prime information, even if the person of stealing secret information is made to receive signal, also unknown signal institute's generation are pressed in receiving end The content of table.
However, current secure fashion is single, confidentiality is not strong.
Summary of the invention
The invention proposes a kind of communication encryption method, system and computer readable storage mediums, it is intended to improve communication letter The secrecy effect of breath.
To achieve the above object, the present invention provides a kind of communication encryption method, and the method is applied to communication encryption system, The communication encryption system includes: speech recognition system and speech synthesis system, the communication encryption method the following steps are included:
Transmitting terminal obtains mandarin pronunciation signal, identifies the common language by the speech recognition system of the transmitting terminal Sound signal is simultaneously converted into received text, and received text is then converted into dialect waveform by the speech synthesis system of the transmitting terminal Output;
Receiving end obtains the dialect phonetic signal of the transmitting terminal output, is known by the speech recognition system of the receiving end The not described dialect phonetic signal is simultaneously converted to received text, is then turned received text by the speech synthesis system of the receiving end Change the output of mandarin waveform into.
Wherein, the step of speech recognition system recognition of speech signals includes:
The speech recognition system carries out acoustic feature extraction to voice signal by front end, obtains the voice signal Speech frame vector matrix;
The speech recognition system combines acoustic model, dictionary and/or language model by the decoder of rear end, right The speech frame vector matrix is decoded, and identification obtains speech recognition result.
Wherein, described that the speech frame vector matrix is decoded, identify that the step of obtaining speech recognition result includes:
Frame in the speech frame moment of a vector is identified as state;
By the combinations of states at phoneme;
By the phonotactics at word, speech recognition result is obtained.
Wherein, before the step of speech recognition system recognition of speech signals further include:
Mute removal procedure is carried out to voice signal.
Wherein, received text is converted into the step of dialect waveform or mandarin waveform export and includes: by speech synthesis system
The speech synthesis system carries out text analyzing to the received text of input and obtains the context-sensitive language of phone-level Speech learns feature;
Based on the phone-level context-dependent language feature, it is other that frame level is expanded to according to duration modeling prediction result Feature, the input as acoustic model;
The acoustical characteristic parameters of acoustic model output are sent into vocoder, speech waveform is exported by the vocoder, it is described Speech waveform includes at least dialect waveform or mandarin waveform.
Wherein, the speech synthesis system by received text be converted into the step of dialect waveform or mandarin waveform export it Before further include:
Statistical parameter based on LSTM simultaneously combines CBHG network modelling, or fights network modelling by production, obtains Acoustic model.
Wherein, the speech synthesis system by received text be converted into the step of dialect waveform or mandarin waveform export it Before further include:
By more speakers, multilingual hybrid modeling, acoustic model is obtained.
Wherein, the speech recognition system by the transmitting terminal identifies the mandarin pronunciation signal and is converted into marking Before the step of quasi- text further include:
By modulation and carrier wave and software encryption technique, a conventional encryption is carried out to the mandarin pronunciation signal.
The embodiment of the present invention also proposes a kind of communication encryption system, the communication encryption system include: speech recognition system, Speech synthesis system, memory, processor, and the communication encryption program being stored on the memory, the communication encryption The step of program executes communication encryption method as described above when being called by the processor.
The embodiment of the present invention also proposes a kind of storage medium, and communication encryption program is stored on the storage medium, described The step of communication encryption program executes communication encryption method as described above when being called by processor.
The beneficial effects of the present invention are:
Transmitting terminal obtains mandarin pronunciation signal, identifies the common language by the speech recognition system of the transmitting terminal Sound signal is simultaneously converted into received text, and received text is then converted into dialect waveform by the speech synthesis system of the transmitting terminal Output;Receiving end obtains the dialect phonetic signal of the transmitting terminal output, is identified by the speech recognition system of the receiving end The dialect phonetic signal is simultaneously converted to received text, is then converted received text by the speech synthesis system of the receiving end It is exported at mandarin waveform.The secure communication mode encrypted as a result, by using dialect, substitutes traditional (dialect) secrecy traffic The work of member, improves the secrecy effect of communication information.It may, furthermore, provide the mode of multi-enciphering, come ensure to communicate Securely and reliably.Furthermore it is also possible to use input feature vector of the language feature set of different language as model in systems, and can Languages label and speaker's label is added, for distinguishing different languages and speaker, so as to be turned at random by mandarin A variety of different dialects are changed into, to increase confidential nature.
Detailed description of the invention
Fig. 1 is the flow diagram of communication encryption method embodiment of the present invention;
Fig. 2 is the flow diagram of speech recognition system in the present invention;
Fig. 3 is the example of a waveform;
Fig. 4 is that frame and frame overlap schematic diagram;
Fig. 5 is observation sequence schematic diagram;
Fig. 6 is frame, state, phoneme schematic diagram;
Fig. 7 is conditional probability schematic diagram of certain frame on state S3;
Fig. 8 is modeling unit schematic diagram;
Fig. 9 is state binding form schematic diagram;
Figure 10 is acoustic model probability schematic diagram;
Figure 11 is that the framework of CD-DNN-HMM indicates schematic diagram;
Figure 12 is speech synthesis system flow diagram in the present invention;
Figure 13 is the statistical parameter modeling schematic diagram in the present invention based on LSTM;
Figure 14 is CBHG schematic network structure in the present invention;
Figure 15 is the structural schematic diagram of GAN in the present invention;
Figure 16 is more speakers in the present invention, multilingual hybrid modeling schematic diagram.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
As shown in Figure 1, the embodiment of the present invention proposes a kind of communication encryption method, the method is applied to communication encryption system System, the communication encryption system includes: speech recognition system and speech synthesis system, and the communication encryption method includes following step It is rapid:
Step S1, transmitting terminal obtain mandarin pronunciation signal, described in the speech recognition system identification by the transmitting terminal Mandarin pronunciation signal is simultaneously converted into received text, is then converted into received text by the speech synthesis system of the transmitting terminal The output of dialect waveform;
Step S2, receiving end obtain the dialect phonetic signal of the transmitting terminal output, are known by the voice of the receiving end Dialect phonetic signal described in other system identification is simultaneously converted to received text, then will be marked by the speech synthesis system of the receiving end Quasi- text conversion is exported at mandarin waveform.
Wherein, the step of speech recognition system recognition of speech signals includes:
The speech recognition system carries out acoustic feature extraction to voice signal by front end, obtains the voice signal Speech frame vector matrix;
The speech recognition system combines acoustic model, dictionary and/or language model by the decoder of rear end, right The speech frame vector matrix is decoded, and identification obtains speech recognition result.
Wherein, described that the speech frame vector matrix is decoded, identify that the step of obtaining speech recognition result includes:
Frame in the speech frame moment of a vector is identified as state;
By the combinations of states at phoneme;
By the phonotactics at word, speech recognition result is obtained.
In addition, before the step of speech recognition system recognition of speech signals mute cut can also be carried out to voice signal Except processing.
Wherein, received text is converted into the step of dialect waveform or mandarin waveform export and includes: by speech synthesis system
The speech synthesis system carries out text analyzing to the received text of input and obtains the context-sensitive language of phone-level Speech learns feature;
Based on the phone-level context-dependent language feature, it is other that frame level is expanded to according to duration modeling prediction result Feature, the input as acoustic model;
The acoustical characteristic parameters of acoustic model output are sent into vocoder, speech waveform is exported by the vocoder, it is described Speech waveform includes at least dialect waveform or mandarin waveform.
Wherein, the speech synthesis system by received text be converted into the step of dialect waveform or mandarin waveform export it Before further include:
Statistical parameter based on LSTM simultaneously combines CBHG network modelling, or fights network modelling by production, obtains Acoustic model.
Alternatively, obtaining acoustic model by more speakers, multilingual hybrid modeling.
It may, furthermore, provide the mode of multi-enciphering, safe and reliable come ensure to communicate.Furthermore it is also possible in system Languages label and speaker's mark can be added as the input feature vector of model in the middle language feature set for using different language Note, for distinguishing different languages and speaker, so as to by mandarin random transition at a variety of different dialects, to increase Confidential nature.
The embodiment of the present invention is described in detail below:
The present invention provides the mode of multi-enciphering, safe and reliable come ensure to communicate.It is modulation and carrier wave and soft first Part encryption technology carries out a conventional encryption to signal, the Encryption Algorithm of the inside may insure to communicate it is high-grade prevent from divulging a secret with It decodes;In addition, increasing a kind of secure communication mode encrypted using dialect, also to substitute traditional (dialect) secrecy operator's Work.
As shown in Fig. 2, Fig. 2 is the flow diagram of speech recognition system in the present invention.
The speech recognition system carries out acoustic feature extraction to voice signal by front end, obtains the voice signal Speech frame vector matrix;
The speech recognition system combines acoustic model, dictionary and/or language model by the decoder of rear end, right The speech frame vector matrix is decoded, and identification obtains speech recognition result.
Wherein, identify that obtaining speech recognition result includes:
Frame in the speech frame moment of a vector is identified as state;
By the combinations of states at phoneme;
By the phonotactics at word, speech recognition result is obtained.
Acoustic model, dictionary and/or language model of the present invention are explained as follows:
Acoustic model (acoustic model): speech vector for identification;Can be identified with the methods of GMM or DNN to Amount, be aligned with DTW or HMM or CTC (alignment) recognition result output (word since when, when terminate);
Dictionary (dictionary): most models are not with word, but using phoneme as recognition unit.When identifying p When these three phonemes of l, dictionary is utilized, so that it may judge that described word is apple.
Language model (language model): when Chinese can still identify when listening the Chinese of the wrong mistake of the foreigner Content be because, knowledge of the Chinese about grammer, illogical word that adjustable acoustic model is identified, this It is exactly the effect of language model.
Firstly, sound is actually a kind of wave.For example Windows PCM file is exactly uncompressed pure wave shape files, The wav file being exactly commonly called as.Stored in wav file other than a file header, be exactly the point one by one of sound waveform. As shown in figure 3, Fig. 3 is the example of a waveform.
Before starting speech recognition, it is sometimes desirable to the mute excision of two ends, reduce and be done caused by subsequent step It disturbs.The operation of this mute excision is commonly referred to as VAD, needs to use some technologies of signal processing.Sound is analyzed, It needs to sound framing, that is, sound is cut into a bit of, every segment is known as a frame.Framing operates with movement Window function is realized.Between frame and frame be usually have it is overlapping, as shown in Figure 4.
In Fig. 4, the length of every frame is 25 milliseconds, has 25-10=15 millisecond to overlap between every two frame.Referred to herein as with frame length 25ms, frame move 10ms framing.After framing, voice has reformed into many segments.But waveform is in the time domain almost without descriptive power, Therefore waveform must be converted.A kind of common transform method is to extract MFCC feature, according to the physiological property of human ear, every One frame waveform becomes a multi-C vector, can simply be interpreted as this vector and contain the content information of this frame voice.This A process is called acoustic feature extraction.So far, sound just at 12 rows (assuming that acoustic feature is 12 dimensions), N arrange one Matrix, referred to as observation sequence, N is totalframes here.Observation sequence is illustrated in fig. 5 shown below, and in Fig. 5, each frame all uses one 12 The vector of dimension indicates that the shade of color lump indicates the size of vector value.
The workflow of speech recognition are as follows: frame is identified as state -- combinations of states at phoneme -- phonotactics Cheng Dan Word.
Phoneme: the pronunciation of word is made of phoneme.To English, a kind of common phone set is the one of Carnegie Mellon University The phone set being made of 39 phonemes is covered, Chinese generally directly uses whole initial consonants and simple or compound vowel of a Chinese syllable as phone set, and in addition Chinese identifies Also dividing has tune without tune.
State: state is phonetic unit more finer than phoneme.A phoneme is usually divided into 3 states.
As shown in fig. 6, each small vertical bar represents a frame, the corresponding state of several frame voices, every three states in Fig. 6 It is combined into a phoneme, several phonotactics are at a word.Only it is to be understood which state every frame voice corresponds to, can obtain The result of speech recognition out.The corresponding state of every frame phoneme is determined by the maximum probability of the corresponding state of certain frame, such as Fig. 7 institute Show, conditional probability of this frame on state S3 is maximum, show that this frame belongs to state S3.
Probability is read by " acoustic model ", and the inside has deposited huge number of voice data, passed through these parameters, so that it may know Road frame and the corresponding probability of state.
A state network is constructed using hidden Markov model (Hidden Markov Model, HMM), from state net It is found and the most matched path of sound in network.According to the demand of actual task, network size and structure are reasonably selected.Build state Network is that phoneme network is launched by word level network, reinflated at state network.
Speech recognition process is that one optimal path of search, voice correspond to the probability of this paths most in state network Greatly, this referred to as " is decoded ".The algorithm of route searching is a kind of algorithm of Dynamic Programming beta pruning, referred to as Viterbi algorithm, is used In searching global optimum path.Being combined with trained model can be by judging new speech vector, to identify language Sound.
1, the basic framework of speech recognition
W*=argmaxwP(W|Y) (1)
≈argmaxw P(Y|W)P(W) (3)
W indicates that word sequence, Y indicate voice input in above formula.Formula 1 indicates that the target of speech recognition is in given voice In the case where input, the maximum word sequence of possibility is found.According to Baye ' Rule, available formula 2, wherein denominator table The probability of this existing voice is shown, it does not have parameters relationship compared to the word sequence of solution, can ignore when solving, in turn Obtain formula 3.First part in formula 3 (P (Y | W)) indicate that the probability of this audio occurs in a given word sequence (Acoustic Model (AM)), it is exactly the acoustic model in speech recognition;There is this text in second part (P (W)) expression The probability (Language Model (LM)) of word sequence, it is exactly the language model in speech recognition.
2, acoustic model (Acoustic Model, AM)
Acoustic model can be understood as the modeling to sounding, it can input voice and be converted into the defeated of acoustics expression It out, is more precisely the probability for providing voice and belonging to some delimiter symbol.
This delimiter symbol can be syllable (syllable) or smaller granularity phoneme in English (phoneme);This delimiter symbol can be the female either granularity of the sound phoneme equally small with English in Chinese.So Acoustic model in formula 3 can be expressed as the form of following formula 4:
Wherein Q indicates the sequence of pronunciation unit.It can be seen that, acoustic model has been ultimately converted to a voice from formula To pronunciation sequence model and one pronunciation sequence to output character sequence dictionary.Here pronunciation sequence is usually phoneme, Acoustic model so far is a description from voice to phoneme state.In order to which the phoneme to different contexts is distinguish, Modeling unit is used as usually using context-sensitive " three-tone ".Can with Fig. 8 shows:
Wherein dictionary portion is expressed as formula 5, and meaning is each text to be split into the sequence of several diacritics Column.
Acoustic part in formula 4 can continue to be decomposed into following formula 6:
θ=θ0..., θT+1is a state sequence (6)
Formula 6 indicates that the granularity of Acoustic Modeling can continue to be decomposed into smaller state (state).Usual three sounds Son is corresponding with 3 states (mute is usually 5 states), then the sum of Acoustic Modeling is exactly 3*Q3+5 so more.In order to press The technology of contracting modeling unit quantity, state binding is widely used, it to pronounce similar state to be indicated with a model table, To reduce parameter amount.The rule of expert's craft writing can be used in the technology of state binding, and data-driven also can be used Mode.Specific binding form is as shown below:
It is shifted onto based on above, acoustic model is the model converted between a description voice and state.
Assume at this point, introducing HMM: state hidden variable, voice are observations, and jumping between state meets Markov vacation If.So acoustic model can continue to be expressed as formula:
Wherein a indicates that transition probability, b indicate emission probability.If being indicated with figure, structure that can be as shown in Figure 10:
As shown in Figure 10, observation probability is usually described with GMM or DNN.Here it is CD-GMM-HMM framework and CD- The voice recognition acoustic model of DNN-HMM framework.The framework of CD-DNN-HMM indicates as shown in figure 11.
As shown in figure 12, speech synthesis system includes the modules such as text analyzing, duration modeling, acoustic model, vocoder.It is right In an input text, first passes around text analyzing and obtain phone-level context-dependent language feature;Then according to duration Model prediction result expands to frame level characteristics, the input as acoustic model;The acoustic feature that finally acoustic model is exported Parameter is sent into vocoder, exports speech waveform.Here acoustics and duration modeling are done using LSTM, and using delay output and jumped The strategy of frame output, can be effectively reduced operand while being promoted and modeling effect.The CBHG being introduced into Tacotron system Network post-processes LSTM layers of output, to guarantee the flatness of acoustical characteristic parameters.In addition, also fighting net using production Network, to obtain more naturally synthesizing voice.
Statistical parameter modeling based on LSTM
Long short-term memory (Long Short Term Memory, LSTM) network has powerful Series Modeling ability, and Bi-directional LSTM is used widely due to that can fully consider the contextual information of sequence.In speech synthesis In task, it is contemplated that the needs of Stream Processing use unidirectional LSTM network here.In order to preferably be taken out to input feature vector As having added two layers of fully-connected network before LSTM layers.
Following input information can be observed in order to make unidirectional LSTM network also, and delay output policy is employed herein. Specific practice is just to start the output for providing first frame after obtaining the input information after several frames.In addition, being calculated to reduce Amount uses frame-skipping output policy.For continuous N frame, it is only necessary to provide the input of last frame, whole N frames can be obtained Output sequence, to effectively reduce calculation amount.LSTM model unlike BLSTM can with the characteristic parameter of output smoothing, And RNN is used to replace being connected to output layer entirely, available more smooth characteristic parameter output, as shown in figure 13.
CBHG network
Using RNN as output layer, can be generated smooth spectrum parameter, but base frequency parameters still have it is obvious not Smoothing problasm.In order to solve this problem, CBHG (the 1-D convolution used in Tacotron system is introduced here Bank+highway network+bidirectional GRU) network.CBHG network structure is as shown in figure 14, by one-dimensional convolution Filter group, in addition Highway network and a two-way GRU network composition.CBHG is a kind of very powerful network, Chang Beiyong Carry out the characterization of abstraction sequence.CBHG network is added behind LSTM network, and it is rough can be effectively improved output characteristic parameter Problem, and further lift scheme precision of prediction.
Production fights network
Production fights network (Generative Adversarial Network, GAN) as a kind of powerful generation Formula model has successfully been applied to image and has generated and some other field.The structure of GAN is as shown in figure 15, by a life The G and discriminator D that grow up to be a useful person is formed.Wherein for G as the acoustic model in parameter synthesis system, target is to generate to approach nature language The characteristic parameter of sound;And the effect of D is to assess the similarity of the acoustic feature and true acoustic feature of G output, and pass through ladder The mode of degree passes to G, makes the acoustic feature of output more approach natural-sounding so as to adjust network is generated.Using GAN's Network structure can be effectively relieved parameter synthesis bring acoustical characteristic parameters and cross smoothing problasm, to make to synthesize voice more True nature.For the generator of traditional GAN using random noise as input, what is inputted here is linguistic information.And the loss letter of G Number is also added into the mean square error of output acoustic feature and actual acoustic feature on the basis of traditional loss function.In training rank Section carries out cross-training to G and D respectively, i.e., first fixes the parameter of D in each iteration, be trained to G;Then fix G's again Parameter is trained D.
More speakers, multilingual hybrid modeling
Traditional speech synthesis system will individually train a model for each speaker of each languages.It examines Consider the powerful modeling ability of LSTM, it here completely can be with a model come to multiple languages and multiple speakers modeling.For For the sake of simplicity, use input feature vector of the language feature set of different language as model in systems here, in addition also plus Languages label and speaker's label are entered, for distinguishing different languages and speaker.It can be by mandarin random transition at each The different dialect of kind, to increase confidential nature, as shown in figure 16.
In addition, the embodiment of the present invention also proposes a kind of communication encryption system, the communication encryption system includes: speech recognition System, speech synthesis system, memory, processor, and the communication encryption program being stored on the memory, the communication The step of encipheror executes communication encryption method as described above when being called by the processor.
In addition, the embodiment of the present invention also proposes a kind of storage medium, communication encryption program is stored on the storage medium, The step of communication encryption program executes communication encryption method as described above when being called by processor.
The above description is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all utilizations Equivalent structure made by description of the invention and accompanying drawing content or process transformation, are applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of communication encryption method, which is characterized in that the method is applied to communication encryption system, the communication encryption system Include: speech recognition system and speech synthesis system, the communication encryption method the following steps are included:
Transmitting terminal obtains mandarin pronunciation signal, identifies that the mandarin pronunciation is believed by the speech recognition system of the transmitting terminal Number and be converted into received text, it is defeated that received text is then converted into dialect waveform by the speech synthesis system of the transmitting terminal Out;
Receiving end obtains the dialect phonetic signal of the transmitting terminal output, identifies institute by the speech recognition system of the receiving end It states dialect phonetic signal and is converted to received text, be then converted into received text by the speech synthesis system of the receiving end The output of mandarin waveform.
2. communication encryption method according to claim 1, which is characterized in that the step of speech recognition system recognition of speech signals Suddenly include:
The speech recognition system carries out acoustic feature extraction to voice signal by front end, obtains the voice of the voice signal Frame vector matrix;
The speech recognition system combines acoustic model, dictionary and/or language model by the decoder of rear end, to described Speech frame vector matrix is decoded, and identification obtains speech recognition result.
3. communication encryption method according to claim 2, which is characterized in that described to be carried out to the speech frame vector matrix Decoding identifies that the step of obtaining speech recognition result includes:
Frame in the speech frame moment of a vector is identified as state;
By the combinations of states at phoneme;
By the phonotactics at word, speech recognition result is obtained.
4. communication encryption method according to claim 2, which is characterized in that the speech recognition system recognition of speech signals The step of before further include:
Mute removal procedure is carried out to voice signal.
5. communication encryption method according to claim 1, which is characterized in that received text is converted by speech synthesis system The step of dialect waveform or mandarin waveform export include:
The speech synthesis system carries out text analyzing to the received text of input and obtains phone-level context-dependent language Feature;
Based on the phone-level context-dependent language feature, it is not special that frame level is expanded to according to duration modeling prediction result Sign, the input as acoustic model;
The acoustical characteristic parameters of acoustic model output are sent into vocoder, speech waveform, the voice are exported by the vocoder Waveform includes at least dialect waveform or mandarin waveform.
6. communication encryption method according to claim 5, which is characterized in that the speech synthesis system turns received text Before the step of changing dialect waveform or the output of mandarin waveform into further include:
Statistical parameter based on LSTM simultaneously combines CBHG network modelling, or fights network modelling by production, obtains acoustics Model.
7. communication encryption method according to claim 5, which is characterized in that the speech synthesis system turns received text Before the step of changing dialect waveform or the output of mandarin waveform into further include:
By more speakers, multilingual hybrid modeling, acoustic model is obtained.
8. communication encryption method according to claim 1, which is characterized in that the speech recognition by the transmitting terminal Mandarin pronunciation signal described in system identification and the step of be converted into received text before further include:
By modulation and carrier wave and software encryption technique, a conventional encryption is carried out to the mandarin pronunciation signal.
9. a kind of communication encryption system, which is characterized in that the communication encryption system includes: speech recognition system, speech synthesis System, memory, processor, and the communication encryption program being stored on the memory, the communication encryption program is by institute It states when processor calls and executes such as the step of communication encryption method of any of claims 1-8.
10. a kind of storage medium, which is characterized in that be stored with communication encryption program, the communication encryption on the storage medium It executes when program is called by processor such as the step of communication encryption method of any of claims 1-8.
CN201910243820.6A 2019-03-28 2019-03-28 Communication encryption method, system and computer readable storage medium Pending CN109859737A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910243820.6A CN109859737A (en) 2019-03-28 2019-03-28 Communication encryption method, system and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910243820.6A CN109859737A (en) 2019-03-28 2019-03-28 Communication encryption method, system and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN109859737A true CN109859737A (en) 2019-06-07

Family

ID=66902366

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910243820.6A Pending CN109859737A (en) 2019-03-28 2019-03-28 Communication encryption method, system and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109859737A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110197655A (en) * 2019-06-28 2019-09-03 百度在线网络技术(北京)有限公司 Method and apparatus for synthesizing voice
CN110400560A (en) * 2019-07-24 2019-11-01 北京明略软件系统有限公司 Data processing method and device, storage medium, electronic device
CN110827803A (en) * 2019-11-11 2020-02-21 广州国音智能科技有限公司 Method, device and equipment for constructing dialect pronunciation dictionary and readable storage medium
CN111354343A (en) * 2020-03-09 2020-06-30 北京声智科技有限公司 Voice wake-up model generation method and device and electronic equipment
CN111899719A (en) * 2020-07-30 2020-11-06 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for generating audio
CN112435666A (en) * 2020-09-30 2021-03-02 远传融创(杭州)科技有限公司 Intelligent voice digital communication method based on deep learning model
CN112581929A (en) * 2020-12-11 2021-03-30 山东省计算中心(国家超级计算济南中心) Voice privacy density masking signal generation method and system based on generation countermeasure network
WO2021098675A1 (en) * 2019-11-20 2021-05-27 维沃移动通信有限公司 Interaction method and electronic device
CN114490963A (en) * 2021-12-17 2022-05-13 中国人民解放军空军军医大学 All-media publishing system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000112488A (en) * 1998-09-30 2000-04-21 Fujitsu General Ltd Voice converting device
CN101950498A (en) * 2010-08-25 2011-01-19 赵洪鑫 Spoken Chinese encryption method for privacy protection
CN103035251A (en) * 2011-09-30 2013-04-10 西门子公司 Method for building voice transformation model and method and system for voice transformation
US20130110511A1 (en) * 2011-10-31 2013-05-02 Telcordia Technologies, Inc. System, Method and Program for Customized Voice Communication
CN105551480A (en) * 2015-12-18 2016-05-04 百度在线网络技术(北京)有限公司 Dialect conversion method and device
CN109285535A (en) * 2018-10-11 2019-01-29 四川长虹电器股份有限公司 Phoneme synthesizing method based on Front-end Design

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000112488A (en) * 1998-09-30 2000-04-21 Fujitsu General Ltd Voice converting device
CN101950498A (en) * 2010-08-25 2011-01-19 赵洪鑫 Spoken Chinese encryption method for privacy protection
CN103035251A (en) * 2011-09-30 2013-04-10 西门子公司 Method for building voice transformation model and method and system for voice transformation
US20130110511A1 (en) * 2011-10-31 2013-05-02 Telcordia Technologies, Inc. System, Method and Program for Customized Voice Communication
CN105551480A (en) * 2015-12-18 2016-05-04 百度在线网络技术(北京)有限公司 Dialect conversion method and device
CN109285535A (en) * 2018-10-11 2019-01-29 四川长虹电器股份有限公司 Phoneme synthesizing method based on Front-end Design

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ANDREW ROSENBERG等: ""MEASURING THE EFFECT OF LINGUISTIC RESOURCES ON PROSODY MODELING FOR SPEECH SYNTHESIS"", 《2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 *
TAKUHIRO KANEKO等: ""GENERATIVE ADVERSARIAL NETWORK-BASED POSTFILTER FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS"", 《2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 *
刘影彤: ""移动通信端到端语音加解密方案研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
徐宗昌: "《图形与多媒体技术在装备IETM中的应用》", 31 October 2015, 国防工业出版社 *
赵力: "《语音信号处理(第2版)》", 31 May 2009, 机械工业出版社 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110197655A (en) * 2019-06-28 2019-09-03 百度在线网络技术(北京)有限公司 Method and apparatus for synthesizing voice
CN110400560A (en) * 2019-07-24 2019-11-01 北京明略软件系统有限公司 Data processing method and device, storage medium, electronic device
CN110400560B (en) * 2019-07-24 2022-10-18 北京明略软件系统有限公司 Data processing method and device, storage medium and electronic device
CN110827803A (en) * 2019-11-11 2020-02-21 广州国音智能科技有限公司 Method, device and equipment for constructing dialect pronunciation dictionary and readable storage medium
WO2021098675A1 (en) * 2019-11-20 2021-05-27 维沃移动通信有限公司 Interaction method and electronic device
CN111354343A (en) * 2020-03-09 2020-06-30 北京声智科技有限公司 Voice wake-up model generation method and device and electronic equipment
CN111354343B (en) * 2020-03-09 2024-03-05 北京声智科技有限公司 Voice wake-up model generation method and device and electronic equipment
CN111899719A (en) * 2020-07-30 2020-11-06 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for generating audio
CN112435666A (en) * 2020-09-30 2021-03-02 远传融创(杭州)科技有限公司 Intelligent voice digital communication method based on deep learning model
CN112581929A (en) * 2020-12-11 2021-03-30 山东省计算中心(国家超级计算济南中心) Voice privacy density masking signal generation method and system based on generation countermeasure network
CN114490963A (en) * 2021-12-17 2022-05-13 中国人民解放军空军军医大学 All-media publishing system
CN114490963B (en) * 2021-12-17 2023-11-24 中国人民解放军空军军医大学 Full-media publishing system

Similar Documents

Publication Publication Date Title
CN109859737A (en) Communication encryption method, system and computer readable storage medium
O’Shaughnessy Automatic speech recognition: History, methods and challenges
EP3523796A1 (en) Speech synthesis
US20160086599A1 (en) Speech Recognition Model Construction Method, Speech Recognition Method, Computer System, Speech Recognition Apparatus, Program, and Recording Medium
CN112151005B (en) Chinese and English mixed speech synthesis method and device
WO2007114605A1 (en) Acoustic model adaptation methods based on pronunciation variability analysis for enhancing the recognition of voice of non-native speaker and apparatuses thereof
KR20230056741A (en) Synthetic Data Augmentation Using Voice Transformation and Speech Recognition Models
CN113724718B (en) Target audio output method, device and system
US6502073B1 (en) Low data transmission rate and intelligible speech communication
JP6330069B2 (en) Multi-stream spectral representation for statistical parametric speech synthesis
Akila et al. Isolated Tamil word speech recognition system using HTK
Banerjee et al. Application of triphone clustering in acoustic modeling for continuous speech recognition in Bengali
Sharma et al. Soft-Computational Techniques and Spectro-Temporal Features for Telephonic Speech Recognition: an overview and review of current state of the art
Daqrouq et al. Wavelet lpc with neural network for spoken arabic digits recognition system
Ijima et al. Prosody Aware Word-Level Encoder Based on BLSTM-RNNs for DNN-Based Speech Synthesis.
Syed et al. Concatenative Resynthesis with Improved Training Signals for Speech Enhancement.
Atal et al. Speech research directions
Ninh A speaker-adaptive hmm-based vietnamese text-to-speech system
Tunalı A speaker dependent, large vocabulary, isolated word speech recognition system for turkish
Sai et al. Enhancing pitch robustness of speech recognition system through spectral smoothing
Pitrelli et al. Expressive speech synthesis using American English ToBI: questions and contrastive emphasis
Chen et al. Low-resource spoken keyword search strategies in georgian inspired by distinctive feature theory
Khalifa et al. Statistical modeling for speech recognition
Govender et al. The CSTR entry to the 2018 Blizzard Challenge
Besbes et al. Wavelet packet energy and entropy features for classification of stressed speech

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190607