WO2009074711A1 - Speech data encryption and decryption - Google Patents

Speech data encryption and decryption Download PDF

Info

Publication number
WO2009074711A1
WO2009074711A1 PCT/FI2007/050685 FI2007050685W WO2009074711A1 WO 2009074711 A1 WO2009074711 A1 WO 2009074711A1 FI 2007050685 W FI2007050685 W FI 2007050685W WO 2009074711 A1 WO2009074711 A1 WO 2009074711A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
speech data
parameters
decryption
encrypted
Prior art date
Application number
PCT/FI2007/050685
Other languages
French (fr)
Inventor
Jani Nurminen
Sakari Himanen
Original Assignee
Nokia Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Corporation filed Critical Nokia Corporation
Priority to PCT/FI2007/050685 priority Critical patent/WO2009074711A1/en
Publication of WO2009074711A1 publication Critical patent/WO2009074711A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04KSECRET COMMUNICATION; JAMMING OF COMMUNICATION
    • H04K1/00Secret communication
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload

Definitions

  • the invention relates to arranging encryption and decryption of speech data.
  • a user specific encryption key is typically established for each call at a mobile terminal and at a base station serving the mobile terminal.
  • channel encoding is typically performed for the speech data.
  • the output of the channel encoder is encrypted.
  • the encrypted data transmitted from the mobile terminal is then deciphered at the base station.
  • the base station may pass the deciphered speech data to further transmission, for instance towards a Public Switched Telephone Network (PSTN) so that the intended recipient may receive the call.
  • PSTN Public Switched Telephone Network
  • FIG. 2 illustrates an apparatus according to an embodiment
  • Figure 3a illustrates speech encoding arrangement according to an embodiment
  • Figure 3b illustrates speech decoding arrangement according to an embodiment
  • Figure 4a illustrates a method according to an embodiment
  • Figure 4b illustrates a method according to an embodiment
  • Figure 5 illustrates a method according to an embodiment
  • Figure 6 illustrates a method according to an embodiment
  • Figure 7 illustrates examples of decoded speech data.
  • Figure 1 shows, in a simplified form, conventional transmission chain and reception chain for speech, for instance in a digital mobile terminal or system.
  • the transmit chain comprises a voice input, such as a microphone, an analog to digital (A/D) converter, a voice coder, a channel encoder, an encryption unit, a frame formatting unit, a modulator, and a transmitter.
  • the voice coder outputs coded speech frames, which comprise a plurality of speech bits.
  • the speech frames are input to the channel encoder for channel encoding using error correction and/or error detection codes.
  • the purpose of the error coding is to enable the receiver to detect and correct bit errors that may occur during transmission.
  • the channel encoder implements the encoding method used in a particular communications system.
  • the speech frames may then be protected from interception by encryption.
  • the speech frames are formatted for transmission by a frame formatting unit.
  • the frame formatting unit interleaves encrypted bits in the speech frames from the encryption unit, adds necessary synchronization and training bits, and assigns the resulting formatted bits to appropriate timeslots of a frame structure, for instance.
  • the modulator modulates the resulting formatted bits to an assigned radio frequency carrier for transmission by the transmitter.
  • the receive chain comprises a receiver, a demodulator, a frame capturing unit, a decryption unit, a channel decoder, a voice decoder, a digital to analog (D/A) converter, and a speaker.
  • the receiver receives the speech signals and converts the signals into a suitable format for digital processing.
  • the demodulator demodulates the received signal and provides the demodulated bits to the frame capturing unit, which performs error detection and de-interleaving.
  • the decryption unit decrypts demodulated bits received from the frame capturing unit.
  • the decrypted bits output from decryption unit are decoded by the channel decoder.
  • the decoded bits are then provided to the voice decoder.
  • FIG. 2 illustrates a simplified block diagram of an apparatus 20 according to an embodiment.
  • the apparatus comprises a unit 24 for encoding and encrypting digitized speech data 22 and a unit 30 for decrypting and decoding encrypted and encoded speech data 32.
  • the speech coding and encryption unit, or speech coder, 24 is configurable for speech data encryption.
  • the speech coding and encryption unit 24 is configured to define one or more encryption parameters for encrypting speech data before end of a speech data compression process.
  • the speech encoding and encryption unit 24 is configured to encrypt speech data 22 before end of the speech data compression process on the basis of the encryption parameters.
  • the coder 24 outputs encoded (compressed) and encrypted speech data 26 for further actions, for instance for storage or channel coding.
  • the speech decoding and decryption unit, or speech decoder, 30 is configurable for speech data decryption.
  • the speech decoding and decryption unit 30 is configured to define one or more decryption parameters for decrypting encrypted speech data before end of a speech data decompression process.
  • the speech decoding and decryption unit 30 is configured to decrypt encrypted and encoded speech data 32, obtained from a channel decoder or memory, for instance, before end of the speech data decompression process on the basis of the decryption parameters.
  • the speech decoding and decryption unit 30 outputs decrypted and decoded speech data 34, which may be provided for D/A conversion and further to speaker, or for further digital processing.
  • the speech data (de)compression process is to be understood broadly to refer to any speech (de)compressing method or procedure for (de)compressing (compressed) speech data, and providing (de)compressed speech data as output of the process.
  • another process such as channel coding or storing to memory (or D/A conversion after decompression) may be started.
  • speech data encryption and decryption can be made a seamless part of the encoding and decoding of speech data and encrypting/encoding and decryption/decoding can be considered parallel processes.
  • no separate encryption and decryption units and processes are required.
  • decoding of the bit stream into the correct speech signal can be made very difficult for outsiders. This is very beneficial in situations where the speech data itself is considered valuable or otherwise worth protecting.
  • An example of such a use case is the compression of text-to-speech (TTS) databases because it is desirable to keep the text-to-speech databases secret.
  • TTS text-to-speech
  • the apparatus 20 has been depicted as one entity, different modules and memory may be implemented in one or more physical or logical entities.
  • the encoder/encryption unit 24 and the decoder/decryption unit 30 are functionally separated in Figure 2, these functions could be implemented in a single unit or module. Further, there could be an apparatus implementing only one of these units 24, 30.
  • an apparatus comprising the speech coder 24 and/or decoder 30 are illustrated below in connection with Figures 3a, 3b, 4a, 4b, 5, and 6. It should be appreciated that the apparatus may comprise other units used in or for further storing, transmitting/receiving, or further processing speech data. However, they are irrelevant to the present embodiments and, therefore, they need not to be discussed in more detail here.
  • the apparatus may be any data processing device performing speech encoding and/or decoding. Examples of a personal data processing device include a personal computer, an entertainment device such as a game console, a laptop, a personal digital assistant, an embedded computing device or a mobile station (mobile phone).
  • the apparatus may also be a network element device serving a plurality of user devices, for instance a base station, a base station controller, or a radio network controller.
  • a wireless connection may be implemented with a wireless transceiver operating according to the GSM (Global System for Mobile Communications), WCDMA (Wideband Code Division Multiple Access), WLAN (Wireless Local Area Network) or Bluetooth® standard, or any other suitable standard/non- standard wireless communication means.
  • GSM Global System for Mobile Communications
  • WCDMA Wideband Code Division Multiple Access
  • WLAN Wireless Local Area Network
  • Bluetooth® any other suitable standard/non- standard wireless communication means.
  • the apparatus could be in a form of a chip unit or some other kind of hardware module for controlling a data processing device.
  • Such hardware module comprises connecting means for connecting the data processing device mechanically and/or functionally.
  • the hardware module may form part of the device and could be removable.
  • Some examples of such hardware module are a sub-assembly or an accessory device, for instance a specific speech coder and/or decoder unit.
  • Apparatuses comprise not only prior art means, but also means for arranging encryption of speech data during encoding process and/or decryption of encrypted speech data during decoding process. In particular, means may be provided for arranging the features illustrated in connection with Figures 3a, 3b, 4a, 4b, 5, and 6.
  • the apparatus may be implemented as an electronic digital computer, which may comprise memory, a central processing unit (CPU), and a system clock.
  • the CPU may comprise a set of registers, an arithmetic logic unit, and a control unit.
  • the control unit is controlled by a sequence of program instructions transferred to the CPU from the memory.
  • the control unit may contain a number of microinstructions for basic operations.
  • the implementation of microinstructions may vary, depending on the CPU design.
  • the program instructions may be coded by a programming language, which may be a high- level programming language, such as C, Java, etc., or a low-level programming language, such as a machine language, or an assembler.
  • the electronic digital computer may also have an operating system, which may provide system services to a computer program written with the program instructions.
  • An embodiment provides a computer program embodied on a distribution medium, comprising program instructions which, when loaded into an electronic apparatus, constitute the units 24 and/or 30.
  • the computer program may be in source code form, object code form, or in some intermediate form, and it may be stored in some carrier, which may be any entity or device capable of carrying the program.
  • Such carriers include a record medium, computer memory, read-only memory, electrical carrier signal, telecommunications signal, and software distribution package, for example.
  • the encryption/decryption features are implemented by a program executed on top of an operating system, such as the Symbian operating system for mobile devices.
  • at least some of the currently described features are implemented as part of text-to-speech software.
  • the unit 24 enabling parallel speech encoding and encryption and/or the unit 30 enabling parallel decoding and decryption may also be implemented as one or more integrated circuits, such as application-specific integrated circuits ASIC.
  • Other hardware embodiments are also feasible, such as a circuit built of separate logic components. A hybrid of these different implementations is also feasible.
  • FIG. 3a illustrates a speech encoding and encryption unit, or speech coder, 50 according to an embodiment.
  • Digitized speech 52 is inputted to a speech analysis process 54 to produce a set of speech parameters 56. It is to be noted that in some speech coding implementations, such as direct waveform coding, there may be no specific speech analysis 54.
  • Each speech parameter is quantized 58.
  • the quantization 58 is configured by control data 60.
  • the control data 60 may be any data for controlling the quantization of the speech parameters 56. For instance, applied codebook content, size, structure, etc. may be controlled. This control data 60 may be exchanged between the quantization module 58 in the encoder 50 and the dequantization module 94 in the decoder 80 during initialization of the encoding process.
  • the set of vectors 62 from the quantization 58 is then subjected to encryption 64.
  • the quantized speech vectors are encrypted by applying encryption parameters 66, the definition of which is illustrated in more detail below.
  • As an output of the encryption 64 a set of encrypted indices 68 is obtained and further provided for bit stream generation 70 to finalize the speech encoding and compression process.
  • the output 72 of the bit stream generation procedure 70 is a bit stream including encoded and encrypted speech data, which is the output of the speech encryption and encoding unit 50.
  • the output 72 may be provided for further units and procedures in the data processing device including the encoder 50, such as for a channel coding unit or for storage.
  • Figure 3b illustrates a data decryption and decoding unit, or decoder, 80 according to an embodiment.
  • Encoded and encrypted speech bit stream 82 is applied as an input to the unit 80.
  • the bit stream is parsed 84.
  • the output of the parsing 84 is a set of encrypted indices 86 submitted for decryption 90.
  • the decryption 90 applies decryption parameter(s) 88 for decrypting the encrypted indices 86.
  • the decryption parameters 88 are formed similarly to the encryption parameters 66.
  • the encryption module 64 and the decryption module 90 share the same or at least very closely related data for encryption and decryption, i.e. symmetric cryptography is applied.
  • the decrypted indices 92 are dequantized 94, by using predefined dequantization control data 96.
  • the dequantization 94 may be part of the decryption process.
  • As an output of the dequantization 94 a set of decoded and decrypted speech parameters 98 is obtained. These speech parameters 98 are submitted to a speech synthesis 100 to finalize the speech decoding and decompression process.
  • the speech synthesis provides decoded (and decrypted) speech data 102 as an output of the decoder 80.
  • three different pieces of data may be separated:
  • the configuration data may depend on the data to be encoded and it is generated before the encoding.
  • the configuration data is the control data controlling quantization 60 and the control data 96 controlling dequantization.
  • control data may be specific rules for mapping speech parameters to indices, for instance.
  • code books may be exchanged, and the code book content, code book size, structure, etc may be varied.
  • part of the encryption can be considered to occur during quantization (or initialisation), and in principle even the entire encryption process could be implemented as part of the quantization.
  • quantization can provide at least part of the encryption process in speech coder and dequantization at least part of decryption process in speech decoder, there could be a single block for encryption (and quantization) in Figure 3a and a single block for decryption (and dequantization) in Figure 3b. However, it is not necessary to adapt quantization for encryption purposes.
  • Parameter(s) used in the encryption/decryption such as time- varying scrambling
  • These parameter(s) may be provided for encryption 64 and decryption 90 by inputs 66 and 88, respectively.
  • the generation of the encryption/decryption parameters is not limited to any particular method. It is possible to apply any parameter agreed between or similarly generated by the coder and the decoder.
  • One embodiment for generating scrambling parameters by applying random number generator is illustrated below.
  • the bit stream contains the compressed speech data as encrypted.
  • the bit stream can be decoded using the decoder even without any additional data but the result will be correct only if the correct configuration data and correct decryption parameter(s) are available and used in a correct way during the decoding.
  • Figure 4a illustrates a method for encoding and encrypting speech data according to an embodiment.
  • configuration data is shared with a decoder before processing any bit streams.
  • the coder 50 illustrated in Figure 3a may share quantization control data 60 with the decoding and decryption unit 80 of Figure 3b. It is to be noted that the step 400 may be completely separate from the actual encoding/decoding processes.
  • the configuration data may be conveyed between the coder (transmitter) and the decoder (receiver) by utilizing separate transmission means, for instance in a separate e-mail, or the data may be pre-configured in the apparatuses.
  • step 402 the encoding is configured on the basis of the configuration data retrieved locally from a memory of an apparatus implementing the method. This step may be performed during initialization of speech coding or the coder may be configured already earlier after entering of the configuration data, for instance.
  • step 404 there is a need to compress and encrypt speech data and scrambling parameters are defined.
  • the scrambling parameters may be pre-stored and retrieved in step 404 from memory of the apparatus implementing the method. In another embodiment the parameters are generated in step 404.
  • scrambling parameters are transferred between the coder and the decoder.
  • the parameter(s) used in the scrambling may have also been conveyed to the decoder separately, not inside the bit stream and not together with the configuration data.
  • the parameter data could be sent as a text message.
  • both the coder and the decoder define their scrambling parameters independently of each other.
  • the definition of the encryption and decryption parameters is to be understood broadly to cover various embodiments of arranging the encryption parameters for use in the coder. For instance, some or all of the encryption and/or decryption parameters could be defined even on the basis of a user input, such as a specific word inputted by the user.
  • step 406 speech data is scrambled by utilizing the one or more scrambling parameters as encryption key.
  • one or more remaining speech encoding functions may be performed after step 406, such as the bit stream generation 70 of Figure 3a. After these remaining functions the speech compression and coding process ends and the coder provides as output encoded and encrypted speech data.
  • Figure 4b illustrates a speech data decoding and decryption method according to an embodiment. This method may be applied in a decoder configured to decode and decrypt speech data encrypted in accordance with the method of Figure 4a. For instance, the method of Figure 4b may be applied in the speech decoding and decryption unit 80 of Figure 3b.
  • configuration data is shared with the encryption unit.
  • the decoder is configured on the basis of the configuration data. This step may be performed during initialization of speech decoding or the decoder may be configured already earlier after entering of the configuration data.
  • step 454 one or more descrambling parameters are defined, to enable the decoder to descramble scrambled speech data before end of decompression process.
  • step 454 scrambled speech data is descrambled on the basis of the descrambling parameters.
  • the scrambling/descrambling is just one option for arranging the encryption/decryption of the speech data.
  • Various other symmetric or asymmetric encryption and decryption methods may be applied.
  • Figures 4a and 4b only illustrate encryption and decryption operations, and encoding/decoding operations were not illustrated.
  • the encryption and decryption can be applied during encoding and decoding processes, respectively.
  • the encryption may be implemented at some other point of the encoding process than after and/or part of the quantization.
  • decryption may be arranged at a different point than illustrated in Figure 3b.
  • the encryption key and encryption process is time-variable.
  • the scrambling may be performed in step 406 in time- varying manner based on time-varying scrambling parameters.
  • a set of scrambling parameters may be defined in step 404.
  • the scrambling process 406 may change the applied scrambling parameter periodically at predetermined time interval or after encrypting one or more speech data units, for instance.
  • Figures 5 and 6 One such further embodiment is illustrated in Figures 5 and 6.
  • decryption may be time-varying.
  • the descrambling may be performed in step 456 in time-varying manner based on time-varying descrambling parameters, a reference is made to the examples given above for scrambling operations.
  • time-varying encryption enables to further increase security level, since an encryption/decryption parameter is valid only for some time. Conventional brute force decrypting would not be feasible due to the time-varying nature of the scrambling.
  • FIG. 5 illustrates an embodiment for arranging specification of time-varying scrambling parameters. This embodiment may be applied in step 404 of Figure 4a and 452 of Figure 4b.
  • a speech unit identifier (or a number derived from it), such as a phoneme ID, is selected to serve as a seed for a random number generator.
  • a set of random number values is specified by applying the specified unit ID or the value derived from the unit ID.
  • random numbers are selected in consecutive manner as scrambling parameters for speech frames or units.
  • the first random number generated using that seed is selected as the scrambling parameter for the first frame or parameter value inside that unit
  • the next random number is selected as the scrambling parameter for the second frame or parameter value inside that unit
  • the same kind of deterministic pseudo-random number generator is used both at the coder (50) and at the decoder (80), and a single seed affects the data in that particular unit.
  • Above-illustrated features may naturally be applied also for generating descrambling parameters.
  • the above embodiment is only one example of available methods for generating the time-varying scrambling parameters, and various other methods may be applied in steps 404 and 452. Further, the above-illustrated method may be modified in various ways. For instance, every n th random number or random parameter is used as a scrambling parameter, or a random number or parameter is selected as scrambling parameter for a pre-determined number of frames of units.
  • Figure 6 illustrates an embodiment for scrambling speech data. For sake of conciseness, both scrambling and descrambling operations are illustrated.
  • the coder may apply this illustrated method in step 406 of Figure 4a by applying the scrambling parameter and the decoder may apply the illustrated method parameter in step 456 of Figure 4b by applying the descrambling parameter.
  • step 600 a predetermined number of bits of a (de)scrambling parameter is selected.
  • step 602 exclusive-OR operation is performed between binary index value and selected bits of (de)scambling parameter.
  • the method may be repeated as long as there are frames or units or speech parameter indices to be (de)scrambled.
  • a next (de)scrambling parameter is selected and step 600 is returned for the next (de)scrambling parameter and the associated frame or unit being next (de)scrambled.
  • the above-illustrated embodiment provides a simple but effective speech data encryption. It does not require any additional storage space and the encrypting and decrypting operations can be made very fast, since, for instance, (de)quantization is anyhow required. Moreover, the protection can be made very strong due to the fact that it is always possible to decode the bitstream into speech-like signal but the result is correct only if the decrypting is done correctly.
  • Figure 7. The upper part 700 of Figure 7 shows a speech signal encrypted and decrypted correctly using the present embodiment.
  • One example of a feasible decoding output without correct decrypting is shown in the lower part 710 of Figure 7. It can be seen that the decoded signal contains some speech-like properties but the content is not the same anymore and it is not understandable to a listener. It should be also noted that some incorrect decrypting could produce valid speech but the content could still be different.
  • At least some of the above illustrated features are applied for encrypting speech data for a text-to-speech (TTS) system.
  • TTS text-to-speech
  • data protection is arranged as follows:
  • the main TTS system may use a separate acoustic synthesis engine in synthesis of speech waveforms.
  • the synthesis engine includes a decoder of a speech codec, for instance a very low bit rate (VLBR) speech codec, and there is a separate interface allowing communication between the TTS system and the synthesis engine (and the VLBR speech decoder).
  • VLBR very low bit rate
  • the configuration data is generated for each TTS voice separately.
  • the TTS system gives the configuration data to the synthesis engine interface that uses it for configuring the decoder.
  • the TTS system When the TTS system is synthesizing a sentence, it first gives a list of unit IDs to the synthesis engine interface that links these IDs to indices in the speech database. This information is also used for deriving time-varying parameters for the descrambling in a non-evident but deterministic manner.
  • the scrambling and descrambling parameters may be derived as illustrated in Figure 5.
  • the scrambling and descrambling operations may be performed as illustrated in Figure 6.
  • the bit stream is read and decoded into speech by the synthesis engine (or the decoder) according to the instructions coming from the TTS system using separate processing calls.
  • the application of the present invention is not limited to any specific speech application.
  • the present features may be applied in connection with transmission of speech data. For example, sensitive communication between two persons could be protected by applying the above embodiments.

Abstract

The present invention relates to speech data encryption and decryption methods. As regards decryption, a speech decoder for decompressing compressed speech data is configured for speech data decryption. One or more decryption parameters is defined for decrypting encrypted speech data before end of a speech data decompression process. Encrypted speech data is decrypted before end of the speech data decompression process on the basis of the one or more decryption parameters.

Description

Speech data encryption and decryption
Field
The invention relates to arranging encryption and decryption of speech data. Background
In prior art digital mobile communication systems, digital encryption is used over the wireless communication link, which is otherwise too easily intercepted. A user specific encryption key is typically established for each call at a mobile terminal and at a base station serving the mobile terminal. After encoding a digitized speech unit to compress the speech information, channel encoding is typically performed for the speech data. The output of the channel encoder is encrypted. The encrypted data transmitted from the mobile terminal is then deciphered at the base station. After deciphering, the base station may pass the deciphered speech data to further transmission, for instance towards a Public Switched Telephone Network (PSTN) so that the intended recipient may receive the call.
However, besides protecting data transferred over the radio interface, there are also other events where speech data needs to be secured. For instance, it may be desirable to restrict access to speech data in a text-to- speech or speech storage applications. Thus, a data encryption procedure needs to be carried out for the speech data.
Brief description
Methods, apparatuses, and computer program products are now provided, which are characterized by what is stated in the independent claims. Some embodiments of the invention are described in the dependent claims.
The invention and various embodiments of the invention provide several advantages, which will become apparent from the detailed description below. One advantage is that encryption and decryption can be made a seamless part of the speech encoding and decoding process. List of drawings
Embodiments of the present invention are described below, by way of example only, with reference to the accompanying drawings, in which Figure 1 illustrates transmission and reception chains for speech transmission;
Figure 2 illustrates an apparatus according to an embodiment;
Figure 3a illustrates speech encoding arrangement according to an embodiment;
Figure 3b illustrates speech decoding arrangement according to an embodiment;
Figure 4a illustrates a method according to an embodiment;
Figure 4b illustrates a method according to an embodiment; Figure 5 illustrates a method according to an embodiment;
Figure 6 illustrates a method according to an embodiment; and
Figure 7 illustrates examples of decoded speech data.
Description of embodiments
The following embodiments are exemplary. Although the specification may refer to "an", "one", or "some" embodiment(s) in several locations, this does not necessarily mean that each such reference is to the same embodiment(s), or that the feature only applies to a single embodiment. Single features of different embodiments may also be combined to provide other embodiments. Figure 1 shows, in a simplified form, conventional transmission chain and reception chain for speech, for instance in a digital mobile terminal or system. The transmit chain comprises a voice input, such as a microphone, an analog to digital (A/D) converter, a voice coder, a channel encoder, an encryption unit, a frame formatting unit, a modulator, and a transmitter. The voice coder outputs coded speech frames, which comprise a plurality of speech bits. The speech frames are input to the channel encoder for channel encoding using error correction and/or error detection codes. The purpose of the error coding is to enable the receiver to detect and correct bit errors that may occur during transmission. The channel encoder implements the encoding method used in a particular communications system. The speech frames may then be protected from interception by encryption. After encryption, the speech frames are formatted for transmission by a frame formatting unit. The frame formatting unit interleaves encrypted bits in the speech frames from the encryption unit, adds necessary synchronization and training bits, and assigns the resulting formatted bits to appropriate timeslots of a frame structure, for instance. The modulator modulates the resulting formatted bits to an assigned radio frequency carrier for transmission by the transmitter.
The receive chain comprises a receiver, a demodulator, a frame capturing unit, a decryption unit, a channel decoder, a voice decoder, a digital to analog (D/A) converter, and a speaker. The receiver receives the speech signals and converts the signals into a suitable format for digital processing. The demodulator demodulates the received signal and provides the demodulated bits to the frame capturing unit, which performs error detection and de-interleaving. The decryption unit decrypts demodulated bits received from the frame capturing unit. The decrypted bits output from decryption unit are decoded by the channel decoder. The decoded bits are then provided to the voice decoder.
However, there are some drawbacks in the procedure of Figure 1. Specific encryption and decryption units are needed, and the speech data may be transferred in encrypted form only a part of the entire transmission path. Further, there is a need to protect speech data also in other situations, also in situations not involving any transmission of the speech data. If only conventional speech coding is applied, it is trivial for anyone to decode the bit stream into speech, if the decoder is known. If standard speech codecs are used, it is easy to find the correct decoder.
Figure 2 illustrates a simplified block diagram of an apparatus 20 according to an embodiment. The apparatus comprises a unit 24 for encoding and encrypting digitized speech data 22 and a unit 30 for decrypting and decoding encrypted and encoded speech data 32. Thus, the speech coding and encryption unit, or speech coder, 24 is configurable for speech data encryption. The speech coding and encryption unit 24 is configured to define one or more encryption parameters for encrypting speech data before end of a speech data compression process. The speech encoding and encryption unit 24 is configured to encrypt speech data 22 before end of the speech data compression process on the basis of the encryption parameters. The coder 24 outputs encoded (compressed) and encrypted speech data 26 for further actions, for instance for storage or channel coding.
The speech decoding and decryption unit, or speech decoder, 30 is configurable for speech data decryption. The speech decoding and decryption unit 30 is configured to define one or more decryption parameters for decrypting encrypted speech data before end of a speech data decompression process. The speech decoding and decryption unit 30 is configured to decrypt encrypted and encoded speech data 32, obtained from a channel decoder or memory, for instance, before end of the speech data decompression process on the basis of the decryption parameters. The speech decoding and decryption unit 30 outputs decrypted and decoded speech data 34, which may be provided for D/A conversion and further to speaker, or for further digital processing.
The speech data (de)compression process is to be understood broadly to refer to any speech (de)compressing method or procedure for (de)compressing (compressed) speech data, and providing (de)compressed speech data as output of the process. After end of the speech (de)compressing process another process, such as channel coding or storing to memory (or D/A conversion after decompression) may be started.
Thus, speech data encryption and decryption can be made a seamless part of the encoding and decoding of speech data and encrypting/encoding and decryption/decoding can be considered parallel processes. When compared to conventional encryption and decryption of (compressed) speech data, no separate encryption and decryption units and processes are required. Further, it becomes possible to reduce overall computational complexity for compressing, encrypting, decrypting, and decompressing speech data, since it is possible to utilize functions anyhow required for compression/decompression. By applying the present embodiment, decoding of the bit stream into the correct speech signal can be made very difficult for outsiders. This is very beneficial in situations where the speech data itself is considered valuable or otherwise worth protecting. An example of such a use case is the compression of text-to-speech (TTS) databases because it is desirable to keep the text-to-speech databases secret.
Although the apparatus 20 has been depicted as one entity, different modules and memory may be implemented in one or more physical or logical entities. Although the encoder/encryption unit 24 and the decoder/decryption unit 30 are functionally separated in Figure 2, these functions could be implemented in a single unit or module. Further, there could be an apparatus implementing only one of these units 24, 30.
Some further embodiments of an apparatus comprising the speech coder 24 and/or decoder 30 are illustrated below in connection with Figures 3a, 3b, 4a, 4b, 5, and 6. It should be appreciated that the apparatus may comprise other units used in or for further storing, transmitting/receiving, or further processing speech data. However, they are irrelevant to the present embodiments and, therefore, they need not to be discussed in more detail here. The apparatus may be any data processing device performing speech encoding and/or decoding. Examples of a personal data processing device include a personal computer, an entertainment device such as a game console, a laptop, a personal digital assistant, an embedded computing device or a mobile station (mobile phone). The apparatus may also be a network element device serving a plurality of user devices, for instance a base station, a base station controller, or a radio network controller. In case of a mobile communications device comprising a transceiver for wireless communications, a wireless connection may be implemented with a wireless transceiver operating according to the GSM (Global System for Mobile Communications), WCDMA (Wideband Code Division Multiple Access), WLAN (Wireless Local Area Network) or Bluetooth® standard, or any other suitable standard/non- standard wireless communication means.
The apparatus could be in a form of a chip unit or some other kind of hardware module for controlling a data processing device. Such hardware module comprises connecting means for connecting the data processing device mechanically and/or functionally. Thus, the hardware module may form part of the device and could be removable. Some examples of such hardware module are a sub-assembly or an accessory device, for instance a specific speech coder and/or decoder unit. Apparatuses comprise not only prior art means, but also means for arranging encryption of speech data during encoding process and/or decryption of encrypted speech data during decoding process. In particular, means may be provided for arranging the features illustrated in connection with Figures 3a, 3b, 4a, 4b, 5, and 6. The apparatus may be implemented as an electronic digital computer, which may comprise memory, a central processing unit (CPU), and a system clock. The CPU may comprise a set of registers, an arithmetic logic unit, and a control unit. The control unit is controlled by a sequence of program instructions transferred to the CPU from the memory. The control unit may contain a number of microinstructions for basic operations. The implementation of microinstructions may vary, depending on the CPU design. The program instructions may be coded by a programming language, which may be a high- level programming language, such as C, Java, etc., or a low-level programming language, such as a machine language, or an assembler. The electronic digital computer may also have an operating system, which may provide system services to a computer program written with the program instructions.
An embodiment provides a computer program embodied on a distribution medium, comprising program instructions which, when loaded into an electronic apparatus, constitute the units 24 and/or 30. The computer program may be in source code form, object code form, or in some intermediate form, and it may be stored in some carrier, which may be any entity or device capable of carrying the program. Such carriers include a record medium, computer memory, read-only memory, electrical carrier signal, telecommunications signal, and software distribution package, for example. In one further embodiment the encryption/decryption features are implemented by a program executed on top of an operating system, such as the Symbian operating system for mobile devices. In a further embodiment at least some of the currently described features are implemented as part of text-to-speech software. The unit 24 enabling parallel speech encoding and encryption and/or the unit 30 enabling parallel decoding and decryption may also be implemented as one or more integrated circuits, such as application-specific integrated circuits ASIC. Other hardware embodiments are also feasible, such as a circuit built of separate logic components. A hybrid of these different implementations is also feasible.
Figure 3a illustrates a speech encoding and encryption unit, or speech coder, 50 according to an embodiment. Digitized speech 52 is inputted to a speech analysis process 54 to produce a set of speech parameters 56. It is to be noted that in some speech coding implementations, such as direct waveform coding, there may be no specific speech analysis 54.
Each speech parameter is quantized 58. In one embodiment, as part of encryption process, the quantization 58 is configured by control data 60. The control data 60 may be any data for controlling the quantization of the speech parameters 56. For instance, applied codebook content, size, structure, etc. may be controlled. This control data 60 may be exchanged between the quantization module 58 in the encoder 50 and the dequantization module 94 in the decoder 80 during initialization of the encoding process.
The set of vectors 62 from the quantization 58 is then subjected to encryption 64. The quantized speech vectors are encrypted by applying encryption parameters 66, the definition of which is illustrated in more detail below. As an output of the encryption 64, a set of encrypted indices 68 is obtained and further provided for bit stream generation 70 to finalize the speech encoding and compression process. The output 72 of the bit stream generation procedure 70 is a bit stream including encoded and encrypted speech data, which is the output of the speech encryption and encoding unit 50. The output 72 may be provided for further units and procedures in the data processing device including the encoder 50, such as for a channel coding unit or for storage.
Figure 3b illustrates a data decryption and decoding unit, or decoder, 80 according to an embodiment. Encoded and encrypted speech bit stream 82 is applied as an input to the unit 80. First, the bit stream is parsed 84. The output of the parsing 84 is a set of encrypted indices 86 submitted for decryption 90. The decryption 90 applies decryption parameter(s) 88 for decrypting the encrypted indices 86. The decryption parameters 88 are formed similarly to the encryption parameters 66. In one embodiment, the encryption module 64 and the decryption module 90 share the same or at least very closely related data for encryption and decryption, i.e. symmetric cryptography is applied.
The decrypted indices 92 are dequantized 94, by using predefined dequantization control data 96. The dequantization 94 may be part of the decryption process. As an output of the dequantization 94, a set of decoded and decrypted speech parameters 98 is obtained. These speech parameters 98 are submitted to a speech synthesis 100 to finalize the speech decoding and decompression process. The speech synthesis provides decoded (and decrypted) speech data 102 as an output of the decoder 80.
Hence, according to an embodiment, three different pieces of data may be separated:
1. Data for configuring the codec (20, 24; 50; 80)
This data has to be available during the encoding and the decoding or otherwise the decoding result will not be correct. The configuration data may depend on the data to be encoded and it is generated before the encoding. In one embodiment, the configuration data is the control data controlling quantization 60 and the control data 96 controlling dequantization. Such control data may be specific rules for mapping speech parameters to indices, for instance. Also code books may be exchanged, and the code book content, code book size, structure, etc may be varied. Hence, part of the encryption can be considered to occur during quantization (or initialisation), and in principle even the entire encryption process could be implemented as part of the quantization. Thus, since quantization can provide at least part of the encryption process in speech coder and dequantization at least part of decryption process in speech decoder, there could be a single block for encryption (and quantization) in Figure 3a and a single block for decryption (and dequantization) in Figure 3b. However, it is not necessary to adapt quantization for encryption purposes.
2. Parameter(s) used in the encryption/decryption, such as time- varying scrambling
These parameter(s) may be provided for encryption 64 and decryption 90 by inputs 66 and 88, respectively. The generation of the encryption/decryption parameters is not limited to any particular method. It is possible to apply any parameter agreed between or similarly generated by the coder and the decoder. One embodiment for generating scrambling parameters by applying random number generator is illustrated below.
3. The actual bit stream representing the speech data
The bit stream contains the compressed speech data as encrypted.
The bit stream can be decoded using the decoder even without any additional data but the result will be correct only if the correct configuration data and correct decryption parameter(s) are available and used in a correct way during the decoding.
Without the configuration data, the parsing of the bit stream cannot be done in a correct way. In addition, even if the correct way for parsing would be solved, the meanings for the bit sequences would not be known.
Without the parameter(s) used in the scrambling, it is not possible to solve the relationship between bit sequences or scrambled indices and the speech parameter values or parameter indices.
Figure 4a illustrates a method for encoding and encrypting speech data according to an embodiment. In step 400 configuration data is shared with a decoder before processing any bit streams. In one embodiment the coder 50 illustrated in Figure 3a may share quantization control data 60 with the decoding and decryption unit 80 of Figure 3b. It is to be noted that the step 400 may be completely separate from the actual encoding/decoding processes. The configuration data may be conveyed between the coder (transmitter) and the decoder (receiver) by utilizing separate transmission means, for instance in a separate e-mail, or the data may be pre-configured in the apparatuses.
In step 402 the encoding is configured on the basis of the configuration data retrieved locally from a memory of an apparatus implementing the method. This step may be performed during initialization of speech coding or the coder may be configured already earlier after entering of the configuration data, for instance.
In step 404 there is a need to compress and encrypt speech data and scrambling parameters are defined. The scrambling parameters may be pre-stored and retrieved in step 404 from memory of the apparatus implementing the method. In another embodiment the parameters are generated in step 404.
In one embodiment scrambling parameters are transferred between the coder and the decoder. The parameter(s) used in the scrambling may have also been conveyed to the decoder separately, not inside the bit stream and not together with the configuration data. For example, in communication between two persons, the parameter data could be sent as a text message. Alternatively, both the coder and the decoder define their scrambling parameters independently of each other. Hence, the definition of the encryption and decryption parameters is to be understood broadly to cover various embodiments of arranging the encryption parameters for use in the coder. For instance, some or all of the encryption and/or decryption parameters could be defined even on the basis of a user input, such as a specific word inputted by the user.
In step 406 speech data is scrambled by utilizing the one or more scrambling parameters as encryption key. Although not shown in Figure 4, one or more remaining speech encoding functions may be performed after step 406, such as the bit stream generation 70 of Figure 3a. After these remaining functions the speech compression and coding process ends and the coder provides as output encoded and encrypted speech data. Figure 4b illustrates a speech data decoding and decryption method according to an embodiment. This method may be applied in a decoder configured to decode and decrypt speech data encrypted in accordance with the method of Figure 4a. For instance, the method of Figure 4b may be applied in the speech decoding and decryption unit 80 of Figure 3b. In step 450 configuration data is shared with the encryption unit. In step 452 the decoder is configured on the basis of the configuration data. This step may be performed during initialization of speech decoding or the decoder may be configured already earlier after entering of the configuration data.
In step 454 one or more descrambling parameters are defined, to enable the decoder to descramble scrambled speech data before end of decompression process. There are various embodiments for implementing step 454, similarly as described above for defining scrambling parameters. In step 456 scrambled speech data is descrambled on the basis of the descrambling parameters.
It is to be noted that the scrambling/descrambling is just one option for arranging the encryption/decryption of the speech data. Various other symmetric or asymmetric encryption and decryption methods may be applied. For simplicity reasons, Figures 4a and 4b only illustrate encryption and decryption operations, and encoding/decoding operations were not illustrated. However, as already illustrated in Figures 3a and 3b, the encryption and decryption can be applied during encoding and decoding processes, respectively. The encryption may be implemented at some other point of the encoding process than after and/or part of the quantization. Similarly, decryption may be arranged at a different point than illustrated in Figure 3b.
In one embodiment the encryption key and encryption process is time-variable. Thus, the scrambling may be performed in step 406 in time- varying manner based on time-varying scrambling parameters. A set of scrambling parameters may be defined in step 404. The scrambling process 406 may change the applied scrambling parameter periodically at predetermined time interval or after encrypting one or more speech data units, for instance. One such further embodiment is illustrated in Figures 5 and 6.
Similarly, decryption may be time-varying. For instance, the descrambling may be performed in step 456 in time-varying manner based on time-varying descrambling parameters, a reference is made to the examples given above for scrambling operations. The use of time-varying encryption enables to further increase security level, since an encryption/decryption parameter is valid only for some time. Conventional brute force decrypting would not be feasible due to the time-varying nature of the scrambling.
Figure 5 illustrates an embodiment for arranging specification of time-varying scrambling parameters. This embodiment may be applied in step 404 of Figure 4a and 452 of Figure 4b. In step 500 a speech unit identifier (ID) (or a number derived from it), such as a phoneme ID, is selected to serve as a seed for a random number generator. In step 502 a set of random number values is specified by applying the specified unit ID or the value derived from the unit ID. In step 504 random numbers are selected in consecutive manner as scrambling parameters for speech frames or units. Thus, the first random number generated using that seed is selected as the scrambling parameter for the first frame or parameter value inside that unit, the next random number is selected as the scrambling parameter for the second frame or parameter value inside that unit, and so on. The same kind of deterministic pseudo-random number generator is used both at the coder (50) and at the decoder (80), and a single seed affects the data in that particular unit. Above-illustrated features may naturally be applied also for generating descrambling parameters.
It is to be noted that the above embodiment is only one example of available methods for generating the time-varying scrambling parameters, and various other methods may be applied in steps 404 and 452. Further, the above-illustrated method may be modified in various ways. For instance, every nth random number or random parameter is used as a scrambling parameter, or a random number or parameter is selected as scrambling parameter for a pre-determined number of frames of units. Figure 6 illustrates an embodiment for scrambling speech data. For sake of conciseness, both scrambling and descrambling operations are illustrated. The coder may apply this illustrated method in step 406 of Figure 4a by applying the scrambling parameter and the decoder may apply the illustrated method parameter in step 456 of Figure 4b by applying the descrambling parameter.
In step 600 a predetermined number of bits of a (de)scrambling parameter is selected. In step 602 exclusive-OR operation is performed between binary index value and selected bits of (de)scambling parameter. As illustrated by 604, after scrambling or descrambling of the frame(s) or unit(s) or speech parameter indices to which the (de)scrambling parameter is associated, the method may be repeated as long as there are frames or units or speech parameter indices to be (de)scrambled. Thus, a next (de)scrambling parameter is selected and step 600 is returned for the next (de)scrambling parameter and the associated frame or unit being next (de)scrambled.
The above-illustrated embodiment provides a simple but effective speech data encryption. It does not require any additional storage space and the encrypting and decrypting operations can be made very fast, since, for instance, (de)quantization is anyhow required. Moreover, the protection can be made very strong due to the fact that it is always possible to decode the bitstream into speech-like signal but the result is correct only if the decrypting is done correctly. This is demonstrated in Figure 7. The upper part 700 of Figure 7 shows a speech signal encrypted and decrypted correctly using the present embodiment. One example of a feasible decoding output without correct decrypting is shown in the lower part 710 of Figure 7. It can be seen that the decoded signal contains some speech-like properties but the content is not the same anymore and it is not understandable to a listener. It should be also noted that some incorrect decrypting could produce valid speech but the content could still be different.
In one embodiment at least some of the above illustrated features are applied for encrypting speech data for a text-to-speech (TTS) system. In the presently illustrated implementation example designed for encrypting of speech databases in a TTS system, data protection is arranged as follows: The main TTS system may use a separate acoustic synthesis engine in synthesis of speech waveforms. The synthesis engine includes a decoder of a speech codec, for instance a very low bit rate (VLBR) speech codec, and there is a separate interface allowing communication between the TTS system and the synthesis engine (and the VLBR speech decoder). The configuration data is generated for each TTS voice separately. It is saved during the compression of the speech database and it is stored in a memory that is accessible for the TTS system. During the initialization of a certain voice, the TTS system gives the configuration data to the synthesis engine interface that uses it for configuring the decoder. When the TTS system is synthesizing a sentence, it first gives a list of unit IDs to the synthesis engine interface that links these IDs to indices in the speech database. This information is also used for deriving time-varying parameters for the descrambling in a non-evident but deterministic manner. The scrambling and descrambling parameters may be derived as illustrated in Figure 5.
The scrambling and descrambling operations may be performed as illustrated in Figure 6. The bit stream is read and decoded into speech by the synthesis engine (or the decoder) according to the instructions coming from the TTS system using separate processing calls.
In this embodiment, even if someone would have the decoder and all the data available, (s)he would not be able to decode it into the correct speech data without finding out all the details and parameters that have to be used in the decrypting. Even if someone could follow the communication between the TTS system and the synthesis engine, it would not be easy to find out the correct techniques and parameters for the decrypting.
Although an example was illustrated above for TTS system, the application of the present invention is not limited to any specific speech application. As already indicated, the present features may be applied in connection with transmission of speech data. For example, sensitive communication between two persons could be protected by applying the above embodiments.
It will be obvious to a person skilled in the art that, as technology advances, the inventive concept can be implemented in various ways. The invention and its embodiments are not limited to the examples described above but may vary within the scope of the claims.

Claims

Claims
1 . An apparatus comprising a speech coder for compressing speech data, wherein the speech coder is configurable for speech data encryption; the speech coder is configured to define one or more encryption parameters for encrypting speech data before end of a speech data compression process; and the speech coder is configured to encrypt speech data before end of the speech data compression process on the basis of the one or more encryption parameters.
2. The apparatus of claim 1 , wherein the speech coder is further configured to: perform speech analysis of digitised speech signal, quantize speech parameters resulted from the speech analysis, encrypt the quantized speech parameters, and generate a bit stream of the encrypted speech parameters, to output encoded and encrypted speech data.
3. The apparatus of any preceding claim, wherein the speech coder is configured to define a set of encryption parameters, and the speech coder is configured to encrypt speech data in time- varying manner by periodically changing the encryption parameter applied for the encryption.
4. The apparatus of any preceding claim, wherein the speech coder is configured to define the one or more encryption parameters on the basis of the source speech data to be encrypted.
5. An apparatus of any preceding claim, wherein the apparatus is configured to generate a set of random numbers as the one or more encryption parameters for a text-to-speech system by applying a speech unit identifier as a seed for a random number generator, and the speech coder is configured to scramble the speech data on the basis of the set of random numbers.
6. An apparatus of any preceding claim, wherein the speech coder is configured to control quantization of speech parameters on the basis of one or more quantization parameters used as the one or more encryption parameters to encrypt the speech data.
7. An apparatus of any preceding claim, wherein the apparatus is a mobile communications device comprising a transceiver for wireless communications.
8. An apparatus comprising a speech decoder for decompressing compressed speech data, wherein the speech decoder is configurable for speech data decryption; the speech decoder is configured to define one or more decryption parameters for decrypting encrypted speech data before end of a speech data decompression process; and the speech decoder is configured to decrypt encrypted speech data before end of the speech data decompression process on the basis of the one or more decryption parameters.
9. An apparatus according to claim 8, wherein the speech decoder is configured to: parse received the encrypted and encoded bit stream, decrypt the encrypted speech data obtained as output of the parsing on the basis of the decryption parameters, and dequantize the output of the decryption.
10. An apparatus of claim 8 or 9, wherein the speech decoder is configured to define a set of decryption parameters, and the speech decoder is configured to decrypt the encrypted speech data in time-varying manner by periodically changing the decryption parameter applied for the decryption.
1 1 . An apparatus of claim 8, 9, or 10, wherein the speech decoder is configured to define the one or more decryption parameters on the basis of information from a speech encoder configured to encrypt the speech data and generate the information on the basis of the source speech data to be encrypted.
12. An apparatus of any preceding claim 8 to 1 1 , wherein the apparatus is configured to generate a set of random numbers as the one or more decryption parameters for a text-to-speech system by applying a speech unit identifier as a seed for a random number generator, and the speech coder is configured to descramble the encrypted speech data on the basis of the set of random numbers.
13. An apparatus of any preceding claim 8 to 12, wherein the speech decoder is configured to control dequantization of the encrypted speech data on the basis of one or more quantization parameters used as the one or more decryption parameters to decrypt the encrypted speech data.
14. An apparatus of any preceding claim 8 to 13, wherein the apparatus is a mobile communications device comprising a transceiver for wireless communications.
15. A method for encrypting speech data, comprising: configuring a speech coder for speech data encryption; defining one or more encryption parameters for encrypting speech data before end of a speech data compression process; and encrypting speech data before end of the speech data compression process on the basis of the one or more encryption parameters.
16. A method according to claim 15, wherein the method comprises: performing speech analysis of digitised speech signal, quantizing speech parameters resulted from the speech analysis, encrypting the quantized speech parameters, and providing the encrypted speech parameters for bit stream generation resulting in encoded speech data.
17. A method according to claim 15 or 16, wherein the encryption parameters are time-varying, and the speech data is encrypted by the speech coding unit by the time- varying encryption parameters.
18. A method according to any preceding claim 15 to 17, wherein the one or more encryption parameters are defined on the basis of the source speech data to be encrypted.
19. A method according to any preceding claim 15 to 18, wherein a set of random numbers is generated as the one or more encryption parameters for a text-to-speech system by applying a speech unit identifier as a seed for a random number generator, and the speech data is scrambled on the basis of the set of random numbers.
20. A method according to any preceding claim 15 to 19, wherein the quantization of speech parameters is controlled on the basis of one or more quantization parameters used as the one or more encryption parameters to encrypt the speech data.
21 . A method for decrypting encrypted speech data, comprising: configuring a speech decoder for speech data decryption; defining one or more decryption parameters for decrypting encrypted speech data before end of a speech data decompression process; and decrypting encrypted speech data before end of the speech data decompression process on the basis of the one or more decryption parameters.
22. A method according to claim 21 , wherein the encrypted and encoded bit stream is parsed, the encrypted speech data obtained as output of the parsing on the basis of the decryption parameters is decrypted, and the output of the decryption is dequantized.
23. A method according to claim 21 or 22, wherein a set of decryption parameters is defined, and the encrypted speech data is decrypted in time-varying manner by periodically changing the decryption parameter applied for the decryption.
24. A method according to claim 21 , 22, or 23, wherein the one or more decryption parameters is defined on the basis of information from a speech encoder configured to encrypt the speech data and generate the information on the basis of the source speech data to be encrypted.
25. A method according to any preceding claim 21 to 24, wherein a set of random numbers is generated as the one or more decryption parameters for a text-to-speech system by applying a speech unit identifier of as a seed for a random number generator, and the encrypted speech data is descrambled on the basis of the set of random numbers.
26. A method according to any preceding claim 21 to 25, wherein the dequantization of the encrypted speech data is controlled on the basis of one or more quantization parameters used as the one or more decryption parameters to decrypt the encrypted speech data.
27. A speech coder comprising: means for compressing speech data, and means for speech data encryption, the speech coder being configured to define one or more encryption parameters for encrypting speech data before end of a speech data compression process and to encrypt speech data before end of the speech data compression process on the basis of the one or more encryption parameters.
28. A speech coder of claim 27, wherein the speech coder is further configured to: perform speech analysis of digitised speech signal, quantize speech parameters resulted from the speech analysis, encrypt the quantized speech parameters, and generate a bit stream of the encrypted speech parameters, to output encoded and encrypted speech data.
29. A speech decoder comprising: means for decompressing compressed speech data, and means for data decryption; wherein the speech decoder is configured to define one or more decryption parameters for decrypting encrypted speech data before end of a speech data decompression process; and to decrypt encrypted speech data before end of the speech data decompression process on the basis of the decryption parameters.
30. A speech decoder of claim 29, wherein the speech decoder is further configured to: parse received the encrypted and encoded bit stream, decrypt the encrypted speech data obtained as output of the parsing on the basis of the decryption parameters, and dequantize the output of the decryption.
PCT/FI2007/050685 2007-12-13 2007-12-13 Speech data encryption and decryption WO2009074711A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/FI2007/050685 WO2009074711A1 (en) 2007-12-13 2007-12-13 Speech data encryption and decryption

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/FI2007/050685 WO2009074711A1 (en) 2007-12-13 2007-12-13 Speech data encryption and decryption

Publications (1)

Publication Number Publication Date
WO2009074711A1 true WO2009074711A1 (en) 2009-06-18

Family

ID=40755261

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2007/050685 WO2009074711A1 (en) 2007-12-13 2007-12-13 Speech data encryption and decryption

Country Status (1)

Country Link
WO (1) WO2009074711A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014117646A1 (en) * 2013-02-01 2014-08-07 飞天诚信科技股份有限公司 Audio data parsing method
CN113645613A (en) * 2021-07-08 2021-11-12 中国人民解放军战略支援部队信息工程大学 Real-time voice encryption equipment and method for cellular mobile network
EP4128618A4 (en) * 2020-03-31 2024-04-10 Ericsson Telefon Ab L M Artificial intelligence (ai) based garbled speech elimination

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4179586A (en) * 1972-08-02 1979-12-18 The United States Of America As Represented By The Secretary Of The Army System of encoded speech transmission and reception
US4550222A (en) * 1981-09-28 1985-10-29 Siemens Aktiengesellschaft Process for interception-protected frequency band compressed transmission of speech signals
WO2004036762A2 (en) * 2002-10-16 2004-04-29 Mazetech Co., Ltd. Encryption processing method and device of a voice signal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4179586A (en) * 1972-08-02 1979-12-18 The United States Of America As Represented By The Secretary Of The Army System of encoded speech transmission and reception
US4550222A (en) * 1981-09-28 1985-10-29 Siemens Aktiengesellschaft Process for interception-protected frequency band compressed transmission of speech signals
WO2004036762A2 (en) * 2002-10-16 2004-04-29 Mazetech Co., Ltd. Encryption processing method and device of a voice signal

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"IEEE 1977 International Conference on Communications (ICC'77)", vol. 1, 12 June 1977, CHICAGO, article JAYANT, N.: "Speech encryption by manipulations of LPC and waveform-code parameters", pages: 13.4-301 - 13.4-305 *
SERVETTI, A. ET AL.: "Perception-based partial encryption of compressed speech", IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, vol. 10, no. 8, November 2002 (2002-11-01), pages 637 - 643, XP011079684 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014117646A1 (en) * 2013-02-01 2014-08-07 飞天诚信科技股份有限公司 Audio data parsing method
EP4128618A4 (en) * 2020-03-31 2024-04-10 Ericsson Telefon Ab L M Artificial intelligence (ai) based garbled speech elimination
CN113645613A (en) * 2021-07-08 2021-11-12 中国人民解放军战略支援部队信息工程大学 Real-time voice encryption equipment and method for cellular mobile network
CN113645613B (en) * 2021-07-08 2023-07-04 中国人民解放军战略支援部队信息工程大学 Cellular mobile network real-time voice encryption equipment and method

Similar Documents

Publication Publication Date Title
US5592555A (en) Wireless communications privacy method and system
US20050180571A1 (en) Scrambling a compression-coded signal
US20130272518A1 (en) Speech encryption method and device, speech decryption method and device
JP2001320780A (en) Mobile station used in radio network, and method used for exchanging signal between mobile station and base station in radio network
US9877146B2 (en) Methods and systems for transmission of arbitrary data via Bluetooth HFP audio connections with low latency
EP0648031B1 (en) Audio scrambling system for scrambling and descrambling audio signals
CN103002406B (en) A kind of voice encryption method being applied to arrowband radio digital communication system
JP2688659B2 (en) Encryption system for digital cellular communication
CN107786574A (en) The voice communication Source Encryption system of mobile terminal
WO2009074711A1 (en) Speech data encryption and decryption
KR20070103113A (en) Encryption /decryption method for voice signal and apparatus for the same
Ridha et al. Modified blind source separation for securing end-to-end mobile voice calls
US20160173456A1 (en) Dynamic Spectrum Audio Encryption and Decryption Device
CN104994500B (en) A kind of speech security transmission method and device for mobile phone
CN101329869A (en) System and method for encrypting acoustic source of speech encoding for vector quantization
CN105788602A (en) Voice encryption method and device for voice band compression system
RU2433547C1 (en) Method, apparatus and system for end-to-end encryption of voice data and transmission thereof over public communication networks
JP4229830B2 (en) Mobile phone system
KR100634495B1 (en) Wireless communication terminal having information secure function and method therefor
KR100408516B1 (en) Terminal for secure communication in CDMA system and methods for transmitting information using encryption and receiving information using decryption
KR100633391B1 (en) Codec transparent analog scramble encryption and decryption device and method
KR20040059146A (en) The encrypting device for voice signals and the encrypting method for voice signals
KR0157666B1 (en) Audio scramble system, audio scramble apparatus and audio descramble apparatus
KR20120116137A (en) Apparatus for voice communication and method thereof
Grier et al. ETERNAL: Encrypted Transmission With an Error-correcting, Real-time, Noise-resilient Apparatus on Lightweight Devices

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07858336

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07858336

Country of ref document: EP

Kind code of ref document: A1