GB2585334A - Method for obscuring or encrypting a voice recording - Google Patents

Method for obscuring or encrypting a voice recording Download PDF

Info

Publication number
GB2585334A
GB2585334A GB1902423.1A GB201902423A GB2585334A GB 2585334 A GB2585334 A GB 2585334A GB 201902423 A GB201902423 A GB 201902423A GB 2585334 A GB2585334 A GB 2585334A
Authority
GB
United Kingdom
Prior art keywords
recording
word
processing
speech
compressed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1902423.1A
Other versions
GB201902423D0 (en
Inventor
Coker Tim
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to GB1902423.1A priority Critical patent/GB2585334A/en
Publication of GB201902423D0 publication Critical patent/GB201902423D0/en
Publication of GB2585334A publication Critical patent/GB2585334A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04KSECRET COMMUNICATION; JAMMING OF COMMUNICATION
    • H04K1/00Secret communication
    • H04K1/06Secret communication by transmitting the information or elements thereof at unnatural speeds or in jumbled order or backwards
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • G10L15/05Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

A speech recording is obfuscated by processing it with a speech recognition means, identifying individual spoken words, extracting those words, reversing them, and then re-inserting the reversed word in place of where it was extracted from the recording. The reinsertion step may be done so as to neither overwrite unprocessed data nor create new gaps in the modified recording. A word table may be generated (Fig. 4). The method may take place substantially in real-time. The method may be used to obscure an actor’s speech, for example in either a motion picture or in a theatre production, with the word table used to generate subtitles. This method allows a recording to be rendered unintelligible by reversing the time order in which each word is stored whilst retaining the word order in the recording. The original recording may be recovered by using information about the word breaks.

Description

Method for obscuring or encrypting a voice recording This invention relates to a method of obscuring or encrypting a voice message or spoken word recording.
Voice messages and recordings are a common form of communication, for example voice-mail used in modern telephony, but more generally the spoken word has been recorded since the 19' century. Initially this was done using analogue methods but it is now invariably done with electronic and digital methods. Other examples of the application of voice recordings include the sound track to motion pictures, radio productions (that are not live) and audio books. In all cases the recorded form will consist of a series of spoken words that together form a speech, utterance or perhaps simply information in its most general sense. In some cases the recording may also include backing music or sound effects, but these are not an essential part of the invention.
In certain cases it may be advantageous or necessary to obscure or even encrypt such a voice recording, for security or possibly just to defeat casual eavesdropping. This can be done in many ways which will be familiar to those skilled in the art, for example by applying an encryption key to the digital message.
One method of obscuring a voice recording is to simply reverse the time order in which the recording is stored, this has the effect that when played back the recording becomes completely unintelligible. Note that with this method both the individual words and the sequence in which they are stored is reversed. However this method has the disadvantage in that the original message can be very easily restored by simply reversing the order of the stored message a second time, noting that no additional information is required to achieve this. The invention presently disclosed improves on this simple technique by reversing the time order in which each word is stored but retaining the order of the sequence of words in the recording. Now, when played back, the modified recording is still unintelligible, but the original version of the message cannot be restored by simply reversing the whole message. The original message can now only be fully restored if the restoration means also has the information regarding the positions in time of the word breaks in the original message. This requirement for additional information to recover the original form of the message means that the process is actually a form of encryption, and the information about the word breaks is equivalent to an encryption key.
The invention will now be described, purely by way of example, with reference to the accompanying Figures in which: Figure 1 shows a breakdown of the steps required in order to obscure or encrypt a voice message according to the invention.
Figure 2 shows Steps 2 and 3 of the process in more detail.
Figure 3 indicates how the order of words in the message is retained.
Figure 4 shows an example of a Word Table.
In one embodiment of the invention the process has four steps which are those shown in Figure 1. First the actual voice recording is digitised (Step 1, although this aspect may be carried out by a separate means), using any method known in the art, for example sampling the recorded waveform at a suitable frequency (16kHz is adequate for simple speech) and then digitising each sample, typically with 16 bit precision. These digitised samples can then be packed together in a digital file. By way of example, the ".wav" file format describes how uncompressed, digitised sound can be recorded in this way.
In Step 2, the digitised recording is processed using any suitable speech recognition means, for example CMUSphinx (htfros://cmusphinx.github.io) is an Open Source speech recognition library. One function available in CMUSphinx is a means to iterate through a voice message or digitised speech recording and to output the start and finish times of each word as it is recognised. With this information the individual words can be extracted from the recording and then individually reversed (Step 3). Once the whole recording has been processed in this way, it is stored (Step 4) as an encrypted or obscured message, using any suitable storage means. Note that this step is not essential to the invention, for example the modified recording could be broadcast and then discarded.
Figure 2 shows more detail for Steps 2 and 3. For Step 2, Step 2a 25 is the processing referred to above which has as its output the start/finish time within the recording of individual words (and optionally, the text form of the word itself). In Step 2b, this start/finish time is used to extract the individual word from the recording, said extracted word can then be reversed (Step 3a) and then re-inserted into the recording in place of the original word (Step 3b). An optional output of Step 2 is a "Word Table" (Step 2c), indicating, for each word recognised, the start/finish time within the recording. If this output is stored separately to the modified recording it becomes an encryption key that can be used to restore the original from a modified recording.
In Figure 3 we show, diagrammatically, how a recorded message is processed. The reversed spelling of each word is meant to indicate how the individual words are reversed but the order of words within the whole is retained.
Figure 4 is an example of a word table that might be produced at 15 Step 2c. If this is transmitted to a recipient along with the modified recording, then the original recording can be reconstructed using the following process: A processing means scans through the recording to the point indicated by the first entry in the word table.
An extract of the recording, starting and finishing at the times indicated for the said first entry, is made.
This extract is reversed, then re-inserted into the recording starting at the start time of the said first entry.
This process is repeated for each subsequent entry in the said
table.
Where necessary the extracted and/or reversed data would be decompressed and optionally re-compressed and the same regard for data size as the encryption methods should also be made (see below).
Note that, if the word table is transmitted to a recipient as an encryption key, then, clearly, the text form of the actual words is not included.
In a second embodiment of the invention, the recording might be compressed after it is sampled and digitised, for example according to the well known mp3 format. In this embodiment, the extracted word would need to be de-compressed before being reversed, and then re-compressed before being re-inserted into the message. The means to achieve this would also need to take account of the form of the compression, in that the compressed recording has something of a "frame" structure (unlike the uncompressed form which is effectively a continuous stream of samples without internal structure), thus when a word is extracted you would need to extract between frame boundaries in order to ensure that sufficient information is extracted to allow the extracted word to be de-compressed properly. Once decompressed the word is reversed then re-compressed before being re-inserted into the original.
Note that in the first embodiment, the file size of the modified recording should be exactly the same as the original, but in the second embodiment this is not necessarily the case, and the modified file could be the same size as the original, or smaller or indeed larger.
In a third embodiment, being a variation to the second embodiment, the reversed word is not re-compressed before being re-inserted, in which case the file size of the modified recording will definitely be larger than that of the original. In this case, whilst the method is different, the form of the modified recording will actually be the same as the first embodiment. In the cases where the reversed and possibly re-compressed word does not have the same data size as the extract it is replacing, when it is reinserted into the file (ie recording) this must be done in such a way as not to over-write information that has not yet been processed or create gaps in the recording that were not present in the original. Means to do this will be known to those skilled in the art.
In a fourth embodiment, wherein the original recording is compressed, the whole recording is decompressed before the word recognition and extraction/reversion/re-insertion process is carried out, but the subsequent file is not re-compressed. A fifth embodiment, being a minor variation to the fourth embodiment, would be to re-compress the recording after the extraction/reversion process has been carried out. A sixth embodiment is a variation to the first embodiment wherein the recording is compressed after being processed.
In terms of security it should be noted that the second, fifth and sixth embodiments are much more secure than any of the first, 25 third or fourth embodiments. This is because a compressed recording cannot be simply reversed in order to play it back unlike the other embodiments, where it is feasible to simply reverse a modified recording, although the information in the modified/reversed recording is still likely to be obscured or the speech unintelligible.
In a second application of the invention, which may use any of the six embodiments thus far described although methods that don't use compression will be more straightforward, the invention is used to provide a sound effect that can be referred to as "alien translation". This could be required for a science-fiction film or theatre production, where actors are required to portray aliens who would not be talking in English. In this case the method described herein has the advantage over prior art methods in that the rhythm of the spoken words is retained whilst the sound of each word is rendered unintelligible. This is because each word is reversed individually and not the order of words in the whole, thus the rhythm is retained and as a consequence the overall effect is more believable than simply reversing everything.
This application of the invention has the further advantage that the word table produced at Step 2c can be readily and immediately used to provide sub-titles such that the audience will be able to follow what the "alien" is saying. In the case of a motion picture, the word table information can be used to create a subtitle track to the film in the standard way. However for a theatre production, the word table would be used to display sub-title like information to the audience on additional screens. As an additional advantage to the invention, the method clearly works "on-the-fly" meaning that the subtitle for each word is created in close to real time, whereas a prior art method, based on reversing the whole recording, would neither broadcast the actor's alien words, nor the subtitles, to the audience in anything like real time, thus spoiling the overall effect.
Note that throughout this specification, the term "recording" is used to refer to a voice message or speech that has been processed according to the invention or is undergoing such processing or is about to undergo such processing. It should be understood that the term "recording" does not necessarily imply that the result of the method being applied is used to create a permanent or substantially permanent copy of the speech or message. For example the modified "recording" could be broadcast to an audience (as in the second application) and then discarded creating no permanent copy. Nor should it be inferred that the input to the first processing step is required to be a recording rather than, for example, the live output of a microphone, or any other form of data stream. It should also be understood that as the method is followed, there will exist within the processing means what will in effect be interim copies of the voice message or speech, and these should all be considered as forms of a recording, according to the invention, and interpreted accordingly.
Further note that, whilst the speech recognition aspect of the method (Step 2a) can be accomplished with high accuracy, the method is not itself dependent on achieving 100% accuracy in this step. For the second application of the invention (motion picture or theatre productions wherein the actor is talking from a script) accuracy is more important but the script itself can be used to substantially improve the accuracy of the speech recognition process should it be necessary.

Claims (18)

  1. Claims 1. A method of encrypting or obscuring a digitally recorded voice message or passage of speech, including the following steps: processing the digitised message with a speech recognition means in order to identify the individual spoken words, and for each word identified, extracting it from the recording, and reversing the extracted word, and re-inserting it into the recording in place of the part that has been extracted.
  2. 2. A method according to Claim 1, in which the original recording is compressed and including the additional steps: where each word is extracted, this is done so taking due account of the formatting of the compressed data, and each extracted word is decompressed before being reversed, and each reversed word is re-compressed before being re-inserted into the recording.wherein Lhe re-insertion sLep is carried cid, so as Lo neiLher over-write unprocessed data nor create gaps in the modified 20 recording that were not present in the original.
  3. 3. A method according to Claim 1, in which the original recording is compressed and including the following additional steps: where each word is extracted, this is done so taking due account of the formatting of the compressed data, and each extracted word is decompressed before being reversed, and each reversed word is re-inserted into the recording without being re-compressed.wherein the re-insertion step is carried out so as to neither over-write unprocessed data nor create gaps in the modified recording that were not present in the original.
  4. 4. A method of encrypting or obscuring a recorded voice message or passage of speech in which the recording is compressed and including the additional steps: de-compressing the entire recording, and processing the said de-compressed recording according to the method described in Claim 1.
  5. S. A method according to Claim 4 including the additional step of re-compressing the recording after processing.
  6. 6. A method according to Claim 1 including the additional step of compressing the recording after processing.
  7. 7. A method of encrypting or obscuring a recorded voice message or passage of speech according to any of Claims 1 to 6 including the additional step of generating a word table.
  8. 8. A method to allow an actor in a motion picture to portray an alien not speaking in an intelligible language, including the steps of: recording the actor's voice, and processing said recording according to Claim 7, and using the processed recording in the sound track of the motion picture.
  9. 9. A method according to Claim 8, including the additional step 10 of using "word table" information derived from the method to create sub-titles for the motion picture.
  10. 10. A method to allow an actor in a theatre production to portray an alien not speaking in an intelligible language, including the steps of: recording the actor's voice, and processing said recording according Lo Claim 7, broadcasting the processed recording to the audience wherein the processing is executed quickly enough that the overall effect happens substantially in real-time.
  11. 11. A method according to Claim 10, including the additional step of using the "word table" information that can be derived from the method to create sub-titles for the theatre audience that can be displayed on suitably placed screens.
  12. 12. A method according to any of Claims 8 to 11 wherein the accuracy of the processing step is improved by reference to the script the actor is talking from.
  13. 13. A method for decrypting a recording encrypted using a method according to any previous Claim wherein a word table is generated, 10 including the steps of: scanning through the recording to the point indicated in the first entry in the word table, and making an extract from the recording starting and finishing at the times indicated for this entry, and reversing this extract, and re-inserting the reversed extract into the recording starting at the start time of this entry, and repeating these steps for each subsequent entry in the table.wherein the re-insertion step is carried so as to neither over-20 write unprocessed data or create gaps in the modified recording that were not present in the original.
  14. 14. A device for modifying or encrypting a voice message or passage of speech including means for: recording said speech, and sampling and digitising said recording, and processing said digital recording according to any of Claims 1 to 7.
  15. 15. A device according to Claim 14 in which the processing means is fast enough to process the recording in substantially real-time.
  16. 16. A device according to Claim IS including further means for broadcasting said modified recording to an audience in such a way that it appears that an actor is speaking unintelligibly.
  17. 17. A device according to Claim 14, 15 or 16 including further means to create sub-titles from word table information.
  18. 18. A device according Lo Claim 17 including further means Lo display said sub-titles to a theatre audience.
GB1902423.1A 2019-02-22 2019-02-22 Method for obscuring or encrypting a voice recording Withdrawn GB2585334A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1902423.1A GB2585334A (en) 2019-02-22 2019-02-22 Method for obscuring or encrypting a voice recording

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1902423.1A GB2585334A (en) 2019-02-22 2019-02-22 Method for obscuring or encrypting a voice recording

Publications (2)

Publication Number Publication Date
GB201902423D0 GB201902423D0 (en) 2019-04-10
GB2585334A true GB2585334A (en) 2021-01-13

Family

ID=65999010

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1902423.1A Withdrawn GB2585334A (en) 2019-02-22 2019-02-22 Method for obscuring or encrypting a voice recording

Country Status (1)

Country Link
GB (1) GB2585334A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2386041A (en) * 2002-03-01 2003-09-03 Sensaura Ltd A method of modifying intermittent audio signals such as speech or musical notes in real time
US20050065778A1 (en) * 2003-09-24 2005-03-24 Mastrianni Steven J. Secure speech
US20080243492A1 (en) * 2006-09-07 2008-10-02 Yamaha Corporation Voice-scrambling-signal creation method and apparatus, and computer-readable storage medium therefor
US20120016665A1 (en) * 2007-03-22 2012-01-19 Yamaha Corporation Sound masking system and masking sound generation method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2386041A (en) * 2002-03-01 2003-09-03 Sensaura Ltd A method of modifying intermittent audio signals such as speech or musical notes in real time
US20050065778A1 (en) * 2003-09-24 2005-03-24 Mastrianni Steven J. Secure speech
US20080243492A1 (en) * 2006-09-07 2008-10-02 Yamaha Corporation Voice-scrambling-signal creation method and apparatus, and computer-readable storage medium therefor
US20120016665A1 (en) * 2007-03-22 2012-01-19 Yamaha Corporation Sound masking system and masking sound generation method

Also Published As

Publication number Publication date
GB201902423D0 (en) 2019-04-10

Similar Documents

Publication Publication Date Title
US9870799B1 (en) System and method for processing ancillary data associated with a video stream
US20080275700A1 (en) Method of and System for Modifying Messages
US7657428B2 (en) System and method for seamless switching of compressed audio streams
JP4117328B2 (en) An apparatus and method for detoxifying content including audio, moving images, and still images.
TW200407812A (en) Information storage medium containing subtitle data for multiple languages using text data and downloadable fonts and apparatus therefor
EP0390049A3 (en) Apparatus and method for digital data continuously input or output
EP1770704A2 (en) Data recording and reproducing apparatus, method, and program therefor
CN111970579A (en) Video music adaptation method and system based on AI video understanding
US8615153B2 (en) Multi-media data editing system, method and electronic device using same
KR20140141408A (en) Method of creating story book using video and subtitle information
GB2585334A (en) Method for obscuring or encrypting a voice recording
US20070061133A1 (en) Recording/reproduction apparatus and recording/reproduction method
CN104538048B (en) A kind of processing method and processing device of audio file
CN108391064A (en) A kind of video clipping method and device
KR20030029790A (en) Information signal edition apparatus, information signal edition method, and information signal edition program
JP2822940B2 (en) Video and audio data editing device
KR101709053B1 (en) Caption data structure and caption player for synchronizing syllables between a sound source and caption data
KR102523814B1 (en) Electronic apparatus that outputs subtitle on screen where video is played based on voice recognition and operating method thereof
JP2006510304A (en) Method and apparatus for selectable rate playback without speech distortion
JPS62180684A (en) Editing and presenting device for voice and image
KR102150639B1 (en) Device of audio data for verifying the integrity of digital data and Method of audio data for verifying the integrity of digital data
JP5223948B2 (en) Electronics
JP4529859B2 (en) Audio playback device
JP2005204003A (en) Continuous media data fast reproduction method, composite media data fast reproduction method, multichannel continuous media data fast reproduction method, video data fast reproduction method, continuous media data fast reproducing device, composite media data fast reproducing device, multichannel continuous media data fast reproducing device, video data fast reproducing device, program, and recording medium
WO2021109000A1 (en) Data processing method and apparatus, electronic device, and storage medium

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)