WO2016062153A1

WO2016062153A1 - Method, system, and terminal for secure transmission of audio data

Info

Publication number: WO2016062153A1
Application number: PCT/CN2015/087245
Authority: WO
Inventors: 陈璐
Original assignee: 中兴通讯股份有限公司
Priority date: 2014-10-24
Filing date: 2015-08-17
Publication date: 2016-04-28
Also published as: CN104393994B; CN104393994A

Abstract

Disclosed are a method, system, and terminal for secure transmission of audio data. The method comprises: at the side of a transmitting terminal, the terminal receives an audio recording request launched by a user via an application, encrypts audio data recorded from an audio source to produce encrypted audio data, and transmits the encrypted audio data to a receiving terminal via the application; and, at the side of the receiving terminal, the terminal receives the audio data transmitted by the transmitting terminal, when an audio recording playback request launched by the user via an application is received, the terminal decrypts the audio data and plays the decrypted audio for the user.

Description

Audio data security transmission method, system and terminal

Technical field

Embodiments of the present invention relate to, but are not limited to, the field of communications, and in particular, to a method, system, and terminal for secure transmission of audio data.

Background technique

With the rapid development of intelligent terminals, users can already use voice real-time chat or send voice clips or recording files instead of manually input text chat mode. There is a kind of chat software. In order to save the user's trouble of inputting text, the user will be used. The voice is recorded, and the mobile phone is sent to the opposite end of the mobile phone to play and listen to the way. Because this way the network traffic is small, no need to type, the effect is similar to the call, but also can be listened to, so it is very popular with users. There is also a type of network real-time calling software that transmits the user's voice into a short voice segment and sends it to the other party at a small time interval, so that the user's call is continuous.

In the related art, the recording is recorded and played by the bottom layer framework of the terminal according to the request of the upper layer application, and the recording in the terminal is generated in two ways: a recording service and an audio service, wherein the first type of chat software uses a recording service, and the above Two types of network real-time calling software use audio services.

In this mode of the recording service, the upper layer application needs to specify the storage path of the recording file, and then requests the underlying frame of the terminal to start recording, and the recording file generated by the terminal underlying frame is saved in the file system corresponding to the specified file storage path, and the upper layer application is further The recording file is obtained in the file system. 1 is a schematic diagram of a recording service provided by a “recording service” in an underlying framework of a terminal in the related art, as shown in FIG. 1 , an upper layer application on the terminal A (for example, a “hold and hold” function of WeChat) The recording service module sends a request, the request carries a specified recording file storage path, requests the recording service module to record and stores the recording file in a specified path in the file system, and after the recording ends, the upper application obtains the recording file from the file system. Transmitting to the network server through the network, the network server stores the recording file in the local storage, and then transmits the recording file to the upper application on the destination terminal B through the network (such as WeChat's "press and hold" function), and the upper application of B will The recording file is stored in the file system, and then a request to play the recording is issued to the recording service module to play the recording file, and the recording service module reads the recording file from the file system and plays it.

The recording file in the recording service is a complete audio file, and the audio service is just a piece of original audio data. It only contains several speech frames. It is a small file, no file header, not a complete audio file. The system interface obtains the original audio data segment from a source such as a MIC. 2 is a schematic diagram of an "audio service" in an underlying framework of a terminal in the related art to provide a recording service for an upper layer application.

As shown in FIG. 2, the upper layer application on the terminal A (for example, the "live intercom" function of WeChat) acquires the original audio data segment recorded from the specified sound source from the audio service module, and transmits it to the network server through the network, and the network server uses the local network. The storage device accesses the original audio data segment, and then the network server transmits the upper layer application on the destination terminal B through the network (for example, "real-time intercom" function of WeChat), and the upper application of B (for example, "real-time pair of WeChat" The "speak" function sends the original audio data segment to the audio service module for playback.

The above-mentioned chat methods have security risks: such recording files or audio data segments are generally stored on the SD card, and the thief can use the method of directly copying files from the SD card and then playing them to understand the user's chat. Content (for example, the thief disguise a game software for the user to use, the built-in Trojan secretly scans all *.amr files on the SD card in the background, and packages them to the thief's own server), if these recordings contain the user's privacy. There is a greater danger to the user.

At one time, it was thought that it was safe to transmit these audio data through upper-layer applications and web servers, and some third-party applications claimed that the way they transmitted information was secure and was encrypted from the client to their servers, but in Snow. After uncovering the incident of large-scale surveillance by the US government, people discovered that the network as a transmission channel for information is very insecure, and the privacy of users may still leak on the server, for example, it may be obtained by hackers breaking the server; Or, for example, it may be obtained and utilized directly from the server by an untrustworthy third-party company. Therefore, the intermediate transmission channel through which the third-party application transmits data is not trusted, becomes unsafe, and cannot satisfy the user's privacy protection needs.

Therefore, there is a need for a secure audio data delivery method that can be trusted by users, independent of any third-party application, and capable of meeting the privacy needs of users.

Summary of the invention

The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.

The embodiment of the invention provides a method, a system and a terminal for safely transmitting audio data, which are independent of any application, prevent application leakage, high security and reliability, and meet the requirements of user protection privacy.

The embodiment of the invention provides a method for securely transmitting audio data, including:

The terminal receives a recording request initiated by the user through an application;

The audio data recorded from the sound source is encrypted to generate encrypted audio data, and the encrypted audio data is transmitted to the receiving terminal by the application.

Optionally, wherein the encrypting the audio data recorded from the sound source to generate the encrypted audio data comprises:

Prompting to enter the recording key to the terminal user, and receiving the recording key input by the terminal user, or acquiring a preset recording key stored locally;

After encrypting the voice frame of the recorded audio data by using the recording key input by the terminal user or the locally stored preset recording key, insert one or more feature speech frames to generate a final encrypted audio. Data; or, the voice frame of the recorded audio data is encrypted by using the recording key input by the terminal user or the locally stored preset recording key, and before the voice frame of the encrypted audio data Insert one or more feature speech frames to generate the final encrypted audio data.

Optionally, before the encrypting the audio data recorded from the audio source to generate the encrypted audio data, the method further includes:

Turn on the security mode to trigger the encryption of the audio data recorded from the source to generate encrypted audio data.

Optionally, wherein the audio data comprises a recording file and a piece of original audio data.

The embodiment of the invention further provides a method for securely transmitting audio data, including:

The terminal receives the audio data sent by the sending terminal;

After receiving the recording play request initiated by the application by the user, the terminal decrypts the audio data and plays the decrypted audio to the user.

Optionally, the audio data sent by the sending terminal includes: a voice frame of the encrypted audio data and one or more feature voice frames;

After receiving the recording play request initiated by the application by the user, the terminal decrypts the audio data, including:

After receiving the recording play request initiated by the application, the terminal identifies the audio data sent by the sending terminal, and after identifying the one or more feature voice frames, prompts the terminal user to input the recording key, and Receiving the recording key input by the terminal user, or after identifying the one or more feature speech frames, acquiring a preset recording key stored locally;

The voice frame of the encrypted audio data is decrypted by using a recording key input by the terminal user or the locally stored preset recording key.

Optionally, after the terminal receives the audio data sent by the sending terminal, the method further includes:

Identifying audio data sent by the transmitting terminal, and when the one or more feature speech frames are identified, the security mode is turned on.

The embodiment of the invention further provides a sending terminal, including:

The first application module is configured to receive a recording request initiated by the user, and send the encrypted audio data sent by the recording module to the receiving terminal;

The recording module is configured to encrypt the audio data recorded from the sound source to generate encrypted audio data, and send the encrypted audio data to the first application module.

Optionally, the recording module encrypts the audio data recorded by the audio source to generate encrypted audio data, including:

The recording module prompts the terminal user to input a recording key, and receives the recording key input by the terminal user, or obtains a preset recording key stored locally;

After encrypting the voice frame of the recorded audio data by using the recording key input by the terminal user or the locally stored preset recording key, insert one or more feature speech frames to generate Final encrypting the audio data; or encrypting the voice frame of the recorded audio data with the recording key input by the terminal user or the locally stored preset recording key, and encrypting the audio data Insert one or more feature speech frames before the speech frame to generate the final encrypted audio data.

Optionally, the sending terminal further includes a first security mode opening module connected to the recording module, configured to: before the encrypted audio data is encrypted from the audio source, generate a security mode, trigger the The recording module encrypts the audio data recorded by the sound source to generate encrypted audio data;

The recording module is further configured to: after being triggered by the first security mode on module, encrypt the audio data recorded from the sound source to generate encrypted audio data.

The embodiment of the invention further provides a receiving terminal, including:

The second application module is configured to receive audio data sent by the sending terminal, and is further configured to receive a recording play request initiated by the user;

The recording play module is configured to decrypt the audio data after receiving the recording play request, and play the decrypted audio to the user.

After receiving the recording play request, the recording and playing module decrypts the audio data, including:

After receiving the recording and playing request, the recording and playing module identifies the audio data sent by the sending terminal, and after identifying the one or more characteristic voice frames, prompts the terminal user to input the recording key, and receives the The recording key input by the terminal user, or after identifying the one or more feature speech frames, acquiring a preset recording key stored locally;

Optionally, the receiving terminal further includes the second application module and the recording play request a second security mode opening module connected to the module, configured to: after the second application module receives the audio data sent by the sending terminal, identify the audio data sent by the sending terminal, when identifying the one or When multiple feature speech frames are enabled, the security mode is enabled to trigger the recording play module to start.

The embodiment of the invention further provides an audio data security delivery system, comprising: the sending terminal and the receiving terminal as described above.

The embodiment of the invention further provides a computer readable storage medium storing program instructions, which can be implemented when the program instructions are executed.

Compared with the related art, the audio data security transmission method, system and terminal provided by the embodiment of the present invention encrypt or decrypt the audio data to be transmitted by the bottom layer module of the terminal, thereby preventing the recording content from being stolen from the SD card, and the application is only used as The transmission channel of the encrypted audio data is ciphertext for the application, so as to prevent the application from leaking, and the security and reliability are high, and the user's privacy protection needs are met.

Other aspects will be apparent upon reading and understanding the drawings and detailed description.

BRIEF abstract

1 is a schematic diagram of a “recording service” in an underlying framework of a terminal in the related art to provide a recording service for an upper layer application;

2 is a schematic diagram of a “audio service” in an underlying framework of a terminal in the related art providing a recording service for an upper layer application;

3 is a structural diagram of a transmitting terminal in an embodiment;

4 is a structural diagram of a receiving terminal in an embodiment;

FIG. 5 is a flowchart of a method for securely transmitting audio data in an embodiment; FIG.

Figure 6 is a schematic view of a "hard" switch and a "soft" switch in the embodiment;

7 is a flow chart of a method for securely transmitting audio data in an embodiment;

8 is a schematic diagram of providing a recording service for an upper layer application by a “recording service” in an underlying framework of a terminal in an application example;

9 is a schematic diagram of providing an audio service for an upper layer application by an "audio service" in an underlying framework of a terminal in an application example;

10 is a schematic diagram of inserting a feature speech frame into a recording file format in an application example;

Figure 11 is a schematic diagram of the insertion of a distinguishable feature into an original piece of audio data in an application example.

Embodiments of the invention

The embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other.

Example:

As can be seen from the background art, the recording is recorded and played by the underlying framework of the terminal in accordance with the request of the upper application. If the terminal can encrypt the recorded audio data, only the correct password can be played. If you do not enter the password or enter the wrong password, you can only hear the noise (including copying it to the PC and forcibly playing it with the music player) , the purpose of protecting the privacy of the user's voice can be achieved.

The present embodiment provides an audio data security delivery system, including a transmitting terminal and a receiving terminal, where, as shown in FIG. 3, the sending terminal includes:

The first application module 301 is configured to receive a recording request initiated by the user, and send the encrypted audio data sent by the recording module 302 to the receiving terminal;

The recording module 302 is configured to encrypt the audio data recorded from the sound source to generate encrypted audio data, and send the encrypted audio data to the first application module 301.

The recording module 302 is disposed in the bottom layer of the terminal. The first application module 301 refers to the application itself, and may be an application that is provided by the terminal itself, or may be a third-party application downloaded by the user, such as WeChat, QQ, and the like.

In an optional manner, the recording module 302 encrypts the audio data recorded by the audio source to generate encrypted audio data, including:

The recording module 302 prompts the terminal user to input a recording key, and receives the recording key input by the terminal user, or obtains a preset recording key stored locally;

After encrypting the voice frame of the recorded audio data by using the recording key input by the terminal user or the locally stored preset recording key, insert one or more feature speech frames to generate a final encrypted audio. data.

Optionally, the one or more feature speech frames may be inserted at any position of the speech frame of the encrypted audio data, for example, before the speech frame of the encrypted audio data, at the middle or at the end of the speech frame.

In an optional manner, the sending terminal further includes a first security mode opening module 303 connected to the recording module 302, configured to encrypt audio data recorded from the audio source before generating encrypted audio data. Turning on the security mode, triggering the recording module 302 to encrypt the audio data recorded from the sound source to generate encrypted audio data;

The recording module 302 is further configured to encrypt the audio data recorded from the sound source to generate encrypted audio data after being triggered by the first security mode opening module 303.

The audio data includes a recording file and a piece of original audio data.

As shown in FIG. 4, the receiving terminal includes:

The second application module 401 is configured to receive audio data sent by the sending terminal, and is further configured to receive a recording play request initiated by the user;

The recording play module 402 is configured to decrypt the audio data after receiving the recording play request, and play the decrypted audio to the user.

The recording and playing module 402 is disposed in the bottom layer of the terminal. The second application module 401 refers to the application itself, and may be an application that is provided by the terminal itself, or may be a third-party application downloaded by the user, such as WeChat, QQ, and the like.

The audio data sent by the sending terminal includes: a voice frame of the encrypted audio data and one or more feature speech frames;

After receiving the recording play request, the recording and playing module 402 decrypts the audio data, including:

After receiving the recording play request, the recording and playing module 402 identifies the audio data sent by the sending terminal, and after identifying the one or more feature voice frames, prompts the terminal user to input the recording key, and Receiving the recording key input by the terminal user, or after identifying the one or more feature speech frames, acquiring a preset recording key stored locally;

The receiving terminal further includes a second security mode opening module 403 connected to the second application module 401 and the recording and playing module 402, and is configured to be in the second application module. After receiving the audio data sent by the sending terminal, the 401 identifies the audio data sent by the sending terminal, and when the one or more feature voice frames are identified, the security mode is enabled, and the recording and playing module 402 is triggered. start up.

The audio data includes a recording file and a piece of original audio data.

As shown in FIG. 5, this embodiment provides a method for securely transmitting audio data, including the following steps:

S101: The terminal receives a recording request initiated by the user through an application.

S102: Encrypt the audio data recorded by the audio source to generate encrypted audio data.

In an optional manner, the audio data recorded from the audio source is encrypted to generate encrypted audio data, including:

Inserting one or more feature voices by using a recording key input by the terminal user or the locally stored preset recording key to encrypt the voice frame of the recorded audio data by using a preset recording key Frame, generating the final encrypted audio data.

Among them, if the recording is recorded for yourself, the user needs to remember the recording key by himself; if it is recorded to others (similar to the recording chat software), the sending and receiving parties should use other methods to agree on the recording key in advance, for example, The way the verbal agreement is made.

Optionally, one or more special types may be inserted before the voice frame of the encrypted audio data. The speech frame is advertised to generate the final encrypted audio data, and of course the one or more feature speech frames may be inserted in the middle or after the speech frame of the encrypted audio data.

The audio data includes a recording file and a piece of original audio data. This embodiment also includes two recording generation methods: a recording service and an audio service, wherein the recording service is for a recording file, and the audio service is for a piece of original audio data, and when the audio data is a recording file, the recording request is carried There is a storage path for the recorded file.

In this embodiment, the encrypted audio file or the original audio data segment maintains the normal audio file format, but some distinguishable features, i.e., one or more feature speech frames, are inserted therein. Since the normal audio file format is maintained, the ordinary music player can also play, but the noise is played when there is no key, which is convenient for the user to intuitively understand that the recording method is safe; in addition, these feature speech frames can be It is used by the receiver to identify the recording file or the original audio data segment is encrypted.

Wherein, the feature speech frame is different from the voice frame of the audio data recorded by the ordinary user when chatting, and the amr file with a bit rate of 12.2 kbps is taken as an example, and the length of the voice frame is 32 bytes, wherein the One byte is the frame header, fixed to 0x3c, and the last 31 bytes are real voice data. It is assumed that the feature speech frame is defined as: 31 bytes after the header 3c are 0x01, then its characteristics are very Obviously, because recording in the real world is impossible to produce such a speech frame. The above is an example of inserting a feature speech frame.

As for inserting a plurality of feature speech frames, it is not necessary to construct a speech frame that cannot be produced by recording in the real world, but to use a sequence relationship between a plurality of feature speech frames. For example, when a sequence of recorded speech frames is inserted as a feature speech frame in front of a speech frame of the encrypted audio data, the sequence of speech frames is a very obvious sign of the encrypted audio data. Because recording in the real world is not likely to produce exactly the same sequence of speech frames.

In one application example, a frame of feature speech frames can be inserted at the beginning of a sequence of speech files or original audio data segment speech frames. This frame feature speech frame playback time is extremely short (in the case of amr, one frame is 20ms), the human ear can not be distinguished, but the machine can recognize.

In another application example, a special speech frame (ie, inserting a recording) can be inserted at the beginning of the sequence of the voice file sequence of the recording file, that is, a plurality of feature speech frames, and a voice description of the encrypted recording file is recorded, and the recording is a human ear. Can be distinguished.

For example, insert a pre-recorded recording at the beginning of the sequence of voice files of the recording file: “palm whisper recording file” (in the case of amr, insert 50 frames for a total of 1 s recording), so that when the receiver plays with a normal music player, The user will first hear the recording and learn that the file is an encrypted recording file. Since this special speech frame is pre-recorded, it is a very obvious feature for the underlying security framework and is identifiable.

If it is the original audio data segment, the first method can be used, because the audio data segment is relatively short, and it is possible to use dozens of ms, so it is reasonable to insert only one frame of the feature speech frame, of course, if it is relatively large. Audio data segments can also be inserted into multiple feature speech frames.

In the case of a complete recording file, both methods are applicable. When the second method is used, since a recording is inserted, it is easy to recognize both for the human ear and for the machine. If the user directly opens the encrypted recording file on the file system of the receiving terminal, the terminal will find that this is an encrypted recording file according to the feature frame of the recording, and the input password dialog box will pop up, and the user enters the password. If the password is entered correctly, the user You can hear the real recording; but if the password is entered incorrectly or the password is not entered, the user will only hear the “palm whisper recording file”, and the noise behind it can be omitted directly by the terminal (this is different from the forced play on the PC). , mainly considering that users do not like to hear noise).

In this embodiment, as an optional manner, a security mode may be set on the terminal, which may be turned on or off. After the security mode is opened, when the upper layer application requests the underlying frame to record, the recording generated by the underlying frame is Encrypted; when turned off, it returns to normal recording; in order to turn safe mode on or off, a switch is required, which can be called a "closed" switch. It can be a real-life button on a terminal (herein referred to as a hard "close" button), as shown in Figure 6(a), or it can be a virtual button suspended on the screen (herein referred to as a soft "close" button). As shown in Figure 6 (b).

S103: Send the encrypted audio data to the receiving terminal by using the application.

As shown in FIG. 7, this embodiment provides a method for securely transmitting audio data, including the following steps:

S201: The terminal receives the audio data sent by the sending terminal.

S202: After receiving, by the terminal, a recording play request initiated by an application, the terminal Decryption of audio data;

The audio data sent by the sending terminal includes: a voice frame of the encrypted audio data and one or more feature voice frames;

The voice frame of the encrypted audio data is decrypted by using a preset recording key by using the recording key input by the terminal user or the locally stored preset recording key.

S203: Play the decrypted audio to the user.

The audio data includes a recording file and a piece of original audio data. This embodiment also includes two recording generation methods: recording service and audio service, and corresponding decryption playing audio data, and there are also two ways, wherein the recording service is for the recording file, and the audio service is for the original audio data segment. When the audio data is a recording file, the recording playback request carries a storage path of the recording file.

The present embodiment will be described in detail below in an application example.

FIG. 8 is a schematic diagram of a “recording service” in the underlying framework of the terminal in this application example providing a recording service for an upper layer application. As shown in FIG. 8, the terminal A and the terminal B agree on the key in advance and provide the key to the recording service module (ie, the recording module of the terminal A and the recording and playing module of the terminal B); the upper application on the terminal A ( For example, WeChat's "press and hold" function sends a request to the recording module, the request carries the specified recording file storage path, requests the recording module to record and stores the recording file in a specified path in the file system, and the recording module pairs The recording file recorded by the audio source is encrypted by using the key (the default key input by the terminal user or locally stored), and the encrypted recording file is generated, and the encrypted recording file is stored in the specified path. After the recording is finished, the upper layer application is The recording file is obtained in the file system and transmitted to the network server through the network, and the network server will record The audio file is stored in the local storage, and then the recording file is transmitted to the upper application on the destination terminal B through the network (for example, the "press and hold" function of WeChat), and the upper application of B stores the recording file in the file system, and then A request to play a recording to the recording playback module is required to play the recording file, and the recording service module reads the recording file from the file system and decrypts and plays it with a key (a default key input by the terminal user or locally stored).

Compared with the conventional "recording service" in the related art, the recording module of the embodiment outputs an encrypted recording file, which is transparent to the upper layer application, that is, the upper layer application still considers that it is using an ordinary "recording". service". The upper layer application does not participate in the encoding by itself. Therefore, in this embodiment, the recording encryption can be implemented by the underlying security framework on the premise that the upper layer application does not modify the code.

Since the recording file is encrypted at the time of generation, the recording files acquired by the upper layer application and acquired by the network server are encrypted, and no key cannot be played correctly. If forced playback, only noise is heard. Even if it is obtained by a thief, it will not reveal the privacy of the user's voice.

FIG. 9 is a schematic diagram of the “audio service” in the underlying framework of the terminal in this application example providing a recording service for an upper layer application. As shown in FIG. 9, the terminal A and the terminal B agree on the key in advance and provide the key to the recording module; the upper layer application on the terminal A (for example, the "real-time intercom function" of the WeChat) is obtained from the recording module. Specifies the original audio data segment recorded by the source. The recording module encrypts the original audio data segment using the key (the terminal user input or the locally stored preset key), and returns the encrypted original audio data segment to the upper application. The network transmits to the network server, and the network server accesses the original audio data segment by using the local storage device, and then the network server transmits the upper layer application on the destination terminal B through the network (for example, the "real-time intercom" function of WeChat), B The upper layer application (for example, the "live intercom" function of WeChat) sends the original audio data segment to the recording and playing module, and the recording and playing module decrypts and plays with the key (the terminal key input by the terminal user or locally stored). .

Compared with the ordinary "audio service" in the related art, the recording module of the embodiment outputs the encrypted original audio data segment, which is transparent to the upper layer application, that is, the upper layer application still considers that it is using ordinary "Audio Service". Since the original audio data segment is encrypted, the original audio data segment acquired by the upper layer application and acquired by the network server is encrypted, and no key cannot be played correctly. If forced playback, only noise is heard. Even if it is obtained by a thief, it will not reveal the privacy of the user's voice.

FIG. 10 is a schematic diagram of inserting a feature speech frame into a recording file format in the application example. among them:

101 is an ordinary audio file structure. For convenience of explanation, an amr file is used as an example, and amr is the most commonly used file format for recording recording files on a terminal;

102 is an encrypted audio file structure (taking amr as an example) in which a feature speech frame is inserted;

103 is an encrypted audio file structure (taking amr as an example) in which a plurality of feature speech frames (for example, a recording) are inserted;

104 is the file header;

105 is an unencrypted original speech frame;

106 is a feature speech frame;

107 is the encrypted speech frame (if forced to play as noise);

108 is a piece of recording composed of the feature speech frame 1, ..., the feature speech frame m (for example, "palm whisper recording file"), which is easy to judge whether it is a human ear or a machine.

Among them, one of the characteristic speech frames in 102 is convenient for machine recognition, and is not convenient for human ear recognition; the plurality of characteristic speech frames in 103 are convenient for machine identification and easy for human ear recognition; if the PC is forcibly played 102 on the PC, noise will be heard; Forcibly playing 103 on a PC, you will hear the “palm encryption file”, then the noise; when using 103, the user will easily recognize that the recording is encrypted using the above method.

Figure 11 is a schematic diagram of the insertion of distinguishable features into the original audio data segments in this application example. among them:

201 is a piece of raw audio data recorded by the "audio service";

202 is an encrypted original audio data segment in which a feature speech frame is inserted;

203 is an unencrypted original speech frame;

204 is a feature speech frame;

205 is an encrypted speech frame (if forced to play as noise);

It can be seen from the foregoing embodiments that, in relation to the related art, the audio data security transmission method, system, and terminal provided in the foregoing embodiments are encrypted or decrypted by the audio data to be transmitted by the bottom layer module of the terminal, thereby preventing theft from the SD card. The content of the recording, and the application is only used as the transmission channel of the encrypted audio data. The audio data transmitted to the application is ciphertext, thereby preventing the application and the audio data from being leaked in the intermediate transmission channel (network and network server). High reliability and meeting the needs of users to protect privacy.

One of ordinary skill in the art will appreciate that all or a portion of the steps described above can be accomplished by a program that instructs the associated hardware, such as a read-only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the foregoing embodiment may be implemented in the form of hardware or in the form of a software function module. Embodiments of the invention are not limited to any specific form of combination of hardware and software.

Industrial applicability

The method, system and terminal provided by the embodiment of the invention encrypt or decrypt the audio data to be transmitted by the bottom layer module of the terminal, which can prevent the recording content from being stolen from the SD card, and the application only serves as a transmission channel of the encrypted audio data. The audio data transmitted by the application is ciphertext, thereby preventing the application from leaking, and the security and reliability are high, and the user's privacy protection needs are met.

Claims

A method for securely transmitting audio data, comprising:

The terminal receives a recording request initiated by the user through an application;

The audio data recorded from the sound source is encrypted to generate encrypted audio data, and the encrypted audio data is transmitted to the receiving terminal by the application.
The method of claim 1 wherein:

The encrypting the audio data recorded from the sound source to generate encrypted audio data, including:

Prompting to enter the recording key to the terminal user, and receiving the recording key input by the terminal user, or acquiring a preset recording key stored locally;

After encrypting the voice frame of the recorded audio data by using the recording key input by the terminal user or the locally stored preset recording key, insert one or more feature speech frames to generate a final encrypted audio. Data; or, the voice frame of the recorded audio data is encrypted by using the recording key input by the terminal user or the locally stored preset recording key, and before the voice frame of the encrypted audio data Insert one or more feature speech frames to generate the final encrypted audio data.
The method according to claim 1 or 2, before the encrypting the audio data recorded from the sound source to generate encrypted audio data, the method further comprises:

Turn on the security mode to trigger the encryption of the audio data recorded from the source to generate encrypted audio data.
The method of claim 1 or 2 wherein:

The audio data includes a recording file and a piece of original audio data.
A method for securely transmitting audio data, comprising:

The terminal receives the audio data sent by the sending terminal;

After receiving the recording play request initiated by the application by the user, the terminal decrypts the audio data and plays the decrypted audio to the user.
The method of claim 5 wherein:

The audio data sent by the sending terminal includes: a voice frame of the encrypted audio data and one or more feature speech frames;

After receiving the recording play request initiated by the application by the user, the terminal decrypts the audio data, including:

After receiving the recording play request initiated by the application, the terminal identifies the audio data sent by the sending terminal, and after identifying the one or more feature voice frames, prompts the terminal user to input the recording key, and Receiving the recording key input by the terminal user, or after identifying the one or more feature speech frames, acquiring a preset recording key stored locally;

The voice frame of the encrypted audio data is decrypted by using a recording key input by the terminal user or the locally stored preset recording key.
The method of claim 5, after the terminal receives the audio data sent by the transmitting terminal, the method further includes:

Identifying audio data sent by the transmitting terminal, and when the one or more feature speech frames are identified, the security mode is turned on.
A method as claimed in claim 5, 6 or 7 wherein:

The audio data includes a recording file and a piece of original audio data.
A transmitting terminal includes:

The first application module is configured to receive a recording request initiated by the user, and send the encrypted audio data sent by the recording module to the receiving terminal;

The recording module is configured to encrypt the audio data recorded from the sound source to generate encrypted audio data, and send the encrypted audio data to the first application module.
The transmitting terminal according to claim 9, wherein:

The recording module encrypts the audio data recorded by the audio source to generate encrypted audio data, including:

The recording module prompts the terminal user to input a recording key, and receives the recording key input by the terminal user, or obtains a preset recording key stored locally;

After encrypting the voice frame of the recorded audio data by using the recording key input by the terminal user or the locally stored preset recording key, insert one or more feature speech frames to generate a final encrypted audio. Data; or, the voice frame of the recorded audio data is encrypted by using the recording key input by the terminal user or the locally stored preset recording key, and after encryption Insert one or more feature speech frames before the speech frame of the audio data to generate the final encrypted audio data.
The transmitting terminal according to claim 9 or 10, further comprising a first security mode on module connected to the recording module, configured to encrypt audio data recorded from a sound source before generating encrypted audio data Turning on the security mode, triggering the recording module to encrypt the audio data recorded by the audio source to generate encrypted audio data;

The recording module is further configured to: after being triggered by the first security mode on module, encrypt the audio data recorded from the sound source to generate encrypted audio data.
A transmitting terminal according to claim 9 or 10, wherein:

The audio data includes a recording file and a piece of original audio data.
A receiving terminal includes:

The second application module is configured to receive audio data sent by the sending terminal, and is further configured to receive a recording play request initiated by the user;

The recording play module is configured to decrypt the audio data after receiving the recording play request, and play the decrypted audio to the user.
The receiving terminal of claim 13 wherein:

The audio data sent by the sending terminal includes: a voice frame of the encrypted audio data and one or more feature speech frames;

After receiving the recording play request, the recording and playing module decrypts the audio data, including:

After receiving the recording and playing request, the recording and playing module identifies the audio data sent by the sending terminal, and after identifying the one or more characteristic voice frames, prompts the terminal user to input the recording key, and receives the The recording key input by the terminal user, or after identifying the one or more feature speech frames, acquiring a preset recording key stored locally;

The voice frame of the encrypted audio data is decrypted by using a recording key input by the terminal user or the locally stored preset recording key.
The receiving terminal according to claim 13, wherein said receiving terminal further comprises said second responder a second security mode opening module connected by the module and the recording play requesting module, configured to: after the second application module receives the audio data sent by the sending terminal, identify the audio data sent by the sending terminal When the one or more feature speech frames are identified, the security mode is turned on, and the recording play module is triggered to be started.
A receiving terminal according to claim 13, 14 or 15, wherein:

The audio data includes a recording file and a piece of original audio data.
An audio data secure delivery system comprising: a transmitting terminal according to claims 9-12 and a receiving terminal according to claims 13-16.
A computer readable storage medium storing program instructions that, when executed, can implement the method of any of claims 1-4.
A computer readable storage medium storing program instructions that, when executed, implement the method of any of claims 5-8.