WO2016062153A1 - Method, system, and terminal for secure transmission of audio data - Google Patents

Method, system, and terminal for secure transmission of audio data Download PDF

Info

Publication number
WO2016062153A1
WO2016062153A1 PCT/CN2015/087245 CN2015087245W WO2016062153A1 WO 2016062153 A1 WO2016062153 A1 WO 2016062153A1 CN 2015087245 W CN2015087245 W CN 2015087245W WO 2016062153 A1 WO2016062153 A1 WO 2016062153A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio data
recording
terminal
module
encrypted
Prior art date
Application number
PCT/CN2015/087245
Other languages
French (fr)
Chinese (zh)
Inventor
陈璐
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016062153A1 publication Critical patent/WO2016062153A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/03Protecting confidentiality, e.g. by encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W8/00Network data management
    • H04W8/22Processing or transfer of terminal data, e.g. status or physical capabilities

Definitions

  • Embodiments of the present invention relate to, but are not limited to, the field of communications, and in particular, to a method, system, and terminal for secure transmission of audio data.
  • chat software In order to save the user's trouble of inputting text, the user will be used. The voice is recorded, and the mobile phone is sent to the opposite end of the mobile phone to play and listen to the way. Because this way the network traffic is small, no need to type, the effect is similar to the call, but also can be listened to, so it is very popular with users. There is also a type of network real-time calling software that transmits the user's voice into a short voice segment and sends it to the other party at a small time interval, so that the user's call is continuous.
  • the recording is recorded and played by the bottom layer framework of the terminal according to the request of the upper layer application, and the recording in the terminal is generated in two ways: a recording service and an audio service, wherein the first type of chat software uses a recording service, and the above Two types of network real-time calling software use audio services.
  • the upper layer application needs to specify the storage path of the recording file, and then requests the underlying frame of the terminal to start recording, and the recording file generated by the terminal underlying frame is saved in the file system corresponding to the specified file storage path, and the upper layer application is further The recording file is obtained in the file system.
  • 1 is a schematic diagram of a recording service provided by a “recording service” in an underlying framework of a terminal in the related art, as shown in FIG.
  • an upper layer application on the terminal A (for example, a “hold and hold” function of WeChat)
  • the recording service module sends a request, the request carries a specified recording file storage path, requests the recording service module to record and stores the recording file in a specified path in the file system, and after the recording ends, the upper application obtains the recording file from the file system.
  • the network server Transmitting to the network server through the network, the network server stores the recording file in the local storage, and then transmits the recording file to the upper application on the destination terminal B through the network (such as WeChat's "press and hold” function), and the upper application of B will
  • the recording file is stored in the file system, and then a request to play the recording is issued to the recording service module to play the recording file, and the recording service module reads the recording file from the file system and plays it.
  • the recording file in the recording service is a complete audio file, and the audio service is just a piece of original audio data. It only contains several speech frames. It is a small file, no file header, not a complete audio file.
  • the upper layer application on the terminal A acquires the original audio data segment recorded from the specified sound source from the audio service module, and transmits it to the network server through the network, and the network server uses the local network.
  • the storage device accesses the original audio data segment, and then the network server transmits the upper layer application on the destination terminal B through the network (for example, "real-time intercom” function of WeChat), and the upper application of B (for example, "real-time pair of WeChat"
  • the "speak” function sends the original audio data segment to the audio service module for playback.
  • chat methods have security risks: such recording files or audio data segments are generally stored on the SD card, and the thief can use the method of directly copying files from the SD card and then playing them to understand the user's chat.
  • Content for example, the thief disguise a game software for the user to use, the built-in Trojan secretly scans all *.amr files on the SD card in the background, and packages them to the thief's own server, if these recordings contain the user's privacy. There is a greater danger to the user.
  • the embodiment of the invention provides a method, a system and a terminal for safely transmitting audio data, which are independent of any application, prevent application leakage, high security and reliability, and meet the requirements of user protection privacy.
  • the embodiment of the invention provides a method for securely transmitting audio data, including:
  • the terminal receives a recording request initiated by the user through an application
  • the audio data recorded from the sound source is encrypted to generate encrypted audio data, and the encrypted audio data is transmitted to the receiving terminal by the application.
  • the encrypting the audio data recorded from the sound source to generate the encrypted audio data comprises:
  • the voice frame of the recorded audio data After encrypting the voice frame of the recorded audio data by using the recording key input by the terminal user or the locally stored preset recording key, insert one or more feature speech frames to generate a final encrypted audio.
  • the voice frame of the recorded audio data is encrypted by using the recording key input by the terminal user or the locally stored preset recording key, and before the voice frame of the encrypted audio data Insert one or more feature speech frames to generate the final encrypted audio data.
  • the method before the encrypting the audio data recorded from the audio source to generate the encrypted audio data, the method further includes:
  • the audio data comprises a recording file and a piece of original audio data.
  • the embodiment of the invention further provides a method for securely transmitting audio data, including:
  • the terminal receives the audio data sent by the sending terminal
  • the terminal After receiving the recording play request initiated by the application by the user, the terminal decrypts the audio data and plays the decrypted audio to the user.
  • the audio data sent by the sending terminal includes: a voice frame of the encrypted audio data and one or more feature voice frames;
  • the terminal After receiving the recording play request initiated by the application by the user, the terminal decrypts the audio data, including:
  • the terminal After receiving the recording play request initiated by the application, the terminal identifies the audio data sent by the sending terminal, and after identifying the one or more feature voice frames, prompts the terminal user to input the recording key, and Receiving the recording key input by the terminal user, or after identifying the one or more feature speech frames, acquiring a preset recording key stored locally;
  • the voice frame of the encrypted audio data is decrypted by using a recording key input by the terminal user or the locally stored preset recording key.
  • the method further includes:
  • the audio data comprises a recording file and a piece of original audio data.
  • the embodiment of the invention further provides a sending terminal, including:
  • the first application module is configured to receive a recording request initiated by the user, and send the encrypted audio data sent by the recording module to the receiving terminal;
  • the recording module is configured to encrypt the audio data recorded from the sound source to generate encrypted audio data, and send the encrypted audio data to the first application module.
  • the recording module encrypts the audio data recorded by the audio source to generate encrypted audio data, including:
  • the recording module prompts the terminal user to input a recording key, and receives the recording key input by the terminal user, or obtains a preset recording key stored locally;
  • the sending terminal further includes a first security mode opening module connected to the recording module, configured to: before the encrypted audio data is encrypted from the audio source, generate a security mode, trigger the The recording module encrypts the audio data recorded by the sound source to generate encrypted audio data;
  • the recording module is further configured to: after being triggered by the first security mode on module, encrypt the audio data recorded from the sound source to generate encrypted audio data.
  • the audio data comprises a recording file and a piece of original audio data.
  • the embodiment of the invention further provides a receiving terminal, including:
  • the second application module is configured to receive audio data sent by the sending terminal, and is further configured to receive a recording play request initiated by the user;
  • the recording play module is configured to decrypt the audio data after receiving the recording play request, and play the decrypted audio to the user.
  • the audio data sent by the sending terminal includes: a voice frame of the encrypted audio data and one or more feature voice frames;
  • the recording and playing module decrypts the audio data, including:
  • the recording and playing module After receiving the recording and playing request, the recording and playing module identifies the audio data sent by the sending terminal, and after identifying the one or more characteristic voice frames, prompts the terminal user to input the recording key, and receives the The recording key input by the terminal user, or after identifying the one or more feature speech frames, acquiring a preset recording key stored locally;
  • the voice frame of the encrypted audio data is decrypted by using a recording key input by the terminal user or the locally stored preset recording key.
  • the receiving terminal further includes the second application module and the recording play request a second security mode opening module connected to the module, configured to: after the second application module receives the audio data sent by the sending terminal, identify the audio data sent by the sending terminal, when identifying the one or When multiple feature speech frames are enabled, the security mode is enabled to trigger the recording play module to start.
  • a second security mode opening module connected to the module, configured to: after the second application module receives the audio data sent by the sending terminal, identify the audio data sent by the sending terminal, when identifying the one or When multiple feature speech frames are enabled, the security mode is enabled to trigger the recording play module to start.
  • the audio data comprises a recording file and a piece of original audio data.
  • the embodiment of the invention further provides an audio data security delivery system, comprising: the sending terminal and the receiving terminal as described above.
  • the embodiment of the invention further provides a computer readable storage medium storing program instructions, which can be implemented when the program instructions are executed.
  • the audio data security transmission method, system and terminal encrypt or decrypt the audio data to be transmitted by the bottom layer module of the terminal, thereby preventing the recording content from being stolen from the SD card, and the application is only used as
  • the transmission channel of the encrypted audio data is ciphertext for the application, so as to prevent the application from leaking, and the security and reliability are high, and the user's privacy protection needs are met.
  • FIG. 1 is a schematic diagram of a “recording service” in an underlying framework of a terminal in the related art to provide a recording service for an upper layer application;
  • FIG. 2 is a schematic diagram of a “audio service” in an underlying framework of a terminal in the related art providing a recording service for an upper layer application;
  • FIG. 3 is a structural diagram of a transmitting terminal in an embodiment
  • FIG. 4 is a structural diagram of a receiving terminal in an embodiment
  • FIG. 5 is a flowchart of a method for securely transmitting audio data in an embodiment
  • Figure 6 is a schematic view of a "hard” switch and a “soft” switch in the embodiment
  • FIG. 7 is a flow chart of a method for securely transmitting audio data in an embodiment
  • FIG. 8 is a schematic diagram of providing a recording service for an upper layer application by a “recording service” in an underlying framework of a terminal in an application example;
  • FIG. 9 is a schematic diagram of providing an audio service for an upper layer application by an "audio service" in an underlying framework of a terminal in an application example;
  • FIG. 10 is a schematic diagram of inserting a feature speech frame into a recording file format in an application example
  • Figure 11 is a schematic diagram of the insertion of a distinguishable feature into an original piece of audio data in an application example.
  • the recording is recorded and played by the underlying framework of the terminal in accordance with the request of the upper application. If the terminal can encrypt the recorded audio data, only the correct password can be played. If you do not enter the password or enter the wrong password, you can only hear the noise (including copying it to the PC and forcibly playing it with the music player) , the purpose of protecting the privacy of the user's voice can be achieved.
  • the present embodiment provides an audio data security delivery system, including a transmitting terminal and a receiving terminal, where, as shown in FIG. 3, the sending terminal includes:
  • the first application module 301 is configured to receive a recording request initiated by the user, and send the encrypted audio data sent by the recording module 302 to the receiving terminal;
  • the recording module 302 is configured to encrypt the audio data recorded from the sound source to generate encrypted audio data, and send the encrypted audio data to the first application module 301.
  • the recording module 302 is disposed in the bottom layer of the terminal.
  • the first application module 301 refers to the application itself, and may be an application that is provided by the terminal itself, or may be a third-party application downloaded by the user, such as WeChat, QQ, and the like.
  • the recording module 302 encrypts the audio data recorded by the audio source to generate encrypted audio data, including:
  • the recording module 302 prompts the terminal user to input a recording key, and receives the recording key input by the terminal user, or obtains a preset recording key stored locally;
  • the one or more feature speech frames may be inserted at any position of the speech frame of the encrypted audio data, for example, before the speech frame of the encrypted audio data, at the middle or at the end of the speech frame.
  • the sending terminal further includes a first security mode opening module 303 connected to the recording module 302, configured to encrypt audio data recorded from the audio source before generating encrypted audio data.
  • a first security mode opening module 303 connected to the recording module 302, configured to encrypt audio data recorded from the audio source before generating encrypted audio data.
  • the recording module 302 is further configured to encrypt the audio data recorded from the sound source to generate encrypted audio data after being triggered by the first security mode opening module 303.
  • the audio data includes a recording file and a piece of original audio data.
  • the receiving terminal includes:
  • the second application module 401 is configured to receive audio data sent by the sending terminal, and is further configured to receive a recording play request initiated by the user;
  • the recording play module 402 is configured to decrypt the audio data after receiving the recording play request, and play the decrypted audio to the user.
  • the recording and playing module 402 is disposed in the bottom layer of the terminal.
  • the second application module 401 refers to the application itself, and may be an application that is provided by the terminal itself, or may be a third-party application downloaded by the user, such as WeChat, QQ, and the like.
  • the audio data sent by the sending terminal includes: a voice frame of the encrypted audio data and one or more feature speech frames;
  • the recording and playing module 402 decrypts the audio data, including:
  • the recording and playing module 402 After receiving the recording play request, the recording and playing module 402 identifies the audio data sent by the sending terminal, and after identifying the one or more feature voice frames, prompts the terminal user to input the recording key, and Receiving the recording key input by the terminal user, or after identifying the one or more feature speech frames, acquiring a preset recording key stored locally;
  • the voice frame of the encrypted audio data is decrypted by using a recording key input by the terminal user or the locally stored preset recording key.
  • the receiving terminal further includes a second security mode opening module 403 connected to the second application module 401 and the recording and playing module 402, and is configured to be in the second application module.
  • the 401 After receiving the audio data sent by the sending terminal, the 401 identifies the audio data sent by the sending terminal, and when the one or more feature voice frames are identified, the security mode is enabled, and the recording and playing module 402 is triggered. start up.
  • the audio data includes a recording file and a piece of original audio data.
  • this embodiment provides a method for securely transmitting audio data, including the following steps:
  • S101 The terminal receives a recording request initiated by the user through an application.
  • S102 Encrypt the audio data recorded by the audio source to generate encrypted audio data.
  • the audio data recorded from the audio source is encrypted to generate encrypted audio data, including:
  • the recording is recorded for yourself, the user needs to remember the recording key by himself; if it is recorded to others (similar to the recording chat software), the sending and receiving parties should use other methods to agree on the recording key in advance, for example, The way the verbal agreement is made.
  • one or more special types may be inserted before the voice frame of the encrypted audio data.
  • the speech frame is advertised to generate the final encrypted audio data, and of course the one or more feature speech frames may be inserted in the middle or after the speech frame of the encrypted audio data.
  • the audio data includes a recording file and a piece of original audio data.
  • This embodiment also includes two recording generation methods: a recording service and an audio service, wherein the recording service is for a recording file, and the audio service is for a piece of original audio data, and when the audio data is a recording file, the recording request is carried There is a storage path for the recorded file.
  • the encrypted audio file or the original audio data segment maintains the normal audio file format, but some distinguishable features, i.e., one or more feature speech frames, are inserted therein. Since the normal audio file format is maintained, the ordinary music player can also play, but the noise is played when there is no key, which is convenient for the user to intuitively understand that the recording method is safe; in addition, these feature speech frames can be It is used by the receiver to identify the recording file or the original audio data segment is encrypted.
  • the feature speech frame is different from the voice frame of the audio data recorded by the ordinary user when chatting, and the amr file with a bit rate of 12.2 kbps is taken as an example, and the length of the voice frame is 32 bytes, wherein the One byte is the frame header, fixed to 0x3c, and the last 31 bytes are real voice data. It is assumed that the feature speech frame is defined as: 31 bytes after the header 3c are 0x01, then its characteristics are very Obviously, because recording in the real world is impossible to produce such a speech frame. The above is an example of inserting a feature speech frame.
  • a frame of feature speech frames can be inserted at the beginning of a sequence of speech files or original audio data segment speech frames.
  • This frame feature speech frame playback time is extremely short (in the case of amr, one frame is 20ms), the human ear can not be distinguished, but the machine can recognize.
  • a special speech frame ie, inserting a recording
  • a recording can be inserted at the beginning of the sequence of the voice file sequence of the recording file, that is, a plurality of feature speech frames, and a voice description of the encrypted recording file is recorded, and the recording is a human ear. Can be distinguished.
  • the first method can be used, because the audio data segment is relatively short, and it is possible to use dozens of ms, so it is reasonable to insert only one frame of the feature speech frame, of course, if it is relatively large. Audio data segments can also be inserted into multiple feature speech frames.
  • both methods are applicable.
  • the second method since a recording is inserted, it is easy to recognize both for the human ear and for the machine. If the user directly opens the encrypted recording file on the file system of the receiving terminal, the terminal will find that this is an encrypted recording file according to the feature frame of the recording, and the input password dialog box will pop up, and the user enters the password. If the password is entered correctly, the user You can hear the real recording; but if the password is entered incorrectly or the password is not entered, the user will only hear the “palm whisper recording file”, and the noise behind it can be omitted directly by the terminal (this is different from the forced play on the PC). , mainly considering that users do not like to hear noise).
  • a security mode may be set on the terminal, which may be turned on or off.
  • the security mode is opened, when the upper layer application requests the underlying frame to record, the recording generated by the underlying frame is Encrypted; when turned off, it returns to normal recording; in order to turn safe mode on or off, a switch is required, which can be called a "closed" switch.
  • It can be a real-life button on a terminal (herein referred to as a hard "close” button), as shown in Figure 6(a), or it can be a virtual button suspended on the screen (herein referred to as a soft "close” button). As shown in Figure 6 (b).
  • S103 Send the encrypted audio data to the receiving terminal by using the application.
  • this embodiment provides a method for securely transmitting audio data, including the following steps:
  • S201 The terminal receives the audio data sent by the sending terminal.
  • the audio data sent by the sending terminal includes: a voice frame of the encrypted audio data and one or more feature voice frames;
  • the terminal After receiving the recording play request initiated by the application by the user, the terminal decrypts the audio data, including:
  • the terminal After receiving the recording play request initiated by the application, the terminal identifies the audio data sent by the sending terminal, and after identifying the one or more feature voice frames, prompts the terminal user to input the recording key, and Receiving the recording key input by the terminal user, or after identifying the one or more feature speech frames, acquiring a preset recording key stored locally;
  • the voice frame of the encrypted audio data is decrypted by using a preset recording key by using the recording key input by the terminal user or the locally stored preset recording key.
  • the audio data includes a recording file and a piece of original audio data.
  • This embodiment also includes two recording generation methods: recording service and audio service, and corresponding decryption playing audio data, and there are also two ways, wherein the recording service is for the recording file, and the audio service is for the original audio data segment.
  • the recording playback request carries a storage path of the recording file.
  • FIG. 8 is a schematic diagram of a “recording service” in the underlying framework of the terminal in this application example providing a recording service for an upper layer application.
  • the terminal A and the terminal B agree on the key in advance and provide the key to the recording service module (ie, the recording module of the terminal A and the recording and playing module of the terminal B); the upper application on the terminal A ( For example, WeChat's "press and hold" function sends a request to the recording module, the request carries the specified recording file storage path, requests the recording module to record and stores the recording file in a specified path in the file system, and the recording module pairs
  • the recording file recorded by the audio source is encrypted by using the key (the default key input by the terminal user or locally stored), and the encrypted recording file is generated, and the encrypted recording file is stored in the specified path.
  • the upper layer application is The recording file is obtained in the file system and transmitted to the network server through the network, and the network server will record The audio file is stored in the local storage, and then the recording file is transmitted to the upper application on the destination terminal B through the network (for example, the "press and hold" function of WeChat), and the upper application of B stores the recording file in the file system, and then A request to play a recording to the recording playback module is required to play the recording file, and the recording service module reads the recording file from the file system and decrypts and plays it with a key (a default key input by the terminal user or locally stored).
  • a key a default key input by the terminal user or locally stored.
  • the recording module of the embodiment outputs an encrypted recording file, which is transparent to the upper layer application, that is, the upper layer application still considers that it is using an ordinary "recording”. service”.
  • the upper layer application does not participate in the encoding by itself. Therefore, in this embodiment, the recording encryption can be implemented by the underlying security framework on the premise that the upper layer application does not modify the code.
  • the recording file Since the recording file is encrypted at the time of generation, the recording files acquired by the upper layer application and acquired by the network server are encrypted, and no key cannot be played correctly. If forced playback, only noise is heard. Even if it is obtained by a thief, it will not reveal the privacy of the user's voice.
  • FIG. 9 is a schematic diagram of the “audio service” in the underlying framework of the terminal in this application example providing a recording service for an upper layer application.
  • the terminal A and the terminal B agree on the key in advance and provide the key to the recording module; the upper layer application on the terminal A (for example, the "real-time intercom function" of the WeChat) is obtained from the recording module.
  • the recording module Specifies the original audio data segment recorded by the source.
  • the recording module encrypts the original audio data segment using the key (the terminal user input or the locally stored preset key), and returns the encrypted original audio data segment to the upper application.
  • the network transmits to the network server, and the network server accesses the original audio data segment by using the local storage device, and then the network server transmits the upper layer application on the destination terminal B through the network (for example, the "real-time intercom” function of WeChat), B
  • the upper layer application for example, the "live intercom” function of WeChat
  • the recording module of the embodiment outputs the encrypted original audio data segment, which is transparent to the upper layer application, that is, the upper layer application still considers that it is using ordinary "Audio Service". Since the original audio data segment is encrypted, the original audio data segment acquired by the upper layer application and acquired by the network server is encrypted, and no key cannot be played correctly. If forced playback, only noise is heard. Even if it is obtained by a thief, it will not reveal the privacy of the user's voice.
  • FIG. 10 is a schematic diagram of inserting a feature speech frame into a recording file format in the application example. among them:
  • amr file 101 is an ordinary audio file structure.
  • amr file is used as an example, and amr is the most commonly used file format for recording recording files on a terminal;
  • 102 is an encrypted audio file structure (taking amr as an example) in which a feature speech frame is inserted;
  • 103 is an encrypted audio file structure (taking amr as an example) in which a plurality of feature speech frames (for example, a recording) are inserted;
  • 105 is an unencrypted original speech frame
  • 106 is a feature speech frame
  • 107 is the encrypted speech frame (if forced to play as noise);
  • the 108 is a piece of recording composed of the feature speech frame 1, ..., the feature speech frame m (for example, "palm whisper recording file”), which is easy to judge whether it is a human ear or a machine.
  • the feature speech frame m for example, "palm whisper recording file”
  • one of the characteristic speech frames in 102 is convenient for machine recognition, and is not convenient for human ear recognition; the plurality of characteristic speech frames in 103 are convenient for machine identification and easy for human ear recognition; if the PC is forcibly played 102 on the PC, noise will be heard; Forcibly playing 103 on a PC, you will hear the “palm encryption file”, then the noise; when using 103, the user will easily recognize that the recording is encrypted using the above method.
  • Figure 11 is a schematic diagram of the insertion of distinguishable features into the original audio data segments in this application example. among them:
  • 201 is a piece of raw audio data recorded by the "audio service"
  • 202 is an encrypted original audio data segment in which a feature speech frame is inserted
  • 204 is a feature speech frame
  • 205 is an encrypted speech frame (if forced to play as noise);
  • the audio data security transmission method, system, and terminal provided in the foregoing embodiments are encrypted or decrypted by the audio data to be transmitted by the bottom layer module of the terminal, thereby preventing theft from the SD card.
  • the content of the recording, and the application is only used as the transmission channel of the encrypted audio data.
  • the audio data transmitted to the application is ciphertext, thereby preventing the application and the audio data from being leaked in the intermediate transmission channel (network and network server). High reliability and meeting the needs of users to protect privacy.
  • the method, system and terminal provided by the embodiment of the invention encrypt or decrypt the audio data to be transmitted by the bottom layer module of the terminal, which can prevent the recording content from being stolen from the SD card, and the application only serves as a transmission channel of the encrypted audio data.
  • the audio data transmitted by the application is ciphertext, thereby preventing the application from leaking, and the security and reliability are high, and the user's privacy protection needs are met.

Abstract

Disclosed are a method, system, and terminal for secure transmission of audio data. The method comprises: at the side of a transmitting terminal, the terminal receives an audio recording request launched by a user via an application, encrypts audio data recorded from an audio source to produce encrypted audio data, and transmits the encrypted audio data to a receiving terminal via the application; and, at the side of the receiving terminal, the terminal receives the audio data transmitted by the transmitting terminal, when an audio recording playback request launched by the user via an application is received, the terminal decrypts the audio data and plays the decrypted audio for the user.

Description

一种音频数据安全传递方法、系统及终端Audio data security transmission method, system and terminal 技术领域Technical field
本发明实施例涉及但不限于通信领域,尤其涉及一种音频数据安全传递方法、系统及终端。Embodiments of the present invention relate to, but are not limited to, the field of communications, and in particular, to a method, system, and terminal for secure transmission of audio data.
背景技术Background technique
随着智能终端的飞速发展,用户已经可以通过语音实时聊天或者发送语音片段或录音文件的方式,代替手动输入文字的聊天方式,有一类聊天软件,为了节省用户输入文字的麻烦,采用将用户的声音录制下来,通过网络发送到对端的手机再播放出来听的方式进行聊天。因为该方式网络流量小,无须打字,效果类似于通话,还能重听,所以很受用户欢迎。还有一类网络实时通话软件,通过将用户的语音录制成很短的语音片段,以很小的时间间隔发送给对方,使用户通话是连续的。With the rapid development of intelligent terminals, users can already use voice real-time chat or send voice clips or recording files instead of manually input text chat mode. There is a kind of chat software. In order to save the user's trouble of inputting text, the user will be used. The voice is recorded, and the mobile phone is sent to the opposite end of the mobile phone to play and listen to the way. Because this way the network traffic is small, no need to type, the effect is similar to the call, but also can be listened to, so it is very popular with users. There is also a type of network real-time calling software that transmits the user's voice into a short voice segment and sends it to the other party at a small time interval, so that the user's call is continuous.
相关技术中,录音都是由终端底层框架按照上层应用的请求录制和播放的,终端中录音生成有两种方式:录音服务和音频服务,其中上述第一种聊天软件使用录音服务,而上述第二种网络实时通话软件使用音频服务。In the related art, the recording is recorded and played by the bottom layer framework of the terminal according to the request of the upper layer application, and the recording in the terminal is generated in two ways: a recording service and an audio service, wherein the first type of chat software uses a recording service, and the above Two types of network real-time calling software use audio services.
录音服务的这种方式中,上层应用需指定录音文件存储路径,然后请求终端底层框架开始录音,终端底层框架将生成的录音文件保存在指定的文件存储路径对应的文件系统中,上层应用再从该文件系统中获取录音文件。图1是相关技术中终端的底层框架中的“录音服务”为上层应用提供录音服务的示意图,如图1所示,终端A上的上层应用(比如,微信的“按住说话”功能)向录音服务模块发出请求,请求中携带有指定的录音文件存储路径,请求录音服务模块录音并把录音文件存储到文件系统中的指定路径下,录音结束后,上层应用从文件系统中取得录音文件,通过网络传输给网络服务器,网络服务器将录音文件存于本地存储中,再将录音文件通过网络传送到目的终端B上的上层应用(比如微信的“按住说话”功能),B的上层应用将录音文件存储到文件系统中,然后向录音服务模块发出播放录音的请求要求播放该录音文件,录音服务模块从文件系统中读出录音文件,然后播放。 In this mode of the recording service, the upper layer application needs to specify the storage path of the recording file, and then requests the underlying frame of the terminal to start recording, and the recording file generated by the terminal underlying frame is saved in the file system corresponding to the specified file storage path, and the upper layer application is further The recording file is obtained in the file system. 1 is a schematic diagram of a recording service provided by a “recording service” in an underlying framework of a terminal in the related art, as shown in FIG. 1 , an upper layer application on the terminal A (for example, a “hold and hold” function of WeChat) The recording service module sends a request, the request carries a specified recording file storage path, requests the recording service module to record and stores the recording file in a specified path in the file system, and after the recording ends, the upper application obtains the recording file from the file system. Transmitting to the network server through the network, the network server stores the recording file in the local storage, and then transmits the recording file to the upper application on the destination terminal B through the network (such as WeChat's "press and hold" function), and the upper application of B will The recording file is stored in the file system, and then a request to play the recording is issued to the recording service module to play the recording file, and the recording service module reads the recording file from the file system and plays it.
录音服务中的录音文件是一个完整的音频文件,而音频服务中只是一个原始音频数据片段,只包含几个语音帧,是很小的文件,没有文件头,并不是一个完整的音频文件,通过系统接口从MIC(麦克)等音源获得原始音频数据片断。图2是相关技术中终端的底层框架中的“音频服务”为上层应用提供录音服务的示意图。The recording file in the recording service is a complete audio file, and the audio service is just a piece of original audio data. It only contains several speech frames. It is a small file, no file header, not a complete audio file. The system interface obtains the original audio data segment from a source such as a MIC. 2 is a schematic diagram of an "audio service" in an underlying framework of a terminal in the related art to provide a recording service for an upper layer application.
如图2所示,终端A上的上层应用(比如,微信的“实时对讲”功能)从音频服务模块获取从指定音源录制的原始音频数据片断,通过网络传输给网络服务器,网络服务器使用本地存储设备存取该原始音频数据片断,之后网络服务器再通过网络传送到目的终端B上的上层应用(比如,微信的“实时对讲”功能),B的上层应用(比如,微信的“实时对讲”功能)将该原始音频数据片断发给音频服务模块进行播放。As shown in FIG. 2, the upper layer application on the terminal A (for example, the "live intercom" function of WeChat) acquires the original audio data segment recorded from the specified sound source from the audio service module, and transmits it to the network server through the network, and the network server uses the local network. The storage device accesses the original audio data segment, and then the network server transmits the upper layer application on the destination terminal B through the network (for example, "real-time intercom" function of WeChat), and the upper application of B (for example, "real-time pair of WeChat" The "speak" function sends the original audio data segment to the audio service module for playback.
上述的这些聊天方式存在着安全隐患:这种录音文件或音频数据片段一般被存放在了SD卡上,窃密者可以使用直接从SD卡上拷贝文件偷走之后再播放的办法来了解用户的聊天内容(比如说窃密者伪装一款游戏软件给用户使用,内藏木马在后台偷偷扫描SD卡上所有*.amr文件,打包传给窃密者自己的服务器),如果这些录音中包含了用户的隐私,对用户来说就存在较大的危险。The above-mentioned chat methods have security risks: such recording files or audio data segments are generally stored on the SD card, and the thief can use the method of directly copying files from the SD card and then playing them to understand the user's chat. Content (for example, the thief disguise a game software for the user to use, the built-in Trojan secretly scans all *.amr files on the SD card in the background, and packages them to the thief's own server), if these recordings contain the user's privacy. There is a greater danger to the user.
曾经,人们认为通过上层应用及网络服务器来传输这些音频数据是安全的,而且有些第三方应用宣称它们传递信息的方式是安全的,是从客户端加密传输到它们的服务器的,但在斯诺登揭露了美国政府实施大规模监控的事件后,人们才发现,网络作为信息的传输通道是十分不安全的,用户的隐私仍然有可能在服务器上泄漏,例如可能会被黑客攻破服务器获得;又或者例如可能被不诚信的第三方公司直接从服务器获得并利用,因此,第三方应用传递数据的中间传输通道已经不被信任,变得不安全且无法满足用户保护隐私的需求。At one time, it was thought that it was safe to transmit these audio data through upper-layer applications and web servers, and some third-party applications claimed that the way they transmitted information was secure and was encrypted from the client to their servers, but in Snow. After uncovering the incident of large-scale surveillance by the US government, people discovered that the network as a transmission channel for information is very insecure, and the privacy of users may still leak on the server, for example, it may be obtained by hackers breaking the server; Or, for example, it may be obtained and utilized directly from the server by an untrustworthy third-party company. Therefore, the intermediate transmission channel through which the third-party application transmits data is not trusted, becomes unsafe, and cannot satisfy the user's privacy protection needs.
因此,目前需要一种可以被用户信任的安全的音频数据传递方法,与任何第三方应用都无关,且能够满足用户保护隐私的需求。Therefore, there is a need for a secure audio data delivery method that can be trusted by users, independent of any third-party application, and capable of meeting the privacy needs of users.
发明内容 Summary of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.
本发明实施例提供一种音频数据安全传递方法、系统及终端,与任何应用都无关,防止应用泄密,安全可靠性高,满足用户保护隐私的需求。The embodiment of the invention provides a method, a system and a terminal for safely transmitting audio data, which are independent of any application, prevent application leakage, high security and reliability, and meet the requirements of user protection privacy.
本发明实施例提供了一种音频数据安全传递方法,包括:The embodiment of the invention provides a method for securely transmitting audio data, including:
终端接收到用户通过一应用发起的录音请求;The terminal receives a recording request initiated by the user through an application;
将从音源录制好的音频数据加密生成加密音频数据,并通过所述应用将所述加密音频数据发送至接收终端。The audio data recorded from the sound source is encrypted to generate encrypted audio data, and the encrypted audio data is transmitted to the receiving terminal by the application.
可选地,其中,所述将从音源录制好的音频数据加密生成加密音频数据,包括:Optionally, wherein the encrypting the audio data recorded from the sound source to generate the encrypted audio data comprises:
向终端用户提示输入录音密钥,并接收所述终端用户输入的录音密钥,或者,获取本地存储的预设的录音密钥;Prompting to enter the recording key to the terminal user, and receiving the recording key input by the terminal user, or acquiring a preset recording key stored locally;
利用所述终端用户输入的录音密钥或所述本地存储的预设的录音密钥对所述录制好的音频数据的语音帧加密后,插入一个或多个特征语音帧,生成最终的加密音频数据;或者,利用所述终端用户输入的录音密钥或所述本地存储的预设的录音密钥对所述录制好的音频数据的语音帧加密,并在加密后的音频数据的语音帧前插入一个或多个特征语音帧,生成最终的加密音频数据。After encrypting the voice frame of the recorded audio data by using the recording key input by the terminal user or the locally stored preset recording key, insert one or more feature speech frames to generate a final encrypted audio. Data; or, the voice frame of the recorded audio data is encrypted by using the recording key input by the terminal user or the locally stored preset recording key, and before the voice frame of the encrypted audio data Insert one or more feature speech frames to generate the final encrypted audio data.
可选地,在所述将从音源录制好的音频数据加密生成加密音频数据之前,所述方法还包括:Optionally, before the encrypting the audio data recorded from the audio source to generate the encrypted audio data, the method further includes:
开启安全模式,触发将从音源录制好的音频数据加密生成加密音频数据。Turn on the security mode to trigger the encryption of the audio data recorded from the source to generate encrypted audio data.
可选地,其中,所述音频数据包括录音文件和原始音频数据片段。Optionally, wherein the audio data comprises a recording file and a piece of original audio data.
本发明实施例还提供了一种音频数据安全传递方法,包括:The embodiment of the invention further provides a method for securely transmitting audio data, including:
终端接收到发送终端发来的音频数据;The terminal receives the audio data sent by the sending terminal;
所述终端接收到用户通过一应用发起的录音播放请求后,对所述音频数据解密,并将解密后的音频播放给用户。 After receiving the recording play request initiated by the application by the user, the terminal decrypts the audio data and plays the decrypted audio to the user.
可选地,其中,所述发送终端发来的音频数据包括:加密后的音频数据的语音帧以及一个或多个特征语音帧;Optionally, the audio data sent by the sending terminal includes: a voice frame of the encrypted audio data and one or more feature voice frames;
所述终端接收到用户通过一应用发起的录音播放请求后,对所述音频数据解密,包括:After receiving the recording play request initiated by the application by the user, the terminal decrypts the audio data, including:
所述终端接收到用户通过一应用发起的录音播放请求后,对发送终端发来的音频数据进行识别,识别出所述一个或多个特征语音帧后,向终端用户提示输入录音密钥,并接收所述终端用户输入的录音密钥,或者,识别出所述一个或多个特征语音帧后,获取本地存储的预设的录音密钥;After receiving the recording play request initiated by the application, the terminal identifies the audio data sent by the sending terminal, and after identifying the one or more feature voice frames, prompts the terminal user to input the recording key, and Receiving the recording key input by the terminal user, or after identifying the one or more feature speech frames, acquiring a preset recording key stored locally;
利用所述终端用户输入的录音密钥或者所述本地存储的预设的录音密钥对所述加密后的音频数据的语音帧解密。The voice frame of the encrypted audio data is decrypted by using a recording key input by the terminal user or the locally stored preset recording key.
可选地,在所述终端接收到发送终端发来的音频数据后,所述方法还包括:Optionally, after the terminal receives the audio data sent by the sending terminal, the method further includes:
对所述发送终端发来的音频数据进行识别,当识别出所述一个或多个特征语音帧时,开启安全模式。Identifying audio data sent by the transmitting terminal, and when the one or more feature speech frames are identified, the security mode is turned on.
可选地,其中,所述音频数据包括录音文件和原始音频数据片段。Optionally, wherein the audio data comprises a recording file and a piece of original audio data.
本发明实施例还提供了一种发送终端,包括:The embodiment of the invention further provides a sending terminal, including:
第一应用模块,设置为接收用户发起的录音请求,并将录音模块发来的加密音频数据发送至接收终端;以及The first application module is configured to receive a recording request initiated by the user, and send the encrypted audio data sent by the recording module to the receiving terminal;
录音模块,设置为将从音源录制好的音频数据加密生成加密音频数据,并发送至所述第一应用模块。The recording module is configured to encrypt the audio data recorded from the sound source to generate encrypted audio data, and send the encrypted audio data to the first application module.
可选地,其中,所述录音模块将从音源录制好的音频数据加密生成加密音频数据,包括:Optionally, the recording module encrypts the audio data recorded by the audio source to generate encrypted audio data, including:
所述录音模块向终端用户提示输入录音密钥,并接收所述终端用户输入的录音密钥,或者,获取本地存储的预设的录音密钥;The recording module prompts the terminal user to input a recording key, and receives the recording key input by the terminal user, or obtains a preset recording key stored locally;
利用所述终端用户输入的录音密钥或所述本地存储的预设的录音密钥对所述录制好的音频数据的语音帧加密后,插入一个或多个特征语音帧,生成 最终的加密音频数据;或者,利用所述终端用户输入的录音密钥或所述本地存储的预设的录音密钥对所述录制好的音频数据的语音帧加密,并在加密后的音频数据的语音帧前插入一个或多个特征语音帧,生成最终的加密音频数据。After encrypting the voice frame of the recorded audio data by using the recording key input by the terminal user or the locally stored preset recording key, insert one or more feature speech frames to generate Final encrypting the audio data; or encrypting the voice frame of the recorded audio data with the recording key input by the terminal user or the locally stored preset recording key, and encrypting the audio data Insert one or more feature speech frames before the speech frame to generate the final encrypted audio data.
可选地,所述发送终端还包括与所述录音模块连接的第一安全模式开启模块,其设置为在将从音源录制好的音频数据加密生成加密音频数据之前,开启安全模式,触发所述录音模块将从音源录制好的音频数据加密生成加密音频数据;Optionally, the sending terminal further includes a first security mode opening module connected to the recording module, configured to: before the encrypted audio data is encrypted from the audio source, generate a security mode, trigger the The recording module encrypts the audio data recorded by the sound source to generate encrypted audio data;
所述录音模块,还设置为在被所述第一安全模式开启模块触发后,将从音源录制好的音频数据加密生成加密音频数据。The recording module is further configured to: after being triggered by the first security mode on module, encrypt the audio data recorded from the sound source to generate encrypted audio data.
可选地,其中,所述音频数据包括录音文件和原始音频数据片段。Optionally, wherein the audio data comprises a recording file and a piece of original audio data.
本发明实施例还提供了一种接收终端,包括:The embodiment of the invention further provides a receiving terminal, including:
第二应用模块,设置为接收发送终端发来的音频数据;还设置为接收用户发起的录音播放请求;The second application module is configured to receive audio data sent by the sending terminal, and is further configured to receive a recording play request initiated by the user;
录音播放模块,设置为在接收到所述录音播放请求后,对所述音频数据解密,并将解密后的音频播放给用户。The recording play module is configured to decrypt the audio data after receiving the recording play request, and play the decrypted audio to the user.
可选地,其中,所述发送终端发来的音频数据包括:加密后的音频数据的语音帧以及一个或多个特征语音帧;Optionally, the audio data sent by the sending terminal includes: a voice frame of the encrypted audio data and one or more feature voice frames;
所述录音播放模块在接收到所述录音播放请求后,对所述音频数据解密,包括:After receiving the recording play request, the recording and playing module decrypts the audio data, including:
所述录音播放模块在接收到所述录音播放请求后,对发送终端发来的音频数据进行识别,识别出所述一个或多个特征语音帧后,向终端用户提示输入录音密钥,并接收所述终端用户输入的录音密钥,或者,识别出所述一个或多个特征语音帧后,获取本地存储的预设的录音密钥;After receiving the recording and playing request, the recording and playing module identifies the audio data sent by the sending terminal, and after identifying the one or more characteristic voice frames, prompts the terminal user to input the recording key, and receives the The recording key input by the terminal user, or after identifying the one or more feature speech frames, acquiring a preset recording key stored locally;
利用所述终端用户输入的录音密钥或者所述本地存储的预设的录音密钥对所述加密后的音频数据的语音帧解密。The voice frame of the encrypted audio data is decrypted by using a recording key input by the terminal user or the locally stored preset recording key.
可选地,所述接收终端还包括与所述第二应用模块和所述录音播放请求 模块连接的第二安全模式开启模块,设置为在所述第二应用模块接收到发送终端发来的音频数据后,对所述发送终端发来的音频数据进行识别,当识别出所述一个或多个特征语音帧时,开启安全模式,触发所述录音播放模块启动。Optionally, the receiving terminal further includes the second application module and the recording play request a second security mode opening module connected to the module, configured to: after the second application module receives the audio data sent by the sending terminal, identify the audio data sent by the sending terminal, when identifying the one or When multiple feature speech frames are enabled, the security mode is enabled to trigger the recording play module to start.
可选地,其中,所述音频数据包括录音文件和原始音频数据片段。Optionally, wherein the audio data comprises a recording file and a piece of original audio data.
本发明实施例还提供了一种音频数据安全传递系统,包括:如上所述发送终端和如上所述的接收终端。The embodiment of the invention further provides an audio data security delivery system, comprising: the sending terminal and the receiving terminal as described above.
本发明实施例还提供一种计算机可读存储介质,存储有程序指令,当该程序指令被执行时可实现上述方法。The embodiment of the invention further provides a computer readable storage medium storing program instructions, which can be implemented when the program instructions are executed.
与相关技术相比,本发明实施例提供的音频数据安全传递方法、系统及终端,由终端底层模块对待传输的音频数据进行加密或解密,可以防止从SD卡上窃取录音内容,而且应用只是作为已加密后的音频数据的传输通道,对于应用而言传递的音频数据均为密文,从而防止应用泄密,安全可靠性高,满足用户保护隐私的需求。Compared with the related art, the audio data security transmission method, system and terminal provided by the embodiment of the present invention encrypt or decrypt the audio data to be transmitted by the bottom layer module of the terminal, thereby preventing the recording content from being stolen from the SD card, and the application is only used as The transmission channel of the encrypted audio data is ciphertext for the application, so as to prevent the application from leaking, and the security and reliability are high, and the user's privacy protection needs are met.
在阅读并理解了附图和详细描述后,可以明白其他方面。Other aspects will be apparent upon reading and understanding the drawings and detailed description.
附图概述BRIEF abstract
图1是相关技术中终端的底层框架中的“录音服务”为上层应用提供录音服务的示意图;1 is a schematic diagram of a “recording service” in an underlying framework of a terminal in the related art to provide a recording service for an upper layer application;
图2是相关技术中终端的底层框架中的“音频服务”为上层应用提供录音服务的示意图;2 is a schematic diagram of a “audio service” in an underlying framework of a terminal in the related art providing a recording service for an upper layer application;
图3是实施例中发送终端的结构图;3 is a structural diagram of a transmitting terminal in an embodiment;
图4是实施例中接收终端的结构图;4 is a structural diagram of a receiving terminal in an embodiment;
图5是实施例中音频数据安全传递方法的流程图;FIG. 5 is a flowchart of a method for securely transmitting audio data in an embodiment; FIG.
图6是实施例中“硬”密开关和“软”密开关示意图; Figure 6 is a schematic view of a "hard" switch and a "soft" switch in the embodiment;
图7是实施例中音频数据安全传递方法的流程图;7 is a flow chart of a method for securely transmitting audio data in an embodiment;
图8是应用示例中终端的底层框架中的“录音服务”为上层应用提供录音服务的示意图;8 is a schematic diagram of providing a recording service for an upper layer application by a “recording service” in an underlying framework of a terminal in an application example;
图9是应用示例中终端的底层框架中的“音频服务”为上层应用提供音频服务的示意图;9 is a schematic diagram of providing an audio service for an upper layer application by an "audio service" in an underlying framework of a terminal in an application example;
图10是应用示例中往录音文件格式中插入特征语音帧的示意图;10 is a schematic diagram of inserting a feature speech frame into a recording file format in an application example;
图11是应用示例中往原始音频数据片段中插入可分辨特征的示意图。Figure 11 is a schematic diagram of the insertion of a distinguishable feature into an original piece of audio data in an application example.
本发明的实施方式Embodiments of the invention
下文中将结合附图对本发明实施例进行详细说明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互任意组合。The embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other.
实施例:Example:
由背景技术可知,录音都是由终端底层框架按照上层应用的请求录制和播放的。如果终端能够对录制好的音频数据加密,只有输入正确的密码才能播放,如果不输入密码或者输错密码则只能听到噪音(包括将它拷贝到PC上用音乐播放器强行播放的情况),就可以达到保护用户声音隐私的目的。As can be seen from the background art, the recording is recorded and played by the underlying framework of the terminal in accordance with the request of the upper application. If the terminal can encrypt the recorded audio data, only the correct password can be played. If you do not enter the password or enter the wrong password, you can only hear the noise (including copying it to the PC and forcibly playing it with the music player) , the purpose of protecting the privacy of the user's voice can be achieved.
本实施例提供了一种音频数据安全传递系统,包括发送终端和接收终端,其中,如图3所示,所述发送终端包括:The present embodiment provides an audio data security delivery system, including a transmitting terminal and a receiving terminal, where, as shown in FIG. 3, the sending terminal includes:
第一应用模块301,设置为接收用户发起的录音请求,并将录音模块302发来的加密音频数据发送至接收终端;以及The first application module 301 is configured to receive a recording request initiated by the user, and send the encrypted audio data sent by the recording module 302 to the receiving terminal;
录音模块302,设置为将从音源录制好的音频数据加密生成加密音频数据,并发送至所述第一应用模块301。The recording module 302 is configured to encrypt the audio data recorded from the sound source to generate encrypted audio data, and send the encrypted audio data to the first application module 301.
其中,录音模块302设置在终端底层框架中,所述第一应用模块301是指应用本身,可以是终端本身自带的应用,也可以是用户下载的第三方应用,比如微信,QQ等。The recording module 302 is disposed in the bottom layer of the terminal. The first application module 301 refers to the application itself, and may be an application that is provided by the terminal itself, or may be a third-party application downloaded by the user, such as WeChat, QQ, and the like.
其中,作为一种可选的方式,所述录音模块302将从音源录制好的音频数据加密生成加密音频数据,包括: In an optional manner, the recording module 302 encrypts the audio data recorded by the audio source to generate encrypted audio data, including:
所述录音模块302向终端用户提示输入录音密钥,并接收所述终端用户输入的录音密钥,或者,获取本地存储的预设的录音密钥;The recording module 302 prompts the terminal user to input a recording key, and receives the recording key input by the terminal user, or obtains a preset recording key stored locally;
利用所述终端用户输入的录音密钥或所述本地存储的预设的录音密钥对所述录制好的音频数据的语音帧加密后,插入一个或多个特征语音帧,生成最终的加密音频数据。After encrypting the voice frame of the recorded audio data by using the recording key input by the terminal user or the locally stored preset recording key, insert one or more feature speech frames to generate a final encrypted audio. data.
其中,可选地,可以在加密后的音频数据的语音帧的任何位置插入所述一个或多个特征语音帧,比如,在加密后的音频数据的语音帧前、语音帧中间或者结尾处。Optionally, the one or more feature speech frames may be inserted at any position of the speech frame of the encrypted audio data, for example, before the speech frame of the encrypted audio data, at the middle or at the end of the speech frame.
其中,作为一种可选的方式,所述发送终端还包括与所述录音模块302连接的第一安全模式开启模块303,其设置为在将从音源录制好的音频数据加密生成加密音频数据之前,开启安全模式,触发所述录音模块302将从音源录制好的音频数据加密生成加密音频数据;In an optional manner, the sending terminal further includes a first security mode opening module 303 connected to the recording module 302, configured to encrypt audio data recorded from the audio source before generating encrypted audio data. Turning on the security mode, triggering the recording module 302 to encrypt the audio data recorded from the sound source to generate encrypted audio data;
所述录音模块302,还设置为在被所述第一安全模式开启模块303触发后,将从音源录制好的音频数据加密生成加密音频数据。The recording module 302 is further configured to encrypt the audio data recorded from the sound source to generate encrypted audio data after being triggered by the first security mode opening module 303.
其中,所述音频数据包括录音文件和原始音频数据片段。The audio data includes a recording file and a piece of original audio data.
如图4所示,所述接收终端,包括:As shown in FIG. 4, the receiving terminal includes:
第二应用模块401,设置为接收发送终端发来的音频数据;还设置为接收用户发起的录音播放请求;The second application module 401 is configured to receive audio data sent by the sending terminal, and is further configured to receive a recording play request initiated by the user;
录音播放模块402,设置为在接收到所述录音播放请求后,对所述音频数据解密,并将解密后的音频播放给用户。The recording play module 402 is configured to decrypt the audio data after receiving the recording play request, and play the decrypted audio to the user.
其中,录音播放模块402设置在终端底层框架中,所述第二应用模块401是指应用本身,可以是终端本身自带的应用,也可以是用户下载的第三方应用,比如微信,QQ等。The recording and playing module 402 is disposed in the bottom layer of the terminal. The second application module 401 refers to the application itself, and may be an application that is provided by the terminal itself, or may be a third-party application downloaded by the user, such as WeChat, QQ, and the like.
所述发送终端发来的音频数据包括:加密后的音频数据的语音帧以及一个或多个特征语音帧;The audio data sent by the sending terminal includes: a voice frame of the encrypted audio data and one or more feature speech frames;
所述录音播放模块402在接收到所述录音播放请求后,对所述音频数据解密,包括: After receiving the recording play request, the recording and playing module 402 decrypts the audio data, including:
所述录音播放模块402在接收到所述录音播放请求后,对发送终端发来的音频数据进行识别,识别出所述一个或多个特征语音帧后,向终端用户提示输入录音密钥,并接收所述终端用户输入的录音密钥,或者,识别出所述一个或多个特征语音帧后,获取本地存储的预设的录音密钥;After receiving the recording play request, the recording and playing module 402 identifies the audio data sent by the sending terminal, and after identifying the one or more feature voice frames, prompts the terminal user to input the recording key, and Receiving the recording key input by the terminal user, or after identifying the one or more feature speech frames, acquiring a preset recording key stored locally;
利用所述终端用户输入的录音密钥或者所述本地存储的预设的录音密钥对所述加密后的音频数据的语音帧解密。The voice frame of the encrypted audio data is decrypted by using a recording key input by the terminal user or the locally stored preset recording key.
其中,作为一种可选的方式,所述接收终端还包括与所述第二应用模块401和所述录音播放模块402连接的第二安全模式开启模块403,设置为在所述第二应用模块401接收到发送终端发来的音频数据后,对所述发送终端发来的音频数据进行识别,当识别出所述一个或多个特征语音帧时,开启安全模式,触发所述录音播放模块402启动。The receiving terminal further includes a second security mode opening module 403 connected to the second application module 401 and the recording and playing module 402, and is configured to be in the second application module. After receiving the audio data sent by the sending terminal, the 401 identifies the audio data sent by the sending terminal, and when the one or more feature voice frames are identified, the security mode is enabled, and the recording and playing module 402 is triggered. start up.
其中,所述音频数据包括录音文件和原始音频数据片段。The audio data includes a recording file and a piece of original audio data.
如图5所示,本实施例提供了一种音频数据安全传递方法,包括以下步骤:As shown in FIG. 5, this embodiment provides a method for securely transmitting audio data, including the following steps:
S101:终端接收到用户通过一应用发起的录音请求;S101: The terminal receives a recording request initiated by the user through an application.
S102:将从音源录制好的音频数据加密生成加密音频数据;S102: Encrypt the audio data recorded by the audio source to generate encrypted audio data.
其中,作为一种可选的方式,所述将从音源录制好的音频数据加密生成加密音频数据,包括:In an optional manner, the audio data recorded from the audio source is encrypted to generate encrypted audio data, including:
向终端用户提示输入录音密钥,并接收所述终端用户输入的录音密钥,或者,获取本地存储的预设的录音密钥;Prompting to enter the recording key to the terminal user, and receiving the recording key input by the terminal user, or acquiring a preset recording key stored locally;
利用所述终端用户输入的录音密钥或所述本地存储的预设的录音密钥对所述录制好的音频数据的语音帧采用预设的录音密钥加密后,插入一个或多个特征语音帧,生成最终的加密音频数据。Inserting one or more feature voices by using a recording key input by the terminal user or the locally stored preset recording key to encrypt the voice frame of the recorded audio data by using a preset recording key Frame, generating the final encrypted audio data.
其中,如果录音是录给自己听,就需要用户自己记住录音密钥;如果是录给别人听(类似录音聊天软件),则收发的双方应当事先使用其他方式约定录音密钥,比如说通过口头约定的方式。Among them, if the recording is recorded for yourself, the user needs to remember the recording key by himself; if it is recorded to others (similar to the recording chat software), the sending and receiving parties should use other methods to agree on the recording key in advance, for example, The way the verbal agreement is made.
其中,可选地,可以在加密后的音频数据的语音帧前插入一个或多个特 征语音帧,生成最终的加密音频数据,当然也可以在加密后的音频数据的语音帧中间或后面插入所述一个或多个特征语音帧。Optionally, one or more special types may be inserted before the voice frame of the encrypted audio data. The speech frame is advertised to generate the final encrypted audio data, and of course the one or more feature speech frames may be inserted in the middle or after the speech frame of the encrypted audio data.
其中,所述音频数据包括录音文件和原始音频数据片段。本实施例也包括了两种录音生成的方式:录音服务和音频服务,其中,录音服务针对录音文件,音频服务针对原始音频数据片段,所述音频数据为录音文件时,所述录音请求中携带有录音文件的存储路径。The audio data includes a recording file and a piece of original audio data. This embodiment also includes two recording generation methods: a recording service and an audio service, wherein the recording service is for a recording file, and the audio service is for a piece of original audio data, and when the audio data is a recording file, the recording request is carried There is a storage path for the recorded file.
本实施例中,加密的录音文件或原始音频数据片段保持着正常的音频文件格式,但是其中会插入一些可分辨的特征,即一个或多个特征语音帧。由于保持着正常的音频文件格式,普通音乐播放器也可以播放,但在没有密钥时播放出的是噪音,这便于用户直观的了解到该录音方法是安全的;此外,这些特征语音帧可以用于接收方识别该录音文件或原始音频数据片段是加密的。In this embodiment, the encrypted audio file or the original audio data segment maintains the normal audio file format, but some distinguishable features, i.e., one or more feature speech frames, are inserted therein. Since the normal audio file format is maintained, the ordinary music player can also play, but the noise is played when there is no key, which is convenient for the user to intuitively understand that the recording method is safe; in addition, these feature speech frames can be It is used by the receiver to identify the recording file or the original audio data segment is encrypted.
其中,所述特征语音帧与普通用户聊天时录制好的音频数据的语音帧是有区别的,以比特率为12.2kbps的amr文件为例,它的语音帧的长度为32字节,其中第一个字节为帧头,固定为0x3c,后面31个字节是真实的语音数据,假设定义特征语音帧为:在帧头3c之后的31个字节均为0x01,那么它的特征就非常明显,因为在真实世界中录音是不可能产生这样的语音帧的。以上是插入一个特征语音帧的例子。Wherein, the feature speech frame is different from the voice frame of the audio data recorded by the ordinary user when chatting, and the amr file with a bit rate of 12.2 kbps is taken as an example, and the length of the voice frame is 32 bytes, wherein the One byte is the frame header, fixed to 0x3c, and the last 31 bytes are real voice data. It is assumed that the feature speech frame is defined as: 31 bytes after the header 3c are 0x01, then its characteristics are very Obviously, because recording in the real world is impossible to produce such a speech frame. The above is an example of inserting a feature speech frame.
至于插入多个特征语音帧,倒不必构造真实世界中录音不可能产生的语音帧,而是利用多个特征语音帧之间的序列关系即可。例如:把一段录好的录音的语音帧序列作为特征语音帧插入到加密后的音频数据的语音帧前时,这段语音帧序列就是加密后的音频数据的非常明显的标志。因为在真实世界中录音也不可能产生一模一样的语音帧序列。As for inserting a plurality of feature speech frames, it is not necessary to construct a speech frame that cannot be produced by recording in the real world, but to use a sequence relationship between a plurality of feature speech frames. For example, when a sequence of recorded speech frames is inserted as a feature speech frame in front of a speech frame of the encrypted audio data, the sequence of speech frames is a very obvious sign of the encrypted audio data. Because recording in the real world is not likely to produce exactly the same sequence of speech frames.
在一个应用示例中,可以在录音文件或原始音频数据片段语音帧序列的开头插入一帧特征语音帧。这一帧特征语音帧播放时间极短(以amr为例,一帧为20ms),人耳无法分辨,但机器可以识别。In one application example, a frame of feature speech frames can be inserted at the beginning of a sequence of speech files or original audio data segment speech frames. This frame feature speech frame playback time is extremely short (in the case of amr, one frame is 20ms), the human ear can not be distinguished, but the machine can recognize.
在另一个应用示例中,可以在录音文件语音帧序列的开头插入一段特殊的语音帧(即插入一段录音),即多个特征语音帧,录制加密录音文件的语音说明,这段录音是人耳可以分辨的。 In another application example, a special speech frame (ie, inserting a recording) can be inserted at the beginning of the sequence of the voice file sequence of the recording file, that is, a plurality of feature speech frames, and a voice description of the encrypted recording file is recorded, and the recording is a human ear. Can be distinguished.
例如,在录音文件语音帧序列的开头插入一段预先录制好的录音:“掌心密语录音文件”(以amr为例,插入50帧,共1s录音),这样接收方用普通音乐播放器播放时,用户会先听到这段录音,了解到该文件是加密录音文件。由于这段特殊的语音帧是预先录制的,所以它对底层安全框架来说是一个非常明显的特征,是可以识别的。For example, insert a pre-recorded recording at the beginning of the sequence of voice files of the recording file: “palm whisper recording file” (in the case of amr, insert 50 frames for a total of 1 s recording), so that when the receiver plays with a normal music player, The user will first hear the recording and learn that the file is an encrypted recording file. Since this special speech frame is pre-recorded, it is a very obvious feature for the underlying security framework and is identifiable.
如果是原始音频数据片段,则可使用第一种方式,因为,音频数据片段是比较短的,可能使用几十个ms,所以,只插入一帧特征语音帧比较合理,当然如果是比较大的音频数据片段也是可以插入多个特征语音帧的。If it is the original audio data segment, the first method can be used, because the audio data segment is relatively short, and it is possible to use dozens of ms, so it is reasonable to insert only one frame of the feature speech frame, of course, if it is relatively large. Audio data segments can also be inserted into multiple feature speech frames.
如果是完整的录音文件,则两种方式都适用,使用第二种方式时,因为插入一段录音,无论对人耳还是对机器来说都容易识别。如果用户在接收终端的文件系统上直接打开加密的录音文件,终端会根据这段录音的特征帧发现这是一个加密录音文件,会弹出输入密码对话框请用户输入密码,如果密码输入正确,用户就能听到真正的录音;但如果密码输入错误或者不输入密码,用户只会听见“掌心密语录音文件”,后面的噪音可被终端直接省略掉(这是与在PC机上强行播放不同的地方,主要考虑到了用户不喜欢听见噪音)。In the case of a complete recording file, both methods are applicable. When the second method is used, since a recording is inserted, it is easy to recognize both for the human ear and for the machine. If the user directly opens the encrypted recording file on the file system of the receiving terminal, the terminal will find that this is an encrypted recording file according to the feature frame of the recording, and the input password dialog box will pop up, and the user enters the password. If the password is entered correctly, the user You can hear the real recording; but if the password is entered incorrectly or the password is not entered, the user will only hear the “palm whisper recording file”, and the noise behind it can be omitted directly by the terminal (this is different from the forced play on the PC). , mainly considering that users do not like to hear noise).
其中,在本实施例中,作为一种可选的方式,可以在终端上设置安全模式,可打开或关闭,打开安全模式后,上层应用请求底层框架进行录音时,由底层框架生成的录音是加密的;关闭时,恢复为普通录音;为了打开或关闭安全模式,需要有一个开关,可称之为“密”开关。它可以是一个终端上真实存在的按键(本文称作硬“密”按钮),如图6(a)所示,也可以是悬浮在屏幕上的虚拟按键(本文称作软“密”按钮)如图6(b)所示。In this embodiment, as an optional manner, a security mode may be set on the terminal, which may be turned on or off. After the security mode is opened, when the upper layer application requests the underlying frame to record, the recording generated by the underlying frame is Encrypted; when turned off, it returns to normal recording; in order to turn safe mode on or off, a switch is required, which can be called a "closed" switch. It can be a real-life button on a terminal (herein referred to as a hard "close" button), as shown in Figure 6(a), or it can be a virtual button suspended on the screen (herein referred to as a soft "close" button). As shown in Figure 6 (b).
S103:通过所述应用将所述加密音频数据发送至接收终端。S103: Send the encrypted audio data to the receiving terminal by using the application.
如图7所示,本实施例提供了一种音频数据安全传递方法,包括以下步骤:As shown in FIG. 7, this embodiment provides a method for securely transmitting audio data, including the following steps:
S201:终端接收到发送终端发来的音频数据;S201: The terminal receives the audio data sent by the sending terminal.
S202:所述终端接收到用户通过一应用发起的录音播放请求后,对所述 音频数据解密;S202: After receiving, by the terminal, a recording play request initiated by an application, the terminal Decryption of audio data;
其中,所述发送终端发来的音频数据包括:加密后的音频数据的语音帧以及一个或多个特征语音帧;The audio data sent by the sending terminal includes: a voice frame of the encrypted audio data and one or more feature voice frames;
所述终端接收到用户通过一应用发起的录音播放请求后,对所述音频数据解密,包括:After receiving the recording play request initiated by the application by the user, the terminal decrypts the audio data, including:
所述终端接收到用户通过一应用发起的录音播放请求后,对发送终端发来的音频数据进行识别,识别出所述一个或多个特征语音帧后,向终端用户提示输入录音密钥,并接收所述终端用户输入的录音密钥,或者,识别出所述一个或多个特征语音帧后,获取本地存储的预设的录音密钥;After receiving the recording play request initiated by the application, the terminal identifies the audio data sent by the sending terminal, and after identifying the one or more feature voice frames, prompts the terminal user to input the recording key, and Receiving the recording key input by the terminal user, or after identifying the one or more feature speech frames, acquiring a preset recording key stored locally;
利用所述终端用户输入的录音密钥或者所述本地存储的预设的录音密钥对所述加密后的音频数据的语音帧采用预设的录音密钥解密。The voice frame of the encrypted audio data is decrypted by using a preset recording key by using the recording key input by the terminal user or the locally stored preset recording key.
S203:将解密后的音频播放给用户。S203: Play the decrypted audio to the user.
其中,所述音频数据包括录音文件和原始音频数据片段。本实施例也包括了两种录音生成的方式:录音服务和音频服务,相应的解密播放音频数据,也对应有两种方式,其中,录音服务针对录音文件,音频服务针对原始音频数据片段,所述音频数据为录音文件时,所述录音播放请求中携带有录音文件的存储路径。The audio data includes a recording file and a piece of original audio data. This embodiment also includes two recording generation methods: recording service and audio service, and corresponding decryption playing audio data, and there are also two ways, wherein the recording service is for the recording file, and the audio service is for the original audio data segment. When the audio data is a recording file, the recording playback request carries a storage path of the recording file.
下面在一个应用示例中对本实施例进行详细描述。The present embodiment will be described in detail below in an application example.
图8是本应用示例中终端的底层框架中的“录音服务”为上层应用提供录音服务的示意图。其中,如图8所示,终端A和终端B事先约定了密钥并将密钥提供给录音服务模块(即终端A的录音模块和终端B的录音播放模块);终端A上的上层应用(比如,微信的“按住说话”功能)向录音模块发出请求,请求中携带有指定的录音文件存储路径,请求录音模块录音并把录音文件存储到文件系统中的指定路径下,录音模块对从音源录制好的录音文件使用密钥(终端用户输入的或本地存储的预设的密钥)加密,生成了加密录音文件,将加密录音文件存储在指定的路径下,录音结束后,上层应用从文件系统中取得录音文件,通过网络传输给网络服务器,网络服务器将录 音文件存于本地存储中,再将录音文件通过网络传送到目的终端B上的上层应用(比如,微信的“按住说话”功能),B的上层应用将录音文件存储到文件系统中,然后向录音播放模块发出播放录音的请求要求播放该录音文件,录音服务模块从文件系统中读出录音文件,用密钥(终端用户输入的或本地存储的预设的密钥)解密然后播放。FIG. 8 is a schematic diagram of a “recording service” in the underlying framework of the terminal in this application example providing a recording service for an upper layer application. As shown in FIG. 8, the terminal A and the terminal B agree on the key in advance and provide the key to the recording service module (ie, the recording module of the terminal A and the recording and playing module of the terminal B); the upper application on the terminal A ( For example, WeChat's "press and hold" function sends a request to the recording module, the request carries the specified recording file storage path, requests the recording module to record and stores the recording file in a specified path in the file system, and the recording module pairs The recording file recorded by the audio source is encrypted by using the key (the default key input by the terminal user or locally stored), and the encrypted recording file is generated, and the encrypted recording file is stored in the specified path. After the recording is finished, the upper layer application is The recording file is obtained in the file system and transmitted to the network server through the network, and the network server will record The audio file is stored in the local storage, and then the recording file is transmitted to the upper application on the destination terminal B through the network (for example, the "press and hold" function of WeChat), and the upper application of B stores the recording file in the file system, and then A request to play a recording to the recording playback module is required to play the recording file, and the recording service module reads the recording file from the file system and decrypts and plays it with a key (a default key input by the terminal user or locally stored).
与相关技术中普通的“录音服务”相比,本实施例录音模块输出的是经过加密的录音文件,这一切对于上层应用来说是透明的,即上层应用仍然认为它在使用普通的“录音服务”。上层应用并不自己参与编码,因此,本实施例可以在上层应用不修改代码的前提下由底层安全框架实现录音加密。Compared with the conventional "recording service" in the related art, the recording module of the embodiment outputs an encrypted recording file, which is transparent to the upper layer application, that is, the upper layer application still considers that it is using an ordinary "recording". service". The upper layer application does not participate in the encoding by itself. Therefore, in this embodiment, the recording encryption can be implemented by the underlying security framework on the premise that the upper layer application does not modify the code.
而由于录音文件在生成时就是加密的,所以上层应用获取的以及网络服务器获取的录音文件均是加密的,没有密钥无法正确播放,如果强行播放,只能听到噪音。即使被窃密者获得,也不会泄露用户声音隐私。Since the recording file is encrypted at the time of generation, the recording files acquired by the upper layer application and acquired by the network server are encrypted, and no key cannot be played correctly. If forced playback, only noise is heard. Even if it is obtained by a thief, it will not reveal the privacy of the user's voice.
图9是本应用示例中终端的底层框架中的“音频服务”为上层应用提供录音服务的示意图。其中,如图9所示,终端A和终端B事先约定了密钥并将密钥提供给录音模块;终端A上的上层应用(比如,微信的“实时对讲”功能)从录音模块获取从指定音源录制的原始音频数据片断,录音模块使用密钥(终端用户输入的或本地存储的预设的密钥)对原始音频数据片断加密,将加密后的原始音频数据片断返回给上层应用,通过网络传输给网络服务器,网络服务器使用本地存储设备存取该原始音频数据片断,之后网络服务器再通过网络传送到目的终端B上的上层应用(比如,微信的“实时对讲”功能),B的上层应用(比如,微信的“实时对讲”功能)将该原始音频数据片断发给录音播放模块,录音播放模块用密钥(终端用户输入的或本地存储的预设的密钥)解密然后播放。FIG. 9 is a schematic diagram of the “audio service” in the underlying framework of the terminal in this application example providing a recording service for an upper layer application. As shown in FIG. 9, the terminal A and the terminal B agree on the key in advance and provide the key to the recording module; the upper layer application on the terminal A (for example, the "real-time intercom function" of the WeChat) is obtained from the recording module. Specifies the original audio data segment recorded by the source. The recording module encrypts the original audio data segment using the key (the terminal user input or the locally stored preset key), and returns the encrypted original audio data segment to the upper application. The network transmits to the network server, and the network server accesses the original audio data segment by using the local storage device, and then the network server transmits the upper layer application on the destination terminal B through the network (for example, the "real-time intercom" function of WeChat), B The upper layer application (for example, the "live intercom" function of WeChat) sends the original audio data segment to the recording and playing module, and the recording and playing module decrypts and plays with the key (the terminal key input by the terminal user or locally stored). .
与相关技术中普通的“音频服务”相比,本实施例录音模块输出的是经过加密的原始音频数据片断,这一切对于上层应用来说是透明的,即上层应用仍然认为它在使用普通的“音频服务”。而由于原始音频数据片断是加密的,所以上层应用获取的以及网络服务器获取的原始音频数据片断均是加密的,没有密钥无法正确播放,如果强行播放,只能听到噪音。即使被窃密者获得,也不会泄露用户声音隐私。 Compared with the ordinary "audio service" in the related art, the recording module of the embodiment outputs the encrypted original audio data segment, which is transparent to the upper layer application, that is, the upper layer application still considers that it is using ordinary "Audio Service". Since the original audio data segment is encrypted, the original audio data segment acquired by the upper layer application and acquired by the network server is encrypted, and no key cannot be played correctly. If forced playback, only noise is heard. Even if it is obtained by a thief, it will not reveal the privacy of the user's voice.
图10是本应用示例中往录音文件格式中插入特征语音帧的示意图。其中:FIG. 10 is a schematic diagram of inserting a feature speech frame into a recording file format in the application example. among them:
101是普通的音频文件结构,为了便于说明,使用了amr文件作为例子,amr是终端上录制录音文件最常用的文件格式;101 is an ordinary audio file structure. For convenience of explanation, an amr file is used as an example, and amr is the most commonly used file format for recording recording files on a terminal;
102是插入了一个特征语音帧的加密后的音频文件结构(以amr为例);102 is an encrypted audio file structure (taking amr as an example) in which a feature speech frame is inserted;
103是插入了多个特征语音帧(例如一段录音)的加密后的音频文件结构(以amr为例);103 is an encrypted audio file structure (taking amr as an example) in which a plurality of feature speech frames (for example, a recording) are inserted;
104是文件头;104 is the file header;
105是未加密的原始语音帧;105 is an unencrypted original speech frame;
106是一个特征语音帧;106 is a feature speech frame;
107是加密后的语音帧(如果强行播放为噪音);107 is the encrypted speech frame (if forced to play as noise);
108是由特征语音帧1、…、特征语音帧m构成的一段录音片断(例如:“掌心密语录音文件”),无论是人耳还是机器都容易判断。108 is a piece of recording composed of the feature speech frame 1, ..., the feature speech frame m (for example, "palm whisper recording file"), which is easy to judge whether it is a human ear or a machine.
其中,102中的一个特征语音帧便于机器识别,不便于人耳识别;103中的多个特征语音帧既便于机器识别又便于人耳识别;如果在PC机上强行播放102,会听见噪音;如果在PC机上强行播放103,会听见“掌心加密录音文件”,然后是噪音;使用103时,用户会比较容易识别出该录音是使用上述方法加密后的录音文件。Among them, one of the characteristic speech frames in 102 is convenient for machine recognition, and is not convenient for human ear recognition; the plurality of characteristic speech frames in 103 are convenient for machine identification and easy for human ear recognition; if the PC is forcibly played 102 on the PC, noise will be heard; Forcibly playing 103 on a PC, you will hear the “palm encryption file”, then the noise; when using 103, the user will easily recognize that the recording is encrypted using the above method.
图11是本应用示例中往原始音频数据片段中插入可分辨特征的示意图。其中:Figure 11 is a schematic diagram of the insertion of distinguishable features into the original audio data segments in this application example. among them:
201是“音频服务”录制的原始音频数据片断;201 is a piece of raw audio data recorded by the "audio service";
202是插入了一个特征语音帧的加密后的原始音频数据片段;202 is an encrypted original audio data segment in which a feature speech frame is inserted;
203是未加密的原始语音帧;203 is an unencrypted original speech frame;
204是一个特征语音帧;204 is a feature speech frame;
205是加密后的语音帧(如果强行播放为噪音); 205 is an encrypted speech frame (if forced to play as noise);
从上述实施例可以看出,相对于相关技术,上述实施例中提供的音频数据安全传递方法、系统及终端,由终端底层模块对待传输的音频数据进行加密或解密,可以防止从SD卡上窃取录音内容,而且应用只是作为已加密后的音频数据的传输通道,对于应用而言传递的音频数据均为密文,从而防止应用以及音频数据在中间传输通道(网络和网络服务器)上泄密,安全可靠性高,满足用户保护隐私的需求。It can be seen from the foregoing embodiments that, in relation to the related art, the audio data security transmission method, system, and terminal provided in the foregoing embodiments are encrypted or decrypted by the audio data to be transmitted by the bottom layer module of the terminal, thereby preventing theft from the SD card. The content of the recording, and the application is only used as the transmission channel of the encrypted audio data. The audio data transmitted to the application is ciphertext, thereby preventing the application and the audio data from being leaked in the intermediate transmission channel (network and network server). High reliability and meeting the needs of users to protect privacy.
本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序来指令相关硬件完成,所述程序可以存储于计算机可读存储介质中,如只读存储器、磁盘或光盘等。可选地,上述实施例的全部或部分步骤也可以使用一个或多个集成电路来实现。相应地,上述实施例中的各模块/单元可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本发明实施例不限制于任何特定形式的硬件和软件的结合。One of ordinary skill in the art will appreciate that all or a portion of the steps described above can be accomplished by a program that instructs the associated hardware, such as a read-only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the foregoing embodiment may be implemented in the form of hardware or in the form of a software function module. Embodiments of the invention are not limited to any specific form of combination of hardware and software.
工业实用性Industrial applicability
本发明实施例提供的方法、系统及终端由终端底层模块对待传输的音频数据进行加密或解密,可以防止从SD卡上窃取录音内容,而且应用只是作为已加密后的音频数据的传输通道,对于应用而言传递的音频数据均为密文,从而防止应用泄密,安全可靠性高,满足用户保护隐私的需求。 The method, system and terminal provided by the embodiment of the invention encrypt or decrypt the audio data to be transmitted by the bottom layer module of the terminal, which can prevent the recording content from being stolen from the SD card, and the application only serves as a transmission channel of the encrypted audio data. The audio data transmitted by the application is ciphertext, thereby preventing the application from leaking, and the security and reliability are high, and the user's privacy protection needs are met.

Claims (19)

  1. 一种音频数据安全传递方法,包括:A method for securely transmitting audio data, comprising:
    终端接收到用户通过一应用发起的录音请求;The terminal receives a recording request initiated by the user through an application;
    将从音源录制好的音频数据加密生成加密音频数据,并通过所述应用将所述加密音频数据发送至接收终端。The audio data recorded from the sound source is encrypted to generate encrypted audio data, and the encrypted audio data is transmitted to the receiving terminal by the application.
  2. 如权利要求1所述的方法,其中:The method of claim 1 wherein:
    所述将从音源录制好的音频数据加密生成加密音频数据,包括:The encrypting the audio data recorded from the sound source to generate encrypted audio data, including:
    向终端用户提示输入录音密钥,并接收所述终端用户输入的录音密钥,或者,获取本地存储的预设的录音密钥;Prompting to enter the recording key to the terminal user, and receiving the recording key input by the terminal user, or acquiring a preset recording key stored locally;
    利用所述终端用户输入的录音密钥或所述本地存储的预设的录音密钥对所述录制好的音频数据的语音帧加密后,插入一个或多个特征语音帧,生成最终的加密音频数据;或者,利用所述终端用户输入的录音密钥或所述本地存储的预设的录音密钥对所述录制好的音频数据的语音帧加密,并在加密后的音频数据的语音帧前插入一个或多个特征语音帧,生成最终的加密音频数据。After encrypting the voice frame of the recorded audio data by using the recording key input by the terminal user or the locally stored preset recording key, insert one or more feature speech frames to generate a final encrypted audio. Data; or, the voice frame of the recorded audio data is encrypted by using the recording key input by the terminal user or the locally stored preset recording key, and before the voice frame of the encrypted audio data Insert one or more feature speech frames to generate the final encrypted audio data.
  3. 如权利要求1或2所述的方法,在所述将从音源录制好的音频数据加密生成加密音频数据之前,所述方法还包括:The method according to claim 1 or 2, before the encrypting the audio data recorded from the sound source to generate encrypted audio data, the method further comprises:
    开启安全模式,触发将从音源录制好的音频数据加密生成加密音频数据。Turn on the security mode to trigger the encryption of the audio data recorded from the source to generate encrypted audio data.
  4. 如权利要求1或2所述的方法,其中:The method of claim 1 or 2 wherein:
    所述音频数据包括录音文件和原始音频数据片段。The audio data includes a recording file and a piece of original audio data.
  5. 一种音频数据安全传递方法,包括:A method for securely transmitting audio data, comprising:
    终端接收到发送终端发来的音频数据;The terminal receives the audio data sent by the sending terminal;
    所述终端接收到用户通过一应用发起的录音播放请求后,对所述音频数据解密,并将解密后的音频播放给用户。After receiving the recording play request initiated by the application by the user, the terminal decrypts the audio data and plays the decrypted audio to the user.
  6. 如权利要求5所述的方法,其中:The method of claim 5 wherein:
    所述发送终端发来的音频数据包括:加密后的音频数据的语音帧以及一个或多个特征语音帧; The audio data sent by the sending terminal includes: a voice frame of the encrypted audio data and one or more feature speech frames;
    所述终端接收到用户通过一应用发起的录音播放请求后,对所述音频数据解密,包括:After receiving the recording play request initiated by the application by the user, the terminal decrypts the audio data, including:
    所述终端接收到用户通过一应用发起的录音播放请求后,对发送终端发来的音频数据进行识别,识别出所述一个或多个特征语音帧后,向终端用户提示输入录音密钥,并接收所述终端用户输入的录音密钥,或者,识别出所述一个或多个特征语音帧后,获取本地存储的预设的录音密钥;After receiving the recording play request initiated by the application, the terminal identifies the audio data sent by the sending terminal, and after identifying the one or more feature voice frames, prompts the terminal user to input the recording key, and Receiving the recording key input by the terminal user, or after identifying the one or more feature speech frames, acquiring a preset recording key stored locally;
    利用所述终端用户输入的录音密钥或者所述本地存储的预设的录音密钥对所述加密后的音频数据的语音帧解密。The voice frame of the encrypted audio data is decrypted by using a recording key input by the terminal user or the locally stored preset recording key.
  7. 如权利要求5所述的方法,在所述终端接收到发送终端发来的音频数据后,所述方法还包括:The method of claim 5, after the terminal receives the audio data sent by the transmitting terminal, the method further includes:
    对所述发送终端发来的音频数据进行识别,当识别出所述一个或多个特征语音帧时,开启安全模式。Identifying audio data sent by the transmitting terminal, and when the one or more feature speech frames are identified, the security mode is turned on.
  8. 如权利要求5、6或7所述的方法,其中:A method as claimed in claim 5, 6 or 7 wherein:
    所述音频数据包括录音文件和原始音频数据片段。The audio data includes a recording file and a piece of original audio data.
  9. 一种发送终端,包括:A transmitting terminal includes:
    第一应用模块,设置为接收用户发起的录音请求,并将录音模块发来的加密音频数据发送至接收终端;以及The first application module is configured to receive a recording request initiated by the user, and send the encrypted audio data sent by the recording module to the receiving terminal;
    录音模块,设置为将从音源录制好的音频数据加密生成加密音频数据,并发送至所述第一应用模块。The recording module is configured to encrypt the audio data recorded from the sound source to generate encrypted audio data, and send the encrypted audio data to the first application module.
  10. 如权利要求9所述的发送终端,其中:The transmitting terminal according to claim 9, wherein:
    所述录音模块将从音源录制好的音频数据加密生成加密音频数据,包括:The recording module encrypts the audio data recorded by the audio source to generate encrypted audio data, including:
    所述录音模块向终端用户提示输入录音密钥,并接收所述终端用户输入的录音密钥,或者,获取本地存储的预设的录音密钥;The recording module prompts the terminal user to input a recording key, and receives the recording key input by the terminal user, or obtains a preset recording key stored locally;
    利用所述终端用户输入的录音密钥或所述本地存储的预设的录音密钥对所述录制好的音频数据的语音帧加密后,插入一个或多个特征语音帧,生成最终的加密音频数据;或者,利用所述终端用户输入的录音密钥或所述本地存储的预设的录音密钥对所述录制好的音频数据的语音帧加密,并在加密后 的音频数据的语音帧前插入一个或多个特征语音帧,生成最终的加密音频数据。After encrypting the voice frame of the recorded audio data by using the recording key input by the terminal user or the locally stored preset recording key, insert one or more feature speech frames to generate a final encrypted audio. Data; or, the voice frame of the recorded audio data is encrypted by using the recording key input by the terminal user or the locally stored preset recording key, and after encryption Insert one or more feature speech frames before the speech frame of the audio data to generate the final encrypted audio data.
  11. 如权利要求9或10所述的发送终端,所述发送终端还包括与所述录音模块连接的第一安全模式开启模块,其设置为在将从音源录制好的音频数据加密生成加密音频数据之前,开启安全模式,触发所述录音模块将从音源录制好的音频数据加密生成加密音频数据;The transmitting terminal according to claim 9 or 10, further comprising a first security mode on module connected to the recording module, configured to encrypt audio data recorded from a sound source before generating encrypted audio data Turning on the security mode, triggering the recording module to encrypt the audio data recorded by the audio source to generate encrypted audio data;
    所述录音模块,还设置为在被所述第一安全模式开启模块触发后,将从音源录制好的音频数据加密生成加密音频数据。The recording module is further configured to: after being triggered by the first security mode on module, encrypt the audio data recorded from the sound source to generate encrypted audio data.
  12. 如权利要求9或10所述的发送终端,其中:A transmitting terminal according to claim 9 or 10, wherein:
    所述音频数据包括录音文件和原始音频数据片段。The audio data includes a recording file and a piece of original audio data.
  13. 一种接收终端,包括:A receiving terminal includes:
    第二应用模块,设置为接收发送终端发来的音频数据;还设置为接收用户发起的录音播放请求;The second application module is configured to receive audio data sent by the sending terminal, and is further configured to receive a recording play request initiated by the user;
    录音播放模块,设置为在接收到所述录音播放请求后,对所述音频数据解密,并将解密后的音频播放给用户。The recording play module is configured to decrypt the audio data after receiving the recording play request, and play the decrypted audio to the user.
  14. 如权利要求13所述的接收终端,其中:The receiving terminal of claim 13 wherein:
    所述发送终端发来的音频数据包括:加密后的音频数据的语音帧以及一个或多个特征语音帧;The audio data sent by the sending terminal includes: a voice frame of the encrypted audio data and one or more feature speech frames;
    所述录音播放模块在接收到所述录音播放请求后,对所述音频数据解密,包括:After receiving the recording play request, the recording and playing module decrypts the audio data, including:
    所述录音播放模块在接收到所述录音播放请求后,对发送终端发来的音频数据进行识别,识别出所述一个或多个特征语音帧后,向终端用户提示输入录音密钥,并接收所述终端用户输入的录音密钥,或者,识别出所述一个或多个特征语音帧后,获取本地存储的预设的录音密钥;After receiving the recording and playing request, the recording and playing module identifies the audio data sent by the sending terminal, and after identifying the one or more characteristic voice frames, prompts the terminal user to input the recording key, and receives the The recording key input by the terminal user, or after identifying the one or more feature speech frames, acquiring a preset recording key stored locally;
    利用所述终端用户输入的录音密钥或者所述本地存储的预设的录音密钥对所述加密后的音频数据的语音帧解密。The voice frame of the encrypted audio data is decrypted by using a recording key input by the terminal user or the locally stored preset recording key.
  15. 如权利要求13所述的接收终端,所述接收终端还包括与所述第二应 用模块和所述录音播放请求模块连接的第二安全模式开启模块,设置为在所述第二应用模块接收到发送终端发来的音频数据后,对所述发送终端发来的音频数据进行识别,当识别出所述一个或多个特征语音帧时,开启安全模式,触发所述录音播放模块启动。The receiving terminal according to claim 13, wherein said receiving terminal further comprises said second responder a second security mode opening module connected by the module and the recording play requesting module, configured to: after the second application module receives the audio data sent by the sending terminal, identify the audio data sent by the sending terminal When the one or more feature speech frames are identified, the security mode is turned on, and the recording play module is triggered to be started.
  16. 如权利要求13、14或15所述的接收终端,其中:A receiving terminal according to claim 13, 14 or 15, wherein:
    所述音频数据包括录音文件和原始音频数据片段。The audio data includes a recording file and a piece of original audio data.
  17. 一种音频数据安全传递系统,包括:如权利要求9~12所述发送终端和如权利要求13~16所述的接收终端。An audio data secure delivery system comprising: a transmitting terminal according to claims 9-12 and a receiving terminal according to claims 13-16.
  18. 一种计算机可读存储介质,存储有程序指令,当该程序指令被执行时可实现权利要求1-4任一项所述的方法。A computer readable storage medium storing program instructions that, when executed, can implement the method of any of claims 1-4.
  19. 一种计算机可读存储介质,存储有程序指令,当该程序指令被执行时可实现权利要求5-8任一项所述的方法。 A computer readable storage medium storing program instructions that, when executed, implement the method of any of claims 5-8.
PCT/CN2015/087245 2014-10-24 2015-08-17 Method, system, and terminal for secure transmission of audio data WO2016062153A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410583678.7A CN104393994B (en) 2014-10-24 2014-10-24 Audio data secure transmission method, system and terminal
CN201410583678.7 2014-10-24

Publications (1)

Publication Number Publication Date
WO2016062153A1 true WO2016062153A1 (en) 2016-04-28

Family

ID=52611830

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/087245 WO2016062153A1 (en) 2014-10-24 2015-08-17 Method, system, and terminal for secure transmission of audio data

Country Status (2)

Country Link
CN (1) CN104393994B (en)
WO (1) WO2016062153A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110636175A (en) * 2019-10-18 2019-12-31 深圳传音控股股份有限公司 Communication recording method, terminal device and computer readable storage medium
CN113707151A (en) * 2021-08-20 2021-11-26 天津讯飞极智科技有限公司 Voice transcription method, device, recording equipment, system and storage medium

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104393994B (en) * 2014-10-24 2021-03-16 中兴通讯股份有限公司 Audio data secure transmission method, system and terminal
CN105099669B (en) * 2015-05-28 2019-07-19 努比亚技术有限公司 Recording encipher-decipher method and device
CN105554330A (en) * 2016-01-06 2016-05-04 努比亚技术有限公司 Voice message device and method
CN106211050A (en) * 2016-09-12 2016-12-07 青岛海信移动通信技术股份有限公司 Wireless cipher sending method, method of reseptance, Apparatus and system
CN109391283B (en) * 2018-12-12 2019-11-05 成都海得控制系统有限公司 Intercom system based on sewage treatment plant's operation and maintenance
CN112818375B (en) * 2021-03-08 2024-03-12 郑州铁路职业技术学院 Encryption system for recording information transmission

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030046063A1 (en) * 2001-09-03 2003-03-06 Samsung Electronics Co., Ltd. Combined stylus and method for driving thereof
CN102831912A (en) * 2012-08-10 2012-12-19 上海量明科技发展有限公司 Method, client and system for displaying playing progress of audio information
CN102916869A (en) * 2012-10-24 2013-02-06 鹤山世达光电科技有限公司 Instant messaging method and system
CN103780949A (en) * 2014-01-28 2014-05-07 佛山络威网络技术有限公司 Multimedia data recording method
CN103856386A (en) * 2012-11-28 2014-06-11 腾讯科技(深圳)有限公司 Information interaction method, system, server and instant messaging client
CN104393994A (en) * 2014-10-24 2015-03-04 中兴通讯股份有限公司 Safe transmission method and system for audio data and terminals

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101969545A (en) * 2010-09-08 2011-02-09 中兴通讯股份有限公司 Encryption method and device of multimedia file

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030046063A1 (en) * 2001-09-03 2003-03-06 Samsung Electronics Co., Ltd. Combined stylus and method for driving thereof
CN102831912A (en) * 2012-08-10 2012-12-19 上海量明科技发展有限公司 Method, client and system for displaying playing progress of audio information
CN102916869A (en) * 2012-10-24 2013-02-06 鹤山世达光电科技有限公司 Instant messaging method and system
CN103856386A (en) * 2012-11-28 2014-06-11 腾讯科技(深圳)有限公司 Information interaction method, system, server and instant messaging client
CN103780949A (en) * 2014-01-28 2014-05-07 佛山络威网络技术有限公司 Multimedia data recording method
CN104393994A (en) * 2014-10-24 2015-03-04 中兴通讯股份有限公司 Safe transmission method and system for audio data and terminals

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110636175A (en) * 2019-10-18 2019-12-31 深圳传音控股股份有限公司 Communication recording method, terminal device and computer readable storage medium
CN113707151A (en) * 2021-08-20 2021-11-26 天津讯飞极智科技有限公司 Voice transcription method, device, recording equipment, system and storage medium

Also Published As

Publication number Publication date
CN104393994B (en) 2021-03-16
CN104393994A (en) 2015-03-04

Similar Documents

Publication Publication Date Title
WO2016062153A1 (en) Method, system, and terminal for secure transmission of audio data
JP6592583B1 (en) Method and apparatus for information exchange
US8467512B2 (en) Method and system for authenticating telephone callers and avoiding unwanted calls
US7792296B2 (en) Access-controlled encrypted recording method for site, interaction and process monitoring
US8526620B2 (en) Method and system for secure data collection and distribution
CN106331751B (en) A kind of online encrypted slice video broadcasting method based on iOS operating system
US20190238795A1 (en) Method and system encrypting and decrypting audio/video file
US9503462B2 (en) Authenticating security parameters
US9571475B2 (en) Call encryption systems and methods
US20150134959A1 (en) Instant Communication Method and System
CN103200387B (en) A kind of monitoring video content protecting method and system
US20060218636A1 (en) Distributed communication security systems
WO2020155812A1 (en) Data storage method and device, and apparatus
CN105721903A (en) Method and system for playing online videos
WO2020003821A1 (en) Information processing system, information processing method, and information processing device
CN107094156A (en) A kind of safety communicating method and system based on P2P patterns
CN108768920B (en) Recorded broadcast data processing method and device
CN110380856B (en) Terminal device and voice information processing method and device thereof, and storage medium
WO2016082401A1 (en) Conversation method and apparatus, user terminal and computer storage medium
KR101078373B1 (en) System for authenticating a caller and Method thereof
CN108270917B (en) Encrypted smart phone
JP5778524B2 (en) Communication medium and data communication system for preventing data leakage
CN103986711A (en) Data processing method for voice communication
CN104038932B (en) A kind of safety equipment
KR101728338B1 (en) Call Security System

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15852040

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15852040

Country of ref document: EP

Kind code of ref document: A1