WO2017016363A1 - Method for processing digital audio signal - Google Patents

Method for processing digital audio signal Download PDF

Info

Publication number
WO2017016363A1
WO2017016363A1 PCT/CN2016/087445 CN2016087445W WO2017016363A1 WO 2017016363 A1 WO2017016363 A1 WO 2017016363A1 CN 2016087445 W CN2016087445 W CN 2016087445W WO 2017016363 A1 WO2017016363 A1 WO 2017016363A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
audio signal
target data
digital audio
sequence
Prior art date
Application number
PCT/CN2016/087445
Other languages
French (fr)
Chinese (zh)
Inventor
李庆成
鹿毅忠
Original Assignee
李庆成
鹿毅忠
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 李庆成, 鹿毅忠 filed Critical 李庆成
Publication of WO2017016363A1 publication Critical patent/WO2017016363A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/018Audio watermarking, i.e. embedding inaudible data in the audio signal

Definitions

  • the present invention relates to a digital audio signal processing technique, and more particularly to a method for digital audio signal processing based on psychoacoustics using a masking effect.
  • Chinese Patent Application No. 201410301832.7 discloses a technique of encoding and modulating digital information to be transmitted to form a sound encoded signal; and mixing the sound encoded signal with an audio signal in a preselected audio and video program for output.
  • the "digital information to be transmitted” can be added to the normal sound by mixing; however, due to the unpredictability of the "digital information to be transmitted", the "digital information to be transmitted” passes through
  • the coded signal formed by the coded modulation may be noise in the sound in a considerable number of cases. In other cases, it may be other sounds that can interfere with normally played sounds.
  • the following improvements are proposed in the specification section of the above patent application:
  • the digital information to be transmitted is encoded and modulated to form a sound encoded signal.
  • the sound encoded signal can be written as a digital sound signal file, or can be converted into a sound analog signal by a digital-to-analog converter.
  • the frequency of the sound analog signal can be selected to be above 18 kHz. In the frequency band below 20 kHz, the human ear is difficult to detect and does not affect the normal playback of the original TV sound or music signal. In the subsequent steps, the local receiving device of the user needs to receive and extract the digital information to be transmitted. Therefore, the voice coding information needs to have certain characteristics, that is, the signal energy distribution is only in a certain frequency range: 18 kHz or more and 20 kHz or less.”
  • the energy distribution of the portion of the sound coded information must be set within the frequency range of 18 kHz to 20 kHz.
  • the embedded target data can also be broadcasted together, and can be received and extracted by the device having the audio signal processing capability without being perceived by the human ear.
  • the target data is quantized by a quantizer capable of performing blind detection on the quantized result, and the discrete Fourier coefficients at the embedded position are assigned by the result of the quantization process, thereby obtaining a plurality of corresponding plurality of first spectral data.
  • Second spectrum data Second spectrum data
  • target data to be transmitted can be embedded at a suitable position of the first digital audio signal in accordance with psychoacoustic principles.
  • the signal embedded in the embedded position for expressing the target data can be masked so as not to be perceived by the human ear, but the embedded signals can be provided with the audio signal.
  • the processing capability of the device is listening and restoring.
  • Another object of the present invention is to provide a method for extracting data from a digital audio signal, by which the received digital audio signal can be processed while the digital audio signal is broadcast by the audio device, using psychoacoustic principles. Extract the target data embedded in it.
  • the target data sequence is
  • the above specific audio data and/or encoded data are serially arranged in a predetermined order; the particular audio frequency domain signals correspond to a particular loudness and/or a particular pitch and/or timbre.
  • the target data sequence carried by the first digital audio signal by using the masking effect is extracted from the first digital audio signal, and the corresponding target data is further recovered;
  • the embedded target data sequence can be broadcasted by the audio device together with the digital audio signal, it is not perceived by the human ear.
  • some target data needs to be embedded into the target digital audio signal.
  • each audio frame data is windowed. Then, frequency domain discrete Fourier transform is performed on each audio frame data subjected to windowing, and a plurality of first spectrum data respectively corresponding to the respective audio frame data can be obtained.
  • the first spectrum data is respectively mapped to an auditory critical band, and a masking threshold of each sub-band in the auditory critical band is calculated; the number of the masking thresholds and the sub-audit critical band are The number of bands is corresponding.
  • a frequency point smaller than the masking threshold is selected as an embedding position of the target data; and then, the quantized device that can perform blind detection on the quantized result is used to quantize the target data, and used a result obtained by the quantization process, the discrete Fourier coefficients of the embedded position are assigned (replaced), so that each second spectral data corresponding to each of the foregoing first spectral data can be obtained;
  • a second digital audio signal can be obtained by performing discrete Fourier transform on the plurality of second spectral data.
  • the target data described above is embedded in the newly obtained second digital audio signal.
  • the length of each audio frame and the size of the window may be determined by the relevant technician according to specific design requirements, and at least two types may be used. plan selection. For example, a scheme is similar to voice recognition technology, that is, there is an overlap between frames and frames; in this manner, a general window length is 25 to 35 ms, and a frame shift is 10 ms (of course, Greater than or less than 10ms).
  • Another scheme is to use a method in which there is no overlap between frames, and the window length is directly specified as the number of sampling points in the time domain, generally 2 N (N is a positive integer) power; for example: 256 Or 512 picks
  • N is a positive integer
  • the sample point is a window of data.
  • mapping specifically refers to converting a linear frequency into a Bark domain frequency; for example, one available conversion formula is as follows:
  • f is the linear Hz frequency
  • z is the serial number of the Bark domain.
  • the above quantizer is unusable.
  • the quantizer used in the present invention is a quantizer capable of adaptive step size and which can perform blind detection on the quantization result. This actually refers to the effect of blind detection of steganographic information, that is, the secret data sequence quantized by the quantizer that can achieve blind detection of the quantization result is written into the carrier, and in the extraction (decoding) phase, the original is not needed. With the participation of the carrier data, the written (embedded) data can be extracted from the quantized data by a quantizer capable of blind detection of the quantized result. It is possible for those skilled in the art to use a quantizer capable of achieving blind detection of the quantized result as long as it is capable of achieving the above effects.
  • the above operation is performed, and the data to be transmitted can be embedded in the first digital audio signal having a certain length of time. information.
  • the specific specific improvements or additions of the present invention may be arbitrarily combined with each other on the basis of the above specific embodiments of the first type, and may be different.
  • the design needs to form a specific technical solution that is different.
  • the quantized device that can perform blind detection on the quantized result is quantized by the target data, and the result obtained by the quantization process is used.
  • a preferred way to assign (replace) the discrete Fourier coefficients of the aforementioned embedded position is:
  • the target data is quantized by a quantizer capable of blind detection of the quantized result, and the discrete Fourier coefficients of the embedded position are assigned (replaced) by the result of the quantization process.
  • the advantage of adopting such a preferred solution is that the amount of embedded data can be automatically adjusted according to the specific situation of the audio frame data of different embedded positions; for example, in an audio signal with more audio data and higher energy. While ensuring the masking effect, try to increase the amount of data embedded; in audio signals with less audio data and lower energy (for example, in the case of static field), the amount of embedded data can be correspondingly reduced to ensure the effect of masking. .
  • the process of calculating the embedded intensity coefficient from the energy value or power spectrum of the audio frame data is essentially calculating the quantization step size.
  • a non-uniform quantization step size can be adopted, the quantization step size is adaptive to the masking threshold of each frame, and the steganographic information cannot be guaranteed. Hear.
  • the quantization step size representing the embedding strength can be calculated using the following formula:
  • ⁇ ' is the quantization step size of the embedded strength
  • is the base quantization step size
  • LT min is the masking threshold of the audio frame to be embedded in the secret information.
  • Lb is the scaling factor for the quantization step increment, which is between 0 and 1, usually taking a value of 1.
  • the embedding position of the target data is located at the frequency point corresponding to the masking threshold, since the masking thresholds of the respective sub-bands of the critical band are usually different, in order to completely and absolutely mask the embedded target data, it will not It is preferred by a human to hear that, in the first embodiment of the present invention, the frequency point corresponding to the smallest masking threshold in each sub-band is selected as the embedding position, and the target to be embedded is selected. The data is embedded at the embedding location corresponding to the smallest masking threshold.
  • the entire audio frequency range is 20 Hz to 20 kHz; in fact, not all people can hear all the sounds in the entire audio frequency range mentioned above. Sound signal.
  • the industry in designing and manufacturing audio playback devices and systems from reducing the amount of data transmission, improving the performance of equipment or systems, etc., often weaken, and even filter out high-frequency audio signals, enhance the low-frequency Signal; therefore, if the target data is embedded in the signal of the high frequency band in the technical solution of the first type of embodiment of the present invention, when the corresponding audio signal is played by using the aforementioned systems or devices, it may cause Target data embedded in the high frequency band is difficult to extract and recover; sometimes it may not even be received at all. In order to solve such a problem, it is ensured that the robustness of the technical solution of the present invention is adopted.
  • the frequency points located in the middle and low frequency bands are preferably used as the embedding positions of the target data.
  • the low frequency band in the present invention is 30 to 150 Hz
  • the medium and low frequency bands are 30 to 500 Hz
  • the medium and high frequency bands 500 to 5000 Hz
  • the most suitable target data is embedded in the invention with 30 to 4000 Hz.
  • the frequency range Of course, those skilled in the art can also select other frequency bands as the frequency range in which the target data is embedded according to specific design requirements.
  • the essence of the technical solution of the present invention is to embed specific target data in the original digital audio signal, and the embedded target data. It can be seen as a noise signal of a new digital audio signal obtained after embedding. It is well known that when the intensity of the noise signal is large enough, it will affect the quality of the new digital audio signal, and will also affect the transmission and extraction of the target data. Therefore, it is necessary to evaluate the quality of the new digital audio signal obtained after embedding the target data, and then determine whether to use or output.
  • the signal to noise ratio of the second digital audio signal may be further calculated, according to the result of the calculation.
  • the quality of the second digital audio signal after embedding the target data is evaluated. If the calculated signal-to-noise ratio is less than a preset ratio (threshold value, which can be set by the relevant technician according to specific design requirements, for example: 17dB, 20dB, 23dB, etc.), indicating the second digital audio signal The quality does not meet the predetermined signal to noise ratio requirements.
  • the embedded position of the target data, the Fourier coefficient and the like are re-determined, and the steps of the foregoing various embodiments of the present invention are re-executed until the finally obtained second digital audio signal is obtained.
  • the noise ratio reaches a predetermined requirement
  • the second number that meets the SNR requirement is output. Word audio signal.
  • the embedded target data is actually serially arranged into a target data sequence by more than one specific audio data and/or encoded data in a predetermined order.
  • the aforementioned specific audio data corresponds to a specific loudness and/or a specific pitch and/or timbre; and the aforementioned encoded data is a number expressed in a computer count.
  • a specific target data sequence may be composed of one or more specific audio data serially arranged in a predetermined order; or may be formed by simply serially arranging one or more specific encoded data in a predetermined order;
  • the predetermined rule is constituted by interleaving one or more specific audio data and one or more specific encoded data, and serially arranging them in a predetermined order.
  • the advantage that a target data sequence is simply serially arranged by more than one specific encoded data is that the target data can be embedded and received and extracted at a high speed, and is suitable for applications that need to transmit data frequently and quickly. Occasionally, for example, scenes such as live interaction.
  • a target data sequence is simply serially arranged by more than one specific audio data.
  • any particular audio data corresponds to a particular loudness and/or a particular pitch and/or timbre.
  • loudness also known as the volume, refers to the strength of the human voice; it is a subjective sense of the size of the sound.
  • the objective evaluation scale is the amplitude of the sound.
  • pitch is the height of the sound, which is determined by the vibration frequency. Therefore, the pitch is proportional to the vibration frequency.
  • tone is also called the sound, which refers to the characteristics of the sound that the hearing feels. The tone is mainly determined by the spectrum of the sound, that is, the composition of the pitch and each harmonic.
  • a target data sequence may be included in a specified number of specific audio data; since any specific audio data may be determined using the above-described loudness, pitch, and timbre, therefore, All target data sequences composed of a predetermined number of specific audio data mentioned in the foregoing various technical solutions are associated with one information codebook for transmitting data covering a larger information codebook.
  • pitches have different frequency values; it is assumed that n different frequency values are selected, wherein the n pitches can respectively use A, B, C, D, E, F, G, H, I, J Said Different loudnesses have different sound intensity values; it is assumed that m different sound intensity values are selected, wherein the m loudnesses can respectively use a, b, c, d, e, f, g, h... Representation; different timbres have different sound spectra; assume k different sound spectra are selected, wherein the k sound spectra can be represented by 1, 2, 3, ... k respectively; on this basis, any An audio data can be described in the following form:
  • the information codebook capacity W of any one of the audio data in the present invention can be calculated by the following formula:
  • a unit audio group is simply composed of five audio data; the information codebook capacity of any unit audio data group is calculated by:
  • the value of W is: 2 30 ⁇ 10 5 > 10 14
  • n, m and k are all natural numbers, and the relevant skilled person can select or determine according to the required codebook capacity when implementing the present invention.
  • a target data sequence can be constructed in a completely single target data form, for example, simply using audio data or simply using the encoded data to construct a target data sequence. However, in some cases, it may be necessary to construct a target data sequence using a mixture of audio data and encoded data.
  • a target data sequence is a mixture of audio data and encoded data, as long as it is used in a completely closed information system, it can be constructed in a good way. Any target data sequence without the need to insert any identifying data sequence into it; instead, in an open information system, identifying the data sequence is almost a must. Therefore, whether or not to use the identification data sequence should be determined by the relevant technical personnel according to the specific needs when designing the relevant system.
  • the identification data sequence is preferably constructed using encoded data.
  • the relevant technician can also choose to use the audio data according to the specific design requirements, and the combination of the audio data and the encoded data to form the identification data sequence.
  • an important advantage of the present invention is that since the above-mentioned target data sequence is inserted at a position below the masking threshold of the digital audio signal, the presence of the masking effect occurs when the digital audio signal after the insertion of the target data sequence is played.
  • the inserted audio signal sequence is not perceived by the human ear.
  • the audio signal (loudness, pitch, and timbre) of various dimensions is used in the present invention to form an audio data sequence
  • the capacity of the information codebook has a large space and can be utilized with limited space. Audio data to deliver enough information.
  • the present invention also provides the following technical solutions:
  • the received digital audio signal is framed into a plurality of audio frame data and performed. Windowing processing; performing frequency domain discrete Fourier transform on the plurality of audio frame data to obtain a plurality of spectrum data respectively corresponding to the audio frame data;
  • a one-dimensional data sequence embedded in the medium see the content of each of the above-described digital audio signal processing of the present invention, wherein the foregoing target data sequence is serially arranged in a predetermined order by more than one specific audio data and/or encoded data. Arranged; wherein the particular audio frequency domain signal corresponds to a particular loudness and/or a particular pitch and/or tone.
  • a corresponding one-dimensional data sequence can be extracted from a digital audio signal embedded with a target data sequence.
  • the one-dimensional data sequence is composed of audio data, or is composed of a mixture of audio data and encoded data; or, when the digital audio signal is transmitted in an open information system, it needs to be extracted Finding a predetermined identification data sequence in the one-dimensional data sequence, and performing pattern recognition on the extracted audio data of the position corresponding to the identification data sequence in the extracted one-dimensional data sequence according to the indication of the identification data sequence, and finally obtaining the corresponding target Data sequence.
  • obtaining the target data sequence means obtaining the actual information, for example, when the target data sequence is composed only of the encoded data; but in some cases, for example, when the target data sequence is composed of audio data, Or when the audio data and the encoded data are mixed, even if the target data sequence is extracted by the mode recognition according to the indication of the foregoing identification data sequence, it may be necessary to use the predetermined coding table to transform the target data sequence. Finally, the target data embedded in the aforementioned digital audio signal is obtained.
  • the one-dimensional data sequence or the target data sequence may be obtained by using a receiving device, such as a mobile phone, a smart device having a microphone and audio processing capability, and the like.
  • a receiving device such as a mobile phone, a smart device having a microphone and audio processing capability, and the like.
  • the server side specifically completes the search for the predetermined identification data sequence, extracts the target data sequence by pattern recognition according to the indication of the identification data sequence, and transforms the target data sequence by using a predetermined coding table.
  • operations such as target data embedded in the aforementioned digital audio signal are obtained.
  • a specific application example is: after extracting the target data sequence embedded in the digital audio signal by using the above specific embodiments, if the target data sequence is simply composed of audio data, the target data sequence can be The specific specific audio data and the combination thereof are encoded and matched, that is, the data information corresponding to the audio signal sequence can be queried in a predetermined coding table.
  • the predetermined coding table generally includes at least one-to-one correspondence information: an audio data sequence and specific information corresponding thereto; for example, according to the above-mentioned audio data sequence composed of loudness, pitch, and timbre.
  • an audio data sequence composed of loudness, pitch, and timbre.
  • the manner of transmitting information in this way is somewhat similar to that of the telegraph code; however, as described above, if the information codebook capacity is sufficiently large, the method of transmitting information of the present invention can be separated from the aforementioned telegraph code, and the data can be directly transmitted. .

Abstract

A method for processing a digital audio signal. By embedding the other contents in a certain form into a digital audio signal, the purpose of secretly delivering digital information is realized; and the digital audio signal is enabled to carry pre-determined data mainly using the masking effect of a human auditory system. By means of the method, data that needs to be delivered can be embedded at a suitable position of a digital audio signal. When the digital audio signal is played, the audio signal, for expressing relevant data information, embedded at an embedding position can be masked, so that the audio signal is not perceived by a human ear but can be received by a device with an audio signal processing capability.

Description

数字音频信号处理的方法Digital audio signal processing method 技术领域Technical field
本发明涉及一种数字音频信号处理技术,尤其涉及一种基于心理声学,利用掩蔽效应的,数字音频信号处理的方法。The present invention relates to a digital audio signal processing technique, and more particularly to a method for digital audio signal processing based on psychoacoustics using a masking effect.
背景技术Background technique
利用数字音频信号来携带信息是业界广为关注并投入相当人力和财力进行研究和开发的技术。利用这样的技术,人们可以一边正常地收听音乐、收看电视节目,一边利用具有音频信号处理能力的设备,例如:移动通信终端,来获取前述的音乐或者电视节目中所携带的数据信息。评价这种技术是否成熟和适于应用的一个重要特性就是:这种技术应当既要保证被携带的数据能够被准确地采集、传递,又要保证数字音频信号本身被播放时,不会产生人类能够感受到的干扰音或者噪音。The use of digital audio signals to carry information is a technology that is widely concerned and invested in research and development by the industry. With such a technique, one can obtain music information carried in the aforementioned music or television program by using a device having audio signal processing capability, such as a mobile communication terminal, while listening to music and watching television programs normally. An important feature to evaluate the maturity and suitability of this technology is that it should ensure that the data being carried can be accurately captured and transmitted, and that the digital audio signal itself is played without humans. Interference noise or noise that can be felt.
中国专利申请201410301832.7公开这样的一种技术:将需要传输的数字信息经过编码调制形成声音编码信号;将该声音编码信号与预选的音视频节目中的音频信号进行混音后输出。虽然,利用该技术能够将“需要传输的数字信息”以混音的方式加入到正常的声音之中;但是,由于“需要传输的数字信息”的不可预知性,“需要传输的数字信息”经过编码调制所形成的声音编码信号在相当多的情况下可能是声音中的噪音。在另外的一些情况下,可能是能够对正常播放的声音造成干扰的其他声音。为了避免这样的问题,在上述专利申请的说明书部分提出了如下的改进方案:Chinese Patent Application No. 201410301832.7 discloses a technique of encoding and modulating digital information to be transmitted to form a sound encoded signal; and mixing the sound encoded signal with an audio signal in a preselected audio and video program for output. Although, with this technology, the "digital information to be transmitted" can be added to the normal sound by mixing; however, due to the unpredictability of the "digital information to be transmitted", the "digital information to be transmitted" passes through The coded signal formed by the coded modulation may be noise in the sound in a considerable number of cases. In other cases, it may be other sounds that can interfere with normally played sounds. In order to avoid such problems, the following improvements are proposed in the specification section of the above patent application:
“将需要传输的数字信息经过编码调制形成声音编码信号。该声音编码信号可以写成数字声音信号文件,也可以经过数模转换器转换成声音模拟信号,该声音模拟信号的频率可选择位于18kHz以上、20kHz以下的频段,该频段人耳难以察觉,不会影响原有电视伴音或音乐信号的正常播放。因为在后续的步骤中,需要由用户本地的接收设备进行接收和提取需要传输的数字信息,所以该声音编码信息需具有一定的特征,该特征是信号能量分布仅在一定频率范围内:18kHz以上,20kHz以下。” "The digital information to be transmitted is encoded and modulated to form a sound encoded signal. The sound encoded signal can be written as a digital sound signal file, or can be converted into a sound analog signal by a digital-to-analog converter. The frequency of the sound analog signal can be selected to be above 18 kHz. In the frequency band below 20 kHz, the human ear is difficult to detect and does not affect the normal playback of the original TV sound or music signal. In the subsequent steps, the local receiving device of the user needs to receive and extract the digital information to be transmitted. Therefore, the voice coding information needs to have certain characteristics, that is, the signal energy distribution is only in a certain frequency range: 18 kHz or more and 20 kHz or less."
显然,上述的方案为了避免人耳察觉用“需要传输的数字信息”形成的声音编码,而必须使这部分声音编码信息的能量分布被设置在18kHz~20kHz这个频率范围之内。Obviously, in order to prevent the human ear from perceiving the sound code formed by the "digital information to be transmitted", the energy distribution of the portion of the sound coded information must be set within the frequency range of 18 kHz to 20 kHz.
众多周知:人耳能够听到的声音的整个范围是20Hz~20kHz。听觉良好的成年人能听到的声音频率常在30Hz~16kHz之间;听力较差的老年人能听到的声音频率则常在50Hz~10kHz之间。然而,儿童能听到的声音频率通常会更高。上述技术方案中所采用的18Hz~20kHz频率范围的声音是许多儿童能够听到的。因此,即使选择性地将声音编码信息的能量分布在18Hz~20kHz这个频率范围之内,也会使得相当多的人,特别是儿童还能听到;这使得这些人,特别是儿童在聆听含有使用该技术进行声音编码电视、广播节目时,依然会受到噪音或者干扰音的困扰。Many people know that the entire range of sounds that can be heard by the human ear is 20 Hz to 20 kHz. The frequency of sounds that can be heard by adults with good hearing is often between 30 Hz and 16 kHz; the frequency of sounds that can be heard by older people with poor hearing is often between 50 Hz and 10 kHz. However, the frequency of sound that children can hear is usually higher. The sound in the frequency range of 18 Hz to 20 kHz used in the above technical solution is audible to many children. Therefore, even if the energy of the sound coded information is selectively distributed within the frequency range of 18 Hz to 20 kHz, a considerable number of people, especially children, can still hear it; this makes these people, especially children, listen to When using this technology to sound-code TV and radio programs, you will still suffer from noise or interference.
另一方面,选择性地将声音编码信息的能量分布在人耳能够听到频率范围(20Hz~20kHz)之外虽然能够实现,但由于绝大多数音响设备的频率响应特性是依据人耳能听到的声音范围设计制造的,对于20Hz~20kHz频率范围之外的音频信号,一般都会被当作杂音或者噪音滤掉,因此,声音编码信息即使能够被混音到正常的音频信号之中,却并不能被音响设备所播放,因而也不可能被接受设备所获取。On the other hand, selectively distributing the energy of the sound coded information beyond the range of the human ear can be heard (20 Hz to 20 kHz), although the frequency response characteristics of most audio devices are based on the human ear. The sound range is designed and manufactured. For audio signals outside the frequency range of 20 Hz to 20 kHz, it is generally filtered out as noise or noise. Therefore, even if the sound coded information can be mixed into a normal audio signal, It cannot be played by audio equipment and therefore cannot be obtained by the receiving equipment.
综上,上述的各种技术显然并不成熟,因此也不可能得到广泛的应用。In summary, the above various technologies are obviously not mature, and therefore it is impossible to obtain a wide range of applications.
发明内容Summary of the invention
本发明的目的是提供一种数字音频信号处理的方法,利用心理声学原理来对所述的数字音频信号进行处理,将需要传送的信息,以特定的目标数据嵌入到该数字音频信号之中,使得该数字音频信号被音响设备播出时,被嵌入的目标数据也能一并被播出,在不为人耳所察觉的情况下,却能被具有音频信号处理能力的设备所接收和提取。It is an object of the present invention to provide a method for digital audio signal processing that utilizes psychoacoustic principles to process the digital audio signal and embed the information to be transmitted into the digital audio signal with specific target data. When the digital audio signal is broadcast by the audio device, the embedded target data can also be broadcasted together, and can be received and extracted by the device having the audio signal processing capability without being perceived by the human ear.
本发明的上述目的是采用这样的技术方案实现的:The above object of the present invention is achieved by using such a technical solution:
将第一数字音频信号分帧为多个音频帧数据并进行加窗处理;对前述多个音频帧数据分别进行频域离散傅立叶(Fourier)变换,得到与前述多个音频帧数据分别对应的多个第一频谱数据;Framing the first digital audio signal into a plurality of audio frame data and performing windowing processing; performing frequency domain discrete Fourier transform on the plurality of audio frame data to obtain a plurality of corresponding audio frame data respectively First spectrum data;
将前述多个第一频谱数据映射到听觉临界频带(Bark域),并计算听 觉临界频带中各个子带的掩蔽阈值;该掩蔽阈值的数量与前述的子带的数量是一一对应的;Mapping the plurality of first spectral data to an auditory critical band (Bark domain) and calculating the listening a masking threshold of each subband in the critical band; the number of the masking thresholds is in one-to-one correspondence with the number of subbands described above;
在前述多个第一频谱数据中选取小于前述掩蔽阈值的频率点作为嵌入位置;Selecting a frequency point smaller than the foregoing masking threshold as the embedded position among the plurality of first spectrum data;
采用可对量化结果实现盲检测的量化器对目标数据进行量化处理,并用量化处理的结果赋值前述的嵌入位置处的离散傅里叶系数,因此获得与前述多个第一频谱数据对应的多个第二频谱数据;The target data is quantized by a quantizer capable of performing blind detection on the quantized result, and the discrete Fourier coefficients at the embedded position are assigned by the result of the quantization process, thereby obtaining a plurality of corresponding plurality of first spectral data. Second spectrum data;
对前述多个第二频谱数据进行离散傅立叶逆变换,获得第二数字音频信号。Performing discrete Fourier transform on the plurality of second spectral data to obtain a second digital audio signal.
采用本发明的上述方法,可以根据心理声学的原理,在第一数字音频信号的合适位置,嵌入需要传递的目标数据。当该第一数字音频信号被播放时,能够掩蔽掉嵌入位置上所嵌入的用于表达有关目标数据的信号,使其不为人耳所察觉,但是,这些被嵌入的信号却能被具有音频信号处理能力的设备所侦听和还原。With the above method of the present invention, target data to be transmitted can be embedded at a suitable position of the first digital audio signal in accordance with psychoacoustic principles. When the first digital audio signal is played, the signal embedded in the embedded position for expressing the target data can be masked so as not to be perceived by the human ear, but the embedded signals can be provided with the audio signal. The processing capability of the device is listening and restoring.
本发明的另一个目的是提供一种从数字音频信号中提取数据的方法;利用该方法,能够在数字音频信号被音响设备播出时,对接收到的数字音频信号进行处理,利用心理声学原理提取嵌入其中的目标数据。Another object of the present invention is to provide a method for extracting data from a digital audio signal, by which the received digital audio signal can be processed while the digital audio signal is broadcast by the audio device, using psychoacoustic principles. Extract the target data embedded in it.
将接收到的第一数字音频信号分帧为多个音频帧数据,并进行加窗处理;对前述多个音频帧数据进行频域离散傅立叶变换,得到与前述多个音频帧数据分别对应的多个第一频谱数据;Framing the received first digital audio signal into a plurality of audio frame data, and performing windowing processing; performing frequency domain discrete Fourier transform on the plurality of audio frame data to obtain a plurality of corresponding audio frame data respectively First spectrum data;
将前述多个第一频谱数据映射到听觉临界频带,并计算听觉临界频带中各子带的掩蔽阈值;前述的掩蔽阈值的数量与前述的子带的数量一一对应;Mapping the plurality of first spectral data to an auditory critical band, and calculating a masking threshold of each subband in the auditory critical band; the number of the masking thresholds is one-to-one corresponding to the number of the subbands;
选取前述多个第一频谱数据中小于相应的掩蔽阈值的频率点作为嵌入位置;Selecting a frequency point of the plurality of first spectrum data that is smaller than a corresponding masking threshold as an embedded position;
采用可对量化结果实现盲检测的量化器对前述嵌入位置处的离散傅里叶系数进行反量化处理,获得前述第一数字音频信号中嵌入的目标数据序列;其中,该目标数据序列是由一个以上特定的音频数据和/或编码数据按照预定的顺序串行排列而成;该等特定的音频频域信号与特定的响度和/或特定的音高和/或音色相对应。 Performing inverse quantization processing on the discrete Fourier coefficients at the embedded position by using a quantizer capable of blind detection of the quantized result, to obtain a target data sequence embedded in the first digital audio signal; wherein the target data sequence is The above specific audio data and/or encoded data are serially arranged in a predetermined order; the particular audio frequency domain signals correspond to a particular loudness and/or a particular pitch and/or timbre.
本发明上述的方法,能够在接收到的第一数字音频信号时,利用心理声学原理从中提取出利用掩蔽效应通过该第一数字音频信号携带的目标数据序列,并进一步恢复出相应的目标数据;而在这一过程中,尽管被嵌入的目标数据序列能够与该数字音频信号一并被音响设备播出,但却不为人耳所察觉。According to the above method of the present invention, when the first digital audio signal is received, the target data sequence carried by the first digital audio signal by using the masking effect is extracted from the first digital audio signal, and the corresponding target data is further recovered; In the process, although the embedded target data sequence can be broadcasted by the audio device together with the digital audio signal, it is not perceived by the human ear.
具体实施方式detailed description
在本发明的第一类具体实施方式中,需要向目标数字音频信号中嵌入一些目标数据。In a first type of embodiment of the invention, some target data needs to be embedded into the target digital audio signal.
为了在一个数字音频信号中嵌入上述的目标数据,需要将数字音频信号分帧为多个音频帧数据,并在此基础上对各个音频帧数据进行加窗处理。然后,对经过加窗处理的各个音频帧数据进行频域离散傅立叶变换,能够得到与前述各个音频帧数据分别一一对应的多个第一频谱数据。In order to embed the above-mentioned target data in a digital audio signal, it is necessary to frame the digital audio signal into a plurality of audio frame data, and on this basis, each audio frame data is windowed. Then, frequency domain discrete Fourier transform is performed on each audio frame data subjected to windowing, and a plurality of first spectrum data respectively corresponding to the respective audio frame data can be obtained.
在得到前述多个第一频谱数据后,需要将这些第一频谱数据分别映射到听觉临界频带,并计算该听觉临界频带中各子带的掩蔽阈值;这些掩蔽阈值的数量与听觉临界频带的子带的数量是对应的。After obtaining the plurality of first spectrum data, the first spectrum data is respectively mapped to an auditory critical band, and a masking threshold of each sub-band in the auditory critical band is calculated; the number of the masking thresholds and the sub-audit critical band are The number of bands is corresponding.
在上述多个第一频谱数据中,均选取其中小于前述掩蔽阈值的频率点作为目标数据的嵌入位置;然后,采用可对量化结果实现盲检测的量化器对前述的目标数据进行量化处理,并用量化处理后得到的结果,对前述嵌入位置的离散傅里叶系数赋值(替换),因此可以获得与前述各个第一频谱数据分别对应的各个第二频谱数据;In the plurality of first spectrum data, a frequency point smaller than the masking threshold is selected as an embedding position of the target data; and then, the quantized device that can perform blind detection on the quantized result is used to quantize the target data, and used a result obtained by the quantization process, the discrete Fourier coefficients of the embedded position are assigned (replaced), so that each second spectral data corresponding to each of the foregoing first spectral data can be obtained;
对该等多个第二频谱数据进行离散傅立叶逆变换,就可以获得第二数字音频信号。这个新获得的第二数字音频信号中嵌入有上述的目标数据。A second digital audio signal can be obtained by performing discrete Fourier transform on the plurality of second spectral data. The target data described above is embedded in the newly obtained second digital audio signal.
需要说明的是:在对第一数字音频信号进行分帧、加窗等处理时,可以由相关的技术人员根据具体的设计要求来确定各音频帧的长度和窗的大小,至少可以有两种方案选择。例如:一种方案与语音识别技术相类似,即采用帧与帧之间有重叠(overlap)的方式;在这种方式下,一般的窗长为25~35ms,帧移为10ms(当然也可以大于或者小于10ms)。另一种方案则是采用帧与帧之间没有重叠的方式,而窗长直接指定为时域上采样点的个数,一般为2的N(N为正整数)次方;比如:以256或者512个采 样点为一窗数据。It should be noted that when the first digital audio signal is processed by framing, windowing, etc., the length of each audio frame and the size of the window may be determined by the relevant technician according to specific design requirements, and at least two types may be used. plan selection. For example, a scheme is similar to voice recognition technology, that is, there is an overlap between frames and frames; in this manner, a general window length is 25 to 35 ms, and a frame shift is 10 ms (of course, Greater than or less than 10ms). Another scheme is to use a method in which there is no overlap between frames, and the window length is directly specified as the number of sampling points in the time domain, generally 2 N (N is a positive integer) power; for example: 256 Or 512 picks The sample point is a window of data.
另外,前述的“映射”具体是指:将线性频率转换为Bark域频率;例如,一个可用的转换公式如下:In addition, the aforementioned "mapping" specifically refers to converting a linear frequency into a Bark domain frequency; for example, one available conversion formula is as follows:
z=13arctan(0.00076f)+3.5arctan[(f/7500)2]z=13arctan(0.00076f)+3.5arctan[(f/7500) 2 ]
其中,f为线性Hz频率,z取整即为Bark域的序号。Where f is the linear Hz frequency, and z is the serial number of the Bark domain.
有关线性Hz频率和Bark域的对应关系,可以参照:美国声学学会杂志(The Journal of the Acoustical Society of America)第33卷第2期第248页所刊登的Zwicker,E.有关《可听频率范围临界频带细分》(Subdivision of the Audible Frequency Range into Critical Bands)一文,以及该杂志第88卷97–91中所刊载Traunmüller,H.(1990)有关《对于音质的感官尺度的解析表达式》(Analytical expressions for the tonotopic sensory scale)一文。For the correspondence between the linear Hz frequency and the Bark domain, refer to: Zwicker, E., on the audible frequency range, published in The Journal of the Acoustical Society of America, Vol. 33, No. 2, p. Subdivision of the Audible Frequency Range into Critical Bands, and Traunmüller, H. (1990), in the Journal, Vol. 88, 97–91, on “Analytical Expressions of Sensory Scales for Sound Quality” ( Analytical expressions for the tonotopic sensory scale).
众所周知:当信号x通过量化器Q时,可以将信号x量化为量化水平y,即:y=Q(x);反之,由量化水平y获得信号x’的过程为反量化,即x’=Q-1(y)。由于量化误差的存在,前述的信号x与信号x’不可能精确一致。It is well known that when the signal x passes through the quantizer Q, the signal x can be quantized to the quantization level y, ie: y = Q(x); conversely, the process of obtaining the signal x' from the quantization level y is inverse quantization, ie x'= Q -1 (y). Due to the existence of quantization errors, the aforementioned signal x and the signal x' may not be exactly coincident.
在本发明中,上述的量化器是无法使用的。本发明中所使用的量化器是能够自适应步长,并且可以对量化结果可以实现盲检测的量化器。这实际上指的是一种隐写信息盲检测的效果,即:通过可对量化结果实现盲检测的量化器量化的隐密数据序列被写入载体后,在提取(解码)阶段,无需原始载体数据的参与,即可从载密数据中由可对量化结果实现盲检测的量化器提取出写(嵌)入的数据。对于本领域技术人员而言,只要是能够实现上述效果的可对量化结果实现盲检测的量化器都是可以使用的。In the present invention, the above quantizer is unusable. The quantizer used in the present invention is a quantizer capable of adaptive step size and which can perform blind detection on the quantization result. This actually refers to the effect of blind detection of steganographic information, that is, the secret data sequence quantized by the quantizer that can achieve blind detection of the quantization result is written into the carrier, and in the extraction (decoding) phase, the original is not needed. With the participation of the carrier data, the written (embedded) data can be extracted from the quantized data by a quantizer capable of blind detection of the quantized result. It is possible for those skilled in the art to use a quantizer capable of achieving blind detection of the quantized result as long as it is capable of achieving the above effects.
采用本发明上述一类具体的实施方式,对于上述第一数字音频信号中的每个音频帧都执行上述的操作,就可以在具有一定时间长度的第一数字音频信号中嵌入所需要传递的数据信息。According to the above specific implementation manner of the present invention, for each audio frame in the first digital audio signal, the above operation is performed, and the data to be transmitted can be embedded in the first digital audio signal having a certain length of time. information.
除了上述第一类具体的实施方式之外,本发明后续的各个具体的改进内容或者增加的内容,都可以在上述第一类具体的实施方案的基础上,任意地相互组合,可以因不同的设计需要而构成各有区别的具体技术方案。In addition to the specific implementations of the first type, the specific specific improvements or additions of the present invention may be arbitrarily combined with each other on the basis of the above specific embodiments of the first type, and may be different. The design needs to form a specific technical solution that is different.
在本发明上述一类具体的实施方式中,所谓采用可对量化结果实现盲检测的量化器对前述的目标数据进行量化处理,并用量化处理后得到的结 果,对前述嵌入位置的离散傅里叶系数赋值(替换)的一个优选的方式是:In the above specific embodiment of the present invention, the quantized device that can perform blind detection on the quantized result is quantized by the target data, and the result obtained by the quantization process is used. A preferred way to assign (replace) the discrete Fourier coefficients of the aforementioned embedded position is:
基于上述的一个嵌入位置,根据在该嵌入位置的音频帧数据的能量值或者功率谱参数来计算出在该嵌入位置的嵌入强度系数,这个嵌入系数强度系数决定了前述相应的音频帧数据中所能嵌入的目标数据的数据量;And calculating an embedding intensity coefficient at the embedding position according to an energy value or a power spectrum parameter of the audio frame data at the embedding position, where the embedding coefficient intensity coefficient determines the corresponding audio frame data. The amount of data that can be embedded in the target data;
根据上述步骤所计算得到的嵌入强度系数,采用可对量化结果实现盲检测的量化器对目标数据进行量化处理,并用量化处理的结果赋值(替换)前述嵌入位置的离散傅里叶系数。According to the embedded intensity coefficient calculated in the above steps, the target data is quantized by a quantizer capable of blind detection of the quantized result, and the discrete Fourier coefficients of the embedded position are assigned (replaced) by the result of the quantization process.
采用这样的一个优选方案的好处是:可以根据不同嵌入位置的音频帧数据的信号具体情况,来自动地调整所嵌入的数据量;例如:在音频数据较多且能量较高的音频信号中可以在确保掩蔽效果的同时,尽量增加所嵌入的数据量;在音频数据较少且能量较低的音频信号(例如:静场的情形)中可以相应地减少所嵌入的数据量以确保掩蔽的效果。The advantage of adopting such a preferred solution is that the amount of embedded data can be automatically adjusted according to the specific situation of the audio frame data of different embedded positions; for example, in an audio signal with more audio data and higher energy. While ensuring the masking effect, try to increase the amount of data embedded; in audio signals with less audio data and lower energy (for example, in the case of static field), the amount of embedded data can be correspondingly reduced to ensure the effect of masking. .
有关根据音频帧数据的能量值或者功率谱计算嵌入强度系数的过程,本质上就是在计算量化步长。在本发明中,为了更好的通过听觉掩蔽来体现载密音频的不可感知性,可以采用非均匀的量化步长,量化步长自适应于每帧的掩蔽阈值,并保证隐写信息不能被听到。在一类具体的实施方式中,代表嵌入强度的量化步长可以采用如下的公式来计算:The process of calculating the embedded intensity coefficient from the energy value or power spectrum of the audio frame data is essentially calculating the quantization step size. In the present invention, in order to better reflect the imperceptibility of the dense audio through the auditory masking, a non-uniform quantization step size can be adopted, the quantization step size is adaptive to the masking threshold of each frame, and the steganographic information cannot be guaranteed. Hear. In a specific embodiment, the quantization step size representing the embedding strength can be calculated using the following formula:
Δ′=Δ+lbLTmin/50Δ'=Δ+lbLT min /50
其中,Δ′为嵌入强度的量化步长,Δ为基础量化步长,LTmin是待嵌入隐秘信息的音频帧的掩蔽阈值。显然,该掩蔽阈值越大,则可取得较大的量化步长。lb为针对量化步长增量的缩放因子,取值在0和1之间,通常取1值。Where Δ' is the quantization step size of the embedded strength, Δ is the base quantization step size, and LT min is the masking threshold of the audio frame to be embedded in the secret information. Obviously, the larger the masking threshold, the larger the quantization step size can be achieved. Lb is the scaling factor for the quantization step increment, which is between 0 and 1, usually taking a value of 1.
尽管目标数据的嵌入位置都位于掩蔽阈值所对应的频率点,但是,由于临界频带的各个子带的掩蔽阈值通常各不相同,为了能够彻底、绝对地将嵌入的目标数据掩蔽掉,而不会被人类听到,优选的一类实施方式是:在本发明中上述第一类具体实施方式的基础上,选取各个子带中最小的掩蔽阈值所对应的频率点作为嵌入位置,将要嵌入的目标数据嵌入到该最小的掩蔽阈值所对应的嵌入位置处。Although the embedding position of the target data is located at the frequency point corresponding to the masking threshold, since the masking thresholds of the respective sub-bands of the critical band are usually different, in order to completely and absolutely mask the embedded target data, it will not It is preferred by a human to hear that, in the first embodiment of the present invention, the frequency point corresponding to the smallest masking threshold in each sub-band is selected as the embedding position, and the target to be embedded is selected. The data is embedded at the embedding location corresponding to the smallest masking threshold.
众所周知:对于人类而言,整个音频频率范围是20Hz~20kHz;事实上,并不是所有的人都能够听到前述整个音频频率范围内的所有闻域的声 音信号。为此,业界在设计、制造音频播放的设备、系统时,从降低数据传输量,提高设备或者系统的性能等多方面考虑,往往会消弱,甚至滤除高频段的音频信号,增强中低频信号;因此,如果在采用本发明第一类具体实施方式的技术方案中将目标数据嵌入到高频段的信号时,在使用前述的那些系统或者设备播放相应的音频信号时,有可能会导致被嵌入到高频段的目标数据难于提取和恢复;有时甚至可能根本无法被接收到。为了解决这样的问题,确保采用本发明技术方案的鲁棒性,可以在上述各类具体实施方式的基础上,优选位于中、低频段的频率点作为目标数据的嵌入位置。It is well known that for humans, the entire audio frequency range is 20 Hz to 20 kHz; in fact, not all people can hear all the sounds in the entire audio frequency range mentioned above. Sound signal. To this end, the industry in designing and manufacturing audio playback devices and systems, from reducing the amount of data transmission, improving the performance of equipment or systems, etc., often weaken, and even filter out high-frequency audio signals, enhance the low-frequency Signal; therefore, if the target data is embedded in the signal of the high frequency band in the technical solution of the first type of embodiment of the present invention, when the corresponding audio signal is played by using the aforementioned systems or devices, it may cause Target data embedded in the high frequency band is difficult to extract and recover; sometimes it may not even be received at all. In order to solve such a problem, it is ensured that the robustness of the technical solution of the present invention is adopted. Based on the above various specific embodiments, the frequency points located in the middle and low frequency bands are preferably used as the embedding positions of the target data.
具体而言,本发明中的低频段为30~150Hz,中低频段为30~500Hz);中高频段(500~5000Hz);综合而言,以30~4000Hz为本发明最为优选的目标数据嵌入的频率范围。当然,本领域的技术人员也可以根据具体的设计要求选择其他的频段作为目标数据嵌入的频率范围。Specifically, the low frequency band in the present invention is 30 to 150 Hz, and the medium and low frequency bands are 30 to 500 Hz; the medium and high frequency bands (500 to 5000 Hz); in general, the most suitable target data is embedded in the invention with 30 to 4000 Hz. The frequency range. Of course, those skilled in the art can also select other frequency bands as the frequency range in which the target data is embedded according to specific design requirements.
尽管使用上述的各类方案可以实现本发明前述的基本目的。但是,在一些情形下还需要如下的措施,以使本发明的方案能够进一步优化:本发明的技术方案的本质是在原先的数字音频信号中嵌入了特定的目标数据,这些被嵌入的目标数据可以被看作是嵌入后所得到的新的数字音频信号的噪音信号。众所周知:当噪音信号的强度足够大时,会影响到新的数字音频信号的质量,也会影响到目标数据的传输和提取。因此,有必要对嵌入目标数据以后得到的新的数字音频信号的质量进行评估,然后再确定是否使用、输出。Although the foregoing basic objects of the present invention can be achieved by using the various schemes described above. However, in some cases, the following measures are also required to enable the scheme of the present invention to be further optimized: the essence of the technical solution of the present invention is to embed specific target data in the original digital audio signal, and the embedded target data. It can be seen as a noise signal of a new digital audio signal obtained after embedding. It is well known that when the intensity of the noise signal is large enough, it will affect the quality of the new digital audio signal, and will also affect the transmission and extraction of the target data. Therefore, it is necessary to evaluate the quality of the new digital audio signal obtained after embedding the target data, and then determine whether to use or output.
为此,在采用本发明上述的任一类具体实施方式,获得了上述的第二数字音频信号时,还可以进一步对该第二数字音频信号的信噪比进行计算,根据该计算的结果来评估嵌入目标数据以后的第二数字音频信号的质量。如果,计算得到的信噪比小于预先设定的一个比值(阈值,可以由有关的技术人员视具体的设计要求自行设置,例如:17dB、20dB、23dB等),说明该第二数字音频信号的质量不符合预定的信噪比要求。此时,可以按照本发明上述的方案,重新确定目标数据的嵌入位置、傅里叶系数等参量,重新执行本发明前述各类具体实施方式的步骤,直到最终获得的第二数字音频信号的信噪比达到预定的要求时,再输出该符合信噪比要求的第二数 字音频信号。Therefore, when the second digital audio signal is obtained by using any of the foregoing specific embodiments of the present invention, the signal to noise ratio of the second digital audio signal may be further calculated, according to the result of the calculation. The quality of the second digital audio signal after embedding the target data is evaluated. If the calculated signal-to-noise ratio is less than a preset ratio (threshold value, which can be set by the relevant technician according to specific design requirements, for example: 17dB, 20dB, 23dB, etc.), indicating the second digital audio signal The quality does not meet the predetermined signal to noise ratio requirements. In this case, according to the foregoing solution of the present invention, the embedded position of the target data, the Fourier coefficient and the like are re-determined, and the steps of the foregoing various embodiments of the present invention are re-executed until the finally obtained second digital audio signal is obtained. When the noise ratio reaches a predetermined requirement, the second number that meets the SNR requirement is output. Word audio signal.
在本发明上述所有具体实施方式中,被嵌入的目标数据,实际上是由一个以上特定的音频数据和/或编码数据按照预定的顺序串行排列为一个目标数据序列。具体而言:前述的特定的音频数据与特定的响度和/或特定的音高和/或音色相对应;而前述的编码数据则是以计算机记数方式表达的数字。一个具体的目标数据序列可以单纯地由一个以上特定的音频数据按照预定的顺序串行排列所构成;也可以单纯地由一个以上特定的编码数据按照预定的顺序串行排列所构成;还可以按照预定的规则,由一个以上特定的音频数据和一个以上特定的编码数据相互交错,并按照预定的顺序串行排列所构成。In all of the above embodiments of the present invention, the embedded target data is actually serially arranged into a target data sequence by more than one specific audio data and/or encoded data in a predetermined order. Specifically, the aforementioned specific audio data corresponds to a specific loudness and/or a specific pitch and/or timbre; and the aforementioned encoded data is a number expressed in a computer count. A specific target data sequence may be composed of one or more specific audio data serially arranged in a predetermined order; or may be formed by simply serially arranging one or more specific encoded data in a predetermined order; The predetermined rule is constituted by interleaving one or more specific audio data and one or more specific encoded data, and serially arranging them in a predetermined order.
事实上,一个目标数据序列单纯地由一个以上特定的编码数据顺序串行排列构成的好处是:能够使目标数据被高速地嵌入和接收、提取,适于应用在需要频繁且较快传递数据的场合,例如:直播互动等场景。In fact, the advantage that a target data sequence is simply serially arranged by more than one specific encoded data is that the target data can be embedded and received and extracted at a high speed, and is suitable for applications that need to transmit data frequently and quickly. Occasionally, for example, scenes such as live interaction.
在一些对数据传输的实时性和速度不敏感,且需要较大数据量传输的场合,一个目标数据序列单纯地由一个以上特定的音频数据顺序串行排列构成更为适当。In the case where some are insensitive to the real-time and speed of data transmission and require a large amount of data transmission, it is more appropriate that a target data sequence is simply serially arranged by more than one specific audio data.
在本发明的具体实施方式中,优选的方案是:任何一个特定的音频数据都与特定的响度和/或特定的音高和/或音色相对应。所谓响度又称音量,是指人耳感受到的声音强弱;它是人对声音大小的一个主观感觉量。其客观评价尺度是声音的振幅大小。所谓音高是指声音的高度,它由振动频率决定,因此,音高与振动频率成正比关系。所谓音色又称音品,是指听觉感受到的声音的特色。音色主要决定于声音的频谱,即基音和各次谐音的组成。In a particular embodiment of the invention, a preferred solution is that any particular audio data corresponds to a particular loudness and/or a particular pitch and/or timbre. The so-called loudness, also known as the volume, refers to the strength of the human voice; it is a subjective sense of the size of the sound. The objective evaluation scale is the amplitude of the sound. The so-called pitch is the height of the sound, which is determined by the vibration frequency. Therefore, the pitch is proportional to the vibration frequency. The so-called tone is also called the sound, which refers to the characteristics of the sound that the hearing feels. The tone is mainly determined by the spectrum of the sound, that is, the composition of the pitch and each harmonic.
在发明上述的各个实施方式中,可以使一个目标数据序列包含规定数量的、特定的音频数据;由于任何一个具体的音频数据,都可以使用上述的响度、音高和音色来确定,因此,可以使前面各个技术方案中述及的所有由规定数量的、特定的音频数据所构成的目标数据序列与一个信息码本对应,用于传递涵盖较大信息码本的数据。In the various embodiments described above, a target data sequence may be included in a specified number of specific audio data; since any specific audio data may be determined using the above-described loudness, pitch, and timbre, therefore, All target data sequences composed of a predetermined number of specific audio data mentioned in the foregoing various technical solutions are associated with one information codebook for transmitting data covering a larger information codebook.
例如:不同的音高具有不同的频率值;假定选取n个不同的频率值,其中,这n个音高分别可以用A、B、C、D、E、F、G、H、I、J......表示; 不同的响度具有不同的声音强度值;假定选取m个不同的声音强度值,其中,这m个响度分别可以用a、b、c、d、e、f、g、h......表示;不同的音色具有不同的声音频谱;假定选取k个不同的声音频谱,其中,这k个声音频谱分别可以用1、2、3......k表示;在此基础上,任何一个音频数据都可以采用如下的形式来描述:For example, different pitches have different frequency values; it is assumed that n different frequency values are selected, wherein the n pitches can respectively use A, B, C, D, E, F, G, H, I, J Said Different loudnesses have different sound intensity values; it is assumed that m different sound intensity values are selected, wherein the m loudnesses can respectively use a, b, c, d, e, f, g, h... Representation; different timbres have different sound spectra; assume k different sound spectra are selected, wherein the k sound spectra can be represented by 1, 2, 3, ... k respectively; on this basis, any An audio data can be described in the following form:
Figure PCTCN2016087445-appb-000001
Figure PCTCN2016087445-appb-000001
其中,X为音高,其数量为n;Y为响度,其数量为m;Z为音色,其数量为k;Where X is the pitch, the number is n; Y is the loudness, the number is m; Z is the timbre, the number of which is k;
因此,本发明中的任何一个音频数据的信息码本容量W可用下式计算:Therefore, the information codebook capacity W of any one of the audio data in the present invention can be calculated by the following formula:
W=n×m×kW=n×m×k
假定:本发明的一个目标数据序列中,单纯地由5个音频数据构成一个单元音频组;则任一单元音频数据组的信息码本容量由下式计算:It is assumed that in a target data sequence of the present invention, a unit audio group is simply composed of five audio data; the information codebook capacity of any unit audio data group is calculated by:
W=(n×m×k)5 W=(n×m×k) 5
当n=10,m=8,k=8时,When n=10, m=8, k=8,
W的值为:230×105>1014 The value of W is: 2 30 × 10 5 > 10 14
当然,上述的整数n、m和k的取值都是自然数,且相关的技术人员在实施本发明的时候,可以根据所需的信息码本容量来选择或者确定。Of course, the values of the above integers n, m and k are all natural numbers, and the relevant skilled person can select or determine according to the required codebook capacity when implementing the present invention.
如上所述:在本发明上述各类具体的实施方式中,可以完全单一的目标数据形式来构建一个目标数据序列,例如:单纯地使用音频数据或者单纯地使用编码数据来构建一个目标数据序列。但是,在一些情况下,有可能需要采用音频数据和编码数据混合的方式来构建一个目标数据序列。为了能够在接收时能够采用正确的手段将数据信息从本发明的第一数字音频信号提取出来,就需要在该目标数据序列的预定位置中插入预先确定的标识数据序列,使得接收设备在解析并识别到标识数据序列后,能够根据该标识数据序列的指示采用相应的识别方案,来提取对应的数据。例如:采用模式识别方案来识别目标数据序列中的音频数据。As described above, in the above various specific embodiments of the present invention, a target data sequence can be constructed in a completely single target data form, for example, simply using audio data or simply using the encoded data to construct a target data sequence. However, in some cases, it may be necessary to construct a target data sequence using a mixture of audio data and encoded data. In order to be able to extract data information from the first digital audio signal of the present invention by means of correct means upon receiving, it is necessary to insert a predetermined identification data sequence in a predetermined position of the target data sequence, so that the receiving device is parsing and After the identification data sequence is identified, the corresponding identification scheme can be used according to the indication of the identification data sequence to extract the corresponding data. For example, a pattern recognition scheme is used to identify audio data in a target data sequence.
当然,即使一个目标数据序列是由音频数据和编码数据混合而成的,但只要在一个完全封闭的信息体系内使用,也可以用协议好的方式来构建 任何目标数据序列,而无需在其中插入任何标识数据序列;相反,在一个开放的信息体系中,标识数据序列则几乎是必须的。因此,是否采用标识数据序列,应当由有关的技术人员在设计相关的系统时根据具体的需求来决定。Of course, even if a target data sequence is a mixture of audio data and encoded data, as long as it is used in a completely closed information system, it can be constructed in a good way. Any target data sequence without the need to insert any identifying data sequence into it; instead, in an open information system, identifying the data sequence is almost a must. Therefore, whether or not to use the identification data sequence should be determined by the relevant technical personnel according to the specific needs when designing the relevant system.
在本发明上述各种具体的实施方式中,如果采用标识数据序列,则该标识数据序列优选地使用编码数据来构成。但是,有关的技术人员也可以根据具体的设计需求选择使用音频数据,以及音频数据和编码数据的组合来构成标识数据序列。In the various specific embodiments of the invention described above, if an identification data sequence is employed, the identification data sequence is preferably constructed using encoded data. However, the relevant technician can also choose to use the audio data according to the specific design requirements, and the combination of the audio data and the encoded data to form the identification data sequence.
综上,本发明的一个重要的优点就是:由于上述的目标数据序列是在数字音频信号的掩蔽阈值以下的位置插入,在插入目标数据序列之后的数字音频信号被播放时,由于掩蔽效应的存在,插入的音频信号序列不会被人耳感知。In summary, an important advantage of the present invention is that since the above-mentioned target data sequence is inserted at a position below the masking threshold of the digital audio signal, the presence of the masking effect occurs when the digital audio signal after the insertion of the target data sequence is played. The inserted audio signal sequence is not perceived by the human ear.
此外,由于本发明中采用了多种维度的音频信号(响度、音高和音色)来构成音频数据序列的方案,这种方式使得构成信息码本的容量具有极大的空间,可以利用有限的音频数据来传递足够多的信息。In addition, since the audio signal (loudness, pitch, and timbre) of various dimensions is used in the present invention to form an audio data sequence, the capacity of the information codebook has a large space and can be utilized with limited space. Audio data to deliver enough information.
为了接收和获取采用本发明前述各个方案在数字音频信号中嵌入的目标数据序列,本发明还提供了如下的若干技术方案:In order to receive and acquire a target data sequence embedded in a digital audio signal using the foregoing various aspects of the present invention, the present invention also provides the following technical solutions:
在使用一些设备(例如:手机、具有麦克风和音频处理能力的智能设备等)接收到嵌入有音频信号序列的数字音频信号时,将接收到的数字音频信号分帧为多个音频帧数据并进行加窗处理;对前述多个音频帧数据进行频域离散傅立叶变换,得到与这些音频帧数据分别对应的多个频谱数据;When receiving a digital audio signal embedded with an audio signal sequence using some devices (for example, a mobile phone, a smart device having a microphone and audio processing capability, etc.), the received digital audio signal is framed into a plurality of audio frame data and performed. Windowing processing; performing frequency domain discrete Fourier transform on the plurality of audio frame data to obtain a plurality of spectrum data respectively corresponding to the audio frame data;
将这些频谱数据映射到听觉临界频带(Bark域),并计算听觉临界频带中各子带的掩蔽阈值;该掩蔽阈值的数量与前述的子带的数量是一一对应的;Mapping the spectral data to an auditory critical band (Bark domain), and calculating a masking threshold of each subband in the auditory critical band; the number of the masking thresholds is in one-to-one correspondence with the number of the aforementioned subbands;
在前述多个频谱数据中选取小于前述掩蔽阈值的频率点作为嵌入位置;采用可对量化结果实现盲检测的量化器对前述嵌入位置的离散傅里叶系数进行反量化处理,获得前述数字音频信号中嵌入的一维数据序列;参见本发明上述数字音频信号处理的各个具体实施方式的内容,前述的目标数据序列由一个以上特定的音频数据和/或编码数据按照预定的顺序串行 排列而成;其中,特定的音频频域信号与特定的响度和/或特定的音高和/或音色相对应。Selecting, from the plurality of spectral data, a frequency point smaller than the masking threshold as an embedded position; and using a quantizer capable of performing blind detection on the quantized result, performing inverse quantization processing on the discrete Fourier coefficients of the embedded position to obtain the digital audio signal. A one-dimensional data sequence embedded in the medium; see the content of each of the above-described digital audio signal processing of the present invention, wherein the foregoing target data sequence is serially arranged in a predetermined order by more than one specific audio data and/or encoded data. Arranged; wherein the particular audio frequency domain signal corresponds to a particular loudness and/or a particular pitch and/or tone.
采用本发明上述从数字音频信号中提取数据的具体实施方式,能够从嵌入有目标数据序列的数字音频信号提取到相应的一维数据序列。但是,如前所述:当一维数据序列是由音频数据构成,或者由音频数据和编码数据混合构成时;或者,这个数字音频信号是在一个开放的信息体系中传递时,需要在提取到的一维数据序列中查找预定的标识数据序列,并且根据这些标识数据序列的指示,对提取到的一维数据序列中与这些标识数据序列相关位置的音频数据进行模式识别,最终获得相应的目标数据序列。With the above-described embodiment of extracting data from a digital audio signal of the present invention, a corresponding one-dimensional data sequence can be extracted from a digital audio signal embedded with a target data sequence. However, as mentioned before: when the one-dimensional data sequence is composed of audio data, or is composed of a mixture of audio data and encoded data; or, when the digital audio signal is transmitted in an open information system, it needs to be extracted Finding a predetermined identification data sequence in the one-dimensional data sequence, and performing pattern recognition on the extracted audio data of the position corresponding to the identification data sequence in the extracted one-dimensional data sequence according to the indication of the identification data sequence, and finally obtaining the corresponding target Data sequence.
在一些情况下,获得目标数据序列,就意味着获得了实际的信息,例如:当目标数据序列仅由编码数据所构成时;但还有一些情况下,例如:当目标数据序列由音频数据,或者由音频数据和编码数据混合构成时,即使根据前述的标识数据序列的指示,采用模式识别的方式提取到目标数据序列后,可能还需要利用预定的编码表,对该等目标数据序列进行变换,最终得到嵌入到前述数字音频信号中的目标数据。In some cases, obtaining the target data sequence means obtaining the actual information, for example, when the target data sequence is composed only of the encoded data; but in some cases, for example, when the target data sequence is composed of audio data, Or when the audio data and the encoded data are mixed, even if the target data sequence is extracted by the mode recognition according to the indication of the foregoing identification data sequence, it may be necessary to use the predetermined coding table to transform the target data sequence. Finally, the target data embedded in the aforementioned digital audio signal is obtained.
当然,在本发明中,获得前述的一维数据序列或者目标数据序列以后,可以利用接收设备,例如:手机、具有麦克风和音频处理能力的智能设备等,将这些一维数据序列或者目标数据序列发送到服务器端,由服务器端来具体完成查找预定的标识数据序列,根据该标识数据序列的指示,采用模式识别的方式提取到目标数据序列,以及利用预定的编码表,对目标数据序列进行变换,最终得到嵌入到前述数字音频信号中的目标数据等操作。Of course, in the present invention, after obtaining the aforementioned one-dimensional data sequence or target data sequence, the one-dimensional data sequence or the target data sequence may be obtained by using a receiving device, such as a mobile phone, a smart device having a microphone and audio processing capability, and the like. Sending to the server side, the server side specifically completes the search for the predetermined identification data sequence, extracts the target data sequence by pattern recognition according to the indication of the identification data sequence, and transforms the target data sequence by using a predetermined coding table. Finally, operations such as target data embedded in the aforementioned digital audio signal are obtained.
一个具体的应用实例是:在采用上述的各个具体实施方式将嵌入到数字音频信号内的目标数据序列提取出来以后,如果该目标数据序列单纯地由音频数据所构成,就可以对该目标数据序列中的各个具体的特定音频数据及其组合进行编码匹配,即可以在预定的编码表中查询到该音频信号序列对应的数据信息。A specific application example is: after extracting the target data sequence embedded in the digital audio signal by using the above specific embodiments, if the target data sequence is simply composed of audio data, the target data sequence can be The specific specific audio data and the combination thereof are encoded and matched, that is, the data information corresponding to the audio signal sequence can be queried in a predetermined coding table.
上述预定的编码表中通常至少含有如下的相互一一对应的信息:音频数据序列和与之相对应的特定信息;例如:根据上述有关由响度、音高以及音色所组成的一个音频数据序列的例子,一个规定长度的音频数据序列 可以对应于字母“A”,对应于词语“能量”,对应于短句“频谱数据”,对应于一种物品对象“手机”,对应于一个网页链接地址“www.baidu.com”等等。这样传递信息的方式与电报码的方式有些类似;但是,如前所述,如果信息码本容量足够大,则本发明传递信息的方式就能够脱离前述的电报码的方式,而可以直接传递数据。The predetermined coding table generally includes at least one-to-one correspondence information: an audio data sequence and specific information corresponding thereto; for example, according to the above-mentioned audio data sequence composed of loudness, pitch, and timbre. Example, a specified length of audio data sequence Corresponding to the letter "A", corresponding to the word "energy", corresponding to the short sentence "spectral data", corresponding to an item object "mobile phone", corresponding to a web page link address "www.baidu.com" and the like. The manner of transmitting information in this way is somewhat similar to that of the telegraph code; however, as described above, if the information codebook capacity is sufficiently large, the method of transmitting information of the present invention can be separated from the aforementioned telegraph code, and the data can be directly transmitted. .
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。 Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present invention. range.

Claims (6)

  1. 一种数字音频信号处理的方法,包括:A method of digital audio signal processing, comprising:
    将第一数字音频信号分帧为多个音频帧数据并进行加窗处理;对所述多个音频帧数据进行频域离散傅立叶变换,得到与所述多个音频帧数据分别对应的多个第一频谱数据;Framing the first digital audio signal into a plurality of audio frame data and performing windowing processing; performing frequency domain discrete Fourier transform on the plurality of audio frame data to obtain multiple numbers corresponding to the plurality of audio frame data respectively a spectrum of data;
    将所述多个第一频谱数据映射到听觉临界频带,并计算听觉临界频带中各子带的掩蔽阈值;所述掩蔽阈值的数量与所述子带的数量对应;Mapping the plurality of first spectral data to an auditory critical band, and calculating a masking threshold of each subband in the auditory critical band; the number of the masking thresholds corresponding to the number of the subbands;
    选取所述多个第一频谱数据中小于所述掩蔽阈值的频率点作为嵌入位置;Selecting a frequency point of the plurality of first spectrum data that is smaller than the masking threshold as an embedded position;
    采用可对量化结果实现盲检测的量化器对目标数据进行量化处理,并用量化处理的结果赋值所述嵌入位置的离散傅里叶系数,获得与所述多个第一频谱数据对应的多个第二频谱数据;The target data is quantized by a quantizer capable of performing blind detection on the quantized result, and the discrete Fourier coefficients of the embedded position are assigned by the result of the quantization process to obtain a plurality of numbers corresponding to the plurality of first spectral data. Second spectrum data;
    对所述多个第二频谱数据进行离散傅立叶逆变换,获得第二数字音频信号。Performing discrete Fourier transform on the plurality of second spectral data to obtain a second digital audio signal.
  2. 根据权利要求1所述的方法,其特征在于,所述目标数据采用如下步骤获得:The method of claim 1 wherein said target data is obtained by the following steps:
    获取一个以上特定的音频数据和/或编码数据,并使所述一个以上特定的音频数据和/或编码数据按照预定的顺序串行排列为一个目标数据序列;或者,Obtaining more than one specific audio data and/or encoded data, and serially arranging the one or more specific audio data and/or encoded data into a target data sequence in a predetermined order; or
    获取一个以上特定的音频数据和/或编码数据,并使所述一个以上特定的音频数据和/或编码数据按照预定的顺序串行排列为一个目标数据序列;并且在所述目标数据序列的预定位置,插入预定的标识数据序列;所述的标识数据序列由预定的编码数据按照约定的长度和顺序排列而成;Obtaining more than one specific audio data and/or encoded data, and serially arranging the one or more specific audio data and/or encoded data into a target data sequence in a predetermined order; and predetermining the target data sequence Positioning, inserting a predetermined sequence of identification data; the sequence of identification data is arranged by predetermined encoded data according to a predetermined length and order;
    其中,所述特定的音频数据与特定的响度和/或特定的音高和/或音色相对应。Wherein the specific audio data corresponds to a specific loudness and/or a specific pitch and/or timbre.
  3. 根据权利要求1或2所述的方法,其特征在于,所述采用可对量化结果可实现盲检测的量化器对目标数据进行量化处理,并用量化处理的结果赋值所述嵌入位置的离散傅里叶系数,具体包括:The method according to claim 1 or 2, wherein said quantizing unit that can perform blind detection on the quantized result quantizes the target data, and assigns the discrete Fourier of said embedded position by the result of the quantization process Leaf coefficient, including:
    基于所述的嵌入位置,根据音频帧数据的掩蔽阈值,计算出相应的嵌 入强度,以确定在相应的音频帧数据中所嵌入的数据量;Calculating a corresponding embedding based on the masking threshold of the audio frame data based on the embedded position Intensity to determine the amount of data embedded in the corresponding audio frame data;
    根据所述的嵌入强度,采用可对量化结果可实现盲检测的量化器对目标数据进行量化处理,并用量化处理的结果赋值所述嵌入位置的离散傅里叶系数。According to the embedding strength, the target data is quantized by a quantizer that can perform blind detection on the quantization result, and the discrete Fourier coefficients of the embedding position are assigned by the result of the quantization process.
  4. 根据权利要求1或2所述的方法,其特征在于,还包括:The method according to claim 1 or 2, further comprising:
    当所述频率点处对应的第一频谱数据小于最小掩蔽阈值;和/或,所述频率点位于音频的中、低频段时,则将该频率点作为嵌入位置;所述的中低频段为30Hz--4KHz;和/或,When the corresponding first spectrum data at the frequency point is smaller than a minimum masking threshold; and/or, when the frequency point is located in the middle and low frequency bands of the audio, the frequency point is used as an embedded position; 30Hz--4KHz; and/or,
    计算所述第二数字音频信号的信噪比,且当所述第二数字音频信号的信噪比高于预定的阈值范围时,输出所述的第二数字音频信号。Calculating a signal to noise ratio of the second digital audio signal, and outputting the second digital audio signal when a signal to noise ratio of the second digital audio signal is above a predetermined threshold range.
  5. 一种从数字音频信号中提取数据的方法,包括:A method of extracting data from a digital audio signal, comprising:
    将第一数字音频信号分帧为多个音频帧数据并进行加窗处理;对所述多个音频帧数据进行频域离散傅立叶变换,得到与所述多个音频帧数据分别对应的多个第一频谱数据;Framing the first digital audio signal into a plurality of audio frame data and performing windowing processing; performing frequency domain discrete Fourier transform on the plurality of audio frame data to obtain multiple numbers corresponding to the plurality of audio frame data respectively a spectrum of data;
    将所述多个第一频谱数据映射到听觉临界频带,并计算听觉临界频带中各子带的掩蔽阈值;所述掩蔽阈值的数量与所述子带的数量对应;Mapping the plurality of first spectral data to an auditory critical band, and calculating a masking threshold of each subband in the auditory critical band; the number of the masking thresholds corresponding to the number of the subbands;
    选取所述多个第一频谱数据中小于所述掩蔽阈值的频率点作为嵌入位置;Selecting a frequency point of the plurality of first spectrum data that is smaller than the masking threshold as an embedded position;
    采用可对量化结果实现盲检测的量化器对所述嵌入位置的离散傅里叶系数进行反量化处理,获得所述第一数字音频信号中嵌入的目标数据序列;其中,所述的目标数据序列由一个以上特定的音频数据和/或编码数据按照预定的顺序串行排列而成;所述特定的音频频域信号与特定的响度和/或特定的音高和/或音色相对应。Performing inverse quantization processing on the discrete Fourier coefficients of the embedded position by using a quantizer capable of performing blind detection on the quantized result, to obtain a target data sequence embedded in the first digital audio signal; wherein the target data sequence More than one particular audio data and/or encoded data is serially arranged in a predetermined order; the particular audio frequency domain signal corresponds to a particular loudness and/or a particular pitch and/or timbre.
  6. 根据权利要求5所述的方法,其特征在于,还包括:The method of claim 5, further comprising:
    在所述的目标数据序列查找预定的标识数据序列,并且根据所述标识数据序列对所述的目标数据序列进行模式识别,获得相应的目标数据序列;或者,Finding a predetermined identification data sequence in the target data sequence, and performing pattern recognition on the target data sequence according to the identification data sequence to obtain a corresponding target data sequence; or
    在所述的目标数据序列查找预定的标识数据序列,并且根据所述标识数据序列对所述的目标数据序列进行模式识别,获得相应的目标数据序 列,并利用预定的编码表,对所述的目标数据序列进行变换,得到嵌入到所述第一数字音频信号中的目标数据。 Finding a predetermined identification data sequence in the target data sequence, and performing pattern recognition on the target data sequence according to the identification data sequence to obtain a corresponding target data sequence Columns, and using the predetermined coding table, transforming the target data sequence to obtain target data embedded in the first digital audio signal.
PCT/CN2016/087445 2015-07-27 2016-06-28 Method for processing digital audio signal WO2017016363A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510447092.2A CN106409301A (en) 2015-07-27 2015-07-27 Digital audio signal processing method
CN201510447092.2 2015-07-27

Publications (1)

Publication Number Publication Date
WO2017016363A1 true WO2017016363A1 (en) 2017-02-02

Family

ID=57884085

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/087445 WO2017016363A1 (en) 2015-07-27 2016-06-28 Method for processing digital audio signal

Country Status (2)

Country Link
CN (1) CN106409301A (en)
WO (1) WO2017016363A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3605531A4 (en) * 2017-03-28 2020-04-15 Sony Corporation Information processing device, information processing method, and program
CN108281152B (en) * 2018-01-18 2021-01-12 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method, device and storage medium
CN109257688B (en) * 2018-07-23 2021-01-22 东软集团股份有限公司 Audio distinguishing method and device, storage medium and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050137876A1 (en) * 2003-12-17 2005-06-23 Kiryung Lee Apparatus and method for digital watermarking using nonlinear quantization
CN101101754A (en) * 2007-06-25 2008-01-09 中山大学 Steady audio-frequency water mark method based on Fourier discrete logarithmic coordinate transformation
CN101345054A (en) * 2008-08-25 2009-01-14 苏州大学 Digital watermark production and recognition method used for audio document
CN102959621A (en) * 2010-02-26 2013-03-06 弗兰霍菲尔运输应用研究公司 Watermark decoder and method for providing binary message data
WO2014199449A1 (en) * 2013-06-11 2014-12-18 株式会社東芝 Digital-watermark embedding device, digital-watermark detection device, digital-watermark embedding method, digital-watermark detection method, digital-watermark embedding program, and digital-watermark detection program
CN104505096A (en) * 2014-05-30 2015-04-08 华南理工大学 Method and device using music to transmit hidden information
CN104795071A (en) * 2015-04-18 2015-07-22 广东石油化工学院 Blind audio watermark embedding and watermark extraction processing method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2362385A1 (en) * 2010-02-26 2011-08-31 Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V. Watermark signal provision and watermark embedding
CN102142255B (en) * 2010-07-08 2012-10-03 北京三信时代信息公司 Method for embedding and extracting digital watermark in audio signal

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050137876A1 (en) * 2003-12-17 2005-06-23 Kiryung Lee Apparatus and method for digital watermarking using nonlinear quantization
CN101101754A (en) * 2007-06-25 2008-01-09 中山大学 Steady audio-frequency water mark method based on Fourier discrete logarithmic coordinate transformation
CN101345054A (en) * 2008-08-25 2009-01-14 苏州大学 Digital watermark production and recognition method used for audio document
CN102959621A (en) * 2010-02-26 2013-03-06 弗兰霍菲尔运输应用研究公司 Watermark decoder and method for providing binary message data
WO2014199449A1 (en) * 2013-06-11 2014-12-18 株式会社東芝 Digital-watermark embedding device, digital-watermark detection device, digital-watermark embedding method, digital-watermark detection method, digital-watermark embedding program, and digital-watermark detection program
CN104505096A (en) * 2014-05-30 2015-04-08 华南理工大学 Method and device using music to transmit hidden information
CN104795071A (en) * 2015-04-18 2015-07-22 广东石油化工学院 Blind audio watermark embedding and watermark extraction processing method

Also Published As

Publication number Publication date
CN106409301A (en) 2017-02-15

Similar Documents

Publication Publication Date Title
US11961527B2 (en) Methods and apparatus to perform audio watermarking and watermark detection and extraction
Hu et al. Robust, transparent and high-capacity audio watermarking in DCT domain
US6968564B1 (en) Multi-band spectral audio encoding
KR100898879B1 (en) Modulating One or More Parameter of An Audio or Video Perceptual Coding System in Response to Supplemental Information
CN1808568B (en) Audio encoding/decoding apparatus having watermark insertion/abstraction function and method using the same
CN102982806B (en) Methods and apparatus to perform audio signal decoding
AU2001251274A1 (en) System and method for adding an inaudible code to an audio signal and method and apparatus for reading a code signal from an audio signal
Hu et al. Effective blind speech watermarking via adaptive mean modulation and package synchronization in DWT domain
EP2787503A1 (en) Method and system of audio signal watermarking
Xiang et al. Digital audio watermarking: fundamentals, techniques and challenges
WO2017016363A1 (en) Method for processing digital audio signal
Chen et al. Telephony speech enhancement by data hiding
Eichelberger et al. Imperceptible audio communication
Cao et al. Bit replacement audio watermarking using stereo signals
Ferreira Perceptual coding of harmonic signals
Adib A high capacity quantization-based audio watermarking technique using the DWPT
AU2012241085B2 (en) Methods and apparatus to perform audio watermarking and watermark detection and extraction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16829729

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16829729

Country of ref document: EP

Kind code of ref document: A1