CN111462766A - Auditory pulse coding method and system based on sparse coding - Google Patents
Auditory pulse coding method and system based on sparse coding Download PDFInfo
- Publication number
- CN111462766A CN111462766A CN202010273268.8A CN202010273268A CN111462766A CN 111462766 A CN111462766 A CN 111462766A CN 202010273268 A CN202010273268 A CN 202010273268A CN 111462766 A CN111462766 A CN 111462766A
- Authority
- CN
- China
- Prior art keywords
- coded
- signal
- coding
- sound signal
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 230000005236 sound signal Effects 0.000 claims abstract description 160
- 238000013507 mapping Methods 0.000 claims abstract description 14
- 238000007781 pre-processing Methods 0.000 claims abstract description 14
- 238000001208 nuclear magnetic resonance pulse sequence Methods 0.000 claims description 13
- 238000010276 construction Methods 0.000 claims description 5
- 238000012935 Averaging Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 abstract description 4
- 230000014509 gene expression Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 210000000860 cochlear nerve Anatomy 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000036962 time dependent Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/10—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
- G10L19/107—Sparse pulse excitation, e.g. by using algebraic codebook
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Algebra (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention relates to an auditory pulse coding method and system based on sparse coding. The method comprises the following steps: constructing a kernel set capable of expressing sound basic elements; acquiring a sound signal to be coded; preprocessing the sound signal to be coded to obtain a preprocessed sound signal to be coded; according to the preprocessed sound signals to be coded, a time sequence matching tracking algorithm is adopted to obtain sparse codes of the preprocessed sound signals to be coded; mapping each of the sparse codes to an auditory impulse code. The coding of the auditory pulse mode generated by the invention can be suitable for a pulse neural network, and can ensure high coding efficiency and high coding fidelity.
Description
Technical Field
The invention relates to the field of sound processing, in particular to an auditory pulse coding method and system based on sparse coding.
Background
Sound structures in nature have non-static and time-dependent properties, such as transients, temporal relationships between acoustic events, and harmonic periodicity, among others. In sound localization, the human subject can reliably detect an interaural time difference of less than 10 μ s, which corresponds to a binaural sound source offset of about 1 degree. In contrast, the sampling interval of the audio CD sampled at 44.1kHz is 22.7 μ s. Studies have shown that some sound cues, such as the onset and offset of sound events, harmonic coordination modulation, and sound source localization, all depend on accurate time information. Therefore, it is a very important thing to extract sound structure features containing accurate time information from the nature. However, this presents a number of challenges, as in the natural acoustic environment, with the various sound sources and background noise, sound events cannot be directly observed and must be inferred using a number of ambiguous cues.
Most conventional sound feature expressions, such as Discrete wavelet transform (Discrete wavelet transform), Perceptual linear Prediction (Perceptual L inner Prediction), Mel-Frequency Cepstral Coefficients (Mel-Frequency Cepstral coeffients), etc., are based on time blocks, i.e., the signal is processed in segments in a series of Discrete blocks.
Disclosure of Invention
The invention aims to provide an auditory pulse coding method and system based on sparse coding so as to improve the sound coding efficiency and fidelity.
In order to achieve the purpose, the invention provides the following scheme:
a sparse coding based auditory pulse coding method, the method comprising:
constructing a kernel set capable of expressing sound basic elements;
acquiring a sound signal to be coded;
preprocessing the sound signal to be coded to obtain a preprocessed sound signal to be coded;
according to the kernel function group and the preprocessed sound signals to be coded, a time sequence matching tracking algorithm is adopted to obtain sparse codes of the preprocessed sound signals to be coded;
mapping each of the sparse codes to an auditory impulse code.
Optionally, the constructing a set of kernels that can express the sound basic elements specifically includes:
determining a center frequency group according to an equivalent rectangular bandwidth principle; the central frequency group comprises a plurality of central frequencies, and the values of the central frequencies are different;
and constructing a set of gamma functions with various center frequencies according to the center frequency set.
Optionally, the preprocessing the sound signal to be encoded to obtain a preprocessed sound signal to be encoded specifically includes:
judging whether the sound signal to be coded is a multi-channel signal or not to obtain a first judgment result;
if the first judgment result shows that the sound signal to be coded is a multi-channel signal, averaging signals of all channels in the multi-channel signal to obtain a single-channel signal;
determining the maximum absolute value of the single sound channel signal according to the single sound channel signal;
dividing the single sound channel signal by the maximum absolute value of the single sound channel signal to obtain a preprocessed sound signal to be coded;
if the first judgment result shows that the sound signal to be coded is not a multi-channel signal, acquiring the maximum value of the absolute value of the sound signal to be coded;
and dividing the sound signal to be coded by the maximum absolute value of the sound signal to be coded to obtain the preprocessed sound signal to be coded.
Optionally, the obtaining a plurality of sparse codes of the preprocessed sound signals to be coded by using a time sequence matching tracking algorithm according to the preprocessed sound signals to be coded specifically includes:
obtaining a plurality of values of inner products of all kernel functions in the kernel function group and the preprocessed sound signals to be coded at all time positions;
obtaining a maximum value of the plurality of values;
combining the maximum value, the time position corresponding to the maximum value and the kernel function index corresponding to the maximum value into a code; the maximum value is the encoded value of the encoding;
adding the code to a code table;
multiplying the code value of each code in the code table by the kernel function corresponding to the kernel function index of each code to obtain a plurality of coded short signals;
superposing the plurality of coded short signals according to the time position corresponding to each coded short signal to form a reconstructed signal;
subtracting the preprocessed sound signal to be coded and the reconstructed signal to obtain a residual signal;
according to the residual signal, obtaining the quotient of the length of the residual signal and the length of the sound signal to be coded;
judging whether the quotient is smaller than a preset quotient threshold value or not to obtain a second judgment result;
if the second judgment result indicates that the quotient is not less than the preset quotient threshold, taking the residual signal as the preprocessed sound signal to be coded, and returning to the step of obtaining a plurality of values of inner products of all kernel functions in the kernel function group and the preprocessed sound signal to be coded at all time positions;
and if the second judgment result shows that the quotient is smaller than the preset quotient threshold value, outputting the coding table.
Optionally, the mapping each sparse code to an auditory pulse code specifically includes:
obtaining the maximum value of all the coding values in the coding table;
obtaining a plurality of equally spaced distribution values within a natural index range from 0 to the maximum value; each distribution value corresponds to an intensity level;
numbering the intensity levels in sequence according to the distribution values;
acquiring the intensity level of each code in the code table; the difference value between the natural index value of the distribution value corresponding to the intensity level and the coded coding value is the minimum difference value between the natural index values of all the distribution values and the coded coding value;
the occurrence time of the pulse event is the time position of each code, and the pulse sequence position of the pulse event is L ═ m-1 (× n + S);
all pulse events constitute an auditory pulse pattern;
l is the pulse sequence position to which the pulse event belongs, m is the kernel function index of each code, n is the total number of intensity levels, and S is the intensity level of each code.
A sparse coding based auditory pulse coding system, the system comprising:
a kernel function group construction unit for constructing a kernel function group capable of expressing basic elements of sound;
the device comprises a to-be-coded sound signal acquisition unit, a coding unit and a coding unit, wherein the to-be-coded sound signal acquisition unit is used for acquiring a sound signal to be coded;
the pre-processed sound signal to be coded acquiring unit is used for pre-processing the sound signal to be coded to acquire a pre-processed sound signal to be coded;
the sparse code acquisition unit is used for acquiring sparse codes of a plurality of preprocessed sound signals to be coded by adopting a time sequence matching tracking algorithm according to the preprocessed sound signals to be coded;
an auditory pulse code acquisition unit for mapping each of the sparse codes to an auditory pulse code.
Optionally, the kernel function set constructing unit specifically includes:
the central frequency group acquisition subunit is used for determining a central frequency group according to an equivalent rectangular bandwidth principle; the central frequency group comprises a plurality of central frequencies, and the values of the central frequencies are different;
and the gamma function acquisition subunit is used for constructing a group of gamma functions with various center frequencies according to the center frequency group.
Optionally, the pre-processed sound signal to be encoded obtaining unit specifically includes:
the sound signal judgment result acquisition subunit is used for judging whether the sound signal to be coded is a multi-channel signal or not to obtain a first judgment result;
a monaural signal obtaining subunit, configured to, if the first determination result indicates that the sound signal to be encoded is a multi-channel signal, average signals of all channels in the multi-channel signal to obtain a monaural signal;
a monaural signal maximum value obtaining subunit, configured to determine an absolute value maximum value of the monaural signal according to the monaural signal;
the preprocessing sound signal to be coded determining subunit is used for dividing the single sound channel signal by the maximum absolute value of the single sound channel signal to obtain a preprocessed sound signal to be coded;
a sound signal to be encoded maximum value obtaining subunit, configured to obtain an absolute value maximum value of the sound signal to be encoded if the first determination result indicates that the sound signal to be encoded is not a multi-channel signal;
and the preprocessed sound signal to be coded acquiring subunit is used for dividing the sound signal to be coded by the maximum absolute value of the sound signal to be coded to obtain the preprocessed sound signal to be coded.
Optionally, the sparse coding acquisition unit specifically includes:
an inner product multi-group value obtaining subunit, configured to obtain a plurality of values of inner products of all kernel functions in the kernel function group and the preprocessed sound signal to be encoded at all time positions;
an inner product maximum value obtaining subunit configured to obtain a maximum value of the plurality of values;
the code acquisition subunit is used for forming a code by the maximum value, the time position corresponding to the maximum value and the kernel function index corresponding to the maximum value; the maximum value is the encoded value of the encoding;
an encoding table acquisition subunit, configured to add the encoding to an encoding table;
the coding short signal acquisition subunit is used for multiplying the coding value of each code in the coding table by the kernel function corresponding to the kernel function index of each code to obtain a plurality of coding short signals;
the reconstructed signal determining subunit is configured to superimpose the plurality of encoded short signals according to a time position corresponding to each encoded short signal to form a reconstructed signal;
a residual signal determining subunit, configured to perform a difference between the preprocessed to-be-encoded sound signal and the reconstructed signal to obtain a residual signal;
a quotient obtaining subunit, configured to obtain, according to the residual signal, a quotient of a length of the residual signal and a length of the sound signal to be encoded;
a quotient judgment result obtaining subunit, configured to judge whether the quotient is smaller than a preset quotient threshold value, and obtain a second judgment result;
a to-be-coded sound signal obtaining subunit, configured to, if the second determination result indicates that the quotient is not smaller than the preset quotient threshold, use the residual signal as a preprocessed to-be-coded sound signal, and return the preprocessed to the inner-product multi-group value obtaining subunit;
and the coding table output subunit is configured to output the coding table if the second determination result indicates that the quotient is smaller than the preset quotient threshold.
Optionally, the auditory pulse code acquiring unit specifically includes:
the maximum value acquisition subunit of all time positions is used for acquiring the maximum values of all the coding values in the coding table;
an equally-spaced distribution value acquisition subunit operable to acquire a plurality of equally-spaced distribution values within a natural exponent range from 0 to the maximum value; each distribution value corresponds to an intensity level;
an intensity level numbering and determining subunit, configured to number the intensity levels in sequence according to the size of the distribution value;
a coding strength level acquiring subunit, configured to acquire a strength level of each code in the coding table; the difference value between the natural index value of the distribution value corresponding to the intensity level and the coded coding value is the minimum difference value between the natural index values of all the distribution values and the coded coding value;
the pulse event acquisition subunit is used for mapping each code into a pulse event, wherein the occurrence time of the pulse event is the time position of each code, and the pulse sequence position of the pulse event is L ═ m-1 (× n + S);
an auditory pulse pattern acquisition subunit, configured to construct an auditory pulse pattern from all pulse events;
l is the pulse sequence position to which the pulse event belongs, m is the kernel function index of each code, n is the total number of intensity levels, and S is the intensity level of each code.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the method comprises the steps of firstly constructing a kernel function group capable of expressing basic elements of sound, preprocessing sound signals to be coded, adopting a time sequence matching tracking algorithm according to the kernel function group and the preprocessed sound signals to be coded, obtaining sparse codes of a plurality of preprocessed sound signals to be coded, namely decomposing sound into a plurality of combinations of kernel functions with different coefficients and different time points, being capable of maximally retaining information of original signals, minimizing required computing resources and having high coding efficiency; and finally mapping each sparse code into a pulse event, wherein all the pulse events form an auditory pulse mode, and the occurrence time of each pulse event is the time position of each code, so that the extracted sound characteristics have accurate time information, and the coded sound signals have extremely high fidelity.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a sparse coding-based auditory pulse coding method provided by the present invention;
FIG. 2 is a schematic diagram of a sparse coding-based auditory pulse coding method according to the present invention;
FIG. 3 is a schematic diagram of a gamma function provided by the present invention;
FIG. 4 is a schematic diagram of sparse coding mapping to auditory pulse coding provided by the present invention;
FIG. 5 is a block diagram of a sparse coding based auditory pulse coding system provided by the present invention;
description of the symbols:
the method comprises the steps of 1-kernel function group construction unit, 2-to-be-coded sound signal acquisition unit, 3-preprocessed to-be-coded sound signal acquisition unit, 4-sparse coding acquisition unit and 5-auditory pulse coding acquisition unit.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide an auditory pulse coding method and system based on sparse coding so as to improve the sound coding efficiency and fidelity.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a flowchart of an auditory pulse encoding method based on sparse coding according to the present invention. As shown in fig. 1, a sparse coding-based auditory pulse coding method includes:
s101, constructing a kernel set capable of expressing sound basic elements, specifically comprising:
determining a center frequency group according to an equivalent rectangular bandwidth principle; the center frequency group includes a plurality of center frequencies, and each of the center frequencies has a different value. Each center frequency is in the range of 20Hz to 8000 Hz.
From the set of center frequencies, a set of gamma functions with various center frequencies is constructed, as shown in FIG. 3. Specifically, the central frequency in the central frequency group is used as an input quantity and is input into a time domain expression of a gamma atone function, so as to obtain an output time domain expression (discrete time signal) of the gamma atone filter, and the time domain expression of the gamma atone filter can be understood as a one-dimensional time-varying vector.
The set of kernels corresponds to the set of kernels Φ of FIG. 2, and the various center frequencies in the set of kernels correspond to those of FIG. 2And
s102, acquiring a sound signal to be coded, such as the sound signal shown in FIG. 2.
S103, preprocessing the sound signal to be encoded to obtain a preprocessed sound signal to be encoded, which specifically includes:
and judging whether the sound signal to be coded is a multi-channel signal or not to obtain a first judgment result.
And if the first judgment result shows that the sound signal to be coded is a multi-channel signal, averaging signals of all channels in the multi-channel signal to obtain a single-channel signal.
The maximum value of the absolute value of the monophonic signal is determined from the monophonic signal.
And dividing the mono signal by the maximum absolute value of the mono signal to obtain the preprocessed sound signal to be coded.
And if the first judgment result shows that the sound signal to be coded is not a multi-channel signal, acquiring the maximum value of the absolute value of the sound signal to be coded.
And dividing the sound signal to be coded by the maximum absolute value of the sound signal to be coded to obtain the preprocessed sound signal to be coded.
S104, according to the kernel function group and the preprocessed to-be-coded sound signals, a time sequence matching tracking algorithm is adopted to obtain sparse codes of the preprocessed to-be-coded sound signals, and the method specifically comprises the following steps:
acquiring a plurality of values of inner products of all kernel functions in the kernel function group and the preprocessed sound signals to be coded at all time positions;
obtaining a maximum value of the plurality of values;
forming a code by the maximum value, the time position corresponding to the maximum value and the kernel function index corresponding to the maximum value; the maximum value is a coded value of the code;
the code is added to the code table. The coding table corresponds to the coding information table of fig. 2. The code value corresponds to s in the code information table of fig. 2, the time position corresponds to τ in the code information table of fig. 2, and the kernel index corresponds to m in the code information table of fig. 2.
And multiplying the coding value of each code in the coding table by the kernel function corresponding to the kernel function index of each code to obtain a plurality of coded short signals. The encoded short signal is the short signal in fig. 2.
And superposing the plurality of coded short signals according to the corresponding time position of each coded short signal to form a reconstructed signal. The reconstructed signal is the reconstructed signal in fig. 2
And subtracting the preprocessed sound signal to be coded and the reconstructed signal to obtain a residual signal. The residual signal is (t) in fig. 2.
From the residual signal, a quotient of the length of the residual signal and the length of the sound signal to be encoded is obtained.
And judging whether the quotient is smaller than a preset quotient threshold value or not to obtain a second judgment result. The preset quotient threshold is preset according to actual conditions.
And if the second judgment result shows that the quotient is not less than the preset quotient threshold, taking the residual signal as the preprocessed sound signal to be coded, and returning to the step of obtaining a plurality of groups of values of the inner product of the kernel function group and the preprocessed sound signal to be coded at each time position. One time position corresponds to a set of values ".
And if the second judgment result shows that the quotient is smaller than the preset quotient threshold value, outputting the coding table.
S105, mapping each sparse code to an auditory pulse code, as shown in fig. 4, specifically including:
the maximum value of all the encoding values in the encoding table is obtained.
A plurality of equally spaced distribution values in a natural index range from 0 to a maximum value are obtained. Each distribution value corresponds to an intensity level.
And numbering the intensity levels in sequence according to the size of the distribution value.
The intensity level of each code in the code table is obtained. And the difference value between the natural index value of the distribution value corresponding to the strength level and the coded coding value is the minimum difference value between the natural index values of all the distribution values and the coded coding value.
The occurrence time of the pulse event is the time position of each code, and the pulse sequence position to which the pulse event belongs is L ═ m-1 (× n + S).
All pulse events constitute an auditory pulse pattern.
Where L is the pulse sequence position to which the pulse event belongs, m is the kernel function index of each code, n is the total number of intensity levels, and S is the intensity level of each code.
The obtained auditory pulse mode can be directly input into any pulse neural network for processing.
The invention provides an auditory nerve coding algorithm based on a Temporal Matching Pursuit (TMP), and compared with the original traditional sound feature expression mode, the auditory nerve coding algorithm based on the TMP has a plurality of advantages that:
(1) time sensitivity: unlike the conventional time block-based sound feature extraction technology, this encoding method can extract sound features having accurate time information. This feature allows the encoded sound signal to have very high fidelity and is well suited to impulse neural networks that require accurate impulses as inputs.
(2) High efficiency: the sound is decomposed into a plurality of combinations of kernel functions with different coefficients and different time points by utilizing a nonlinear signal decomposition method, so that the information of the original signal can be maximally reserved, the required computing resources are minimized, and the energy consumption is reduced.
(3) Robustness: spiketrum exhibits natural information robustness to noise and self-loss of coding. The information robustness comes from the global greedy selection of kernel functions at different moments in the sound decomposition process, namely the maximum projection principle. Thus, when the pulse frequency is reduced, spikerum will look for and preferentially discard the atom with the least amount of information for the reconstructed signal under the global input signal time window.
The invention also provides a sparse coding-based auditory pulse coding system corresponding to a sparse coding-based auditory pulse coding method, as shown in fig. 5, the system comprises: the method comprises a kernel function group construction unit 1, a sound signal to be coded acquisition unit 2, a preprocessed sound signal to be coded acquisition unit 32, a sparse coding acquisition unit 4 and an auditory pulse coding acquisition unit 5.
A kernel function group constructing unit 1, configured to construct a kernel function group that can express the sound basic elements.
And the sound signal to be coded acquiring unit 2 is used for acquiring the sound signal to be coded.
The preprocessed to-be-coded sound signal obtaining unit 32 is configured to preprocess the to-be-coded sound signal to obtain a preprocessed to-be-coded sound signal.
And the sparse code acquisition unit 4 is configured to acquire sparse codes of the plurality of preprocessed sound signals to be coded by adopting a time sequence matching tracking algorithm according to the preprocessed sound signals to be coded.
An auditory pulse code acquisition unit 5 for mapping each sparse code to an auditory pulse code.
The kernel function group construction unit 1 specifically includes: a center frequency group acquisition subunit and a gamma function acquisition subunit.
The central frequency group acquisition subunit is used for determining a central frequency group according to an equivalent rectangular bandwidth principle; the center frequency group includes a plurality of center frequencies, and each of the center frequencies has a different value.
And the gamma function acquisition subunit is used for constructing a group of gamma functions with various center frequencies according to the center frequency group.
The pre-processed sound signal to be encoded obtaining unit 32 specifically includes: the device comprises a sound signal judgment result acquisition subunit, a single sound channel signal maximum value acquisition subunit, a preprocessing sound signal to be coded determination subunit, a sound signal maximum value acquisition subunit to be coded and a preprocessed sound signal to be coded acquisition subunit.
And the sound signal judgment result acquisition subunit is used for judging whether the sound signal to be coded is a multi-channel signal or not to obtain a first judgment result.
And the single-channel signal acquisition subunit is used for averaging the signals of all the channels in the multi-channel signal to obtain a single-channel signal if the first judgment result indicates that the sound signal to be encoded is a multi-channel signal.
And the monaural signal maximum value acquisition subunit is used for determining the maximum value of the absolute value of the monaural signal according to the monaural signal.
And the preprocessing sound signal to be coded determining subunit is used for dividing the monaural signal by the maximum absolute value of the monaural signal to obtain the preprocessed sound signal to be coded.
And the maximum value acquiring subunit is used for acquiring the maximum value of the absolute value of the sound signal to be coded if the first judgment result indicates that the sound signal to be coded is not a multi-channel signal.
And the preprocessed sound signal to be coded acquiring subunit is used for dividing the sound signal to be coded by the maximum absolute value of the sound signal to be coded to obtain the preprocessed sound signal to be coded.
The sparse code acquisition unit 4 specifically includes: the device comprises an inner product multi-group value acquisition subunit, an inner product maximum value acquisition subunit, a code table acquisition subunit, a code short signal acquisition subunit, a reconstructed signal determination subunit, a residual signal determination subunit, a quotient acquisition subunit, a quotient judgment result acquisition subunit, a to-be-coded sound signal acquisition subunit and a code table output subunit.
And the inner product multi-group value acquisition subunit is used for acquiring a plurality of values of inner products of all kernel functions in the kernel function group and the preprocessed sound signals to be coded at all time positions.
And the inner product maximum value acquisition subunit is used for acquiring the maximum value of the plurality of values.
And the code acquisition subunit is used for combining the maximum value, the time position corresponding to the maximum value and the kernel function index corresponding to the maximum value into a code. The maximum value is the encoded value of the encoding.
And the coding table acquisition subunit is used for adding the codes into the coding table.
And the coding short signal acquisition subunit is used for multiplying the coding value of each code in the coding table by the kernel function corresponding to the kernel function index of each code to obtain a plurality of coding short signals.
And the reconstructed signal determining subunit is used for superposing the plurality of encoded short signals according to the time position corresponding to each encoded short signal to form a reconstructed signal.
And the residual signal determining subunit is used for subtracting the preprocessed sound signal to be coded and the reconstructed signal to obtain a residual signal.
And the quotient acquisition subunit is used for acquiring the quotient of the length of the residual signal and the length of the sound signal to be coded according to the residual signal.
And the quotient judgment result acquisition subunit is used for judging whether the quotient is smaller than a preset quotient threshold value or not to obtain a second judgment result.
And the sound signal to be coded acquiring subunit is used for taking the residual signal as the sound signal to be coded after the preprocessing if the second judgment result shows that the quotient is not less than the preset quotient threshold value, and returning to the inner product multi-group value acquiring subunit.
And the coding table output subunit is used for outputting the coding table if the second judgment result shows that the quotient is smaller than the preset quotient threshold.
The auditory pulse code acquiring unit 5 specifically includes: the device comprises an all-time-position inner product maximum value acquisition subunit, an equal-spacing distribution value acquisition subunit, an intensity level number determination subunit, a coding intensity level acquisition subunit, a pulse event acquisition subunit and an auditory pulse mode acquisition subunit.
And the maximum value acquisition subunit of all time positions is used for acquiring the maximum value of all the code values in the code table.
An equally-spaced distribution value acquisition subunit operable to acquire a plurality of equally-spaced distribution values within a natural exponent range from 0 to a maximum value. Each distribution value corresponds to an intensity level.
And the strength grade number determining subunit is used for sequentially numbering the strength grades according to the size of the distribution value.
And the coding strength level acquisition subunit is used for acquiring the strength level of each code in the coding table. And the difference value between the natural index value of the distribution value corresponding to the strength level and the coded coding value is the minimum difference value between the natural index values of all the distribution values and the coded coding value.
The pulse event acquisition subunit is used for mapping each code to one pulse event, the occurrence time of the pulse event is the time position of each code, and the pulse sequence position to which the pulse event belongs is L ═ (m-1) × n + S.
An auditory pulse pattern acquisition subunit for all pulse events constituting an auditory pulse pattern.
Where L is the pulse sequence position to which the pulse event belongs, m is the kernel function index of each code, n is the total number of intensity levels, and S is the intensity level of each code.
Firstly, setting a group of kernel functions capable of expressing auditory basic characteristics as a dictionary, taking a single-channel sound signal with any length and type as input, continuously searching a part which is most similar to the dictionary in the signal by using a matching tracking algorithm, removing the part from an original signal, recording the time position of the part, corresponding dictionary element indexes and corresponding grades of similar strength; the above process is repeated until the sum of the squares of the sound signals is less than a certain threshold. The sparse set of codes thus obtained can be regarded as a binary set of pulse event sequences, the time position of each code represents the occurrence time of its corresponding pulse event, and the intensity level and the element index determine the pulse sequence position where it is located. The auditory pulse code generated by the invention can be suitable for a pulse neural network, and can ensure high coding efficiency and high coding fidelity.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.
Claims (10)
1. A sparse coding based auditory pulse coding method, the method comprising:
constructing a kernel set capable of expressing sound basic elements;
acquiring a sound signal to be coded;
preprocessing the sound signal to be coded to obtain a preprocessed sound signal to be coded;
according to the kernel function group and the preprocessed sound signals to be coded, a time sequence matching tracking algorithm is adopted to obtain sparse codes of the preprocessed sound signals to be coded;
mapping each of the sparse codes to an auditory impulse code.
2. An auditory pulse coding method based on sparse coding according to claim 1, wherein the constructing a set of kernels that can express basic elements of sound comprises:
determining a center frequency group according to an equivalent rectangular bandwidth principle; the central frequency group comprises a plurality of central frequencies, and the values of the central frequencies are different;
and constructing a set of gamma functions with various center frequencies according to the center frequency set.
3. An auditory pulse coding method based on sparse coding according to claim 1, wherein the preprocessing the sound signal to be coded to obtain a preprocessed sound signal to be coded specifically comprises:
judging whether the sound signal to be coded is a multi-channel signal or not to obtain a first judgment result;
if the first judgment result shows that the sound signal to be coded is a multi-channel signal, averaging signals of all channels in the multi-channel signal to obtain a single-channel signal;
determining the maximum absolute value of the single sound channel signal according to the single sound channel signal;
dividing the single sound channel signal by the maximum absolute value of the single sound channel signal to obtain a preprocessed sound signal to be coded;
if the first judgment result shows that the sound signal to be coded is not a multi-channel signal, acquiring the maximum value of the absolute value of the sound signal to be coded;
and dividing the sound signal to be coded by the maximum absolute value of the sound signal to be coded to obtain the preprocessed sound signal to be coded.
4. The sparse coding-based auditory pulse coding method according to claim 3, wherein the obtaining of the sparse codes of the plurality of preprocessed sound signals to be coded by using a time sequence matching tracking algorithm according to the preprocessed sound signals to be coded specifically comprises:
obtaining a plurality of values of inner products of all kernel functions in the kernel function group and the preprocessed sound signals to be coded at all time positions;
obtaining a maximum value of the plurality of values;
combining the maximum value, the time position corresponding to the maximum value and the kernel function index corresponding to the maximum value into a code; the maximum value is the encoded value of the encoding;
adding the code to a code table;
multiplying the code value of each code in the code table by the kernel function corresponding to the kernel function index of each code to obtain a plurality of coded short signals;
superposing the plurality of coded short signals according to the time position corresponding to each coded short signal to form a reconstructed signal;
subtracting the preprocessed sound signal to be coded and the reconstructed signal to obtain a residual signal;
according to the residual signal, obtaining the quotient of the length of the residual signal and the length of the sound signal to be coded;
judging whether the quotient is smaller than a preset quotient threshold value or not to obtain a second judgment result;
if the second judgment result indicates that the quotient is not less than the preset quotient threshold, taking the residual signal as the preprocessed sound signal to be coded, and returning to the step of obtaining a plurality of values of inner products of all kernel functions in the kernel function group and the preprocessed sound signal to be coded at all time positions;
and if the second judgment result shows that the quotient is smaller than the preset quotient threshold value, outputting the coding table.
5. The sparse-coding-based auditory pulse coding method according to claim 4, wherein the mapping each of the sparse codes to an auditory pulse code specifically comprises:
obtaining the maximum value of all the coding values in the coding table;
obtaining a plurality of equally spaced distribution values within a natural index range from 0 to the maximum value; each distribution value corresponds to an intensity level;
numbering the intensity levels in sequence according to the distribution values;
acquiring the intensity level of each code in the code table; the difference value between the natural index value of the distribution value corresponding to the intensity level and the coded coding value is the minimum difference value between the natural index values of all the distribution values and the coded coding value;
the occurrence time of the pulse event is the time position of each code, and the pulse sequence position of the pulse event is L ═ m-1 (× n + S);
all pulse events constitute an auditory pulse pattern;
l is the pulse sequence position to which the pulse event belongs, m is the kernel function index of each code, n is the total number of intensity levels, and S is the intensity level of each code.
6. A sparse coding based auditory pulse coding system, the system comprising:
a kernel function group construction unit for constructing a kernel function group capable of expressing basic elements of sound;
the device comprises a to-be-coded sound signal acquisition unit, a coding unit and a coding unit, wherein the to-be-coded sound signal acquisition unit is used for acquiring a sound signal to be coded;
the pre-processed sound signal to be coded acquiring unit is used for pre-processing the sound signal to be coded to acquire a pre-processed sound signal to be coded;
the sparse code acquisition unit is used for acquiring sparse codes of a plurality of preprocessed sound signals to be coded by adopting a time sequence matching tracking algorithm according to the preprocessed sound signals to be coded;
an auditory pulse code acquisition unit for mapping each of the sparse codes to an auditory pulse code.
7. The sparse coding-based auditory pulse coding system of claim 6, wherein the set of kernels constructing unit specifically comprises:
the central frequency group acquisition subunit is used for determining a central frequency group according to an equivalent rectangular bandwidth principle; the central frequency group comprises a plurality of central frequencies, and the values of the central frequencies are different;
and the gamma function acquisition subunit is used for constructing a group of gamma functions with various center frequencies according to the center frequency group.
8. An auditory pulse coding method based on sparse coding according to claim 6, wherein the pre-processed sound signal to be coded acquisition unit specifically comprises:
the sound signal judgment result acquisition subunit is used for judging whether the sound signal to be coded is a multi-channel signal or not to obtain a first judgment result;
a monaural signal obtaining subunit, configured to, if the first determination result indicates that the sound signal to be encoded is a multi-channel signal, average signals of all channels in the multi-channel signal to obtain a monaural signal;
a monaural signal maximum value obtaining subunit, configured to determine an absolute value maximum value of the monaural signal according to the monaural signal;
the preprocessing sound signal to be coded determining subunit is used for dividing the single sound channel signal by the maximum absolute value of the single sound channel signal to obtain a preprocessed sound signal to be coded;
a sound signal to be encoded maximum value obtaining subunit, configured to obtain an absolute value maximum value of the sound signal to be encoded if the first determination result indicates that the sound signal to be encoded is not a multi-channel signal;
and the preprocessed sound signal to be coded acquiring subunit is used for dividing the sound signal to be coded by the maximum absolute value of the sound signal to be coded to obtain the preprocessed sound signal to be coded.
9. The sparse-coding-based auditory pulse coding method according to claim 8, wherein the sparse-coding acquisition unit specifically comprises:
an inner product multi-group value obtaining subunit, configured to obtain a plurality of values of inner products of all kernel functions in the kernel function group and the preprocessed sound signal to be encoded at all time positions;
an inner product maximum value obtaining subunit configured to obtain a maximum value of the plurality of values;
the code acquisition subunit is used for forming a code by the maximum value, the time position corresponding to the maximum value and the kernel function index corresponding to the maximum value; the maximum value is the encoded value of the encoding;
an encoding table acquisition subunit, configured to add the encoding to an encoding table;
the coding short signal acquisition subunit is used for multiplying the coding value of each code in the coding table by the kernel function corresponding to the kernel function index of each code to obtain a plurality of coding short signals;
the reconstructed signal determining subunit is configured to superimpose the plurality of encoded short signals according to a time position corresponding to each encoded short signal to form a reconstructed signal;
a residual signal determining subunit, configured to perform a difference between the preprocessed to-be-encoded sound signal and the reconstructed signal to obtain a residual signal;
a quotient obtaining subunit, configured to obtain, according to the residual signal, a quotient of a length of the residual signal and a length of the sound signal to be encoded;
a quotient judgment result obtaining subunit, configured to judge whether the quotient is smaller than a preset quotient threshold value, and obtain a second judgment result;
a to-be-coded sound signal obtaining subunit, configured to, if the second determination result indicates that the quotient is not smaller than the preset quotient threshold, use the residual signal as a preprocessed to-be-coded sound signal, and return the preprocessed to the inner-product multi-group value obtaining subunit;
and the coding table output subunit is configured to output the coding table if the second determination result indicates that the quotient is smaller than the preset quotient threshold.
10. The sparse coding-based auditory pulse coding method according to claim 8, wherein the auditory pulse code acquisition unit specifically comprises:
the maximum value acquisition subunit of all time positions is used for acquiring the maximum values of all the coding values in the coding table;
an equally-spaced distribution value acquisition subunit operable to acquire a plurality of equally-spaced distribution values within a natural exponent range from 0 to the maximum value; each distribution value corresponds to an intensity level;
an intensity level numbering and determining subunit, configured to number the intensity levels in sequence according to the size of the distribution value;
a coding strength level acquiring subunit, configured to acquire a strength level of each code in the coding table; the difference value between the natural index value of the distribution value corresponding to the intensity level and the coded coding value is the minimum difference value between the natural index values of all the distribution values and the coded coding value;
the pulse event acquisition subunit is used for mapping each code into a pulse event, wherein the occurrence time of the pulse event is the time position of each code, and the pulse sequence position of the pulse event is L ═ m-1 (× n + S);
an auditory pulse pattern acquisition subunit, configured to construct an auditory pulse pattern from all pulse events;
l is the pulse sequence position to which the pulse event belongs, m is the kernel function index of each code, n is the total number of intensity levels, and S is the intensity level of each code.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010273268.8A CN111462766B (en) | 2020-04-09 | 2020-04-09 | Auditory pulse coding method and system based on sparse coding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010273268.8A CN111462766B (en) | 2020-04-09 | 2020-04-09 | Auditory pulse coding method and system based on sparse coding |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111462766A true CN111462766A (en) | 2020-07-28 |
CN111462766B CN111462766B (en) | 2022-04-26 |
Family
ID=71683706
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010273268.8A Active CN111462766B (en) | 2020-04-09 | 2020-04-09 | Auditory pulse coding method and system based on sparse coding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111462766B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113049080A (en) * | 2021-03-08 | 2021-06-29 | 中国电子科技集团公司第三十六研究所 | GDWC auditory feature extraction method for ship radiation noise |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW200830770A (en) * | 2007-01-05 | 2008-07-16 | Univ Nat Chiao Tung | A joint channel estimation and data detection method for STBC/OFDM systems |
US20090103602A1 (en) * | 2003-03-28 | 2009-04-23 | Digital Accelerator Corporation | Overcomplete basis transform-based motion residual frame coding method and apparatus for video compression |
US20110222707A1 (en) * | 2010-03-15 | 2011-09-15 | Do Hyung Hwang | Sound source localization system and method |
CN103177265A (en) * | 2013-03-25 | 2013-06-26 | 中山大学 | High-definition image classification method based on kernel function and sparse coding |
-
2020
- 2020-04-09 CN CN202010273268.8A patent/CN111462766B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090103602A1 (en) * | 2003-03-28 | 2009-04-23 | Digital Accelerator Corporation | Overcomplete basis transform-based motion residual frame coding method and apparatus for video compression |
TW200830770A (en) * | 2007-01-05 | 2008-07-16 | Univ Nat Chiao Tung | A joint channel estimation and data detection method for STBC/OFDM systems |
US20110222707A1 (en) * | 2010-03-15 | 2011-09-15 | Do Hyung Hwang | Sound source localization system and method |
CN103177265A (en) * | 2013-03-25 | 2013-06-26 | 中山大学 | High-definition image classification method based on kernel function and sparse coding |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113049080A (en) * | 2021-03-08 | 2021-06-29 | 中国电子科技集团公司第三十六研究所 | GDWC auditory feature extraction method for ship radiation noise |
Also Published As
Publication number | Publication date |
---|---|
CN111462766B (en) | 2022-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103730131B (en) | The method and apparatus of speech quality evaluation | |
EP3469584B1 (en) | Neural decoding of attentional selection in multi-speaker environments | |
Liutkus et al. | Informed source separation through spectrogram coding and data embedding | |
EP1941493B1 (en) | Content-based audio comparisons | |
CN1860526B (en) | Encoding audio signals | |
CN106373583B (en) | Multi-audio-frequency object coding and decoding method based on ideal soft-threshold mask IRM | |
US20110112669A1 (en) | Apparatus and Method for Calculating a Fingerprint of an Audio Signal, Apparatus and Method for Synchronizing and Apparatus and Method for Characterizing a Test Audio Signal | |
JP2009271554A (en) | Parametric representation of spatial audio | |
HUE031966T2 (en) | Companding apparatus and method to reduce quantization noise using advanced spectral extension | |
JP4538324B2 (en) | Audio signal encoding | |
CN111462766B (en) | Auditory pulse coding method and system based on sparse coding | |
Ideli et al. | Visually assisted time-domain speech enhancement | |
CN109584890A (en) | Audio watermark embedding method, audio watermark extracting method, television program interaction method and device | |
CN117238311B (en) | Speech separation enhancement method and system in multi-sound source and noise environment | |
JP4496378B2 (en) | Restoration method of target speech based on speech segment detection under stationary noise | |
Shin et al. | Audio coding based on spectral recovery by convolutional neural network | |
CN105283915B (en) | Digital watermark embedding device and method and digital watermark detecting device and method | |
Suied et al. | Auditory sketches: sparse representations of sounds based on perceptual models | |
Nogales et al. | A deep learning framework for audio restoration using Convolutional/Deconvolutional Deep Autoencoders | |
Derrien | Detection of genuine lossless audio files: Application to the MPEG-AAC codec | |
CN113571074A (en) | Voice enhancement method and device based on multi-band structure time domain audio separation network | |
Ballesteros L et al. | On the ability of adaptation of speech signals and data hiding | |
Khademi et al. | Audio steganography by using of linear predictive coding analysis in the safe places of discrete wavelet transform domain | |
CN116110373B (en) | Voice data acquisition method and related device of intelligent conference system | |
CN109040116A (en) | A kind of video conferencing system based on cloud server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |