CN118136023A - Speech perception hash authentication method, system, medium and equipment based on spectral entropy - Google Patents

Speech perception hash authentication method, system, medium and equipment based on spectral entropy Download PDF

Info

Publication number
CN118136023A
CN118136023A CN202410289365.4A CN202410289365A CN118136023A CN 118136023 A CN118136023 A CN 118136023A CN 202410289365 A CN202410289365 A CN 202410289365A CN 118136023 A CN118136023 A CN 118136023A
Authority
CN
China
Prior art keywords
audio data
pcm audio
frame
hash
pseudo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410289365.4A
Other languages
Chinese (zh)
Inventor
李强
王凌志
叶东翔
朱勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Bairui Internet Electronic Technology Co ltd
Original Assignee
Chongqing Bairui Internet Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Bairui Internet Electronic Technology Co ltd filed Critical Chongqing Bairui Internet Electronic Technology Co ltd
Priority to CN202410289365.4A priority Critical patent/CN118136023A/en
Publication of CN118136023A publication Critical patent/CN118136023A/en
Pending legal-status Critical Current

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application discloses a voice perception hash authentication method, a system, a storage medium and equipment based on spectral entropy, and belongs to the technical field of voice authentication of Bluetooth audio. The method includes performing LC3 audio encoding on PCM audio data to obtain audio encoded data; performing LC3 audio partial decoding on the audio encoded data to obtain spectral coefficients of each frame of PCM audio data; obtaining pseudo spectrum flatness of the PCM audio data of the corresponding frame according to the spectral coefficient of the PCM audio data of each frame; generating a perception hash sequence according to the pseudo spectrum flatness of each frame of PCM audio data; and comparing the perceived hash sequence with a pre-stored hash database, and judging whether the PCM audio data is the voice of the appointed person. The application effectively reduces the operation amount and the transmission bandwidth in the coding and decoding process by adopting the technology combined with the existing LC3 coder and decoder, and the hash sequence obtained based on the pseudo spectrum flatness can effectively distinguish the voice and the noise and the voices of different people, thereby improving the efficiency and the accuracy of voice authentication.

Description

Speech perception hash authentication method, system, medium and equipment based on spectral entropy
Technical Field
The application relates to the technical field of voice authentication of Bluetooth audio, in particular to a voice perception hash authentication method, a system, a storage medium and equipment based on spectral entropy.
Background
Perceptual hashing (Perceptual Hash) is a robust hash technique based on multimedia perceptual features that has emerged in recent years. Perceptual hashing is a type of unidirectional mapping of a set of multimedia data to a set of perceptual digests, i.e. a unique mapping of multimedia digital representations with the same perceptual content to a segment of digital digests. It is robust to content retention operations, and from a media aware content perspective, has properties similar to conventional cryptographic hash functions, such as unidirectionality, collision resistance, etc. This allows the application mode of many conventional hash functions to be extended in the multimedia field by means of perceptual hashing, for example: the keyed perceptual hash allows robust authentication of the integrity of the media content in a similar manner as the integrity of the data using cryptographic hashes.
"CN 107195028A-a high-precision wireless voice recognition access control system", wherein an access control system for realizing voice recognition based on perceptual hash is described as follows: firstly, a voice acquisition system is used for acquiring voice, then the voice is transmitted through a wireless transmission module, a voice signal processing module is used for generating a perception hash sequence, and finally, the generated perception hash sequence is compared with a pre-stored database through a voice recognition module to confirm whether recognition is successful or not. The defects are that: 1. the voice is not compressed before being transmitted by the wireless transmission module, so that the code rate is higher, a larger transmission bandwidth is needed, and the power consumption is larger; 2. the wireless receiving end needs to complete the encoding and decoding of the voice, occupies more storage space and consumes more computing resources, thereby bringing challenges to the system implementation of the receiving end; 3. the subband energy based on MDCT (modified discrete cosine transform) coefficient uses non-negative matrix factorization, and reconstructs a hash sequence, wherein the non-negative matrix factorization has larger operand, and the modules are all concentrated in a low-power consumption embedded system to be difficult to realize; 4. the full-band sub-band energy is used, but in practice, the most important frequency spectrum components of the voice perception hash are concentrated at 300 Hz-3500 Hz, and the full-band sub-band energy can not only cause the reduction of precision, but also increase the operand and the complexity of system realization.
Disclosure of Invention
Aiming at the technical problems in the prior art, the application provides a voice perception hash authentication method, a system, a storage medium and equipment based on spectral entropy.
The first technical scheme adopted by the application is as follows: the voice perception hash authentication method based on the spectrum entropy comprises the following steps: performing LC3 audio coding on the PCM audio data to obtain audio coding data; performing LC3 audio partial decoding on the audio encoded data to obtain spectral coefficients of each frame of PCM audio data; obtaining pseudo spectrum flatness of the PCM audio data of the corresponding frame according to the spectral coefficient of the PCM audio data of each frame; generating a perception hash sequence according to the pseudo spectrum flatness of each frame of PCM audio data; and comparing the perceived hash sequence with a pre-stored hash database, and judging whether the PCM audio data is the voice of the appointed person.
The second technical scheme adopted by the application is as follows: provided is a voice perception hash authentication system based on spectrum entropy, comprising: a module for performing LC3 audio encoding on the PCM audio data to obtain audio encoded data; a module for performing LC3 audio partial decoding on the audio encoded data to obtain spectral coefficients of each frame of PCM audio data; a module for obtaining pseudo spectrum flatness of the corresponding frame of PCM audio data according to the spectral coefficients of each frame of PCM audio data; a module for generating a perceptual hash sequence based on the pseudo-spectrum flatness of each frame of PCM audio data; and a module for comparing the perceived hash sequence with a pre-stored hash database and judging whether the PCM audio data is the voice of the appointed person.
The third technical scheme adopted by the application is as follows: a computer readable storage medium is provided having stored thereon computer instructions operable to perform a spectral entropy based speech aware hash authentication method in scheme one.
The fourth technical scheme adopted by the application is as follows: there is provided a computer device comprising a processor and a memory, the memory storing a computer program, wherein: the processor operates the computer program to perform the spectral entropy based speech aware hash authentication method of scheme one.
The technical scheme of the application has the following beneficial effects: the technical scheme of the application can be used for low-power consumption Bluetooth, classical Bluetooth and other short-distance wireless communication scenes, can effectively reduce the operand, algorithm delay and transmission bandwidth in the coding and decoding process by adopting the technology combined with the existing LC3 coder and decoder, can effectively distinguish voice and noise and voices of different people by calculating the hash sequence by utilizing the spectrum flatness, and improves the efficiency and accuracy of voice authentication.
Drawings
FIG. 1 is a schematic flow diagram of a specific embodiment of a spectral entropy-based voice perception hash authentication method of the present application;
FIG. 2 is a schematic diagram of one embodiment of a spectral entropy-based speech perceptual hash authentication method of the present application;
fig. 3 is a schematic diagram of a specific embodiment of a speech aware hash authentication system based on spectral entropy according to the present application.
Detailed Description
The preferred embodiments of the present application will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present application can be more easily understood by those skilled in the art, thereby making clear and defining the scope of the present application.
Fig. 1 is a schematic flow chart of a specific embodiment of a voice perception hash authentication method based on spectral entropy.
In the embodiment shown in fig. 1, the speech perception hash authentication method based on spectral entropy of the present application includes a process S101 of performing LC3 audio encoding on PCM audio data to obtain audio encoded data.
In this embodiment, at the bluetooth transmitting end, LC3 audio encoding is performed on PCM (Pulse Code Modulation ) audio data according to a standard specification, so as to obtain audio encoded data, and then the audio encoded data is transmitted to the bluetooth receiving end by means of bluetooth wireless transmission.
In the embodiment shown in fig. 1, the speech perception hash authentication method based on spectral entropy of the present application includes a process S102, where LC3 audio partial decoding is performed on audio encoded data to obtain spectral coefficients of PCM audio data of each frame.
In this embodiment, the bluetooth receiving end performs LC3 partial decoding on the received audio encoded data according to the standard specification, to obtain spectral coefficients of PCM audio data of each frame, for subsequent generation of the hash sequence.
In a specific embodiment of the present application, LC3 audio partial decoding is performed on audio encoded data, comprising: the audio encoded data is decoded until the transform domain noise shaping decoding is completed. Only LC3 partial decoding is performed to omit LD-IMDCT (low delay modified discrete cosine transform), overlap-add and LTPF (long-term post-filter) operations, thereby saving the amount of computation and memory, and reducing the delay of the system.
In the embodiment shown in fig. 1, the speech perception hash authentication method based on spectral entropy of the present application includes a process S103, according to spectral coefficients of each frame of PCM audio data, pseudo spectrum flatness of the corresponding frame of PCM audio data is obtained.
In one embodiment of the present application, obtaining pseudo-spectral flatness of each frame of PCM audio data from spectral coefficients of the corresponding frame of PCM audio data comprises: according to the spectral coefficients of each frame of PCM audio data, respectively obtaining corresponding pseudo spectrums of each frame of PCM audio data; and obtaining the flatness of the pseudo spectrum of the PCM audio data of the corresponding frame according to the ratio between the geometric average value of the pseudo spectrum of the PCM audio data of each frame and the corresponding arithmetic average value. Because the pseudo-spectrum flatness has a small energy relation with the audio, and is mainly related to the characteristics of the audio, such as the voice and the noise have very obvious difference in the pseudo-spectrum flatness, the voice and the noise can be effectively distinguished from the voice of different people by using the pseudo-spectrum flatness.
In this particular embodiment, the computation of the pseudo spectrum is based on a frame of spectral coefficients to compute the pseudo spectrum:
Wherein the method comprises the steps of Is a frame spectral coefficient obtained by decoding,/>When k= -1 or N F
After obtaining the corresponding pseudo spectrum of each frame of PCM audio data, calculating the geometric mean value of the pseudo spectrum of each frame of PCM audio data and the corresponding arithmetic mean value, and obtaining the ratio between the geometric mean value of the pseudo spectrum of each frame of PCM audio data and the corresponding arithmetic mean value, namely the flatness of the pseudo spectrum of the corresponding frame of PCM audio data.
In the embodiment shown in fig. 1, the speech perceptual hash authentication method based on spectral entropy of the present application includes a process S104 of generating a perceptual hash sequence according to pseudo spectrum flatness of PCM audio data of each frame.
In one embodiment of the present application, generating a perceptual hash sequence based on pseudo-spectral flatness of each frame of PCM audio data comprises: according to the spectral coefficient of each frame of PCM audio data, pseudo spectral energy of the corresponding frame of PCM audio data is obtained; and obtaining a perception hash value of each frame of PCM audio data according to the ratio between the pseudo spectrum energy of each frame of PCM audio data and the pseudo spectrum flatness of the corresponding frame of PCM audio data.
In one specific example of this disclosure, the entire pseudo spectrum of each frame of PCM audio data is divided into a plurality of sub-bands of predetermined width; and selecting an effective sub-band in which voice energy is concentrated from a plurality of sub-bands with preset widths, wherein the pseudo spectrum flatness of the corresponding frame of PCM audio data is obtained according to the spectral coefficient of each frame of PCM audio data, the pseudo spectrum flatness of the corresponding frame of PCM audio data is obtained according to the ratio between the geometric average value and the corresponding arithmetic average value of the pseudo spectrum of the effective sub-band, and the pseudo spectrum energy of the corresponding frame of PCM audio data is obtained according to the spectral coefficient of each frame of PCM audio data, and the pseudo spectrum energy of the corresponding frame of PCM audio data is obtained according to the pseudo spectrum corresponding to the effective sub-band. The pseudo spectrum energy of the effective sub-band is selected to be used, so that the operation and storage space can be saved, and the precision can be improved.
Specifically, firstly, dividing the whole pseudo spectrum of each frame of PCM audio data into a plurality of sub-bands with preset WIDTHs, wherein the number of the sub-bands is SUBBAND _num, the number of the sub-bands is SUBBAND _width, for example, when the sampling rate is 16kHz and the frame length is 10ms, 160 spectrum coefficients are output by discrete cosine transform, and 160 pseudo spectrum coefficients can be obtained:
Xpseudo(0),Xpseudo(1),…,Xpseudo(159)
The 160 pseudo-spectral coefficients may then be divided into 40 subbands, such that each subband has 4 pseudo-spectral coefficients:
the spectral coefficients for the 1 st subband are: x pseudo(0),Xpseudo(1),…,Xpseudo (3);
The spectral coefficients for the 2 nd subband are: x pseudo(4),Xpseudo(5),…,Xpseudo (7);
······
The spectral coefficients for the 40 th subband are: x pseudo(156),Xpseudo(157),…,Xpseudo (159).
After obtaining a plurality of sub-bands with preset widths, selecting an effective sub-band with concentrated voice energy from the sub-bands with preset widths, wherein the voice energy is mainly concentrated in a frequency band of 300-3500 Hz, the effective sub-band with the sub-band number of 1-18 can be selected in consideration of efficiency and calculation convenience, the corresponding frequency band range is 200-3600 Hz, the minimum value and the maximum value of the effective sub-band are respectively recorded as sbMin and sbMax, and thus the pseudo spectrum energy of the pseudo spectrum corresponding to the effective sub-band, the geometric average value of the pseudo spectrum of the effective sub-band and the corresponding arithmetic average value can be calculated, and the method is as follows:
First, pseudo spectrum energy of a pseudo spectrum corresponding to an effective sub-band is calculated:
secondly, calculating the geometric mean of the pseudo spectrum of the effective sub-band:
again, the arithmetic mean of the pseudo spectrum of the valid subbands is calculated:
Obtaining the flatness of pseudo spectrum of PCM audio data of corresponding frames according to the ratio of the geometric average value of pseudo spectrum of effective sub-band to the corresponding arithmetic average value, namely
Then, the ratio between the pseudo spectrum energy of the pseudo spectrum corresponding to the effective sub-band and the flatness of the pseudo spectrum is calculated:
Mu 1 is to avoid taking the logarithm of 0 during silence, usually taking the value 1, mu 2 is to avoid taking the denominator 0, usually taking the value 0.0001.
In one embodiment of the present application, obtaining a perceptual hash value of each frame of PCM audio data based on a ratio between pseudo-spectral energy of each frame of PCM audio data and pseudo-spectral flatness of the corresponding frame of PCM audio data, comprises: and under the condition that the ratio between the pseudo spectrum energy of the PCM audio data of the current frame and the pseudo spectrum flatness of the PCM audio data of the current frame is larger than the ratio between the pseudo spectrum energy of the PCM audio data of the previous frame and the pseudo spectrum flatness of the PCM audio data of the previous frame, taking the perceptual hash value of the PCM audio data of the current frame as 1, otherwise taking the perceptual hash value as 0.
Specifically, according to the ratio between the pseudo spectrum energy of the pseudo spectrum corresponding to the calculated effective sub-band and the pseudo spectrum flatness, the ratio of the current frame to the previous frame is judged, so as to obtain a perceived hash value, namely
Wherein h (m) is the hash value of the mth frame, so as to construct a perceptual hash sequence of length N as:
h(m-N+1),…,h(m-1),h(m)
In the specific embodiment shown in fig. 1, the voice perception hash authentication method based on spectral entropy of the present application includes a process S105 of comparing a perception hash sequence with a pre-stored hash database to determine whether PCM audio data is voice of a designated person.
In a specific embodiment of the present application, the comparing the perceptual hash sequence with a pre-stored hash database to determine whether PCM audio data is voice of a designated person includes: calculating the arithmetic average value of the exclusive OR value between the perceived hash value of each frame of PCM audio data and the corresponding hash value pre-stored in the hash database to obtain the hash distance of the PCM audio data; and under the condition that the hash distance is not greater than a preset threshold value, judging that the PCM audio data is the voice of the appointed person, or else, judging that the PCM audio data is not the voice of the appointed person. Firstly, calculating an arithmetic mean of exclusive OR values between a perceived hash value obtained by calculation and a corresponding hash value prestored in a hash database to obtain a hash distance of PCM audio data; the hash distance is then compared with a predetermined threshold to determine whether the PCM audio data is voice of a designated person.
Specifically, first, a hash distance is calculated:
then, the comparison result is judged:
Where T is a predetermined threshold, which is an empirical value, and may generally be 0.05 to 0.1.
Fig. 2 is a schematic diagram of a specific embodiment of a speech perception hash authentication method based on spectral entropy according to the present application.
As shown in fig. 2, LC3 audio encoding is performed on PCM audio data at a bluetooth transmitting end, an LC3 encoding technology is adopted, so that audio data can be effectively compressed, transmission bandwidth and power consumption are reduced, then the audio encoded data is transmitted to a bluetooth receiving end in a bluetooth wireless transmission mode, the bluetooth receiving end performs LC3 partial decoding on the received audio encoded data to obtain a spectral coefficient, a series of operations mentioned in the specific embodiment of fig. 1 are performed according to the spectral coefficient to generate a perceptual hash sequence, and the perceptual hash sequence is compared with a pre-stored hash database, so that an authentication result is output. According to the embodiment, LC3 coding is completed at a Bluetooth transmitting end, so that the related operation of a long-term post filter is omitted by the coding end, the operation space and the storage space of the coding end are saved, LC3 partial decoding is then performed at a Bluetooth receiving end, the operation processes of LD-IMDCT, overlap-add and LTPF are omitted by a decoding end, the operation amount and the storage space are further saved, meanwhile, the delay of a system can be reduced, then at the Bluetooth receiving end, according to the spectral coefficients obtained by partial decoding, the pseudo-spectrum energy of an effective sub-band is selected, the operation and the storage space can be saved, the precision can be improved, the perception hash value is obtained based on the pseudo-spectrum flatness, the perception hash sequence is generated, and then the perception hash sequence is compared with a pre-stored hash database, so that the operation speed of a voice authentication process is improved, the storage space is saved, the transmission bandwidth is reduced, and the accuracy of an authentication result is improved.
In the voice perception hash authentication method based on the spectrum entropy, the operation amount, the algorithm time delay and the transmission bandwidth in the encoding and decoding process can be effectively reduced by adopting the technology combined with the existing LC3 encoder and decoder, and the hash sequence obtained based on the pseudo spectrum flatness can effectively distinguish voice and noise and voices of different people, so that the efficiency and the accuracy of voice authentication are improved.
Fig. 3 is a schematic diagram of a specific embodiment of a speech aware hash authentication system based on spectral entropy according to the present application.
In the embodiment shown in fig. 3, the speech-aware hash authentication system based on spectral entropy of the present application includes a module 301 for performing LC3 audio encoding on PCM audio data to obtain audio encoded data.
In the embodiment shown in fig. 3, the speech-aware hash authentication system based on spectral entropy of the present application includes a module 302 for performing LC3 audio partial decoding on audio encoded data to obtain spectral coefficients of PCM audio data of each frame.
In a specific embodiment of the present application, LC3 audio partial decoding is performed on audio encoded data, comprising: the audio encoded data is decoded until the transform domain noise shaping decoding is completed. Only LC3 partial decoding is performed to omit LD-IMDCT (low delay modified discrete cosine transform), overlap-add and LTPF (long post-filter) operations, thereby saving the amount of computation and memory, and reducing the delay of the system.
In the embodiment shown in fig. 3, the speech perception hash authentication system based on spectral entropy of the present application includes a module 303, configured to obtain pseudo spectrum flatness of PCM audio data of each frame according to spectral coefficients of the PCM audio data of the corresponding frame.
In one embodiment of the present application, obtaining pseudo-spectral flatness of each frame of PCM audio data from spectral coefficients of the corresponding frame of PCM audio data comprises: according to the spectral coefficients of each frame of PCM audio data, respectively obtaining corresponding pseudo spectrums of each frame of PCM audio data; and obtaining the flatness of the pseudo spectrum of the PCM audio data of the corresponding frame according to the ratio between the geometric average value of the pseudo spectrum of the PCM audio data of each frame and the corresponding arithmetic average value. Because the pseudo-spectrum flatness has a small energy relation with the audio, and is mainly related to the characteristics of the audio, such as the voice and the noise have very obvious difference in the pseudo-spectrum flatness, the voice and the noise can be effectively distinguished from the voice of different people by using the pseudo-spectrum flatness.
In the embodiment shown in fig. 3, the speech perceptual hash authentication system based on spectral entropy of the present application comprises a module 304 for generating a perceptual hash sequence based on pseudo-spectral flatness of each frame of PCM audio data.
In one embodiment of the present application, generating a perceptual hash sequence based on pseudo-spectral flatness of each frame of PCM audio data comprises: according to the spectral coefficient of each frame of PCM audio data, pseudo spectral energy of the corresponding frame of PCM audio data is obtained; and obtaining a perception hash value of each frame of PCM audio data according to the ratio between the pseudo spectrum energy of each frame of PCM audio data and the pseudo spectrum flatness of the corresponding frame of PCM audio data.
In one specific example of this disclosure, the entire pseudo spectrum of each frame of PCM audio data is divided into a plurality of sub-bands of predetermined width; and selecting an effective sub-band in which voice energy is concentrated from a plurality of sub-bands with preset widths, wherein the pseudo spectrum flatness of the corresponding frame of PCM audio data is obtained according to the spectral coefficient of each frame of PCM audio data, the pseudo spectrum flatness of the corresponding frame of PCM audio data is obtained according to the ratio between the geometric average value and the corresponding arithmetic average value of the pseudo spectrum of the effective sub-band, and the pseudo spectrum energy of the corresponding frame of PCM audio data is obtained according to the spectral coefficient of each frame of PCM audio data, and the pseudo spectrum energy of the corresponding frame of PCM audio data is obtained according to the pseudo spectrum corresponding to the effective sub-band. The pseudo spectrum energy of the effective sub-band is selected to be used, so that the operation and storage space can be saved, and the precision can be improved.
In one embodiment of the present application, obtaining a perceptual hash value of each frame of PCM audio data based on a ratio between pseudo-spectral energy of each frame of PCM audio data and pseudo-spectral flatness of the corresponding frame of PCM audio data, comprises: and under the condition that the ratio between the pseudo spectrum energy of the PCM audio data of the current frame and the pseudo spectrum flatness of the PCM audio data of the current frame is larger than the ratio between the pseudo spectrum energy of the PCM audio data of the previous frame and the pseudo spectrum flatness of the PCM audio data of the previous frame, taking the perceptual hash value of the PCM audio data of the current frame as 1, otherwise taking the perceptual hash value as 0.
In the specific embodiment shown in fig. 3, the speech perception hash authentication system based on spectral entropy of the present application includes a module 305 for comparing the perception hash sequence with a pre-stored hash database, and determining whether PCM audio data is speech of a designated person.
In a specific embodiment of the present application, the comparing the perceptual hash sequence with a pre-stored hash database to determine whether PCM audio data is voice of a designated person includes: calculating the arithmetic average value of the exclusive OR value between the perceived hash value of each frame of PCM audio data and the corresponding hash value pre-stored in the hash database to obtain the hash distance of the PCM audio data; and under the condition that the hash distance is not greater than a preset threshold value, judging that the PCM audio data is the voice of the appointed person, or else, judging that the PCM audio data is not the voice of the appointed person. Firstly, calculating an arithmetic mean of exclusive OR values between a perceived hash value obtained by calculation and a corresponding hash value prestored in a hash database to obtain a hash distance of PCM audio data; the hash distance is then compared with a predetermined threshold to determine whether the PCM audio data is voice of a designated person.
In the voice perception hash authentication system based on the spectrum entropy, the calculation amount, the algorithm time delay and the transmission bandwidth in the encoding and decoding process can be effectively reduced by adopting the technology combined with the existing LC3 encoder and decoder, and the hash sequence obtained based on the pseudo spectrum flatness can effectively distinguish voice and noise and voices of different people, so that the efficiency and the accuracy of voice authentication are improved.
In one embodiment of the application, a computer readable storage medium stores computer instructions operable to perform the spectral entropy-based speech-aware hash authentication method described in any of the embodiments. Wherein the storage medium may be directly in hardware, in a software module executed by a processor, or in a combination of the two.
A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
The Processor may be a central processing unit (English: central Processing Unit, CPU for short), other general purpose Processor, digital signal Processor (English: DIGITAL SIGNAL Processor, DSP for short), application specific integrated Circuit (Application SPECIFIC INTEGRATED Circuit, ASIC for short), field programmable gate array (English: field Programmable GATE ARRAY, FPGA for short), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one embodiment of the application, a computer device includes a processor and a memory storing computer instructions, wherein: the processor operates the computer instructions to perform the spectral entropy based voice aware hash authentication method described in any of the embodiments.
The foregoing description is only illustrative of the present application and is not intended to limit the scope of the application, and all equivalent structural changes made by the present application and the accompanying drawings, or direct or indirect application in other related technical fields, are included in the scope of the present application.

Claims (10)

1. A voice perception hash authentication method based on spectrum entropy is characterized by comprising the following steps:
Performing LC3 audio coding on the PCM audio data to obtain audio coding data;
performing LC3 audio partial decoding on the audio encoded data to obtain spectral coefficients of the PCM audio data for each frame;
Obtaining pseudo spectrum flatness of the PCM audio data of corresponding frames according to the spectral coefficients of the PCM audio data of each frame;
Generating a perception hash sequence according to the pseudo spectrum flatness of the PCM audio data of each frame; and
And comparing the perceived hash sequence with a pre-stored hash database, and judging whether the PCM audio data is the voice of the appointed person.
2. The voice perceptual hash authentication method based on spectral entropy as defined in claim 1, wherein the obtaining pseudo spectrum flatness of the PCM audio data according to spectral coefficients of the PCM audio data for each frame comprises:
according to the spectral coefficients of the PCM audio data of each frame, respectively obtaining corresponding pseudo spectrums of the PCM audio data of each frame;
And obtaining the pseudo spectrum flatness of the PCM audio data of the corresponding frame according to the ratio between the geometric average value of the pseudo spectrum of the PCM audio data of each frame and the corresponding arithmetic average value.
3. The spectral entropy-based voice perceptual hash authentication method of claim 1, wherein the generating a perceptual hash sequence according to pseudo spectral flatness of the PCM audio data for each frame comprises:
Obtaining pseudo spectrum energy of the PCM audio data of corresponding frames according to the spectral coefficients of the PCM audio data of each frame;
And obtaining a perception hash value of the PCM audio data of each frame according to the ratio between the pseudo spectrum energy of the PCM audio data of each frame and the pseudo spectrum flatness of the PCM audio data of the corresponding frame.
4. The voice perceptual hash authentication method based on spectral entropy as defined in claim 3, wherein said obtaining a perceptual hash value of the PCM audio data for each frame based on a ratio between pseudo spectral energy of the PCM audio data for each frame and pseudo spectral flatness of the PCM audio data for the corresponding frame comprises:
And under the condition that the ratio between the pseudo spectrum energy of the PCM audio data in the current frame and the pseudo spectrum flatness of the PCM audio data in the current frame is larger than the ratio between the pseudo spectrum energy of the PCM audio data in the previous frame and the pseudo spectrum flatness of the PCM audio data in the previous frame, the perceived hash value of the PCM audio data in the current frame is taken as 1, otherwise, the perceived hash value is taken as 0.
5. The spectral entropy-based voice perceptual hash authentication method of claim 3, further comprising:
dividing all of said pseudo spectrum of said PCM audio data for each frame into a plurality of sub-bands of predetermined width; and
Selecting an effective sub-band of the speech energy concentration from the plurality of sub-bands of predetermined width, wherein,
The obtaining the pseudo spectrum flatness of the PCM audio data of the corresponding frame according to the spectral coefficients of the PCM audio data of each frame comprises obtaining the pseudo spectrum flatness of the PCM audio data of the corresponding frame according to the ratio between the geometric average value and the corresponding arithmetic average value of the pseudo spectrum of the effective sub-band, and
The pseudo spectrum energy of the PCM audio data of the corresponding frame is obtained according to the spectral coefficient of the PCM audio data of each frame, and the pseudo spectrum energy of the PCM audio data of the corresponding frame is obtained according to the pseudo spectrum corresponding to the effective sub-band.
6. The spectral entropy-based voice perceptual hash authentication method of claim 3, wherein the comparing the perceptual hash sequence with a pre-stored hash database to determine whether the PCM audio data is voice of a designated person comprises:
Calculating the arithmetic average value of the exclusive OR value between the perceived hash value of the PCM audio data of each frame and the corresponding hash value prestored in the hash database to obtain the hash distance of the PCM audio data;
And under the condition that the hash distance is not greater than a preset threshold value, judging that the PCM audio data is the voice of the appointed person, or else, judging that the PCM audio data is not the voice of the appointed person.
7. The spectral entropy-based voice perceptual hash authentication method of claim 1, wherein the performing LC3 audio partial decoding on the audio encoded data comprises: the audio encoded data is decoded until transform domain noise shaping decoding is completed.
8. A spectral entropy-based speech perception hash authentication system, comprising:
a module for performing LC3 audio encoding on the PCM audio data to obtain audio encoded data;
means for performing LC3 audio partial decoding on the audio encoded data to obtain spectral coefficients of the PCM audio data for each frame;
a module for obtaining pseudo spectrum flatness of the PCM audio data of the corresponding frame according to the spectral coefficients of the PCM audio data of each frame;
A module for generating a perceptual hash sequence based on the pseudo-spectrum flatness of the PCM audio data for each frame; and
And the module is used for comparing the perception hash sequence with a pre-stored hash database and judging whether the PCM audio data is the voice of a designated person.
9. A computer readable storage medium storing a computer program, wherein the computer program is operative to perform the spectral entropy based speech perceptual hash authentication method of any of claims 1-7.
10. A computer device comprising a processor and a memory, the memory storing a computer program, wherein: the processor operates the computer program to perform the spectral entropy based speech aware hash authentication method of any of claims 1-7.
CN202410289365.4A 2024-03-14 2024-03-14 Speech perception hash authentication method, system, medium and equipment based on spectral entropy Pending CN118136023A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410289365.4A CN118136023A (en) 2024-03-14 2024-03-14 Speech perception hash authentication method, system, medium and equipment based on spectral entropy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410289365.4A CN118136023A (en) 2024-03-14 2024-03-14 Speech perception hash authentication method, system, medium and equipment based on spectral entropy

Publications (1)

Publication Number Publication Date
CN118136023A true CN118136023A (en) 2024-06-04

Family

ID=91246864

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410289365.4A Pending CN118136023A (en) 2024-03-14 2024-03-14 Speech perception hash authentication method, system, medium and equipment based on spectral entropy

Country Status (1)

Country Link
CN (1) CN118136023A (en)

Similar Documents

Publication Publication Date Title
US11380342B2 (en) Hierarchical decorrelation of multichannel audio
US9524726B2 (en) Audio signal decoder, audio signal encoder, method for decoding an audio signal, method for encoding an audio signal and computer program using a pitch-dependent adaptation of a coding context
RU2591663C2 (en) Audio encoder, audio decoder, method of encoding audio information, method of decoding audio information and computer program using detection of group of previously decoded spectral values
US11682409B2 (en) Audio encoder for encoding an audio signal, method for encoding an audio signal and computer program under consideration of a detected peak spectral region in an upper frequency band
CN101968781B (en) Method of making a window type decision based on MDCT data in audio encoding
US9026451B1 (en) Pitch post-filter
US20240185873A1 (en) Audio encoder with a signal-dependent number and precision control, audio decoder, and related methods and computer programs
WO2014190641A1 (en) Media data transmission method, device and system
JP6408125B2 (en) Method for estimating noise in an audio signal, noise estimator, audio encoder, audio decoder and system for transmitting an audio signal
CN112599140B (en) Method, device and storage medium for optimizing voice coding rate and operand
CN111951815B (en) Method and system for searching quantized global gain sequence number of optimized LC3 encoder
RU2688259C2 (en) Method and device for signal processing
CN118136023A (en) Speech perception hash authentication method, system, medium and equipment based on spectral entropy
US10332527B2 (en) Method and apparatus for encoding and decoding audio signal
CN114566174B (en) Method, device, system, medium and equipment for optimizing voice coding
US20240153513A1 (en) Method and apparatus for encoding and decoding audio signal using complex polar quantizer
CN113205826B (en) LC3 audio noise elimination method, device and storage medium
CN108630212B (en) Perception reconstruction method and device for high-frequency excitation signal in non-blind bandwidth extension
CN116189693A (en) Bandwidth expansion method, device, medium and equipment
CN116504256A (en) Speech coding method, apparatus, medium, device and program product
Melkote et al. A modified distortion metric for audio coding
CN118197348A (en) Audio perception hash retrieval method and device based on nonnegative matrix factorization, and medium equipment
CN113539277A (en) Bluetooth audio decoding method, device, medium and equipment for protecting hearing
Huang et al. Hash authentication algorithm of compressed domain speech perception based on MFCC and NMF

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination