CN112687282A - Voice source tracking method based on fingerprint image perceptual hashing - Google Patents

Voice source tracking method based on fingerprint image perceptual hashing Download PDF

Info

Publication number
CN112687282A
CN112687282A CN202011401234.9A CN202011401234A CN112687282A CN 112687282 A CN112687282 A CN 112687282A CN 202011401234 A CN202011401234 A CN 202011401234A CN 112687282 A CN112687282 A CN 112687282A
Authority
CN
China
Prior art keywords
fingerprint image
voice
hash
fingerprint
perceptual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011401234.9A
Other languages
Chinese (zh)
Inventor
刘林
贾鹏
刘亮
张磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202011401234.9A priority Critical patent/CN112687282A/en
Publication of CN112687282A publication Critical patent/CN112687282A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Collating Specific Patterns (AREA)

Abstract

The invention relates to a voice source tracking method based on fingerprint image perceptual hashing, and belongs to the technical field of information processing and voice tracking. The method comprises the following steps: step 1, performing perceptual hashing on a fingerprint image based on image characteristics to generate a hashed fingerprint image; step 2, embedding the Hash fingerprint image generated in the step 1 into a voice signal in a digital watermark mode, so as to bind the unique biological characteristics of the speaker with voice data and obtain the audio frequency embedded with the Hash perception of the fingerprint image of the speaker with the voice source; and 3, carrying out identity authentication on the voice by utilizing fingerprint image perception hash. The method performs identity authentication based on the fingerprint image generated by the voice perception hash, effectively avoids the technical defect that the voice recognition technology is easily interfered by external environmental factors, has certain robustness and collision resistance, meets the uniqueness requirement of the perception hash, and also has strong safety.

Description

Voice source tracking method based on fingerprint image perceptual hashing
Technical Field
The invention relates to a voice source tracking method based on fingerprint image perceptual hashing, and belongs to the technical field of information processing and voice tracking.
Background
With the acceleration of the social life digitization process, in the existing open communication environment, it is very easy to tamper with the voice content in storage and eavesdrop transmission, and the voice recording source is difficult to authenticate. For example, once economic and legal disputes are caused in financial operations and voice order business, the responsibility of the speaker cannot be traced easily; the voice evidence to be adopted in judicial judgment can hinder justice if part of the content is maliciously tampered but is not found in time in the storage and transmission processes. Even if the content is not tampered, if the certificate word is a fake certificate, the condition that the counterfeiter wants to escape legal sanction and deny that the counterfeiter does the certificate word cannot be ignored. Therefore, the research of tracking the source of the voice and confirming the identity of the speaker is significant.
The Perceptual Hashing (Perceptual Hashing) function takes multimedia data as input, outputs a Perceptual abstract set, and unidirectionally maps multimedia digital information with the same Perceptual effect (content) into a segment of digital abstract. The most advantageous characteristic of perceptual hashing is that the perceptual robustness is provided, and small-amplitude distortion and deformation which are frequently encountered in the input object information acquisition process can be tolerated. When the perception hash function is constrained by the perception threshold, the perception hash function has transferability, and simultaneously, because the hash algorithm has collision resistance, multimedia information with completely different perception contents cannot be mapped to obtain the same perception hash value. The perceptual hash can effectively reduce the feature vector dimension of the target object, occupies extremely small data capacity, and is suitable for generating a feature value mark.
The main research directions of the voice recognition technology include two types: the first category is to identify and understand the content of the speaker voice, and the second category is to identify the speaker voice by the unique characteristic, thereby realizing the identification of the speaker identity. The first category of research has emerged a number of mature products, with voice-based human-computer interaction having been widely used in everyday life; the second type of research, which aims to convert the speaker identification result into real life, is still limited by many uncertain factors: firstly, voice signals are unstable, microphones are different in the acquisition process, and the filtering and acquisition effects on original sound sources are different; secondly, the voice of the speaker is easily imitated or replayed by high-resolution recording equipment; thirdly, the identity characteristics of the speaker and the voice content characteristics of the speaker are difficult to be thoroughly separated, and characteristic parameters which can uniquely identify the identities of different speakers and can also block other external condition interferences are lacked.
The voice signals are utilized to carry out identity recognition, so that the identity recognition system not only contains identity characteristic information of individual speakers, but also contains voice content information of the speakers, namely aliasing of the identity characteristic signals and the voice content signals, and the difference between the two paths of signals is extremely small. Even if a high-fidelity and high-resolution voice recognition system is adopted under relatively ideal external conditions, the voice recognition system is influenced by external multiple complex factors in the practical application process, so that the accuracy of voice signal recognition is reduced. Therefore, the technical route of using the voice signal as the only analysis target to identify the speaker is difficult to be applied in a large range in a real life environment, and other types of identification information are required to be added to assist the voice signal in identifying the speaker.
The application researches the application of speaker biological fingerprint image perception hash in voice information source tracking, and the generated fingerprint image perception hash value is embedded into voice information as a watermark based on three fingerprint image perception hash generation schemes of gravity center, pixel expectation and gravity center angle. When the voice information source needs to be tracked, the perceptual hash value extracted from the voice information is compared with the perceptual hash value of the fingerprint image library, so that the tracking of the voice source is realized.
Disclosure of Invention
The invention aims to solve the technical situation that the identification accuracy is low due to the limitation of acquisition conditions and information processing means in the existing identification and source identification problems of voice signals, and provides a voice source tracking method based on fingerprint image perceptual hashing.
In order to achieve the above purpose, the present invention adopts the following technical scheme.
The voice source tracking method based on the fingerprint image perceptual hash comprises the following steps:
step 1, performing perceptual hashing on a fingerprint image based on image characteristics to generate a hashed fingerprint image;
the image characteristics comprise three fingerprint images based on gravity centers, pixel expectations and gravity center angles;
step 1, specifically comprising the following substeps:
step 1.1 calculate MD5 value with user input as key and calculated MD5 value as pseudo random number generator seed;
step 1.2, carrying out image acquisition on the user fingerprint, and randomly dividing a plurality of rectangular areas from the fingerprint image of the speaker through key control;
wherein, in the random division, the seed of the random number generator is generated in the step 1.1;
step 1.3, selecting parameters with good geometric invariance characteristics in each rectangular area generated in the step 1.2 as analysis objects, quantizing the selected parameters, and forming fingerprint image perception hash for identifying identity information of speakers to generate fingerprint images;
step 2, embedding the Hash fingerprint image generated in the step 1 into a voice signal in a digital watermark mode, so as to bind the unique biological characteristics of the speaker with voice data and obtain the audio frequency embedded with the Hash perception of the fingerprint image of the speaker with the voice source;
step 2, specifically comprising the following substeps:
step 2.1, dividing the original audio signal A into M sections, wherein the length of each section of audio signal is N;
step 2.2, each audio signal is divided into sub-bands, and each audio segment is divided into M1A sub-band, the length of each sub-band is N/M1(ii) a For the convenience of watermark embedding, M of the number of sub-bands is divided1The value is equal to the perceptual hash bit;
and 2.3, embedding the perception hash values in the M audio segments respectively by adopting an LSB (least significant bit) method, wherein the specific embedding method is to embed 0 bit and 1 bit of each hash value into M of the corresponding audio segment respectively1Embedding the M audio segments one by adopting the method at the 1 st sampling point of each sub-band until the M audio segments are embedded into all the audio segments, and then obtaining the audio A' embedded with the fingerprint image perception hash of the speaker with the voice source;
step 3, identity authentication is carried out on the voice by utilizing fingerprint image perception hash, and the method specifically comprises the following steps:
let h1And h2Respectively representing the perceptual hash values of the two fingerprint images, and measuring the similarity of the two fingerprint images by adopting the normalized Hamming distance;
x represents a fingerprint image, Y represents a fingerprint image of the image X after the content holding operation, and Z represents another fingerprint image different from X; hkRepresenting a perception hash function controlled by the key K, wherein the comparison result of perception hash values meets the following conditions:
D(Hk(X),Hk(Y))<T1
D(Hk(X),Hk(Z))>T2
wherein, T is more than or equal to 01<T2≤0.5;
Step 3, specifically comprising the following substeps:
step 3.1 extract perceptual hash value H from the lowest order of the speech samples1
Step 3.2 and perception hash value H in fingerprint image perception hash library2Matching one by one and calculating the normalized Hamming distance D (H) thereof1,H2) If D (H)1,H2)<T is thenIs successfully prepared, i.e. from H2The corresponding fingerprint image can track the identity of the speaker in the voice source;
wherein T is a similarity threshold value, and T is more than or equal to 0 and less than or equal to 0.5.
Advantageous effects
The invention provides a voice source tracking method based on fingerprint image perceptual hashing, which has the following beneficial effects compared with the traditional voice source tracking method:
1. the method combines a plurality of advantageous characteristics of the biological fingerprint image with an encryption key, and embeds the calculated perceptual hash into the voice signal by referring to the digital watermark, so that the identity of the speaker in the voice source is bound with the voice signal, and the tracking of the speaker in the voice source is realized;
2. the method performs identity authentication based on the fingerprint image generated by the voice perception hash, effectively avoids the technical defect that the voice recognition technology is easily interfered by external environmental factors, and has certain robustness;
3. the method has certain robustness and collision resistance, meets the requirement of sensing Hash uniqueness, namely safety, and also has strong safety and authentication uniqueness.
Drawings
FIG. 1 is a flowchart of a voice source tracking method based on fingerprint image perceptual hashing according to the present invention;
FIG. 2 is a schematic diagram illustrating a voice source tracking method based on fingerprint image perceptual hashing according to the present invention;
FIG. 3 is a schematic diagram illustrating a gravity-based biometric fingerprint image perceptual hash generation method for voice source tracking based on fingerprint image perceptual hash according to the present invention;
FIG. 4 is a 300X 300 fingerprint image from a fingerprint image library;
FIG. 5 is a comparison of the rotation attack resistance of the three perceptual hash algorithms of FIG. 6 for an oblique fingerprint image after being rotated by different angles of 1, 2, 5, 10, 30, and 90 degrees;
fig. 7 is a histogram of the matching values.
Detailed Description
The following describes a voice source tracking method based on perceptual hashing of a fingerprint image in detail with reference to specific embodiments and the accompanying drawings.
Example 1
This embodiment describes steps and specific implementation of a voice source tracking method based on perceptual hashing of fingerprint images according to the present invention, and a flow thereof is shown in fig. 1.
In fig. 1, firstly, a fingerprint image of a speaker is input through a smart phone or a fingerprint acquisition instrument, then a perceptual hash value is generated based on the fingerprint image and embedded in voice information data as a watermark to obtain a voice signal embedded with the perceptual hash of the fingerprint image, and the signal is remotely transmitted to a receiving end through a transmission channel. When a receiving end needs to track the identity of a speaker of a voice source, a perceptual hash value is firstly extracted from voice information and then is compared with the perceptual hash values in a fingerprint image perceptual hash library one by one, and when the similarity of the perceptual hash value and a certain perceptual hash value exceeds a certain threshold, matching is successful, so that the identity of the speaker of the voice source is tracked. Since the perceptual hash value is closely related to the speaker fingerprint image, the same perceptual hash value does not substantially appear in fingerprint images of different persons due to different keys. Thus, the requirements of collision resistance and safety can be satisfied. In addition, the gravity center of the image can resist geometric attacks such as translation and rotation, the stability is good, and micro geometric distortion of the fingerprint image caused by factors such as angle and pressing force during fingerprint recording can be tolerated, so that the corresponding hash value also meets the requirement of robustness.
When the method is implemented specifically, as shown in fig. 2, the key is input by the user and is in the form of a character string; calculating an MD5 value based on the key, the calculated MD5 value being used as a pseudo random number generator seed; then, the fingerprint of the speaker is subjected to image acquisition, the fingerprint image of the speaker is randomly divided into a plurality of rectangular areas through key control, the rectangular areas are randomly divided, then parameters with good geometric invariance characteristics are selected from each rectangular area to be used as analysis objects, calculation is carried out after the relevant geometric parameters are quantized, finally, the fingerprint image perceptual hash for identifying the identity information of the speaker is formed, a fingerprint image is generated, and then characteristic extraction, matching and tracking are carried out.
Perceptual hashing is to map multimedia objects of any size to short output data according to human perceptual characteristics, so that multimedia objects with the same perception but different expression forms generate similar or even the same perceptual hash values through a perceptual hash function. Image-aware hashing maps a digital image into a string of fixed-length feature values. The generated effect is similar to the judgment result of a human eye visual perception system on a target, two images with completely different contents are subjected to perception hash calculation, and different perception hash values are obtained; the perception hash calculation is carried out on the images with similar visual effects but slightly different definition and shooting angles, and the similar or even identical perception hash values can be obtained. The application realizes voice tracking based on perceptual hashing.
In specific implementation, in step 1, the fingerprint image based on the image characteristics is subjected to perceptual hashing, which includes three types: three fingerprint images based on gravity center, pixel expectation and gravity center angle are respectively realized through 1A, 1B and 1C;
1A) the fingerprint image perception hash generation based on the gravity center is specifically that the perception hash generated by the gravity center of the biological fingerprint image is calculated firstly, and then the hash value is embedded into voice data, so that the identity of a speaker is tracked, and the specific implementation is as shown in fig. 2. The method comprises the following steps:
step 1A.1) a user inputs a password (key), an MD5 value of the key is calculated, and the MD5 value is used as a seed of a pseudorandom number generator;
step 1a.2) designates a positive integer as the number of rectangular areas, which integer also represents the number of image blocks. Under the control of a random number generator, randomly dividing the fingerprint image into N overlapped rectangular areas;
step 1A.3) calculating the gravity center and the complement map gravity center of each rectangular area by adopting a gravity center calculation formula, and calculating the distance L between the two gravity centers;
wherein, in order to lengthen the distance between two centroids; the gravity center calculation formula adopts a modified formula as follows:
Figure RE-GDA0002966140040000051
Figure RE-GDA0002966140040000052
wherein f (i, j), i is more than or equal to 1 and less than or equal to N, j is more than or equal to 1 and less than or equal to N represents the gray value of the biological fingerprint image, and the parameter delta epsilon R+(ii) a The modified gravity center calculation formula enlarges the distance between the two gravity center points, and simultaneously the parameter delta ensures the robustness of the gravity center of the fingerprint image;
step 1A.4) combining N distances L into a column vector, and rounding and quantizing the column vector into a binary sequence L ', wherein L' is a generated perceptual hash value and a barycentric coordinate G (m) of a fingerprint imagex,my) Comprises the following steps:
Figure RE-GDA0002966140040000053
Figure RE-GDA0002966140040000054
in concrete implementation, the gravity center distribution region of the image is close to the geometric center region of the image, and if the gravity center of the image is not well obtained, the complement map gravity center G '(m'x,m′y) The center of gravity of the complement image is defined as follows:
f′(i,j)=Glevel-f(i,j)
wherein G islevelRepresenting the maximum gray level number of the fingerprint image; the image gravity center has good stability to geometric attacks, can bear operations such as translation, rotation and the like, and the fingerprint image based on the gravity center has negligible micro geometric distortion during fingerprint input and reading; the gravity center calculation result of the fingerprint image depends on the sensed image content, and the calculation process does not need to consider that the fingerprint image acquisition results of different people are similar, so thatDifferent encryption keys are adopted for fingerprint images of different people, so that the same perception hash value basically does not appear;
1B) a block diagram of centroid-based biometric fingerprint image-aware hash generation is shown in fig. 3.
-a pixel-based desired fingerprint image aware hash generation comprising the steps of:
step 1B.1) a user inputs a key, the key is regarded as a character string, an MD5 value of the key is calculated, and the MD5 value is used as a seed of a pseudorandom number generator;
step 1B.2) a user scans a fingerprint image of the user, and randomly divides the scanned fingerprint image into N overlapped rectangular areas by using a pseudo-random number generator;
step 1B.3) calculating the expectation of pixels of each rectangular area to obtain a vector F consisting of the expectation of rectangular pixels;
step 1B.4) rounding and quantizing F into a binary sequence F ', wherein F' is the generated perceptual hash value;
the method is expected to generate the perceptual hash value by calculating the regional pixels of the fingerprint image, has strong adaptability to the distortion of the fingerprint image to a certain extent, but has weak adaptability to the rotation operation of the fingerprint image because the image rotation has pixel distortion.
1C) Perceptual hash generation based on barycentric angles, comprising the steps of:
step 1C.1) a user inputs a key, the key is regarded as a character string, an MD5 value of the key is calculated, and the calculated MD5 value is used as a seed of a pseudorandom number generator;
step 1c.2) the user scans his fingerprint image and randomly segments this image into N overlappable rectangular areas using a pseudo-random number generator.
Step 1C.3) calculating the gravity center of each rectangular area and the corresponding complement chart gravity center by adopting a gravity center calculation formula;
step 1C.4) taking an original point (a point with a horizontal and vertical coordinate of zero) of the image as a straight line and connecting two centroids of a rectangle to form two straight lines intersecting the original point;
step 1C.5) calculating an included angle of two intersecting straight lines to obtain a vector H related to the included angle;
step 1C.6) rounding and quantizing H into a binary sequence H ', and then H' is the generated perceptual hash value.
Because the biological fingerprint can represent the identity of a speaker, and the hash value calculated based on the biological fingerprint is a fingerprint image abstract, the fingerprint image hash value generated by the speaker as a voice source can be embedded in a voice signal as a watermark, so that the identity of the speaker as a voice source is tracked, and if a digital signal corresponding to an initial audio is A ═ a (i), i is more than or equal to 1 and less than or equal to L, the hash value is sensed; step 2, during specific implementation, embedding is carried out through the following steps:
step 2.1, segmenting the audio signal, namely dividing the original audio signal A into M segments, wherein the length of each segment is N, and then M is equal to L/N; each audio segment is marked as A1(p),p=1,2,...,M;
Step 2.2, each audio signal is divided into sub-bands, and each audio segment is divided into M1Individual sub-band A2(p,q), q=1,2,...,M1Then the length of each sub-band is N/M1(ii) a For the convenience of watermark embedding, M of the number of sub-bands is divided1The value is equal to the perceptual hash bit;
and 2.3, embedding the perception hash values in the M audio segments respectively by adopting an LSB (least significant bit) method, wherein the specific embedding method is to embed 0 bit and 1 bit of each hash value into M of the corresponding audio segment respectively1The 1 st sample point of each sub-band. With S1{A2(p, q) } denotes the 1 st sample point of each subband, the embedding details are:
LSB(S1{A2(p,q)})←hd (1)
wherein LSB (-) represents the lowest bit of the sampling point;
embedding the M audio segments one by adopting the method until the M audio segments are embedded into all the audio segments, and obtaining the audio A' embedded with the fingerprint image perception hash of the speaker with the voice source;
let h1And h2Respectively representing the perceptual hash values of two fingerprint images, using normalized Hamming distance to measure the two fingerprint imagesSimilarity, normalized hamming distance is defined as follows:
Figure RE-GDA0002966140040000061
one fingerprint image is denoted by X, the fingerprint image of image X after the content holding operation is denoted by Y, and another fingerprint image different from X is denoted by Z. HkRepresenting a perceptual hash function controlled by a key K. The perception hash value comparison result meets the following conditions:
D(Hk(X),Hk(Y))<T1 (3)
D(Hk(X),Hk(Z))>T2 (4)
wherein, T is more than or equal to 01<T2≦ 0.5, in the comparison of perceptual hash values, T is desired1And T2The larger the difference, the better. Ideally, the normalized Hamming distance of similar fingerprint images should be close to 0, and the normalized Hamming distance of dissimilar fingerprint images should be close to 0.5.
And voice source tracking implementation: when tracking a speech source, first extracting a perceptual hash value H from the lowest order bits of a speech sample value1Then the fingerprint image is combined with a perception hash value H in a fingerprint image perception hash library2Matching one by one and calculating the normalized Hamming distance D (H) thereof1,H2):
Figure RE-GDA0002966140040000071
Wherein, N is the length of the perceptual hash value. Normalized Hamming distance D (H)1,H2) The smaller, the closer the two fingerprint images are, the higher the source tracking accuracy.
Let T be the similarity threshold, and T is more than or equal to 0 and less than or equal to 0.5, if D (H1, H2) < T, the matching is successful, i.e. the identity of the speaker in the source of speech can be tracked by the fingerprint image corresponding to H2. The smaller the similarity threshold T, the higher the source tracking accuracy will be.
Performance analysis of robustness: in the experiment, the fingerprint image comes from the million-level fingerprint image pixel database of the research group of the intelligent biological information system of the Fingerpass of the Chinese academy of Automation. The BMP format grayscale image with a fingerprint image size of 300 × 300 is generated with a perceptual hash value length of 150 bits.
Fig. 4 is a 300 × 300 fingerprint image in the fingerprint image library, and a speaker often inevitably tilts at a certain angle when inputting a fingerprint, and fig. 5 is a tilted fingerprint image of the speaker after rotating at different angles of 1 °, 2 °, 5 °, 10 °, 30 °, and 90 °. Calculating the rotated perceptual hash value h2Perceptual hash value h with original fingerprint image1The normalized Hamming distance can obtain a relation graph of the rotation angle and the normalized Hamming distance. Fig. 6 shows the comparison of the anti-rotation attack performance of the three perceptual hash algorithms after the fingerprint image is rotated by different angles. As can be seen from fig. 6, the perceptual hash generation algorithm based on the angle of the center of gravity has better anti-attack performance, and the normalized hamming distance of all three perceptual hash generation algorithms increases as the rotation angle increases, but the normalized hamming distance decreases to some extent at some angles, and the normalized hamming distance increases as the rotation angle increases in a wavy manner. In addition, the normalized Hamming distance is larger for the rotation angle larger than 5 degrees, and the normalized Hamming distance is acceptable for the rotation angle within 5 degrees, which shows that the speaker with the phonetic source can be tracked more accurately. Impact resistance: to study the collision resistance of the proposed perceptual hash algorithm, we tested the centroid-based perceptual hash generation algorithm as an example. The method comprises the steps of randomly selecting 60 fingerprint images of 304 x 256 in a fingerprint image library (self-collection in a laboratory) to generate hash values, and matching the hash values pairwise to obtain 1770 matching results. Fig. 7 is a histogram of the matching values, and it can be seen that the matching results are approximately fitted to a gaussian distribution N (μ, σ), where the mathematical expectation is 1.327 and the standard deviation σ 0.5786. In the test, a threshold T' is selected to be 0.43, and the collision resistance generating the perceptual hash value is calculated according to the following formula:
Figure RE-GDA0002966140040000081
the collision rate of the gravity-based perceptual hash generation algorithm calculated by equation (9) is 9.043 × 10-5Therefore, the collision rate of the algorithm is small, and therefore the uniqueness of the perceptual hash value is guaranteed. Fig. 7, statistical histogram of 1770 matching results.
Safety: with the extreme sensitivity of the MD5 algorithm to input, the user inputs different keys, generates different random numbers, and randomly blocks the fingerprint image, resulting in the perceptual hash value segments shown in table 1. As can be seen from table 1, the perceptual hash values generated by different keys are distinct, and the euclidean distance is calculated to be 85.9867 by matching the two perceptual hash values, which indicates that the distance is large, and the security requirement of the hash sequence is satisfied. Therefore, without knowing the key, the perceptual hash value of the fingerprint image cannot be obtained.
TABLE 1 perceptual Hash value segments generated by different keys for the same fingerprint image
Figure RE-GDA0002966140040000082
While the foregoing is directed to the preferred embodiment of the present invention, it is not intended that the invention be limited to the embodiment and the drawings disclosed herein. Equivalents and modifications may be made without departing from the spirit of the disclosure, which is to be considered as within the scope of the invention.

Claims (5)

1. A voice source tracking method based on fingerprint image perceptual hashing is characterized in that: the method comprises the following steps:
step 1, performing perceptual hashing on a fingerprint image based on image characteristics to generate a hashed fingerprint image;
step 1, specifically comprising the following substeps:
step 1.1 calculate MD5 value with user input as key and calculated MD5 value as pseudo random number generator seed;
step 1.2, carrying out image acquisition on the user fingerprint, and randomly dividing a plurality of rectangular areas from the fingerprint image of the speaker through key control;
step 1.3, selecting parameters with good geometric invariance characteristics in each rectangular area generated in the step 1.2 as analysis objects, quantizing the selected parameters, and forming fingerprint image perception hash for identifying identity information of speakers to generate fingerprint images;
step 2, embedding the Hash fingerprint image generated in the step 1 into a voice signal in a digital watermark mode, so as to bind the unique biological characteristics of the speaker with voice data and obtain the audio frequency embedded with the Hash perception of the fingerprint image of the speaker with the voice source;
step 2, specifically comprising the following substeps:
step 2.1, dividing the original audio signal A into M sections, wherein the length of each section of audio signal is N;
step 2.2, each audio signal is divided into sub-bands, and each audio segment is divided into M1A sub-band, the length of each sub-band is N/M1(ii) a For the convenience of watermark embedding, M of the number of sub-bands is divided1The value is equal to the perceptual hash bit;
and 2.3, embedding the perception hash values in the M audio segments respectively by adopting an LSB (least significant bit) method, wherein the specific embedding method is to embed 0 bit and 1 bit of each hash value into M of the corresponding audio segment respectively1Embedding the M audio segments one by adopting the method at the 1 st sampling point of each sub-band until the M audio segments are embedded into all the audio segments, and then obtaining the audio A' embedded with the fingerprint image perception hash of the speaker with the voice source;
step 3, identity authentication is carried out on the voice by utilizing fingerprint image perception hash, and the method specifically comprises the following steps:
let h1And h2Respectively representing the perceptual hash values of the two fingerprint images, and measuring the similarity of the two fingerprint images by adopting the normalized Hamming distance;
x represents a fingerprint image, Y represents a fingerprint image of the image X after the content holding operation, and Z represents another fingerprint image different from X; hkRepresenting a perception hash function controlled by the key K, wherein the comparison result of perception hash values meets the following conditions:
D(Hk(X),Hk(Y))<T1
D(Hk(X),Hk(Z))>T2
step 3, specifically comprising the following substeps:
step 3.1 extract perceptual hash value H from the lowest order of the speech samples1
Step 3.2 and perception hash value H in fingerprint image perception hash library2Matching one by one and calculating the normalized Hamming distance D (H) thereof1,H2) If D (H)1,H2)<T, then the matching is successful, i.e. from H2The corresponding fingerprint image can track the identity of the speaker in the voice source.
2. The voice source tracking method based on the perceptual hashing of fingerprint images as claimed in claim 1, wherein: in step 1, the image features include three types of fingerprint images including fingerprint images based on gravity center, pixel expectation, and gravity center angle.
3. The voice source tracking method based on the perceptual hashing of fingerprint images as claimed in claim 2, wherein: in step 1.2, the random number generator seed is generated using step 1.1 during random division.
4. The voice source tracking method based on the perceptual hashing of fingerprint images as claimed in claim 3, wherein: in step 3, T is more than or equal to 01<T2≤0.5。
5. The method for tracking the voice source based on the perceptual hash of the fingerprint image as claimed in claim 4, wherein: in step 3, T is a similarity threshold, and T is more than or equal to 0 and less than or equal to 0.5.
CN202011401234.9A 2020-12-02 2020-12-02 Voice source tracking method based on fingerprint image perceptual hashing Pending CN112687282A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011401234.9A CN112687282A (en) 2020-12-02 2020-12-02 Voice source tracking method based on fingerprint image perceptual hashing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011401234.9A CN112687282A (en) 2020-12-02 2020-12-02 Voice source tracking method based on fingerprint image perceptual hashing

Publications (1)

Publication Number Publication Date
CN112687282A true CN112687282A (en) 2021-04-20

Family

ID=75445909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011401234.9A Pending CN112687282A (en) 2020-12-02 2020-12-02 Voice source tracking method based on fingerprint image perceptual hashing

Country Status (1)

Country Link
CN (1) CN112687282A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113592744A (en) * 2021-08-12 2021-11-02 长光卫星技术有限公司 Geometric precise correction method suitable for high-resolution remote sensing image
CN115021966A (en) * 2022-05-06 2022-09-06 深圳比特微电子科技有限公司 Voice access method, user access equipment and remote system
CN117116275A (en) * 2023-10-23 2023-11-24 浙江华创视讯科技有限公司 Multi-mode fused audio watermarking method, device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104835499A (en) * 2015-05-13 2015-08-12 西南交通大学 Cipher text speech perception hashing and retrieving scheme based on time-frequency domain trend change
CN110414200A (en) * 2019-04-08 2019-11-05 广州腾讯科技有限公司 Auth method, device, storage medium and computer equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104835499A (en) * 2015-05-13 2015-08-12 西南交通大学 Cipher text speech perception hashing and retrieving scheme based on time-frequency domain trend change
CN110414200A (en) * 2019-04-08 2019-11-05 广州腾讯科技有限公司 Auth method, device, storage medium and computer equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HWAI-TSU HU: "Efficient and robust frame-synchronized blind audio watermarking by featuring multilevel DWT and DCT", 《CLUSTER COMPUTING》 *
邱勇: "基于感知哈希的语音身份及内容认证技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113592744A (en) * 2021-08-12 2021-11-02 长光卫星技术有限公司 Geometric precise correction method suitable for high-resolution remote sensing image
CN113592744B (en) * 2021-08-12 2024-03-19 长光卫星技术股份有限公司 Geometric fine correction method suitable for high-resolution remote sensing image
CN115021966A (en) * 2022-05-06 2022-09-06 深圳比特微电子科技有限公司 Voice access method, user access equipment and remote system
CN117116275A (en) * 2023-10-23 2023-11-24 浙江华创视讯科技有限公司 Multi-mode fused audio watermarking method, device and storage medium
CN117116275B (en) * 2023-10-23 2024-02-20 浙江华创视讯科技有限公司 Multi-mode fused audio watermarking method, device and storage medium

Similar Documents

Publication Publication Date Title
CN112687282A (en) Voice source tracking method based on fingerprint image perceptual hashing
Muhammad et al. A secure method for color image steganography using gray-level modification and multi-level encryption
CN104823203B (en) Biometric templates safety and key generate
CN102306305B (en) Method for authenticating safety identity based on organic characteristic watermark
CN101345054B (en) Digital watermark production and recognition method used for audio document
Ouyang et al. Robust hashing for image authentication using SIFT feature and quaternion Zernike moments
Yan et al. Multi-scale difference map fusion for tamper localization using binary ranking hashing
CN107993669B (en) Voice content authentication and tampering recovery method based on modification of least significant digit weight
CN108122225B (en) Digital image tampering detection method based on self-adaptive feature points
Bartlow et al. Protecting iris images through asymmetric digital watermarking
Bhalshankar et al. Audio steganography: LSB technique using a pyramid structure and range of bytes
Zenati et al. SSDIS-BEM: A new signature steganography document image system based on beta elliptic modeling
Choudhury et al. Cancelable iris Biometrics based on data hiding schemes
CN1315091C (en) Digital image recognising method based on characteristics
Liu et al. Data protection in palmprint recognition via dynamic random invisible watermark embedding
Dutta et al. An efficient and secure digital image watermarking using features from iris image
Benhamza et al. Image forgery detection review
Yuling et al. Robust Image Hashing Using Radon Transform and Invariant Features.
Ma et al. Block pyramid based adaptive quantization watermarking for multimodal biometric authentication
Sayeed et al. Forgery detection in dynamic signature verification by entailing principal component analysis
Dutta et al. Watermark generation from fingerprint features for digital right management control
Partala et al. Improving robustness of biometric identity determination with digital watermarking
Kaur et al. A secure and high payload digital audio watermarking using features from iris image
Low et al. A preliminary study on biometric watermarking for offline handwritten signature
Karsh Geometric invariant image authentication system using hashing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210420

WD01 Invention patent application deemed withdrawn after publication