CN112995135A - Mass digital voice content oriented batch content authentication method - Google Patents

Mass digital voice content oriented batch content authentication method Download PDF

Info

Publication number
CN112995135A
CN112995135A CN202110149767.0A CN202110149767A CN112995135A CN 112995135 A CN112995135 A CN 112995135A CN 202110149767 A CN202110149767 A CN 202110149767A CN 112995135 A CN112995135 A CN 112995135A
Authority
CN
China
Prior art keywords
voice
digital
digital voice
grouping
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110149767.0A
Other languages
Chinese (zh)
Other versions
CN112995135B (en
Inventor
钱清
王欢
陈清容
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou University of Finance and Economics
Original Assignee
Guizhou University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou University of Finance and Economics filed Critical Guizhou University of Finance and Economics
Priority to CN202110149767.0A priority Critical patent/CN112995135B/en
Publication of CN112995135A publication Critical patent/CN112995135A/en
Application granted granted Critical
Publication of CN112995135B publication Critical patent/CN112995135B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/018Audio watermarking, i.e. embedding inaudible data in the audio signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Editing Of Facsimile Originals (AREA)

Abstract

The invention discloses a batch content authentication method for massive digital voice content. The invention relates to the technical field of content authentication of digital voice in batches; the method comprises two parts of batch voice stable grouping and batch digital voice content authentication based on voice feature analysis, and firstly, a feature selection model is established to stably group massive voice by extracting and analyzing the robustness of digital voice key features on a time domain or a frequency domain. And then, according to a grouping strategy, a multi-layer grading and distributed parallel computing method is utilized, and a digital watermarking algorithm is combined to perform watermarking generation and embedding on the batch voice and content authentication of the batch voice, so that the rate of mass voice content authentication is further improved.

Description

Mass digital voice content oriented batch content authentication method
Technical Field
The invention relates to the technical field of content authentication of mass digital voice, in particular to a mass content authentication method for mass digital voice content.
Background
At present, the events of tampering and counterfeiting digital media contents are endless, and illegal persons cover true facts and manufacturing confusion through technical means such as tampering and counterfeiting, so that personal and collective economic property losses are caused, and even peace confusion and social fluctuation are caused. The characteristic that digital media data are easy to modify reduces the reliability of the digital media data, and for the multimedia data which are difficult to distinguish true from false, how to authenticate the integrity and authenticity of the content is a research hotspot in the field of multimedia information safety at present. Digital voice is one of the most important ways for human beings to acquire and exchange information as a main medium for information dissemination, and often carries some important and sensitive information, and once the voice content carrying the important and sensitive information is tampered or forged, immeasurable consequences can be caused.
With the advent of the big data age, digital voice in networks has a tendency of being quantized, and the traditional way of storing voice by using a local computer cannot meet the requirement of massive voice storage. With the help of the cloud storage technology, massive digital voice can be stored in the cloud. Therefore, the subject of digital voice content authentication is being shifted from an individual to a cloud service provider, that is, the main application scenario of digital voice content authentication is being gradually changed from a small number of digital voice authentications of an individual to a huge number of digital voice authentications of a cloud service provider. However, research on digital voice content authentication is still focused on personal and small amount of digital voice content authentication scenes, and research and exploration on massive digital voice content authentication scenes of cloud service providers have not yet attracted the attention of researchers, and many key problems are urgently solved. Especially, in the context of mass digital voice content authentication, the current technology of authenticating the content of each voice file one by one in a serial mode cannot meet the requirement of mass voice data processing on efficiency. Therefore, the corresponding batch digital voice content authentication technology is designed according to the requirement of mass digital voice content authentication, the efficiency of mass digital voice content authentication can be obviously improved, and the method has good application and popularization scenes.
Disclosure of Invention
The invention provides a batch content authentication method facing mass digital voice content from the perspective of active defense based on a digital watermarking technology, a machine learning technology and a distributed parallel computing technology, and the invention provides the following technical scheme:
a batch content authentication method for massive digital voice contents comprises the following steps:
step 1: extracting the characteristics of the digital voice in a time domain or a frequency domain, and determining the influence of voice attack on the extracted characteristics;
step 2: establishing a feature selection model, and selecting key features which are minimally affected by voice attack to stably group massive voices;
and step 3: determining the relevance between voice data according to the grouping result;
and 4, step 4: and a multilayer grading and distributed parallel method is adopted to efficiently search the tampered voice from the massive voice and locate the tampered area in the digital voice.
Preferably, the step 1 specifically comprises:
step 1.1: extracting the characteristics of digital voice in time domain or frequency domain, wherein the voice characteristics comprise pitch period, formants, mass centers, sub-band energy ratio, spectral component ratio, smooth pitch ratio, Mel cepstrum coefficient, emotion information and the like, and marking the input mass digital voice as siI1, 2, I is the number of digital voices;
step 1.2: sequentially extracting a plurality of features of each digital voice, and recording the jth feature set extracted by all the digital voices as f when J features are extracted by each digital voicej={f1,j,f2,j,…,fi,,j,…,fI,j|j=1,2,…,J};
Step 3, carrying out K types of conventional signal processing of noise addition, resampling, requantization and low-pass filtering on the input digital voice signal, and recording the digital voice after each type of attack as
Figure BDA0002931763640000021
Extracting each conventional signal processing operation in turnFeature set of post-all digital speech
Figure BDA0002931763640000022
Figure BDA0002931763640000023
Representing a set of jth features extracted after kth conventional signal processing;
sequentially calculate fjAnd
Figure BDA0002931763640000024
and is noted as the distance between
Figure BDA0002931763640000025
Preferably, the step 2 specifically comprises:
establishing a feature selection model, and when the voice features show robustness in the attacked digital voice, storing the voice features into a feature sequence, otherwise, discarding the voice features; grouping voices to be detected according to a massive voice grouping strategy, and selecting key characteristics which are minimally affected by voice attack to perform stable grouping;
calculating the mean value mu of each set of distance setsjAnd normalized to map the result value to [0-1]To (c) to (d);
Figure BDA0002931763640000026
Figure BDA0002931763640000027
setting a threshold value T, and calculating a normalized mean value
Figure BDA0002931763640000028
Satisfy the requirement of
Figure BDA0002931763640000029
Under the condition, corresponding characteristics are taken out and stored into a characteristic set theta,θ=θ∪fjotherwise, discarding the corresponding features;
performing feature fusion on the feature set theta by adopting a principal component analysis method, reducing data volume and obtaining the feature after dimension reduction
Figure BDA00029317636400000210
Indicating that each digital voice will extract data with the length of N as a characteristic sequence representing the voice;
extracting each digital voice s in turniFeatures after fusion, and are respectively marked as Fi,i=1,2,...,I;
Selecting a certain feature from the extracted features as a clustering center, and dividing the extracted voice features into m classes by using a clustering analysis algorithm;
according to the classification result of the characteristics, dividing the mass digital voice into m groups, and recording the grouped voice cluster as gj={sx1,sx2,...,sxy|x1,x2,....,xy∈[1,N]},j=1,2,...,m。
Preferably, the step 3 specifically comprises: according to a massive voice grouping strategy, all digital voices in each group are used for generating digital watermarks in the groups for grouping authentication;
embedding the packet authentication watermark information into the digital voice in the packet under the condition of equalizing the inaudibility and the embedding capacity and the detection precision, and marking the input massive digital voice as siI1, 2, I is the number of digital voices;
according to the digital voice feature extraction process, sequentially extracting the feature after each voice fusion, and recording as Fi,i=1,2,...,I;
According to the classification result of the extraction features of the massive digital voices, the massive digital voices are divided into m groups, and the grouped voice clusters are recorded as gj={sx1,sx2,...,sxy|x1,x2,...,xy∈[1,N]},j=1,2,...,m;
All the voice features in each group are processed and finally converted into binary sequences which serve as grouping watermarks and are recorded as wgmIndicating that the mth packet generation is a packet watermark;
polling all digital voice signals contained in each group, generating watermark information of each digital voice by a digital voice watermark generation algorithm based on contents, and recording the watermark generated by the jth digital voice in the mth group as wm,j
Splicing the grouped watermarks with watermarks generated by the single voice to obtain watermark information of the jth digital voice finally embedded into the mth group
Figure BDA0002931763640000031
And (3) embedding the digital voice watermarks in the groups into the voice by adopting a distributed parallel method and simultaneously executing a watermark embedding algorithm in each digital voice.
Preferably, the step 4 specifically includes:
step 4.1: marking mass digital voice containing tampered voice as siN, N is the number of digital speeches;
dividing the mass digital voice into m groups according to a grouping strategy, and recording the grouped voice cluster as gj={sx1,sx2,...,sxy|x1,x2,...,xy∈[1,N]},j=1,2,...,m;
Consistent with the generation and embedding process of the watermark, the grouping watermark of each group is sequentially generated according to the grouping result, and the grouping watermark of the mth group of digital voice clusters is recorded as wgm
Adopting an extraction algorithm corresponding to the watermark embedding algorithm to sequentially extract the grouping watermarks of all the grouping voice clusters, and recording the grouping watermarks extracted from the mth group of digital voice clusters as
Figure BDA0002931763640000032
Step 4.2: defining a packet authentication sequence WGmLet us order
Figure BDA0002931763640000033
WGmE {0, 1 }; when dividing intoGroup authentication sequence WGmIf all the elements are 0, all the digital voice in the current grouping is not tampered, and the authentication is finished; otherwise, when the group authentication sequence WGmIf the element contains an element with an element value of 1, the fact that at least one digital voice in the current packet is tampered is indicated, and the digital voice in the packet needs to be further verified;
step 4.3: for the packet with the tampered voice, sequentially reconstructing watermark information in each digital voice in the packet, and recording the reconstructed watermark of the jth digital voice in the mth tampered voice packet as wm,j
Sequentially extracting watermark information embedded in each digital voice in the tampered voice packet, and recording the watermark extracted from the jth digital voice in the mth tampered voice packet as
Figure BDA0002931763640000041
Defining the authentication sequence of the jth digital voice in the mth tampered voice packet as Wm,jLet us order
Figure BDA0002931763640000042
Wm,jE {0, 1 }; when authenticating the sequence Wm,jIf all the elements are 0, the current digital voice is not tampered, and the authentication is finished; otherwise, when the authentication sequence Wm,jIf the element value is 1, the current digital voice is tampered, the tampered position is located according to the watermark embedding principle, and the authentication is finished.
The invention has the following beneficial effects:
the invention fully considers the requirement of mass voice content authentication in a big data environment, adopts a multi-layer graded grouping batch voice content authentication method, simultaneously authenticates a plurality of grouped voice contents in a distributed parallel mode, and further improves the mass voice content authentication rate, thereby being beneficial to popularization and application of the invention.
Drawings
FIG. 1 is a block diagram of mass digital speech feature extraction and speech grouping;
FIG. 2 is a block diagram of generation and embedding of a mass of digital voice watermarks;
fig. 3 is a block diagram of authentication of mass digital voice content.
Detailed Description
The present invention will be described in detail with reference to specific examples.
The first embodiment is as follows:
as shown in fig. 1 to fig. 3, the present invention provides a batch content authentication method for massive digital voice content, which includes the following steps:
a batch content authentication method for massive digital voice contents comprises the following steps:
step 1: extracting the characteristics of the digital voice in a time domain or a frequency domain, and determining the influence of voice attack on the extracted characteristics;
the step 1 specifically comprises the following steps:
step 1.1: extracting the characteristics of digital voice in time domain or frequency domain, wherein the voice characteristics comprise pitch period, formants, mass centers, sub-band energy ratio, spectral component ratio, smooth pitch ratio, Mel cepstrum coefficient and emotion information, and marking the input mass digital voice as siI1, 2, I is the number of digital voices;
step 1.2: sequentially extracting a plurality of features of each digital voice, and recording the jth feature set extracted by all the digital voices as f when J features are extracted by each digital voicej={f1,j,f2,j,…,fi,,j,…,fI,j|j=1,2,…,J};
Step 3, performing K kinds of conventional signal processing on the input digital voice signal, adding noise, resampling, requantizing and low-pass filtering, and recording the digital voice after each kind of attack as
Figure BDA0002931763640000043
Sequentially extracting feature sets of all digital voices after each conventional signal processing operation
Figure BDA0002931763640000044
Figure BDA0002931763640000045
Representing a set of jth features extracted after kth conventional signal processing;
sequentially calculate fjAnd
Figure BDA0002931763640000046
and is noted as the distance between
Figure BDA0002931763640000047
Step 2: establishing a selection model, and selecting key characteristics which are minimally influenced by voice attack to stably group massive voices;
the step 2 specifically comprises the following steps:
establishing a selection model, and when the robustness is shown in the digital voice with the voice characteristics attacked, storing the digital voice into a characteristic sequence, otherwise, discarding the digital voice; grouping voices to be detected according to a massive voice grouping strategy, and selecting key characteristics which are minimally affected by voice attack to perform stable grouping;
calculating the mean value mu of each set of distance setsjAnd normalized to map the result value to [0-1]To (c) to (d);
Figure BDA0002931763640000051
Figure BDA0002931763640000052
setting a threshold value T, and calculating a normalized mean value
Figure BDA0002931763640000053
Satisfy the requirement of
Figure BDA0002931763640000054
In this condition, the corresponding feature is taken out and stored in the characterThe characteristic set theta, theta ═ U [ < f >jOtherwise, discarding the corresponding features;
performing feature fusion on the feature set theta by adopting a principal component analysis method, reducing data volume and obtaining the feature after dimension reduction
Figure BDA0002931763640000055
Indicating that each digital voice will extract data with the length of N as a characteristic sequence representing the voice;
extracting each digital voice s in turniFeatures after fusion, and are respectively marked as Fi,i=1,2,...,I;
Selecting a certain feature from the extracted features as a clustering center, and dividing the extracted voice features into m classes by using a clustering analysis algorithm;
according to the classification result of the characteristics, dividing the mass digital voice into m groups, and recording the grouped voice cluster as gj={sx1,sx2,...,sxy|x1,x2,...,xy∈[1,N]},j=1,2,...,m。
And step 3: determining the relevance between voice data according to the grouping result;
the step 3 specifically comprises the following steps: according to a massive voice grouping strategy, all digital voices in each group are used for generating digital watermarks in the groups for grouping authentication;
embedding the packet authentication watermark information into the digital voice in the packet under the condition of equalizing the inaudibility and the embedding capacity and the detection precision, and marking the input massive digital voice as siI1, 2, I is the number of digital voices;
according to the digital voice feature extraction process, sequentially extracting the feature after each voice fusion, and recording as Fi,i=1,2,...,I;
According to the classification result of the extraction features of the massive digital voices, the massive digital voices are divided into m groups, and the grouped voice clusters are recorded as gj={sx1,sx2,...,sxy|x1,12,...,1y∈[1,N]},j=1,2,...,m;
Will be in each groupAll speech features of (a) are processed and finally converted into a binary sequence as a block watermark and noted wgmIndicating that the mth packet generation is a packet watermark;
polling all digital voice signals contained in each group, generating watermark information of each digital voice by a digital voice watermark generation algorithm based on contents, and recording the watermark generated by the jth digital voice in the mth group as wm,j
Splicing the grouped watermarks with watermarks generated by the single voice to obtain watermark information of the jth digital voice to be embedded into the mth group
Figure BDA0002931763640000061
And (3) embedding the digital voice watermarks in the groups into the voice by adopting a distributed parallel method and simultaneously executing a watermark embedding algorithm in each digital voice.
And 4, step 4: and a multilayer grading and distributed parallel method is adopted to efficiently search the tampered voice from the massive voice and determine the tampered area in the digital voice.
The step 4 specifically comprises the following steps:
step 4.1: marking mass digital voice containing tampered voice as siN, N is the number of digital speeches;
dividing the mass digital voice into m groups according to a grouping strategy, and recording the grouped voice cluster as gj={sx1,sx2,...,sxy|x1,x2,...,xy∈[1,N]},j=1,2,...,m;
Consistent with the generation and embedding process of the watermark, the grouping watermark of each group is sequentially generated according to the grouping result, and the grouping watermark of the mth group of digital voice clusters is recorded as wgm
Adopting an extraction algorithm corresponding to the watermark embedding algorithm to sequentially extract the grouping watermarks of all the grouping voice clusters, and recording the grouping watermarks extracted from the mth group of digital voice clusters as
Figure BDA0002931763640000062
Step 4.2: defining a packet authentication sequence WGmLet us order
Figure BDA0002931763640000063
WGmE {0, 1 }; when grouping authentication sequence WGmIf all the elements are 0, all the digital voice in the current grouping is not tampered, and the authentication is finished; otherwise, when the group authentication sequence WGmIf the element contains an element with an element value of 1, the fact that at least one digital voice in the current packet is tampered is indicated, and the digital voice in the packet needs to be further verified;
step 4.3: for the packet with the tampered voice, sequentially reconstructing watermark information in each digital voice in the packet, and recording the reconstructed watermark of the jth digital voice in the mth tampered voice packet as wm,j
Sequentially extracting watermark information embedded in each digital voice in the tampered voice packet, and recording the watermark extracted from the jth digital voice in the mth tampered voice packet as
Figure BDA0002931763640000064
Defining the authentication sequence of the jth digital voice in the mth tampered voice packet as Wm,jLet us order
Figure BDA0002931763640000065
Wm,jE {0, 1 }; when authenticating the sequence Wm,jIf all the elements are 0, the current digital voice is not tampered, and the authentication is finished; otherwise, when the authentication sequence Wm,jIf the element value is 1, the current digital voice is tampered, the tampered position is located according to the watermark embedding principle, and the authentication is finished.
The second embodiment is as follows:
the method comprises the steps of performing stable grouping on batch voice and authenticating the content of the batch digital voice based on voice feature analysis, extracting key features of the digital voice in a time domain or a frequency domain, analyzing the influence of conventional voice signal processing attack on the extracted features, establishing a selection model, and selecting the key features which are less influenced by the voice attack to perform stable grouping on massive voice; and (3) mining the relevance among mass voice data, efficiently searching the tampered voice from the mass voice and accurately positioning the tampered area in the tampered digital voice by adopting a multilayer hierarchical and distributed parallel computing technology.
The main steps of the batch voice stable grouping part of the invention are as follows: and extracting digital voice features including pitch period, formants, mass centers, sub-band energy ratios, spectrum component ratios, smooth pitch ratios, Mel cepstrum coefficients, emotion information and the like. If the above features show strong robustness in the attacked digital voice, the features are selected to be stored in the feature sequence, otherwise, the features are discarded. And (4) calculating the features stored in the feature sequence by a feature fusion algorithm to construct the strong robust features of the digital voice. The digital voice strong robust features are used, and a clustering technology is adopted to stably group massive digital voices, so that the digital voices can still be grouped in a consistent manner before attack after being attacked to a certain degree.
The batch digital voice content authentication method comprises the following main steps: watermark generation and embedding are carried out on batch voice and content authentication of the batch voice is carried out by utilizing a multi-layer hierarchical and distributed parallel computing method.
The invention uses all digital voice in each group to generate the group digital watermark for group authentication according to the massive voice grouping strategy, embeds the group authentication watermark information into the current group voice under the conditions of balanced inaudibility, embedding capacity and detection precision, and respectively generates corresponding watermark information for each voice in the group and embeds the watermark information into the single digital voice by adopting the parallel technology and the digital watermark generating and embedding method based on the single digital voice.
Grouping voice to be detected according to a massive voice grouping strategy, then authenticating the voice grouping step by adopting a multi-layer grading method, and detecting whether the whole grouping is subjected to tampering attack or not through grouping authentication watermark information embedded in grouping voice. If the detection result shows that the whole packet is complete, all the voice in the current packet is considered to be not tampered, and otherwise, the digital voice in the group needs to be further authenticated. And performing content integrity parallel detection on each digital voice in the group by using a parallel technology and a single voice content authentication method, and positioning a tampered area in the tampered voice.
The above description is only a preferred embodiment of the batch content authentication method for mass digital voice contents, and the protection scope of the batch content authentication method for mass digital voice contents is not limited to the above embodiments, and all technical solutions belonging to the idea belong to the protection scope of the present invention. It should be noted that modifications and variations which do not depart from the gist of the invention will be those skilled in the art to which the invention pertains and which are intended to be within the scope of the invention.

Claims (5)

1. A batch content authentication method for massive digital voice contents is characterized by comprising the following steps: the method comprises the following steps:
step 1: extracting digital voice features in a time domain or a frequency domain, and determining the influence of voice attack on the extracted features;
step 2: establishing a feature selection model, and selecting key features which are minimally affected by voice attack to stably group massive voices;
and step 3: determining the relevance between voice data according to the grouping result;
and 4, step 4: and a multilayer grading and distributed parallel method is adopted to efficiently search the tampered voice from the massive voice and locate the tampered area in the digital voice.
2. The batch content authentication method for massive digital voice contents according to claim 1, which is characterized in that: the step 1 specifically comprises the following steps:
step 1.1: extracting digital voice features in time domain or frequency domain, wherein the voice features comprise pitch period, formants, mass center, sub-band energy ratio, spectrum component ratio, smooth pitch ratio and Mel inverse ratioSpectral coefficient and emotional information, and marking the input mass digital voice as siI1, 2, I is the number of digital voices;
step 1.2: sequentially extracting a plurality of features of each digital voice, and recording the jth feature set extracted by all the digital voices as f when J features are extracted by each digital voicej={f1,j,f2,j,…,fi,,j,…,fI,j|j=1,2,…,J};
Step 3, carrying out K types of conventional signal processing of adding noise, resampling, requantizing and low-pass filtering on the input digital voice signal, and recording the digital voice after each type of signal processing as
Figure FDA0002931763630000011
Sequentially extracting feature sets of all digital voices after each conventional signal processing operation
Figure FDA0002931763630000012
Figure FDA0002931763630000013
Representing a set of jth features extracted after kth conventional signal processing;
sequentially calculate fjAnd
Figure FDA0002931763630000014
and is noted as the distance between
Figure FDA0002931763630000015
3. The batch content authentication method for massive digital voice contents according to claim 1, which is characterized in that: the step 2 specifically comprises the following steps:
establishing a feature selection model, and when the voice features show robustness in the attacked digital voice, storing the voice features into a feature sequence, otherwise, discarding the voice features; grouping voices to be detected according to a massive voice grouping strategy, and selecting key characteristics which are minimally affected by voice attack to perform stable grouping;
calculating the mean value mu of each set of distance setsjAnd normalized to map the result value to [0-1]To (c) to (d);
Figure FDA0002931763630000016
Figure FDA0002931763630000017
setting a threshold value T, and calculating a normalized mean value
Figure FDA0002931763630000018
Satisfy the requirement of
Figure FDA0002931763630000019
Under the condition, corresponding features are taken out and stored into a feature set theta, theta is equal to theta and U fjOtherwise, discarding the corresponding features;
performing feature fusion on the feature set theta by adopting a principal component analysis method, reducing data volume and obtaining the feature after dimension reduction
Figure FDA00029317636300000110
Indicating that each digital voice will extract data with the length of N as a characteristic sequence representing the voice;
extracting each digital voice s in turniFeatures after fusion, and are respectively marked as Fi,i=1,2,...,I;
Selecting a certain feature from the extracted features as a clustering center, and dividing the extracted voice features into m classes by using a clustering analysis algorithm;
according to the classification result of the characteristics, dividing the mass digital voice into m groups, and recording the grouped voice cluster as gj={sx1,sx2,...,sxy|x1,x2,...,xy∈[1,N]},j=1,2,...,m。
4. The batch content authentication method for massive digital voice contents according to claim 1, which is characterized in that: the step 3 specifically comprises the following steps: according to a massive voice grouping strategy, all digital voices in each group are used for generating digital watermarks in the groups for grouping authentication;
embedding the packet authentication watermark information into the digital voice in the packet under the condition of equalizing the inaudibility and the embedding capacity and the detection precision, and marking the input massive digital voice as siI1, 2, I is the number of digital voices;
according to the digital voice feature extraction process, sequentially extracting the feature after each voice fusion, and recording as Fi,i=1,2,...,I;
According to the classification result of the extraction features of the massive digital voices, the massive digital voices are divided into m groups, and the grouped voice clusters are recorded as gj={sx1,sx2,...,sxy|x1,x2,...,xy∈[1,N]},j=1,2,...,m;
All the voice characteristics in each group are processed and finally converted into binary sequences which serve as grouping watermarks and are recorded as wgmIndicating that the mth packet generation is a packet watermark;
polling all digital voice signals contained in each group, generating watermark information of each digital voice by a digital voice watermark generation algorithm based on contents, and recording the watermark generated by the jth digital voice in the mth group as wm,j
Splicing the grouped watermarks with watermarks generated by the single voice to obtain watermark information of the jth digital voice finally embedded into the mth group
Figure FDA0002931763630000021
And (3) embedding the digital voice watermarks in the groups into the voice by adopting a distributed parallel method and simultaneously executing a watermark embedding algorithm in each digital voice.
5. The batch content authentication method for massive digital voice contents according to claim 1, which is characterized in that: the step 4 specifically comprises the following steps:
step 4.1: marking mass digital voice containing tampered voice as siN, N is the number of digital speeches;
dividing the mass digital voice into m groups according to a grouping strategy, and recording the grouped voice cluster as gj={sx1,sx2,...,sxy|x1,x2,...,xy∈[1,N]},j=1,2,...,m;
Consistent with the generation and embedding process of the watermark, the grouping watermark of each group is sequentially generated according to the grouping result, and the grouping watermark of the mth group of digital voice clusters is recorded as wgm
Adopting an extraction algorithm corresponding to the watermark embedding algorithm to sequentially extract the grouping watermarks of all the grouping voice clusters, and recording the grouping watermarks extracted from the mth group of digital voice clusters as
Figure FDA0002931763630000022
Step 4.2: defining a packet authentication sequence WGmLet us order
Figure FDA0002931763630000031
WGmE {0, 1 }; when grouping authentication sequence WGmIf all the elements are 0, all the digital voice in the current grouping is not tampered, and the authentication is finished; otherwise, when the group authentication sequence WGmIf the element contains an element with an element value of 1, the fact that at least one digital voice in the current packet is tampered is indicated, and the digital voice in the packet needs to be further verified;
step 4.3: for the packet with the tampered voice, sequentially reconstructing watermark information in each digital voice in the packet, and recording the reconstructed watermark of the jth digital voice in the mth tampered voice packet as wm,j
Sequential extraction of tampered voice packetsThe watermark information embedded in each digital voice is recorded as the watermark extracted from the jth digital voice in the mth tampered voice packet
Figure FDA0002931763630000032
Defining the authentication sequence of the jth digital voice in the mth tampered voice packet as Wm,jLet us order
Figure FDA0002931763630000033
Wm,jE {0, 1 }; when authenticating the sequence Wm,jIf all the elements are 0, the current digital voice is not tampered, and the authentication is finished; otherwise, when the authentication sequence Wm,jIf the element value is 1, the current digital voice is tampered, the tampered position is located according to the watermark embedding principle, and the authentication is finished.
CN202110149767.0A 2021-02-03 2021-02-03 Mass digital voice content oriented batch content authentication method Active CN112995135B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110149767.0A CN112995135B (en) 2021-02-03 2021-02-03 Mass digital voice content oriented batch content authentication method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110149767.0A CN112995135B (en) 2021-02-03 2021-02-03 Mass digital voice content oriented batch content authentication method

Publications (2)

Publication Number Publication Date
CN112995135A true CN112995135A (en) 2021-06-18
CN112995135B CN112995135B (en) 2021-11-02

Family

ID=76346372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110149767.0A Active CN112995135B (en) 2021-02-03 2021-02-03 Mass digital voice content oriented batch content authentication method

Country Status (1)

Country Link
CN (1) CN112995135B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060227968A1 (en) * 2005-04-08 2006-10-12 Chen Oscal T Speech watermark system
CN102915740A (en) * 2012-10-24 2013-02-06 兰州理工大学 Phonetic empathy Hash content authentication method capable of implementing tamper localization
CN107993669A (en) * 2017-11-20 2018-05-04 西南交通大学 Voice content certification and tamper recovery method based on modification least significant digit weight

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060227968A1 (en) * 2005-04-08 2006-10-12 Chen Oscal T Speech watermark system
CN102915740A (en) * 2012-10-24 2013-02-06 兰州理工大学 Phonetic empathy Hash content authentication method capable of implementing tamper localization
CN107993669A (en) * 2017-11-20 2018-05-04 西南交通大学 Voice content certification and tamper recovery method based on modification least significant digit weight

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张涛: "移动终端语音感知哈希认证方法及应用研究", 《中国优秀硕士学位论文》 *

Also Published As

Publication number Publication date
CN112995135B (en) 2021-11-02

Similar Documents

Publication Publication Date Title
US10666792B1 (en) Apparatus and method for detecting new calls from a known robocaller and identifying relationships among telephone calls
Poisel et al. Forensics investigations of multimedia data: A review of the state-of-the-art
CN110968845B (en) Detection method for LSB steganography based on convolutional neural network generation
Luo et al. Improved audio steganalytic feature and its applications in audio forensics
CN106991312B (en) Internet anti-fraud authentication method based on voiceprint recognition
CN107507626A (en) A kind of mobile phone source title method based on voice spectrum fusion feature
Jaiswal et al. Aird: Adversarial learning framework for image repurposing detection
Xia et al. A privacy-preserving handwritten signature verification method using combinational features and secure kNN
CN116150651A (en) AI-based depth synthesis detection method and system
CN111863025A (en) Audio source anti-forensics method
WO2004038532A2 (en) Apparatus for online signature verification using pattern transform technique and method therefor
Zhang et al. Spectrogram-based Efficient Perceptual Hashing Scheme for Speech Identification.
CN105283916B (en) Electronic watermark embedded device, electronic watermark embedding method and computer readable recording medium
CN113591853A (en) Keyword extraction method and device and electronic equipment
CN112995135B (en) Mass digital voice content oriented batch content authentication method
Zhang et al. Aslnet: An encoder-decoder architecture for audio splicing detection and localization
Chuchra et al. A deep learning approach for splicing detection in digital audios
CN115982388A (en) Case quality control map establishing method, case document quality testing method, case quality control map establishing equipment and storage medium
US11763836B2 (en) Hierarchical generated audio detection system
Wu et al. The defender's perspective on automatic speaker verification: An overview
Diwan et al. Visualizing the truth: a survey of multimedia forensic analysis
CN113178199A (en) Digital audio tampering evidence obtaining method based on phase deviation detection
Wang et al. General GAN-generated image detection by data augmentation in fingerprint domain
Liu et al. Anti‐noise image source identification
CN112967712A (en) Synthetic speech detection method based on autoregressive model coefficient

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant