KR101382356B1 - Apparatus for forgery detection of audio file - Google Patents

Apparatus for forgery detection of audio file Download PDF

Info

Publication number
KR101382356B1
KR101382356B1 KR1020130079293A KR20130079293A KR101382356B1 KR 101382356 B1 KR101382356 B1 KR 101382356B1 KR 1020130079293 A KR1020130079293 A KR 1020130079293A KR 20130079293 A KR20130079293 A KR 20130079293A KR 101382356 B1 KR101382356 B1 KR 101382356B1
Authority
KR
South Korea
Prior art keywords
audio signal
learning
feature data
model
recording path
Prior art date
Application number
KR1020130079293A
Other languages
Korean (ko)
Inventor
김경화
유하진
백록선
Original Assignee
대한민국
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 대한민국 filed Critical 대한민국
Priority to KR1020130079293A priority Critical patent/KR101382356B1/en
Application granted granted Critical
Publication of KR101382356B1 publication Critical patent/KR101382356B1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

An embodiment of the present invention relates to an apparatus which extracts feature data from an audio file and automatically detects a potentially forged segment using a recording path estimation result based on the feature data. The detection method of the apparatus includes the steps of: receiving an audio signal; dividing the audio signal in frame units of a predetermined length and generating multiple frame unit signals; extracting one of a mel-frequency cepstral Coefficient (MFCC) value, a linear prediction coding (LPC) value, and a perceptual linear prediction (PLP) value of each of the frame unit signals as the feature data; setting a model most similar to the feature data among multiple audio signal recording path models as the estimated recording path of each frame unit signal; and setting the most frequently estimated recording path among the estimated recording paths as the default recording path and detects an unusual segment which is suspected to be forged. [Reference numerals] (AA) Start; (BB) End; (S210) Step of receiving an audio signal; (S220) Step of dividing an audio signal in frame units of a predetermined length and generating frame unit signals; (S230) Step of extracting feature data from each frame unit signal; (S240) Step of estimating each recording path of the frame unit signal and generating an estimated recording path; (S250) Step of detecting an unusual segment based on the estimated recording path

Description

Apparatus for Forgery Detection of Audio File

An embodiment of the present invention relates to an apparatus for detecting whether or not an audio file is forged.

The contents described in this section merely provide background information on the embodiment of the present invention and do not constitute the prior art.

The popularity of digital devices such as digital recorders, MP3s, and smartphones has made it easier for anyone to record digital voice files. With the ease of secret recordings using digital devices, the number of cases in which voice files are submitted as evidence in court continues to increase.

Digital voice files are easy to forgery such as insertion, deletion, and copying, but it is not easy to detect forgery, so the expert judges whether the forgery is through precise listening analysis and device analysis. At this time, it takes a long time for the expert to analyze the forgery of the entire voice file. In this case, if you use a system that automatically detects suspected forgery, you can effectively reduce the time and cost required for file analysis.

The digital voice file contains sound signal information according to the characteristics of the recording device and ambient noise during recording, and has different characteristics depending on the compression technique or the storage technique. Automatic identification of these features can improve the accuracy of the analysis results while reducing the physical time required to analyze the evidence.

According to an embodiment of the present invention, by extracting feature data of an audio file, estimating a recording path used in the process of generating the audio file, and automatically detecting a suspected forgery interval of the audio file based on the recording path estimation result, the evidence data analysis is performed. The main object of the present invention is to provide a forgery suspecting interval detection device that can shorten the physical time required and improve the accuracy of the analysis result.

One embodiment of the present invention, the process of receiving an audio signal; Generating a plurality of frame unit signals by dividing the audio signal into frame units having a predetermined length; Extracting feature data from each of a plurality of frame unit signals; Generating an estimated recording path by estimating all or each recording path of a plurality of the frame unit signals; And detecting a specific section based on the estimated recording path.

The generating of the frame unit signal may allow a plurality of the frame unit signals to overlap at regular intervals.

In the generating of the estimated recording path, the recording path of the model having the highest similarity with the feature data may be estimated as all or respective recording paths of the corresponding frame unit signal.

The detecting of the singular section may include setting a recording path having the highest frequency among the estimated recording paths as a default path, and including a section including all or each of the frame unit signals having a recording path different from the default path. Can be detected.

The feature data may be generated by at least one of features such as Mel-Frequency Cepstral Coefficient (MFCC), Linear Prediction Coding (LPC) Coefficient, and Perceptual Linear Prediction (PLP) Coefficient.

The method for detecting a forgery suspect interval further includes adding a model, wherein adding the model includes: receiving path information of the model; Receiving a learning audio signal; Extracting learning feature data of the learning audio signal; And learning the feature data for learning.

The learning may use at least one of methods such as Gaussian Mixture Model (GMM), Hidden Markov Model (HMM), and Support Vector Machine (SVM).

The extracting of the feature data for learning may be performed using the same method as the method of extracting the feature data used in the process of extracting the feature data.

Further, according to another aspect of the embodiment of the present invention, the process of receiving the path information of the model; Receiving an audio signal for training the model; Extracting feature data of the audio signal; And it provides a model learning method comprising the step of learning the feature data.

The route information may include at least one of information such as a device used for recording, a sample frequency, a compression method, and a transmission method.

The feature data may be generated by at least one of features such as Mel-Frequency Cepstral Coefficient (MFCC), Linear Prediction Coding (LPC) Coefficient, and Perceptual Linear Prediction (PLP) Coefficient.

The learning may use at least one of methods such as Gaussian Mixture Model (GMM), Hidden Markov Model (HMM), and Support Vector Machine (SVM).

In addition, according to another aspect of an embodiment of the present invention, an audio signal receiving unit for receiving an audio signal; A frame dividing unit dividing the audio signal into frame units having a predetermined length to generate a plurality of frame unit signals; A feature data extraction unit for extracting feature data from each of a plurality of frame unit signals; A model storage unit for storing a model; A recording path estimator for estimating all or a plurality of recording paths of the plurality of frame unit signals using the feature data and the model to generate an estimated recording path; And a specific section comparison detection unit for detecting a specific section based on the estimated recording path.

The apparatus for detecting a forgery suspect period further includes a model adding unit, wherein the model adding unit includes: a learning feature data extracting unit for receiving a learning audio signal and extracting learning feature data from the learning audio signal; And a model learning unit configured to receive the training feature data and the path information of the model, train the model, generate the learning result as an additional model, and transmit the additional model to the model storage unit.

The learning feature data extracting unit may use the same method as the method of extracting the feature data used in the feature data extracting unit as a method of extracting the learning feature data.

In addition, according to another aspect of an embodiment of the present invention, an audio signal receiving unit for receiving an audio signal; A feature data extraction unit for extracting feature data from the audio signal; And a model learner which receives the feature data and the path information of the audio signal, learns the feature data, and generates the learning result as a model.

According to an embodiment of the present invention, by automatically detecting a section suspected of forgery of the audio file, it is possible to shorten the physical time required to analyze the audio file, and to reduce the cost due to the input of expert personnel, Accuracy can also be improved.

1 is a view schematically showing the configuration of a forgery suspected interval detection device according to an embodiment of the present invention.
2 is a flowchart illustrating an operation of a singular section detecting unit in a forgery suspected section detecting apparatus according to an embodiment of the present invention.
FIG. 3 is a diagram illustrating in detail a process of generating a frame unit signal in the apparatus for detecting a suspected forgery interval according to an embodiment of the present invention.
4 is a diagram illustrating in detail a process of extracting feature data in the apparatus for detecting a suspected forgery interval according to an embodiment of the present invention.
5 is a flowchart illustrating the operation of the model adder in the forgery suspected interval detection device according to an embodiment of the present invention.
6 is a block diagram of a forgery suspected interval detection device according to an embodiment of the present invention.
7 is a view showing in detail the model addition unit in the forgery suspected interval detection device according to an embodiment of the present invention.

Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

It should be noted that, in adding reference numerals to the constituent elements of the drawings, the same constituent elements are denoted by the same reference symbols as possible even if they are shown in different drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.

In describing the components of the present invention, terms such as first, second, A, B, (a), and (b) may be used. These terms are intended to distinguish the constituent elements from other constituent elements, and the terms do not limit the nature, order or order of the constituent elements. When a component is described as being "connected", "coupled", or "connected" to another component, the component may be directly connected to or connected to the other component, It should be understood that an element may be "connected," "coupled," or "connected."

1 is a view schematically showing the configuration of the forgery suspected interval detection device 100 according to an embodiment of the present invention.

Forgery suspected interval detection device 100 according to an embodiment of the present invention may be configured to include a specific section detection unit 110 and the model adding unit 120.

Singular section detection unit 110 receives the audio signal to detect the unusual section suspected of forgery, extracts feature data from the received audio signal, estimates the recording path of the audio signal using the extracted feature data and model It may be configured to detect a singular section based on the estimated recording path. In this case, the model may be generated by extracting the training feature data from the training audio file generated by the previously known path information and learning the extracted training feature data for use in the recording path estimation process of the audio signal.

The model adding unit 120 receives the learning audio signal knowing the route information, extracts the learning feature data from the received learning audio signal, learns the extracted learning feature data, generates a model, and generates the model in a specific section. It may be configured to transmit to the detector 110.

2 is a flowchart illustrating an operation of the singular section detection unit 110 in the forgery suspected section detection device 100 according to an embodiment of the present invention.

The singular section detecting unit 110 detects the singular section in a process of receiving an audio signal (S210), dividing the audio signal into frame units of a predetermined length to generate a frame unit signal (S220), and each frame unit signal. Extracting feature data from the process (S230), estimating the recording path of each frame unit signal using the feature data and the model to generate the estimated recording path (S240), and detecting the singular interval based on the estimated recording path. It may include a process (S250). In describing the operation of the singular section detection unit 110, the order illustrated in FIG. 2 is described as an example, but the order of each process may be performed interchangeably between processes, and is not limited to the order illustrated in FIG. 2.

The method for detecting the singular section by the singular section detecting unit 110 starts from a process of receiving an audio signal to detect the singular section (S210).

The audio signal to be detected for the specific section, that is, the specific section detection target audio signal may be a digital or analog signal conforming to the uncompressed format such as WAV, AIFF and AU and the compressed format such as MP3, WMA, MPC and OGG.

Thereafter, the received audio signal is divided into frame units having a predetermined length to generate a frame unit signal (S220). 3 is a diagram illustrating a process of generating a frame unit signal (S220) in detail.

In order to compensate for the high frequency components, the audio signal of the audio signal mainly has a high pass characteristic, as shown in FIG. 3, in order to compensate for the high frequency components. By performing an emphasis (Pre-Emphasis) process, the received audio signal is processed to have a uniform energy distribution over the entire band.

The pre-emphasis audio signal is divided into frames. A window function such as rectangular, hamming, hanning, and kaiser-bessel may be used to divide the audio signal in units of frames.

At this time, it is preferable to set the length of the frame to a very short time (tens of hundreds to hundreds of msec) that can be assumed to be stationary without changing the characteristics of the audio signal. In the present embodiment, the length of the frame is set to be 3 to 4 times the pitch period, but is not necessarily limited thereto.

When the analysis is divided into frame units as described above, in order to ensure continuity between analysis units, the divided sections are set to overlap to some extent. For example, if the length of the frame is set to 20 msec and the moving period of the frame is set to 10 msec, a duplicate portion of 10 msec occurs between adjacent frames.

Feature data is extracted from the audio signal divided into the frame unit signal (S230). Features include Mel-Frequency Cepstral Coefficients (MFCC), Linear Prediction Coding (LPC) Coefficient, Perceptual Linear Prediction (PLP) Coefficient, Total Power Spectrum, Subband Power, and Center Frequency Centroid), bandwidth and pitch frequency may be used.

In describing the operation of the forgery suspected interval detection device 100 according to an embodiment of the present invention, the MFCC is described as an example, but is not limited thereto.

4 is a diagram illustrating a process of extracting MFCC feature data (S230) in detail.

The frame unit signal is converted into a frequency band by a fast fourier transform (FFT), and the signal converted into the frequency band passes through a Mel-Scale Filter Bank.

 Mel-Scale Filter Bank is composed of filter banks of a plurality of Band-pass Filters close to the human auditory structure, and the center frequency array of each filter is 1 KHz based on Mel unit, which is a human perceptual frequency unit. In the following, it is configured linearly, and in 1 KHz or more, it is configured in a log scale. Processing by the Mel-Scale Filter Bank makes the incoming audio signal similar to the spectral signal perceived by the human auditory system.

The signal passing through the Mel-Scale Filter Bank is processed through a log process to obtain MFCC characteristic data by a discrete cosine transform.

Next, an estimated recording path is generated by estimating the recording path of each frame unit signal (S240). At this time, there should be a model trained on the recording path. That is, the recording path of the corresponding frame is estimated by learning the feature data extracted from the audio signal of which the path information is known, by comparing the extracted model data with the feature data extracted from the detection target audio signal.

If there is no model, the model adder 120 may add the model. Description of the learning and addition of the model will be described in detail when explaining the operation of the model adding unit 120.

The estimated recording path is generated by estimating the recording path of the frame with the model with the highest similarity by comparing the feature data of each frame with the learned model.

The path information on the recording path means information that can affect the formation of the characteristics of the audio signal during the generation of the audio signal. The path information on the recording path, the sampling frequency, the compression method, the transmission method, etc. There may be, and such information may be combined to form route information.

Information on the equipment used for recording includes the types of devices such as mini recorders, small ballpoint pen recorders, calculator recorders, mobile phones, video cameras, cassette tape recorders, and digital audio tapes (DAT), and the model name and manufacturing serial number of each device ( Serial Number) may be included.

Sampling frequency is 44.1 KHz sampling frequency for home devices such as CD (Compact Disc), MD (Mini Disc) and LD (Laser Disc), and sampling frequency of 48.0 KHz is used for broadcasting equipment such as DAT. The instrument uses a sampling frequency much higher than the Nyquist frequency, such as 96 KHz, to improve the signal-to-noise ratio.

Compression methods include MPEG-1 Layer III (MP3), Advanced Audio Coding (AAC), High-Efficiency Advanced Audio Coding (HE-AAC), Dolby Digital (AC3), Adaptive Multi-Rate (AMR), and Adaptive TRransform (ATRAC). Compression methods such as Acoustic Coding (ADC), Adaptive Differential Pulse Coded Modulation (ADPCM), and Free Lossless Audio Codec (FLAC) may be used.

Transmission methods include AES / EBU (Audio Engineering Society / European Broadcasting Union), S / PDIF (Sony / Philips Digital Interface Format), ADI (Alesis Digital Interface), TDIF (Tascam Digital InterFace) and USB (Universal Serial Bus) Manner may be used.

Depending on the device used, the sampling frequency, compression method, and transmission method used in the device may be determined. This can be decided

When the forgery suspected interval detection device 100 according to an embodiment of the present invention is used in a trial or the like, such path information may be grasped by a record of a production process of an audio signal submitted as evidence.

Thereafter, the specific section is detected based on the estimated recording path for each frame (S250).

Among the estimated recording paths, an estimated recording path having the highest frequency is selected as the basic path, and a section including a frame having an estimated recording path different from the basic path is detected as a specific section.

5 is a flowchart illustrating the operation of the model adding unit 120 in the forgery suspected interval detection device 100 according to an embodiment of the present invention.

The method of adding the model by the model adding unit 120 may include receiving path information of the model (S510), receiving an audio signal for use in model learning (S520), and extracting feature data of the audio signal ( S530) and the process of learning the feature data (S540). In describing the operation of the model adding unit 110, the order illustrated in FIG. 5 is described as an example, but the order of each process is not limited thereto.

First, path information of a model to be added to the singular section detection unit 110 is received (S510). The route information may include a device used for recording, a sample frequency, a compression method, and a transmission method. Since the description of the route information has been described in detail when describing the operation of the singular section detection unit 110, it will be omitted.

Thereafter, an audio signal to be used for learning the recording path model is received (S520). In this case, the audio signal to be used for learning the recording path model should be an audio signal generated by the path information received in step S510 of receiving path information of the model to be learned.

Then, feature data of the received audio signal is extracted (S530). At this time, the method of extracting the feature data should be the same method as the method of extracting the feature data used in the step (S230) of extracting the feature data from each frame unit signal of the singular section detection unit 100.

Next, the model of the recording path is trained using the extracted feature data (S540). As a learning method, a Gaussian Mixture Model (GMM), a Hidden Markov Model (HMM), and a Support Vector Machine (SVM) may be used.

When using the apparatus for detecting the suspected forgery interval 100 according to an embodiment of the present invention, it is possible to estimate the recording path of the audio signal, and when the audio signal recorded with different recording paths is edited into one audio signal, Forgery can be detected automatically, greatly reducing the manpower and time required to analyze evidence in criminal investigations or trials.

6 is a block diagram of the apparatus 100 for detecting suspected forgery interval according to an embodiment of the present invention.

The forgery suspected interval detection device 100 according to an embodiment of the present invention includes a detection audio signal receiver 610, a frame divider 620, a feature data extractor 630, a recording path estimator 640, and a model. It may be configured to include a storage unit 650, a specific section comparison detection unit 660 and the model adding unit 120.

The detection audio signal receiver 610 receives a detection audio signal for detecting a specific section. The detection audio signal to detect a specific section may be a digital or analog signal that follows an uncompressed format such as WAV, AIFF and AU, and a compressed format such as MP3, WMA, MPC, and OGG.

The frame dividing unit 620 divides the audio signal for detection into frame units to generate a frame unit signal. The audio signal for detection is divided in units of frames through a pre-emphasis process. A window function such as rectangular, hamming, hanning and kaiser-bessel may be used to divide the audio signal for detection in units of frames.

In this case, in order to ensure continuity between analysis units, the divided sections are divided to overlap each other. For example, if the length of the frame is set to 20 msec and the moving period of the frame is set to 10 msec, a duplicate portion of 10 msec occurs between adjacent frames.

The feature data extractor 630 receives the detection audio signal divided into the frame unit signal and extracts the feature data of the detection audio signal in the frame unit. Features include Mel-Frequency Cepstral Coefficients (MFCC), Linear Prediction Coding (LPC) Coefficient, Perceptual Linear Prediction (PLP) Coefficient, Total Power Spectrum, Subband Power, and Center Frequency Centroid), bandwidth and pitch frequency may be used.

In describing the operation of the forgery suspected interval detection device 100 according to an embodiment of the present invention, the MFCC is described as an example, but is not limited thereto.

The frame unit signal is converted into a frequency band by a fast fourier transform (FFT), and the signal converted into the frequency band passes through a Mel-Scale Filter Bank. The signal passing through the Mel-Scale Filter Bank undergoes a logarithmic transformation and obtains MFCC feature data by a discrete cosine transform.

The recording path estimator 640 receives the feature data, estimates the recording path of each frame unit signal, and generates an estimated recording path. In this case, the model learned about the recording path stored in the model storage unit 650 is referred to. If there is no model, the model adder 120 may add the model. The estimated recording path is generated by estimating the recording path of the frame with the model with the highest similarity by comparing the feature data of each frame with the learned model.

The path information of the recording path means information that may affect the formation of the characteristics of the audio signal during the generation of the audio signal. The path information of the recording path, the sampling frequency, the encoding method, the compression method, and the transmission method are used. There may be a method and the like and such information may be combined to form route information. Details thereof will be omitted since they have been described above.

The singular section comparison detection unit 660 receives the estimated recording path for each frame, selects the estimated recording path having the highest frequency among the estimated recording paths as the default path, and selects a section including a frame having an estimated recording path different from the default path. Detect by specific section.

7 is a view showing in detail the model adding unit 120 in the forgery suspected interval detection device 100 according to an embodiment of the present invention.

The model adder 120 may include a training audio signal receiver 710, a training feature data extractor 720, and a model learner 730.

The training audio signal receiver 710 receives a training audio signal to be used for learning the recording path model. At this time, the learning audio signal to be used for learning the recording path model should be an audio signal that knows the path information.

The training feature data extractor 720 extracts training feature data from the training audio signal. At this time, the method for extracting the feature data for learning should be the same method as the method for extracting the feature data used in the feature data extraction unit 630 of the singular section detection unit 100.

The model learner 730 receives the path information and the training feature data of the learning audio signal, trains the model, generates a learning result as an additional model, and transmits the training result to the model storage unit 650. As a learning method, a Gaussian Mixture Model (GMM), a Hidden Markov Model (HMM), and a Support Vector Machine (SVM) may be used.

The apparatus for detecting suspected forgery interval 100 according to an embodiment of the present invention is in the form of a general-purpose PCB, a digital signal processing (DSP) chip, a flexible program gate array (FPGA), an application specific integrated circuit (ASIC), and a software program. It may be implemented but is not limited to any one.

While the present invention has been described in connection with what is presently considered to be the most practical and preferred embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. That is, within the scope of the present invention, all of the components may be selectively coupled to one or more of them.

The above description is merely illustrative of the technical idea of the present invention, and those skilled in the art to which the present invention pertains may make various modifications and changes without departing from the essential characteristics of the present invention. Therefore, the embodiments disclosed in the present invention are not intended to limit the technical idea of the present invention but to describe the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments. The scope of protection of the present invention should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

100: forgery suspected section detection device
110: singular section detection unit 120: model addition unit
610: detection audio signal receiving unit 620: frame division unit
630: feature data extraction unit 640: recording path estimation unit
650: model storage unit 660: specific section comparison detection unit
710: learning audio signal receiving unit 720: learning feature data extraction unit
730: model learning unit

Claims (16)

Receiving an audio signal;
Generating a plurality of frame unit signals by dividing the audio signal into frame units having a predetermined length;
Extracting, as feature data, one of a Mel-Frequency Cepstral Coefficient (MFCC), a Linear Prediction Coding (LPC) coefficient, and a Perceptual Linear Prediction (PLP) coefficient from each of the plurality of frame unit signals;
Setting a model of the audio signal recording path having the highest similarity with the feature data among the models of the plurality of audio signal recording paths as the estimated recording path of each frame unit signal; And
Setting the most frequently estimated recording path among the estimated recording paths as a basic recording path, and detecting a section including the frame unit signal having an estimated recording path different from the basic recording path as a singular section suspected of forgery;
Forgery suspected interval detection method comprising a.
The method according to claim 1,
The generating of the frame unit signal may include a plurality of frame unit signals overlapping at regular intervals.
delete delete delete The method according to claim 1,
The method for detecting a forgery suspect interval further includes adding a model for the audio signal recording path,
Adding a model for the audio signal recording path,
Receiving a learning audio signal;
Receiving at least one of path information including a sampling frequency, a compression method, a device used for recording, and a transmission method for the learning audio signal as a learning audio signal recording path;
Extracting, from the learning audio signal, one of a Mel-Frequency Cepstral Coefficient (MFCC), a Linear Prediction Coding (LPC) coefficient, and a Perceptual Linear Prediction (PLP) coefficient as learning feature data; And
Learning a model for the learning audio signal recording path using the training feature data in at least one of a learning method including a Gaussian Mixture Model (GMM), a Hidden Markov Model (HMM), and a Support Vector Machine (SVM). Forgery suspected interval detection method characterized in that it comprises a process for.
delete The method according to claim 6,
And extracting the learning feature data as the learning feature data using the same method as the method used in the extracting the feature data.
Receiving a learning audio signal and recording path information of the learning audio signal;
Extracting, from the learning audio signal, one of a Mel-Frequency Cepstral Coefficient (MFCC), a Linear Prediction Coding (LPC) coefficient, and a Perceptual Linear Prediction (PLP) coefficient as feature data; And
Learning the recording path information using the feature data
Model learning method comprising a.
10. The method of claim 9,
The recording path information, the model learning method, characterized in that it comprises at least one of the device used for recording, sample frequency, compression method and transmission method.
delete delete An audio signal receiver for receiving an audio signal;
A frame dividing unit dividing the audio signal into frame units having a predetermined length to generate a plurality of frame unit signals;
A feature data extraction unit for extracting one of the plurality of frame unit signals from the plurality of frame unit signals as feature data from among MFCC (Linear Prediction Coding) coefficients and LLP (Perceptual Linear Prediction) coefficients;
A model storage unit for storing at least one model for the audio signal recording path;
A recording path estimator configured to set a model of the audio signal recording path having the highest similarity with the feature data among the models of the audio signal recording path as the estimated recording path of each of the frame unit signals; And
Singularity that sets the most frequently estimated recording path among the estimated recording paths as a basic recording path, and detects a section including the frame unit signal having an estimated recording path different from the basic recording path as a singular section suspected of forgery. Section comparison detector
Forgery suspected interval detection device comprising a.
14. The method of claim 13,
The forgery suspected interval detection device further includes a model addition unit,
The model adder,
A learning feature data extracting unit for receiving a learning audio signal and extracting one of a mel-frequency cepstral coefficient (MFCC) coefficient, a linear prediction coding (LPC) coefficient, and a perceptual linear prediction coefficient (PLP) coefficient from the learning audio signal as learning feature data. ; And
Receiving path information of the learning audio signal, learning the path information using the learning feature data, and additionally generating a model for an audio signal recording path based on the learning result and transmitting it to the model storage unit. Model Learning Department
, ≪ / RTI &
And wherein the learning is performed by at least one of a learning method including a Gaussian Mixture Model (GMM), a Hidden Markov Model (HMM), and a Support Vector Machine (SVM).
15. The method of claim 14,
And the learning feature data extracting unit extracts the learning feature data using the same method as the method of extracting the feature data used by the feature data extracting unit.
An audio signal receiver for receiving an audio signal and recording path information of the audio signal;
A feature data extraction unit for extracting, from the audio signal, one of a Mel-Frequency Cepstral Coefficient (MFCC), a Linear Prediction Coding (LPC) coefficient, and a Perceptual Linear Prediction (PLP) coefficient as feature data; And
Model learning unit for learning the recording path information using the feature data, and generates a model for the audio signal recording path based on the result of the learning
Model learning apparatus comprising a.
KR1020130079293A 2013-07-05 2013-07-05 Apparatus for forgery detection of audio file KR101382356B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020130079293A KR101382356B1 (en) 2013-07-05 2013-07-05 Apparatus for forgery detection of audio file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020130079293A KR101382356B1 (en) 2013-07-05 2013-07-05 Apparatus for forgery detection of audio file

Publications (1)

Publication Number Publication Date
KR101382356B1 true KR101382356B1 (en) 2014-04-10

Family

ID=50656863

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020130079293A KR101382356B1 (en) 2013-07-05 2013-07-05 Apparatus for forgery detection of audio file

Country Status (1)

Country Link
KR (1) KR101382356B1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101566425B1 (en) 2014-07-23 2015-11-06 대한민국 Detecting Method of Suspicious Points of Editing through Analysis of Frequency Distribution
CN112397102A (en) * 2019-08-14 2021-02-23 腾讯科技(深圳)有限公司 Audio processing method and device and terminal
CN113409771A (en) * 2021-05-25 2021-09-17 合肥讯飞数码科技有限公司 Detection method for forged audio frequency, detection system and storage medium thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6973573B1 (en) 2000-02-23 2005-12-06 Doug Carson & Associates, Inc. Detection of a digital data fingerprint
KR20080098878A (en) * 2007-05-07 2008-11-12 (주)엔써즈 Method and apparatus for generating audio fingerprint data and comparing audio data using the same
US20090031425A1 (en) 2007-07-27 2009-01-29 International Business Machines Corporation Methods, systems, and computer program products for detecting alteration of audio or image data
JP2009070026A (en) 2007-09-12 2009-04-02 Mitsubishi Electric Corp Recording device, verification device, reproduction device, recording method, verification method, and program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6973573B1 (en) 2000-02-23 2005-12-06 Doug Carson & Associates, Inc. Detection of a digital data fingerprint
KR20080098878A (en) * 2007-05-07 2008-11-12 (주)엔써즈 Method and apparatus for generating audio fingerprint data and comparing audio data using the same
US20090031425A1 (en) 2007-07-27 2009-01-29 International Business Machines Corporation Methods, systems, and computer program products for detecting alteration of audio or image data
JP2009070026A (en) 2007-09-12 2009-04-02 Mitsubishi Electric Corp Recording device, verification device, reproduction device, recording method, verification method, and program

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101566425B1 (en) 2014-07-23 2015-11-06 대한민국 Detecting Method of Suspicious Points of Editing through Analysis of Frequency Distribution
CN112397102A (en) * 2019-08-14 2021-02-23 腾讯科技(深圳)有限公司 Audio processing method and device and terminal
CN113409771A (en) * 2021-05-25 2021-09-17 合肥讯飞数码科技有限公司 Detection method for forged audio frequency, detection system and storage medium thereof

Similar Documents

Publication Publication Date Title
JP5362178B2 (en) Extracting and matching characteristic fingerprints from audio signals
Dean et al. The QUT-NOISE-TIMIT corpus for evaluation of voice activity detection algorithms
Cuccovillo et al. Audio tampering detection via microphone classification
WO2019148586A1 (en) Method and device for speaker recognition during multi-person speech
CN110189757A (en) A kind of giant panda individual discrimination method, equipment and computer readable storage medium
US20200265864A1 (en) Segmentation-based feature extraction for acoustic scene classification
CN102486920A (en) Audio event detection method and device
CN111640411B (en) Audio synthesis method, device and computer readable storage medium
Zhao et al. Audio splicing detection and localization using environmental signature
US9058384B2 (en) System and method for identification of highly-variable vocalizations
US9792898B2 (en) Concurrent segmentation of multiple similar vocalizations
CN105825857A (en) Voiceprint-recognition-based method for assisting deaf patient in determining sound type
WO2015092492A1 (en) Audio information processing
CN111782861A (en) Noise detection method and device and storage medium
CN105719660A (en) Voice tampering positioning detection method based on quantitative characteristic
KR101382356B1 (en) Apparatus for forgery detection of audio file
KR101808810B1 (en) Method and apparatus for detecting speech/non-speech section
JP4985134B2 (en) Scene classification device
Patil et al. Combining evidences from mel cepstral features and cepstral mean subtracted features for singer identification
Marković et al. Reverberation-based feature extraction for acoustic scene classification
Zhang et al. Deep scattering spectra with deep neural networks for acoustic scene classification tasks
CN104715756A (en) Audio data processing method and device
CN111627426B (en) Method and system for eliminating channel difference in voice interaction, electronic equipment and medium
CN112750458B (en) Touch screen sound detection method and device
Patole et al. Acoustic environment identification using blind de-reverberation

Legal Events

Date Code Title Description
A201 Request for examination
A302 Request for accelerated examination
E902 Notification of reason for refusal
E701 Decision to grant or registration of patent right
GRNT Written decision to grant